Commun. Math. Phys. 214, 1 – 56 (2000)
Communications in
Mathematical Physics
© Springer-Verlag 2000
Modular-Invariance of Trace Functions in Orbifold Theory and Generalized Moonshine Chongying Dong1, , Haisheng Li2, , Geoffrey Mason1, 1 Department of Mathematics, University of California, Santa Cruz, CA 95064, USA 2 Department of Mathematical Sciences, Rutgers University, Camden, NJ 08102, USA
Received: 7 January 1999 / Accepted: 14 March 2000
Abstract: The goal of the present paper is to provide a mathematically rigorous foundation to certain aspects of the theory of rational orbifold models in conformal field theory, in other words the theory of rational vertex operator algebras and their automorphisms. Under a certain finiteness condition on a rational vertex operator algebra V which holds in all known examples, we determine the precise number of g-twisted sectors for any automorphism g of V of finite order. We prove that the trace functions and correlation functions associated with such twisted sectors are holomorphic functions in the upper half-plane and, under suitable conditions, afford a representation of the modular group of the type prescribed in string theory. We establish the rationality of conformal weights and central charge. In addition to conformal field theory itself, where our conclusions are required on physical grounds, there are applications to the generalized Moonshine conjectures of Conway–Norton–Queen and to equivariant elliptic cohomology. Contents 1. 2. 3. 4. 5. 6. 7.
Introduction . . . . . . . . . . . . . . . . . Vertex Operator Algebras . . . . . . . . . . Twisted Modules . . . . . . . . . . . . . . . P -Functions and Q-Functions . . . . . . . . The Space of 1-Point Functions on the Torus The Differential Equations . . . . . . . . . . Formal 1-Point Functions . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
2 8 11 15 26 30 34
Supported by NSF grants DMS-9303374, DMS-9700923 and faculty research funds granted by the University of California at Santa Cruz. Supported by NSF grant DMS-9616630. Supported by NSF grants DMS-9401272, DMS-9700909 and faculty research funds granted by the University of California at Santa Cruz.
2
8. 9. 10. 11. 12. 13.
C. Dong, H. Li, G. Mason
Correlation Functions . . . . . . . . . . . . . . . . . An Existence Theorem for g-Twisted Modules . . . . The Main Theorems . . . . . . . . . . . . . . . . . . Rationality of Central Charge and Conformal Weights Condition C2 . . . . . . . . . . . . . . . . . . . . . . Applications to the Moonshine Module . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
35 40 42 47 50 53
1. Introduction The goals of the present paper are to give a mathematically rigorous study of rational orbifold models, more precisely we study the questions of the existence and modularinvariance of twisted sectors of rational vertex operator algebras. The idea of orbifolding a vertex operator algebra with respect to an automorphism, and in particular the introduction of twisted sectors, goes back to some of the earliest papers in the subject [FLM1, Le, FLM2, FLM3, DHVW, DHVV, FFR, D3], while the question of modular-invariance underlies the whole enterprise. Apart from a few exceptions such as [DGM], the physical literature tends to treat the existence and modular-invariance of twisted sectors as axioms, while mathematical work has been mainly limited to studying special cases such as affine algebras and lattice theories [KP, Le, FLM2] and fermionic orbifolds [DM1]. Under some mild finiteness conditions on a rational vertex operator algebra V we will, among other things, establish the following: (A) The precise number of inequivalent, simple g-twisted sectors that V possesses. (B) Modular-invariance (in a suitable sense) of the characters of twisted sectors. In order to facilitate the following discussion we assume that the reader has a knowledge of the basic definitions concerning vertex operator algebras as given, for example, in [FLM3, FHL, DM1] and below in Sects. 2 and 3. Let us suppose that V is a vertex operator algebra. There are several approaches to what it means for V to be rational, each of them referring to finiteness properties of V of various kinds (cf. [MS, HMV, AM] for example). Our own approach is as follows (cf. [DLM2, DLM3]): following [Z], an admissible V -module is a certain linear space M=
∞
M(n)
(1.1)
n∈Z
which admits an action of V by vertex operators which satisfy certain axioms, the most important of which is the Jacobi identity. However, the homogeneous subspaces M(n) are not assumed to be finite-dimensional. As explained in [DLM2], this definition includes as a special case the idea of an (ordinary) V-module, which is a graded linear space of the shape M= Mn (1.2) n∈C
such that M admits an appropriate action of V by vertex operators and such that each Mn has finite dimension, Mλ+n = 0 for fixed λ ∈ C and sufficiently small integer n, and each Mn is the n-eigenspace of the L(0)-operator. Although one ultimately wants to establish the rationality of the grading of such modules, it turns out to be convenient to allow the gradings to be a priori more general. One then has to prove that the grading is
Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine
3
rational after all. By the way, the terms “simple module” and “sector” are synonymous in this context. We call V a rational vertex operator algebra in case each admissible V -module is completely reducible, i.e., a direct sum of simple admissible modules. We have proved in [DLM3] that this assumption implies that V has only finitely many inequivalent simple admissible modules, moreover each such module is in fact an ordinary simple V -module. These results are special cases of results proved in (loc.cit.) in which one considers the same set-up, but relative to an automorphism g of V of finite order. Thus one has the notions of g-twisted V -module, admissible g-twisted V -module and g-rational vertex operator algebra V , the latter being a vertex operator algebra all of whose admissible g-twisted modules are completely reducible. We do not give the precise definitions here (cf. Sect. 3), noting only that if g has order T then a simple g-twisted V -module has a grading of the shape M=
∞
Mλ+n/T
(1.3)
n=0
with Mλ = 0 for some complex number λ which is called the conformal weight of M. This is an important invariant of M which plays a rˆole in the theory of the Verlinde algebra, for example. The basic result (loc.cit.) is that a g-rational vertex operator algebra has only finitely many inequivalent, simple, admissible g-twisted modules, and each of them is an ordinary simple g-twisted module. Although the theory of twisted modules includes that of ordinary modules, which corresponds to the case g = 1, it is nevertheless common and convenient to refer to the untwisted theory if g = 1, and to the twisted theory otherwise. It seems likely that if V is rational then in fact it is g-rational for all automorphisms g of finite order, but this is not known. Nevertheless, our first main result shows that there is a close relation between the untwisted and twisted theories. To explain this, we first recall standard notation for the vertex operator determined by v: Y (v, z) = v(n)z−n−1 (1.4) n∈Z
so that each v(n) is a linear operator on V . Define C2 (V ) to be the subspace of V spanned by the elements u(−2)v for u, v in V . We say that V satisfies Condition C2 if C2 (V ) has finite codimension in V . This is closely related to Zhu’s condition C [Z], which also includes the hypothesis that V is a sum of highest weight modules for the Virasoro algebra. Zhu asserted in [Z], and we verify in Sect. 12, that condition C2 holds for a number of the most familiar rational vertex operator algebras. Next, any automorphism h of V induces in a natural way (cf. Sect. 4 of [DM2] and Sect. 3 below) a bijection from the set of isomorphism classes of (simple, admissible) g-twisted modules to the corresponding set of hgh−1 -twisted modules, so that if g and h commute then one may consider the set of h-stable g-twisted modules. In particular we have the set of h-stable ordinary (or untwisted) modules, which includes the vertex operator algebra V itself. Our first set of results may now be stated as follows: Theorem 1.1. Suppose that V is a rational vertex operator algebra which satisfies Condition C2 . Then the following hold: (i) The central charge of V and the conformal weight of each simple V -module are rational numbers.
4
C. Dong, H. Li, G. Mason
(ii) If g is an automorphism of V of finite order then the number of inequivalent, simple g-twisted V -modules is at most equal to the number of g-stable simple untwisted V -modules, and is at least 1 if V is simple. (iii) If V is g-rational, the number of inequivalent, simple g-twisted V -modules is precisely the number of g-stable simple untwisted V -modules. We actually prove an extension of this result in which V is also assumed to be g i rational for all integers i (g as in (ii)), where the conclusion is that each simple g i -twisted V -module has rational conformal weight. There is a second variation of this theme involving the important class of holomorphic vertex operator algebras. This means that V is assumed to be both simple and rational, moreover V is assumed to have a unique simple module - which is necessarily the adjoint module V itself. Familiar examples include the Moonshine Module [B2, FLM3] and vertex operator algebras associated to positive-definite, even, unimodular lattices [B2], [FLM3]. Proof of rationality and holomorphy of these particular vertex operator algebras can be found in [D1, D2] and [DLM2]. We establish Theorem 1.2. Suppose that V is a holomorphic vertex operator algebra which satisfies Condition C2 , g is an automorphism of V of finite order. Then the following hold: (i) V possesses a unique simple g-twisted V -module up to isomorphism, call it V (g). (ii) The conformal weight of V (g) is a rational number. We turn now to a discussion of the general question of modular-invariance. One is concerned with various trace functions, the most basic of which is the formal character of a (simple) g-twisted sector M. If M has grading (1.3) we define the formal character as char q M = q λ−c/24
∞
dim Mλ+n/T q n/T ,
(1.5)
n=0
where c is the central charge and q a formal variable. More generally, if M is an h-stable g-twisted sector as before, then h induces a linear map on M which we denote by φ(h), and one may consider the corresponding graded trace ZM (g, h) = q λ−c/24
∞
tr Mλ+n/T φ(h)q n/T ,
(1.6)
n=0
The linear map φ(h) is only determined up to a nonzero scalar and therefore ZM (g, h) is also only defined up to a nonzero scalar. Similarly, up to a nonzero scalar ZM (g, h) is independent of the choice g-twisted sector in the isomorphism class of M. The choice of such a scalar does not interfere with any of the proofs and results in the present paper. As is well-known, it is important to consider these trace functions as special cases of so-called (g, h) correlation functions. These may be defined for homogeneous elements v ∈ V of weight k and any pair of commuting elements (g, h) via TM (v, g, h) = q λ−c/24
∞
tr Mλ+n/T (v(k − 1)φ(h))q n/T .
(1.7)
n=0
Note that v(k − 1) induces a linear map on each homogeneous subspace of M. If we take v to be the vacuum vector then (1.7) reduces to (1.6).
Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine
5
In the special case when V is holomorphic and the twisted sector V (g) is unique up to equivalence, we set ZV (g) (g, h) = Z(g, h), TV (g) (v, g, h) = T (v, g, h).
(1.8)
Note that in this situation, the uniqueness of V (g) shows that V (g) is h-stable whenever g and h commute. As discussed in [DM1–DM2], this allows us to consider φ as a projective representation of the centralizer C(g) on V (g), in particular (1.7) is defined for all commuting pairs (g, h). Linearizing this projective representation to an ordinary representation of a covering group of C(g) involves the choice of a 2-cocycle with coefficients in C∗ . Such a choice corresponds to a choice of the scalar in defining φ(h), as discussed above. One can also consider these trace functions less formally. Taking q to be the usual local parameter at infinity in the upper half-plane h = {τ ∈ C|imτ > 0}
(1.9)
i.e., q = qτ = e2πiτ , we will see that the trace functions TM (v, g, h, τ ) converge to holomorphic functions in h under suitable conditions on V . By extending TM linearly to the whole of V one obtains a function TM : V × P (G) × h → P 1 (C),
(1.10)
where P (G) is the set of commuting pairs of elements in G. We take ! = SL(2, Z) to operate on h in the usual way via Möbius transformations, that is aτ + b ab , γ = ∈ !, (1.11) γ : τ → cd cτ + d and we let it act on the right of P (G) via (g, h)γ = (g a hc , g b hd ).
(1.12)
Zhu has introduced in [Z] a second vertex operator algebra associated in a certain way to V ; it has the same underlying space, however the grading is different. We are concerned with elements v in V which are homogeneous of weight k, say, with regard to the second vertex operator algebra. We write this as wt[v] = k. For such v we define an action of the modular group ! on TM in a familiar way, namely TM |γ (v, g, h, τ ) = (cτ + d)−k TM (v, g, h, γ τ ).
(1.13)
We state some of our main results concerning modular-invariance. Theorem 1.3. Suppose that V is a vertex operator algebra which satisfies Condition C2 , and let G be a finite group of automorphisms of V . (i) For each triple (v, g, h) ∈ V × P (G) and for each h-stable g-twisted sector M, the trace function TM (v, g, h, τ ) converges to a holomorphic function in h. (ii) Suppose in addition that V is g-rational for each g ∈ G. Let v ∈ V satisfy wt[v] = k. Then the space of (holomorphic) functions in h spanned by the trace functions TM (v, g, h, τ ) for all choices of g, h and M is a (finite-dimensional) !-module with respect to the action (1.13).
6
C. Dong, H. Li, G. Mason
More precisely, if γ ∈ ! then we have an equality TM |γ (v, g, h, τ ) = σW TW (v, (g, h)γ , τ ),
(1.14)
W
where (g, h)γ is as in (1.12) and W ranges over the g a hc -twisted sectors which are g b hd -stable. The constants σW depend only on g, h, γ and W . Theorem 1.4. Suppose that V is a holomorphic vertex operator algebra which satisfies Condition C2 , and let G be a cyclic group of automorphisms of V . If (g, h) ∈ P (G), v ∈ V satisfies wt[v] = k, and γ ∈ !, then T (v, g, h, τ ) is a holomorphic function in h which satisfies T |γ (v, g, h, τ ) = σ (g, h, γ )T (v, (g, h)γ , τ )
(1.15)
for some constant σ (g, h, γ ). Note that if V is as in Theorem 1.4, then charq V is a modular function on SL(2, Z), possibly with character. It follows from this that the central charge of V is an integer divisible by 8. We can summarize some of the results above by saying that the functions TM (v, g, h, τ ) and T (v, g, h, τ ) are generalized modular forms of weight k in the sense of [KM]. This means essentially that each of these functions and each of their transforms under the modular group have q-expansions in rational powers of q with bounded denominators, and that up to scalar multiples there are only a finite number of such transforms. One would like to show that each of these functions is, in fact, a modular form of weight k and some level N in the usual sense of being invariant under the principal congruence subgroup !(N). This will require further argument, as it is shown in (loc.cit.) that there are generalized modular forms which are not modular forms in the usual sense. All we can say in general at the moment is that TM (v, g, h, τ ) and T (v, g, h, τ ) have finite level in the sense of Wohlfahrt [Wo]. This is true of all generalized modular forms, and means that TM (v, g, h, τ ) and T (v, g, h, τ ) and each of their ! -transforms are invariant under the group +(N ) for some N , where we define +(N ) to be the normal 1N closure of in !. 0 1 Let us emphasize the differences between Theorem 1.3 (ii) and Theorem 1.4. In the former we assume g-rationality, which in practice is hard to verify, even for known vertex operator algebras. In Theorem 1.4 there is no such assumption, however we have to limit ourselves to cyclic pairs (g, h) in P (G). One expects Theorem 1.4 to hold for all commuting pairs (g, h). In [N], Simon Norton conjectured that (1.15) holds in the special case that v is the vacuum vector and V is the Moonshine Module [FLM3] whose automorphism group is the Monster (loc.cit.). His argument was based on extensive numerical evidence in Conway-Norton’s famous paper Monstrous Moonshine [CN] which was significantly expanded in the thesis of Larrissa Queen [Q]. A little later it was given a string-theoretic interpretation in [DGH]. We will see in Sect. 12 that the Moonshine Module satisfies Condition C2 (a proof is also outlined in [Z]), so that Theorem 1.4 applies. Norton also conjectured that each Z(g, h, τ ) is either constant or a hauptmodul, the latter being a modular function (weight zero) on some discrete subgroup !0 of SL(2, R) of genus zero such that the modular function in question generates the full field of meromorphic modular functions on !0 . By utilizing the results of Borcherds [B2] which establish the original Moonshine conjectures in [CN] for the Moonshine Module we obtain
Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine
7
Theorem 1.5 (Generalized Moonshine). Let V , be the Moonshine Module, and let g be an element of the Monster simple group M. The following hold: (i) V , has a unique g-twisted sector V , (g). (ii) The formal character char q V , (g) is a hauptmodul. (iii) More generally, if g and h in M generate a cyclic subgroup then the graded trace Z(g, h, τ ) of φ(h) on V , (g) is a hauptmodul. These results essentially establish the Conway–Norton–Queen conjectures about the Monster for cyclic pairs (g, h). As we have said, the spaces V , (g) support faithful projective representations of the corresponding centralizer CM (g) of g in M. It was these representations that were conjectured to exist in [N] and [Q], and they are of considerable interest in their own right. See [DLM1] for more information in some special cases. There also appears to be some remarkable connections with the work of Borcherds and Ryba [Ry, BR, B3] on modular moonshine which we hope to consider elsewhere. Finally, we mention that Theorem 1.4 can be considerably strengthened, indeed the best possible results can be established, if we assume that g has small order. For simplicity we state only a special case: Theorem 1.6. Let V , v, k be as in Theorem 1.4, and assume that the central charge of V is divisible by 24. Suppose that g has order p = 2 or 3. Then the following hold: (i) The conformal weight λ of the (unique) g-twisted sector V (g) is in
1 Z. p2
(ii) The graded trace T (v, 1, g, τ ) is a modular form of weight k and level p 2 . (iii) We have λ ∈ p1 Z if, and only if, T (v, g, h, τ ) is a modular form on the congruence subgroup !0 (p). These results follow from the previous theorems, and will be proved elsewhere. They can be used to make rigorous some of the assumptions commonly made by physicists (cf. [M, S, T1, T2, T3, V] for example) in the theory of Zp -orbifolds, and we hope that Theorem 1.6 may provide the basis for a more complete theory of such orbifolds. We have already mentioned the work of Zhu [Z] on several occasions, and it is indeed this paper to which we are mainly indebted intellectually. In essence, we are going to prove an equivariant version of the theory laid down by Zhu (loc.cit.), though even in the special case that he was studying our work yields improvements on his results. In particular, Zhu’s hypothesis that V is a sum of highest weight modules for the Virasoro algebra is eliminated in the present paper, and our notion of rationality, developed in [DLM3], is qualitatively weaker than that of Zhu [Z]. Nevertheless, the broad outline of our proof follows that of [Z]. The equivariant refinement of Zhu’s theory began with our paper [DLM3] which also plays a basic rˆole in the present paper. In this paper we constructed so-called twisted Zhu algebras Ag (V ) which are associative algebras associated to a vertex operator algebra V and automorphism g of finite order. They have the property that, at least for suitable classes of vertex operator algebras, the module category for Ag (V ) and the category of g-twisted modules for V are equivalent. This reduces the construction of g-twisted sectors to the corresponding problem for Ag (V ) (not a priori known to be non-zero!) As in [Z], the rˆole of the finiteness condition C2 is to show that the (g, h) correlation functions that we have considered above satisfy certain differential equations of regular - singular type. These differential equations have coefficients which are essentially modular forms on a congruence subgroup, a fact which is ultimately attributable to the Jacobi identity satisfied by the vertex operators. One attempts to characterize the space of correlation functions as those solutions of
8
C. Dong, H. Li, G. Mason
the differential equation which possess other technical features related to properties of V and Ag (V ). This space is essentially what is sometimes referred to as the (g, h)conformal block, and our results follow from the technical assertion that, under suitable circumstances, the (g, h)-conformal block is indeed spanned by the (g, h)-correlation functions. This whole approach to conformal blocks is inspired by [Z], but is more complicated when twisted sectors are involved. We point out that the holomorphy of trace functions follows from the fact that they are solutions of suitable differential equations, moreover the Frobenius–Fuchs theory of differential equations with regular - singular points leads to the representation of elements of the conformal block as q-expansions. One attempts to identify coefficients of such q-expansions with the trace function defined by some Ag (V )-module, and at the same time show that elements of the conformal blocks are free of logarithmic singularities. The point is that the Frobenius–Fuchs theory plays a critical rˆole, as it does also in [Z]. The paper is organized as follows: after some preliminaries in Sects. 2 and 3, we take up in Sect. 4 the study of certain modular and Jacobi-type forms, the main goal being to write down the transformation laws which they satisfy. Our methods (and probably the results too) will be well-known to experts in elliptic functions, but it is fascinating to see how such classical topics such as Eisenstein series and Bernoulli distributions play a rˆole in an abstract theory of vertex operator algebras. In Sect. 5 we introduce the space of abstract (g, h) 1-point functions associated to a vertex operator algebra and establish that it affords an action by the modular group (Theorem 5.4). In Sections 6 and 7 we continue the study of such functions and in particular write down the differential equation that they satisfy and the general shape of the solutions in terms of q-expansions and logarithmic singularities. Next we prove in Sect. 8 (Theorems 8.1 and 8.7) that if g and h commute then distinct h-stable g-twisted sectors give rise to linearly independent trace functions which lie in the (g, h)-conformal block, and in Sect. 9 we give as an application of the ideas developed so far a general existence theorem for twisted sectors. Section 10 contains the main theorems which give sufficient conditions under which the (g, h) conformal block is spanned by trace functions. Having reached Sect. 11, we have enough information to be able to apply the methods and results of Anderson and Moore [AM], and this leads to the rationality results stated above in Theorems 1.1 and 1.2 as well as the applications to modular-invariance and to the generalized Moonshine Conjectures, which are discussed in Sect. 13. We also point out how one can use Theorem 1.4 to describe not only other correlation functions but also “Monstrous Moonshine of weight k.” Sect. 12 establishes condition C2 for a number of well-known rational vertex operator algebras, so that our theory applies to all of these examples. 2. Vertex Operator Algebras The definition of vertex operator algebra [FLM3] entails a Z-graded complex vector space: Vn (2.1) V = n∈Z
satisfying dim Vn < ∞ for all n and Vn = 0 for n 0.
(3.12) (3.13)
where δr = 1 if r = 0 and δr = 0 if r = 0. Extend ◦g and ∗g to bilinear products on V . We let Og (V ) be the linear span of all u ◦g v. Theorem 3.3 ([DLM3, Z]). The quotient Ag (V ) = V /Og (V ) is an associative algebra with respect to ∗g . Note that if g = 1 then A(V ) is nonzero, whereas if g = 1 the analogous assertion may not be true. But if Ag (V ) = 0 then the vacuum element maps to the identity of Ag (V ), and the conformal vector maps into the center of Ag (V ). Let M be a weak g-twisted V -module. Define 7(M) = {w ∈ V |u(wtu − 1 + n)w = 0, u ∈ V , n > 0}. For homogeneous u ∈ V define o(u) = u(wtu − 1)
(3.14)
(sometimes called the zero mode of u). Theorem 3.4 ([DLM3]). Let M be a weak g-twisted V -module. The following hold: (a) The map v → o(v) induces a representation of the associative algebra Ag (V ) on 7(M). (b) If M is a simple admissible g-twisted V -module then 7(M) = M(0) is a simple Ag (V )-module. Moreover, M → M(0) induces a bijection between (isomorphism classes of) simple admissible g-twisted V -modules and simple Ag (V )-modules. When combined with Theorem 3.2 one finds Theorem 3.5 ([DLM3]). Suppose that V is a vertex operator algebra with an automorphism g of finite order, and that V is g-rational (possibly g = 1). Then the following hold: (a) Ag (V ) is a finite-dimensional, semi-simple associative algebra (possibly 0). (b) The map M → 7(M) induces an equivalence between the category of ordinary g-twisted V -modules and the category of finite-dimensional Ag (V )-modules.
14
C. Dong, H. Li, G. Mason
There are various group actions that we need to explain. Let g, h be automorphisms of V with g of finite order. If (M, Yg ) is a weak g-twisted module for V there is a weak hgh−1 -twisted V -module (M, Yhgh−1 ), where for v ∈ V we define Yhgh−1 (v, z) = Yg (h−1 v, z).
(3.15)
This defines a left action of Aut(V ) on weak twisted modules and on isomorphism classes of weak twisted modules. Symbolically, we write h ◦ (M, Yg ) = (M, Yhgh−1 ) = h ◦ M,
(3.16)
where we sometimes abuse notation slightly by identifying (M, Yg ) with the isomorphism class that it defines. The action (3.16) induces an action h ◦ 7(M) = 7(h ◦ M).
(3.17)
Next, it follows easily from definitions (3.12) and (3.13) that the action of h on V induces an isomorphism of associative algebras h :Ag (V )→ Ahgh−1 (V ) v
→ hv
(3.18)
which then induces a functor h : Ahgh−1 (V ) − mod → Ag (V ) − mod.
(3.19)
To describe (3.19), let (N, ∗hgh−1 ) be a left Ahgh−1 (V )-module (extending the notation of (3.13)). Then h ◦ (N, ∗hgh−1 ) = (N, ∗g ), where, for n ∈ N, v ∈ V , v ∗hgh−1 n = h−1 v ∗g n.
(3.20)
Now if (M, Yg ) is as before then (3.17) and (3.19) both define actions of h on 7(M); they are the same. For if v ∈ V and we consider the image of v in Ahgh−1 (V ), it acts on 7(h ◦ M) via the zero mode ohgh−1 (v) of v in the vertex operator Yhgh−1 (v, z) = Yg (h−1 v, z). In other words, if m ∈ 7(h◦M) = 7(M), then ohgh−1 (v)m = og (h−1 v)m, which is precisely what (3.20) says. We say that the g-twisted V -module M is h-invariant if h ◦ M ∼ = M. The set of all such automorphisms, the stabilizer of M, is a subgroup C of AutV . There is a projective representation of C on M induced by the action (3.16). See [DM1] or [DM2] for more information on this point. Via (3.17) this induces a projective representation of C on 7(M). Next we discuss the C2 -condition in more detail. Let V be a vertex operator algebra and M a V -module. We define C2 (M) = {v(−2)m|v ∈ V , m ∈ M}.
(3.21)
We say that M satisfies condition C2 in case C2 (M) has finite codimension in M. The most important case is that in which M is taken to be V itself. Proposition 3.6. Suppose that V satisfies condition C2 , and let g be an automorphism of V of finite order. Then the algebra Ag (V ) has finite dimension.
Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine
15
Proof. Note from (3.21) that C2 (V ) is a Z-graded subspace of V . Since the codimension of C2 (V ) is finite, there is an integer k such that V = C2 (V ) + W , where W is the sum of the first k homogeneous subspaces of V . We will show that Vm ⊂ W +Og (V ) for each m ∈ Z, in which case V = W +Og (V ) and dim Ag (V ) ≤ dim W . We proceed by induction on m. Recall that V r is the eigenspace of g with eigenvalue e−2πir/T for 0 ≤ r ≤ T − 1, where g has order T . Since C2 (V ) is a homogeneous and g-invariant subspace of V then we may write any c ∈ C2 (V ) ∩ Vm in the form c=
n
ui (−2)vi
(3.22)
i=1
for homogeneous elements ui , vi ∈ V satisfying ui ∈ V r for some r = r(i) and wtui + wtvi + 1 = m. Suppose first that ui ∈ V r with r = 0. According to (3.12), Og (V ) contains wtui (1 + z)wtui Resz Y (ui , z) vi = ui (j − 2)vi . z2 j
(3.23)
j ≥0
Now wtui (j − 2)vi = wtui + wtvi − j + 1 = m − j , so if j ≥ 1 then ui (j − 2)vi ∈ W + Og (V ) by induction. But then W + Og (V ) also contains the remaining summand ui (−2)vi of (3.23). Now suppose that ui ∈ V r with r ≥ 1. By Lemma 2.2 (i) of [DLM3] we have that wtui −1+r/T Og (V ) contains the element Resz Y (ui , z) (1+z) z2 vi , and we conclude once again that ui (−2)vi lies in W + Og (V ). So we have shown that all summands of (3.22) lie in W + Og (V ). The proposition follows. Proposition 3.7. Suppose that V satisfies condition C2 , and let g be an automorphism of V of finite order. If Ag (V ) = 0 then V has a simple g-twisted V -module. Proof. Ag (V ) has finite dimension by Proposition 3.6. Now the result follows from Theorem 9.1 of [DLM3]. The following lemma will be used in Sect. 12. Lemma 3.8. Let M be a V -module. Then C2 (M) is invariant under the operators v(0) and v(−1) for any v ∈ V . u ∈ V and w ∈ M. Then for k = 0, −1 Proof. Consider u(−2)w ∈ C2 (M) for k v(k)u(−2)w = u(−2)v(k)w + ∞ i=0 i (v(i)u)(−2 + k − i)w ∈ C2 (M) as required. 4. P -Functions and Q-Functions We study certain functions, which we denote by P and Q, which play a r oˆ le in later sections. The P -functions are essentially Jacobi forms [EZ] and the Q-functions are certain modular forms. The main goal is to write down the transformation laws of these functions under the action of the modular group ! = SL(2, Z).
16
C. Dong, H. Li, G. Mason
Let h denote the upper half plane h = {z ∈ C|im z > 0} with the usual left action of ! via Möbius transformations aτ + b ab τ= . (4.1) cd cτ + d ! also acts on the right of S 1 × S 1 via ab (µ, λ) = (µa λc , µb λd ). cd
(4.2)
Let t be a torsion point of S 1 ×S 1 . Thus t = (µ, λ) with µ = e2πij/M and λ = e2πil/N for integers j, l, M, N with M, N > 0. For each integer k = 1, 2, · · · and each t we define a function Pk on C × h as follows: Pk (µ, λ, z, qτ ) = Pk (µ, λ, z, τ ) =
nk−1 qzn 1 , (k − 1)! j 1 − λqτn
(4.3)
n∈ M +Z
where the sign means omit the term n = 0 if (µ, λ) = (1, 1). Here and below we write qx = e2πix . Remark 4.1. (i) (4.3) converges uniformly and absolutely on compact subsets of the region |qτ | < |qz | < 1. (ii) Theorem 4.2 holds also for (µ, λ) = (1, 1) in case k ≥ 3 but not if k = 1, 2 (cf. [Z]). We will prove Theorem 4.2. Suppose that (µ, λ) = (1, 1). Then z Pk (µ, λ, , γ τ ) = (cτ + d)k Pk ((µ, λ)γ , z, τ ) cτ + d ab for all γ = ∈ !. cd We can reformulate Theorem 4.2 as follows: for suitable functions F (µ, λ, z, τ ) on (Q/Z)2 × C × h, and for an integer k, we set z F |k γ (µ, λ, z, τ ) = (cτ + d)−k F ((µ, λ)γ −1 , , γ τ ). (4.4) cτ + d As is well-known, this defines a right action of ! on such functions F . Theorem 4.2 says precisely that Pk is an invariant of this action. So it is enough to prove the theorem 0 −1 11 for the two standard generators S = and T = of !. If γ = T 1 0 01 then Theorem 4.2 reduces to the assertion Pk (µ, λ, z, τ + 1) = Pk (µ, µλ, z, τ ), which follows immediately from definition (4.3). We also note the equality d (4.5) Pk (µ, λ, z, τ ) = 2π ikPk+1 (µ, λ, z, τ ). dz So if Theorem 4.2 holds for k then it holds for k + 1 by (4.5) and the chain rule. These observations reduce us to proving Theorem 4.2 in the case that k = 1 and γ = S, when it can be restated in the form
Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine
17
Theorem 4.3. If (µ, λ) = (1, 1) then z −1 ) = τ P1 (λ, µ−1 , z, τ ). P1 (µ, λ, , τ τ We will need to make use of several other functions in the proof of Theorem 4.3. First there is the usual Eisenstein series G2 (τ ) with q-expansion ∞
G2 (τ ) =
nq n π2 τ + 2(2π i)2 3 1 − qτn
(4.6)
n=1
and well-known transformation law
G2 (γ τ ) = (cτ + d) G2 (τ ) − 2π ic(cτ + d), γ = 2
ab cd
∈ !.
(4.7)
(For these and other facts about elliptic functions, the reader may consult [La].) Let
∞ qτn qz−1 qz + 1 qz qτn . (4.8) − + 2π i ℘1 (z, τ ) = G2 (τ )z + πi qz − 1 1 − qz qτn 1 − qτn qz−1 n=1
The function ℘1 (z, τ ) is not elliptic, but satisfies ℘1 (z + 1, τ ) = ℘1 (z, τ ) + G2 (τ ), ℘1 (z + τ, τ ) = ℘1 (z, τ ) + G2 (τ )τ − 2π i, z ab ∈ !. , γ τ ) = (cτ + d)℘1 (z, τ ), γ = ℘1 ( cd cτ + d
(4.9) (4.10) (4.11)
Now introduce a further P -type function
Pλ (z, τ ) = 2π i
n∈Z,n=0
qzn , 1 − λqτn
where λ is a root of unity. Lemma 4.4. Suppose that |qτ | < |qz | < 1 and that λN = 1. Then Pλ (z, τ ) =
N−1
λk (G2 (N τ )(z + kτ ) − ℘1 (z + kτ, N τ ) − π i) .
k=0
Proof. As |λqτn | < 1 for n ≥ 1 then ∞ n=1
∞
∞
qzn = qzn λm qτmn n 1 − λqτ = =
n=1 m=0 ∞ ∞
m=0 n=0 ∞ m m=0
λm qz qτm (qz qτm )n
λ qz qτm 1 − qz qτm ∞
=
(as |qz qτm | < 1 for m ≥ 0)
λ m qz q m qz τ + . 1 − qz 1 − qz qτm m=1
(4.12)
18
C. Dong, H. Li, G. Mason
Using |qz−1 qτm | < 1 for m ≥ 1, a similar calculation yields ∞ ∞ λ−1 qz−n qτn λ−m qz−1 qτm = . −1 n 1 − λ qτ 1 − qz−1 qτm n=1 m=1
From this and (4.12) we get −1
(2πi)
∞
qz Pλ (z, τ ) = + 1 − qz
m=1
λ−m qz−1 qτm λm qz qτm − 1 − qz q mt 1 − qz−1 qτm
.
(4.13)
Next, use the expansion −1 ∞ ∞ N N−1 ∞ n qz+kτ qNτ λm qz qτm λk qz qτnN+k k = = λ n 1 − qz qτm 1 − qz+kτ qNτ 1 − qz qτnN+k
m=0
n=0 k=0
k=0
n=0
and a similar expression for the second term under the summation sign in (4.13) to see that
∞ N−1 ∞ −1 n qz+kτ q n q q z+kτ Nτ Nτ Pλ (z, τ ) = 2πi . (4.14) λk n − −1 n 1 − qz+kτ qNτ 1 − qz+kτ qNτ n=0 n=1 k=0 Using the formula (4.8) for ℘1 (z + kτ, N τ ), the lemma follows readily from (4.14).
For a root of unity λ, set ?(λ) =
1 1−λ ,
0,
λ = 1 λ = 1.
(4.15)
Now we are ready for the proof of Theorem 4.3. Let ν = e2πi/M with µ and λ as before. For t ∈ Z we then have M
jt
j
ν P1 (ν , λ, z, τ ) =
j =1
M
ν
jt
j =1
j n∈ M +Z
n q z+t qzn M = . 1 − λqτn 1 − λq nτ n∈Z
M
From (4.12) and (4.15) we conclude that M
ν j t P1 (ν j , λ, z, τ ) =
j =1
1 z+t τ Pλ ( , ) + ?(λ). 2π i M M
(4.16)
Regarding this as a system of linear equations in P1 (ν j , λ, z, τ ) for t = 0, 1, . . . , M − 1, we may invert to find that (with µ = ν j ) P1 (µ, λ, z, τ ) =
M−1 1 −t z+t τ Pλ ( µ , ) + 2π i?(λ) . 2πiM M M t=0
Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine
19
Now use Lemma 4.4 to obtain
P1 (µ, λ, z, τ ) =
M−1 N−1 1 −t k µ λ 2πiM
(4.17)
t=0 k=0
M−1 ?(λ) −t N τ (t + kτ ) z + t + kτ N τ G2 ( ) − ℘1 ( , ) + µ . M M M M M t=0
(We used (µ, λ) = (1, 1) to eliminate some terms in (4.17).) So now we get, using (4.7), (4.11) and (4.17): M−1 N−1 z −1 1 −t k µ λ )= P1 (µ, λ, , τ τ 2π iM t=0 k=0
M−1 −N (−k + tτ ) z − k + tτ −N ?(λ) −t G2 ( µ ) − ℘1 ( , ) + Mτ Mτ Mτ Mτ M t=0
M−1 N −1 Mτ 2 1 −t k Mτ Mτ (−k+tτ ) = µ λ G2 ( ) − 2∂ii 2πiM N N N Mτ t=0 k=0
M−1 Mτ z−k+tτ Mτ ?(λ) −t −− µ . ℘1 ( , ) + N N N M
(4.18)
t=0
Case 1. µ = 1 = λ. Here (4.18) simplifies to M−1 N−1 τ −t k z −1 z − k + tτ Mτ )=− , ) P1 (µ, λ, , µ λ ℘1 ( τ τ 2π iN N N t=0 k=0
=−
τ 2π iN
M−1 N−1
µ−t λk ℘1 (
t=0 k=0
z + N − k + tτ Mτ , ) N N
using (4.9). From (4.17) this is indeed equal to τ P1 (λ, µ−1 , z, τ ), as required. Case 2. λ = 1 = µ. This time (4.18) reads M−1 N−1 z −1 1 k P1 (µ, λ, , λ )= τ τ 2πiM t=0 k=0
Mτ N
2
Mτ 2π iMτ G2 ( )− N N
z − k + tτ Mτ Mτ ℘1 ( , ) + ?( la). − N N N
−k Mτ
(4.19)
20
C. Dong, H. Li, G. Mason
It is easy to check that
1 N
N−1 k=0
kλk = −?(λ), so that (4.19) simplifies to read
M−1 N−1 τ k z −1 Mτ k z − k + tτ Mτ P1 µ, λ, , =− ) + ℘1 ( , ) λ G2 ( τ τ 2πiN N N N N t=0 k=0
=
τ 2πiN
N−1 M−1 l=0 t=0
Mτ l z + l + tτ Mτ ) − ℘1 ( , ) λ−l G2 ( N N N N
= τ P1 (λ, µ−1 , z, τ ). This completes the discussion of Case 2. The final case λ = 1 = µ is completely analogous, and we accordingly omit details.This completes the proof of Theorem 4.3, hence also that of Theorem 4.2. We discuss some aspects of Bernoulli polynomials. Recall [La, Ra] that these polynomials Bk (x) are defined by the generating function ∞
tetx tk = . B (x) k et − 1 k!
(4.20)
1 1 B0 (x) = 1, B1 (x) = x − , B2 (x) = x 2 − x + . 2 6
(4.21)
k=0
For example
We will need the following identities (loc.cit.) N−1
(a + x)k−1 =
a=0
1 (Bk (x + N ) − Bk (x)), k
Bk (1 − x) = (−1)k Bk (x).
(4.22) (4.23)
Proposition 4.5. If µ = e2πij/M with 1 ≤ j ≤ M and k ≥ 2 then µm 1 −Bk (j/M) = . k k (2πi) m k! 0=m∈Z
Proof. This is a typical sort of calculation which we give using results from [La]. Now
m
k
µ /m =
∞ m µ m=1
0=m∈Z
=
mk
+ (−1)
kµ
−m
mk
M ∞ µt + (−1)k µ−t (Mn + t)k t=1 n=0
= M −k
M t=1
ζ (k, t/M)(µt + (−1)k µ−t ),
Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine
21
where ζ (k, x) is the Hurwitz zeta-function ζ (k, x) =
∞ n=0
For an
M th
1 . (n + x)k
root of unity α define fα : Z/MZ → C by fα (n) = α n . Set ξ(k, fα ) = M −k
M
fα (n)ζ (k, n/M).
n=1
In fact, one can use this definition for any function f : Z/MZ → C. Thus µm = ξ(k, fµ ) + (−1)k ξ(k, fµ−1 ). mk
0=m∈Z
Define
1 0
fj (n) = and fˆj (n) =
M
n ≡ j modM n ≡ j modM fj (a)e−2πian/M .
a=1
Then we get fˆj (n) = µ−n . Use (loc.cit. Theorem 2.1, p. 245) to get (remember k ≥ 2) 1−k −1 2π ξ(1−k, fj ) = (2πi) !(k) ξ(k, fµ )eπi(1−k)/2 − ξ(k, fµ−1 )e−πi(1−k)/2 M −k k−1 = (2πi) M (k − 1)! ξ(k, fµ ) + (−1)k ξ(k, fµ−1 ) . (! is the gamma-function here!) So now we have µm (2π i)k M 1−k = ξ(1 − k, fj ). k m (k − 1)!
0=m∈Z
On the other hand by definition, ξ(1 − k, fj ) = M k−1
M
fj (n)ζ (1 − k, n/M) = M k−1 ζ (1 − k, j/M)
n=1
so
µm (2π i)k ζ (1 − k, j/M). = mk (k − 1)!
0=m∈Z
Moreover (loc.cit. Corollary on p, 243) ζ (1 − k, j/M) = −!(k)Resz
z−k ezj/M = −(k − 1)!Bk (j/M)/k! = −Bk (j/M)/k. ez − 1
22
C. Dong, H. Li, G. Mason
So finally
µm (2π i)k = − Bk (j/M). mk k!
0=m∈Z
We next introduce the Q-functions. With µ = e2πij/M and λ = e2πil/N , we define for k = 1, 2, · · · and (µ, λ) = (1, 1), Qk (µ, λ, qτ ) = Qk (µ, λ, τ ) n+j/M
=
λ(n + j/M)k−1 qτ 1 n+j/M (k − 1)! 1 − λqτ n≥0
n−j/M
+
(−1)k λ−1 (n − j/M)k−1 qτ n−j/M (k − 1)! 1 − λ−1 qτ
−
n≥1
Bk (j/M) . k!
(4.24)
Here (n + j/M)k−1 = 1 if n = 0, j = 0 and k = 1. Similarly, (n − j/M)k−1 = 1 if n = 1, j = M and k = 1. For good measure we also set Q0 (µ, λ, τ ) = −1.
(4.25)
We need to justify the notation, which suggests that Qk (µ, λ, τ ) depends only on τ and the residue classes of j and l modulo M and N respectively. To see this, note that if we provisionally denote by Qk (µ, λ, τ ) the value of (4.24) in which j is replaced by j + M, then we find that Qk (µ, λ, τ ) − Qk (µ, λ, τ ) =
1 Bk (j/M + 1) Bk (j/M) (j/M)k−1 − + = 0, (k − 1)! k! k!
the last equality following from (4.22). We are going to prove Theorem 4.6. If k ≥ 0 then Qk (µ, λ, τ ) is a holomorphic modular form of weight k. If ab γ = ∈ ! it satisfies cd Qk (µ, λ, γ τ ) = (cτ + d)k Qk ((µ, λ)γ , τ ). As usual one needs to deal with the cases k = 1, 2 of Theorem 4.6 separately. To this end, for each element a = (a1 , a2 ) ∈ Q2 /Z2 , we recall the Klein and Hecke forms (loc.cit.) defined as follows: ga (τ ) = −qτB2 (a1 )/2 e2πia2 (a1 −1)/2 (1 − qa1 τ +a2 )
1 qa1 τ +a2 ha (τ ) = 2πi a1 − − − 2 1 − qa1 τ +a2
∞ m=1
∞ n=1
(1−qτn qa1 τ +a2 )(1 − qτnau qa−1 ), 1 τ +a2
qτm qa−1 qτm qa1 τ +a2 1 τ +a2 . − 1 − qτm qa1 τ +a2 1 − qτm qa−1 1 τ +a2 (4.26)
. Using (4.21) we easily find Proposition 4.7. Let a = (j/M, l/N ) ∈ Z. Then
Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine
23
(i) ha (τ ) = −2πiQ1 (µ, λ, τ ) d (ii) dτ (logga (τ )) = −2πiQ2 (µ, λ, τ ). Now we can prove Theorem 4.6 in the case k = 1, 2. For k = 1 we use (i) above together with Theorem 2 (i) and H3 of (loc.cit. p. 248). Similarly if k = 2 we use the calculation of (loc.cit. p. 251 et. seq.) We now consider the case k ≥ 3. In this case the result is a consequence of the following: Theorem 4.8. If k ≥ 3 then 1 (2π i)k
Qk (µ, λ, τ ) =
m1 ,m2 ∈Z
λ−m1 µm2 , (m1 τ + m2 )k
indicates that (m1 , m2 ) = (0, 0). λ−m1 µm2 Proof. The non-constant part of (m k is equal to 1 τ +m2 ) ∞ ∞ −m1 µm2 m1 µm2 λ λ + (−1)k (m1 τ + m2 )k (m1 τ − m2 )k
where
m2 ∈Z
m1 =1
=
∞ M m1 =1 t=1 n∈Z
µt
m1 =1
λ−m1 λm1 k + (−1) (m1 τ + Mn + t)k (m1 τ − Mn − t)k
M ∞
= M −k
µt
m1 =1 t=1 n∈Z λ−m1
(m1 τ/M + n + t/M)k
+ (−1)k
λm1 (m1 τ/M − n − t/M)k
.
Use (loc.cit. p. 155) to get this equal to M −k
M ∞ (−1)k µt (2π i)k
(k − 1)!
m1 =1 t=1 ∞ n=1
n n nk−1 λ−m1 qm +(−1)k nk−1 λm1 qm 1 τ/M+n+t/M 1 τ/M−n−t/M
= (−1)k M −k
∞ M ∞ (2πi)k t µ (k − 1)! m1 =1 t=1
n=1
m1 n m1 n nk−1 λ−m1 ν tn qτ/M +(−1)k nk−1 λm1 ν −tn qτ/M
= (−1)k M −k
d|n
M
∞
t=1
n=1
(2π i)k t µ (k − 1)!
n d k−1 (λ−n/d ν td + (−1)k λn/d ν −td qτ/M
24
C. Dong, H. Li, G. Mason
(where ν = e2πi/M ). Using orthogonality relations for roots of unity, this is equal to ∞ M 1−k (−2πi)k (k − 1)! n=1
=
=
d k−1 λ−n/d + (−1)k
d|n
d≡−j (modM)
d|n
n d k−1 λn/d qτ/M
d≡j (modM)
∞ ∞ M 1−k (2πi)k m m(Mn+j ) λ (Mn + j )k−1 qτ/M (k − 1)! n=0 m=1 m(Mn+M−j ) k −m +(−1) λ (Mn + M − j )k−1 qτ/M
M 1−k (2πi)k (k − 1)! Mn+j ∞ −1 (Mn + M − j )k−1 q Mn+M−j λ(Mn + j )k−1 qτ/M λ τ/M , + (−1)k Mn+j −1 q Mn+M−j 1 − λq 1 − λ n=0 τ/M τ/M
where 1 ≤ j ≤ M. Together with Proposition 4.5 we now get 1 λ−m1 µm2 Bk (j/M) M 1−k = − + (2πi)k (m τ + m2 )k k! (k − 1)! 1 Mn+j ∞ −1 (Mn + M − j )k−1 q Mn+M−j λ λ(Mn + j )k−1 qτ/M τ/M + (−1)k Mn+j −1 q Mn+M−j 1 − λq 1 − λ n=0 τ/M τ/M
= Qk (µ, λ, τ ). This completes the proof of Theorem 4.8 and hence also Theorem 4.6.
We remark the well-known fact that if we take (µ, λ) = (1, 1) in Theorem 4.8 we obtain the Eisenstein series of weight k as long as k ≥ 3 is even. Thus for k ≥ 4 even,
1 (m1 τ + m2 )k m1 ,m2 ∈Z
∞ 2 k −Bk (0) n = (2πi) σk−1 (n)q , + k! (k − 1)!
Gk (τ ) =
(4.27)
n=1
where σk−1 (n) = d|n d k−1 . For this range of values of k, Gk (τ ) is a modular form on SL(2, Z) of weight k. We make use of the normalized Eisenstein series ∞
Ek (τ ) =
1 −Bk (0) 2 Gk (τ ) = σk−1 (n)q n + k (2πi) k! (k − 1)!
(4.28)
n=1
for even k ≥ 2. (Warning: this is not the same notation as used in (loc.cit.), for example.)
Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine
25
We also utilize the differential operator ∂k which acts on holomorphic functions on h via ∂k f (τ ) =
1 df (τ ) + kE2 (τ )f (τ ). 2π i dτ
(4.29)
One checks using (4.7) that (∂k f )|k+2 γ = ∂k (f |k γ ).
(4.30)
We complete this section with a discussion of some further functions related to P and Q which occur later. Again we take µ = e2πij/M and λ = e2πil/N . Set for k ≥ 1, P¯k (µ, λ, z, qτ ) = P¯k (µ, λ, z, τ ) =
1 (k − 1)!
n∈j/M+Z
nk−1 zn 1 − λqτn
(4.31)
with P¯0 = 0.
(4.32)
Recall (3.6). We shall need the following result in Sect. 8. Proposition 4.9. If m ∈ Z, µ = e2πij/M , k ≥ 0 and (µ, λ) = (1, 1), then 1 Qk (µ, λ, τ ) + Bk (1 − m + j/M) k z1 −1 m−j/M −m+j/M ¯ = Resz ιz,z1 ((z − z1 ) )z1 z Pk (µ, λ, , τ ) z z1 qτ −1 m−j/M −m+j/M ¯ − Resz λιz1 ,z ((z − z1 ) )z1 Pk (µ, λ, , τ) . z z Proof. The result is clear if k = 0, so take k ≥ 1. The first of the two residues we must evaluate is equal to r+j/M z1 k−1 ∞ (r + j/M) z 1 Resz r+j/M (k − 1)!z 1 − λq τ n=0 r∈Z ∞
= =
(−n − m + j/M)k−1 1 −n−m+j/M (k − 1)! 1 − λqτ (−1)k
n=0 ∞
(k − 1)! n=m
n−j/M
λ−1 (n − j/M)k−1 qτ
n−j/M
1 − λ−1 qτ
Similarly the second residue is equal to ∞ n+j/M −1 λ(n + j/M)k−1 qτ . n+j/M (k − 1)! 1 − λqτ n=1−m
.
26
C. Dong, H. Li, G. Mason
Comparing with the definition of Qk (µ, λ, τ ), we see that we are reduced to establishing that 1 (Bk (1 − m + j/M) − Bk (j/M)) k! −m n+j/M −1 λ(n + j/M)k−1 qτ n+j/M (k − 1)! 1 − λqτ n=0 n−j/M −1 λ (n−j/M)k−1 qτ 0 (−1)k + (k−1)! , n−j/M n=m −1 1−λ qτ = n+j/M −1 λ(n+j/M)k−1 qτ 1 n+j/M n=1−m (k−1)! 1−λqτ k m−1 λ−1 (n−j/M)k−1 q n−j/M (−1) τ , n−j/M − (k−1)! n=1 1−λ−1 qτ 0,
m≤0
.
(4.33)
m≥2 m=1
The case m = 1 is trivial. Assume next that m ≤ 0. Then the two summations in (4.33) are equal to −m
n+j/M
−1 λ(n + j/M)k−1 qτ n+j/M (k − 1)! 1 − λqτ
−
n=0
0 (−n + j/M)k−1 1 (k − 1)! n=m λqτ−n+j/M − 1
−m
=
1 (n + j/M)k−1 . (k − 1)! n=0
Now the desired result follows from (4.22). Similarly, if m ≥ 2 the two summands in (4.33) sum to m−1 (−1)k (n − j/M)k−1 . (k − 1)! n=1
Two applications of (4.22) then yield the desired result.
5. The Space of 1-Point Functions on the Torus The following notation will be in force for some time: V is a vertex operator algebra . g, h ∈ Aut(V ) have finite order and satisfy gh = hg. A = "g, h#. N = lcm(T , T1 ). g has order T , h has order T1 and A has exponent ab in SL(2, Z) satisfying a ≡ d ≡ 1 (e) !(T , T1 ) is the subgroup of matrices cd (modN ), b ≡ 0 (modT ), c ≡ 0 (modT1 ). (f) M(T , T1 ) is the ring of holomorphic modular forms on !(T , T1 ); it is naturally graded M(T , T1 ) = ⊕k≥0 Mk (T , T1 ), where Mk (T , T1 ) is the space of forms of weight k. We also set M(1) = M(1, 1). (g) V (T , T1 ) = M(T , T1 ) ⊗C V . (a) (b) (c) (d)
Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine
27
(h) O(g, h) is the M(T , T1 )-submodule of V (T , T1 ) generated by the following elements, where v ∈ V satisfies gv = µ−1 v, hv = λ−1 v : v[0]w, w ∈ V , (µ, λ) = (1, 1), v[−2]w +
∞
(2k − 1)E2k (τ ) ⊗ v[2k − 2]w, (µ, λ) = (1, 1),
(5.1) (5.2)
k=2
v, (µ, λ) = (1, 1), ∞
Qk (µ, λ, τ ) ⊗ v[k − 1]w, (µ, λ) = (1, 1).
(5.3) (5.4)
k=0
Here, notation for modular forms is as in Sect. 4. These definitions are sensible because of the following: Lemma 5.1. M(T , T1 ) is a Noetherian ring which contains each E2k (τ ), k ≥ 2, and each Qk (µ, λ, τ ), k ≥ 0, for µ, λ a T th ., resp. T1th . root of unity. Proof. It is well-known that the ring of holomorphic modular forms on any congruence subgroup of SL(2, Z) is Noetherian. So the first statement holds. Each E2k is a modular form on SL(2, Z), whereas the containment Qk (µ, λ, τ ) ∈ Mk (T , T1 ) follows from Theorem 4.6. Lemma 5.2. Suppose that V satisfies Condition C2 . Then V (T , T1 )/O(g, h) is a finitelygenerated M(T , T1 )-module. Proof. Since C2 (V ) is a graded subspace of V of finite codimension, there is an integer m such that Vn ⊂ C2 (V ) whenever n > m. Let M be the M(T , T1 )-submodule of V (T , T1 ) generated by W = ⊕n≤m Vn . The lemma will follow from the assertion that V (T , T1 ) = M +O(g, h). This will be established by proving that if v ∈ V[k] (cf. (2.22)) then v ∈ M + O(g, h). If k ≤ m then v ∈ W by (2.23) and we are done, so we may take k > m. Since V[k] ⊂ C2 (V ) + W then we may write v in the form v=w+
p
ai (−2)bi
i=1
with ai , bi ∈ V homogeneous in the vertex operator algebra (V, [ ]) such that wt[ai ] + wt[bi ] + 1 = k. Clearly, it suffices to show that ai (−2)bi ∈ M + O(g, h). We may also assume that gai = µ−1 ai , hai = λ−1 ai for suitable scalars µ, λ. Suppose first that (µ, λ) = (1, 1). From (5.2) we see that O(g, h) contains ai [−2]bi +
∞
(2l − 1)E2l (τ ) ⊗ ai [2l − 2]bi .
l=2
Since wt[ai [2l − 2]bi ] = k − 2l, then the sum ∞
(2l − 1)E2l (τ ) ⊗ ai [2l − 2]bi
l=2
lies in M + O(g, h) by the inductive hypothesis, whence so too does ai [−2]bi .
28
C. Dong, H. Li, G. Mason
On the other hand, it follows from (2.9) that we have v(n) = v[n] + αj v[j ] j >n
for v ∈ V , j ∈ Z and scalars αj . In particular we get αj ai [j ]bi . ai (−2)bi = ai [−2]bi + j >−2
Having already shown that each of the summands ai [j ]bi lies in M + O(g, h), j ≥ −2, we get ai (−2)bi ∈ M + O(g, h) as desired. Now suppose that (µ, λ) = (1, 1). In this case (5.4) tells us that O(g, h) contains the element ∞ −ai [−1]bi + Ql (µ, λ, τ ) ⊗ ai [l − 1]bi l=1
(cf. (4.25)). More to the point, O(g, h) also contains the same expression with ai replaced by L[−1]ai . Since (L[−1]ai )[t] = −tai [t − 1] by (2.7), we see that O(g, h) contains the element ∞ ai [−2]bi + (l − 1)Ql (µ, λ, τ ) ⊗ ai [l − 2]bi . l=1
Now we proceed as before to conclude that ai (−2)bi ∈ M + O(g, h).
There is a natural grading on V (T , T1 ). Namely, the subspace of elements of degree n is defined to be V (T , T1 )n = ⊕k+l=n Mk (T , T1 ) ⊗ V[l] .
(5.5)
Observe that O(g, h) is a graded subspace of V (T , T1 ). Lemma 5.3. Suppose V satisfies Condition C2 . If v ∈ V then there is m ∈ N and elements ri (τ ) ∈ M(T , T1 ), 0 ≤ i ≤ m − 1, such that L[−2]m v +
m−1
ri (τ ) ⊗ L[−2]i v ∈ O(g, h).
(5.6)
i=0
Proof. Let I be the M(T , T1 )-submodule of V (T , T1 )/O(g, h) generated by {L[−2]i v, i ≥ 0}. Since M(T , T1 ) is a Noetherian ring, Lemma 5.2 tells us that I is finitely generated and so some relation of the form (5.6) must hold. We now define the space of (g, h) 1-point functions C 1 (g, h) to be the C-linear space consisting of functions S : V (T , T1 ) × h → C which satisfy (C1) S(v, τ ) is holomorphic in τ for v ∈ V (T , T1 ). (C2) S(v, τ ) is M(T , T1 )-linear in the sense that S is C-linear in v and satisfies S(f ⊗ v, τ ) = f (τ )S(v, τ ) for f ∈ M(T , T1 ) and v ∈ V .
(5.7)
Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine
29
(C3) S(v, τ ) = 0 if v ∈ O(g, h). (C4) If v ∈ V satisfies gv = hv = v then S(L[−2]v, τ ) = ∂S(v, τ ) +
∞
E2l (τ )S(L[2l − 2]v, τ ).
(5.8)
l=2
In (5.8), ∂S is the operator which is linear in v and satisfies ∂S(v, τ ) = ∂k S(v, τ ) =
1 d S(v, τ ) + kE2 (τ )S(v, τ ) 2π i dτ
for v ∈ V[k] (cf. (4.29)).
Theorem 5.4 (Modular-Invariance). For S ∈ C1 (g, h) and γ =
ab cd
(5.9)
∈ SL(2, Z)
define S|γ (v, τ ) = S|k γ (v, τ ) = (cτ + d)−k S(v, γ τ )
(5.10)
for v ∈ V[k] , and extend linearly. Then S|γ ∈ C1 ((g, h)γ ). Proof. We need to verify that S|γ satisfies (C3)-(C4) with (g, h)γ = (g a hc , g b hd ) in place of (g, h). Step 1. S|γ vanishes on O((g, h)γ ). Pick v, w ∈ V homogeneous in (V , Y [ ]) and with g a hc v = µ−1 v, g b hd v = λ−1 v. Suppose to begin with that (µ, λ) = (1, 1). We must show that S|γ (u, τ ) = 0 when u is one of the elements (5.1) and (5.2). This follows easily from (5.10), the equality S(u, τ ) = 0, and the fact that E2k is modular of weight 2k. Now assume that (µ, λ) = (1, 1). If gv = α −1 v and hv = β −1 v then (α, β) = (1, 1), so that certainly S|γ (v, τ ) = (cτ + d)−wt[v] S(v, γ τ ) = 0. So it remains to show that S|γ (u, τ ) = 0 for ∞ u= Qk (µ, λ, τ ) ⊗ v[k − 1]w. k=0
First note that we have (α, β) = (µ, λ)γ −1 . Then with Lemma 5.1 and Theorem 4.6 we calculate that S|γ (u, τ ) = =
∞ k=0 ∞
Qk (µ, λ, τ )S|γ (v[k − 1]w, τ ) Qk (µ, λ, τ )(cτ + d)−wt[v]−wt[w]+k S(v[k − 1]w, γ τ )
k=0
= (cτ + d)−wt[v]−wt[w] = (cτ + d)−wt[v]−wt[w]
∞ k=0 ∞ k=0
which is indeed 0 since S ∈ C1 (g, h).
Qk (α, β, γ τ )S(v[k − 1]w, γ τ ) S(Qk (α, β, γ τ ) ⊗ v[k − 1]w, γ τ )
30
C. Dong, H. Li, G. Mason
Step 2. S|γ satisfies (5.8). First note that if g a hc v = g b hd v = v then also gv = hv = v. Then we calculate using (4.30) that S|γ (L[−2]v, τ ) = (cτ + d)−wt[v]−2 S(L[−2]v, γ τ ) = (cτ + d)−wt[v]−2 (∂S(v, γ τ ) +
∞
E2k (γ τ )S(L[2k − 2]v, γ τ ))
k=2
= (∂wt[v] S)|wt[v]+2 γ (v, τ ) +
∞
(cτ + d)2k−wt[v]−2 E2k (τ )S(L[2k − 2]v, γ τ )
k=2
= ∂wt[v] (S|wt[v] γ )(v, τ ) +
∞
E2k (τ )S|γ (L[2k − 2]v, τ ).
k=2
This completes the proof of Step 2, and with it that of the theorem.
6. The Differential Equations In this section we study certain differential equations which are satisfied by functions S(v, τ ) in the space of (g, h) 1-point functions. The idea is to exploit Lemma 5.3 together with (5.8). We fix an element S ∈ C1 (g, h). Lemma 6.1. Let v ∈ V and suppose that V satisfies Condition C2 . There are m ∈ N and ri (τ ) ∈ M(T , T1 ), 0 ≤ i ≤ m − 1, such that S(L[−2]m v, τ ) +
m−1
ri (τ )S(L[−2]i v, τ ) = 0.
(6.1)
i=0
Proof. Combine Lemma 5.3 together with (C2) and (C3).
In the following we extend (5.9) by setting for f ∈ Ml (T , T1 ), v ∈ V[k] , ∂S(f ⊗ v, τ ) = ∂k+l (S(f ⊗ v, τ )) = ∂k+l (f (τ )S(v, τ ))
(6.2)
(cf. (4.29). Then define ∂ i S(f ⊗ v, τ ) = ∂k+l+2(i−1) (∂ i−1 S(f ⊗ v, τ ))
(6.3)
∂S(f ⊗ v, τ ) = (∂l f (τ ))S(v, τ ) + f (τ )∂S(v, τ ).
(6.4)
for i ≥ 1. Note that
Moreover ∂l f (τ ) ∈ Ml+2 (T , T1 ) as we see from (4.30). The simplest case to study is that corresponding to a primary field, i.e. a vector v which is a highest weight vector for the Virasoro algebra in (V , Y [ ]). Thus v satisfies L[n]v = 0 for n > 0. We assume that this holds until further notice. First note that we have S(L[−2]v, τ ) = ∂S(v, τ ).
(6.5)
Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine
31
This follows from (5.8) if gv = hv = v. In general, it is a consequence of this special case, the linearity of S(v, τ ) in v, and the identity S(w, τ ) = 0 if gw = µ−1 w, hw = λ−1 w and (µ, λ) = (1, 1). This latter equality follows from (5.3) and (C3). In the same way, we find from (5.8) that S(L[−2]i+1 v, τ ) = ∂S(L[−2]i v, τ ) +
∞
E2k (τ )S(L[2k − 2]L[−2]i v, τ ).
(6.6)
k=2
Using the Virasoro algebra relation we easily find that for i ∈ N and k ≥ 2 there are scalars αij k , 0 ≤ j ≤ i − 1 such that L[2k − 2]L[−2]i v =
i−1
αij k L[−2]j v,
(6.7)
j =0
so that (6.6) becomes S(L[−2]i+1 v, τ ) = ∂S(L[−2]i v, τ ) +
i−1 ∞
αij k E2k (τ )S(L[−2]j v, τ ).
(6.8)
j =0 k=2
Now proceeding by induction on i, the case i = 1 being (6.5), one proves Lemma 6.2. Suppose that L[n]v = 0 for n > 0. Then for i ≥ 1 there are elements fj (τ ) ∈ M(1), 0 ≤ j ≤ i − 1, such that S(L[−2]i v, τ ) = ∂ i S(v, τ ) +
i−1
fj (τ )∂ j S(v, τ ).
(6.9)
j =0
Combine Lemmas 6.2 and 6.1 to obtain Lemma 6.3. Suppose that V satisfies Condition C2 , and that v ∈ V satisfies L[n]v = 0 for n > 0. Then there are m ∈ N and gi (τ ) ∈ M(T , T1 ), 0 ≤ i ≤ m − 1, such that ∂ m S(v, τ ) +
m−1
gi (τ )∂ i S(v, τ ) = 0.
(6.10)
i=0
Bearing in mind the definition of ∂ (cf. (5.9), (6.3)), (6.10) may be reformulated as follows: Proposition 6.4. Let R = R(T , T1 ) be the ring of holomorphic functions generated by E2 (τ ) and M(T , T1 ). Suppose that V satisfies Condition C2 , and that v ∈ V satisfies L[n]v = 0 for n > 0. Then there are m ∈ N and ri (τ ) ∈ R(T , T1 ), 0 ≤ i ≤ m − 1, such that m−1
(q 1 T
d m d i ) S(v, τ ) + ri (τ )(q 1 ) S(v, τ ) = 0, T dq 1 dq 1
where q 1 = e2πiτ/T . T
T
i=0
T
(6.11)
32
C. Dong, H. Li, G. Mason
d 1 d 1 d = q1 . = dq T T dq 1 2π i dτ T Now (6.11) is a homogeneous linear differential equation with holomorphic coefficients ri (τ ) ∈ R, and such that 0 is a regular singular point. The forms in R(T , T1 ) have Fourier expansions at ∞ which are power series in q 1 because they are invariant under T 1T . We are therefore in a position to apply the theory of Frobenius–Fuchs concern0 1 ing the nature of the solutions to such equations. A good reference for the elementary aspects of this theory is [I], but the reader may also consult [AM] where they arise in a context related to that of the present paper. We will also need some results from (loc. cit.) in Sect. 11. Frobenius–Fuchs theory tells us that S(v, τ ) may be expressed in the following form: for some p ≥ 0, We observe here only that q
S(v, τ ) =
p
(log q 1 )i Si (v, τ ), T
i=0
(6.12)
where Si (v, τ ) =
b(i)
q λij Si,j (v, τ ),
(6.13)
j =1
Si,j (v, τ ) =
∞
ai,j,n (v)q n/T
(6.14)
n=0
are holomorphic on the upper half-plane, and 1 λi,j1 ≡ λi,j2 (mod Z) T
(6.15)
for j1 = j2 . We are going to prove Theorem 6.5. Suppose that V satisfies Condition C2 . For every v ∈ V , the function S(v, τ ) ∈ C1 (g, h) can be expressed in the form (6.12)-(6.15). Moreover, p is bounded independently of v. We begin by proving by induction on k that if v ∈ V[k] then S(v, τ ) has an expression of the type (6.12). We have already shown this if v is a highest weight vector for the Virasoro algebra and in particular if v is in the top level V[t] of (V , Y [ ]), i.e., if V[t] = 0 and V[s] = 0 for s < t. This begins the induction. The proof is an elaboration of the previous case. We may assume that gv = hv = v. Lemma 6.6. Suppose that l ≥ 2 and i ≥ 0. Then there are scalars αij l and wij l ∈ V[2i+2−2l−2j +k] , 0 ≤ j ≤ i − 1, such that L[2l − 2]L[−2]i v = L[−2]i L[2l − 2]v +
i−1
αij l L[−2]j wij l .
j =0
Moreover wt[wij l ] ≤ wt[v] with equality only if wij l = v.
(6.16)
Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine
33
Proof. By induction on i + l, the case i = 0 being trivial. Now we calculate c L[2l − 2]L[−2]i+1 v = (L[−2]L[2l − 2] + 2lL[2l − 4] + δl,2 )L[−2]i v 2 i−1 c = L[−2]i+1 L[2l−2]v+ αij l L[−2]j +1 wij l +2lL[2l−4]L[−2]i v+δl,2 L[−2]i v. 2 j =0
Either l = 2 or the inductive hypothesis applies to L[2l − 4]L[−2]i v, and in either case the lemma follows. Now use (5.8) and Lemma 6.6 to see that S(L[−2]i+1 v, τ ) = ∂S(L[−2]i v, τ ) +
∞
E2l (τ )(S(L[−2]i L[2l − 2]v, τ ) +
l=2
i
αij l S(L[−2]j wij l , τ )). (6.17)
j =0
Note that (6.17) is the appropriate generalization of (6.8). By induction based on (6.17) we find Lemma 6.7. For i ≥ 1 we have S(L[−2]i v, τ ) = ∂ i S(v, τ ) +
i−1
fij (τ )∂ j S(v, τ ) +
j =0
i−1 j =0
gij l (τ )∂ j S(wij l , τ ),
l
(6.18) where fij (τ ), gij l (τ ) ∈ M(1) and wt[wij l ] < wt[v]. The analogue of Lemma 6.3 is now Lemma 6.8. There is m ∈ N such that ∂ m S(v, τ ) +
m−1
gi (τ )∂ i S(v, τ ) +
i=0
m j =0
hj l (τ )∂ j S(wj l , τ ) = 0
(6.19)
l
for gi (τ ), hj l (τ ) ∈ M(T , T1 ), and wt[wj l ] < wt[v]. We are now in a position to complete the proof that S(v, τ ) has an expression of the form (6.12)-(6.15). By induction this is true of the terms S(wj l , τ ) in (6.19), and hence the third summand on the l.h.s of (6.19) also has such an expression. Thus as before we may view (6.19) as a differential equation of regular singular type, this time inhomogeneous, namely, p
m−1
(q 1 T
d m d i ) S(v, τ ) + ri (τ )(q 1 ) S(v, τ ) + (log q 1 )i ui (v, τ ) = 0, T T dq 1 dq 1 T
i=0
T
i=0
(6.20) where ri (τ ) ∈ M(T , T1 ) (cf. Proposition 6.4) and ui (v, τ ) satisfies (6.13)-(6.15). One easily sees (cf. [I, AM]) that the functions (log q 1 )i ui (v, τ ), 0 ≤ i ≤ p, T are themselves solutions of a differential equation of regular singular type (6.11) with
34
C. Dong, H. Li, G. Mason
coefficients analytic in the upper half plane. Let us formally state this by saying that they are solutions of the differential equation L1 f = 0, where L1 is a suitable linear differential operator with 0 as regular singular point and coefficients analytic in the upper half plane. Now (6.20) takes the form L2 S + f = 0 for the corresponding linear differential operator L2 , so that we get L1 L2 S = 0. But L1 L2 is once again a linear differential operator of the appropriate type, so again the Frobenius–Fuchs theory allows us to conclude that S(v, τ ) indeed satisfies (6.12)-(6.15). It remains to prove that the integer p in (6.12) can be bounded independently of v. Indeed, we showed in Lemma 5.2 that if W = ⊕n≤m Vn and Vn ⊂ C2 (V ) for n > m then V(T , T1 )/O(g, h) is generated as M(T , T1 )-module by W . So for v ∈ V we have v≡ i fi (τ ) ⊗ wi (mod O(g, h)), where {wi } is a basis for W , whence S(v, τ ) = i fi (τ )S(wi , τ ) since S vanishes on O(g, h). Clearly then, we may take p to be the maximum of the corresponding integers determined by S(wi , τ ). This completes the proof of Theorem 6.5. 7. Formal 1-Point Functions Although we dealt with holomorphic functions in Sect. 6, the arguments were all formal in nature. In this short section we record a consequence of this observation. We identify elements of M(T , T1 ) with their Fourier expansions at ∞, which lie in the ring of formal power series C[[q 1 ]]. Similarly, the functions E2k (τ ), k ≥ 1, are T considered to lie in C[[q]]. The operator ∂ (cf. (5.9), (6.2)) operates on these and other power series via the identification
1 d 2πi dq
=
q1
d T T dq 1
.
T
A formal (g, h) 1-point function is a map
S : V (T , T1 ) → P , where P is the space of formal power series of the form qλ
∞
an q n/T
(7.1)
n=0
for some λ ∈ C, and which satisfies the formal analogues of (C2)-(C4) in Sect. 5. We will establish Theorem 7.1. Suppose that S is a formal (g, h) 1-point function. Then S defines an element of C1 (g, h), also denoted by S, via the identification S(v, τ ) = S(v, q), q = qτ = e2πiτ .
(7.2)
The main point is to show that if S is a formal (g, h) 1-point function, and if v ∈ V is such that ∞ λ S(v, q) = q an q n/T qτλ
∞
n/T n=0 an qτ
n=0
then is holomorphic in τ . We prove this as in Sect. 6. Namely, by first showing that if v is a highest weight vector for the Virasoro algebra then S(v, q) satisfies a differential equation of type (6.11). Since the coefficients are holomorphic in h, the Frobenius–Fuchs theory tells us that S(v, q) has the desired convergence.
Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine
35
Proceeding by induction on wt[v], in the general case we arrive at an inhomogeneous differential equation of type (6.20). Again convergence of S(v, q) follows from the Frobenius–Fuchs theory. Since the proofs of these assertions are precisely the same as those of Sect. 6, we omit further discussion. 8. Correlation Functions In this section we start to relate the theory of 1-point functions to that of twisted V modules. We keep the notation (a)-(h) introduced at the beginning of Sect. 5, and introduce now a simple g-twisted V -module M = M(g) = ⊕∞ n=0 Mλ+n/T (cf. (3.11)). We further assume that h leaves M stable, that is h ◦ M & M. As remarked in Sect. 3, there is a projective representation on M of the stabilizer (in AutV ) of M, and we let φ(h) be a linearized action on M of the element corresponding to h. This all means (cf. (3.15), (3.16)) that if v ∈ V operates on M via the vertex operator YM (v, z) then we have (as operators on M) φ(h)YM (v, z)φ(h)−1 = YM (hv, z).
(8.1)
We define M = ⊕∞ n=0 Mλ+n/T to be the restricted dual of M, so that Mn = HomC (Mn , C) and there is a pairing ", # : M × M → C such that "Mn , Mm # = 0 if m = n. With this notation, a (g, h) 1-point correlation function is essentially a trace function, namely trYM (v, z)q L(0) = "w , YM (v, z)q L(0) w#, (8.2) w,w
where w ranges over a homogeneous basis of M, w ranges over the dual basis of M , and q is indeterminate. As Laurent series we have trYM (v, z)q L(0) = "w , v(n)q L(0) w#z−n−1 . (8.3) w,w n∈ 1 Z T
It is easy to see that the trace function is independent of the choice of basis. Now we introduce the function T which is linear in v ∈ V , and defined for homogeneous v ∈ V as follows: T (v) = TM (v, (g, h), q) = zwtv trYM (v, z)φ(h)q L(0)−c/24 .
(8.4)
Here c is the central charge of V . Next observe that m ∈ T1 Z, v(m) maps Mn to Mn+wtv−m−1 . So unless m = for wtv − 1, we have "w , v(m)φ(h)w# = 0. So only the zero mode o(v) = v(wtv − 1) contributes to the sum in (8.3). Thus T (v) is independent of z, and T (v) = q λ−c/24
∞
tr Mλ+n/T o(v)φ(h)q n/T .
(8.5)
n=0
We could equally write TM (v) = tr M o(v)φ(h)q L(0)−c/24 . We are going to prove
(8.6)
36
C. Dong, H. Li, G. Mason
Theorem 8.1. T (v) ∈ C1 (g, h). The strategy is to prove that T is a formal (g, h) 1-point function, then invoke Theorem 7.1. Certainly T (v) has the correct shape as a power series in q (cf. (7.1). So we must establish that T (v) satisfies the formal analogues of (C2)-(C4). We can impose M(T , T1 )-linearity (C2) by extension of scalars. As we shall explain, the proof of (C4) is contained in Zhu’s paper [Z]. So it remains to discuss (C3), that is we must show that T (v) vanishes on O(g, h), i.e., on the elements of type (5.1)-(5.4). Again we shall later explain that (5.1) and (5.2) may be deduced from results in [Z], so we concentrate on (5.3) and (5.4). To this end, let us now fix a homogeneous v ∈ V such that gv = µ−1 v, hv = λ−1 v and (µ, λ) = (1, 1). We need to establish Lemma 8.2. T (v) = 0. Theorem 8.3. ∞ k=0 Qk (µ, λ, τ )T (v[k − 1]w) = 0 for any w ∈ V . The proof of Lemma 8.2 is easy. We have already seen that only the zero mode o(v) of v contributes a possible non-zero term in the calculation of T (v). On the other hand, if µ = 1 then from (3.4) we see that o(v) = 0. So Lemma 8.2 certainly holds if µ = 1. Suppose that λ = 1. We have T (v) = tr M o(v)φ(h)q L(0)−c/24 = tr M φ(h)o(v)q L(0)−c/24 , i.e., trYM (v, z)φ(h)q L(0) = trφ(h)YM (v, z)q L(0) .
(8.7)
φ(h)YM (v, z) = λ−1 YM (v, z)φ(h).
(8.8)
But (8.1) yields
As λ = 1, (8.7) and (8.8) yield trYM (v, z)φ(h)q L(0) = 0. This completes the proof of Lemma 8.2. The proof of Theorem 8.3 is harder. We first need to define n-point correlation functions. These are multi-linear functions T (v1 , . . . , vn ), vi ∈ V , defined for vi homogeneous via T (v1 , . . . , vn ) = T ((v1 , z1 ), . . . , (vn , zn ), (g, h), q)
= z1wtv1 · · · znwtvn trYM (v1 , z1 ) · · · YM (vn , zn )φ(h)q L(0)−c/24 .
(8.9)
We only need the case n = 2. We will prove Theorem 8.4. Let v, v1 ∈ V be homogeneous with gv = µ−1 v, hv = λ−1 v and (µ, λ) = (1, 1). Then T (v, v1 ) =
∞
z1 P¯k (µ, λ, , q)T (v[k − 1]v1 ), z
k=1 ∞
T (v1 , v) = λ
k=1
where P¯k is as in (4.31).
z1 P¯k (µ, λ, q, q)T (v[k − 1]v1 ), z
(8.10)
(8.11)
Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine
37
We start with Lemma 8.5. Let k ∈
1 T Z.
Then
(1 − λq k )trv(wtv − 1 + k)YM (v1 , z1 )φ(h)q L(0) ∞ wtv − 1 + k wtv−1+k−i z1 = trYM (v(i)v1 , z1 )φ(h)q L(0) i
(8.12)
i=0
(1 − λq k )trYM (v1 , z1 )v(wtv − 1 + k)φ(h)q L(0) ∞ wtv − 1 + k wtv−1+k−i z1 = λq k trYM (v(i)v1 , z1 )φ(h)q L(0) . i
(8.13)
i=0
Proof. We have trv(wtv − 1 + k)YM (v1 , z1 )φ(h)q L(0) = tr[v(wtv − 1 + k), YM (v1 , z1 )]φ(h)q L0 + trYM (v1 , z1 )v(wtv − 1 + k)φ(h)q
L(0)
(8.14) .
From (8.8) we get v(wtv − 1 + k)φ(h) = λφ(h)v(wtv − 1 + k), moreover, v(wtv − 1 + k)q L(0) = q k q L(0) v(wtv − 1 + k). Hence trYM (v1 , z1 )v(wtv − 1 + k)φ(h)q L(0) = λq k trv(wtv − 1 + k)YM (v1 , z1 )φ(h)q L(0) . (8.15) Using the relation [v(m), YM (v1 , z1 )] =
∞ m i=0
i
z1m−i YM (v(i)v1 , z1 )
which is a consequence of the Jacobi identity (2.3) we get tr[v(wtv − 1 + k), YM (v1 , z1 )] =
∞ wtv − 1 + k i=0
i
z1wtv−1+k−i YM (v(i)v1 , z1 ).
Both parts of the lemma follow from (8.15)-(8.16).
(8.16)
38
C. Dong, H. Li, G. Mason
Now we turn to the proof of (8.10) of Theorem 8.4. Using (8.12) in the last lemma and setting µ = e2πir/T we have T (v, v1 ) = T ((v, z), (v1 , z1 ), (g, h), q) = zwtv z1wtv1 trYM (v, z)YM (v1 , z1 )φ(h)q L(0)−c/24 = zwtv z1wtv1 z−wtv−k trv(wtv − 1 + k)YM (v1 , z1 )φ(h)q L(0)−c/24 k∈Z+ Tr
= z1wtv1
z−k (1 − λq k )−1
k∈Z+ Tr
∞
wtv − 1 + k wtv−1+k−i z1 YM (v(i)v1 , z1 )φ(h)q L(0)−c/24 i i=0 ∞ z1 wtv − 1 + k T (v(i)v1 ) = ( )k (1 − λq k )−1 i z r i=0
k∈Z+ T
=
∞
(
k∈Z+ Tr
=
i
z1 k ) (1 − λq k )−1 c(wtv, i, m)k m T (v(i)v1 ) z
i ∞
i=0 m=0
m!P¯m+1 (µ, λ, z1 /z, q)c(wtv, i, m)T (v(i)v1 )
i=0 m=0
=
∞
P¯m+1 (µ, λ, z1 /z, q)T (v[m]v1 ),
m=0
where we have used (2.10) and (4.31). This is precisely (8.10) of Theorem 8.4. Equation (8.11) follows in the same way by using (8.13). Before proving Theorem 8.3 we still need Lemma 8.6. We have ∞ ∞ r/T 1 v(i − 1). Bk (1 − wtv + r/T )v[k − 1] = i k! k=0
i=0
Proof. The l.h.s. of the equality is equal to Resw Y [v, w]
e(1−wtv+r/T )w e(1+r/T )w = Resw Y (v, ew − 1) w w e −1 e −1 (1 + z)r/T = Resz Y (v, z) z ∞ r/T = v(i − 1) i i=0
as required.
Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine
39
Proof of Theorem 8.3. Combine Lemma 8.6 and Proposition 4.9 to get ∞
=
Qk (µ, λ, q)T (v[k − 1]w)
k=0 ∞
z1 wtv−r/T −wtv+r/T ¯ Resz ιz,z1 ((z − z1 )−1 )z1 z Pk (µ, λ, , q) T (v[k − 1]w) z k=1 ∞ z1 q wtv−r/T −wtv+r/M ¯ −λ Pk (µ, λ, Resz ιz1 ,z ((z−z1 )−1 )z1 z , q) T (v[k−1]w) z k=1 ∞ r/T − T (v(i − 1)w). i i=0
On the other hand, use (3.7) to obtain ∞ r/T i=0
i
T (v(i − 1)w) =
∞ r/T i=0
i
z1wtv+wtw−i
trYM (v(i − 1)w, z1 )φ(h)q L(0)−c/24 ∞ r/T wtv+wtw−i z1 = Resz−z1 (z − z1 )i−1 i i=0
trYM (Y (v, z − z1 )w, z1 )φ(h)q L(0)−c/24 r/T z = Resz−z1 ιz1 ,z−z1 (z − z1 )−1 z1wtv+wtw z1 trYM (Y (v, z − z1 )w, z1 )φ(h)q L(0)−c/24 z1 = Resz ιz,z1 (z − z1 )−1 ( )wtv−r/T T (v, w) z −1 z1 wtv−r/T T (w, v) − Resz ιz1 ,z (z − z1 ) ( ) z which by Theorem 8.4 is equal to ∞
Resz ιz,z1 (z − z1 )−1 (
k=1
−λ
∞
z1 wtv−r/T ¯ z1 Pk (µ, λ, , q)T (v[k − 1]w) ) z z
Resz ιz1 ,z (z − z1 )−1 (
k=1
This completes the proof of the theorem.
z1 wtv−r/T ¯ z1 q Pk (µ, λ, ) , q)T (v[k − 1]w). z z
In order to complete the proof of Theorem 8.1 we need to explain how (5.8), and the fact that T (v) vanishes on (5.1) and (5.2), follow from [Z]. These results concern the case in which the critical vector v satisfies gv = hv = v. Thus v lies in the invariant sub vertex operator algebra V A . Now Zhu’s proof of (5.8), for example, is quite general in the sense that it does not depend on any special properties of V . In particular, his argument
40
C. Dong, H. Li, G. Mason
applies to V A , which is what we need. (Note that M is a module for V A .) Similarly, Zhu’s argument establishes that T vanishes on (5.1) and (5.2) in the case that g and h both fix v and w. On the other hand we may certainly assume that w is an eigenvector for g and h. If gw = αw, hw = βw and (α, β) = (1, 1), we have already seen (cf. the proof of Lemma 8.2) that T (v[2k − 2]w) = 0, so it is clear in this case that T vanishes on (5.2). This completes the proof of Theorem 8.1. Theorem 8.7. Let M 1 , M 2 , ... be inequivalent simple g-twisted V -modules, each of which is h-stable. Let T1 , T2 , . . . . be the corresponding trace functions (8.6). Then T1 , T2 , . . . . are linearly independent elements of C1 (g, h). Proof. Suppose false. Then we may choose notation so that for some m ∈ N there are non-zero scalars c1 , c2 , . . . , cm such that c1 T1 + · · · + cm Tm = 0.
(8.17)
Let 7i be the top level of M i , 1 ≤ i ≤ m, and let λi be the conformal weight of M i (cf. Sect. 3). Thus M i is graded by T1 Z, i M i = ⊕∞ n=0 Mλi +n/T
and 7i = Mλi i . Define a partial order 0. Theorem 10.1 obviously follows from these two propositions. First we show that Proposition 10.5 follows from Proposition 10.4. To this end, pick v ∈ V such that gv = µ−1 v, hv = λ−1 v. It suffices to show that pSp (v, τ ) = 0. If (µ, λ) = (1, 1) this follows from (C3) and (5.3), so we may assume gv = hv = v. Set w = L[−2]v −
∞
E2k (τ )L[2k − 2]v.
(10.2)
k=1
From (5.8) and (5.9) we get S(w, τ ) =
q1
d T S(v, τ ). T dq 1
(10.3)
T
Now Proposition 10.4 combined with Theorem 8.1 tells us that (10.3) is satisfied by each Si . Then we calculate that S(w, τ ) =
p i ( (log q 1 )i−1 Si (v, τ ) + (log q 1 )i Si (w, τ )). T T T i=0
(10.4)
44
C. Dong, H. Li, G. Mason
We may identify the parts of (10.1) which involve a given power (log q 1 )i . Taking T i = p − 1, we see that Sp−1 (w, τ ) =
p Sp (v, τ ) + Sp−1 (w, τ ), T
so that pSp (v, τ ) = 0, as desired. We turn our attention to the proof of Proposition 10.4. We assume without loss that Sp = 0 and that each Sp,j = 0 (cf. (6.13)). We are then in the situation that was in effect in Sect. 9. We adopt the notation (9.1). It was shown that α : V → C vanishes on Og (V ), and thus defines a linear functional α : Ag (V ) → C. We continue this line of reasoning, and now prove Lemma 10.6. Suppose that u, v ∈ V and satisfy hu = ρu, hv = σ v, ρ, σ ∈ C. Then α(u ∗g v) = ρδρσ,1 α(v ∗g u).
(10.5)
Proof. We may assume that gu = ξ u and gv = νv for scalars ξ, ν. If ξ or ν is not equal to 1 then u (resp. v) lies in Og (V ) (cf. Lemma 2.1 of [DLM3]), whence so too do u ∗g v and v ∗g u by Theorem 3.3. So in this case both sides of (10.5) are equal to 0. So we may assume gu = u, gv = v. Similarly, u ∗g v is an eigenvector for h with eigenvalue ρσ , so if ρσ = 1 then u ∗g v and v ∗g u lie in O(g, h) by (5.3). Then S(u ∗g v) = S(v ∗g u) = 0 by (C3), which again leads to both sides of (10.5) being 0. So we may assume that ρσ = 1 and try to prove that α(u ∗g v) = ρα(v ∗g u). Now we know from [Z] (also see Lemma 2.2 (iii) of [DLM3]) that if V g is the space of g-invariants of V then for u homogeneous u ∗g v − v ∗g u ≡ Resz Y (u, z)v(1 + z)wtu−1 (modO(V g )). Using (2.17) we get u ∗g v − v ∗g u ≡
∞ wtu − 1 u(i)v ≡ u[0]v (modO(V g )). i
(10.6)
i=0
Now certainly O(V g ) ⊂ Og (V ) (loc. cit.), and if ρ = 1 then u[0]v ∈ O(g, h) by (5.1). So in this case (10.6) leads to α(u ∗g v − v ∗g u) = 0 as desired. So we may take ρ = 1. In this case we follow the calculation of Lemma 9.2. Bearing in mind that gu = u and hv = ρv with ρ = 1, we see from the proof of Lemma 9.2 (b) that the constant term of ∞
Qk (1, ρ −1 , q)u[k − 1]v
k=0
is −u ∗g v +
1 u[0]v. 1 − ρ −1
Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine
Since S vanishes on
∞
k=0 Qk (1, ρ
−1 , q)u[k
α(u ∗g v) =
45
− 1]v ∈ O(g, h), this shows that
1 α(u[0]v). 1 − ρ −1
However (10.6) still applies, so that α(u ∗g v) =
1 (α(u ∗g v) − α(v ∗g u)), 1 − ρ −1
which is equivalent to the desired result.
We will need Lemma 10.7. Let A be a finite-dimensional semi-simple algebra over C with decomposition A = ⊕i∈I Ai into simple components. Let h : A → A be an automorphism of A of finite order, and suppose that F : A → C is a linear map which satisfies F (ab) = ρδρσ,1 F (ba)
(10.7)
whenever ha = ρa and hb = σ b, ρ, σ ∈ C. Then F can be written as a linear combination with scalars αj : αj tr Wj (aγj ), (10.8) F (a) = j ∈J
where in (10.8), {Aj }j ∈J ranges over the h-invariant simple components of A, Wj is the simple A-module such that Aj Wj = Wj , and γj ∈ A∗j satisfies ha = γj aγj−1 for a ∈ Aj . Remark. The existence of γj is the Skolem–Noether Theorem. Proof. Proceed by induction on the order of h and cardinality of I . The group "h# permutes the Ai among themselves, and the conditions of the lemma apply to any hinvariant sum of Ai . So we may assume that "h# is transitive in its action on the Ai . First assume that there are at least two components. Then there are no h-invariant components, so we must show that F = 0 in this case. If σ = 1 and hb = σ b, taking a = 1 in (10.7) shows that F (b) = 0. So we only need show that F is zero on the algebra Ah of h-invariants of A. If there is 1 = k ∈ "h# such that k fixes each Ai then the algebra of k-invariants B is a semi-simple algebra admitting "h#/"k#. By induction we see that F (B) = 0, so we are done as Ah ⊂ B. So without loss there is no such k. So h has order |I |, the number of components. Thus if A0 is the first component then we may set Ai = hi A0 , 0 ≤ i ≤ |I | − 1. Then |I |−1 h hi x|x ∈ A0 } & A0 . A ={ i=0
By (10.7) F (ab) = F (ba) for a, b ∈ Ah , so F is a trace function , i.e., F (a) = αtr W a for some α ∈ C, W the simple Ah -module. So we must show F (1A ) = 0. Let λ be an |I |th root of unity, let u, v ∈ A0 be units such that uv = 1A0 , and let a=
|I |−1 i=0
λi hi (u), b =
|I |−1 i=0
λ−i hi (v).
46
C. Dong, H. Li, G. Mason
|I |−1 Then ba = ab = i=0 hi (u)hi (v) = i hi (uv) = 1A . On the other hand ha = λ−1 a, hb = λb and λ = 1. So (10.7) yields F (1A ) = F (ab) = F (ba) = λF (ab). So F (1A ) = 0 as desired. This reduces us to the case that A is itself a simple algebra. Pick γ ∈ A∗ such that h(a) = γ aγ −1 and consider F1 : A → C defined by F1 (a) = F (aγ −1 ). If ha = ρa and ρ = 1 then F (aγ −1 ) = 0 by (10.7), so F1 (a) = 0 for such a. On the other hand we get for hb = σ b, F1 (ab) = F (abγ −1 ) = ρδρσ,1 F (bγ −1 a) = ρδρσ,1 F (bγ −1 aγ γ −1 ) = δρσ,1 F (baγ −1 ) = δρσ,1 F1 (ba). From this we conclude that F1 (ab) = F1 (ba) for all a, b ∈ A. So F1 is a trace function F1 (a) = αtr W a, so that F (a) = αtr W aγ . This completes the proof of the lemma. Now we return to the situation of Lemma 10.6. From (3.18) h induces an automorphism of Ag (V ) via h : v → hv, and since V is g-rational, then Ag (V ) is semi-simple and Lemma 10.7 applies. From the discussion in Sect. 3, the h-invariant components of Ag (V ) correspond precisely to the h-invariant simple Ag (V )-modules, and these correspond to the h-invariant simple g-twisted V -modules. For such a simple Ag (V )-module 7 we have φ(h)o(v)φ(h)−1 = o(hv) (cf. (8.1)), o(v) being the corresponding zero mode (3.14). Also o(hv) = γ o(v)γ −1 if γ represents h in the sense of Lemma 10.7. So γ and φ(h) differ by a scalar when considered as operators on 7. By Lemmas 10.6 and 10.7 we get Lemma 10.8. The linear function α : Ag (V ) → C can be represented in the form αj tr 7(M j ) o(v)φ(h), (10.9) α(v) = j
where αj are scalars and the spaces 7M j range over the top levels of the h-invariant simple g-twisted V -modules M j . Recall that we have Sp (v, τ ) =
b
q λp,j Sp,j (v, τ )
(10.10)
j =1
with Sp,1 as in (9.1). Lemma 10.9. Suppose that αj = 0 in (10.9). Then the conformal weight of the corresponding g-twisted module M j is equal to λp,1 + c/24. Proof. We use the method of proof of Proposition 10.5 once more. For v ∈ V , let w = w(v) be as in (10.2). Thus (10.3) holds whenever S ∈ C1 (g, h). Applying (10.3) with S = TM j (cf. (8.4)) and considering leading terms yields tr 7(M j ) o(w)φ(h) = (λj − c/24)tr 7(M j ) o(v)φ(h),
(10.11)
where λj is the conformal weight of M j . Similarly applying (10.3) to S itself and considering the leading term of Sp yields α(w) = λp,1 α(v).
(10.12)
Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine
47
Using (10.5), we find that for v ∈ V , λp,1
j
αj tr 7(M j ) o(v)φ(h) = λp,1 α(v) = α(w) = =
j
j
αj tr 7(M j ) o(w)φ(h)
αj (λj − c/24)tr 7(M j ) o(v)φ(h).
The linear independence of characters of Ag (V ) implies that λp,1 αj = αj (λj − c/24). The lemma follows. We are ready for the final argument. We have in the previous notation
q λp,1 Sp,1 (v, τ ) = q λp,1 (
j
=
j
αj tr 7(M j ) o(v)φ(h) +
∞
ap,1,n (v)q n/T )
n=1
q λj −c/24 αj tr 7(M j ) o(v)φ(h) + q λp,1
∞
ap,1,n (v)q n/T ).
n=1
Now also TM j (v, τ ) = q λj −c/24 (tr 7(M j ) o(v)φ(h) +
∞ n=1
tr M j
λj +n/T
o(v)φ(h)q n/T ).
So we see that the function S (v, τ ) = S(v, τ ) − (log q 1 )p T
j
αj TM j (v, τ )
again has the form (6.12)-(6.15), but the leading term of the piece corresponding to Sp now has a higher degree than Sp itself. We now continue the argument, replacing S with S and Sp with Sp . We find, since each TM j already lies in C1 (g, h), and since there are only finitely many M j , that indeed Sp is a linear combination of TM j . But our argument applies equally well to each Si , so each Si is a linear combination of TM j . This completes the proof of Proposition 10.4.
11. Rationality of Central Charge and Conformal Weights Recall from (3.11) that a simple g-twisted V -module M has grading of the form M = ⊕∞ n=0 Mλ+n/T for some λ ∈ C called the conformal weight of M. We will show that, under suitable rationality conditions on V , the conformal weight λ of M is a rational number. We prove even more, namely Theorem 11.1. Suppose that V is a holomorphic vertex operator algebra which satisfies Condition C2 , and let g ∈ AutV have finite order. Let V (g) be the unique simple gtwisted V -module whose existence is guaranteed by Theorem 10.3. Then the conformal weight of V (g) is rational, and the central charge c of V is also rational.
48
C. Dong, H. Li, G. Mason
Theorem 11.2. Suppose that V is a vertex operator algebra which satisfies Condition C2 , and let g ∈ AutV have finite order. Suppose that V is g i -rational for all integers i. Then each simple g i -twisted V -module has rational conformal weight, and the central charge c of V is rational. Theorem 11.3. Suppose that V is a rational vertex operator algebra which satisfies condition C2 . Then each simple V -module has rational conformal weight, and the central charge of V is rational. These theorems complete the proofs of Theorems 1.1 and 1.2. Note that Theorem 11.3 is simply a restatement of Theorem 11.2 in the special case that g = 1. We will prove Theorems 11.1 and 11.2 simultaneously. Indeed, at this point in the paper the proof follows from ideas in a paper of Anderson and Moore [AM]: we have only to assemble the relevant facts. First observe that to prove Theorem 11.2 it suffices to show that each simple g-twisted V -module has rational conformal weight, and that c is rational. With this in mind, let f (q) be one of the following q-expansion: q −c/24 n≥n0 (dim Vn )q n , where V = ⊕n≥n0 Vn ; q λ−c/24 n≥n0 (dim V (g)λ+n/T )q n with λ the conformal weight of V (g), where V (g) is either the unique simple g-twisted V -module in the situation of Theorem 11.1, or any simple g-twisted V -module in the situation of Theorem 11.2. Let U be the SL(2, Z)-module of holomorphic functions on h generated by f (q). In each case U is a finite-dimensional C-linear space, and the elements of U have q-expansions in (not necessarily rational) powers of q. This assertion follows from Theorems 10.1 and 10.3. This puts us in the position of being able to apply methods and results of Anderson and Moore (loc.cit.) . The argument proceeds as follows. Define by λ = λ(τ ) the usual Picard function which generates the field of rational d functions on the compactification of h/ !(2). With E = dλ , there are unique meromorphic functions ki such that U is precisely the space of solutions of the differential equation Eny +
n−1
ki E i y = 0.
(11.1)
i=0
The ki are then in C(λ) (Proposition 1 of (loc.cit.)). For a given φ ∈ Aut(C), and for r(q) ∈ U , let r φ be as defined in (loc.cit.). By the Frobenius–Fuchs theory, the r φ are then q-expansions of the solutions of the φ-transform of (11.1), namely n
E y+
n−1 i=0
φ
ki E i y = 0.
(11.2)
We claim that the solutions of (11.2) also afford a representation of SL(2, Z). First note that since each ki lies C(λ) then the actions of φ and γ ∈ SL(2, Z) on C(λ) commute: this follows from the well-known formulae for the action of the modular group on λ. Then if y|γ is the γ -image of a solution y of (11.1) we find that (y|γ )φ |γ −1 is a solution of (11.2). The claim follows from this observation. Now f (q) ∈ U has the form f = q λ−c/24 an q n/T , n≥N
Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine
49
where λ = 0 and T = 1 in the first case, and where an ∈ Z in all cases. Then f φ = q φ(λ−c/24) an q n/T , n≥N
i.e., f φ = q φ(λ−c/24)−(λ−c/24) f.
(11.3)
One now applies S to both sides of (11.3) to obtain f φ |S = e−α/τ f |S,
(11.4)
where α = 2π i(φ(λ − c/24) − (λ − c/24)). On the other hand, we showed above that both f φ |S and f |S have q-expansions. This leads to a contradiction by using the limit argument of (loc.cit.) unless α = 0. As this holds for all φ, we conclude that c/24 ∈ Q and λ − c/24 ∈ Q, which completes the proofs of the theorems. Let us formalize the situation which prevails in case V is a holomorphic vertex operator algebra which satisfies Condition C2 and is equipped with a finite group G of automorphisms. Let V (g) be the unique simple g-twisted V -module (Theorem 10.3) for g ∈ G, and let C(g) = {h ∈ G|gh = hg} be the centralizer of g in G. According to Theorem 11.1 the grading of V (g) has the form V (g) = ⊕∞ n=0 V (g)λ+n/T , where g has order T and λ = λ(g) ∈ Q. As V (g) is unique, it admits a (projective) representation of C(g), so for any h ∈ C(g) we may consider the trace function TV (g) (v, g, h, q). As usual, this is really only defined up to a nonzero scalar. By Theorem 10.3 this trace function spans the 1-dimensional space C1 (g, h) if "g, h# is cyclic. The shape of the trace function is as in (8.5), with λ − c/24 ∈ Q. Thus it has a q-expansion with rational powers of q of bounded denominator, and is holomorphic as a function on h. We now assume that "g, h# is cyclic and fix v ∈ V[k] . It follows from Theorem 5.4 that TV (g) |k γ (v, g, h, τ ) = (cτ + d)−k TV (g) (v, g, h, γ τ ) lies 1 ((g, h)γ , τ ) and hence in C ab is a scalar multiple of TV (g a hc ) (v, (g, h)γ , τ ). Here γ = lies in SL(2, Z). Thus cd there are scalars σ (g, h, γ ) such that the following holds: (cτ + d)−k TV (g) (v, g, h, γ τ ) = σ (g, h, γ )TV (g a hc ) (v, (g, h)γ , τ ).
(11.5)
Equation (11.5) together with the rationality of the corresponding q-expansions says precisely that each TV (g) (v, g, h, τ ) is a generalized modular form of weight k in the language of [KM]. Note that Theorem 1.4 follows from these results. Theorem 1.3 follows in the same way. A case of particular interest is when we take v to be the vacuum element 1, in which case k = 0. In this case Z(g, h) = TV (g) (1, g, h, τ )
(11.6)
(see (1.6)-(1.8)). This is essentially the graded trace of φ(h) on the g-twisted module V (g), sometimes called a partition function or McKay-Thompson series. In this case, we have proved
50
C. Dong, H. Li, G. Mason
Theorem 11.4. Let V be a holomorphic vertex operator algebra which satisfies Condition C2 , and let G be a finite group of automorphisms of V . For each pair of commuting elements (g, h) which generates a cyclic group, Z(g, h) is a generalized modular function (i.e., of weight zero) which is holomorphic on h and satisfies γ : Z(g, h) → σ (g, h, γ )Z((g, h)γ )
(11.7)
for γ ∈ SL(2, Z). 12. Condition C2 In order to be able to apply the preceding results to known vertex operator algebras, we need verify that Condition C2 is satisfied. We do this for some of the best known rational vertex operator algebras in this section. Refer to Sect. 3 for the definition of Condition C2 . Lemma 12.1. If V is a vertex operator algebra and M is V -module, then C2 (M) contains v(−n)M for all v ∈ V and n ≥ 2. Proof. This follows from definition (3.21) together with the equality (L(−1)m v)(−2) = (m + 1)!v(−m − 2).
Lemma 12.2. Let V1 , . . . , Vk be vertex operator algebras such that for each i, all simple Vi -modules satisfy Condition C2 . Then the same is true for the tensor product vertex operator algebra V1 ⊗ · · · ⊗ Vk . Proof. See [FHL] for tensor product vertex operator algebras and their modules. We may assume that k = 2. One knows (loc.cit.) that the simple V1 ⊗ V2 -modules are precisely those of the form M1 ⊗ M2 with Mi a simple Vi -module. If v ∈ V1 then (v ⊗ 1)(−2) = v(−2) ⊗ id, from which it follows that C2 (M1 ⊗ M2 ) contains C2 (M1 ) ⊗ M2 . Similarly it contains M1 ⊗ C2 (M2 ). The lemma follows immediately. Now we discuss Condition C2 for the most well-known rational vertex operator algebras, namely, (i)
The vertex operator algebra L(cp,q , 0) associated with the (discrete series) simple Virasoro algebra V ir-module of highest weight 0 and central charge c = cp,q = 1−
6(p−q)2 pq
([DMZ, FZ, Wa]).
(ii) The moonshine module V , ([B1, FLM3]). (iii) The vertex operator algebra VL associated with a positive definite even lattice L ([B1, D1, FLM3]). (iv) The vertex operator algebra L(k, 0) associated to a gˆ -module of highest weight 0 and positive integral level k, g a simple Lie algebra ([DL, FZ, Li]). Lemma 12.3. L(cp,q , 0) satisfies Condition C2 .
Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine
51
Proof. Set L = L(cp,q , 0). It is a quotient of the corresponding Verma module M = M(cp,q , 0) and we have M & U (V ir− ) · 1 (cf. [FZ]) where V ir− = ⊕∞ n=1 CL(−n) and the L(n) are the usual generators of the Virasoro algebra V ir. Now Y (ω, z) = n∈Z L(n)z−n−2 , so that C2 (V ) contains L(−n)L for all n ≥ 3 by Lemma 12.1. We have L = M/J and J contains two singular vectors [FF]. The first is L(−1) · 1, which C2 (L) contains U (V ir− )L(−1)1. From this we see that L = C2 (L) + ∞ shows that k 1. CL(−2) k=0 The second singular vector has the form v = L(−2)pq 1 + an1 ,··· ,nr L(−n1 − 2) · · · L( − nr − 2)1, (12.1) where the sum ranges over certain (n1 , · · · , nr ) ∈ Zr+ with n1 +· · ·+nr = 0, an1 ,...,nr ∈ C (cf. Eq. (3.11) of [DLM2]). From the previous paragraph we see that the terms under the summation sign in (12.1) each lie in C2 (L), whence also L(−2)pq 1 lies in C2 (L). By Lemma 3.8, C2 (L) is invariant under L(−2). We conclude that L = C2 (L) + pq−1 k k=0 CL(−2) 1, and the proposition follows. The following result was stated without proof in [Z] Proposition 12.4. The moonshine module V , satisfies Condition C2 . Proof. Let U be the tensor product L( 21 , 0)⊗48 . It is shown in [DMZ] that V , contains a sub vertex operator algebra isomorphic to U . Moreover when considered as a U -module, V , is a direct sum of finitely many simple U -modules. Suppose that each simple module for L( 21 , 0) satisfies Condition C2 . Then this is true also for U by Lemma 12.2, so that the space spanned by u(−2)v for u ∈ U and v ∈ V , already has finite codimension in V , . So it suffices to show that the simple L( 21 , 0)-modules indeed satisfy Condition C2 . The proof of this later assertion is similar to that of the last proposition. Apart from L = L( 21 , 0) itself there are just two other simple modules for L, namely L( 21 , 21 ) 1 1 and L( 21 , 16 ) [DMZ]. Let M( 21 , h) (h = 21 , 16 ) be the corresponding Verma module 1 1 with L( 2 , h) = M( 2 , h)/Jh . As in the previous proposition we have L(−n)L( 21 , h) ⊂ C2 (L( 21 , h)) for n ≥ 3, so that L( 21 , h) = C2 (L( 21 , h)) + a,b≥0 CL(−2)a L(−1)b vh , where vh is a highest weight vector. Moreover Jh contains two singular vectors (cf. [FF]). One of them is (L(−2) − 43 L(−1)2 )v 1 or (L(−2) − 43 L(−1)2 )v 1 (cf. [DMZ]), 2 16 from which we conclude that L( 21 , h) = C2 (L( 21 , h)) + b≥0 CL(−1)b vh . Because of the existence of a second singular vector we see that C2 (L( 21 , h)) necessarily contains L(−1)b vh for big enough b, whence our claim follows. The reader is referred to [FLM3] for the construction of VL and associated notation, which we use in the next result. Proposition 12.5. VL satisfies Condition C2 . Proof. Set H = L ⊗Z C. Then it is easy to see that VL = C2 (VL ) + S(H ⊗ t −1 ) ⊗ C{L}.
(12.2)
52
C. Dong, H. Li, G. Mason
Let 0 = α, γ ∈ L and set β = γ − α. Then C2 (VL ) contains u(−2)v, where we take u = L(−1)k ι(a) (k ≥ 0) and v = ι(b), where a, b, c ∈ Lˆ are such that a¯ = α, b¯ = β and c = ab. Then c¯ = γ . So C2 (VL ) contains Resz z−k−2 Y (ι(a), z)ι(b) = Resz z
−k−2
exp
−α(n) n0
n
z
−n
azα ι(b).
Now azα ι(b) = z"α,β# ι(c). Using (12.2) we now see that C2 (VL ) contains Resz z−k−2 eα(−1)z z"α,β# ι(c), and we conclude that C2 (VL ) contains α(−1)1+k−"α,β# ι(c)
(12.3)
whenever k ≥ 0 and k ≥ 1 − "α, β#. So if "α, β# ≥ 1 we may choose k appropriately to see that C2 (VL ) contains ι(c). So we have shown that C2 (VL ) contains ι(c), and hence S(H ⊗ t −1 ) ⊗ ι(c) by Lemma 3.8, unless "α, γ # ≤ "α, α# for all α ∈ L. Let ! denote the set of γ ∈ L with this latter property. Now ! is a finite set. Fix a Z-basis B of L and let M = maxγ ∈!,β∈B (1 − "β, γ − β#). From (12.3) we see that β(−1)M ⊗ ι(c) ∈ C2 (VL ) for all β ∈ B, γ ∈ !. Now from the above calculations, we see that C2 (VL ) contains S r (H ⊗ t −1 ) ⊗ C{L} for all big enough integers r and also C2 (VL ) contains S(H ⊗ t −1 ) ⊗ Cι(c) for all c ∈ Lˆ such that c¯ ∈ L\!. It follows from (12.2) that indeed C2 (VL ) has finite codimension in VL . Proposition 12.6. Let k be a positive integer and g a complex simple Lie algebra. Then the vertex operator algebra L(k, 0) associated to g and k satisfies Condition C2 . Proof. See [FZ] for the vertex operator algebra structure of L(k, 0) and also the corresponding Verma module M(k, 0). By definition M(k, 0) = U (ˆg) ⊗U (∞ C & U( n n=0 t ⊗g+Cc)
∞
t −n ⊗ g)
n=1
(linearly). Then L = L(k, 0) is the quotient of M(k, 0) by the maximal gˆ -submodule. For a ∈ g, Y (a, z) = n∈Z a(n)z−n−1 , so C2 (L) contains a(−n)L for all a ∈ g and all n ≥ 2 by Lemma 12.1. Thus L = C2 (L) + U (t −1 ⊗ g)1. It is enough to show that C2 (L) contains a1 (−1)m1 · · · ad (−1)md 1 whenever mi ≥ 0 and m1 + · · · + md is large enough; here a1 , . . . , ad is a basis of g. By Lemma 3.6 of [DLM2] we may choose the ai so that [Y (ai , z1 ), Y (ai , z2 )] = 0, Y (ai , z)3k+1 = 0 for each i. Now the constant term in Y (ai , z)3k+1 1 is equal to ai (−1)3k+1 1 + r where r is a sum of products of the form ai (n1 )e1 · · · ai (n3k+1 )e3k+1 1 with some nj ≤ −2. Since the operators ai (nj ) commute, r ∈ C2 (L). Hence ai (−1)3k+1 1 ∈ C2 (L). We can now conclude that C2 (L) contains a1 (−1)m1 · · · ad (−1)md 1 whenever mi ≥ 3k + 1 for some i. The proposition follows immediately.
Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine
53
13. Applications to the Moonshine Module We now apply our results to the study of the conjectures of Conway–Norton–Queen as discussed in the Introduction. Recall that the moonshine module V , is a vertex operator algebra whose automorphism group is precisely the Monster M. See [B1, FLM3, G] for details. The first author proved that V , has a unique simple module, namely V , itself, in [D2], and in [DLM3] we showed that in fact V , is rational, that is every admissible V , -module is completely reducible. Thus V , is holomorphic. It also satisfies Condition C2 (Prop. 12.4). By Theorem 10.3 we conclude that there is a unique simple g-twisted V , -module V , (g) for each g ∈ M. For each pair of commuting elements (g, h) in M, recall from Sect. 11 that Z(g, h, τ ) = Z(g, h) is the corresponding partition function. The function Z(1, h) is precisely the graded character of h ∈ M on V , . By the results of Borcherds [B2], which confirm the original Conway-Norton conjecture [CN], each Z(1, h) is a Hauptmodul – in fact the Hauptmodul conjectured in [CN]. We use this to prove the next result, which completes the proof of Theorem 1.5. Theorem 13.1. The following hold: (i) There is a scalar σ = σ (g) such that the graded dimension Z(g, 1, τ ) of V , (g) is equal to σ Z(1, g, Sτ ). In particular, Z(g, 1, τ ) is a Hauptmodul. (ii) More generally, if a commuting pair g, h ∈ M generates a cyclic group then Z(g, h, τ ) is Hauptmodul. Proof. Suppose that "g, h# = "k#. Then there is γ ∈ SL(2, Z) such that (g, h)γ −1 = (1, k). By Theorem 11.4 we have Z(g, h, τ ) = σ Z(1, k, γ τ ) for some scalar σ . Since Z(1, k, τ ) is a Hauptmodul, so too is Z(g, h, τ ). If h = 1 then we may take k = g and γ = S. Both parts of the theorem now follow. More is known in special cases. Huang has shown [H] that if g is of type 2B (in ATLAS notation) then in fact the constant σ (g) in part (i) is equal to 1. This also follows from our results and [FLM3]. Similarly, if g is of type 2A it is shown in [DLM1], on the basis of Theorem 13.1, that again σ (g) = 1. In this case, the precise description of Z(g, h, τ ) for gh = hg and h of odd order is given in [DLM1]. As discussed in [DM1] and [DM2], the uniqueness of V , (g) leads to a projective representation of the centralizer CM (g) on V , (g). These are discussed in some detail, in the case that g has order 2, in (loc.cit.). The conjectures of Conway–Norton–Queen state that there are (projective) representations of CM (g) on suitably graded spaces such that all graded traces are either Hauptmoduln or zero. There is no doubt that the twisted modules V , (g) are the desired spaces. One would still like to show that σ (g) = 1 in part (i) of the theorem for all g ∈ M, and to compute the Mckay-Thompson series Z(g, h, τ ) for "g, h# not cyclic. Finally, we calculate some correlation functions. We fix a holomorphic vertex operator algebra V satisfying Condition C2 . Recall that T (v, (g, h), τ ) is the (g, h) 1-point correlation function associated with v and a pair of commuting elements g, h ∈ Aut(V ) as defined in (8.4). If wt[v] = k then we have seen that T (v, (1, 1), τ ) is a generalized modular form, holomorphic in the upper half-plane h. In fact, Eq. (11.5) tells us that T (v, (1, 1), τ ) spans a 1-dimensional SL(2, Z)-module under the action (1.13), that is we have T |γ (v, (1, 1), τ ) = σ (γ )T (v, (1, 1), τ ) for some character σ : SL(2, Z) → C∗ . We use this to prove
(13.1)
54
C. Dong, H. Li, G. Mason
Lemma 13.2. Let V be a holomorphic vertex operator algebra which satisfies Condition C2 and let v ∈ V satisfy wt[v] = k. If k is odd, then the correlation function T (v, (1, 1), τ ) is identically zero. Proof. We observe that the q-expansion of T (v, (1, 1), τ ) lies in C[[q 1/3 , q −1/3 ]]. This is because V is holomorphic and so 8|c (cf. the remark following Theorem 1.4). It follows that T acts on T (v, (1, 1), τ ) as multiplication by a cube root of unity, and since T covers the abelianization of SL(2, Z) then the kernel of the character σ has index dividing 3. Since S 4 = id it follows that S, and in particular S 2 , lies in the kernel of σ . Now setting γ = S 2 in (13.1) yields T (v, (1, 1), τ ) = T |S 2 (v, (1, 1), τ ) = (−1)k T (v, (1, 1), τ ). The lemma follows.
Now let G be a finite group of automorphisms of V . Then T (v, 1, g, τ ) is essentially the graded trace of o(v)g on V for g ∈ G. If we choose v to be the conformal vector ω˜ of (V , Y []) then wt[ω] ˜ = 2, so T (ω, ˜ 1, g, τ ) is a form of weight 2. It is easy to describe: Lemma 13.3. T (ω, ˜ 1, g, τ ) =
1 d 2πi dτ Z(1, g, τ ).
Proof. One could proceed by setting ω˜ = ω−c/24 and using Y (ω, z) = n L(n)z−n−2 , but it is simpler to use (5.8) and (5.9), as we may because g ω˜ = ω. ˜ As ω˜ = L[−2]1, the lemma follows. If we write V = ⊕n Vn , then of course we have Z(1, g, τ ) = q −c/24 (tr|Vn g)q n , n
T (ω, ˜ 1, g, τ ) = q
−c/24
n
(n − c/24)(tr|Vn g)q n ,
and we may think of T (ω, ˜ 1, g, τ ) as arising from a sequence of virtual characters of G. That is, instead of “Moonshine of weight 0,” one now has “Moonshine of weight 2.” This is relevant because of the work of Devoto [De] in which such things are interpreted as being elements of degree 2 in the elliptic cohomology of BG. Similarly suppose that v ∈ V satisfies wt[v] = k with gv = v for all g ∈ G. Then G commutes with o(v) in its action on each Vn , so each eigenspace of the semisimple part o(v)s of o(v) on Vn is a G-module and gives rise to a “generalized module” for G, i.e., of the form i λin Vni with λin ∈ C the distinct eigenvalues of o(v)s on Vn and Vni the corresponding eigenspaces of o(v)s . In this way, the pair (V , v) gives rise to a sequence of generalized modules n,i λin Vni for G such that the corresponding trace functions T (v, 1, g, τ ) are modular forms of weight k. This is “Moonshine of weight k,” and together with the analogues for the twisted sectors gives rise to elements of Ell k BG as in [De]. Actually, this is not quite what we have proved, because in [De] there are additional arithmetic requirements. It seems likely that the appropriate conditions do hold, but that remains to be investigated. The 1-point correlation functions for the Moonshine Module are completely described in a forthcoming paper [DM3].
Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine
55
References [AM]
Anderson, G. and Moore, G.: Rationality in conformal field theory. Commun. Math. Phys. 117, 441–450 (1988) [B1] Borcherds, R.E.: Vertex algebras, Kac-Moody algebras, and the Monster. Proc. Natl. Acad. Sci. USA 83, 3068–3071 (1986) [B2] Borcherds, R.E.: Monstrous moonshine and monstrous Lie superalgebras. Invent. Math. 109, 405– 444 (1992) [B3] Borcherds, R.E.: Modular Moonshine III. Peprint [BR] Borcherds, R.E. and Ryba, A.: Modular Moonshine II. Duke. Math. J. 83, 435–459 (1996) [CN] Conway, J.H. and Norton, S.P.: Monstrous Moonshine. Bull. London. Math. Soc. 12, 308–339 (1979) [De] Devoto, J.: Equivariant cohomology and finite groups. Michigan Math. J. 43, 3–32 (1996) [DVVV] Dijkgraaf, R., Vafa, C., Verlinde, E. and Verlinde, H.: The operator algebra of orbifold models. Commun. Math. Phys. 123, 485–526 (1989) [DGH] Dixon, L., Ginsparg, P., Harvey, J.A.: Beauty and the beast: Super-conformal symmetry in a Monster module. Commun. Math. Phys. 119, 221–241 (1986) [DHVW] Dixon, L., Harvey, J.A., Vafa, C. and Witten, E.: Strings on orbifolds. Nucl. Phys. B 261, 620–678 (1985); Strings on orbifolds II. Nucl. Phys. B 274, 285–314 (1986) [DGM] Dolan, L., Goddard, P. and Montague, P.: Conformal field theory of twisted vertex operators. Nucl. Phys. B 338, 529 (1990) [D1] Dong, C.: Vertex algebras associated with even lattices. J. Algebra 160, 245–265. (1993) [D2] Dong, C.: Representations of the moonshine module vertex operator algebra. Contemporary Math. 175 (1994) [D3] Dong, C.: Twisted modules for vertex algebras associated with even lattice. J. of Algebra 165, 91–112 (1994) [DL] Dong, C. and Lepowsky, J.: Generalized Vertex Algebras and Relative Vertex Operators. Progress in Math. Vol. 112, Boston: Birkhäuser, 1993 [DLM1] Dong, C., Li, H. and Mason, G.: Some twisted modules for the moonshine vertex operator algebras. Contemp. Math. 193, 25–43 (1996) [DLM2] Dong, C., Li, H. and Mason, G.: Regularity of rational vertex operator algebras. Adv. in Math. 132, 148–166 (1997) [DLM3] Dong, C., Li, H. and Mason, G.: Twisted representations of vertex operator algebras, Math. Ann. 310, 571–600 (1998) [DM1] Dong, C. and Mason, G.: Nonabelian orbifolds and the boson-fermion correspondence. Commun. Math. Phys. 163„ 523–559 (1994) [DM2] Dong, C. and Mason, G.: Vertex operator algebras and moonshine: A survey. Adv. Studies in Pure Math. 24, 101–136 (1996) [DM3] Dong, C. and Mason, G.: Monstrous moonshine of higher weight. math.QA/9803116 [DMZ] Dong, C., Mason, G. and Zhu, Y.: Discrete series of the Virasoro algebra and the moonshine module. Proc. Symp. Pure. Math. American Math. Soc. 56 II, 295–316 (1994) [EZ] Eichler, M. and Zagier, D.: On the Theory of Jacobi Forms I. Progress in Math. Vol. 55, Boston: Birkhäuser, 1985 [FF] Feigin, B.L. and Fuchs,D.B.: Verma modules over the Virasoro algebra. Lecture Notes in Math., Vol. 1060, Berlin–New York: Springer-Verlag, 1984 [FFR] Feingold, A.J., Frenkel, I.B. and Ries, J.F.X.: Spinor construction of vertex operator algebras, triality (1) and E8 . Contemp. Math. 121 (1991) [FHL] Frenkel, I.B., Huang, Y. and Lepowsky, J.: On axiomatic approaches to vertex operator algebras and modules. Memoirs Am. Math. Soc. 104 (1993) [FLM1] Frenkel, I.B., Lepowsky, J. and Meurman, A.: A natural representation of the Fischer-Griess Monster with the modular function J as character. Proc. Natl. Acad. Sci. USA 81, 3256–3260 (1984) [FLM2] Frenkel, I.B., Lepowsky, and Meurman, A.: Vertex operator calculus. In: Mathematical Aspects of String Theory, Proc. 1986 Conference, San Diego. ed. by S.-T. Yau, Singapore: World Scientific, 1987, pp. 150–188 [FLM3] Frenkel, I.B., Lepowsky, J. and Meurman, A.: Vertex Operator Algebras and the Monster. Pure and Applied Math. Vol. 134, London–New York: Academic Press, 1988 [FZ] Frenkel, I.B. and Zhu,Y.: Vertex operator algebras associated to representations of affine and Virasoro algebras. Duke Math. J. 66, 123–168 (1992) [G] Griess, R.: The Friendly Giant. Invent. Math. 69, 1–102 (1982) [HMV] Harvey, J., Moore, G. and Vafa, C.: Quasicrystalline compactification, Nucl. Phys. B304, 269–290 (1988) [H] Huang, Y.: A non-meromorphic extension of the moonshine module vertex operator algebra. Contemp. Math. 193, 123–148 (1996)
56
[I] [KP] [KM] [K] [La] [Le] [Li] [M] [MS] [N] [Q] [Ra] [Ry] [S] [T1] [T2] [T3] [V] [Wa] [Wo] [Z]
C. Dong, H. Li, G. Mason
Ince, E.: Ordinary Differential Equations. London: Dover Publications, Inc., 1956 Kac, V. and Peterson, D.: Infinite-dimensional Lie algebras, theta functions and modular forms. Advances in Math. 53, 125–264 (1984) Knopp, M. and Mason, G.: Generalized modular forms. In preparation Koblitz, N.: Introduction to Ellipitic Curves and Modular Forms. New York: Springer-Verlay, 1984 Lang, S.: Introduction to Modular Forms. New York: Springer-Verlag, 1976 Lepowsky, J.: Calculus of twisted vertex operators. Proc. Natl. Acad Sci. USA 82, 8295–8299 (1985) Li, H.: Local systems of vertex operators, vertex superalgebras and modules. J. Pure Appl. Alg. 109, 143–195 (1996) Montague, P.: Orbifold constructions and the classification of self-dual c = 24 conformal field theory. Nucl. Phys. B 428, 233–258 (1994) Moore, G. and Seiberg, N.: Classical and quantum conformal field theory. Commun. Math. Phys. 123, 177–254 (1989) Norton, S.: Generalized moonshine. Proc. Symp. Pure. Math., American Math. Soc. 47, 208–209 (1987) Queen, L.: Some relations between finite groups, Lie groups and modular functions. Ph.D. Thesis, University of Cambridge, 1980 Rademacher, H.: Topics in Analytic Number Theory. New York: Springer-Verlag, 1973 Ryba, A.: Modular Moonshine? Contemp. Math. 193, 307–336 (1996) Schellekens, A.N.: Meromorphic c = 24 Conformal Field Theories. Commun. Math. Phys. 153, 159 (1993) Tuite, M.: Monstrous moonshine from orbifolds. Commun. Math. Phys. 146, 277–309 (1992) Tuite, M.: On the relationship between monstrous moonshine and the uniqueness of the moonshine module. Commun. Math. Phys. 166, 495–532 (1995) Tuite, M.: Generalized moonshine and abelian orbifold constructions. Contemp. Math. 193, 353–368 (1996) Vafa, C.: Modular invariance and discrete torsion on orbifolds. Nucl. Phys. B 273, 592 (1986) Wang, W.: Rationality of Virasoro vertex operator algebras. Duke Math. J. IMRN, 71 1, 197–211 (1993) Wohlfahrt, K.: An extension of F. Klein’s level concept. Illinois J. Math. 8, 529–535 (1964) Zhu, Y.: Modular invariance of characters of vertex operator algebras. J. Am. Math. Soc. 9, 237–302 (1996)
Communicated by T. Miwa
Commun. Math. Phys. 214, 57 – 89 (2000)
Communications in
Mathematical Physics
© Springer-Verlag 2000
Random Matrix Theory and ζ (1/2 + it) J. P. Keating1,2 , N. C. Snaith1 1 School of Mathematics, University of Bristol, University Walk, Bristol BS8 1TW, UK 2 BRIMS, Hewlett-Packard Laboratories, Filton Road, Stoke Gifford, Bristol BS34 6QZ, UK
Received: 20 December 1999 / Accepted: 24 March 2000
Abstract: We study the characteristic polynomials Z(U, θ ) of matrices U in the Circular Unitary Ensemble (CUE) of Random Matrix Theory. Exact expressions for any matrix size N are derived for the moments of |Z| and Z/Z ∗ , and from these we obtain the asymptotics of the value distributions and cumulants of the real and imaginary parts of log Z as N → ∞. In the limit, we show that these two distributions are independent and Gaussian. Costin and Lebowitz [15] previously found the Gaussian limit distribution for Im log Z using a different approach, and our result for the cumulants proves a conjecture made by them in this case. We also calculate the leading order N → ∞ asymptotics of the moments of |Z| and Z/Z ∗ . These CUE results are then compared with what is known about the Riemann zeta function ζ (s) on its critical line Res = 1/2, assuming the Riemann hypothesis. Equating the mean density of the non-trivial zeros of the zeta function at a height T up the critical line with the mean density of the matrix eigenvalues gives a connection between N and T . Invoking this connection, our CUE results coincide with a theorem of Selberg for the value distribution of log ζ (1/2 + iT ) in the limit T → ∞. They are also in close agreement with numerical data computed by Odlyzko [29] for large but finite T . This leads us to a conjecture for the moments of |ζ (1/2 + it)|. Finally, we generalize our random matrix results to the Circular Orthogonal (COE) and Circular Symplectic (CSE) Ensembles. 1. Introduction We investigate the distribution of values taken by the characteristic polynomials Z(U, θ ) = det(I − U e−iθ )
(1)
of N × N unitary matrices U with respect to the circular unitary ensemble (CUE) of random matrix theory (RMT). Our motivation is that it has been conjectured that the limiting distribution of the non-trivial zeros of the Riemann zeta function (and other
58
J. P. Keating, N. C. Snaith
L-functions), on the scale of their mean spacing, is the same as that of the eigenphases θn of matrices in the CUE in the limit as N → ∞ [28, 29, 31]. Hence the distribution of values taken by the zeta function might be expected to be related to those of Z(U, θ ), averaged over the CUE. The Riemann zeta function is defined by ∞ 1 −1 1 ζ (s) = 1− s = (2) ns p p n=1
for Res > 1, and then by analytic continuation to the rest of the complex plane. It has infinitely many non-trivial zeros in the critical strip 0 < Res < 1. The Riemann Hypothesis (RH) states that all of these non-trivial zeros lie on the critical line Res = 1/2; that is, ζ (1/2 + it) = 0 has non-trivial solutions only when t = tn ∈ R. Montgomery [28] has conjectured that the two-point correlations between the heights tn (assumed real), on the scale of the mean asymptotic spacing 2π/ log tn , in the limit n → ∞, are the same as those which exist between the eigenvalues of random complex hermitian matrices in the limit as the matrix size tends to infinity. Such matrices form the Gaussian Unitary Ensemble (GUE) of RMT. The GUE correlations are in turn the same as those of the phases θn of the eigenvalues of N × N unitary matrices, on the scale of their mean separation 2π/N , averaged over the CUE, in the limit N → ∞. (For a review of the spectral statistics of random matrices, see [27]). This conjecture is supported by a theorem, also due to Montgomery [28], which implies that, in the appropriate limits, the Fourier transform of the two-point correlation function of the Riemann zeros coincides over a restricted range with the corresponding CUE result. It is also supported by extensive numerical computations [29]. Both the conjecture and Montgomery’s theorem (again for restricted ranges) extend to all n-point correlations [30]. There is also strong numerical evidence in support of this generalization; for example, the distribution of spacings between adjacent zeros, measured in units of the mean spacing, appears to have the same limit as for the CUE [29]. Furthermore, heuristic calculations based on a Hardy-Littlewood conjecture for the pair correlation of the primes imply the validity of the generalized conjecture for all n, without restriction on the correlation range [24, 7, 9]. Thus all available evidence suggests that, in the limit as N → ∞, local (i.e. shorttn 1 range) statistics of the scaled (to have unit mean spacing) zeros wn = tn 2π log 2π , th defined by averaging over the zeros up to the N , coincide with the corresponding N statistics of the similarly scaled eigenphases φn = θn 2π , defined by averaging over the CUE of N × N unitary matrices. This then implies that locally-determined statistical properties of ζ (s), high up the critical line, might be modelled by the corresponding properties of Z(θ ), averaged over the CUE. One of our aims here is to explore this link by comparing certain RMT calculations with the following theorem and conjecture concerning the value distribution of ζ (1/2 + it). First, according to a theorem of Selberg [33, 29], for any rectangle E in R2 , 1 log ζ (1/2 + it) lim ∈E t : T ≤ t ≤ 2T , T →∞ T (1/2) log log T 1 2 2 e−(x +y )/2 dx dy; (3) = 2π E
Random Matrix Theory and ζ (1/2 + it)
59
that is, in the limit as T , the height up the critical line, tends to infinity, the value distributions of the real and imaginary parts of log ζ (1/2 + iT )/ (1/2) log log T each tend independently to a Gaussian with unit variance and zero mean. Interestingly, Odlyzko’s computations for these distributions when T ≈ t1020 show systematic deviations from this limiting form [29]. For example, increasing moments of both the real and imaginary parts diverge from the Gaussian values. We review this data in more detail in Sect. 3. Second, it is a long-standing conjecture that f (λ), defined by 1 1 T lim |ζ (1/2 + it)|2λ dt = f (λ)a(λ), (4) T →∞ (log T )λ2 T 0 where a(λ) =
p
(1 − 1/p)
λ2
∞ (λ + m) 2 −m p m! (λ)
,
(5)
m=0
exists, and a much-studied problem then to determine the values it takes, in particular for integer λ (see, for example, [33, 21]). Obviously f (0) = 1. It is also known that f (1) = 1 [17] and f (2) = 1/12 [20]. Based on number-theoretical arguments, Conrey and Ghosh have conjectured that f (3) = 42/9! [13], and Conrey and Gonek that f (4) = 24024/16! [14]. Conrey and Ghosh have obtained a lower bound for f when λ ≥ 0 [12], and HeathBrown [18] has obtained an upper bound for 0 < λ < 2. We now state our main results, all of which hold for θ ∈ R. (i) For Res > −1, N
MN (s) = |Z(U, θ )|s U (N) =
j =1
(j ) (j + s) , ( (j + s/2))2
(6)
where the average is over the CUE of N ×N unitary matrices, that is over the group U (N ) with respect to the normalized translation-invariant (Haar) measure [34, 27]. Clearly the result extends by analytic continuation to the rest of the complex s-plane. It follows from (6) that, for integers k ≥ 0, MN (2k) is a polynomial in N of degree k 2 . (ii) For s ∈ C, N ( (j ))2 Z(U, θ ) s/2 LN (s) = = , (7) ∗ Z (U, θ ) (j + s/2) (j − s/2) U (N)
j =1
where arg Z(U, θ ) is defined by continuous variation along θ − i$, starting at −i$, in the limit $ → 0, assuming θ is not equal to any of the eigenphases θn , with log Z(U, θ − i$) → 0 as $ → ∞. Thus Im log Z(U, θ ) has a jump discontinuity of size π when θ = θn . (iii) The value distributions of the real and imaginary parts of log Z(U, θ )/ (1/2) log N each tend independently to a Gaussian with zero mean and unit variance in the limit as N → ∞. This corresponds directly to Selberg’s theorem (3) for log ζ (1/2 + it) if we identify the mean density of the eigenangles θn , N/2π , with the mean density of the 1 T Riemann zeros at a height T up the critical line, 2π log 2π ; that is if N = log
T . 2π
(8)
60
J. P. Keating, N. C. Snaith
This is a natural connection to make between matrix size and position on the critical line, because the mean eigenvalue density is the only parameter in the theory of spectral statisitics for the circular and Gaussian ensembles of RMT. The central limit theorem for Im log Z was first proved by Costin and Lebowitz [15] for the characteristic polynomials of matrices in the GUE (see also [32] for a review of related results). Our proof is new, and goes further in that it allows us to compute the cumulants. (iv) Let Qn (N ) be the nth cumulant of the distribution of values of Re log Z, defined with respect to the CUE, and let Rn (N ) be the corresponding cumulant for Im log Z. Then N
2n−1 − 1 (n−1) ψ (j ), 2n−1
Qn (N ) =
(9)
j =1
and
Rn (N ) =
N
(−1)1+n/2 2n−1
j =1 ψ
(n−1) (j )
0
n even , n odd
(10)
where ψ is a polygamma function. Thus Q1 (N ) = R1 (N ) = 0. It is straightforward to obtain a complete (large N ) asymptotic expansion for these cumulants. For example, 1 1 1 Q2 (N ) = (Re log Z)2 = log N + (γ + 1) + + O(N −4 ), (11) U (N) 2 2 24N 2 2n−1 − 1 ζ (n − 1) (n) + O(N 2−n ), n ≥ 3, (12) Qn (N ) = (−1)n 2n−1 and
R2 (N ) = (Im log Z)2
U (N)
= Q2 (N )
1 1 1 + O(N −4 ), log N + (γ + 1) + 2 2 24N 2 (−1)(k+1) R2k (N ) = ζ (2k − 1) (2k) + O(N 2−2k ), k > 1. 22k−1 =
(13) (14)
The fact that when k > 1 R2k (N ) tends to a constant as N → ∞ proves a conjecture made by Costin and Lebowitz [15]. (v) It follows from (6) that fCUE (λ) = lim
1
N→∞
Nλ
2
|Z(U, θ )|2λ
U (N)
=
G2 (1 + λ) , G(1 + 2λ)
(15)
where G denotes the Barnes G-function [3], and hence that fCUE (0) = 1 (trivial) and fCUE (k) =
k−1 j =0
j! (j + k)!
(16)
for integers k ≥ 1. Thus, for example, fCUE (1) = 1, fCUE (2) = 1/12, fCUE (3) = 42/9! 2 and fCUE (4) = 24024/16!. fCUE (k) is the coefficient of N k in MN (2k), which, as noted
Random Matrix Theory and ζ (1/2 + it)
61
above, is a polynomial in N of degree k 2 . The coefficients of the lower-order terms can also be calculated explicitly. Similarly, lim N λ LN (2λ) = G(1 − λ)G(1 + λ). 2
N→∞
(17)
The results listed above allow us to compute the value distributions of Re log Z, Im log Z, and |Z|, for any N , and to derive explicit asymptotics for these distributions when N → ∞. In comparing our random-matrix results with what is known about the zeta function, we find the following. First, the value distributions of Re log Z and Im log Z coincide with Odlyzko’s numerical data for the corresponding distributions of the values of the zeta function at a height T up the critical line if we make the identification (8). This implies that, with respect to its local statistics, the zeta function behaves like a finite polynomial of degree N given by (8). The value distribution of |Z| is similarly in agreement with our numerical data for that of |ζ (1/2 + it)|. It is important at this stage to remark that Montgomery’s conjecture (and its generalization) refers to the short range correlations (i.e. correlations on the scale of mean separation) between the Riemann zeros at a height T up the critical line, in the limit as T → ∞. The finite-T correlations take the form of a sum of two contributions, one being the random-matrix limit and the other representing long range deviations which may be expressed as a sum over the primes [4, 25, 5]. This is also known to be the case for the second moment of Im log ζ (1/2 + it). Specifically, Goldston [16] has proved, under the assumption of RH and Montgomery’s conjecture, that as T → ∞, 1 T
T
(Im log ζ (1/2 + it))2 dt
0
∞
=
(1 − m) 1 1 T 1 + o(1). log log + (γ + 1) + 2 2π 2 m2 p m p
(18)
m=2
Here the first two terms on the right-hand side agree with those in (13) if we again make the identification (8). The same general behaviour also holds for the higher moments of log ζ . It is plausible then that the moments of |ζ (1/2 + it) | (which are determined by long-range correlations between the zeros) asymptotically split into a product of two terms, one coming from random matrix theory and the other from the primes. Taken together with the fact that fCUE (k) = f (k) for k = 1, 2, and, conjecturally, for k = 3, 4, this leads us to conjecture that f (λ) = fCUE (λ)
(19)
for all λ where the moments are defined. This is further supported by other heuristic arguments, and by the fact that the product of a(λ) and our formula (6) for the moments of |Z(U, θ )| matches Odlyzko’s numerical data for the moments of |ζ (1/2 + it)| over the range 0 < λ ≤ 2, where we can compare them, again making the identification (8). These results were first announced in lectures at the Erwin Schrödinger Institute in Vienna, in September 1998 and at the Mathematical Sciences Research Institute in Berkeley in June 1999. The structure of this paper is as follows. We derive the CUE results listed above in Sect. 2, and then compare them with numerical data (almost all taken from [29]) for the Riemann zeta-function in Sect. 3. Our conjecture (19) is also discussed in more detail
62
J. P. Keating, N. C. Snaith
in this section. In Sect. 4 we state the analogues of the CUE results for the other circular ensembles of RMT, namely the Circular Orthogonal (COE) and Circular Symplectic (CSE) Ensembles. Numerical evidence suggests that the eigenvalues of the laplacian on certain compact (non-arithmetic) surfaces of constant negative curvature are asymptotically the same as those of matrices in the COE, and so our results might be expected to describe the associated Selberg zeta functions. More generally, it has been suggested that in the semiclassical (h¯ → 0) limit the quantum eigenvalue statistics of all generic, classically chaotic systems are related to those of the RMT ensembles (COE for time-reversal symmetric integer-spin systems, CUE for non-time-reversal integer-spin systems, and CSE for half-integer-spin systems) [10], and our results might then be expected to apply to the corresponding quantum spectral determinants. It is worth noting in this respect that extensive numerical evidence supports the conclusion that for classically chaotic systems the value distribution of the fluctuating part of the spectral counting function (which is proportional to the imaginary part of the logarithm of the spectral determinant) tends to a Gaussian in the semiclassical limit [6, 2]. Finally, it is worth remarking that Montgomery’s conjecture extends to many other classes of L-functions, and hence our results are expected to apply to them too, in the same way. However, Katz and Sarnak [22, 23] have conjectured that correlations between the zeros low down on the critical line, defined by averaging over L-functions within certain particular families, are described not by averages over the CUE, that is, over the unitary group U (N ), but by averages over other classical compact groups, for example the orthogonal group O(N ) or the unitary symplectic group U Sp(2N ). Thus the value distributions within these families close to the symmetry point t = 0 on the critical line will also be described by averages over the corresponding groups. We shall present our results in this case in a second paper [26].
2. CUE Random Matrix Polynomials 2.1. Generating functions. All of our CUE random-matrix results follow from the formulae (6) and (7) for the generating functions MN (s) and LN (s), and our goal in this section is to derive these expressions. Consider first MN (s). We start with the representation of Z(U, θ ) in terms of the eigenvalues eiθn of U : Z(U, θ ) =
N
1 − ei(θn −θ) .
(20)
n=1
The CUE average can then be performed using the joint probability density for the 2 eigenphases θn , ((2π)N N !)−1 j <m eiθj − eiθm [34, 27]. Thus |Z|s U (N) =
1 (2π)N N ! ×
1≤j <m≤N
2π
2π
···
0
iθj e
dθ1 · · · dθN N s 2 iθm i(θn −θ) − e (1 − e ) . 0
n=1
(21)
Random Matrix Theory and ζ (1/2 + it)
63
This integral can be evaluated exactly using Selberg’s formula (see, for example, Chapter 17 of [27]): J (a, b, α, β, γ , N ) 2γ ∞ ∞ N ··· (x − x ) (a + ixj )−α (b − ixj )−β dxj = j 3 −∞ −∞ 1≤j 1 and 1 Re α Re β Re (α + β − 1) − < Re γ < min , , . (23) N N −1 N −1 2(N − 1) To see this, note that (21) can be written in the form 2π 2N(N−1) 2sN 2π |Z|s U (N) = · · · dθ1 · · · dθN N !(2π )N 0 0
(24)
N sin(θj /2 − θm /2)2 |sin(θn /2 − θ/2)|s .
1≤j <m≤N
n=1
×
Clearly this integral is independent of θ (as it must be, since we are averaging over all unitary matrices) and so we set θ = 0. Using sin(θj −θm ) = sin θj cos θm −cos θj sin θm , we then have π π 2 2N +sN cot θm − cot θj 2 · · · dθ · · · dθ |Z|s U (N) = 1 N N !(2π)N 0 0 1≤j <m≤N
×
N
sin2 θn
N N−1
n=1
|sin θk |s .
(25)
k=1
Finally, the change of variables xn = cot θn gives 2N +sN N !(2π)N 2
|Z|s U (N) =
×
N
∞ −∞
···
∞ −∞
dx1 · · · dxN
|xm − xj |2
1≤j <m≤N
((1 + ixn )(1 − ixn ))−N−s/2
n=1
= =
2 2N +sN
N !(2π)N N j =1
J (1, 1, N + s/2, N + s/2, 1, N )
(j ) (s + j ) , ( (j + s/2))2
(26)
64
J. P. Keating, N. C. Snaith
provided Res > −1, which is just the result (6). Clearly the product (26) has an analytic continuation to the rest of the complex plane. Consider next LN (s). Note first that, according to the definition given in the Introduction,
Z Z∗
1 2
= exp (i Im log Z(U, θ ))
∞ N sin[(θn − θ)m] = exp −i , m
(27)
n=1 m=1
where for each value of n, the sum of sines lies in (−π, π ]. Hence, again using the joint probability density of the eigenphases θn , s 2π 2π 1 Z 2 eiθj − eiθm 2 = · · · dθ · · · dθ 1 N ∗ N Z N !(2π) 0 0 1≤j <m≤N U (N)
N ∞ sin[(θn − θ)k] . (28) × exp −is k n=1
k=1
As before, this integral is independent of θ , and so we set θ = 0. The sum in (28) can be evaluated using ∞ sin kx k=1
k
=
π −x , for 2
0 < x < 2π.
(29)
Note that this relation keeps the sine sum within the range (−π, π ] prescribed by the definition of the logarithm. Substituting (29) into (28) then yields s 2π 2π 2N(N−1) Z 2 sin(θj /2 − θm /2)2 = · · · dθ · · · dθ 1 N Z∗ N!(2π)N 0 0 U (N)
×
N
exp −
n=1
1≤j <m≤N
is (π − θn ) . 2
(30)
Making the transformation φj = θj /2 − π/2 and using the identity sin(φj − φm ) = (tan φj − tan φm ) × cos φj cos φm gives s π/2 π/2 2 Z 2 2N = ··· dφ1 · · · dφN (31) Z∗ N!(2π )N −π/2 −π/2 U (N)
×
N tan φj − tan φm 2 (cos2 φn )N−1
1≤j <m≤N
×
N
(cos φk + i sin φk )s .
k=1
n=1
Random Matrix Theory and ζ (1/2 + it)
65
Finally, changing variables to xj = tan φj , we have that s ∞ ∞ 2 Z 2 2N xj − xm 2 = · · · dx · · · dx 1 N Z∗ N !(2π )N −∞ −∞ 1≤j <m≤N U (N) s N N N 1 xk 1 × × +i 1 + xn2 2 1+x 1 + x2 n=1
=
k=1
2 2N
N!(2π )N ×
N n=1
∞
···
−∞
1 1 + xn2
N
∞ −∞
k
dx1 · · · dxN 1 + ixk √ 1 − ixk
k=1
xj − xm 2
1≤j <m≤N
N √
×
k
s
.
(32)
This is in the form of Selberg’s integral (22) with a = b = 1, α = N − s/2, β = N + s/2 and γ = 1 (the condition (23) is satisfied when |s| < 2) and so we have s N Z 2 ( (j ))2 = , (33) Z∗ (j + s/2) (j − s/2) j =1
U (N)
as required. 2.2. Value distribution of Re log Z. All information about the value distribution of Re log Z can be obtained from the generating function MN (s): the moments may be obtained from the coefficients in the Taylor expansion of MN about s = 0, MN (s) =
∞ (log |Z|)j U (N)
j!
j =0
sj ;
(34)
the corresponding cumulants Qj (N ) are related to the Taylor coefficients of log MN , log MN (s) =
∞ Qj (N )
j!
j =1
sj ;
(35)
and the probability density for the values taken by Re log Z, ρN (x) =< δ(log |Z| − x) >U (N) , is given by 1 ρN (x) = 2π
∞
−∞
e−iyx MN (iy)dy.
(36)
(37)
We now analyse these general formulae using the explicit expression (6) for MN (s). Differentiating log MN (s), we have that N
Qn (N ) =
2n−1 − 1 (n−1) ψ (j ), 2n−1 j =1
(38)
66
J. P. Keating, N. C. Snaith
where ψ (n) (z) =
d n+1 log (z) dzn+1
(39)
are the polygamma functions. Thus it follows immediately that Q1 (N ) = (log |Z|)U (N) = 0.
(40)
Furthermore, substituting the well-known integral representation for the polygamma functions [1], when n ≥ 2, ∞ n−1 −j t N t e 2n−1 − 1 n (−1) dt n−1 2 1 − e−t 0 j =1 ∞ n−1 −t 2n−1 − 1 t e 1 − e−Nt n = (−1) dt 2n−1 1 − e−t 1 − e−t 0 ∞ 2n−1 − 1 e−t n = (−1) 2n−1 1 − e−t 0 × (n − 1)t n−2 − (n − 1)t n−2 e−Nt + N t n−1 e−Nt dt,
Qn (N ) =
(41)
where the last equality follows from an integration by parts. Consider first the second cumulant Q2 (N ). Rearranging the integrand in the final equality of (41),
1 ∞ 1 − e−Nt −t e−(N+1)t dt, (42) e + Nt Q2 (N ) = 2 0 1 − e−t 1 − e−t and so, re-expanding the terms written as fractions to give geometric series and integrating these term-by-term, we have that Q2 (N ) = (log |Z|)2 U (N) =
N ∞ 11 N 1 . + 2 n 2 n2 n=1
(43)
n=N+1
The large-N asymptotics can then be obtained by substituting n 1 k=1
k
∞
= γ + log n +
Ak 1 − , 2n n(n + 1) · · · (n + k − 1)
(44)
k=2
1 where Ak = k1 0 x(1−x)(2−x)(3−x) · · · (k−1−x)dx, into the first sum and applying the Euler–Maclaurin formula to the second. Any number of terms in the expansion in inverse powers of N can be calculated in this way; for example 1 1 1 1 1 . − + O Q2 (N ) = (log |Z|)2 U (N) = log N + (γ + 1) + 2 2 24N 2 80N 4 N6 (45)
Random Matrix Theory and ζ (1/2 + it)
67
Consider next the cumulants Qn (N ) when n ≥ 3. We now write ∞ −Nt t n−2 2n−1 − 1 n n−1 n−2 e (n−1) (−1) −(n−1)t ) + (N t dt. Qn (N ) = 2n−1 et − 1 et − 1 0 (46) The first term, which is independent of N , can be integrated explicitly using a wellknown representation of the zeta-function [33]. Changing variables in the second to y = tN then gives 2n−1 − 1 Qn (N ) = (−1)n n−1 2 ∞ 1 1 n−1 n−2 −y × (n)ζ (n − 1) + n−1 (y − (n − 1)y )e dy . N ey/N − 1 0
(47)
The N -dependent term in this equation clearly vanishes in the limit as N → ∞. Its large-N asymptotics can be obtained by expanding (ey/N − 1)−1 in powers of y/N and then integrating term-by-term; for example (n − 3)! 2n−1 − 1 n (n)ζ (n − 1) − + O(N 1−n ). (−1) (48) Qn (N ) = 2n−1 N n−2 It follows immediately from the fact that Qn (N√)/(Q2 (N ))n/2 → 0 as N → ∞ for all n > 2 that the value distribution of Re log Z/ Q2 (N ) tends to a Gaussian in this limit. Specifically, we have from (37) and the definition of the cumulants that if (49) ρ˜N (x) = Q2 (N )ρN ( Q2 (N )x) then 1 ρ˜N (x) = 2π
∞
y2 Q4 y 4 iQ3 y 3 exp −iyx − + + ··· − 3/2 2 4!Q22 −∞ 3!Q2
dy.
(50)
Hence all terms in the exponential that involve higher powers of y than y 2 vanish in the limit as N → ∞. Evaluating the resulting Gaussian integral then gives 2 1 −x lim ρ˜N (x) = √ exp . (51) N→∞ 2 2π The large-N asymptotics describing the approach to this limit can be obtained by retaining more terms in (50). There are several ways to do this. One is to expand the exponential of all terms that involve higher powers of y than y 2 as a series in increasing powers of y, so that
2 ∞ Q3 (iy)3 1 1 Q4 (iy)4 −x 2 + e−iyx e−y /2 + + ··· ρ˜N (x) = √ exp 3/2 2 2π −∞ 4!Q22 2π 3!Q2
2 Q3 (iy)3 Q4 (iy)4 + 2! + + ··· 3/2 4!Q22 3!Q2
68
J. P. Keating, N. C. Snaith
Q3 (iy)3
Q4 (iy)4 + + ··· 4!Q22
3
3! + · · · dy 3/2 3!Q2
2 ∞ 3 1 A4 (iy)4 1 −x −iyx −y 2 /2 A3 (iy) = √ exp e e + + 3/2 2 2π −∞ Q22 2π Q2 A5 (iy)5 + + · · · dy, 5/2 Q2 +
(52)
where the coefficients An (N ) are defined in terms of combinations of the cumulants Qn (N ) with n ≥ 3 (for example, A3 = Q3 /3!). Integrating term-by-term then gives 2 ∞ 1 1 Am −x 2 /2 −x e +√ ρ˜N (x) = √ exp 2 2π 2π m=3 Qm/2 2 m m p i m−p (m − p − 1)!!, m − p even x × (53) 0, m − p odd p p=0
from which it follows that the deviation from the Gaussian limit is of the order of (log N )−3/2 (because An (N ) → constant as N → ∞). It may be seen from (53) that it is only in the limit as N → ∞ that ρ˜N (x) becomes even in x: when N is finite it is asymmetric about x = 0. This can be traced back to the fact that the series in the exponential in (50) involves both even and odd powers of y. Indeed, the dominant N → ∞ asymptotics can also be computed by retaining only the y 3 term in the exponential (and not expanding the exponential as a series itself). Thus
∞ 1 y2 iQ3 y 3 ρ˜N (x) ∼ dy, (54) exp −ixy − − 3/2 2π −∞ 2 3!Q2 and this integral can then be computed exactly in terms of the Airy function Ai(z), giving
√ 3/2 −2 1/3 Q32 xQ2 Q22 21/3 x Q2 Ai , ρ˜N (x) ∼ Q2 exp + + 1/3 4/3 Q3 Q3 3Q23 Q3 22/3 Q3 (55) which itself is manifestly asymmetric in x. Finally, we note that the formulae derived above lead directly to corresponding expressions for the moments, since these may be related to the cumulants by taking the exponential of the right-hand side of (35), re-expanding as a Taylor series in powers of s, and equating the coefficients with those in (34). Thus, for example, it is straightforward to see that dn (log |Z|)n U (N) = n MN (s) |s=0 ds (2k − 1)!!(log |Z|)2 kU (N) + O((log N )k−2 ) if n = 2k = , if n = 2k + 1 O((log N )k−1 )
(56)
where the second moment is given by (45). This again implies that the limiting distribution is Gaussian.
Random Matrix Theory and ζ (1/2 + it)
69
2.3. Value distribution of Im log Z. In the same way as for the real part, all information about the value distribution of Im log Z is contained in the generating function LN (s). Thus, LN (−it) =
∞ (Im log Z)j U (N)
j!
j =0
tj ,
(57)
and similarly for the corresponding cumulants Rj , log LN (−it) =
∞ Rj (N ) j =1
j!
tj ,
(58)
where LN (s) is given by (7). Likewise, the probability density for the values taken by Im log Z, σN (x) =< δ(Im log Z − x) >U (N) ,
(59)
is given by σN (x) =
1 2π
∞
−∞
e−iyx LN (y)dy.
(60)
All of the results of the previous section then extend immediately to Im log Z. Thus, taking the logarithm of (7) and differentiating, N (−i)n (n−1) n−1 (n−1) (j ) + (−1) ψ (j ) −ψ 2n j =1 0 if n odd . = (−1)n/2+1 N (n−1) (j ) if n even j =1 ψ 2n−1
Rn (N ) =
(61)
The fact that all of the odd cumulants are zero implies that all of the odd moments are also zero. This is the main difference compared to the case of Re log Z. For the even cumulants we have R2m (N ) =
(−1)m+1 Q2m (N ), 22m−1 − 1
(62)
and so the asymptotics computed in the previous section apply immediately in this case too. Thus R2 (N ) = (Im log Z)2 U (N) 1 1 1 1 , = Q2 (N ) = log N + (γ + 1) + +O 2 2 24N 2 N4
(63)
and for m > 1, R2m (N ) =
(−1)m+1 22m−1
(2m)ζ (2m − 1) −
(2m − 3)! N 2m−2
+ O(N 1−2m ).
(64)
70
J. P. Keating, N. C. Snaith
The fact that, for m ≥ 2, R2m /R2m → 0 as N → ∞ implies that the value distribution √ of Im log Z/ R2 (N ) tends to a Gaussian in the limit. This was first proved by Costin and Lebowitz [15] for the GUE of random matrices. Specifically, they proved that the fluctuating part of the eigenvalue counting function has a limiting value distribution that is Gaussian. The connection comes because the two functions are the same, up to multiplication by π; specifically, if n(U, a, b) denotes the number of eigenvalues of U with a < θn < b, then n(U, a, b) =
(b − a)N 1 Z(U, b) + Im log , 2π π Z(U, a)
(65)
assuming that none of the eigenphases coincides with the end-points of the range. In addition, Costin and Lebowitz conjectured that, for m ≥ 2, R2m (N ) → constant when N → ∞. Our asymptotic formula (64) proves this for averages over the CUE, and provides the value of the constant. Wieand [35] has independently given a proof of the central limit theorem for n(U, a, b) − (b − a)N/(2π ) in the CUE case. The asymptotics of the approach to the Gaussian can be calculated from (58) and (60). Defining σ˜ N (x) =
R2 (N )σN ( R2 (N )x),
(66)
we have that y2 exp −iyx − 2 −∞
R6 y 6 R4 y 4 − 3 + · · · dy × exp R22 4! R2 6! 2 ∞ 1 y2 1 −x = √ exp exp −iyx − + 2 2π −∞ 2 2π 4 6 8 C6 y C8 y C4 y + + + · · · dy, × 2 3 R2 R2 R24
σ˜ N (x) =
1 2π
∞
(67)
where the coefficents Cn (N ) are defined in terms of the cumulants R2m (N ) with m > 1; for example C4 (N ) = R4 (N )/4!. Thus, integrating term-by-term, 2 ∞ 1 1 C2m −x 2 /2 −x +√ σ˜ N (x) = √ exp m e 2 2π 2π m=2 R2 2m 2m p (2m − p − 1)!! if 2m − p is even × . (−ix) 0 if 2m − p is odd p
(68)
p=0
In this case σ˜ N (x) is an even function of x for all N , and not just in the limit as N → ∞. This is a consequence of the fact that all of the odd cumulants are identically zero. It follows from (68) that the deviation from the Gaussian limit is of the order of (log N )−2 , and so is asymptotically smaller than in the case of Re log Z.
Random Matrix Theory and ζ (1/2 + it)
71
Finally, the expressions derived above for the cumulants may again be used to deduce information about the moments. We have already noted that the odd moments are identically zero. For the even moments we find the usual Gaussian relationship: (Im log Z)2k U (N) = (2k − 1)!!(Im log Z)2 kU (N) + O((log N )k−2 ),
(69)
where the asymptotics of the second moment are given by (63). 2.4. Independence. We have shown in Sects. 2.2 and 2.3 that the values of both Re log Z and Im log Z have a Gaussian limit distribution as N → ∞. Our purpose in this section is to show that they are also independent in this limit. The generating function for the joint distribution is 2π 2π 1 t is(Im log Z) eiθj − eiθk 2 |Z| e U (N) = · · · dθ · · · dθ 1 N N!(2π )N 0 0 1≤j > − 21 log N , the lower bound being determined by the first pole of MN (s).
82
J. P. Keating, N. C. Snaith
0.6
CUE
0.5
Zeta
0.4
0.3
0.2
0.1
0
5
10
15
20
Fig. 4. The CUE value distribution of |Z|, corresponding to N = 12, with numerical data for the value distribution of |ζ (1/2 + it)| near t = 106
For any finite N we can plot PN (w) numerically by direct evaluation of (102). This is done in Fig. 4 together with data for the value distribution of |ζ (1/2 +it)| when t ≈ 106 , which corresponds via (92) to N = 12. As w → 0, PN (w) tends to a constant for a given N , the value of which can be calculated by noting that the contribution to the integral is dominated by the pole of MN (is) (at s = i) closest to the real axis. Hence lim PN (w) =
w→0
2 N (j ) 1 . (N ) (j − 1/2)
(105)
j =1
If N is large, this is asymptotic to [19] 1 1 1/4 2 log 2 + 3ζ (−1) − log π N 1/4 . N (G(1/2)) = exp 12 2
(106)
Based on the previous discussion of its moments, it is natural to expect that as t → ∞ the way in which the primes contribute to the value distribution of |ζ (1/2 + it)| is given by P˜N (w) =
1 2πw
∞ −∞
e−is log w a(is/2)MN (is)ds.
(107)
Consequently, P˜N (0) = a(−1/2)PN (0).
(108)
Random Matrix Theory and ζ (1/2 + it)
83
We find that a(−1/2) ≈ 0.919, P12 (0) ≈ 0.671, and so a(−1/2)P12 (0) ≈ 0.617, which is indeed close to the numerically computed value of the probability density at zero, 0.613. Away from w = 0, in the region where (104) is valid, the stationary point of (107) is at s ∗ = −i log w/Q2 , so a(is ∗ /2) = a(log w/(2Q2 )). Since a(0) = 1, when | log w| 1, and by an analytical continuation in the rest of the complex plane. We conjectured that the moments of ζ (1/2 + it) high on the critical line t ∈ R factor into a part which is specific to the Riemann zeta function, and a universal component which is the corresponding moment of the characteristic polynomial Z(U, θ) of matrices in U (N ), defined with respect to an average over the CUE. The connection between N and the height T up the critical line corresponds to equating the mean density of eigenvalues 1 T N/2π with the mean density of zeros 2π log 2π . This idea has subsequently been applied by Brézin and Hikami [2] to other random matrix ensembles, and by Coram and Diaconis [4] to other statistics. Our purpose here is to extend these calculations to SO(2N ) and U Sp(2N ), and to compare the results with what is known about the L-functions. (Only SO(2N ) is relevant, because a family of L-functions governed by O(N ) falls approximately into two halves; one displaying even symmetry about s = 1/2, and the other odd symmetry. This latter class contributes zero to averages at the central value, while the zero statistics of the former are expected to follow those of SO(2N ).) We therefore consider the value distribution of the characteristic polynomial of 2N ×2N orthogonal or unitary symplectic matrices, Z(U, θ ) =
N
1 − ei(θn −θ)
1 − ei(−θn −θ) ,
(3)
n=1
averaged over these groups when θ = 0, which is the symmetry point for the eigenvalues, just as s = 1/2 is for the L-function zeros. In each case we derive an explicit expression for Z(U, 0)s , valid for all N , and from this obtain the leading order N → ∞ asymptotics, together with a simplified formula when s is an integer. We also derive the value
Random Matrix Theory and L-Functions at s = 1/2
93
distributions of log Z(U, 0) and Z(U, 0). Comparing with results for various families of L-functions suggests that, as for ζ (s), random matrix theory determines the universal part of (1). This then provides further support for the programs of Katz and Sarnak, and Conrey and Farmer. 2. Symplectic Symmetry 2.1. Random matrices in U Sp(2N ). We are interested here in the group of symplectic unitary matrices, U Sp(2N ).These are2N × 2N matrices, U , with U U † = 1 and 0 IN U t J U = J , where J = and IN is the N × N identity matrix. For −IN 0 these matrices, the eigenvalues lie on the unit circle and come in complex conjugate pairs. Thus the characteristic polynomial related to such a matrix with eigenvalues eiθ1 , e−iθ1 , eiθ2 , e−iθ2 , . . . eiθN , e−iθN takes the form (3). Our first step will be to calculate the moments of Z(U, 0) defined by averaging with respect to Haar measure. The joint probability density function of the eigenvalues is thus [17]
NSp
1≤i<j ≤N
θi − θj sin 2
θi + θj sin 2
= NSp
2 N
1≤i<j ≤N
sin2 θk
k=1
2 N
1 (cos θj − cos θi ) 2
sin2 θk ,
(4)
k=1
with normalization constant NSp = 2
−3N
N
(1 + N + j ) 22N −2N = . (1 + j )((1/2 + j ))2 πNN! 2
j =1
(5)
Noting that Z(U, 0) =
N (1 − eiθn )2 n=1
= 22N
N
sin2 (θn /2)
n=1
= 2N
N
(1 − cos θn ),
(6)
n=1
we proceed to
Z(U, 0)s
U Sp(2N)
= NSp 2Ns ×
2π 0
1≤i<j ≤N
2π
··· 0
dθ1 · · · dθN 2 N
1 (cos θj − cos θi ) 2
k=1
sin2 θk
N (1 − cos θn )s n=1
94
J. P. Keating, N. C. Snaith
= NSp 2Ns+2N−N ×
N
sin2 θk
2
π
0
N
π
··· 0
dθ1 · · · dθN
(cos θj − cos θi )2
1≤i<j ≤N
(1 − cos θn )s ,
(7)
n=1
k=1
which, after the transformation xj = cos θj , becomes
s
Z(U, 0)
= NSp 2
U Sp(2N)
×
N
Ns+2N−N 2
1 −1
···
1 −1
dx1 · · · dxN
(xj − xi )2
1≤i<j ≤N
(1 − xk )1/2+s (1 + xk )1/2 .
(8)
k=1
There is a form of Selberg’s integral (detailed in [11]) which states that
1
−1
···
1
−1 1≤j 0, Reβ > 0 and Reγ > − min n1 , n−1 , n−1 . In our case γ = 1, α = 3/2 + s and β = 3/2, so
Z(U, 0)s
= NSp 2Ns+2N−N 2N 2
U Sp(2N)
×
N−1 j =0
= 22Ns
2 +N+Ns
(2 + j )(3/2 + s + j )(3/2 + j ) (2)(3 + s + N + j − 1)
N (1 + N + j )(1/2 + s + j ) (1/2 + j )(1 + s + N + j )
j =1
≡ MSp (N, s).
(10)
We now consider the coefficients cj in the expansion MSp (N, s) = ec1 s+c2 s
2 /2+c s 3 /3!+c s 4 /4!+··· 3 4
.
(11)
These coefficients are the cumulants of the distribution of log Z(U, 0), because MSp (N, s) is the generating function for the moments of this distribution: MSp (N, s) =
∞ j =0
(log Z(U, 0))j U Sp(2N)
sj . j!
(12)
Random Matrix Theory and L-Functions at s = 1/2
95
Since log MSp (N, s) = 2sN log 2 +
N
(log (1/2 + s + j ) + log (1 + N + j )
j =1
− log (1/2 + j ) − log (1 + s + N + j )),
(13)
we find that N
c1 =
d (ψ(1/2 + j ) − ψ(1 + N + j )) (14) log MSp (N, s)s=0 = 2N log 2 + ds j =1
is the first cumulant, while the higher ones are given by cn =
N dn (n−1) (n−1) log M (N, s) = (1/2 + j ) − ψ (1 + N + j ) , ψ Sp s=0 ds n j =1
(15) j +1
d where ψ (j ) (z) = dz j +1 log (z) is a polygamma function. We seek the behaviour of these cumulants for large N . For the first we use the asymptotic formula ∞
ψ(z) ∼ log z −
B2n 1 , − 2z 2nz2n
(16)
n=1
which holds when z → ∞ with |argz| < π , where B2n are the Bernoulli numbers. Also, we need the integral form of the digamma function,
∞ −t e − e−zt dt. (17) ψ(z) + γ = 1 − e−t 0 Applying (17), we obtain c1 = 2N log 2+
N j =1
= 2N log 2 +
∞ 0
N
j =1 0
∞
e−t −e−(j +1/2)t dt −γ − 1−e−t
0
∞
e−t −e−(j +N+1)t dt + γ 1 − e−t
e−(j +N+1)t − e−(j +1/2)t dt. 1 − e−t
(18)
We now interchange the summation and integration and perform the sum explicitly so that we can integrate by parts to arrive at
∞ −(2N+2)t e−(N+2)t e dt + (2N + 1) dt −t 1 − e 1 − e−t 0 0
∞ −(N+3/2)t e−3t/2 e dt − (N + 1/2) dt. 1 − e−t 1 − e−t 0
c1 = 2N log 2 − (N + 1) +
1 2
0
∞
∞
96
J. P. Keating, N. C. Snaith
Converting back to polygamma function notation via (17), then applying (16) as N becomes large, we see that c1 = 2N log 2 + (N + 1)ψ(N + 2) − (2N + 1)ψ(2N + 2) 1 − ψ(3/2) + (N + 1/2)ψ(N + 3/2) 2 1 γ = log N + + O(N −1 ). 2 2
(19)
For the second cumulant we need to use the asymptotic formula for higher polygamma functions, valid as z → ∞ with |arg z| < π ,
∞ (n − 1)! (2k + n − 1)! n! ψ (n) (z) ∼ (−1)n−1 . (20) + n+1 + B2k zn 2z (2k)!z2k+n k=1
There is also an integral formula for the higher polygamma functions which will prove useful:
∞ n −zt t e ψ (n) (z) = (−1)n−1 dt. (21) 1 − e−t 0 This leads us to c2 =
N
ψ (1) (j + 1/2) − ψ (1) (1 + N + j )
j =1
=
N j =1
∞ 0
te−(j +1/2)t dt − 1 − e−t
∞ 0
te−(1+N+j )t dt . 1 − e−t
(22)
Again we interchange the order of the summation and integration, perform the sum and integrate by parts. The result, expressed in terms of polygamma functions, is 1 c2 = −ψ(3/2) − ψ (1) (3/2) + ψ(N + 3/2) + (N + 1/2)ψ (1) (N + 3/2) 2 + ψ(N + 2) + (N + 1)ψ (1) (N + 2) − ψ(2N + 2) − (2N + 1)ψ (1) (2N + 2) 3 = log N + 1 + γ + log 2 − ζ (2) + O(N −1 ). (23) 2 The higher cumulants follow in a similar manner
∞ n−1 −(1/2+j )t
∞ n−1 −(1+N+j )t N t e t e n n cn = (−1) dt − (−1) dt −t 1 − e 1 − e−t 0 0 j =1
∞ n−1 −(3/2)t t e 1 − e−Nt = (−1)n dt −t 1−e 1 − e−t 0
∞ n−1 −2−N t e 1 − e−Nt − (−1)n dt, (24) −t 1−e 1 − e−t 0
Random Matrix Theory and L-Functions at s = 1/2
97
and so N→∞
∞
t n−1 e−(3/2)t dt (1 − e−t )(1 − e−t )
0
∞ n−2 −(1/2)t n−1 e−(3/2)t ∞ t e n −t = (−1) dt + (n − 1) 0 1 − e−t 1 − e−t 0
1 ∞ t n−1 e−(1/2)t − dt 2 0 1 − e−t
lim cn = (−1)n
1 = −(n − 1)ψ (n−2) (1/2) − ψ (n−1) (1/2) 2 1 n n n−1 = (−1) (n − 1)! (2 − 1)ζ (n − 1) − (2 − 1)ζ (n) . 2
(25)
These expressions for the cumulants, inserted into (11), allow us to write the leading order coefficient of the moment MSp (N, s) as MSp (N, s) fSp (s) ≡ lim N→∞ N s/2+s 2 /2 2 γ 3 s = exp s + 1 + γ + log 2 − ζ (2) (26) 2 2 2 n ∞ 1 s (−1)n (2n−1 − 1)ζ (n − 1) − (−1)n (2n − 1)ζ (n) + . 2 n n=3
This coefficient can be expressed as a combination of gamma functions and the Barnes G-function [1, 16], which is defined by G(1 + z) = (2π)z/2 e−[(1+γ )z
2 +z]/2
∞
(1 + z/n)n e−z+z
2 /(2n)
,
(27)
n=1
and has zeros at the negative integers, −n, with multiplicity n (n = 1, 2, 3 . . . ). Other properties useful to us are that G(1) = 1, G(z + 1) = (z) G(z),
(28)
and furthermore, for |z| < 1, ∞
z zn z2 log G(1 + z) = (log(2π) − 1) − (1 + γ ) + (−1)n−1 ζ (n − 1) . 2 2 n
(29)
n=3
Combining this with log (1 + z) = −γ z +
∞ n=2
ζ (n)
(−z)n , n
(30)
98
J. P. Keating, N. C. Snaith
which holds for |z| < 1, we see that, for |s| < 1/2, 1 1 1 log G(1 + 2s) − log (1 + 2s) + log (1 + s) 2 2 2 ∞ s2 3 γ (−1)n (2n−1 − 1)ζ (n − 1) = s + (1 + γ ) − ζ (2)s 2 + 2 2 4 n=3 n 1 s − (−1)n (2n − 1)ζ (n) . 2 n
log G(1 + s) −
(31)
A comparison with (26) shows that fSp (s) = 2
s 2 /2
√ G(1 + s) (1 + s) ×√ , G(1 + 2s)(1 + 2s)
(32)
for |s| < 1/2, and hence by analytic continuation for all s. For integer moments the formula is simpler. Using (28) we see that G(n) =
n−1
(j ),
(33)
j =1
and so for integer n,
√ G(1 + n) (1 + n) √ G(1 + 2n)(1 + 2n) n √ (1 + n) j =1 (j ) n2 /2 =2 2n j =1 (j ) (1 + 2n) √ n n! j =1 (j ) n2 /2 =2 2n+1 j =1 (j − 1)! n √ (j − 1)! n! 2 j =1 = 2n /2 22n−1 32n−2 42n−3 · · · (2n − 1)2 2n n √ √ √ √ 1 2··· n − 1 n j =1 (j − 1)! n2 /2 =2 √ √ √ 2 4 · · · 2n 22n−2 32n−2 42n−4 · · · (2n − 2)2 (2n − 1)2
fSp (n) = 2n
2 /2
1n−1 2n−2 · · · (n − 2)2 (n − 1) 2n/2 2n−1 3n−1 4n−2 5n−2 · · · (2n − 2)(2n − 1) 1 2 = 2n /2 n−1 2n/2 2 j =1 j nj=1 (2j − 1)!! −1 n = (2j − 1)!! . = 2n
2 /2
(34)
j =1
Following the ideas developed in [10], these integer coefficients have also been calculated independently by Brézin and Hikami [2].
Random Matrix Theory and L-Functions at s = 1/2
99
Having the generating function, MSp (N, s), it is a short step to find the value distributions of both log Z(U, 0) and Z(U, 0) itself. The distribution of log Z(U, 0)/ log N is δ(x − log Z(U, 0)/ log N ) U Sp(2N)
∞ 1 = e−iy(x−log Z(U,0)/ log N) dy 2π −∞ U Sp(2N)
∞ 1 = e−iyx MSp (N, iy/ log N )dy 2π −∞
∞ 1 2 3 = e−iyx ec1 iy/ log N+c2 (iy/ log N) /2+c3 (iy/ log N) /3!+··· dy 2π −∞
∞ 1 1 −iyx log N + O(1) iy/ log N e exp = 2π −∞ 2 y2 y3 −(log N + O(1)) − i(O(1)) + · · · dy. 2(log N )2 3!(log N )3 Thus
∞ 1 e−iyx+iy/2 dy 2π −∞ = δ(x − 1/2),
lim δ(x − log Z(U, 0)/ log N ) U Sp(2N) =
N→∞
(35) (36)
(37)
and so the distribution of values of log Z(U, 0)/ log N tends to a delta function centred at x = 1/2. If we instead retain the y 2 term in the exponent in (36), we have the central limit theorem 2 1 c1 log Z(U, 0) x = exp − , (38) lim δ x + √ − √ N→∞ c2 c2 2π 2 U Sp(2N) where c1 and c2 are related to N by (19) and (23), respectively. For finite N , the exact distribution is of course given by (35), where MSp (N, s) is defined by (10). 1
It is not difficult to determine as well the distribution of the values of Z(U, 0) log N . Changing variables in (35), results in
∞ ! 1 1 δ(x − Z(U, 0) log N ) = x −iy MSp (N, iy/ log N )dy, (39) U Sp(2N) 2π x −∞ and so lim δ(x − Z(U, 0)
N→∞
1 log N
) U Sp(2N)
∞ 1 = e−iy log x eiy/2 dy 2π x −∞ 1 = δ(log x − 1/2). x
(40)
Alternatively, we can examine the value distribution of Z(U, 0). Denoting it by PSp (N, x), we see that
∞ 1 PSp (N, x) = x −iy MSp (N, iy)dy. (41) 2π x −∞
100
J. P. Keating, N. C. Snaith
Although PSp (N, x) does not have a limiting distribution as N → ∞, we suggest the approximation
∞ 1 c2 y 2 exp −iy log x + ic1 y − dy PSp (N, x) ≈ 2πx −∞ 2 1 (log x − c1 )2 , (42) = √ exp − 2c2 x 2πc2 and plot it, for two values of N , in Fig. 1 along with the exact distribution (41). It should be noted that the approximation (42) is valid when x is fixed and N → ∞, and more generally is expected to be a good approximation when log x >> − log N and N is large, the lower bound being determined by the first pole of MSp (N, s). P (x)
P (x)
(a)
0.2
(b)
0.2 exact
exact
asymptotic
0.15
asymptotic
0.15
0.1
0.1
0.05
0.05 x
0
2
4
6
8
x 0
2
4
6
8
Fig. 1. Distribution of the values of Z(U, 0) for matrices in U Sp(2N ), (a) N = 6, (b) N = 42. The solid curve is the exact distribution (41) and the dashed curve is the large N approximation (42)
It may be seen from Fig. 1 that PSp (N, 0) = 0. Although the approximation (42) also tends to zero as x → 0, it does not predict the correct rate of approach. This may instead be obtained by examining the poles of the integrand of PSp (N, x) =
1 2πx
∞
−∞
x −iy 22iNy
N (1 + N + j )(1/2 + iy + j ) dy. (1/2 + j )(1 + iy + N + j )
(43)
j =1
These poles occur at the points y = i(2k + 1)/2 and are of order k, for k = 1, 2, . . . , N, then of order N for all higher k. Due to the factor x −iy it is evident that in the limit x → 0, the lowest pole, that at y = (3i)/2, gives the dominant contribution. From the residue at that lowest pole we thus find that as x → 0, PSp (N, x) ∼ x 1/2 2−3N
N 1 (1 + N + j )(j ) . (N ) (1/2 + j )(N + j − 1/2)
(44)
j =1
2.2. L-functions with symplectic symmetry. In the Introduction we gave a brief description of the mean values at s = 1/2 for families of L-functions and the relation of these to the symmetry type displayed by the low-lying zeros. Here we consider the case of symplectic symmetry in more depth.
Random Matrix Theory and L-Functions at s = 1/2
101
If we again use Conrey and Farmer’s notation, as in (1), then in the symplectic case they have V (z) = z and find that B(k) = 21 k(k + 1) [3]. They also list several families which are conjectured to have low-lying zeros with symplectic symmetry, the simplest of which is the Dirichlet L-functions, L(s, χd ), where χd is a quadratic Dirichlet character. The sum (1) is then over all such characters with conductor |d| ≤ D: as D → ∞, k 1 1 1 a(k) 1 , χd ∼ gk L (log D 2 ) 2 k(k+1) , 1 D∗ 2 (1 + 2 k(k + 1))
(45)
|d|≤D
where D ∗ is the number of quadratic characters included in the sum. For this case the first few values of gk for integer k have been found using numbertheoretic techniques to be [7, 15, 3] g1 = 1, g2 = 2, g3 = 24 and, by conjecture, g4 = 3 · 28 . It might be expected that gk is related to the random matrix moment values calculated in Sect. 2.1, since it is believed to be purely symmetry-determined. Our purpose now is to provide evidence in favour of this. Making the identification N = log(QA ),
(46)
and recalling that as N → ∞ MSp (N, k) ∼ fSp (k)N 2 k(k+1) , we conjecture that for symplectic families of L-functions 1
(1 +
gk 1 2 k(k
+ 1))
= fSp (k).
(47)
Following the arguments of [10], the relation between N and Q should arise from equating the mean densities of zeros. For the L-functions we need the density near s = 1/2 because we are dealing with the L-functions just at this point. In the case of L-functions with quadratic Dirichlet characters, (45), the mean density at a fixed height 1 up the critical line increases like 2π log |d| as |d| → ∞. Since the mean density of eigenvalues of a matrix in U Sp(2N ) is N/π , we equate N = (1/2) log D, and obtain exactly the proposed relation, since A = 1/2 in this case. It is then striking that the first few values of fSp at the integers, fSp (1) = 1, fSp (2) = 1 1 1 3 , fSp (3) = 45 and fSp (4) = 4725 , agree precisely, via (47), with the values that Conrey and Farmer report for the symplectic L-functions. Further evidence in favour of (47) is the success of a very similar conjecture relating moments of the Riemann zeta function to averages over U (N ) [10]. The only difference is that in the case of ζ (s) the average was along the critical line rather than over a family of functions. This is not a significant difference, however, and Conrey and Farmer in fact suggest that we think of the Riemann zeta function as a unitary family (with zeros showing the statistics of the eigenvalues of matrices from U (N )) in its own right, where we are averaging over special values of the family {ζ (1/2 + it)} as t ranges over the real numbers. The validity of the conjecture (47) would imply many results on the value distribution of the central values of symplectic L-functions. The distribution for the logarithm of symplectic families of L-functions, for example, is expected to behave for asymptotically large Q in the same way as that of the characteristic polynomial Z, always remembering
102
J. P. Keating, N. C. Snaith
that N must be related to the L-function parameter via the density of zeros. This is because the conjecture (47) can also be written as MSp (N, k) 1 1 k lim Lf ( ) = a(k) × lim , (48) 1 N→∞ N 21 k(k+1) 2 Q→∞ (log QA ) 2 k(k+1) Q∗ f ∈F c(f ) ≤ Q so the value distribution of log Lf ( 21 )/ log log QA defined by averages with c(f ) ≤ Q, would be, for large Q and making the identification (46),
∞ 1 VSp (x) = e−ixy a(iy/ log N )MSp (N, iy/ log N )dy, (49) 2π −∞ leading to 1 lim VSp (x) = N→∞ 2π
∞
−∞
a(0)e−ixy+iy/2 dy.
(50)
Since a(0) = 1, we see that this would imply that the distribution of log Lf ( 21 )/ log log QA is asymptotic to δ(x − 1/2), in just the same way as for log Z(U, 0)/ log N . Following the same line of argument, we suggest that " # log Lf 21 c˜1 lim δ x+√ − √ Q→∞ c˜2 c˜2 F
∞ √ √ 3/2 1 iy 2 3 e−iyx−iyc1 / c2 eiyc1 / c2 −y /2+c3 (iy) /(c2 3!)+··· dy = lim a √ N→∞ 2π −∞ c2 2 1 −x , (51) = √ exp 2 2π where · F denotes an average over a family F of L-functions, as in (1), and c˜1 and c˜2 are given by (19) and (23), respectively, again with the identification (46). If we now turn to the distribution of values of Lf ( 21 ) itself, WSp (x), we can close the contour of
∞ 1 WSp (x) = x −is a(is)MSp (N, is)ds (52) 2π x −∞ around the poles and obtain, as x → 0, the dominant contribution from the pole at s = (3i)/2: WSp (x) ∼ x 1/2 a(−3/2)2−3N
N 1 (1 + N + j )(j ) . (N ) (1/2 + j )(N + j − 1/2)
(53)
j =1
This is of particular note in the light of recent interest in the non-vanishing of the central values of L-functions, see for example [15, 6, 5] and references therein. Clearly (53) implies that as long as a(−3/2) is finite for a particular family of symplectic Lfunctions, the probability that the central value of those L-functions lies in the range (0, x) decreases like x 3/2 as x → 0.
Random Matrix Theory and L-Functions at s = 1/2
103
3. Orthogonal Symmetry 3.1. Random matrices in SO(2N). We now consider the characteristic polynomial of matrices from the group SO(2N ). Here the eigenphases also come in complex conjugate pairs, so Z(U, θ ) takes the form (3), and the average at the point θ = 0 is, once again using the joint probability density function for the eigenphases dictated by Haar measure [17],
Z(U, 0)s
SO(2N)
= NO
2π
0
×2Ns
2π
··· 0
N
×
π
π
···
0
2
1 (cos θj − cos θi ) 2
0
dθ1 · · · dθN
(cos θj − cos θi )2
1≤i<j ≤N
(1 − cos θn )s
n=1
= NO 22N−N ×
(1 − cos θn )s
2N−N 2 +Ns
N
1≤i<j ≤N
n=1
= NO 2
dθ1 · · · dθN
N
2 +Ns
1 −1
···
1 −1
dx1 · · · dxN
(xj − xi )2
1≤i<j ≤N
(1 − xn2 )−1/2 (1 − xn )s ,
(54)
n=1
with a normalization constant NO = 2
−N
N j =1
(N + j − 1) 22N −4N+1 = . (1 + j )((j − 1/2))2 πNN! 2
(55)
We use the Selberg integral again, this time with γ = 1, α = s + 1/2 and β = 1/2, obtaining
Z(U, 0)s
SO(2N)
= NO 22N−N ×
N−1 j =0
= NO 2
2 +Ns
· 2N
2 −N+sN +N/2+N/2−N
(2 + j )(s + 1/2 + j )(1/2 + j ) (2)(s + 1 + N + j − 1)
N+2Ns
N (1 + j )(s + j − 1/2)(j − 1/2) (s + N + j − 1)
j =1
= 22Ns
N j =1
(N + j − 1)(s + j − 1/2) (j − 1/2)(s + j + N − 1)
≡ MO (N, s).
(56)
104
J. P. Keating, N. C. Snaith
As for MSp (N, s), MO (N, s) is the generating function for the moments of the log of Z(U, 0), this time for the orthogonal ensemble, so if we write MO (N, s) = eq1 s+q2 s
2 /2+q s 3 /3!+q s 4 /4!+··· 3 4
,
(57)
then the parameters qj are the cumulants of the value distribution of log Z(U, 0). These cumulants can be obtained by taking derivatives of log MO (N, s) = 2N s log 2 +
N
(log (N + j − 1) + log (s + j − 1/2)
j =1
− log (j − 1/2) − log (s + j + N − 1)),
(58)
thus producing d log MO (N, s) s=0 ds N = 2N log 2 + (ψ(j − 1/2) − ψ(j + N − 1)),
q1 =
j =1
qn = =
dn
log MO (N, s)
ds n N
s=0
ψ (n−1) (j − 1/2) − ψ (n−1) (j + N − 1) ,
(59)
j =1
for n = 2, 3, 4, . . . . The asymptotic behaviour of these cumulants for large N may be recovered using the techniques of Sect. 2.1. Starting with q1 and using (16) and (17),
∞ −t −(j +N−1)t N ∞ e−t −e−(j −1/2)t e −e q1 = 2N log 2+ dt −γ − dt +γ 1−e−t 1−e−t 0 0 j =1
= 2N log 2 +
N
j =1 0
∞
e−(j +N−1)t − e−(j −1/2)t dt. 1 − e−t
(60)
At this point we interchange the sum and integral, evaluate the sum and integrate by parts, resulting in
∞ −Nt
∞ −2Nt e e q1 = 2N log 2 − (N − 1) dt + (2N − 1) dt −t 1−e 1 − e−t 0 0
∞ −(N+1/2)t
1 ∞ e−t/2 e − dt − (N − 1/2) dt −t 2 0 1−e 1 − e−t 0 = 2N log 2 + (N − 1)ψ(N ) − (2N − 1)ψ(2N ) 1 + ψ(1/2) + (N − 1/2)ψ(1/2 + N ) 2 1 γ 1 = − log N − + O . (61) 2 2 N
Random Matrix Theory and L-Functions at s = 1/2
105
The second cumulant is determined similarly (with the help of (20) and (21)) to be
q2 =
N 0
j =1
=
∞
∞
te−(j −1/2)t dt − 1 − e−t
te−t/2 (1 − e−Nt )
∞
te−(j +N−1)t dt 1 − e−t
0
dt −
∞
te−Nt (1 − e−Nt ) dt (1 − e−t )2
(1 − e−t )2 0
∞ e−t/2 1 ∞ te−t/2 = dt + dt 1 − e−t 2 0 1 − e−t 0
∞ −(N+1/2)t
∞ −(N+1/2)t e te − dt + (N − 1/2) dt −t 1 − e 1 − e−t 0 0
∞ −Nt
∞ e te−Nt − dt + (N − 1) dt 1 − e−t 1 − e−t 0 0
∞ −2Nt
∞ −2Nt e te + dt − (2N − 1) dt −t 1−e 1 − e−t 0 0 1 = −ψ(1/2) + ψ (1) (1/2) + ψ(N + 1/2) + (N − 1/2)ψ (1) (N + 1/2) 2 + ψ(N ) + (N − 1)ψ (1) (N ) − ψ(2N ) − (2N − 1)ψ (1) (2N ) 3 1 = log N + 1 + γ + log 2 + ζ (2) + O . (62) 2 N 0
Finally, all the higher cumulants converge asymptotically to a constant, lim qn = lim
N→∞
N→∞
×
(−1)n
∞ t n−1 e−t/2
0 ∞ t n−1 e−Nt
0
= (−1)n
1 − e−t
∞ 0
1 − e−Nt dt − (−1)n 1 − e−t 1 − e−Nt dt 1 − e−t 1 − e−t
t n−1 e−t/2 dt. (1 − e−t )2
(63)
Evaluating the integral in (63) by integrating by parts and then rewriting it as a pair of polygamma functions, 1 lim qn = −(n − 1)ψ (n−2) (−1/2) + ψ (n−1) (−1/2) 2 1 = (−1)n (n − 1)! (2n−1 − 1)ζ (n − 1) + (2n − 1)ζ (n) . 2
N→∞
(64)
It is thus clear from (57) and the asymptotic form of the cumulants that the leading order coefficient of the moments of Z(U, 0) is
106
J. P. Keating, N. C. Snaith
MO (N, s) fO (s) ≡ lim N→∞ N s 2 /2−s/2 2 γ 3 s = exp − s + 1 + γ + log 2 + ζ (2) 2 2 2 n ∞ 1 s n n−1 n n (−1) (2 − 1)ζ (n − 1) + (−1) (2 − 1)ζ (n) + . (65) 2 n n=3
Examining the product form of MO (N, s) we see that the coefficient is expected to have poles of order k at s = −(2k − 1)/2, for k = 1, 2, 3 . . . . Using (29) and (30), we see that, for |s| < 1/2, 1 1 1 log G(1 + 2s) + log (1 + 2s) − log (1 + s) 2 2 2 2 γ 3 s = − + 1 + γ + ζ (2) 2 2 2 n ∞ 1 s n n−1 n n + (−1) (2 , − 1)ζ (n − 1) + (−1) (2 − 1)ζ (n) 2 n
log G(1 + s) −
(66)
n=3
and comparing with (65) we thus find that fO (s) = 2s
2 /2
√ G(1 + s) (1 + 2s) ×√ , G(1 + 2s)(1 + s)
(67)
for |s| < 1/2, and hence by analytic continuation in the rest of the complex plane. This clearly has the correct combination of poles. This leading order coefficient reduces for integer moments, again using (28), to √ n (j ) (1 + 2n) j =1 2 fO (n) = 2n /2 2n j =1 (j ) (1 + n) √ n (2n)! j =1 (j − 1)! n2 /2 =2 2n−2 2n−3 2 3 · · · (2n − 2)2 (2n − 1)n! n √ √ 2n n! (2n − 1)!! j =1 (j − 1)! n2 /2 =2 2n−1 4n−2 · · · (2n − 2) 32n−3 52n−5 · · · (2n − 1)n! n √ 2n j =1 (j − 1)! n2 /2 =2 n−1 2 j =1 j 1n−1 2n−2 · · · (n − 1) 32n−4 52n−6 · · · (2n − 3)2 1n−1 2n−2 · · · (n − 2)2 (n − 1)2n/2 2n(n−1)/2 3n−2 5n−3 · · · (2n − 3)1n−1 2n−2 · · · (n − 1) −1 n−1 = 2n (2j − 1)!! .
= 2n
2 /2
j =1
This result was also obtained independently in [2].
(68)
Random Matrix Theory and L-Functions at s = 1/2
107
Once more, we can examine the value distribution of Z(U, 0) and its logarithm. The value distribution of log Z(U, 0)/ log N is δ(x − log Z(U, 0)/ log N ) SO(2N)
∞ 1 = e−iyx MO (N, iy/ log N )dy 2π −∞
∞ 1 1 exp −iyx + − log N + O(1) iy/ log N = 2π −∞ 2 − (log N + O(1))
(69)
y2 y3 −i(O(1)) + · · · dy, 2(log N )2 3!(log N )3
yielding the limiting distribution lim δ(x − log Z(U, 0)/ log N ) SO(2N)
N→∞
∞ 1 = e−iyx−iy/2 dy 2π −∞ = δ(x + 1/2).
(70)
This is a delta distribution as in the symplectic case, but this time centred at x = −1/2. Keeping the y 2 term in the exponent in the integral above leads to the central limit theorem: 2 1 q1 log Z(U, 0) x lim δ x + √ − = exp − . (71) √ N→∞ q2 q2 2π 2 SO(2N) 1
The value distribution of Z(U, 0) log N is similarly straightforward to compute. We see that
∞ 1 1 x −iy MO (N, iy/ log N )dy, (72) δ(x − Z(U, 0) log N ) SO(2N) = 2π x −∞ and so
∞ 1 e−iy log x e−iy/2 dy 2π x −∞ 1 = δ(log x + 1/2). x
1
lim δ(x − Z(U, 0) log N ) SO(2N) =
N→∞
We also examine the distribution simply of Z(U, 0), PO (N, x). As
∞ 1 x −iy MO (N, iy)dy, PO (N, x) = 2π x −∞
(73)
(74)
we can make the approximation q2 y 2 exp −iy log x + iq1 y − dy 2 −∞ −(log x − q1 )2 1 , exp = √ 2q2 x 2π q2
PO (N, x) ≈
1 2π x
∞
(75)
108
J. P. Keating, N. C. Snaith
P (x)
P (x)
(a)
(b)
0.6
0.6
0.5
0.5
exact
exact 0.4
0.4
asymptotic
0.3
asymptotic
0.3 0.2
0.2
0.1
0.1 x
x 2
0
4
6
2
0
8
4
6
8
Fig. 2. Distribution of the values of Z(U, 0) for matrices in the group SO(2N ), with (a) N = 6 and (b) N = 42. The solid curve is the exact distribution (74) and the dashed curve is the large N approximation in (75)
valid as N → ∞ when x is fixed (and, like (42), expected to be a good approximation when log x >> − log N and N is large). The result (75) is plotted in Fig. 2 for N = 6 and N = 42 along with the numerically calculated exact distribution, from (74). Unlike the symplectic case (and unlike the approximation (75)), PO (N, x) diverges as x → 0. This can be seen by considering the poles of the integrand, which occur at i/2, 3i/2, 5i/2, . . . . Once again it is the lowest pole, the simple one at i/2, that dominates the integral as x → 0. In this case we find that PO (N, x) ∼ x −1/2 2−N
N 1 (N + j − 1)(j ) , (N ) (j − 1/2)(j + N − 3/2)
(76)
j =1
in that limit. 3.2. L-functions with orthogonal symmetry. We now turn our attention to families of L-functions with a symmetry governed by an ensemble of orthogonal matrices. Lfunctions of this type fall into two categories, even and odd, which are related to the ensembles SO(2N ) and SO(2N + 1) respectively. Of the L-functions comprising an orthogonal family, approximately one half will have even symmetry, and the other half odd symmetry, these latter vanishing at s = 1/2. Examples of such families are given in [3]. Referring to (1), in the orthogonal case V (z) = z and B(k) = 21 k(k−1).As in the symplectic case, the first few of the coefficients gk with integer coefficients have been calculated. The known values are g1 = 1, g2 = 2, g3 = 23 and it is conjectured that g4 = 27 [3]. With N taking the place of log(QA ), we conjecture this time that lim
Q→∞
1 (log QA )
1 2 k(k−1)
Q∗
Lf
1 k 2
= a(k) × fO (k)/2.
(77)
f ∈F c(f ) ≤ Q
The right-hand side is divided by two because the random matrix average was just over SO(2N ), whereas the sum over central values of the L-functions contains an equal number of functions contributing zero to the average; namely the L-functions with odd
Random Matrix Theory and L-Functions at s = 1/2
109
symmetry about s = 1/2. Once again, we expect the relation (46) to follow from equating the density of zeros of the L-functions and the density of eigenphases of the matrices. Having posed the conjecture (77), we check it against the known values of gk . It is clear that the first four coefficients fO (1) = 2, fO (2) = 4, fO (3) = 83 and fO (4) = 16 45 satisfy conjecture (77); that is gk / (1 + 21 k(k − 1)) = fO (k)/2, for k = 1, 2, 3, 4. As for the symplectic case, we can examine what (77) implies about the value distributions of L-functions and their logarithms. Since the L-functions with odd symmetry are zero at s = 1/2, we now restrict ourselves to averages over the orthogonal L-functions with even symmetry. These are expected to satisfy (77) without the factor 1/2 on the right-hand side. The value distribution of log Lf 21 / log log QA for L-functions with even symmetry (defined by averaging as in (77)) is expected to be given, for large Q, and with the identification (46), by
∞ 1 VO (x) = e−ixs a(is/ log N )MO (N, is/ log N )ds, (78) 2π −∞ and following the argument laid out for the symplectic case, this converges to δ(x + 1/2) as N → ∞. We can once again state a conjectural central limit theorem, this time for averages over a family F of L-functions with c(f ) ≤ Q governed by the symmetry SO(2N ): " # log Lf 21 q˜1 lim δ x + − Q→∞ q˜2 q˜2 F
∞ √ √ 3/2 1 iy 2 3 e−iyx−iyq1 / q2 eiyq1 / q2 −y /2+q3 (iy) /(q2 3!)+··· dy = lim a √ N→∞ 2π −∞ q2 2 1 x , (79) = √ exp − 2 2π where q˜1 and q˜2 are related to (61) and (62), respectively, via (46). For the value distribution of Lf ( 21 ) itself, which the conjecture suggests for large Q is
∞ 1 WO (x) = x −is a(is)MO (N, is)ds, (80) 2π x −∞ we expect that near x = 0, WO (x) ∼ x −1/2 a(−1/2)2−N
N 1 (N + j − 1)(j ) ; (N ) (j − 1/2)(j + N − 3/2)
(81)
j =1
the contribution to the integral (80) from the simple pole at s = i/2. For L-functions with even symmetry from an orthogonal family for which a(−1/2) = 0, this analysis therefore suggests that the likelihood that the central value vanishes is integrably singular, and that the probability of a value in the range (0, x) vanishes as x 1/2 when x → 0. Acknowledgements. It is a pleasure to thank David Farmer and, especially, Brian Conrey for suggesting these calculations and for numerous helpful comments during the course of the research. We are grateful also to Peter Sarnak for stimulating discussions. NCS also owes a great deal of gratitude to NSERC and the CFUW in Canada for their generous funding.
110
J. P. Keating, N. C. Snaith
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.
Barnes, E.W.: The theory of the G-function. Q. J. Math. 31, 264–314 1900 Brézin, E. and Hikami, S.: Characteristic polynomials of random matrices. Preprint, 1999 Conrey, J.B. and Farmer, D.W.: Mean values of L-functions and symmetry. Preprint, 1999 Coram, M. and Diaconis, Persi: New tests of the correspondence between unitary eigenvalues and the zeros of Riemann’s zeta function. Preprint, 1999 Iwaniec, H., Luo, W. and Sarnak, P.: Low lying zeros of families of L-functions. Preprint, 1999 Iwaniec , H.and Sarnak, P.: The non-vanishing of central values of automorphic L-functions and Siegel’s zeros. Preprint, 1997 Jutila, M.: On the mean value of L(1/2, χ ) for real characters. Analysis, 1, 149–161 1981 Katz, N.M. and Sarnak, P.: Random Matrices, Frobenius Eigenvalues and Monodromy, Providence, R. I: AMS, 1999 Katz, N.M. and Sarnak, P.: Zeros of zeta functions and symmetry. Bull. Am. Math. Soc. 36, 1–26 1999 Keating, J.P. and Snaith, N.C.: Random matrix theory and ζ (1/2 + it). Commun. Math. Phys. 214, 57–89 (2000) Mehta, M.L.: Random Matrices. London: Academic Press, second edition, 1991 Rubinstein, M.: Evidence for a Spectral Interpretation of Zeros of L-functions. PhD thesis, Princeton University, 1998 Rudnick, Z. and Sarnak, P.: Principal L-functions and random matrix theory. Duke Math. J. 81 2, 269–322 1996 Rumely, R.: Numerical computations concerning ERH. Math. Comp. 61, 415–440 1993 Soundararajan, K.: Non-vanishing of quadratic Dirichlet L-functions at s = 21 . Preprint, 1999 Vignéras, M.-F.: L’equation fonctionnelle de la fonction zeta de Selberg du groupe modulaire P SL(2, Z). Asterisque 61, 235–249 1979 Weyl, H.: Classical Groups. Princeton, NJ: Princeton University Press, 1946
Communicated by P. Sarnak
Commun. Math. Phys. 214, 111 – 135 (2000)
Communications in
Mathematical Physics
© Springer-Verlag 2000
Characteristic Polynomials of Random Matrices Edouard Brézin1 , Shinobu Hikami2 1 Laboratoire de Physique Théorique de l’École Normale Supérieure, Unité Mixte de Recherche 8549 du
Centre National de la Recherche Scientifique et de l’École Normale Supérieure, 24 rue Lhomond, 75231 Paris Cedex 05, France. E-mail:
[email protected] 2 Department of Basic Sciences, University of Tokyo, Meguro-ku, Komaba 3-8-1, Tokyo 153, Japan. E-mail:
[email protected] Received: 1 October 1999 / Accepted: 18 May 2000
Abstract: Number theorists have studied extensively the connections between the distribution of zeros of the Riemann ζ -function, and of some generalizations, with the statistics of the eigenvalues of large random matrices. It is interesting to compare the average moments of these functions in an interval to their counterpart in random matrices, which are the expectation values of the characteristic polynomials of the matrix. It turns out that these expectation values are quite interesting. For instance, the moments of order 2K scale, for unitary invariant ensembles, as the density of eigenvalues raised to the power K 2 ; the prefactor turns out to be a universal number, i.e. it is independent of the specific probability distribution. An equivalent behaviour and prefactor had been found, as a conjecture, within number theory. The moments of the characteristic determinants of random matrices are computed here as limits, at coinciding points, of multi-point correlators of determinants. These correlators are in fact universal in Dyson’s scaling limit in which the difference between the points goes to zero, the size of the matrix goes to infinity, and their product remains finite. 1. Introduction The correlation function of the eigenvalues of large N × N matrices are known to exhibit a number of universal features in the large-N limit. For instance in the Dyson limit [1, 2], when the distances between these eigenvalues, measured in units of the local spacing, becomes of order 1/N , the correlation functions, as well as the level spacing distribution, become universal, i.e. independent of the specific probability measure. For finite differences, upon a smoothing of the distribution, the two-point correlation function is again universal [3,4]. The short distance universality was also shown to extend to external source problems [5–8], in which an external matrix is coupled to the random matrix. In this article, we study the average of the characteristic polynomials, whose zeros are the eigenvalues of the random matrix. The probability distribution of the characteristic
112
E. Brézin, S. Hikami
polynomial det(λ − X) of a random matrix X, a polynomial of degree N in λ, may be characterized by its moments det K (λ − X) , or better by its correlation functions K det(λl − X) . l=1
This study is motivated by various conjectures which appeared recently in number theory for the zeros of the Riemann ζ -function and its generalizations known as Lfunctions [12]. Indeed the characteristic polynomials, as well as the zeta-fuctions, have their zeros on a straight line, and these zeros obey the same statistical distribution. For the 2K th moment of the Riemann ζ -function (K is a positive integer), it has been conjectured [9, 10] that 1 T
0
T
dt|ζ
1 2 + it |2K γK aK (log T )K , 2
(1)
where aK is a number related to the Dirichlet coefficient (the divisor function) dK (n), and γK =
K−1 l=0
l! . (l + K)!
(2)
The explicit formula for aK is given in the Appendix, together with summation formulae for the Dirichlet coefficients, which are related to (1). In this work we shall compute the equivalent of (1) for random matrices, show that the density of states ρ(λ) replaces log T , and that the same number γK is universally present. For the negative moments, similar conjectures have been proposed, with a cut-off parameter δ for avoiding divergences [11], and we show here how to obtain these negative moments for random matrices. Several types of L-functions have been introduced [12], which correspond to the three standard classes of random matrices. The conjecture for the average of the moments (1) has been extended to these L-functions [13]. The average is taken as a sum of the discriminant d, for instance, for the Dirichlet L( 21 , χd ) function. The relations between the distributions of the eigenvalues of the random matrix theory and the statistical distribution of the zeros of the various L-functions has also been studied [12, 14]. Our aim in this article, is to clarify the universality of the moments of the characteristic polynomials for these three classes. The circular unitary ensemble has been studied earlier by Keating and Snaith [10], who did obtain the γK in (2) from their calculation. However this ensemble has a constant density of states, and furthermore it does not allow to study the universality of these properties. In this work we have considered a Gaussian ensemble and non-Gaussian extensions, instead of the circular ensemble, to verify both the explicit dependence in the density of states and the universality of the coefficient γK . In theprocess of the derivation, we have found it necessary to start with the K-point K functions det(λl − X) , which are shown to be themselves universal in the large-N l=1
Dyson limit, in which N (λi − λj ) is held fixed. The moments are then simply the limit of these functions when all the Dyson variables vanish.
Characteristic Polynomials of Random Matrices
113
2. Correlation Functions of Characteristic Polynomials We consider random M × M Hermitian matrices X with a normalized probability distribution 1 exp −N TrV(X), Z
P (X) =
(3)
in which V is a given polynomial. It will turn out to be convenient to distinguish here M and N , but we later restrict ourselves to a large N and large M limit, with limM/N = 1. Let us consider the correlation function of K distinct characteristic polynomials: K FK (λ1 , · · · , λK ) = det(λα − X) , (4) α=1
in which the bracket denotes an expectation value with the weight (3). Integrating as usual over the unitary group, we obtain FK (λ1 , · · · , λK ) =
1 ZM
M
dµ(xi )
2
(x1 , · · · , xM )
M K
(λα − xi )
(5)
α=1 i=1
1
in which dµ(x) denotes the measure dµ(x) = dx exp −N V (x), the Vandermonde (xi − xj ), and ZM the normalization constant determinant (x1 , · · · , xM ) = 1≤i<j ≤M
ZM =
M
dµ(xi )
2
(x1 , · · · , xM ).
(6)
(x1 , · · · , xM ; λ1 , · · · , λK ) , (λ1 , · · · , λK )
(7)
1
We now use the obvious identity (x1 , · · · , xM )
M K α=1 i=1
(λα − xi ) =
and represent the Vandermonde determinants (x1 , · · · , xM ) and (x1 , · · · , xM ; λ1 , · · · , λK ) as determinants of arbitrary polynomials whose coefficients of highest degree are equal to unity (the so-called monic polynomials) pn (x) = x n + lowerdegree.
(8)
Then (x1 , · · · , xM ) = det pn (xm )
(9)
(n runs from zero to M − 1 and m from one to M), and (x1 , · · · , xM ; λ1 , · · · , λK ) = det pa (ub )
(10)
in which a runs from zero to M + K − 1, b from one to M + K and ub stands for xb if b ≤ M, or λb for M < b ≤ M + K.
114
E. Brézin, S. Hikami
Choosing now the polynomials orthogonal with respect to the measure dµ: pn (x)pm (x)dµ(x) = hn δnm ,
(11)
we may easily integrate over the M eigenvalues M
dµ(xi )
(x1 , · · · , xM ; λ1 , · · · , λK ) (x1 , · · · , xM )
1
= M!
M−1
hn det pα (λβ ),
(12)
0
in which α runs from M to M + K − 1 and β from 1 to K. Similarly the normalization factor ZM is given by ZM =
M 1
dµ(xi )
2
(x1 , · · · , xM ) = M!
M−1
hn .
(13)
0
We thus end up with FK (λ1 , · · · , λK ) =
pM (λ1 ) pM+1 (λ1 ) · · · pM+K−1 (λ1 ) pM (λ2 ) pM+1 (λ2 ) · · · pM+K−1 (λ2 ) 1 . det .. (λ1 , · · · , λK ) . p (λ ) p (λ ) · · · p (λ ) M K M+1 K M+K−1 K
(14)
If we are concerned simply with the moments of the distribution of a single characteristic polynomial, we obtain from (14), µK (λ) = FK (λ, · · · , λ) = [det(λ − X)]K pM (λ) pM+1 (λ) · · · pM+K−1 (λ) pM (λ) pM+1 (λ) · · · pM+K−1 (λ) (−1)K(K−1)/2 .. = det . K−1 . (l!) l=0 (K−1) (K−1) (K−1) p (λ) p (λ) · · · p (λ) M M+1 M+K−1
(15)
These expressions are all exact, but in the next section we shall be concerned with the large N limit. Then (i) the interesting case is that of even K, since for odd K the result is oscillatory ( for instance for K = 1 µ1 (λ) = pM (λ) ), (ii) it will turn out that, even if we are interested simply in the moments µK (λ), it is more convenient to study first the large N -limit of the FK with distinct λi and afterwards let them approach a single λ. The results that will be derived later for those FK ’s and µK ’s will be shown to be universal in the Dyson limit, in which N goes to infinity, the λi − λj goes to zero for any pair i, j , and the products N (λi − λj ) remain finite. We first derive explicit formulae for the Gaussian case, and show later that they do apply to any random matrix distribution P (X) of the form (3).
Characteristic Polynomials of Random Matrices
115
3. The Gaussian Case We now specialize the result (14) of the previous section to the Gaussian distribution of M × M Hermitian matrices P (X) =
1 N exp − TrX2 , ZM 2
(16)
with M = N − K,
(17)
(the reason for this choice of M will be clarified in the next section). Then the polynomials that we have introduced, are Hermite polynomials, and with our normalizations, (−1)n Nx 2 /2 d n −Nx 2 /2 Hn (x) = e e = x n + l.d., (18) Nn dx and n! hn = n N
2π . N
(19)
The integral representation (−1)n n! Hn (x) = Nn
dz e−N(z /2+xz) 2iπ z(n+1) 2
(20)
over a contour which circles around the origin in the z-plane, turns out to be well adapted. Repeated use of this formula in the result (14) yields F2K (λ1 , · · · , λ2K ) =
2K−1 (−1)K l=0 (M + l)! (λ1 , · · · , λ2K ) N K(2M+2K−1)
2K
2K z2 dzl l exp − N × det(e−Nλa zb ). (21) 2 2iπ zlM+l l=1
1
We can expand the determinant in the r.h.s. and keep only one of the (2K)! terms, antisymmetrizing instead the integration variables zl . This gives 2K−1 (−1)K l=0 (M + l)! F2K (λ1 , · · · , λ2K ) = (λ1 , · · · , λ2K ) N K(2M+2K−1)
2K 2K zl2 dzl exp [−N × ( + λl zl )] (z1 , · · · , z2K ). (22) 2 2iπ zlM+2K 1 l=1 This expression for the expectation value of a product of 2K characteristic polynomials, as an integral over 2K complex variables, is exact for finite N and M. We are now in position to study the large N -limit through a saddle point integration 1 over each zl . Since we have chosen M +K = N each z has a weight K exp −N (z2 /2+ z
116
E. Brézin, S. Hikami
λz+log z), which presents two saddle points z± , solutions of the equation z2 +λz+1 = 0, i.e. with the parametrization λ = 2 sin φ,
(23)
when λ lies on the support of the asymptotic Wigner semi-circle of the density of levels, z+ = ieiφ , z− = −ie−iφ .
(24)
Therefore there are, a priori 22K saddle-points at which the moduli of the weight 2K 2K zl2 z2 exp [−N ( + λl zl + log zl )] are the same. However, it is only when ( l + 2 2 1 1 λl zl + log zl ) is real (in the Dyson limit in which the differences between the λ’s are small), that theoscillations, which damp the result, are not present. Therefore we keep 2K only the saddle-points in which we take K solutions of type z+ and K of type K z− . We are now interested in Dyson’s short-distance limit. Defining λ=
2K 1 λl , 2K
(25)
l=1
and the density of eigenvalues at this point 1 1 4 − λ2 = cos φ, ρ(λ) = 2π π
(26)
we introduced the scaling variables xa = 2π Nρ(λ)(λa − λ), with
2K
xa = 0,
(27)
a=1
which are kept finite in this limit. Then the fluctuations around a saddle-point may be taken all at the point λ, and they yield a factor (
2π K 2 2 −K/2 )(1 − z− )] = (Nρ(λ))−K . ) [(1 − z+ N
(28)
We must now take into account the various factors in (22) at these saddle-points. In 2K the Dyson limit the factor zlK which remained in the denominator, may be re1
placed by one, since at a given λ one has z+ z− = 1. The only delicate factor is thus 2K z2 (z1 , · · · , z2K ) exp [−N ( l + λl zl + log zl )], which we must first compute at one (λ1 , · · · , λ2K ) 2 1 2K of the saddle-points, and then take the sum over the saddle-points. We consider K first the saddle-point zl (λl ) = z+ (λl ) l = 1, · · · , K, zl (λl ) = z− (λl ) l = K + 1, · · · , 2K.
(29)
Characteristic Polynomials of Random Matrices
117
2K zl2 one If we expand in the Dyson limit the weight exp −N + λl zl + log zl 2 1 finds
2K zl2 exp −N + λl zl + log zl 2 1 K 2K λ2 = exp N K 1 + × exp −N (λl − λ)z+ (λ) + (λl − λ)z− (λ) , (30) 2
K+1
1
d 1 2 z (λ) + λz± + log z± = z± ). Therefore at that saddle-point, in dλ 2 ± terms of the scaling variables (27),
2K K zl2 λ2 exp −N = exp N K 1 + xl . (31) + λl zl + log zl exp −i 2 2 (we have used
1
1
Let us consider now the ratio of Vandermonde determinants at that same saddle-point: (z1 , · · · , z2K ) = (λ1 , · · · , λ2K )
1≤l<m≤K
z+ (λl ) − z+ (λm ) λl − λ m
×
1≤l≤K,K+1≤m≤2K
K+1≤l<m≤2K
z+ (λl ) − z− (λm ) . λl − λ m
In the scaling limit, this factor becomes dz+ dz− K(K−1)/2 (z1 , · · · , z2K ) 2 = (2i cos φ)K (λ1 , · · · , λ2K ) dλ dλ = (N i)
K2
(2πρ(λ))
z− (λl ) − z− (λm ) λl − λ m
K+K 2
(32)
1≤l≤K,K+1≤m≤2K
1≤l≤K,K+1≤m≤2K
1 . xl − x m
1 λl − λ m (33)
Leaving aside for the moment the overall factors which do not change at the various saddle-points, we note the result from this particular one which is K 1 2K exp −i xl , and consider summing over the saddleK xl − x m 1
1≤l≤K,K+1≤m≤2K
points. The sum is best done under the form of an integral over K variables. Indeed, if we consider
K K 2 (u , · · · , u ) (−1)K(K−1)/2 duα 1 K I (x1 , · · · , x2K ) = exp −i uα K 2K K! 2iπ α=1 l=1 (uα − xl ) 1 l=1 (34) over a contour in which each uα circles around the x’s, we recover exactly the contribution previous saddle-point by choosing u1 = x1 , · · · , uK = xK , or any permutation of those K x’s. In view of the Vandermonde in the numerator, all the u’s have to be different, and
118
E. Brézin, S. Hikami
2K poles to be added, which reconstruct exactly the sum on K the saddle-points that we needed to perform. Collecting the various factors that came on the way, we end up with the final formula thus there are indeed
2K N 2 exp(−N K) V (λl ) F2K (λ1 , · · · , λ2K ) = (2π Nρ(λ))K 2 K! l=1
K K 2 (u , · · · , u ) duα 1 K × uα K 2K exp −i . 2π (u − xl ) α=1 l=1 α 1 α=1
exp −
If we specialize to K = 1 one finds sin x N exp − (V (λ1 ) + V (λ2 )) F2 (λ1 , λ2 ) = e−N (2π Nρ(λ)) 2 x
(35)
(36)
with x = π Nρ(λ)(λ1 − λ2 ), in which we recover the well-known Dyson kernel, which characterizes the correlation between eigenvalues, whose universality has been very 2 much discussed over recent years. Note the dependence in (Nρ(λ))K of this function. −N This K=1 result (36) is indeed equal to (2π e )K(λ1 , λ2 ), where the kernel K(λ1 , λ2 ) is K(λ1 , λ2 ) =
sin[π Nρ(λ)(λ1 − λ2 )] . π(λ1 − λ2 )
(37)
(In the next section we return to the normalizations. It will be explained how the extrafactor 2πe−N is cancelled by the normalization constant hN−1 .) We can now specialize this formula to the moments of the distribution of the characteristic polynomial, by letting all the λ’s approach each other, i.e. letting the x’s vanish. Before we do that, we should point out that the procedure to obtain these moments is in fact subtle. In principle we could have set all the λ’s equal at an early stage of the calculation. If we returned for instance to (21) we might have replaced the limit of det(e−Nλa zb ) by (z1 , · · · , z2K ) (up to a factor), but then the saddle-point method to (λ1 , · · · , λ2K ) obtain the large N -limit becomes quite problematic. Indeed the Vandermonde of the z s at the saddle-point vanishes and it is necessary to go far beyond the Gaussian integration. However it is now straightforward to obtain this moment from (35). We obtain exp −(N KV (λ))F2K (λ, · · · , λ) K K duα 2 exp(−N K) = (2π Nρ(λ))K exp −i uα K! 2π 1
α=1
2 (u
1 , · · · , uK ) . K 2K α=1 uα
(38)
Expanding the Vandermonde determinant into a sum over permutations, we find 2 K K duα (u1 , · · · , uK ) uα = (−1)K(K−1)/2 exp −i K 2K 2π u α=1 α 1 α=1 1 1 × (−1)(P +Q) ··· , (39) (2K − P0 − Q0 − 1)! (2K − PK−1 − QK−1 − 1)! P ,Q
Characteristic Polynomials of Random Matrices
119
in which P and Q are permutations of the integers (0, · · · , K − 1). Therefore K K duα uα exp −i 2π
2 (u
1 , · · · , uK ) K 2K α=1 uα
α=1
1
= (−1)K(K−1)/2 K!
K−1 1 l! = K! , 0≤i,j ≤K−1 (2K − i − j − 1)! (K + l)!
det
(40)
0
and thus finally exp −(N KV (λ))F2K (λ, · · · , λ) = (2π Nρ(λ))K e−NK 2
K−1 0
l! . (K + l)!
(41)
4. Normalizations and Universality We have studied in the previous section a Gaussian ensemble of random matrices and 2 found that the result (41) for the moment involved (2π Nρ(λ))K times a number and one would like to see how general this result is, as far as the dependence in the density of states is concerned as well as for the normalization. We shall see that this behaviour is quite general, and given a proper normalization, that the prefactor is also universal. Indeed let us recall how the K-point correlation function of the eigenvalues are defined in an ensemble of hermitian N × N matrices X with a probability weight proportional to exp −NTrV (X). In [1] one finds N N! 1 RK (λ1 , · · · , λK ) = dλ(K+1) · · · dλN exp −N V (λl ) (N − K)! ZN 1
×
2
(λ1 , · · · λN ).
(42)
Comparing with our initial definitions (5) we see that one has the relation K ZN−K N! 2 RK (λ1 , · · · , λK ) = exp −N V (λl ) (λ1 , · · · λK ) (N − K)! ZN 1
×F2K (λ1 , λ1 , · · · , λK , λK );
(43)
the r.h.s. reduces, up to a normalization, to our previous product of characteristic functions of matrices (N − K) × (N − K), each one beeing repeated twice. On the other hand it is well known ([1]) that this K-point function may be expressed in terms of a kernel KN as RK (λ1 , · · · , λK ) =
det
1≤i,j ≤K
KN (λi , λj ),
(44)
and without entering into the precise definition of KN in terms of orthogonal polynoKN (λ, µ) mials, one should simply recall that is universal in the Dyson limit ([4]) ρ((λ + µ)/2)) (λ − µ goes to zero, N goes to infinity, N (λ − µ) finite), i.e. it is independent of the polynomial V which defines the probability measure.
120
E. Brézin, S. Hikami
Therefore we define a modified weight, and modified moments, 12K (λ1 , λ2 , · · · , λ2K ) N! ZN−K = (N − K)! ZN
2K N exp − V (λl ) F2K (λ1 , λ2 , · · · , λ2K ) 2
(45)
1
and M2K (λ) =
N! ZN−K {exp −N KV (λ)} F2K (λ, λ, · · · , λ). (N − K)! ZN
(46)
The universality of level correlations implies the universality of M2K . Therefore we have to return to the Gaussian case, in order to take into account this new normalization, and then the result will be universal. From (13) we have N! ZN−K 1 = N−1 , (N − K)! ZN N−K hn
(47)
and, given the explicit expression (19) of hn for the Gaussian case, we find, in the large N limit, N! ZN−K = (2π )−K eNK . (N − K)! ZN
(48)
With this normalization the universal moment M2K (λ) is given by M2K (λ) = (2π )−K (2π Nρ(λ))K
2
K−1 0
l! . (K + l)!
(49)
In fact this connection between the usual correlation functions and the expectation values of a product of characteristic functions, (43) and (44), allows one to recover directly the moment M2K (λ), by using the universal expression for the kernel K(λi , λj ) in the Dyson limit, K(λi , λj ) =
sin[π Nρ(λi − λj )] . π(λi − λj )
(50)
The integral representation, over 2K variables describing contours around the K poles λl , det 1≤i<j ≤K K(λi , λj ) 2 (λ , · · · , λ ) 1 K K K K 1 dul dvl (u1 , · · · , uK ) (v1 , · · · , vK ) = K(ui , vi ) K K K! 2πi 2πi i=1 j =1 (ui − λj )(vi − λj ) 1
1
i=1
(51)
Characteristic Polynomials of Random Matrices
121
allows one to write easily the limit in which all the λ’s are equal: det 1≤i<j ≤K K(λi , λj ) 2 (λ , · · · , λ ) 1 K K K 1 dul dvl = K! 2πi 2π i
lim
1
1
K
(u1 , · · · , uK ) (v1 , · · · , vK ) × K(ui , vi ). K K K i=1 (ui − λ) (vi − λ) i=1
(52)
Since the kernel is a Toeplitz matrix, i.e. K(λi , λj ) = K(λi − λj ), one can shift the u’s and the v’s of λ and the r.h.s. becomes independent of λ. In the case of the sine kernel we obtain, in the limit in which all the λ’s are equal, 1 K!
K K dxl dvl 2πi 2π i 1
=
1
2 K−1 (2πρN)K
(2π)K
l=0
K
(v1 , · · · , vK ) (x1 , · · · , xK ) sin(π Nρxi ) K K K π xi i=1 [(vi + xi ) vi ] i=1
l! . (l + K)!
(53)
We have indeed recovered, for any function V defining the probability distribution, the universal moment (49) 5. Large N Asymptotics Rather than starting, as in the previous sections, with an exact expression for the correlation functions of characteristic functions, and at the end letting N go to infinity, we may use a different method to investigate directly the large N limit for the moments of their distribution. This method applies for a general probability distribution of the form (3) and it may also be used to the more general case of an external matrix source coupled to the matrix X [7] in this distribution. It turns out that here again it is neccessary to consider first F2K for different λj ’s, and let all the λj ’s approach the same λ at the end of the calculation. From (5), we have ∂ ln F2K = MGλ (λi ), ∂λi where Gλ (λi ) is the resolvent, Gλ (λi ) =
1 1 Tr . M λi − X
(54)
(55)
The bracket here denotes an expectation value with a weight which includes both P (X) 2K and det(λl − X). We assume that the asymptotic spectrum of the eigenvalues xi of 1
X fill a single interval [α, β] in the large M limit. (It is sufficient to consider the single cut case, since we are interested in Dyson short distance universality, which involves only the local statistics). Therefore Gλ (z) is also analytic in a plane cut from the interval [α, β], and ˆ λ (x) ∓ iπρλ (x), Gλ (x ± i0) = G
(56)
122
E. Brézin, S. Hikami
ˆ λ (x) = [Gλ (x + i0) + Gλ (x − i0)]/2. The saddle point equation in the large where G M limit becomes 2MGλ (z) − N V (z) +
2K j =1
1 = 0. z − λj
(57)
The last term of (57) is of relative order 1/N and thus we have to solve this Riemannˆ Hilbert problem to this order. At leading order, we have 2G(x) = V (x), and up to order 1/N, 1 Gλ (z) = G(z) + N
CG (z) +
2K
Cλi (z) .
(58)
i=1
ˆ and Cˆ λi (x) = From the saddle point equation (57), we have Cˆ G (x) = (N − M)G(x) 1 . We now set M = N − K. The functions CG (z) and Cλi (z) are uniquely 2(λi − x) determined from their analyticity in a plane cut from α to β, and their fall-off as 1/z2 for large z (since both Gλ (z) and G(z) behave as 1/z at infinity). The result is K , (z − α)(z − β) √ √ 1 (z − α)(z − β) − (λi − α)(λi − β) 1 . (59) Cλi (z) = √ 1− 2 (z − α)(z − β) z − λi CG (z) = KG(z) − √
These expressions lead to 1 d log (λi − α)(λi − β) 2 dλi
2K (λj − α)(λj − β) 1 1− . λi − λ j (λi − α)(λi − β)
(N − K)Gλ (λi ) = N G(λi ) − −
1 2
(60)
j =1,j =i
Since there is a branch cut between α and β, one must specify whether λi approaches the real axis from above or from below. The sign of the square root on both sides of the cut will be denoted 6i . There are then a priori 22K saddle points corresponding to the different choices of 6i . For each choice of the 6i ’s, we have ∂ 1 d log F˜6 = 6i N iπρ(λi ) + − log (λi − α)(β − λi ) ∂λi 2 dλi
2K 6j (λj − α)(β − λj ) 1 1 , − 1− √ 2 λ − λj 6i (λi − α)(β − λi ) j =1,j =i i
(61)
where F˜6 means the value of F2K for given 6j ’s multiplied by a factor exp(− N2 V (λi )). Introducing the parametrization φ(x), defined by x = 21 (α + β) − 21 (β − α) cos φ(x)
Characteristic Polynomials of Random Matrices
and 21 (β − α) sin φ(x) =
123
√ (x − α)(β − x), we have
6i φ(λi ) − 6j φ(λj ) d log sin dλi 2 √ 6i (λi − α)(β − λi ) + 6j (λj − α)(β − λj ) 6i 1 = √ . 2 (λi − α)(β − λi ) λi − λ j Thus we obtain F˜6 by integration, F˜6 = C6
2K sin
6i φ(λi )−6j φ(λj ) 2
λi − λ j
i<j
×
2K
exp 6i iN π
λi
λ0
i=1
(62)
2K
1 √ sin φ(λi ) i=1
ρ(x)dx .
(63)
We have to sum over all the saddle-point contributions, i.e. sum over all the different choices of 6j ’s. We focus now on the Dyson limit in which thedifferences λi − λj are all 2K 2K of order 1/N. Among the 2 possibilities, we retain only the solutions in which K half of K among 6l are positive, and the remaining halves are negative. Otherwise, the exponential factor in the final result gives very rapid oscillations in the large N limit. This situation is thus exactlysimilar to that of the previous section. 2K Again the sum over the saddle-points is conveniently written as a contour K integral 2K K 2 (u − u ) 1 du · · · du du n m 1 2 K n<m F˜ = λj − 2 un . ··· cos K 2K K! (2πi)K (u − λ ) n j n=1 j =1 n=1
j =1
(64) When we set all the λj = λ, this becomes 1 F˜ = K!
dui (2π i)K
i<j (ui − uj ) K 2K i=1 ui
2
cos 2
K
un π Nρ
(65)
n=1
and we recover the result (38). However in this method, since we re-integrated the logarithmic derivative of F2K , the constant of integration remains undetermined. We may fix this constant by the same requirement that we have used in the previous section, and the final result agrees then with the previous calculation. 6. Symplectic Group Sp(N ) We have studied up to now unitary invariant measures, characterized for the probability law of the eigenvalues by the factor | (x1 , · · · , xM )|β . We could also consider the Gaussian orthogonal ensemble (GOE, with β = 1) or Gaussian symplectic (GSE, with β = 4). If we took the GOE for instance, we could immediately relate the correlation
124
E. Brézin, S. Hikami
functions of characteristic determinants, to the correlations of the eigenvalues, as in (43) (except that since β is one no doubling of the λ’s is needed), and therefore relate the moments universality to the Dyson universal limit. Remaining still with the unitary β = 2 class, in Cartan’s classification of symmetric spaces, we find ensembles which are invariant under Sp(N ) or O(N ). One of the physical applications of random Sp(N ) matrices, is the statistics of the energy levels inside a superconductor vortex [8]. In number theory, it is known that some generalizations of Riemann’s ζ -functions, such as the Dirichlet L-function L(s, χd ), where χd is a quadratic Dirichlet character of mod |d|, present a spectrum of low lying zeros on the line Re s = 1/2, which agrees with the statistics of the eigenvalues of the Sp(N) random matrix theory [12, 14]. In this Sp(N) invariant symmetric space, the eigenvalues appear always in pairs of positive and negative real numbers. Due to this fact, a new universality class governs the correlations of the eigenvalues near the origin, i.e. near s = 1/2 (whereas in the bulk one recovers the previous unitary class). Therefore we study now the new universality class, which governs the new scaling near the origin. We thus consider random Hermitian matrices X, which are 2M × 2M and satisfy the condition X T J + J X = 0, where J is
(66)
0 1M J = . −1M 0
(67)
The unitary symplectic group is the subgroup of SU (2M) consisting of 2M ×2M unitary matrices, satisfying the symplectic constraint U T = −J U † J.
(68)
The integration over this unitary symplectic group for FK (λ1 , · · · , λK ) gives [8] K det(λα − X) FK (λ1 , · · · , λK ) = α=1
1 = ZM
M
dµ(xi )
2
2 (x12 , · · · , xM )
1
M i=1
xi2
M K α=1 i=1
(λ2α − xi2 ). (69)
Repeating the analysis of Sect. 2, FK (λ1 , · · · , λK ) is given again by a determinantal form as (14). Changing x1 to xi2 = yi and denoting µi = λ2i , we have ∞ K K K K 1 dyi yi2 (yi − yj )2 (µα − yi )e−N yi . FK (µ1 , · · · , µK ) = 0
i=1
i=1
i<j
α=1 i=1
(70) The orthogonal monic polynomials for this measure are the Laguerre polynomials (1)
Ln2 (y), which is defined by (−1)n eNy d n n+ 1 −Ny (1) Ln2 (y) = √ ( ) (y 2 e ) y N n dy 1 (−1)n du (1 + u)n+ 2 −Nuy = n! e , Nn 2π i un+1
(71)
Characteristic Polynomials of Random Matrices
125
(1)
normalized as required to Ln2 (y) = y n + lowerdegree. The orthogonality condition is ∞ √ (1) (1) dye−Ny yLn2 (y)Lm2 (y) = hn δn,m (72) 0
3
with hn = n! 0. Since 4 ∈ S with the property that div 4(x)dx = 0, Rn
it follows from Stein [13, Chap. IV, 4.3.3] that gL∞ ≤
N k=−N
≤
N
(div 4)5k ∗ f L∞ sup (div 4)t ∗ f L∞
k=−N t>0
≤ CN f BMO , which yields (2.14).
2.2. Proof of Theorem 2. It is proved by Kato–Lai [7] and Kato–Ponce [8] that for the given initial data a ∈ W s,p for s > 1 + n/p, the time interval T of the existence of the solution u to (E) in the class (1.4) depends only on aW s,p . Hence by the standard argument of continuation of local solutions, it suffices to establish an apriori estimate for u in W s,p in terms of a, T , M0 or a, T , M1 according to (1.5) or (1.6). Indeed, we shall show that the solution u(t) in the class (1.4) is subject to the following estimate:
Sobolev Inequality in BMO and Euler Equations
199
sup u(t)W s,p ≤ (aW s,p + e)αj exp(CT αj )
0 Re w, evaluate these operators products, and analytically continue the result. The above expression (2.27) thus becomes,
∞
z
z du S(u) · V1 (z) · V2 (w) dx V1 (z) · S (x) · V2 (w) w
z
∞ + du V1 (z) · S(u) · V2 (w) dx S (x) · V1 (z) · V2 (w) w z
∞
∞ + (−1)α1 du S(u) · V1 (z) · V2 (w) dx S (x) · V1 (z) · V2 (w) z z
∞
∞ + (−1)α1 du S(u) · V1 (z) · V2 (w) dx S (x) · V1 (z) · V2 (w), z
z
(2.28)
where (−1)α1 and (−1)α1 are the monodromies coming from
V1 (z) S (x) = (−1)α1 S (x) V1 (z),
V1 (z) S(x) = (−1)α1 S(x) V1 (z), √1
ϕ
√1
(2.29)
ϕ
(where Re x > Re z). When V1 is either e 2p or γ e 2p , as prescribed in (2.23), the monodromies are (−1)α1 = e−πi/p and (−1)α1 = e−πi/p = −eπi/p , and (2.27) becomes, (SV1 (z) V1 (z) + V1 (z) S V1 (z)) · (SV2 (w) V2 (w) + V2 (w) S V2 (w))
∞
∞ = −2i sin πp du S(u) V1 (z) V2 (w) dx S (x) V1 (z) V2 (w) z z
∞
z + du S(u) V1 (z) V2 (w) dx V1 (z) S (x) V2 (w) z w
z
∞ + du V1 (z) S(u) V2 (w) dx S (x) V1 (z) V2 (w). (2.30) w
z
Vertex Operator Extensions of Dual Affine s(2) Algebras
505
Appendix D is devoted to explicit checks of the s(2|1) operator product expansions in the Wakimoto representation. In addition to establishing the operator product expansions, the Wakimoto bosonisation is a useful tool to verify the relation between the energymomentum tensors underlying the central charge balance in Eq. (1.5). Lemma 2.2. The combined energy-momentum tensors of the s(2|1) algebra generated by (2.12)–(2.16) and the independent u(1) current A(z), Eq. (2.17), equals the sum of the energy-momentum tensors for two s(2) and the free u(1) current ∂f (z): TSug + 21 AA = 21 ∂ϕ∂ϕ − √12p ∂ 2 ϕ − β∂γ + 21 ∂ϕ ∂ϕ − √1 ∂ 2 ϕ − β ∂γ + 21 ∂f ∂f. 2p p−2 s(2)
p −2 s(2)
(2.31) Proof. To show this, we directly calculate the Sugawara energy-momentum tensor given by Eq. (B.5). First, the s(2) subalgebra contribution is given by, 1 p−1
E 12 F 12 + H − H − = −β∂γ +
1 p−1
∂βγ +
p 2(p−1)
∂ϕ∂ϕ
(2.32)
(which, obviously, is not the s(2) energy-momentum tensor). Second, the fermionic contribution can be read off Appendix D, 1 p−1
E1F 1 − E2F 2 =
+ 21 ∂ϕ ∂ϕ + √ 1 ∂β γ + 2β γ ∂f − − β ∂γ − p−1 1 2(1−p) ∂ϕ∂ϕ
p ∂ϕ ∂f −
p−2 2(p−1) ∂f ∂f
− √1 ∂ 2 ϕ ,
(2.33)
(p−2)2 H + H + = p (p − 2)∂ϕ ∂f − 2(p−1) ∂f ∂f − p2 ∂ϕ ∂ϕ √ + 2(p − 2)β γ ∂f + (p − 1)β ∂γ − (p − 1)∂β γ − 2p(p − 1)β γ ∂ϕ − (p − 1)β β γ γ .
(2.34)
√1 ∂ 2 ϕ 2p
2p
and, in addition, we have −
1 p−1
Finally, the free scalar current A has the energy-momentum tensor 1 2 AA
−
√
=
p 2 ∂ϕ ∂ϕ −
p(p − 1) ∂ϕ ∂f + 21 (p − 1)∂f ∂f +
2p(p − 1)β γ ∂ϕ
2(p − 1)β γ ∂f + (p − 1)β β γ γ − (p − 1)β ∂γ + (p − 1)∂β γ .
Adding all this together, we arrive at (2.31).
(2.35)
k )k Algebra 3. Extending s(2|1) k to the D(2|1; So far, we have seen that the appropriate bilinear combinations of vertex operators extend k ⊕ s(2) k ⊕ the s(2) u(1) algebra to s(2|1) k . We now show that this is in fact part of a larger construction, that of the exceptional affine Lie superalgebra D(2|1; α).
506
P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina
3.1. The “auxiliary” level-1 s(2) algebra. We first use the auxiliary scalar f to con struct an s(2) algebra of level 1, j + (z) = e
√
2f (z)
,
j 0 (z) =
√1 2
∂f (z),
√ 2f (z)
j − (z) = e−
.
(3.1)
Then the two Cartan currents of s(2|1) u(1) current become p−2 and the extra H − = J 0,
(3.2) 0
p 0 p J − (p − 2) j , A = 2(p − 1)(J 0 − j 0 ),
H+ =
(3.3) (3.4)
p −2 and J 0 is that of s(2) p−2 . We where, as before, J 0 is the Cartan current of s(2) thus clearly see the interplay of three s(2) algebras, two at dual levels k = p − 2, k = p − 2, and one at level 1, in our construction of s(2|1). ⊕ s(2) ⊕ s(2) In the next subsection, we discuss how to embed s(2) and s(2|1) ⊕ u(1) in the affine superalgebra D(2|1; α) in such a way that the u(1) currents (3.2)–(3.4) are recovered. This will mean that the eight basis elements of C2 (z) ⊗ C 2 (z) ⊗ C21 (z) (four of which were explicitly given in Eqs. (2.12)–(2.13) in terms of vertex operators) represent in fact the D(2|1; α) fermions. 3.2. Conformal embeddings in D(2|1; α). Our conventions on D(2|1; α) are summarised in Appendix C. Using the notations for the roots introduced there, we write the bosonic subalgebra of D(2|1; α) as s(2)(α2 ) ⊕ s(2)(α3 ) ⊕ s(2)(αθ ) . When s(2|1) is regularly embedded in D(2|1; α) (i.e. when the roots of the subalgebra are chosen as a subset of the roots of the embedding algebra), any of these three s(2) algebras can be taken as the s(2) subalgebra of s(2|1), for which we introduce the notation (reminding − us of the H − current) s(2)(α ) , so that α − = α2 , α3 or αθ . For each choice of α − , there exist two regular embeddings of s(2|1) in D(2|1; α). The corresponding s(2|1) root lattices contain α − and four of the eight odd roots of D(2|1; α). From now on, we use α the parameter γ = 1+α , γ ∈ C \ {0, 1, ∞} instead of α. This parameter governs the norm of α2 and α3 , and αθ is always the longest root if γ ∈ [ 21 , 1[ (see Appendix C). For the affine algebras, the levels of the algebras involved in the embedding γ u(1) ⊂ D(2|1; s(2|1) λ ⊕ 1−γ )κ
(3.5)
κ or κ according to whether α − = α2 , α3 or αθ . On the are related as λ = − γκ , − 1−γ other hand, one also has the regular embedding, γ (α−κ3 ) ⊕ s(2) κ(αθ ) ⊂ D(2|1; (α2κ ) ⊕ s(2) s(2) 1−γ )κ . − γ
(3.6)
1−γ
γ We now note that the Sugawara central charge of D(2|1; 1−γ )κ is 1 for any value of the γ level κ, because the dual Coxeter number of D(2|1; 1−γ )κ is zero and its superdimension (number of bosonic generators minus number of fermionic generators) is one; hence, the u(1) is also 1 corresponding central charge is one. But the central charge of s(2|1) λ ⊕ for any level λ, this time because the superdimension of s(2|1) is zero, and it turns out that for certain relations between the level κ and the parameter γ , which we analyse
Vertex Operator Extensions of Dual Affine s(2) Algebras
507
γ λ1 ⊕ s(2) λ2 ⊕ s(2) λ3 subalgebra of D(2|1; below, the central charge of the s(2) 1−γ )κ is also one. It is therefore not surprising that s(2) ⊕ s(2) ⊕ s(2) and s(2|1) ⊕ u(1) are intimately linked, a fact already noticed earlier when the energy-momentum tensors of the two theories were shown to coincide in a free field representation (2.31), and also seen from the representation decompositions in Sect. 5 and the character identities (5.55). The relation between s(2) ⊕ s(2) ⊕ s(2) and s(2|1) ⊕ u(1) is based on the fact γ that the embeddings into D(2|1; ) which we consider give rise to dual s(2) pairs: κ 1−γ
(α2κ ) ⊕ s(2) (α−κ3 ) ⊕ s(2) κ(αθ ) is one if Lemma 3.1. The Sugawara central charge of s(2) − γ
1−γ
and only if two of the s(2) subalgebras are dual to each other in the sense of Eq. (1.1). Moreover, when this is so, the third s(2) algebra has level 1. Proof. Indeed, adding up the central charges on the left-hand side of (3.6), we find the (α2 ) and s(2) (αθ ) are dual and s(2) (α3 ) sum is one for κ = γ − 1 (in which case s(2) (α ) (α ) θ 3 has level 1), for κ = −γ (with s(2) and s(2) dual), or for κ = 1 (with s(2)(α2 ) (α ) 3 and s(2) dual). Conversely, we have to consider three different possibilities to pick out two dual s(2) algebras. (α2 ) and s(2) (αθ ) leads to κ = γ −1 and the relevant embedding 1. Taking the pair s(2) is γ 2) (α3 ) ⊕ s(2) (αθ ) ⊂ D(2|1; (α s(2) 1−γ ⊕ s(2)1 γ −1 1−γ )γ −1 .
(3.7)
γ
(αθ ) chosen as dual, κ = −γ and the relevant em (α3 ) and s(2) 2. For the pair s(2) bedding is (αθ ) γ (α2 ) ⊕ s(2) (αγ3 ) ⊕ s(2) −γ s(2) ⊂ D(2|1; 1 1−γ )−γ .
(3.8)
1−γ
(α2 ) and s(2)(α3 ) as the dual pair leads to κ = 1 and, thus, to the 3. Taking s(2) embedding γ (α21) ⊕ s(2) (α13 ) ⊕ s(2) (αθ ) ⊂ D(2|1; s(2) 1 1−γ )1 . −γ
(3.9)
γ −1
This completes the proof of the lemma. γ So we have two subalgebras of g = D(2|1; u(1) 1−γ )κ , namely h1 = s(2|1)λ ⊕ (α2 ) (α3 ) (αθ ) and h2 = s(2)λ1 ⊕ s(2)λ2 ⊕ s(2)λ3 , whose central charges coincide with that κ of g as long as two of the three levels k1 = − γκ , k2 = − 1−γ , and k3 = κ obey (ka + 1)(kb + 1) = 1. For unitary representations, we would then conclude that embeddings (3.5), (3.7), (3.8), and (3.9) are conformal, since the coset Virasoro algebra L = Lg − Lh has vanishing central charge, and the only unitary representation of the coset algebra in this case is the trivial one [18], leading to the following formal relation between characters,
Tr 7 q Lg = Tr 7 q Lh .
(3.10)
In the case at hand, though, we are dealing with non unitary representations. A sum h2 representations can only be (of products) of characters corresponding to h1 or to
508
P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina
interpreted as an irreducible g character if the corresponding representation is acted upon trivially by the coset Virasoro algebra. We want to argue that, in order for this to be so in the case of a g representation with highest vector |7, it is sufficient to demand that the state L−2 |7 is null (i.e., singular or a descendant of a singular state), and that the same condition holds for the vacuum representation, i.e. L−2 |0 is null, where |0 is the usual sl(2, C) invariant vacuum, which is also invariant under the action of the finite dimensional algebra g. Indeed, the operator product expansion of L with one of the currents J a (ζ ) of g is of the form, L(z)J a (ζ ) =
X(ζ ) Y (ζ ) + + regular, (z − ζ )2 z−ζ
(3.11)
and the state associated with the residue of the double pole is, X−1 |0 = (J a )1 L−2 |0 = λ(J a )−1 |0,
(3.12)
for some constant λ. Since (J a )−1 |70 is not null and yet our assumption demands that the vector (J a )1 L−2 |70 vanishes or is null, it follows that λ = 0, so that X(ζ ) must vanish identically. Thus, a b , (3.13) Jm , Ln = M ab Ym+n so that Ln and Yna are just two Kaˇc–Moody primary fields in some representation of g. Moreover this representation is finite since the number of weight-two states in 70 is finite, and all the states in the multiplet are null. We now return to the representation |7 which is not the vacuum representation. It is easy to show that (Lg )1 L−2 |7 = 3L−1 |7,
(3.14)
so that this state must also be null. Since L−1 and L−2 generate the whole Virasoro algebra, it follows that L−n |7 must be null for any n, and hence any of the Y−n |7 will also be null for a Y in the null multiplet of weight-two fields. As a consequence of the above remarks, a state of the form |W = L−n {. . . }|7, where . . . consists of some arbitrary collection of g current modes, is null, as can be shown inductively by pushing L−n to the right. The action of L on the representation 7 is therefore effectively trivial as claimed. γ For instance, in the case of embedding (3.9), where D(2|1; 1−γ ) has level one, the αθ 2 αθ state (E−1 ) |0 is a singular vector, where E−1 is the usual affine simple root associated with the step operator corresponding to the highest root αθ of g. This state is a singular vector at grade two as is L−2 |0, so it isn’t unreasonable to argue that they are in the same multiplet under g. If this were the case, our condition that L−2 |7 is null is equivalent to the condition that (E αθ −1 )2 |7 is null. The latter is implied by the vanishing αθ 2 of 7|(E1−αθ )2 (E−1 ) |7, yielding the familiar s(2) result αθ .7(αθ .7 − 1) = 0.
(3.15)
This means we only should see the unitary singlet and doublet representations of the (α ) γ s(2)1 θ appearing in the D(2|1; 1−γ )1 character decomposition if the embedding is to be conformal, and indeed inspection of our character sum rules reveals that only the (α ) characters of these s(2)1 θ representations occur.
Vertex Operator Extensions of Dual Affine s(2) Algebras
509
We now proceed to identify which of the s(2|1) embeddings (3.5) (with κ = γ − 1, −γ or 1) is the one implied in (3.2)–(3.4). To make contact with the latter expressions, (α − ) is also at level k. we must fix the level of s(2|1) to be k, so that its subalgebra s(2) According to the above discussion, we have the embeddings s(2|1) ⊕ u (1) λ
D (α2 ) ⊕ s(2) (α3 ) ⊕ s(2) (αθ ) s(2) λ1 λ2 λ3
γ ≡ D(2|1; in D 1−γ )γ −1 for the levels, (k; k , 1, k), (k ; k , 1, k), for γ = − kk = k + 1, (λ; λ1 , λ2 , λ3 ) = (k; k, 1, k ), (k ; k, 1, k ), for γ = − kk = k + 1, γ ≡ D(2|1; in D 1−γ )−γ for the levels, (k; 1, k , k), (k ; 1, k , k), for γ = −k, (λ; λ1 , λ2 , λ3 ) = (k; 1, k, k ), (k ; 1, k, k ), for γ = −k , γ ≡ D(2|1; and in D 1−γ )1 for the levels, (k; k, k , 1), (k ; k, k , 1), for γ = − k1 , (λ; λ1 , λ2 , λ3 ) = (k; k , k, 1), (k ; k , k, 1), for γ = − k1 .
(3.16)
(3.17)
(3.18)
(3.19)
u(1) in terms of s(2)(α2 ) ⊕s(2)(α3 ) ⊕ We actually have twelve descriptions of s(2|1) k ⊕ (α ) θ s(2) , which we summarize in Table 1. There, we label the even subalgebra of s(2|1) ⊕ u(1) as (α [s(2)
−)
+
∗
⊕ u(1)(α ) ] ⊕ u(1)(α ) .
(3.20)
(α − ) is the algebra dual to s(2) (α − ) , and We further define α − and β such that s(2) (β) is the third, level 1, algebra in the sense described in Lemma 3.1. The comparison s(2) of data in Table 1 with the currents (3.2)–(3.4) requires proper normalisation, dictated k/2 by the operator product expansions derived earlier. Since H − (z) H − (w) = (z−w) 2, − − − the properly normalised corresponding root is α = µα with µ = k/2(α )2 − 2 + + = να + with (since ( α ) = k/2). Similarly, H (z) corresponds to α − − 0 + 2 to α = ρα with ρ = k /2(α − )2 , j 0 (z) ν = −k/2(α ) , J (z) corresponds 2 α∗ = τ α∗ corresponds to β = σβ with σ = 1/2(β) , and finally, A(z) corresponds to with τ = 1/(α ∗ )2 . We conclude that the first embedding in Table 1 indeed corresponds to currents (3.2)– (3.4), with
α− = +
α = ∗
α =
√
k 2 αθ , √ − k , α − − kβ 2 (α2 + α3 ) = (k + 1) √ −1 (kα2 + (k + 1)α3 ) = 2k(k 2(k+1)
(3.21) (3.22) −
+ 1)( α
). −β
(3.23)
510
P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina
This therefore indicates that the vertex operators corresponding to all eight elements from C2 (z) ⊗ C 2 (z) ⊗ C21 (z) extend our vertex operator construction of s(2|1) k to 1 D(2|1; k )k = D(2|1; k )k . In Sect. 5, we establish character sumrules which confirm that the coset of D(2|1; k )k by s(2|1) u(1) actually coincides with the coset correk ⊕ sponding to the conformal embedding (3.7).
γ (α2 ) ⊕ s(2) (α3 ) ⊕ s(2) (αθ ) in D(2|1; Table 1. Embeddings of s(2|1) u(1) and s(2) λ ⊕ λ3 λ1 λ2 1−γ )γ −1 γ γ (embeddings 1,3,5,7), D(2|1; 1−γ )−γ (embeddings 2,4,9,11) and D(2|1; 1−γ )1 (embeddings 6,8,10,12). while α ∗ is in the u(1) direction α − and α + are associated with the isospin and hypercharge in s(2|1), −
−
(α ) and s(2) (α ) are dual algebras and s(2) (β) is orthogonal to the s(2|1) root plane in D(2|1; α). s(2) −1 the third, level 1 algebra. The transformations t1 : γ → 1 − γ and t2 : γ → γ are discussed in Appendix C. α−
roots generating s(2|1) plane
αθ
(α1 − αθ , α1 )
α+
α∗
−α2 − α3 (1 − γ )α2 − γ α3
(α1 + α3 , − α1 − α2 ) −α2 + α3 (1 − γ )α2 + γ α3
α2
(α1 + α3 , αθ − α1 )
α 3 + αθ
α3 + (1 − γ )αθ
α− β
value of parameter γ
α 2 α3
−k/k
(k; k , 1, k)
α3 α2
−k = t1 (−k/k )
(k; 1, k , k)
α 2 α3
−k/k
(k; k , 1, k)
α3 α2
−k = t1 (−k/k )
(k; 1, k , k)
−k /k
(k; k, 1, k )
αθ
α3
α3 αθ −1/k = t2 t1 t2 (−k /k) (α1 , α1 + α2 )
−α3 + αθ −α3 + (1 − γ )αθ
αθ
α3
−k /k
α3 αθ −1/k = t2 t1 t2 (−k /k) α3
(α1 + α2 , αθ − α1 )
α 2 + αθ
α2 + γ αθ
αθ
α2
α2 αθ (α1 , α1 + α3 )
−α2 + αθ
−α2 + γ αθ
αθ
α2
α2 αθ
embedding
(λ; λ1 , λ2 , λ3 )
(k; k, k , 1) (k; k, 1, k ) (k; k, k , 1)
−k
(k; 1, k, k )
−1/k = t2 (−k )
(k; k , k, 1)
−k
(k; 1, k, k )
−1/k = t2 (−k )
(k; k , k, 1)
4. Constructing the Representations k We now construct relations between representations of s(2|1) k and of the s(2) k algebras with dual k and k . The vertex operator realisation of s(2|1) and s(2) in k and s(2) k vertex operaSect. 2 involves taking “quantum traces” of products of s(2) tors, suggesting that representations of s(2|1) could be constructed as sums of products 1 representations). of representations of these dual s(2) algebras (and the auxiliary s(2) However, identifying which s(2) representations to choose in order to obtain, e.g., the
Vertex Operator Extensions of Dual Affine s(2) Algebras
511
category O of s(2|1) representations does not follow from the operator construction alone, and the question is analysed in Sect. 4.3. k⊕ On the other hand, one can address the inverse problem of reconstructing s(2) s(2)k representations out of a given s(2|1) representation. By simply restricting the k . For the dual s(2) k algebra, latter, one obviously arrives at representations of s(2) which is not a subalgebra of s(2|1), the construction of its representations starting with an s(2|1) k representation is provided in Secs. 4.1 and 4.2. The “direct” and the “inverse” problems each consist, first, in constructing a representation and second, in decomposing it. The second step, which requires a more detailed analysis of the representations constructed, will be addressed in Sect. 5. We continue using both k = p − 2 and k = p − 2 as the parameters. 4.1. Reconstructing the dual s(2) pair. We first address the problem of reconstructing k ⊕ s(2) k starting with the s(2|1) the dual pair s(2) k algebra. One of these is obviously the subalgebra of s(2|1), and thus the problem is to “complete” the u(1) subalgebra to the second (“primed”) s(2). The construction goes by picking out the appropriate terms in the operator products (C2 (z)⊕C2 (z))·(C2 (w)⊕C2 (w)), where C2 (z)⊕C2 (z) denotes the s(2|1) operators representing s(2) doublets. This yields the dual s(2) operator product expansions after a “correction” with an extra scalar field (recall that we also needed an auxiliary scalar in constructing s(2|1) out of two s(2) algebras). We, thus, introduce the scalar field φ with the operator product ∂φ(z) ∂φ(w) = −
1 (z − w)2
(4.1)
(note that the sign is opposite to the one for the f scalar in (2.10)) and define C+ =
1 k+1
E1 F 2 e
√
2φ
,
C− =
1 k+1
√ 2φ
F 1 E 2 e−
,
C0 =
k √ 2
∂φ + (k + 1)H + . (4.2)
2
The operators C ± representing C2 (z) (“corrected” by the free scalar, and therefore, local with respect to each other) commute with the s(2) subalgebra of s(2|1). It also follows from the s(2|1) algebra that (E 1 F 2 )(z) (E 2 F 1 )(w) =
TSug + E 1 F 1 − E 2 F 2 + ∂H − + ∂H + k(k + 1) 2(k + 1)H + − −(k+1) +. . . , 4 3 (z − w) (z − w) (z − w)2 (4.3)
where TSug is the Sugawara energy-momentum tensor (B.5). Thus, C + (z) C − (w) =
k C0 . + 2 (z − w)2 z−w
(4.4)
It is easy to see that C + (z) C + (w) and C − (z) C − (w) are nonsingular. We also find the operator products C 0 (z) C ± (w) =
±C ± , z−w
C 0 (z) C 0 (w) =
k /2 . (z − w)2
(4.5)
512
P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina
Equations (4.4) and (4.5) show that C + , C 0 , and C − satisfy the “dual” s(2) algebra of level k . However, we keep the notation C ±,0 instead of J ±,0 in order to stress that the former are constructed out of the s(2|1) currents. In addition, we have the current D=
√1 2
∂φ + H + ,
D(z) D(w) = − (k+1)/2 , (z−w)2
(4.6)
k algebra. that commutes with the s(2) Remark 4.1. Under the s(2|1) spectral flow E 1 (z) → zθ E 1 (z),
E 2 (z) → z−θ E 2 (z),
F 1 (z) → z−θ F 1 (z), F 2 (z) → zθ F 2 (z),
H + (z) → H + (z) − θ kz ,
k currents undergo the spectral flow transform with the parameter 2θ : the s(2) C ± (z) → z±2θ C ± (z),
C 0 → C 0 + θ
k z.
Note also √that if the s(2|1) spectral flow is accompanied by the transformation φ(z) → k currents remain invariant. φ(z) − 2 θ log z in the auxiliary scalar sector, the s(2) Remark 4.2. The operators emerging in the operator product expansion (4.3) already subalgebra; they can be further reorganised by separating the part commute with the s(2) that commutes with H + . The resulting operators then generate the coset W algebra w = s(2|1)/ g(2) mentioned in the Introduction. Its central charge c = −2
2k + 1 k − 1 =2 k+2 k +2
3k satisfies, obviously, c + 1 + k+2 = 0. The lowest spin (spin-2 and spin-3) generators of w can be read off from (4.3) (with the first-order pole restored) and are expressed through the s(2|1) currents as
W2 =
E1F 1 − E2F 2 E 12 F 12 + H − H − H +H + ∂H − + + + k+1 (k + 1)(k + 2) k(k + 1) k + 2
and (choosing it to be a W2 -primary) 1 1 12 1 2 12 1 2k 1 1 12 W3 = k+1 (k+2)(k+4)(3k+2) 2 (k + 4)E ∂F + E F F − 2 E ∂F + 21 (k + 4)E 2 ∂F 2 − − F 12 E 1 E 2 + H − E 1 F 1 + H − E 2 F 2 − − 2k H + E 12 F 12 + −
− k+2 + k H ∂H
−
1 2 (k
−
k+4 + 2 2 2 + − − k H E F − − kH H H 1 1 12 12 1 1 2 (k + 4)∂E F + 2 ∂E F
−
2(k+4) H 3k 2
k+4 + 1 1 k H E F + + +
H H
+ 4)∂E F 2 − ∂H + H − − 21 ∂ 2 H − − 16 (k + 4)∂ 2 H + . 2
No higher-spin currents are generated for k = −3 (i.e., the W3 · W3 operator product is closed to W2 and W3 ), and similarly with more negative integer values: for k = −n, the highest spin is n.
Vertex Operator Extensions of Dual Affine s(2) Algebras
513
k representations. We now consider the s(2) k represen4.2. From s(2|1) k to s(2) ±,0 tation furnished by the modes Cn of the currents constructed in (4.2). We start with an s(2|1) k module, which we take to be a twisted Verma module Ph− ,h+ ,k;θ (see Appendix 6) with the twisted highest-weight vector |h− , h+ , k; θ . Let also Eσ be the Fock σ √ φ module of the free current ∂φ with the highest-weight vector |e 2 corresponding to σ √ φ k . the operator e 2 . Then Ph− ,h+ ,k;θ ⊗ n∈Z Eσ +2n carries a representation of s(2) We now construct the s(2)k highest-weight vector, which we temporarily denote by |h− , h+ ; θ ; σ , by tensoring the highest-weight vectors in these modules. k state Lemma 4.3. The s(2)
√σ φ |h− , h+ ; θ ; σ = |h− , h+ , k; θ ⊗ e 2
(4.7)
is a twisted relaxed highest-weight state. We recall [17] that a twisted relaxed highest-weight state |j , 7, k ; θ s(2) satisfies the annihilation conditions + C≥1+θ = 0, |j , 7, k ; θ s(2) − C≥1−θ = 0, |j , 7, k ; θ s(2) 0 C≥1 |j , 7, k ; θ s(2)
(4.8)
= 0,
the eigenvalue condition
k C00 |j , 7, k ; θ s(2) = (j − θ 2 ) |j , 7, k ; θ s(2) ,
(4.9)
and the relation − + C−θ = 7 |j , 7, k ; θ s(2) . Cθ |j , 7, k ; θ s(2)
(4.10)
We will write |j , 7, k s(2) for the “untwisted” state |j , 7, k ; 0s(2) . When there are no more independent relations (no singular vectors are factored over), the module k generators acting on |j , 7, k ; θ is called the relaxed Verma spanned by the s(2) module Rj ,7,k ;θ [17]. The Sugawara dimension of the twisted relaxed highest-weight vector |j , 7, k ; θ s(2) is given by
# =
j 2 + j + 7 k 2 j + − θ θ . k + 2 4
(4.11)
Proof of the Lemma. Let Uh− ,h+ ;θ be the operator corresponding to the twisted highest k currents C ±,0 deweight vector |h− , h+ , k; θ (see (B.3)). It follows that the s(2) velop the following poles in the operator product with the operator Xh− ,h+ ;θ;σ = Uh− ,h+ ;θ e
σ √ 2
φ
corresponding to (4.7):
C + (z) Xh− ,h+ ;θ;σ (0) ∼ z2θ−1−σ ,
C − (z) Xh− ,h+ ;θ;σ (0) ∼ z−2θ−1+σ .
(4.12)
This gives precisely the twisted relaxed highest-weight conditions for the state in (4.7). We now choose σ = 2θ for simplicity (taking σ different from 2θ would result in the spectral flow transform of this state and, thus, in the twist of the entire module). We
514
P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina
then find that the |h− , h+ ; θ ; 2θ state constructed in (4.7) satisfies condition (4.10) with θ = 0 and 1 7 = − (k+1) 2 (k + 1 + h+ − h− )(h+ + h− ).
We also see that the eigenvalue of C00 on |h− , h+ ; θ ; 2θ is j = |h− , h+ ; θ ; 2θ is the relaxed highest-weight state
(4.13) h+ k+1 .
h+ 1 |h− , h+ ; θ ; 2θ = | k+1 , − (k+1) 2 (k + 1 + h+ − h− )(h+ + h− ), k s(2)
Thus,
(4.14)
and the freely generated module is the relaxed Verma module Rh+
−1 k+1 , (k+1)2
(k+1+h+ −h− )(h+ +h− ),k
.
The dependence of the space n E2θ+2n on θ is only mod Z, and for the integer θ that we consider in what follows, we have the space E∗ = n E2n . The s(2)k representations on Ph− ,h+ ,k;θ ⊗ E∗ are distinguished by the eigenvalues ξ of the current in (4.6). Let Xξ be the Fock module over the Heisenberg algebra of Fourier modes of the D current with the highest-weight vector |ξ such that D0 |ξ = ξ |ξ and D≥1 |ξ = 0. It follows that √ √ D0 |h− , h+ , k; θ ⊗ e 2(θ+n)φ = (h+ − (k + 1)θ − n) |h− , h+ , k; θ ⊗ e 2(θ+n)φ , (4.15) √
k algebra, the state |h− , h+ , k; θ ⊗ |e 2(θ+n)φ is a twisted while as regards the s(2) relaxed highest-weight state with the twist θ = 2n. We thus expect the decomposition k representations, into a sum of spectral-flow transformed s(2) Ph− ,h+ ,k;θ ⊗ E∗ =
n∈Z
Rh+
−1 k+1 , (k+1)2
(k+1+h+ −h− )(h+ +h− ),k ;2n
⊗ Xh+ −(k+1)θ−n ⊗ . . . (4.16)
k k representations). Since the parameters of the s(2) (where the dots denote s(2) modules appearing on the right-hand side are insensitive to the s(2|1) spectral flow parameter θ , this induces the correspondence Ph− ,h+ ,k;• ❀ Rh+
−1 k+1 , (k+1)2
(k+1+h+ −h− )(h+ +h− ),k ;•
(4.17)
k modules. between spectral-flow orbits of s(2|1) k and s(2) An interesting question is whether this correspondence can be extended to a functor; answering this involves, in particular, investigating the correspondence between singular vectors appearing in the modules related by the correspondence. We now show that the ❀ arrow descends to submodules generated from a class of the so-called “charged” singular vectors. The relaxed Verma modules may contain singular vectors such that the corresponding quotients are the usual Verma or twisted Verma modules. These are the charged singular
Vertex Operator Extensions of Dual Affine s(2) Algebras
515
vectors [17], which occur in the module built on the vector |j , 7, k s(2) whenever 7 = 7ch (n, j ) for n ∈ Z, where 7ch (n, j ) = n(n + 1) + 2nj ,
n ∈ Z.
When (4.18) holds, the charged singular vector reads (C0− )−n |j , 7ch (n, j ), k s(2) n ≤ −1, , |C(n, j , k )s(2) = (C + )n+1 |j , 7 (n, j ), k , n ≥ 0. ch s(2) 0
(4.18)
(4.19)
This state satisfies the usualVerma-module highest-weight conditions for n ≤ −1 and the twisted Verma highest-weight conditions with the twist parameter θ = 1 for n ≥ 1 [17]. Remarkably, we see from (4.13) and (4.18) that a charged singular vector occurs in if and only if the relaxed Verma module Rh+ −1 k+1 , (k+1)2
h+ = −h− − (k + 1)n
(k+1+h+ −h− )(h+ +h− ),k
or
h+ = h− − (k + 1)(n + 1),
(4.20)
which are the conditions (B.11) for the existence of charged singular vectors (B.12) and (B.14) in the s(2|1) k Verma modules Ph− ,h+ ,k;• (in the notations of [7], such conditions on the highest weight state lead to class IV and class V representations). Thus, Lemma 4.4. The relaxed Verma module Rh+
−1 k+1 , (k+1)2
(k+1+h+ −h− )(h+ +h− ),k
generated
from the state constructed in (4.7) contains a charged singular vector if and only if the s(2|1) Verma module Ph− ,h+ ,k;θ contains a charged singular vector. In addition to the “existence” result asserted in the lemma, the charged singular k representation on Ph− ,h+ ,k;θ ⊗ E∗ actually evaluate as vectors (4.19) in the s(2) the respective s(2|1) k charged singular vectors (B.12) or (B.14). Indeed, consider the representation on Ph− ,−h− −(k+1)n,k;θ ⊗E∗ and let n ≤ −1. Then, up to the factor (k+1)n , √ (C0− )−n |h− , −h− − (k + 1)n, k; θ ⊗ e 2θ φ
√ 2 2 1 = Eθ+n . . . Eθ−1 · Fθ+n+1 . . . Fθ1 |h− , −h− − (k + 1)n, k; θ ⊗ e 2(θ+n) φ , (4.21)
which is precisely the corresponding charged singular vector in (B.14). Next, let n ≥ 0. Note that in this case, the charged singular vector (C0+ )n+1 |j , n(n + 1) + 2nj , k s(2) satisfies the highest-weight conditions that are twisted by 1 with respect to the s(2) Verma-module highest-weight conditions. It is immediate to see that (again, up to a nonvanishing factor) √ (C0+ )n+1 |h− , −h− − (k + 1)n, k; θ ⊗ e 2θ φ
√ 1 1 2 2 |h− , −h− − (k + 1)n, k; θ ⊗ e 2(θ+n+1) φ = E−θ−n−1 . . . E−θ−1 · F−θ−n . . . F−θ √ (+) 1 = E−θ−n−1 C (n, h− , k; θ) ⊗ e 2(θ+n+1) φ . (4.22)
516
P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina
1 Now, unless k + 1 − 2h− = 0, the state E−θ−n−1 |C (+) (n, h− , k; θ) generates the same module as the charged singular vector of which it is a descendant, since (+) 1 1 E−θ−n−1 Fθ+n+1 C (n, h− , k; θ) = (k + 1 − 2h− )C (+) (n, h− , k; θ) .
k singular vectors in Ph− ,h− −(k+1)n,k;θ ⊗ A similar analysis applies to the charged s(2) E∗ , which again evaluate as the s(2|1) charged singular vectors. Namely, whenever h+ = k module Rh h− − (k + 1)n, the corresponding s(2) contains 2h− − h− k+1
− n, k )
k+1 −n,(n−1)( k+1 −n),k
the charged singular vector |C(n − 1, . When n ≤ 0, it is given by s(2) 2 |C (−) (n, h− , k; θ) times a primary the action of (C0− )−n+1 , which evaluates as Eθ+n−1 state in the φ sector. Unless 2h− = k + 1, this s(2|1) vector generates the same module k charged as the charged singular vector |C (−) (n, h− , k; θ). When n ≥ 1, the s(2) h− + n singular vector |C(n − 1, k+1 − n, k )s(2) is given by the action of (C0 ) on the highest-weight vector and evaluates precisely as |C (−) (n, h− , k; θ) (times a φ-sector state). k ⊕ s(2) k to s(2|1) 4.3. From s(2) k representations. We now build s(2|1) k repre k and s(2) k (and the “auxiliary” s(2) 1 ) representations. sentations by combining s(2) The construction is summarised in Theorem 4.5, while Sect. 4.3.1–4.3.2 explain how k representations, we build on the result one arrives at this result. As regards the s(2) k correspondence of the previous subsection, where we saw that the s(2|1) k ❀ s(2) produces relaxed Verma modules from the s(2|1) Verma modules. We can therefore k expect that the correspondence acting in the opposite direction should start with relaxed Verma modules. This proves to be the case, as we see in what follows. k ⊕s(2) 1 sector. We recall from (2.6)– 4.3.1. Choosing representation spaces: the s(2) k modules as (2.7) that the vertex operators C2 (z) map between s(2) C2 (z) ⊗ C2q (z) : Vj,k → Vj − 1 ,k ⊗ z
−j −1 p
2
j
C((z)) ⊕ Vj + 1 ,k ⊗ z p C((z)). 2
(4.23)
k modules Vj + n ,k , n ∈ Z, with the spins differing from This gives rise to a chain of s(2) 2 each other by (half-)integers. For the future convenience, we rewrite (4.23) for these modules C2 (z) ⊗ C2q (z) : Vj + n2 ,k → Vj + n−1 ,k ⊗ z 2
−j −1− n 2 p
C((z)) ⊕ Vj + n+1 ,k ⊗ z 2
j+ n 2 p
C((z)). (4.24)
In the auxiliary sector, let Fµ be the Fock space of the auxiliary current ∂f (z) with the highest-weight vector |µ such that √ (4.25) (∂f )0 |µ = 2 µ |µ. We then have the vertex operator action C21 (z) : Fµ → Fµ+ 1 ⊗ zµ C((z)) + Fµ− 1 ⊗ z−µ C((z)). 2
2
(4.26)
Vertex Operator Extensions of Dual Affine s(2) Algebras
517
m
✻
·
·•
·
·•
·
·•
·
·•
·
·•
·
·•
·
·•
·
·•
·
·•
·
·•
·
·•
·
·•
·
·•
·
·•
·
·•
·
·•
·
·•
·
✲n
Fig. 1.
The s(2|1) fermions constructed in terms of the vertex operators would change both n and m in Vj + n2 ,k ⊗ Fµ+ m2 . However, µ is changed by a half -integer simultaneously with k spin j changed by a half-integer. This involves only an index-2 sublattice of the s(2) the Z × Z $ (n, m) lattice in Fig. 1 (alternatively, one could choose to work with the other half of the lattice sites). k vertex operators Our aim is to combine Eqs. (4.24) and (4.26) with the action of s(2) such that the s(2|1) fermions E 1 , E 2 , F 1 , and F 2 then act on a space of the form n∈Z
Vj + n2 ,k ⊗ R (n) ⊗
m∈Z
Fµ− n2 +m
(4.27)
k modules). The very existence of such a space endowed (where R are some s(2) with the representations induced from the vertex operator action is not obvious a priori, because vertex operator action in each sector gives rise to the spaces zν C((z)) with non-integral ν that are different for different terms (recall that p = k + 2 ∈ C \ {0, 1} and j ∈ C in (4.24)); it does not therefore induce a representation on a space of the form (4.27) unless the modules R (n) are judiciously chosen. A crucial point making such a choice possible is the quantum group trace in the construction of the s(2|1) fermions. k sector. As we saw in Sect. 4.2 from 4.3.2. Choosing representation spaces: the s(2) k reprethe construction that is in a certain sense inverse to the present one, the s(2) sentations have to include the relaxed Verma modules. As a hint in properly choosing the relaxedVerma module R (0) = Rj ,7,k ;θ in (4.27), k relaxed k highest-weight state, an s(2) let us assume that the product of an s(2) 1 highest-weight state, |j, k ⊗ |j , 7, k ; θ ⊗ highest-weight state, and an s(2) s(2) |µ ∈ Vj,k ⊗ R ⊗ Fµ , is a twisted highest-weight state with respect to s(2|1)k j ,7,k ;θ
(see (B.3)) tensored with a highest-weight vector in the Fock module Aa over the free current A(z) in Eq. (2.17); we define the highest-weight vectors |a ∈ Aa such that A0 |aA =
2(k + 1) a |aA .
(4.28)
518
P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina
Assuming, thus, that |j, k ⊗ |j , 7, k ; θ s(2) ⊗ |µ = |h− , h+ , k; θ ⊗ |aA , we evaluate the eigenvalues of the s(2|1) Cartan generators (and also of A0 , see (2.17)) on this vector. Using (2.16) and (2.17), we obtain the eigenvalues
H0+ ≈ (k + 1)j − k(µ − θ2 ), A0 ≈ 2(k + 1) (j − k 2θ − µ),
(4.29) (4.30)
whence h+ = (1 + k)j + k(θ − µ + 21 θ ). We now compare the dimensions using Eq. (2.31) that we have established for our s(2|1) k generators. The balance of dimensions of the highest-weight vectors reads j (j +1) k+2
=
+
j (j +1)+7 k +2
j 2 −(j (k+1))2 k+1
− j θ + (4.11)
+2 µ−
k θ 2 4
+ µ2
j (k + 1) − k µ −
θ 2
θ 2
2
+ (k + 1) j −
k θ 2
2 −µ .
(B.6)
(4.31) This fixes 7 equal to 7(j, j ) given by1 7(j, j ) = − 1 + j − (k + 1)j j + (k + 1)j . Generalising this, we now take the modules Rj − n ,7 (j,j ),k with 2
R (n)
(4.32)
in (4.27) to be
R (n)
=
n
7n (j, j ) = −(1 − (k + 1)j + j )(j + (k + 1)j − n).
(4.33)
In accordance with (4.11), the Sugawara dimension #n of |j , 7n (j, j ), k is #n =
(1 − (k + 1)j + n2 )( n2 − (k + 1)j ) , k + 2
j k+2 and #n−1 − #n 3/4 1 k +2 is the Sugawara dimension of | 2 , k corresponding
and therefore, evaluating #n+1 − #n −
3/4 k +2
=
n/2 k +2
(4.34)
−
−n/2−1 j − k3/4 +2 = k +2 + k+2 (where k vertex operator action to the C2 (z) vertex operator), we find the exponents in the s(2)
C2 (z) ⊗ C2q (z) : Rj − n ,7n (j,j ),k → 2
→
Rj − n−1 ,7 (j,j ),k n−1 2
⊗z
−n 2 −1 + j p p
C((z)) ⊕ Rj − n+1 ,7 (j,j ),k n+1 2
⊗z
n 2 −j p p
C((z)). (4.35)
+ pj and Remarkably, the exponents −n/2−1 p integers with the exponents from (4.24).
n/2 p
−
j p
appearing here add up to (half-)-
1 We note that (4.32) implies h
+ 7(j, k+1 )=−
(k+1+h+ −h− )(h+ +j ) , (k+1)2
k states from a given s(2|1) which reproduces Eq. (4.13) obtained in constructing s(2) k state and, thus, suggests that the constructions in Sects. 4.3 and 4.2 are parts of the direct and the inverse functors.
Vertex Operator Extensions of Dual Affine s(2) Algebras
519
4.3.3. The vertex operator action. The last observation is a crucial point that, irrespective of the preceding motivations, allows us to construct the space carrying an s(2|1) action. Theorem 4.5. For every pair (j, j ) ∈ C × C, there is an s(2|1) k representation on the space Nj,j =
n∈Z
Vj + n2 ,k ⊗ Rj − n ,7n (j,j ),k ⊗ 2
m∈Z
Fm− n2 ,
(4.36)
k module with the highest-weight vector |j, k, R where Vj,k is an s(2) j ,7,k is an s(2)k representation with the relaxed highest-weight vector |j , 7, k s(2) , and F is the Fock µ space of the auxiliary current with the highest-weight vector |µ defined in (4.25). Remark 4.6. The range of the n summation in (4.36) can be restricted to a subset of Z k and s(2) k modules involved, as we will see in what follows. depending on the s(2) To show (4.36), it only remains to check that combining (4.24), (4.35), and (4.26), we are left with the Laurent spaces zν C((z)) with integer ν. Indeed, putting together (4.24) and (4.35) and recalling the trace in (2.9), we arrive at C2 (z) ⊗ C2 (z) : Vj + n2 ,k ⊗ Rj − n ,7n (j,j ),k → 2
n
→ Vj + n−1 ,k ⊗ Rj − n−1 ,7 2
2
n−1 (j,j ),k
⊗ z− 2 −1 C((z)) ⊕ Vj + n+1 ,k 2
⊗ Rj − n+1 ,7 (j,j ),k n+1 2
n 2
⊗ z C((z)). (4.37)
We next combine this with the auxiliary sector from (4.26) and, thus, obtain the vertex operator action C2 (z) ⊗ C2 (z) ⊗ C21 (z) : Vj + n2 ,k ⊗ Rj − n ,7n (j,j ),k ⊗ Fµ− n2 +m → 2 n → Vj + n−1 ,k ⊗ Rj − n−1 ,7 (j,j ),k ⊗ z− 2 −1 C((z)) ⊕ Vj + n+1 ,k n−1 2 2 2
n 2 ⊗ Rj − n+1 ,7 (j,j ),k ⊗ z C((z)) ⊗ n+1 2n
n ⊗ zµ− 2 +m Fµ− n +m+ 1 ⊕ z−µ+ 2 −m Fµ− n +m− 1 . 2
2
2
2
(4.38)
This indeed involves the zν C((z)) spaces with all ν being integer mod µ, and there fore, induces an s(2|1) representation on the space n∈Z Vj + n2 ,k ⊗ Rj − n ,7 (j,j ),k ⊗ n 2 n m∈Z Fµ+m− 2 . This space actually depends only on the fractional part of µ, and this dependence is nothing but the overall (non-integral) spectral flow; we can therefore set µ = 0, which gives the space Nj,j in Eq. (4.36) endowed with the structure of an s(2|1) k representation. (Choosing between integral and half-integral µ corresponds to Ramond and Neveu–Schwarz sectors.) Remark 4.7. In the auxiliary sector modules in (4.36), we can replace m− n2 with m− 21 for odd n and with m for even n; therefore, for odd and even n we have the respective spaces 1 F = M and 1 1 m∈Z m+ m∈Z Fm = M0,1 that are the spin- 2 and spin-0 irreducible ,1 2
2
520
P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina
1 algebra introduced in (3.1). Thus, the s(2|1) representations of the s(2) representation space can be described as a sum of tensor products of three s(2) representations: Nj,j = Vj + n2 ,k ⊗ Rj − n ,7n (j,j ),k ⊗ M ε(n) ,1 , ε(n) = n mod 2. (4.39) 2
n∈Z
2
1/2
Replacing ε(n) with 1 − ε(n) gives the Neveu–Schwarz representation space Nj,j . Remark 4.8. It is obvious from the above that Nj,j is in fact a representation of D(2|1; k )k . 5. Decomposition of Representations and Character Identities In this section, we decompose the s(2|1) k representations constructed as in (4.39) and check the decomposition formulas by using character identities. In Sect. 5.1, we explain the general strategy of decomposing the representations and obtaining the corresponding character identities. In Sec 5.2, we calculate the characters of both sides of the decomposition formula for the corresponding (relaxed) Verma modules and show these characters to be identical. In Sect. 5.3, we outline the steps leading from the Verma-module case to the respective irreducible representations. We specialise to the admissible representations and, further, to the “principal admissible” ones. The relevant decomposition formula is given in (5.25). As we do not give a direct proof that the representations on the right-hand side are indeed the corresponding irreducible s(2|1) representations, we invoke the 1 u −1 character sumrules derived independently. These are given in Sect. 5.4, where we first have to explain how the characters of the twisted representations are identified with the s(2|1) characters known from [7]. As a result, the sumrules in Eqs. (5.55)–(5.57) 1 u −1 are found to be precisely the character identities for (5.25). We conclude the section with a remark on the modular properties of the characters involved. 5.1. Decomposing the s(2|1) representation. The s(2|1) k representation on the space Nj,j constructed in the previous section is by no means irreducible, since all the s(2|1) generators commute with the current A(z) constructed in (2.17). Thus, Nj,j decomposes as λ P(λ) ⊗ Aa(λ) , where P(λ) are some s(2|1) k modules and Aa are the Fock modules over the free scalar; λ labels different eigenvalues that A0 has on the highestweight vector of each A module. Evaluating the s(2|1) highest-weight parameters, we find that the different s(2|1) representations P(λ) are the spectral flow transform of each other. For all these modules, the highest weight parameters, except the twist, are the same. Taking for definiteness Vj + n2 ,k to be the Verma modules (and Rj − n ,7 (j,j ),k ;θ the twisted relaxed Verma n 2 modules), we thus arrive at the decomposition n∈Z
Vj + n2 ,k ⊗ Rj − n ,7n (j,j ),k ;θ ⊗ M ε(n) ,1 = 2
2
Pj,(k+1)j ,k;2µ−θ +θ ⊗ Aj −θ
θ∈Z
(5.1) where Aj −θ are Fock modules (see (4.28)). A priori, Pj,(k+1)j ,k;2µ−θ +θ are some s(2|1) representations with the respective twisted highest-weight vectors |j, (k + 1)j , k; 2µ − θ + θ. That they are in fact the corresponding twisted Verma
Vertex Operator Extensions of Dual Affine s(2) Algebras
521
modules is confirmed in Sect. 5.2 by showing that the characters on both sides of (5.1) are identical.2 − Sug + 1 √ 1 A In calculating the characters, we take the trace Tr zH0 ζ H0 q L0 y 2(k+1) 0 q 2 (AA)0 on the right-hand side of the decomposition formula and then use Eqs. (2.16), (2.17), k ⊕ s(2) k ⊕ s(2) 1 Cartan generators and (2.31) to rewrite this in terms of the s(2) and the energy-momentum tensors, −
Sug √ 1 + 1 A Tr zH0 ζ H0 q L0 y 2(k+1) 0 q 2 (AA)0 0 Sug j 0 Sug
J 0 Sug = Tr zJ0 q L0 ζ k+1 y 0 q L0 ζ −k y −1 0 q l0
(5.2)
(with the respective Sugawara energy-momentum tensor for each algebra). The trace on each side of the last formula is taken over the space on the complementary side of (5.1). The resulting character identities are therefore of the general form
χ s(2|1)k (q, z, ζ ) χ A (q, y) = χ s(2)k (q, z) χ s(2)k (q, ζ k+1 y) χ s(2)1 (q, ζ −k y −1 )
(5.3)
(where χ s(2)1 and χ A are the standard expressions essentially given by the correspond k and s(2) k ing theta function; the “nontrivial” ingredients are the s(2|1) and s(2) characters). In what follows, we freely interchange between (q, z, ζ, y) and (τ, σ, ν, ρ) related by q = e2iπτ ,
z = e2iπσ ,
ζ = e2iπν ,
y = e2iπρ ,
(5.4)
where τ ∈ C, Im τ > 0 ⇒ |q| < 1, and σ ∈ C, ν ∈ C, ρ ∈ C. 5.2. Verma-module case. Comparing the characters of both sides of (5.1), we have to deal with formal objects, since already the relaxed n Verma module character involves the nowhere convergent series δ(z, y) = n∈Z yz : the characters of a twisted relaxed Verma module R and of the s(2)k Verma module Vj,k are j ,7,k ;θ
char R j ,7,k ;θ (u, q) ≡ Tr R
j ,7,k ;θ
=q =q char V j,k (z, q)
k 2 4θ
u
− k4 θ 2
− k2 θ
δ(u q
δ(u q
≡ Tr Vj,k z
J00
Sug
0
uJ 0 q L0
q
−θ
Sug
L0
−θ
, 1)
, 1)
q
q
2 j +j +7 k +2
(u q −θ )j i 3 i≥1 (1 − q )
2 j +j +7 k +2
uj q −θ j , i 3 i≥1 (1 − q )
(5.5)
j 2 +j
q k+2 zj = , ϑ1,1 (q, z)
(5.6)
2 Strictly speaking, this requires the assumption that the modules in question are generated from the respective highest-weight vectors.
522
P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina
with the Jacobi theta-function (B.9). In the auxiliary sector, it is more convenient to return to the description as in (4.36); the character of Fµ is simply
Tr Fµ v
√1 (∂f )0 2
q
1 2 (∂f ∂f )0
For the character of n∈Z Vj + n2 ,k ⊗ Rj − n ,7 n 2 (recall the dimension in Eq. (4.34)) 0 Sug
Tr V n zJ0 q L0 Tr R
Sug
0
uJ 0 q L0
=
q
n 2 (j + n 2 ) +j + 2 k+2
2
j − n 2 ,7n (j,j ),k
j + 2 ,k
m,n∈Z
vµ q µ = . (5.7) i i≥1 (1 − q ) ⊗ m∈Z Fµ− n2 +m , we thus have (j,j ),k
n2 + n 4 2
n
Tr F
µ− n 2 +m
v
− j (n+1) + (k
1 q 2 (∂f ∂f )0 =
√1 (∂f )0 2
+1)j 2
n
(5.8)
n 2
n
k+2 k+2 uj − 2 q (µ+m− 2 ) v µ+m− 2 zj + 2 δ(u, 1) q k +2 . ϑ1,1 (q, z) (1 − q i )4
n∈Z m∈Z
i≥1
Shifting the summation variable as n → n + m, we continue this as (with ϑ1,0 defined in (B.10)) ϑ1,0 (q, z 2 u− 2 v − 2 q −µ ) ϑ1,0 (q, z 2 u− 2 v 2 q µ ) . ϑ1,1 (q, z) (1 − q i )4 1
j2
= zj uj v µ q k+1 +µ δ(u, 1) 2
1
1
1
1
1
i≥1
(5.9) In accordance with (5.2), we now substitute u = ζ k+1 y and v = ζ −k y −1 . Then uv = ζ , and therefore, the argument of the first theta function in the numerator becomes 1 1 1 1 1 1 z 2 ζ − 2 q −µ . In the second theta-function, we have u− 2 v 2 = u−1 (uv) 2 = u−1 ζ 2 ; we now use the formal δ-function property zδ(z, y) = yδ(z, y) to replace u with 1. Thus, m,n∈Z
Tr V
j+ n 2 ,k
Sug
zJ0 q L0 0
Tr R
j − n 2 ,7n (j,j ),k
Sug
0
(ζ k+1 y)J 0 q L0
Tr F
µ− n 2 +m
(ζ k y)
−1 √ (∂f )0 2
2
1
ϑ1,0 (q, z 2 ζ − 2 q −µ ) ϑ1,0 (q, z 2 ζ 2 q µ ) . ϑ1,1 (q, z) (1 − q i )4 1
j2
= zj ζ µ q k+1 +µ δ(ζ k+1 y, 1)
q 2 (∂f ∂f )0
1
1
1
i≥1
We set µ = 0 for simplicity (this parameter has the meaning of the overall spectral flow transform). On the right-hand side of (5.1), we have the character (B.8) of the twisted s(2|1) Verma module Ph− ,h+ ,k;θ , and the character of Aa is given by
Tr Aa y
√ 1 2(k+1)
A0
q
L0
y a q (k+1)a = , (1 − q i ) 2
i≥1
(5.10)
Vertex Operator Extensions of Dual Affine s(2) Algebras
523
and thus the character of the right-hand side of (5.1) equals θ∈Z
Tr P
=
j,(k+1)j ,k;θ
−
+
Sug
zH0 ζ H0 q L0
Tr A
j2
j −θ
zj ζ (k+1)j −(k+1)θ q k+1 −(k+1)j
y
√ 1 A 2(k+1) 0
q 2 (AA)0 1
2 +2θ(k+1)j −(k+1)θ 2
y j −θ q (k+1)(j −θ)
2
θ∈Z
ϑ1,0 (q, z 2 ζ 2 ) ϑ1,0 (q, z 2 ζ − 2 ) ϑ1,1 (q, z) (1 − q i )4 1
1
1
1
i≥1
ϑ1,0 (q, z 2 ζ 2 ) ϑ1,0 (q, z 2 ζ − 2 ) , ϑ1,1 (q, z) (1 − q i )4 1
j2
= zj y j ζ (k+1)j q k+1 δ(ζ k+1 y, 1)
1
1
1
(5.11)
i≥1
where we again can use the δ-function to replace y j ζ (k+1)j with 1. This is identical with the character obtained for the left-hand side (with µ = 0; a nonzero value of µ, as well as of the twist θ of the R modules, is restored by the overall spectral flow transform). We thus conclude that there are twisted s(2|1) Verma modules on the right-hand side of (5.1).
5.3. From Verma modules to irreducible representations. To obtain the decomposition formula of form (5.1) for other (e.g., irreducible) representations, one can start with appropriate Verma modules (e.g., those with dominant highest-weights, so as to arrive at the admissible representations eventually) and follow the corresponding BGG resolution, with (5.1) applied again to each term in the resolution. If the decomposition formula is of a functorial nature (i.e., the appearance of singular vectors is “synchronised” on both sides), one then arrives at a similar formula for the irreducible representations. Here, we follow this program only as far as the first step consisting in taking the quotients with respect to the charged singular vectors. In the end, we confirm the decomposition identities for irreducible representations by verifying the corresponding character identities. We see from (4.18) and (B.11) that whenever a charged singular vector occurs in one (and hence in all) of the relaxed Verma modules Rj − n ,7 (j,j ),k , n ∈ Z, each of the n 2 twisted s(2|1) Verma modules Pj,(k+1)j ,k;θ , θ ∈ Z, contains a charged singular vector. (Proceeding in the “inverse” direction, we saw the same correspondence in Sect. 4.2, where we made it even more explicit.) For Weyl modules, we then obtain the result stated in the Introduction by taking the appropriate quotients of (5.1). In what follows, we discuss the form that the decomposition takes for a class of irreducible representations. We find which character identities (“sumrules” of type (5.3)) must follow from this formula, and derive them independently. This is a strong argument in favour of the functorial properties of decomposition (5.1). We now explain which identity is to be tested at the level of character sumrules. We take a rational level k in the positive zone expressed through two positive integers t and u via k+2=
t u
' = 1.
(5.12)
524
P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina
We require the relaxed Verma module Rj ,70 (j,j ),k “at the origin” in (5.1) to have the charged singular vector |C(, j , k )s(2) ; more precisely, we satisfy Eq. (4.18) by choosing (k + 1)j = 1 + j + ,
∈ Z.
k , we are interested in the case where the irreducible represen k and s(2) For both s(2) with respect tations are admissible [27]. We recall here that the quotient of Rj ,7,k /V to the submodule generated from the charged singular vector |C(, j , k )s(2) , ∈ Z, is a Verma module if ≥ 0 or a twisted Verma module (with the twist 1) if ≤ −1, and the spin of the highest-weight vector of the quotient module is 1, ≥ 0, 1−θ jV = j + + (k + 2) 2 , where θ = 0, ≤ −1. take the “admissible” values, We now require that the spins of Vj,k and Rj ,70 (j,j ),k /V j=
r−1 2
−
j + +
t s−1 u 2 ,
t 1−θ t−u 2
=
r −1 2
t s −1 t−u 2
(5.13)
1 ≤ s ≤ t − u
(5.14)
−
with 1 ≤ r ≤ t − 1,
1 ≤ r ≤ t − 1,
1 ≤ s ≤ u,
(which obviously requires t ≥ u + 1). Conditions (5.13) give (r + s − s + θ )t = (r + r )u,
(5.15)
implying that r + r is a multiple of t, which means r + r = t once r and r are in the “admissible range” given above. We then have r = u − s + s − θ , and therefore (choosing ≥ 0, so as to have the “untwisted’ Verma modules in the quotients of the relaxed Verma modules, after which we can set = 0), we obtain the representations V t−u+n+1−s − t−u s−1 , t−2u ⊗ V u−n−1−s u s −1 2u−t ⊗ M ε(n) ,1 . (5.16) n
u
2
2
u
2
− t−u
2
,
t−u
2
Here, V and V are Verma modules; their quotients are the respective admissible representations M and M for n such that −t + s − s + 1 ≤ n − u + 1 ≤ s − s − 1.
(5.17)
Shifting the summation index, we thus arrive at the following combination of admissible representations: Nadm s,s
=
s −s−1 n=s −s+1−t
M t+n−s − t−u s−1 , t−2u ⊗ M−n−s 2
u
2
u
2
u − t−u
s −1 2u−t 2 , t−u
1 ≤ s ≤ u, ⊗ M ε(n+u−1) ,1 , 1 ≤ s ≤ t − u. 2
(5.18)
Vertex Operator Extensions of Dual Affine s(2) Algebras
525
Remark 5.1. That the representations outside the “admissible range” decouple can also be understood by starting with free-field realisations of s(2) modules. The correspond ing formula of type (5.1) then gives some s(2|1) representations (whose structure de k and s(2) k ). Now, admissible pends on the free-field representations chosen for s(2) representations of either s(2)k or s(2)k are singled out from the free-field spaces as the cohomology of an appropriate “BRST” complex (for example, a Felder-like complex [2], but in general other free-field resolutions are needed). The vertex operators k and s(2) k representations then extend to the mapping between the individual s(2) entire complex; this allows one to single out trivial vertex operators (see, e.g., [6]) that induce zero mapping on the cohomology (as a rule, nontrivial mappings via vertex operator exist if and only if the corresponding fusion rule coefficient is nonzero). In the admissible case, the vertex operators become trivial as soon as the spin j + n2 or j − n2 goes outside the corresponding Kaˇc table (which occurs simultaneously for the unprimed and primed modules in (5.16)–(5.18)). This allows us to consistently restrict to only a finite number (exactly t − 1) of terms on the left-hand side of (5.1). For each pair (s, s ) in the corresponding range, the space Nadm s,s decomposes into a direct sum over the s(2|1) spectral flow orbit. The s(2|1) representations involved carry the labels Lh− ,h+ ,k;θ = L t−u+1−s − t−u s−1 , t−u+1−s − t−u s+1 , t −2;m , m ∈ Z. One 2 u 2 2 u 2 u expects these to be the corresponding irreducible s(2|1) representations. As we have noted, proving this by studying the BGG resolution is a separate problem which we do not address here, however this is confirmed by the character identities derived independently. k representations, we take the level Specialising to the integrable (unitary) s(2) 2u−t k = t−u to be an integer, whence t = u + 1 which implies s = 1 and k = u − 1. It follows that the level k is a fractional of the form k=
1 − 1, u
(5.19)
and therefore, belongs to a subclass of s(2) “admissible” levels called “principal admissible”, which are parametrised as u(k + h∨ ) = k 0 + h∨ ,
k 0 ∈ Z+ ,
(5.20)
where h∨ = 2 is the dual Coxeter number of s(2) [27, 30]. In the present instance, we have k 0 = u − 1. Note that (5.19) is also principal admissible with respect to s(2|1), with k 0 = 0, since h∨ = 1. For each s ∈ [1, u], we now have the space s(2|1) Nint s =
−s n=1−s−u
M u+n − 1 s−1 , 1 −1 ⊗ M −n−s ,u−1 ⊗ M ε(n+u−1) ,1 = u
2
=
2
u
u−s n=1−s
2
2
M n − 1 s−1 , 1 −1 ⊗ M u−s−n ,u−1 ⊗ M ε(n−1) ,1 . 2
u
2
u
2
(5.21)
2
We remind the reader that the notation Xj,k for each module indicates the spin j and the level k. When we are dealing with the characters in what follows, it will be useful to t label each admissible representation of s(2) −2 with the spin u
j=
r −1 t s−1 − , 2 u 2
1 ≤ r ≤ t − 1, 1 ≤ s ≤ u,
(5.22)
526
P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina
by r and s in addition to t and u, as Mt,u,r,s . In this notation, the integrable represen k integrable representations tations are Mt,1,r,1 = It,r . In fact, we encounter the s(2) Mu+1,1,r ,1 = Iu+1,r . Then (after a convenient redefinition of the summation index) int the space Ns is rewritten as Nint s =
u r=1
Mu+1,u,r,s ⊗ Iu+1,u+1−r ⊗ I3,ε(r−s−1)+1 .
(5.23)
This is the space to be decomposed into s(2|1) representations. 1 u −1 In addition to only a finite number of irreducible representations appearing on the left-hand side of the decomposition formula as discussed above, another effect occurring for the representations under consideration is the periodicity under the spectral flow on the s(2|1) side, where the spectral flow (B.7) with θ = u produces an isomorphic representation. In the analogue of (5.1) for such representations Lh− ,h+ ,k , we split the summation over the twists on the right-hand side by writing θ = uα + β with α ∈ Z and β = 0, . . . , u − 1. Since Lh− ,h+ ,k;uα+β ) Lh− ,h+ ,k;β , the decomposition formula for Nint s takes a remarkable form involving precisely u non-isomorphic s(2|1) representations: u r=1
Mu+1,u,r,s ⊗ Iu+1,u+1−r ⊗ I3,ε(r−s−1)+1
=
u−1 β=0
L u−s+1 , u−s−1 , 1 −1;β ⊗ 2u
2u
u
α∈Z
(5.24) A u−1−s −uα−β . 2
In what follows, we test this formula by verifying the corresponding character identities. 5.4. Sumrules for irreducible representation characters. We now study the character sumrules corresponding to Eq. (5.25). 5.4.1. Free-scalar and s(2) characters. In order to write down the character identities corresponding to (5.25), we first of all note that the sum of the Fock module characters over α on the right-hand side gives rise to a level u generalised theta function, namely,
q u [α−
2β−(u−1−s) 2 ] 2u
y u [α−
2β−(u−1−s) ] 2u
α∈Z
= θ−2β+(u−1−s),u (q, y) = θ2β−(u−1−s),u (q, y −1 ),
(5.25)
where we used (5.10) and some standard properties of the generalised theta functions defined by µ 2 µ µ m2 θµ,κ (q, ζ ) = q κ(n+ 2κ ) ζ κ(n+ 2κ ) = ζ 2 q 4κ ϑ1,0 (q 2κ , ζ κ q µ−κ ) (5.26) n∈Z
(with the Jacobi theta function from (B.10)). This accounts for the level-u theta function in Eqs. (5.55)–(5.57).
Vertex Operator Extensions of Dual Affine s(2) Algebras
527
k We also recall [27] the character formula corresponding to the admissible s(2) module Mt,u,r,s with the level as in (5.12), namely, s(2)
χt,u,r,s (τ, σ ) =
θb+ ,ut (τ, σu ) − θb− ,ut (τ, σu ) , θ1,2 (τ, σ ) − θ−1,2 (τ, σ )
(5.27)
where b± = ±ur − (s − 1)t (and, as before, 1 ≤ r ≤ t − 1, 1 ≤ s ≤ u). Under the spectral flow, the s(2) characters transform as follows (see also [16] for the s(2) spectral flow properties): s(2)
χu+1,u,r,s (τ, σ − λτ ) = (−1)λ q
λ2 (u−1) 4u
z−
λ(u−1) 2u
s(2)
χu+1,u,r,s+λ (τ, σ ),
(5.28)
s(2)
with χu+1,u,r,s+λ understood as follows: writing s + λ = uα + β with α ∈ Z, β = 0, . . . , u − 1, we have s(2) for α ∈ 2Z, χu+1,u,r,(s+λ)u (τ, σ ) s(2) χu+1,u,r,s+λ (τ, σ ) = (5.29) s(2) −χu+1,u,u+1−r,(s+λ)u (τ, σ ) for α ∈ 2Z + 1, u−1 representations are where (x)u denotes the residue mod u. The integrable s(2) periodic under the spectral flow with period 2, while the spectral flow transform by 1 maps the representations into one another according to r → u + 1 − r; we then have the character transformation formula s(2)
χu+1,1,r,1 (τ, σ + τ ) = q −
u−1 4
z−
u−1 2
s(2)
χu+1,1,u+1−r,1 (τ, σ ).
(5.30)
5.4.2. Identification of the s(2|1) characters and the spectral flow. Turning to the right characters known from hand side of (5.25), we next identify which of the s(2|1) 1 u −1 our previous work on the subject [7] correspond to the module Lh− ,h+ ,k;β with twist β and h− =
u−s+1 , 2u
h+ =
u−s−1 . 2u
(5.31)
We recall an extensive study of s(2|1) 1 −1 characters provided in [?,?,?]; these characu
ters are organised according to the eigenvalues that H0− and H0+ have on the twist-zero state (see (B.3)) H− , H+ = H− − ν , k ≡ H− , H− − ν , k; 0 . (5.32) u u
To see how these are related to (5.31), we observe that the same irreducible module can be generated from the twisted highest-weight state 1 1 2 . . . E−1 · F1−ν . . . F02 H− , H− − uν , k; 0 , (5.33) X− (ν, H− , k) = E−ν+1 ν−1
ν
which is singled out by the fact that it satisfies annihilation conditions of type (B.13), 1 ≈ 0, E 2 ≈ 0, F 1 ≈ 0, and F 12 ≈ 0 (in the Verma module, the action namely E−ν ν ν 1 1 on X − produces the charged singular vector, and hence, E 1 |X − = 0 in the of E−ν −ν
528
P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina
irreducible representation; in this sense, therefore, X − is a pre-singular vector to (B.12)). We find the eigenvalues H0− (X − ) = H− − 21 ,
H0+ (X − ) = H+ + ν −
(5.34)
1 2
that the Cartan generators have on this state. On the other hand, we see from (5.31) that h+ − h− = −(k + 1) = − u1 .
(5.35)
The X− state is then expressed through the twisted highest-weight state as 2 h− , h− − u1 , k; β , X − = F−β
(5.36)
and therefore, H0− (X − ) = h− − H0+ (X − ) =
−s+1 2u , 1 h+ + 1 − ( u − 1)β 1 2
=
−
1 2
=
−s−1 2u
(5.37)
+ 1 − ( u1 − 1)β.
By comparing (5.37) and (5.34), we obtain H− = h− = H+ = h+ −
u−s+1 2u , β u−s−1−2β . u = 2u
(5.38)
This gives the parameters of the twist-zero highest-weight vectors in the s(2|1) modules entering the decomposition formula. All of these representations are obtained by taking the quotient of the Verma modules with two charged singular vectors, one of type (B.12) and the other of type (B.14), with the integers labelling these singular vectors being of the opposite signs (obviously, there exist other singular vectors that have to be factored over to obtain the irreducible representation). According to the values of H− and H+ , the irreducible representation characters in any chosen “sector” of the theory (Ramond, Neveu–Schwarz, etc.) at level k = u1 − 1 are those of class IV ( 21 u(u + 1) characters) and of class V ( 21 u(u − 1) characters) [10, 7, 8]. Choosing the Ramond sector for definiteness, we have that IV. The class IV constraints on the highest weight state isospin and hypercharge are given by 2H− + (k + 1)m = 0
and
H+ − H− = m (k + 1),
(5.39)
where 0 ≤ m ≤ m ≤ u − 1 and m, m ∈ Z+ . For the corresponding Verma module, this implies the existence of the charged singular vectors |C (−) (−m , H− , k) and |C (+) (m − m , H− , k). The corresponding 21 u(u + 1) irreducible representation characters are given by, R,IV χm,m (q, z, ζ ) = q
·
2 −H 2 H− + k+1
zH− ζ H+ F R (q, z, ζ ) u
(5.40)
q − 8 η(q u )3 ϑ1,1 (q u , zq −m ) ϑ1,0 (q u , z 2 ζ 2 q −m ) ϑ1,0 (q u , z 2 ζ − 2 q m −m ) 1
1
1
1
,
Vertex Operator Extensions of Dual Affine s(2) Algebras
529
with ϑ1,0 (q, z 2 ζ − 2 ) ϑ1,0 (q, z 2 ζ 2 ) F (q, z, ζ ) = ϑ1,1 (q, z) n≥1 (1 − q n )3 1
1
1
1
R
(5.41)
being essentially the Verma module character (B.8). Note that one of these characters only (the vacuum representation m = m = 0) is regular in the limit where z → 1. V. The class V constraints on the highest weight state isospin and hypercharge are given by, 2H− − (k + 1)(M + M + 2) = 0
H+ − H− = −(M + 1)(k + 1), (5.42)
and
where 0 ≤ M + M ≤ u − 2 and M, M ∈ Z+ . The charged singular vectors in the corresponding Verma module are |C (−) (M + 1, H− , k) and |C (+) (−M − 1, H− , k); the corresponding 21 u(u − 1) irreducible representation characters are R,V χM,M (q, z, ζ ) = q
·
2 −H 2 H− + k+1
zH− ζ H+ F R (q, z, ζ ) u
(5.43)
−q − 8 η(q u )3 ϑ1,1 (q u , zq M+M +2 ) 1 2
1 2
1 2
ϑ1,0 (q u , z ζ q M +1 ) ϑ1,0 (q u , z ζ
− 21
q M+1 )
.
Note that u − 1 of them are regular in the limit as z → 1 (they correspond to M + M = u − 2). As we will see momentarily, a given orbit of the spectral flow (B.7) involves the χ R,V as well as the χ R,IV type characters. Lemma 5.2. The character of L u−s+1 , u−s−1 , 1 −1;β is given by 2u
u
2u
L
χ u−s+1 , u−s−1 , 1 −1;β (q, z, ζ ) = 2u
2u
u
R,IV χs−1,u−1−β (q, z, ζ ) for u − 1 < s + β R,V χu−s−1−β,β (q, z, ζ ) for u − 1 ≥ s + β
=
= q −(β+1)
β+s u
z
1−s 2u
ζ−
1+2β+s 2u
(5.44)
F R (q, z, ζ )
u
·
q − 8 η(q u )3 ϑ1,1 (q u , zq 1−s ) ϑ1,0 (q u , z 2 ζ 2 q β+1 ) ϑ1,0 (q u , z 2 ζ − 2 q −s−β ) 1
1
1
1
.
Indeed, according to (5.32), the relation between H+ and H− involves a strictly positive integer ν, and therefore, one is naturally in a class V context. The labels M and M are given by (5.42) with H± from (5.38), M = u(H− + H+ ) − 1 = −s − 1 − β + u, M = u(H− − H+ ) − 1 = β.
(5.45)
These can be translated into class-IV labels in the appropriate range in accordance with the spectral flow properties of the class IV and V characters, which we now consider. R,IV The spectral flow (B.7) acts on the χm,m characters by shifting the labels as m → m,
m → m − θ,
(5.46)
530
P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina
and for some values of θ , the parameter m may flow to a value outside the fundamental range 0 ≤ m ≤ m ≤ u − 1. This gives the class-V characters. Indeed, first note that it is sufficient to consider the spectral flow parameter θ in the range 0 ≤ θ ≤ u − 1, given the (quasi)-periodicity property of the class IV characters, R,IV n R,IV χm+nu,m +n u = (−1) χm,m .
(5.47)
Then we have, R,IV χm,m (τ, σ, ν + 2θ τ ) = q
θ 2 (1−u) u
θ 2 (1−u) u
R,IV χm,m −θ (τ, σ, ν)
ζ
θ (1−u) u
R,V χu−1−(m−m +θ),θ−m −1 (τ, σ, ν)
m + 1 ≤ θ ≤ u − 1 − (m − m ),
for R,IV χm,m (τ, σ, ν + 2θ τ ) = q
θ (1−u) u
0 ≤ m − θ ≤ m,
for R,IV χm,m (τ, σ, ν + 2θ τ ) = q
ζ
θ 2 (1−u) u
for
ζ
θ (1−u) u
R,IV χm,m −θ+u (τ, σ, ν)
u − (m − m ) ≤ θ ≤ u − 1.
On the other hand, the spectral flow transform (B.7) acts on class V characters by shifting the representation labels as M → M + θ,
M → M − θ,
(5.48)
and one has, R,V χM,M (τ, σ, ν + 2θ τ ) = q
θ 2 (1−u) u
for R,V χM,M (τ, σ, ν + 2(u − 1 − M )τ ) = q
θ (1−u) u
(M +1−u)2 (u−1) u
ζ
(M +1−u)(u−1) u
θ = u − 1 − M ,
θ 2 (1−u) u
for
R,V χM−θ,M +θ (τ, σ, ν)
M − (u − 2) ≤ θ ≤ u − 2 − M ,
for R,V χM,M (τ, σ, ν + 2θ τ ) = q
ζ
ζ
θ (1−u) u
R,IV χu−2−M−M ,0 (τ, σ, ν)
R,V χM−θ+u,M +θ−u (τ, σ, ν)
u − M ≤ θ ≤ u − 1.
5.4.3. Sumrules for s(2|1) characters. A key ingredient in the derivation of the character identities is provided by the decomposition formulas of the u2 characters given above into s(2) characters at the same level k = u1 − 1 [20], namely, R,IV χm,m (τ, σ, ν) =
R,V χM,M (τ, σ, ν) =
u−1 i=0 u−1 i=0
s(2)
χu+1,u,u−i,m+1 (τ, σ ) F IV (τ, ν), s(2)
χu+1,u,u−i,u−1−M−M (τ, σ ) F V (τ, ν),
(5.49)
(5.50)
where F IV (τ, ν) =
u−2 a=0
(u−1)
ci,X(i,a) (τ )θ(u−1)(m−2m +u)+uX(i,a),u(u−1) (τ, uν ),
(5.51)
Vertex Operator Extensions of Dual Affine s(2) Algebras
F (τ, ν) = V
u−2 a=0
531
(u−1)
ci,X(i,a) (τ )θ(u−1)(M −M)+uX(i,a),u(u−1) (τ, uν ).
(5.52)
(u−1)
In the above, ci,j (τ ) are the s(2) string functions at level u − 1 [26] and X(i, a) = u u u (u − 1)i + 2i( 2 − [ 2 ]) − 2a, where [ 2 ] is the integer part of u2 . The s(2) string functions u−1 characters at the level u−1 were first introduced by Kaˇc and Peterson to rewrite s(2) in terms of generalised theta functions at level u − 1, namely, u−1
s(2)
χu+1,1,r,1 (τ, ν) =
m=−u+2
(u−1)
cr−1,m (τ )θm,u−1 (τ, ν),
(5.53)
with the properties (k)
(k)
(k)
(k)
c,m (τ ) = c,−m (τ ) = ck−,k−m (τ ) = c,m+2nk (τ ),
(5.54)
where n ∈ Z and ≡ m mod 2. As is clear from (5.49) and (5.50), each s(2|1) character is decomposed in exactly u s(2) characters χu+1,u,r,s (τ, σ ) with s fixed. More precisely, at a fixed value of s in the range 1 ≤ s ≤ u, the u s(2|1) characters which can be decomposed into the u characters s(2) χu+1,u,r,s (τ, σ ) are, R,IV IV. χs−1,m with 0 ≤ m ≤ s − 1
(s characters),
R,V V. χM,M with M + M = u − 1 − s
(u − s characters).
characters are not only Decomposition formulas (5.49) and (5.50) for the s(2|1) 1 u −1 explicitly given in terms of characters of the subalgebra s(2) 1 , but they also encode u −1
u−1 through the s(2) some information on a dual algebra s(2) string functions at the level u − 1. We were able to derive character identities for the values u = 2 and u = 3 by using standard, albeit somewhat tedious, manipulations on generalised theta functions. On the basis of these two cases, we were led to conjecture identities for arbitrary u ∈ N. One may write 2u sumrules for Ramond s(2|1) characters at the level k = u1 − 1 corresponding to the decomposition (5.25) for irreducible representations. As before, we use (λ)u to denote the residue modulo u of λ, and [ uλ ] to denote the integer part of uλ . The 2u sumrules read SλR (τ, ρ, σ, ν) :
R AR λ (τ, ρ, σ, ν) = Bλ (τ, ρ, σ, ν)
λ = 0, . . . , 2u − 1, (5.55)
where AR λ (τ, ρ, σ, ν)
=
(λ)u i=0
+
R,IV θu−2i+λ,u (τ, −ρ)χ(λ) (τ, σ, ν) u ,i u−1
i=(λ)u +1
(5.56) R,V θu−2i+λ,u (τ, −ρ)χi−1−(λ) (τ, σ, ν) u ,u−1−i
532
P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina
and BλR (τ, ρ, σ, ν) = θ1+λ,1 (τ, u−1 u ν − ρ) ·
N i=0
s(2) s(2) (τ, σ ) χu+1,1,u−2i,1 (τ, uν u+1,u,2i+1+(u−1−4i)[ uλ ],(λ)u +1
χ
+ θλ,1 (τ,
u−1 u ν
− ρ) ·
N
+ ρ)
s(2) u+1,u,2i+2+(u−3−4i)[ uλ ],(λ)u +1
χ
i=0 s(2) · (τ, σ ) χu+1,1,u−(2i+1),1 (τ, uν
+ ρ),
(5.57)
with u − 2 ≤ 2N ≤ u − 1 and u − 3 ≤ 2N ≤ u − 2. Note that the second sum in (5.56) is zero for (λ)u = u − 1. There is complete agreement between the sumrules (5.55) and the decomposition formula (5.25). The appearance of the level-u theta functions was explained in (5.26). The theta functions at the level κ = 1 represent the level-one s(2) integrable characters, since one has 1 s(2) θr−1,1 (τ, ν) = χ3,1,r,1 (τ, ν), η(τ )
r = 1, 2.
(5.58)
Strictly speaking, Eq. (5.25) provides the first u sumrules, but the other set is obtained by transforming each of them under the flow, σ → σ − uτ, ν → ν + uτ, ρ → ρ − τ.
(5.59)
To make the correspondence between (5.55) and (5.25) transparent, it remains to recall Lemma 5.2. We end up this section by observing that the above sumrules behave in a remarkable way under two particular flows of the variables σ , ν, and ρ. We first note that the class IV and V characters transform as R,IV χm,m (τ, σ + uτ, ν + (u − 2)τ ) R,IV for m ≤ m − 1, P χm,m +1 (τ, σ, ν) R,IV = P χu−1,0 (τ, σ, ν) for m = m = u − 1, P χ R,V 0,u−2−m (τ, σ, ν) for m = m ≤ u − 2,
R,V χM,M (τ, σ + uτ, ν + (u − 2)τ ) R,V for 0 ≤ M ≤ u − 3 and 1 ≤ M ≤ u − 2, P χM+1,M −1 (τ, σ, ν) = R,IV for 0 ≤ M ≤ u − 2 and M = 0, P χu−2−M,0 (τ, σ, ν)
(5.60)
(5.61)
where P = (−1)u+1 q
(u−1)2 u
z
u−1 2
ζ−
(u−1)(u−2) 2u
.
(5.62)
Vertex Operator Extensions of Dual Affine s(2) Algebras
533
R Remark 5.3. The u terms in AR λ (resp. Bλ ) form an orbit under the flow
σ → σ + uτ, ν → ν + (u − 2)τ, ρ → ρ + 2 uτ ,
(5.63)
as can be readily checked with the help of the spectral flow formulas (5.60), (5.61), and (5.28). The invariance under (5.63) indicates that each expression AR λ (equivalently, BλR ) potentially describes a character corresponding to a representation of the bigger affine Lie superalgebra D(2|1; α). Indeed, the spectral flow generator has isospin H− = 1 1/2, hypercharge H+ = 2−u , 2u a u(1) charge proportional to − u in the direction orthogonal to s(2|1) and conformal weight 1. (This follows by comparing the quantum numbers of the vacuum representation of s(2|1) ⊕ u(1) with those of the representation R,V ρ with character θ2u−2,u (τ, u )χ0,u−2 (τ, σ, ν), which are in the same spectral flow orbit and appear in the λ = u sumrule). If one extends the algebra s(2|1) ⊕ u(1) by this spectral flow generator, one generates D(2|1; α), as can be most quickly understood by looking at the D(2|1; α) root diagram in Appendix 6. Indeed, consider for example the first embedding of s(2|1) in D(2|1; α) in Table 1; the spectral flow generator can be identified with the current corresponding to the root α1 + α2 , which is needed in order to extend the s(2|1) root diagram to D(2|1; α). We next note that for α ∈ Z chosen in such a way that 0 ≤ m + λ − uα ≤ u − 1, we have R,IV χm,m (τ, σ − λτ, ν + λτ ) λ(u−1) λ(u−1) R,IV (−1)λ+α z− 2u ζ − 2u χm+λ−uα,m (τ, σ, ν) for m ≤ m + λ − uα, = λ(u−1) λ(u−1) (−1)λ+α z− 2u ζ − 2u χmR,V −m−λ+uα−1,u−1−m (τ, σ, ν) for m > m + λ − uα.
(5.64)
For class V characters, with α ∈ Z chosen such that 0 ≤ u−2−M −M +λ−uα ≤ u−1, similarly, R,V χM,M (τ, σ − λτ, ν + λτ ) λ(u−1) λ(u−1) R,V (−1)λ+α z− 2u ζ − 2u χM−λ+uα,M (τ, σ, ν) for M − λ + uα ≥ 0, = λ(u−1) λ(u−1) R,IV (−1)λ+α z− 2u ζ − 2u χu−2−(M−λ+uα)−M ,u−1−M (τ, σ, ν) for M − λ + uα < 0.
(5.65)
Remark 5.4. The λ th sumrule may be obtained from the λth sumrule by the transformation σ → σ − (λ − λ)τ, ν → ν + (λ − λ)τ,
ρ → ρ − (λ
− λ) τu .
(5.66)
534
P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina
In particular, because of the quasi-periodicity of the level-u theta functions and because s(2|1) characters are periodic under the simultaneous shifts σ → σ − uτ and 1 u −1 ν → ν + uτ , one has the quasi-periodicity properties −u u 1−u 1−u R AR ζ Aλ (τ, ρ, σ, ν), λ (τ, ρ − 2τ, σ − 2uτ, ν + 2uτ ) = q y z
BλR (τ, ρ − 2τ, σ − 2uτ, ν + 2uτ ) = q −u y u z1−u ζ 1−u BλR (τ, ρ, σ, ν),
(5.67)
and the set of 2u sumrules (5.55) is closed under the above spectral flow. Moreover, this set carries a unitary representation of the modular group, as can be explicitly checked by using the modular transformations of s(2) characters at integrable and fractional levels. A complete treatment of the modular properties must include the Neveu–Schwarz, the “super” Ramond and “super” Neveu–Schwarz sectors, which may be obtained as follows [20]. The Neveu–Schwarz sumrules are derived by flowing the Ramond sector sumrules according to, SλR (τ, ρ, −σ − τ, ν) = q −
1−u 4u
z−
1−u 2u
SλNS (τ, ρ, σ, ν),
(5.68)
while the “super” Ramond and Neveu–Schwarz sectors are given by, SλR,NS (τ, ρ, σ, ν). SλR,NS (τ, ρ, σ + 1, ν) =
(5.69)
It should be noted that the group of modular transformations and the group of spectral flow transformations on characters are combined into an “extended modular group” via the semi-direct product, in which the spectral flow transformations are the invariant subgroup; thus, the “extended modular group” representation can be induced from the spectral flow representation, and these are the transformations that close on the chosen representations. The relevance of these facts to the representation set of s(2|1) 1 u −1 theory of D(2|1; α) are beyond the present scope and are left aside for future work. 6. Conclusions We have seen that a vertex operator extension of two s(2) algebras at levels k and k satisfying the duality relation (k + 1)(k + 1) = 1 yields an interesting structure, the exceptional affine Lie superalgebra D(2|1; α). This novel construction should provide the setting needed to build some classes of D(2|1; α) representations whose characters are given by either side of the sumrules (5.55). In this paper however, we made use of the above vertex operator extension to con struct representations of the s(2|1) subalgebra of D(2|1; α), and saw that they give sums of representations “twisted” by the s(2|1) spectral flow (5.1). We also derived the k corresponding character identities relating s(2|1) characters to the constituent s(2) k characters, both for Verma modules (Sect. 5.2) and irreducible represenand s(2) tations (Eq. (5.55)). The latter identities involve s(2|1) representations at admissible 1 (non integrable) level k = u1 − 1, u ∈ N and relate them to representations of s(2) u −1 u−1 . Interestingly enough, the admissible s(2|1) and s(2) representations are re1 u −1
1 representations, whose characters are periodic under lated to the admissible s(2) u −1 u−1 representations, which the spectral flow with period 2u, and to the integrable s(2) are themselves periodic under the spectral flow with period 2 [16]. The interplay of admissible and integrable representations within a bigger algebraic structure is quite
Vertex Operator Extensions of Dual Affine s(2) Algebras
535
remarkable and should be further exploited in the way any duality is: in this context for instance, it should relate the representation theory of D(2|1; α) at integer and fractional levels. We hope to return to these issues elsewhere. Another important point raised by the relation between representations and by the corresponding character identities is that of the closure under modular transformations. The s(2|1) characters appearing in these identities do carry a unitary representation of the modular group [23], and using this information, one can explicitly check that the 2u functions A(λ), λ = 0, . . . 2u − 1 on the left-hand side of the sumrules (5.55) also carry a unitary representation of the modular group. It is therefore tempting to identify these functions with the characters corresponding to a particular class of D(2|1; α) represen tations satisfying the requirements for s(2|1) ⊕ u(1) ˆ to be conformally embedded in D(2|1; α), as discussed in Sect. 3. A very interesting general problem is to verify the functorial properties of the cor k ⊕ s(2) k and s(2|1) respondence established between the s(2) k representations; in the simpler case of the relation between s(2) and N = 2 superconformal representations [17], a similar relation is the equivalence of categories modulo the spectral flow (i.e., the equivalence of representation theories of two algebras obtained by extending the universal enveloping of s(2) and N = 2, respectively, by the spectral flow operator). We have seen that in the present case, the spectral flow plays a very similar rôle, and thus an interesting problem is whether we again can construct a similar functor; the two cases are actually related by the Hamiltonian reduction functor k ⊕ s(2) k ←−−−−−−−−−− s(2|1) s(2) k −−−−−−−−−−→ Hamiltonian Reduction [4,3,22] ! ! k −1 s(2)
) mod spectral flow
←−−−−−−−−−−− N = 2 −−−−−−−−−−→
which may be extended to an argument demonstrating the functorial properties of the correspondence found in this paper. We note that the character identity corresponding to (1.2) turns out to be equivalent to the character identity derived from the correspon dence between the s(2) and N = 2 Verma modules [16]; this identity has also been known in a different representation-theory context [28]. As another future research direction, we note the possibility to use various free-field (and other) realisations of s(2|1) [35, 9] in the construction of Sects. 4.1–4.2; it would be interesting, for example, to relate the corresponding screening operators and interpret this relation in terms of the respective quantum groups. Finally, it is worth exploring the geometric interpretation of the construction for s(2|1) found in this paper. Despite the complication generated by the presence of the odd integer n = 1 on the right-hand side of (1.3), we expect to be able to proceed similarly to the n = 0 case, relevant to the coupling of matter to gravity, where no auxiliary scalar " C) = { : S 1 → is needed and the basic geometric setting involves the loop group SL(2, SL(2, C)}. In that case, the analogue of s(2|1) is an extended algebra consisting of the semi-direct product of two commuting s(2) algebras (corresponding to the left and the right actions on the loop group) with levels k1 and k2 constrained by Eq. (1.3) with n = 0, and the contracted vertex operators C2 (z) ⊗ C2 (z), which are functions on the group and can be viewed as matrix elements of the evaluation representation (in contrast with the s(2|1) case, C2 (z) ⊗ C2 (z) do commute). The constraint on k1 and k2 actually
536
P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina
follows from imposing the Knizhnik–Zamolodchikov equation on the C2 (z) ⊗ C2 (z) vertex operators. The vacuum representation of the extended algebra can be described in terms of distributions on G+ – the subgroup of the loop group consisting of mappings that extend to the origin from S 1 = {z ∈ C : |z| = 1} – and is given by a sum of Weyl modules of the two s(2) algebras. A different piece of this picture is provided " C)/G+ , where G− is the subgroup by distributions on the double quotient G− \SL(2, " C) consisting of loops which are the boundary values of holomorphic maps of SL(2, : {z ∈ C ∪ ∞ : |z| > 1 → SL(2, C)} [32]. The equivalence classes X(n) are labelled by a positive integer n ∈ Z+ . The distributions living on a given X(n) carry the left and the right s(2) actions; this time, however, the Weyl modules are combined differently since W ⊗ Wm enters as many times as there are n-dimensional s(2) representations in the tensor product of the - and m-dimensional ones. It is interesting to investigate how much of this description can be carried over to the “noncommutative” case of the s(2|1) algebra constructed in this paper. Acknowledgements. We thank A. Belavin, H. Kausch, W. Oxbury, I. Shchepochkina, I. Tipunin, and G. Watts for discussions. This work was supported by the EPSRC grant GR/M12544 and partly by the RFBR Grant 9801-01155 and the Russian Federation President Grant 99-15-96037. A.M.S. gratefully acknowledges kind hospitality extended to him at the Department of Mathematical Sciences, University of Durham. A.T. acknowledges The Leverhulme Trust for a fellowship.
Appendix A. Some s(2) Quantum Group Relations In describing the s(2)q quantum group, we follow the conventions of [29]. The quantum group relations are K K −1 = K −1 K = 1, K E K −1 = q 2 E,
K F K −1 = q −2 F,
(A.1)
K − K −1 [E, F ] = . q − q −1 The antipode acts on these generators as follows: S(E) = −E K −1 ,
S(F ) = −K F,
S(K) = K −1 ,
S(K −1 ) = K,
(A.2)
and the comultiplication is given by #(E) = 1 ⊗ E + E ⊗ K, #(K) = K ⊗ K,
#(F ) = K −1 ⊗ F + F ⊗ 1, #(K
−1
)=K
−1
⊗K
−1
.
(A.3) (A.4)
Together with the counit ε given by ε(E) = ε(F ) = 0, ε(K) = ε(K −1 ) = 1, these relations endow s(2)q with a Hopf algebra structure. For a module V over a Hopf algebra A, the A action on the dual module V∗ is defined by (a f )(v) = f (S(a) v),
a ∈ A,
f ∈ V∗ ,
v ∈ V.
(A.5)
Let V",n be the s(2)q module with the highest-weight vector v0 such that E v0 = 0,
K v0 = "q n v0 ,
(A.6)
Vertex Operator Extensions of Dual Affine s(2) Algebras
537
where " 2 = 1 and n is a positive integer. We then define F vi−1 = [i] vi ,
(A.7)
whence K vi = " q n−2i vi .
E vi = "[n − i + 1] vi−1 , We use the standard notation q i − q −i [i] = , q − q −1
# $ n [n]! , = [i]! [n − i]! i
[i]! = [1] [2] . . . [i].
There is an invariant scalar product (vi , vj ) = δij q
(A.8)
−i(n−i−1)
# $ n . i
(A.9)
(A.10)
In the V",n modules with n = 1, in particular, the s(2)q action on the basis vectors v0 and v1 is given by E v0 = 0, E v1 = "v0 ,
K v0 = "q v0 , K v1 = "q
−1
v1 ,
F v0 = v1 ,
(A.11)
F v1 = 0.
(A.12)
In the dual module with the dual basis v i , we then find from (A.5) and (A.2): E v0 = − q v1,
K v 0 = "q −1 v 0 ,
F v 0 = 0,
E v = 0,
K v = "q v ,
F v = −"q
1
1
1
(A.13)
1
V" ,1
−1 0
v0
v . v1 . As
Let be a similar module over s(2)q −1 , with the basis and check, it is also a module over s(2)q , the s(2)q action being given by E v0 = v1 , E v1
= 0,
(A.14) is easy to
K v0 = " q −1 v0 ,
F v0 = 0,
(A.15)
K
F
(A.16)
v1
The tensor product of s(2)q modules
="q V" ,1
v1 ,
v1
=
" v0 .
⊗ V",1 is decomposed as
V" ,1 ⊗ V",1 = V"" ,0 ⊕ V"" ,2 ,
(A.17)
where V"" ,2 is generated from v1 ⊗v0 , and V"" ,0 from v0 ⊗v0 −qv1 ⊗v1 . The projection V" ,1 ⊗ V",1 → V"" ,0 can be defined in terms of the trace ·, · such that v0 , v0 = 1,
v1 , v1 = −q,
v0 , v1 = 0,
v1 , v0 = 0.
(A.18)
The vertex operator construction involves this trace operation in the case where " = 1 and " = −1: from the quantum group representation standpoint, each s(2|1) k (or D(2|1; α)k ) fermion is constructed as −qv1 ⊗ v1 + v0 ⊗ v0 = qv1 ⊗ F v0 − " F v1 ⊗ v0 .
(A.19)
With the basis of the quantum group module represented by the vertex operators as in Sect. 2, we identify qF in the first term with the screening operator S acting in the unprimed sector, and −" F in the second terms with S acting in the primed sector; we then denote the action of S or S on the respective vertex operator with a tilde. This gives (2.12)–(2.13), where we omit the tensor product sign and write the primed and the unprimed multipliers in the reversed order compared to the formula with the tensor-product notation; we hope that this minor notational discrepancy does not lead to confusion.
538
P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina F2
E1
E 12
F 12
F1
E2
Fig. 2. The s(2|1) root diagram
Appendix B. s(2|1) Algebra, Spectral Flow, and Charged Singular Vectors The affine Lie superalgebra s(2|1) consists of four bosonic currents E 12 , H − , F 12 , and + 1 H and four fermionic ones, E , E 2 , F 1 , and F 2 . The s(2) subalgebra is generated by E 12 , H − , and F 12 , and it commutes with the u(1) subalgebra generated by H + . For reference, we give in Fig. 2 the two-dimensional root diagram of the finite dimensional Lie superalgebra s(2|1), represented in Minkowski space with the fermionic roots along the light cone directions. We sometimes refer to the eigenvalue corresponding to a given eigenstate of the Cartan generator H0− (resp. H0+ ) as the isospin (resp. the hypercharge) of that state. At level k, the nonvanishing commutation relations are given by 12 , [Hm− , En12 ] = Em+n
12 , [Hm− , Fn12 ] = −Fm+n
1 , [Fm12 , En2 ] = Fm+n
12 , F 2 ] = −E 1 [Em n m+n ,
2 , [Fm12 , En1 ] = −Fm+n
12 , F 1 ] = E 2 [Em n m+n ,
1 [Hm± , En1 ] = 21 Em+n ,
1 , [Hm± , Fn1 ] = − 21 Fm+n
− 1 12 , F 12 ] = mδ ± ± [Em m+n,0 k + 2Hm+n , [Hm , Hn ] = ∓ 2 mδm+n,0 k, n
[Hm± , En2 ] 1 , F 1] [Em n + 2 [Em , Fn2 ]+ 1 , E2] [Em n +
= = = =
[Hm± , Fn2 ] + − −mδm+n,0 k + Hm+n − Hm+n , + − mδm+n,0 k + Hm+n + Hm+n , 12 , Em+n [Fm1 , Fn2 ]+ 2 ∓ 21 Em+n ,
=
(B.1)
2 , ± 21 Fm+n
12 . = Fm+n
One of the s(2|1) k spectral flows is given by Uθ :
1 , E 2 → E 2 , En1 → En−θ n n+θ 1 , F 2 → F 2 , Fn1 → Fn+θ n n−θ
Hn+ → Hn+ + kθ δn,0
(B.2)
(with the s(2) subalgebra remaining invariant). For θ ∈ Z, this is an automorphism of s(2|1). Applying the spectral flow to modules gives twisted modules. A twisted module with a vacuum vector is, thus, generated from the state |h− , h+ , k; θ (which we call the twisted highest-weight vector) satisfying the twisted highest-weight conditions 1 |h− , h+ , k; θ = 0 E−θ
Eθ2 |h− , h+ , k; θ = 0
F112 |h− , h+ , k; θ = 0,
(B.3)
and whose quantum numbers of hypercharge and isospin are given by, H0+ |h− , h+ , k; θ = (h+ − kθ ) |h− , h+ , k; θ ,
H0− |h− , h+ , k; θ = h− |h− , h+ , k; θ ,
(B.4)
Vertex Operator Extensions of Dual Affine s(2) Algebras
539
where k is the level and θ is the twist. The eigenvalue of H0+ is parametrised as h+ − kθ so as to have the same value of h+ for all the modules differing from each other by a spectral flow transform. We assume θ ∈ Z in most of our formulas, with the necessary modifications for θ ∈ Z + 21 to be done in accordance with the spectral flow transform. The dimension of |h− , h+ , k; θ with respect to the Sugawara energy-momentum tensor
1 H − H − − H + H + + E 12 F 12 + E 1 F 1 − E 2 F 2 (B.5) TSug = k+1 is given by h2− − h2+ + 2θh+ − kθ 2 . (B.6) k+1 The character of a twisted module N;θ is expressed through the “untwisted” character χN as #h− ,h+ ,k;θ =
N (q, z, ζ ) = ζ −kθ q −kθ χ N (q, z, ζ q 2θ ). χ;θ 2
(B.7)
The twisted Verma module Ph− ,h+ ,k;θ is freely generated from |h− , h+ , k; θ by 1 2 1 , F 2 , E 12 , F 12 , H + , and H − . For an integral θ , the E≤−θ−1 , E≤θ−1 , F≤θ ≤−θ ≤−1 ≤−1 ≤−1 ≤0 character of Ph− ,h+ ,k;θ is 2 − h2 Sug
+ − −h+ 2 Tr zH0 ζ H0 q L0 = zh− ζ h+ −(k+1)θ q k+1 +2θh+ −(k+1)θ
(B.8)
ϑ1,0 (q, z 2 ζ 2 ) ϑ1,0 (q, z 2 ζ − 2 ) , ϑ1,1 (q, z) m≥1 (1 − q m )3 1
·
1
1
1
where the Jacobi theta functions are defined by % % % 1 2 ϑ1,1 (q, z) = (−1)m q 2 (m −m) z−m = (1 − z−1 q m ) (1 − zq m ) (1 − q m ), m≥0
m∈Z
ϑ1,0 (q, z) =
2 −m)
q 2 (m 1
z−m =
%
(1 + z−1 q m )
m≥0
m∈Z
m≥1
%
(1 + zq m )
m≥1
m≥1
%
(B.9) (1 − q m ).
m≥1
(B.10) Among singular vectors that can exist in Ph− ,h+ ,k;θ , we note the so-called charged singular vectors. They occur whenever h+ = ±h− − (k + 1)n,
n∈Z
(B.11)
and are given by an explicit construction as follows [34]. For h+ − h− = −(k + 1)n, n ∈ Z, the charged singular vector in the twisted Verma module Ph− ,h+ ,k;θ reads (−) C (n, h− , k; θ ) = 2 . . . E 2 · F 1 . . . F 1 |h , h − n(k + 1), k; θ, n ≤ 0, E θ+n θ−1 θ+n θ − − −n −n+1 (B.12) 1 1 2 2 E . . . E · F −θ−n −θ−1 1−θ−n . . . F−θ |h− , h− − n(k + 1), k; θ, n ≥ 1. n
n
540
P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina
As is easy to check, this vector satisfies the highest-weight conditions (we omit the singular vector itself) 1 ≈0 E−θ−n
2 Eθ+n ≈0
F112 ≈ 0
(B.13)
1 ≈ 0. together with Fθ+n Similarly, whenever h+ + h− = −n(k + 1), n ∈ Z, the charged singular vector in Ph− ,h+ ,k;θ is
(+) C (n, h− , k; θ ) = 2 2 ·F1 . . . Fθ1 |h− , −h− − n(k + 1), k; θ, Eθ+n . . . Eθ−1 θ+n+1 −n
−n
1 1 E−θ−n . . . E−θ−1 ·F2 . . . F 2 |h− , −h− − n(k + 1), k; θ, −θ −n −θ n
n ≤ −1, n ≥ 0.
(B.14)
n+1
2 This satisfies the highest-weight conditions (B.13) supplemented by F−θ−n ≈ 0. It is (±) straightforward to check that the highest-weight conditions satisfied by |C (n, h− , k; θ) imply that this vector generates a submodule in the respective Ph− ,h+ ,k;θ module.
Appendix C. D(2|1; α) The one-parameter family of exceptional Lie superalgebras D(2|1; α), with α ∈ C \ {0, −1, ∞} are basic type II classical simple complex Lie superalgebras in the Kaˇc classification [24, 11].At each fixed value of α, D(2|1; α) is a rank 3 superalgebra with six even roots and eight odd roots. Its dual Coxeter number h∨ is zero and its superdimension, which is the number of bosonic generators minus the number of fermionic generators, is sdim = 9 − 8 = 1. The central charge of the Virasoro algebra satisfied by the Sugawara energy-momentum tensor of the affine algebra is therefore 1 for any value of the level κ since, c=
κ sdim = 1. κ + h∨
(C.1)
The bosonic part of D(2|1; α) is s(2)⊕s(2)⊕s(2), and the action of D(2|1; α)0 on D(2|1; α)1 is the product of 2-dimensional representations. The root diagram (see Fig. 3) can be visualised in a parallelipiped in 3d space with metric gij = diag(−1, −1, 1). All odd roots are at the vertices of the parallelipiped on the light cone. Six even roots lie on the three lines through the centre of the faces. The Weyl group does not act transitively on the set of simple roots, and there are six choices of simple root systems. We describe in more detail the system of simple roots with one odd root, α1 , and two even roots, α2 and α3 . The three regular s(2) subalgebras are in the directions of α2 , α3 and αθ = 2α1 + α2 + α3 . The relevant scalar products are summarised as α12 = 0, α22 = −2γ , α32 = −2(1 − γ ), αθ2 = 2, α1 · α2 = γ , α1 · α3 = 1 − γ , α2 · α3 = 0, α3 · αθ = 0, α2 · αθ = 0, α1 · αθ = 1,
(C.2) (C.3) (C.4)
Vertex Operator Extensions of Dual Affine s(2) Algebras
541
αθ
α1
α3
α2
Fig. 3. The D(2|1; α) root diagram
where γ =
α , 1+α
γ ∈ C \ {0, 1, ∞}.
(C.5)
With the metric above, we have α1 =
1 − γ , 1),
α2 = ( 2γ , 0, 0),
√ √1 (− γ , − 2
α3 = (0, 2(1 − γ ), 0). (C.6)
The mapping t1 : γ → 1 − γ interchanges the rôles of α2 and α3 , while the mapping t2 : γ → γ −1 interchanges those of α2 and αθ . These two transformations generate an order-6 group defined by the relations t12 = t22 = 1, t1 t2 t1 = t2 t1 t2 , and one has the isomorphisms γ D(2|1; 1−γ ) ) D(2|1; 1−γ γ ) 1 ) ) D(2|1; − γ1 ) ) D(2|1; γ − 1) ) D(2|1; −γ ). ) D(2|1; γ −1
(C.7)
If one restricts the parameter γ to real values, it is sufficient to consider the domain γ ∈ [1/2, 1[ for which αθ is always the longest root. As far as the affine superalgebra is concerned, the isomorphisms are, γ 1−γ 1 D(2|1; 1−γ )k ) D(2|1; γ )k ) D(2|1; γ −1 )− k γ
) D(2|1; γ − 1)− k ) D(2|1; − γ1 )− γ
k 1−γ
) D(2|1; −γ )−
k 1−γ
.
(C.8)
542
P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina
Appendix D. s(2|1) OPE’s We use a number of standard integrals, assuming whenever necessary that they are analytically continued from the domain where they are well-defined. For Re z > Re w, we have
z U(α + 1)U(β + 1) (D.1) dx (z − x)α (x − w)β = (z − w)a+b+1 U(α + β + 2) w and also
∞ U(α + 1)U(−1 − α − β) dx (x − z)α (x − w)β = (z − w)α+β+1 U(−β) z sin πβ U(α + 1)U(β + 1) . = −(z − w)α+β+1 sin π(α + β) U(α + β + 2) (D.2) In evaluating the operator product E 1 (z) · E 2 (w), we have the following operators √1
ϕ
in (2.30): V1 = e 2p , V2 = e above integrals, we obtain
∞
z
du S(u) V1 (z) V2 (w)
= β(w)
sin sin
π p 2π p
(z − w)
√1 ϕ 2p
∞ z
1 2 2p +1− p
, V1 = e
√1 ϕ 2p
, and V2 = γ e
√1 ϕ 2p
. Using the
dx S (x) V1 (z) V2 (w) U(1 − p1 )U(1 − p1 ) U(2 − p2 )
· (z − w)
1 −2 2p p
U( p1 )U(2 − p2 ) U(2 − p1 )
= (z − w)− 2 β(w) 1
πp
sin
(D.3)
2π p
plus lower-order terms. This is to be multiplied with the operator product √1
f (z) − √1 f (w)
e 2 = (z − w)− 2 + . . . , which restores the first-order pole. We now e 2 recall the normalisation factors (−1) · πpi cos πp from (2.12) and the factor −2i sin πp from (2.30). This gives the operator product 1
E 12 (w) −β(w) = , z−w z−w which is in agreement with (B.1). The remaining terms in (2.30) cancel each other. Evaluating the operator product E 1 (z) · F 1 (w), we have V1 = e V1 = e
√1 ϕ 2p
, and V2 = γ e
√1 ϕ 2p
√1 ϕ 2p
, V2 = γ e
√1 ϕ 2p
. Then
∞
du S(u) V1 (z)V2 (w)
∞ 1 1 = (z − w) 2p du − u−w + p2 ∂ϕ(w) + βγ (w) − z
z
√1 2p
z−w ∂ϕ(w) u−w + ...
(u − z)
− p1
(u − w)
− p1
,
Vertex Operator Extensions of Dual Affine s(2) Algebras
543
(where the dots denote higher-order terms). This equals '' & U(1 − p1 )U(−1 + p2 ) & p = (z − w) . p − 2 + (z − w) 2 ∂ϕ(w) + βγ (w) U( p1 ) (D.4) (∞ Next, the primed sector integral z du S (u) V1 (z)V2 (w) is obtained from the last expression by simply replacing p → p , ϕ → ϕ , β → β , and γ → γ . Multiplying the primed and the unprimed contributions, we thus obtain
∞
∞ du S(u) V1 (z) V2 (w) dx S (x) V1 (z) V2 (w) z z p ' 1 2 & U(1 − )U(−1 + ) 1 p−2 2 ∂ϕ(w) + βγ (w) p p 2 = (z − w) + (z − w)2 z−w U( p1 ) ' p U( p1 )U(1 − p2 ) & p − 2 2 ∂ϕ (w) + β γ (w) × + z−w (z − w)2 U(1 − p1 ) & ( p ∂ϕ + βγ ) − p( p ∂ϕ + β γ ) ' p 1 p (p − 2) π 2 2 = (z − w) 2 . + 2π 2 (z − w) z − w sin p 2 1 2p − p
√1 f (z) 2
This is further multiplied with e
e
− √1 f (w) 2
= (z−w)− 2 (1+ √1 (z−w)∂f +. . . ); in 1
2
addition, we recall the normalisations (−1)· πpi cos πp in (2.12) and the factor −2i sin from the first term in (2.30). Thus, the first term in (2.30) gives the contribution p p + β γ ) − ( p ∂ϕ + βγ ) − (p−2) √ ∂f ( ∂ϕ −(p − 2) 2 2 p 2 E 1 (z) F 1 (w) = + 2 (z − w) z−w −k H+ − H− = + . 2 (z − w) z−w
π p
The remaining terms in (2.30) cancel each other and, therefore, the operator product E 1 (z) F 1 (w) given by the last formula is in agreement with the respective commutator in (B.1). We will need the first regular term in the above expansion when we calculate the s(2|1) energy-momentum tensor in Lemma 2.2, namely, −p E 1 F 1 = √p βγ ∂f − √p β γ ∂f − p2 β γ ∂ϕ + p2 βγ ∂ϕ 2 2 √ √ p (p−2) p 2 √ √ − p(p−1) β γ ∂ϕ − βγ ∂ϕ + ∂ ϕ (D.5) 2 +
p+2 4(p−1) ∂ϕ∂ϕ
+
p−2 2
+
p 2 2∂ ϕ
2(1−p) 2 2(p−1) (p−2) p−2 2 √ p∂ f − √ ∂ϕ∂ϕ + 41 (2 − 3p)∂ϕ ∂ϕ 2 p−1 2 2
+
p √ 2 p ∂ϕ∂f
−
p 2
p ∂ϕ ∂f +
p 4
(p − 2)∂f ∂f + X,
where X is a contribution that cancels against similar terms coming from E 2 F 2 .
544
P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina
As regards E 2 (z) · F 2 (w), we again use (2.30), where now V1 = e √1 ϕ
√1 ϕ 2p
√1 ϕ 2p
√1 ϕ
, V2 =
γe ,(V1 = γ e 2p , and V2 = e 2p . We already know from (D.4) the unprimed ∞ integral z du S(u) V1 (z) V2 (w); the primed sector contributes is evaluated similarly. We only quote the first regular term −p E 2 F 2 = −pβ ∂γ −p ∂βγ + p2 βγ ∂ϕ − p2 β γ ∂ϕ + √p βγ ∂f 2 √ p p p(p−1) √ + √ β γ ∂f − √ βγ ∂ϕ − β γ ∂ϕ + (p−2)p ∂ 2f 2 2 2(1−p) 2 2 √ − p2 p2 ∂ 2 ϕ − p2 p2 ∂ 2 ϕ − p4 (p − 2)∂f ∂f + p2 p ∂ϕ∂f p−2 ∂ϕ∂ϕ + p2 p ∂ϕ ∂f − 4(p−1) +
2−p √ ∂ϕ∂ϕ 2 p−1
+ 41 (2 − p)∂ϕ ∂ϕ + X,
(D.6)
where X is a contribution that cancels between E 1 F 1 and E 2 F 2 . √1 ϕ The operator product F 1 (z) · F 2 (w) is evaluated similarly; we have V1 = γ e 2p , √1
√1 ϕ
ϕ
√1 ϕ
V2 = γ e 2p , V1 = γ e 2p , and V2 = e 2p . One then multiplies the contributions of the primed and unprimed sectors, given by
∞ z
and
z
∞
du S
(u) V1 (z) V2 (w)
= −(z − w)
1 −2 2p p
U(− p1 )U( p2 ) U( p1 )
,
∞ γ (w)
γ (z) − du S(u) V1 (z) V2 (w) = du βγ 2 − u−w u−z z
−1 −1 · 1 − p2 (u − w) ∂ϕ + √12p (z − w) ∂ϕ (u − z) p (u − w) p 1
= (z − w) 2p
− p2 +1
U(1 − p1 )U(−1 + p1 ) U( p1 )
(βγ 2 +
2p γ ∂ϕ + (p − 2)∂γ ),
where we see the J − current from (2.22). References 1. Aharony, G., Ganor, O., Sonnenschein, J., Yankielowicz, S., Sochen, N.: G/G models and WN strings. Phys. Lett. B289, 309–316 (1992) CurrentAlgebra. Commun. 2. Bernard, D., Felder, G.: Fock Representations and BRST Cohomology in s(2) Math. Phys. 127, 145–168 (1990) 3. Bershadsky, M., Lerche, W., Nemeschansky, D. and Warner, N.P.: Extended N = 2 Superconformal Structure of Gravity and W Gravity Coupled to Matter. Nucl. Phys. B401, 304–347 (1993) 4. Bershadsky, M. and Ooguri, H.: Hidden Osp(N, 2) Symmetries in Superconformal Field Theories. Phys. Lett. B 229, 374 (1989) 5. Bouwknegt, P., McCarthy, J., Pilch, K.: Quantum Group Structure in the Fock Space Resolutions of s(n) Representations. Commun. Math. Phys. 131, 125–155 (1990) 6. Bouwknegt, P., McCarthy, J., Pilch, K.: Free-Field Approach to 2-Dimensional Conformal Field Theories. Prog. Theor. Phys. Suppl. 102, 67–135 (1990)
Vertex Operator Extensions of Dual Affine s(2) Algebras
545
7. Bowcock, P., Hayes, M.R., Taormina, A.: Characters of admissible representations of the affine superal gebra s(2|1). Nucl. Phys. B 510, 739–763 (1998) 8. Bowcock, P., Hayes, M.R., Taormina, A.: Parafermionic representation of the affine s(2|1) at fractional level. hep-th 9803024 9. Bowcock, P., Koktava, R-L., Taormina, A.: Wakimoto modules for the affine superalgebra s(2|1) and noncritical N = 2 strings. Phys. Lett. 388, 303–308 (1996) 10. Bowcock, P., Taormina, A.: Representation theory of the affine Lie superalgebra s(2|1) at fractional level. Commun. Math. Phys. 185, 467–493 (1997) 11. Cornwell, J.F.: Group Theory in Physics. Vol.3, London–New York: Academic Press, 1989 12. David, F.: Conformal field theories coupled to 2D-gravity in the conformal gauge. Mod. Phys. Lett. A 3, 1651–1656 (1988) 13. Distler, J., Kawai, H.: Conformal field theory and 2D-quantum gravity. Nucl. Phys. B 321, 509–527 (1989) 14. Fan, J-B., Yu, M.: Modules over affine Lie superalgebras. hep-th 9304122 15. Fan, J-B., Yu, M.: G/G gauged supergroup valued WZNW field theory. hep-th 9304123 16. Feigin, B.L., Semikhatov, A.M., Sirota, V.A., and Tipunin, I.Yu.: Resolutions and Characters of Irreducible Representations of the N = 2 Superconformal Algebras. Nucl. Phys. B 536 617–656 (1999) 17. Feigin, B.L., Semikhatov, A.M., and Tipunin, I.Yu.: Equivalence between Chain Categories of Representations of Affine s(2) and N = 2 Superconformal Algebras. J. Math. Phys. 39, 3865–3905 (1998) 18. Goddard, P., Olive, D.: Kac–Moody and Virasoro algebras in relation to quantum physics. Int. J. of Mod. Phys. A1, 303–414 (1986) 19. Gomez, C., Sierra, S.: The Quantum Symmetry of Rational Conformal Field Theories. Nucl. Phys. B 352, 791–828 (1991) 20. Hayes, M.R., Taormina, A.: Admissible s(2|1) characters and parafermions. Nucl. Phys. B 529, 588–610 (1998) 21. Hu, H.L.,Yu, M.: On the equivalence of non-critical strings and G/G topological field theories. Phys. Lett. B 289, 302–308 (1992) 22. Ito, K., and Kanno, H.: Hamiltonian Reduction and Topological Conformal Algebra in c ≤ 1 Non-Critical Strings. Mod. Phys. Lett. A 9, 1377 (1994) 23. Johnstone, G.B.: Modular Transformations and Invariants in the Context of Fractional Level s(2|1; C). In preparation 24. Kaˇc, V.G.: A sketch of Lie Superalgebra Theory. Commun. Math. Phys. 53, 31–64 (1977) 25. Kaˇc, V.G.: Infinite Dimensional Lie Algebras. Cambridge: Cambridge University Press, 1990 26. Kaˇc, V.G., Peterson, D.H.: Infinite-dimensional Lie algebras, theta functions and modular forms. Adv. in Math. 53, 125–264 (1984) 27. Kaˇc, V.G., Wakimoto M.: Modular invariant representations of infinite-dimensional Lie algebras and superalgebras. Proc. Nat. Acad. Sci. 85, 4956 (1988) 28. Kac, V.G., Wakimoto, M.: Integrable highest weight modules over affine superalgebras and number theory. hep-th 9407057 29. Kassel, C.: Quantum Groups. Berlin–Heidelberg–New York: Springer, 1995 30. Mathieu, P., Walton, M.A.: On principal admissible representations and conformal field theory. hep-th 9812192 31. Mukhi, S., Panda, S.: Fractional-Level Current Algebras and the Classification of Characters. Nucl. Phys. B 338, 263–282 (1990) 32. Pressley, A., Segal, G.: Loop Groups. Oxford: Oxford Mathematical Monographs, 1986 33. Ramirez, C., Ruegg, H., Ruiz Altaba, M.: Explicit Quantum Symmetries of WZNW Theories. Phys. Lett. B 247, 499–508 (1990) 34. Semikhatov, A.M.: Verma Modules, Extremal Vectors, and Singular Vectors on the Non-Critical N = 2 String Worldsheet. hep-th/9610084 35. Semikhatov, A.M.: The Non-Critical N = 2 String is an s(2|1) Theory. Nucl. Phys. B 478, 209 (1996) 36. Tsuchiya, A., Kanie, Y.: Vertex Operators in the Conformal Field Theory on P1 and Monodromy Representations of the Braid Group. Lett. Math. Phys. 13, 303–312 (1987) Communicated by R. H. Dijkgraaf
Commun. Math. Phys. 214, 547 – 563 (2000)
Communications in
Mathematical Physics
© Springer-Verlag 2000
Monotonicity Properties of Optimal Transportation and the FKG and Related Inequalities Luis A. Caffarelli Department of Mathematics, The University of Texas at Austin, Austin, TX 78712-1082, USA. E-mail:
[email protected] Received: 18 October 1999 / Accepted: 24 March 2000
Abstract: Optimal transportation between densities f (X), g(Y ) can be interpreted as a joint probability distribution with marginally f (X), and g(Y ). We prove monotonicity and concavity properties of optimal transportation (Y (X)) under suitable assumptions on f and g. As an application we obtain the Fortuin, Kasteleyn, Ginibre correlation inequalities as well as some generalizations of the Brascamp–Lieb momentum inequalities. 0. Introduction We start this introduction by giving some background on optimal transportation and the FKG inequalities. 0.1. The problem of optimal transportation. We are given two probability densities f (X), g(Y ), and we want to transport the (variable X with) density f onto the (variable Y with) density g in a way that minimizes transportation costs, say for simplicity, C(Y −X). Let us first say what we mean by transporting f to g. (Pre) Definition. A smooth map Y (X) transports f to g if g(Y (X)) det DX Y = f (X). That is, a small differential of volume g(Y ) dy is pulled back to
f (X) dx
by the map Y (X).
Research was supported in part by the National Science Foundation, DMS-9714758.
548
L. A. Caffarelli
A weak formulation is the following: Definition 1. A (weak) transport is a measurable map Y (X), such that for any C0 function h(Y ) the following (“change of variable”) formula is valid: h(Y )g(Y ) dY = h(Y (X))f (X) dX. Now, given the cost function C(X), we define Optimal transportation. The (weak) transportation Y (X) is optimal if it minimizes J (Y ) = C(Y (X) − X)f (X) dx among all weak transportation. Existence and regularity of such an optimal transportation has been studied in detail. (See for instance [B, C2, C3] and [G-M].) We will discuss (and use) in this paper the particular case where C(X − Y ) = 21 |X − Y |2 . The correlation inequalities part of the paper holds true for more general cost functions, still convex and with the appropriate symmetries, but the proofs are technically involved and we will present it elsewhere. The second derivative estimates for the Monge-Ampere like equations corresponding to non-quadratic cost functions, is a completely open matter. In the quadratic case, there is a rather complete existence and regularity theory ([B, C2, C3]). We will be interested in the following results. Theorem 1 (Existence and stability, [B])). Let 1 , 2 be two open domains in Rn , f (X), g(Y ) two strictly positive bounded, measurable functions in i , with f (X) dX = g(Y ) dY = 1. 1
2
Then, a) There exists a unique optimal transportation map Y (X). b) The optimal transportation Y (X) (and its inverse X(Y )) are obtained from the following minimization process: b1 ) Among all pairs of continuous functions ϕ(X), ψ(Y ) satisfying the constraint ϕ(X) + ψ(Y ) ≥ X, Y minimize
J (ϕ, ψ) =
1
ϕ(X)f (X) dX +
2
ψ(Y )g(Y ) dY.
b2 ) ϕ and ψ are unique and convex and Y (X) is defined as the (possibly multiple valued) map Y ∈ Y (X) if ϕ(X) + ψ(Y ) = Y, X.
Monotonicity Properties of Optimal Transportation
549
Theorem 2 (Regularity, [C2, C3]). Hypothesis as before, assume further that 1 , 2 are convex. Then a) If 0 < λ ≤ f, g ≤ , the map Y (X) and its inverse X(Y ) are single valued, of class C α in i for some α. b) If f, g are Hölder continuous, with exponent β for some β then Y (X), X(Y ) are of class C 1,β . c) In both cases, (a) and b)), there exists a pair of convex potentials ϕ(X), ψ(Y ) such that Y (X) = ∇ϕ(X), X(Y ) = ∇ψ(Y ). d) ϕ satisfies the Monge–Ampére equation det D 2 ϕ(X) =
f (X) g(∇ϕ(X))
in case a) in the Alexandrov weak sense, in case b) in the classical sense. (Note that ϕ ∈ C 2,β .) By approximation, we will develop all our discussion for f, g of class C α , so we will always talk of “classical” solutions. From the variational construction of Y , we also have a stability theorem. Theorem 3 (Stability). Let fj , gj be uniformly bounded, measurable and supported in a bounded domain BR . Assume that fj → f in L1 , gj → g in L1 . Then ϕj → ϕ, ψj → ψ uniformly in BR . In particular if ϕj , ψj are uniformly C 1,α , then ∇ϕj , ∇ψj also converge uniformly to ∇ϕ, ∇ψ. We complete the discussion with the following interpretation (see [B]). If we think of f (X), g(Y ) as probability densities, we may think of the map Y (X) as a joint probability distribution: ν0 (X, Y ) in 1 × 2 , sitting on the graph X, Y (X) with the property that the marginals µ1 (X), µ2 (Y ) of ν0 are exactly f (X) dx and g(Y ) dy. In fact ν0 has the following minimizing property: Theorem ([B]). Among all probability measures ν(X, Y ) with marginals f (X) dX and g(Y ) dY , Y (X) minimizes E(ν) = |X − Y |2 dν(X, Y ).
0.2. The FKG inequalities. The FKG inequalities (see [FKG, H, P]) play a fundamental role in statistical mechanics. In this paper, we are interested in a theorem of Holley [H] from which the inequalities follow. Holley’s Theorem establishes a monotonicity condition for probability measures µ1 , µ2 defined on a finite lattice, . Let us discuss briefly his two main theorems. We consider a finite lattice (that we will think of as embedded in the set P of vertices of the unit cube of RN for some N (i.e., the set of all N -tuples, X = (x1 , . . . , xN ) with xi = 0 or 1. On , we have two non-vanishing probability measures µ1 (X), µ2 (X) with the “monotonicity property”: Given X, Y in , µ2 (X ∨ Y )µ1 (X ∧ Y ) ≥ µ2 (X)µ1 (Y ). (As usual ∨ denotes taking max in each entry, ∧ min.) Then
550
L. A. Caffarelli
Theorem 4 ([H]). There exists a joint measure ν(X, Y ) with marginals µ1 (X), µ2 (Y ) such that ν(X, Y ) = 0 ⇒ X ≤ Y. As a corollary, he obtains Corollary 1. If h is an increasing function of X, then h(X) dµ1 (X) ≤ h(X) dµ2 (X)
(that is µ2 is “concentrated more to the right” than µ1 ). The purpose of this paper is to study the relation between optimal transportation and the FKG inequalities, in particular to show: a) In the continuous case, the optimal transportation from the unit cube of Rn into itself (µ1 = f (X), µ2 = g(Y )) has the proper monotonicity properties (Y (X) ≥ X) of Holley’s joint probability density provided that f, g do). b) If we “spread” the measures µi from the vertices of the unit cube to half cubes, the densities f, g so obtained satisfy these properties, recuperating from this approach Holley’s theorem, for the lattice formed by all vertices of the cube. c) For a general sublattice, one can extend the “spread” measure to all of the half cubes recuperating in full the theorem of Holley. d) In fact the discrete optimal transportation satisfies Y (X) ≥ X. Our proof is based on the fact that first derivatives of solutions of the Monge–Ampére equation satisfy an equation themselves. But it is also known that second derivatives are subsolutions of an elliptic equation. In the last section we explore what the implications of that fact are in terms of correlation inequalities. In closing this introduction we want to stress that in the continuous case the optimal transport map Y (X) interpreted as a joint probability measure ν(X, Y ) = δX,Y (X) (X, Y )f (X) dX = δX,Y (X) (X, Y )g(Y ) dY is not just a joint distribution but a “change of variables”, i.e., a one to one map that carries one density to the other, and it is further the gradient of a convex potential, giving the map (or the measure ν(X, Y )) a lot of stability. 1. Optimal Transportation from the Unit Cube to the Unit Cube and Periodic Monge–Ampére We start this section with a reflection property of optimal transportation maps. Given X ∈ Rn we denote by X¯ its reflection with respect to x1 , i.e., if X = (x1 , x2 , . . . , xn ) then X¯ = (−x1 , x2 , . . . , xn ). Lemma 1. Assume that a) 1 , 2 are symmetric with respect to x1 , i.e., X ∈ i ⇔ X¯ ∈ i ,
Monotonicity Properties of Optimal Transportation
551
b) f, g are also symmetric, i.e., ¯ f (X) = f (X),
¯ g(X) = g(X).
Then the optimal transportation is also symmetric, i.e., ¯ a) ϕ(X) = ϕ(X), ψ(Y ) = ψ(Y¯ ), ¯ = Y¯ (X). b) Y (X) Proof. By Brenier [B] ϕ, ψ are the unique minimizing pair of ϕ(X)f (X) dX + ψ(Y )f (Y ) dY under the constraint ϕ(X) + ψ(Y ) ≥ X, Y . By uniqueness, then,
¯ ϕ(X) = ϕ(X),
ψ(Y ) = ψ(Y¯ )
¯ ψ(Y¯ ) are a competing pair with the same energy. since ϕ(X),
Remark. The lemma is valid for a general cost function C(X) symmetric in x1 . Corollary 2. Under the hypothesis and with the notation of the lemma, if Y + is the + + optimal transportation from + , where Y is again ϕ(X), ψ(Y ) 1 to 2 then Y = Y |+ 1 + n restricted to X, Y in (R ) = {X : x1 > 0} must be the minimizing pair. We apply the previous lemma and corollary to densities f (X) and g(Y ) in the unit cube of Rn . Let f, g be densities in the unit cube of Rn , Q1 = {X : 0 ≤ xi ≤ 1} and Y be the optimal transportation. Let us write Y = X + V and respectively ϕ(X) = 21 |X|2 + u(X) (that is V = ∇u). Then Theorem 5. If we extend f, g to f ∗ , g ∗ on a larger cube Q by even reflections, then u(X) also extends periodically to u∗ , to the same cube Q∗ by even reflection and Y (X) to the optimal transportation map Y ∗ = X + ∇u∗ (X) from Q∗ to Q∗ . Corollary 3. If f, g are strictly positive and C α in the unit cube Q1 , then Y (X) maps each face of the cube to itself and both Y (X), X(Y ) have a C 1,α extension across ∂Q. Proof. It follows from the interior regularity theory (the above theorem) since each face of Q becomes interior after a reflection. Remark. The problem of finding “periodic” solutions to the Monge–Ampére equation was solved by Yanyan Li [L] by a different method.
552
L. A. Caffarelli
2. Monotonicity Properties of Y (X) We start with a heuristic discussion. Recall that the Holley condition on µ2 , µ1 was that µ2 (A ∨ B)µ1 (A ∧ B) ≥ µ2 (A)µ1 (B). Logarithmically log µ2 (A ∨ B) − log µ2 (A) ≥ log µ1 (B) − log µ1 (A ∧ B). Let us now think on smooth densities f (X), g(Y ) on the unit cube, and assume we are trying to prove, by a continuity argument that Y (X) is monotone, that is Y (X) ≥ X. So we are looking at a continuous family of densities f t , g t for which Y (X) > X and we find a first time t0 and a point X0 , for which Y (X0 ) > X0 , that is some coordinate, say y1 (X0 ) = x1 (X0 ). That means that y1 (X) − x1 (X) has a local minimum, zero, at X0 . But it is well known that y1 = D1 ϕ, satisfies an elliptic equation, obtained by differentiating the equation for ϕ. From log det D 2 ϕ = log f (X) − log g(∇ϕ) we get Mij Dij (D1 ϕ) = (log f (X))1 − (log g(∇ϕ))i Di1 ϕ. Since ϕ1 − x1 has a minimum, zero, at X0 , Di1 ϕ = δi1 , and we get at X0 , Y (X0 ), Mij Dij [y1 − x1 ] = (log f )1 (X) − (log g)1 (Y ). Since Mij is a strictly positive matrix for ϕ strictly convex and y1 − x1 has a minimum, the left-hand side must be non-negative. If we impose the right-hand to be non-positive we have a contradiction. About the right-hand side, we know that Y > X and that Y − X, e1 = 0, so the natural hypothesis we want to impose on f, g is that Monotonicity hypothesis. If Y ≥ X and Y − X, ei = 0, then Di (log g)(Y ) ≥ Di (log f )(X). Note. If we think of A = Y and B = X + tei we can argue that heuristically B ∨ A = Y + tei and B ∧ A = X, so log g(Y + tei ) − log g(Y ) ≥ log f (X + tei ) − log f (X) becomes Holley’s condition. We will show in fact later how to associate to a discrete “Holley” pair a continuous one satisfying our hypothesis.
Monotonicity Properties of Optimal Transportation
553
But first we prove our main comparison theorem. Theorem 6. Let f, g be C 1,α , strictly positive probability densities in the unit cube Q of Rn . Assume that given any X, Y, ej with X ≤ Y , and X − Y, ej = 0 (i.e., yj − xj = 0) (Dj log f )(X) ≤ (Dj log g)(Y ), and let Y (X) be the optimal transportation map. Then for any X in Q, Y (X) ≥ X. Proof. As we pointed out before, we know that the potentials ϕ(X), ψ(Y ) are of class C 2,α across ∂Qj and the C 1,α optimal transportations Y (X), X(Y ) map each face of the cube into itself in a C 1,α fashion. In particular, classical regularity theory for fully non linear equations applies to ϕ in the interior of the cube. More precisely, ϕ satisfies det Dij ϕ =
f (X) g(∇ϕ)
(see [G-T]) and f, g being C 1,α (this is not kept by reflection along the faces), we have that: ϕ is of class C 3,α (Q). We now study directional derivatives along the boundary of Qj . Consider D1 ϕ outside the faces x1 = 0, x1 = 1. Then, across the remaining boundary of Q1 , y1 (X) = D1 ϕ satisfies Mij Dij (D1 ϕ) = D1 log f (X) − D, (log g)D,1 ϕ. Both Mij and the right-hand side are of class C α (since D1 log f is tangential to the face). Hence y1 (X) is of class C 2,α across that part of the boundary and the equation is satisfied in the classical sense. In order to make the f, g relation strict we change g to gε by defining εyi + Cε , log gε (Y ) = log g + where the constant Cε is chosen so that gε (Y ) = 1. Then from the condition Dj log f (X) ≤ Dj log g(Y ) for yj − xj = 0, we now have for 0 < γ < δ(ε) small enough: Dj log f (X) ≤ Dj log gε (Y ) − δ if |yj0 − xj0 | < γ for some j0 and yj − xj > −γ for the remaining j .
554
L. A. Caffarelli
We now look at the continuous family of densities ft , gt defined by log ft = t log f + C(t), log gt = t log gε + D(t), where C(t), D(t) are chosen to keep
ft =
gt = 1 and we show
Lemma 2. For any 0 < t < 1 the corresponding (continuous in t) family of optimal transports Yt (X), satisfies yjt ≥ xjt − 21 γ . Proof. For t = 0, Y (X) is the identity map, and thus the inequality is satisfied for t small. As usual, suppose there exists a first value t0 > 0, for which the inequality is not satisfied. Thus, there exists X0 and a j (say j = 1) such that y1 (X0 ) = x1 (X0 ) − 21 γ and still y1 (X) ≥ x1 (X) − 21 γ everywhere else. We first note that x1 (X0 ) = 0, 1 because, if not y1 (X0 ) = x1 (X0 ). But everywhere else we have 0 ≤ Mij Dij y1 (X0 ) (since y1 − x1 has a minimum at X0 ) and D1 log f (X0 ) ≤ D1 log g(Y (X0 )) − tδ (since |y1 − x1 | = γ /2 and yj ≥ xj − γ /2 for the remaining j ). This is a contradiction that completes the proof of the lemma and the theorem.
Corollary 4. Let 0 < λ ≤ f, g ≤ be measurable. Suppose that log f , log g satisfies the hypothesis of the theorem in the sense of distributions. Then, the theorem still holds, i.e., Y (X) ≥ X. Proof. Mollify log f , log g to log fε , log gε with a standard (radially symmetric, nonnegative, compactly supported) mollifier ϕε . Then the hypothesis of Theorem 6 is satisfied as long as X, Y stay at distance ε from ∂Q1 . Take as center of coordinates the center of the cube: X = ( 21 , 21 , . . . , 21 ) and make a 2ε-dilation. The new fε , gε satisfy the hypothesis of Theorem 6 when restricted to the unit cube. Thus Theorem 1 holds for them. By passing to the limit on the maps, the theorem holds for f, g.
Monotonicity Properties of Optimal Transportation
555
3. Holley’s Theorem when the Lattice is all of the Vertices of the Unit Cube Given a vertex X ∈ P , we will denote by QX the subcube of Qj , of side 1/2 that has X as a vertex QX = {Z : |Z − X|L∞ ≤ 1/2}. We prove the following theorem. Theorem 7. Let f, g be step functions f = µ1 (X)χ QX , X∈P
g=
µ2 (X)χ QX .
X∈P
Assume that given vertices X, Y , X + ej , Y + ej with Y ≥ X and Y, ej = X, ej = 0 we have log µ2 (Y + ej ) − log µ2 (Y ) ≥ log µ1 (X + ej ) − log µ1 (X). Then Y (X) ≥ X. Proof. As a distribution Di log f (resp. Di log g) is the jump function log µi (X + ej ) − log µ1 (X) supported on the face of QX laying in the plane xj = 1/2.
Corollary 5. Let Z1 , Z2 ∈ P . Define ν(Z1 , Z2 ) = µ1 (Z1 )/|Q1/2 | |{X ∈ QZ1 /Y (X) ∈ QZ2 }| = µ2 (Z2 )/|Q1/2 | |{Y ∈ QZ2 /X(Y ) ∈ QZ1 }|. Then a) ν is a probability measure with marginals µ1 (Z1 ), µ2 (Z2 ), b) ν(Z1 , Z2 ) = 0 ⇒ Z2 ≥ Z1 . 4. Holley’s Theorem for General Lattices Given a lattice ⊂ P , and two measures µ1 , µ2 satisfying the Holley condition we want to extend µ1 , µ2 to small perturbations µ∗1 , µ∗2 in all of P keeping the inequalities. Usually, µ1 , µ2 are extended by zero. We need to be a little more careful. We state the following presentation of . Lemma 3. There is a partition of RN = Rk1 ⊗ Rk2 ⊗ · · · ⊗ Rk, j
and a family of elements wi (1 ≤ j ≤ ,, 1 ≤ i ≤ kj ) such that any non zero element j X ∈ is the max of wi , j wi x= i,j ∈IX
and
j
j
wi = ei + v
with the coordinates vis = 0 ∀ s ≥ j . (More precisely wi1 = e1i , wi2 = e2i + v, with v ∈ Rk1 , wi3 = e3i + v with v ∈ Rk1 +k2 and so on.
556
L. A. Caffarelli
Proof. The decomposition is by first choosing the minimal elements e¯ 1 , e¯ 2 , . . . , e¯ k1 and contracting the ones in them to only one position. Next we choose minimal elements among those not in Rk1 and so on. ¯ be the following extension of : We now extend the lattice and the measure. Let ¯ = ∪ 0 ,
where w ∈ 0 ⇔ max(w, e1 ) ∈
(that is, we add to all those elements with a 1 as first coordinates, those with a zero). ¯ define Given w in w + = w ∨ e1 , w− = w+ − e1 (i.e., w with a zero in the position e1 ). Define
∗
µ (w) =
µ(w) µ(w + )/M
if w ∈ . otherwise (M large)
¯ is a lattice and µ∗ , µ∗ still satisfy Theorem 8. 1 2 log µ∗2 (v1 ∨ v2 ) − log µ∗2 (v2 ) ≥ log µ∗1 (v1 ) − log µ∗1 (v1 ∧ v2 ). ¯ are w + and w − of elements in (w + is always in since e1 ∈ λ). Proof. Elements in Then v1 ∧ v2 = w1± ∧ w2± for w ∈ . If one of the signs is a −, v1 ∧ v2 = (w1 ∧ w2 )− . If not Also
v1 ∧ v2 = w1 ∧ w2 . v1 ∨ v2 = w1± ∨ w2± .
If one of the signs is a + (since w + ∈ ), v1 ∨ v2 = w1 ∨ w2 . If not
v1 ∨ v2 = (w1 ∨ w2 )− .
About the measures µ∗1 , µ∗2 , let us verify the proper inequalities. For that purpose we choose M # µi (X) for any X. There are several cases to consider a) w1 , w2 ∈ , then w1 ∧ w2 , w1 ∨ w2 ∈ and everything is as before. b) w1 ∈ , w2 ∈ / (thus w2 = w2− ). b1 ) If w1 = w1− , we have that w1 ∧ w2 ∈ and w1 ∨ w2 ∈ / and the factor log M cancels in the µ∗2 expression. b2 ) If w1 = w1+ , w1 ∨ w2 ∈ . If w1 ∧ w2 ∈ , the extra factor log M in the µ∗2 expression controls everything else (we choose log M # sup | log µi |. If w1 ∧ w2 ∈ / , µ∗1 (w1 ∧ w2 ) = µ1 (w1 ∧ w2+ )/M, and µ∗ (w2 ) = µ(w2+ )/M, thus each term has an extra log M factor that cancels.
Monotonicity Properties of Optimal Transportation
557
c) w2 ∈ , w1 ∈ / . c1 ) If w2 = w2+ , then w1 ∨ w2 ∈ . If w1 ∧ w2 ∈ the extra term − log M in the µ1 expression controls everything. If w1 ∧ w2 ∈ / then µ∗1 (w1 ∧ w2 ) = µ(w1+ ∧ w2 )/M, µ∗1 (w1 ) = µ(w1+ )/M,
and we have log M cancellation. / and we have c2 ) If w2 = w2− , then w1 ∧ w2 ∈ , but w1 ∨ w2 ∈ µ∗2 (w1 ∨ w2 ) = µ2 (w1+ ∨ w2 )/M, µ∗1 (w1 )∗ = µ1 (w1+ )/M,
and there is a log M factor cancellation. / , w2 ∈ / , then w1 ∨ w2 ∈ / . If w1 ∧ w2 ∈ / , the factors log M cancel. d) If w1 ∈ If not, the extra factor log M in the µ∗1 expression controls everything else. The proof of the theorem is complete.
Theorem 9. We are given ⊂ P and µ1 , µ2 . As before, let f, g be the step functions µ1 (wi )χ Qwi , f = wi ∈
g=
wi ∈
µ2 (wi )χ Qwi .
Then, the optimal transportation map Y (X) is monotone. Proof. If we start with M = M0 and we repeat the extension process (M1 # M0 , M2 ≥ M1 and so on) we exhaust P . Note that once we have extended through e11 , . . . , e1k1 , the elements e21 , . . . , e2k2 belong now to the lattice and are minimal, so we can keep extending. As M0 goes to infinity the measures µ∗i converge to µi . We complete this work by showing that, actually, the discrete optimal transportation map is monotone. In this case the map is in general multi-valued. That is the mass µ1 (w) may have to be spread through several points v. Still, for all those v’s, v(w) ≥ w. Theorem 10. Let be a sublattice of P , the set of vertices of the unit cube on Rn , and let µ1 , µ2 be positive measures in satisfying the usual monotonicity condition. Let ν(X, Y ) be the (discrete) optimal transportation. Then ν(X, Y ) = 0 ⇒ Y ≥ X. Proof. From the previous theorem we may assume that µi is defined and positive in all of P . We will approximate it by bounded densities f, g that satisfy the hypothesis of Theorem 6. We define them as follows. Let 1 be the vector 1 = (1, 1, . . . , 1). In the strip Sωε = {εω < X ≤ ω + ε1}, let N (X, ω) be the number of coordinates, j , for which wj − xj > ε and we define there, for δ $ ε, f (X) = µ1 (ω)δ N . Note that Sωε cover Q1 disjointly (given X we determine w by those coordinates xj > ε). Same definition for g.
558
L. A. Caffarelli
Of course, we have to multiply as usual by a normalization constant to make f = g = 1, but this does not affect the logarithmic inequality. Also if δ goes to zero much faster than ε, (say like ε 2N ) f and g converge to µ1 and µ2 , since most of the mass concentrates in the cube Qε (ω) = {|xi − ωi | < ε}. About Di log f , Di log g, they are jump functions concentrated on the planes xj = ε or 1 − ε so we have to check that the jump inequalities are satisfied. We also may disregard plane intersections since they will not affect Di f in the distributional sense. So we check that a) For X ≤ Y and xi = yi = ε we have Jump(log g) ≥ Jump(log f ). Indeed when xi , yi go through ε we change from evaluating the measures at w1 , (resp. w2 ) to w1 + ei , w2 + ei , and both N (X), N (Y ) increase by one, so the jump relation holds (they are the lattice relations plus a factor log δ. b) When xi , yi go through (1 − ε), w1 and w2 remain unchanged and N (X), N (Y ) both decrease by one. Also here the jump relation holds (both jumps are just log δ). This completes the proof. 5. Second Derivative Estimates In this section we explore what the implications are of the fact that second derivatives of solutions to Monge–Ampére equations are subsolutions of an elliptic equation. First an heuristic discussion: Let us take a second pure derivative of the equation log det Dij ϕ = log f (x) − log g(∇ϕ). We get Mij Dij ϕαα + Mij,k, Dij α ϕDijβ ϕ = Dαα log f − (log g)ij ϕiα ϕj α − (log g)i ϕααi . From the concavity of log det the second term on the left is negative. If ϕαα reaches at X0 the maximum value among all pure second derivatives, then the right-hand side must be negative. Let us look at the explicit case in which up to a constant, f = e−Q(X) and g = e−(Q(Y )+F (Y )) , where Q is a nonnegative quadratic polynomial, aij xi xj (for instance, near neighborhood or other “Dirichlet Integral” like interactions in field theory). We may assume that α = e1 . Then, we must compute D11 (−Q(X) + Q(∇ϕ) + F (∇ϕ), we have D11 (−Q)(X) = −a11 , D11 Q(∇ϕ) = aij ϕi1 ϕj 1 + aij ϕi11 ϕj . But since ϕ11 (X0 ) is the maximum among all pure second derivatives, ϕ11i = 0 for all i, and ϕ1i = 0 for i = 1. So D11 Q(∇ϕ(X0 )) = a11 (ϕ11 )2 . Finally, if F is convex D11 F (∇ϕ) = Fij ϕi1 ϕj 1 + Fi ϕi11 is non-negative.
Monotonicity Properties of Optimal Transportation
559
Therefore D11 (R.H.S.) ≥ a11 ((ϕ11 )2 − 1). We get a contradiction if ϕ11 > 1. That is Theorem 11. Let, up to a multiplicative constant, f (X) = e−Q(X) , g(Y ) = e−(Q(Y )+F (Y )) with F convex. Then the potential ϕ of the optimal transportation satisfies 0 ≤ ϕαα ≤ 1. In particular Y = X + ∇u(X), where u = ϕ − 21 |X|2 is concave and −1 ≤ uαα ≤ 0 (independently of dimension). Proof. To make the previous theorem valid we have to take care of what happens when X goes to infinity. Again by approximation we may assume that the convex function F (X) is +∞ outside the ball BR (that is g is supported in the ball of radius R, and smooth bounded away from zero and infinity inside it. We will replace the second derivative by an incremental quotient, and show that it still satisfies a maximum principle and goes to zero at infinity. Let (δϕe )(X) = ϕ(X + he) + ϕ(X − he) − 2ϕ(X). We fix h, and study what happens if δϕ = δϕe1 attains a local maximum at X0 , for all possible e. From the concavity of log det, we still have that, for the linearization coefficients Mij , of log det at X0 , Mij δϕ(X0 ) ≤ δ(log f − log g) = δ(−Q(X) ) + Q(∇ϕ) + F (∇ϕ). From the fact that δϕe1 realizes a maximum among X and e, we obtain a) ∇δϕ = ∇ϕ(X0 + he1 ) + ∇ϕ(X0 − he1 ) − 2∇ϕ(X0 ) = 0 and b) for any τ ⊥ e1 , Dτ δϕ = τ · (∇ϕ(X0 + he1 ) − ∇ϕ(X0 − he1 ) = 0. Therefore ∇ϕ(X ± he1 ) = ∇ϕ(X) ± λe1 and δϕ = 2λ (λ positive). Then, from the convexity of F , δF (∇ϕ(X0 )) ≥ 0.
560
L. A. Caffarelli
If we write Q(X) as a bilinear form Q(X) = B(X, X), δQ(∇ϕ) = B(∇ϕ(X0 ) + λe1 , ∇ϕ(X0 ) + λe1 ) + B(∇ϕ(X0 ) − λe1 , ∇ϕ(X0 ) − λe1 ) − 2B(∇ϕ(X0 ), ∇ϕ(X0 )) = λ2 B(e1 , e1 ). Similarly δQ(X) = h2 B(e1 , e1 ) so we get: If δϕ has an interior maximum at X0 , then it must hold: ∇ϕ(X0 ± he1 ) = ∇ϕ(X0 ) ± λe1 with λ < h. But, since ϕ is convex ϕ(X0 ± he1 ) − ϕ(X0 ) ≤ ∇ϕ(X0 ± he1 ) − ∇ϕ(X0 ), ±he1 = λh ≤ h2 . Thus, δϕ ≤ 2h2 , the desired inequality. To complete the proof of the theorem it would be enough to show that δϕ goes to zero (for fixed δ) when X goes to infinity. We show that: X . Lemma 4. As X goes to infinity Y converges uniformly to R |X|
Proof. Let X0 = λe1 for λ large and Y0 its image. Let ν be a unit vector with angle (ν, e1 ) ≤
π − ε. 2
From the monotonicity of the map, any point on BR of the form Y & = Y0 + tν must come from a vector
X & = X0 + sµ,
with µ, ν ≥ 0. In particular, we must have angle (µ, e1 ) ≤ (π − ε). In other words if in Y space we consider the cone, = = {Y & = Y0 + tν, with t > 0, angle (ν, e1 ) ≥
π − ε, 2
its intersection with BR must be covered by the image of the (concave) cone = = {X& = X0 + sµ, with s > 0 and angle (µ, e1 ) ≤ π − ε}.
Monotonicity Properties of Optimal Transportation
561
But = has very small f measure µf ( = ) ≤ (ελ)n e−(ελ) , 2
ελ > λ1/2 ,
since the ball of radius ελ is not contained in =. On the other hand, g is strictly positive in BR , so µg (= ∩ BR ) ∼ |= ∩ BR | ≤ µf ( = ). This forces the exponential convergence of Y to Re1 . This completes the proof of the lemma and the theorem, since the uniform converX gence of ∇ϕ to |X| , makes δϕ go to zero (for any fixed, positive h). We state three corollaries of this last inequality. The first two are a generalization of the classic Brascamp–Lieb moment inequality and the third an eigenvalue inequality. Corollary 6. Let f (X) = e−Q(X) , g(H ) = e−[Q(Y )+F (Y )] with Q quadratic and F convex, and let = be a convex function of one variable (|x1 |α in [B-L]). Then Eg (=(y1 − Eg (y1 )) ≤ Ef (=(x1 )). Proof. It follows from [B-L] that it is enough to prove it in the one dimensional case (see Theorem 5.1 of [B-L]). We can also assume by a translation that Eg (y1 ) = 0. By the change of variable formula that means y(x)f (x) dx = 0. Also
Eg (=(y1 )) =
=(y1 (x)f (x) dx.
But y(x) = x + u(x), where y = ϕ & (x), ϕ convex and u = ψ & (x), ψ concave. Thus y is increasing, and u is decreasing and changes sign, since u(x)f (x) dx = y(x)f (x) dx = 0. Say u(x0 ) = 0. Then, we write =(y(x))f (x) ≤ [=(x) + = & (y(x))(y − x)]f (x). Since = is convex, ≤ Ef (=(x)) +
[= & (y(x)) − = & (x0 )](y − x)f (x).
But at x0 , = & (y(x0 )) = = & (x0 ) and y(x0 ) = x0 , and further = & is increasing, while y − x = u is decreasing, thus the last integrand is negative, and this completes the proof.
562
L. A. Caffarelli
If we want to repeat the argument above for functions = that depend on more than one variable, and we want to prove that Eg (=(Y − Eg (Y )) ≤ Ef (=(X)), we may as before assume that Eg (Y ) = 0. That means, with Y = X+U , that U (X0 ) = 0 for some X0 (i.e., the concave function −ψ has a maximum). The same computation then gives us Eg (=(Y )) ≤ Ef (=(X)) + (∇=(Y ) − ∇(=(X0 ))(−∇ψ(Y )f (X) dx, where ψ and = − ∇=(X0 ), X − X0 are both convex with a minimum at X0 , so there is some hope that the integrand be negative. For instance, if we are looking at statistics of k-variables we have the following corollary. Corollary 7. Assume that Q(X), F (X) in the definition of f (X), g(Y ) are symmetric with respect to (x1 , . . . , xk ) and that =(x1 , . . . , xk ) is convex and symmetric. Then Eg (=(Y )) ≤ Ef (=(X)). Proof. As before we may assume the problem is k-dimensional ([B-L], Theorem 4.3). Since Q and F are symmetric, the potentials ϕ(X), ψ(X) are symmetric. Therefore ∇ϕ, ∇ψ, ∇= = 0 for X = 0 and further, sign ϕi (X) = sign ψi (X) = sign =i (X) = sign xi = sign yi . From the computation above it suffices to show that for all Y , ∇= · ∇ψ ≥ 0. That follows since =i · ψi ≥ 0 for all i.
A final consequence of the estimate ϕαα ≤ 1 for log concave perturbations of the Gaussian is that any Raleigh-like quotient (log Sobolev inequality, isoperimetric inequality, Poincaré inequality) that involves a quotient between first derivatives and the function themselves is smaller for the perturbation than for the Gaussian. For instance, let F (t), G(t), H (t), K(t) be non-negative, non-decreasing functions of t ∈ [0, ∞), then we have the Corollary 8. Let f, g be densities as in Theorem 11 (i.e., a Gaussian and its log concave perturbation) then consider the “Raleigh” quotient F ( G(|∇u|)f (X) dX) λf = inf . H ( K(|u|)f (X) dX) Then λg ≥ λf . Proof. If we apply the change of variable formula to any function u(Y ), we get K(|u(Y )|)g(Y ) dY = K(|u(X)|)f (X) dX, while ∇X u(Y (X)) = DX (Y )∇Y u(X). But DX Y is a symmetric matrix with all eigenvalues less than one, so |∇X u(Y (X))| ≤ |∇Y u(Y )| which proves the corollary. Remark. The monotonicity for the log Sobolev inequality under log concave perturbations of the Gaussian follows from the Bakry–Emery theorem ([B-E]).
Monotonicity Properties of Optimal Transportation
563
References [B-E]
Bakry, D. and Emery, M.: Diffusions hypercontractives. In: Sém. Prob. XIX, LNM 1123 Berlin– Heidelberg–New York: Springer, 1985, pp. 177–206 [B-L] Brascamp, H. and Lieb, E.: On extentions of the Brunn-Minkowski and Prékopa-Leindler Theorems, Including Inequality for Log Concave functions, and with an Application to the Diffusion Equation. J. Funct. Anal. 22, 366–389 (1976) [B] Brenier, Y.: Polar factorization and monotone rearrangement of vector-valued functions. Comm. Pure Appl. Math. XLIV, 375–417 (1991) [C1] Caffarelli, L.A.: Interior W 2,p estimates for solutions of the Monge–Ampére equation. Ann. of Math. 131, 135–150 (1989) [C2] Caffarelli, L.A.: The regularity of mappings with a convex potential. J.A.M.S. 5, 99–104 (1992) [C3] Caffarelli, L.A.: Boundary regularity of maps with convex potential I. Comm. Pure Appl. Math. 45, 1141–1151 (1992) [FKG] Fortiun, C.M., Kasteleyn, P.W., Ginibre, J.: Correlation inequalities on some partially ordered sets. Commun. Math. Phys. 22, 89–103 (1971) [G-M] Ganzbo, W., McCann, R.J.: The geometry of optimal transport. Acta Math. 177, 2, 113–161 (1996) [G-T] Gilbarg, P., Trudinger, N.: Elliptic partial differential equations of second order. Second edition, Berlin–Heidelberg–New York: Springer, 1983 [H] Holley, R.: Remarks on the FKG inequalities. Commun. Math. Phys. 36, 227–231 (1974) [L] Li, Yanyan: Some existence results of fully non-linear elliptic equations of Monge-Ampere type. Comm. Pure Appl. Math. 43, 233–271 (1990 [P] Preston, C.J.: A generalization of the FKG inequalities. Commun. Math. Phys. 36, 232–241 (1974) Communicated by J. L. Lebowitz
Commun. Math. Phys. 214, 565 – 572 (2000)
Communications in
Mathematical Physics
© Springer-Verlag 2000
Lifshitz Tail for Schrödinger Operator with Random Magnetic Field Shu Nakamura Graduate School of Mathematical Sciences, University of Tokyo, 3-8-1, Komaba, Meguro-ku, Tokyo 153-8914, Japan. E-mail:
[email protected] Received: 3 January 2000 / Accepted: 18 April 2000
Abstract: We study the behavior of the density states at the lower edge of the spectrum for Schrödinger operators with random magnetic fields. We use a new estimate on magnetic Schrödinger operators, which is similar to the Avron–Herbst–Simon estimate but the bound is always nonnegative. 1. Introduction In this paper, we consider the magnetic Schrödinger operator without scalar potential on Rd (d ≥ 2): H = (p − A(x))2
on L2 (Rd ),
where p = −i∂x the momentum operator, A(x) = (A1 (x), . . . , Ad (x))
x ∈ Rd
is a vector potential. For the moment we suppose A is a C 1 -class function. We identify A(x) with the 1-form A = dj =1 Aj (x)dxj . The magnetic field is the 2-form defined by Bij (x)dxi ∧ dxj , B(x) = dA = i<j
where Bij (x) = ∂xi Aj (x) − ∂xj Ai (x), and we identify B with the skew-symmetric matrix valued function (Bij (x)) on Rd . It is well-known that for any closed 2-form B, there is a 1-form A such that B = dA, and the corresponding Schrödinger operator H is uniquely determined by B modulo the gauge transformation group. Supported in part by ISPS grant Kiban B 09440055
566
S. Nakamura
We suppose that B = B ω is a metrically transitive random closed 2-form in the following sense: There exists a probability space (, , P) and Rd acts on as a measure preserving transformation group. We denote the action by {Tx | x ∈ Rd }. The action is supposed to be ergodic, i.e., if E ⊂ is invariant under {Tx }, then P(E) = 0 or 1. B = B ω is assumed to satisfy B ω (x − y) = B (Tx ω) (x),
x, y ∈ Rd ,
and it is a continuous closed 2-form almost surely. We denote L = {x ∈ Rd | |xj | ≤ L/2 for j = 1, . . . , d},
L > 0,
and let Hω be the Schrödinger operator (with the same symbol as above) on ⊂ Rd with, e.g., the Neumann boundary condition (cf. Sect. 3). We also denote the Lebesgue measure of by ||. Then it is well-known that the density of states k(E) = lim
L→∞
1 #{eigenvalue of HωL ≤ E} |L |
exists for almost all E ∈ R and ω ∈ , and it is independent of ω (almost surely). Our main assumption is the following. Assumption A. (i) B ω is a metrically transitive random closed 2-form on Rd in the above sense. (ii) There is a constant M ∈ R+ = [0, ∞) such that |B ω (x)| ≤ M for any x ∈ Rd and almost all ω ∈ , where |B(x)| = ( i<j |Bij (x)|2 )1/2 . (iii) There exists a real-valued continuous function ϕ on R+ such that ϕ(λ) → 0 as λ → ∞, and it satisfies the following condition: Let 1 , 2 ∈ and let f ∈ L1 (), g ∈ L∞ () such that f is 1 -measurable and g is 2 -measurable, where denotes the σ -algebra generated by {B ω (x) | x ∈ }. Then |E(f g) − E(f )E(g)| ≤ ϕ(dist(1 , 2 ))f L1 gL∞ , where E(·) denotes the expectation with respect to P, and dist(·, ·) is the Euclidean distance on Rd . (iv) Let C0 be the unit cube in Rd . Then
E
C0
|B ω (x)|dx > 0.
Theorem 1. Suppose Assumption A. Then lim sup log(− log k(E))/ log E ≤ −d/2. E↓0
Lifshitz Tail for Schrödinger Operator with Random Magnetic Field
567
The Lifshitz tail (or the Lifshitz singularity) of the density of states for the Schrödinger operator with random potential has been studied by many authors (see, e.g., the textbook of Carmona and Lacroix [2] or the lecture note of Kirsch [3]), but very few results have been obtained on the Lifshitz tail for the Schrödinger operator with random magnetic field so far. The author is only aware of a work by Ueki [7], where the Lifshitz tail is studied for Schrödinger operators with a class of Gaussian random magnetic field, and a work by the author on the 2D discrete Schrödinger operator with Anderson type random magnetic field [5]. The strategy of the proof is similar to [5]. We first prove that the magnetic Schrödinger operator is bounded from below by a nonnegative function, which does not vanish where B does not (Sect. 2). This estimate is similar to the well-known AHS (Avron-HerbstSimon) estimate: H ≥ ±Bij (x),
x ∈ Rd , i = j.
The right-hand side of the AHS estimate is not necessarily positive. In order to obtain global lower bounds of the operator, we have to use a partition of unity, and thus introduce (not necessarily positive) error terms. Our estimate (Theorem 2) is in fact weaker than the AHS estimate in many situations, but the right hand side of the estimate is always nonnegative, and hence easier to apply in our context. In Sect. 3, we discuss Neumann decoupling and prove a simple a priori estimate on the density of states for magnetic Schrödinger operators. We then combine them with a result on the Lifshitz tail of the Schrödinger operator with random potential to conclude the assertion (cf. Kirsch–Martinelli [4], Kirsch [3]). Notation. We denote the definition domain of an operator A by D(A). The quadratic form domain of A is denoted by Q(A). The inner product of a Hilbert space H is denoted by ϕ|ψ (ϕ, ψ ∈ H), which is linear in the second entry. We denote the space of the Schwartz functions on Rd by S(Rd ). 2. Energy Estimates for Magnetic Schrödinger Operators We consider a deterministic Schrödinger operator H = (p − A(x))2 with the magnetic field B = dA. We fix a constant r > 0. For x ∈ Rd and 1 ≤ i < j ≤ d, we denote Dij (x) = y ∈ Rd yk = xk for k = i, j , (xi − yi )2 + (xj − yj )2 ≤ r 2 , γij (x) = ∂Dij (x). We define bij (x) ∈ T = R/(2π Z) by bij (x) ≡ Bij (y)dyi dyj Dij (x)
We identify T with [−π, π) and set W (x) =
(mod 2π Z).
1 |bij (y)|2 dy. (d − 1)4π 3 r 3 γij (x) i<j
568
S. Nakamura
Theorem 2. H ≥ W in the operator sense, i.e., ψ|H ψ ≥ W (x)|ψ(x)|2 dx, Rd
ψ ∈ D(H ).
Remark. Since |bij (x)| ≤ π , we have 0 ≤ W (x) ≤
1 d(d − 1) d · 2π r · π 2 = 2 , 3 3 (d − 1)4π r 2 4r
x ∈ Rd .
Proof. We denote the velocity operator by vj = pj − Aj (x),
j = 1, . . . , d.
We parameterize γij (x) as follows: For t ∈ [0, 2π r), we set if k = i, j , xk yk (x; t) = xi + r cos(t/r) if k = i, x + r sin(t/r) if k = j . j For ψ ∈ S(Rd ), we compute
Ex (ψ) = where y(x; ˙ t) =
d dt y(x; t).
γij (x)
|y˙ · (vψ)|2 dy,
(2.1)
If we denote
˜ ψ(t) = ψ(y(x; t)),
˜ = y(x; A(t) ˙ t) · A(y(x; t)),
then the right-hand side of (2.1) is 2πr (−i∂t − A(t)) ˜ ˜ 2 dt = ψ|h ˜ ψ, ˜ ψ(t) 0
where h is an operator on L2 ([0, 2π r)) defined by 2 ˜ hϕ(t) = (−i∂t − A(t)) ϕ(t),
ϕ ∈ D(h) = H 2 (R/(2π rZ)),
i.e., ϕ ∈ D(h) satisfies the periodic boundary condition. Then using the gauge transform t ˜ G(t) = exp(i 0 A(s)ds), we can easily observe that h is unitarily equivalent to h˜ defined by ˜ hϕ(t) = −∂t2 ϕ(t), ˜ = ϕ ∈ H 2 ((0, 2π r)) ϕ(2π r) = eib ϕ(0) , ϕ ∈ D(h) where b ≡
2πr
˜ A(t)dt (mod 2πZ). Now we note 2πr ˜ A(t)dt = y˙ · A(y(x; t))dt
0
0
=
γij (x)
Dij (x)
Bij (y)dyi dyj ≡ bij (x)
(mod 2π Z)
Lifshitz Tail for Schrödinger Operator with Random Magnetic Field
569
by the Stokes formula. Thus b = bij (x). The spectrum of h˜ is easily computed, and we learn b n 2 ˜ σ (h) = + n∈Z . 2π r r In particular, ˜ = inf σ (h) = inf σ (h)
bij (x) 2π r
2 =
1 |bij (x)|2 . 4π 2 r 2
This implies ˜ ψ ˜ ≥ inf σ (h)ψ ˜ 2= Ex (ψ) = ψ|h
1 ˜ 2. |bij (x)|2 ψ 4π 2 r 2
Integrating this inequality over x ∈ Rd , we have 2πr 1 Ex (ψ)dx ≥ |bij (x)|2 |ψ(y(x; t))|2 dtdx 2r 2 d d 4π R R 0 2πr 1 |bij (y(x ; t ))|2 |ψ(x )|2 dt dx , = 4π 2 r 2 Rd 0
(2.2)
where x = y(x, t) and t ≡ t + π r (mod 2π Z). Now we estimate the left hand side of (2.2). We use the same change of variables as above to learn 2πr 2 y(x; ˙ t) · ψ(y(x; t)) dtdx Ex (ψ)dx = Rd
=
Rd
= πr
2πr 0
Rd
Rd
0
sin2 (t/r)|vi ψ(x)|2 + cos2 (t/r)|vj ψ(x)|2 dtdx
|vi ψ(x)|2 + |vj ψ(x)|2 dx.
Combining this with (2.2), we obtain |vi ψ(x)|2 + |vj ψ(x)|2 dx ≥ Rd
1 4π 3 r 3
Rd
2πr 0
|bij (y(x; t))|2 |ψ(x)|2 dtdx.
Then we sum up this estimate over 1 ≤ i < j ≤ d to conclude (d − 1)
d i=1
|vi ψ(x)|2 ≥
1 4π 3 r 3
Rd i<j
and this implies the assertion of Theorem 2.
|bij (y(x; t))|2 |ψ(x)|2 dtdx,
Example. Let H be the Schrödinger operator on R2 with constant magnetic field with B > 0. We choose r > 0 so that b = b12 = π , i.e., r = B −1/2 . Then we have W (x) = B/2, and hence H ≥ B/2. It is well-known, however, that inf σ (H ) = B, and the estimate is not optimal.
570
S. Nakamura
The following estimate (or, strictly speaking, its generalization) will be useful in the proof of Theorem 1. Corollary 3. Let W as above, and let H0 = p2 be the free Schrödinger operator. Then 1 inf σ (H0 + W ). 2 Proof. By Kato’s inequality ([6] Theorem X.33), we have inf σ (H ) ≥
ψ|H ψ ≥ |ψ| |H0 |ψ|,
ψ ∈ S(Rd ).
Combining this with Theorem 1, we learn ψ|H ψ ≥
1 |ψ| |(H0 + W )|ψ|, 2
and hence ψ|H ψ ψ2 |ψ| |(H0 + W )|ψ| 1 1 ≥ inf ≥ inf σ (H0 + W ). 2 2 ψ=0 ψ 2
inf σ (H ) = inf
ψ=0
(Note that the last inequality is in fact equality, since the ground state of the Schrödinger operator is nonnegative.) 3. Proof of Theorem 1 As in the last section, we fix A(x) and consider a deterministic magnetic Schrödinger operator H = (p − A)2 . Let ⊂ Rd be an open set. Then the Neumann Hamiltonian H on L2 () is defined by H ψ(x) = (p − A(x))2 ψ(x),
ψ ∈ D(H ),
Q(H ) = D((H ) ) 1 = ψ ∈ Hloc () (pj − Aj (x))ψ ∈ L2 (), j = 1, . . . , d . 1/2
˜ ⊂ are Note that Q(H ) is not necessarily the same as H 1 () if is unbounded. If d 2 2 ˜ ˜ open sets in R such that | \ | = 0, then we can regard L () = L (), and H and H˜ may be considered as operators on the same Hilbert space. Since Q(H˜ ) ⊃ Q(H ) and the symbols are the same, we learn H˜ ≤ H
on L2 ().
Hence, in particular, by the min-max principle ([6] Theorem XIII-1) we have 1 1 #{eigenvalues of H ≤ E} ≤ #{eigenvalues of H˜ ≤ E} || || for E ∈ R if || is finite. We denote C(L; x) = y ∈ Rd xj < yj < xj + L, j = 1, . . . , d , for L > 0, x ∈ Rd . In particular, we write L = C(L, 0). We will need the following a priori bound on the number of eigenvalues.
Lifshitz Tail for Schrödinger Operator with Random Magnetic Field
571
Lemma 4. Suppose that |B(x)| ≤ M with M < ∞. Then there exists a constant C > 0 depending only on d and M such that 1 #{eigenvalues of HL ≤ E} ≤ C(1 + E d/2 ) |L | for any E ≥ 0 and for any integer L > 0. Proof. Let = L , and let ˜ = C(1, n) n ∈ Zd , 0 ≤ nj < L, j = 1, . . . , d . ˜ ⊂ and | \ | ˜ = 0, and hence it suffices to estimate Then 1 #{e.v.’s of H˜ ≤ E} = −d |L |
#{e.v.’s of HC(1,n) ≤ E}.
(3.1)
0≤nj 0. The lemma now follows from (3.1) and (3.2).
(3.2)
We also need a generalization of the results of the last section for the Neumann Hamiltonian. Let r > 0 and W (x) as in the last section. For ⊂ Rd , we set W (x) if dist(x, c ) > r, W (x) = 0 if dist(x, c ) ≤ r. Then, by exactly the same argument as in the last section, we can prove the following estimates: Lemma 5. H ≥ W as operators on L2 (). Hence, in particular, inf σ (H ) ≥
1 inf σ (H0, + W ), 2
where H0, denotes the free Schrödinger operator with the Neumann boundary condition on .
572
S. Nakamura
Proof of Theorem 1. Here we suppose that B = B ω satisfies Assumption A. Let L > 0 and we set 1 kL (E) = E(#{e.v.’s of HLω ≤ E}), E ∈ R, |L | where HLω is the Neumann Hamiltonian on L . Then we can show k(E) ≤ kL (E),
for any E and L,
(3.3)
using exactly the same argument as in the case of the Schrödinger operator with random potential (cf. [3] Chapter 7, Corollary 2). Now we apply the argument of Kirsch– Martinelli [4]. It follows from Assumption A-(iii) that W (x) satisfies Assumption B-(ii) of [4]. By Assumption A-(iv), we have E |W (x)|dx > 0, C0
and hence E(|{x ∈ C0 | W (x) = 0}|) < 1. Then we learn from the argument of Sect. 3 of [4] that there are c1 , c2 > 0 such that if L ∼ c1 E −1/2 then P inf σ (H0,L + WL ) ≤ E ≤ exp(−c2 E −d/2 ), E ∼ 0. We note that the support of W − WL is contained in a neighborhood of ∂L with the width r, and the volume is O(Ld−1 ). Hence the difference of W and WL does not affect the large deviation argument. This and Lemma 5 imply P inf σ (HLω ) ≤ E ≤ exp(−c3 E −d/2 ) with c3 > 0. Combining this with Lemma 4, we have kL (E) ≤ C(1 + E d/2 ) · P inf σ (HL ) ≤ E ≤ exp(−c4 E −d/2 ) with c4 > 0 if E is sufficiently small. Theorem 1 follows from this and (3.3).
References 1. Avron, J., Herbst, I., Simon, B.: Schrödinger operators with magnetic fields I: General interactions. Duke Math. J. 45, 847–883 (1978) 2. Carmona, R., Lacroix, J.: Spectral Theory of Random Schrödinger Operators. Basel–Boston: Birkhäuser, 1990 3. Kirsch, W.: Random Schrödinger operators. In: Schrödinger Operators, H. Holden, A. Jensen, eds., Lecture Notes in Physics 345, Berlin–Heidelberg–New York: Springer, 1989 4. Kirsch, W., Martinelli, F.: Large deviations and Lifshitz singularity of the integrated density of states of random Hamiltonians. Commun. Math. Phys. 89, 27–40 (1983) 5. Nakamura, S.: Lifshitz tail for 2D discrete Schrödinger operator with random magnetic field. Preprint 1999 Sep. (mp_arc:99-420) To appear in Ann. Henri Poincaré 6. Reed, M., Simon, B.: The Methods of Modern Mathematical Physics, Vol. I–IV. New York: Academic Press, 1972–1979 7. Ueki, N.: Simple examples of Lifshitz tails in Gaussian random magnetic fields. To appear in Ann. Henri Poincaré Communicated by B. Simon
Commun. Math. Phys. 214, 573 – 592 (2000)
Communications in
Mathematical Physics
© Springer-Verlag 2000
Homotopy Classes for Stable Periodic and Chaotic Patterns in Fourth-Order Hamiltonian Systems W. D. Kalies1 , J. Kwapisz2 , J. B. VandenBerg3 , R. C. A. M. VanderVorst3,4, 1 Department of Mathematical Sciences, Florida Atlantic University, Boca Raton, FL 33431, USA.
E-mail:
[email protected] 2 Department of Mathematical Sciences, Montana State University-Bozeman, Bozeman, MT 59717-2400,
USA. E-mail:
[email protected] 3 Department of Mathematical Sciences, University of Leiden, Niels Bohrweg 1, 2333 CA Leiden,
The Netherlands. E-mail:
[email protected];
[email protected] 4 CDSNS, Georgia Institute of Technology, Atlanta, GA 30332, USA
Received: 6 April 1999 / Accepted: 2 May 2000
Abstract: We investigate periodic and chaotic solutions of Hamiltonian systems in R4 which arise in the study of stationary solutions of a class of bistable evolution equations. Under very mild hypotheses, variational techniques are used to show that, in the presence of two saddle-focus equilibria, minimizing solutions respect the topology of the configuration plane punctured at these points. By considering curves in appropriate covering spaces of this doubly punctured plane, we prove that minimizers of every homotopy type exist and characterize their topological properties. 1. Introduction This work is a continuation of [?] where we developed a constrained minimization method to study heteroclinic and homoclinic local minimizers of the action functional γ 2 β 2 JI [u] = j (u, u , u ) dt = (1.1) |u | + |u | + F (u) dt, 2 I I 2 which are solutions of the equation γ u − βu + F (u) = 0
(1.2)
with γ , β > 0. This equation with a double-well potential F has been proposed in connection with certain models of phase transitions. For brevity we will omit a detailed background of this problem and refer only to those sources required in the proofs of the results. A more extensive history and reference list are provided in [?], to which we refer the interested reader. The above equation is Hamiltonian with H = −γ u u +
γ 2 β 2 |u | + |u | − F (u). 2 2
This work was supported by grants ARO DAAH-0493G0199 and NIST G-06-605.
(1.3)
574
W.D. Kalies, J. Kwapisz, J.B. VandenBerg, R.C.A.M. VanderVorst
The configuration space of the system is the (u, u )-plane, and solutions to (1.2) can be represented as curves in this plane. Initially these curves do not appear to be restricted in any way. However, the central idea presented here is that, when (±1, 0) are saddle-foci, the minimizers of J respect the topology of this plane punctured at these two points, which allows for a rich set of minimizers to exist. Using the topology of the doublypunctured plane and its covering spaces, we describe the structure of all possible types of minimizers, including those which are periodic and chaotic. Since the action of the minimizers of these latter types is infinite, a different notion of minimizer is required that is reminiscent of the minimizing (Class A) geodesics of Morse [?]. Such minimizers have been intensively studied in the context of geodesic flows on compact manifolds or the Aubry–Mather theory (see e.g. [1] for an introduction). A crucial difference is that we are dealing with a non-mechanical system on a non-compact space. Nevertheless, we are able to emulate many of Morse’s original arguments about how the minimizers can intersect with themselves and each other. For a precise statement of the main results we refer to Theorem 3.2 and Theorem 5.8. For related work on mechanical Hamiltonian systems we refer to [2,?] and the references therein. Another important aspect of the techniques employed here and in [?] is the mildness of the hypotheses. In particular, our approach requires no transversality or non-degeneracy conditions, such as those found in other variational methods and dynamical systems theory, see [?]. Specifically, we will assume the following hypothesis on F : (H): F ∈ C 2 (R), F (±1) = F (±1) = 0, F (±1) > 0, and F (u) > 0 for u = ±1. Moreover there are constants c1 and c2 such that F (u) ≥ −c1 + c2 u2 . We will also assume for simplicity of the formulation that F is even, but many analogous results will hold for nonsymmetric potentials, cf. [?]. Finally, we assume that the parameters γ and β are such that u = ±1 are saddle-foci, i.e. 4γ /β 2 > 1/F (±1). An example of a nonlinearity satisfying these conditions is F (u) = (u2 − 1)2 /4, in which case (1.2) is the stationary version of the so-called extended Fisher–Kolmogorov (EFK) equation. In [?] we classify heteroclinic and homoclinic minimizers of J by a finite sequence of even integers which represent the number of times a minimizer crosses u = ±1. In order to classify more general minimizers we must consider infinite and bi-infinite sequences, as we now describe. A function u : R → R can be represented as a curve in the (u, u )−plane, and the associated curve will be denoted by (u). Removing the equilibrium points (±1, 0) from the (u, u )−plane (the configuration space) creates a space with nontrivial topology, denoted by P = R2 \{(±1, 0)}. In P we can represent functions u which have the property that u = 0 when u = ±1, and various equivalence classes of curves can be distinguished. For example, in [?] we considered classes of curves that terminate at the equilibrium points (±1, 0). Another important class consists of closed curves in P, which represent periodic functions. We now give a systematic description of all classes to be considered. Definition 1.1. A type is a sequence g = (gi )i∈I with gi ∈ 2N ∪ {∞}, where ∞ acts as a terminator. To be precise, g satisfies one of the following conditions: i) I = Z, and g ∈ 2NZ is referred to as a bi-infinite type. ii) I = {0} ∪ N, and g = (∞, g1 , g2 , . . . ) with gi ∈ 2N for all i ≥ 1, or I = −N ∪ {0}, and g = (. . . , g−2 , g−1 , ∞) with gi ∈ 2N for all i ≤ −1. In these cases g is referred to as a semi-terminated type.
Homotopy Classes for Stable Periodic and Chaotic Patterns
575
iii) I = {0, . . . , N + 1} with N ≥ 0, and g = (∞, g1 , . . . , gN , ∞) with gi ∈ 2N. In this case g is referred to as a terminated type. These types will define function classes using the vector g to count the crossings of u at the levels u = ±1. Since there are two equilibrium points, we introduce the notion of parity denoted by p, which will be equal to either 0 or 1. 2 (R) is in the class M(g, p) if there are nonempty sets Definition 1.2. A function u ∈ Hloc {Ai }i∈I such that i) u−1 (±1) = i∈I Ai , ii) #Ai = gi for i ∈ I, iii) max Ai < min Ai+1 , i+p+1 , and iv) u(A i ) = (−1) v) i∈I Ai consists of transverse crossings of ±1, i.e., u (x) = 0 for x ∈ Ai .
Note that by Definition 1.1, a function u in any class M(g, p) has infinitely many crossings of ±1. Definition 1.2 is similar to the definition of the class M(g) in [?] except that here it is assumed that all crossings of ±1 are transverse. Only finitely many crossings are assumed to be transverse in [?] so that the classes M(g) would be open subsets of χ +H 2 (R). Since we will not directly minimize over M(g, p), we now require transversality of all crossings of ±1 to guarantee that (u) ∈ P. However, note that the minimizers found in [?] are indeed contained in classes M(g, p) as defined above, where the types g are terminated. The classes M(g, p) are nonempty for all pairs (g, p). Conversely, any function 2 (R) is contained in the closure of some class M(g, p) with respect to the u ∈ Hloc −i 2 (R) given by ρ(u, v) = complete metric on Hloc i 2 min{1, u − vH 2 (−i,i) }, cf. 2 [?]. That is, if we define M(g, p) := {u ∈ Hloc (R) | ∃un ∈ M(g, p), with un → u 2 (R)}, then H 2 (R) = ∪ in Hloc (g,p) M(g, p). Note that the functions in ∂M(g, p) := loc M(g, p) \ int(M(g, p)) have tangencies at u = ±1 and thus are limit points of more than one class. In the case of an infinite type, shifts of g can give rise to the same function class. Therefore certain infinite types need to be identified. Let σ be the shift map defined by σ (g)i = gi+1 and the map τ : {0, 1} → {0, 1} be defined by τ (p) = (p + 1)mod 2 = |p−1|. Two infinite types (g, p) and (g , p ) are equivalent if g = σ n (g) and p = τ n (p) for some n ∈ Z, and this implies M(g, p) = M(g , p ). A bi-infinite type g is periodic if there exists an integer n such that σ n (g) = g. When the domain of integration is R, the action J [u] given in (1.1) is well-defined only for terminated types g and u ∈ M(g, p) ∩ {χ p + H 2 (R)}, where χ p is a smooth function from (−1)p+1 to (−1)p . For semi-terminated types or infinite types the action J is infinite for every u ∈ M(g, p). In Sect. 2, we will define an alternative notion of minimizer in order to overcome this difficulty. The primary goal of this paper is to prove the following theorem, but we also prove additional results about the structure and relationships between various types of minimizers. Theorem 1.3. If F satisfies Hypothesis (H) and is even, then for any type g and parity p there exists a minimizer of J in M(g, p) in the sense of Definition 2.1. Moreover, if g is periodic, then there exists a periodic minimizer in M(g, p). In Sects. 5 and 6 we show that other properties of the symbol sequences, such as symmetry, are reflected in the corresponding minimizers. The classification of minimizers by
576
W.D. Kalies, J. Kwapisz, J.B. VandenBerg, R.C.A.M. VanderVorst
symbol sequences has other properties in common with symbolic dynamics; for example, if a type is asymptotically periodic in both directions, then there exists a minimizer of that type which is a heteroclinic connection between two periodic minimizers. The minimizers discussed here all lie in the 3-dimensional “energy-manifold” M0 = {(u, u , u , u ) | H ((u, u , u , u ) = 0}. Exploiting certain properties of minimizers that are established in this paper, we can deduce various linking and knotting characteristics when they are represented as smooth curves in M0 [?,?]. The minimizers found in this paper are also used in [?] to construct stable patterns for the evolutionary EFK equation on a bounded interval, and the dynamics of the evolutionary EFK is discussed in [?]. Some notation used in this paper was previously introduced in [?]. While we have attempted to present a self-contained analysis, we have avoided reproducing details (particularly in Sect. 5.1) which are not central to the ideas presented here, and which are thoroughly explained in [?]. 2. Definition of Minimizer For every compact interval I ⊂ R the restricted action JI is well-defined for all types. When we restrict u to an interval I , we can define its type and parity relative to I , which we denote by (g(u|I ), p(u|I )). Namely, let u ∈ M(g, p). It is clear that (u, u )|∂I ∈ (±1, 0) for any bounded interval I . Then g(u|I ) is defined to be the finite-dimensional vector which counts the consecutive instances of u|I = ±1, and p(u|I ) is defined such that the first time u|I = ±1 in I happens at (−1)p+1 . Note that the components of g(u|I ) are not necessarily all even, since the first and the last entries may be odd. We are now ready to state the definition of a (global) minimizer in M(g, p). Definition 2.1. A function u ∈ M(g, p) is called a minimizer for J over M(g, p) if and only if for every compact interval I the number JI [u|I ] minimizes JI [v|I ] over all functions v ∈ M(g, p) and all compact intervals I such that (v, v )|∂I = (u, u )|∂I and (g(v|I ), p(v|I )) = (g(u|I ), p(u|I )). The pair (g(u|I ), p(u|I )) defines a homotopy class of curves in P with fixed end points (u, u )|∂I . The above definition says that a function u, represented as a curve (u) in P, is a minimizer if and only if for any two points P1 and P2 on (u), the segment (P1 , P2 ) ⊂ (u) connecting P1 and P2 is the most J -efficient among all connections (P1 , P2 ) between P1 and P2 that are induced by a function v and are of the same homotopy type as (P1 , P2 ), regardless of the length of the interval needed to parametrize the curve (P1 , P2 ). As we mentioned in the introduction, this is analogous to the length minimizing geodesics of Morse and Hedlund and the minimizers in the Aubry–Mather theory. The set of all (global) minimizers in M(g, p) will be denoted by CM(g, p). Lemma 2.2. Let u ∈ M(g, p) be a minimizer, then u ∈ C 4 (R) and u satisfies Eq. (1.2). Moreover, u satisfies the relation H (u, u , u , u ) = 0, i.e. the associated orbit lies on the energy level H = 0. Proof. From the definition of M(g, p), on any bounded interval I ⊂ R there exists #0 (I ) > 0 sufficiently small such that u + φ ∈ M(g, p) for all φ ∈ H02 (I ), with φH 2 < # ≤ #0 . Therefore JI [u + φ] ≥ JI [u] for all such functions φ, which implies that dJI [u] = 0 for any bounded interval I ⊂ R, and thus u satisfies (1.2).
Homotopy Classes for Stable Periodic and Chaotic Patterns
577
To prove the second statement we argue as follows. Since u ∈ M(g, p), there exists a bounded interval I such that u |∂I = 0. Introducing the rescaled variable s = t/T with T = |I | and v(s) = u(t), we have JI [u] = J [T , v] ≡
1
0
1 γ 2 1β 2 |v |v | + | + T F (v) ds, T3 2 T 2
(2.1)
which decouples u and T . Since u |∂I = 0 we see from Definition 2.1 that J [T ± #, v] ≥ J T [u] = J [T , v]. The smoothness of J in the variable T > 0 implies that ∂ = 0. Differentiating yields ∂τ J [τ, v] τ =T
∂ J [τ, v] = ∂τ
1
−4 3
2
−2 β
2
−τ γ |v | − τ |v | + F (v) ds 2 2 τ 3 β 2 −1 2 =τ − γ |u | − |u | + F (u) dt 2 2 0 τ = −τ −1 H (u, u , u , u )dt ≡ −E. 0
0
Thus E = 0, and H (u, u , u , u ) = 0 for t ∈ I . This immediately implies that H = 0 for all t ∈ R. The minimizers for J found in [?] also satisfy Definition 2.1, and we restate one of the main results of [?]. Proposition 2.3. Suppose F is even and satisfies (H), and β, γ > 0 are chosen such that u = ±1 are saddle-focus equilibria. Then for any terminated type g with parity either 0 or 1 there exists a minimizer u ∈ M(g, p) of J . From Definition 1.2, the crossings of u ∈ M(g, p) with ±1 are transverse and hence isolated. We adapt from [?], the notion of a normalized function with a few minor changes to reflect the fact that we now require every crossing of ±1 to be transverse. Definition 2.4. A function u ∈ M(g, p) is normalized if, between each pair u(a) and u(b) of consecutive crossings of ±1, the restriction u|(a,b) is either monotone or u|(a,b) has exactly one local extremum. Clearly, the case of u|(a,b) being monotone can occur only between two crossings at different levels ±1, in which case we say that u has a transition on [a, b]. Lemma 2.5. If u ∈ CM(g, p), then u is normalized. Proof. Since u ∈ M(g, p), all crossings of u = ±1 are transverse, i.e. u = 0. Thus for any critical point t0 ∈ R, u(t0 ) = ±1, and the Hamiltonian relation from Lemma 2.2 and (1.3) implies that γ u (t0 )2 /2 = F (u(t0 )) > 0. Therefore u is a Morse function, and between any two consecutive crossings of ±1 there are only finitely many critical points. Now on any interval between consecutive crossings where u is not normalized, the clipping lemmas of Sect. 3 in [?] can be applied to obtain a more J -efficient function, which contradicts the fact that u is a minimizer.
578
W.D. Kalies, J. Kwapisz, J.B. VandenBerg, R.C.A.M. VanderVorst
3. Minimizers of Arbitrary Type In this section we will introduce a notion of convergence of types which will be used in Sect. 5.2 to establish the existence of minimizers in every class M(g, p) by building on the results proved in [?].
Definition 3.1. Consider a sequence of types (gn , pn ) = (gin )i∈In , pn and a type (g, p) = (gi )i∈I , p . The sequence (gn , pn ) limits to the type (g, p) if and only if n there exist numbers Nn ∈ 2Z such that gi+N → gi for all i ∈ I as n → ∞. We n n +p −p n n will abuse notation and write (g , p ) → (g, p). We should point out that a sequence of types can limit to more than one type. For
n , 0) = (∞, 2, 2, n, 4, 4, 4, 4, n, 2, 2, 2, . . . ), 0 limits to the example the sequence (g
types (∞, 2, 2, ∞), 0 , (∞, 4, 4, 4, 4, ∞), 1 and (∞, 2, 2, 2, . . . ), 0 . Theorem 3.2. Let (gn , pn ) → (g, p) and un ∈ CM(gn , pn ) with un 1,∞ ≤ C for 4 (R), all n. Then there exists a subsequence unk such that unk → u ∈ M(g, p) in Cloc and u is a minimizer in the sense of Definition 2.1, i.e. u ∈ CM(g, p). Proof. This proof requires arguments developed in [?] to which the reader is referred for certain details. The idea is to take the limit of un restricted to bounded intervals. We define the numbers Nn as in Definition 3.1, and we denote the convex hull of Ai by Ii = conv(Ai ). Due to translation invariance we can pin the functions un so that un (0) = (−1)p+1 , which is the beginning of the transition between INn n +pn −p and n I1+N . Due to the assumed a priori bound and interpolation estimates which can n n +p −p be found in the appendix to [?], there is enough regularity to yield a limit function u 4 –limit of u , after perhaps passing to a subsequence. Moreover u satisfies the as a Cloc n differential equation (1.2) on R. The question that remains is whether u ∈ M(g, p). To simplify notation we will now assume that Nn = 0 and pn = p = 0. Fixing a small δ > 0, we define Iin (δ) ⊃ Iin as the smallest interval containing Iin such that u|∂Iin (δ) = (−1)i+1 − (−1)i+1 δ (if g is a (semi-)terminated type then Iin (δ) may be n (δ) is denoted by Ln (δ). a half-line). The interval of transition between Iin (δ) and Ii+1 i To see that u ∈ M(g, p), one has to to eliminate the two possibilities that a priori may lead to the loss or creation of crossings in the limit so that u ∈ M(g, p): the distance between two consecutive crossings in un could grow without bound or u could possess tangencies at u = ±1. Due to the a priori estimates in W 1,∞ we have the following bounds on J : J [un |Iin (δ) ] ≤ C
and
J [un |Lni (δ) ] ≤ C ,
(3.1)
where C and C are independent of n and i. Indeed, note that for n large enough the homotopy type of un on the intervals Iin (δ) is constant by the definition of convergence of types. Since the functions un are minimizers, J [un |Iin (δ) ] is less than the action of any test function of this homotopy type satisfying the a priori bounds on u and u on ∂Iin (δ) (see [?, Sect. 6] for a similar test function argument). The estimate |Lni (δ)| ≤ C(δ) is immediately clear from Lemma 5.1 of [?]. We now need to show that the distance between two crossings of (−1)i+1 within the interval Iin (δ) cannot tend to infinity. First we will deal with the case when gin is finite for all n. Suppose that the distance between consecutive crossings of (−1)i+1 in Iin (δ) tends to infinity as n → ∞. Due to Inequality (3.1) and Lemma 2.5, minimizers have exactly one extremum between
Homotopy Classes for Stable Periodic and Chaotic Patterns
579
crossings of (−1)i+1 for any # > 0, and hence there exist subintervals Kn ⊂ Iin (δ) with |Kn | → ∞, such that 0 < |un − (−1)qn | < # on Kn , where qn ∈ {0, 1}, and |u |∂Kn | < #. Taking a subsequence we may assume that qn is constant. We begin by considering the case where qn = i + 1. Now # can be chosen small enough, so that the local theory in [?] is applicable in Kn . If |Kn | becomes too large then un can be replaced by a function with lower action and with many crossings of (−1)i+1 . Subsequently, redundant crossings can be clipped out, thereby lowering the action. This implies that un is not a minimizer in the sense of Definition 2.1, a contradiction. The case where qn = i must be dealt with in a different manner. First, there are unique points tn ∈ Kn such that un (tn ) = 0, and for these points un (tn ) → (−1)i as |Kn | → ∞. Let un (sn ) be the first crossing of (−1)i+1 to the left of Kn . Taking the limit (along subsequences) of un (t − sn ) we obtain a limit function u which is a solution of (1.2). If |tn − sn | is bounded then u has a tangency to u = (−1)i at some t∗ ∈ R. All un lie in {H = 0} (see (1.3)) and so does u, hence u (t∗ ) = 0. Moreover u (t∗ ) = 0, because u(t∗ ) is an extremum. By uniqueness of the initial value problem this implies that u ≡ (−1)i , contradicting the fact that u(0) = (−1)i+1 . If |tn − sn | → ∞, then u is a monotone function on [0, ∞), tending to (−1)i as x → ∞, and its derivatives tend to zero (see Lemma 3 in [14] or Lemma 1, Part (ii) in [?] for details). This contradicts the saddle-focus nature of the equilibrium point. In the case that gin = ∞ we remark that (3.1) also holds when Iin is a half-line. It follows from the estimates in Lemma 5.1 in [?] that uni → (−1)i+1 as x → ∞ or x → −∞ (whichever is applicable). From the local theory in Sect. 4 of [?] and the fact that un is a minimizer, it follows that the derivatives of un tend to zero. The analysis above concerning the intervals Kn and the clipping of redundant oscillations now goes on unchanged. We have shown that the distance between two crossings of ±1 is bounded from above. Next we have to show that the limit function has only transverse crossings of ±1. This ensures that no crossings are lost in the limit. If u were tangent to (−1)i+1 in Ii , then we could construct a function in v ∈ M(g, p) in the same way as demonstrated in [?] by replacing tangent pieces by more J -efficient local minimizers and by clipping. The function v still has a lower action than u on a slightly larger interval (the limit function u also obeys (3.1), so that the above clipping arguments still apply). Since un → u in 4 it follows that J [u ] → J [u] on bounded intervals I . This then implies that for n Cloc I n I large enough the function un is not a minimizer in the sense of Definition 2.1, which is a contradiction. The limit function u could also be tangent to (−1)i for some t0 ∈ Ii . As before, such tangencies satisfy u(t0 ) − (−1)i = u (t0 ) = u (t0 ) = u (t0 ) = 0, which leads to a contradiction of the uniqueness of the initial value problem. Finally, crossings of u = ±1 cannot accumulate since this would imply that at the accumulation point all derivatives would be zero, leading to the same contradiction as above. In particular, if gin → ∞ for some i, then |Iin | → ∞ and the crossings in Anj for j > i move off to infinity and do not show in u, which is compatabile with the convergence of types. 4 –limit of minimizers, We have now proved that u ∈ M(g, p) and, since u is the Cloc u is also a minimizer in the sense of Definition 2.1. Remark 3.3. It follows from the estimates in Theorem 3 of [?] that in the theorem above we in fact only need an L∞ -bound on the sequence un .
580
W.D. Kalies, J. Kwapisz, J.B. VandenBerg, R.C.A.M. VanderVorst
Remark 3.4. It follows from the proof of Theorem 3.2 that there exists a constant δ0 > 0 such that for all uniformly bounded minimizers u(t) it holds that |u(t) − (−1)i+p | > δ for all t ∈ Ii and all i ∈ I. This means that the uniform separation property discussed in [?] is uniformly satisfied by all minimizers. 4. Periodic Minimizers A bi-infinite type g is periodic if there exists an integer n such that σ n (g) = g. The (natural) definition of the period of g is min{n ∈ 2N | σ n (g) = g}. We will write g = r, where r = (g1 , . . . , gn ) and n is even. Cyclic permutations of r with possibly a flip of p give rise to the same function class M(r, p). In reference to the type r with parity p we will use the notation (r, p). Any such type pair (r, p) can formally be associated with a homotopy class in π1 (P, 0) in the following way. Let e0 and e1 be the clockwise oriented circles of radius one centered at (1, 0) and (−1, 0) respectively, so that [e0 ] r /2 r /2 and [e1 ] are generators for π1 (P, 0). Defining θ(r, p) = eτnn (p) · . . . · ep1 , the map θ : ∪k≥1 2N2k × {0, 1} → π1 (P, 0) is an injection, and we define π1+ (P, 0) to be the image of θ in π1 (P, 0). Powers of a type pair (r, p)k for k ≥ 1 are defined by concatenation of r with itself k times, which is equivalent to (r, p)k = θ −1 ((θ (r, p))k ). Definition 4.1. Two pairs (r, p) and ( r, p) are equivalent if there are numbers p, q ∈ N r, p)q up to cyclic permutations. This relation, (r, p) ∼ ( r, p), is such that (r, p)p = ( an equivalence relation.
Example. If (r, p) = (2, 4, 2, 4), 0 and ( r, p) = (4, 2, 4, 2, 4, 2), 1 , then θ(r, p)3 = θ ( r, p)2 . The equivalence class of (r, p) is denoted by [r, p]. A type (r, p) is a minimal representative for [r, p] if for each ( r, p) ∈ [r, p] there is k ≥ 1 such that ( r, p) = (r, p)k up to cyclic permutations. A minimal representative is unique up to cyclic permutations. It is clear that in the representation of a periodic type g = r, the type r is minimal if the length of r is the minimal period. Due to the above equivalences we now have that M(r, p) = M( r, p), ∀ ( r, p) ∈ [r, p]. It is not a priori clear that minimizers in M(r, p) are periodic. However, we will see that among these minimizers, periodic minimizers can always be found. For a given periodic type r we consider the subset of periodic functions in M(r, p), Mper (r, p) = {u ∈ M(r, p) | u is periodic}. For any u ∈ Mper (r, p) and a period T of u, (u|[0,T ] ) is a closed loop in P whose homotopy type corresponds to a nontrivial element of π1+ (P, 0). In this correspondence there is no natural choice of a basepoint. For specificity, we will describe how to make the correspondence with the origin as the basepoint and thereafter omit it from the notation. Translate u so that u(0) = 0. Let γ : [0, 1] → P be the line from 0 to ∗ [0,T ] ) = γ ∗ ◦ (u|[0,T ] ) ◦ γ , and
(0, u (0)), and let+ γ (t) = γ (1 − t). Then (u| (u|[0,T ] ) . Thus there exists a (u|[0,T ] ) ∈ π1 (P, 0). Now define (u|[0,T ] ) ≡
pair θ −1 (u|[0,T ] ) = ( r, p) ∈ [r, p], with r = rk for some k ≥ 1. Therefore we define for any ( r, p) ∈ [r, p],
Mper ( r, p) = u ∈ Mper (r, p) | (u|[0,T ] ) ∼ θ( r, p) ∈ π1 (P) for a period T of u .
Homotopy Classes for Stable Periodic and Chaotic Patterns
581
The type r = g(u|[0,T ] ), with g = r, is the homotopy type of u relative to a period T . This type has an even number of entries. It follows that Mper (r, p) ⊂ Mper ( r, p) for all ( r, p) = (r, p)k , k ≥ 1. Furthermore Mper (r , p) = ∪( r, p)∈[r,p] Mper ( r, p). In order to get a better understanding of periodic minimizers in M(r, p) we consider the following minimization problem: Jper (r, p) =
inf
u∈Mper (r,p)
JT [u] =
inf
T (r,p) Mper T ∈R+
JT [u],
(4.1)
T (r, p) where JT is action given in (1.1) integrated over one period of length T , and Mper is the set of T -periodic functions u ∈ Mper (r, p) for which g(u|[0,T ] ) = r. Note that T is not necessarily the minimal period, unless r is a minimal representative for [r]. It is clear that for γ , β > 0 the infima Jper (r, p) are well-defined and are nonnegative for any homotopy type r. At this point it is not clear, however, that the infima Jper (r, p) are attained for all homotopy types r. We will prove in Sect. 5 that existence of minimizers for (4.1) can be obtained using the existence of homoclinic and heteroclinic minimizers already established in [?].
Lemma 4.2. If Jper (r, p) is attained for some u ∈ Mper (r, p), then u ∈ C 4 (R) and satisfies (1.2). Moreover, since u is minimal with respect to T we have H (u, u , u , u ) = 0, i.e. the associated periodic orbit lies in the energy surface H = 0. Proof. Since Jper (r, p) is attained by some u ∈ Mper (r, p) for some period T , we have that JT [u + φ] − JT [u] ≥ 0 for all φ ∈ H 2 (S 1 , T ) with φH 2 ≤ #, sufficiently small. This implies that dJT [u] = 0, and thus u satisfies (1.2). The second part of this proof is analogous to the proof of Lemma 2.2. We now introduce the following notation: CM(r, p) = {u ∈ M(r, p) | u is a minimizer according to Definition 2.1}, CMper (r, p) = {u ∈ CM(r, p) | u is periodic}, CMper (r, p) = {u ∈ Mper (r, p) | u is a minimizer for Jper (r, p)}. 4.1. Existence of periodic minimizers of type r = (2, 2)k . If we seek periodic minimizers of type r = (2, 2)k , the uniform separation property for minimizing sequences (see Sect. 5 in [?]) is satisfied in the class Mper (r). Note that the parity is omitted because it does not distinguish different homotopy types here. The uniform separation property as defined in [?] prevents minimizing sequences from crossing the boundary of the given homotopy class. For any other periodic type the uniform separation property is not a priori satisfied. For the sake of simplicity we begin with periodic minimizers of type (2, 2) and minimize J in the class Mper ((2, 2)). Minimizing sequences can be chosen to be normalized due to the following lemma, which we state without proof. The proof is analogous to Lemma 3.5 in [?]. Lemma 4.3. Let u ∈ Mper ((2, 2)) and T be a period of u. Then for every # > 0 there exists a normalized function w ∈ Mper ((2, 2)) with period T ≤ T such that JT [w] ≤ JT [u] + #.
582
W.D. Kalies, J. Kwapisz, J.B. VandenBerg, R.C.A.M. VanderVorst
The goal of this subsection is to prove that when F satisfies (H) and β, γ > 0 are such that u = ±1 are saddle-foci, then Jper ((2, 2)) is attained, by Theorem 4.5 below. The proof relies on the local structure of the saddle-focus equilibria u = ±1 and is a modification of arguments in [?]; hence we will provide only a brief argument. The reader is referred to [?] for further details. In preparation for the proof of Theorem 4.5, we fix τ0 > 0, #0 > 0, and δ > 0 so that the conclusion of Theorem 4.2 of [?] holds, i.e. the characterization of the oscillatory behavior of solutions near the saddle-focus equilibria u = ±1 holds. Let T ((2, 2)) be normalized, and let t be such that u(t ) = 0. Then t is part of a u ∈ Mper 0 0 0 transition from ∓1 to ±1. Assume without loss of generality that this transition is from −1 to 1. Define t− = sup{t < t0 : |u(t)+1| < δ} and t+ = inf{t > t0 : |u(t)−1| < δ}. Then let S(u) = {t : |u(t) ± 1| < δ} and B[u, T ] = |S(u) ∩ [t+, t− + T ]|, and note that [t0 , t0 + T ] = S(u) ∩ [t+ , t− + T ] ∪ S(u)c ∩ [t0 , t0 + T ] . With these definitions we can establish the following estimate (cf. Lemma 5.4 in [?]). For all u ∈ Mper ((2, 2)) with JT [u] ≤ Jper ((2, 2)) + #0 , u2H 2 ≤ C(1 + Jper ((2, 2)) + B[u, T ]).
(4.2)
First, u 2H 1 ≤ C(Jper ((2, 2))+#0 ), and second if |u±1| > δ, then F (u) ≥ η2 u2 , which t +T implies that u2L2 ≤ 1/η2 t00 F (u) dt + (1 + δ)2 B[u, T ] ≤ C(JT [u] + B[u, T ]). Combining these two estimates proves (4.2). T ((2, 2)) that satisfy J [u] ≤ J ((2, 2)) + 1, it follows For functions u ∈ Mper T per from Lemma 5.1 of [?] that there exist (uniform in u) constants T1 and T2 such that T2 ≥ |S(u)c ∩ [t0 , t0 + T ]| ≥ T1 > 0 and thus T > T1 . The next step is to give an a priori upper bound on T by considering the minimization problem (cf. Sect. 5 in [?]) T ((2, 2)) normalized, T ∈ R+ , B# = inf{ B[u, T ] | u ∈ Mper and JT [u] ≤ Jper ((2, 2)) + #}.
Lemma 4.4. There exists a constant K = K(τ0 ) > 0 such that B# ≤ K for all 0 < # < #0 . Moreover, if T0 ≡ K + T2 , then for any 0 < # < #0 , there is a normalized T ((2, 2)) with J [u] ≤ J ((2, 2)) + 2# and T < T ≤ T . u ∈ Mper T per 1 0 Tn ((2, 2))×R+ be a minimizing sequence for B# , with normalProof. Let (un , Tn ) ∈ Mper ized functions un . As in the proof of Theorem 5.5 of [?], in the weak limit this yields a pair ( u, T ) such that B[ u, T ] ≤ B# . We now define K((2, 2), τ0 ) = 8((2τ0 + 2) + 2). This gives two possibilities for B[ u, T ], either B[ u, T ] > K or B[ u, T ] ≤ K. If the former is T ((2, 2)) × R+ , true then we can construct (see Theorem 5.5 of [?]) a pair ( v , T ) ∈ Mper with v normalized, such that
v ] < JT [ u] ≤ Jper ((2, 2)) + # JT [
and
B[ v , T ] < B[ u, T ] ≤ B# ,
which is a contradiction excluding the first possibility. In the second case, where B[ u, T ] ≤ K, we can construct a pair ( v , T ) with v normalized such that v ] < JT [ u] + # ≤ Jper ((2, 2)) + 2#, JT [
and
B[ v , T ] < B[ u, T ] ≤ K,
which implies that T1 < T < T ≤ K + T2 = T0 and concludes the proof. For details concerning these constructions, see Theorem 5.5 in [?].
Homotopy Classes for Stable Periodic and Chaotic Patterns
583
Theorem 4.5. Suppose that F satisfies (H) and β, γ > 0 are such that u = ±1 are saddle-foci, then Jper ((2, 2)k ) is attained for any k ≥ 1. Moreover, the projection of any minimizer in CMper ((2, 2)) onto the (u, u )–plane is a simple closed curve. Tn Proof. By Lemma 4.4, we can choose a minimizing sequence (un , Tn ) ∈ Mper ((2, 2))× R+ , with un normalized and with the additional properties that un H 2 ≤ C and T1 < Tn ≤ T0 . Since the uniform separation property is satisfied for the type (2, 2) this leads to a minimizing pair ( u, T ) for (4.1) by following the proof of Theorem 2.2 in [?]. As for the existence of periodic minimizers of type r = (2, 2)k the uniform separation property is automatically satisfied and the above steps are identical. Lemma 2.5 yields that minimizers are normalized functions and the projection of a normalized function in Mper ((2, 2)) is a simple closed curve in the (u, u )–plane.
We would like to have the same theorem for arbitrary periodic types r. For homotopy types that satisfy the uniform separation property the analogue of Theorem 4.5 can be proved. However, in Sect. 5 we will prove a more general result using the information about the minimizers with terminated types (homoclinic and heteroclinic minimizers) which was obtained in [?]. Remark 4.6. The existence of a (2, 2)-type minimizer is proved here in order to obtain a priori W 1,∞ -estimates for all minimizers (Sect. 5). However, if F satisfies the additional hypothesis that F (u) ∼ |u|s , s > 2 as |u| → ∞, then such estimates are automatic (cf. [?,?]). In that case the existence of a minimizer of type (2, 2) follows from Theorem 4.14 below. To prove existence of minimizers of arbitrary type r we will use an analogue of Theorem 4.14 (see Lemma 5.7 and Theorem 5.8 below). 4.2. Characterization of minimizers of type g = (2, 2). Periodic minimizers associated with [e0 ] or [e1 ] are the constant solutions u = −1 and u = 1 respectively. The simplest nontrivial periodic minimizers are those of type r = (2, 2)k , i.e. r ∈ [(2, 2)]. These minimizers are crucial to the further analysis of the general case. The type r = (2, 2) is a minimal type (associated with [e1 e0 ]), and we want to investigate the relation between minimizers in M((2, 2)) and periodic minimizers of type (2, 2)k . Considering curves in the configuration space P is a convenient method for studying minimizers of type (2, 2). For example, minimizers in CM((2, 2)) and CMper ((2, 2)) all satisfy the property that they do not intersect the line segment L = (−1, 1)×{0} in P. If other homotopy types r are considered, i.e. r ∈ [(2, 2)], then minimizers represented as curves in P necessarily have self-intersections and they must intersect the segment L, which makes their comparison more complicated. We will come back to this problem in Sect. 5. Note that for a C 1 -function u the associated curve (u) is a closed loop if and only if u is a periodic function. Lemma 4.7. For any non-periodic minimizer u ∈ CM((2, 2)) and any bounded interval I the curve [u|I ] has only a finite number of self-intersections. For periodic minimizers u ∈ CMper ((2, 2)) this property holds when the length of I is smaller than the minimal period. Proof. Fix a time interval I = [0, T ]. If u is periodic, T should be chosen smaller than the minimal period of u. Let P = (u0 , u0 ) be an accumulation point of self-intersections of u|I . Then P is a self intersection point, and there exists a monotone sequence of times τn ∈ I converging to t0 such that (u(τn )) are self-intersection points and (u(t0 )) = P .
584
W.D. Kalies, J. Kwapisz, J.B. VandenBerg, R.C.A.M. VanderVorst
Also there exists a corresponding sequence σn ∈ I with σn = τn such that (u(τn )) = (u(σn )). Choosing a subsequence if necessary, σn → s0 monotonically. Since u is a minimizer in CM((2, 2)), the intervals [σn , τn ] must contain a transition, and hence |τn − σn | > T0 > 0. Therefore, s0 = t0 , and we will assume that s0 < t0 (otherwise change labels). The homotopy type of (u|[s0 ,t0 ] ) is (2, 2)k for some k ≥ 1 (since I is bounded). Assume that σn and τn are increasing; the other case is similar. Using the times σn < s0 < τn < t0 , the curve ∗ = [u|[σn −δ,t0 +δ] ], for δ sufficiently small, can be decomposed as 1 = a ◦γ2 ◦γ ◦γ1 ◦b, where b = (u|[σn −δ,σn ] ), γ1 = (u|[σn ,s0 ] ), γ = (u|[s0 ,τn ] ), γ2 = (u|[τn ,t0 ] ), and a = (u|[t0 ,t0 +δ] ). For n sufficiently large, γ1 and γ2 have the same homotopy type, and γ1 = γ2 , since otherwise u would be periodic with period smaller than t0 − σn < T . We can now construct two more paths 1 = a ◦ γ1 ◦ γ ◦ γ1 ◦ b
and
2 = a ◦ γ2 ◦ γ ◦ γ 2 ◦ b
which have the same homotopy type for n sufficiently large. Since J [∗ ] is minimal, J [1 ] ≥ J [∗ ] and J [2 ] ≥ J [∗ ], and thus J [γ1 ] ≥ J [γ2 ] and J [γ2 ] ≥ J [γ1 ] which implies that J [γ1 ] = J [γ2 ]. Therefore J [∗ ] = J [1 ] = J [2 ], and 1 , 2 and ∗ are all distinct minimizers with the same homotopy type and same boundary conditions. Since these curves all coincide along γ , the uniqueness of the initial value problem is contradicted. An argument very similar to the one above is also used in the proof of Lemma 4.12 and is demonstrated in Fig. 4.1. Lemma 4.8. If r = (2, 2)k with k > 1, then CMper (r) = CMper ((2, 2)) and Jper (r) = k · Jper ((2, 2)). Proof. Let u ∈ CMper (r) with r = (2, 2)k for k > 1, and let T be the period1 such that the associated curve in P, (u|[0,T ] ), has the homotopy class of θ((2, 2)k ). First we will prove that (u|[0,T ] ) is a simple closed curve in P, and hence u ∈ Mper ((2, 2)). Suppose not, then by Lemma 4.7 the curve (u|[0,T ] ) can be fully decomposed into k distinct simple closed curves i for i = 1, . . . , k (just call the inner loop 1 , cut it out, and call the new inner loop 2 , and so on). Denote by Ji the action associated with loop i , then i Ji = JT [u]. Let vi ∈ Mper ((2, 2)k ) be the function obtained by pasting together k copies of u restricted to the loop i . If vi were a minimizer in Mper ((2, 2)k ), then by Lemma 4.2 the functions u and vi would be distinct solutions to the differential equation (1.2) which coincide over an interval. This would contradict the uniqueness of solutions of the initial value problem, and hence vi is not a minimizer, i.e. JT [vi ] = k · Ji > Jper ((2, 2)k ). Consequently Jper ((2, 2)k ) = i Ji > Jper ((2, 2)k ), which is a contradiction. Thus u ∈ Mper ((2, 2)) and (u|[0,T ] ) is a simple loop traversed k times. Now we will show that u ∈ CMper ((2, 2)). Since (u) is the projection of a function into the (u, u )–plane, u traverses the loop once over the interval [0, T /k], and Jper ((2, 2)k ) = k · JT /k [u]. Suppose JT /k > Jper ((2, 2)). Then we can construct a function in Mper ((2, 2)k ) with action less than J [u] = Jper ((2, 2)k ) by gluing together k copies of a minimizer in Mper ((2, 2)), which is a contradiction. Lemma 4.9. For any k ≥ 1, CMper ((2, 2)k ) = CMper ((2, 2)) = CMper ((2, 2)). 1 One may assume without loss of generality that is a minimal period.
Homotopy Classes for Stable Periodic and Chaotic Patterns
585
Proof. We have already shown in Lemma 4.8 that CMper ((2, 2)k ) = CMper ((2, 2)). We first prove that CMper ((2, 2)) ⊂ CMper ((2, 2)). Let u ∈ CMper ((2, 2)) have period T . Suppose u ∈ CMper ((2, 2)). Then there exist two points (u(t1 )) = P1 and (u(t2 )) = P2 on (u) such that the curve γ between P1 and P2 obtained by following (u) is not minimal. Replacing γ by a curve with smaller action and the same homotopy type yields a function v ∈ Mper ((2, 2)) for which J[t1 ,t2 ] [v] ≤ J[t1 ,t2 ] [u]. Choose k ≥ 0 such that kT > t2 − t1 . Then u is a minimizer in CMper ((2, 2)k ) = CMper ((2, 2)) which is a contradiction. To finish the proof of the lemma we show that CMper ((2, 2)) ⊂ CMper ((2, 2)). Let u ∈ CMper ((2, 2)) have period T . Let (u|[0,T ] ) be the associated closed curve in P and ω its winding number with respect to the segment L. Suppose JT [u] > Jper ((2, 2)ω ) = ω·Jper (2, 2). This implies the existence of a function v ∈ Mper ((2, 2)ω ) and a period T such that JT [v] < JT [u]. Choose a time t0 ∈ [0, T ] such that u(t0 ) = 1 and u (t0 ) > 0. Let P0 = (1, u (t0 )) ∈ P. There exists a δ > 0 sufficiently small such that u(t0 ± δ) > 0, u (t0 ± δ) > 0, and u does not cross ±1 in [t0 − δ, t0 + δ] except at t0 . Let P1 and P2 denote the points (u(t0 ∓ δ), u (t0 ∓ δ)) respectively. Let γ denote the piece of the curve (u) from P1 to P2 and γ ∗ the curve tracing (u) backward in time from P2 to P1 . Now choose a point P3 on (v) for which v = 1 and v > 0. We can easily construct cubic polynomials p1 and p2 for which the curve (p1 ) connects P1 to P3 and the curve (p2 ) connects P3 to P2 in P. These curves (pi ) are monotone functions, and hence the loop (p1 ) ◦ (p2 ) ◦ γ ∗ has trivial homotopy type in P. Therefore (u|[0,T ] )k ◦ γ ∼ (p2 ) ◦ (v|[0,T ] )k ◦ (p1 ) in P for any k ≥ 1, and from Definition 2.1 J [(u|[0,T ] )k ◦ γ ] ≤ J [(p2 ) ◦ (v|[0,T ] )k ◦ (p1 )]. Thus, k · JT [u] + J [γ ] ≤ J [p1 ] + J [p2 ] + k · JT [v], which implies 0 ≤ k(JT [u] − JT [v]) ≤ J [p1 ] + J [p2 ] − J [γ ]. These estimates lead to a contradiction for k sufficiently large.
Lemma 4.10. For any two distinct minimizers u1 and u2 in CMper ((2, 2)), the associated curves (ui ) do not intersect. Proof. Suppose (u1 ) and (u2 ) intersect at a point P ∈ P. Translate u1 and u2 so that (u1 (0)) = (u2 (0)) = P . Define the function u ∈ Mper ((2, 2)2 ) as the periodic extension of u1 (t) for t ∈ [0, T1 ], u(t) = u2 (t − T1 ) for t ∈ [T1 , T1 + T2 ], where Ti is the minimal period of ui . Then JT1 +T2 [u] = 2Jper ((2, 2)) = Jper ((2, 2)2 ). By Lemma 4.8 we have u ∈ CMper ((2, 2)), which contradicts the fact that u1 and u2 are distinct minimizers with (u1 ) = (u2 ). As a direct consequence of this lemma, the periodic orbits in Mper ((2, 2)) are ordered in the sense that (u1 ) lies either strictly inside or outside the region enclosed by (u2 ). The ordering will be denoted by >. Theorem 4.11. There exists a largest and a smallest periodic orbit in CMper ((2, 2)) in the sense of the above ordering, which we will denote by umax and umin respectively. Moreover 1 < umin 1,∞ ≤ umax 1,∞ ≤ C0 , and umin < u < umax for every u ∈ CMper ((2, 2)). In particular the set CMper ((2, 2)) is compact.
586
W.D. Kalies, J. Kwapisz, J.B. VandenBerg, R.C.A.M. VanderVorst
Proof. Either the number of periodic minimizers is finite, in which case there is nothing to prove, or the set of minimizers is infinite. Let U = {(u) | u ∈ CMper ((2, 2))} ⊂ P, and let A = U ∩ {(u, u ) | u = 0, u > 0}. Every minimizer in CMper ((2, 2)) intersects the positive u–axis transversely exactly once. Moreover distinct minimizers cross this axis at distinct points by Lemma 4.10. Thus we can use A as an index set and label the minimizers as uα for α ∈ A. Due to the a priori upper bound on minimizers (Lemma 5.1 in [?]), A is a bounded set. The set A is contained in the u-axis and hence has an ordering induced by the real numbers. This order corresponds to the order on minimizers, i.e. α < β in A if and only if uα < uβ as minimizers. Suppose α∗ is an accumulation point of A. Then there exists a sequence αn converging to α∗ . From Theorem 3.2 (the a priori L∞ -bound on uαn is sufficient by Remark 3.3) we see that there exists u ∈ CM((2, 2)) which is a solution to Eq. (1.2) such that 1 (R). Since u is periodic and the C 1 –limit of a sequence of periodic uαn → u in Cloc αn loc functions with uniformly bounded periods (compare with the proof of Theorem 3.2 to find a uniform bound on the periods) is periodic, u ∈ CMper ((2, 2)). By Lemma 4.9, u ∈ CMper ((2, 2)). Furthermore u corresponds to uα∗ , and hence A is compact. Consequently A contains maximal and minimal elements. Let umax and umin be the periodic minimizers through the maximal and minimal points of A respectively. This proves the theorem. The above lemmas characterize periodic minimizers in CM((2, 2)). Now we turn our attention to non-periodic minimizers. We conclude this subsection with a theorem that gives a complete description of the set CM((2, 2)). Let u ∈ CM((2, 2)) be non-periodic. Suppose that P is a self-intersection point of (u). Then there exist times t1 < t2 such that (u(t1 )) = (u(t2 )) = P , and (u|[t1 ,t2 ] ) is a closed loop. By Lemma 4.7 there are only finitely many self-intersections on [t1 , t2 ]. Without loss of generality we may therefore assume that γ is a simple closed loop, i.e, we need only consider the case where P = (u(t1 )) = (u(t2 )) and (u|[t1 ,t2 ] ) is a simple closed loop. We now define + = (u|(t1 ,∞) ) and − = (u|(−∞,t2 ) ). We will refer to ± as the forward and backward orbits of u relative to P . Lemma 4.12. Let u ∈ CM((2, 2)) be a non-periodic minimizer with at least one selfintersection. Let P and ± be defined as above. Then the forward and backward orbits ± relative to P do not intersect themselves. Furthermore, P and ± are unique, and the curve (u) passes through any point in P at most twice. Proof. We will prove the result for + ; the argument for − is similar. Suppose that + has self-intersections. Define t∗ = min{t > t1 | (u(t)) = (u(τ )) for some τ ∈ (t1 , t)}. The minimum t∗ is attained by Lemma 4.7, and t∗ > t2 since γ ≡ (u|[t1 ,t2 ] ) is a simple closed loop. Let t0 ∈ (t1 , t∗ ) be the point such that (u(t0 )) = (u(t∗ )). This point is unique by the definition of t∗ , and γ˜ ≡ (u|[t0 ,t∗ ] ) is a simple closed loop. For small positive δ we define Q = (u(t∗ )), B = (u(t1 − δ)), E = (u(t∗ + δ)) and ∗ = (u|[t1 −δ,t∗ +δ] ), see Fig. 4.1. We can decompose this curve into five parts; ∗ = σ3 ◦ γ˜ ◦ σ2 ◦ γ ◦ σ1 , where σ1 joins B to P , σ2 joins P to Q, σ3 joins Q to E, and γ and γ˜ are simple closed loops based at P and Q respectively, see Fig. 4.1. The simple closed curves γ and γ˜ go around L exactly once and thus have the same homotopy type. Moreover, γ = γ˜ since u is non-periodic.
Homotopy Classes for Stable Periodic and Chaotic Patterns
587
Besides ∗ we can construct two other distinct paths from B to E: 1 = σ 3 ◦ σ 2 ◦ γ ◦ γ ◦ σ 1
and
2 = σ3 ◦ γ˜ ◦ γ˜ ◦ σ2 ◦ σ1 .
It is not difficult to see that 1 , 2 and ∗ all have the same homotopy type. Since J [∗ ] is minimal in the sense of Definition 2.1 we have, by the same reasoning as in Lemma 4.7, that J [1 ] ≥ J [∗ ] and J [2 ] ≥ J [∗ ], which implies that J [γ˜ ] ≥ J [γ ] and J [γ ] ≥ J [γ˜ ]. Hence J [γ ] = J [γ˜ ]. Therefore J [1 ] = J [2 ] = J [∗ ] which gives that 1 , 2 and ∗ are all distinct minimizers of the same type as curves joining B to E. Since these curves all contain the paths σ1 , σ2 and σ3 , and are solutions to (1.2), the uniqueness to the initial value problem is contradicted. Finally, the curve (u) can pass through a point at most twice because it is a union of + and − , each visiting a point at most once. Moreover, points in (u|(t1 ,t2 ) ), common to both + and − , are passed exactly once. It now follows that if there is another selfintersection besides P , say at R = (u(s1 )) = (u(s2 )), then s1 < t1 and t2 < s2 . We conclude that the curve (u|(s1 ,s2 ) ) contains (u|[t1 ,t2 ] ) and therefore it is not a simple closed curve. Thus P is a unique self-intersection that cuts off a simple loop.
B
Q
2
1
3
P
~
E L
;1; 0)
(
(1; 0)
Fig. 4.1. The forward orbit + starting at P with a self-intersection at the point Q Lemma 4.12 implies that this cannot happen for non-periodic u ∈ CM((2, 2))
Lemma 4.13. Let u ∈ CM((2, 2)) be non-periodic. Suppose that u ∈ L∞ (R). Then u is a connecting orbit between two periodic minimizers u− , u+ ∈ CMper ((2, 2)), i.e. there are sequences tn− , tn+ → ∞ such that u(t − tn− ) → u− (t) and u(t + tn+ ) → u+ (t) 4 (R). in Cloc Proof. Lemma 4.12 implies that + is a spiral which intersects the positive u–axis at a bounded, monotone sequence of points (αn , 0) in P converging to a point (α∗ , 0). Let tn be the sequence of consecutive times such that u(tn ) = αn , and n (tn ) = 0. Consider the sequence of minimizers in CM((2, 2)) defined by un (t) = u(t + tn ). By Theorem 1 –limit u ∈ CM((2, 2)). If u is periodic, there is nothing more 3.2 there exist a Cloc + + to prove. Thus suppose u+ is non-periodic. Then the curve (u+ ) crosses the u–axis 1 convergence (u ) crosses infinitely many times. On the other hand, from the Cloc +
588
W.D. Kalies, J. Kwapisz, J.B. VandenBerg, R.C.A.M. VanderVorst
this axis only at α∗ . By Lemma 4.12, (u+ ) can intersect α∗ at most twice, which is a 4 –convergence follows from regularity (as in the proof of Theorem contradiction. The Cloc 4.2). The proof of the existence of u− is similar. Theorem 4.14. Let u ∈ CM((2, 2)). Either u is unbounded, u is periodic and u ∈ CMper ((2, 2)), or u is a connecting orbit between periodic minimizers in CMper ((2, 2)). Proof. Let u ∈ CM((2, 2)) be bounded, then u is either periodic or non-periodic. In the case that u is periodic it follows from Lemma 4.9 that u ∈ CMper ((2, 2)). Otherwise if u is not periodic it follows from Lemma 4.13 that u is a connecting orbit between two minimizers u− , u+ ∈ CMper ((2, 2)). In Sect. 5.2 we give analogues of the above theorems for arbitrary homotopy types r. Notice that the option of u ∈ CM((2, 2)) being unbounded in the above theorem does not occur when F (u) ∼ |u|s , s > 2 as |u| → ∞. 5. Properties of Minimizers In Sect. 4, we proved the existence of minimizers in Mper ((2, 2)), which will provide a priori bounds on the minimizers of arbitrary type. These bounds and Theorem 3.2 will establish the existence of such minimizers. In this section we will also prove that certain properties of a type g are often reflected in the associated minimizers. The most important examples are the periodic types g = r. Although there are minimizers in every class M(r, p), it is not clear a priori that among these minimizers there are also periodic minimizers. In order to prove existence of periodic minimizers for every periodic type r we use the theory of covering spaces. 5.1. Existence. The periodic minimizers of type (2, 2) are special for the following reason. For a normalized u ∈ Mper ((2, 2)), define D(u) to be the closed disk in R2 such that ∂D(u) = (u). Theorem 5.1. i) If u ∈ CM(r , p), then (u) ⊂ D(umin ) for any periodic type r = (2, 2). ii) If u ∈ CM(g, p), then (u) ⊂ D(umin ) for any terminated type g. Proof. i) If r = (2, 2) then every u ∈ CM(r , p) has the property that (u) intersects the u-axis between u = ±1. Suppose that (u) does not lie inside D(umin ). Then (u) must intersect (umin ) at least twice, and let P1 and P2 be distinct intersection points with the property that the curve 1 obtained by following (u) from P1 to P2 lies entirely outside of D(umin ). Let 2 ⊂ (umin ) be the curve from P1 to P2 following umin , such that 1 and 2 are homotopic (traversing the loop (umin ) as many times as necessary) and thus J [1 ] = J [2 ] is minimal. Replacing 1 by 2 leads to a minimizer in CM(r , p) which partially agrees with u. This contradicts the uniqueness of the initial value problem for (1.2). ii) As in the previous case the associated curve (u) either intersects (umin ) at least twice or lies completely inside D(umin ), and the proof is identical. Corollary 5.2. For all minimizers in the above theorem, u1,∞ ≤ umin 1,∞ ≤ C0 . In order to prove existence of minimizers in every class we now use the above theorem in combination with an existence result from [?].
Homotopy Classes for Stable Periodic and Chaotic Patterns
589
Theorem 5.3. For any given type g and parity p there exists a (bounded) minimizer u ∈ CM(g, p). Moreover u1,∞ ≤ C0 , independent of (g, p). Proof. Given a type g we can construct a sequence gn of terminated types such that gn → g as n → ∞. For any terminated type gn there exists a minimizer un ∈ CM(gn , p) by Proposition 2.3 (Theorem 1.3 of [?]). Clearly such a sequence un satisfies un 1,∞ ≤ C by Corollary 5.2. Applying Theorem 3.2 completes the proof.
5.2. Covering spaces and the action of the fundamental group. The fundamental group of P is isomorphic to the free group on two generators e0 and e1 which represent loops (traversed clockwise) around (1, 0) and (−1, 0) respectively with basepoint (0, 0). Indeed, P is homotopic to a bouquet of two circles X = S1 ∨ S1 . The universal covering can be represented by an infinite tree whose edges cover either e0 or of X denoted by X → P, can then be e1 in X, see Fig. 5.1. The universal covering of P, denoted by ℘ : P viewed by thickening the tree X so that P is homeomorphic to an open disk in R2 .
Xg
Xg }g
O
O
}
}
e1
0
X
e0
of X is a tree. Its origin is denoted by O. For θ = e0 e1 e0 , the quotient space Fig. 5.1. The universal cover X θ = X/ θ is also a covering space over X, and X θ ∼ S 1 X
An important property of the universal covering is that the fundamental group π1 (P) in a natural way, via the lifting of paths in P to paths in induces a left group action on P We will not reproduce P. This action will be denoted by θ · p for θ ∈ π1 (P) and p ∈ P. the construction of this action here, and the reader is referred to an introductory book on algebraic topology such as [?]. However, we will utilize the structure of the quotient
590
W.D. Kalies, J. Kwapisz, J.B. VandenBerg, R.C.A.M. VanderVorst
obtained from this action, which are again coverings of P. These quotient spaces of P spaces will be the natural spaces in which to consider the lifts of curves (u) which lie in more complicated homotopy classes than those in the case of u ∈ Mper ((2, 2)). A periodic type g = r is generated by a finite type r, which together with the parity r2n p determines an element of π1 (P) of the form θ(r) = e|p−1| · · · · · epr1 . Since we only consider curves in P which are of the form (u) = (u(t), u (t)), the numbers ri are all positive. The infinite cyclic subgroup generated by any such element θ will be denoted θ = P/ θ is obtained by identifying points p by θ ⊂ π1 (P). The quotient space P for which q = θ k · p for some k ∈ Z. The resulting space P θ is homotopic and q in P θ → P is a covering space. Figure 5.1 illustrates the situation to an annulus, and ℘θ : P for X, since it is easier to draw, and for P the reader should imagine that the edges in based at O is shown by the picture are thin strips. The lift of the path θ = e0 e1 e0 to X θ . Note the dashed line. This piece of the tree becomes a circle in the quotient space X are identified with this circle. The dashed lines in both that infinitely many edges in X and X θ are strong deformation retracts of X and X θ respectively, and hence X θ is X θ is homotopic to an annulus. Thus θ gives that P homotopic to a circle. Thickening X θ ) is a generated by a simple closed loop in P θ which will be denoted by ζ (r). Note π1 (P that for convenience we suppress the dependence of θ and ζ on the parity p. Remark 5.4. If we define the shift operator σ on finite types r to be a cyclic permutation, then Mper (r, p) = Mper (σ k (r), τ k (p)) for all k ∈ Z. Functions in Mper (r, p) have a θ , θ = θ(r). However, functions in the shifted unique lift to a simple closed curve in P k k θ . In order for such functions class Mper (σ (r), τ (p)) are not simple closed curves in P θk , to be lifted to a unique simple closed curve we need to consider the covering space P k k where θk = θ (σ (r), τ (p)). 5.3. Characterization of minimizers of type r. In Sect. 5.2 we characterized minimizers in CM((2, 2)) by studying the properties of their projections into P. What was special about the types (2, 2)k was that the projected curves were a priori contained in P \L, which is topologically an annulus. The J -efficiency of minimizing curves restricts the possibilities for their self and mutual intersections. In particular, we showed that all periodic minimizers in CM((2, 2)) project onto simple closed curves in P \ L and that no two such minimizing curves intersect. These two properties, coupled with the simple topology of the annulus, already force the minimizing periodic curves to have a structure of a family of nested simple loops. Such a simple picture in the configuration plane P cannot be expected for minimizers in CM((r , p)) with r = (2, 2). The simple intersection properties (of Lemma 5.9 and 5.11) no longer hold; in fact, periodic minimizing curves must have self-intersections in P as do any curves in P representing the homotopy class of (r , p). However, by θ , we can remove exactly these necessary lifting minimizing curves into the annulus P self-intersections and put us in a position to emulate the discussion for the types (2, 2)k . More precisely, for a minimal type (r, p), any u ∈ Mper ((r, p)k ) with period T such that θ −1 [(u|[0,T ] )] = (r, p)k , there are infinitely many lifts of the closed loop (u|[0,T ] ) θ (r) (see the above remark) but there is exactly one lift, denoted θ (u|[0,T ] ), that into P θ (r). We can repeat all of the arguments in is a closed loop homotopic to ζ k (r) in P θ (r) instead of Sect. 4 by identifying intersections between the curves θ (u|[0,T ] ) in P intersections between the curves (u|[0,T ] ) in P \ L. Of course, when gluing together pieces of curves, the values of u and u come from the projections into P. In particular,
Homotopy Classes for Stable Periodic and Chaotic Patterns
591
the arguments of Lemma 4.9 show that θ (u|[0,T ] ) must be a simple loop traced k-times, which leads to the following: Lemma 5.5. For any periodic type r and any k ≥ 1 it holds that CMper ((r, p)k ) = CMper (r, p) = CMper (r , p). The proof of the next theorem is a slight modification of Theorem 4.11. Theorem 5.6. For any periodic type r the set CMper (r, p) is compact and totally θ ). ordered (in P The following lemma is analogous to Lemma 4.13. Note however that by Theorem 5.1 we do not need to assume that the minimizer is uniformly bounded. Lemma 5.7. Let u ∈ CM(r , p) for some periodic type r = (2, 2). Either u is periodic and u ∈ CMper (r, p), or u is a connecting orbit between two periodic minimizers u− , u+ ∈ CMper (r, p), i.e. there are sequences tn− , tn+ → ∞ such that 4 (R). u(t − tn− ) → u− (t) and u(t + tn+ ) → u+ (t) in Cloc Combining Theorem 5.3 and Lemma 5.7 we obtain the existence of periodic minimizers in every class with a periodic type (this result can also be obtained in a way analogous to Theorem 4.5). Theorem 5.8. For any periodic type r the set CMper (r, p) is nonempty. The classification of functions by type has some properties in common with symbolic dynamics. For example, if a type g is asymptotic to two different periodic types, i.e. σ n (g) → r+ and σ −n (g) → r− as n → ∞, with r+ = r− , then any minimizer u ∈ CM(g, p) is a connecting orbit between two periodic minimizers u− ∈ CMper(r− ,p) and u+ ∈ CMper (r+ , p), i.e. there exist sequences tn− , tn+ → ∞ such that u(t −tn− ) → u− (t) 4 (R). This result follows from Cantor’s diagonal argument and u(t + tn+ ) → u+ (t) in Cloc using Theorems 3.2 and 5.7, and hence we have used the symbol sequences to conclude the existence of heteroclinic and homoclinic orbits connecting any two types of periodic orbits. Symmetry properties of types g are also often reflected in the corresponding minimizers. For example, define the map Bi0 on infinite types by Bi0 (g) = (g2i0 −i )i∈Z , and consider types that satisfy Bi0 (g) = g for some i0 . Moreover assume that g is periodic. In this case we can prove that the corresponding periodic minimizers are symmetric and satisfy Neumann boundary conditions. Theorem 5.9. Let g = r satisfy Bi0 (r) = r for some i0 . Then for any u ∈ CMper (r, p) there exists a shift τ such that uτ (x) = u(x − τ ) satisfies i) uτ (x) = uτ (T − x) for all x ∈ [0, T ] where T is the period of u, ii) uτ (0) = u τ (0) = 0 and uτ (T ) = uτ (T ) = 0, and iii) uτ is a local minimizer for the functional JT [u] on the Sobolev space Hn2 (0, T ) = {u ∈ H 2 (0, T ) | u (0) = u (T ) = 0}. Proof. Without loss of generality we may assume that i0 = 1 and g = (g1 , . . . , gN ) for some N ∈ 2N. We can choose a point t0 in the convex hull of A1 such that u (t0 ) = u (t0 + T ) = 0 and g(u|[t0 ,t0 +T ] ) = (g1 /2, g2 , . . . , gN , g1 /2). We now define v(t) = u(t0 +T − t). Then by the symmetry assumptions on g we have that g(v|[t0 ,t0 +T ] ) = g(u|[t0 ,t0 +T ] ). Since J[t0 ,t0 +T ] (v) = J[t0 ,t0 +T ] (u) and (u(t0 )) = (u(t0 +T )) = (v(t0 )) = (v(t0 + T )), we conclude from the uniqueness of the initial value problem that u(t) = v(t) for all t ∈ [t0 , t0 + T ], which proves the first statement. The second statement follows immediately from i). The third property follows from the definition of minimizer.
592
W.D. Kalies, J. Kwapisz, J.B. VandenBerg, R.C.A.M. VanderVorst
References 1. Bangert, V.: Mather Sets for Twist Maps and Geodesics on Tori. Volume 1, of Dynamics Reported. Oxford: Oxford University Press, 1988 2. Boyland, P. and Golé, C.: Lagrangian systems on hyperbolic manifolds. Ergodic Theory Dynam. Systems 19, 1157–1173 (1999) 3. Fulton, W.: Algebraic Topology: A First Course. Berlin–Heidelberg–New York: Springer-Verlag, 1995 4. Ghrist, R., VandenBerg, J.B. and VanderVorst, R.C.A.M.: Braided closed characteristics in fourth-order twist systems. Preprint 2000 5. Ghrist, R. and VandenBerg, J.B. and VanderVorst, R.C.A.M.: Morse theory on the space of braids and Lagrangian dynamics. In preparation 6. Hulshof, J. and VandenBerg, J.B. and VanderVorst, R.C.A.M.: Traveling waves for fourth order parabolic equations. To appear in SIAM J. Math. Anal. (1999) 7. Kalies, W.D. and Kwapisz, J. and VanderVorst, R.C.A.M.: Homotopy classes for stable connections between Hamiltonian saddle-focus equilibria. Commun. Math. Phys. 193, 337–371 (1998) 8. Kalies, W.D. and VanderVorst, R.C.A.M.: Multitransition homoclinic and heteroclinic solutions of the extended Fisher–Kolmogorov equation. J. Diff. Eq. 131, 209–228 (1996) 9. Kalies, W.D. and VanderVorst, R.C.A.M. and Wanner, T.: Slow motion in higher-order systems and -convergence in one space dimension. To appear in Nonlin. Anal. TMA 10. Kwapisz, J.: Uniqueness of the stationary wave for the extended Fisher–Kolmogorov equation. J. Diff. Eq. 165, 235–253 (2000) 11. Morse, M.: A fundamental class of geodesics on any closed surface of genus greater than one. Trans. Am. Math. Soc. 26, 25–60 (1924) 12. Rabinowitz, P.H.: Heteroclinics for a Hamiltonian system of double pendulum type. Top. Meth. Nonlin. Anal. 9, 41–76 (1997) 13. Schecter, E.: Handbook of analysis and its foundations. San Diego–New York–Boston: Acad. Press, 1997 14. VandenBerg, J.B.: The phase-plane picture for a class of fourth-order conservative differential equations. J. Diff. Eq. 161, 110–153 (2000) 15. VandenBerg, J.B.: Uniqueness of solutions for the extended Fisher–Kolmogorov equation. Comptes Rendus Acad. Sci. Paris (Série I) 326, 447–452 (1998) 16. VandenBerg, J.B. and VanderVorst, R.C.A.M.: Stable patterns for fourth order parabolic equations. Preprint (2000) Communicated by Ya. G. Sinai
Commun. Math. Phys. 214, 593 – 605 (2000)
Communications in
Mathematical Physics
© Springer-Verlag 2000
The Ordered K-Theory of C ∗ -Algebras Associated with Substitution Tilings Ian F. Putnam Department of Mathematics and Statistics, University of Victoria, Victoria, B.C., Canada Received: 12 August 1999 / Accepted: 2 May 2000
Abstract: We consider the C ∗ -algebra, AT , constructed from a substitution tiling system which is primitive, aperiodic and satisfies the finite pattern condition. Such a C ∗ -algebra has a unique trace. We show that this trace completely determines the order structure on the group K0 (AT ); a non-zero element in K0 (AT ) is positive if and only if its image under the map induced from the trace is positive. 1. Introduction and Statement of the Main Result We begin by introducing some of the terminology and notation. All of these things are developed more fully in the survey article [KP]. We have included other references to more original sources where appropriate. A substitution tiling system in Rd consists of a finite collection of bounded, regular closed sets p1 , . . . , pN in Rd called prototiles. We also have a constant λ > 1 and, for each i = 1, . . . , N, ω(pi ) which is a finite collection of subsets of Rd with pairwise disjoint interiors; each is a translate of one of the prototiles and their union is λpi = {λx | x ∈ pi }. In general, we call a translate of one of the prototiles a tile. Several one and two dimensional examples, including the Penrose tiles, are given in [AP]. As a generalization of the above, one can also have a finite set called labels. A labelled prototile is then a bounded, closed regular subset together with a label. (The idea being that we now have a way of distinguishing two prototiles which may be exactly the same geometric object.) It is clear how to extend the remainder of the definitions to this situation. All of our results apply equally well to the situation of labelled prototiles. Collections of tiles with pairwise disjoint interiors are called partial tilings. The union of such a set of tiles is called the support of the partial tiling and a partial tiling whose support is Rd is called a tiling. If T denotes a tiling (or even a partial tiling), then for any x in Rd , T + x denotes the tiling (or partial tiling) obtained by translating all tiles in T by x. Supported in part by a grant from NSERC, Canada
594
I. F. Putnam
Notice that we can extend our definition of ω to tiles by ω(p + x) = ω(p) + λx, for any prototile p and vector x. We can further extend this definition to partial tilings by ω(T ) = {ω(t) | t ∈ T }. This also means that we can iterate ω, and for any prototile p, we may construct ωn (p), for any n = 1, 2, . . . , which is a partial tiling with support λn p. We will assume here that all of our prototiles contain the origin in their interior. (This loses no generality.) We define the puncture of any prototile p to be the origin and for any vector x, we define the puncture of t = p + x, denoted x(t), to be x. So each tile has a distinguished point in its interior. We say that the substitution is primitive if there is a positive integer k such that, for every ordered pair of prototiles p, p , a translate of p appears inside ωk (p). We construct a tiling as follows. There exists a prototile p, a vector x, and a positive integer k so that the sequence of partial tilings ωnk (p + x), n = 1, 2, . . . is coherent in the sense that the nth one contains all the earlier ones. Moreover, these grow to cover Rd . We will not prove this here, although the proof is not difficult. We let T denote the union of these partial tilings which is a tiling. We look at all translations of T and put a metric d on this set as in [RW, Rud, So1]). The completion of this set of translations of T is denoted . It is also worth noting that the elements of this completion can be viewed as tilings with the same tiles. This space is actually independent of the choice of T as above. From now on, we revert to using T to denote an arbitrary element of . Under a hypothesis called the finite pattern condition, [RW], it is a compact metric space. The map ω:→ then becomes a continuous surjection. We will focus our attention on the case when this map is also injective, and hence, a homeomorphism. This is usually referred to as the unique composition property or that the substitution is locally invertible. However, Solomyak [So2] has shown that, with the hypothesis of the finite pattern condition, this is equivalent to the set containing no periodic tilings. That is, if T in and x in Rd satisfy T + x = T , then x = 0. (In the terminology of [So2], is the local isomorphism class of any of its elements.) Therefore, we will say that the substitution system is aperiodic if the map ω is injective. We will also assume that our substitution forces its border [Kel1]. As discussed in [KP], this loses no generality, provided we allow labelled tiles. We define punc to be the set of all tilings in which have a puncture on the origin. This set is compact and totally disconnected. We want to describe a base for its topology consisting of clopen sets. Fix some finite partial tiling P inside ωk (p), where p is a prototile and k is a positive integer, and let t be any tile in P . We let U (P , t) = {T | P − x(t) ⊂ T }. That is, we translate the patch P back by vector x(t), so that t now covers the origin, with its puncture exactly on the origin. Then we look at all tilings containing this patch. This set is closed and open in punc and such sets form a base for the topology. We are interested in the equivalence relation on punc which is simply translational equivalence. That is, we define Rpunc = {(T , T + x) | T , T + x ∈ punc }. This set is also given a topology which is easiest to describe as follows. Let P be a patch as before and let t, t be two tiles in P . The map sending T in U (P , t) to
Ordered K-Theory of C ∗ -Algebras with Substitution Tilings
595
T + x(t) − x(t ) is a homeomorphism onto U (P , t ). Its graph is contained in Rpunc and is denoted U (P , t, t ). These sets form a base for the topology of Rpunc . Indeed, they are actually clopen sets and Rpunc is totally disconnected. This makes Rpunc into a locally compact, Hausdorff, σ -compact, r-discrete, principal groupoid with counting measure as a Haar system. (See [Ren] as a general reference on the subject of groupoids, or [Put2] for a leisurely treatment.) We use r, s to denote the range and source maps from Rpunc to punc . That is, r(T , T ) = T and s(T , T ) = T . We let AT denote the C ∗ -algebra of Rpunc . We refer the reader to [KP] or to [Ren] as the main source for the construction of C ∗ -algebras from groupoids. For general references to C ∗ -algebras, we suggest [Fi, Da, Pe]. (We should note that this really doesn’t depend on T . The notational confusion comes because this is a special case of a more general construction [KP]. It would probably be preferable to use the notation Aω , but we will stay with this for historical reasons.) This C ∗ -algebra is the completion in a certain norm of the *-algebra of continuous compactly supported functions on Rpunc , denoted Cc (Rpunc ). Let us mention some properties of this C ∗ -algebra. The key point is that the space with map ω can be viewed as a Smale space [AP]. Then the space punc can be viewed as an abstract transversal to the relation of unstable equivalence. The reduction of this groupoid on punc is exactly Rpunc . Hence AT is strongly Morita equivalent to U (, ω) and the results of [PS] apply. In particular, the equivalence relation Rpunc is minimal in the sense that every equivalence class is dense in punc and also amenable in the sense of Renault [Ren]. We are interested in the computation of the K-theory of AT , and especially its K-zero group. We refer the reader to [Bl,W-O] as general references for K-theory for operator algebras and [Be1, Kel1] for further information and motivation for this problem in physics. Methods have been given in [AP,Be2,BCL,Kel1,Kel2] for the computation of the K-theory of AT . In some cases, these included the order structure on K-zero. Here, we will prove a more general result. The space punc possesses a natural measure µ. It is most easily described as follows. The mixing Smale space, (, ω), has a measure of maximum entropy which is a product measure with respect to the canonical stable and unstable coordinates. The entropy is d log(λ). The set punc is contained in a finite collection of local stable sets and the measure µ is simply the restriction of the stable component of the measure of maximal entropy. Its key properties are that it is finite and Rpunc -invariant. This means that it is preserved under the local homomorphisms whose graphs make up our topology base above. This measure has full support. This is because the equivalence relation Rpunc is minimal and, since µ is Rpunc -invariant, its support is also. This measure is also unique. We will have more to say about this later. This measure µ defines a trace, τ , on the C ∗ -algebra AT . For an element f which lies in the dense sub-algebra Cc (Rpunc ), its trace is given by τ (f ) = f (x, x)dµ(x). punc
This is a positive bounded linear functional of norm one. It is also faithful since the measure has full support. Such a trace induces a positive group homomorphism on the K-zero group of AT [Bl,W-O], τˆ : K0 (AT ) → R.
596
I. F. Putnam
It is our goal here to show that, under a very mild hypothesis regarding the topology of the prototiles, this homomorphism completely determines the order structure on K0 (AT ) [Bl,W-O]. Theorem 1.1. Let p1 , . . . , pN , ω be a substitution tiling system (or labelled substitution tiling system) in Rd which is primitive, aperiodic and satisfies the finite pattern condition. Suppose that for each prototile, the capacity or box-counting dimension of its boundary is strictly less than d. Then the order on K0 (AT ) is determined by the trace. That is, for any element a in K0 (AT ), a is in K0 (AT )+ if and only if a = 0 or τˆ (a) > 0. Notice that the hypothesis regarding dimension is satisfied by any polyhedra, where the boundaries are made up of lower dimensional hypersurfaces. Our proof will be presented in the last section. It will make use of a canonical C ∗ subalgebra of AT , denoted AFT [Kel1, Kel2]. This sub-algebra is reasonably large inside AT , but also has the advantage of being AF or approximately finite dimensional. (Again, AFω might be more appropriate notation.) The structure of this C ∗ -algebra is fairly well-understood. It is one of the AF -algebras constructed by Cuntz and Krieger from a mixing topological Markov chain, and the analogue of our main result above is known for such C ∗ -algebras. Our proof will make use of this. The rest of the argument is to show how we may interpolate between projections in AT with projections in AFT . The details appear in Sect. 3. Let us give a description of AFT . For each prototile p and each positive integer n, let Punc(p, n) denote the set of all punctures in the tiles of the partial tiling ωn (p). Suppose that x is in some Punc(p, n), and T is any tiling in punc such that p is in T . Recall that the puncture in p is the origin. Then the tiling ωn (T ) − x is again in punc . We let W (p, n, x) denote the set of all tilings of this form. It is not difficult to check that, for a fixed value of n, the collection of sets {W (p, n, x) | p a prototile, x ∈ Punc(p, n)}, is a partition of punc into clopen sets. Since we assume the substitution forces its border, then as n varies, these generate the topology on punc . Suppose that p is a prototile and n is a positive integer. If we have x and y in Punc(p, n), then the map sending T to T + x − y is a homeomorphism from W (p, n, x) onto W (p, n, y). The graph of this map is denoted by W (p, n, x, y). It is a clopen subset of Rpunc . We define RAF to be the union of these sets, which is then an open subgroupoid of Rpunc . Then the C ∗ -algebra of RAF is denoted by AFT and the obvious inclusion of Cc (RAF ) ⊂ Cc (Rpunc ) extends to an inclusion AFT ⊂ AT . To see that AFT is approximately finite dimensional, it suffices to notice that if we take An to be the linear span of the characteristic functions of the sets W (p, n, x, y), where p is a prototile and x, y are in Punc(p, n), then in fact, this is actually a finite dimensional C ∗ -subalgebra. The details are given in [KP]. It is also shown there that the matrix which describes the embedding of An ⊂ An+1 is the same for every n and is equal to the N × N matrix whose i, j entry is the number of different translates of pi appearing in ω(pj ) for all i, j . Since the substitution is primitive, so is this matrix in the sense that some power has no zero entries [LM]. Our trace, τ , restricts to a trace on AFT . By the results of [Ha], such a C ∗ -algebra has a unique trace. This implies the
Ordered K-Theory of C ∗ -Algebras with Substitution Tilings
597
uniqueness of our Rpunc -invariant measure µ since any other measure would give rise to another trace. These would be distinct on C(punc ) which is contained in AFT . While our main theorem gives a complete answer to the question of the order on K0 (AT ), there is one important question which we leave unanswered. That is to compute the range of the map τˆ . There is a natural conjecture, namely τˆ (K0 (AT )) = τˆ (K0 (AFT )) = {µ(E) | E ⊂ punc clopen } + Z. One inclusion in the first equality is obvious. In some special situations in low dimensions (d ≤ 2), equality is known [vE]. As well, our result in the next section, Theorem 2.1, suggests that this will be true under the same hypothesis as Theorem 1.1. Furthermore, the set τˆ (K0 (AFT )) is known to be the subgroup of R generated by numbers of the form λ−nd ξi , where n is a positive integer and ξi is an entry of the left Perron eigenvector of the primitive matrix of the last paragraph. 2. A Technical Result In this section we will prove a technical result which we will need in the proof of the main theorem. We include it as a separate section because it may be of some independent interest, as we will try to explain below. As we discussed in the introduction, we have two equivalence relations (or principal groupoids), RAF ⊂ Rpunc , on the space punc . The second has a topology in which it is locally compact, Hausdorff, metrizable, r-discrete σ -compact and for which counting measure is a Haar system. The first is an open subgroupoid. Roughly speaking, the structure of the subgroupoid is fairly well-understood and the difficulty in analyzing the C ∗ -algebra AT usually involves Rpunc − RAF . Our main technical result here is to show that, at the level of measure theory, the difference is negligible. Specifically, we will prove the following. Theorem 2.1. Let p1 , . . . , pN , ω be a substitution tiling system in Rd which is primitive, aperiodic and satisfies the finite pattern condition. Let Rpunc and RAF be the associated principal groupoids and let µ be the unique Rpunc -invariant probability measure on punc . If the boundary of each pi has capacity or box-counting dimension strictly less than d, then we have µ(r(Rpunc − RAF )) = 0. The proof will be broken into a series of lemmas and we will introduce some new notation. Recall that Punc(p, n) denotes the set of all punctures in the tiles of ωn (p). For any x in Punc(p, n), we define ∂(x) to be the Euclidean distance from x to the boundary of λn p, which is the support of ωn (p); that is, ∂(x) = inf{|x − y| | y ∈ Rd − λn p}, where || denotes the usual Euclidean norm on Rd . We fix b > 0 so that B(0, b) ⊂ p, for all prototiles p. Here, B(x, r) denotes the open ball in Rd centred at x and with radius r. Notice that this means for all tiles t, B(x(t), b) ⊂ t. We let #X to denote the number of elements of any finite set X.
598
I. F. Putnam
Lemma 2.2. For each prototile p, there is a positive constant ap such that #Punc(p, n) ≥ ap λdn for all positive integers n. Proof. We define the N × N substitution matrix B as follows. The i, j entry of B is the number of occurrences of translates of pj in ω(pi ). The fact that the substitution is primitive is equivalent to the fact that this non-negative matrix is primitive [LM]. Let v be the vector in Rd whose i th entry is the volume of pi . It is easy to calculate that v is a (right) eigenvector of B with eigenvalue λd . Since this eigenvector is clearly positive, this is the Perron eigenvector for B and λd is the Perron eigenvalue. (See Sect. 4.2 of [LM].) Since B is primitive, we may apply 4.5.12 of [LM] to conclude that, for every pair i, j , we may find a positive constant ai,j such that lim |(B n )i,j − ai,j λdn | = 0.
n→∞
But #Punc(p, n) is simply the number of different tiles in ωn (p) which is sum over all i of (B n )i,j , where pj = p. The result follows easily from this. Lemma 2.3. Let p1 , . . . , pN , ω be a primitive substitution tiling system in Rd such that, for each prototile p, boundary of p, ∂p, has box-counting dimension strictly less than d. Then, for any R > 0 and prototile p, we have lim
n→∞
#{x ∈ Punc(p, n) | ∂(x) ≤ R} = 0. #Punc(p, n)
Proof. We let δ be the maximum box-counting dimension of the boundaries of the prototiles. So our hypothesis is that δ < d. This means that there is a constant K and a function m(/) ≤ K/ −δ such that, for any prototile p, we may cover its boundary with m(/) balls of radius /, for any / > 0. Fix a prototile p and a positive integer n, let / = Rλ−n . Choose an open cover of ∂p with /-balls as above and denote their centres by xi , i = 1, . . . , m(/). Now if x is in Punc(p, n) and ∂(x) ≤ R, then for some y in ∂(λn p), we have |x − y| ≤ R. Then we have |λ−n x − λ−n y| ≤ Rλ−n = /, λ−n y is in ∂p and hence |xλ−n − xi | < 2/, for some i. So each point x of Punc(p, n) within R of the boundary of λn p, is contained in some λn B(xi , 2/). Notice that λn B(xi , 2/) = B(λn xi , λn 2/) = B(λn xi , 2R). We next want an upper bound on the number of such x, for a fixed i. Let ki = #(Punc(p, n) ∩ B(λn xi , 2R)). We will use the fact that B(x(t), b) ⊂ t, for any tile t. This means that the balls B(x, b), for x in Punc(p, n), are pairwise disjoint. And if x is also in B(λn xi , 2R), then B(x, b) is contained in B(λn xi , 2R + b). This means that V ol(B(λn xi , 2R + b)) ≥ V ol(B(x, b)),
Ordered K-Theory of C ∗ -Algebras with Substitution Tilings
599
where the sum is taken over all x in Punc(p, n) ∩ B(λn xi , 2R). There is a positive constant, Vd , so that for all positive r, V ol(B(x, r)) = Vd r d . So we have Vd (2R + b)d ≥ ki Vd bd , which in turn gives us ki ≤ (1 + 2R/b)d .
(1)
Now if we sum over i, we obtain #{x ∈ Punc(p, n) | ∂(x) ≤ R} ≤
m(/)
ki
i=1
≤
m(/)
(1 + 2R/b)d
i=1
≤ m(/)(1 + 2R/b)d ≤ K/ −δ (1 + 2R/b)d = K(Rλ−n )−δ (1 + 2R/b)d = K(1 + 2R/b)d R −δ λnδ = K λnδ , where K = K(1 + 2R/b)d R −δ is independent of n. Now we combine this estimate with Lemma 2.2 to obtain #{x ∈ Punc(p, n) | ∂(x) ≤ R} K λnδ ≤ lim n→∞ n→∞ (ap λdn ) #Punc(p, n) lim
= lim (K /ap )λn(δ−d) n→∞
= 0, since δ < d.
Definition 2.4. For any tiling T in r(Rpunc − RAF ), there is a vector x in Rd such that T − x is in punc , but (T , T − x) is not in RAF . For such T we define ρ(T ) = inf{|x| | (T , T − x) ∈ Rpunc − RAF }. Lemma 2.5. Let p be a prototile and n be a positive integer. Suppose that x is in Punc(p, n) and that T is in r(Rpunc − RAF ) ∩ W (p, n, x). Then we have ρ(T ) ≥ ∂(x). Proof. The hypothesis means that we can write T = ωn (T ) − x, where T contains the tile p. Suppose that y is any vector with |y| < ∂(x). We claim that if T − y is in punc , then (T , T − y) is in RAF . From this it follows that if (T , T − y) is to be in Rpunc − RAF , we must have |y| ≥ ∂(x) and the conclusion follows from the definition of ρ. As for the claim, we begin by noting that if |y| < ∂(x), then x + y is in the interior of λn p. If, in addition, T −y is in punc , then the T −y = ωn (T )−x−y = ωn (T )−(x+y) and so x+y is in Punc(p, n). The graph of the map sending ωn (T )−x to ωn (T )−(x+y) is contained in RAF . In particular, the pair (T , T − y) is in RAF . This completes the proof of the claim.
600
I. F. Putnam
Recall that we are trying to prove that µ(r(Rpunc − RAF )) = 0. It is easy to check that for fixed R > 0, the set {(T , T − x) | |x| ≤ R} ∩ (Rpunc − RAF ) is compact in Rpunc . It follows that for any R > 0, r(Rpunc − RAF ) ∩ ρ −1 [0, R] is compact in punc . To prove our result, it suffices to show that the µ-measure of this set is zero, for any R. We now fix R0 > 0 and, for convenience, we denote {T ∈ r(Rpunc − RAF ) | ρ(T ) ≤ R0 } by Y0 . We will construct a sequence of positive constants, R0 < R1 < R2 < . . . , and a sequence of locally defined maps γ1 , γ2 , . . . with the following properties. Each γm is a local homeomorphism whose graph is a clopen set in RAF and whose domain contains Y0 . Moreover, for all T in Y0 , we will have Rm−1 < ρ(γm (T )) ≤ Rm − 1. We may conclude from this last equation that the sets γm (Y0 ) are pairwise disjoint. Moreover, the maps γm all preserve the measure µ. So each of these sets has the same measure as Y0 and since the measure is finite, we conclude that µ(Y0 ) = 0, as desired. We begin by setting R0 = sup{ρ(T ) | T ∈ Y0 }. Assume that, for some m ≥ 1, we have Rm−1 defined with the property that ρ(T ) < Rm−1 , for all T in Y0 . We define γm as follows. We apply Lemma 2.3 using the value R = Rm−1 + 1. We may find a n sufficiently large so that the ratio in the limit is less than 1/2, for all prototiles p. This means that, for this value of n, #{x ∈ Punc(p, n) | ∂(x) ≤ Rm−1 } ≤ #{x ∈ Punc(p, n) | ∂(x) ≥ Rm−1 + 1}, for each prototile p. Now for each prototile, p, we may define an injection, η, from the first set above to the second. (Of course, there is a different η for each p, but we will suppress this in our notation.) The domain of the map γm will be the union of all sets W (p, n, x), where p is any prototile and x is in Punc(p, n) with ∂(x) ≤ Rm−1 . For T in W (p, n, x), we define γm (T ) = T + x − η(x). It is easy to check that γm is a homeomorphism on its domain. Also, for every T in YR , we know that T is in some set W (p, n, x). It follows from Lemma 2.5 that Rm−1 ≥ ρ(T ) ≥ ∂(x) and so W (p, n, x), and hence T , is in the domain of γm . It is also clear from the definition and Lemma 2.5 that ρ(γm (T )) ≥ ∂(η(x)) ≥ Rm−1 − 1. Therefore γm has all the required properties. To complete the induction, we choose Rm to be Rm = sup ρ(γm (Y0 )) + 1. This completes the proof of Theorem 2.1.
Ordered K-Theory of C ∗ -Algebras with Substitution Tilings
601
3. Proof of the Main Result We begin a proof of the main result Theorem 1.1. The key ingredient is the following. Lemma 3.1. Let p be a non-zero projection in AT and suppose that 0 < / < τ (p). Then there is a projection q in AFT satisfying [q] ≤ [p] in K0 (AT ) and |τ (p) − τ (q)| < /. The proof will take some time and involve several lemmas. Begin by choosing 0 < δ ≤ //20 and so that δ < 1/400. We use the facts that Cc (Rpunc ) ⊂ AT is dense and that Rpunc is totally disconnected to find a function f in Cc (Rpunc ) which is locally constant (i.e. f has finite range) and so that p − f < δ.
(2)
By replacing f by f ∗ f if necessary, we may assume that f is positive in AT . We may also assume that f ≤ 1. It follows from Eq. (2) that f 2 − f < 3δ.
(3)
Note that when we write f 2 , we mean the product in AT , which is the convolution product on Rpunc , not the pointwise product. Let K = r(supp(f ) ∩ (Rpunc − RAF )) which is a compact subset of punc and has µ(K) = 0, by Theorem 2.1. We may choose a clopen set F ⊃ K such that f 2 (x, x)dµ(x) < δ. (4) F
We define a function e on Rpunc by
e(T , T ) =
1 0
if T = T ∈ /F . otherwise
Notice that e is a projection in AFT . Lemma 3.2. The element ef (product in AT ) is a locally constant function on Rpunc and ef is in Cc (RAF ). Finally, we have |τ (f ) − τ (f 2 ef )| < 7δ.
(5)
602
I. F. Putnam
Proof. The first statement is obvious since both e and f have the same property. As for the second, we only need to see that ef is zero on Rpunc − RAF . We have ef (T , T ) = e(T , T )f (T , T ), for any (T , T ) in Rpunc . If (T , T ) is in Rpunc − RAF and f is not zero on this point, then T = r(T , T ) is in K and so e(T , T ) = 0. For the last inequality, we have |τ (f ) − τ (f 2 ef )| ≤ |τ (f ) − τ (f 2 )| + |τ (f 2 ) − τ (f ef )| +|τ ((f − f 2 )ef )| ≤ f − f 2 + |τ (f 2 ) − τ (ef 2 )| +f − f 2 ≤ 6δ + τ ((1 − e)f 2 ) = 6δ + f 2 (x, x)dµ(x) F
< 7δ, by Eq. (4).
We now know that the element f ef = (ef )∗ (ef ) is self-adjoint and lies in AFT . Since it is a locally constant function on RAF it will actually lie in one of the canonical approximating finite-dimensional C ∗ -algebras, denoted by AN in [KP]. This means that its spectrum is finite and we may write f ef =
m
λ i ei ,
(6)
i=1
where the λi are positive constants less than or equal to 1 and the ei are projections in AFT satisfying m
ei = 1,
i=1
ei ej = 0, for i = j.
By re-arranging the order of the terms, we may assume that λi ≤ 1/2, for i = 1, . . . , k, λi ≥ 1/2, for i = k + 1, . . . , m, for some fixed k. We now define q=
m
ei .
i=k+1
Notice immediately that q is a self-adjoint projection and lies in AFT .
Ordered K-Theory of C ∗ -Algebras with Substitution Tilings
603
Lemma 3.3. 1. 2.
pq − q < 4δ. pqp − q < 8δ.
Proof. We use the definition of q and Eq. (6): pq − q = (p − 1)
ei
i>k
m λj ej λ−1 = (p − 1) i ei i>k
= (p − 1)
i>k
j =1
f ef λ−1 i ei
≤ (p − 1)f ef
i>k
λ−1 i ei
≤ (p(f − p) + p − f ) sup{λ−1 i } i>k
< (2δ)2 = 4δ.
The second inequality follows at once from the first. We omit the details.
Since δ < 1/400, we obtain (pqp)2 − pqp < 24δ < 1/16, so the spectrum of pqp is contained in [−1/8, 1/8] ∪ [7/8, 9/8] and we may apply functional calculus and obtain q = χ(1/2,∞) (pqp). Then q is a self-adjoint projection in AT within 1/8 of pqp and hence within distance 1/2 of q. Therefore, [q ] = [q], by 4.3.2 of [Bl] or 5.2.6 of [W-O]. Also, the element q can be obtained as a limit of polynomial functions with zero constant term applied to pqp. From this we see that pq = q = q p, or q ≤ p. We have all the properties we desired from q and q , except the estimate on the trace of q. Lemma 3.4. |τ (p) − τ (q)| < /. Proof. First, we want to estimate |τ (f ) − τ (f q)|. Recall that the sum of the ei ’s was the identity, so that we have
|τ (f ) − τ (f q)| = |
m
τ (f ei ) −
i=1
=|
k i=1
i>k
τ (f ei )|.
τ (f ei )|
604
I. F. Putnam
Now we use the fact that, for i ≤ k, we have λi ≤ 1/2. So we may continue |τ (f ) − τ (f q)| ≤ | ≤|
k i=1 n
2(1 − λi )τ (f ei )| 2(1 − λi )τ (f ei )|
i=1 n
= 2|
τ (f ei ) − τ (f λi ei )|
i=1
= 2|τ (f ) − τ (f 2 ef )| < 14δ by Lemma 3.2. Now we are ready to compute |τ (p) − τ (q)| ≤ |τ (p) − τ (f )| + |τ (f ) − τ (f q)| +|τ (f q) − τ (pq)| + |τ (pq) − τ (q)| < δ + 14δ + f − p + pq − q < 20δ < /, using Lemma 3.3 for the last term.
We have now completed the proof of Lemma 3.1 and we are ready to give a proof of Theorem 1.1. First, we consider the “only if” direction. If a is any positive element in K0 (AT ), then by definition, a = [p] for some projection p in some Mn (AT ). By applying II.4.2 of [Ren], we may view p as a matrix of functions on Rpunc . Since p = p∗ p, the diagonal elements of the matrix p are non-negative on the diagonal in Rpunc . This means that τ (p) is non-negative. Moreover, if it is zero, then since µ has full support, then each diagonal entry of p is zero. This in turn implies that p = 0. Now we turn to the “if” direction of the proof. That is, suppose that a = [p] − [q] is in K0 (AT ) and has τˆ (a) = τ (p) − τ (q) > 0. We will show that [p] ≥ [q] in K0 (AT ). We will first consider the case that the projections actually lie in the algebra AT , rather than in matrices over AT . Begin with two projections p1 and p2 in AT and suppose that τ (p1 ) > τ (p2 ). Let / = (τ (p1 ) − τ (p2 ))/3. We apply Lemma 3.1 to the projection p1 and / > 0 to obtain q1 in AFT with [p1 ] ≥ [q1 ] and |τ (p1 ) − τ (q1 )| < /. We apply the same result to the projection 1 − p2 and the same / to obtain a projection in AFT . We let q2 be its orthogonal complement. So we have [q2 ] ≥ [p2 ] and |τ (p2 ) − τ (q2 )| < /. Then by a simple application of the triangle inequality, we have τ (q1 ) − τ (q2 ) ≥ / > 0. The C ∗ -algebra AFT has a unique trace and it is a simple AF-algebra. For simple AFalgebras, the order on their K-zero groups is completely determined by the traces [EHS]. We know then that [q1 ] ≥ [q2 ] and hence [p1 ] ≥ [p2 ] in K0 (AT ) as desired. In the case that the projections lie in Mn (AT ), we can use the same argument by replacing the groupoids Rpunc and RAF by their products with the trivial groupoid {1, . . . , n} × {1, . . . , n}. All of the essential features of the groupoids remain and the
Ordered K-Theory of C ∗ -Algebras with Substitution Tilings
605
effect at the level of C ∗ -algebras is to tensor on C ∗ ({1, . . . , n} × {1, . . . , n}) ∼ = Mn , the C ∗ -algebra of n × n matrices. We omit the details. Acknowledgement. I would like to thank Johannes Kellendonk, Chris Bose, Florin Diacu and Rua Murray for helpful conversations.
References [AP] [Be1] [Be2] [BCL] [Bl] [Co2] [Da] [Ef] [EHS] [vE] [Fi] [GS] [Ha] [Kel1] [Kel2] [KP] [LM] [Pe] [Put1] [Put2] [PS] [RW] [Ren] [Rud] [So1] [So2] [W-O]
Anderson, J.E. and Putnam, I.F.: Topological invariants for substitution tilings and their associated C ∗ -algebras. Ergodic Theory and Dynamical Systems 18, 509–537 (1998) Bellissard, J.: K-theory of C ∗ -algebras in solid state physics. In: Statistical Mechanics and Field Theory: Mathematical Aspects, ed. T.C. Dorlas, N.M. Hugenholtz and M. Winnik, Lecture Notes in Physics 257, Berlin–Heidelberg–New York: Springer-Verlag, 1986, pp. 99–156 Bellissard, J.: Gap labelling theorems for Schrödinger’s Operators. In: From Number Theory to Physics, ed. M. Waldschmidt, P. Moussa, J.M. Luck and C. Itzykson, Berlin–Heidelberg–New York: Springer-Verlag, 1992, pp. 538–630 Bellissard, J., Contensou, E., Legrand, A.: K-théorie des quasicristaux, image par la trace, le cas du ré octaggonal. C. R. Acad. Sci. Paris t. 326, Série I, 197–200 (1998) Blackadar, B.: K-theory for Operator Algebras. MSRI Publications 5, Berlin–Heidelberg–New York: Springer-Verlag, 1986 Connes, A.: Non-commutative Geometry. San Diego: Academic Press, 1994 Davidson, K.R.: C ∗ -algebras by example. Providence, RI.: Am. Math. Soc. 1996 Effros, E.G.: Dimensions and C ∗ -algebras. CBMS Regional Conf. Ser. no. 46, Providence, RI.: Am. Math. Soc., 1981 Effros, E.G., Handelman, D.E. and Shen, C.-L.: Dimension groups and their affine representations. Am. J. Math. 102, 385–407 (1980) van Elst, A.: Gap-labelling theorems for Schrödinger operators on the square and cubic lattice, Rev. Math. Phys. 6, 319–342 (1994) Fillmore, P.A.: A user’s guide to operator algebras. New York: Wiley, 1996 Grünbaum, B. and Shephard, G.C.: Tilings and Patterns. New York: Freeman and Co., 1987 Handelman, D.: Positive matrices and dimension groups associated with topological Markov chains. J. Operator Th. 6, 55–74 (1981) Kellendonk, J.: Non-commutative geometry of tilings and gap labelling. Rev. Math. Phs. 7, 1133–1180 (1995) Kellendonk, J.: The local structure of tilings and their integer group of coinvariants. Commun. Math. Phys. 187, 115–157 (1997) Kellendonk, J. and Putnam, I.F.: Tilings, C ∗ -algebras and K-theory. Preprint Lind, D. and Marcus, B.: An introduction to symbolic dynamics and coding. Cambridge: Cambridge University Press, 1995 Pedersen, G.K.: C ∗ -algebras and their automorphism groups. London: Academic Press, 1979 Putnam, I.F.: C ∗ -algebras from Smale spaces. Canad. J. Math. 48, 175–195 (1996) Putnam, I.F.: Hyperbolic dynamical systems and generalized Cuntz-Krieger algebras. Lecture Notes from the summer school in operator algebras, Odense, 1996 Putnam, I.F. and Spielberg, J.: The structure of C ∗ -algebras associated with hyperbolic dynamical systems. J. Func. Anal. 163, 279–299 (1999) Radin, C. and Wolff, M.: Space tilings and local isomorphism. Geom. Ded. 42, 355–360(1992) Renault, J.N.: A groupoid approach to C ∗ -algebras. Lecture Notes in Math. 793, Berlin–Heidelberg– New York: Springer-Verlag, 1980 Rudolph, D.J.: Markov tilings of Rd and representations of Rd actions. Contemp. Math. 94, 271–289 (1989) Solomyak, B.: Dynamics of self-similar tilings. Ergodic Theory and Dynamical Systems 17, 695–738 (1997) Solomyak, B.: Non-periodicity implies unique composition for self-similar translationally-finite tilings. Disc. Comp. Geom. 20, 265–279 (1998) Wegge-Olsen, N.E.: K-theory and C ∗ -algebras. Oxford: Oxford University Press, 1993
Communicated by A. Connes
Commun. Math. Phys. 214, 607 – 649 (2000)
Communications in
Mathematical Physics
© Springer-Verlag 2000
Stratification of the Generalized Gauge Orbit Space Christian Fleischhack1,2 1 Mathematisches Institut and Institut für Theoretische Physik, Universität Leipzig, Augustusplatz 10/11,
04109 Leipzig, Germany. E-mail:
[email protected] 2 Max-Planck-Institut für Mathematik in den Naturwissenschaften, Inselstraße 22–26, Leipzig, Germany.
E-mail:
[email protected] Received: 12 January 2000 / Accepted: 8 May 2000
Abstract: Different versions for defining Ashtekar’s generalized connections are investigated depending on the chosen smoothness category for the paths and graphs – the label set for the projective limit. Our definition covers the analytic case as well as the case of webs. Then the action of Ashtekar’s generalized gauge group G on the space A of generalized connections is investigated for compact structure groups G. Here, first, the orbit types of the generalized connections are determined. The stabilizer of a connection is homeomorphic to the holonomy centralizer, i.e. the centralizer of its holonomy group. It is proven that the gauge orbit type of a connection can be defined by the G-conjugacy class of its holonomy centralizer equivalently to the standard definition via G-stabilizers. The connections of one and the same gauge orbit type form a so-called stratum. As the main result of this article a slice theorem is proven on A. This yields the openness of the strata. Afterwards, a denseness theorem is proven for the strata. Hence, A is topologically regularly stratified by G. These results coincide with those of Kondracki and Rogulski for Sobolev connections. Furthermore, the set of all gauge orbit types equals the set of all (conjugacy classes of) Howe subgroups of G. Finally, it is shown that the set of all gauge orbits with maximal type has the full induced Haar measure 1.
1. Introduction For quite a long time the geometric structure of gauge theories has been investigated. A classical (pure) gauge theory consists of three basic objects: First the set A of smooth connections (“gauge fields”) in a principal fiber bundle, then the set G of all smooth gauge transforms, i.e. vertical automorphisms of this bundle, and finally the action of G on A. Physically, two gauge fields that are related by a gauge transform describe one and the same situation. Thus, the space of all gauge orbits, i.e. elements in A/G, is the configuration space for the gauge theory. Unfortunately, in contrast to A, which is
608
C. Fleischhack
an affine space, the space A/G has a very complicated structure: It is non-affine, noncompact and infinite-dimensional and it is not a manifold. Additionally, another typical “disadvantage” concerning A/G is the so-called Gribov problem: usually there do not exist global gauge fixings in A/G, i.e. smooth sections in A −→ A/G. Moreover, even in Airr −→ Airr /G (Airr ⊆ A being the set of all connections whose holonomy group is an irreducible subgroup of the structure group) there are often no smooth sections as proven by Singer [22]. All that causes enormous problems, in particular, when one wants to quantize a gauge theory. One possible quantization method is the path integral quantization. Here one has to find an appropriate measure on the configuration space of the classical theory, hence a measure on A/G. As just indicated, this is very hard to find. Thus, one has hoped for a better understanding of the structure of A/G. However, up to now, results are quite rare. But, should one restrict oneself to the case of smooth connections? Since in a quantization process smoothness is usually lost anyway, it is quite clear that one has to admit also non-smooth connections. This way, about 20 years ago, the efforts were focussed on a related problem: The consideration of connections and gauge transforms that are contained in a certain Sobolev class. (For basic results we refer, e.g., to [19].) Now, G is a Hilbert–Lie group and acts smoothly on A. About 15 years ago, Kondracki and Rogulski [14] found a lot of fundamental properties of this action. Perhaps, the most remarkable theorem they obtained was a slice theorem on A. This means, for every orbit A ◦ G ⊆ A there is an equivariant retraction from a (so-called tubular) neighborhood of A onto A ◦ G. Using this theorem they could clarify the structure of the so-called strata. A stratum contains all connections that have the same, fixed type, i.e. the same (conjugacy class of the) stabilizer under the action of G. Using a denseness theorem for the strata, Kondracki and Rogulski proved that the space A is regularly stratified by the action of G. In particular, all the strata are smooth submanifolds of A. Despite these results the mathematically rigorous construction of a measure on A/G has not been achieved. This problem was solved – at least preliminary – by Ashtekar et al. [1, 2], but, however, not for A/G itself. Their idea was to drop simply all smoothness conditions for the connections and gauge transforms. In detail, they first used the fact that a (smooth) connection can always be reconstructed uniquely by its parallel transports. On the other hand, these parallel transports can be identified with an assignment of elements of the structure group G to the paths in the base manifold M such that the concatenation of paths corresponds to the product of these group elements. It is intuitively clear that for smooth connections the parallel transports additionally depend smoothly on the paths [16]. But now this restriction is removed for the generalized connections. They are purely algebraic homomorphisms from the groupoid P of paths to the structure group G. Analogously, the set G of generalized gauge transforms collects all functions from M to G. Now the action of G on A is defined purely algebraically as well. Given A and G the topologies induced by the topology of G, one sees that, for compact G, these spaces are again compact. This provides us with the huge apparatus of measure theory on compact spaces. A particularly nice theorem [2] guarantees the existence of a natural induced Haar measure on A and A/G, the new configuration space for the path integral quantization. Both from the mathematical and from the physical point of view it is very interesting how the “classical” regular gauge theories are related to the generalized formulation in the Ashtekar framework. First of all, it has been proven that A and G are dense subsets in A and G, respectively [20]. Furthermore, A is contained in a set of induced Haar measure zero [18]. These properties coincide exactly with the experiences known from the Wiener
Stratification of the Generalized Gauge Orbit Space
609
or Feynman path integral. Then the Wilson loop expectation values have been determined for the two-dimensional pure Yang–Mills theory [5, 11] – in coincidence with the known results in the standard framework. Now, we are going to investigate the action of the generalized gauge transforms on the space of generalized connections in comparison with its counterpart in the Sobolev case discussed in detail by Kondracki and Rogulski [14]. Our main goals are the determination of the gauge orbit types and the proof of a slice theorem and a denseness theorem in the Ashtekar framework. However, our methods are completely different to those in [14]: We use topology and algebra instead of differential geometry and analysis. The outline of the present paper is as follows: • In the first part we will give a quite detailed introduction into the algebraic and topological definitions and properties of A, G and A/G. Here we closely follow Ashtekar and Lewandowski [4, 3] as well as Marolf and Mourão [18]. The most important difference to their definitions is that we do not restrict the paths to be (piecewise) analytic or smooth. For our purpose it is sufficient to fix a category of smoothness from the beginning. This is C r , where r can be any positive natural number, ∞ (smooth case) or ω (analytical case). We can also consider the corresponding cases C r,+ of paths that are (piecewise) immersions additionally. We will show that in a certain sense the case (ω, +) corresponds to the loop structures introduced by Ashtekar and Lewandowski [2] and the case (∞, +) corresponds to the webs introduced by Baez and Sawin [6]. • For the following parts we still need a more detailed analysis of the properties of the space A itself. However, these considerations are a little bit separate from the main goal of this paper – the investigation of the action of G on A –, such that we exported them to a second article [10]. There we will give a construction method for new connections being crucial for most of the statements of that article as well as some of the present paper. Then, as a main result, we will show that an induced Haar measure dµ0 can be defined for arbitrary smoothness conditions. For this, we introduce the notion of a hyph that generalizes the notion of a web and a graph. We show that the paths of a hyph are holonomically independent and that the set of all hyphs is directed. These two properties yield the well-definedness of dµ0 . • The second part of the present paper is devoted to the type of the gauge orbit. Here and in the following, G is compact. In the general theory of transformation groups the type of an orbit (or, more precisely, an element of an orbit) is defined by the conjugacy class of its stabilizer (see, e.g., [8]). Here, we will derive the explicit form of the stabilizer for every generalized connection. As we will see, the stabilizer of a connection is homeomorphic to the centralizer of its holonomy group, hence a finite-dimensional Lie group. Since stabilizers are conjugate in G if and only if these centralizers are conjugate in G, the type of an orbit is uniquely determined by a certain equivalence class of a Howe subgroup of the structure group G. (A Howe subgroup of G is a subgroup that can be written as the centralizer of some subset V ⊆ G.) This is closely related to the observations of Kondracki and Sadowski [15]. • In the third part we will see how the results of Kondracki and Rogulski can be extended from the Sobolev framework to the generalized case. First of all we prove a very crucial lemma: Every centralizer in a compact Lie group is finitely generated. This implies that every orbit type (being the centralizer of the holonomy group) is determined by a finite set of holonomies of the corresponding connection. Using the projection onto these holonomies we can lift the slice theorem from an appropriate finite-dimensional Gn to the space A. A slice theorem means that for every connection A ∈ A there
610
C. Fleischhack
is an open and G-invariant neighbourhood that can be retracted equivariantly to the orbit A ◦ G. It implies the (relative) openness of the strata – the sets of connections of one and the same type. Afterwards, we prove a denseness theorem for the strata. For this we need the construction method for new connections from [10] mentioned above. Altogether, we prove that the slice and the denseness theorem yield again a topologically regular stratification of A as well as of A/G. However, in contrast to the Sobolev case, the strata are not proved to be manifolds. But, two results for generalized connections go beyond those for Sobolev ones. First, as a corollary of the denseness theorem we obtain that the set of all gauge orbit types equals the set of all conjugacy classes of Howe subgroups of G. This way we can explicitly derive the set of all gauge orbit types. This was not known until now for the Sobolev case. However, recently, Rudolph, Schmidt and Volobuev [21] solved this problem for all SU (n)-bundles over two-, three- and four-dimensional manifolds. Second, we prove that the generic stratum, i.e. the set of all connections whose holonomy centralizers equal the center of G, is not only dense in A, but has also the total induced Haar measure 1. This shows finally that the Faddeev–Popov determinant for the projection A −→ A/G is equal to 1. In the following, M is always a connected and at least two-dimensional C r -manifold with r ∈ N+ ∪ {∞} ∪ {ω} being arbitrary, but fixed. Furthermore, m is an, as well, arbitrary, but fixed point in M and G is a Lie group being compact in sections 3ff. 2. Reformulation of Ashtekar’s Gauge Theory 2.1. Paths. In the classical approach a connection can be described by the corresponding parallel transports along paths in the base manifold. But, not every assignment of group elements to the paths yields a connection. On the one hand, this map has to be a homomorphism, i.e., products of paths have to lead to products of the parallel transports, and on the other hand, it has to depend in a certain sense continuously on the paths. Moreover, additional topological obstructions may occur. In the Ashtekar approach, however, the second (and the third) condition is dropped. A connection is now simply a homomorphism from the set of paths to the structure group G. Up to now, it is not clear whether there is an “optimal” definition for the structure of the groupoid P of paths. The first version was given by Ashtekar and Lewandowski [2]. They used piecewise analytical paths. The advantage of this approach was that any finite set of paths forms a finite graph. Hence for two finite graphs there is always a third graph containing both of them, i.e. the set of all graphs forms a directed set. Using this it is easy to prove independence theorems for loops and to define then a natural measure on A and A/G. But, the restriction to analyticity seems a little bit unsatisfactory. Since one has desired from the very beginning to use A for describing quantum gravity, one comes into troubles with the diffeomorphism invariance of this theory: After applying a diffeomorphism a path need no longer be analytical. That is why Baez and Sawin [6] introduced so-called webs and tassels built by only smooth paths fulfilling certain conditions. Any graph can be written as a web and for any finite number of webs there is a web containing all of them. So the directedness of the label set for the definition of A remains valid, and, consequently, one can generalize the construction of the natural induced Haar measure and lots more things. In this paper we will introduce another definition for paths. Our definition will have the advantage that it does not depend explicitly on the chosen smoothness category
Stratification of the Generalized Gauge Orbit Space
611
labelled by r ∈ N+ ∪ {∞} ∪ {ω}. Moreover, it does not matter whether we demand the paths to be piecewise immersions (cases C r,+ , + denoting the immersion property) or not. Therefore, in what follows suppose that we have fixed the parameter r from the very beginning. Furthermore, we decide now whether we additionally demand the paths to be piecewise immersions or not. Nevertheless, we write always simply C r . 2.1.1. General case. In this paragraph we consider all smoothness categories on one stroke. Definition 2.1. A path is a piecewise C r -map γ : [0, 1] −→ M. If we consider piecewise immersed paths, we have to additionally define all γ : [0, 1] −→ M that are piecewise constant, i.e. γ |[τ1 ,τ2 ] = {x} for some x ∈ M, or immersive to be paths. The initial point is γ (0) and the terminal point γ (1). Two paths γ1 and γ2 can be multiplied iff the terminal point of γ1 and the initial point of γ2 coincide. Then the product is given by γ1 (2t) for 0 ≤ t ≤ 21 γ1 γ2 (t) := . γ2 (2t − 1) for 21 ≤ t ≤ 1 A path γ is called trivial iff im γ ≡ γ ([0, 1]) is a single point. An important idea of the Ashtekar program is the assumption that the total information about the continuum theory is encoded in the set of all subtheories on finite lattices. Thus we need the definition of paths and graphs. The set of all paths is hard to manage. That is why we restrict ourselves to special paths. Definition 2.2. • A path γ has no self-intersections iff from γ (τ1 ) = γ (τ2 ) follows that – τ1 = τ2 or – τ1 = 0 and τ2 = 1 or – τ1 = 1 and τ2 = 0. • Two paths γ1 and γ2 are non-intersecting iff γ1 (τ1 ) = γ2 (τ2 ) implies τ1 , τ2 ∈ {0, 1}. • A path γ is called subpath of a path γ iff there is an affine non-decreasing map φ : [0, 1] → [0, 1] with γ = γ ◦ φ. Iff additionally φ(0) = 0 (or φ(1) = 1), γ is called initial path (or terminal path) of γ . We define γ t,+ (τ ) := γ (t + τ (1 − t)) for all t ∈ [0, 1) and γ t,− (τ ) := γ (τ t) for all t ∈ (0, 1] to be the outgoing and incoming subpath of γ in t, respectively. If γ is a path without self-intersections then set γ x,± := γ t,± for all x ∈ im γ , where t fulfills γ (t) = x. (We choose t = 0 in the +-case if x = γ (0). Analogously for t = 1.) • A (finite) graph is a (finite) union of paths ei that are mutually non-intersecting and and of isolated points vj . The elements have no self-intersections of V() := {e (0), e (1)} ∪ {v } are called vertices, that of E() := i i j i j i {ei } edges. A graph is called connected iff V() ∪ e∈E() im e is connected. • A path in a graph is a path in M, that equals a product of edges in and trivial paths (with values in V()), respectively, whereas the product of two consecutive paths has to exist. A path γ in M is called simple iff there is a finite graph such that γ is a path in . • A path γ in M is called finite iff γ is up to the parametrization equal to a finite product of simple paths. Here, two paths γ1 and γ2 are equal up to the parametrization iff there is a bijective : [0, 1] −→ [0, 1] with (0) = 0 and γ2 = γ1 ◦ such that and −1 are piecewise C r .
612
C. Fleischhack
• Two finite paths γ1 and γ2 are called equivalent iff there is a finite sequence of finite paths δi with δ0 = γ1 and δn = γ2 such that for all i = 1, . . . , n – δi and δi−1 coincide up to the parametrization or – δi arises from δi−1 by inserting a retracing or – δi−1 arises from δi by inserting a retracing. Inserting a retracing means, there is a τ ∈ [0, 1] and a finite path δ such that δ ( 1 t) i−1 2 1 δ(4(t − 2 τ )) δi (t) = 1 δ(4( τ + 1 − t)) δ (21 t − 21 ) i−1 2 2
for 0 ≤ t ≤ 21 τ for 21 τ ≤ t ≤ 21 τ + 41 . 1 1 for 2 τ + 4 ≤ t ≤ 21 τ + 21 for 21 τ + 21 ≤ t ≤ 1
In the following, we denote by a retracing of a path γ a subpath of the form δδ −1 with a certain finite δ. • The set of all classes of finite paths is denoted by P, that of paths in by P . Furthermore, we write Pxy for the set of all classes of finite paths from x to y. The set of all classes of finite paths having base point m forms the hoop group HG ≡ Pmm . We have immediately from the definitions Proposition 2.1. The multiplication on P induced by the multiplication of paths is welldefined and generates a groupoid structure on P. (This means, roughly speaking, P possesses all properties of a group: associativity, existence of unit elements and of the inverse. But, the product need not be defined for all paths.) The hoop group HG is a subgroup of P. Remark. 1. One can define an analogous equivalence relation on the set of paths in a fixed graph: Two paths would be “-equivalent” iff they arise from each other by reparametrizations or by inserting or cancelling of retracings contained in . Obviously, two paths in are equivalent if they are -equivalent. On the other hand, one can also prove that two paths contained in are already -equivalent if they are equivalent. Consequently, we can identify P and the set of all -equivalence classes of paths in . In other words: P is the groupoid that is generated freely by the set of all edges of . 2. In what follows we usually say instead of “finite connected graph” simply “graph” and instead of “finite path” only “path”. Moreover, by a path we always mean – if not explicitly the converse is said – an equivalence class of paths. 3. Finally, we identify two graphs if the (corresponding) edges are equivalent. Since edges are per def. free of retracings, this simply means that the edges are equal up to the parametrization. 4. Note that the paths γ1 (τ ) := τ and γ2 (τ ) := τ 2 in R(⊆ Rn ⊆ M) are not equivalent. √ This comes from the fact that : τ −→ τ 2 is C r , but −1 : τ −→ τ is not. (As well, it is not possible to transform γ1 into γ2 successively inserting or deleting retracings as in Definition 2.2.) Furthermore, one sees that γ1 ◦ γ2−1 is an example for a path with retracings that is not equivalent to a path without. 5. If we restricted ourselves to piecewise analytical paths, i.e. the smoothness category (ω, +) from the very beginning, every path would be finite. [2]
Stratification of the Generalized Gauge Orbit Space
613
The main assumption quoted above suggests the usage of finite graphs as an index set for the subtheories. But, these theories are not “independent”. Roughly speaking, a subtheory defined on a smaller lattice arises by projecting the theory defined on the bigger lattice. Definition 2.3. Let 1 and 2 be two graphs. 1 is smaller or equal 2 (1 ≤ 2 ) iff each edge of 1 is (up to the parametrization) a product of edges of 2 and the vertex sets fulfill V(1 ) ⊆ V(2 ). Obviously, ≤ is a partial ordering. 2.1.2. Immersive case. In the case of piecewise immersed paths we can define another equivalence relation for finite paths. Here we use the fact that any piecewise immersed path can be parametrized proportionally to the arc length: Definition 2.4. We shortly call a path a pal-path iff it is parametrized proportionally to the arc length. Two finite paths γ1 and γ2 are called equivalent iff there is a finite sequence of finite paths δi with δ0 = γ1 and δn = γ2 such that for all i = 1, . . . , n • δi and δi−1 coincide when parametrized proportionally to the arc length or • δi arises from δi−1 by inserting a retracing or • δi−1 arises from δi by inserting a retracing. This definition seems to require a certain Riemannian structure on M. But, on the one hand, each manifold can be given a Riemannian structure. On the other hand, the definition of equivalence does not depend on the chosen Riemannian metric: if two paths coincide w.r.t. to the arc length to the first metric then they obviously coincide w.r.t. to the arc length of the other metric. Thus, this definition is indeed completely determined by the manifold structure of M. Lemma 2.2. 1. Two finite paths γ1 and γ2 are equivalent if they can be obtained from each other by a reparametrization. 2. Each nontrivial finite path is equivalent to a pal-path without retracings. Proof. 1. Clear. 2. We prove this inductively on the number n of simple paths γi that the finite path γ is decomposed into. We will even prove that γ is equivalent to a pal-path γ that can be decomposed (up to the parametrization) into n ≤ n simple paths and that has no retracings. For n = 1 we have nothing to prove. Thus, let n ≥ 2. First free γ0 := n−1 i=1 γi off n −1 the retracings using the induction hypothesis. We get a pal-path γ0 ∼ i=1 γi with the desired properties and n ≤ n. Denote by γ the pal-path corresponding to γ0 γn . Obviously, γ ∼ γ . Suppose, γ is not free of retracings. Let δδ −1 be a retracing. Then a part of the retracing δδ −1 has to be in γn . Since γn is simple (and w.l.o.g. non-trivial), the terminal point of δ cannot be in int γn . Since by assumption γ0 is free of retracings, the terminal point has to be the initial point of γn , and thus δ −1 is (if necessary, after an appropriate [affine] reparametrization) an initial path of γn . Assume now δ to be maximal, i.e., any δ “containing” terminal path δ of γ0 that yields a retracing in γ is equal to δ. Such a δ exists: Assume that every pal-subpath δτ−1 of γn corresponding to the parameter interval [0, τ ] with τ < T yields a retracing. (Such
614
C. Fleischhack
a T exists, because there exists some retracing.) By the continuity of every path and the fact that the paths arising here and so all their subpaths are pal, also δT−1 has to yield a retracing. Consequently, there is a maximal T . Now, cancel out the retracing: If δ is not a (genuine) subpath of γn −1 (i.e., “exceeds” or equals it), define γn to be the “remaining” part of γn “outside” (γn −1 )−1 ; then γ := n −2 i=1 γi ◦ γn ∼ γ consists of at most n − 2 + 1 < n finite paths. The induction hypothesis gives the assertion. Suppose now that δ is a (genuine) subpath of γn −1 . Then n −2 define the pal-path γ by i=1 γi γn −1 ◦ γn , where γn denotes the "remaining" part of γn outside of δ −1 and γn −1 that of γn −1 outside of δ. By the maximality of δ, γ contains no retracings. γ ∼ γ ∼ γ yields the assertion. Most of the constructions in the following as well as most of those in [10] do actually not depend on the choice of the equivalence relation for the paths. But, the second one can only be used for piecewise immersed paths. Therefore, in what follows, we will use the general equivalence relation given in the previous paragraph. 2.2. Gauge theory on the lattice. In this subsection we will transfer the lattice gauge theory given by Ashtekar and Lewandowski [4, 3] to our case. The algebraic definitions for the connections, gauge transforms and the action of the latter ones follow these authors closely. In the last two subsections we will state some assertions mainly on the basic properties of the action of the gauge transforms and the projections onto smaller graphs. 2.2.1. Algebraic definition. We use the standard definition: Globally connections are parallel transports, i.e. G-valued homomorphisms of paths in M, and gauge transforms are G-valued functions over M. The lattice versions now come from restricting the domain of definition to edges and vertices in a graph. Definition 2.5. Let be a graph. We define A := Hom(P , G) . . . set of all connections on and G := Maps(V(), G) . . . set of all gauge transforms on . Here, Hom(P , G) denotes the set of all homomorphisms from the groupoid P freely generated by the edges of into the structure group and Maps(V(), G) the set of all maps from the set of all vertices of into the structure group. In the classical case the action of a gauge transform on a connection can be described by the corresponding action on the parallel transports: hA (γ ) −→ gγ−1 (0) hA (γ )gγ (1) . By simply restricting onto the lattice we receive the action of G on A by # : A × G −→ A (h , g ) −→ h ◦ g with h ◦ g (γ ) := g (γ (0))−1 h (γ ) g (γ (1)) for all paths γ in . Definition 2.6. For each graph we define A/G := A /G . . . set of all equivalence classes of connections in .
Stratification of the Generalized Gauge Orbit Space
615
2.2.2. Topological definition. It is obvious that the groupoid P is always freely generated by the the set A = Hom(P , G) can be identified edges ei of . Hence, via h −→ h(e1 ), . . . , h(e#E() ) with G#E() and can so be given a natural topology. Analogously, we use that naturally G = Maps(V(), G) can be identified via g −→ (g(x))x∈V(G) with G#V() . So G is by means of the pointwise multiplication a topological group. We have immediately Proposition 2.3. For all graphs the action # : A × G −→ A is continuous. Proof. # as a map from G#E() × G#V() to G#E() is a concatenation of multiplications, hence continuous. Corollary 2.4. A/G = A /G is a Hausdorff space. A/G is compact for compact G. It is well-known that connections are dual to paths and equivalence classes of connections are dual to closed paths. This is again confirmed by Proposition 2.5. A/G is isomorphic to Hom(HG x, , G)/Ad, hence isomorphic to Gdim π1 () /Ad, for each graph and for each vertex x in . Here HG x, is the set of all (classes of) path(s) in starting and ending in x, and π1 () is the fundamental group of . Proof. Define
J : A/G −→ Hom(HG x, , G)/Ad. [h] −→ [h |HG x, ]Ad
• J is well-defined. If h = h ◦ g, then h (α) = g(x)−1 h (α)g(x) for all α ∈ HG x, , i.e. h |HG x, = h |HG x, ◦Ad g(x). • J is injective. Let J (h ) = J (h ), i.e., let there exist a g ∈ G such that h (α) = g −1 h (α)g for all α ∈ HG x, . Choose for all vertices y = x a path γy from x to y, set γx := 1 and set g(y) := h (γy )−1 g h (γy ) for all y. Now, h = h ◦ (g(y))y∈V() is clear. • J is surjective. Let [h] ∈ Hom(HG x, , G)/Ad be given. Choose an h ∈ [h] and as above for all vertices y a path γy and some gy ∈ G. For each γ ∈ P set h0 (γ ) := −1 gγ−1 (0) h(γγ (0) γ γγ (1) ) gγ (1) . We have J (h0 ) = [h]. Since HG x, is isomorphic to π1 (), hence a free group with dim π1 () generators [11, 2], we have A/G ∼ = Gdim π1 () /Ad. 2.2.3. Relations between the lattice theories. If one constructs a global theory from its subtheories one has to guarantee that these subtheories are “consistent”. This means, e.g., that the projection of a connection onto a smaller graph has to be already defined by its projection onto a bigger graph. So we need projections onto the subtheories induced by the partial ordering on the set of graphs. Definition 2.7. Let 1 ≤ 2 . We define
π12 : A2 −→ A1 , h −→ h |P1
616
C. Fleischhack
π12 : G 2 −→ G 1 g −→ g |V(1 ) and
π12 : A/G 2 −→ A/G 1 . [h] −→ [h |P1 ]
We denote all the three maps by one and the same symbol because it should be clear in the following what map is meant. Obviously, from h = h ◦ g on 2 follows h |P1 = h |P1 ◦g |V(1 ) on 1 , i.e.
π12 is well-defined. Furthermore, we have
Proposition 2.6. Let 1 ≤ 2 ≤ 3 . Then π12 π23 = π13 . Finally, we write down the projections by operations on the structure group in order to see topological properties. Let again 1 ≤ 2 . First we decompose each edge ei of 1 into edges fj of 2 : i *(i,ki ) ei = K ki =1 fj (i,ki ) . With this we get for the map between the connections (n := #E(1 )) π12 :
G#E(2 )
G#E(1 ) , −→ Kn *(1,k1 ) *(n,kn ) K1 g1 , . . . , g#E(2 ) −→ k1 =1 gj (1,k1 ) , . . . , kn =1 gj (n,kn ) .
On the level of gauge transforms the description is very easy: π12 projects (gv )v∈V(2 ) onto those elements belonging to vertices in 1 . For classes of connections an analogous formula as for connections holds: First choose two free generating systems α and β of HG x1 ,1 and HG x2 ,2 , respectively, and then a path γ from x2 to x1 in the bigger graph i *(i,ki ) 2 . Thus we get decompositions αi = γ −1 K ki =1 βj (i,ki ) γ . Hence, (ni := dim π1 (i )) π12 :
Gn2 /Ad
g1 , . . . , gn2
−→
Ad
−→
Gn1 /Ad,
Kn *(n1 ,kn ) , kn 1=1 gj (n1 ,kn1 ) .
*(1,k1 ) K1 k1 =1 gj (1,k1 ) , . . .
1
1
Ad
Proposition 2.7. π12 is continuous, open and surjective. Proof. The surjectivity is clear for all three cases. The continuity is trivial for the first two cases and follows in the third because the projections Gn −→ Gn /Ad are open, continuous and surjective (see [8]) and the map from Gn2 to Gn1 corresponding to π12 is obviously continuous. The openness follows immediately in the case of gauge transforms because projections onto factors of a direct product are open anyway. In the case of connections one additionally needs the openness of the multiplication in G: Each edge in 1 is a product of edges in 2 , i.e., after possibly renumbering we have ei = fi,1 · · · fi,Ki . Thus, π12 (g1,1 , . . . , gn,Kn , . . . ) = (g1,1 · · · g1,K1 , . . . , gn,1 · · · gn,Kn ). Let now W be open in A2 = G#E(2 ) . Then W is a union of sets of the form W1,1 × · · · × Wn,Kn × · · · , i.e., π12 (W ) is a union of sets of the form (W1,1 · · · W1,K1 ) × · · · × (Wn,1 · · · Wn,Kn ). But these are open, i.e., π12 is open. The openness of π12 : A/G 2 −→ A/G 1 follows
now because the map π12 : A2 −→ A1 is open and the projections A −→ A/G are continuous, open and surjective.
Stratification of the Generalized Gauge Orbit Space
617
2.3. Continuum gauge theory. For completeness in the first paragraph we will briefly quote the definitions of A, G and A/G from [4] and in the second we summarize the most important facts about these spaces. In the last two paragraphs we will first investigate the topological properties of the action of G on A and of the projections onto the lattice gauge theories and then prove that the connections etc. are algebraically described exactly in the same form both for our definition of paths and for that of Ashtekar and Lewandowski [2]. 2.3.1. Definition of A, G and A/G. By means of the continuity of the projections π12 the spaces (A ) , (G ) and (A/G ) are projective systems of topological spaces. This leads to the crucial [4] Definition 2.8 (Generalized Gauge Theories). • A := lim A is the space of generalized connections. ← − The elements of A are usually denoted by A or hA . • G := lim G is the space of generalized gauge transforms. ← − The elements of G are usually denoted by g. • A/G := lim A/G is the space of generalized equivalence classes of connections. ← − Explicitly this means A = {(h ) ∈ × A | π12 h2 = h1 for all 1 ≤ 2 },
G = {(g ) ∈ × G | π12 g2 = g1 for all 1 ≤ 2 }
as well as A/G = {([h ]) ∈ × A/G | π12 [h2 ] = [h1 ] for all 1 ≤ 2 }.
We denote π :
A −→ A , (h ) −→ h
π :
G −→ G (g ) −→ g ,
and π :
A/G −→ A/G . ([h ]) −→ [h ]
(1)
618
C. Fleischhack
2.3.2. Topological characterization of A, G and A/G. We have [4, 13] Theorem 2.8. 1. A, G and A/G are completely regular Hausdorff spaces and, for compact G, compact. 2. For every principal fibre bundle over M with structure group G the regular connections (gauge transforms, equivalence classes of connections) are also generalized connections (gauge transforms, equivalence classes of generalized connections): The maps A −→ A, G −→ G and A/G −→ A/G are embeddings. 3. Let X be a topological space. A map f : X −→ A is continuous iff π ◦ f : X −→ A ≡ G#E() is continuous for all graphs . The analogous assertion holds for maps from X to G and A/G, respectively, as well. 4. π is continuous for all graphs . 5. G is a topological group. We shall postpone the discussion whether the space A is dense in A or not for several reasons. This, in fact, depends crucially on the chosen smoothness category and equivalence relation for the paths. It should be clear that – provided γ1 (τ ) := τ and γ2 (τ ) := τ 2 (cf. Remark in paragraph 2.1.1) are seen to be non-equivalent – the denseness is unlikely: No classical smooth connection A can distinguish between these paths. So we will discuss this a bit more in detail in the accompanying paper [10]. As well, we will show there that π is also open and surjective. But all that requires some technical efforts that are absolutely not necessary for the actual goal of this paper – the determination of the gauge orbit types. Proof. 1. The property of being compact, Hausdorff or completely regular is maintained by forming product spaces and by the transition to closed subsets. Thus the assertion follows from the corresponding properties of the structure group G. 2. The embedding property follows from Giles’ reconstruction theorem [12] and [1]. 3. See, e.g., [13]. 4. Since id : A −→ A etc. is continuous, this follows from the facts just proven. 5. The multiplication on G is defined by (g ) ◦ (g ) = (g ◦ g ) . With this G −1 = (g−1 ) . The multiplication is a group with unit (e ) and inverse (g ) m : G × G −→ G is continuous due to the continuity criterion above: π ◦ m = m ◦ (π × π ) is continuous for all , because the multiplication m on G is continuous. 2.3.3. Action of Gauge transforms on connections. Because of the consistency of the actions of G on A one can also define an action of G on A. One simply sets [4] #:
A × G −→ A. (h ) , (g ) −→ (h ◦ g )
Theorem 2.9. 1. The action # of G on A is continuous. and are continuous. 2. The maps A : G −→ A g : A −→ A g −→ A ◦ g A −→ A ◦ g 3. The canonical projection πA/G : A −→ A/G is continuous and open and for compact G also closed and proper. 4. The map π : A/G −→ A /G is well-defined and continuous. [(h ) ] −→ [h ]
Stratification of the Generalized Gauge Orbit Space
619
Proof. 1. π ◦ # = # ◦ (π × π ) : A × G −→ A as a concatenation of continuous maps on the right-hand side is continuous for any graph . By the continuity criterion for maps to A in Theorem 2.8, # is continuous. 2. Follows from the continuity of #. 3. Follows because # is a continuous action of a (compact) topological group G on the Hausdorff space A. [8] 4. π is well-defined. Namely, let A = A ◦ g, i.e. (h ) = (h ◦ g ) , thus h = h ◦ g for all graphs . Then [h ] = [h ]. The continuity of π : A/G −→ A /G follows from the continuity of π : A −→ A and πA /G as well as from the continuity criterion for the quotient topology because the diagram A π
↓
A is commutative.
πA/G
A/G π
πA
/G
↓
A /G
We note that for a compact structure group G and for analytic paths A/G and A/G are even homeomorphic (cf. [4, 3]). 2.3.4. Algebraic characterization of A, G and A/G. In this paragraph we will show that our choice of the definition of paths leads to the same results as the definitions in [2] do. Theorem 2.10. 1. We have A ∼ = Hom(P, G). (This justifies the notation hA for a connection A.) Here, Hom(P, G) is the set of all maps h : P −→ G, that fulfill h(γ1 γ2 ) = h(γ1 )h(γ2 ) for all multipliable paths γ1 , γ2 ∈ P. 2. We have G ∼ = ×x∈M G ≡ Maps(M, G). The isomorphism is even a homeomorphism of topological groups. 3. The action of gauge transforms on the connections is given by hA◦g (γ ) := gγ−1 (0) hA (γ ) gγ (1) for all γ ∈ P.
(2)
hA : P −→ G is the homomorphism corresponding to A ∈ A and gx the component of the gauge transform g ∈ G in x. 4. We have A/G ∼ = Hom(HG, G)/Ad. Here, Hom(HG, G) is the set of all homomorphisms h : HG −→ G. Proof. 1. Define
I : Hom(P, G) −→ A. h −→ (h |P )
• I is obviously well-defined. • I is injective. From h 1 = h2 follows the existence of a γ ∈ P with h1 (γ ) =h2 (γ ). Since γ equals γi with appropriate simple γi , we have h1 (γi ) = h2 (γi ), hence h1 (γi ) = h2 (γi ) for some γi . Choose a finite graph such that γi is a path in . Here we have h1 |P (γi ) = h1 (γi ) = h2 (γi ) = h2 |P (γi ), i.e. I (h1 ) = I (h2 ).
620
C. Fleischhack
• I is surjective. Let (h ) be given. We consider first not classes of paths, but the paths itself. Construct for any simple γ ∈ P a graph containing γ . Define h(γ ) := h (γ ). For general γ ∈ P define h(γ ) := h(γi ) according to some decomposition of γ into simple paths γi . This construction is well-defined: First one easily realizes that it is independent of the decomposition of γ into finite paths (thus also of the parametrization), see the remark below. Hence obviously, h is a homomorphism. Thus, also h(γ δδ −1 γ ) = h(γ γ ) etc., i.e., h is constant on equivalence classes of paths. Consequently, h : P −→ G is a well-defined homomorphism with I (h) = (h ) . 2. Set I : Maps(M, G) −→ G. (gx )x∈M −→ (gx )x∈V() Obviously, I is bijective and a group homomorphism. The topology on Maps(M, G) = ×x∈M G is generated by the preimages πy−1 (U ) of open U ⊆ G, by which πy : (gx )x∈M −→ gy is continuous. Hence, π ◦ I = πv1 × · · · × πv#V() is continuous for all , i.e., I is continuous. Due to the continuity criterion for maps into product spaces, also I −1 is continuous because for all y the map πy ◦ I −1 = π ( consists only of the vertex y) is continuous. 3. This follows immediately from the preceding steps. 4. Use the map J : A/G −→ Hom(HG, G)/Ad and repeat the steps of the proof [h] −→ [h |HG ]Ad of Proposition 2.5. Remark. We still have to show that h(γ ) defined in the surjectivity part of thefirst item above is independent of the decomposition of γ into finite paths. Namely, let γi and γj be two decompositions of γ . The terminal points of γi and γj correspond to certain values of the parameters τi and τj , respectively, of the path γ . Order these values to a sequence (τk ) and construct a decomposition of γ into simple paths δk such that δk corresponds to the segment γ |[τk ,τk+1 ] . Now, on the one hand, γ equals up to the parametrization δk , but, on the other hand, each γi and γj equals up to the parametrization a product δκ ◦ δκ+1 ◦ · · · ◦ δλ with certain κ, λ. Now let i be that graph w.r.t. that γi is simple. Construct hereof the graph i by inserting the terminal points of i ≤ i , and all the γj as vertices. Finally, let k be the graph spanned by δk . Thus, k , we have h(γi ) = h (γi ) i
= hi (γi ) = hi (δκ ◦ δκ+1 ◦ · · · ◦ δλ ) = hi (δκ ) hi (δκ+1 ) · · · hi (δλ ) = hκ (δκ ) hκ+1 (δκ+1 ) · · · hλ (δλ ). Using the analogous relation for γj we have h(γi ) = hk (δk ) = h(γj ). Thus, h(γ ) does not depend on the decomposition. In the following we will usually write a gauge transform in the form g = (gx )x∈M . Furthermore we have again by the continuity criterion for maps into product spaces:
Stratification of the Generalized Gauge Orbit Space
621
Corollary 2.11. Let X be a topological space. A map f : X −→ G is continuous iff πx ◦ f : X −→ G is continuous for all x ∈ M. πx is continuous for all x ∈ M. Remark. If we work in the (ω, +)-category for the paths, i.e., we only consider piecewise analytical graphs, all the definitions and results coincide completely with those of Ashtekar and Lewandowski in [2, 4, 3].
2.4. Graphs vs. webs. In this subsection we will compare the consequences of our definition of paths to that of webs [6, 7, 17]. Within this subsection we only consider the smooth, piecewise immersed category (∞, +) for paths. Note moreover that, here, a path is simply a piecewise immersive and C ∞ -map from [0, 1] to M, i.e. it is not an equivalence class. But it is still finite as before. Let us briefly quote the basic properties of webs. A web consists of a finite number of so-called tassels. A tassel T with base point p ∈ M is a finite, ordered set of curves ci (piecewise immersive smooth maps from [0, 1] to M, i.e. the notion of a curve coincides with our notion of a general, usually non-finite path) that fulfills certain properties: 1. ci (0) = p for all i (common initial point). 2. ci is an embedding (in particular, has no self-intersections). 3. There is a positive constant ki ∈ R for each i such that ci (t) = cj (s) implies ki t = kj s (consistent parametrization). 4. Define Type(x) := {i ∈ I | x ∈ im ci } for all x ∈ M. Then, for all J ⊂ I the set Type−1 ({J }) is empty or has p as an accumulation point. Thus, in our notation, each ci is a simple path. A web is now a finite collection of tassels such that no path of one tassel contains the base point of another tassel. The following theorem on curves proven by Baez and Sawin [6] will be crucial: Theorem 2.12. Given a finite set C of curves. Then there is a web w, such that every curve c ∈ C is equivalent to a finite product of paths γ ∈ w and their inverses. This, namely, leads immediately to the following Proposition 2.13. Every curve is equivalent to a finite path. Thus, our restriction to finite paths is actually no restriction. Proof. Let there be given an arbitrary curve γ : [a, b] −→ M. By the preceding theorem γ depends on some web w, i.e., there is a family of curves ci being simple paths such that γ equals (modulo equivalence, i.e. up to reparametrizations, cf. [6]) a finite product of the curves ci and their inverses. By Definition 2.2, γ is finite. This means, roughly speaking, the sets of paths the connections are based on are the same for the webs and our case (∞, +). But this yields the equality of our definition of A and that of Baez and Sawin. Theorem 2.14. Suppose G to be compact and semi-simple. Then AWeb and A(∞,+) , i.e. the spaces of generalized connections defined by webs [6] and by Definition 2.8, respectively, are homeomorphic.
622
C. Fleischhack
Proof. Using the proposition above we see analogously to the proof of Theorem 2.10 that IWeb : Hom(P, G) −→ AWeb h −→ (h |w )w is a bijection. (Now, the well-definedness is a consequence of the surjectivity of πw : AWeb −→ G#w [17]). Thus, I := IWeb ◦ I −1 : A(∞,+) −→ AWeb is a bijection, too. We are left with the proof that I is a homeomorphism. For this it is sufficient to prove that each element of a subbase of the one topology has an open image in the other topology. Possible subbases for A(∞,+) and AWeb are the families of all sets of the type π−1 (W ) and πw−1 (Ww ), respectively. Hereby, w is a web and Ww ⊆ Gk , k being the number of paths in w, open. (Here again where we need the semi-simplicity and compactness of G, because only for these assumptions it is proven [17] up to now that the projection πw |A : AWeb ⊇ A −→ Gk is surjective, i.e. Aw = Gk . Otherwise, it would be possible that πw (A) is a non-open Lie subgroup of Gk . So the sets πw−1 (Ww ) do no longer create a subbase.) Furthermore, is a graph and W ⊆ G#E() an element of a certain subbase, e.g., a set of type W = W1 × · · · × W#E() with open Wi ∈ G. Thus, we can take as a subbase for A(∞,+) simply all sets πc−1 (W ), where c is a simple path, i.e. a graph, and W ⊆ G is open. Since every web is a collection of a finite number of simple paths, we get completely analogously that the family of all πc−1 (W ) is a subbase for AWeb . The only difference here is that c has to be simple with different initial and terminal point. We are therefore left with the proof that I(πc−1 (W )) is open in AWeb for all simple, closed paths c and all open W , which is, however, quite easy. Decompose c into two paths c1 and c2 (with different initial and terminal points) which span the graph . Then I(πc−1 (W )) = I(π−1 ((πc )−1 (W ))). By the continuity of πc the set (πc )−1 (W ) is open in G2 , i.e. a union of sets of the type W1 × W2 , but I(π−1 (W1 × W2 )) is open as discussed above. We will continue the discussion on the relationship between graphs and webs in [10]. 3. Determination of the Gauge Orbit Types In contrast to the general theory above let now G be a compact Lie group throughout this section and the following ones. The goal of this section is the classification of the generalized connections by the type of their G-orbits. In contrast to the theory of classical connections in principal fiber bundles, topological subtleties do not play an important rôle – a generalized connection is only an (algebraic) homomorphism from the groupoid P of paths into the structure group G, and the generalized gauge transforms are simply mappings from M to G. Thus, also the theory of generalized gauge orbits is governed completely by the algebraic structure of the action of G on A: hA◦g (γ ) = gx−1 hA (γ ) gy
for all A ∈ A, g ∈ G, γ ∈ Pxy .
(3)
For each element g of the stabilizer B(A) of a connection A the following must be fulfilled: hA (γ ) = hA◦g (γ ) = gx−1 hA (γ ) gy hence, in particular,
for all γ ∈ Pxy ,
(4)
Stratification of the Generalized Gauge Orbit Space
623
−1 h (α) g for all α ∈ HG ≡ P • hA (α) = gm m mm and A −1 • hA (γx ) = gm hA (γx ) gx for all x ∈ M, whereas γx is for any x some fixed path from m to x.
Any path γ ∈ Pxy can be written as γx−1 (γx γ γy−1 ) γy , i.e. as a product of paths in HG and {γx }; thus, both conditions are even equivalent to (4). From the first condition it follows that gm has to commute with all holonomies hA (α), i.e. gm is contained in the centralizer Z(HA ) of the holonomy group of A. Writing the second condition as gx = hA (γx )−1 gm hA (γx )
for all x ∈ M,
(5)
we see that an element g of the stabilizer of A is already completely determined by its value in the point m, i.e. by an element of the holonomy centralizer Z(HA ). From this the isomorphy of B(A) and Z(HA ) follows immediately. Due to general theorems of the theory of transformation groups the gauge orbit A ◦ G is homeomorphic to the factor space B(A)\ G. Now, we define the subgroup G 0 ⊆ G by πm−1 (eG ). This means it contains all gauge transforms that are trivial in m. Obviously, we have G ∼ = G × G 0 . Since B(A) and Z(HA ) ∼ = Z(HA ) × {eG 0 } are homeomorphic, we get for the moment heuristically
∼ ∼ G G G × G0. G × = = \ \ \ 0 B(A) Z(HA ) × {eG 0 } Z(HA ) Using a rigorous argument we will prove that the left and the right space are indeed homeomorphic, i.e. the homeomorphism type of a gauge orbit is already determined by that of Z(H )\ G. Consequently, two connections have homeomorphic gauge orbits, in A particular, if the holonomy centralizers are conjugate. Finally, we can prove that the stabilizers of two connections are conjugate w.r.t. G iff the corresponding holonomy centralizers are conjugate w.r.t. G. This allows us to define the type of a connection not only (as known from the general theory of transformation groups) by the G-conjugacy class of its stabilizer B(A), but equivalently by the Gconjugacy class of its holonomy centralizer Z(HA ). After all, we again mention that in the following G is a compact Lie group. The purely algebraic results, of course, are valid also without this assumption. 3.1. Stabilizer of a connection.
Definition 3.1. Let A ∈ A. Then EA := A ◦ G ≡ {A ∈ A | ∃g ∈ G : A = A ◦ g} is called a gauge orbit of A. Obviously, two gauge orbits are equal or disjoint. We need some notations. Definition 3.2. Let A ∈ A be given. 1. The holonomy group HA of A is equal to hA (HG) ⊆ G. 2. The centralizer Z(HA ) of the holonomy group, also called holonomy centralizer of A, is the set of all elements in G that commute with all elements in HA . 3. The base centralizer B(A) of A is the set of all elements g = (gx )x∈M in G such that −1 h (γ ) g for all x ∈ M and all paths γ from m to x. hA (γ ) = gm x A
624
C. Fleischhack
Note that for regular connections the holonomy group defined above is exactly the holonomy group known from classical theory. We get immediately from the definitions Lemma 3.1. Let A ∈ A and g ∈ G. 1. The holonomy group HA is a subgroup of G. 2. Z(HA ) is a closed subgroup of G. −1 H g and Z(H −1 3. We have HA◦g = gm A m A◦g ) = gm Z(HA ) gm . 4. We have g ∈ B(A) iff a) gm ∈ Z(HA ) and −1 h (γ ) g . b) for all x ∈ M there is a path γ from m to x with hA (γ ) = gm x A Proof. 1. This is an obvious consequence of the homomorphy property of hA : HG −→ G. 2. Trivial. −1 h (α)g for all α ∈ HG. 3. This follows immediately from hA◦g (α) = gm m A 4. $⇒ We have to prove only gm ∈ Z(HA ), but this is clear because we have hA (α) = −1 h (α)g for all α ∈ HG by assumption. gm m A ⇐$ Let x ∈ M be fixed and δ be an arbitrary path from m to x. Choose a γ such −1 h (γ ) g . Then α := δγ −1 ∈ HG and that hA (γ ) = gm x A −1 −1 gm hA (δ) gx = gm hA (αγ ) gx
−1 = gm hA (α) hA (γ ) gx −1 = hA (α) gm hA (γ ) gx = hA (α) hA (γ ) = hA (δ).
since gm ∈ Z(HA ) by the choice of γ
Now we can determine the stabilizer of a connection. Proposition 3.2. For all A ∈ A and all g ∈ G we have A ◦ g = A ⇐⇒ g ∈ B(A). Proof. Per def. we have A ◦ g = A ⇐⇒ ∀x, y ∈ M, γ ∈ Pxy : hA (γ ) = hA◦g (γ ) = gx−1 hA (γ ) gy .
(6)
−1 h (α) g = h (α) holds for all α ∈ P $⇒ Let A ◦ g = A. Due to (6) gm m mm ≡ HG, A A −1 h (γ ) g for all x ∈ M i.e. gm ∈ Z(HA ). Again by (6) we have hA (γx ) = gm x A x and all γ ∈ Pmx . Thus, g ∈ B(A). ⇐$ Let g ∈ B(A) and x, y ∈ M be given. Choose some γx ∈ Pmx , γy ∈ Pmy . Then for all γ ∈ Pxy the following holds:
gx−1 hA (γ )gy = gx−1 hA (γx−1 γx γ γy−1 γy ) gy −1 −1 hA (γx γ γy−1 ) gm gm hA (γy ) gy = gx−1 hA (γx−1 ) gm gm −1 −1 = (gm hA (γx ) gx )−1 hA (γx γ γy−1 ) (gm hA (γy ) gy )
(since γx γ γy−1 ∈ HG and gm ∈ Z(HA ))
= hA (γx )−1 hA (γx γ γy−1 ) hA (γy ) = hA (γ ).
Stratification of the Generalized Gauge Orbit Space
By (6) we have A ◦ g = A.
625
Since for compact transformation groups every stabilizer is closed (see, e.g., [8]), we have using the proposition above Corollary 3.3. B(A) is a closed, hence compact subgroup of G. Furthermore, by the lemma above we get A ◦ g 1 = A ◦ g 2 ⇐⇒ A ◦ g 1 ◦ g −1 2 = −1 A ⇐⇒ g 1 ◦ g 2 ∈ B(A), i.e. we can identify EA and B(A)\ G by τ : B(A)\ G −→ EA . −→ A ◦ g [g] Again by the general theory of compact transformation groups we get [8] Proposition 3.4. τ : B(A)\ G −→ EA is an equivariant isomorphism between compact Hausdorff spaces. 3.2. Isomorphy of B(A) and Z(HA ). In the next subsection we shall determine the homeomorphism class of a gauge orbit EA . For that purpose, we should use the base centralizer. But, this object seems – at least for the first moment – to be quite inaccessible from the algebraic point of view. However, looking carefully at its definition (Def. 3.2) −1 h (γ )g the value of g is already one sees that for given A due to hA (γ ) = gm x x A determined by gm ∈ Z(HA ). Therefore, the base centralizer is completely determined by the holonomy centralizer. Proposition 3.5. For any A ∈ A the map φ : B(A) −→ Z(HA ) g −→ gm is an isomorphism of Lie groups. (The topologies on B(A) and Z(HA ) are the relative ones induced by G and G, respectively.) Proof. • Obviously, φ is a homomorphism. • Surjectivity Let g ∈ Z(HA ). Choose for each x ∈ M a path γx from m to x (w.l.o.g. γm is the trivial path) and define gx := hA (γx )−1 g hA (γx ).
(7)
Obviously, g = (gx ) ∈ G and φ(g) = g. By Lemma 3.1, 4 we have g ∈ B(A) because 1. gm = hA (γm )−1 g hA (γm ) = g ∈ Z(HA ) by the triviality of γm ∈ HG and −1 h (γ ) g for the γ chosen above. 2. hA (γx ) = gm x x A x • Injectivity Clear, because gx is uniquely determined by A and so gm is due to hA (γx ) = −1 h (γ ) g . gm x A x • Continuity of φ φ is the restriction of πm : G −→ G m ≡ G to B(A). The continuity of φ is now a consequence of the continuity of πm .
626
C. Fleischhack
• Continuity of φ −1 φ : B(A) −→ Z(HA ) is a continuous and bijective map of a compact space onto a Hausdorff space. Therefore, φ −1 is continuous. Finally, we note that obviously the isomorphism φ does not depend on the special choice of the paths γx .
3.3. Determination of the homeomorphism class. As we have seen in the last subsection, B(A) and Z(HA ) × {eG 0 } are homeomorphic subgroups of G. One could conjecture that consequently
= Z(HA )\ G × G 0 B(A)\ G and Z(HA ) × {eG 0 } \ G × G 0 ∼ are homeomorphic. But, this is not clear at all. For instance, 2Z and 3Z are isomorphic, but Z/2Z = {0, 1} and Z/3Z = {0, 1, 2} are not. Nevertheless, in our case the claimed relation holds: Proposition 3.6. For any A ∈ A there is a homeomorphism 80 : G 0 × Z(H )\ G −→ B(A)\ G. A Hence, the homeomorphism type of EA is not only determined by B(A)\ G, but already by Z(H )\ G. A
Before we prove this proposition, we shall motivate our choice of the homeomorphism. First we again choose for each x ∈ M a path γx from m to x, where w.l.o.g. γm is the trivial path. By Eq. (7) we get a homomorphism φ : G −→ G g −→ hA (γx )−1 g hA (γx ) x∈M with φ (Z(HA )) = B(A) and therefore a map from Z(H )\ G to B(A)\ G. Further A more, we have φ (G)G 0 = G ∼ = φ (G)×G 0 with g −→ φ (gm ), φ (gm )−1 g . Although
there is no group structure on B(A)\ G – in general, B(A) is only a subgroup and not a normal subgroup of G –, there is at least a canonical right action of G and G 0 , respectively, by [g] ◦ g := [g g ]. Thus, (g, [g]) −→ [φ (g)] ◦ g is a good candidate to become our desired homeomorphism.
Proof. First we choose some path γx from m to x for each x ∈ M, where w.l.o.g. γm is trivial. Now we define 80 : G 0 × Z(H )\ G −→ G A B(A)\ (gx )x∈M , [g] −→ φ (g) (gx )x∈M with gm = eG .
Stratification of the Generalized Gauge Orbit Space
627
1. 80 is well-defined. Let g1 ∼ g2 , i.e. g1 = zg2 for some z ∈ Z(HA ), and let g := (gx )x∈M ∈ G 0 . Then we have 80 (gx )x∈M , [g1 ] = φ (g1 ) g = φ (zg2 ) g Homomorphy property of φ = φ (z) φ (g2 ) g = φ (g2 ) g φ (Z(HA )) = B(A) by Proposition 3.5 = 80 (gx )x∈M , [g2 ] . 2. 80 is injective. Let 80 (g1,x )x∈M , [g1 ] = 80 (g2,x )x∈M , [g2 ] . Then there exists a z ∈ B(A) with φ (g1 )x g1,x = zx φ (g2 )x g2,x , i.e. hA (γx )−1 g1 hA (γx ) g1,x = zx hA (γx )−1 g2 hA (γx ) g2,x for all x ∈ M. Thus, • for x = m: g1 = zm g2 , i.e. [g1 ] = [g2 ], and • for x = m: g1,x = hA (γx )−1 g1−1 hA (γx ) zx hA (γx )−1 g2 hA (γx ) g2,x = hA (γx )−1 g1−1 zm g2 hA (γx ) g2,x = hA (γx )−1 hA (γx ) g2,x = g2,x ,
i.e. 80 is injective. 3. 80 is surjective. Let [ g ] ∈ B(A)\ G be given. Define gx := (φ ( gm )−1 g )x for all x ∈ M. Then we have 80 (gx )x∈M , [ gm ] = [ g ]. 4. 80−1 is continuous. It is sufficient to prove that the projections pr i ◦ 80−1 of 80−1 to the factors G 0 (i = 1) and Z(H )\ G (i = 2) are continuous. A
a) pr 1 ◦ 80−1 is continuous. For all x ∈ M \ {m} the map πmx
mult.
G −−→ G × G −−→ G −1 h (γ )g ) g −→ (gm , gx ) −→ (hA (γx )−1 gm A x x is a composition of continuous maps and consequently continuous itself. Since πB(A) : G −→ B(A)\ G is open and surjective, we get the continuity of πx ◦ pr 1 ◦ 80−1 for all x ∈ M \ {m} by πx ◦ (pr 1 ◦ 80−1 ) ◦ πB(A) = mult. ◦ πmx . For x = m the corresponding statement is trivial. Thus, pr 1 ◦ 80−1 is continuous. b) pr 2 ◦ 80−1 is continuous. We use πZ(HA ) ◦ πm = (pr 2 ◦ 80−1 ) ◦ πB(A) : G −→ Z(H )\ G. The statement A now follows because πB(A) is an open and surjective map and πZ(HA ) and πm are continuous.
628
C. Fleischhack
5. 80 is a homeomorphism because 80−1 is a continuous and bijective map of a compact space onto a Hausdorff space. Thus we get the following important result: The homeomorphism class of a gauge orbit of a connection is completely determined by its holonomy centralizer. Finally, we should emphasize that, in general, the homeomorphism 80 is not an equivariant map w.r.t. the canonical action of G on G 0 × Z(H )\ G. A
3.4. Criteria for the homeomorphy of gauge orbits. It is well known that orbits of general transformation groups are classified by the conjugacy classes of their stabilizers. This would effect in our case that the gauge orbits are characterized by the conjugacy class of their corresponding base centralizer w.r.t. G. As we have already seen, the base centralizer of a connection A is isomorphic to the holonomy centralizer of A and the homeomorphism type of the gauge orbit is completely determined by that of Z(H )\ G. A
Now we are going to show that base centralizers are conjugate w.r.t. G if and only if the corresponding holonomy centralizers are conjugate w.r.t. G. This will allow us to define the type of a gauge orbit EA to be the conjugacy class of Z(HA ) w.r.t. G. The investigation of the set of all these classes is much easier than in the case of classes in G. We want to prove the following Proposition 3.7. Let A1 , A2 ∈ A be two generalized connections. Then the following statements are equivalent: 1. Z(HA1 ) and Z(HA2 ) are conjugate in G.
2. B(A1 ) and B(A2 ) are conjugate in G.
It would be quite easy to prove this directly using Proposition 3.5. Nevertheless, we do not want to do this. Instead, we shall first derive some concrete criteria for the homeomorphy of two gauge orbits. Finally, the just claimed proposition will be a nice by-product. let there Proposition 3.8. Let A1 , A2 ∈ A be two generalized connections.Furthermore, exist an isomorphism 8 : G −→ G of topological groups with 8 B(A1 ) = B(A2 ). Then the map : : EA1 −→ EA2 A1 ◦ g −→ A2 ◦ 8(g) is a homeomorphism compatible with the action of G. Proof. • : is well-defined. −1 −1 Let A1 ◦ g = A1 ◦ g . Then we have A1 ◦ (g ◦ g ) = A1 , i.e. g ◦ g ∈ B(A1 ) by −1 Proposition 3.2. By assumption we have 8(g ◦ g ) = 8(g) ◦ 8(g )−1 ∈ B(A2 ), i.e. A2 ◦ 8(g) = A2 ◦ 8(g ). • Since 8 is a group isomorphism, : is again an isomorphism that is compatible with the action of G. • For the proof of the homeomorphy property of : we consider the following commutative diagram:
Stratification of the Generalized Gauge Orbit Space
E A1
:
τ1 ∼ =
↓
B(A1 )\ G
→ EA2 ∼ = τ2
;
↓
→ B(A2 )\ G
629
A1 ◦ g
:
→ A2 ◦ g
τ1
↓ [g]B(A1 )
τ2 ;
.
↓ → [8(g)]B(A2 )
Since τ1 and τ2 are homeomorphisms, it is sufficient to prove that ; and ;−1 are continuous. But, this follows immediately from the fact that πB(A) : G −→ B(A)\ G is an orbit space projection and that ; ◦ πB(A1 ) = πB(A2 ) ◦ 8. To simplify the speech in the following we state Definition 3.3. Let G be a Lie group (topological group) and let U1 and U2 be closed subgroups of G. U1 and U2 are called extendibly isomorphic (w.r.t. G) iff there is an isomorphism ψ : G −→ G of Lie groups (topological groups) with ψ(U1 ) = U2 . If misunderstanding seems to be unlikely, we simply drop “w.r.t. G” and write “extendibly isomorphic”. In Proposition 3.8 we compared gauge orbits w.r.t. their base centralizers. Now we will compare them using their holonomy centralizers. In order to manage this we need an extendibility lemma. Let the holonomy centralizers of two connections be extendibly isomorphic, i.e. let there exist a ψ : G −→ G with ψ(Z(HA1 )) = Z(HA2 ). By 8 := φ2−1 ◦ ψ ◦ φ1 the base centralizers are isomorphic. Extending 8 to G we get Lemma 3.9. Let A1 , A2 ∈ A be two generalized connections. Then the following statement holds: If Z(HA1 ) and Z(HA2 ) are extendibly isomorphic, then B(A1 ) and B(A2 ) are also extendibly isomorphic. We have explicitly: Let ψ : G −→ G be an isomorphism of Lie groups with ψ Z(HA1 ) = Z(HA2 ). Furthermore, let γx be an arbitrary, but fixed path in M for each x ∈ M. Then we have: • The map 8 : G −→ G defined by 8(g)x := hA2 (γx )−1 ψ hA1 (γx )gx hA1 (γx )−1 hA2 (γx )
(8)
is an isomorphism of topological groups. • 8 |B(A1 ) is an isomorphism of Lie groups between B(A1 ) and B(A2 ). Furthermore, 8 |B(A1 ) is independent of the choice of the paths γx . Proof. Let Z(HA1 ) and Z(HA2 ) be extendibly isomorphic with the corresponding isomorphism ψ. • Obviously, we have 8(g) ∈ G and 8 is a homomorphism of groups. Moreover, 8 is bijective with the inverse 8 −1 (g)x = hA1 (γx )−1 ψ −1 hA2 (γx )gx hA2 (γx )−1 hA1 (γx ).
(9)
630
C. Fleischhack
To prove the continuity of 8 it is sufficient to prove the continuity of πx ◦ 8 for all x. Hence, let U ⊆ G be open. Then we have (πx ◦ 8)−1 (U ) = {g ∈ G | (πx ◦ 8)(g) = 8(g)x = hA2 (γx )−1 ψ hA1 (γx )gx hA1 (γx )−1 hA2 (γx ) ∈ U } = πx−1 hA1 (γx )−1 ψ −1 (hA2 (γx )) ψ −1 (U ) ψ −1 (hA2 (γx )−1 ) hA1 (γx ) . Since ψ is a homeomorphism and πx is continuous, (πx ◦ 8)−1 (U ) is open. The continuity of 8 is now a consequence of Corollary 2.11, that of 8 −1 is clear. • Let φi be the isomorphism for Ai (i = 1, 2) corresponding to Proposition 3.5. Then we have 8 |B(A1 ) = φ2−1 ◦ ψ ◦ φ1 : B(A1 ) −→ B(A2 ). Since φ1 , φ2 and ψ are Lie isomorphisms and, moreover, independent of the choice of the γx , 8 |B(A1 ) is again an isomorphism of Lie groups that is independent of the choice of the γx . Thus, B(A1 ) and B(A2 ) are extendibly isomorphic.
The next lemma is obvious. Lemma 3.10. Let A1 , A2 ∈ A be two generalized connections. Then Z(HA1 ) and Z(HA2 ) are extendibly isomorphic provided they are conjugate w.r.t. G. Now we can prove Proposition 3.7. Proof of Proposition 3.7. • Let Z(HA1 ) and Z(HA2 ) be conjugate and thus also extendibly isomorphic. The map 8 : G −→ G from Lemma 3.9 fulfills now 8(g ) = (hA1 (γx )−1 g hA2 (γx ))−1 gx (hA1 (γx )−1 g hA2 (γx )) x∈M , where g ∈ G was chosen such that Z(HA2 ) = (Ad g)Z(HA1 ). We define g := hA1 (γx )−1 g hA2 (γx ) x∈M . Hence, the map 8 : G −→ G from Lemma 3.9 is simply Ad g. Moreover, Ad g maps B(A1 ) isomorphically onto B(A2 ). Thus, B(A2 ) = (Ad g)B(A1 ). • Let B(A1 ) and B(A2 ) be conjugate, i.e. let there exist a g ∈ G with B(A2 ) = −1 Z(H )g . g −1 B(A1 )g. Then we obviously have Z(HA2 ) = gm A1 m Let us summarize: Theorem 3.11. Let A1 , A2 ∈ A be two generalized connections. Then the following implication chain holds: B(A1 ) and B(A2 ) are conjugate in G. ⇐⇒ Z(HA1 ) and Z(HA2 ) are conjugate in G. $⇒ Z(HA1 ) and Z(HA2 ) are extendibly isomorphic. $⇒ B(A1 ) and B(A2 ) are extendibly isomorphic. $⇒ The gauge orbits EA1 and EA2 are homeomorphic.
Stratification of the Generalized Gauge Orbit Space
631
This theorem has an interesting and perhaps a little bit surprising consequence: Even after projecting A down to A/G ≡ Hom(HG, G)/Ad the complete knowledge about the homeomorphism class of the corresponding gauge orbit is conserved. Naively one would suggest that after projecting the total gauge orbit onto one single point this information should be lost. But, the homeomorphism class is already determined by giving the holonomy centralizer, that, the other way round, can be, up to a global conjugation, reconstructed from [A]. Proposition 3.12. For each [A] ∈ A/G the homeomorphism class of the gauge orbit corresponding to [A] can be reconstructed from [A]. 3.5. Discussion on how to define the gauge orbit type. If we ignored the usual definition of the type of an orbit in a general G-space, then Theorem 3.11 would open up several possibilities to define the type of a gauge orbit. If the type should characterize as “uniquely” as possible the homeomorphism class of the gauge orbit, then it would be advisable to define the base centralizer modulo extendible isomorphy to be the type. But, even this choice would not guarantee that two gauge orbits with different type are in fact non-homeomorphic. Moreover, the base centralizers as subgroups of G are not so easily controllable as centralizers in G are. Thus, we will take the holonomy centralizer for the definition. Only the question remains, whether we should take the centralizer modulo conjugation or modulo extendible isomorphy. We have to collect conjugate centralizers in one type anyway in order to make points of one orbit be of the same type. (Note, that the holonomy centralizers of two gauge equivalent connections are generally not equal but only conjugate.) If we now include the general definition of an orbit type into our considerations again, it will be clear that we shall use the centralizer modulo conjugation. Since two connections have one and the same (usual) orbit type iff their base centralizers are conjugate, i.e. iff their holonomy centralizers are conjugate, we define the gauge orbit type by Definition 3.4. The type of a gauge orbit EA is the holonomy centralizer of A modulo conjugation. We write Typ([A]) or simply Typ(A). We emphasize that this definition of the type of the gauge orbit EA is – as mentioned above – independent of the choice of the connection A ∈ EA . In fact, if A is gauge equivalent to A, by Lemma 3.1 there is a g ∈ G with Z(HA ) = g −1 Z(HA )g. Hence, the
holonomy centralizers of A and A are conjugate. Thus, we can assign to each [A] ∈ A/G a unique gauge orbit type. Using Theorem 3.11 we get immediately Corollary 3.13. Two gauge orbits with the same type are homeomorphic.
Finally, we want to give a further justification for our definition of the gauge orbit type. Let us consider regular connections. In the literature there are two different definitions for the type of a “classical” gauge orbit: On the one hand [14], one chooses the total stabilizer of A ∈ A in G. On the other hand [15], one sees first that the pointed gauge group G0 (the set of all gauge transforms that are the identity on a fixed fibre) is a normal and closed subgroup in G. Obviously, G := G/G0 can be identified with the structure group G. Moreover, the action of G0 on A is free, proper and smooth. This way one gets an action of G, the "essential part" of the gauge transforms, on the space
632
C. Fleischhack
A/G0 . Now, the gauge orbit types are the conjugacy classes of stabilizers being closed subgroups of G ∼ = G. This definition corresponds to our choice of the centralizer of the holonomy group. Due to the statements proven above these two descriptions are equivalent if we consider generalized connections, but in general not if we work in the classical framework. There it is under certain circumstances possible [15] that two connections have conjugate holonomy centralizers, but this conjugation cannot be lifted to a conjugation of the base centralizers. The deeper reason behind this is that the gauge transform g = hA1 (γx )−1 g hA2 (γx ) x∈M (cf. proof of Proposition 3.7) generally is not a classical gauge transform, i.e. it is not smooth. Nevertheless, in case of the definition using the holonomy group we have Corollary 3.14. The gauge orbit type is conserved by the embedding A >→ A. But note that this does not mean at all that the classical and the generalized gauge orbit of a classical connection itself are equal or at least homeomorphic. 4. Stratification of the Generalized Gauge Orbit Space Throughout this section let G be a compact Lie group. The main goal of this section is to prove that A admits a stratification. The basic concept hereby is the notion of a stratum. A stratum simply collects all connections having the same type. Natural questions are: Where in A does which stratum lie? What strata are “bigger” or “smaller”? Which stratum is perhaps the boundary of another? The aim of a stratification theorem is to reduce these geometrical questions to a certain ordering of the types of the strata under consideration. As we have seen above, types can be characterized by subgroups of G that, on the other hand, are naturally ordered by the inclusion relation. And indeed this will induce an appropriate ordering of the strata. For the proof of the stratification theorem we will need first a lemma showing that the orbit type of every connection is determined by a finite set of holonomies. This allows us to lift the slice theorem from a certain Gk to A. Together with a denseness theorem we get the stratification.
4.1. Partial ordering of types. Since every gauge orbit type is an equivalence class of the centralizer of the corresponding holonomy group, the following notation is useful. Definition 4.1. A subgroup U of G is called Howe subgroup iff there is a set V ⊆ G with U = Z(V ). Analogously to the general theory we define a partial ordering for the gauge orbit types [9]. Definition 4.2. Let T denote the set of all conjugacy classes of Howe subgroups of G. Let t1 , t2 ∈ T . Then t1 ≤ t2 holds iff there are G1 ∈ t1 and G2 ∈ t2 with G1 ⊇ G2 . Obviously, we have Lemma 4.1. The maximal element in T is the class tmax of the center Z(G) of G, the minimal is the class tmin of G itself. Every connection whose type equals tmax will be called generic.
Stratification of the Generalized Gauge Orbit Space
633
Definition 4.3. Let t ∈ T . We define the following expressions: A≥t := {A ∈ A | Typ(A) ≥ t}, A=t := {A ∈ A | Typ(A) = t}, A≤t := {A ∈ A | Typ(A) ≤ t}. All the A=t are called strata. The justification for the notation “strata” can be found in Subsect. 4.5. 4.2. Reducing the problem to finite-dimensional G-spaces. 4.2.1. Finiteness lemma for centralizers. We start with the crucial Lemma 4.2. Let U be a nonempty subset of a compact Lie group G. Then there exist an n ∈ N and u1 , . . . , un ∈ U , such that Z({u1 , . . . , un }) = Z(U ). Proof. • The case Z(U ) = G is trivial. • Let Z(U ) = G. Then there is a u1 ∈ U with Z({u1 }) = G. Choose now for i ≥ 1 successively ui+1 ∈ U with Z({u1 , . . . , ui }) ⊃ Z({u1 , . . . , ui+1 }) as long as there is such a ui+1 . This procedure stops after a finite number of steps, since each nonincreasing sequence of compact subgroups in G stabilizes [9]. (Centralizers are always closed, thus compact.) Therefore there is an n ∈ N, such that Z({u1 , . . . , u n }) = Z({u1 , . . . , un } ∪ {u}) for all u ∈ U . Thus, we have Z({u1 , . . . , un }) = u∈U Z({u1 , . . . , un } ∪ {u}) = Z({u1 , . . . , un } ∪ U ) = Z(U ). Corollary 4.3. Let A ∈ A. Then there is a finite set α ⊆ HG, such that Z(HA ) = Z(hA (α)). We set hA (α) := hA (α1 ), . .. , hA (αn ) ⊆ G, where n := #α. To avoid cumbersome notations we denote also hA (α1 ), . . . , hA (αn ) ∈ Gn by hA (α). It should be clear from the context what is meant. Furthermore, α is always finite. Proof. Due to HA ⊆ G and the just proven lemma there are an n ∈ N and g1 , . . . , gn ∈ HA with Z({g1 , . . . , gn }) = Z(HA ). On the other hand, since g1 , . . . , gn ∈ HA , there are α1 , . . . , αn ∈ HG with gi = hA (αi ) for all i = 1, . . . , n. 4.2.2. Reduction mapping. Definition 4.4. Let α ⊆ HG. Then the map ϕα : A −→ G#α A −→ hA (α) is called reduction mapping. Lemma 4.4. Let α ⊆ HG be arbitrary. Then ϕα is continuous, and for all A ∈ A and g ∈ G we have ϕα (A ◦ g) = ϕα (A) ◦ gm . Here G acts on G#α by the adjoint map. Proof. ϕα is continuous by the continuity criterion for maps into product spaces and by the fact that each α ∈ α is a finite product of edges of graphs. The compatibility with −1 h (α) g . the group action follows from hA◦g (α) = gm m A
634
C. Fleischhack
4.2.3. Adjoint action of G on Gn . In this short paragraph we will summarize the most important facts about the adjoint action of G on Gn that can be deduced from the general theory of transformation groups (see, e.g., [8]). First we determine the stabilizer Gg of an element g ∈ Gn . We have Gg = {g ∈ G | g ◦ g = g} = {g ∈ G | g −1 gi g = gi
∀i} = Z({g1 , . . . , gn }).
Consequently, we have for the type of the corresponding orbit Typ(g) = [Gg ] = [Z({g1 , . . . , gn })]. The slice theorem reads now as follows: Proposition 4.5. Let g ∈ Gn . Then there is an S ⊆ Gn with g ∈ S, such that: • S ◦ G is an open neighboorhood of g ◦ G and • there is an equivariant retraction f : S ◦ G −→ g ◦ G with f −1 ({g}) = S. Both on A and on Gn the type is (the class of) a Howe subgroup of G. The transformation behaviour of the types under a reduction mapping is stated in the next Proposition 4.6. Any reduction mapping is type-minorifying, i.e. for all α ⊆ HG and all A ∈ A we have Typ ϕα (A) ≤ Typ(A). Proof. We have Typ ϕα (A) = [Z(ϕα (A))] ≡ [Z(hA (α))] ≤ [Z(HA )] = Typ(A). 4.3. Slice theorem for A. We state now the main theorem of the present paper. Theorem 4.7. There is a tubular neighbourhood for any gauge orbit. Equivalently we have: For all A ∈ A there is an S ⊆ A with A ∈ S, such that: • S ◦ G is an open neighbourhood of A ◦ G and • there is an equivariant retraction F : S ◦ G −→ A ◦ G with F −1 ({A}) = S. 4.3.1. The idea. Our proof imitates in a certain sense the proof of the standard slice theorem (see, e.g., [8]) which is valid for the action of a finite-dimensional compact Lie group G on a Hausdorff space X. Let us review the main idea of this proof. Given x ∈ X. Let H ⊆ G be the stabilizer of x, i.e., [H ] is an orbit type on the G-space X. Now, this situation is simulated on an Rn , i.e., for an appropriate action of G on Rn one chooses a point with stabilizer H . So the orbits on X and on Rn can be identified. For the case of Rn the proof of a slice theorem is not very complicated. The crucial point of the general proof is the usage of the Tietze-Gleason extension theorem because this yields an equivariant extension ψ : X −→ Rn , mapping one orbit onto the other. Finally, by means of ψ the slice theorem can be lifted from Rn to X. What can we learn for our problem? Obviously, G is not a finite-dimensional Lie group. But, we know that the stabilizer B(A) of a connection is homeomorphic to the centralizer Z(HA ) of the holonomy group that is a subgroup of G. Since every centralizer is finitely generated, Z(HA ) equals Z(hA (α)) with an appropriate finite α ∈ HG. This is nothing but the stabilizer of the adjoint action of G on Gn in the point hA (α). Thus, the reduction mapping ϕα is the desired equivalent for ψ.
Stratification of the Generalized Gauge Orbit Space
635
We are now looking for an appropriate S ⊆ A, such that F : S ◦ G −→ A ◦ G A ◦ g −→ A ◦ g is well-defined and has the desired properties. In order to make F well-defined, we need A ◦ g = A $⇒ A ◦ g = A for all A ∈ S and g ∈ G, i.e. B(A ) ⊆ B(A). Applying the projections πx on the stabilizers we get for γx ∈ Pmx (let γm be the trivial path)
hA (γm )−1 Z(HA )hA (γx ) = πx (B(A )) ⊆ πx (B(A)) = hA (γm )−1 Z(HA )hA (γx ), thus Z(HA ) ⊆ hA (γm )hA (γm )−1 Z(HA ) hA (γx )h−1 (γx ) A
(10)
for all x ∈ M. In particular, we have Z(HA ) ⊆ Z(HA ) for x = m. Now we choose an α ⊆ HG with Z(HA ) = Z(hA (α)) and an S ⊆ G#α and an equivariant retraction f : S ◦ G −→ ϕα (A) ◦ G. Since equivariant mappings magnify stabilizers (or at least do not reduce them), we have Z(g ) ⊆ Z(ϕα (A)) for all g ∈ S. Therefore, the condition of (10) would be, e.g., fulfilled if we had for all A ∈ S, 1. ϕα (A ) ∈ S and 2. hA (γx ) = hA (γx ) for all x ∈ M,
because the first condition implies Z(HA ) ⊆ Z(hA (α)) ≡ Z(ϕα (A )) ⊆ Z(ϕα (A)) = Z(HA ). We could now choose S such that these two conditions are fulfilled. However, this would imply F −1 ({A}) ⊃ S in general because for g ∈ B(A) together with A the connection A ◦ g is contained in F −1 ({A}) as well (we have F (A ) = A = A ◦ g = F (A ◦ g)), but A ◦ g need no longer fulfill the two conditions above. Now it is quite obvious to define S as the set of all connections fulfilling these conditions multiplied with B(A). And indeed, the well-definedness remains valid. 4.3.2. The proof. Proof. 1. Let A ∈ A. Choose for A an α ⊆ HG with Z(HA ) = Z(hA (α)) according to Corollary 4.3 and denote the corresponding reduction mapping ϕα : A −→ G#α shortly by ϕ. 2. Due to Proposition 4.5 there is an S ⊆ G#α with ϕ(A) ∈ S, such that • S ◦ G is an open neighbourhood of ϕ(A) ◦ G and • there exists an equivariant mapping f with – f : S ◦ G −→ ϕ(A) ◦ G and – f −1 ({ϕ(A)}) = S. 3. We define the mapping ψ : A −→ G, A −→ hA (γx ) x∈M , whereas for all x ∈ M \ {m} the (arbitrary, but fixed) path γx runs from m to x and γm is the trivial path.
636
C. Fleischhack
4. As we motivated above we set S 0 := ϕ −1 (S) ∩ ψ −1 (ψ(A)), S := ϕ −1 (S) ∩ ψ −1 (ψ(A)) ◦ B(A) ≡ S 0 ◦ B(A) and
5.
F : S ◦ G −→ A ◦ G. A ◦ g −→ A ◦ g
F is well-defined. • Let A ◦ g = A ◦ g with A , A ∈ S and g , g ∈ G. Then there exist z , z ∈ B(A) with A = A0 ◦ z and A = A0 ◦ z as well as A0 , A0 ∈ S 0 . • Due to S 0 ⊆ ψ −1 (ψ(A)) we have ψ(A0 ) = ψ(A) = ψ(A0 ), i.e. hA (γx ) = 0 hA (γx ) = hA (γx ) for all x. 0 • Furthermore, we have
f (ϕ(A ◦ g )) = f (ϕ(A0 ◦ z ◦ g ))
◦ gm ) ϕ "equivariant" = f (ϕ(A0 ) ◦ zm
◦ gm = f (ϕ(A0 )) ◦ zm
= = =
ϕ(A) ◦ zm
◦ gm ϕ(A ◦ z ) ◦ gm ϕ(A) ◦ gm
f equivariant
ϕ(A0 ) ∈ S ϕ "equivariant" z ∈ B(A)
. Therefore, we have ϕ(A) ◦ g = and analogously f (ϕ(A ◦ g )) = ϕ(A) ◦ gm m −1 (g )−1 ∈ ϕ(A)◦gm , i.e. gm (gm ) is an element of the stabilizer of ϕ(A), thus gm m Z(ϕ(A)) = Z(HA ). • Since A0 ◦ z ◦ g = A0 ◦ z ◦ g , we have A0 = A0 ◦ z g (g )−1 (z )−1 , and so for all x ∈ M, −1 hA (γx ) = z g (g )−1 (z )−1 m hA (γx ) z g (g )−1 (z )−1 x . 0
0
Moreover, since g (g )−1 m ∈ Z(HA ), we have z g (g )−1 (z )−1 m ∈ Z(HA ). From hA (γx ) = hA (γx ) = hA (γx ) for all x now z g (g )−1 (z )−1 ∈ 0
0
B(A) follows, and thus g (g )−1 ∈ B(A). • By this we have A ◦ g = A ◦ g , i.e. F is well-defined. 6. F is equivariant. • Let A = A ◦ g ∈ S ◦ G. Then
F (A ◦ g) = F (A ◦ (g ◦ g)) = A ◦ (g ◦ g) = (A ◦ g ) ◦ g
= F (A ◦ g ) ◦ g
= F (A ) ◦ g.
Stratification of the Generalized Gauge Orbit Space
637
F is retracting. • Let A = A ◦ g ∈ A ◦ G. Then F (A ) = F (A ◦ g) = A ◦ g = A . 8. S ◦ G is an open neighbourhood of A ◦ G. • Obviously, A ◦ G ⊆ S ◦ G. • We have S ◦ G = ϕ −1 (S ◦ G). “⊆” Let A = A ◦ g ∈ S 0 ◦ G = S ◦ G. Then we have ϕ(A ) = ϕ(A ◦g) = ϕ(A )◦gm ∈ S◦G because ϕ(S 0 ) ⊆ S. −1 Thus, A ∈ ϕ (S ◦ G). “⊇” – Let A ∈ ϕ −1 (S ◦ G), i.e. ϕ(A ) = g ◦ g with appropriate g ∈ S and g ∈ G. – Choose some g with gm = g. −1 = g ∈ S. Then ϕ(A ◦ g −1 ) = ϕ(A ) ◦ gm −1 Now set A := A ◦ g . −1 hA (γx ) and A := A ◦ g we get – Using gx := hA (γx )
7.
= e and a) ϕ(A ) = ϕ(A ) ∈ S because of gm G b) hA (γx ) = hA (γx ) gx = hA (γx ) for all x ∈ M.
Thus, we have A ∈ S 0 ⊆ S and A = A ◦ g = A ◦ ((g )−1 ◦ g) ∈ S ◦ G. • Consequently, S ◦ G = ϕ −1 (S ◦ G) is as a preimage of an open set again open because of the continuity of ϕ. 9. F is continuous. • We consider the following diagram: F
S◦G ϕ
→ A◦G .
ϕ
↓ S◦G
f
↓ → ϕ(A) ◦ G
A ◦g
F
ϕ
↓
ϕ(A ) ◦ gm
τG → ∼ =
(11)
Z(HA )\ G
→ A◦g ϕ
f
↓ → ϕ(A) ◦ gm
τG
→ [gm ]Z(HA )
It is commutative due to ϕ(S ◦ G) ⊆ S ◦ G, ϕ(A ◦ G) ⊆ ϕ(A) ◦ G and the definition of F . τG is the canonical homeomorphism between the orbit of ϕ(A) and the quotient of the acting group G by the stabilizer of ϕ(A). Since ϕ, f and τG are continuous, the map F := τG ◦ ϕ ◦ F : S ◦ G −→ Z(H )\ G A A ◦ g −→ [gm ]Z(HA ) is continuous. • Now, we consider the map G. F : (S ◦ G) × G −→ −1 (A ◦ g , gm ) −→ hγx (A) gm hγx (A ◦ g ) x∈M
638
C. Fleischhack
F is continuous because mult. πx ◦ F : (S ◦ G) × G −→ G×G −−→ G (A , gm ) −→ (hγx (A ), gm ) −→ hγx (A)−1 gm hγx (A ) is obviously continuous for all x ∈ M. • F induces a map F via the following commutative diagram F (S ◦ G) × G →G id×πZ(H
i.e., –
A
)
πB(A)
↓
F
,
↓
(S ◦ G) × Z(H )\ G → B(A)\ G A −1 = hγx (A) gm hγx (A ) x∈M B(A) .
F (A , [gm ]Z(HA ) ) F is well-defined.
Let g2,m = zg1,m with z ∈ Z(HA ). Then hγx (A)−1 g2,m hγx (A ) x∈M B(A) = hγx (A)−1 z g1,m hγx (A ) x∈M B(A) = zx hγx (A)−1 g1,m hγx (A ) x∈M B(A)
F (A , [g2,m ]Z(HA ) ) =
= F (A , [g1,m ]Z(HA ) ), because (zx )x∈M := (hγx (A)−1 z hγx (A))x∈M ∈ B(A) for z ∈ Z(HA ). – F is continuous, because id × πZ(HA ) is open and surjective and πB(A) and F are continuous. • For A ∈ S there is an A0 ∈ S 0 and a g ∈ B(A) with A = A0 ◦ g . Thus, we have hγx (A0 ) = hγx (A) and hγx (A)−1 gm hγx (A0 ◦ g ◦ g) x∈M B(A) −1 −1 = hγx (A)−1 gm gm (gm ) hγx (A)gx gx x∈M B(A) = hγx (A)−1 hγx (A ◦ g ) gx x∈M B(A) = (gx )x∈M B(A)
F (A ◦ g, [gm ]) =
= [g]B(A) , where we used g ∈ B(A). • Now, F is the concatenation of the following continuous maps: id×F
F
(τG )−1
F : S ◦ G −−−→ (S ◦ G) × Z(H )\ G −→ B(A)\ G −−−→ A ◦ G, A A ◦ g −→ (A ◦ g, [gm ]Z(HA ) ) −→ [g]B(A) −→ A ◦ g, where τG is the canonical homeomorphism between the orbit A ◦ G and the acting group G modulo the stabilizer B(A) of A. Hence, F is continuous. 10. We have F −1 ({A}) = S.
Stratification of the Generalized Gauge Orbit Space
639
• “⊆” Let A ∈ F −1 ({A}), i.e. F (A ) = A. – By the commutativity of (11) we have f (ϕ(A )) = ϕ(F (A )) = ϕ(A), hence A ∈ ϕ −1 (f −1 (ϕ(A))) = ϕ −1 (S). – Define gx := hA (γx )−1 hA (γx ) and A := A ◦ g. Then we have
ϕ(A ) = ϕ(A ) ∈ S, i.e. A ∈ ϕ −1 (S), and hA (γx ) = hA (γx ) for all
x, i.e. A ∈ ψ −1 (ψ(A)). By this, A ∈ S 0 . – Consequently, F (A ) = A = F (A ), and therefore also A ◦ g = F (A ) ◦ g = F (A ◦ g) = F (A ) = A, i.e. g ∈ B(A). Thus, A = A ◦ g −1 ∈ S 0 ◦ B(A) = S. “⊇” Let A ∈ S. Then F (A ) = F (A ◦ 1) = A ◦ 1 = A, i.e. A ∈ F −1 ({A}). 4.3.3. Openness of the strata. Proposition 4.8. A≥t is open for all t ∈ T . Corollary 4.9. A=t is open in A≤t for all t ∈ T . Proof. Since A=t = A≥t ∩ A≤t , A=t is open w.r.t. to the relative topology on A≤t .
Corollary 4.10. A≤t is compact for all t ∈ T . Proof. A \ A≤t = t ∈T ,t /≤ t A=t = t ∈T ,t /≤ t A≥t is open because A≥t is open for all t ∈ T . Thus, A≤t is closed and therefore compact. The proposition on the openness of the strata can be proven in two ways: first as a simple corollary of the slice theorem on A, but second directly using the reduction mapping. Thus, altogether the second variant needs less effort. Proof of Proposition 4.8. We have to show that any A ∈ A≥t has a neighbourhood that again is contained in A≥t . So, let A ∈ A≥t . • Variant 1 Due to the slice theorem there is an open neighbourhood U of A ◦ G, and so of A, too, and an equivariant retraction F : U −→ A ◦ G. Since every equivariant mapping reduces types, we have Typ(A ) ≥ Typ(A) ≥ t for all A ∈ U , thus U ⊆ A≥t . • Variant 2 Choose again for A an α ⊆ HG with Typ(A) = [Z(HA )] = [Z(hA (α))] ≡ [Z(ϕα (A))] = Typ(ϕα (A)). Due to the slice theorem for general transformation groups there is an open, invariant neighbourhood U of ϕα (A) in G#α and an equivariant retraction f : U −→ ϕα (A) ◦ G. Since ϕα and f are type-reducing, we have Typ(A ) ≥ Typ(ϕα (A )) ≥ Typ f (ϕα (A )) = Typ(ϕα (A)) = Typ(A) ≥ t
for all A ∈ U := ϕα−1 (U ), i.e. U ⊆ A≥t . Obviously, U contains A and is open as a preimage of an open set.
640
C. Fleischhack
4.4. Denseness of the strata. The next theorem we want to prove is that the set A=t is not only open, but also dense in A≤t . This assertion does – in contrast to the slice theorem and the openness of the strata – not follow from the general theory of transformation groups. We have to show this directly on the level of A. As we will see in a moment, the next proposition will be very helpful. Proposition 4.11. Let A ∈ A and i be finitely many graphs. Then there is for any t ≥ Typ(A) an A ∈ A with Typ(A ) = t and πi (A) = πi (A ) for all i. Namely, we have Corollary 4.12. A=t is dense in A≤t for all t ∈ T . Proof. Let A ∈ A≤t ⊆ A. We have to show that any neighbourhood U of A contains an A having type t. It is sufficient to prove this assertion for all graphs i and all (Wi ) with open Wi ⊆ G#E(i ) and πi (A) ∈ Wi for all i ∈ I with finite I , U = i π−1 i because any general open U contains such a set. Now let i and U be chosen as just described. Due to Proposition 4.11 above there exists an A ∈ A with Typ(A ) = t ≥ Typ(A) and πi (A) = πi (A ) for all i, i.e. with A ∈ A=t and A ∈ π−1 (Wi ) for all i, thus, A ∈ i π−1 (Wi ) = U . πi ({A}) ⊆ π−1 i i i Along with the proposition about the openness of the strata we get Corollary 4.13. For all t ∈ T the closure of A=t w.r.t. A is equal to A≤t . Proof. Denote the closure of F w.r.t. E by ClE (F ). Due to the denseness of A=t in A≤t we have ClA≤t (A=t ) = A≤t . Since the closure is
compatible with the relative topology, we have A≤t = ClA≤t (A=t ) = A≤t ∩ ClA (A=t ), i.e. A≤t ⊆ ClA (A=t ). But, due to Corollary 4.10, A≤t ⊇ A=t itself is closed in A. Hence, A≤t ⊇ ClA (A=t ).
4.4.1. How to prove Proposition 4.11?. Which ideas will the proof of Proposition 4.11 be based on? As in the last two subsections we get help from the finiteness lemma for centralizers. Namely, let α ⊆ HG be chosen such that Typ(A) = [Z(HA )] = [Z(ϕα (A))]. t ≥ Typ(A) is finitely generated as well. Thus, we have to construct a connection whose type is determined by ϕα (A) and the generators of t. For this we use the induction on the number of generators of t. In conclusion, we have to construct inductively from A new connections Ai , such that Ai−1 coincides with Ai at least along the paths that pass α or that lie in the graphs i . But, at the same time, there has to exist a path e, such that hAi (e) equals the i th generator of t. Now, it should be obvious that we get help from the construction method for new connections introduced in [10]. Before we do this we recall an important notation used there. Definition 4.5. Let γ1 , γ2 ∈ P. We say that γ1 and γ2 have the same initial segment (shortly: γ1 ↑↑ γ2 ) iff there exist 0 < δ1 , δ2 ≤ 1 such that γ1 |[0,δ1 ] and γ2 |[0,δ2 ] coincide up to the parametrization.
Stratification of the Generalized Gauge Orbit Space
641
We say analogously that the final segment of γ1 coincides with the initial segment of γ2 (shortly: γ1 ↓↑ γ2 ) iff there exist 0 < δ1 , δ2 ≤ 1 such that γ1−1 |[0,δ1 ] and γ2 |[0,δ2 ] coincide up to the parametrization. Iff the corresponding relations are not fulfilled, we write γ1 ↑↑ γ2 and γ1 ↓↑ γ2 , respectively. Finally, we recall the decomposition lemma. Lemma 4.14.Let x ∈ M be a point. Any γ ∈ P can be written (up to parametrization) as a product γi with γi ∈ P, such that • int γi ∩ {x} = ∅ or • int γi = {x}. 4.4.2. Successive magnifying of the types. In order to prove Proposition 4.11 we need the following lemma for magnifying the types. Hereby, we will use explicitly the con struction of a new connection A from A as given in [10]. Lemma 4.15. Let i be finitely many graphs, A ∈ A and α ⊆ HG be a finite set of paths with Z(HA ) = Z(hA (α)). Furthermore, let g ∈ G be arbitrary. Then there is an A ∈ A, such that: • hA (α) = hA (α),
• πi (A ) = πi (A) for all i, • hA (e) = g for an e ∈ HG and • Z(HA ) = Z({g} ∪ hA (α)).
Proof. 1. Let m ∈ M be some point that is neither contained in the images of i nor in that of α, and join m with m by some path γ . Now let e be some closed path in M with base point m and without self-intersections, such that im (i ) = ∅. (12) im e ∩ int γ ∪ im (α) ∪ Obviously, there exists such an e because M is supposed to be at least two-dimensional. Set e := γ e γ −1 ∈ HG and g := hA (γ )−1 ghA (γ ). Finally, define a connection A for A, e and g as follows: 2. Construction of A • Let δ ∈ P be for the moment a "genuine" path (i.e., not an equivalence class) that does not contain the initial point e (0) ≡ m of e as an inner point. Explicitly we have int δ ∩ {e (0)} = ∅. Define g hA (e )−1 hA (δ) hA (e ) g −1 , for δ ↑↑ e and δ ↓↑ e g h (e )−1 h (δ) , for δ ↑↑ e and δ ↓↑ e A A . hA (δ) := −1 hA (δ) hA (e ) g , for δ ↑↑ e and δ ↓↑ e hA (δ) , else • For every trivial path δ set hA (δ) = eG . • Now, let δ ∈ P be an arbitrary path. Decompose δ into a finite product δi due to Lemma 4.14 such that no δi contains the point e (0) in the interior when we suppose δi is not trivial. Here, set hA (δ) := hA (δi ).
We know from [10] that A is indeed a connection.
642
C. Fleischhack
3. The assertion πi (A ) = πi (A) for all i is an immediate consequence of the construction because im (i ) ∩ im e = ∅. As well, we get hA (α) = hA (α). 4. Moreover, from (12), the fact that e has no self-intersections and the definition of A we get hA (γ ) = hA (γ ) and so
hA (e) = hA (γ ) hA (e ) hA (γ −1 ) = hA (γ ) g hA (γ )−1 = g. 5. We have Z(HA ) = Z({g} ∪ HA ). “⊆” Let f ∈ Z(HA ), i.e. f hA (α) = hA (α) f for all α ∈ HG. • From hA (e) = g it follows that f g = gf , i.e. f ∈ Z({g}). • From im e ∩ im (α) = ∅ it follows that hA (αi ) = hA (αi ), i.e. f ∈ Z(hA (αi )) for all i. Thus, f ∈ Z({g}) ∩ Z(hA (α)) = Z({g} ∪ HA ). “⊇” Let f ∈ Z({g} ∪ HA ). • Let α be a path from m to m , such that int α ∩ {m } = ∅ or int α = {m }. Set α := γ α γ −1 . Then by construction we have hA (α) = hA (γ ) hA (α ) hA (γ )−1 = hA (γ ) hA (α ) hA (γ )−1 .
There are four cases: Suppose, e.g., α ↑↑ e and α ↓↑ e . Then hA (α) = hA (γ ) g hA (e )−1 hA (α ) hA (e ) (g )−1 hA (γ )−1 = g hA (γ ) hA (e )−1 hA (α ) hA (e ) hA (γ )−1 g −1 = g hA (γ e •
−1 −1
αeγ
) g −1 .
Thus, f ∈ Z({hA (α)}). The remaining cases yield the same conclusion. Now, let α ∈ HG be arbitrary and α := γ −1 αγ . By the Decomposition Lemma 4.14 there is a decomposition α = αi with int αi ∩{m } = ∅ or int αi = {m } for all i. Thus, α = γ αi γ −1 = −1 γ αi γ . Using the result just proven we get −1 γ αi γ = Z({hA (α)}). f ∈ Z h A
Thus, f ∈ Z(HA ). Due to the definition of α we have Z(HA ) = Z({g} ∪ hA (α)).
4.4.3. Construction of arbitrary types. Finally, we can now prove the desired proposition. Proof of Proposition 4.11. • Let t ∈ T and t ≥ Typ(A). Then there exist a Howe subgroup V ⊆ G with t = [V ] and a g ∈ G, such that Z(HA ) ⊇ g −1 V g =: V . Since V is a Howe subgroup, we have Z(Z(V )) = V and so by Lemma 4.2 there exist certain u0 , . . . , uk ∈ Z(V ) ⊆ G, such that V = Z(Z(V )) = Z({u0 , . . . , uk }).
Stratification of the Generalized Gauge Orbit Space
643
• Now let Z(HA ) = Z(hA (α)) with an appropriate α ⊆ HG as in Corollary 4.3. Because of V ⊆ Z(HA ) we have V = V ∩Z(HA ) = Z({u0 , . . . , uk })∩Z(hA (α)) = Z({u0 , . . . , uk } ∪ hA (α)). • We now use inductively Lemma 4.15. Let A0 := A and α 0 := α. Construct for all j = 0, . . . , k a connection Aj +1 and an ej ∈ HG from Aj and α j by that lemma, such that πi (Aj +1 ) = πi (Aj ) for all i, hAj +1 (α j ) = hAj (α j ), hAj +1 (ej ) = uj and Z(HAj +1 ) = Z({uj } ∪ hAj (α j )). Setting α j +1 := α j ∪{ej } we get Z(HAj +1 ) = Z({uj }∪hAj (α j )) = Z(hAj +1 (α j +1 )).
Finally, we define A := Ak+1 . Now, we get πi (A ) = πi (A) for all i, hA (α) = hA (α) and hA (ej ) = uj . Thus, Z(HA ) = Z(hA (α k+1 ))
= Z(hA ({e0 , . . . , ek } ∪ α))
= Z({u0 , . . . , uk } ∪ hA (α)) = V,
i.e., Typ(A ) = [V ] = t.
The proposition just proven has a further immediate consequence. Corollary 4.16. A=t is non-empty for all t ∈ T . Proof. Let A be the trivial connection, i.e. hA (α) = eG for all α ∈ P. The type of A is [G], thus minimal, i.e. we have t ≥ Typ(A) for all t ∈ T . By means of Proposition 4.11 there is an A ∈ A with Typ(A ) = t. This corollary solves the problem which gauge orbit types exist for generalized connections. Theorem 4.17. The set of all gauge orbit types on A is the set of all conjugacy classes of Howe subgroups of G. Furthermore we have Corollary 4.18. Let be some graph. Then π (A=tmax ) = π (A). In other words: π is surjective even on the generic connections. Proof. π is surjective on A as proven in [10]. By Proposition 4.11 there is now an A with Typ(A ) = tmax and π (A ) = π (A).
4.5. Stratification of A. First we recall the general definition of a stratification [14]. Definition 4.6. A countable family S of non-empty subsets of a topological space X is called stratification of X iff S is a covering for X and for all U, V ∈ S we have • U ∩ V = ∅ $⇒ U = V , • U ∩ V = ∅ $⇒ U ⊇ V and
644
C. Fleischhack
• U ∩ V = ∅ $⇒ V ∩ (U ∪ V ) = V . The elements of such a stratification S are called strata. A stratification is called topologically regular iff for all U, V ∈ S, U = V and U ∩ V = ∅ $⇒ V ∩ U = ∅. Theorem 4.19. S := {A=t | t ∈ T } is a topologically regular stratification of A. Analogously, {(A/G)=t | t ∈ T } is a topologically regular stratification of A/G. Proof. • Obviously, S is a covering of A. • For a compact Lie group the set of all types, i.e. all conjugacy classes of Howe subgroups of G, is at most countable (cf. [14]). • Moreover, from A=t1 ∩ A=t2 = ∅, A=t1 = A=t2 immediately follows. • Due to Corollary 4.13 we have Cl(A=t1 ) = A≤t1 , i.e. from Cl(A=t1 ) ∩ A=t2 = ∅, t2 ≤ t1 follows and thus Cl(A=t1 ) ⊇ A=t2 . (Note that Cl(U ) denotes again the closure of U , here w.r.t. A.) • Analogously we get Cl(A=t2 ) ∩ (A=t1 ∪ A=t2 ) = A≤t2 ∩ (A=t1 ∪ A=t2 ) = A=t2 . • As well, from Cl(A=t1 )∩A=t2 = ∅ and A=t1 = A=t2 , t1 > t2 follows, i.e. Cl(A=t2 )∩ A=t1 = ∅. Consequently, S is a topologically regular stratification of A. For a regular stratification it would be required that each stratum carries the structure of a manifold that is compatible with the topology of the total space. In contrast to the case of the classical gauge orbit space [14], this is not fulfilled for generalized connections. 5. Non-Complete Connections We shall round off that paper with the proof that the set of the so-called non-complete connections is contained in a set of induced Haar measure zero. This section actually stands a little bit separated from the context because it is the only section that is not only algebraic and topological, but also measure theoretical. Again, G is compact. Definition 5.1. Let A ∈ A be a connection. 1. A is called complete ⇐⇒ HA = G. 2. A is called almost complete ⇐⇒ HA = G. 3. A is called non-complete ⇐⇒ HA = G. Obviously, we have Lemma 5.1. If A ∈ A is complete (almost complete, non-complete), so A◦g is complete (almost complete, non-complete) for all g ∈ G. Thus, the total information about the completeness of a connection is already contained in its gauge orbit. Now, to the main assertion of this section. Proposition 5.2. Let N := {A ∈ A | A non-complete}. Then N is contained in a set of µ0 -measure zero whereas µ0 is the induced Haar measure on A [2,6,10].
Stratification of the Generalized Gauge Orbit Space
645
Since N is gauge invariant, we have Corollary 5.3. Let [N ] := {[A] ∈ A/G | A non-complete}. Then [N ] is contained in a set of µ0 -measure zero. For the proof of the proposition we still need the following Lemma 5.4. Let U ⊆ G be measurable with µHaar (U ) > 0 and NU := {A ∈ A | HA ⊆ G \ U }. Then NU is contained in a set of µ0 -measure zero. Proof. • Let k ∈ N and k be some connected graph with one vertex m and k edges α1 , . . . , αk ∈ HG. Such a graph does indeed exist for dim M ≥ 2. For instance, take k circles Ki with centers in ( 1i , 0, . . . ) and radii 1i . By means of an appropriate chart mapping around m these circles define a graph with the desired properties. Furthermore, let πk : A −→ Gk . A −→ (hA (α1 ), . . . , hA (αk )) • Denote now by Nk,U := πk−1 ((G \ U )k ) the set of all connections whose holonomies on k are not contained in U . Per construction we have NU ⊆ Nk,U . • Since the characteristic function χNk,U for Nk,U is obviously a cylindrical function, we get χNk,U dµ0 = πk∗ (χ(G\U )k ) dµ0 µ0 (Nk,U ) = A A k = χ(G\U )k dµHaar = [µHaar (G \ U )]k . Gk
• From NU ⊆ Nk,U for all k follows NU ⊆ k Nk,U . But, µ0 ( k Nk,U ) ≤ µ0 (Nk,U ) = µHaar (G \ U )k for all k, i.e. µ0 ( k Nk,U ) = 0, because µHaar (G \ U ) = 1 − µHaar (U ) < 1. Proof of Proposition 5.2. • Let (*k )k∈N be some null sequence. Furthermore, let {Uk,i }i be for each k a finite covering of G by open k,i whose respective diameters are smaller than *k . Now sets U define N := k i NUk,i . • Since Uk,i is open and G is compact, Uk,i is measureable with µHaar (Uk,i ) > 0. Due to Lemma 5.4 we have NUk,i ⊆ NU∗ k,i with µ0 (NU∗ k,i ) = 0 for all k, i; thus N ⊆ N ∗ := k i NU∗ k,i with µ0 (N ∗ ) = 0. • We are left to show N ⊆ N . Let A ∈ N. Then there is an open U ⊆ G with HA ⊆ G \ U . Now let m ∈ U . Then * := dist(m, ∂U ) > 0. Choose k such that *k < *. Then choose a Uk,i with m ∈ Uk,i . We get for all x ∈ Uk,i : d(x, m) ≤ diamUk,i < *k < *, i.e. x ∈ U . Consequently, Uk,i ⊆ U and thus HA ⊆ G \ Uk,i , i.e. A ∈ N . Corollary 5.5. The set of all generic connections (i.e. connections of maximal type) has µ0 -measure 1. Proof. Every almost complete connection A has type [Z(HA )] = [Z(G)] = tmax . (Observe that the centralizer of a set U ⊆ G equals that of the closure U .) Since A=tmax is open due to Proposition 4.8, thus measurable, Proposition 5.2 yields the assertion.
646
C. Fleischhack
The last assertion is very important: It justifies the definition of the natural induced Haar measure on A/G (cf. [2, 10]). Actually, there were (at least) two different possibilities for this. Namely, let X be some general topological space equipped with a measure µ and let G be some topological group acting on X. The problem now is to find a natural measure µG on the orbit space X/G. On the one hand, one could simply define µG (U ) := µ(π −1 (U )) for all measurable U ⊆ X/G. (π : X −→ X/G is the canonical projection.) But, on the other hand, one also could stratify the orbit space. For instance, in the easiest case we could have X = X/G×G. In general, one gets (roughly speaking) X= V /G × GV \ G whereas V is an appropriate disjoint decomposition of X and GV characterizes the type of the orbits on V . Now one naively defines µG (U ) := −1 µ(π −1 (U )∩V ) V µG,V (G/GV ) := V µ π (U ) ∩ V µV (GV ), where µV measures the “size” of the stabilizer GV in G. This second variant is nothing but the transformation of the dµ measures using the Faddeev-Popov determinant (i.e. the Jacobi determinant) dµ . In G contrast to the first method, here the orbit space and not the total space is regarded to be primary. For a uniform distribution of the measure over all points of the total space the image measure on the orbit space need no longer be uniformly distributed; the orbits are weighted by size. But, for the second method the uniformity is maintained. In other words, the gauge freedom does not play any rôle when the Faddeev-Popov method is used. Nevertheless, we see in our concrete case of πA/G : A −→ A/G that both methods are equivalent because the Faddeev-Popov determinant is equal to 1 (at least outside a set of µ0 -measure zero). This follows immediately from the slice theorem and the corollary above that the generic connections have total measure 1. 6. Discussion In the present paper we gained a lot of information about the structure of the generalized gauge orbit space within the Ashtekar framework. The most important tool was the theory of compact transformation groups on topological spaces. This enabled us to investigate the action of the group of generalized gauge transforms on the space of generalized connections. Our considerations were guided by the results of Kondracki and Rogulski [14] about the structure of the classical gauge orbit space for Sobolev connections. The methods used there are however fundamentally different from ours. Within the Ashtekar approach most of the proofs are purely algebraic or topological; in the classical case the methods are especially based on the theory of fiber bundles, i.e. analysis and differential geometry. In Sect. 3 we proved that the G-stabilizer B(A) of a connection A is isomorphic to the G-centralizer Z(HA ) of the holonomy group of A. Furthermore, two connections have conjugate G-stabilizers if and only if their holonomy centralizers are conjugate. Thus, the type of a generalized connection can be defined equivalently both by the G-conjugacy class of B(A) (as known from the general theory of transformation groups) and by the G-conjugacy class of Z(HA ). This is a significant difference from the classical case. The reduction of our problem from structures in G to those in G was the crucial idea in Sect. 4. Since centralizers in compact groups are even generated by a finite number of elements, we could model the gauge orbit type [Z(HA )] on a finite-dimensional space. Using an appropriate mapping we lifted the corresponding slice theorem to a slice theorem on A. This is the main result of our paper. Collecting connections of one and the same type we got the so-called strata whose openness was an immediate
Stratification of the Generalized Gauge Orbit Space
647
consequence of the slice theorem. In the next step we showed that the natural ordering on the set of the types encodes the topological properties of the strata. More precisely, we proved that the closure of a stratum contains (besides the stratum itself) exactly the union of all strata having a smaller type. This implied that this decomposition of A is a topologically regular stratification. All these results hold in the classical case as well. This is very remarkable because our proofs used partially completely different ideas. However, two results of this paper go beyond the classical theorems. First, we were able to determine the full set of all gauge orbit types occurring in A. This set is known for Sobolev connections – to the best of our knowlegde – only for certain bundles. Recently, Rudolph, Schmidt and Volobuev solved this problem completely for SU (n)-bundles P over two-, three- and four-dimensional manifolds [21]. The main problem in the Sobolev case is the non-triviality of the bundle P . This can exclude orbit types that occur in the trivial bundle M × SU (n). But, this problem is irrelevant for the Ashtekar framework: Every regular connection in every G-bundle over M is contained in A [2]. This means, in a certain sense, we only have to deal with trivial bundles. Second, in the Ashtekar framework there is a well-defined natural measure on A. Using this we could show that the generic stratum has the total measure one; this is not true in the classical case. The proposition above implies now that the Faddeev-Popov determinant for the transformation from A to A/G is equal to 1. This, on the other hand, justifies the definition of the induced Haar measure on A/G by projecting the corresponding measure for A which has been discussed in detail in Sect. 5. Hence, we were able to “transfer” the classical theory of strata in a certain sense (almost) completely to the Ashtekar program. We emphasize that all assertions are valid for each compact structure group – both in the analytical and in the C r -smooth case. What could be next steps in this area? An important – and in this paper completely ignored – item is the physical interpretation of the gained knowledge. So we will conclude our paper with a few ideas that could link mathematics and physics: • Topology What is the topological structure of the strata? Are they connected or is A connected itself (at least for connected G)? Is A=t globally trivial over (A/G)=t , at least for the generic stratum with t = tmax ? What sections do exist in these bundles, i.e. what gauge fixings do exist in A? These problems are closely related to the so-called Gribov problem, the non-existence of global gauge fixings for classical connections in principal fiber bundles with compact, non-commutative structure group (see, e.g., [22]). From this lots of difficulties result for the quantization of such a Yang–Mills theory that are not circumvented up to now. • Algebraic topology Is there a meaningful, i.e. especially non-trivial cohomology theory on A? First abstract attempts can be found, e.g., in [4, 3]. Is it possible to construct in this way characteristic classes or even topological invariants? • Measure theory How are arbitrary measures distributed over single strata? In other words: What properties do measures have that are defined by the choice of a measure on each single stratum?
648
C. Fleischhack
This is extremely interesting, in particular, from the physical point of view because the choice of a µ0 -absolutely continuous measure µ on A corresponds to the choice of an action functional S on A by A f dµ = A f e−S dµ0 . According to Lebesgue’s decomposition theorem all measures whose support is not fully contained in the generic stratum have singular parts. Finally, we have to stress that the present paper only investigates the case of pure gauge theories. Of course, this is physically not satisfying. Therefore the next goal should be the inclusion of matter fields. A first step has already been done by Thiemann [23] whereas the aspects considered in the present paper did not play any rôle in Thiemann’s paper. Acknowledgements. I am very grateful to Gerd Rudolph and Eberhard Zeidler for their great support while I wrote my diploma thesis and the present paper. Additionally, I thank Gerd Rudolph for reading the drafts. Moreover, I am grateful to Domenico Giulini and Matthias Schmidt for convincing me to hope for the existence of a slice theorem on A. I thank Jerzy Lewandowski for asking me how the notion of webs is related to the notion of paths in the present paper. Finally, I thank the Max-Planck-Institut für Mathematik in den Naturwissenschaften for its generous promotion.
References 1. Ashtekar, A. and Isham, C. J.: Representations of the holonomy algebras of gravity and nonabelian gauge theories. Class. Quant. Grav. 9, 1433–1468 (1992) 2. Ashtekar, A. and Lewandowski, J.: Representation theory of analytic holonomy C ∗ algebras. In: Baez, J. C. (ed.) Knots and Quantum Gravity. Oxford Lecture Series in Mathematics and its Applications, Oxford: Oxford University Press, 1994, pp. 21–61 3. Ashtekar, A. and Lewandowski, J.: Differential geometry on the space of connections via graphs and projective limits. J. Geom. Phys. 17, 191–230 (1995) 4. Ashtekar, A. and Lewandowski, J.: Projective techniques and functional integration for gauge theories. J. Math. Phys. 36, 2170–2191 (1995) 5. Ashtekar, A., Lewandowski, J., Marolf, D., Mourão, J. and Thiemann, Th.: SU (N ) quantum Yang–Mills theory in two dimensions: A complete solution. J. Math. Phys. 38, 5453–5482 (1997) 6. Baez, J. C. and Sawin, S.: Functional integration on spaces of connections. J. Funct. Anal. 150, 1–26 (1997) 7. Baez, J. C. and Sawin, S.: Diffeomorphism-invariant spin network states. J. Funct. Anal. 158, 253–266 (1998) 8. Bredon, G. E.: Introduction to Compact Transformation Groups. New York: Academic Press, Inc., 1972 9. Burbaki, N.: Gruppy i algebry Li, Gl. IX Kompaktnye vewestvennye gruppy Li). Moskva: Izdatelstvo «Mir», 1986 10. Fleischhack, Ch.: Hyphs and the Ashtekar-Lewandowski Measure. MIS-Preprint 3/2000. To appear in J. Geom. Phys. 11. Fleischhack, Ch.: A new type of loop independence and SU (N ) quantum Yang–Mills theory in two dimensions. J. Math. Phys. 41, 76–102 (2000) 12. Giles, R.: Reconstruction of gauge potentials from Wilson loops. Phys. Rev. D24, 2160–2168 (1981) 13. Kelley, J. L.: General Topology. Toronto, New York, London: D. van Nostrand Company, Inc., 1955 14. Kondracki, W. and Rogulski, J.: On the stratification of the orbit space for the action of automorphisms on connections (Dissertationes mathematicae 250). Warszawa: 1985 15. Kondracki, W. and Sadowski, P.: Geometric structure on the orbit space of gauge connections. J. Geom. Phys. 3, 421–434 (1986) 16. Lewandowski, J.: Group of loops, holonomy maps, path bundle and path connection. Class. Quant. Grav. 10, 879–904 (1993) 17. Lewandowski, J. and Thiemann, Th.: Diffeomorphism invariant quantum field theories of connections in terms of webs. Class. Quant. Grav. 16, 2299–2322 (1999) 18. Marolf, D. and Mourão, J.: On the support of the Ashtekar-Lewandowski measure. Commun. Math. Phys. 170, 583–606 (1995) 19. Mitter, P. K.: Geometry of the space of gauge orbits and the Yang-Mills dynamical system. Lectures given at Cargèse Summer Inst. on Recent Developments in Gauge Theories, Cargèse, 1979 20. Rendall, A.: Comment on a paper of Ashtekar and Isham. Class. Quant. Grav. 10, 605–608 (1993)
Stratification of the Generalized Gauge Orbit Space
649
21. Rudolph, G., Schmidt, M. and Volobuev, I.: Classification of gauge orbit types for SU (n) gauge theories. math-ph/0003044 22. Singer, I. M.: Some remarks on the Gribov ambiguity. Commun. Math. Phys. 60, 7–12 (1978) 23. Thiemann, Th.: Kinematical Hilbert spaces for Fermionic and Higgs quantum field theories. Class. Quant. Grav. 15, 1487–1512 (1998) Communicated by H. Nicolai
Commun. Math. Phys. 214, 651 – 677 (2000)
Communications in
Mathematical Physics
© Springer-Verlag 2000
On Action-Angle Variables for the Second Poisson Bracket of KdV T. Kappeler, M. Makarov Institut für Mathematik, Universität Zürich, Winterthurerstrasse 190, 8057 Zürich, Switzerland. E-mail:
[email protected] Received: 12 November 1999 / Accepted: 9 May 2000
Abstract: We prove that on the Sobolev spaces H N (S 1 ) (N ≥ 0), each leaf of the foliation, induced by the second Poisson bracket of KdV, admits global action-angle variables. The actions with respect to the first bracket raise to the actions with respect to the second bracket. The angles for the first bracket are, at the same time, angles for the second bracket.
0. Introduction Consider the Korteweg–deVries equation (KdV) with periodic boundary conditions, ∂t u = −∂x3 u + 6u∂x u;
u(x + 1, t) = u(x, t) (x, t ∈ R).
It is well known that this equation can be viewed as a bihamiltonian system (cf. e.g. [GZ, Ma, Mc]): i.e. there exist two Poisson structures, ∂x , referred to as the first Poisson structure, and 1 Lq := − ∂x3 + q∂x + ∂x q, 2
(0.1)
referred to as the second Poisson structure, with the following two properties: (BH1) ∂x and Lq are compatible, i.e. Lq + ∂x is a Poisson structure; (BH2) KdV is a Hamiltonian system with respect to ∂x and Lq . Indeed, ∂t u = ∂x
∂H1 (u), ∂q(x)
(0.2)
652
T. Kappeler, M. Makarov
where the Hamiltonian H1 is given by H1 (q) := ∂t u = Lq
1
1 2 0 ( 2 (∂x q)
+ q 3 )dx and
∂H2 (u), ∂q(x)
(0.3)
1 where H2 (q) := 21 0 q(x)2 dx. Both Poisson structures are degenerate and induce symplectic foliations. However, the second Poisson structure is not constant. Nevertheless the second Poisson structure is, in many respects, the more natural one of the two (cf. e.g. [GZ, KZ]). For the first 1 Poisson structure, the average [q] := 0 q(x)dx is a Casimir and the symplectic leaves of the induced foliation on the Sobolev spaces (H N (S 1 ))N≥0 are given by the affine spaces HcN (S 1 ) := {q ∈ H N (S 1 ) | [q] = c} (S 1 denotes the unit circle.) It has been proved in [KM2] (cf. also [BBGK] and [BKM1]) that each symplectic leaf admits global action-angle coordinates. The aim of this paper is to prove that each leaf of the symplectic foliation induced by the second Poisson structure admits global action-angle variables as well. To the best of our knowledge, this is the first result of its kind, providing action-angle coordinates for a completely integrable system of infinite dimension with a non-constant Poisson structure. To state our results more precisely, we have first to describe the symplectic foliation of the second Poisson structure. For q ∈ H N (S 1 ), consider the Schrödinger equation −y
+ qy = λy.
(0.4)
Denote by y1 (x, λ, q) and y2 (x, λ, q) the fundamental solutions of (0.4), which are N+2 (R), and by (λ, q) the discriminant elements in Hloc (λ, q) := y1 (1, λ, q) + y2 (1, λ, q).
(0.5)
It turns out that (0, q) is a Casimir for the second Poisson structure whose level sets describe the regular leaves on the Sobolev spaces (H N (S 1 ))N≥0 . However, the foliation induced by the second Poisson structure is not regular and admits singular leaves. To describe them, denote by spec(q) the spectrum λ0 (q) < λ1 (q) ≤ λ2 (q) < . . . of the d2 operator − dx 2 + q when considered with periodic boundary conditions on the interval [0,2] and let specd (q) ⊆ spec(q) denote the subset of double eigenvalues. Further introduce, for N ≥ 0, the following level sets: For n ≥ 0, N 1 LN c,n := {q ∈ H (S ) | λn−1 (q) < 0 < λn (q); (0, q) = c}
(0.6)
(with the convention λ−1 = −∞). For n = 2k ≥ 2, InN := {q ∈ H N (S 1 ) | λn−1 (q) = λn (q) = 0};
(0.7)
further J0N := {q ∈ H N (S 1 ) | λ0 (q) = 0} and for n = 2k ≥ 2, N := {q ∈ H N (S 1 ) | λn− 1 ± 1 (q) = 0; 0 ∈ specd (q)}. J±,n 2
2
(0.8)
0 . To make notation shorter, we set Lc,n := L0c,n , In := In0 , J0 = J00 , and J±,n := J±,n
Action-Angle Variables for KdV
653
Notice that, for N ≥ 0, LN c,n = ∅ for |c| < 2 and n even, or c > 2 and n = 4k, or N JN JN c < −2 and n = 4k + 2 and that LN = I2k +,2k −,2k (disjoint union). In (−1)k 2,2k the sequel, the pair (c, n) will always be assumed to satisfy one of the following three conditions (k ≥ 0) (i) |c| < 2, n = 2k + 1;
(ii) c > 2, n = 4k;
(iii) c < −2, n = 4k + 2,
as in all other cases LN c,n = ∅. The following result was established in [KM1] N N Theorem 1. For N ≥ 0, LN c,n (|c| = 2, n ≥ 0), J±,n (n ≥ 2, even), and J0 are N 1 connected, real analytic submanifolds of H (S ) of codimension 1 and are the regular symplectic leaves of the foliation of the second Poisson structure, whereas InN (n ≥ 2, even) are connected, real analytic submanifolds of H N (S 1 ) of codimension 3 and are the singular symplectic leaves.
To describe our results in a convenient form, we introduce the following Definition 1. We say that a real analytic submanifold M of a Hilbert space of sequences with elements (vj , wj )j ∈A (A ⊂ N), endowed with the canonical symplectic structure j dvj ∧dwj , is an action-angle model space for a leaf L of the symplectic foliation of a Poisson structure if there exists a symplectomorphism : L → M which, together with its inverse, is real analytic so that, for any j ∈ A, the elements (vj , wj ) are either actionangle variables or Birkhoff variables (i.e. the associated symplectic polar coordinates are action-angle coordinates). We call an analytic action-angle diffeomorphism. Introduce, for A ⊆ N, hm (A; R2 ) := z = (xj , yj )j ∈A | ||z||2m := j 2m (xj2 + yj2 ) < ∞ , j ∈A
the cylinder Z := R×S 1 , and the half cylinders Z + := R>0 ×S 1 and Z − := R 2 and n = 0, hN+ 2 (N; R2 ) is an action-angle model space for LN c,n . 3
(B) For |c| > 2 and n = 2k ≥ 2 even, Z × hN+ 2 (N \ {k}; R2 ) is an action-angle model space for LN c,n . 3
N . (C) For n = 2k ≥ 2, Z ∓ × hN+ 2 (N \ {k}; R2 ) is an action-angle model space for J±,n 3
(D) For n = 2k ≥ 2, hN+ 2 (N\{k}; R2 ) is an action-angle model space for the singular leaf InN . 3
(E) hN+ 2 (N; R2 ) is an action-angle model space for J0N . 3
N To shorten notation we denote by LN ∗ an arbitrary leaf of the foliation and by M∗ its corresponding action-angle model space, as given by Theorem 2.
654
T. Kappeler, M. Makarov
Remark 1. It follows from statement (B) that for |c| > 2 and n even, Lc,n has nontrivial homology. Its fundamental group is given by Z. Despite this fact, Lc,n admits global action-angle variables (cf. [Du] for a discussion of the existence of global action-angle coordinates in the case of integrable Hamiltonian systems of finite dimension). Remark 2. The proof of Theorem 2 is presented for the leaves Lc,n with |c| = 2 and n ≥ 1. The remaining leaves are treated in a similar fashion.
(1) To show Theorem 2 we use the Birkhoff coordinates 2In (cos θn , sin θn ) n≥1
on H0N (S 1 ) (N ≥ 0) for the first Poisson structure ∂x (cf. [KM2]) as a starting point, (1) (1) where (In , θn )n≥1 denote action-angle variables with respect to ∂x . Similarly as for In (2) (n ≥ 1), the actions In with respect to the second Poisson structure can be represented by periods of a holomorphic differential on the hyperelliptic surface µ = (λ, q)2 − 4 d2 associated to the spectrum spec(q) of the operator − dx 2 + q (cf. [FM, NV]). However, (2)
if λ2n−1 ≤ 0 ≤ λ2n , the period integral for In is not well defined and needs to be (2) regularized. As a result, in the case where λ2n−1 < 0 < λ2n , In can take all real values. This phenomenon accounts for the cylinder Z appearing in the action-angle model space (1) (2) in part (B) of Theorem 2. The formulas for the actions In and In extend to the whole (1) phase space L2 (S 1 ). Moreover, the bihamiltonian structure relates the action In and (2) In , Lq (1)
(2)
(1)
d ∂In ∂In = , ∂q(x) dx ∂q(x)
(2)
i.e. In can be raised to In . In a first step of the proof of Theorem 2 (cf. Sect. 4) it is shown that, on any leaf LN c,n with |c| < 2 and N ≥ 0 (with corresponding model space denoted by M∗ ), one (2) : L → M when the actions I (1) in obtains a real ∗ ∗ n
analytic diffeomorphism (1) (q) =
(1)
2In (cos θn , sin θn )
(2)
n≥1
are replaced by In
(n ≥ 1). Similar results
hold for all other leaves, with some extra care to be taken for the leaves with topology. In Sect. 3 we prove that the angles (θn )n≥1 in the definition of (1) are, at the same (2) time, conjugate, with respect to the second Poisson structure, to the actions In n≥1 . To establish this result, we have been inspired from computations due to McKean–Vaninsky [MV] who proved canonical relations for the defocusing nonlinear Schrödinger equation (NLS). In future work we plan to extend the results of Theorem 2 to more general phase spaces, including in particular quasiperiodic potentials. 1. Action Variables of the Second Poisson Structure 1.1. Action variables for KdV. Based on Arnold’s formula for the action variables of a finite dimensional integrable system, Flaschka and McLaughlin [FM] introduced vari(1) ables In which can be shown to be action variables for KdV with respect to the first
Action-Angle Variables for KdV
Poisson structure (n ≥ 1, q ∈ L2 (S 1 ))
1 (−1)n (µ, q) dµ, In(1) (q) = arccosh π #n 2 where
arccosh
(−1)n (−1)n (µ, q) := log (µ) − (µ)2 − 4 . 2 2
655
(1.1)
(1.2)
Here #n (n ≥ 1) is a counterclockwise oriented circuit around the nth gap (λ2n−1 , λ2n ), the root (µ)2 − 4 is the continuous branch (in q and λ) defined on C \ ∪n≥0 [λ2n−1 , λ2n ] and normalized by i (µ, q)2 − 4 > 0 for q = 0, µ = 1. It is convenient to choose the principal branch of the logarithm in (1.2). For this purpose we assume that #n is chosen sufficiently close around the gap (λ2n−1 , λ2n ) so that it is contained in the of domain the principal branch of the logarithm. In addition we may (−1)n (µ) assure that Re ≥ 21 for µ inside #n . 2 (2) By the same approach as in [FM] one can define variables I˜n with respect to the second Poisson structure (cf. [NV]),
1 1 (−1)n (µ, q) I˜n(2) := arccosh dµ 2π #n µ 2
where we assume for the moment that, for any n ≥ 1, zero is not on #n . However the above definition depends on the choice of #n . More precisely, the integral depends on whether 0 is inside of #n or not. To remove this dependence we slightly the change 1 (−1)n (−i0) 1 above definition by adding the Casimir Fn (q) := − 2π dµ arccosh #n µ 2 to I˜n2 , which leads to
(−1)n (µ) (−1)n (−i0) 1 arccosh − arccosh dµ, 2 2 #n µ (1.3) n n where, as usual, arccosh (−1) (−i0) = lim&→0+ arccosh (−1) (−i&) . Notice that, 2 2 due to Cauchy’s theorem the Casimir Fn (q) vanishes if 0 is not inside #n . Further the right side of (1.3) is well defined even in the case where 0 is on the circuit #n , as the (−1)n (µ) function arccosh is differentiable in µ on #n . 2 In(2) :=
1 2π
(2)
The fact that Fn (q) = 0 if 0 is not inside #n can be used to verify that indeed In (q) does not depend on the choice of the circuit #n , as long as #n is close to [λ2n−1 , λ2n ]. Hence, in the case where 0 ∈ [λ2n−1 , λ2n ], (1.3) leads to
1 λ2n 1 (−1)n (µ − i0) (2) In = arccosh dµ, (1.4) π λ2n−1 µ 2
656
T. Kappeler, M. Makarov
whereas in the case 0 ∈ (λ2n−1 , λ2n ) one gets,
(−1)n (µ − i0) (−1)n (−i0) 1 λ2n 1 (2) In = arccosh − arccosh dµ. π λ2n−1 µ 2 2 (1.5) n Notice that for 0 ∈ {λ2n , λ2n−1 }, arccosh (−1)2(0) = 0. Nevertheless, the integral (1.5) remains convergent: in the case λ2n = λ2n−1 , this holds as arccosh is Hölder continuous of order 21 for x ≥ 1. In the case λ2n = λ2n−1 = 0, the convergence follows (2) from the case λ2n−1 = 0 < λ2n by a limiting argument which shows that In = 0 in this case. 1.2. Raising of actions. Proposition 2. For q ∈ L2 (S 1 ) and n ≥ 1, Lq
(2)
(1)
∂In d ∂In = . ∂q(x) dx ∂q(x)
Remark. For a bihamiltonian system with Poisson structures P1 and P2 , we say that a functional F1 , defined on the phase space, raises to a functional F2 if P1 dF1 = P2 dF2 . (1) (2) Proposition 2 states that In raises to In (n ≥ 1). In fact, for any integrable, non degenerate, finite dimensional bihamiltonian system, actions with respect to the first Poisson structure raise to actions for the second Poisson structure. Proof. Choose #n so that 0 ∈ #n . Then from (1.3) one obtains (2) ∂In 1 1 1 1 ∂(0) ∂(µ)
=− − dµ. (1.6) ∂q(x) 2π #n µ 2 (µ) − 4 ∂q(x) 2 (−i0) − 4 ∂q(x) Observe that any product yi · yj of the two fundamental solutions y1 = y1 (x, µ, q), y2 = y2 (x, µ, q) of −y
+ qy = µq is in the domain of Lq and satisfies Lq (yi yj ) = d 2 (yi yj ). As ∂(µ) 2µ dx ∂q(x) is a linear combination of y1 (x, µ) , y1 (x, µ)y2 (x, µ) and y2 (x, µ)2 we conclude that Lq
∂(µ) d ∂(µ) = 2µ . ∂q(x) dx ∂q(x)
In particular it follows that Lq ∂(0) ∂q(x) = 0 and in view of (1.6), this leads to (2)
d 1 ∂In =− Lq ∂q(x) dx π
#n
(1)
d ∂In ∂(µ) dµ = . 2 ∂q(x) dx ∂q(x) (µ) − 4 1
Introduce the Poisson brackets {·, ·}j (j = 1, 2) defined by 1 1 ∂F d ∂G ∂F ∂G {F, G}1 := dx; {F, G}2 := Lq dx ∂q(x) dx ∂q(x) ∂q(x) ∂q(x) 0 0 for functionals F and G with sufficiently regular gradients.
(1.7)
Action-Angle Variables for KdV
657
Corollary 3. For a differentiable functional F , {F, In(1) }1 = {F, In(2) }2 (n ≥ 1). Proposition 2 allows to express the average [q] = S 1 q(x)dx of q ∈ L2 (S 1 ) as a (2) function of the actions (In )n≥1 and Casimir(s). To prove this we use the fact that if a Casimir F of a first Poisson structure of a bihamiltonian system raises to a functional G, then G is a Casimir for the second Poisson structure. Corollary 4. There exists a Casimir f (q) with respect to the second Poisson structure, so that, (2) 2πj Ij + f (q). (1.8) [q] = j ≥1
Proof. Recall that 21 ||q||2 is the Hamiltonian of translation with respect to the first (1) Poisson structure, hence the frequency ωj corresponding to the j th action variable Ij (1) is given by ωj = ∂(1) ( 21 ||q||2 ) = 2πj . Thus 21 ||q||2 − j ≥1 2πj Ij is constant on ∂Ij
each leaf L2c (S 1 ) := {q ∈ L2 (S 1 ) | [q] = c} of the foliation induced by the first Poisson (1) (1) structure. For q ≡ c, Ij = 0 ∀j ≥ 1 and therefore 21 ||q||2 − j ≥1 2πj Ij = 21 c2 , which leads to 1 1 (1) 2πj Ij + [q]2 (q ∈ L2 (S 1 )). (1.9) ||q||2 = 2 2 j ≥1
Notice that, for q ∈ H 1 (S 1 ), 21 ||q||2 raises to [q], i.e. d d ∂ 21 ||q||2 ∂[q] = q= . Lq ∂q(x) dx dx ∂q(x) Thus, in view of Proposition 2, 21 ||q||2 −
(1) j ≥1 2πj Ij
(which is a Casimir with respect (2)
to the first Poisson structure) raises to [q] − j ≥1 2πj Ij which by the remark before Corollary 4 is a Casimir with respect to the second Poisson structure on H 1 (S 1 ) and, by continuity, on L2 (S 1 ). The next result compares the span of the L2 -gradients For q ∈ L2 (S 1 ), let
(2)
∂In ∂q(x)
with the span of
(1)
∂In ∂q(x) .
O := {n ≥ 1 | λ2n−1 (q) < λ2n (q)}. For a subset A ⊆ L2 (S 1 ), denote by span "A# the L2 -closure of the linear span of A. Lemma 5.
(2) (1) ∂In ∂(0) ∂In , | n ∈ O = span 1, |n∈O . span ∂q(x) ∂q(x) ∂q(x)
658
T. Kappeler, M. Makarov
Proof. First notice that either λ2n = 0 ∀n ∈ O or λ2n−1 = 0 ∀n ∈ O. As the two cases are treated in the same way we consider the case λ2n = 0 ∀n ∈ O. By I so(q) we denote the isospectral set of q, Iso(q) := {p ∈ L2 (S 1 ) | spec(p) = spec(q)}. The generalized tangent space to Iso(q) s given by (cf. [MT]) d 2 Tq Iso(q) = span f |n∈O , dx 2n where f2n is an L2 -normalized eigenfunction corresponding to λ2n . Thus, with Mq denoting the right inverse of Lq (cf. [KM1]), d −1 2 Tq Iso(q) = span 1, f2n |n∈O ; span 1, dx ∂(0) ∂(0) 2 span , Mq Tq Iso(q) = span ,f | n ∈ O , ∂q(x) ∂q(x) 2n d 2 2 = 2λ where we used the identity Lq f2n 2n dx f2n . 2 with a = 0 iff n ∈ {0}∪O and ∂(0) = 2 As 1 = n≥0 an f2n n n≥0 bn f2n with bn = ∂q(x) 2 2 0 iff n ∈ {0} ∪ O (cf. [MT], p. 175–176) we conclude that f0 is in span 1, f2n | n ∈ O
as well as in span
∂(0) 2 ∂q(x) , f2n
| n ∈ O . This leads to
d −1 ∂(0) span 1, Tq Iso(q) = span , Mq Tq Iso(q) = NO , dx ∂q(x) 2 | n ∈ O . On the other hand, using the Birkhoff coordiwhere NO := span f02 , f2n (1) d ∂In (1) nates provided by , one sees that Tq Iso(q) = span dx ∂q(x) | n ∈ O . Thus (cf. Proposition 2) (1) d −1 ∂In Tq Iso(q) = span 1, |n∈O ; NO = span 1, dx ∂q(x) (2) ∂(0) ∂(0) ∂In NO = span , Mq Tq Iso(q) = span , |n∈O . ∂q(x) ∂q(x) ∂q(x)
2. Angle Variables 2.1. Abel map and angle variables. In this section we review the definition of the angle variables (θn )n≥1 (cf. [MT, KM2]). First let us introduce some more notation. For q ∈ L2 (S 1 ) and j ≥ 1, denote by φj (λ, q) the function φj (λ, q) =
1 j 2π 2
µ(j ) (q) − λ l , l2π 2
l∈N\{j }
(2.1)
Action-Angle Variables for KdV
659
(j )
(j )
where µl (q) satisfy λ2l−1 (q) ≤ µl (q) ≤ λ2l (q) (j, l ≥ 1, l = j ) and are uniquely determined by 1 λ2k (q) dλ φj (λ, q) = 0, (k = j ). (2.2) π λ2k−1 (q) 2 (λ, q) − 4 Further, for j ≥ 1, denote by cj (q) the spectral invariant, defined by (cf. [KM2]) 1 dλ φj (λ, q) = 1. (2.3) cj (q) 2 2π #j (λ, q) − 4 In the case the j th gap is open, cj (q) is given by 1 cj (q) π
λ2j (q)
λ2j −1 (q)
φj (λ, q)
dλ 2 (λ − i0, q) − 4
= 1.
(2.4)
Define ψj (λ) := cj φj (λ) and introduce, for n ≥ 1 and q with γn (q) > 0, the multivalued ( mod 2π ) map (cf. [KM2]) θn (q) :=
∞
µ∗k (q)
k=1 λ2k (q)
ψn (λ, q) ∗
dλ 2 (λ, q) − 4
,
(2.5)
d where µk (q) (k ≥ 1) denote the Dirichlet eigenvalues of − dx 2 + q, considered on the
∗ ∗ interval [0, 1] and µk (q) the Dirichlet divisor, µk (q) = (µk (q), ∗ 2 (µk , q) − 4), on
the hyperelliptic surface y = ∗ 2 (λ, q) − 4 with ∗ 2 (µk , q) − 4 = y1 (1, µk , q) − y2 (1, µk , q). (2.6) 2
Recall from [KM2] that the infinite sum in (2.5) converges uniformly.
2.2. On the gradient of the angle variables. For the convenience of the reader we recall two propositions proved in [KM2]. Denote by Un the open set Un := {q ∈ L2 (S 1 ) | γn (q) = 0}. For A ⊆ L2 (S 1 ) denote by N orA the (possibly empty) subset of A given by N orA := {q ∈ A | µk (q) = λ2k (q), ∀k ≥ 1}. Further, denote by mij ≡ mij (λ, q) the entries of the Floquet matrix, m11 := y1 (1, λ, q); m21 := y1 (1, λ, q); m12 := y2 (1, λ, q); m22 := y2 (1, λ, q). For k, n ≥ 1 and q ∈ Un , define (˙ denotes the derivative cn,k (q) := −
ψn (λ2k , q) ; ˙ 2k , q) (λ
dk (q) := (−1)k+1
d dλ )
m ˙ 11 (λ2k , q)m21 (λ2k , q) . ˙ 2k , q) (λ
660
T. Kappeler, M. Makarov
Proposition 6 ([KM2]). For n ≥ 1 and q ∈ N orUn , ∞ ∂θn = cn,k (q) y1 (x, λ2k , q)y2 (x, λ2k , q) + dk (q)y22 (x, λ2k , q) , ∂q(x) k=1
where the series converges in H 2 (S 1 ). Proposition 7 ([KM2]). For any N ≥ 0 and n ≥ 1 the map ∇θn : Un ∩ H0N → H0N+1 ,
q '→
∂θn ∂q(x)
is real analytic. The following corollary follows from Proposition 7. Corollary 8. For q ∈ Un ∩ Um ∩ H 2 (S 1 ), the bracket {θn , θm }2 (q) is well defined. 3. Canonical Relations In this section we prove the following Theorem 3. Let n, m ≥ 1. (2)
(2)
(i) For q ∈ L2 (S 1 ), {In , Im }2 = 0. (2) (ii) For q ∈ L2 (S 1 ) ∩ Un , {θn , Im }2 = −δn,m . 2 1 (iii) For q ∈ Un ∩ Um ∩ H (S ), {θn , θm }2 = 0. d . Then, for any a ∈ R, Lq;a ∂(a) Proof. For a ∈ R, introduce Lq;a := Lq − 2a dx ∂q(x) = 0 (cf. [KM2], Appendix B). Notice that (a − b)Lq = aLq;b − bLq;a . Hence, for a = b,
∂(b) ∂(a) 1 ∂(b) ∂(a) , Lq , (aLq;b − bLq;a ) = ∂q(x) ∂q(x) L2 a − b ∂q(x) ∂q(x) L2
∂(a) ∂(b) ∂(b) 1 ∂(a) = 0, = +a b Lq;a , , Lq;b b−a ∂q(x) ∂q(x) L2 ∂q(x) ∂q(x) L2
and in view of (1.6), statement (i) follows. (2) (1) (ii) By Corollary 3, {θn , Im }2 = {θn , Im }1 on Un . It is proved in [KM2] that θn is an angle variable for KdV with respect to the first Poisson structure. Hence θn satisfies (1) {θn , Im }1 = −δn,m and (ii) follows. To prove (iii) we need two auxiliary results. Proposition 9. For n, m ≥ 1 and q ∈ N orUn ∩ N orUm , {θn , θm }2 (q) is well defined and {θn , θm }2 (q) = 0.
Action-Angle Variables for KdV
661
Proof. For k ≥ 1, introduce ak (x, q) := y1 (x, µk (q), q)y2 (x, µk (q), q), Then (cf. [PT]), for i, j ≥ 1, d 2 2 d = = 0; g ,g ai , aj , dx i j L2 dx L2
gk (x, q) :=
d 2 g , aj dx i
y2 (x, µk (q), q) . ||y2 (·, µk (q), q)||L2 L2
=
1 δi,j . 2
d d 2 As Lq ak = 2µk dx ak and Lq gk2 = 2µk dx gk , we conclude that, for i, j ≥ 1, Lq gi2 , gj2 2 = Lq ai , aj L2 = 0; Lq gi2 , aj 2 = µi δi,j . L
L
The claimed statement then follows from Proposition 6.
Recall that by Corollary 8, {θn , θm }2 (q) is well defined on Un ∩ Um ∩ H 2 (S 1 ). Proposition 10. For n, m, k ≥ 1 and q ∈ Un ∩ Um ∩ H 2 (S 1 ), the derivative of {θn , θm }2 ∂I
(2)
k at q in the direction Lq ∂q(x) is zero.
Remark. Reformulated (with a slight abuse of notation), Proposition 10 states that (2) {{θn , θm }2 , Ik }2 = 0. Proof. We want to use the Jacobi identity (2)
(2)
(2)
{{θn , θm }2 , Ik }2 + {{θm , Ik }2 , θn }2 + {{Ik , θn }2 , θm }2 = 0.
(3.1)
Given F, G, H ∈ C 2 (U ) with sufficiently regular gradients and U an open subset of L2 (S 1 ), the Jacobi identity {{F, G}2 , H }2 + {{G, H }2 , F }2 + {{H, F }2 , G}2 = 0
(3.2)
∂ F ∂ G and ∂q(x)∂q(y) is established as follows: for h ∈ L2 (S 1 ), one obtains, using that ∂q(x)∂q(y) are symmetric, that the derivative Dh {F, G}2 of {F, G}2 in the direction of h ∈ L2 (S 1 ) is given by ∂G ∂ 2F Dh {F, G}2 = , Lq , h(x) ∂q(x)∂q(y) ∂q(y) L2 L2 ∂F d ∂G + , h(x) ∂q(x) dx ∂q(x) L2 (3.3) ∂F ∂ 2G − , h(x) , Lq ∂q(x)∂q(y) ∂q(y) L2 L2 ∂G d ∂F − . , h(x) ∂q(x) dx ∂q(x) L2 2
2
Substituting (3.3) (and similar expressions for the other terms) in the left side of (3.2), one verifies the identity (3.2).
662
T. Kappeler, M. Makarov (2)
In our case F = θn , G = θm , H = Ik , and U = Un ∩ Um ∩ H 2 (S 1 ). Taking into (2) ∂θn ∂θm , ∂q(x) ∈ H 3 (S 1 ) account that on U , the functions Ik , θn , θm are analytic and ∂q(x) (2)
∂In ∈ H 4 (S 1 ), one verifies that all the expressions in the (Proposition 7) as well as ∂q(x) formal derivation of the Jacobi identity are well defined. It follows from Theorem 3(ii) (2) that {{θn , θm }2 , Ik }2 = 0.
Proof of Theorem 3(iii). For q ∈ Un ∩ Um ∩ H 2 (S 1 ), the set I so(q) := {p ∈ L2 | spec(p) = spec(q)} satisfies I so(q) ⊂ Un ∩ Um ∩ H 2 (S 1 ). By Proposition 10, for (2) k, n, m ≥ 1, {{θn , θm }2 , Ik }2 = 0 on Un ∩ Um ∩ H 2 (S 1 ). Thus {θn , θm }2 is constant on I so(q). By Proposition 9, for the unique element p in N or(I so(q)), {θn , θm }2 (p) = 0, hence {θn , θm }2 (q) = 0. 4. Action-angle Map 4.1. Definition of (2) . In [KM2] it is shown that the map (1) : H0N (S 1 ) → hn+ 2 (N; R2 ), 1
defined by (1) (q) :=
(1) 2Ij (cos θj , sin θj )
j ≥1
is a real analytic action-angle map with respect to the first Poisson structure. (1) has a real analytic extension to all of L2 (S 1 ) (again denoted by (1) ), (1) (q) := (1) (q−[q]). We define the map (2) using the same angles (θn )n≥1 as in (1) . From (1.4) one (2) (2) sees that for n ≥ 1 with λ2n (q) < 0, In (q) satisfies In (q) ≤ 0 whereas for n with (2) 0 < λ2n−1 (q), In (q) ≥ 0. (2) = (vn , wn )n≥1 , where Definition 11. For q ∈ L2 (S 1 ), set (2) (q) := n (q) (2) −2In (cos θn , sin θn ) − I (2) , θ n n (vn , wn ) := (0, 0) (2) 2In (cos θn , sin θn )
n≥1
if λ2n < 0; if
λ2n−1 ≤ 0 ≤ λ2n , γn = 0;
if
0 = λ2n−1 = λ2n ;
if
0 < λ2n−1 .
(2) To simplify notation, the restriction of (2) to a leaf LN ∗ is denoted again by . N N (2) For example, for a leaf L∗ = Lc,2k with |c| > 2 and k ≥ 1, (q) is given by
(2) (2) (2) , Ik , θk , 2In (cos θn , sin θn ) , − −2In (cos θn , sin θn ) n>k
1≤nk
.
Action-Angle Variables for KdV
663
N 4.2. Analyticity of (2) . Recall that MN ∗ denotes the model space corresponding to L∗ (cf. Theorem 2). In this subsection we prove (2)
N Proposition 12. For any N ≥ 0, ∗ : LN ∗ → M∗ is a real analytic map.
First let us introduce the following real analytic submanifolds (c ∈ R fixed): L∗ (C) ≡ Lc,k (C) := {q ∈ L2 (S 1 ; C) | (0, q) = c}, J∗ (C) ≡ J±,k (C) := {q ∈ L2 (S 1 ; C) | λk− 1 ± 1 = 0; 0 ∈ specd (q)}, 2
2
Ik (C) := {q ∈ L2 (S 1 ; C) | λk−1 (q) = λk (q) = 0}. Using that λj (q) (j ≥ 0) satisfy the asymptotics λ2n (q), λ2n−1 (q) = n2 π 2 + O(||q||), one obtains the following Lemma 13. (i) Given q0 ∈ Lc,k , there exist a complex neighborhood Vq0 ⊆ L2 (S 1 ; C) of q0 and K > 0 with the properties sup |λ2n+1 (q) − λ2n (q)| ≥ K;
q∈Vq0 n≥0
sup |λj (q)| ≥ K.
q∈Vq0 j ≥0
(ii) Given q0 ∈ J±,k , there exist a complex neighborhood Vq0 ⊆ L2 (S 1 ; C) of q0 and K > 0 with the properties sup |λ2n+1 (q) − λ2n (q)| ≥ K;
q∈Vq0 n≥0
sup
q∈Vq0
|λj (q)| ≥ K.
j =k− 21 ± 21
(iii) Given q0 ∈ Ik , there exist a complex neighborhood Vq0 ⊆ L2 (S 1 ; C) of q0 and K > 0 with the property sup |λ2n+1 (q) − λ2n (q)| ≥ K;
q∈Vq0 n≥0
sup
q∈Vq0
|λj (q)| ≥ K.
j =k− 21 ± 21 (1)
(2)
Lemma 13 allows us to analytically extend the variables In and In , defined in (1.1) (2) and (1.2). With regard to the variables In , note that on J±,k and Lc,k (C) (C),n Ik (C), 1 (−1) (−i0) 1 with |c| < 2, the Casimir Fn (q) = − 2π #n µ dµ arccosh vanishes for 2
any n ≥ 1, whereas on Lc,k with |c| > 2, Fn (q) vanishes for n = 2k . From Lemma 13 and the analyticity of (µ, q) in µ and q it follows that the defi(1) (2) nitions (1.1) and (1.2) can be used to define the variables In respectively In on the (1) complex neighborhoods specified in Lemma 13. According to [KM2], the actions In 2 1 are analytic on a complex neighborhood of L (S ) which is independent of n. A similar (2) result holds for the variables In : Lemma 14. For any given q0 ∈ L2 (S 1 ), there exists a complex neighborhood Uq0 ⊆ (2) L2 (S 1 ; C) of q0 so that, for any n ≥ 1, In (q), defined by (1.2), is analytic on Uq0 .
664
T. Kappeler, M. Makarov
We consider the case L∗ = Lc,k with |c| < 2. (Similar arguments are used in the (2) other cases.) To prove that ∗ is real analytic we have to show that, for any q0 ∈ L∗ , there exists a complex neighborhood Uq0 ⊂ L2 (S 1 ; C) of q0 such that, for any n ≥ 1, the coordinate functions vn , wn are analytic and (vn , wn )n≥1 is uniformly bounded in Uq0 . 1 Taking (1) : H N (S 1 ) → hN+ 2 (N; R2 ) as a starting point, we want to replace in (1) (1) (2) j the action Ij by Ij . Notice that for q real valued and n with λ2n−1 < λ2n < 0, or
(1)
In 2|λ2n−1 |
(2)
≤ −In ≤
1/2 ≤ (2|λ2n |)−1/2 (2|λ2n−1 |)−1/2 ≤ −ζn (q) := −In(2) /In(1)
(1)
In 2|λ2n |
(4.1)
whereas for 0 < λ2n−1 < λ2n , one obtains 1/2 ≤ (2|λ2n−1 |)−1/2 . (2|λ2n |)−1/2 ≤ ζn (q) := In(2) /In(1)
(4.2)
The inequalities (4.1)–(4.2) show that the quotient 1/2 ζn (q) := ± ±In(2) /In(1)
(4.3)
can be defined in a continuous fashion for q ∈ L2 (S 1 ) with λ2n−1 ≤ λ2n < 0 resp. 0 < λ2n−1 ≤ λ2n : if λ2n = λ2n−1 = 0, ζn (q) is given by ζn (q) = − (−2λ2n )−1/2 resp. (2λ2n )−1/2 . (2) (2) (1) As we are in the case L∗ = Lc,k with |c| < 2, n satisfies n = ζn n (n ≥ 1). th For n ≥ 1, denote by γn the length of the n gap, γn := λ2n − λ2n−1 . To prove that (2) is real analytic we need the following Lemma 15. Let q0 ∈ L∗ . Then there exists a (small) neighborhood Uq0 ⊆ L2 (S 1 ; C) with the following properties: (A) For q ∈ Uq0 , (1)
(i) In (q) = (ii)
(2) ±In (q)
1 2nπ
=
γn 2 2
1 4n3 π 3
(1) 1 + rn (q) ; γn 2 (2) 1 + rn (q) , 2 (j )
(j )
where the error terms rn (q) (j = 1, 2) satisfy rn (q) = O
(B) The (iii)
log n n
uniformly
(j ) (j ) for q ∈ Uq0 as well as |1 + rn (q)| ≤ C and C1 ≤ Re(1 + rn (q)) with C > 1 independent of n and q ∈ Uq0 . functions ζn (q) (cf. (4.3)), admit an analytic continuation on Uq0 and satisfy (3) ζn (q)2 = 2n21π 2 (1 + rn (q)), (3) (3) where the error terms rn (q) satisfy rn (q) = O logn n uniformly for q ∈ Uq0 (3) (3) as well as |1 + rn (q)| ≤ C and C1 ≤ Re(1 + rn (q)) with C > 1 independent of n and q ∈ Uq0 .
Remark. From Lemma 15(B) and Cauchy’s formula it follows that
log n ||p|| (q ∈ Uq0 ; p ∈ L2 (S 1 ; C)). dq ζn (p) = O n2
Action-Angle Variables for KdV
665 (1)
Proof. Statement (B) follows from (A) and Lemma 14: By (A) and Lemma 14, rn and (j ) (2) log n rn are real analytic. In view of the asymptotics rn (q) = O n (j = 1, 2) we can (j )
choose Uq0 so small that for any q ∈ Uq0 , n ≥ 1 and j = 1, 2, | Im(1 + rn (q))| ≤ (j ) (3) (1) (2) 1 Re 1 + r (q) .As 1+rn (q) = (1+rn (q))−1 (1+rn (q)), the claimed statements n 2 then follow from (A) (with C > 1 appropriately chosen). Concerning (A), statements (i) and (ii) are proven in a similar fashion, thus we consider (ii) only (for (i) cf. also [BBGK]). Choose a neighborhood U ⊆ L2 S 1 , C of q0 and, as above, circuits #n (n ≥ 1) around the nth gap (λ2n−1 (q0 ), λ2n (q0 )) so that, d2 for all q in U and n ≥ 1, the only eigenvalues of − dx 2 + q inside #n are λ2n−1 , λ2n , the d ˙ only zero of (µ) ≡ dµ (µ) inside #n is λ˙ n , and spec(q) ∩ #n = ∅. ( λ˙ n denotes the (2) ˙ zero of (µ) close to τn = (λ2n−1 + λ2n )/2 .) Then the variables In (q) are analytic (2)
on U (cf. Lemma 14). As we are in the case L∗ = Lc,k with |c| < 2, In is given by (n ≥ 1) 1 2π
In(2) (q) =
#n
1 (−1)n (µ) arccosh dµ. µ 2
As q0 ∈ Lc,k with |c| < 2, λk−1 (q0 ) < 0 < λk (q0 ) and k is odd. Introduce
−1, for n ≤ +1, for n >
εn :=
k−1 2 k−1 2 .
With this choice, Re(εn µ) > 0 inside #n and hence the value log(εn µ) of the principal ˙ branch of the logarithm is well defined. Integrate by parts to obtain (with (µ) = d (µ)) dµ εn In(2) (q) =
Use that
#n
˙
1 2π
#n
εn log(εn µ)
√ (µ)
dµ = 0, to obtain
εn In(2) =
εn 2π
(µ)2 −4
#n
˙ (µ) (µ)2 − 4
dµ.
˙ (µ) log(εn µ) − log(εn λ˙ n ) dµ. (µ)2 − 4
Introduce, for µ inside and near #n , w1 (µ) ≡ w1 (µ, q) := 2
1 (µ − λ0 ) 2 ((λ2k − µ)(λ2k−1 − µ))1/2 , nπ k2 π 2
k=n
w2 (µ) ≡ w2 (µ, q) := (−1)n−1
λ˙ k − µ . k2 π 2
k=n
666
T. Kappeler, M. Makarov
(Here, as usual, z1/2 denotes the branch on C \ R≤0 with 11/2 = 1.) Then both
˙ (µ)2 − 4 and (µ) admit product representations,
w1 (µ) (λ2n − µ)(µ − λ2n−1 ), nπ µ − λ˙ n ˙ (µ) = (−1)n−1 2 2 w2 (µ) n π
with the sign of the radical (λ2n − µ)(µ − λ2n−1 ) determined by (µ)2 − 4. Hence
1 εn µ (µ − λ˙ n )w(µ)
εn In(2) = log dµ, 2π #n λ˙ n nπ (λ2n − µ)(µ − λ2n−1 ) where w(µ) :=
(µ)2 − 4 =
w2 (µ) w1 (µ) .
If γn = 0, (ii) holds and thus we may assume γn = 0. By
deforming the contour #n , one obtains with µ(t) := τn + t γ2n and σn := γn /2 εn = nπ π
τn −λ˙ n γn /2 ,
(σn + t) w(µ(t))dt. (1 − t 2 )1/2 In order to investigate the integrand we make a Taylor expansion of log µ(t) , ˙λ εn In(2)
µ(t) log λ˙ n −1
log where, with λ(µ) :=
µ λ˙ n
1
µ(t) λ˙ n
n
= λ(µ(t))b(µ(t)),
− 1,
γn /2 µ(t) −1= λ(µ(t)) = λ˙ n λ˙ n and
b(µ) :=
1
0
τn − λ˙ n +t γn /2
ds = 1 + sλ(µ)
1 0
1+s
=
ds
µ λ˙ n
γn /2 (σn + t) λ˙ n
−1
.
Therefore, εn In(2) = with 4 Jn := π
1 −1
(γn /2)2 Jn 4nπ εn λ˙ n
(σn + t)2 b(µ(t))w(µ(t)) (2)
dt . (1 − t 2 )1/2
It remains to obtain the stated estimates for In . According to [PT], for µ − τn = O(1), the functions w1 (µ) and w2 (µ) satisfy asymptotic estimates
log n 1 log n w1 (µ) = 1 + O ; w2 (µ) = 1+O . n 2 n
Action-Angle Variables for KdV
667
Hence w2 (µ) 1 w(µ) = = 1 + rn(4) ; w1 (µ) 2 By Lemma 16 below, σn = γn O λ(µ(t)) =
log n n
rn(4)
=O
log n . n
and hence, with λ˙ n = n2 π 2 + O(1),
γn (σn + t) = γn O 2λ˙ n
1 n2
(0 ≤ |t| ≤ 1)
which leads to
b(µ(t)) = 1 + γn O
1 n2
.
Combining these estimates we conclude 4 Jn = π
1
−1
t2
1 dt + O √ 2 2 1−t
log n n
=1+O
log n n
locally uniformly in q. It remains to establish estimates for Jn for finitely many n’s. They are obtained by proving them first for real valued potentials in U - which is straightforward - and then taking, if necessary, a sufficiently small neighborhood Vq0 ⊆ U of q0 to conclude that the same estimates hold for the complex valued potentials in Vq0 as well. The following result has been obtained in [BKM1, Lemma2.4] Lemma 16. Locally uniformly on a (small) neighborhood U ⊂ L2 (S 1 ; C) of L2 , λ˙ n − τn = γn2 O where τn =
λ2n +λ2n−1 , γn 2
log n , n
˙ = λ2n − λ2n−1 , and λ˙ n is the zero of (λ) close to τn .
As a consequence of Lemma 15, we conclude that, for q ∈ L2 (S 1 ), (2) (2πj )3 Ij (q) < ∞. j ≥1
Proof of Proposition 12. The statement follows by combining Lemma 13 and Lemma 15 1 together with the result from [KM2] saying that (1) : H0N (S 1 ) → hN+ 2 N; R2 is real analytic.
668
T. Kappeler, M. Makarov
4.3. Properness of (2) . To shorten notation let, for λ2n−1 ≤ µ ≤ λ2n , (−1)n (µ, q) pn (µ) ≡ pn (µ, q) := arccosh 2 1/2
n (−1) (µ) 1 . = log 1+ 1− 2 ((µ)/2)2 N Proposition 17. For any N ≥ 0, (2) : LN ∗ → M∗ is proper. N Proof. Let us consider the case where LN ∗ = Lc,2k with |c| > 2 and k ≥ 1. (The other cases are treated similarly.) Given a compact subset K ⊂ MN c,2k , there exists M ≥ 1 (2) −1 (K) ⊆ Lc,2k , and, for any ε > 0, nε ≥ 1 so that for, q ∈ Q := n3+2N |In(2) (q)| ≤ M; (4.4) n≥1
n≥nε
n3+2N |In(2) (q)| ≤ ε.
(4.5)
By Lemma 27 (cf. Appendix B), there exists n1 = n1 (M, k, c) ≥ k + 1 such that for n ≥ n1 , 1 γn2 1 1 λ2n 1 (1) 1 In(2) (q) ≥ . pn (µ)dµ = In ≥ λ2n π λ2n−1 2λ2n 2(8π )2 λ2n n By Lemma 26 (cf. Appendix B), there exist constants Cj = Cj (M, c, k) (j ≥ 1) such that, for all q ∈ Lc,2k , γj (q) ≤ Cj . Let #l ≡ #l (q) := ( j ≥l j 2N γj2 )1/2 and Dk := kj =1 Cj . Recall that the sum of the bands nk=1 (λ2k−1 − λ2k−2 ) can be estimated by n2 π 2 (cf. [GT]). Then for n ≥ n1 , using that λ0 < λ2k−1 < 0 < λ2k (as k ≥ 1), we obtain √ λ2n ≤ λ2n − λ0 ≤ γ1 + . . . + γn + n2 π 2 ≤ Dn1 + n#n1 + n2 π 2 . This leads to n3+2N |In(2) (q)| ≥ M≥ n≥n1
≥
1 2(8π)2 Dn1
1 1 2 1 n3+2N γ √ 2 2(8π ) n≥n Dn1 + n#n1 + n2 π 2 n n 1 1 1 1 2N 2 n γn = #2 2 2 + #n1 + π n≥n 2(8π ) Dn1 + #n1 + π 2 n1 1
1 ≥ (#n1 − Dn1 − π 2 ), 2(8π)2 hence #n1 ≤ 2(8π )2 M + Dn1 + π 2 . 2 2 Substitute this inequality into #12 ≤ n2N 1 Dn1 + #n1 to obtain, for q ∈ Q, 2 2 2 2 #12 ≤ n2N D + 2(8π ) M + D + π . n 1 n 1 1
(4.6)
Action-Angle Variables for KdV
669
Thus the set {(γn (q))n≥1 | q ∈ Q} is bounded in hN (N; R). Using the same argument as above we obtain from (4.5), with n˘ := max (nε , n1 ), ε≥
n3+2N |In(2) (q)| ≥
n≥n˘
1 1 n2N γn (q)2 2 2 2(8π ) #1 (q) + π n≥n˘
and thus the set {(γn (q))n≥1 | q ∈ Q} is compact in hN (N; R). Corollary 4 together with (4.4) implies that, for q ∈ Q ⊂ Lc,2k , |[q]| ≤ M + f (q),
(4.7)
where f (q) is the Casimir introduced in Corollary 4. By [GT] it then follows that Q is compact in H N (S 1 ). 4.4. Local properties of (2) . In this subsection we prove N Proposition 18. For any N ≥ 0, the map (2) : LN ∗ → M∗ is a local diffeomorphism, N (2) N N i.e. for any q ∈ L∗ , dq : Tq L∗ → T(2) (q) M∗ is bijective. N We prove Proposition 18 for LN ∗ = Lc,2k with |c| > 2, k ≥ 1. (The other cases are ∼ N+ 23 (N; R2 ). To show that dq (2) : treated similarly.) Notice that T(2) (q) MN c,2k = h N+ 2 (N; R2 ) is invertible we prove that dq (2) is a Fredholm operator of Tq LN c,2k → h index 0 and is 1-1. 3
N+ 2 Lemma 19. The map dq (2) : Tq LN (N; R2 ) is a Fredholm operator of c,2k → h index 0. 3
(2)
Proof. In view of the definition of n , for n ∈ N \ {k} and p ∈ Tq LN c,2k , (1) (1) dq (2) n (p) = ζn (q)dq n (p) + dq ζn (p)n (q).
(4.8)
3 Formula (4.8) allows to extend dq (2) to a map dq (2) : H N (S 1 ) → hN+ 2 N; R2 . We 3 first prove that dq (2) : H N (S 1 ) → hN+ 2 N; R2 is a Fredholm operator of index 1. Rewrite (4.8) as (with n ∈ N \ {k}, p ∈ H N (S 1 ))
1 1 (1) (1) dq (2) (p) = (p) + ζ (q) − d dq (1) √ √ q n n n n (p) + dq ζn (p)n (q). 2nπ 2nπ Recall from [KM2] that dq (1) (p) : H N (S 1 ) → hN+ 2 (N; R2 ) is a Fredholm operator of index 1. By Lemma 15(B),
1 log n |ζn (q) − √ . |=O n2 2nπ 1
n ). Hence As (ζn (q))n=k ∈ l ∞ is analytic we then conclude that |dq ζn (p)| ≤ ||p||O( log n2 3 (1) 1 N+ 2 (2) N 1 2 dq : H (S ) → h (N; R ) is a compact perturbation of √ dq n 2nπ
n≥1
and thus a Fredholm operator of index 1. As a consequence, dq (2) : Tq LN c,2k → hN+ 2 (N; R2 ) is a Fredholm operator of index 0. 3
670
T. Kappeler, M. Makarov
To finish the proof of Proposition 18, it remains to prove N Lemma 20. The map dq (2) : Tq LN c,2k → T(2) (q) Mc,2k is 1-1.
Proof. It suffices to consider the case N = 0. Assume that, for some h ∈ Tq Lc,2k , dq (2) (h) = 0. Notice that ∂vn ∂wn (2) dq (h) = ,h ,h en + e−n , ∂q(x) ∂q(x) L2 L2 n≥1
(2) where en := (δk,n , 0)k≥1 and e−n := (0, δk,n )k≥1 . Introduce the sequence Fn (2)
defined by F0
:=
∂(0) ∂q(x) ,
n≥0
and
(2) F2n−1
:=
∂θn ∂q(x) , ∂vn ∂q(x) ,
n ∈ O; n ∈ O;
(2) F2n
:=
(2) Then, for n ≥ 0, h, Fn
(2)
∂In ∂q(x) , ∂wn ∂q(x) ,
n ∈ O; n ∈ O.
(2) = 0. We prove that h = 0 by showing that F is n L2 n≥0 (1) a complete system in L2 (S 1 ), i.e. span (Fn )n≥0 = L2 (S 1 ). We use themap (q) = (1)
1
(xn , yn )n≥1 . As dq (1) : L20 (S 1 ) → h 2 (N; R2 ) is 1-1, the system Fn (1) F0
n≥0
with
= 1 and, for n ≥ 1, (1) F2n−1
:=
∂θn ∂q(x) , ∂xn ∂q(x) ,
n ∈ O; n ∈ O;
(1) F2n
:=
(1)
∂In ∂q(x) , ∂yn ∂q(x) , (2)
n∈O n ∈ O (1)
is complete in L2 (S 1 ) (cf. [KM2]). Notice that, for n ∈ O, F2n−1 = F2n−1 . From (4.8) (2)
(1)
(2)
(1)
we conclude that, for n ∈ O, F2n−1 = ζn F2n−1 as well as F2n = ζn F2n with ζn = 0 (2) (2) (1) (1) (Lemma 15 ). By Lemma 5, span F0 , F2n | n ∈ O = span F0 , F2n | n ∈ O . Thus span Fn(2) | n ≥ 0 = span Fn(1) | n ≥ 0 = L2 (S 1 ). 4.5. Global properties of (2) . In this section we prove N Theorem 4. For N ≥ 0, the map (2) : LN ∗ → M∗ as well as its inverse is a real analytic diffeomorphism. N Proof. We have established that (2) : LN ∗ → M∗ is a real analytic map and a local (2) N diffeomorphism. It remains to show that : L∗ → MN ∗ is 1-1 and onto. Consider (2) −1 N (z) = 1}. Then V is open and closed in MN the set V := {z ∈ M∗ | F ∗ (as (2) is a local diffeomorphism) and proper. In order to show that V = MN it suffices ∗ therefore to prove that V = ∅. N N In the case where LN ∗ = Lc,n with |c| < 2 we take w = 0 ∈ Mc,n . Then, for any (2) −1 q∈ (0) and n ≥ 1, γn (q) = 0 (cf. (1.4)) and therefore q is a constant potential.
Action-Angle Variables for KdV
671
Using that a leaf LN c,n with |c| < 2, contains exactly one constant potential (cf. [KM1]) one concludes that 0 ∈ V which proves that V = MN c,n . N with |c| > 2 and k ≥ 1, it follows that γ (q) > 0 for In the case where LN = L k ∗ c,2k N all q ∈ L∗ and thus we have to argue differently. Introduce the sets gap{k} := {(vj , wj )j ≥1 | (vj , wj ) = (0, 0) for j = k}; Gapc,{k} := {q ∈ LN c,2k | γj (q) = 0 iff j = k}. and k ≥ 1),is 1-1. By (1.4)–(1.5), −1 gap{k} ⊆ Gapc,{k} . (2) Gapc,{k} ⊆ gap{k} and (2) Therefore it suffices to show that Gapc,{k} is non-empty and (2) , restricted to Gapc,{k} , is 1-1. By Subsect. 2.4 [KM1], Gapc,{k} is a 2-dimensional cylinder. Further, each isospectral set in Gapc,{k} is a circle, parametrized by the angle θk , and contains a unique normalized potential which is completely determined by its average. By Corollary 4, (2) [q] = 2πkIk + const on Gapc,{k} and hence the restriction of (2) to Gapc,{k} is 1-1. Thus ∅ = (2) Gapc,{k} ⊆ V which proves that V = MN c,n in this case as well. 4.6. Symplectic properties of (2) . In this section we prove N Theorem 5. For N ≥ 0 arbitrary, (2) : LN ∗ → M∗ is a symplectomorphism. (2) Proof. Let q ∈ LN ∗ and z := (q). Denote by (e±n )n≥1 the standard basis in 3 (2) N+ 2 2 h (N; R ), i.e. en := (δk,n , 0)k≥1 and e−n := (0, δk,n )k≥1 . Let u±n (q) be n≥1
the basis of Tq LN ∗ given by
−1 (2) (2) [e±n ]. u±n (q) ≡ u±n := dz (2)
(2)
Denote by ωq the symplectic structure evaluated at q, induced by the restriction of the second Poisson structure to LN ∗ . We have to prove that (2) (2) ωq(2) u±n , u±m = ∓δ±n,∓m , n, m ≥ 1. (4.9) It suffices to consider the case N = 0. Let us first verify (4.9) for q ∈ L2∗ ∩ Gap∞ , where Gap∞ := {q ∈ L2 (S 1 ) | γk (q) > 0, ∀k ≥ 1}. In view of Theorem 3, for any n, m ≥ 1 and q ∈ L2∗ ∩ Gap∞ , {vn , vm }2 (q) = {wn , wm }2 (q) = 0;
{vn , wm }2 (q) = δn,m ,
(4.10)
where we recall that (2) (q) = (vn , wn )n≥1 . For a functional f : L2 (S 1 ) → R and q ∈ L∗ , denote by δq f the orthogonal ∂f ∂f projection of ∂q(x) on Tq L∗ . Notice that Lq ∂q(x) − δq f = 0. Take q ∈ L2∗ ∩ Gap∞ . (2) As dq (2) u±k = e±k and ∂vn ∂wn en + e−n , dq (2) (h) = ,h ,h ∂q(x) ∂q(x) L2 L2 n≥1
672
T. Kappeler, M. Makarov
(2) (2) the system δq vk , δq wk k≥1 is a basis of Tq L∗ , biorthogonal to the basis uk , u−k k≥1 of Tq L∗ . On the other hand, by (4.10) Lq δq wk , −Lq δq vk k≥1 is biorthogonal to (2) (2) δq vk , δq wk k≥1 . Thus, for k ≥ 1, uk (q) = Lq δq wk and u−k (q) = −Lq δq vk . Using again (4.10) one concludes that, for q ∈ L2∗ ∩ Gap∞ , (4.9) holds. (2) (2) As L2∗ ∩Gap∞ is dense in L∗ and ωq as well as u±k (q) (k ≥ 1) depend continuously on q, we conclude that (4.9) holds for arbitrary q ∈ L∗ . Remark. It follows from the computations above that, for k ≥ 1 and q ∈ L∗ arbitrary, (2) (2) δq wk = Mq uk and δq vk = −Mq u−k , where Mq is the canonical right inverse of Lq (cf. [KM1]). Therefore we can improve on Theorem 3(iii) in the following way: Corollary 21. For n, m ≥ 1 and q ∈ Un ∩ Um {θn , θm }2 = 0. A. Appendix In this appendix we state some well known results that are frequently used in this paper. The following lemma follows by direct computations. Lemma 22. Let f and g be two solutions of −y
+ q(x)y = λy with q ∈ H 1 (S 1 ). Then d Lq (f g) = 2λ dx (f g). Recall that, for q ∈ L2 (S 1 ), y1 (x, λ, q) and y2 (x, λ, q) denote the fundamental solutions of −y
+ q(x)y = λy and mij are given by m11 := y1 (1, λ, q), m21 := y1 (1, λ, q), m12 := y2 (1, λ, q), m22 := y2 (1, λ, q). Lemma 23. (i) The functional (·, ·) is analytic on C × L2C (S 1 ). N+2 (S 1 ) and is given by (ii) For q ∈ H N (S 1 ) and λ ∈ C, ∂(λ,q) ∂q(x) is in H ∂(λ, q) = m12 y12 (x, λ, q) + (m22 − m11 )y1 (x, λ, q)y2 (x, λ, q) ∂q(x) − m21 y22 (x, λ, q). d ∂(λ,q) (iii) For q ∈ L2 (S 1 ), Lq ∂(λ,q) ∂q(x) = 2λ dx ∂q(x) . ! Lemma 24. (i) 2 (λ, q) − 4 = 4(λ0 − λ) n≥1 ! ˙ ˙ (ii) (λ, q) = − n≥1 (λnn2−λ) . π2 ! (µn −λ) (iii) y2 (1, λ, q) = n≥1 n2 π 2 .
(λ2n −λ)(λ2n−1 −λ) . n4 π 4
The following lemma is proved in [PT, p 168]. Lemma 25. Suppose zm , m ≥ 1, is a sequence of complex numbers such that zm = ! is an entire function of m2 π 2 + O(1). Then, for each n ≥ 1, the product m≥1 zmm2−λ π2 λ such that
m=n
zm − λ log n 1 n+1 1+O = (−1) m2 π 2 2 n
m≥1 m=n
uniformly for λ = n2 π 2 + O(1).
Action-Angle Variables for KdV
673
B. Appendix In this appendix we prove estimates needed in Subsect. 4.3. Recall that, for λ2n−1 ≤ µ ≤ λ2n , we have introduced pn (µ) ≡ pn (µ, q) := arccosh
(−1)n (µ, q) . 2
Define hn ≡ hn (q) := pn (λ˙ n (q), q)
(=
max
λ2n−1 ≤µ≤λ2n
pn (µ, q)).
Lemma 26. Let c ∈ R, k ≥ 0, M > 0. Then, for any n ≥ 1, there exists a constant Cn = Cn (M, k, c) > 0 so that for q ∈ Lc,k with ∞ j =1
(2)
(2πj )3 |Ij (q)| ≤ M,
(B.1)
the following two estimates hold: (ii) In(1) (q) ≤ Cn , n = k/2.
(i) γn (q) ≤ Cn , n ≥ 1;
(2)
Proof. (i) Case [|c| < 2] : for any j ≥ 1, the actions Ij 1 λ2j 1 π λ2j −1 µ pj (µ)dµ. Let q ∈ Lc,k with (B.1) and n ≥ 1. If hn (q) ≤ 1,
(2)
are given by Ij
=
γn ≤ 4 max {2π nhn , h2n }
(cf. [GT, Theorem 3.3]) and thus γn ≤ 8π n. If hn (q) > 1 (hence λ2n−1 < λ2n ) we consider the set An := {λ2n−1 ≤ µ ≤ λ2n | pn (µ) ≤ 1}. From [GT, Theorem 3.3] and the maximum principle, we obtain ln := meas(An ) ≤ 4 max {2π n, 1} = 8π n.
(B.2)
As |c| < 2, there exists n0 ≥ 1 with λ2n0 −2 < 0 < λ2n0 −1 . By assumption (B.1), we obtain, for n ≥ n0 , 1 1 λ2n 1 λ2n−1 + γn M (2) ≥ In ≥ dµ = log n3 π λ2n−1 +ln µ π λ2n−1 + ln or
πM γn ≤ (λ2n−1 + ln ) exp − λ2n−1 n3
πM πM ≤ λ2n−1 exp − 1 + 8π n exp . n3 n3
(B.3)
Using the a priori estimates on the bands, λ2j +1 − λ2j ≤ (2j + 1)π 2 (cf. [GT]) we get λ2n−1 ≤
n−1 j =n0
γj + n 2 π 2 .
(B.4)
674
T. Kappeler, M. Makarov
Combining (B.3) and (B.4) leads to (n ≥ n0 )
n−1 πM πM . γj exp 3 − 1 + 8π n exp γn ≤ n2 π 2 + n n3 j =n0
Similarly, for n < n0 , one obtains
n 0 −1 πM πM 2 2 γn ≤ n0 π + exp 3 − 1 + 8π n exp . γj n n3 j =n+1
By induction we then obtain the claimed estimate in the case [|c| < 2]. The case [c ≥ 2, k = 0] is treated similarly as the case [|c| < 2], once one has obtained an estimate for λ0 (q): If c ≥ 2 and k = 0, then 0 ≤ λ0 . By the variational characterization of the eigenvalue λ0 , λ0 (q) ≤ [q] and, by Corollary 4, [q] ≤ M + f (q). Thus, for any 2 2 n ≥ 1, 0 ≤ λ2n−1 ≤ λ2n−1 − λ0 + [q] ≤ n−1 j =1 γj + n π + M + f (q) and these estimates lead to bounds of γn as above. In the case [c > 2, k = 2m ≥ 2] as well as [c = ±2, k = 2m ≥ 2] one argues again similarly as in the case [|c| < 2] : Consider e.g. [c > 2, k = 2m ≥ 2]. Only the estimate λ2m 1 (2) (2) for γm is different as λ2m−1 < 0 < λ2m and Im is given by Im = π1 λ2m−1 µ (pm (µ) − (−1)m c pm (0))dµ. Notice that for q ∈ Lc,k , pm (0) = arccosh is independent of 2 q. If hm (q) ≤ pm (0) + 1, then γm ≤ 4 max{2π m(pm (0) + 1), (pm (0) + 1)2 }. If hm (q) > pm (0) + 1, introduce Am := {λ2m−1 ≤ µ ≤ λ2m | pm (µ) ≤ pm (0) + 1}. The function pm (µ) is increasing on (λ2m−1 , λ˙ m ) and decreasing on (λ˙ m , λ2m ), hence Am = [λ2m−1 , xm ] ∪ [ym , λ2m ], where pm (xm ) = pm (ym ) = pm (0) + 1. Notice that either 0 ≤ xm < ym or xm < ym ≤ 0. Let lm = meas(Am ). Again by [GT, Theorem 3.3] and the maximum principle, lm ≤ 4 max{2π m(pm (0) + 1), (pm (0) + 1)2 }. Thus, if γm ≤ 4lm , γm is bounded. If γm − 4lm > 0 we argue as follows: Assume that 0 ≤ xm < ym . (The case xm < ym ≤ 0 is treated similarly.) Then 1 xm 1 Im(2) = (B.5) (pm (µ) − pm (0))dµ π λ2m−1 µ 1 λ2m 1 1 ym 1 (pm (µ) − pm (0))dµ + (pm (µ) − pm (0))dµ. + π xm µ π ym µ m (0) The integrals on the right hand side of (B.5) are estimated separately: As pm (µ)−p ≥0 µ for λ2m−1 ≤ µ < 0 and for 0 < µ ≤ xm , we have xm 1 (pm (µ) − pm (0))dµ ≥ 0. λ2m−1 µ
Using that ym − xm = γm − lm and ym < γm one obtains 1 γm dµ 1 γm 1 ym 1 1 ym dµ (pm (µ) − pm (0))dµ ≥ ≥ = log . π xm µ π xm µ π lm µ π lm
Action-Angle Variables for KdV
675
As γm − lm = ym − xm , 0 ≤ xm ≤ ym , and 4lm ≤ γm in the case considered, 2lm ≤ γm − 2lm ≤ γm − lm < λ2m . This leads to
λ2m 1 dµ (pm (µ) − pm (0))dµ ≥ −pm (0) µ µ ym ym
γm −lm dµ lm 3 ≥ −pm (0) log = −pm (0) log 1 + . ≥ −pm (0) γm − 2lm 2 γm −2lm µ
0≥
λ2m
Substituting these estimates into (B.5) yields
1 1 3 γm − pm (0) log Im(2) ≥ log π lm π 2 or γm ≤ e
(2)
π|Im |
pm (0) 3 lm , 2
where lm can be estimated as above. (1) (ii) Notice that, for n = k/2, 0 ∈ [λ2n−1 , λ2n ] and therefore In (q) ≤ 2(|λ2n | + (2) |λ2n−1 |)|In (q)| and, according to Corollary 4, |[q]| ≤ M + f (q) for q ∈ Lc,k . Using that for q ∈ Lc,k , λk > 0 and hence |λj | ≤ max {λj − λ0 + |[q]|, λk − λ0 }, and using the same estimate for λ2n−1 − λ0 as above λ2n−1 − λ0 ≤ n2 π 2 +
n
γj
j =1
one obtains, with an appropriate choice of Cn = Cn (M, k, c), the claimed estimate (ii). Lemma 27. Under the same assumption as in Lemma 26, there exists n1 = n1 (M, k, c) so that for n ≥ n1 and q ∈ Lc,k , (i) γn (q) ≤ 18nπ; (1) 1 1 2 (ii) In (q) ≥ (8π) 2 n γn (q). Proof. (i) Let q ∈ Lc,k and choose n0 sufficiently large so that for n ≥ n0 , exp 1
τ1 when one of them hits ∂D. At the time τ2 , the particle which approaches the boundary jumps to the current location of a randomly (uniformly) chosen particle among the ones strictly inside D. The subsequent evolution of the process Xt proceeds along the same lines. Before we start to study properties of Xt , we have to check if the process is well defined. Since the distribution of the hitting time of ∂D has a continuous density, only one particle can hit ∂D at time τk , for every k, a.s. However, the process Xt can be defined for all t ≥ 0 using the informal recipe given above only if τk → ∞ as k → ∞. This is because there is no obvious way to continue the process Xt after the time τ∞ = limk→∞ τk if τ∞ < ∞. Hence, the question of the finiteness of τ∞ has a fundamental importance for our model. Theorem 1.1. We have limk→∞ τk = ∞ a.s. Consider an open set D which has more than one connected component. If at some time t all processes Xtk belong to a single connected component of D then they will obviously stay in the same component from then on. Will there be such a time t? The answer is yes, according to the theorem below, and so we could assume without loss of generality that D is a connected set, especially in Theorem 1.4. Theorem 1.2. With probability 1, there exists t < τ∞ such that all processes Xtk belong to a single connected component of D at time t. Before we continue the presentation of our results, we will provide a slightly more formal description of the process Xt than that at the beginning of the introduction. The fully rigorous definition would be a routine but tedious task and so it is left to the reader. One can show that given (x 1 , x 2 , . . . , x N ) ∈ D N , there exists a strong Markov process Xt , unique in the sense of distribution, with the following properties. The process starts from X0 = (x 1 , x 2 , . . . , x N ), a.s. Let τ1 = and for n ≥ 1,
inf
1≤m≤N
τn+1 =
inf
inf{t > 0 : lim Xsm ∈ D c },
1≤m≤N
s→t−
inf{t > τn : lim Xsm ∈ D c }. s→t−
Then τn+1 > τn for every n ≥ 1, a.s. For every n ≥ 1, there exists a unique kn such that lims→τn − Xskn ∈ D c , a.s. We have Xτmn = Xτmn − , for every m = kn . For some random j
j = j (n, kn ) = kn we have Xτknn = Xτn . The distribution of j (n, kn ) is uniform on the set {1, 2, . . . , N} \ {kn } and independent of {Xt , 0 ≤ t < τn }. For every n, the process {X(t∧τn+1 )− , t ≥ τn } is a Brownian motion on D N stopped at the hitting time of ∂D N . Let PtD (x, dy) be the transition probability for the Brownian motion killed at the time of hitting of D c . Given a probability measure µ0 (dx) on D, we define measures µt for t > 0 by P D (x, A)µ0 (dx) , (1.1) µt (A) = D tD D Pt (x, D)µ0 (dx)
Fleming–Viot Particle Representation of the Dirichlet Laplacian
681
for open sets A ⊂ D. Note that µt is a probability measure, for every t ≥ 0. Let δx (dy) be the probability measure with the unit atom at x. We will write X N t (dy) = (1/N ) N δ (dy) to denote the empirical (probability) distribution representing the k=1 Xtk particle process Xt . We will say that D has a regular boundary if for every x ∈ ∂D, Brownian motion starting from x hits D c for arbitrarily small t > 0, a.s. Theorem 1.3. Suppose that D is bounded and has a regular boundary. Fix a probability distribution µ0 on D and recall the definition (1.1). Suppose that for every N , the initial N distribution X0N is a non-random measure µN 0 . If the measures µ0 converge as N → ∞ to µ0 then for every fixed t > 0 the empirical distributions XtN converge to µt in the sense that for every set A ⊂ D, the sequence XtN (A) converges to µt (A) in probability. The regularity of ∂D seems to be only a technical assumption, i.e., Theorem 1.3 is likely to hold without this assumption. We conjecture that for any S > 0, the measure-valued processes {XtN ( · ), 0 ≤ t ≤ S} converge to {µt ( · ), 0 ≤ t ≤ S} in the Skorohod topology, as N → ∞. The arguments presented in this paper do not seem to be sufficient to justify this claim. One may wonder whether EXtN (A) = µt (A) for all sets A and t > 0 if we assume that X0N = µ0 . One can find intuitive arguments both for and against this claim but none of them seemed to be quite clear to us. We will have to resort to brute calculation to show that the statement is false. The word “brute” refers only to the lack of a clear intuitive explanation and not to the difficulty of the example which is in fact quite elementary (see Example 2.1 in Sect. 2). The example is concerned with a process on a finite state space. We presume that a similar example can be based on the Brownian motion process. We will say that an open set D ⊂ Rd satisfies the interior ball condition if for some r > 0 and every x ∈ D there exists an open ball B(y, r) ⊂ D such that x ∈ B(y, r). Theorem 1.4. Suppose that D ⊂ Rd is a bounded domain, has a regular boundary and satisfies the interior ball condition. (i) For every N , there exists a unique stationary probability measure MN for Xt . The process Xt converges to its stationary distribution exponentially fast, i.e., there exists λ > 0 such that for every A ⊂ D N , and every x ∈ D N , lim eλt |P x (Xt ∈ A) − MN (A)| = 0.
t→∞
N be the stationary empirical measure, i.e., let X N have the same distribution (ii) Let XM M as (1/N ) N δ (dy), assuming that X has the distribution MN . Let ϕ(x) be the k t k=1 Xt first eigenfunction for boundary conditions, the Laplacian in D with the Dirichlet N converge as N → ∞ normalized so that D ϕ = 1. Then the random measures XM to the (non-random) measure with density ϕ(x), in the sense of weak convergence of random measures.
682
K. Burdzy, R. Hołyst, P. March
2. Proofs This section is devoted to proofs of the main results. It also contains an example related to Theorem 1.3. Proof of Theorem 1.1. Fix an arbitrary S < ∞. Let Bt be a Brownian motion, and h(x, t) = P (inf{s > t : Bs ∈ / D} > S | Bt = x). We will first prove the theorem for N = 2 as this special case presents the main idea of 1 , t) + h(X 2 , t). Consider an arbitrary y ∈ D the proof in a clear way. Let Mt = h(Xt− t− 1 2 and assume that X0 = X0 = y. Let a = h(y, 0) and τ∗ = τ1 ∧ S. An application of the optional stopping theorem to the martingale Mt∧τ∗ gives 2h(y, 0) = EM0 = EMτ∗ = E(Mτ∗ | τ∗ = S)P (τ∗ = S) + E(Mτ∗ | τ∗ < S)P (τ∗ < S) = 2 · P (τ∗ = S) + E(Mτ∗ | τ1 < S)P (τ1 < S) = 2 · h(y, 0)2 + E(Mτ∗ | τ1 < S)(1 − h(y, 0)2 ). From this we obtain E(Mτ∗ | τ1 < S) =
2h(y, 0) − 2h(y, 0)2 2h(y, 0) = ≥ h(y, 0). 1 − h(y, 0)2 1 + h(y, 0)
The process Xtk which hits ∂D at time τ1 jumps to the location of Xτ3−k , so we have 1 E(h(Xτ11 , τ1 ) + h(Xτ21 , τ1 ) | τ1 < S) ≥ 2h(y, 0) = E(h(X01 , 0) + h(X02 , 0)). By applying the strong Markov property at the stopping time τ1 we obtain E(h(Xτ12 , τ2 ) + h(Xτ22 , τ2 ) | τ2 < S) ≥ E(h(X01 , 0) + h(X02 , 0)). By induction, for all k ≥ 1, E(h(Xτ1k , τk ) + h(Xτ2k , τk ) | τk < S) ≥ E(h(X01 , 0) + h(X02 , 0)) = 2a.
(2.1)
Let Jk = h(Xτ1k , τk ) + h(Xτ2k , τk ). Since h(x, t) ≤ 1, we have Jk ≤ 2. Hence, E(Jk | τk < S) ≤ 2P (Jk ≥ a | τk < S) + aP (Jk < a | τk < S) = P (Jk ≥ a | τk < S)(2 − a) + a, and so, using (2.1), P (h(Xτ1k , τk ) = h(Xτ2k , τk ) ≥ a/2 | τk < S) 2a − a a E(Jk | τk < S) − a = P (Jk ≥ a | τk < S) ≥ ≥ = . 2−a 2−a 2−a
Fleming–Viot Particle Representation of the Dirichlet Laplacian
683
It follows that P (τk+1 ≥ S | τk < S) 1 2 = P (inf{s > τk : Xs− ∈ / D} > S, inf{s > τk : Xs− ∈ / D} > S | τk < S) = [P (inf{s > τk : Xs1 ∈ / D} > S | Xτ1k = x)]2 P (Xτ1k ∈ dx | τk < S) = h(x, τk )2 P (Xτ1k ∈ dx | τk < S)
≥ (a/2)2 ·
a3 a = . 2−a 8 − 4a
This implies that P (τk+1 < S) =
k
P (τj +1
j =1
a3 < S | τj < S) ≤ 1 − 8 − 4a
k ,
and so P (τ∞ < S) = 0. Recall that we have assumed that X01 = X02 . If X01 is not equal to X02 , we can apply the argument to the post-τ1 process to see that P (τ∞ < S) = 0 for every starting position of Xt . Since S < ∞ is arbitrarily large, the proof of the theorem is complete in the special case N = 2. Now we generalize the argument to arbitrary N ≥ 2. Recall S, τ∗ and h(x, t) from the first part of the proof. Let Mt =
N k=1
k h(Xt− , t),
and ak = h(X0k , 0). Then N
ak = EM0 = EMτ∗
k=1
= E(Mτ∗ | τ∗ = S)P (τ∗ = S) + E(Mτ∗ | τ∗ < S)P (τ∗ < S) = N · P (τ∗ = S) + E(Mτ∗ | τ1 < S)P (τ1 < S) N N =N ak + E(Mτ∗ | τ1 < S) 1 − ak . k=1
k=1
From this we obtain E(Mτ∗ | τ1 < S) =
N
k=1 ak
1−
−N
N
N
k=1 ak
k=1 ak
.
(2.2)
Our immediate goal is to prove that the right hand side of (2.2) is bounded below by ((N − 1)/N ) N k=1 ak .
684
K. Burdzy, R. Hołyst, P. March
The derivative of the function x →
N k=1 ak
− N x /(1 − x) is equal to
N
k=1 ak
−N . (1 − x)2 The derivative is non-positive since N ak ≤ N . If we let b = N k=1
k=1 ak /N then for N a fixed value of b, the value of the product k=1 ak is maximized if we take ak = b for all k. These facts imply that N N
N ak − N · b N k=1 ak − N k=1 ak ≥ k=1
N 1 − bN 1 − k=1 ak (2.3) N b − N · bN 1 − bN−1 = = Nb · . 1 − bN 1 − bN We will show that Nb ·
1 − bN−1 ≥ (N − 1)b, 1 − bN
(2.4)
for b ∈ [0, 1). The last inequality is equivalent to N (1 − bN−1 ) ≥ (N − 1)(1 − bN ). After multiplying out and regrouping the terms we obtain 1 + N bN − N bN−1 − bN ≥ 0.
(2.5)
The function f (b) = 1 + N bN − N bN−1 − bN has the derivative f (b) = N (N − 1)bN−2 (b − 1) which is negative for b < 1. Since f (1) = 0, we have f (b) ≥ 0 for b ∈ [0, 1), i.e., (2.5) holds. Consequently, (2.4) is true as well. Combining (2.2), (2.3) and (2.4) yields E(Mτ∗ | τ1 < S) =
N
k=1 ak
1−
−N
N
N
k=1 ak
k=1 ak
N
≥ (N − 1)b =
N −1 ak . N k=1
The process Xk which hits the boundary at time τ1 jumps to the location of a process Xj , uniformly chosen from other processes. Hence, N N 1 E h(Xτk∗ , t) | τ1 < S = 1 + h(Xτk∗ − , t) | τ1 < S E N −1 k=1
=
N E(Mτ∗ N −1
k=1
N N N N −1 | τ1 < S) ≥ ak = ak . · N −1 N k=1
k=1
By induction and the strong Markov property applied at τk ’s, we have for every k ≥ 1, N N k E h(Xτk , t) | τk < S ≥ ak = N b. (2.6) k=1
k=1
Fleming–Viot Particle Representation of the Dirichlet Laplacian
685
N j Let Jk = h(Xτk , τk ). Since h(x, t) ≤ 1, we have Jk ≤ N . Recall that b = N j =1 (1/N ) k=1 ak . Hence, E(Jk | τk < S) ≤ N P (Jk ≥ b | τk < S) + bP (Jk < b | τk < S) = P (Jk ≥ b | τk < S)(N − b) + b. This and (2.6) imply that P (Jk ≥ b | τk < S) ≥
E(Jk | τk < S) − b Nb − b ≥ . N −b N −b
It follows that P (∃j : h(Xτjk , τk ) ≥ b/N | τk < S) ≥
Nb − b . N −b
(2.7)
j
Fix some t ∈ (0, S). Suppose that h(Xt , t) ≥ b/N for some j and assume that j is j / (b/(2N ), 1)}. the smallest number with this property. Let T = inf{s > t : h(Xs , s) ∈ j j Note that h(XT , T ) = 1 if and only if T = S. The process h(Xs , s) is a martingale on the interval (t, T ). By the martingale property and the optional stopping theorem, the probability of not hitting b/(2N ) before time S is greater than or equal to b/N − b/(2N ) b = . 1 − b/(2N ) 2N − b
(2.8)
j
j
Consider the event A that h(Xt , t) ≥ b/N at some time t and that the process h(Xs , s) does not hit b/(2N ) between t and S. Given this event, for any k = j , the process X k may jump at most once before time S with probability greater than (1/(N − 1)) · (b/(2N )), independent of other processes X m , m = k, j . To see this, observe that X k might not hit ∂D before time S at all; or it may hit ∂D, then jump to the location of Xj with probability 1/(N − 1). If the jump takes place, the process X k lands at some time u at a j place where we have h(Xuk , u) = h(Xu , u) ≥ b/(2N ), because we are assuming that A holds. The definition of the function h now implies that after u, the process Xk will not hit ∂D before time S, with probability greater than b/(2N ). Multiplying the probabilities for all k = j and using (2.8), we conclude that if we j have h(Xt , t) ≥ b/N then the probability that there will be at most N − 1 jumps (counting all particles) before time S is greater than N−1 b b . p0 = · 2N − b 2N (N − 1) Hence, in view of (2.7), P (τk+N ≥ S | τk < S) ≥ P (∃j : h(Xτjk , τk ) ≥ b/N | τk < S)P (τk+N > S | ∃j : h(Xτjk , τk ) ≥ b/N ) Nb − b ≥ · p0 . N −b Thus P (τ(m+1)N < S) =
m j =1
P (τ(j +1)N < S | τj N
Nb − b < S) ≤ 1 − · p0 N −b
m
,
686
K. Burdzy, R. Hołyst, P. March
and so
P (τ∞ < S) = 0.
Since S is arbitrarily large, the proof is complete.
j
Proof of Theorem 1.2. Fix arbitrary points x j ∈ D and suppose that X0 = x j for all j . Let τ j be the the first jump time for the process X j . Since there are N ! permutations of {1, 2, . . . , N}, there exists a permutation (j1 , j2 , . . . , jN ), such that P (τ j1 < τ j2 < · · · < τ jN ) ≥ 1/N !. In order to simplify the notation we will assume that (j1 , j2 , . . . , jN ) = (1, 2, . . . , N ). Thus we have P (τ 1 < τ 2 < · · · < τ N ) ≥ 1/N ! and P (τ 1 < τ 2 < · · · < τ N , Xτ11 = XτN1 ) ≥ j
1 . N · N!
j
Let τ2 denote the time of the second jump of process Xt . By independence, P (τ 1 < τ 2 < . . . < τ N−1 < min(τ N , τ21 ), Xτ11 = XτN1 | τ 1 , τ 2 , . . . , τ N−1 , Xτ11 = XτN1 ) = P (τ 1 < τ 2 < · · · < τ N−1 < τ N , Xτ11 = XτN1 | τ 1 , τ 2 , . . . , τ N−1 , Xτ11 = XτN1 )
× P (τ 1 < τ 2 < · · · < τ N−1 < τ21 , Xτ11 = XτN1 | τ 1 , τ 2 , . . . , τ N−1 , Xτ11 = XτN1 )
= P (τ 1 < τ 2 < · · · < τ N−1 < τ N , Xτ11 = XτN1 | τ 1 , τ 2 , . . . , τ N−1 , Xτ11 = XτN1 )2 . It follows that, P τ 1 < τ 2 < · · · < τ N−1 < min(τ N , τ21 ), Xτ11 = XτN1 = EP τ 1 < τ 2 < . . . < τ N−1 < min(τ N , τ21 ), Xτ11 = XτN1 | τ 1 , τ 2 , . . . , τ N−1 , Xτ11 = XτN1
= E P τ 1 < τ 2 < · · · < τ N−1 < τ N ,
Xτ11 = XτN1 | τ 1 , τ 2 , . . . , τ N−1 , Xτ11 = XτN1
2
≥ EP τ 1 < τ 2 < · · · < τ N−1 < τ N ,
Xτ11 = XτN1 | τ 1 , τ 2 , . . . , τ N−1 , Xτ11 = XτN1 2 = P τ 1 < τ 2 < · · · < τ N−1 < τ N , Xτ11 = XτN1 ≥
2
1 . (N · N!)2
We proceed by induction. Let us display one induction step. We start with P (τ 1 < τ 2 < · · · < τ N−1 < min(τ N , τ21 ), Xτ11 = XτN1 , Xτ22 = XτN2 ) ≥
1 1 . · N (N · N !)2
Fleming–Viot Particle Representation of the Dirichlet Laplacian
687
Then we observe that P (τ 1 < τ 2 < · · · < τ N−1 < min(τ N , τ21 , τ22 ), Xτ11 = XτN1 , Xτ22 = XτN2 | τ 1 , τ 2 , . . . , τ N−1 , τ21 , Xτ11 = XτN1 , Xτ22 = XτN2 , )
= P (τ 1 < τ 2 < · · · < τ N−1 < min(τ N , τ21 ), Xτ11 = XτN1 , Xτ22 = XτN2 | τ 1 , τ 2 , . . . , τ N−1 , τ21 , Xτ11 = XτN1 , Xτ22 = XτN2 , )
× P (τ 1 < τ 2 < · · · < τ N−1 < min(τ22 , τ21 ), Xτ11 = XτN1 , Xτ22 = XτN2 | τ 1 , τ 2 , . . . , τ N−1 , τ21 , Xτ11 = XτN1 , Xτ22 = XτN2 , )
= P (τ 1 < τ 2 < · · · < τ N−1 < min(τ N , τ21 ), Xτ11 = XτN1 , Xτ22 = XτN2 | τ 1 , τ 2 , . . . , τ N−1 , τ21 , Xτ11 = XτN1 , Xτ22 = XτN2 , )2 .
From this we deduce that P (τ 1 < τ 2 < · · · < τ N−1 < min(τ N , τ21 , τ22 ), Xτ11 = XτN1 , Xτ22 = XτN2 )
= EP (τ 1 < τ 2 < · · · < τ N−1 < min(τ N , τ21 , τ22 ), Xτ11 = XτN1 , Xτ22 = XτN2 | τ 1 , τ 2 , . . . , τ N−1 , τ21 , Xτ11 = XτN1 , Xτ22 = XτN2 , )
= E[P (τ 1 < τ 2 < · · · < τ N−1 < min(τ N , τ21 ), Xτ11 = XτN1 , Xτ22 = XτN2 | τ 1 , τ 2 , . . . , τ N−1 , τ21 , Xτ11 = XτN1 , Xτ22 = XτN2 , )2 ]
≥ [EP (τ 1 < τ 2 < · · · < τ N−1 < min(τ N , τ21 ), Xτ11 = XτN1 , Xτ22 = XτN2 | τ 1 , τ 2 , . . . , τ N−1 , τ21 , Xτ11 = XτN1 , Xτ22 = XτN2 , )]2
= P (τ 1 < τ 2 < · · · < τ N−1 < min(τ N , τ21 ), Xτ11 = XτN1 , Xτ22 = XτN2 )2 2 1 1 ≥ . · N (N · N!)2 Proceeding in this way, we can prove that P (τ 1 < τ 2 < · · · < τ N−1 < min(τ N , τ21 , τ22 , . . . , τ2N−1 ), N Xτ11 = XτN1 , Xτ22 = XτN2 , . . . , XτN−1 N −1 = Xτ N −1 ) ≥ c1 ,
(2.9)
where c1 > 0 is a constant which depends on N but not on the starting position of Xtk ’s. If the event in (2.9) occurs then at time τN−1 all particles are present in the same connected component of D as XN . They will stay in this connected component of D forever. If the event in (2.9) does not occur then we wait until the time max(τ N , τ21 , τ22 , . . . , τ2N−1 ) and restart our argument, using the strong Markov property. We can construct in this way a sequence of events whose conditional probabilities (given the outcomes of previous “trials”) are bounded below by c1 . With probability 1, at least one of these events will occur and so all particles will end up in a single connected component of D. Proof of Theorem 1.3. Fix some S ∈ (0, ∞). We will prove that XSN converges to µS . Our proof will consist of three parts.
688
K. Burdzy, R. Hołyst, P. March
Part 1. In this part of the proof, we will define the tree of descendants of a particle and estimate its size. Fix an arbitrarily small ε1 > 0. Let Bt denote a Brownian motion and TA = inf{t > 0 : Bt ∈ A}. Find open subsets A1 , A2 and A3 of D such that A1 ⊂ A2 ⊂ A3 , µ0 (A1 ) ≥ ε1 > 0, and for some p1 , p2 > 0, inf P x (TAc2 > S) > p1
x∈A1
and
inf P x (TAc3 > S) > p2 .
x∈A2
We would like to set aside a small family of particles starting from A1 and containing about ε1 N particles. Since the measure µ0 may be purely atomic with all atoms greater than ε1 , we cannot designate the particles in that family just by their starting position. We have assumed that the measures X0N converge to µ0 , so we must have X0N (A1 ) > ε1 /2 for large N . Let [b] denote the integer part of a number b. For each sufficiently large N, we arbitrarily choose [N ε1 /2] particles with the property that their starting positions lie inside A1 . The choice is deterministic (non-random). The family of all N − [N ε1 /2] remaining particles will be called H. By the law of large numbers, for any p3 < 1 and sufficiently large N, more than N ε1 p1 /4 particles will stay inside A2 until time S, with probability greater than p3 . We will say that a particle has label k if its motion is represented by Xtk . We will identify the families Hc and H with the sets of labels so that we can write, for example, k ∈ Hc . Let F be the event that at least N1 = N ε1 p1 /4 particles from the family Hc stay inside A2 until time S. Consider the motion of a particle X k belonging to the H family, conditional on F . Given F , the probability that the particle lands on a particle from the family Hc at the time of a jump is not less than (N ε1 p1 /4)/(N − 1). If this event occurs, then the k th particle can stay within the set A3 until time S with probability p2 or higher. Hence, each jump of particle k has at least probability p2 (N ε1 p1 /4)/(N − 1) ≥ p2 ε1 p1 /4 ≡ p4 of being the last jump for this particle before time S. We see that the total number of jumps of X k before time S is stochastically bounded by the geometric distribution with mean 1/p4 . In the rest of Part 1, we will assume that F occurred, i.e., all the probabilities will be conditional probabilities given F , even if the conditioning is not reflected in the notation. We will now define a tree T m of particle trajectories representing descendants of particle m (see Fig. 1). Informally, the family of all descendants of a particle Xm can be described as the smallest family of points (t, n) with the following properties. The particle Xtm is its own descendant for all t, i.e., (t, m) ∈ T m for all t. If a particle X k jumps on a descendant of X m at time s then Xtk becomes a descendant of X m for all t ≥ s, i.e., (t, k) ∈ T m for all t ≥ s. We now present a more formal definition. We will say that (t, n) ∈ T m if there exists a set of pairs (sj , yj ), 0 ≤ j ≤ j0 , with (s0 , y0 ) = (0, x m ), (sj0 , yj0 ) = (t, Xtn ), such that there exist distinct k(j ) ∈ H, j ≥ 1, with the following properties: (i) (sj −1 , Xskj −1 ) = (sj −1 , yj −1 ) and (sj , Xskj ) = (sj , yj ) for all j , (ii) X k does not jump at time sj for 0 ≤ j ≤ j0 − 1, and (iii) k(0) = m and k(j0 ) = n. Note that the definition assumes that we use only pieces of trajectories of particles from the family H. Let Ktm be the set of all descendants of particle m until t, i.e., the set of all k such that (s, k) ∈ T m for some s ≤ t. The function t → Ktm is monotone. Fix some m ∈ H. Let α1k be the number of jumps made by particle k from the family H during the time interval [0, S] but before the first time Tmk when it becomes a descendant of m (if there is no such time, we count all jumps before S). Let α2k be the
Fleming–Viot Particle Representation of the Dirichlet Laplacian
689
a H
H
c
{
{
xm 0
Fig. 1. Descendants of the mth particle are represented by thick lines. The domain D is the interval (0, a)
number of jumps made after Tmk but before S (α2k = 0 if k does not become a descendant of m before S). It is easy to see that every random variable α1k and α2k is stochastically bounded by the geometric distribution with mean 1/p4 – we can use the same argument as earlier in the proof. We will need a substantially stronger bound, though. It is not very hard to see that one can define our processes on a probability space which will also carry random variables α1k and α2k for all k ∈ H, such that α1k ≤ α1k and α2k ≤ α2k for all k k k ∈ H, and every random variable α1 and α2 is geometric with mean 1/p4 . Moreover, k k random variables α1 and α2 can be constructed so that they are jointly independent and independent of the process t → Ktm . The construction of such a family of random α2k ’s variables is standard so we will only sketch it. We start with constructing α1k ’s and k k and then we use them to construct α1 ’s and α2 ’s. We consider a probability space which carries independent sequences of Bernoulli coin tosses with success probability p4 . We then identify α1k ’s and α2k ’s with the number of tosses until and including the first success in different sequences – we need one sequence of Bernoulli trials for each α1k and α2k . k The results of coin tosses corresponding to α1 are used to determine whether particle k jumps onto a particle from the family Hc and then stays inside A3 until S (this would be considered a “success”), for all jumps before the time Tmk . The analogous events after time Tmk are determined by the sequence of coin tosses corresponding to α2k . All other aspects of the motion of particle k are determined by some other random mechanism. Such mechanisms need not be independent for different particles. Let |Ktm | denote the cardinality of Ktm . Note that if a particle from the family H \ Ktm jumps then with probability 1/(N − 1) it lands on any other given particle. The value of |Ktm | increases by 1 at the time t of a jump of a particle from H \ Ktm with probability m |/(N − 1). Hence, equal to |Kt− m m k P (|Ktm | = a + 1 |Kt− | = a, ∃k ∈ H \ Kt− : Xtk = Xt− )=
a , N −1
690
K. Burdzy, R. Hołyst, P. March
so for integer a, r ≥ 1, and c1 = c1 (r) < ∞, m m k E(|Ktm |r |Kt− | = a, ∃k ∈ H \ Kt− : Xtk = Xt− ) a a = ar 1 − + (a + 1)r N −1 N −1 r a + 1 a a = ar 1 − + N −1 a N −1 a a c r 1 ≤ ar 1 − + 1+ N −1 a N −1 c1 r = ar 1 + . N −1 Hence, the expectation of |Ktm |r jumps by at most the factor of 1 + c1 r/(N − 1) at the time of a jump of a particle from H \ Ktm . Let α = k∈H α1k . By conditioning on the times of jumps, we obtain for m ∈ H, c1 r α m r . E|Kt | ≤ E 1 + N −1 We estimate this quantity as follows, using the fact that the family of random variables α1k may be simultaneously stochastically bounded by a sequence of independent geometric random variables α1k with mean 1/p4 . The number of particles in H is obviously bounded by N . In the following calculation we will pretend that the number of α1k ’s is N ; this is harmless because if the number of particles in H is smaller than N , we can always add a few independent α1k ’s to the family. If c1 r log 1 + = c2 , N −1 then c2 is small for large N , and the following holds, c1 r α k E 1+ = E exp(c2 α) = E exp c2 α1 N −1 k∈H k α1 = E exp c2 α1k ≤ E exp c2 k∈H
≤E
N
exp c2 α1k
k∈H
N α1k = E exp c2
k=1
N ∞ c2 (j +1) j = e (1 − p4 ) p4 = j =0
ec2 p4 1 − ec2 (1 − p4 )
N
−N −N 1 1 −c2 1 = 1+ (e − 1) = 1+ − 1 c1 r p4 p4 1 + N−1 −N 1 c1 r = 1− ≤ c3 = c3 (r, p4 ) < ∞. p4 N − 1 + c 1 r
Fleming–Viot Particle Representation of the Dirichlet Laplacian
691
Thus, for some c3 which depends only on r and p4 , E|Ktm |r ≤ c3 .
(2.10)
Next we will estimate the total number of jumps β m on the tree of descendants of particle m. For each descendant, this will include not only the first jump, at the time of which a particle becomes a descendant of m, but also all subsequent jumps by the descendant. Recall that given the whole genealogical tree {Ktm , 0 ≤ t ≤ S}, the numbers of jumps of descendants of m can be simultaneously bounded by α2k ’s, i.e., independent geometric random variables with mean 1/p4 . We have, using (2.10), for some c4 = c4 (r) < ∞, r |Ktm | α2k E(β m )r ≤ E |Ktm | + k=1
|Ktm |
≤ c4 E|Ktm |r + c4 E
≤
α2k
k=1
c4 E|Ktm |r
r
+ c4 E E
a
α2k
r m |K | = a t
≤
k=1 a r m r r k m α2 |Kt | = a c4 E|Kt | + c4 E E a k=1 r m r r+1 c4 E|Kt | + c4 E E a α2k |Ktm | = a
≤
c4 E|Ktm |r
≤
+ c4 E E a
r+1
m c5 |Kt | = a
= c4 E|Ktm |r + c4 c5 E|Ktm |r+1 ≤ c6 = c6 (r) < ∞.
(2.11)
Part 2. This part of the proof is devoted to some qualitative estimates of the transition probabilities of the killed Brownian motion. Suppose A ⊂ D is an open set. We recall our assumption of regularity of ∂D. It implies that the function (x, t) → PtD (x, A) vanishes continuously as x → D c and so it has a continuous extension to D × [0, ∞). The notation of the following remarks partly anticipates the notation in Part 3 of the proof. Fix some t > 0 and arbitrarily small δ1 > 0. The set D × [0, t] is compact, so the continuous function (x, t) → PtD (x, A) is uniformly continuous on this set. It follows D (x, A)−P D (y, A)| < δ that we can find an integer n < ∞ and δ2 > 0 such that |Pt−s 1 t−s j when s ∈ [sj , sj +1 ], |sj − sj +1 | ≤ t/n, and |x − y| ≤ δ2 . Fix arbitrarily small t1 , δ1 > 0. Let Dδ2 denote the set of points whose distance to D c is greater than δ2 . The transition density pt (x, y) of the free Brownian motion is bounded by r1 < ∞ for x, y ∈ Rd and t ≥ t1 . The same bound holds for the transition densities ptD (x, y) for the killed Brownian motion because ptD (x, y) ≤ pt (x, y). Choose δ2 > 0 so small that the volume of D \ Dδ2 is less than δ1 /r1 . Then for every sj ≥ t1 and x ∈ D, D c Psj (x, Dδ2 ) = ptD (x, y)dy ≤ (δ1 /r1 ) · r1 = δ1 . D\Dδ2
692
K. Burdzy, R. Hołyst, P. March
Part 3. We start with the definition of marks which we will use to label particles. We will prove, in a sense, that the theorem holds separately for each family of particles bearing the same marks. Typically, a particle Xj will bear different marks at different times. The family of marks 0 is defined as the smallest set which contains 0, and which has the property that if θ1 , θ2 ∈ 0 then the ordered pair (θ1 , θ2 ) also belongs to 0. Note that we do not assume that θ1 = θ2 . We will write (θ1 → θ2 ) rather than (θ1 , θ2 ). Here are some examples of marks: 0,
(0 → 0),
(0 → (0 → 0)),
((0 → 0) → 0).
Our marks can be identified with vertices of a binary tree and are introduced only because our notation seems more intuitive in our context. Every mark will have an associated “height”. The height of 0 is defined to be 1. The height of (θ1 → θ2 ) is one plus the maximum of heights of θ1 and θ2 . We assign marks as follows. If a particle X j has not jumped before time t then its mark is equal to 0 on the interval (0, t). The mark of every particle is going to change every time it jumps, and only at such times. If a particle X j jumps at time t onto a particle Xk , the mark of X j has been θ1 just before t, and the mark of X k has been θ2 just before t then the mark of Xj will be (θ1 → θ2 ) on the interval between (and including) t and the first jump of Xj after t. To see that the above definition uniquely assigns marks to all particles at all times, note that we assign mark 0 to all particles until the first jump by any particle. Recall that τ1 < τ2 < τ3 < . . . denote the jump times of all particles. If we know the marks on the interval [τj , τj +1 ) then it is easy to assign them in a unique way on the interval [τj +1 , τj +2 ). An easy inductive procedure allows us to assign marks to all particles at all times. The mark of Xtk will be denoted θ(Xtk ). For any θ1 ∈ 0, let XtN,θ1 (dy) =
N 1 1{θ1 } (θ (Xtk ))δXk (dy). t N k=1
Note that XtN,θ1 (dy) is a (sub-probability) empirical measure supported by the particles marked with θ1 at time t. The law of large numbers and the continuity of x → PtD (x, dy) (see Part 2 of the proof) imply that for every fixed t ≤ S, the measures XtN,0 (dy) converge in probability as N → ∞ to the measure D PtD (x, dy)µ0 (dx), in the sense of weak convergence of measures. In particular, XSN,0 (dy) converge weakly to a multiple of µS (dy). The main goal of this part of the proof is to show that XSN,θ (dy) converge weakly to a multiple of µS (dy), for any fixed mark θ. This will be achieved by an inductive argument. We will elaborate the details of one inductive step, showing how the convergence of XtN,0 (dy) N,(0→0) to D PtD (x, dy)µ0 (dx) for every t ≤ S implies the convergence of Xt (dy) to D c1 D Pt (x, dy)µ0 (dx) for every t ≤ S. Consider some t ∈ (0, S]. We will show that for any δ > 0, p < 1, open A ⊂ D and 0 < t1 < t2 < t, we have N,0 D (dx) D Pt−s (x, A)Xs (1 − δ)µt (A) ≤ inf D t1 ≤s≤t2 D Pt (x, D)µ0 (dx) P D (x, A)XsN,0 (dx) ≤ sup D t−sD ≤ (1 + δ)µt (A), (2.12) t1 ≤s≤t2 D Pt (x, D)µ0 (dx)
Fleming–Viot Particle Representation of the Dirichlet Laplacian
693
with probability greater than p, when N is sufficiently large. If we fix any integer n ≥ 1, then for every sj = j t2 /n, j = 0, 1, . . . , n, and every open set A1 we have XsN,0 (A1 ) → j D (x, A )µ (dx) in probability as N → ∞, by the law of large numbers and the P 1 0 D sj D (x, A) imply that continuity of x → PsDj (x, A1 ). This and the continuity of x → Pt−s j D
D (x, A)X N,0 (dx) Pt−s sj j D
PtD (x, D)µ0 (dx)
→ µt (A),
in probability. Since there is only a finite number of sj ’s, we immediately obtain a weak version of (2.12), namely, N,0 D D Pt−sj (x, A)Xsj (dx) (1 − δ)µt (A) ≤ inf D 0≤j ≤n D Pt (x, D)µ0 (dx) N,0 D D Pt−sj (x, A)Xsj (dx) ≤ sup ≤ (1 + δ)µt (A), (2.13) D 0≤j ≤n D Pt (x, D)µ0 (dx) with probability greater than p, when N is sufficiently large. Fix arbitrarily small δ1 , p1 > 0 and let δ2 > 0 be so small and n so large that the D (x, A) − following conditions are satisfied, according to Part 2 of the proof. First, |Pt−s j D (y, A)| < δ when s ∈ [s , s Pt−s 1 j j +1 ], 0 ≤ j ≤ n − 1, and |x − y| ≤ δ2 . Second, if Dδ2 denotes the set of points whose distance to D c is greater than δ2 , we want to have PsDj (x, Dδc2 ) < δ1 , for every x and j with sj ≥ t1 . Finally, increase n if necessary so that the probability that a Brownian path has an oscillation larger than δ2 within a subinterval of [0, S] of length t2 /n or less, is less than p1 . With this choice of various constants, we see that for large N , with probability greater than p the following will be true for all j with t1 ≤ sj ≤ S. First, the proportion of X k ’s which will be within distance δ2 of the boundary at time sj will be less than 2δ1 and the proportion of X k ’s which will jump during the interval [sj , sj +1 ] will be less than 3δ1 . Because of this and D (X k , A) − P D (X k , A)| < δ for all the other parameter choices, we will have |Pt−s 1 sj t−s s j j , s ∈ [sj , sj +1 ] and all labels k in a subset of {1, 2, . . . , N} whose cardinality would be bounded below by (1 − 2p1 − 3δ1 )/N . This implies that simultaneously for all j and s ∈ [sj , sj +1 ], for large N , D N,0 P D (x, A)X N,0 (dx) − Pt−s (x, A)Xs (dx) ≤ δ1 + 2p1 + 3δ1 , t−sj sj D
D
with probability greater than p. This, the fact that p can be arbitrarily large and δ1 and p1 arbitrarily small, and (2.13) prove (2.12). We will now prove that a suitable version of (2.12) holds when we replace XsN,0 N,(0→0) with Xs . Consider an arbitrary t ≤ S, and 0 < t1 < t2 < t. Suppose that a particle Xk with mark 0 hits the boundary of D at a time s ∈ (t1 , t2 ). Then it will jump onto a randomly chosen particle. If Xk jumps onto a particle marked 0, its label will change to (0 → 0). Given this event, conditional on the value of XsN,0 , the distribution D (x, · )X N,0 (dx), by the strong Markov property. The of Xk at time t will be D Pt−s s same holds true for all other particles with mark 0 which hit D c between times t1 and t2 . Since these particles evolve independently after the jump, we see from (2.12) that
694
K. Burdzy, R. Hołyst, P. March
the empirical distribution at time t of all particles marked (0 → 0) which received this mark at a time between t1 and t2 converges in probability to a constant multiple of µt , as N → ∞. If we fix t > 0, it is easy to see that for sufficiently small t1 > 0 and large t2 < t, the probability that a particle with mark 0 will hit the boundary of D in one of N,(0→0) converges to a the intervals [0, t1 ] or [t2 , t] will be arbitrarily small. Hence, Xt constant multiple of µt , in probability. Given the last fact, the same argument which proves (2.13) yields for some η(θ ) ∈ (0, 1], D →0) η((0 → 0))(1 − δ)µt (A) ≤ inf Pt−s (x, A)XsN,(0 (dx) j j 0≤j ≤n D D →0) ≤ sup Pt−s (x, A)XsN,(0 (dx) (2.14) j j 0≤j ≤n D
≤ η((0 → 0))(1 + δ)µt (A), with large probability, when N is large. The argument following (2.13) is not specific to the case when the particles have the mark 0 and so it can be applied to the present case of particles marked (0 → 0). Hence, we obtain the following formula, which differs from (2.12) only in that the normalizing constant is η((0 → 0)) and not D PtD (x, D)µ0 (dx), η((0 → 0))(1 − δ)µt (A) ≤ inf P D (x, A)XsN,(0→0) (dx) t1 ≤s≤t2 D t−s D ≤ sup Pt−s (x, A)XsN,(0→0) (dx) (2.15) t1 ≤s≤t2 D
≤ η((0 → 0))(1 + δ)µt (A). The last formula holds with probability greater than p if N is sufficiently large, for any fixed t ≤ S, any 0 < t1 < t2 < t, and any p < 1. Proceeding by induction, one can show that (2.15) applies not only to the mark θ = (0 → 0) but also to (0 → (0 → 0)), ((0 → 0) → 0), ((0 → 0) → (0 → 0)), and to every other mark θ . Imbedded in an induction step for a mark θ is the proof that the measures XSN,θ (dy) converge to a constant (deterministic) multiple of µS (dy). It follows that for every finite deterministic subset 01 of 0, the measures θ∈01 XSN,θ (dy) converge to a multiple of µS (dy). It will now suffice to show that for any p2 , p3 > 0, there exists a finite set 01 such N,θ that θ ∈0 / 1 XS (D) < p2 with probability greater than 1 − p3 . In other words, we want to show that for some finite 01 , the number of particles with a mark from 0c1 at time S is less than p2 N with probability greater than 1 − p3 . In order to prove this, we will use the result proved in Part 1 of the proof. Recall the notion of the “height” of a mark from the beginning of the second part of the proof. Suppose that a particle Xk has a mark with height j at time S. Let tj be the infimum of t with the property that Xtk has a mark with height j . Then tj must be the time of a jump of Xk . Let X nj be the particle on which X k jumps at time tj . By definition, the height of the mark of Xk or the height of the mark of X nj must be equal to j − 1 just before time tj . We define kj to be k or nj , so that the height of the mark of Xkj is equal to j − 1 prior to tj . We proceed by induction. Suppose we have identified a particle X km which has a mark with height m − 1 prior to a time tm , where m ≤ j .
Fleming–Viot Particle Representation of the Dirichlet Laplacian
695
Then we let tm−1 be the infimum of t < tm with the property that the height of the mark of Xtkm is m − 1. We see that X km must jump at time tm−1 on a particle X nm−1 . We choose km−1 to be either km or nm−1 , so that the height of the mark of X km−1 is m − 2 just before the time tm−1 . Proceeding in this way, we will end up with a particle X k2 which has a mark with height 1. The mark of this particle is 0, as it is the only mark with height 1. This implies that t1 = 0. We claim that for all m ≤ j and t ∈ [tm−1 , tm ), we have (t, km ) ∈ T k2 . In other words, every particle X km is a descendant of X k2 at times t ≥ tm−1 . To see this, note that the claim is obviously true for m = 2. If k3 = k2 then the claim is true for m = 3, because Xk2 always remains its own descendant. If k3 = k2 , it is clear that the particle Xk3 has jumped at time t2 on X k2 , a descendant of k2 , and so became a descendant of k2 at this time. Proceeding by induction, we can show that all particles in our chain are descendants of k2 on the intervals specified above. Note that at every time tm , either a descendant of k2 jumps or a descendant of k2 is born. Hence, if Xk has a mark with height j at time S, it must belong to the family of descendants of a particle for which the sum of descendants and their jumps is not less than j . Recall the event F from Part 1 of the proof and choose the parameters in Part 1 so that the probability of F c is less than p3 /2 and the cardinality of Hc is less than Np2 /2. Conditional on F , we have the following estimate. If the sum of the number of descendants of a particle k and the number of all their jumps is equal to j then a crude estimate says that at most j descendants of k end up at time S with marks of height j or lower; by the argument in the previous paragraph, the marks cannot be higher than j . Hence, the expected number of particles with marks higher than n at time S among descendants of a particle k ∈ H can be bounded using (2.11) by j ≥n
j P (β k ≥ j ) ≤
E(β k )3 j ≤ c1 j −2 ≤ c2 /n. 3 j j ≥n
j ≥n
The expected number of all particles in the family H with marks higher than n is bounded by N c2 /n. Choose 01 to be the set of all marks of height less than n, where n is so large that N c2 /n < (p3 /2)(p2 N/2). Then, conditional on F , the probability that the total number of particles in H with marks higher than n is bounded by Np2 /2 with probability 1−p3 /2 or higher. We add to that estimate all particles from Hc – their number is bounded by Np2 /2, so the total number of particles with marks higher than n is bounded by Np2 , with probability greater than or equal to 1 − p3 /2. This estimate was obtained under the assumption that F holds. Since the probability of F is less than p3 /2, we are done. Example 2.1. We will show that for some process Xt , some t, A, µ0 and N we have EXtN (A) = µt (A). Our process has a finite state space; we presume that a similar example can be constructed for Brownian motion. We will consider a continuous time Markov process on the state space {0, 1, 2}. The set {1, 2} will play the role of D. The possible jumps of the process are 0 → 1, 1 → 2, 2 → 1 and 1 → 0. The jump rates are equal to 1 for each one of these possible transitions. The measure µ0 will be uniform on D, i.e., µ0 (1) = µ0 (2) = 1/2. First we will argue that µt = µ0 for all t > 0. To see this, note that if we condition the process not to jump to 0, it will jump from the state 1 to the state 2 and from 2 to 1 at the rates 1, i.e., at the original rates. This is because all four types of jumps 0 → 1, 1 → 2, 2 → 1 and 1 → 0 may be thought of as coming from four independent Poisson
696
K. Burdzy, R. Hołyst, P. March
processes. Conditioning on the lack of jumps of one of these processes does not influence the other three jump processes. Since the conditioned process makes jumps from 1 to 2 and vice versa with equal rates, the symmetry of µ0 on the set {1, 2} is preserved forever, i.e., µt = µ0 for all t > 0. Now let N = 2 and consider the distribution of the process Xt . Let At denote the number of particles X1 and X 2 at the state 1 at time t. The process At is a continuous time Markov process with possible values 0, 1 and 2. Its possible transitions are 0 → 1, 1 → 2, 2 → 1 and 1 → 0, just like for the original process Xt . If At = 0, i.e., if both particles are at the state 2, the waiting time for a jump of At has expectation 1/2 because each of the particles jumps independently of the other one with the jump rate 1. When both particles are in the state 1, and one of them jumps to 0, it immediately returns to 1 (the location of the other particle) so the jumps of Xk ’s from 1 to 0 have no effect on At , if At = 2. It follows that the rate for the jumps of At from 2 to 1 is 2, i.e., it is the same as for the jumps of At from 0 to 1. Finally, let us analyze the case At = 1. When only one of the particles is at 2, it jumps to 1 at the rate 1, so the rate of the transitions 1 → 2 for the At process is 1. However, its rate of transitions 1 → 0 is equal to 2 because any jump of a particle from the state 1 will result in its landing at 2, either directly or through the instantaneous visit to 0. Given these transition rates, it is elementary to check that the stationary distribution of At assigns probabilities 1/3, 1/2 and 1/6 to the states 0, 1 and 2. This implies that EXt2 ({1}) ≈ 5/12 for large t (no matter what X02 is) and so EXt2 ({1}) = µt ({1}) for some t when we choose µ0 ({1}) to be equal to 1/2. Proof of Theorem 1.4. For a point x ∈ D let ρx be the supremum of dist(x, ∂B(y, r)) over all open balls B(y, r) such that x ∈ B(y, r) ⊂ D. For each x ∈ D we will choose a ball Bx with radius r, such that x ∈ Bx ⊂ D and dist(x, ∂Bx ) > ρx /2. The center of Bx will be denoted vx . We would like the mapping x → vx to be measurable. One way to achieve this goal is to construct a countable family of balls with radius r and make the mapping x → vx constant on every element of a countable family of squares, closed on two sides, disjoint, and summing up to D. Such a construction is known as “Whitney squares”, it is quite elementary and so it is left to the reader. We will construct Xtk ’s in a special way. Two 1-dimensional processes Utk and Rtk will be associated with each Xtk . The processes Utk and Rtk will take their values in [0, r]. The processes Rtk will be independent d-dimensional Bessel processes reflected at r. In other words, every process Rtk will have the same distribution as the radial part of the d-dimensional Brownian motion reflected inside the ball B(0, r). We will define Utk so that Utk ≤ Rtk for all k and t. The processes Rtk will give us a bound on the distance of Xtk from D c ; more precisely, we will have, according to our construction, dist(Xtk , D c ) ≥ r − Utk ≥ r − Rtk . No matter what distribution for X0 is desirable, it is easy to see that we can choose the starting values for Rtk ’s so that R0k = dist X0k , vXk , a.s. 0
In our construction, we assume that Rtk ’s are given and we proceed to describe how to define Utk ’s and Xtk ’s given Rtk ’s. We will first fix a k. Let T1k be the first time when the process Rtk hits r. On interval [0, T1k ), we can define Xtk as Brownian motion in the d k k R such that Rt = dist Xt , vXk . This requires only generating an angular part for Xtk , 0
relative to the initial positions X0k . A classical “skew-product” decomposition (see Itô and McKean (1974)) achieves the goal by generating a Brownian motion on a sphere (independent of Rtk ) and then time-changing it according to a clock defined by Rtk . Note
Fleming–Viot Particle Representation of the Dirichlet Laplacian
697
that according to the definition of vx , this constructed process Xtk will remain inside D k k k Utk jumps to for t ∈ [0, T1k). We let Utk = R T1 , the process t for t ∈ [0, T1 ). At time the value dist Xk k , v Xk k T1 −
, i.e., we let U k k = dist X k k , v Xk k
T1 −
T1 −
T1
T1 −
. We let the
process Utk evolve after time T1k as a d-dimensional Bessel process independent of Rtk , until time T2k = inf{t ≥ T1k : Utk = Rtk }. Let T3k = inf{t ≥ T2k : Rtk = r}. We couple the processes Utk and Rtk on the interval [T2k , T3k ), i.e.,we let Utk = Rtk for t ∈ [T2k , T3k ). For t ∈ [T1k , T3k ), we construct Xtk so that dist Xtk , v Xk k T1
= Utk . The spherical part
is constructed in an “independent” way, in the sense of the skew-product decomposition. We proceed by induction. Recall that Rtk is given, and suppose that processes Xtk k and Utk are defined on the interval [0, T2j that Rtk approaches −1 ).Moreover, suppose k k r as t ↑ T2j to be dist X k k , v Xk k and we let the −1 . We define U k T2j −1 −
T2j −1
T2j −1 −
k process Utk evolve after time T2j −1 as a d-dimensional Bessel process independent k k k k k k k of Rt , until time T2j = inf{t ≥ T2j −1 : Ut = Rt }. Note that Ut < Rt ≤ r for k k k k k k t ∈ [T2j −1 , T2j ). Let T2j +1 = inf{t ≥ T2j : Rt = r}. We couple the processes Ut k k k and Rt (i.e., we make them equal) on the interval [T2j , T2j +1 ). The Brownian motion k k k k Xtk is defined on [T2j = Utk . Its spherical part −1 , T2j +1 ) so that dist Xt , v X k T2j −1
is generated in an independent way from other elements of the construction and then time-changed according to the skew-product recipe. We see that Utk < Rtk ≤ r for k k Rtk = r. This implies that Xtk stays inside D on every t ∈ [T2j −1 , T2j +1 ) and lim t→T k k k interval [T2j −1 , T2j +1 ).
2j +1
Let τ1k = limj →∞ Tjk and note that typically, τ1k < ∞. The above procedure allows us to define the processes Xtk and Utk on the interval [0, τ1k ). We repeat the construction for all particles Xtk in such a way that the processes in the family {(Xtk , Utk )}1≤k≤N are jointly independent. j Let τ1 = min1≤j ≤N τ1 and suppose that the minimum is attained at k, i.e., τ1k = k k τ1 < ∞. Since infinitely many independent Bessel processes {Utk , t ∈ [T2j −1 , T2j +1 )} traveled from dist Xk k , v Xk k to r, and their travel times sum up to a finite T2j −1 T2j −1 number, bounded by τ1 , it follows that limj →∞ dist X k k , v Xk k = r. We will τ1k
Xtk
T2j −1
T2j −1
show that must approach ∂D at time = τ1 . Recall the function ρx from the beginning of the proof. If x belongs to an open ball B(y, r) ⊂ D then the same holds for all points in a small neighborhood of x. The definition of ρx now easily implies that the function x → ρx is Lipschitz inside D. Since, by assumption, ρx does not vanish inside D, every sequence xn satisfying dist(xn , vxn ) → r also satisfies ρxn → 0 and so must approach ∂D as n → ∞. This finishes the proof that limt→τ k − dist(Xtk , D c ) = 0. Since two independent Brownian 1
particles cannot hit ∂D at the same time, we see that there is only one process Xtk with τ1k = τ1 < ∞. Still assuming that τ1k = τ1 < ∞, we uniformly and independently of everything j j else choose j = k and let Xτk1 = Xτ1 and Uτk1 = Uτ1 . We then proceed with the
698
K. Burdzy, R. Hołyst, P. March
construction of Xtk and Utk on the interval [τ1k , τ2k ), such that limt→τ k − dist(Xtk , D c ) = 0. 2 The construction is completely analogous to that outlined above. Note that we necessarily have Uτk1 < Rτk1 so we have to start our construction as in the inductive step of the original algorithm. Recall that the construction generates a process Utk satisfying Utk ≤ Rtk for t ∈ [τ1k , τ2k ). j We let τ2 = τ2k ∧ min1≤j ≤N,j =k τ1 . A particle Xj will have to approach ∂D at time τ2 . We will make this particle jump and then proceed by induction. Theorem 1.1 shows that there will be no accumulation of jumps of Xtk ’s at any finite time. Recall that the inner ball radius r > 0 is a constant depending only on the domain D. It is well known that the reflected process Rtk spends zero time on the boundary (i.e., at the point r) so if it starts from R0k = r then its distribution at time t = 1 is supported on (0, r). It follows that for any p1 < 1 there exists r1 ∈ (0, r) such that we have, with r2 = r, P R1k ∈ [0, r1 ] | R0k = r2 > p1 . This estimate can be extended to all r2 ∈ [0, r], by an easy coupling argument. It follows from this and the independence of processes Rtk that there exists p2 > 0 such that with probability greater than p2 , more than Np1 /2 processes Rtk happen to be in [0, r1 ] at time 1, no matter what their starting positions are at time 0, for every N > 0. Let Da be the set of all points in D whose distance from D c is greater than or equal to a. The processes Xtk have been constructed in such a way that a.s., for every k and t we have dist(Xtk , D c ) ≥ r − Rtk . This and the claim in the previous paragraph show that for any starting position of Xtk ’s, with probability greater than p2 , more than Np1 /2 processes Xtk happen to be in Dr3 at time 1, where r3 = r − r1 . We will now proceed as at the beginning of Part 1 in the proof of Theorem 1.3. Fix an arbitrary p1 < 1, a corresponding r1 = r1 (p1 ) ∈ (0, r), r3 = r − r1 , and arbitrary 0 < r5 < r4 < r3 . Let H be the family of all processes Xk such that X1k ∈ Dr3 . Assume that H has at least Np1 /2 elements. There is p3 > 0 (depending on N, p1 and rj ’s) such that with probability greater than p3 , all processes in H will stay in Dr4 for all t ∈ [1, 2]. For some p4 > 0, all processes in Hc will have a jump in the interval [1, 2], will land on a particle from the H family, and subsequently stay in Dr5 until time t = 2. Altogether, there is a strictly positive probability p2 p3 p4 ≡ p5 that all particles will be in Dr5 at time t = 2, given any initial distribution at time t = 0. Now let us rephrase the last statement in terms of the vector process Xt whose state space is D N . We have just shown that with probability higher than p5 the process Xt can reach a compact set DrN5 within 2 units of time. This and the strong Markov property applied at times 2, 4, 6, . . . show that the hitting time of DrN5 is stochastically bounded by an exponential random variable with the expectation independent of the starting point of Xt . Since the transition densities ptX (x, y) for Xt are bounded below by the densities for the Brownian motion killed at the exit time from D N , we see that ptX (x, y) > c1 > 0 for x, y ∈ DrN5 . Fix arbitrarily small s > 0 and consider the “skeleton” {Xns }n≥0 . It is standard to prove that the properties listed in this paragraph imply that the skeleton has a stationary probability distribution and that it converges to that distribution exponentially fast. This can be done, for example, using Theorem 2.1 in Down, Meyn and Tweedie (1995). Extending the convergence claim to the continuous process t → Xt from its skeleton can be done in a very general context, as was kindly shown to us by Richard Tweedie. In our case, a simple argument based on “continuity” can be supplied. More precisely, one can use a lower estimate for ptX (x, y) in terms of the transition densities
Fleming–Viot Particle Representation of the Dirichlet Laplacian
699
for Brownian motion killed upon leaving D N , which are continuous. We leave the details to the reader. This completes the proof of part (i) of the theorem. Recall that we have proved that for any p1 < 1 there exists r1 < r such that for any starting position of X0k , the particle X k is in Dr−r1 at time t = 1, with probability N of the compact set greater than p1 . It follows that for any N , the mean measure EXM N Dr−r1 is not less than p1 . Hence, the mean measures EXM are tight in D. Lemma 3.2.7 N is tight and of Dawson (1992, p. 32) implies that the sequence of random measures XM so it contains a convergent subsequence. N Choose a subsequence Nj such that the sequence XMj is convergent to a probability measure 7(µ), carried by the family of probability measures on D. It will be enough to prove part (ii) of the theorem for this sequence. Consider the sequence of processes N N Xt = Xt j , each with the stationary distribution XMj as its starting distribution. Fix an open set A ⊂ D. By an argument totally analogous to the proof of Theorem 1.3, the following holds in the sense of convergence in probability, P D (x, A)µ(dx) N D tD lim Xt j (A) = 7(dµ). (2.16) j →∞ D Pt (x, D)µ(dx) We will now apply a few results from Bass and Burdzy (1992). Check Sect. 3 of that paper for the definition of a John domain. It is elementary to see that our domain D is a John domain, because it satisfies the interior ball condition. By Proposition 3.2 in Bass and Burdzy (1992), every John domain is a twisted Hölder domain of order 1. Hence, the parabolic boundary Harnack principle (Theorem 1.2 of Bass and Burdzy (1992)) holds for D. That theorem says that if ptD (x, y) denotes the transition densities for Brownian motion killed upon exiting D, then for each u > 0 there exists c = c(D, u) ∈ (0, 1) such that ptD (x, y) psD (v, y) ≥ c (2.17) psD (v, z) ptD (x, z) for all s, t ≥ u and all v, x, y, z ∈ D. We will need a stronger version of this inequality. The proof will be based on a lemma of Burdzy, Toby and Williams (1989). The following version of that lemma is taken from Burdzy and Khoshnevisan (1998). Suppose that functions h(x, y), g(x, y) and h1 (x, y) are defined on product spaces W1 × W2 , W2 × W3 and W1 × W3 , resp. Assume that for some constant c1 , c2 ∈ (0, 1) the functions satisfy for all x, y, x1 , x2 , y1 , y2 , z1 , z2 , h1 (x, y) = h(x, z)g(z, y)dz, W2
h(x2 , z1 ) h(x1 , z1 ) ≥ (1 − c1 ), h(x1 , z2 ) h(x2 , z2 ) and
Then
g(z1 , y1 ) g(z2 , y1 ) ≥ c2 . g(z1 , y2 ) g(z2 , y2 ) h1 (x1 , y1 ) h1 (x2 , y1 ) ≥ (1 − c1 + c22 c1 ). h1 (x1 , y2 ) h1 (x2 , y2 )
(2.18)
We will apply the lemma with p2D (x, y) in place of h1 (x, y), and p1D (x, y) in place of h(x, z) and g(z, y). We see from (2.18) that the constant c(D, 2) in (2.17) may be taken to
700
K. Burdzy, R. Hołyst, P. March
be c(D, 1)+c(D, 1)2 (1−c(D, 1)). By induction, we see that the constants c(D, 2n ) may be chosen in such a way that c(D, 2n ) = c(D, 2n−1 ) + c(D, 2n−1 )2 (1 − c(D, 2n−1 )). Then c(D, 2n ) → 1 as n → ∞. Obviously, we may assume that the function u → c(D, u) is non-decreasing. Hence, (2.17) holds for some c(D, u) satisfying c(D, u) → 1 as u → ∞. The inequality (2.17) easily implies that c(D, t)
D PtD (y, A) PtD (x, A) −1 Pt (y, A) ≤ ≤ c(D, t) , PtD (y, D) PtD (x, D) PtD (y, D)
for all x, y ∈ D. This in turn shows that D D D Pt (x, A)µ2 (dx) D Pt (x, A)µ1 (dx) c(D, t) ≤ D D D Pt (x, D)µ2 (dx) D Pt (x, D)µ1 (dx) D −1 D Pt (x, A)µ2 (dx) ≤ c(D, t) , D D Pt (x, D)µ2 (dx) for any probability measures µ1 and µ2 on D. Since c(D, t) → 1 as t → ∞, we see that for some fixed probability measure µ1 on D and any 7(µ), −1 D D D Pt (x, A)µ1 (dx) D Pt (x, A)µ(dx) 7(dµ) = 1. (2.19) lim D D t→∞ D Pt (x, D)µ1 (dx) D Pt (x, D)µ(dx) The normalized distribution of the killed Brownian motion in D converges to the normalized first eigenfunction ϕ1 of the Dirichlet Laplacian in D, i.e., D ϕ1 (y)dy D Pt (x, A)µ1 (dx) lim = A , D t→∞ D ϕ1 (y)dy D Pt (x, D)µ1 (dx) by the eigenfunction expansion for ptD (x, y). In view of (2.19), D ϕ1 (y)dy D Pt (x, A)µ(dx) 7(dµ) → A , D D ϕ1 (y)dy D Pt (x, D)µ(dx)
(2.20)
N
as t → ∞. By the stationarity of XMj , the right hand side of (2.16) does not depend on t and so (2.20) is in fact an equality. This observation combined with (2.16) completes the proof.
3. Appendix. Related Probabilistic and Physical Models We will discuss a few well known models and problems in probability and mathematical physics to which our paper is related. Before we do so, let us note that the original impulse for the article came from heuristic and numerical results presented in Burdzy, Hołyst, Ingerman and March (1996). This largely determined the direction of our research. The notes below may include some ideas for future research on our model, perhaps different in their flavor from the present article. (i) Superprocesses with interactions. Superprocesses, also known as measure-valued diffusions or Dawson–Watanabe diffusions, are processes whose states are measures.
Fleming–Viot Particle Representation of the Dirichlet Laplacian
701
Super-Brownian motion and the Fleming–Viot process with Brownian spatial motion are two of the most studied models in this class. The model introduced in this paper resembles most the Fleming–Viot process, which can be described in a heuristic way as follows. Consider N particles performing independent Brownian motions in Rd . Every ε units of time, two particles are chosen uniformly and the first particle jumps to the location of the second one. Between the jumps, the particles are independent Brownian motions. Assume for simplicity that all particles start from a fixed point. If N → ∞ and ε → 0, at a rate related to N, then the empirical distributions of the particles converge for every time t ≥ 0 to a random measure. It is known that in dimensions d ≥ 2, the measures are carried by sets of fractal nature. The original Fleming–Viot model sketched above assumes independence of the branching mechanism from the spatial distribution of the particles. In recent years, a number of papers have been devoted to processes which are similar but whose branching mechanism does depend on the spatial distribution of the particles. Roughly speaking, two closely related models have been considered – in one of them a “catalyst” is present, facilitating the branching of particles; the other model assumes that branching can be influenced by the local density of particles (see, e.g., Adler and Ivanitskaya (1996) Dawson and Fleischmann (1997), Dawson and Greven (1996), Dawson and Perkins (1998) and Klenke (1999)). Our model goes in a slightly different direction because we consider an “obstacle” (the set D c ) where the particles are killed although the offspring are generated in a uniform way across the whole population as in the original model. Our process might possibly represent a biological population, with a region D c having fatal effect on individuals. The assumption of the constant number of individuals is an idealization of the constant carrying capacity of an environment. Fleming–Viot models are sometimes applied to “populations” whose individual members are genes. The main qualitative difference between our model and the classical Fleming-Viot process with Brownian spatial motion is that in the limit, we obtain measures with smooth densities. (ii) Propagation of chaos. When we consider a large number of interacting particles then under some assumptions, two tagged particles will behave in an almost independent way j (see, e.g., Sznitman (1991)). In our case, two particles Xtk and Xt are almost independent when N is large. An even stronger result is true – the propagation of chaos holds for the entire trees of descendants of particles labeled k and j . The two claims are quite clear in view of the theorems and techniques presented in the paper but we will not give a rigorous proof here. (iii) Genetic algorithms. A very active area of applied and theoretical research deals with “genetic algorithms”. We mention a book of Man, Tang and Kwong (1999) as a possible starting entry point to this rapidly growing field. A genetic algorithm is a way to search for an answer to a problem by imitating biological genetic processes. Our model might be thought of as a genetic algorithm generating the first eigenvalue and the corresponding eigenfunction for the Dirichlet Laplacian. We do not make any claims of direct applicability of our model, especially in view of the fact that we do not present any theoretical estimates of the rate of convergence or computer simulations. We note however, that a related problem of finding the second Neumann eigenvalue (the “spectral gap”) is one of the most studied problems from both theoretical and practical points of view, for various Markov processes. (iv) Minimization of entropy production. It is postulated in physics (Wio (1994) III.5) that an irreversible system achieves a stationary state characterized by the minimum
702
K. Burdzy, R. Hołyst, P. March
entropy production. See the Prigogine–de Groot Theorem in Yourgrau, van der Merwe and Raw (1982); consult also a recent article of Ruelle (1997) on this topic. Entropy production has been studied in the context of stochastic processes, for example by Gong and Qian (1997) but we could not find a direct relationship between that paper and our model. We will explain how our model relates the principle of minimum entropy production to a minimizing property of the first Laplacian eigenfunction. In order to simplify the presentation, we will consider a slightly modified model in which the branching rate is constantly equal to λ1 , the first eigenvalue of the Laplacian in D with Dirichlet boundary conditions. In general, the branching rate does not have to be a constant. In the limit, when N → ∞, we obtain the following formula for the evolution of the density of the particle process, ∂p(x, t) = ;p(x, t) + λ1 p(x, t), (3.1) ∂t where ; represents the Laplacian (we ignored the probabilistic constant 1/2). The first term on the right-hand side represents the Brownian motion effect and the second one represents branching. We will use the notion of entropy proposed by Rényi (1961). He introduced a family of entropy measures parametrized by β, St (β) = (1 − β)−1 log p(x, t)β dx. D
We will consider one of these definitions corresponding to β = 2, p(x, t)2 dx. St = − log D
Using this definition of entropy, we obtain from (3.1), |∇p(x, t)|2 dx dS . = −λ1 + D 2 dt D p(x, t) dx The first term represents the decrease of the entropy in the system due to the flux of particles through the boundary. The second term represents the entropy production. The last quantity is always positive and is minimal in the stationary state, i.e., when dS/dt = 0. We see that the entropy production is minimal when |∇p(x, t)|2 dx D = λ1 . 2 D p(x, t) dx However, the same minimization problem defines the first eigenfunction of the Laplacian in D with Dirichlet boundary conditions leading to Eq. (3.1) in the stationary regime (dp/dt = 0). In this sense, the first eigenfunction minimizes the entropy production. We note that since λ1 is the mean escape rate from the system, the property of minimum entropy production is equivalent to the property of the minimum mean escape rate from the system. The Rényi entropy belongs to the class of entropies introduced in the nonextensive thermostatics (Pennini, Plastino and Plastino (1998)). In ordinary physical systems it is usually assumed – in view of the second law of thermodynamics – that entropy is an additive quantity and therefore has a properly defined density. This is the case when the
Fleming–Viot Particle Representation of the Dirichlet Laplacian
703
boundary conditions do not strongly influence the bulk properties of the system. This does not hold for the stochastic process considered in this paper since the process of branching in the middle of the system is induced by the flux of particles through the boundary. Acknowledgement. We are grateful to David Aldous, Wilfrid Kendall, Tom Kurtz, Jeff Rosenthal, Dan Stroock, Kathy Temple and Richard Tweedie for very useful advice. We would like to thank the anonymous referee for many suggestions for improvement.
References 1. Adler, R.J. and Ivanitskaya, L.: A superprocess with a disappearing self-interaction. J. Theoret. Probab. 9, 245–261 (1996) 2. Bass, R. and Burdzy, K.: Lifetimes of conditioned diffusions. Probab. Th. Rel. Fields 91, 405–443 (1992) 3. Burdzy, K.,Hołyst, R.,Ingerman, D. and March, P.: Configurational transition in a Fleming-Viot-type model and probabilistic interpretation of Laplacian eigenfunctions. J. Phys. A 29, 2633–2642 (1996) 4. Burdzy, K. andKhoshnevisan, D.: Brownian motion in a Brownian crack. Ann. Appl. Probab. 8, 708–748 (1998) 5. Burdzy, K.,Toby, E. andWilliams, R.J.: On Brownian excursions in Lipschitz domains. Part II. Local asymptotic distributions, In: Seminar on Stochastic Processes 1988, E. Cinlar, K.L. Chung, R. Getoor, J. Glover, editors, Boston: Birkhäuser, 1989, pp. 55–85 6. Dawson, D.A.: Infinitely divisible random measures and superprocesses. In: Stochastic Analysis and Related Topics, H. Körezlioglu and A.S. Üstünel, Eds, Boston: Birkhäuser, 1992 7. Dawson, D.A. and Fleischmann, K.: Longtime behavior of a branching process controlled by branching catalysts. Stoch. Process. Appl. 71, 241–257 (1997) 8. Dawson, D.A. and Greven, A.: Multiple Space-Time Scale Analysis For Interacting Branching Models. Electronic J. Probab. 1, paper no. 14, 1–84 (1996) 9. Dawson, D.A. and Perkins, E.A.: Long-time behavior and coexistence in a mutually catalytic branching model. Ann. Probab. 26, 1088–1138 (1998) 10. Down, D., Meyn, S.P. and Tweedie, R.L.: Exponential and uniform ergodicity of Markov processes. Ann. Probab. 23, 1671–1691 (1995) 11. Gong, G. and Qian, M.: Entropy production of stationary diffusions on non-compact Riemannian manifolds. Sci. China Ser. A 40, 926–931 (1997) 12. Itô, K. and McKean, P.: Diffusion Processes and Their Sample Paths. New York: Springer-Verlag, 2nd edition, 1974 13. Klenke, A.: A Review on Spatial Catalytic Branching. In: Festschrift in honour of D. Dawson, 1999, to appear 14. Man, K.F., Tang, K.S., and Kwong S.: Genetic algorithms. Concepts and designs. London: SpringerVerlag (1999) 15. Pennini, F., Plastino, A.R. and Plastino, A.: Rényi entropies and Fisher informations as measures of nonextensivity in a Tsallis setting. Physica A 258, 446–457 (1998) 16. Rényi, A.: On measures of entropy and information. Proc. 4-th Berkeley Symp. Math. Stat. Probab. 1, 547–561 (1961) 17. Ruelle, D.: Entropy production in nonequilibrium statistical mechanics. Commun. Math. Phys. 189, 365–371 (1997) 18. Sznitman, A.-S.: Topics in propagation of chaos, École d’Été de Probabilités de Saint-Flour XIX–1989, Lecture Notes in Math., 1464. Berlin: Springer, 1991, pp. 165–251 19. Wio, H.S.: An Introduction to Stochastic Processes and Nonequilibrium Statistical Physics. Singapore: World Scientific, 1994 20. Yourgrau, W., van der Merwe, A., and Raw, G.: Treatise on Irreversible and Statistical Thermophysics. New York: Dover Publications Inc., 2nd edition, 1982, pp. 48–52 Communicated by D. Brydges
Commun. Math. Phys. 214, 705 – 731 (2000)
Communications in
Mathematical Physics
© Springer-Verlag 2000
Passivity and Microlocal Spectrum Condition Hanno Sahlmann, Rainer Verch Institut für Theoretische Physik, Universität Göttingen, Bunsenstr. 9, 37073 Göttingen, Germany. E-mail:
[email protected];
[email protected] Received: 9 February 2000 / Accepted: 7 June 2000
Abstract: In the setting of vector-valued quantum fields obeying a linear wave-equation in a globally hyperbolic, stationary spacetime, it is shown that the two-point functions of passive quantum states (mixtures of ground- or KMS-states) fulfill the microlocal spectrum condition (which in the case of the canonically quantized scalar field is equivalent to saying that the two-pnt function is of Hadamard form). The fields can be of bosonic or fermionic character. We also give an abstract version of this result by showing that passive states of a topological ∗-dynamical system have an asymptotic pair correlation spectrum of a specific type. 1. Introduction A recurrent theme in quantum field theory in curved spacetime is the selection of suitable states which may be viewed as generalizations of the vacuum state familiar from quantum field theory in flat spacetime. The selection criterion for such states should, in particular, reflect the idea of dynamical stability under temporal evolution of the system. If a spacetime possesses a time-symmetry group (generated by a timelike Killing vector field), then a ground state with respect to the corresponding time-evolution appears as a good candidate for a vacuum-like state. More generally, any thermal equilibrium state for that time-evolution should certainly also be viewed as a dynamically stable state. Groundand thermal equilibrium states, and mixtures thereof, fall into the class of the so-called “passive” states, defined in [34]. An important result by Pusz and Woronowicz [34] asserts that a dynamical system is in a passive state exactly if it is impossible to extract energy from the system by means of cyclic processes. Since the latter form of passivity, i.e. the validity of the second law of thermodynamics, expresses a thermodynamical stability which is to be expected to hold generally for physical dynamical systems, one would expect that passive states are natural candidates for physical (dynamically stable) states in quantum field theory in curved spacetime, at least when the spacetime, or parts of it, possess time-symmetry groups. This point of view has been expressed in [7].
706
H. Sahlmann, R. Verch
In this work we study the relationship between passivity of a quantum field state and the microlocal spectrum condition for free quantum fields on a stationary, globally hyperbolic spacetime. The microlocal spectrum condition (abbreviated, µSC ) is a condition restricting the form of the wavefront sets, WF(ωn ), of the n-point distributions ωn of a quantum field state [6, 35]. For quasifree states, it suffices to restrict the form of WF(ω2 ); see relation (1.1) near the end of this Introduction for a definition of µSC in this case. There are several reasons why the µSC may rightfully be viewed as an appropriate generalization of the spectrum condition (i.e. positivity of the energy in any Lorentz frame), required for quantum fields in flat spacetime, to quantum field theory in curved spacetime. Among the most important is the proof by Radzikowski [35] (based on mathematical work by Duistermaat and Hörmander [12]) that, for the free scalar Klein–Gordon field on any globally hyperbolic spacetime, demanding that the two-point function ω2 obeys the µSC is equivalent to ω2 being of Hadamard form. This is significant since it appears nowadays well-established to take the condition that ω2 be of Hadamard form as criterion for physical (dynamically stable) quasifree states for linear quantum fields on curved spacetime in view of a multitude of results, cf. e.g. [15, 14, 31, 42, 43, 45, 47] and references given therein. Moreover, µSC has several interesting structural properties which are quite similar to those of the usual spectrum condition, and allow to some extent similar conclusions [6, 5, 44]. It is particularly worth mentioning that one may, in quasifree states of linear quantum fields fulfilling µSC, covariantly define Wick-products and develop the perturbation theory for P (φ)4 -type interactions along an Epstein–Glaser approach generalized to curved spacetime [5, 6]. Also worth mentioning is the fact that µSC has proved useful in the analysis of other types of problems in quantum field theory in curved spacetime [36, 30, 13]. In view of what we said initially about the significance of the concept of passivity for quantum field states on stationary spacetimes one would be inclined to expect that, on a stationary, globally hyperbolic spacetime, a passive state fulfills the µSC, at least for quasifree states of linear fields. And this is what we are going to establish in the present work. We should like to point out that more special variants of such a statement have been established earlier. For the scalar field obeying the Klein–Gordon equation on a globally hyperbolic, static spacetime, Fulling, Narcowich and Wald [16] proved that the quasifree ground state with respect to the static Killing vector field has a two-point function of Hadamard form, and thus fulfills µSC, as long as the norm of the Killing vector field is globally bounded away from zero. Junker [26] has extended this result by showing that, if the spacetime has additionally compact spatial sections, then the quasifree KMS-states (thermal equilibrium states) at any finite temperature fulfill µSC. But the requirement of having compact Cauchy-surfaces, or the constraint that the static Killing vector field have a norm bounded globally away from zero, exclude several interesting situations from applying the just mentioned results. A prominent example is Schwarzschild spacetime, which possesses a static timelike Killing flow, but the norm of the Killing vector field tends to zero as one approaches the horizon along any Cauchy-surface belonging to the static foliation. In [28] (cf. also [17]), quasifree ground- and KMS-states with respect to the Killing flow on Schwarzschild spacetime have been constructed for the scalar Klein– Gordon field, and it has long been conjectured that the two-point functions of these states are of Hadamard form. However, when trying to prove this along the patterns of [16] or [26], who use the formulation of quasifree ground- and KMS-states in terms of the Klein–Gordon field’s Cauchy-data, one is faced with severe infra-red problems even for massive fields upon giving up the constraint that the norm of the static Killing vector
Passivity and Microlocal Spectrum Condition
707
field be globally bounded away from zero. This has called for trying to develop a new approach to proving µSC for passive states, the result of which is our Theorem 5.1; see further below in this Introduction for a brief description. Then our Thm. 5.1 shows, as a corollary, that the quasifree ground- and KMS-states of the scalar Klein–Gordon field on Schwarzschild spacetime satisfy µSC (thus their two-point functions are of Hadamard form). [We caution the reader that this does not show that these states or rather, their “doublings” defined in [28], were extendible to Hadamard states on the whole of the Schwarzschild–Kruskal spacetime. There can be at most one single quasifree, isometryinvariant Hadamard state on Schwarzschild–Kruskal spacetime and this state necessarily restricts to a KMS-state at Hawking temperature on the (“outer, right”-) Schwarzschildpart of Schwarzschild–Kruskal spacetime, cf. [31, 29].] Before describing next the contents of this work, we wish to note that we have aimed at a quite self-contained presentation. Therefore, Sections 3 and 4 consist to major parts of summaries of well-established material from the literature (as will be described shortly), with some adaptations required for the present purposes. The inclusion of this material is mainly for the convenience of the reader. The novel results of the present work appear in Sections 2 and 5. In more detail, the organization of the paper is as follows. In Section 2, we will introduce the notion of “asymptotic pair correlation spectrum” of a state ω of a topological ∗-dynamical system. This object is to be viewed as a generalization of the wavefront set of the two-point function ω2 in the stated general setting, see [44] for further discussion. We then show that for (strictly) passive states ω the asymptotic pair correlation spectrum must be of a certain, asymmetric form. This asymmetry can be interpreted as the microlocal remnant of the asymmetric form of the spectrum that one would obtain for a ground state. Section 3 will be concerned with some aspects of wavefront sets of distributions on test-sections of general vector bundles. Section 3.1 contains a reformulation of the wavefront set for vector-bundle distributions along the lines of Prop. 2.2 in [44]. We briefly recapitulate some notions of spacetime geometry, as far as needed, in Subsect. 3.2. In Subsect. 3.3 we quote the propagation of singularities theorem (PST) for waveoperators acting on vector bundles, in the form used later in Section 5, from [8, 12]. In Subsect. 4.1 we introduce, following [32], the Borchers algebra of smooth testsections with compact support in a vector bundle over a Lorentzian spacetime, and briefly summarize the connection between states on the Borchers algebra, their GNSrepresentations, the induced quantum fields, and the Wightman n-point functions. We require that the quantum fields associated with the states are, in a weak sense, bosonic or fermionic, i.e., they fulfill a weak form of (twisted) locality. A quite general formulation of (bosonic or fermionic) quasifree states will be given in Subsect. 4.3. Section 5 contains our main result, saying that for a state ω on the Borchers algebra associated with a given vector bundle, over a globally hyperbolic, stationary spacetime (M, g) as base manifold, the properties (i) ω is (strictly) passive, (ii) ω fulfills a weak form of (twisted) locality, and (iii) ω2 is a bi-solution up to C ∞ for a wave operator, imply WF(ω2 ) ⊂ R,
(1.1)
where R is the set of pairs of non-zero covectors (q, ξ ; q , ξ ) ∈ T∗ M × T∗ M so that g µν ξν is past-directed and lightlike, the base points q and q are connected by an affinely
708
H. Sahlmann, R. Verch
parametrized, lightlike geodesic γ , and both ξ and −ξ are co-tangent to γ , or ξ = −ξ if q = q . Following [6], we say that the quasifree state with two-point function ω2 fulfills the µSC if the inclusion (1.1) holds. If one had imposed the additional requirement that ω (resp., the associated quantum fields) fulfill appropriate vector-bundle versions of the CCR or CAR, one would conclude that WF(ω2 ) = R, as is e.g. the case for the free scalar Klein–Gordon field (cf. [35]). Moreover, for a quasifree state ω on the Borchers algebra of a vector bundle over any globally hyperbolic spacetime one can show that imposing CCR or CAR implies that ω2 is of Hadamard form (appropriately generalized) if and only if WF(ω2 ) = R. The discussion of these matters will be contained in a separate article [38]. 2. Passivity and Asymptotic Pair Correlation Spectrum Let A be a C ∗ -algebra with unit and {αt }t∈R a one-parametric group of automorphisms of A, supposed to be strongly continuous, that is, ||αt (A) − A|| → 0 as t → 0 for each A ∈ A. Moreover, let D(δ) denote the set of all A ∈ A such that the limit δ(A) := lim
t→0
1 (αt (A) − A) t
exists. One can show that D(δ) is a dense ∗-subalgebra of A, and δ is a derivation with domain D(δ). Following [34], one calls a state ω on A passive if for all unitary elements U ∈ D(δ) which are continuously connected to the unit element,1 the estimate 1 ω(U ∗ δ(U )) ≥ 0 i
(2.1)
is fulfilled. As a consequence, ω is invariant under {αt }t∈R : ω◦αt = ω for all t ∈ R. Furthermore, it can be shown (cf. [34]) that ground states or KMS-states at inverse temperature β ≥ 0 for αt are passive, as are convex sums of such states. (In Appendix A we will summarize some basic properties of ground states and KMS-states. Standard references include [4, 39].) However, the significance of passive states is based on two remarkable results in [34]. First, a converse of the previous statement is proven there: If a state is completely passive, then it is a ground state or a KMS-state at some inverse temperature β ≥ 0. Here a state is called completely passive if, for each n ∈ N, the product state ⊗n ω is a passive state on ⊗n A with respect to the dynamics {⊗n αt }t∈R . Secondly, the following is established in [34]: the dynamical system modelled by A and {αt }t∈R is in a passive state precisely if it is impossible to extract energy from the system by means of cyclic processes. In that sense, passive states may be viewed as good candidates for physically realistic states of any dynamical system since for these states the second law of thermodynamics is warranted. In the present section we are interested in studying the asymptotic high frequency behaviour of passive states along similar lines as developed recently in [44]. We shall, 1 i.e. there exists a continuous curve [0, 1] t → U (t) ∈ D(δ) with each U (t) unitary and U (0) = 1 , A U (1) = U .
Passivity and Microlocal Spectrum Condition
709
however, generalize the setup since this will prove useful for developments later in this work. Thus, we assume now that A is a topological ∗-algebra with a locally convex topology and with a unit element (cf. e.g. [40]). We denote by S the set of continuous semi-norms for A. Moreover, we say that {αt }t∈R is a continuous one-parametric group of ∗-automorphisms of A if for each t, αt is a topological ∗-automorphism of A, and if the group action is locally bounded and continuous in the sense that for each σ ∈ S there is σ ∈ S, r > 0 with σ (αt (A)) ≤ σ (A) for all |t| < r, A ∈ A, and σ (αt (A) − A) → 0 as t → 0 for each A ∈ A. Then we refer to the pair (A, {αt }t∈R ) as a topological ∗-dynamical system. Using the fact that for all A, B ∈ A and σ ∈ S, the maps C → σ (AC) and C → σ (CB) are again continuous semi-norms on A, one deduces by a standard argument that also σ (αs (A)αt (B) − AB) → 0 as s, t → 0. A continuous linear functional ω on A will be called a state if ω(A∗ A) ≥ 0 for all A ∈ A and if ω(1A ) = 1. Furthermore, we say that ω is a ground state, or a KMS-state at inverse temperature β > 0, for {αt }t∈R , if the functions t → ω(Aαt (B)) are bounded for all A, B ∈ A, and if ω satisfies the ground state condition (A.1) or the KMS-condition (A.2) given in Appendix A, respectively. Now we call a family (Aλ )λ>0 with Aλ ∈ A a global testing family in A provided there is for each σ ∈ S an s ≥ 0 (depending on σ and on the family) such that sup λs σ (A∗λ Aλ ) < ∞. λ
(2.2)
The set of all global testing families will be denoted by A. Let ω be a state on A, and ξ = (ξ1 , ξ2 ) ∈ R2 \{0}. Then we say that ξ is a regular direction for ω, with respect to the continuous one-parametric group {αt }t∈R , if there exists some h ∈ C0∞ (R2 ) and an open neighbourhood V of ξ in R2 \{0} such that 2 −iλ−1 k·t sup e h(t)ω(αt1 (Aλ )αt2 (Bλ )) dt = O ∞ (λ) as λ → 0 (2.3) k∈V
holds for all global testing families (Aλ )λ>0 , (Bλ )λ>0 ∈ A. Then we define the set ACS 2A (ω) as the complement in R2 \{0} of all k which are regular directions for ω. We call ACS 2A (ω) the global asymptotic pair correlation spectrum of ω. The asymptotic pair correlation spectrum, and more generally, asymptotic n-point correlation spectra of a state, may be regarded as generalizations of the notion of a wavefront set of a distribution in the setting of states on a dynamical system. We refer to [44] for considerable further discussion and motivation. The properties of ACS 2A (ω) are analogous to those of ACS 2 (ω) described in [44, Prop. 3.2]. In particular, ACS 2A (ω) It is evident that, if ω is a finite convex sum of states ωi , is a closed conic set in R2 \{0}. then ACS 2A (ω) is contained in i ACS 2A (ωi ). Now we are going to establish an upper bound for ACS 2A (ω), distinguished by a certain asymmetry, for all ω in a subset P of the set of all passive states, to be defined next: We define P as the set of all states on A which are of the form ω(A) =
m
ρi ωi (A),
A ∈ A,
(2.4)
i=1 2 We shall write ϕ(λ) = O ∞ (λ) as λ → 0 iff for each s ∈ N there are C , λ > 0 so that |ϕ(λ)| ≤ C λs s s s for all 0 < λ < λs .
710
H. Sahlmann, R. Verch
where m ∈ N, ρi > 0, m i=1 ρi = 1, and each ωi is a ground state or a KMS-state at some inverse temperature βi > 0 (note that βi = 0 is not admitted!) on A with respect to {αt }t∈R . The states in P will be called strictly passive. We should like to remark that in the present general setting where A is not necessarily a C ∗ -algebra, the criterion for passivity given at the beginning in (2.1) may be inappropriate since it could happen that D(δ), even if dense in A, doesn’t contain sufficiently many unitary elements. In the C ∗ -algebraic situation, (2.1) entails the slightly weaker variant 1 ω(A δ(A)) ≥ 0 i
(2.5)
for all A = A∗ ∈ D(δ), and one may take this as a substitute for the condition of passivity of a state in the present more general framework (supposing that D(δ) is dense). In fact, each ω ∈ P is {αt }t∈R -invariant and satisfies (2.5) (see Appendix A), and in the C ∗ -algebraic situation, every ω ∈ P also satisfies (2.1). Proposition 2.1. Let (A, {αt }t∈R ) be a topological ∗-dynamical system as described above. (1) Let ω ∈ P. Then either or
ACS 2A (ω) = ∅,
ACS 2A (ω) = {(ξ1 , ξ2 ) ∈ R2 \{0} : ξ1 + ξ2 = 0, ξ2 ≥ 0}.
(2) Let ω be an {αt }t∈R -invariant KMS-state at inverse temperature β = 0. Then either or
ACS 2A (ω) = ∅,
ACS 2A (ω) = {(ξ1 , ξ2 ) ∈ R2 \{0} : ξ1 + ξ2 = 0}.
Proof. 1) By assumption ω is continuous, hence we can find a seminorm σ ∈ S so that |ω(A)| ≤ σ (A) for all A ∈ A. Thus there are positive constants c and s so that ω(αt (A∗λ Aλ )) = ω(A∗λ Aλ ) ≤ c · (1 + λ−1 )s
(2.6)
holds for all t ∈ R. In the first equality, the invariance of ω was used, and in the second, condition (2.2) was applied. Thus, for any Schwartz-function hˆ ∈ S(R2 ), and any (Aλ )λ>0 , (Bλ )λ>0 in A, one obtains that the following function of λ > 0 and k ∈ R2 , ˆ wλ (k) := e−ik·t h(t)ω(α t1 (Aλ )αt2 (Bλ )) dt depends smoothly on k and satisfies the estimate |wλ (λ−1 k)| ≤ c (|k| + λ−1 + 1)r with suitable constants c > 0, r ∈ R. Hence, this function satisfies the assumptions of Lemma 2.2 in [44]. Application of that lemma entails the following: Suppose that for ˆ some open neighbourhood V of ξ ∈ R2 \{0} we can find some hˆ ∈ S(R2 ) with h(0) =1 and −iλ−1 k·t ˆ h(t)ω(αt1 (Aλ )αt2 (Bλ )) dt = O ∞ (λ) as λ → 0 sup e (2.7) k∈V
Passivity and Microlocal Spectrum Condition
711
for all (Aλ )λ>0 , (Bλ )λ>0 ∈ A. Then this implies that the analogous relation holds with hˆ replaced by φ · hˆ for any φ ∈ C0∞ (R2 ) when simultaneously V is replaced by some slightly smaller neighbourhood V of ξ . Consequently, relation (2.7) – with hˆ ∈ S(R2 ), ˆ h(0) = 1 – entails that ξ is absent from ACS 2A (ω). 2) Some notation needs to be introduced before we can proceed. For f ∈ S(R), we define (τs f )(s ) := f (s − s)
and
r
f (s ) := f (−s ),
s, s ∈ R.
Then we will next establish ω◦αt = ω
⇒
ACS 2A (ω) ⊂ {(ξ1 , ξ2 ) ∈ R2 \{0} : ξ1 + ξ2 = 0}.
(2.8)
To this end, let ξ = (ξ1 , ξ2 ) ∈ R2 \{0} be such that ξ1 + ξ2 = 0, and pick some δ > 0 and an open neighbourhood Vξ of ξ so that |k1 + k2 | > δ for all k ∈ Vξ . Now pick two functions hj ∈ C0∞ (R) (j = 1, 2) such that their Fourier-transforms −it ·p hˆ j (tj ) = √1 e j hj (p) dp have the property hˆ j (0) = 1. Define hˆ ∈ S(R2 ) by 2π ˆ h(t) := hˆ 1 (t1 )hˆ 2 (t2 ). Then observe that one can find λ0 > 0 such that the functions gλ,k (p) := ((τ−λ−1 (k1 +k2 ) rh1 ) · h2 )(p),
p ∈ R,
(2.9)
vanish for all k = (k1 , k2 ) ∈ Vξ and all 0 < λ < λ0 . Consequently, also the functions fλ,k (p) := (τλ−1 k2 gλ,k )(p),
p ∈ R,
(2.10)
vanish for all k ∈ Vξ and all 0 < λ < λ0 . Denoting the Fourier-transform of fλ,k by fˆλ,k , one obtains for all k ∈ Vξ , 0 < λ < λ0 : 0= = =
fˆλ,k (s)ω(Aλ αs (Bλ )) ds −1 (k +k )s 1 2
e−iλ
−1 k·t
e−iλ
−1 k s 2
e−iλ
hˆ 1 (s )hˆ 2 (s + s)ω(Aλ αs (Bλ ))ds ds
ˆ h(t)ω(α t1 (Aλ )αt2 (Bλ )) dt
for all testing-families (Aλ )λ>0 , (Bλ )λ>0 . Invariance of ω under {αt }t∈R was used in passing from the second equality to the last. In view of step 1.) above, this shows (2.8). 3) In a further step we will argue that ω ground state
⇒
ACS 2A (ω) ⊂ {(ξ1 , ξ2 ) ∈ R2 \{0} : ξ2 ≥ 0}.
(2.11)
So let again hj and hˆ j as above, and fλ,k as in (2.10) with Fourier-transform fˆλ,k . Let ξ = (ξ1 , ξ2 ) ∈ R2 \{0} have ξ2 < 0. Then there is an open neighbourhood Vξ of ξ and an 0 > 0 so that k2 < −0 for all k ∈ Vξ . The support of fλ,k is contained in the support of τλ−1 k2 h2 , and there is clearly some λ0 > 0 such that suppτλ−1 k2 h2 ⊂ (−∞, 0) for all k = (k1 , k2 ) ∈ Vξ as soon as 0 < λ < λ0 . By the characterization of a ground state
712
H. Sahlmann, R. Verch
given in (A.1), and using also the {αt }t∈R -invariance of a ground state, one therefore obtains −1 ˆ sup e−iλ k·t h(t)ω(α (A )α (B )) dt t1 λ t2 λ k∈Vξ
= sup fˆλ,k (s)ω(Aλ αs (Bλ )) ds k∈Vξ
= 0
if 0 < λ < λ0
for all (Aλ )λ>0 , (Bλ )λ>0 ∈ A. Relation (2.11) is thereby proved. 4) Now we turn to the case ω KMS at β > 0
⇒
ACS 2A (ω) ⊂ {(ξ1 , ξ2 ) ∈ R2 \{0} : ξ2 ≥ 0}.
(2.12)
Consider a ξ ∈ R2 \{0} with ξ2 < 0 and pick some 0 > 0 and an open neighbourhood Vξ of ξ so that k2 < −0 for all k = (k1 , k2 ) ∈ Vξ . Choose again hj and hˆ j as above and define correspondingly gλ,k and fλ,k as in (2.9) and (2.10), respectively. Denote again their Fourier-transforms by gˆ λ,k and fˆλ,k . Note that gλ,k and fλ,k are in C0∞ (R) for all λ > 0 and all k ∈ R2 , so their Fourier-transforms are entire analytic. Moreover, a standard estimate shows that sup |gˆ λ,k (s + iβ)| ds ≤ c < ∞. (2.13) λ>0,k∈R2
One calculates −1 −1 fˆλ,k (s + iβ) = eλ k2 β e−iλ k2 s gˆ λ,k (s + iβ),
s ∈ R,
and now the KMS-condition (A.2) yields for all (Aλ )λ>0 , (Bλ )λ>0 ∈ A, fˆλ,k (s)ω(Aλ αs (Bλ )) ds λ−1 k β −iλ−1 k2 s 2 = e e gˆ λ,k (s + iβ)ω(αs (Bλ )Aλ ) ds −1 k β 2
≤ eλ
c · c (1 + λ−1 )s ,
λ > 0, k ∈ R2 ,
for suitable c , s > 0, where (2.6) and (2.13) have been used. Making use also of the {αt }t∈R -invariance of ω one finds, with suitable γ > 0, −1 ˆ sup e−iλ k·t h(t)ω(α (A )α (B )) dt t1 λ t2 λ k∈Vξ = sup fˆλ,k (s)ω(Aλ αs (Bλ )) ds k∈Vξ
−1 0β
≤ γ e−λ
(1 + λ−1 )s = O ∞ (λ)
as λ → 0
for all (Aλ )λ>0 , (Bλ )λ>0 ∈ A. This establishes statement (2.12).
Passivity and Microlocal Spectrum Condition
713
5) Combining now the assertions (2.8), (2.11) and (2.12), one can see that for each ω ∈ P there holds ACS 2A (ω) ⊂ {(ξ1 , ξ2 ) ∈ R2 \{0} : ξ1 + ξ2 = 0, ξ2 ≥ 0}. Since the set on the right-hand side obviously has no proper conic subset in R2 \{0}, one concludes that statement (1) of the proposition holds true. 6) As ω is KMS at β = 0, this means that it is a trace: ω(AB) = ω(BA). Since ω is also {αt }t∈R -invariant, we have ACS 2A (ω) ⊂ {(ξ1 , ξ2 ) ∈ R2 \{0} : ξ1 + ξ2 = 0}. The set on the right hand side has precisely two proper closed conic subsets W± := {(ξ1 , ξ2 ) ∈ R2 \{0} : ξ1 + ξ2 = 0, ±ξ2 ≥ 0}. These two sets are disjoint, W+ ∩ W− = ∅, and we have W+ = −W− . Hence, since ω is a trace, one can argue exactly as in [44, Prop. 4.2] to conclude that either ACS 2A (ω) ⊂ W+ or ACS 2A (ω) ⊂ W− imply ACS 2A (ω) = ∅. This establishes statement (2) of the proposition. ! " Hence we see that strict passivity of ω results in its ACS 2A (ω) being asymmetric. This is due to the fact that, roughly speaking, the negative part of the spectrum of the unitary group implementing {αt }t∈R in such a state is suppressed by an exponential weight factor. It is worth noting that this asymmetry is not present for KMS-states at β = 0. Such states at infinite temperature would hardly be regarded as candidates for physical states, and they can be ruled out by the requirement that ACS 2A (ω) be asymmetric. Remark 2.2. One can modify or, effectively, enlarge the set of testing families by allowing a testing family to depend on additional parameters: Define A2 as the set of all families (Ay,λ )λ>0,y∈Rm , where m ∈ N is arbitrary (and depends on the family) having the property that for each semi-norm σ ∈ S there is an s ≥ 0 (depending on σ and on the family) such that sup λs σ (A∗y,λ Ay,λ ) < ∞. λ,y
(2.14)
Then the definition of a regular direction k ∈ R2 \{0} for a state ω of the dynamical system (A, {αt }t∈R ) may be altered through declaring ξ a regular direction iff there are an open neighbourhood V of ξ and a function h ∈ C0∞ (R2 ), h(0) = 1, so that −1 sup sup e−iλ k·t h(t)ω(αt1 (Ay,λ )αt2 (Bz,λ )) dt = O ∞ (λ) as λ → 0 k∈V y,z
holds for any pair of elements (Ay,λ )λ>0,y∈Rm , (Bz,λ )λ>0,z∈Rn in A2 . This makes the set of regular directions a priori smaller, and if we define ACS 2A2 (ω) as the complement of all ξ ∈ R2 \{0} that are regular directions for ω according to the just given, altered definition then clearly we have, in general, ACS 2A2 (ω) ⊃ ACS 2A (ω). However, essentially by repeating – with somewhat more laborious notation – the proof of Prop. 2.1, one can see that the statements of Prop. 2.1 remain valid upon replacing ACS 2A (ω) by ACS 2A2 (ω). We shall make use of that observation later.
714
H. Sahlmann, R. Verch
3. Wavefront Sets and Propagation of Singularities 3.1. Wavefront sets of vectorbundle-distributions. Let X be a C ∞ vector bundle over a base manifold N (n = dim N ∈ N) with typical fibre isomorphic to Cr or to Rr ; the bundle projection will be denoted by πN . (We note that here and throughout the text, we take manifolds to be C ∞ , Hausdorff, 2nd countable, finite dimensional and without boundary.) We shall write C ∞ (X) for the space of smooth sections of X and C0∞ (X) for the subspace of smooth sections with compact support. These spaces can be endowed with locally convex topologies in a like manner as for the corresponding test-function spaces E(Rn ) and D(Rn ), cf. [9, 10] for details. By (C ∞ (X)) and (C0∞ (X)) we denote the respective spaces of continuous linear functionals, and by C0∞ (XU ) the space of all smooth sections in X having compact support in the open subset U of N . For later use, we introduce the following terminology. We say that ρ is a local diffeomorphism of some manifold X if ρ is defined on some open subset U1 = domρ of X and maps it diffeomorphically onto another open subset U2 = Ranρ of X. If U1 = U2 = X, then ρ is a diffeomorphism as usual. Let ρ be a (local) diffeomorphism of the base manifold N . Then we say that R is a (local) bundle map of X covering ρ if R is a smooth map from πN−1 (domρ) to πN−1 (Ranρ) so that, for each q in domρ, R maps the fibre over q linearly into the fibre over ρ(q). If this map is also one-to-one and if R is also a local diffeomorphism, then R will be called a (local) morphism of X covering ρ. Moreover, let (ρx )x∈B be a family of (local) diffeomorphisms of N depending smoothly on x ∈ B, where B is an open neighbourhood of 0 ∈ Rs for some s ∈ N. Then we call (Rx )x∈B a family of (local) morphisms of X covering (ρx )x∈B if each Rx is a morphism of X covering ρx , depending smoothly on x ∈ B. Note that each bundle map R of X covering a (local) diffeomorphism ρ of N induces a (local) action on C0∞ (X) in the form of a continuous linear map R 9 : C0∞ (Xdomρ ) → C0∞ (XRanρ ) given by R 9 f := R ◦f ◦ρ −1 ,
f ∈ C0∞ (Xdomρ ).
(3.1)
Given a local trivialization of X over some open U ⊂ N , this induces a oneto-one correspondence between C0∞ (XU ) and ⊕r D(U ), inducing in turn a one-toone correspondence between (C0∞ (XU )) and ⊕r D (U ). Now let u ∈ (C0∞ (XU )) and let (u1 , . . . , ur ) ∈ ⊕r D (U ) be the corresponding r-tupel of scalar distributions on U induced by the local trivialization of X over U . The wavefront set WF(u) of u ∈ (C0∞ (XU )) may then be defined as the union of the wavefront sets of the components ua , i.e. WF(u) :=
r
WF(ua ),
(3.2)
a=1
cf. [8].3 It is not difficult to check that this definition is, in fact, independent of the choice of local trivialization of X over U , and thus yields a definition of WF(u) for all u ∈ (C0∞ (X)) having the properties familiar of the wavefront set of scalar distributions on the base manifold N , so that WF(u) is a conical subset of T∗ N \{0}. Another characterization of WF(u) may be given in the following way. Let q ∈ U and ξ ∈ Tq∗ N \{0}. Choose any chart for U around q, thus identifying q with 0 ∈ Rn 3 We assume that the reader is familiar with the concept of the wavefront set of a scalar distribution, which is presented e.g. in the textbooks [25, 10, 37].
Passivity and Microlocal Spectrum Condition
715
and ξ with ξ ∈ T0∗ Rn ≡ Rn via the dual tangent map of the chart. With respect to the chosen coordinates, we introduce translations: ρˇx (y) := y + x, and dilations: δˇλ (y) := λy on a sufficiently small coordinate ball around y = 0 and taking λ > 0 and the norm of x ∈ Rn small enough so that the coordinate range isn’t left. Via pulling these actions back with help of the chart they induce families of local diffeomorphisms (ρx )x∈B and (δλ )00 ∈ Fq (X) one has −iλ−1 k·x 9 h(x)u(Rx fλ ) dx = O ∞ (λ) as λ → 0. (3.3) sup e k∈V
Proof. Select a local trivialization of X over U . With respect to it, there are smooth GL(r)-valued functions (Rba (x))ra,b=1 of x such that 4 u(Rx9 fλ ) = Rba (x)ua (fλb ◦ρx−1 ). Now suppose that (q, ξ ) is not in WF(u), so that (q, ξ ) isn’t contained in any of the WF(ua ). Then, making use of the fact that the wavefront set of a scalar distribution may be characterized in terms of the decay properties of its localized Fourier-transforms in any coordinate chart (cf. [25]) in combination with Prop. 2.1 and Lemma 2.2 in [44], one obtains immediately the relation (3.3). Conversely, assume that (3.3) holds. Since (Rba (x))ra,b=1 is in GL(r) for each x and depends smoothly on x, we can find a smooth family (Scb (x))rb,c=1 of functions of x so that Scb (x)Rba (x) = δca , x ∈ B. Since (3.3) holds, one may apply Lemma 2.2 of [44] to the effect that for some open neighbourhood V of ξ and for all ((0, . . . , ϕλ , . . . , 0))λ>0 ∈ Fq (X), where only the cth entry is non-vanishing, one has −1 sup e−iλ k·x h(x)uc (ϕλ ◦ρx−1 ) dx k∈V −iλ−1 k·x b a −1 = sup e h(x)Sc (x)Rb (x)ua (ϕλ ◦ρx ) dx k∈V ∞
= O (λ)
as λ → 0.
Then one concludes from Prop. 2.1 in [44] that (q, ξ ) isn’t contained in WF(uc ) for each c = 1, . . . , r. ! " 4 Summation over repeated indices will be assumed from now on. See also footnote 5.
716
H. Sahlmann, R. Verch
A very useful property is the behaviour of the wavefront set under (local) morphisms of X. We put on record here the following lemma without proof, which may be obtained by extending the proof for the scalar case in [25] together with some of the arguments appearing in the proof of Lemma 3.1. Lemma 3.2. Let U1 and U2 be open subsets of N , and let R : XU1 → XU2 be a vector bundle map covering a diffeomorphism ρ : U1 → U2 . Let u ∈ (C0∞ (XU1 )) . Then it holds that t t WF(R 9 u) ⊂ DρWF(u) = {(ρ −1 (x), Dρ · ξ ) : (x, ξ ) ∈ WF(u)},
(3.4)
t where Dρ denotes the transpose (or dual) of the tangent map of ρ. If R is even a bundle morphism, then the inclusion (3.4) specializes to an equality.
3.2. Briefing on spacetime geometry. Since several concepts of spacetime geometry are going to play some role later on, we take the opportunity to introduce them here and establish the corresponding notation. We refer to the standard references [46, 23] for a more thorough discussion and also for definition of some well-established terminology that is not always introduced explicitly in the following. Let us assume that (M, g) is a spacetime, so that M is a smooth manifold of dimension m ≥ 2, and g is a Lorentzian metric having signature (+, −, . . . , −). It will also be assumed that the spacetime is time-orientable, and that a time-orientation has been chosen. Then one introduces, for any subset G of M, the corresponding future/past sets J ± (G), consisting of all points lying on piecewise smooth, continuous future/pastdirected causal curves emanating from G. A subset G ⊂ M is, by definition, causally separated from G if it has void intersection with J + (G) ∪ J − (G). Thus a pair of points (q, p) ∈ M × M is called causally separated if q is causally separated from p or vice versa, since this relation is symmetric. A smooth hypersurface @ in M is called a Cauchy-surface if each inextendible causal curve in (M, g) intersects @ exactly once. Spacetimes (M, g) possessing Cauchysurfaces are called globally hyperbolic. It can be shown that a globally hyperbolic spacetime admits smooth one-parametric foliations into Cauchy-surfaces. Globally hyperbolic spacetimes have a very well-behaved causal structure. A certain property of globally hyperbolic spacetimes will be important for applying the propagation of singularities theorem in Sect. 5, so we mention it here: Let v be a nonzero lightlike vector in Tq M for some q ∈ M. It defines a maximal smooth, affinely d parametrized geodesic γ : I → M with the properties γ (0) = q and dt γ (t)t=0 = v where “maximal” here refers to choosing I as the largest real interval (I is taken as a neighbourhood of 0, and may coincide e.g. with R) where γ is a smooth solution of the geodesic equation compatible with the specified data at q. Then γ is both future- and past-inextendible (see e.g. the argument in [35, Prop. 4.3]), and consequently, given an arbitrary Cauchy-surface @ ⊂ M, there is exactly one parameter value t ∈ I so that γ (t) ∈ @. 3.3. Wave-operators and propagation of singularities. Suppose that we are given a timeoriented spacetime (M, g). Then let V be a vector bundle with base manifold M, typical fibre isomorphic to Cr , and bundle projection πM . Moreover, we assume that there exists a morphism C of V covering the identity map of M which is involutive (C ◦C = idV ) and
Passivity and Microlocal Spectrum Condition
717
acts anti-isomorphically on the fibres; in other words, C acts like a complex conjugation in each fibre space. Therefore, the C-invariant part V◦ of V is a vector bundle over the base M with typical fibre Rr . A linear partial differential operator P : C0∞ (V) → C0∞ (V) will be said to have metric principal part if, upon choosing a local trivialization of V over U ⊂ M in which sections f ∈ C0∞ (VU ) take the component representation (f 1 , . . . , f r ), and a chart (x µ )m µ=1 , one has the following coordinate representation for 5 P: (Pf )a (x) = g µν (x)∂µ ∂ν f a (x) + Aν ab (x)∂ν f b (x) + Bba (x)f b (x). Here, ∂µ denotes the coordinate derivative ∂x∂ µ , and Aν ab and Bba are suitable collections of smooth, complex-valued functions. Observe that thus the principal part of P diagonalizes in all local trivializations (it is “scalar”). If P has metric principal part and is in addition C-invariant, i.e. C 9 ◦P ◦C 9 = P ,
(3.5)
then we call P a wave operator. In this case, P leaves the space C0∞ (V◦ ) of C 9 -invariant sections invariant. As an aside we note that then there is a covariant derivative (linear connection) ∇ (P ) on V◦ together with a bundle map v of V◦ covering idM such that Pf = g µν ∇µ(P ) ∇ν(P ) f + v 9 f for all f ∈ C0∞ (V◦ ); this covariant derivative is given by (P )
2 · ∇gradϕ f = P (ϕf ) − ϕP (f ) − (✷g ϕ)f for all ϕ ∈ C0∞ (M, R) and f ∈ C0∞ (V◦ ), where ✷g denotes the d’Alembert-operator induced by the metric g on scalar functions [20]. Before we can state the version of the propagation of singularities theorem that will be relevant for our considerations later, we need to introduce further notation. By V V we denote the outer product bundle of V. This is the C ∞ -vector bundle over M × M whose fibres over (q1 , q2 ) ∈ M × M are Vq1 ⊗ Vq2 , where Vqj denotes the fibre over qj (j = 1, 2), and with base projection defined by vq1 ⊗ v q2 → (q1 , q2 )
for
vq1 ⊗ v q2 ∈ Vq1 ⊗ Vq2 .
Note also that the conjugation C on V induces a conjugation 2 C on VV by anti-linear extension of the assignment 2 C(vq1 ⊗ v q2 ) := Cvq1 ⊗ Cv q2 ,
qj ∈ M.
The definition of n V, the n-fold outer tensor product of V, should then be obvious, and likewise the definition of n C. Going to local trivializations and using partition of unity arguments, it is not difficult to see that the canonical embedding C0∞ (V) ⊗ C0∞ (V) ⊂ C0∞ (V V) is dense ([10]). 5 Greek indices are raised and lowered with g µ (x), latin indices with δ a . ν b
718
H. Sahlmann, R. Verch
Moreover, if we take some L ∈ (C0∞ (V V)) , then it induces a bilinear form F over C0∞ (V) by setting F(f, f ) = L(f ⊗ f ),
f, f ∈ C0∞ (V).
(3.6)
Clearly F is then jointly continuous in both entries. On the other hand, if F is a bilinear form over C0∞ (V) which is separately continuous in both entries (f → F(f, f ) and f → F(f , f ) are continuous maps for each fixed f ), then the nuclear theorem implies that there is an L ∈ (C0∞ (V V)) inducing F according to (3.6) [10]. These statements generalize to the case of n-fold tensor products in the obvious manner. Now define 6 N := {(q, ξ ) ∈ T∗ M\{0} : g µν (q)ξµ ξν = 0}. Moreover, define for each pair (q, ξ ; q , ξ ) ∈ N × N : (q, ξ ) ∼ (q , ξ ) iff there exists an affine parametrized lightlike geodesic γ in (M, g) connecting q and q and such that ξ and ξ are co-tangent to γ at q and q , respectively. d Here, we say that ξ is co-tangent to γ at q = γ (s) if ( dt γ (t))µ = g µν (q)ξν , t=s where t is the affine parameter. Therefore, (q, ξ ) ∼ (q , ξ ) means ξ and ξ are parallel transports of each other along the lightlike geodesic γ connecting q and q . Note that the possibility q = q is included, in which case (q, ξ ) ∼ (q , ξ ) means ξ = ξ . One can introduce the following two disjoint future/past-oriented parts (with respect to the time-orientation of (M, g)) of N , N± := {(q, ξ ) ∈ N | ± ξ ✄ 0},
(3.7)
where ξ ✄ 0 means that the vector ξ µ = g µν ξν is future-pointing. The relation “∼” is obviously an equivalence relation between elements in N . For (q, ξ ) ∈ N , the corresponding equivalence class is denoted by B(q, ξ ); it is a bicharacteristic strip of any wave operator P on V since such an operator has metric principal part and therefore its bi-characteristics are lightlike geodesics (see, e.g. [30]). Now we are ready to state a specialized version of the propagation of singularities theorem (PST) which is tailored for two-point distributions that are solutions (up to C ∞ -terms) of wave operators, and which derives as a special case of the PST in [8]. We should like to point out that the formulation of the PST in [8] (extending arguments developed in [12] for the scalar case) is considerably more general in two respects: First, it applies, with suitable modifications, not only to linear second order differential operators with metric principal part, but to pseudo-differential operators on C0∞ (V) that have a so-called “real principal part” (of which “metric principal part” is a special case, note also that a metric principal part is homogeneous). Secondly, the general formulation of the PST gives not only information about the wavefront set of a u ∈ (C0∞ (V)) which is a solution up to C ∞ -terms of a pseudo-differential operator A having real principal part (i.e. WF(Au) = ∅), but even describes properties of the polarization set of such a u. The polarization set WFpol (u) of u ∈ (C0∞ (V)) is a subset of the direct product bundle T∗ M ⊕ V over M and specifies which components of u (in a local trivialization of V) have the worst decay properties in Fourier-space near any given base point in M; 6 The notation (q, ξ ) ∈ T∗ M means that ξ ∈ T∗ M, i.e. q denotes the base point of the cotangent q vector ξ .
Passivity and Microlocal Spectrum Condition
719
the projection of WFpol (u) onto its T∗ M-part coincides with the wavefront set WF(u). The reader is referred to [8] for details and further discussion, and also to [33, 24] for a discussion of the polarization set for Dirac fields on curved spacetimes. As a corollary to the PST formulated in [8] together with Lemma 6.5.5 in [12] (see also [30] for an elementary account), one obtains the following: Proposition 3.3. Let P be a wave operator on C0∞ (V) and define for w ∈ (C0∞ (VV)) the distributions w(P ) , w(P ) ∈ (C0∞ (V V)) by w(P ) (f ⊗ f ) := w(Pf ⊗ f ),
(3.8)
w(P ) (f ⊗ f ) := w(f ⊗ Pf ), for all f, f ∈ C0∞ (V). Suppose that WF(w(P ) ) = ∅ = WF(w (P ) ). Then it holds that WF(w) ⊂ N × N and
(q, ξ ; q , ξ ) ∈ WF(w) with ξ = 0 and ξ = 0 ⇒ B(q, ξ ) × B(q , ξ ) ⊂ WF(w). 4. Quantum Fields 4.1. The Borchers algebra. We begin our discussion of linear quantum fields obeying a wave equation by recalling the definition and basic properties of the Borchers-algebra [2]. Let V denote a vector bundle over the base-manifold M as in the previous section. Then consider the set ∞ n B := {f ≡ (fn )∞ n=0 : f0 ∈ C, fn ∈ C0 ( V), only finitely many fn = 0},
where n V denotes the n-fold outer product bundle of V, cf. Sect. 3. The set B is a priori a vector space, but one may also introduce a ∗-algebraic structure on it: A product f · g for elements f, g ∈ B is given by defining the nth component (f · g)n to be (f · g)n := fi ⊗ gj . i+j =n
Here, fi ⊗ gj is understood as the element in C0∞ (n V) induced by the canonical embedding C0∞ (i V) ⊗ C0∞ (j V) ⊂ C0∞ (n V). Observe that B possesses a unit th element 1B , given by the sequence ((1B )n )∞ n=0 having the number 1 in the 0 component ∗ while all other components vanish. Moreover, for f ∈ B one can define f by setting fn∗ (q1 , . . . , qn ) := (n C)fn (qn , . . . , q1 ),
qj ∈ M,
(4.1)
for the nth component of f∗ , where C denotes the complex conjugation assumed to be given on V. This yields an anti-linear involution on B. With these definitions of product and ∗-operation, B is a ∗-algebra. Furthermore, B has a natural “local net structure” in the sense that one obtains an inclusion-preserving map M ⊃ O → B(O) ⊂ B taking subsets O of M to unital ∗-subalgebras B(O) of B upon defining B(O) to consist of all (fn )∞ n=0 for which supp fn ⊂ O, n ∈ N.
720
H. Sahlmann, R. Verch
Another simple fact is that (local) morphisms of V commuting with C can be lifted to (local) automorphisms of B. To this end, let (Rx )x∈B be a family of (local) morphisms of V covering (ρx )x∈B , and assume that CRx = Rx C for all x. Suppose that O ⊂ M is in the domain of ρx ; then define a map αx on B(O) by setting for f ∈ B(O) the nth component, (αx f)n , of αx f to be (αx f)n := (n Rx9 )fn ,
(4.2)
where (n Rx9 )(g (1) ⊗ · · · ⊗ g (n) ) := Rx9 g (1) ⊗ · · · ⊗ Rx9 g (n) ,
g (j ) ∈ C0∞ (V),
defines the outer product action of Rx9 via linear extension on C0∞ (n V). It is not difficult to check that this yields a ∗-isomorphism αx : B(O) → B(ρx (O)). We will now turn B into a locally convex space by giving it the topology of the strict inductive limit of the toplogical vector spaces Bn := C ⊕
n
C0∞ (k V),
n ∈ N.
k=1
This topology is known as the locally convex direct sum topology (see e.g. [3, Chap. II, §4 n◦ 5]). Some important properties of B, equipped with this topology are given in the following lemma, the proof of which will be deferred to Appendix B. Lemma 4.1. With the topology given above, B is complete and a topological ∗-algebra. Moreover, a linear functional u : B → C is continuous if and only if there is a sequence ∞ j (un )∞ n=0 with u0 ∈ C and uj ∈ (C0 ( V)) for j ∈ N so that u(f) = u0 f0 + uj (fj ), f ∈ B. (4.3) j ∈N
If α is a ∗-automorphism lifting a morphism R of V to B as in (4.2), then α is continuous. Moreover, let (Rx )x∈B be a family of morphisms of V depending smoothly on x with CRx = Rx C and R0 = idV , and let (αx )x∈B be the induced family of ∗-automorphisms of B induced according to (4.2). Then for each f ∈ B it holds that αx (f) → f
for
x → 0,
(4.4)
and there is a constant r > 0 such that to each continuous semi-norm σ of B one can find another semi-norm σ with the property σ (αx (f)) ≤ σ (f),
|x| ≤ r, f ∈ B.
(4.5)
4.2. States and quantum fields. A state ω on B is a continuous linear form on B which fulfills the positivity requirement ω(f∗ f) ≥ 0 for all f ∈ B. By Lemma 4.1 such a state ω is completely characterized by a set {ωn |n ∈ N0 } of linear functionals ωn ∈ (C0∞ (n V)) , the so-called n-point functions. The positivity requirement allows it to associate with any state ω a Hilbertspace ∗-representation by the well-known Gelfand–Naimark–Segal (GNS) construction (or the Wightman reconstruction theorem [41]). More precisely, given a state on B, there exists a triple (ϕ, D ⊂ H, G), called GNS-representation of ω, possessing the following properties:
Passivity and Microlocal Spectrum Condition
721
(a) H is a Hilbertspace, and D is a dense linear subspace of H. (b) ϕ is a ∗-representation of B on H by closable operators with common domain D. (c) G is a unit vector contained in D which is cyclic, i.e. D = ϕ(B)G, and has the property that ω(f) = *G, ϕ(f)G+, f ∈ B. Furthermore, the GNS-representation is unique up to unitary equivalence. We refer to [40, Part II] for further details on ∗-representations of ∗-algebras as well as for a proof of these statements and references to the relevant original literature. Therefore, a state ω on B induces a quantum field – that is to say, an operator-valued distribution C0∞ (V) f → H(f ) := ϕ(f),
f = (0, f, 0, 0, . . . ),
(4.6)
where the H(f ) are, for each f ∈ C0∞ (V), closable operators on the dense and invariant domain D and one has H(C 9 f ) ⊂ H(f )∗ , where H(f )∗ denotes the adjoint operator of H(f ). Conversely, such a quantum field induces states on B: Given some unit vector ψ ∈ D, the assignment ω
(ψ)
(f
(1)
ω(ψ) (c · 1B ) := c, c ∈ C, ⊗ · · · ⊗ f (n) ) := *ψ, H(f (1) ) · · · H(f (n) )ψ+,
f (j ) ∈ C0∞ (V),
defines, by linear extension, a state ω(ψ) on B. (Obviously this generalizes from vector states to mixed states.) If the quantum field H is an observable field, then one would require commutativity at causal separation, and this means H(f )H(f ) = H(f )H(f ) whenever the supports of f and f are causally separated. Such commutative behaviour (locality) of H at causal separation is characteristic of bosonic fields. On the other hand, a field H is fermionic if it anti-commutes at causal separation (twisted locality), i.e. H(f )H(f ) = −H(f )H(f ) for causally separated supports of f and f . The general analysis of quantum field theory so far has shown that the alternative of having quantum fields of bosonic or fermionic character may largely be viewed as generic at least for spacetime dimensions greater than 2 [21, 41, 11, 19]. If ω is a state on B inducing via its GNS-representation a bosonic field, then it follows (−) that the commutator ω2 of its two-point function, defined by (−)
ω2 (f ⊗ f ) :=
1 (ω2 (f ⊗ f ) − ω2 (f ⊗ f )), 2
f, f ∈ C0∞ (V),
vanishes as soon as the supports of f and f are causally separated. If, on the other hand, ω induces a fermionic field, then the anti-commutator, (+)
ω2 (f ⊗ f ) :=
1 (ω2 (f ⊗ f ) + ω2 (f ⊗ f )), 2
f, f ∈ C0∞ (V),
of its two-point function vanishes when the supports of f and f are causally separated. For our purposes in Sect. 5, we may assume a weaker version of bosonic or fermionic (+) (−) behaviour of quantum fields: We shall later suppose that ω2 or ω2 is smooth (C ∞ ) at causal separation. The definition relevant for that terminology is as follows:
722
H. Sahlmann, R. Verch
Definition 4.2. Let w ∈ (C0∞ (V V)) . We say that w is smooth at causal separation if WF(wQ ) = ∅, where Q is the set of all pairs of points (q, q ) ∈ M × M which are causally separated in (M, g) 7 and wQ denotes the restriction of w to C0∞ ((V V)Q ).
4.3. Quasifree states. Of particular interest are quasifree states associated with quantum fields obeying canonical commutation relations (CCR) or canonical anti-commutation relations (CAR). A simple way of introducing them is via the characterization of such states given in [29] which we will basically follow here. Note, however, that in this reference the map K in (4.7) is defined on certain quotients of C0∞ (V◦ ) while we define K on C0∞ (V◦ ) itself (recall that C0∞ (V◦ ) is the space of C 9 -invariant sections). This is due to the fact that we haven’t imposed CCR or CAR for states on the Borchers algebra, so the notion of quasifree states given here is, in this respect, more general. Let h be a complex Hilbertspace (the so-called “one-particle Hilbertspace”) and † F± (h) the bosonic/fermionic Fock-space over h. By a± (.) and a± (.) we denote the corresponding annihilation and creation operators, respectively. The Fock-vacuum vector will be denoted by G± . Then we say that a state ω on B is a (bosonic/fermionic) quasifree state if there exists a real-linear map K : C0∞ (V◦ ) → h
(4.7)
whose complexified range is dense in h, such that the GNS-representation (ϕ, D ⊂ H, G) of ω takes the following form: H = F± (h), G = G± , and
1 † (K(f )) , H(f ) = √ a± (K(f )) + a± 2
f ∈ C0∞ (V◦ ),
where H(.) relates to ϕ(.) as in (4.6). Quasifree states are in a sense the most simple states. It is, however, justified to consider prominently those states since for quantum fields obeying a linear wave-equation, ground- and KMS-states turn out to be quasifree in examples. Any quasifree state ω is entirely determined by its two-point function, i.e. by the map C0∞ (V) × C0∞ (V) (f (1) , f (2) ) → ω(f (1) ⊗ f (2) ) = *G, H(f (1) )H(f (2) )G+, in the sense that the n-point functions ωn (f (1) ⊗ · · · ⊗ f (n) ) = *G, H(f (1) ) · · · H(f (n) )G+,
f (j ) ∈ C0∞ (V),
vanish for all odd n, while the n-point functions for even n can be expressed as polynomials in the variables ω2 (f (i) ⊗f (j ) ), i, j = 1, . . . , n. This attaches particular significance to the two-point functions for quantum fields obeying linear wave equations. We refer to [4, 29, 1] for further discussion of quasifree states and their basic properties. 7 Q is an open subset in M × M due to global hyperbolicity.
Passivity and Microlocal Spectrum Condition
723
5. Passivity and Microlocal Spectrum Condition In the present section we will state and prove our main result connecting passivity and microlocal spectrum condition for linear quantum fields obeying a hyperbolic wave equation on a globally hyperbolic, stationary spacetime. First, we need to collect the assumptions. It will be assumed that V is a vector bundle, equipped with a conjugation C, over a base manifold M carrying a time-orientable Lorentzian metric g, and that (M, g) is globally hyperbolic. Moreover, we assume that the spacetime (M, g) is stationary, so that there is a one-parametric C ∞ -group {τt }t∈R of isometries whose generating vector field, denoted by ∂ τ , is everywhere timelike and future-pointing (with respect to a fixed time-orientation). We recall that the notation N± for the future/past-oriented parts of the set of null-covectors N has been introduced in (3.7), and note that (q, ξ ) ∈ N± iff ±ξ(∂ τ ) > 0. It is furthermore supposed that there is a smooth one-parametric group {Tt }t∈R of morphisms of V covering {τt }t∈R , and a wave operator P on C0∞ (V), having the following properties: Tt9 ◦P = P ◦Tt9 ,
C ◦Tt = Tt ◦C,
t ∈ R.
Now let B again denote the Borchers algebra as in Sect. 4. The automorphism group induced by lifting {Tt }t∈R on B according to (4.2) will be denoted by {αt }t∈R . Whence, by Lemma 4.1, (B, {αt }t∈R ) is a topological ∗-dynamical system. Recall that a state ω on B is, by definition, contained in P if it is a convex combination of ground- or KMS-states at strictly positive inverse temperature for {αt }t∈R . Theorem 5.1. Let ω ∈ P and let ω2 be the two-point distribution of ω (see Sect. 4.1). (P ) (P ) Suppose that WF(ω2 ) = ∅ = WF(ω2(P ) ), where ω2 and ω2(P ) are defined as in (+) (−) (3.8), and suppose also that the symmetric part ω2 or the anti-symmetric part ω2 of the two-point distribution is smooth at causal separation (Definition 4.2). Then it holds that WF(ω2 ) ⊂ R, where R is the set R := {(q, ξ ; q , ξ ) ∈ N− × N+ : (q, ξ ) ∼ (q , −ξ )}.
(5.1)
Proof. 1) Let q be any point in M. Then there is a coordinate chart κ = (y 0 , y) = (y 0 , y 1 , . . . , y m−1 ) around q so that, for small |t|, κ ◦τt = τˇt ◦κ holds on a neighbourhood of q, where τˇt (y 0 , y) := (y 0 + t, y). In such a coordinate system, we can also define “spatial” translations ρˇx (y 0 , y) := (y 0 , y + x) for x = (x 1 , . . . , x m−1 ) in a sufficiently small neighbourhood B of the origin in Rm−1 . Let (Rx )x∈B be any smooth family of local morphisms around q covering (ρx )x∈B , where ρx := κ −1 ◦ρˇx ◦κ (on a sufficiently small neighbourhood of q). Now let q be another point, and choose in an analogous manner as for q a coordinate system κ , and (ρx )x ∈B and (Rx )x ∈B .
724
H. Sahlmann, R. Verch
2) In a further step we shall now establish the relation WF(ω2 ) ⊂ {(q, ξ ; q , ξ ) ∈ (T∗ M × T∗ M)\{0} : ξ(∂ τ ) + ξ (∂ τ ) = 0, ξ (∂ τ ) ≥ 0}. (5.2) Since we have WF(ω2 ) ⊂ N × N by Prop. 3.2, this then allows us to conclude that WF(ω2 ) ⊂ {(q, ξ ; q , ξ ) ∈ N− × N+ : ξ(∂ τ ) + ξ (∂ τ ) = 0},
(5.3)
and we observe that thereby the possibility (q, ξ ; q , ξ ) ∈ WF(ω2 ) with ξ = 0 or ξ = 0 is excluded, because that would entail both ξ = 0 and ξ = 0. For proving (5.2) it is in view of Lemma 3.1 and according to our choice of the coordinate systems κ, κ and corresponding actions (Rx )x∈B and (Rx )x ∈B sufficient to demonstrate that the following holds: There is a function h ∈ C0∞ (Rm × Rm ) with h(0) = 1, and for each (ξ ; ξ ) = (ξ0 , ξ ; ξ0 , ξ ) ∈ (Rm × Rm )\{0} with ξ0 + ξ0 = 0 or ξ0 < 0 there is an open neighbourhood V ∈ (Rm × Rm )\{0} so that sup
(k;k )∈V
−1 e−iλ−1 (tk0 +x·k) e−iλ (t k0 +x ·k ) h(t, x; t , x )ω2 ((T 9 R 9 ⊗ T 9 R 9 )Fλ ) dt dt dx dx t x t x = O ∞ (λ)
(5.4)
as λ → 0 holds for all (Fλ )λ>0 ∈ F(q;q ) (V V). (The notation k = (k0 , k) should be obvious.) However, making use of part (c) of the statement of Prop. 2.1 in [44], for proving (5.4) it is actually enough to show that there are h and V as above so that
sup
(k;k )∈V
−1 e−iλ−1 (tk0 +x·k) e−iλ (t k0 +x ·k ) h(t, x; t , x )ω2 (T 9 R 9 fλ ⊗ T 9 R 9 f ) dt dt dx dx t x λ t x = O ∞ (λ)
(5.5)
as λ → 0 holds for all (fλ )λ>0 ∈ Fq (V) and all (fλ )λ>0 ∈ Fq (V). In order now to exploit the strict passivity of ω via Prop. 2.1, we define the set B2 of testing families with respect to the Borchers algebra B in the same manner as we have defined the set A2 of testing families for the algebra A in Remark 2.2. In other words, a B-valued family (fz,λ )λ>0,z∈Rn is a member of B2 , for arbitrary n ∈ N, whenever for each continuous seminorm σ on B there is some s ≥ 0 so that ∗ fz,λ ) < ∞. sup λs σ (fz,λ z,λ
Now if (fλ )λ>0 is in Fq (V), then (fx,λ )λ>0,x∈B defined by fx,λ := (0, Rx9 fλ , 0, 0, . . . )
(5.6)
is easily seen to be a testing family in B2 . The same of course holds when taking any (fλ )λ>0 ∈ Fq (V) and defining (fx ,λ )λ>0,x ∈B accordingly. Since ω ∈ P, it follows from Prop. 2.1 and Remark 2.2 that, with respect to the time-translation group {αt }t∈R , ACS 2B2 (ω) ⊂ {(ξ0 , ξ0 ) ∈ R2 \{0} : ξ0 + ξ0 = 0, ξ0 ≥ 0}.
Passivity and Microlocal Spectrum Condition
725
And this means that there is some h0 ∈ C0∞ (R2 ) with h0 (0) = 1, and for each (ξ0 , ξ0 ) ∈ R2 \{0} with ξ0 + ξ0 = 0 or ξ0 < 0 an open neighbourhood V0 in R2 \{0} so that e−iλ−1 (tk0 +t k0 ) h0 (t, t )ω(αt (fx,λ )αt (f )) dt dt = O ∞ (λ) (5.7) sup x ,λ (k0 ,k0 )∈V0 ,x,x
as λ → 0 for all (fx,λ )λ0,x ∈B in B2 . When (fx,λ )λ>0,x∈B ∈ B2 relates to (fλ )λ>0 ∈ Fq (V) as in (5.6), and if their primed counterparts are likewise related, then for sufficiently small |t| and x ∈ B, x ∈ B one has ω(αt (fx,λ )αt (fx ,λ )) = ω2 (Tt9 Rx9 fλ ⊗ Tt9 Rx 9 fλ ) for small enough λ. Whence, upon taking V = {(k0 , k; k0 , k ) : (k0 , k0 ) ∈ V0 , k, k ∈ Rm−1 } and h(t, x; t , x ) = h0 (t, t )h(x, x ), where h is in C0∞ (Rm−1 × Rm−1 ) with h(0) = 1, and with h0 and h having sufficiently small supports, it is now easy to see that (5.7) entails the required relation (5.5), proving (5.2), whence (5.3) is also established. (+)
3) Now we shall show the assumption that ω2 is smooth at causal separation to imply (−) that also ω2 and hence, ω2 itself is smooth at causal separation. The same conclusion (−) can be drawn assuming instead that ω2 is smooth at causal separation. We will present the proof only for the first mentioned case, the argument for the second being completely analogous. We define Q as the set of pairs of causally separated points (q, q ) ∈ M × M. The (+) restriction of ω2 to C0∞ ((V V)Q ) will be denoted by ω2Q . By assumption, ω2Q (−)
has empty wavefront set and therefore WF(ω2Q ) = WF(ω2Q ). Since (q, q ) ∈ Q iff (q , q) ∈ Q, the “flip” map ρ : (q, q ) → (q , q) is a diffeomorphism of Q. Then R : Vq ⊗ Vq vq ⊗ v q → v q ⊗ vq ∈ Vq ⊗ Vq
(5.8)
is a morphism of (V V)Q covering ρ. Thus one finds [R 9 (f ⊗ f )](q, q ) = f (q) ⊗ f (q ), implying
(−)
(−)
(−)
ω2Q (R 9 (f ⊗ f )) = ω2Q (f ⊗ f ) = −ω2Q (f ⊗ f )
for all f ⊗ f ∈ C0∞ ((V V)Q ). Noting that multiplication by constants different from zero doesn’t change the wavefront set of a distribution, this entails, with Lemma 3.2, (−)
(−)
(−)
WF(ω2Q ) = WF(ω2Q ◦R 9 ) = tDρWF(ω2Q ).
(5.9)
Now it is easy to check that Dρ(q, ξ ; q , ξ ) = (q , ξ ; q, ξ )
t
for all (q, ξ ; q , ξ ) ∈ T∗ M × T∗ M, and this implies t
Dρ(N− × N+ ) = N+ × N− .
(5.10)
726
H. Sahlmann, R. Verch (+)
However, since we already know from (5.3) that WF(ω2Q ) ⊂ N− ×N+ and WF(ω2Q ) = (−)
∅, we see that WF(ω2Q ) ⊂ N− × N+ . Combining this with (5.9) and (5.10) yields (−)
WF(ω2Q ) ⊂ (N− × N+ ) ∩ (N+ × N− ) = ∅. And thus we conclude that ω2 is smooth at causal separation. 4) Now we will demonstrate that the wavefront set has the form (5.1) for points (q, q) on the diagonal in M × M, by demonstrating that otherwise singularities for causally separated points would occur according to the propagation of singularities (Prop. 3.3). To this end, let (q, ξ ; q, ξ ) be in WF(ω2 ) with ξ not parallel to ξ . In view of the observation made below (5.3) that we must have ξ = 0 and ξ = 0, we obtain from Prop. 3.3 B(q, ξ ) × B(q, ξ ) ⊂ WF(ω2 ). For any Cauchy surface of M, one can find (p, η; p , η ) in B(q, ξ ) × B(q, ξ ) with p and p lying on that Cauchy surface because of the inextendibility of the bi-characteristics. Since ξ is not parallel to ξ , one can even choose that Cauchy surface so that p = p (if such a choice were not possible, the bi-characteristics through q with cotangent ξ and ξ would coincide). But this is in contradiction to the result of 3) since p and p are causally separated. Hence, only (q, ξ ; q, ξ ) with ξ = λξ , λ ∈ R can be in WF(ω2 ). Applying the constraint ξ(∂ τ ) + ξ (∂ τ ) = 0 found in (5.3) gives λ = −1. Together with the other constraint WF(ω2 ) ⊂ N− × N+ of (5.3) we now see that if (q, ξ ; q, ξ ) is in WF(ω2 ) it must be in R. 5) It will be shown next that ω2 is smooth at points (q, q ) in M × M which are causally related but not connected by any lightlike geodesic: Suppose (q, ξ ; q , ξ ) were in WF(ω2 ) with q, q as described. Using global hyperbolicity and the inextendibility of the bi-characteristics, we can then find (p, η) in B(q, ξ ) with p lying on the same Cauchy surface as q . As p cannot be equal to q by assumption, it must be causally separated from q , and so we have by Prop. 3.3 a contradiction to 3). Thus, ω2 must indeed be smooth at (q, q ). 6) Finally, we consider the case of points (q, q ) connected by at least one lightlike geodesic: Let (q, ξ ; q , ξ ) be in WF(ω2 ). To begin with, we assume additionally that ξ is not co-tangential to any of the lightlike geodesics connecting q and q . As in 4) we then find (p, η; p , η ) in B(q, ξ ) × B(q , ξ ) with p and p lying on the same Cauchy surface and p = p , thus establishing a contradiction to 3). To cover the remaining case, let ξ be co-tangential to one of the lightlike geodesics connecting q and q . As a consequence, we find η with (q , η) ∈ B(q, ξ ). By 4), we have η = −ξ , ξ ✄ 0, showing (q, ξ ; q , ξ ) to be in R. ! " We conclude this article with a few remarks. First we mention that for the canonically quantized scalar Klein–Gordon field, WF(ω2 ) ⊂ R implies WF(ω2 ) = R and thus the two-point function of every strictly passive state is of Hadamard form, see [35]. Results allowing similar conclusions for vector-valued fields subject to CCR or CAR will appear in [38]. In [27], quasifree ground states have been constructed for the scalar Klein-Gordon field on stationary, globally hyperbolic spacetimes where the norm of the Killing vector field is globally bounded away from zero. Our result shows that they all have two-point functions of Hadamard form. As mentioned in the introduction, quasifree ground- and KMS-states have also been constructed for the scalar Klein-Gordon field on Schwarzschild spacetime [28], and again we conclude that their two-point functions are of Hadamard form.
Passivity and Microlocal Spectrum Condition
727
In [18], massive vector fields are quantized on globally hyperbolic, ultrastatic spacetimes using (apparently) a ground state representation, and our methods apply also in this case.
Appendix A. Ground- and KMS-States, Passivity Let (A, {αt }t∈R ) be a topological ∗-dynamical system as described in Sect. 2. We recall that a continuous linear functional ω : A → C is called a state if ω(A∗ A) ≥ 0 for all A ∈ A and ω(1A ) = 1. Now let fˆ(t) := √1 e−ipt f (p) dp, f ∈ C0∞ (R), denote 2π the Fourier-transform. Note that fˆ extends to an entire analytic function of t ∈ C. Then a convenient way of defining ground- and KMS-states is the following: The state ω is a ground state for (A, {αt }t∈R ) if R t → ω(Aαt (B)) is, for each A, B ∈ A, a bounded function and if moreover, ∞ fˆ(t)ω(Aαt (B)) dt = 0, A, B ∈ A, (A.1) −∞
holds for all f ∈ C0∞ ((−∞, 0)). The state ω is a KMS state at inverse temperature β > 0 for (A, {αt }t∈R ) if R t → ω(A, αt (B)) is, for each A, B ∈ A, a bounded function and if moreover,
∞ −∞
fˆ(t)ω(Aαt (B)) dt =
∞
∞
fˆ(t + iβ)ω(αt (B)A) dt,
A, B ∈ A,
(A.2)
holds for all f ∈ C0∞ (R). The state ω is a KMS state at inverse temperature β = 0 if ω is {αt }t∈R -invariant and a trace, i.e. ω(AB) = ω(BA),
A, B ∈ A.
(A.3)
(Note that we have here additionally imposed {αt }t∈R -invariance in the definition of KMS state at β = 0. Other references define a KMS state at β = 0 just by requiring it to be a trace. The invariance doesn’t follow from that, cf. [4].) We note that various other, equivalent definitions of ground- and KMS-states are known (mostly formulated for the case that (A, {αt }t∈R ) is a C ∗ -dynamical system), see e.g. [4] and [39] as well as references cited there. The term “KMS” stands for Kubo, Martin and Schwinger who introduced and used the first versions of condition (A.2). The significance of KMS-states as thermal equilibrium states, particularly for infinite systems in quantum statistical mechanics, has been established in [22]. The following properties of any ground- or KMS-state at inverse temperature β > 0, ω, are standard in the setting of C ∗ -dynamical systems, and the proofs known for this case carry over to topological ∗-dynamical systems:
728
H. Sahlmann, R. Verch
(i) ω is {αt }t∈R -invariant, (ii) −iω(Aδ(A)) ≥ 0 for all A = A∗ ∈ D(δ) (where δ and D(δ) are as introduced at the beginning of Sect. 2). Let us indicate how one proceeds in proving these statements. We first consider the case where ω is a ground state. Since A contains a unit element, the ground state condition (A.1) says that for any A ∈ A the Fourier-transform of the function t → ω(αt (A)) vanishes on (−∞, 0). For A = A∗ , that Fourier-transform is symmetric and hence is supported at the origin. As t → ω(αt (A)) is bounded, its Fourier-transform can thus only be a multiple of the Dirac-distribution. This entails that t → ω(αt (A)) is constant. By linearity, this carries over to arbitrary A ∈ A, and thus ω is {αt }t∈R -invariant. Now we may pass to the GNS-representation (ϕ, D ⊂ H, G) of ω (cf. Sect. 4 where this object was introduced for the Borchers-algebra, but the construction can be carried out for topological ∗-algebras, see [40]) and we observe that, if ω is invariant, then {αt }t∈R is in the GNS-representation implemented by a strongly continuous unitary group {Ut }t∈R leaving G as well as the domain D = ϕ(A)G invariant. This unitary group is defined by Ut ϕ(A)G := ϕ(αt A)G,
A ∈ A, t ∈ R.
Since it is continuous, it possesses a selfadjoint generator H , i.e. Ut = eitH , and the ground state condition implies that the spectrum of H is contained in [0, ∞). Therefore, one has for all A ∈ D(δ), −iω(A∗ δ(A)) = *ϕ(A)G, H ϕ(A)G+ ≥ 0, and this entails property (ii). Now let ω be a KMS-state at inverse temperature β > 0. For the proof of its {αt }t∈R invariance, see Prop. 4.3.2 in [39]. Property (ii) is then a consequence of the so-called “auto-correlation lower bounds”, see [39, Thm. 4.3.16] or [4, Thm. 5.3.15]. (Note that the proofs of the cited theorems generalize to the case where ω is a state on a topological ∗-algebra.) B. Proof of Lemma 4.1 We now want to give the proof of Lemma 4.1. First, we state some properties of the strict inductive limit of a sequence of locally convex spaces, the topology given to B being a specific example. See for example [3, II, §4] for proofs as well as for further details. Let (En )∞ n=1 be a sequence of locally convex linear spaces such that En ⊂ En+1 and the relative topology of En in En+1 coincides with the genuine topology of En for all n ∈ N. Let E be the inductive limit of the En , denote by πn : En P→ E the canonical imbeddings of the En into E and let F be some locally convex space. In this situation, we have: (a) E is a locally convex space. (b) A map f : E → F is continuous iff f ◦ πn is continuous for each n in N. (c) A family of maps (fι )ι , fι : E → F is equicontinuous iff the family (fι ◦ πn )ι is equicontinuous for each n in N. (d) The relative topology of En in E coincides with the genuine topology of En . (e) If the En are complete, so is E.
Passivity and Microlocal Spectrum Condition
729
Now we can prove the lemma: Because of (e), B is complete. The characterization (4.3) of the continuous linear forms on B is a special case of (b). We want to check now that B is a topological ∗-algebra, i.e. that its ∗-operation is continuous and its multiplication m : B × B → B separately continuous in both entries: For f ∈ B let mf : B → B be the right multiplication with f and denote by [f] the smallest integer such that fk = 0 for all k > [f]. By (b), showing continuity of mf amounts to showing the continuity of the maps mf ◦ πn : Bn −→ Bn+[f] , where by (d), we can take the topologies involved to be the genuine topologies of the respective spaces. As those topologies are direct sum topologies with finitely many summands, the question of continuity can be further reduced, finding that mf is continuous iff the maps C0∞ (n V) g −→ g ⊗ f P→ C0∞ (n+k V) are continuous for all f ∈ C0∞ (k V), k ∈ N. That this is indeed the case can be checked by taking recourse to the topologies of the C0∞ (l V). Therefore, the maps mf ◦ πn are continuous for all n, which in turn shows the continuity of right multiplication on B. In the same way, the proof of continuity of the ∗-operation reduces to showing continuity of the ∗-operation (4.1) on C0∞ (n V), which in turn is easy. The continuity of the left-multiplication can be proven completely analogous to that of right-multiplication or inferred from it, using the continuity of the ∗-operation. Therefore, B equipped with the locally convex direct sum topology is indeed a topological ∗-algebra. For the proof of the last statements of the lemma, let αx be a ∗-homomorphism of B which is the lift of a morphism Rx of V covering ρx as stated in Lemma 4.1. Because of (b) and (d) above, αx is continuous iff its restrictions αx ◦ πn : Bn → Bn are continuous which in turn is the case, iff the maps n Rx9 : C0∞ (n V) → C0∞ (n V) are continuous. That the n Rx9 are indeed continuous follows from density of n C0∞ (V) in C0∞ (n V) together with the continuity of Rx9 on C0∞ (V), the latter of which can again be checked by inspection of the topology on C0∞ (V). For the proof of the continuity property (4.4), note that [αx (f)] = [f] for all x, thus it suffices to prove the convergence of αx (f) for x → 0 in the topology induced on B[f] . But this convergence is implied by the assumed smoothness of Rx (hence of Rx9 ) in x together with (d). The proof of (4.5) amounts to showing that (αx )|x|≤r is an equicontinuous set of maps. By (c) and (d), the proof can in the by now familiar way be reduced to proving equicontinuity of (Rx9 )|x|≤r for some r > 0. For the proof of the latter, note that because of the assumed smoothness of ρx in x we find r > 0 such that for each compact set K ⊂ M the set ∪ ρx (K) is contained in some other compact set. Inspection of the |x| 0, log | det S(−reiθ )| > −Cδ r n , r > r0 , θ ∈ (0, θ0 ) \ (r), |(r)| < δ,
(4)
which follows from [4, Lemma 1] and the minimum modulus estimates for functions of finite order in an angle (see [2, Theorem 56] where it is a consequence of the Carleman inequality recalled in [4, (4.2)]; see also [5, Sect.7] for a direct argument leading to a similar estimate). As in the proof of [4, Lemma 2], Cartan’s lemma gives that | Re gρ,τ (reiθ )| ≤ Cδ τ n , θ ∈ (0, θ0 ) \ (r), |(r)| < δ, |reiθ − τ | < τ/B, Re gρ,τ (λ) ≤ Cτ n , |τ − λ| < τ/B, Im λ ≥ 0,
734
V. Petkov, M. Zworski
where we used (4) to obtain the first inequality. We can now apply the Harnack inequality correctly to conclude that | Re gρ,τ (λ)| ≤ Cγ −1 τ n , Im λ > γ τ, |λ − τ | ≤ τ/2B. From the unitarity of the scattering matrix we also have that ¯ Re gρ,τ (λ) = − Re gρ,τ (λ). To obtain a lower bound everywhere we apply the following lemma which was kindly pointed out to us by W. K. Hayman: Lemma. Suppose that u is harmonic in D(0, 1), and |u(z)| ≤ Then, for every
K , u(z) = −u(¯z), z ∈ D(0, 1). | Im z|
> 0, there exists C = C( ) such that |u(z)| ≤ CK| Im z|, z ∈ D(0, 1 − ).
Proof. We use the Poisson formula and the symmetry (and we can assume that the hypotheses hold in a slightly bigger disc): 2π (1 − r 2 )u(eiϕ ) 1 dϕ u(reiθ ) = 2π 0 1 − 2r cos(θ − ϕ) + r 2 1 π 8(1 − r 2 )r sin θ sin ϕu(eiϕ ) = dϕ. π 0 (1 − 2r cos(θ − ϕ) + r 2 )(1 − 2r cos(θ + ϕ) + r 2 ) Since we know that
|u(eiϕ )| ≤ K/ sin ϕ, 0 ≤ ϕ ≤ π,
we conclude that for r < 1 − , |u(z)/ Im z| ≤ 8
−4
K.
To get (1) we apply the lemma with u(z) = τ −n Re gρ,τ (τ + 2Bτ z). We point out that in the factorization defining gρ,τ in [4, Lemma 2], it only matters that we take a product over resonances in |τ − λ| ≤ τ/C, for some C. That makes gρ,τ defined in |τ − λ| ≤ τ/2C, with (1) holding there. Cauchy estimates then imply that we gain decay in λ when derivatives are applied. The estimate (3) follows from the argument of [4, Prop. 1]: λ+3R Im λj # {λj : |λ − λj | ≤ R} ≤ C dt |t − λj |2 λ−3R |λ −λ|≤λ/2 j λj ∈#ρ
= C(s(λ + 3R) − s(λ − 3R) − i(gρ,λ (λ + 3R) − gρ,λ (λ − 3R)))
(5)
= O(R)λn−1 , 1 ≤ R ≤ λ/6 by (1) and Cauchy inequalities. For R > λ/C the estimate follows from the global bound on the number of resonances. We stress here that the estimate on gρ,τ is only
Breit–Wigner Approximation and Distribution of Resonances
735
needed on the real axis, and hence, in odd but not even dimensions, it follows directly from Melrose’s argument in [3]. To get (2), we proceed as in [4, (4.3)], and hence need to find a bound for 1 π 1 log | det S(−λ − ρeiθ )| sin θdθ = − 2 Re log | det S(z)|dz ρ 0 ρ %λ,ρ 1 = 2 Re 2i ∂¯z log | det S(z)|L(dz) ρ & λ,ρ 1 = − 2 Re i ∂z log det S(z)L(dz), ρ &λ,ρ where &λ,ρ = {z : Im z ≥ 0, |z + λ| ≤ ρ}, %λ,ρ , its boundary, L(dz), the Lebesgue measure on C, and where we used Green’s formula. We now write 1 1 (z) + − ∂z log det S(z) = gρ,τ z − λj z − λ¯ j |λ−λ |