Commun. Math. Phys. 219, 1 – 3 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Harry Lehmann Harry Lehmann died in Hamburg on November 22, 1998. He left his wife Margot, whom he married in 1971, and two adult children. Lehmann was a founding father of modern Quantum Field Theory, the part of Quantum Mechanics that underlies elementary particle physics. Harry Lehmann was born in 1924 at Güstrow, Mecklenburg. After graduation from school in Rostock, the German army drafted him in 1942 for service in North Africa, where he was taken prisoner of war by the American forces. He spent three years in a prison camp in the United States; there he had the opportunity to study on his own, and to prepare for the university. When he was released in 1946, he soon returned to his parents in Rostock and began to study physics, first at the University of Rostock, then at the Humboldt University in East Berlin, obtaining his diploma with a thesis on experimental physics. In 1949 he became assistant of Friedrich Hund at the University of Jena, where he wrote his doctoral dissertation on classical electrodynamics. When Hund moved to the University of Frankfurt, Lehmann served at Jena as acting professor, until Hund was replaced. In the fall of 1952, Heisenberg offered Lehmann a position at the Max Planck Institute for Physics in Göttingen. There he joined an active group of young theorists from Germany and abroad who had come to collaborate with Heisenberg. After his initial stay, he requested permission to extend his visit; but the authorities of the former German
2
A. Jaffe, G. Mack, W. Zimmermann
Democratic Republic never responded, and so Harry Lehmann remained in the west. As a result it was not until 1976 that he could once again visit his parents in Rostock. A main topic of discussion in Heisenberg’s institute was the method of renormalization, that had been developed in the United States and Japan right after the war. This technique made it possible to compute measurable quantities of quantum electrodynamics, and to compare them with experiments, even though divergent integrals entered intermediate stages of the calculations. Despite the enormous success of renormalization theory, made evident by the high-precision agreement between theory and experiment, many physicists of the older generation in Europe remained skeptical and were convinced that the infinities indicated a serious deficiency of quantum field theory. Dirac, for instance, called renormalization theory “a sin against theoretical physics”. On the other hand the younger theoretical physicists were quite enthusiastic. They considered it a challenge to reformulate the theory in such a way that renormalization infinities never occur, either in the formulation of the principles, or in the calculation of observable quantities. Harry Lehmann’s publication on the properties of propagators [1] was an early decisive step in this direction. From minimal assumptions he derived the main properties of propagators, and expressed the constants of renormalization by integrals over finite quantities, even though those quantities diverged in perturbation theory. He carried out a large part of his pioneering work in the 1950’s, in collaboration with Kurt Symanzik and Wolfhart Zimmermann. The shorthand designation LSZ is familiar to all elementary particle physicists up to this day. The LSZ-formalism and the Lehmann representation are among the most important basic tools of the theory of elementary particles [2]. An important application of this technique in scattering theory provides the relation between scattering amplitudes and time ordered correlation functions. These were derived as an immediate consequence of the asymptotic behavior of field operators in the distant past and future. In 1955 Harry Lehmann left Heisenberg’s institute to visit Copenhagen as a member of the CERN Study Group, and in 1956 he accepted a professorship at the University of Hamburg to become the successor to Wilhelm Lenz. He founded the theoretical elementary particle physics group, and for thirty years, his strong personality determined the character of the II. Institut für Theoretische Physik at Hamburg University. He became Professor Emeritus in 1986. He also advised the German Electron Synchrotron laboratory DESY and helped start its theory group by persuading Kurt Symanzik to return there from New York. Many young scientists were strongly influenced by his personality, by his style of discussion in seminars, by the conciseness of his insightful contributions to research, and by his views on physics in general. Harry Lehmann’s interest in the theory of dispersion relations led to the beginning of his close collaboration with Res Jost. In the case of equal mass scattering Jost and Lehmann found a representation for matrix elements of the commutator of two field operators between energy-momentum eigenstates [3]. This representation was extended by Dyson to the general case of unequal masses [4]. On the basis of the Dyson representation, Lehmann derived dispersion relations and other analytic properties of the scattering amplitudes as a consequence of locality, relativistic invariance and conditions on the particle spectrum [5]. These results are also valid for composite particles despite their internal structure, since only general properties are used in the derivation which are independent of the dynamics of the system. Dispersion relations, therefore, provide an experimental test for the principles of local quantum field theory.
Harry Lehmann
3
Harry Lehmann remained active in research until the end of his life. He directed several NATOAdvanced Study Institutes in Cargèse (France). With K. Pohlmeyer he worked on field theories with non-polynomial Lagrangians. During his last years he investigated symmetry breaking effects for the quark mass spectrum together with T. T. Wu. Harry Lehmann’s scientific merits were recognized in many ways. He received the Max Planck medal of the German Physical Society 1967 and he was made a Chevalier de la Legion d’Honneur on December 31, 1969. He was honoured in 1997 by the Dannie Heineman Prize of the American Physical Society and the American Institute of Physics. Harry Lehmann was an excellent speaker with the remarkable ability to communicate involved and difficult subjects understandably. We remember him gratefully, in friendship, and with esteem for his scientific work. References 1. 2. 3. 4. 5.
Lehmann, H.: Nuovo Cimento 11, 342 (1954) Lehmann, H., Symanzik, K., and Zimmermann, W.: Nuovo Cimento 1, 1425 (1955) Jost, R., and Lehmann, H.: Nuovo Cimento 5, 1598 (1957) Dyson, F. J.: Phys. Rev. 110, 1460 (1958) Lehmann, H.: Nuovo Cimento 10, 579 (1958), and Suppl. to Nuovo Cimento 14, 153 (1950)
Arthur Jaffe Gerhard Mack Wolfhart Zimmermann
Commun. Math. Phys. 219, 5 – 30 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Algebraic Quantum Field Theory, Perturbation Theory, and the Loop Expansion M. Dütsch , K. Fredenhagen II. Institut für Theoretische Physik, Universität Hamburg, Luruper Chaussee 149, Hamburg, Germany. E-mail:
[email protected],
[email protected] Received: 9 February 2000 / Accepted: 21 March 2000
Dedicated to the memory of Harry Lehmann Abstract: The perturbative treatment of quantum field theory is formulated within the framework of algebraic quantum field theory. We show that the algebra of interacting fields is additive, i.e. fully determined by its subalgebras associated to arbitrary small subregions of Minkowski space. We also give an algebraic formulation of the loop expansion by introducing a projective system A(n) of observables “up to n loops”, where A(0) is the Poisson algebra of the classical field theory. Finally we give a local algebraic formulation for two cases of the quantum action principle and compare it with the usual formulation in terms of Green’s functions. 1. Introduction Quantum field theory is a very successful frame for our present understanding of elementary particle physics. In the case of QED it led to fantastically precise predictions of experimentally measurable quantities; moreover the present standard model of elementary particle physics is of a similar structure and is also in good agreement with experiments. Unfortunately, it is not so clear what an interacting quantum field theory really is, expressed in meaningful mathematical terms. In particular, it is by no means evident how the local algebras of observables can be defined. A direct approach by methods of constructive field theory led to the paradoxical conjecture that QED does not exist; the situation seems to be better for Yang-Mills theories because of asymptotic freedom, but there the problem of big fields which can appear at large volumes poses at present unsurmountable problems [1, 21]. In this paper we will take a pragmatic point of view: interacting quantum field theory certainly exists on the level of perturbation theory, and our confidence on quantum field theory relies mainly on the agreement of experimental data with results from low orders of perturbation theory. On the other hand, the general structure of algebraic quantum Work supported by the Deutsche Forschungsgemeinschaft
6
M. Dütsch, K. Fredenhagen
field theory (or “local quantum physics”) coincides nicely with the qualitative features of elementary particle physics, therefore it seems to be worthwhile to revisit perturbation theory from the point of view of algebraic quantum field theory. This will, on the one hand, provide physically relevant examples for algebraic quantum field theory, and on the other hand, give new insight into the structure of perturbation theory. In particular, we will see that we can reach a complete separation of the infrared problem from the ultraviolet problem. This might be of relevance for Yang-Mills theory, and it is important for the construction of the theory on curved spacetimes [7]. The plan of the paper is as follows. We will start by describing the Stückelberg– Bogoliubov–Shirkov–Epstein–Glaser-version of perturbation theory [6, 14, 28, 26, 7]. This construction yields the local S-matrices S(g) (g ∈ D(R4 )) as formal power series in g (Sect. 2). The most important requirement which is used in this construction is the condition of causality (15) which is a functional equation for g → S(g). The results of Sects. 3 and 4 are to a large extent valid beyond perturbation theory. We only assume that we are given a family of unitary solutions of the condition of causality. In terms of these local S-matrices we will construct nets of local observable algebras for the interacting theory (Sect. 3). We will see that, as a consequence of causality, the interacting theory is completely determined if it is known for arbitrary small spacetime volumes (Sect. 4). In Sect. 5 we algebraically quantize a free field by deforming the (classical) Poisson algebra. In a second step we generalize this quantization procedure to the perturbative interacting field. We end up with an algebraic formulation of the expansion in h¯ of the interacting observables (“loop expansion”). In the last section we investigate two examples for the quantum action principle: the field equation and the variation of a parameter in the interaction. Usually this principle is formulated in terms of Green’s functions [20, 18, 22], i.e. the vacuum expectation values of time ordered products of interacting fields. Here we give a local algebraic formulation, i.e. an operator identity for a localized interaction. In the case of the variation of a parameter in the interaction this requires the use of the retarded product of interacting fields, instead of only time ordered products (as in the formulation in terms of Green’s functions). For a local construction of observables and physical states in gauge theories we refer to [10, 11, 5]. There, perturbative positivity (“unitarity”) is, by a local version of the Kugo-Ojima formalism [17], reduced to the validity of BRST symmetry [3]. 2. Free Fields, Borchers’ Class and Local S-Matrices An algebra of observables corresponding to the Klein–Gordon equation ( + m2 )ϕ = 0
(1)
can be defined as follows: Let ret,av be the retarded, resp. advanced Green’s functions of ( + m2 ) ( + m2 ) ret,av = δ,
supp ret,av ⊂ V¯± ,
(2)
where V¯± denotes the closed forward, resp. backward lightcone, and let = ret − av . The algebra of observables A is generated by smeared fields ϕ(f ), f ∈ D(R4 ), which
Algebraic Quantum Field Theory, Perturbation Theory, Loop Expansion
7
obey the following relations f → ϕ(f ) is linear,
(3)
ϕ(( + m )f ) = 0, ϕ(f )∗ = ϕ(f¯),
(4)
2
(5) (6)
[ϕ(f ), ϕ(g)] = i < f, ∗ g >, where the star denotes convolution and < f, g >= d 4 xf (x)g(x). As a matter of fact, A (as a ∗-algebra with unit) is uniquely determined by these relations. The Fock space representation π of the free field is induced via the GNS-construction from the vacuum state ω0 . Namely, let ω0 : A → C be the quasifree state given by the two-point function ω0 (ϕ(f )ϕ(g)) = i < f, + ∗ g >,
(7)
where + is the positive frequency part of . Then the Fock space H, the vector representing the vacuum and the Fock representation are up to equivalence determined by the relation (, π(A)) = ω0 (A), A ∈ A. On H, the field ϕ (we will omit the representation symbol π ) is an operator valued distribution, i.e. there is some dense subspace D ⊂ H with (i) ϕ(f ) ∈ End(D) (ii) f → ϕ(f ) is continuous
∀ ∈ D.
There are other fields A on H, on the same domain, which are relatively local to ϕ, [A(f ), ϕ(g)] = 0
if
(x − y)2 < 0
∀(x, y) ∈ (supp f × supp g).
(8)
They form the so called Borchers class B. In the case of the free field in 4 dimensions, B consists of Wick polynomials and their derivatives [13]. Fields from the Borchers class can be used to define local interactions, g ∈ D(R4 ), (9) HI (t) = − d 3 x g(t, x)A(t, x), (where the minus sign comes from the interpretation of A as an interaction term in the Lagrangian) provided they can be restricted to spacelike surfaces. The corresponding time evolution operator from −τ to τ , where τ > 0 is so large that supp g ⊂ (−τ, τ ) × R3 , (the S-matrix) is formally given by the Dyson series S(g) = 1 +
∞ n i n=1
n!
dx1 . . . dxn T A(x1 ) . . . A(xn ) g(x1 ) . . . g(xn ).
(10)
with the time ordered products (“T -products”) T . . . . It is difficult to derive (10) from (9) if the field A cannot be restricted to spacelike surfaces. Unfortunately, this is almost always the case in four spacetime dimensions, the only exception being the field ϕ itself and its derivatives. Therefore one defines the timeordered products of n factors directly as multilinear (with respect to C ∞ -functions as coefficients) symmetric mappings from
8
M. Dütsch, K. Fredenhagen
B n to operator valued distributions T A1 (x1 ) . . . An (xn ) on D such that they satisfy the factorization condition1 T A(x1 ) . . . A(xn ) = T A(x1 ) . . . A(xk ) T A(xk+1 ) . . . A(xn ) (11) if {xk+1 , . . . , xn } ∩ ({x1 , . . . , xk } + V¯+ ) = ∅. The S-matrix S(g) is then, as a formal power series, by definition given by (10) . Since its zeroth order term is 1, it has an inverse in the sense of formal power series ∞ (−i)n S(g)−1 = 1 + dx1 . . . dxn T¯ A(x1 ) . . . A(xn ) g(x1 ) . . . g(xn ), (12) n! n=1
where the “antichronological products” T¯ (. . . ) can be expressed in terms of the time ordered products def T¯ A(x1 ) . . . A(xn ) = (−1)|P |+n T A(xi ), i ∈ p). (13) P ∈P ({1,...,n})
p∈P
(Here P({1, . . . , n}) is the set of all ordered partitions of {1, . . . , n} and |P | is the number of subsets in P .) The T¯ -products satisfy anticausal factorization T¯ A(x1 ) . . . A(xn ) = T¯ A(xk+1 ) . . . A(xn ) T¯ A(x1 ) . . . A(xk ) (14) if {xk+1 , . . . , xn } ∩ ({x1 , . . . , xk } + V¯+ ) = ∅. The crucial observation now (cf. [16]) is that S(g) satisfies the remarkable functional equation S(f + g + h) = S(f + g)S(g)−1 S(g + h),
(15)
f, g, h ∈ D(R4 ), whenever (supp f + V¯+ ) ∩ supp h = ∅ (independent of g). Equivalent forms of this equation play an important role in [6] and [14]. For g = 0 this is just the functional equation for the time evolution and may be interpreted as the requirement of causality [6]. Actually, for formal power series S(·) of operator valued distributions, the g = 0 equation is equivalent to the seemingly stronger relation (15), because both are equivalent to condition (11) for the time ordered products. We call (15) the “condition of causality”. 3. Interacting Local Nets The arguments of this and the next section are to a large extent independent of perturbation theory. We start from the assumption that we are given a family of unitaries S(f ) ∈ A, ∀f ∈ D(R4 , V) (i.e. f has the form f = i fi (x)Ai , fi ∈ D(R4 , R), Ai ∈ V), where V is an abstract, finite dimensional, real vector space, interpreted as the space of possible interaction Lagrangians, and A is some unital ∗-algebra. In perturbation theory V is a real subspace of the Borchers’ class. The unitaries S(f ) are required to satisfy the causality condition (15). We first observe that we obtain new solutions of (15) by introducing the relative S-matrices Sg (f ) = S(g)−1 S(g + f ), def
1 Due to the symmetry and linearity of T (. . . ) it suffices to consider the case A = A = · · · = A . n 1 2
(16)
Algebraic Quantum Field Theory, Perturbation Theory, Loop Expansion
9
where now g is kept fixed and Sg (f ) is considered as a functional of f . In particular, the relative S-matrices satisfy local commutation relations [Sg (h), Sg (f )] = 0
(x − y)2 < 0
if
∀(x, y) ∈ supp h × supp f.
(17)
δ Therefore their functional derivatives Ag (x) = δh(x) Sg (hA)|h=0 , A ∈ V, h ∈ D(R4 ), provided they exist, are local fields (in the limit g → constant this is Bogoliubov’s definition of interactig fields) [6]. We now introduce local algebras of observables by assigning to a region O of Minkowski space the ∗-algebra Ag (O) which is generated by {Sg (h) , h ∈ D(O, V)}. A remarkable consequence of relation (15) is that the structure of the algebra Ag (O) depends only locally on g [16, 7], namely, if g ≡ g in a neighbourhood of a causally closed region containing O, then there exists a unitary V ∈ A such that
V Sg (h)V −1 = Sg (h),
∀ h ∈ D(O, V).
(18)
Hence the system of local algebras of observables (according to the principles of algebraic quantum field theory this system (“the local net”) contains the full physical content of a quantum field theory) is completely determined if one knows the relative S-matrices for test functions g ∈ D(R4 , V). The construction of the global algebra of observables for an interaction Lagrangian L ∈ V may be performed explicitly (cf. [7]). Let $(O) be the set of all functions θ ∈ D(R4 ) which are identically to 1 in a causally closed open neighbourhood of O and consider the bundle {θ} × Aθ L (O). (19) θ∈$(O )
Let U(θ, θ ) be the set of all unitaries V ∈ A with V Sθ L (h) = Sθ L (h)V ,
∀ h ∈ D(O, V).
(20)
Then AL (O) is defined as the algebra of covariantly constant sections, i.e. AL (O) A = (Aθ )θ∈$(O) V Aθ = Aθ V ,
(Aθ ∈ Aθ L (O))
∀V ∈ U(θ, θ ).
(21) (22)
AL (O) contains in particular the elements SL (h), (SL (h))θ = Sθ L (h).
(23)
The construction of the local net is completed by fixing the embeddings i21 : AL (O1 ) &→ AL (O2 ) for O1 ⊂ O2 . But these embeddings are inherited from the inclusions Aθ L (O1 ) ⊂ Aθ L (O2 ) for θ ∈ $(O2 ) by restricting the sections from $(O1 ) to $(O2 ). The embeddings evidently satisfy the compatibility relation i12 ◦ i23 = i13 for O3 ⊂ O2 ⊂ O1 and define thus an inductive system. Therefore, the global algebra can be defined as the inductive limit of local algebras def
AL = ∪O AL (O).
(24)
In perturbation theory, the unitaries V ∈ U(θ, θ ) are themselves formal power series, therefore it makes no sense to say that two elements A, B ∈ AL (O) agree in nth
10
M. Dütsch, K. Fredenhagen
order, but only that they agree up to nth order (because (Aθ − Bθ ) = O(g n+1 ) implies Aθ − Bθ = V −1 (Aθ − Bθ )V = O(g n+1 )). The time ordered products and hence the relative S-matrices Sθ L (h) are chosen as to satisfy Poincaré covariance (see the normalization condition N1 below), i.e. the unitary ↑ positive energy representation U of the Poincaré group P+ under which the free field transforms satisfies U (L)Sθ L (h)U (L)−1 = SθL L (hL ), θL (x) := θ (L−1 x), hL (x) := D(L)h(L−1 x),
(25)
↑
∀L ∈ P+ provided L is a Lorentz scalar and V transforms under the finite dimensional representation D of the Lorentz group. This enables us to define an automorphic action of the Poincaré group on the algebra of observables. Let for A ∈ AL (O), θ ∈ $(LO) (αL (A))θ = U (L)AθL−1 U (L)−1 . def
(26)
By inserting the definitions one finds that αL (A) is again a covariantly constant section (22). So αL is an automorphism of the net which realizes the Poincaré symmetry αL AL (O) = AL (LO),
αL1 L2 = αL1 αL2 .
(27)
For the purposes of perturbation theory, we have to enlarge the local algebras somewhat. In perturbation theory, the relative S-matrices are formal power series in two variables, and therefore the generators of the local algebras SL (λf ) =
∞ n n i λ n=0
n!
TL (f ⊗n )
(28)
are formal power series with coefficients which are covariantly constant sections in the sense of (22). The first order terms in (28) are, according to Bogoliubov, the interacting local fields, TL (hA) =: AL (h), A ∈ V, h ∈ D(R4 ),
(29)
the higher order terms satisfy the causality condition (11) and may therefore be interpreted as time ordered products of interacting fields (cf. [14], Sect. 8.1). Our enlarged local algebra AL (O) (we use the same symbol as before) now consists of all formal power series with coefficients from the algebra generated by all timeordered products TL (f ⊗n ) with f ∈ D(O, V), n ∈ N0 . 4. Consequences of Causality Another consequence of the causality relation (15) is that the S-matrices S(f ) are uniquely fixed if they are known for test functions with arbitrarily small supports. Namely, by a repeated use of (15) we find that S( ni=1 fi ) is a product of factors S( i∈K fi )±1 , where the sets K ⊂ {1, . . . , n} have the property that for every pair i, j ∈ K the causal closures of supp fi and supp fj overlap. Hence if the supports of all fi are contained in double cones of diameter d, the supports of i∈K fi fit into double cones of diameter
Algebraic Quantum Field Theory, Perturbation Theory, Loop Expansion
11
2d. As d > 0 can be chosen arbitrarily small and the relative S-matrices also satisfy (15), this implies additivity of the net, AL (Oα ), (30) AL (O) = α
where (Oα ) is an arbitrary covering of O and where the symbol means the generated algebra. One might also pose the existence question: Suppose we have a family of unitaries S(f ) for all f with sufficiently small support which satisfy the causality condition (15) for f, g, h ∈ D(O, V), diam(O) sufficiently small, and local commutativity for arbitrary big separation [S(f ), S(g)] = 0
if
supp f is spacelike to supp g.
By repeated use of the causality (15) we can then define S-matrices for test functions with larger support. It is, however, not evident that these S-matrices are independent of the way of construction and that they satisfy the causality condition. (We found a consistent construction only in the simple case of one dimension: x = time.) Fortunately, a general positive answer can be given in perturbation theory. Let S(f ) be given for f ∈ D(O, V) for all double cones with diam(O) < r. The time ordered product of n factors is the n-fold functional derivative of S at f = 0. It is an operator valued distribution2 Tn defined on test functions of n variables with support
contained in Un = {(y1 , . . . , yn ) ∈ R4n | maxi<j |yi − yj | < 2r } and with values in V ⊗n . Especially we know T1 (x) on R4 . On this domain the time ordered products satisfy the factorization condition (11). In addition, local commutativity of the S-matrices implies def
[Tn (x1 , . . . , xn ), Tm (y1 , . . . ym )] = 0
(31)
for (xi − yj )2 < 0 ∀(i, j ) and (x1 , . . . xn ) ∈ Un , (y1 , . . . , ym ) ∈ Um . By construction Tn |Un is symmetric with respect to permutations of the factors. We now show that this input suffices to construct Tn (x1 , . . . , xn ) on the whole R4n by induction on n. We assume that the Tk ’s were constructed for k ≤ n − 1, that they fulfil causality (11) and [Tm (x1 , . . . , xm ), Tk (y1 , . . . yk )] = 0
for
(x1 , . . . xm ) ∈ Um , k ≤ n − 1
(32)
(m arbitrary) and [Tl (x1 , . . . , xl ), Tk (y1 , . . . yk )] = 0 if (xi − yj )2 < 0 of [7].3
for
l, k ≤ n − 1,
(33)
∀(i, j ) in the latter two equations. We can now proceed as in Sect. 4
2 Here we change the notation for the time ordered products: let f = f (x)A , f i i ∈ i i D(R4 ), Ai ∈ V. Instead of dx1 . . . dxn i1 ...in T Ai1 (x1 ) . . . Ain (xn ) fi1 (x1 ) . . . fin (xn ) (10) we write dx1 . . . dxn Tn (x1 , . . . , xn )f (x1 ) . . . f (xn ) ≡ Tn (f ⊗n ). 3 In contrast to the (inductive) Epstein–Glaser construction of T (x , . . . , x ) [14, 7] the present construcn 1 n
tion is unique, normalization conditions (e.g. N1–N4 in Sect. 5) are not needed, because the non-uniqueness of the Epstein–Glaser construction is located at the total diagonal n ≡ {(x1 , . . . , xn ) | x1 = · · · = xn }. But here the time ordered products are given in the neighbourhood Un of n .
12
M. Dütsch, K. Fredenhagen
Let J denote the family of all non-empty proper subsets I of the index set {1, . . . , n} def
and define the sets CI = {(x1 , . . . , xn ) ∈ R4n | xi ∈ J − (xj ), i ∈ I, j ∈ I c } for any I ∈ J . Then CI ∪ Un = R4n . (34) I ∈J
We use the short-hand notations
T I (xI ) = T (
Ai (xi )),
xI = (xi , i ∈ I ).
(35)
i∈I
On D(CI ) we set c
TI (x) = T I (xI )T I (xI c ) def
(36)
for any I ∈ CI . For I1 , I2 ∈ J , CI1 ∩ CI2 = ∅ one easily verifies4 TI1 |CI1 ∩CI2 = TI2 |CI1 ∩CI2 .
(37)
Let now {fI }I ∈J ∪ {f0 } be a finite smooth partition of unity of R4n subordinate to {CI }I ∈J ∪ Un : supp fI ⊂ CI , supp f0 ⊂ Un . Then we define def
Tn (h) = Tn |Un (f0 h) +
TI (fI h),
h ∈ D(R4n , V ⊗n ).
(38)
I ∈J
As in [7] one may prove that this definition is independent of the choice of {fI }I ∈J ∪{f0 } and that Tn is symmetric with respect to permutations of the factors and satisfies causality (11). Local commutativity (32) and (33) (with n−1 replaced by n) is verified by inserting the definition (38) and using the assumptions. By (10) we obtain from the T -products the corresponding S-matrix S(g) for arbitrary large support of g ∈ D(R4 , V), and S(g) satisfies the functional equation (15).
5. Perturbative Quantization and Loop Expansion Causal perturbation theory was traditionally formulated in terms of operator valued distributions on Fock space. It is therefore well suited for describing the deformation of the free field into an interacting field by turning on the interaction g ∈ D(R4 , V). It is much less clear how an expansion in powers of h¯ can be performed, describing the deformation of the classical field theory, mainly because the Fock space has no classical counter part. Usually the expansion in powers of h¯ is done in functional approaches to field theory by ordering Feynman graphs according to loop number. In this section we show that the algebraic description provides a natural formulation of the loop expansion, and we point out the connection to formal quantization theory. 4 In contrast to [7] the Wick expansion of the T -products is not used here, because local commutativity of the T -products is contained in the inductive assumption.
Algebraic Quantum Field Theory, Perturbation Theory, Loop Expansion
13
5.1. Quantization of a free field and Wick products. In quantization theory one associates to a given classical theory a quantum theory. One procedure is the deformation (or star-product) quantization [2]. This procedure starts from a Poisson algebra, i.e. a commutative and associative algebra together with a second product: a Poisson bracket, satisfying the Leibniz rule and the Jacobi identity; and to deform the product as a function of h, ¯ such that5 a ×h¯ b is a formal power series in h, ¯ the associativity is maintained and a ×h¯ b
h¯ →0
−→
1 (a ×h¯ b − b ×h¯ a) h¯
ab,
h¯ →0
−→
{a, b}.
(39)
Actually this scheme can easily be realized in free field theory (cf. [9]). Basic functions are the evaluation functionals ϕclass (x), ( + m2 )ϕclass = 0, with the Poisson bracket {ϕclass (x), ϕclass (y)} = (x − y)
(40)
( is the commutator function (2)). Because of the singular character of the fields should be smoothed out in order to belong to the Poisson algebra. Hence our fundamental classical observables are φ(t) = t0 +
N
ϕclass (x1 ) . . . ϕclass (xn )tn (x1 , . . . , xn )dx1 . . . dxn ,
n=1
(41)
t ≡ (t0 , t1 , . . . ), where t0 ∈ C arbitrary, N < ∞, tn is a suitable test “function” (we will admit also certain distributions) with compact support. The Klein Gordon equation shows up in the property: A(t) = 0 if t0 = 0 and tn = ( i + m2 )gn for all n > 0, some i = i(n) and some gn with compact support. In the quantization procedure we identify ϕclass (x1 ) . . . ϕclass (xn ) with the normally ordered product (Wick product) : ϕ(x1 ) . . . ϕ(xn ) : (ϕ is the free quantum field ((3)– (6)). Wick’s theorem may be interpreted as the definition of a h-dependent associative ¯ product, :
ϕ(xi ) : ×h¯ :
i∈I
=
j ∈J
ϕ(xj ) :
K⊂I α:K→J injective j ∈K
i h
¯ + (xj − xα(j ) ) :
ϕ(xl ) :
(42)
l∈(I \K)∪(J \α(K))
in the linear space spanned by Wick products (the “Wick quantization”).6 To be precise we have to fix a suitable test function space (or better: test distribution space) in (41) which is small enough such that the product is well defined for all h¯ and which contains the interesting cases occuring in perturbation theory, e.g. products of translation invariant distributions (particularly δ-distributions of difference variables) with test functions of compact support should be allowed for tn as in Theorem 0 of Epstein and Glaser. 5 The deformed product is called a ∗-product in deformation theory. In order to avoid confusion with the ∗-operation we denote the product by ×h¯ . 6 The observation that the Wick quantization is appropriate for the quantization of the free field goes back to Dito [9].
14
M. Dütsch, K. Fredenhagen
Let Wn = {t ∈ D (R4n )symm , supp t compact, def
(43)
WF(t) ∩ (R4n × V+n ∪ V−n ) = ∅}
(see the Appendix for a definition of the wave front set WF of a distribution). In [7] it was shown that Wick polynomials smeared with distributions t ∈ Wn , def def ⊗n : ϕ(x1 ) . . . ϕ(xn ) : t (x1 , . . . , xn ) dx1 . . . dxn , (ϕ ⊗0 ) = 1, (44) (ϕ )(t) = are densely defined operators on an invariant domain in Fock space. This includes in particular the Wick powers n
: ϕ (f ) := (ϕ
⊗n
)(t), f ∈ D(R ), t (x1 , . . . , xn ) = f (x1 ) 4
n
δ(xi − x1 )
(45)
i=2
The product of two such operators is given by (ϕ
⊗n
)(t) ×h¯ (ϕ
⊗m
)(s) =
min{n,m}
h¯ k (ϕ ⊗(n+m−2k) )(t ⊗k s)
(46)
k=0
with the k-times contracted tensor product n!m!i k dy1 . . . dy2k + (y1 − y2 ) . . . (t ⊗k s)(x1 , . . . , xn+m−2k ) = S k!(n − k)!(m − k)!
+ (y2k−1 − y2k )t (x1 , . . . , xn−k , y1 , y3 , . . . , y2k−1 ) s(xn−k+1 , . . . , xn+m−2k , y2 , y4 , . . . , y2k ) (47) (S means the symmetrization in x1 , . . . , xn+m−2k ). The conditions on the wave front sets of t and s imply that the product (t ⊗k s) exists (see the Appendix) and is an element of Wn+m−2k . The ∗-operation reduces to complex conjugation of the smearing function. def def ∞ Let W0 = C and W = n=0 Wn . For t ∈ W let tn denote the component of t in def Wn . The ∗-operation is defined by (t ∗ )n = (t¯n ). Equation (46) can be thought of as the definition of an associative product on W, h¯ k tm ⊗k sl . (48) (t ×h¯ s)n = m+l−2k=n
The Klein–Gordon equation defines an ideal N in W which is generated by ( + m2 )f, f ∈ D(R4 ). Actually this ideal is independent of h¯ (because a contraction with ( +m2 )f vanishes) and coincides with the kernel of φ defined in (41). Hence the product ¯ = W/N . For a given positive value of h, (48) is well defined on the quotient space W ¯ ¯ is isomorphic to the algebra generated by Wick products (ϕ ⊗n )(t), t ∈ Wn (44). In W the limit h¯ → 0 we find h¯ n t ⊗n s) lim φ(t) ×h¯ φ(s) = lim φ( h¯ →0
h¯ →0
n
= φ(t ⊗0 s) = φ(t) · φ(s)
(49)
Algebraic Quantum Field Theory, Perturbation Theory, Loop Expansion def
(we set (t ⊗k s)n = lim
h¯ →0
m+l=n tm+k
15
⊗k sl+k , cf. (47)), with the classical product ·, and
1 [φ(t), φ(s)]h¯ = φ(t ⊗1 s − s ⊗1 t) = {φ(t), φ(s)} i h¯
(50)
with the classical Poisson bracket. Thus (W, ×h¯ ) provides a quantization of the given Poisson algebra of the classical free field ϕclass (40). We point out that we have formulated the algebraic structure of smeared Wick products without using the Fock space. The Fock representation is recovered, via the GNS construction, from the vacuum state ω0 (t) = t0 . It is faithful for h¯ = 0 but is one dimensional in the classical limit h¯ = 0. This illustrates the superiority of the algebraic point of view for a discussion of the classical limit.
5.2. Normalization conditions and retarded products. To study the perturbative quantization of interacting fields we need some technical tools which are given in this subsection. The time ordered products are constructed by induction on the number n of factors (which is also the order of the perturbation series (10)). In contrast to the inductive construction of the T -products in sect. 4, we do not know Tn |Un here. So causality (11) and symmetry determine the time ordered products uniquely (in terms of time ordered products of less factors) up to the total diagonal n = {(x1 , . . . , xn ) ∈ R4n |x1 = x2 = · · · = xn }. There is some freedom in the extension to n . To restrict it we introduce the following additional defining conditions (“normalization conditions”, formulated for a scalar field without derivative coupling, i.e. L is a Wick polynomial solely in φ, it does not contain derivatives of φ; for the generalization to derivative couplings see [5]) N1 covariance with resp. to Poincaré transformations and possibly discrete symmetries, in particular N2 unitarity: T (A1 (x1 ) . . . An (xn ))∗ = T¯ (A∗1 (x1 ) . . . A∗n (xn )), N3 [T (A1 (x1 ) . . . An (xn )), φ(x)] k = i h¯ nk=1 T (A1 (x1 ) . . . ∂A ∂φ (xk ) . . . An (xn )) (xk − x), 2 N4 ( x + m )T (A1 (x1 ) . . . An (xn )φ(x)) k = −i h¯ nk=1 T (A1 (x1 ) . . . ∂A ∂φ (xk ) . . . An (xn ))δ(xk − x), where [φ(x), φ(y)] = i h¯ (x − y). N1 implies covariance of the arising theory, and N2 provides a ∗-structure. N3 gives the relation to time ordered products of sub Wick polynomials. Once these are known (in an inductive procedure), only a scalar distribution has to be fixed. Due to translation invariance the latter depends only on the relative coordinates. Hence, the extension of the (operator valued) T -product to n is reduced to the extension of a C-number distribution t0 ∈ D (R4(n−1) \ {0}) to t ∈ D (R4(n−1) ). (We call t an extension of t0 if t (f ) = t0 (f ), ∀f ∈ D(R4(n−1) \ {0})). The singularity of t0 (y) and t (y) at y = 0 is classified in terms of Steinmann’s scaling degree [27, 7] sd(t) = inf{δ ∈ R , lim λδ t (λx) = 0}. def
λ→0
(51)
By definition sd(t0 ) ≤ sd(t), and the possible extensions are restricted by requiring sd(t0 ) = sd(t).
(52)
16
M. Dütsch, K. Fredenhagen
Then the extension is unique for sd(t0 ) < 4(n − 1), and in the general case there remains the freedom to add derivatives of the δ-distribution up to order (sd(t0 ) − 4(n − 1)), i.e. Ca ∂ a δ(y) (53) t (y) + |a|≤sd(t0 )−4(n−1)
is the general solution, where t is a special extension [7, 24, 14], and the constants Ca are restricted by N1, N2, N4, permutation symmetries and possibly further normalization conditions, e.g. the Ward identities for QED [10, 5]. For an interaction with mass dimension dim(L) ≤ 4 the requirement (52) implies renormalizability by power counting, i.e. the number of indeterminate constants Ca does not increase by going over to higher perturbative orders. In [10] it is shown that the normalization condition N4 implies the field equation for the interacting field corresponding to the free field φ (see also (77) and Sect. 6.1 below). We have defined the interacting fields as functional derivatives of relative S-matrices (29). Hence, to formulate the perturbation series of interacting fields we need the perturbative expansion of the relative S-matrices: Sg (f ) =
i n+m n,m
n!m!
Rn,m (g ⊗n ; f ⊗m ),
(54)
where g, f ∈ D(R4 , V). The coefficients are the so called retarded products (“Rproducts”). They can be expressed in terms of time ordered and anti-time ordered products by Rn,m (g ⊗n ; f ⊗m ) =
n
(−1)k
k=0
n! T¯k (g ⊗k ) k!(n − k)!
×h¯ Tn−k+m (g ⊗(n−k) ⊗ f ⊗m ).
(55)
They vanish if one of the first n arguments is not in the past light cone of some of the last m arguments ([14], Sect. 8.1), supp Rn,m . . . ⊂ {(y1 , . . . yn , x1 , . . . , xm ) , {y1 , . . . yn } ⊂ ({x1 , . . . , xm } + V¯− )}. (56) In the remaining part of this subsection we show that the time ordered products can be defined in such a way that Rn,m is of order h¯ n . For this purpose we will introduce the connected part (a1 ×h¯ · · · ×h¯ an )c of (a1 ×h¯ · · · ×h¯ an ), where the ai are normally ordered products of free fields, and the connected part Tnc of the time ordered product Tn (or “truncated time ordered product”). In both cases the connected part corresponds to the sum of connected diagrams, provided the vertices belonging to the same ai are identified. Besides the (deformed) product ×h¯ (42) a ×h¯ b = h¯ n Mn (a, b), (57) n≥0
where a, b are normally ordered products of free fields, we have the classical product a · b = M0 (a, b), which is just the Wick product ϕ(xi ) : · : ϕ(xj ) :=: ϕ(xi ) ϕ(xj ) : (58) : i∈I
j ∈J
i∈I
j ∈J
Algebraic Quantum Field Theory, Perturbation Theory, Loop Expansion
17
and which is also associative and in addition commutative. Then we define (a1 ×h¯ · · · ×h¯ an )c recursively by def (aj1 ×h¯ · · · ×h¯ aj|J | )c , (59) (a1 ×h¯ · · · ×h¯ an )c = (a1 ×h¯ · · · ×h¯ an ) − |P |≥2 J ∈P
where {j1 , . . . , j|J | } = J , j1 < · · · < j|J | , the sum runs over all partitions P of {1, . . . , n} in at least two subsets and means the classical product (58). Tnc is defined analogously def c T|p| (⊗j ∈p fj ), (60) Tnc (f1 ⊗ · · · ⊗ fn ) = Tn (f1 ⊗ · · · ⊗ fn ) − |P |≥2 p∈P
and similarly we introduce the connected antichronological product T¯nc ≡ (T¯n )c . Proposition 1. Let the normally ordered products of free fields a1 , . . . , an be of order O(h¯ 0 ). Then (a1 ×h¯ · · · ×h¯ an )c = O(h¯ n−1 ).
(61)
Proof. We identify the vertices belonging to the same ai and apply Wick’s theorem (42) to a1 ×h¯ · · ·×h¯ an . Each “contraction” (i.e. each factor + ) is accompanied by a factor h. ¯ In the terms ∼ h¯ 0 (i.e. without any contraction) a1 , . . . , an are completely disconnected, the number of connected components is n. By a contraction this number is reduced by 1 or 0. So to obtain a connected term we need at least (n − 1) contractions. Hence the connected terms are of order O(h¯ n−1 ). % & Let B A1 , . . . , An = O(h¯ 0 ) and xi = xj , ∀1 ≤ i < j ≤ n. Then there exists a permutation π ∈ Sn such that T c A1 (x1 ) . . . An (xn ) = (Aπ1 (xπ1 ) ×h¯ · · · ×h¯ Aπn (xπn ))c = O(h¯ n−1 ). (62) We want this estimate to hold true also for coinciding points T c A1 (x1 ) . . . An (xn ) = O(h¯ n−1 ) on D(R4n ).
(63)
By the following argument this can indeed be satisfied by appropriate normalization of the time ordered products, i.e. (63) is an additional normalization condition, which is compatible with N1–N4. We proceed by induction on the number n of factors. Let us assume that the T c -products with less than n factors fulfil (63) and that we are away from the total diagonal
n . Using causal factorization, (60) and the shorthand notation T (J ) := T ( j ∈J Aj (xj )), J ⊂ {1, . . . , n}, we then know that there exists I ⊂ {1, . . . , n}, I = ∅, I c = ∅, with
c
c
T A1 (x1 ) . . . An (xn ) = T (I ) ×h¯ T (I ) =
|I | |I |
r=1 s=1 I1 &···&Ir =I J1 &···&Js =I c
h¯ k Mk T c (I1 ) · · · · · T c (Ir ), T c (J1 ) · · · · · T c (Js ) ,
(64)
k≥0
where & means the disjoint union. We now pick out the connected diagrams. The term k = 0 on the r.h.s. has (r + s) disconnected components. Analogously to Proposition 1
18
M. Dütsch, K. Fredenhagen
we conclude that it must hold k ≥ (r +s −1) for a connected diagram. Taking the validity of (63) for T c (Il ) and T c (Jm ) into account, we obtain rl=1 (|Il | − 1) + sm=1 (|Jm | − 1) + (r + s − 1) = n − 1 for the minimal order in h¯ of a connected diagram. So the h-power behaviour (62) holds true on D(R4n \ n ), and (63) is in fact a normalization ¯ condition. Due to (60) (Tn − Tnc ) is completely given by timeordered products of lower orders < n and hence is known also on n . The problem of extending Tn to n concerns solely Tnc . The normalization conditions N1–N4 are equivalent to the same conditions for Tnc and T¯nc (i.e. Tn and T¯n everywhere replaced by Tnc and T¯nc ). Due to N3–N4 it remains only the extension of < , T c (A1 . . . An ) >, where all Aj are different from free fields and is the vacuum. It is obvious that this can be done in a way which maintains (63) and is in accordance with N1–N2. We emphasize that the (ordinary) time ordered product Tn does not satisfy (63) because of the presence of disconneted diagrams. On the other hand the connected antichronological product T¯nc fulfills the estimate (63), as may be seen by unitarity N2. We now turn to the retarded products (55): Proposition 2. Let D(R4 , V) fj , gk = O(h¯ 0 ). Then the following statements hold true: (i) All diagrams which contribute to Rn,m (f1 ⊗ · · · ⊗ fn ; g1 ⊗ · · · ⊗ gm ) have the property that each fj -vertex is connected with at least one gk -vertex. (ii) Rn,m (f1 ⊗ · · · ⊗ fn ; g1 ⊗ · · · ⊗ gm ) = O(h¯ n ). Proof. (i) We work with the notation Rn,m (Y ; X), Y ≡ {y1 , . . . , yn }, X ≡ {x1 , . . . , xm } (cf. [14]), and consider a subdiagram with vertices J ⊂ Y which is not connected with the other vertices (Y \ J ) ∪ X. Because disconneted diagrams factorize with respect to the classical product (58), the corresponding contribution to Rn,m (Y ; X) (55) reads
(−1)|I | T¯ (I ∩ J c )T¯ (I ∩ J ) ×h¯ T (I c ∩ J )T (I c ∩ J c , X) . (65) I ⊂Y
However, this expression vanishes due to P ⊂J (−1)|P | T¯ (P ) ×h¯ T (J \ P ) = 0 (the latter equation is equivalent to (13), it is the perturbative version of S −1 S = 1). Hence for non-vanishing diagrams J must be the empty set. (ii) We express the R-product in terms of the connected T - and T¯ -products Rn,m (f1 ⊗ · · · ⊗ fn ; g1 ⊗ · · · ⊗ gm ) (−1)|I | = I ⊂{1,...,n}
p∈P
P ∈Part(I ) Q∈Part(I c &{1,...,m})
c c T¯|p| (⊗i∈p fi ) ×h¯ T|q| (⊗i∈q fi ⊗ ⊗j ∈q gj ) ,
(66)
q∈Q
where again means the classical product (58) and & stands again for the disjoint union. From (63) we know c T¯|p| (⊗i∈p fi ) = O(h¯ |I |−|P | ), p∈P
q∈Q
c T|q| (⊗i∈q fi ⊗ ⊗j ∈q gj ) = O(h¯ |I
c |+m−|Q|
).
(67)
Algebraic Quantum Field Theory, Perturbation Theory, Loop Expansion
19
From part (i) we conclude that the terms of lowest order (in h) ¯ in
c c c c h¯ n Mn T¯|p| T¯|p| (. . . ) ×h¯ T|q| (. . . ) = (. . . ), T|q| (. . . ) (68) p∈P
n≥0
q∈Q
p∈P
q∈Q
do not contribute. For simplicity we first consider the special case m = 1. Then only connected diagrams contribute. Hence we obtain n ≥ |P | + |Q| − 1 similarly to the reasoning after (64). For arbitrary m ≥ 1 the terms with minimal power in h¯ correspond to diagrams which are maximally disconnected.According to part (i) these diagrams have m disconnected components each component containing precisely one vertex gj . Applying the m = 1-argument to each of this components we get n ≥ |P | + |Q| − m. Taking (67) into account it results the assertion: (|I |−|P |)+(|I c |+m−|Q|)+(|P |+|Q|−m) = n. & % 5.3. Interacting fields. We first describe the perturbative construction of the interacting classical field. Let L be a function of the field which serves as the interaction Lagrangian (for simplicity, we do not consider derivative couplings). We want to find a Poisson algebra generated by a solution of the field equation ( + m2 )ϕL (x) = −
∂L ∂ϕ L
(x),
(69)
with the initial conditions {ϕL (0, x), ϕL (0, y)} = 0 = {ϕ˙L (0, x), ϕ˙L (0, y)} {ϕL (0, x), ϕ˙L (0, y)} = δ(x − y).
(70)
We proceed in analogy to the construction of the interacting quantum field in Sect. 3 and construct in a first step solutions with localized interactions θ L with θ ∈ D(R4 ) which coincide at early times with the free field (hence the initial conditions (70) are trivially satisfied for sufficiently early times). They are given by a formal power series in the Poisson algebra of the free field ϕθ L (x) =
∞ 0 0 0 0 n=0 y1 ≤y2 ≤...yn ≤x
dy1 dy2 . . . dyn θ(y1 ) . . . θ(yn )
(71)
{L(y1 ), {L(y2 ), . . . {L(yn ), ϕ(x)} . . . }} Analogous to the quantum case, the structure of the Poisson algebra associated to a causally closed region O does not depend on the behaviour of the interaction Lagrangian outside of O, i.e. there is, for θ, θ ∈ $(O) a canonical transformation v with v(ϕθ L (x)) = ϕθ L (x) for all x ∈ O. The interacting field ϕL may then be defined as a covariantly constant section within a bundle of Poisson algebras. Starting from the classical interacting field, one may try to define the quantized interacting field by replacing products of free classical fields by the normally ordered product of the corresponding free quantum fields (as in sect. 5.1) and the Poisson brackets in (71) by commutators {· , ·} →
1 [· , ·]h¯ , i h¯
(72)
20
M. Dütsch, K. Fredenhagen
where the commutator refers to the quantized product ×h¯ . Note that in general this replacement produces additional terms, e.g. the terms k ≥ 2 in min {n,m} n!m! 1 n m k−1 (i h) [: ϕ (x) :, : ϕ (y) :]h¯ = ¯ i h¯ (n − k)!(m − k)! k=1
k
+ (x − y) − + (y − x)k : ϕ (n−k) (x)ϕ (m−k) (y) :
(73)
which correspond to loop diagrams. Due to the distributional character of the fields with respect to the quantized product the integral in (71), as it stands, is not well defined (there is an ambiguity for coinciding points due to the time ordering). But as we will see Bogoliubov’s formula (29) for the interacting quantum field as a functional derivative of the relative S-matrix may be interpreted as a precise version of this integral. From the factorization property (11), (14) of time ordered and anti-time ordered products, one gets the following recursion formula for the retarded products ((54), (55)): if supp g is contained in the past and supp f, supp h in the future of some Cauchy surface, we find Rn+1,m (g ⊗ h⊗n ; f ⊗m ) = −[T1 (g), Rn,m (h⊗n ; f ⊗m )]h¯ ,
(74)
where we used the fact that T¯1 = T1 . Hence, for m = 1 and yi = yj ∀i = j the retarded product Rn,1 (y1 , . . . , yn ; x) can be written in the form7 0 0 0 R L(y1 ) . . . L(yn );ϕ(x) = (−1)n $(x 0 − yπn )$(yπn − yπ(n−1) )... π∈Sn (75) 0 0 − yπ1 )[L(yπ1 ), [L(yπ2 ) . . . [L(yπn ), ϕ(x)]h¯ . . . ]h¯ ]h¯ . $(yπ2
(Due to the locality of the interaction L this is a Poincaré covariant expression.) This formula confirms part (ii) of Proposition 2 for non-coinciding yi . Our main application of (75) is the study of the classical limit h¯ → 0 of the quantized interacting field (29). Due to Proposition 2 (part (ii)) R h¯ −1 L(y1 ) . . . h¯ −1 L(yn ); ϕ(x) contains no terms with negative powers of h¯ and thus has a well-defined classical limit. We conclude that the quantized interacting field (29), (54) ϕθ L (h) =
∞ in Rn,1 ((θL)⊗n ; hϕ), n!h¯ n
h ∈ D(R4 ),
(76)
n=0
tends to the classical interacting field (71) in this limit. Note that the factor h¯ −1 in the interaction Lagrangian is in accordance with the quantization rule (72), since in (75) there is for each factor L precisely one commutator. In Rn,1 ((θ L)⊗n ; f ϕ) the above mentioned ambiguities for coinciding points in the iterated retarded commutators have been fixed by the definition of time ordered products as everywhere defined distributions. The normalization condition N4 implies an analogous equation for the retarded product Rn,1 (cf. [10]). The latter means that ϕL (76) satisfies the same field equation as the classical interacting field (69)
∂L ( + m2 )ϕL (x) = − (x). (77) ∂ϕ L 7 The notation for the time ordered products introduced in Sect. 2 is used here for the retarded products.
Algebraic Quantum Field Theory, Perturbation Theory, Loop Expansion
21
Here ∂∂ϕL is not necessarily a polynomial in ϕL (the pointwise product of interacting L fields is in general not defined). We found that the relative S-matrices Sh¯ −1 θ L (f ) (f ∈ D(R4 , V)), and hence all elements of the algebra Ah¯ −1 θ L are power series in h. ¯ For the global algebras of covariantly constant sections we recall from [7] that the unitaries V ∈ U(θ, θ ) can be chosen as relative S-matrices V = Sh¯ −1 θ L (h¯ −1 θ− L)−1 ∈ U(θ, θ ),
(78)
where θ− ∈ D(R4 ) depends in the following way on (θ − θ ): we split θ − θ = θ+ + θ− with supp θ+ ∩ (C(O) + V¯− ) = ∅ and supp θ− ∩ (C(O) + V¯+ ) = ∅, (where C(O) means the causally closed region containing O in which θ and θ agree, cf. (18)). So V is a formal Laurent series in
h, ¯ and the sections
are no longer well defined power series. Replacing A and A(O) by n∈N0 h¯ n A and n∈N0 h¯ n A(O) (for the new algebras the same symbol A will be used again) we obtain modules over the ring of formal power series in h¯ with complex coefficients. For the further construction the validity of part (iii) of the following Proposition is crucial: (a) (a) Proposition 3. (i) Let Rn,m (. . . ; . . . ) = m a=1 Rn,m (. . . ; . . . ), where Rn,m (. . . ; . . . ) is the sum of all diagrams with a connected components. Then (a) ((h¯ −1 θ L)⊗n ; (h¯ −1 θ− L)⊗m ) = O(h¯ −a ). Rn,m
(79)
(Note that the range of a is restricted by part (i) of Proposition 2.) This estimate is of more general validity: instead of a retarded product we could have e.g. a multiple ×h¯ -product, a time ordered or antichronological product and the factors may be quite arbitrary. It is only essential that each factor is of order O(h¯ −1 ). (ii) Let A ∈ A(O). Then all diagrams which contribute to V ×h¯ A ×h¯ V −1 , (where V is given by (78)) have the property that each vertex of V and of V −1 is connected with at least one vertex of A. (It may happen that a connected component of V is not directly connected with A, but that it is connectecd with a connected component of V −1 and the latter is connected with A.) (iii) A(O) A = O(h¯ n )
(⇒
V ×h¯ A ×h¯ V −1 = O(h¯ n ).
(80)
In particular if A is the term of n-th order in h¯ of an interacting field, then V ×h¯ A ×h¯ V −1 is a power series in h¯ in which the terms up to order h¯ n−1 vanish. Proof. Part (i) is obtained essentially in the same way as Proposition 1. Part (iii) is a consequence of parts (i) and (ii), and the following observation: let us consider a diagram which contributes to V ×h¯ A ×h¯ V −1 according to part (ii). If the subdiagrams belonging to V and V −1 have r and s connected components, then the whole diagram has at least (r + s) contractions, which yield a factor h¯ (r+s) . It remains the proof of (ii): We use the same notations as in the proof of Proposition 2. Let Y1 & Y2 = Y , X1 & X2 = X. We now consider the sum of all diagrams contributing to R(Y, X) in which the vertices (Y1 , X1 ) are not connected with the vertices (Y2 , X2 ).
22
M. Dütsch, K. Fredenhagen
Using (55) and the fact that disconnected diagrams factorize with respect to the classical product (58), this (partial) sum is equal to (−1)|I ∩Y1 | [T¯ (I ∩ Y1 ) ×h¯ T (I c ∩ Y1 , X1 )] · I ⊂Y (81) |I ∩Y2 | ¯ c [T (I ∩ Y2 ) ×h¯ T (I ∩ Y2 , X2 )] = R(Y1 , X1 ) · R(Y2 , X2 ). (−1) From 1 = V V −1 = V V ∗ , (54) and (78) we know (−1)(|Y1 |+|X1 |) R ∗ (Y1 , X1 ) ×h¯ R(Y2 , X2 ) = 0
(82)
Y1 &Y2 =Y, X1 &X2 =X
for fixed (Y, X), Y ∪ X = ∅. Next we note 1 V ×h¯ A ×h¯ V −1 = dy1 . . . dyn dx1 . . . dxm θ(y1 ) . . . n!m! n,m θ (yn )θ− (x1 ) . . . θ− (xm ) (−i)(|Y1 |+|X1 |) Y1 &Y2 =Y, X1 &X2 =X
×i
(|Y2 |+|X2 |)
∗
R (Y1 , X1 ) ×h¯ A ×h¯ R(Y2 , X2 ),
(83)
where we have used the notations Y ≡ {y1 , . . . yn }, X ≡ {x1 , . . . , xn }. In the integrand of the latter expression we consider (for given Y and X) fixed decompositions Y = Y3 & Y4 and X = X3 & X4 , Y3 ∪ X3 = ∅. Now we consider the (partial) sum of all diagrams in which the vertices (Y3 , X3 ) are not connected with A and each of the vertices (Y4 , X4 ) is connected with A. Part (ii) is proved if we can show that this partial sum vanishes. This holds in fact true because R ∗ and R factorize according to (81), and due to the unitarity (82): (−1)(|Y1 ∩Y4 |+|X1 ∩X4 |) R ∗ (Y1 ∩ Y4 , X1 ∩ X4 ) Y1 &Y2 =Y, X1 &X2 =X
×h¯ A ×h¯ R(Y2 ∩ Y4 , X2 ∩ X4 ) (−1)(|Y1 ∩Y3 |+|X1 ∩X3 |) R ∗ (Y1 ∩ Y3 , X1 ∩ X3 ) ×h¯ R(Y2 ∩ Y3 , X2 ∩ X3 ) = 0. & %
Now we are ready to give an algebraic formulation of the expansion in h. ¯ Let In = h¯ n AL . In is an ideal in the global algebra AL . We define def
(n) def
AL =
AL , In+1
(n)
def
AL (O) =
AL (O) . In+1 ∩ AL (O)
(84)
which means that we neglect all terms which are of order O(h¯ n+1 ). The embeddings (n) (n) i21 : AL (O1 ) &→ AL (O2 ) for O1 ⊂ O2 induce embeddings AL (O1 ) &→ AL (O2 ). (n) Thus we obtain a projective system of local nets (AL (O)) of algebras of quantum observables up to order h¯ n+1 . (n) Note that we may equip our algebras AL also with the Poisson bracket induced by 1 i h¯ [·, ·]h¯ , because the ideals In are also Poisson ideals with respect to these brackets. Then
Algebraic Quantum Field Theory, Perturbation Theory, Loop Expansion
23
(0)
AL becomes the local net of Poisson algebras of the classical field theory, whereas for n = 0 we obtain a net of noncommutative Poisson algebras. The expansion in powers of h¯ is usually called “loop expansion”. This is due to the fact that the order in h¯ of a certain Feynman diagram belonging to Rn,m ((h¯ −1 θL)⊗n ; f1 ⊗ · · · ⊗ fm ), D(R4 , V) fj = O(h¯ 0 ), is equal to: (number of propagators (i.e. inner lines)) - n = (number of loops) + m - (number of connected components). In particular, using part (i) of Proposition 2, we find that for the interacting fields (m = 1) the order in h¯ agrees with the number of loops. 6. Local Algebraic Formulation of the Quantum Action Principle The method of algebraic renormalization (for an overview see [22]) relies on the so called “quantum action principle” (QAP), which is due to Lowenstein [20] and Lam [18]. This principle is a formula for the variation of (possibly connected or one-particle-irreducible) Green’s functions (or of the corresponding generating functional) under – a change of coordinates (e.g. one applies the differential operator of the field equation to the Green’s functions), – a variation of the fields (e.g. the BRST-transformation) – a variation of a parameter. This may be a parameter in the Lagrangian or in the normalization conditions for the Green’s functions. These are different theorems with different proofs. The common statement is that the variation of the Green’s functions is equal to the insertion of a local or spacetime integrated composite field operator (for details see [22]). In this section we study two simple cases of the QAP: the field equation and the variation of a parameter which appears only in the interaction Lagrangian. The aim of this section is to formulate the QAP (in these two cases) for our local algebras of observables AL (O), i.e. we are looking for an operator identity which holds true independently of the adiabatic limit. Such an identity does not depend on the choice of a state, as it is the case for the Green’s functions. In a second step we compare our formula with the usual formulation of the QAP in terms of Green’s functions. The latter are the vacuum expectation values in the adiabatic limit g → 1.8 We specialize to models for which the adiabatic limit is known to exist. This is the case for pure massive theories [14] and certain theories with (some) massless particles such as QED and λ : ϕ 2n : -theories [4], provided the time ordered products are appropriately normalized. Remarks. (1) From the usual QAP (in terms of Green’s functions) one obtains an operator identity by means of the Lehmann–Symanzik–Zimmermann-reduction formalism [19]. Although the latter relies on the adiabatic limit an analogous conclusion from the Fock vacuum expectation values to arbitrary matrix elements is possible in our local construction: let O be an open double cone and let x1 , . . . , xk ∈ ((O¯ ∪ {xk+l+1 , . . . , xn }) + V¯− ), xk+1 , . . . , xk+l ∈ O and xk+l+1 , . . . , xn ∈ (O¯ + V¯+ ). Using the causal factorization of time ordered products of interacting fields (28) we 8 This limit is taken by scaling the test function g: let g ∈ D(R4 ), g (0) = 1; then one considers the 0 0 limit A → 0 (A > 0) of gA (x) ≡ g0 (Ax). Uniqueness of the adiabatic limit means the independence of the particular choice of g0 .
24
M. Dütsch, K. Fredenhagen
obtain
∗ , Tθ L ϕ(x1 ) . . . ϕ(xn ) = Tθ L ϕ(x1 ) . . . ϕ(xk ) ,
Tθ L ϕ(xk+1 ) . . . ϕ(xk+l ) Tθ L ϕ(xk+l+1 ) . . . ϕ(xn ) .
(85)
Now we choose θ ∈ $(O) such that {x1 , . . . , xk } ∩ (supp θ + V¯− ) = ∅ and {xk+l+1 , . . . , xn } ∩ (supp θ + V¯+ ) = ∅. Due to the retarded support (56) of the Rproducts we then know that T ϕ(x ) . . . ϕ(x ) agrees with the time ordered prodk+l+1 n θ L uct T0 ϕ(xk+l+1 ) . . . ϕ(xn ) of the corresponding free fields. By means of Sθ L (f ϕ) = S(θL)−1 S(f ϕ)S(θL) for supp f ∩ (supp θ + V¯− ) = ∅ we obtain ∗ ∗ (86) Tθ L ϕ(x1 ) . . . ϕ(xk ) = S(θ L)−1 T0 ϕ(x1 ) . . . ϕ(xk ) S(θ L). Our assertion follows now from the fact that the states T0 ϕ(xk+l+1 ) . . . ϕ(xn ) generate a dense subspace of the Fock space and the same for the states S(θ L)−1 ∗ T0 ϕ(x1 ) . . . ϕ(xk ) S(θL). (For the validity of the latter statement it is important that x1 , . . . , xk can be arbitrarily spread over a Cauchy surface which is later than (O¯ ∪ {xk+l+1 , . . . , xn }).) (2) Recently Pinter [23] presented an alternative derivation of the QAP for the variation of a parameter in the Lagrangian also in the framework of causal perturbation theory. In contrast to our presentation Pinter’s QAP is formulated for the S-matrix. 6.1. Field equation. The normalization condition N4 implies ( x + m2 )R L(y1 ) . . . L(yn ); φ(x)φ(x1 ) . . . φ(xm ) = −i −i
n l=1 m
∂L δ(x − yl )R L(y1 ) . . . lˆ . . . L(yn ); (x)φ(x1 ) . . . φ(xm ) ∂φ
(87)
δ(x − xj )R L(y1 ) . . . L(yn ); φ(x1 ) . . . jˆ . . . φ(xm ) ,
j =1
where lˆ and jˆ means that the corresponding factor is omitted. This equation takes a simple form for the corresponding generating functionals (i.e. the relative S-matrices (16))
δ δ ∂L Sg L (f φ) − . f (x)Sg L (f φ) = ( x + m2 ) Sg L f φ + ρg iδf (x) iδρ(x) ρ=0 ∂φ (88) To formulate this in terms of our local algebras of observables (cf. sect. 3) we set g ≡ θ ∈ $(O) and for x ∈ O we can choose ρ such that supp ρ ⊂ {y|θ(y) = 1}. Then (88) turns into
∂L δ δ ( x + m2 ) SL (f φ) = f (x)SL (f φ) + , x ∈ O. SL f φ + ρ iδf (x) iδρ(x) ρ=0 ∂φ (89)
Algebraic Quantum Field Theory, Perturbation Theory, Loop Expansion
25
This is the QAP (in the case of the field equation) for the local algebras of observables. To compare with the usual form of the QAP we consider the generating functional Z(f ) for the Green’s functions < |T φL (x1 ) . . . φL (xm ) | > which is obtained from the relative S-matrices by Z(f ) = lim (, Sg L (f φ)), g→1
(90)
where is the Fock vacuum [14]. So by taking the vacuum expectation value and the adiabatic limit of (88) we get f (x)Z(f ) = − (x) · Z(f ),
(91)
where (x) is a insertion of UV-dimension9 3, coinciding with the classical field polyδS nomial δφ(x) in the classical approximation (where S = d 4 x [ 21 (∂µ φ(x)∂ µ φ(x) − m2 φ 2 (x)) + g(x)L(x)] is the classical action). Equation (91) is the usual form of the QAP (cf. eqn. (3.20) in [22]). In the present case the local algebraic formulation (89) contains more information than the usual QAP (91).
6.2. Variation of a parameter in the interaction. In (54) we have defined retarded products of Wick polynomials, i.e. elements of the Borchers class. Analogously we now introduce retarded products RL (g ⊗n ; f ⊗m ) of interacting fields SL+g (f ) = SL (g)−1 SL (g + f ) =
def
∞ i n+m RL (g ⊗n ; f ⊗m ), n!m!
(92)
n,m=0
where L, g, f ∈ D(R4 , V). Obviously they can be expressed in terms of antichronological and time ordered products of interacting fields by exactly the same formula as in the case of Wick polynomials (55) RL (g ⊗n ; f ⊗m ) =
n
(−1)k
k=0
n! T¯L (g ⊗k )TL (g ⊗(n−k) ⊗ f ⊗m ). k!(n − k)!
(93)
Thereby the antichronological product of interacting fields is defined analogously to the time ordered product (28), namely by T¯L (f ⊗m ) =
dm SL (λf )−1 , (−i)m dλm λ=0
(94)
and satisfies anticausal factorization (14) (which justifies the name). The support property (56) of the retarded products relies on the (anti)causal factorization of the T - and T¯ products (11, 14), hence, the R-product of interacting fields ((92), (93)) has also retarded support (56). Similarly to Lowenstein in [20], Sect. II.B, we consider an infinitesimal change of the interaction Lagrangian L0 → L0 + AL1 , 9 We assume that L has UV-dimension 4.
(95)
26
M. Dütsch, K. Fredenhagen
where L0 , L1 ∈ V or D(R4 , V). For the m-fold variation of the time ordered product of the interacting fields (28) we obtain d m ∂ l ∂ m ⊗l T (f ) = Sθ(L0 +A L1 ) (λf ) θ(L0 +A L1 ) dA m A=0 ∂A m A=0 i l ∂λl λ=0 = i m Rθ L0 ((θL1 )⊗m ; f ⊗l ).
(96)
To formulate this identity for our local algebras of observables we assume that L1 has compact support, i.e. L1 ∈ D(R4 , V). We set def
$0 (O) = {θ ∈ $(O)
|
θ |supp L1 ≡ 1}.
(97)
We consider the observables as covariantly constant sections in the bundle over $0 (O) (instead of $(O) as in sect. 3). Then we obtain dm ⊗l |A=0 TL0 +A L1 (f ⊗l ) = i m RL0 (L⊗m 1 ; f ). dA m
(98)
This is the local algebraic formulation of the QAP for the variation of a parameter in the interaction. We are now going to investigate the usual QAP by using Epstein and Glaser’s definition of Green’s functions (90). In (96) the m-fold variation of the parameter A results in a retarded insertion of (θ L1 )⊗m . In the usual QAP (θL1 )⊗m is inserted into the time ordered product, i.e. one considers i m Tθ L0 ((θL1 )⊗m ⊗ f ⊗l ) =
∂ l ∂ m Sθ L0 (θ AL1 + λf ). ∂A m A=0 i l ∂λl λ=0
(99)
Obviously (96) and (99) do not agree. However, let us assume that we are dealing with a purely massive theory and that L0 and L1 have UV-dimension dim(Lj ) = 4. Or: if dim(Lj ) < 4 we assume that Lj is treated in the extension to the total diagonal as if it would hold dim(Lj ) = 4. Hence it may occur that the scaling degree increases in the extension to a certain amount: sd(t0 ) ≤ sd(t) ≤ 4n − b for a scalar theory without derivative couplings, where b is the number of external legs (cf. (51)–(53)). (In the BPHZ framework one says that Lj is “oversubtracted with degree 4”.) Then there exists a normalization of the time ordered products, which is compatible with the other normalization conditions N1–N4 and (63), such that the Green’s functions corresponding to (99) exist and agree, i.e. we assert
d m ⊗l m ⊗m ⊗l , T , T lim (f ) = i lim ((θL ) ⊗ f ) 1 θ(L0 +A L1 ) θ L0 θ→1 dA m A=0 θ→1 (100) for all m, l ∈ N0 , which is equivalent to
lim , Sθ(L0 +A L1 ) (λf ) = lim , Sθ L0 (θ AL1 + λf ) . θ→1
θ→1
m
l
(101)
∂ ∂ commute with the adiabatic limit θ → 1. (We assume that the derivatives ∂A m and ∂λl This seems to be satisfied for vacuum expectation values in pure massive theories as it is the case here [14].) This is the usual form of the QAP (in terms of Epstein and Glaser’s
Algebraic Quantum Field Theory, Perturbation Theory, Loop Expansion
27
Green’s functions) for the present case (cf. Eq. (2.6) of [20]10 ). In contrast to the field equation, the QAP (100) does not hold for the operators before the adiabatic limit. Proof of (100). For a better comparison with Lowenstein’s formulation, we present a proof which makes the detour over the corresponding Gell–Mann Low expressions. First we comment on the equality of Epstein and Glaser’s Green’s functions with the Gell–Mann Low series lim (, Sθ L (f )) = lim
θ→1
θ→1
(, S(θ L + f )) , (, S(θL))
(102)
which is proved in the appendix of [12]. This can be understood in the following way: let def
P be the projector on the Fock vacuum and P⊥ = 1−P . Using S(θ L)∗ = S(θ L)−1 we obtain (, Sθ L (f )) = (S(θL), (P + P⊥ )S(θ L + f )) (, S(θL + f )) · |(, S(θL))|2 = (, S(θ L)) + (, S(θL)−1 P⊥ S(θ L + f ))
(103)
and 1 = (, S(θ L)−1 (P + P⊥ )S(θL))
= |(, S(θ L))|2 + (, S(θ L)−1 P⊥ S(θ L)).
(104)
In (, S(θ L)−1 P⊥ S(θL + f )) there is at least one contraction between S(θ L)−1 and S(θL + f ) (or: the terms without contraction are precisely (, S(θ L)−1 )(, S(θL + f ))). In the mentioned reference the support properties in momentum space of the contracted terms are analysed and in this way it is proved lim (, S(θL)−1 P⊥ S(θL + f )) = 0.
θ→1
(105)
Inserting this into (103) and (with f = 0) into (104) it results (102). Because of (102) our assertion (101) is equivalent to lim
θ→1
(, S(θ (L0 + AL1 ) + λf )) (, S(θ (L0 + AL1 ) + λf )) = lim . θ→1 (, S(θ (L0 + AL1 ))) (, S(θ L0 ))
(106)
This is the QAP in terms of the Gell–Mann Low series. Obviously the nontrivial statement is lim
θ→1
(, S(θ (L0 + AL1 ))) = 1. (, S(θL0 ))
(107)
A possibility to ensure the validity of this equation is the above assumption (which has not been used so far) that L0 and L1 have mass dimension dim(Lj ) ≤ 4 and are treated as 10 Lowenstein works with Zimermanns definition of normal products of interacting fields: l Nδ { lj =1 ϕij L (x)}, δ ≥ d ≡ j =1 d(ϕij L ) [29]. For δ = d (i.e. without oversubtraction) l Nδ { j =1 ϕij L (x)} agrees essentially with our (: lj =1 ϕij (x) :)g L (29). The difference is due to the adiabatic limit and the different ways of defining Green’s functions (Zimmermann uses the Gell–Mann Low series, cf. (102), (106)).
28
M. Dütsch, K. Fredenhagen
dimension 4 vertices in the renormalization procedure. Due to this additional assumption and the requirements that the adiabatic limit exists and is unique, the normalization of the vacuum diagrams is uniquely fixed, and with this normalization the vacuum diagrams vanish in the adiabatic limit lim (, S(θ L0 )) = 1,
θ→1
lim (, S(θ (L0 + AL1 ))) = 1.
θ→1
(For a proof see also the appendix of [12].)
(108)
& %
Remarks. (1) Without the assumption about L0 and L1 we find
, Sθ L0 (θ AL1 + λf )
lim , Sθ(L0 +A L1 ) (λf ) = lim θ→1 θ→1 , Sθ L0 (θ AL1 )
(109)
instead of (101), by using (102) only. This is a formulation of the QAP for general situations in which (107) does not hold. (2) By means of the QAP (98) (or (100), or (109)) one can compute the change of the time ordered products of interacting fields (or of the Green’s functions) under the variation of parameters λ1 , . . . , λs if the interaction Lagrangian has the form L(x) = 4 i ai (λ1 , . . . , λs )Li (x), Li ∈ V resp. D(R , V) (cf. Eqs. (2.7), (2.8)) of [20]). But only the interaction L may depend on the parameters and not the time ordering operator (i.e. the normalization conditions for the time ordered products). Appendix: Wavefront Sets and the Pointwise Product of Distributions In this appendix we briefly recall the definition of the wavefront set of a distribution and mention a simple criterion for the existence of the pointwise product of distributions in terms of their wavefront sets. For a detailed treatment we refer to Hörmander [15], the application to quantum field theory on curved spacetimes can be found in [25, 8, 7]. Let t ∈ D (Rd ) be singular at the point x and let f ∈ D(R4 ) with f (x) = 0. Then f t ∈ D (Rd ) is also singular at x and f t has compact support. Hence the Fourier transform ft is a C ∞ -function. In some directions ft does not rapidly decay, because otherwise f t would be infinitly differentiable at x. Thereby a function g is called rapidly decaying in the direction k ∈ Rd \ {0}, if there is an open cone C with k ∈ C and supk ∈C |k |N |g(k )| < ∞ for all N ∈ N. Definition. The wavefront set WF(t) of a distribution t ∈ D (Rd ) is the set of all pairs (x, k) ∈ Rd × Rd \ {0} such that the Fourier transform ft does not rapidly decay in the direction k for all f ∈ D(Rd ) with f (x) = 0. For example the delta distribution satisfies fδ(k) = f (0), hence WF(δ) = {0} × \ {0}. The wavefront set is a refinement of the singular support of t (which is the complement of the largest open set where t is smooth): Rd
t is singular at x
⇐⇒
∃k ∈ Rd \ {0} with (x, k) ∈ WF(t).
For the wavefront set of the two-point function one finds WF( + ) = {(x, k) | x 2 = 0, k 2 = 0, x.k, k0 > 0}.
(110)
Algebraic Quantum Field Theory, Perturbation Theory, Loop Expansion
29
Let t and s be two distributions which are singular at the same point x. We localize them by multiplying with f ∈ D(Rd ), where f (x) = 0. We assume that (f t) and (f s) have only one overlapping singularity, namely at x. In general the pointwise product (f t)(y)(f s)(y) does not exist. Heuristically this can be seen by the divergence of the convolution integral dk (f t)(p − k)(f s)(k). But this integral converges if k1 + k2 = 0 for all k1 , k2 with (x, k1 ) ∈ WF(t) and (x, k2 ) ∈ WF(s). This makes plausible the following theorem: Theorem. Let t, s ∈ D (Rd ) with {(x, k1 + k2 ) | (x, k1 ) ∈ WF(t) ∧ (x, k2 ) ∈ WF(s)} ∩ (Rd × {0}) = ∅.
(111)
Then the pointwise product (ts) ∈ D (Rd ) exists. By means of this theorem one verifies the existence of the distributional products (ϕ ⊗n )h¯ (t) (44) and (t ⊗k,h¯ s) (47). Acknowledgements. We thank Gudrun Pinter for several discussions on the quantum action principle, and Volker Schomerus and Stefan Waldmann for discussions on deformation quantization. In particular we are grateful to Stefan Waldmann for drawing our attention to reference [9].
Note added in proof. Renormalization can also be done entirely on the level of retarded products [1, 2, 3]. This leads to a direct proof that the interacting fields are power series in h. ¯ [1] Steinmann, O.: Perturbation expansions in axiomatic field theory. Lecture Notes in Physics 11, Berlin– Heidelberg–New York: Springer-Verlag, 1971 [2] Dütsch, M. and Fredenhagen, K.: Perturbative Algebraic Field Theory, and Deformation Quantization. To appear in the proceedings of the Conference on Mathematical Physics in Mathematics and Physics, Siena, June 20–25, 2000 [3] Dütsch, M. and Fredenhagen, K.: Causal perturbation theory in terms of retarded products and perturbative algebraic field theory. Work in progress
References 1. Balaban, T.: Large Field Renormalization. II. Localization, Exponentiation, and Bounds for the R Operation. Commun. Math. Phys. 122, 355 (1989); and earlier works of Balaban cited therein 2. Bayen, F., Flato, M., Fronsdal, C., Lichnerowicz, A., Sternheimer, D.: Deformation Theory and Quantization. Ann. Phys. (N.Y.) 111, 61, 111 (1978) 3. Becchi, C., Rouet, A. and Stora, R.: Renormalization of the abelian Higgs–Kibble model. Commun. Math. Phys. 42, 127 (1975); Becchi, C., Rouet, A. and Stora, R.: Renormalization of gauge theories. Ann. Phys. (N.Y.) 98, 287 (1976) 4. Blanchard, P. and Sénéor, R.: Green’s functions for theories with massless particles (in perturbation theory). Ann. Inst. H. Poincaré A 23, 147 (1975) 5. Boas, F.M., Dütsch, M. and Fredenhagen, K.: A local (perturbative) construction of observables in gauge theories: Nonabelian gauge theories. Work in progress 6. Bogoliubov, N.N. and Shirkov, D.V.: Introduction to the Theory of Quantized Fields. New York, 1959 7. Brunetti, R. and Fredenhagen, K.: Microlocal analysis and interacting quantum field theories: Renormalization on physical backgrounds. math-ph/9903028, Commun. Math. Phys. 208, 623 (2000) 8. Brunetti, R., Fredenhagen, K. and Köhler, M.: The microlocal spectrum condition and Wick polynomials of free fields on curved space times. Commun. Math. Phys. 180, 312 (1996) 9. Dito, J.: Star-Product Approach to Quantum Field Theory: The Free Scalar Field. Lett. Math. Phys. 20, 125 (1990); Dito, J.: Star-products and nonstandard quantization for Klein–Gordon equation. J. Math. Phys. 33, 791 (1992)
30
M. Dütsch, K. Fredenhagen
10. Dütsch, M. and Fredenhagen, K.: A local (perturbative) construction of observables in gauge theories: The example of QED. Commun. Math. Phys. 203, 71 (1999) 11. Dütsch, M. and Fredenhagen, K.: Deformation stability of BRST-quantization. Preprint: hep-th/9807215, DESY 98-098, In: Proceedings of the conference ’Particles, Fields and Gravitation’, Lodz, Poland (1998) 12. Dütsch, M.: Slavnov–Taylor identities from the causal point of view. Int. J. Mod. Phys. A 12, 3205 (1997) 13. Epstein, H.: On the Borchers’ class of a free field. N. Cimento 27, 886 (1963); Schroer, B.: Unpublished preprint (1963) 14. Epstein, H. and Glaser, V.: The role of locality in perturbation theory. Ann. Inst. H. Poincaré A 19, 211 (1973) 15. Hörmander, L.: The Analysis of Linear Partial Differential Operators I. Berlin: Springer-Verlag, 1983 16. Il’in, V.A., Slavnov, D.A.: Algebras of observables in the S-matrix approach. Theor. Math. Phys. 36, 578 (1978) 17. Kugo, T. and Ojima, I.: Local covariant operator formalism of non-abelian gauge theories and quark confinement problem. Suppl. Progr. Theor. Phys. 66, 1 (1979) 18. Lam, Y.-M.P.: Perturbation Lagrangian Theory for Scalar Fields – Ward–Takahashi Identity and Current Algebra. Phys. Rev. D 6, 2145 (1972); Equivalence Theorem on Bogoliubov–Parasiuk–Hepp– Zimmermann – Renormalized Lagrangian Field Theories. Phys. Rev. D 7, 2943 (1973) 19. Lehmann, H., Symanzik, K., Zimmermann, W.: Zur Formulierung quantisierter Feldtheorien. Nuovo Cimento 1, 205 (1955) 20. Lowenstein, J.H.: Differential vertex operations in Lagrangian field theory. Commun. Math. Phys. 24, 1 (1971) 21. Magnen, J., Rivasseau and Sénéor, R.: Construction of Y M4 with an infrared cutoff. Commun. Math. Phys. 155, 325 (1993) 22. Piguet, O. and Sorella, S.P.: Algebraic Renormalization. Berlin–Heidelberg–New York: Springer-Verlag, 1995 23. Pinter, G.: The Action Principle in Epstein Glaser Renormalization and Renormalization of the S-Matrix of 4 -Theory. hep-th/9911063 24. Prange, D.: Epstein–Glaser renormalization and differential renormalization. J. Phys. A 32, 2225 (1999) 25. Radzikowski, M.: Micro-local approach to the Hadamard condition in quantum field theory on curved space-time. Commun. Math. Phys. 179, 529 (1996) 26. Scharf, G.: Finite Quantum Electrodynamics. The causal approach. 2nd. ed., Berlin–Heidelberg–New York: Springer-Verlag, 1995 27. Steinmann, O.: Perturbation expansions in axiomatic field theory. Lecture Notes in Physics 11, Berlin– Heidelberg–New York: Springer-Verlag, 1971 28. Stora, R.: Differential algebras in Lagrangean field theory. ETH-Zürich Lectures, January–February 1993; Popineau, G. and Stora, R.: A pedagogical remark on the main theorem of perturbative renormalization theory. Unpublished preprint (1982) 29. Zimmermann, W.: In: Lectures on Elementary Particles and Quantum Field Theory. Brandeis Summer Institute in Theoretical Physics (1970), S. Deser (ed.) Communicated by G. Mack
Commun. Math. Phys. 219, 31 – 44 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Weak Transition Matrix Elements from Finite-Volume Correlation Functions Laurent Lellouch1, , Martin Lüscher2, 1 LAPTH, Chemin de Bellevue, B.P. 110, 74941 Annecy-Le-Vieux Cedex, France 2 CERN, Theory Division, 1211 Geneva 23, Switzerland
Received: 29 March 2000 / Accepted: 10 April 2000
Dedicated to the memory of Harry Lehmann Abstract: The two-body decay rate of a weakly decaying particle (such as the kaon) is shown to be proportional to the square of a well-defined transition matrix element in finite volume. Contrary to the physical amplitude, the latter can be extracted from finite-volume correlation functions in euclidean space without analytic continuation. The K → ππ transitions and other non-leptonic decays thus become accessible to established numerical techniques in lattice QCD. 1. Introduction The computation of the non-leptonic kaon decay rates from first principles, using lattice QCD and numerical simulations, meets a number of technical difficulties (see [1], for example). Apart from the operator renormalization, which must be controlled at the nonperturbative level, the central problem is that the computational framework is limited to correlation functions in euclidean space and that there is apparently no simple relation between the behaviour of these functions at large time separations and the desired transition matrix elements [2, 3]. This statement (which is often referred to as the Maiani–Testa no-go theorem) applies to very large or infinite lattices, where the spectrum of final states is continuous. One might think that having a finite volume (as is unavoidable when numerical simulations are employed) makes it even more difficult to extract the transition amplitudes. In the present paper we wish to show that this is actually not so. The key observation is that the two-pion energy spectrum is far from being continuous when the lattice is only a few fermis wide. Under these conditions, a kaon at rest cannot decay into two pions unless one of these energy levels happens to be close to its mass. This is the case for certain Work supported in part by TMR, EC-Contract No. ERBFMRX-CT980169.
On leave from Centre de Physique Théorique, CNRS Luminy, 13288 Marseille Cedex 9, France On leave from Deutsches Elektronen-Synchrotron DESY, 22603 Hamburg, Germany
32
L. Lellouch, M. Lüscher
lattice sizes, and a simple formula then relates the square of the corresponding transition amplitude in finite volume to the physical decay rate in infinite volume. The problem is thus reduced to calculating the required finite-volume transition amplitudes. Since the initial and final states are isolated energy eigenstates, these matrix elements can in principle be computed using established techniques, such as those commonly employed to determine form factors. An additional difficulty is that the relevant two-pion states are not the lowest ones in the specified sector. Two-particle states in finite volume have, however, previously been studied [4]–[14] and practical methods have been devised to calculate the higher levels. To keep the presentation as transparent as possible, we shall consider a simplified generic theory with two kinds of spinless particles, referred to as the kaon and the pion. Details are given in the next section, and we then first discuss the form of the two-pion energy spectrum in finite volume. This is essentially a summary of the relevant results of refs. [15]–[17]. In Sect. 4 we define the transition amplitudes in finite volume and state the formula that relates them to the corresponding decay rates in infinite volume. The following sections contain the proof of this relation and a discussion of its application to the physical kaon decays. 2. Preliminaries As announced above, we consider a generic situation where there are two particles, the “kaon” and the “pion”, with spin zero and masses such that 2mπ < mK < 4mπ .
(2.1)
We assume that the symmetries of the theory are such that the kaon is stable in the absence of the weak interactions and that the pions scatter purely elastically below the four-pion threshold. The weak interactions, described by a local effective lagrangian Lw (x), then allow the kaon to decay into two pions. The corresponding transition amplitude is T (K → ππ) = π p1 , π p2 out|Lw (0)|K p,
(2.2)
with p1 , p2 and p the four-momenta of the pions and the kaon. We shall only be interested in the physical case where the total momentum p = p1 + p2 is conserved. Lorentz invariance and the kinematical constraints then imply that the transition amplitude is independent of the momentum configuration. The meson states in Eq. (2.2) are normalized according to the standard relativistic conventions (Appendix A) and their phases are constrained by the LSZ formalism. In the case of the pions, for example, one assumes that there exists an interpolating hermitian field ϕ(x) such that 0|ϕ(x)|π p = Zπ e−ipx (2.3) for some positive constant Zπ . If the phase of the kaon states is chosen in the same way, the CPT symmetry implies T (K → π π ) = Aeiδ0
(2.4)
with A real and δ0 the S-wave scattering phase shift of the outgoing pion state. The decay rate is then given by the usual expression kπ 1 2 |A| = , kπ ≡ m2K − 4m2π , (2.5) 2 16πm2K proportional to the pion momentum kπ in the centre-of-mass frame.
Weak Transition Matrix Elements from Finite-Volume Correlation Functions
33
3. Two-Pion States in Finite Volume In a spatial box of size L × L × L with periodic boundary conditions, the eigenvalues of the total momentum operator are integer multiples of 2π/L. The energy spectrum is also discrete in this situation, with level spacings that can be appreciable. In the following we consider the subspace of states with zero total momentum and trivial transformation behaviour under cubic rotations and reflections. The energy spectrum of the two-pion states in this sector below the inelastic threshold W = 4mπ has been studied in detail in refs. [15]–[18]. In particular, for the lowest energy value the expansion a02 a0 4π a0 W = 2mπ − 1 + c1 + c2 2 + O(L−6 ), (3.1) m π L3 L L c1 = −2.837297,
c2 = 6.375183,
(3.2)
has been obtained, where a0 = lim
k→0
δ0 (k) k
(3.3)
is the S-wave scattering length (here and below the scattering phase is considered to be a function of the pion momentum k in the centre-of-mass frame). The higher energy values in the elastic region are determined through W = 2 m2π + k 2 , (3.4) nπ − δ0 (k) = φ(q),
q≡
kL , 2π
(3.5)
where n = 1, 2, . . . labels the energy levels in increasing order and the angle φ(q) is a known kinematical function (Appendix B). Apart from the lowest level, the energy spectrum at any given value of L is thus obtained by inserting the solutions k of Eq. (3.5) in Eq. (3.4)1 . All these results are valid up to terms that vanish exponentially at large L. Box sizes a few times larger than the diameter of the pion should be safe from these corrections. Equation (3.5) moreover assumes that the scattering phases δl for angular momenta l ≥ 4 are small in the elastic region, which is usually the case since δl is proportional to k 2l+1 at low energies. For illustration, let us consider QCD with three flavours of quarks, unbroken isospin symmetry and quark masses such that the masses of the charged pions and kaons coincide with their physical values. In the subspace with isospin 0, the two-pion energy spectrum is then given by Eqs. (3.1)–(3.5), with δ0 the appropriate pion scattering phase. If we insert the phase shift that is obtained at one-loop order of chiral perturbation theory [20]–[22], this yields the curves shown in Fig. 1. For any other reasonable choice of the scattering phase the plot would look essentially the same, because the interaction effects are proportional to 1/L3 and thus tend to be small. Note that the spacing between successive levels is quite large. One is clearly very far away from having a continuous spectrum when L ≤ 10 fm. 1 Similar formulae have been derived for the spectrum in the subspaces of states with non-zero total momentum [19]. The extension of our results to these sectors could give further insight into the connection between finite and infinite volume matrix elements and may prove useful in practice.
34
L. Lellouch, M. Lüscher
Fig. 1. Two-pion energy spectrum in QCD below the inelastic threshold, in the sector with isospin 0, calculated from Eqs. (3.1)–(3.5) with the scattering phase shift given by next-to-leading order chiral perturbation theory. The levels shown in this plot are all non-degenerate
4. Kaon Decays in Finite and Infinite Volume Let us imagine that a state |K describing a kaon in finite volume with zero momentum has been prepared at time x0 = 0. In the absence of the weak interactions, this is an energy eigenstate (and thus a stationary state) with energy mK . However, through the interaction hamiltonian Hw = d3 x Lw (x), (4.1) x0 =0
the time evolution of the state becomes non-trivial and it starts to mix with the other eigenstates of the unperturbed hamiltonian. It is straightforward to work this out using ordinary time-dependent perturbation theory. For the transition probability at time x0 = t to any finite-volume two-pion state |π π with energy W , the result 2 1 2 sin 2 ωt P (K → ππ) = 4 |ππ|Hw |K| , ω ≡ W − mK , (4.2) ω2 is then obtained (in this equation the states are assumed to be normalized to unity and higher-order weak-interaction effects have been neglected). From Eq. (4.2) one infers that the transition probabilities tend to be very small unless the energy of one of the two-pion final states happens to be close to the kaon mass. Recalling Fig. 1, it is clear that this will be the case only for certain box sizes L. In the following we focus on these special values of L and introduce the associated transition matrix element M = π π |Hw |K,
(4.3)
where both states are normalized to unity as before, while their phase will not matter and can be chosen arbitrarily. Since W = mK in this case, Eq. (4.2) becomes P (K → π π ) = |M|2 t 2
(4.4)
Weak Transition Matrix Elements from Finite-Volume Correlation Functions
35
and the kaon will thus have an appreciable probability to decay into the two-pion state if one waits long enough (the formula breaks down at very large times, because the higher-order terms are then no longer negligible). The central result obtained in the present paper is that the finite-volume matrix element M is related to the decay rate of the kaon in infinite volume through
∂φ ∂δ0 |A| = 8π q +k ∂q ∂k
2
k=kπ
mK kπ
3 |M|2
(4.5)
[cf. Eqs. (2.5), (3.5)]. The relation holds under the same premises as Eq. (3.5) and the comments made in Sect. 3 thus apply here too. Another restriction is that the two-pion final state has to be non-degenerate in the specified sector of the unperturbed theory. This condition is satisfied for n < 8 [17], but degeneracies can occur at higher level numbers and the formula then ceases to be valid. In principle Eq. (4.5) allows one to compute the kaon decay rate in infinite volume by studying the theory in finite volume. Note that in the course of such a calculation it should also be possible to determine the two-pion energy spectrum and thus the scattering phase δ0 in the elastic region. The proportionality factor in Eq. (4.5) essentially accounts for the different normalizations of the particle states in finite and infinite volume. One can easily check this in the free theory, where the pion self-interactions are neglected. In this case and for n ≤ 6, the nth two-pion energy level passes through mK at L=
2π √ n. kπ
(4.6)
Equation (4.5) then assumes the form 4 (mK L)3 |M|2 , νn νn ≡ number of integer vectors z with z2 = n,
|A|2 =
(4.7) (4.8)
which is precisely what is derived from the relative normalizations of the plane waves in finite and infinite volume that describe the (non-interacting) kaon and pion states (Sect. 6). 5. Proof of Equation (4.5) The interpretation of the proportionality factor in Eq. (4.5) given above also applies in the interacting case. This follows from the fact that the transition matrix elements probe the S-wave component of the two-pion wave function near the origin and that this component is the same in finite and infinite volume apart from its phase and normalization. The latter can be worked out explicitly in the framework of refs. [15]–[17], but the calculation is rather involved and will not be presented here. Instead we shall go through a different argument, where one studies the influence of the weak interaction on the energy spectrum in finite volume. This can be done directly, using ordinary perturbation theory, or one may start from Eq. (3.5) and take the weakinteraction effects on the scattering phase into account. The combination of the results of these calculations then yields Eq. (4.5).
36
L. Lellouch, M. Lüscher
K
Fig. 2. Kaon resonance contribution to the elastic pion scattering amplitude in the s-channel. The diagram appears at second order of the expansion in powers of the weak interaction, with the bubbles representing the first-order Kπ π vertex function
As already mentioned in Sect. 2, the kaon is assumed to carry a quantum number (alias strangeness) that forbids its decay into pions in the unperturbed theory. Since only the strangeness-changing part of the weak interaction lagrangian contributes to the kaon transition amplitudes, all other terms may be dropped without loss. The matrix elements of the weak hamiltonian Hw between states with the same strangeness are then all equal to zero. As a consequence most energy values in finite volume are affected by the weak interaction only to second order. First order energy shifts do occur, however, if there are degenerate states at lowest order that mix under the action of Hw . This is the case at the values of L where one of the two-pion energy values coincides with the kaon mass, i.e. at the special points considered in the preceding section. Degenerate perturbation theory then yields W = mK ± |M| + . . .
(5.1)
for the first order change of these energy values (here and below the ellipses denote higher-order terms that do not contribute to the final results). The energy shifts (5.1) can also be calculated by including the weak corrections to the scattering phase on the left-hand side of Eq. (3.5). From the above one infers that the solutions of Eq. (3.5) we are interested in are given by mK |M| . k = kπ ± &k + . . . , &k ≡ (5.2) 4kπ Compared to the kaon resonance width (which is of second order in the weak interaction), these values of k are far away from the kaon pole. The weak corrections to the pion scattering amplitude in the relevant range of energies are hence small and can be safely computed by working out the perturbation expansion in powers of the interaction lagrangian. One might think that these corrections are all of second or higher order, because the interaction is strangeness-changing. The reason this is not so is that the kaon propagator in a diagram like the one shown in Fig. 2 evaluates to iZK iZK =± + ... 2mK |M| p 2 − m2K
(5.3)
at the energies (5.1) and thus reduces the effective order of the term by 1. This diagram is in fact the only one that yields a first-order contribution to the scattering amplitude. It can be calculated by noting that the momenta flowing into the three-point vertices are all on shell up to higher-order corrections. The vertices are hence proportional to the kaon decay amplitude A. Together with Eq. (5.3) this leads to the result δ¯0 (k) = δ0 (k) ∓
kπ |A|2 + ... 32π m2K |M|
(mod π )
(5.4)
Weak Transition Matrix Elements from Finite-Volume Correlation Functions
37
for the scattering phase in the full theory at the point (5.2) (as in the previous section, δ0 stands for the phase shift in the unperturbed theory). We now replace δ0 in Eq. (3.5) by δ¯0 and expand all terms in powers of the weak interaction. The lowest-order terms cancel while, at first order, the equation implies kπ |A|2 ∂δ0 (k) ∂φ(q) + . (5.5) −&k = &k ∂k ∂k 32π m2K |M| k=kπ k=kπ This is easily seen to be equivalent to Eq. (4.5) after substituting the expression (5.2) for &k and we have thus proved this relation. 6. Verification of Equation (4.5) in Perturbation Theory In a low-energy effective theory, such as the chiral non-linear σ -model, it is possible to obtain an independent check on Eq. (4.5) by working out the transition amplitudes in finite and infinite volume in perturbation theory. Since this calculation does not rely on any of the results presented above, it can provide additional confidence in the correctness of the equation. Perturbation theory may also prove helpful when considering more complicated situations, where one has several decay channels or particles with non-zero spin. In this section we describe how such a calculation proceeds, without giving too many details. 6.1. Specification of the model. The two-pion energy spectrum and the proportionality factor in Eq. (4.5) depend on the final-state interactions only through the phase shift δ0 . All other properties of the pion interactions do not matter and to check the equation we may thus consider an arbitrary effective meson theory with the correct particle spectrum. For the pion interaction lagrangian the simplest choice is 1 (6.1) λϕ(x)4 , 4! where ϕ(x) denotes the pion field and λ the bare coupling. To make the perturbation expansion completely well-defined, we introduce a Pauli–Villars cutoff ). At tree level the euclidean pion propagator is then given by 1 1 d4 x e−ipx ϕ(x)ϕ(0) = 2 − 2 , (6.2) 2 m +p ) + p2 Lint (x) =
with m the bare mass of the pion (its physical mass is denoted by mπ as before). The cutoff should be large enough so that ghost particles cannot be produced at energies below the four-pion threshold, but in view of the universality of Eq. (4.5) there is no need to take ) to infinity at the end of the calculation. As far as the kaon is concerned, the least complicated possibility is to describe it by a hermitian free field θ (x) with mass mK and to take 1 (6.3) gθ (x)ϕ(x)2 2 as the weak-interaction lagrangian. One then first has to expand the transition amplitude (2.2) in powers of λ, but we shall not discuss this here since the calculation is completely standard. The way to obtain the perturbation expansion of the finite-volume matrix element (4.3) may be less obvious, however, and we thus proceed to explain this in some detail. Lw (x) =
38
L. Lellouch, M. Lüscher
time
(a)
(b)
(c)
(d)
Fig. 3. Feynman diagrams contributing to the correlation function (6.7). The lines represent the free pion propagator in the time-momentum representation (6.8) and the filled circles the self-interaction vertex. All external lines end at times x0 or y0
6.2. Two-pion states. In finite volume the low-lying two-pion energy eigenstates with zero total momentum and trivial transformation behaviour under the cubic group may be labelled by an integer n = 0, 1, 2 . . . such that the associated energies Wn increase monotonically with n. We denote these states by |π π n and assume that they have unit norm. To lowest order in λ, the energy values are determined through the free energymomentum relation and the relative momentum of the pions. Since we only consider cubically invariant states, any two momenta that are related to each other by a cubic transformation describe the same state. For n ≤ 6 the momenta in the set
-n = k = 2π z/L z ∈ Z3 , z2 = n (6.4) are all equivalent in this sense. The corresponding state is thus non-degenerate and one concludes from this that Wn = 2 m2 + n(2π/L)2 + O(λ), 0 ≤ n ≤ 6. (6.5) In the following, our attention will be restricted to these levels. The corresponding energy eigenstates |π π n can be created from the vacuum by applying the operators L On (x0 ) = d3 x d3 y eik(x−y) ϕ(x0 , x)ϕ(x0 , y). (6.6) k∈-n 0
Note that On (x0 ) couples to all two-pion states in the given sector, since there are no quantum numbers that would forbid this. In euclidean space and at large time separations x0 − y0 , its connected two-point function is thus given by On (x0 )On (y0 )con =
6
|0|On (0)|π π l|2 e−Wl (x0 −y0 ) + . . . ,
(6.7)
l=0
where the ellipses stand for more rapidly decaying terms. The perturbation expansion of the two-pion energy Wn and the associated matrix element |0|On (0)|ππ n| may now be obtained by expanding the left-hand side of Eq. (6.7) in Feynman diagrams in the standard way. If one uses the time-momentum representation L e−ωp |x0 | d3 x e−ipx ϕ(x)ϕ(0) = − (m ↔ )), ωp ≡ m2 + p 2 , (6.8) 2ωp 0
Weak Transition Matrix Elements from Finite-Volume Correlation Functions
(a)
(b)
39
(c)
Fig. 4. Diagrams contributing to the correlation function (6.10). The double line represents the kaon propagator and the circled cross the weak interaction vertex at the origin. All other graphical elements are as in Fig. 3
for the tree-level pion propagator, the diagrams evaluate to a sum of exponentials. The desired expansions can then be read off from the coefficients of the exponential factor that corresponds to the nth level. To leading order the diagrams (a) and (b) in Fig. 3 yield the expected expression (6.5) for the two-pion energy and |0|On (0)|π π n| = 2νn L3 /Wn + O(λ) (6.9) for the matrix element [cf. Eq. (4.8)]. At the next order in the coupling, there are two types of diagrams. Diagram c and three further diagrams of this kind amount to an additive renormalization of the pion mass by a term that is independent of L up to exponentially small corrections [15]. Such contributions are neglected here and the renormalization is thus equivalent to replacing m by mπ in the tree-level expressions. One is then left with the diagram (d), which can be worked out analytically in a few lines.
6.3. Transition matrix element. The finite-volume transition matrix element (4.3) can be computed by studying the euclidean correlation function 0
L
d3 y On (x0 )Lw (0)θ (y)con =
6
e−Wl x0 +mK y0 0|On (0)|π π lπ π l|Hw |KK|θ(0)|0 + . . .
(6.10)
l=0
at large x0 and large negative y0 . As in the case of the two-pion states, the terms we are interested in are found by looking for the appropriate exponential factor. To lowest order diagram (a) in Fig. 4 yields √ g νn {1 + O(λ)} . |ππ n|Hw |K| = (6.11) 2Wn mK L3 The pion mass in this expression is renormalized by the tadpole insertions at the next order (diagram (b) and its mirror image). Diagram (c), the only other diagram at this order, may be evaluated by inserting the time-momentum representation for the external and also the internal lines. Apart from various simple terms, one then ends up with the momentum sum
40
L. Lellouch, M. Lüscher
−3
Sn = L
p∈/ n
1 2 2 − R) (p , k ) , ωp (p2 − k2 )
k ∈ -n ,
(6.12)
where R) is an expression that arises from the Pauli-Villars regularization. A general summation formula proved in ref. [16] allows one to compute such sums up to terms that vanish more rapidly than any power of 1/L. The precise form of R) is not important for this. One only needs to know that it is a smooth function of p and k and that it makes the sum absolutely convergent. The result 1 1 d3 p 1 2 , k2 ) Sn = (p + − R ) (2π)3 2ωp p2 − k2 + i/ p2 − k2 − i/ (6.13) zn νn νn 2 2 + + 3 R) (k , k ) + 4π 2 ωk L 2(ωk L)3 L is then obtained, with the constant zn given by √ zn = lim 4π Z00 (1; q 2 ) + q 2 →n
νn q2 − n
(6.14)
(the zeta function Z00 (s; q 2 ) is defined in Appendix B). 6.4. Final steps. To check Eq. (4.5) one has to tune the box size so that Wn = mK for a specified level number n. This condition determines L order by order in the coupling. The perturbation expansion of the right-hand side of Eq. (4.5) is then obtained by inserting this series in the proportionality factor and the perturbative expressions for the matrix element |ππ n|Hw |K|. To lowest order, the box size is given by Eq. (4.6) and√the function φ (q) in the proportionality factor is thus to be expanded around q = n. This generates a term proportional to zn , which cancels the corresponding term in Eq. (6.13). The integral in this equation matches with the contribution to the transition amplitude A of the infinitevolume diagram with the topology of diagram (c). All other terms that occur at first order in the coupling cancel and one finds that Eq. (4.5) holds as expected. 7. Application to the Physical Kaon Decays Compared to the generic theory considered so far, the situation in the case of the physical kaon decays is complicated by the fact that there are several decay channels. To a first approximation we may however assume that isospin is an exact symmetry in the absence of the weak interactions. The decay channels can then be separated from each other by passing to a basis of states with definite quantum numbers. As an example we discuss the CP-conserving decays of the neutral kaon into twopion states with isospin 0 and 2. The corresponding decay amplitudes, A0 and A2 , are related to the physical transition matrix elements through 2 1 0 2 T (KS0 → π + π − ) = √ A0 eiδ0 + √ A2 eiδ0 , 6 3 2 2 0 2 T (KS0 → π 0 π 0 ) = − √ A0 eiδ0 + √ A2 eiδ0 . 6 3
(7.1) (7.2)
Weak Transition Matrix Elements from Finite-Volume Correlation Functions
41
Table 7.1. Calculation of the proportionality factor in Eq. (7.4) at the first level crossing k∂δ0I /∂k
I
L [fm]
q
q∂φ/∂q
0
5.34
0.89
4.70
1.12
2
6.09
1.02
6.93
−0.09
In these equations δ0I denotes the S-wave pion scattering phase in the channel with isospin I and the normalization and phase conventions are as in Sect. 2. In the sector of two-pion states with isospin I , zero electric charge, zero total momentum and trivial transformation behaviour under cubic rotations and reflections, the energy spectrum in finite volume is determined by the equations that we have previously discussed, with δ0 replaced by δ0I . At the points where one of these energy levels passes through mK , we define the associated transition matrix element MI = (π π )I |Hw |K 0 ,
(7.3)
where it is understood that the states are normalized to unity and that Hw is the CPconserving part of the effective weak hamiltonian. With these conventions, the physical amplitudes are given by
∂δ I ∂φ |AI | = 8π q +k 0 ∂q ∂k
2
k=kπ
mK kπ
3 |MI |2 .
(7.4)
Note that A0 and A2 are real and only their relative sign is observable. Up to this sign, the complete information can thus be retrieved from the matrix elements and the energy spectrum in finite volume. For illustration, let us suppose that the scattering phases δ0I are accurately described by the one-loop formulae of chiral perturbation theory [20]–[22]. The two-pion energy spectrum in the subspaces with isospin I and the box sizes L, where the next-to-lowest levels in these sectors (the ones with level number n = 1) coincide with the kaon mass, can then be calculated. After that the proportionality factor in Eq. (7.4) is easily evaluated (Table 1) and one ends up with |A0 | = 44.9 × |M0 |, |A2 | = 48.7 × |M2 |, |A0 /A2 | = 0.92 × |M0 /M2 |.
(7.5) (7.6) (7.7)
As can be seen from these figures, the large difference between the scattering phases in the two isospin channels (about 45◦ at k = kπ ) does not lead to a big variation in the proportionality factors. In fact, if we set the scattering phases to zero altogether, Eqs. (4.6)–(4.8) give |AI | = 47.7 × |MI | for n = 1, which is not far from the results quoted above. The proportionality factor in Eq. (7.4) thus appears to be only weakly dependent on the final-state interactions. In particular, if the theory is to reproduce the &I = 1/2 enhancement, the large factor has to come from the ratio of the finite-volume matrix elements MI .
42
L. Lellouch, M. Lüscher
8. Concluding Remarks Finite-volume techniques have been used in lattice field theory for many years and have long proved to be a most effective tool. It may well be that weak transition matrix elements are also best approached in this way. For two-body decays a concrete proposition along this line has been made here, which is conceptually satisfactory and which we believe has a fair chance to work out in practice. In the case of the physical kaon decays, the proportionality factor relating the transition matrix elements in finite and infinite volume turned out to be nearly the same in the two isospin channels. This may be surprising at first sight, since the interactions of the pions in the isospin 0 state are much stronger than in the isospin 2 state. One should, however, take into account the fact that the comparison is made at box sizes L greater than 5 fm. It is hence quite plausible that the finite-volume matrix elements already include most of the final-state interaction effects (such as the ones recently discussed in refs. [23]–[25]). Apart from a purely kinematical factor, an only small correction is then required to pass to the matrix elements in infinite volume. Since the unitarity of the underlying field theory has been essential for our argumentation, it is not obvious that Eq. (4.5) holds in quenched QCD. As usual, however, one expects to be safe from the deficits of the quenched approximation when the quark masses are not too small and our results should then be applicable. An investigation of the problem in quenched chiral perturbation theory, following refs. [26, 27], may be worthwhile at this point to find out where precisely the unphysical effects set in. As a final comment we note that the ideas developed in this paper may also be applied to baryon decays, such as ) → N π , 3 → N π and 4 → )π , as well as to any other decay where the particles in the final state scatter only elastically. Depending on the kinematical details, the relation between the finite and infinite volume transition matrix elements may, however, assume a slightly different form. Appendix A The components of four-vectors in real and euclidean space are labelled by an index running from 0 to 3. Bold-face types denote the spatial parts of the corresponding fourvectors and scalar products are always taken with euclidean metric, except for Lorentz vectors in real space where xy = x0 y0 − xy. States | p in infinite volume describing a spinless particle, with mass m and fourmomentum p0 = m2 + p2 > 0, (A.1) p = (p0 , p), are normalized in such a way that p | p = 2p0 (2π )3 δ(p − p ).
(A.2)
Particle states in finite volume are always normalized to unity. In the centre-of-mass frame, the elastic scattering amplitude of two spinless particles of mass m may be expanded in partial waves according to T = 16πW
∞ l=0
(2l + 1)Pl (cos θ)tl (k),
W = 2 m2 + k 2 ,
(A.3)
Weak Transition Matrix Elements from Finite-Volume Correlation Functions
43
where W denotes the total energy of the particles, θ the scattering angle and Pl (z) the Legendre polynomials [28]. Below the inelastic threshold, unitarity implies 1 2iδl −1 , (A.4) e tl = 2ik with δl the (real) scattering phase for angular momentum l. Appendix B For all q ≥ 0 the angle φ(q) is determined through tan φ(q) = −
π 3/2 q , Z00 (1; q 2 )
φ(0) = 0,
(A.1)
and the requirement that it depends continuously on q. The zeta function in this equation is defined by 1 2 (n − q 2 )−s Z00 (s; q 2 ) = √ 4π 3
(A.2)
n∈Z
if Re s > and elsewhere through analytic continuation. Numerical methods to compute the zeta function are described in ref. [17] and a table of values of φ(q) is included in ref. [18]. The source code of a set of ANSI C programs for these functions can be obtained from the authors. 3 2
References 1. Dawson, C., Martinelli, G., Rossi, G.C., Sachrajda, C.T., Sharpe, S., Talevi, M., Testa, M.: Nucl. Phys. B 514, 313 (1998) 2. Maiani, L., Testa, M.: Phys. Lett. B 245, 585 (1990) 3. Ciuchini, M., Franco, E., Martinelli, G., Silvestrini, L.: Phys. Lett. B 380, 353 (1996) 4. Montvay, I., Weisz, P.: Nucl. Phys. B 290 [FS20], 327 (1987) 5. Frick, Ch., Jansen, K., Jersák, J., Montvay, I., Münster, G., Seuferling, P.: Nucl. Phys. B 331, 515 (1990) 6. Lüscher, M., Wolff, U.: Nucl. Phys. B 339, 222 (1990) 7. Guagnelli, M., Marinari, E., Parisi, G.: Phys. Lett. B 240, 188 (1990) 8. Gattringer, C.R., Lang, C.B.: Nucl. Phys. B 391, 463 (1993) 9. Gupta, R., Patel, A., Sharpe, S.: Phys. Rev. D 48, 388 (1993) 10. Fiebig, H.R., Dominguez, A., Woloshyn, R.M.: Nucl. Phys. B 418, 649 (1994) 11. Göckeler, M., Kastrup, H.A., Westphalen, J., Zimmermann, F.: Nucl. Phys. B 425, 413 (1994) 12. Fukugita, M., Kuramashi, Y., Okawa, M., Mino, H., Ukawa, A.: Phys. Rev. D 52, 3003 (1995) 13. Aoki, S. et al. (JLQCD Collab.): Phys. Rev. D 58, 054503 (1998) 14. Gutsfeld, C., Kastrup, H.A., Stergios, K.: Nucl. Phys. B 560, 431 (1999) 15. Lüscher, M.: Commun. Math. Phys. 104, 177 (1986) 16. Lüscher, M.: Commun. Math. Phys. 105, 153 (1986) 17. Lüscher, M.: Nucl. Phys. B 354, 531 (1991) 18. Lüscher, M.: Nucl. Phys. B 364, 237 (1991) 19. Rummukainen, K., Gottlieb, S.: Nucl. Phys. B 450, 397 (1995) 20. Gasser, J., Leutwyler, H.: Phys. Lett. B 125, 325 (1983); Ann. Phys. (NY) 158, 142 (1984); Nucl. Phys. B 250, 465 (1985) 21. Gasser, J., Meissner, U.-G.: Phys. Lett. B 258, 219 (1991) 22. Knecht, M., Moussallam, B., Stern, J., Fuchs, N.H.: Nucl. Phys. B 457, 513 (1995) 23. Pallante, E., Pich, A.: Phys. Rev. Lett. 84, 2568 (2000) 24. Paschos, E.A.: Rescattering effects for / //. hep-ph/9912230 25. Buras, A.J., Ciuchini, M., Franco, E., Isidori, G., Martinelli, G., Silvestrini, L.: Phys. Lett. B 480, 80 (2000)
44
L. Lellouch, M. Lüscher
26. Bernard, C.W., Golterman, M.F.L.: Phys. Rev. D 53, 476 (1996) 27. Golterman, M.F.L., Leung, K.C.: Phys. Rev. D 56, 2950 (1997); ibid. D 57, 5703 (1998); ibid. D 58, 097503 (1998) 28. Gradshteyn, I.S., Ryzhik, I.M.: Table of Integrals, Series and Products. New York: Academic Press, 1965 Communicated by G. Mack
Commun. Math. Phys. 219, 45 – 56 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Computations of M/MS in the 2-d O(n) Non-Linear σ -Model Peter Weisz Max-Planck-Institut für Physik, Föhringer Ring 6, 80805 München, Germany Received: 23 March 2000 / Accepted: 10 April 2000
Dedicated to the memory of Harry Lehmann Abstract: We review the various computations of the ratio of the mass gap M to the -parameter entering the perturbative computations of amplitudes at high energies in the O(n) non-linear σ -model. In particular we reproduce from original notes of H. Lehmann his computation of this ratio in the (next–to) leading order of the 1/n expansion. 1. Introduction Harry Lehmann was in his later career well known to have a strong interest in phenomenology. It may thus come as a surprise to many that he also had a secret love for soluble models in two dimensions! The reason was that he was always concerned about basic field theoretical principles and structures, and these can generally be investigated more easily in 2 dimensions rather than our real world. Historically studies of 2 dimensional models have played a significant role in the development of quantum field theory and statistical mechanics – ranging from studies of phase transitions, critical exponents, non-Euclidean geometry, solitons, duality etc.; moreover they lie at the heart of the presently popular string theories. The main topic of this review is the non-linear O(n) sigma model. Classically it is simply described by the action 1 S = 2 d2 x(∂µ s a (x))2 , (1.1) 2g0 where s a (x) are fields satisfying the constraint s 2 = 1. There are many approaches to the quantization of this model, albeit the question as to which of these lead to the same theory has not yet been answered. What makes the model particularly interesting is that it shares many properties with Yang–Mills theory, the pure gluonic part of QCD, the candidate theory of the strong interactions. For example for the case n = 3 the model has instanton solutions. Moreover perturbatively it has been shown to be renormalizable
46
P. Weisz
[1, 2] and has the property of being asymptotically free, i.e. at high energies q various physical amplitudes can be computed as power series in a running coupling, e.g. gMS (q) the coupling of the MS scheme of dimensional regularization. The running coupling of any scheme falls logarithmically at high energies and the scale is set by the so-called -parameter (of that scheme), for example in the MS scheme ln ln q/MS 1 1 q q (n − 2) . (1.2) + +O ln ln ln = 2π MS n−2 MS ln q/MS g 2 (q) MS
Perturbation theory describes the running of amplitudes at high energy but one needs non-perturbative methods to relate the -parameter to low energy mass parameters. Table 1. Determinations of M/MS in the σ -model Date
Authors
n=3
’82
Lüscher
1.7 ± 0.4
general n
small vol. L
’85
Floratos,
2.1 ± 0.2
+ extrapolation L→∞
Petcher ’82
Fox et al.
Method
∼ 3.7
Lattice MC (SA)
’83
Berg et al.
∼ 1.3
Lattice MC (IA)
’89
Wolff
2.75(25)
Lattice MC (SA)
’90
Hasenfratz,
3.4 ± 0.1
Niedermayer ’85
Müller et al.
1+
’90
Biscari et al.
(ln 8 + γ − 1) n1 +···
Lehmann
1 n -Expansion
Hamilton Form. ’85
Gliozzi
2.71 · · ·
’81
Iwasaki
8/e =
exp[1/(n − 2)]
+ OPE (wrong!) Instantons
2.94 · · ·
?!
Hasenfratz, ’90
Maggiore,
TBetheA 8/e
Niedermayer Weisz, Wolff
’97
Shin
+ S-Matrix
Lüscher, ’91
(8/e)1/(n−2) [1+1/(n−2)]
Finite size 2.9(3)
scaling + lattice
It is widely assumed that the ground state |0 is O(n) invariant, and that the spectrum of stable particles consists only of an O(n) vector multiplet of particles of mass M > 0 (i.e. there are no stable bound states). The computation of ratio of M to the -parameter of the MS scheme in the σ -model has a long history which is summarized in Table 1 and will be further discussed in the next section. Lehmann’s name appears in connection with the computation of the leading orders in the 1/n expansion: M/MS ≡ C(n) = 1 + c1 /n + c2 /n2 + O(1/n3 ).
(1.3)
Computations of M/MS in the 2-d O(n) Non-Linear σ -Model
47
He completed the analytic calculation of the coefficient c1 around the same time as Biscari, Campostrini and Rossi [3]. But, as was typical for him, he did not publish his result immediately; he wanted to first compute the next coefficient c2 . He thought that with this information he could actually guess the exact result! In the meantime Hasenfratz, Maggiore and Niedermayer [4] showed that amazingly the exact result could be obtained analytically using the thermodynamic Bethe ansatz. The result on c1 thus served as a useful consistency check. In 1991 Forgács, Niedermayer and I were discussing with Lehmann the extension of the computation of M/ to the Gross–Neveu model. He agreed to send his unpublished hand-written notes and I have taken this opportunity to convert these practically unchanged to print in Appendix A. We adopted his method, which is rather elegant since it makes use of his spectral representation and does not require a particular regularization, for our 1/n computation [5]. 2. Methods of Computing the Ratio C = M/MS in the O(n) σ -Models The first computations of C were performed using lattice regularization. The literature on the subject is vast and only a few representative references have been included in the table. The lattice -parameter associated with a given lattice action is a function of the bare coupling g0 : 2 alatt = (b0 g02 )−b1 /b0 exp[−1/(b0 g02 )] 1 + O(g02 ) , (2.1) where a is the lattice spacing. One can compute the mass gap in lattice units M(g0 )a directly through numerical measurements of the exponential decay of the spin 2–point function and compare the behavior with the rhs of (2.1). The lattice data is consistent with M(g0 )/latt (g0 ) slowly varying for small g0 . Unfortunately perturbation theory in the bare coupling (for the standard action) seems badly convergent, which could explain the deviations of the values in the table from the exact result. Nevertheless we note that the ratio MS /latt which is needed to convert the result to the MS scheme often introduces a significant factor e.g. for the standard action and n = 3 it is ∼ 27, and hence any discrepancy could have been a priori much worse! Wolff [8] obtained his result by comparing the spin 2–point function measured on the lattice at short physical distance with renormalized perturbation theory. His result, obtained prior to the paper by Hasenfratz et al, is in remarkably good agreement therewith. Another approach was that of Lüscher [9]. His tactic was to consider the mass gap m(L) in a finite periodic (1d) volume L. He pointed out that for very small volumes z = m(L)L 1 the ratio c(z) = m(L)/MS could be computed perturbatively. The ratio in leading order is a rapidly falling function for small z (e.g. for n = 3, c(z) ∝ z2 exp(2π/z)) which flattens and attains a minimum at some finite z. In the limit n → ∞ the value of c(z) at the minimum is close to the known result at z = ∞. Encouraged by this and by the fact that he had previously shown that m(L) approached M exponentially for large ML [10]1 (m(L) − M)/M = f (n) √
1 2π ML
e−ML [1 + O(1/(ML))] ,
(2.2)
Lüscher assumed that c(z) was a monotonically falling function of z and tried to estimate C = c(∞). Unfortunately it turned out that his systematic error was underestimated. 1 Here f (n) is related to the forward scattering amplitude; one obtains e.g. f (3) = 32/9.
48
P. Weisz
Indeed the computation to next order, performed by Floratos and Petcher [11], yielded a central value just within his expected errors. Their result is indeed closer to the exact result but the error is still underestimated. It seems that straightforward 2-loop perturbation theory for c(z) is not quantitatively accurate at intermediate values z ∼ 1. A refined version of the above method was put forward by Lüscher, Weisz and Wolff [12] and extended later by Shin [13]. Here the running of the “coupling" m(L)L was measured on the lattice over a wide range of volumes. The continuum limit was taken assuming that the ratio of physical quantities reaches its limit rapidly as a power in the lattice cut–off (modulated by powers of logs) as proposed by Symanzik [14]. The range measured covers large volumes where m(L) ∼ M, to small volumes where renormalized perturbative behavior seems to set in, and hence permits an approximate determination of C. Actually, the first computation of the coefficient c1 in (1.3) was done by Müller, Raddatz and Rühl [15] in 1985. They started from the 1/n expansion in the lattice regularization and evaluated c1 numerically. Their result was in excellent agreement with the analytic result obtained much later by Biscari et al [3] and Lehmann working in the continuum 2 . It remains to remark that although Lehmann invested a lot of effort to compute the coefficient c2 he did not complete this calculation and to my knowledge it has also not yet been accomplished by anyone else. Iwasaki [17] produced the exact result for the case n = 3 already in 1981! He obtained this by assuming that classical multi-instantons configurations completely saturate the path integral and further argued that either instantons or anti-instanton should be included but not both. Since the approximations involved are rather questionable it is not clear whether this agreement is an accident or has a deeper significance. In the table we also note the result of Gliozzi [18] which does not agree with the exact result. We include this because the story here is rather unfortunate; Gliozzi obtained his result crucially using a wrong equation in a paper of Lüscher [19]3 . Fortunately this mistake did not effect the results in which Lüscher was actually interested in that publication. Neither author published an erratum but notes discussing the necessary corrections to [19] are available [20]. We conclude this section by outlining the computation of Hasenfratz et al [4]. It is based on a remarkable property of the O(n) sigma models which we have not mentioned so far. That is classically they have an infinite set of conserved local and non-local charges. It was shown by Polyakov [1] and Lüscher [19] that these survive quantization. In the quantum theory they have the consequence that there is no particle production and the N-particle S-matrix factorizes into a product of 2-particle S-matrices, which in turn is determined (up to CDD ambiguities) to be that proposed by Zamolodchikov and Zamolodchikov [21]. To be able to compute C exactly one must find a quantity which is calculable in perturbation theory and in a non-perturbative approach. One such quantity is the free energy f as a function of a chemical potential h coupled to a Noether charge J 12 . The euclidean action is:
1 S = 2 dD x (∂s)2 + 2ih(s 1 ∂0 s 2 − s 2 ∂0 s 1 ) − h2 1 − s32 − · · · − sn2 . (2.3) 2g0 2 The correct result appears also in the paper H. Flyvberg [16]; however his 1989 preprint on which this paper is based contained an error which was pointed out by the authors of ref. [3]. 3 When Lüscher saw Gliozzi’s paper he realized that there must be an error (which he hardly ever makes!). He immediately informed the author of [18] but it seems that it was too late for the paper to be withdrawn.
Computations of M/MS in the 2-d O(n) Non-Linear σ -Model
49
A standard one loop perturbative calculation using dimensional regularization yields f (h) − f (0) = −
(n − 2) h2 1 2 ) , + O(g(h) − 2 g 2 (h) 4π
(2.4)
MS
and hence, using (1.2) f (h) − f (0) = −
h h (n − 2) h2 1 +O ln ln ln √ + 2π 2 MS MS e n − 2
ln ln h/MS . ln h/MS (2.5)
On the other hand Polyakov and Wiegmann [22] derived an integral equation for f (h) for cases n = 3, 4 by applying the Bethe ansatz technique to related fermionic models. Hasenfratz and Niedermayer [4] showed the integral equation can be derived for general n. The largest eigenvalue of J 12 on one particle states is 1. Thus as h exceeds the threshold value of hc = M a finite density of such particles will be formed. The momenta of a dilute gas of N such particles in a periodic system of size L satisfy the eigenvalue equation pj = M sinh θj : S(θj − θr ), j = 1, 2, . . . , N, (2.6) exp(−iLM sinh θj ) = r=j
where S(θ ) is the invariant amplitude in the symmetric and traceless channel. If the gas is not dilute, multi–particle scattering processes will also enter, which in general leads to a more complicated problem. However because of factorization of the S–matrix one can argue that Eq. (2.6) remains true for arbitrary densities ρ = N/L. Due to the property S(0) = −1, the threshold behavior h ≥ M is described by a dilute non-relativistic non-interacting Fermi gas. The derivation of the integral equation starts from (2.6) and follows standard steps. Taking the thermodynamic limit N → ∞, L → ∞, ρ fixed the density g(θ ) (normalized so that dN = (L/2π )g(θ)dθ ) satisfies the integral equation B g(θ ) − dθ K(θ − θ , n)g(θ ) = M cosh θ, (2.7) −B
where the kernel K is related to the S–matrix amplitude through K(θ, n) =
1 d ln S(θ ). 2π i dθ
The energy E of the ground state and the particle density ρ are given by B 1 dθg(θ )M cosh θ, E= 2π −B B 1 ρ= dθg(θ ). 2π −B
(2.8)
(2.9) (2.10)
The free energy density can be obtained from the above equations by performing the Legendre transform f (h) = minρ [E(ρ) − hρ].
(2.11)
50
P. Weisz
One obtains M f (h) − f (0) = − 2π
B
−B
dθ +(θ ) cosh θ,
where +(θ ) satisfies the integral equation B +(θ ) − dθ K(θ − θ , n)+(θ ) = h − M cosh θ. −B
(2.12)
(2.13)
The parameter B is determined through the boundary condition +(±B) = 0.
(2.14)
The analysis of the integral equation in the limit of h M uses the generalized Wiener–Hopf technique, and is rather involved [4]. The result for h M is f (h) − f (0) = −
(n − 2) h2 h h ln ln(h/M) 1 ln √ + ln ln + O( , (2.15) 2π 2 M ln(h/M)) M e n−2
where 1 1 8 n−2 c(n) = e 1+
1 n−2
.
(2.16)
Comparing (2.5) and (2.15) we see that C(n) = c(n).
(2.17)
3. Discussion A glance at the table gives a quantitatively satisfactory impression; all methods for determining the ratio C agree rather well (and as noted before for the lattice determinations it is note a priori obvious that this should be so). On the other hand the overall agreement could just be an accident! Obviously the task of computing C(n) only makes sense if there really exists a non-perturbative definition of the theory which behaves according to renormalized perturbation theory at high energies. The problem is that there is no non-perturbative definition of the model where this has been rigorously proven, indeed although supposedly integrable no completely satisfactory solution starting from first principles (e.g. a la Bethe Ansatz) has been given4 . As mentioned above the thermodynamic Bethe Ansatz computation above is fully consistent with perturbation theory, but there are some unproven assumptions which enter into the computation. In this connection one may argue that the fact that the leading ln(h) perturbative behavior is obtained from the thermodynamic Bethe ansatz is not surprising because the Zamolodchikov S-matrix satisfies (on-shell) asymptotic freedom in the sense that the phase shifts go to zero logarithmically at high energies. However what is highly non-trivial is that the coefficient non-leading terms involving ln ln h (which in the perturbative approach is related to the 2-loop coefficient of the beta function) also match; without this of course one would have obtained an inconsistency! 4 A few physicists, e.g. L. Faddeev, might disagree with this statement!
Computations of M/MS in the 2-d O(n) Non-Linear σ -Model
51
There is one non-perturbative approach to quantization of the sigma-model, the form factor bootstrap [23, 24], which is nearly definitely asymptotically free. The best evidence for this has been produced by Balog and Niedermaier [25]. In this approach off-shell correlation functions are obtained starting from the on-shell data and using general field theoretic properties. Here the crucial question which remains is whether this approach really defines a quantum field theory (although no reason has yet been put forward that this should not be the case). The situation with the construction of the theory starting from the lattice regularization is more subtle. Practically all present lattice investigations crucially rely on one or both of the following assumptions. The first is that the critical point of the (standard) lattice model is at g0 = 0, and the second is that the continuum limit is reached à la Symanzik (mentioned in the previous section). Although much data is consistent with these, a proof of either is lacking. A failure of the Symanzik hypothesis would unfortunately be a big blow to the goal of obtaining accurate results in QCD from numerical simulations. In fact both assumptions have been questioned by Patrascioiu and Seiler [26]; they claim that, for the standard action, there is a critical point at some gc > 0 and that the continuum limit of the lattice model O(n) is not asymptotically free for any n ≥ 2! If Patrascioiu and Seiler are correct it would point to a serious gap in our understanding of universality and that there would be various classes of non-linear O(n) sigma models with differing high energy behavior. It is known is that the continuum limit of the lattice model is quantitatively consistent with the Zamolodchikov S-matrix at low energies [28] and correlation functions are consistent with perturbation theory up to very high energies of O(q/M) = 50 [25]. Thus although the question of whether the continuum limit of the standard lattice theory is asymptotically free is theoretically highly interesting, it is probably phenomenologically irrelevant in the sense that there is a wide range of energies effectively described by the lattice regularization and by renormalized perturbation theory. Infinite energies are an idealization, and if one extends similar thoughts to QCD then this theory (assuming it correctly describes hadronic phenomena) does not “stand alone” in Nature; in particular at high energies the other interactions which come into play must be taken into account.
Appendix. Lehmanns Computation The one-particle states |p, a ,
a = 1, . . . , n,
(A.1)
are labeled by a momentum p = (p 0 , p1 ),
p0 =
M 2 + (p 1 )2 > 0,
(A.2)
and an isospin label a. The normalization of these states may be chosen such that p, a|q, b = δ ab 2p 0 2π δ(p 1 − q 1 ).
(A.3)
Let s a (x) be the renormalized spin field normalized such that 0|s a (x)|q, b = δ ab e−ipx .
(A.4)
52
P. Weisz
We consider the 2-point function 0|s a (x)s b (0)|0 = δ ab i+ (x).
(A.5)
The leading terms of i+ (x) for euclidean distance |x| → 0 computed in renormalized perturbation theory are n−1 (n − 1) 1 1 C(n) + n−2 1+ ln L + ln i = f rac12π a(n)L (n − 2) L n − 2 K(n) (A.6) ln L 2 × 1+O , L with L = − ln ξ,
ξ = M|x|,
(A.7)
and K(n) =
1 exp[γ − 1/(n − 2)], 2
(A.8)
where γ denotes Euler’s constant. To order 1/n we have 1 + ..., n C(n) 1 ln = ln 2 − γ + (1 + c1 ) + . . . , K(n) n a(n) = 1 + a1
(A.9) (A.10)
and hence i+ =
1 1 L + ln 2 − γ + L ln L + a1 L + (1 + ln 2 − γ ) ln L 2π n
(A.11)
+ (ln 2 − γ )(1 + a1 ) + 1 + c1 + O(1/L) + O(1/n2 ) .
In the following we will show 4 + γ − 3, π c1 = 3 ln 2 + γ − 1.
a1 = ln
(A.12) (A.13)
First we write i+ =
1 i + O(1/n2 ). K0 (ξ ) + + 2π n 1
Now + 1 is given by the imaginary part of the diagram
(A.14)
Computations of M/MS in the 2-d O(n) Non-Linear σ -Model
53
where the solid line corresponds to the propagator 1/(k 2 − M 2 ), and the wavy line to the propagator D(q): ∞ dκ 2 D(q) = · E(κ 2 ), (A.15) 2 2 4M 2 q − κ √ 4π κ κ 2 − 4M 2 2 E(κ ) = . (A.16) √ ln2 (κ + κ 2 − 4M 2 )2 /4M 2 + π 2 Thus we have the spectral representation 1 + i1 (x) = d2 ke−ikx θ(k0 )ρ1 (k 2 ) 2π ∞ 1 dK 2 K0 (K|x|)ρ1 (K 2 ), = 2π 9M 2 with
(A.17)
d2 qθ(k0 − q0 )θ (q0 )δ (k − q)2 − M 2 E(q 2 ) √ ∞ 2 2 1 2 E(κ )θ ( k − κ − M) = dκ . (A.18) 2π 4M 2 (k 2 + κ 2 − M 2 )2 − 4k 2 κ 2 1 2π
(k 2 − M 2 )2 ρ1 (k 2 ) =
Now we change the integration variables in (A.17) to η, t by ch3η − cht K = Mu, u = , chη − cht
(A.19)
κ = 2Mchη, to obtain i+ 1 (ξ ) =
1 2π
∞
dη(η2 + π 2 /4)−1
0
(A.20)
η
dtK0 (ξ u)
0
chη − cht . shη
(A.21)
We need the leading terms of + 1 (ξ ) for ξ → 0 neglecting terms of O(1/L). To deal with this we proceed in three steps. L Step 1. We show that the η-integration can be restricted to 0 . Consider η ∞ chη − cht dη(η2 + π 2 /4)−1 dtK0 (ξ u) . (A.22) I0 = shη L 0 Since K0 (z) decreases monotonically and u > eη we have ∞ dηK0 (ξ eη )Iˆ 0 (η), I0 < L
with Iˆ 0 (η) =
1 2 shη(η + π 2 /4)
η 0
dt (chη − cht).
(A.23)
(A.24)
54
P. Weisz
Now ηcothη − 1 η 1 Iˆ 0 (η) = 2 < 2 < , 2 2 η + π /4 η + π /4 η and thus 1 I0 < L
∞
L
1 dηK0 (ξ e ) = L η
∞
1
ds K0 (s) = O(1/L). s
(A.25)
(A.26)
Step 2. Now we define Kˆ 0 through K0 (z) = − ln z + ln 2 − γ + Kˆ 0 (z), so that Kˆ 0 (z) = O(z2 ln z) for small z. Then η L chη − cht I1 ≡ = O(1/L) dη(η2 + π 2 /4)−1 dt Kˆ 0 (ξ u) shη 0 0
(A.27)
(A.28)
follows by elementary calculation if the limit ξ → 0 can be taken in the integral 5 . Step 3. So we finally have i+ 1 (ξ ) =
1 (L + ln 2 − γ )I2 + I3 + O(1/L), 2π
where
L
I2 =
(A.29)
dηIˆ 0 (η),
(A.30)
dη Iˆ 3 (η), shη(η2 + π 2 /4)
(A.31)
dt (chη − cht) ln u2 .
(A.32)
0
I3 = − Iˆ 3 (η) =
1 4π
0
L η
0
Now
∞ dη(η − 1) dηηe−η + + O(ξ −2 ) 2 2 2 + π 2 /4) η + π /4 shη(η 0 0 2
1 2 2 2 L = ln η + π /4 |0 − arctan L + ln 2 + γ − 1 + O(ξ −2 ) 2 π π 4 1 = ln L + ln + γ − 2 + + O(1/L2 ). π L
I2 =
L
(A.33) (A.34) (A.35)
Next we note that Iˆ 3 (η) can be calculated by elementary integration with the result (1) (2) Iˆ 3 = Iˆ 3 + Iˆ 3 ,
(A.36)
(1) Iˆ 3 = −4shηch2 η ln(2chη) + 4ηchηsh2 η,
(A.37)
with
5 Lehmann remarks in his notes that he didn’t show that this was actually possible.
Computations of M/MS in the 2-d O(n) Non-Linear σ -Model
and (2) Iˆ 3 = 2chη
2η
dx ln shx − 2 0 = 2chη η2 + 2G(η) ,
where
55
η
dx ln shx
dx ln 1 + e−2x 0
η dx xe−x = η ln 1 + e−2η + . chx 0
G(η) =
(A.38)
0
(A.39)
η
(A.40) (A.41)
By contour integration (shifting to a path parallel to the real axis with imaginary part iπ) we get 1 (1) I3 = ln L + γ + ln 2 − 1 . (A.42) 2π Further 1 (2,I) (2) (2,II) (2,III) + O(1/L), (A.43) I3 = − I3 + I3 + I3 2π where L dη η2 π2 (2,I) I3 = = L − , (A.44) 2 2 4 0 η + π /4 π/2 ∞
e−η π2 dη η2 dy π (2,II) ) = − ln 2 − − y tan y, (A.45) = I3 (η2 + π 2 /4 shη 4 y 2 0 0 ∞ dη (2,III) =2 cothηG(η). (A.46) I3 2 η + π 2 /4 0 We introduce H (z) = G(z + iπ/2) = G(z − iπ/2), (Re z ≥ 0)
z dx xe−x π2 −2z = − + z ln 1 − e , 8 shx 0 which is analytic for |Im z| < π . Now 2 π/2 dy (2,III) = tan y H (iy) + H (−iy) I3 π 0 y π/2 dy π 2 tan y( − y)2 = π 0 y 2 π/2
dy π − y tan y − ln 2, = y 2 0
(A.47) (A.48)
(A.49) (A.50) (A.51)
and so π2 − 2 ln 2. 4 Putting everything together we get the final result (A.13). (2,II)
I3
(2,III)
+ I3
=
(A.52)
56
P. Weisz
References 1. Polyakov, A.M.: Phys. Lett. B 59, 79 (1975) 2. Brézin, E. and Zinn-Justin, J.: Phy. Rev. B 14, 3110 (1976); Brézin, E., Zinn-Justin, J. and Le Guillou, J.C.: Phys. Rev. D 14, 2615 (1976) 3. Biscari, P., Campostrini, M. and Rossi, P.: Phys. Lett. B 242, 225 (1990) 4. Hasenfratz, P., Maggiore, M. and Niedermayer, F.: Phys. Lett. B 245, 522 (1990); Hasenfratz, P. and Niedermayer, F.: Phys. Lett. B 245, 529 (1990) 5. Forgács, P., Niedermayer, F. and Weisz, P.: Nucl. Phys. B 367, 123 (1991); ibid. B 367, 144 (1991) 6. Fox, G., Gupta, R., Martin, O. and Otto, S.: Nucl. Phys. B 205 [FS5], 188 (1982) 7. Berg, B., Meyer, S. and Montvay, I.: Nucl. Phys. B 235 [FS11], 149 (1984) 8. Wolff, U.: Nucl. Phys. B 334, 581 (1990); Phys. Rev. Lett. 62, 361 (1989) 9. Lüscher, M.: Phys. Lett. B 118, 391 (1982) 10. Lüscher, M.: In: Progress in Gauge Field Theory, ed G.’t Hooft et al., New York: Plenum, 1984 11. Floratos, E. and Petcher, D.: Nucl. Phys. B 252, 689 (1985) 12. Lüscher, M., Weisz, P. and Wolff, U.: Nucl. Phys. B 359, 221 (1991) 13. Shin, D.-S.: Nucl. Phys. B 496, 408 (1997) 14. Symanzik, K.: Nucl. Phys. B 226, 187 (1983) For a review see Lüscher, M.: Improved Lattice Gauge Theories, In: Les Houches 1984, Proceedings, Critical Phenomena, Random Systems, Gauge Theories, pp. 359–374; and “Advanced Lattice QCD”, Talk given at Les Houches Summer School in Theoretical Physics 1997, hep-lat/9802029 15. Müller, V.F., Raddatz, T. and Rühl, W.: Nucl. Phys. B 231 [FS13], 212 (1985) 16. Flyvberg, H.: Phys. Lett. B 245, 533 (1990) 17. Iwasaki, Y.: Phys. Lett. B 104, 458 (1981); Prog. Theor. Phys. 68, 448 (1982) 18. Gliozzi, F.: Phys. Lett. B 153, 403 (1985) 19. Lüscher, M.: Nucl. Phys. B 135, 1 (1978) 20. Lüscher, M.: Addendum to ref.[19]. Unpublished notes (1986) 21. Zamolodchikov, A.B. and Zamolodchikov, Al.B.: Ann. Phys. 120, 253 (1979); Nucl. Phys. B 133, 525 (1979) 22. Polyakov, A. and Wiegmann, P.B.: Phys. Lett. B 131, 121 (1983); Wiegmann, P.B.: Phys. Lett. B 152, 209 (1985); JETP. Lett. 41, 95 (1985) 23. Karowski, M. and Weisz, P.: Nucl. Phys. B 139, 455 (1978) 24. Smirnov, F.A.: Form factors in Completely Integrable Models of Quantum Field Theory. Singapore: World Scientific, 1992 25. Balog, J. and Niedermaier, M.: Nucl. Phys. B 500, 421 (1997); Phys. Rev. Lett. 78, 4151 (1997) 26. Patrascioiu, A. and Seiler, E.: Phys. Rev. Lett. 74, 1920 (1995); ibid 1924 27. Patrascioiu, A. and Seiler, E.: hep-th/0002153 28. Lüscher, M. und Wolff, U.: Nucl. Phys. B 339, 222 (1990) Communicated by W. Zimmernann
Commun. Math. Phys. 219, 57 – 76 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Facts and Fictions About Anti deSitter Spacetimes with Local Quantum Matter Bert Schroer Institut für Theoretische Physik, FU-Berlin, Arnimallee 14, 14195 Berlin, Germany Received: 24 March 2000 / Accepted: 27 February 2001
Dedicated to the memory of Harry Lehmann Abstract: It is natural to analyse the AdSd+1 -CQFTd correspondence in the context of the conformal-compactification and covering formalism. In this way one obtains additional insight about Rehren’s rigorous algebraic holography in connection with the degree of freedom issue which in turn allows to illustrate the subtle but important differences beween the original string theory-based Maldacena conjecture and Rehren’s theorem in the setting of an intrinsic field-coordinatization-free formulation of algebraic QFT. I also discuss another more generic type of holography related to light fronts which seems to be closer to ’t Hooft’s original ideas on holography. This in turn is naturally connected with the generic concept of “Localization Entropy”, a quantum pre-form of Bekenstein’s classical black-hole surface entropy. 1. Historical Background There has been hardly any problem in particle physics which has attracted as much attention as the problem if and in what way quantum matter in the Anti deSitter spacetime and the one dimension lower conformal field theories are related and whether this could possibly contain clues about the meaning of quantum gravity. In more specific quantum physical terms the question is about a conjectured [1–3] (and meanwhile in large part generically and rigorously understood [6]) correspondence between two quantum field theories in different spacetime dimensions; the lowerdimensional conformal one being the “holographic image” or projection of the AdS theory. Conjectures, different from mathematical proofs; allow of course almost always a certain margin in their precise mathematical formulation and in their physical interpretation. The field theoretic content of this conjecture has often been interpreted as This work received financial support from the CNPq
Present address: CBPF, Rua Dr. Xavier Sigaud, 22290-180 Rio de Janeiro, Brazil.
E-mail:
[email protected] 58
B. Schroer
a correspondence between two Lagrangian field theories (e.g. between a conformally invariant 4-dimensional SYM and a higher dimensional spin = 2 gravitational-like theory). The exact theorem says that such a correspondence cannot exist; one side has to be non-Lagrangian. There is no exception to this proposition; not even the assumption of supersymmetry helps here. One of our goals is to spell this out in detail and to illustrate this interesting point with a simple model. The community of string physicists has placed this correspondence problem in the center of their interest. Remembering the great conceptual and calculational achievements as e.g. the derivation of scattering theory and dispersion relations from field theory with which the name of Harry Lehmann (to whose memory this article is dedicated) is inexorably linked, I will limit myself to analyze the particle physics content of the socalled Anti deSitter conformal QFT-correspondence from the conservative point of view of a quantum field theorist who, although having no active ambitions outside QFT, still nourishes a certain curiosity about present activities in particle physics as e.g. string theory or noncommutative geometry. In the times of Harry Lehmann the acceptance of a theoretical proposal in particle physics was primarily coupled to its experimental verifiability and/or its conceptual standing within physics. The AdS model of a curved spacetime has a long history [4, 5] as a theoretical laboratory of what can happen with particle physics in a universe which is the extreme opposite of globally hyperbolic in that it possesses a self-closing time, whereas the proper de Sitter spacetime was once considered among the more realistic models of the universe. The recent surge of interest about AdS came from string theory and is different in motivation and more related to the hope (or dream) to attribute a meaning to “Quantum Gravity” from a string theory viewpoint. Fortunately for a curious outsider (otherwise I would have to quit right here), this motivation has no bearing on the conceptual and mathematical problems posed by the would be AdS-conformal QFT correspondence; the latter turned out to be one of those properties discovered in the setting of string theory which allows an interesting and rigorous formulation in QFT which confirms some, but not all, the conjectured properties. The rigorous treatment however requires a reformulation of (conformal) QFT within a more algebraic setting. The standard formalism based on pointlike “field coordinatizations” which underlies the Lagrangian (and the Wightman) fomulations does not provide a natural setting for the study of isomorphisms between models in different spacetime dimensions, even though the underlying physical principles are the same. One would have to introduce many additional concepts and auxiliary tricks into the standard framework to the extent that the formulation appears contrived containing too many ad hoc prescriptions. The important aspects in this isomorphism are related to space and timelike (Einstein, Huygens) causality, localization of corresponding objects and problems of degree of freedom counting. All these issues are belonging to real-time physics and in most cases their meaning in terms of Euclidean continuation (statistical mechanics) remains obscure; but this of course does not make them less physical. This note is organized as follows. In the next section I elaborate the kinematical aspects of the AdSd+1 -CQFTd situation as a collateral result of the old (1974/75) compactification formalism for the “conformalization” of the d-dimensional Minkowski spacetime. For this reason the seemingly more demanding problem of studying QFT directly in AdS within a curved spacetime formalism can be bypassed. The natural question whose answer would have led directly from CQFT4 to AdS5 in the particle physics setting (without string theory as a midwife) is: does there exist a quantum field theory which has the same SO(4, 2) symmetry and just reprocesses the CQFT4 matter con-
Anti deSitter Spacetimes with Local Quantum Matter
59
tent in such a way that the “conformal Hamiltonian” (the timelike rotational generator ¯ becomes the true hamiltonian? This theory indeed exists, it is through compactified M) an AdS theory with a specific local matter content computable from the CQFT matter content. The answer is unique, but as a result of the different dimensionality one cannot describe this one-to-one relation between spacetime indexed matter contents in terms of pointlike fields. This will be treated in Sect. 3, where we will also compare the content of Rehren’s isomorphism [6, 8] with the Maldacena, Witten at al. [1–3] conjectures and notice some subtle but potentially serious differences in case one interprets the conjecture (as it was done in most of the subsequent literature) as a relation between two Lagrangian theories. Whoever is aware of the fact that subtle differences often have been the enigmatic motor of progress, will not dismiss such observations. The last section presents some general results of AQFT on degrees-of-freedomcounting and holography. Closely connected is the idea of “chiral scanning” i.e. the encoding of the full content of a higher dimensional (massive) field QFT into a finite number of copies of one chiral theory in a carefully selected relative position within a common Hilbert space. In this case the price one has to pay for this more generic holography (light-front holography) is that some of the geometrically acting spacetime symmetry transformations become “fuzzy” in the holographic projection and some of the geometrically acting symmetries on the holographic image are not represented by diffeomorphisms if pulled back into the original QFT.
2. Conformal Compactification and AdS The simplest type of conformal QFT is obtained by realizing zero mass Wigner representation of the Poincaré group with positive energy (and discrete helicity) and allowing for a natural extension to the conformal symmetry group SO(4, 2)/Z2 without any enlargement of the Hilbert space. Besides scale transformations, this larger symmetry also incorporates the fractional transformations (proper conformal transformations) x =
x − bx 2 . 1 − 2bx + b2 x 2
(1)
It is often convenient to view this formula as the action of the translation group T (b) conjugated with a (hyperbolic) inversion I −x , x2 x = I T (b)I x. I :x→
(2) (3)
I does not belong to the above conformal group, although it is unitarily represented (and hence a Wigner symmetry) in these special Wigner representations. For fixed x and small b formula (1) is well defined, but globally it mixes finite spacetime points with infinity and hence requires a more precise definition (in particular in view of the positivity energy-momentum spectral properties) in its action on quantum fields. Hence as a preparatory step for the adequate formulation of quantum field theory concepts, one has to achieve a geometric compactification. This starts most conveniently from a linear representation of the conformal group SO(d, 2) in d+2-dimensional auxiliary space
60
B. Schroer
R(d,2) (i.e. without field theoretic significance) with two negative (time-like) signatures gµν (4) G= −1 +1 and restricts this representation to the (d + 1)-dimensional forward light cone 2 LC (d,2) = {ξ = (ξ, ξ4 , ξ5 ); ξ 2 + ξd2 − ξd+1 = 0},
(5)
where ξ 2 = ξ02 − ξ 2 denotes the d-dimensional Minkowski length square. The compactified Minkowski space M¯ d is obtained by adopting a projective point of view (stereographic projection) ξ M¯ d = x = (6) ; ξ ∈ LC (d,2) . ξd + ξd+1 It is then easy to verify that the linear transformations, which keep the last two components invariant, consists of the Lorentz group and those transformations which only transform the last two coordinates, yield the scaling formula ξd ± ξd+1 → e±s (ξd ± ξd+1 )
(7)
leading to x → λx, λ = es . The remaining transformations, namely the translations and the fractional proper conformal transformations, are obtained by composing rotations in the ξi -ξd and boosts in the ξi -ξd+1 planes. A convenient description of Minkowski spacetime M in terms of this d + 2 dimensional auxiliary formalism is obtained in terms of a “conformal time” τ , Md = (sin τ, e, cos τ ), e ∈ S d−1 , e sin τ , x= d , t= d e + cos τ e + cos τ ed + cos τ > 0, −π < τ < +π
(8) (9)
so that the Minkowski spacetime is a piece of the d-dimensional wall of a cylinder in d+1 dimensional spacetime which becomes tiled with the closure of infinitely many Minkowski worlds. If one cuts the wall on the backside appropriately, this carved out piece representing d-dimensional compactified Minkowski spacetime has the form of a d-dimensional double cone positional symmetrically around τ = 0, e = (0, ed = 1) without its boundary1 . The above directional compactification leads to an identification of boundary points at “infinity” and give e.g. for d=1+1 the compactified manifold the ¯ topology of a torus. The points which have been added at infinity to M namely M\M are best described in terms of the d-1 dimensional submanifold of points which are lightlike with respect to the past infinity apex at m−∞ = (0, 0, 0, 0, 1, τ = −π ). The cylinder d = S d−1 × R which is “tiled” in both τ -directions walls form the universal covering M by infinitely many Minkowski spacetimes (“heavens and hells”) [12]. If the only interest 1 The graphical representations are apart from the compactification (which involves identifications between past and future points at time/light-infinity), the famous Penrose pictures of M.
Anti deSitter Spacetimes with Local Quantum Matter
61
¯ then one may as well stay with the original is the description of the compactification M, x-coordinates and write the d+2 ξ -coordinates follow Dirac and Weyl as ξ µ = x µ , µ = 0, 1, 2, 3, 1 ξ 4 = (1 + x 2 ), 2 1 5 ξ = (1 − x 2 ), 2 2
2
i.e. ξ − ξ = x − x .
(10)
Since ξ is only defined up to a scale factor, we conclude that lightlike differences retain an objective meaning in M¯ even though the space- and time-like separation does lose its meaning. An example of a physical theory on M¯ are free photons. The impossibility of a distinction between space- and time-like finds its mathematical formulation in the Huygens principle which says that the lightlike separation is the only one where the physical fields do not commute and hence where an interaction can happen. In the terminology of local quantum physics this means that the commutant of an observable algebra localized in a double cone consists apparently of a (Einstein causal) connected spacelike – as well as two disconnected (Huygens causality) timelike – pieces. But taking the compactification into consideration one realizes that all three parts are connected and ¯ In terms of Wightman correlation the space/time-like distinction is meaningless on M. functions this is equivalent to the rationality of the analytically continued Wightman functions of observable fields which includes an analytic extension into timelike Jost points [37, 18]. Therefore in order to make contact with particle physics aspects, the use of either the or of more general fields (see the next section) on M¯ is very important since covering M only in this way one can implement the pivotal property of causality together with the associated localization concepts. As first observed by I. Segal [11] and later elaborated and brought into the by now standard form in field theory by Lüscher and Mack [12], a global form of causality can be based on the sign of the invariant
2 ξ(e, τ ) − ξ(e , τ ) ≷ 0, hence
1
2
e − e
τ − τ ≷ 2 Arcsin
= Arccos e · e ,
4
(11)
where the < inequality characterizes global spacelike distances and > corresponds to positive and negative global timelike separations. Whereas the globally spacelike region of a point is compact, the timelike region is not. The concept of global causality solves the so-called Einstein causality paradox of CQFT [13]. In the next section we will meet a global decomposition method which also avoids this paradox without the necessity of using covering space. The central theme, namely the connection with QFT on AdS enters this section naturally if one asks the question whether one can use instead of the surface of the forward light cone a mass hyperboloid Hd+1 inside the forward light cone of the same
62
B. Schroer
ambient d+2 dimensional space, Hd+1 = η; η2 = 1 , η0 = 1 + r 2 sin τ, ηi = rei , i = 1, . . . d, ηd+1 = 1 + r 2 cos τ.
(12)
This space which because of its formal relation to the analogous deSitter spacetime (which is defined by the spacelike hyperboloid) is called “Anti deSitter” spacetime is noncompact. It is obvious from its construction that its asymptotic part is the same as M¯ d . It was conjectured by Maldacena and others [1–3] that there is also a correspondence between quantum field theories. This conjecture implies the tacit assumption (not explicitly stated in these papers) that an AdSd+1 QFT which coalesces asymptotically2 with an CQFTd theory has a unique extension into the AdS bulk. Since there can be no mapping between pointlike fields on spacetimes of different dimensions the question of the origin of this unique extension is non-trivial. The conjecture came from some speculations concerning possible relations of string theory with some supersymmetric gauge theories (SYM), i.e. from ideas far removed from the present particle physics setting which therefore will not be explained here. In the 70s, at the time of the conformal compactifications, free fields on AdS4 were studied from a particle physics viewpoint by Fronsdal [4]. The correspondence to CQFT3 was overlooked, probably because of the fact that despite the obvious group theoretical connection through the common SO(3, 2), the multiplicites of the discrete AdS free Hamiltonian turned out too big for matching those of the rotational conformal Hamiltonian, a fact which will find its explanation in the next section. Although the two spacetimes cannot be mapped into each other, their shared spacetime symmetry group SO(4, 2) suggests that there is at least a correspondence between certain subsets which may be obtained from projecting down wedge regions from the ambient space onto the two spacetime manifolds. Wedges have a natural relation to SO(4, 2); they may all be generated from the standard wedge in the ambient auxiliary space Wst = ξ 1 > ξ 0 . The fixed point group of this transitive action on wedges consists of a boost and transversal translations and rotations3 . The projected wedges pW on AdS are by definition again wedges in AdS/CQFT and the SO(d, 2) symmetry group has the same transitive action, i.e. the system of wedges is described by SO(d, 2) modulo the fixed point subgroup. This geometric situation clearly suggests that on should consider algebras associated with these wedges instead of looking for a relation between pointlike fields. On the conformal side this includes all double cone algebras of arbitrary small size since the noncompact wedge regions are conformally equivalent to compact double cone regions. The logic of algebraic QFT requires to continue this algebraic correspondence to all intersections obtained from wedges. In this way one expects to arrive at an isomorphism which carries the full content of both theories and which includes the asymptotic relation (on the conformal surface of the aforementioned cylinder) in terms of field coordinatizations used by Maldacena et al. In order to obtain a rigorous 2 Using the previous cylindric representation of the conformal covering, the covering of AdS corresponds to the full cylinder of which its mantel is the conformal covering. 3 If one adds the two longitudinal lightlike translations which in one direction cause a compression into the wedge, one obtains a 8-dimensional Galilei group [21].
Anti deSitter Spacetimes with Local Quantum Matter
63
proof, one must check some consistency conditions in the conversion of maps between spacetime regions and algebras indexed by those regions. This was achieved by Rehren [6] and his theorem will be briefly commented on (including its relation to the original conjecture) in the next section. According to our previous remarks, interacting conformal local fields live on the Fortunately the geometric isomorphism between wedge regions can covering space M. d+1 − M correspondence. The conformal decomposition theory of be lifted to an AdS the next section avoids the use of the rather complicated coverings by using an operator analog of fibre bundles on M¯ 3. The Conformal Hamiltonian as the True Hamiltonian There is another less geometric, but more particle physics type of argument, which leads to the AdS-CQFT correspondence. For this one should recall that in SO(d, 2) there are besides the usual translations with infinity as a fixed point also “conformal translations” which act without fixed points on the compactified M¯ as some kind of “timelike rotations”. They are the analogs of (±) the light-like chiral rotation R (±) (L0 in standard Virasoro algebra notation) and their connection with the light ray translation P (±) with which they share the positivity of their spectrum is R (±) = P (±) + K (±) , K (±) = I (±) P (±) I (±) ,
(13)
where I± is the representer of the chiral conformal reflection x → − x1 (in linear lightray coordinates x) and K is the generator of the fractional special conformal transformation (1). For free zero mass fields the discrete R-spectrum can be understood in terms of that of a Hamiltonian for a massless model in a spatial box. This is however not possible for the R-spectrum of chiral theories with anomalous scale dimension (the R-spectrum is known to be identical to that of scale dimensions). In that case the only theory for which the spectrum is that of its Hamiltonian is the QFT on AdS2 . So if one wants to read the SL(2, Z) modular characters of chiral conformal field theory in the spirit of a Hamiltonian Gibbs formula one should use the AdS side. An analogous statement holds in higher dimensions where the M¯ rotation is described in terms of a Lorentz vector Rµ , Rµ = Pµ + I Pµ I, where the inversion I was defined at the beginning of the previous section. It leads to a family of operators with discrete spectrum of e · R which are dependent on a timelike vector eµ . Again the operator R0 is the true Hamiltonian of only one theory with the same symmetry group and the same system of algebras (but with a different spacetime indexing): the associated d+1 dimensional AdS theory. Now it is time to quote (adapted to our purpose) Rehren’s theorem and comment on it. Theorem 1. The geometric bijection between projected wedges pW on AdSd+1 and the ¯ d which constitute the asymptotic infinity of pW (as deconformal double cones in M scribed in the previous section) extends to an isomorphism of the corresponding algebras. Both theories share the same Hilbert space and the same family of operator algebras, but their spacetime organization and with it their physical interpretation changes.
64
B. Schroer
For the proof we refer to Rehren [6]. Some comments are in order. There are no additional restrictive assumptions (supersymmetry, vanishing β-functions) on either side. If the algebras of the AdS theory are generated by pointlike fields then the associated conformal algebra cannot be generated by a field which has an energy-momentum tensor or obeys a causal equation of motion. This is one of Rehren’s conclusions and it is very instructive to illustrate this with an example. Consider a free scalar AdS field [7]. A simple calculation which will not be repeated here reveals that it corresponds to a conformal generalized free field with homogeneous Kallen–Lehmann spectral function. Generalized free fields always have been physically suspect and if the spectral functions increase in the manner as the homogeneous degree demands in this case, one can prove that primitive causality [17] is violated since the algebra on a piece of time-slice (represented as a chain of small double cones which approximate the compact slice from the inside) is not equal to its causal completion (causal shadow) algebra. As one moves up in time inside the causal shadow from the time-slice more and more degrees of freedom coming from the inner parts of the bulk enter the causal shadow which were not in the time-slice. Rehren’s graphical representation [33] of the CQFT world on the wall of a full AdS cylinder makes this undesired sidewise propagation geometrically visible. This free field situation is generic in the sense that pointlike AdS fields always carry too many degrees of freedom which lead to a violation of causal propagation in the aforementioned sense4 . Such theories have to be abandoned for general physical reasons (not just because they do not fit into a Lagrangian picture which automatically implies causal propagation). Therefore the nice idea [34] to circumvent the scarcity in constructing Lagrangian conformal models (the β-function restrictions) by starting instead with AdS Lagrangians does not work; the resulting conformal theories all share the above defect. In passing we mention that the brane idea shares the same causality conflict with pointlike field. Whereas from a mathematical viewpoint a manifold of interest may in certain cases be considered a brane of a larger dimensional space, the assignment of a physical reality to the ambient spacetime generates causality problems of the above kind for restrictions to the brane in the case of pointlike field theories in the larger ambient spacetime. Only if the ambient degrees of freedom are carefully tuned to the brane can such causality violations be avoided. Note that it is always the causal shadow property which may get lost in such constructions and not the Einstein causality. This is not visible if one restricts one’s attention to (semi)classical solutions concentrated on a brane and or to euclidean formulations. Whereas the principles of AQFT confirm in a very precise way that there exists an isomorphism, it is very interesting that there is a clash with certain concepts which have been used in string theory for the last two decades. This clash extends beyond the above remarks on the AdS-CQFT and the brane concept and casts doubt on the consistency of such quasiclassical pictures as the Kaluza–Klein dimensional reduction. As a matter of fact the quasiclassical Klein–Kaluza reduction idea has been shown to be consistent with causal QFT. For this one would have to demonstrate that the idea works on the vacuum expectation of the ambient QFT and not just on the objects involved in the formal quantization approach which is used in the tentative construction of a QFT. 4 Contrary to a widespread belief, the number of degrees of freedom of causally propagating AdS theories is always larger than that of causally propagating conformal theories so that the isomorphism cannot be one among causally propagating theories. If the AdS theory is pointlike and causally propagating, the associated conformal theory has no causal propagation.
Anti deSitter Spacetimes with Local Quantum Matter
65
As far as the strict conceptual requirement of causality and Haag duality in AQFT are concerned, the K-K mechanism, to the extent that it is not just a mathematical trick (but an asymptotic property of a genuine inclusion of two local quantum physics worlds) has at best remained an enigmatic speculative idea (and at worst a tautology caused by not doing what one actually is claiming to do). The above degree of freedom discussion creates the suspicion that “good” causal conformal theories may have too few degrees of freedom in order to yield AdS pointlike fields as the other side of the coin of the above observation that pointlike AdS fields create causally bad conformal theories. This is indeed the case and can be seen by starting from the Wigner zero mass representation space of the Poincaré group d d−1 p HW ig = ψ(p)| |ψ(p)|2 y,
(18)
Anti deSitter Spacetimes with Local Quantum Matter
67
where the R-matrices are determined from admissible braid group representations. For more on the timelike braid group structure in higher dimensional conformal QFT we refer to [18]. Since theAdS-CQFT isomorphism implies a radical reprocessing on the physical side, it would be interesting to perform the timelike commutation relation analysis directly within the AdS setting. This has not been done yet. 4. Generalized Holography in Local Quantum Physics The message we can learn from the AdS-conformal correspondence is two-fold. On the one hand there is the recognition that there are situations where it is necessary to avoid the use of “field coordinates” in favor of directly working with local algebras. In most concrete situations there were always convenient field coordinatizations available in terms of which the calculations simplified. For the AdS-conformal correspondence is however a new type of problem for which the best way is to stay intrinsic, i.e. to use the net of algebras. The second message is that there may exist a holographic relation between QFT’s and their lower dimensional boundaries. We have argued that the degrees of freedom of AdSd+1 are the same as in the corresponding CQFTd on the boundary even though the Hamiltonians and the associated thermal aspects are different5 . This is the only known case of a bijection of nets of algebras associated with spacetimes of different dimensions but with the same maximal spacetime diffeomorphisms group. Another more frequent kind of holography6 occurs for spacetimes with a causal horizon. In that case certain spacetime diffeomorphisms of the original spacetime act in a “fuzzy” nongeometric manner, thus accounting for the fact that the diffeomorphism group of the horizon does not admit all original diffeomorphisms. Let us consider a simple example: the holographic image of a two-dimensional massive theory in the vacuum representation restricted to the standard wedge, i.e. a Rindler–Unruh situation. We want to restrict the d = 1 + 1 wedge algebra A(W ) to its upper half-line horizon R+ . In a massive theory we expect that both operator algebras are globally identical A(W ) = A(R+ )
(19)
although their local net structure is quite different. Classically this corresponds to the fact that characteristic data on either of the two horizons determine uniquely the function in the wedge7 . It is very important to control the data on the entire upper horizon R+ ; in contradistinction to a spacelike interval, compact intervals on R+ do not cast twodimensional causal shadows. The physical reason is of course that each point in a small neighborhood below that interval is in the backward influence cone of some points on R+ which are far removed to the right outside that interval. Only if we take all of R+ , we will have W as its two-dimensional causal shadow. 5 Contrary to a widespread belief, the number of degrees of freedom of causally propagating pointlike AdSd+1 theories is always larger than that of a causally propagating conformal theories CQFTd so that the isomorphism cannot be one among causally propagating pointlike theories, i.e. if the AdS theory is pointlike and causally propagating, the associated conformal theory has no causal propagation and hence has to be discarded as unphysical. 6 Since our approach tries to relate the holographic aspects via modular localization ideas to the old principles of particle physics, we do not have to invoke a new “holographic principle”. 7 This is true in any dimension. The only exception is d = 1 + 1, mass = 0 in which case both horizons are needed to specify the two chiral components of conformal theories.
68
B. Schroer
In the general approach to QFT the von Neumann algebra of a compact spacetime region is, according to the causal shadow property of AQFT (which is a local version of the time-slice property mentioned in the previous section [17]), identical to the algebra of its causally completed region. Each field theory with a causal propagation (in particular Lagrangian field theory) fulfills this requirement. If one takes a sequence of spacelike intervals which approximate a lightlike interval, the causal shadow region becomes gradually smaller and approaches an interval on the light ray in the limit. The only way to counteract this shrinking is to extend the spacelike interval gradually to the right in such a way that the larger lower causal shadow part becomes the full wedge in the limit. The correctness of this intuitive idea which suggests the correctness of (19) can be checked against other rigorous results. One such result from Wigner representation theory (which therefore is limited to free field theories) together with the application of the Weyl- or CAR- (for half-integer spin) functor is the statement that the cyclicity spaces for an interval I on R+ agree with the total space [19] A(I )6 = A(R+ )6 = A(W )6 = H,
(20)
i.e. the validity of the Reeh–Schlieder theorem on the light ray subalgebra. In fact this holds for all positive energy representations including zero mass, except zero mass in d=1+1 in which case the decomposition in two chiral factors prevents its validity. Therefore one only needs to prove the spatial statement A(R+ )6 = A(W )6 = H in order to derive (19). But this spatial completeness follows from the causal shadow property for spacelike half-lines L starting at the origin since the space A(L)6 = H and this completeness property cannot get lost in the light ray limit L → R+ . The step from spaces to (19) is done with the help of Takesaki’s theorem (mentioned later). We still have to rigorously define the holographic algebra A(R+ ) (which turns out to be chiral conformal) and its net structure A(I ) from A(W ). This is done by the modular inclusion technique which is one of AQFT’s most recent mathematical achievements [20, 19]. The modular way of associating a chiral conformal theory with e.g. a d=1+1 massive theory is the following. Start from the right wedge algebra A(W ) with apex at the origin and let an upper lightlike translation a+ (which fulfills the energy positivity!) act on A(W ) and produce an inclusion (all the algebras are von Neumann algebras) A(Wa+ ) ⊂ A(W ).
(21)
This inclusion is halfsided “modular”, i.e. the modular group8 7it of (A(W ), 6) which is the Lorentz boost acts on A(Wa+ ) for t < 0 as a “compression” Ad7it A(Wa+ ) ⊂ A(Wa+ ),
t < 0.
(22)
The assumed nontriviality of the net i.e. the intersections9 of wedge algebras entails that the relative commutant (primes on algebras denote their commutant in B(H )) A(Wa+ ) ∩ A(W )
(23)
8 For presentations of the Tomita–Takesaki modular theory which are close to the present concepts and notations see [10]. A more extensive presentation which pays due attention to the importance of modular theory for the new conceptual setting of QFT is that by Borchers [36]. 9 The nontriviality of the intersections is in some sense the algebraic counterpart of the renormalizable short distance behaviour in a quantization approach which is believed to be required by the mathematical existence of the Lagrangian theory.
Anti deSitter Spacetimes with Local Quantum Matter
69
is also nontrivial. Such inclusions are called “standard”. It is known that standard modular inclusions correspond to chiral conformal theories, i.e. the classification problem for the latter is identical to the classification of all standard modular inclusions [35]. In the case at hand the emergence of the chiral theory is intuitively clear since the only “living space” in agreement with Einstein causality (within the closure of W and spacelike with respect to the open Wa+ ) which one can attribute to the relative commutant is the lightray interval of length a+ starting at the origin. From the abstract modular inclusion setting the Hilbert space which the relative commutant generates from the vacuum could be a subspace H+ ⊂ H, H+ = P H of the original one, but the already mentioned causal shadow property assures that H+ = H, i.e. P = 1. With the help of the L-boost (=modular group 7it of (A(W ), 6) one then defines a net on the halfline R+ and a global algebra A(R+ ) = alg ∪t R + δ 0, f (x0 ) ≥ 0, f (x0 )d x0 = 1,
gR,δ (x) =
(25)
one finds that the square norm of the partial charge applied to the vacuum QR QR diverges with δ → 0 and increases for fixed δ in the limit R → ∞ as R d−2 , where d is the spacetime dimension [24]. But what, if any, could be the message of this area law with that of the would be localization entropy? We first have to understand the algebraic analogue of the surface vacuum fluctuation of the partial charges. This turns out to be the split property, i.e. the necessity to work with fuzzy space time boxes in the form of double cones with a “collar” region of thickness δ separating the inside of the smaller box of radius R from the outside of the bigger with radius R + δ. In this split situation we do recover the quantum mechanical inside/outside tensor factorization which refers to a fuzzy box algebra N which extends beyond the smaller box into the collar without sharp geometric boundaries [16]. This sets the stage for defining von Neumann entropy which needs the type I tensor-factorization of boxes in QM. There remains however another important difference to Schrödinger quantum mechanics in that the vacuum state remains entangled, i.e. does not factorize into an inside/outside part but rather remains a highly correlated state with the Hawking–Unruh temperature. This has paradigmatic consequences for the conceptual framework of the measurement process in local quantum physics [25]. It is also the origin of the localization entropy which we have been looking for. One can show that the vacuum state restricted to the fuzzy QFT box leads to a nontrivial entropy which diverges with δ → 0 and increases with the size R of the box in agreement with the above analogy which intuitively pictures the box entropy of the vacuum as being related to a partial “Hamiltonian charge” via a Gibbs formula in the above sense. As in that case one also would expect the validity of an entropical area law at least for large ratios of the diameter divides by the collar size and that the matter dependence would show up, if not in the coefficient of the area law itself, at least in its correction terms. The “Hamiltonian charge” which
72
B. Schroer
we intuitively relate with a Gibbs formula is not expected to be associated to a geometrical symmetry but rather to one of the infinitely many modular-generated fuzzy/hidden symmetries which any QFT possesses. In particular we find the use of the conformal rotational Hamiltonian which appeared in the recent literature [26] physically ad hoc and too restrictive, especially in view of the fact that Bekenstein’s area law does not require conformal invariance. Even if in very special conformal situations its spectrum happens to be similar to that of the logarithm of the modular operator of the splitting box algebra with respect to the vacuum and the resulting entropy complies with the Bekenstein area law, such an enigmatic observation will be helpful only if it leads to a general physical concept. The Minkowski analog (the Unruh problem) of black hole thermodynamics/statistical mechanics in our view is the understanding of thermal aspects resulting from (modular) localization rather than the application of the heat-bath Gibbs formalism. Various intuitively equivalent forms of localization entropy related to the split inclusion situation were introduced via the concept of relative entropy for a pair of states in the work of H. Narnhofer [27]. The most managable version for the purpose of extracting a possible area law which refers directly to the states seems to be the relative entropy of the vacuum relative to the “split vacuum” on the restricted tensor product algebra A ⊗ B , A ⊂ N , B ⊂ N , where A is the smaller double cone algebra and B the commutant of the bigger one. There exists [28] a nice variational formula in terms of states only for such relative entropy of a von Neumann algebra M between two states ωi , i = 1, 2, S(ω1 |ω2 )M = − log 7ω1 ,ω2 ω 2 1 ω(1) 1 dt − ω1 (y ∗ (t)y(t)) − ω2 (x ∗ (t)x(t)) , = sup 1+t t t 0 x(t) = 1 − y(t), x(t) ∈ M.
(26)
Here 7ω1 ,ω2 is the relative modular operator and for the case at hand we have to identify ω1 = 6, ω2 = 6 ⊗ 6 (the split vacuum) and M = A ⊗ B . Using some previous nuclearity estimates of Buchholz and Wichmann [16], Narnhofer carried out a rough estimate for this entropy and found that it increases less than the volume of the relativistic box of size R [27]. In the present setting her result may be interpreted as a first indication in favor of a Bekenstein area law for localized quantum matter. In order to obtain more structural insight into this fundamental and universal phenomenon I started to investigate this problem in the mathematically better controllable situation of two double cones separated by a collar in conformal theories [10]. By conformal invariance the large R behaviour becomes coupled to the short distance behavior in the limit of vanishing collar size δ → 0. One expects to have an easier conceptual grasp on this ultraviolet behavior as a consequence of the fact that it reflects truly intrinsic properties of the local algebras and not with short distance divergencies of particular field coordinates. Besides, conformal theories from an analytic viewpoint are the simplest theories after free fields. There are as yet no sufficiently concrete results worthwhile to be reported here. To avoid misunderstanding, We do not claim that the area law for black holes is a simple consequence of the area law for localized quantum matter. It would be a pity if it would, because then not much would be revealed by black hole physics about the still extremely speculative issue of quantum gravity. Rather I believe that it is the seemingly
Anti deSitter Spacetimes with Local Quantum Matter
73
very nontrivial conversion of localization entropy of local quantum physics into the more geometric Killing horizon entropy of black holes10 which will be the crucial step. 5. Epilogue The present analysis of the AdS-CQFT correspondence has its roots in the LSZ setting of particle physics from which the conformally invariant QFT should result in the zero mass limit [30]. The step from the traditional use of pointlike (Lagrangian) fields to operator algebras indexed by spacetime regions has been taken a long time ago with the intention to obtain a more profound understanding of the observed insensitivity of the S-matrix obtained as the asymptotic limit in the setting of the Lehmann–Symanzik–Zimmermann formalism to changes of field coordinates (“interpolating fields”) within the (Borchers) equivalence class. This led to a more intrinsic formulation of QFT called algebraic QFT (AQFT) which relegates the role of fields to a coordinatization of local algebras in terms of selection of particular generators. If one wants to use field-coordinatizations altogether, as was needed in Rehren’s proof of the AdS-CQFT isomorphism, it is appropriate to avoid the word “field” and talk about Local Quantum Physics [16]. As the step from differential geometry with coordinates to the modern intrinsic coordinate-free formulation did not represent a change of the geometrical content, one does not change physical principles (but only some concepts for their implementation) by passing from QFT to LQP. Since certain problems, as e.g. the abominable short-distance problem in the pointlike formulation (∼coordinate singularities? of which field coordinate?)11 which always seemed to threaten the existence of Lagrangian QFT through its long journey through renormalization theory, become deemphasized in favor of apparently different aspects (ultraviolet divergencies→nontriviality of certain intersection of algebras) in the new formulation, this reprocessing of concepts represents a very healthy change. The conjecture about the AdS-CQFT correspondence comes from string theory. Although string theory has been the dominant way of thinking in particle physics publications for at least two decades, its main achievements seem to be that (with some training and coaching) it allows theoretical physicists to make contributions to mathematics. Its historical origin in the dual S-matrix model of Veneziano was very close to the framework of LSZ scattering theory; in fact it started as a proposal for a nonperturbative crossing symmetric S-Matrix which fulfilled a very strong form (not suggested by QFT) of crossing called duality (saturation of crossing on the level of reggeized one-particle states). This forced the S-matrix to live in a high-dimensional spacetime of at least 10 dimensions (by invoking another invented structure: supersymmetry). The next step in the LSZ logic would have been to ask for the understanding of this high dimensional QFT (i.e. the unique equivalence class of fields or the local net of algebras) which has this S-matrix as a bona fide physical S-matrix, i.e. as a large time LSZ limit. Unfortunately this never happened; instead the off-shell transition was performed at a completely different purely technical place. It was based on the auxiliary observation that the particle towers which appeared in the lowest order (or lowest genus of Riemann surfaces which is the analogue of Schwinger’s auxiliary eigentime in QFT) 10 The role of the double cone restricted vacuum in the black hole situation is played by the Hartle–Hawking state restricted to the outside of the black hole [29]. 11 There are also intrinsic ultraviolet aspects of the local algebras. For example if one uses the “split property” for the definition of the vauum entropy of a local algebra with a “collar” for controlling the vacuum fluctuations near the causal horizon of the localization region, the entropy diverges with shrinking collar size in a way which is characteristic for the model but not for one of its field coordinates [10].
74
B. Schroer
can be reproduced by the mass spectrum of a string. In the original strong interaction representation of the model this tower was thought to lead to resonances (poles in the second Riemann sheet) resulting from higher order interactions destabilizing the higherlying particles in the tower. It was this step (which occurred even before the decree of the use of string theory as a quantum theory of gravity) which is responsible for the lost (and never recovered since) relation to causality and localization which are the cornerstones of QFT. Whereas in earlier times quantum field theorists have thought (without success) about nonlocal alternatives in the form of an elementary length or a cutoff, recent developments in algebraic QFT have made it abundantly clear that Einstein causality and its strengthed form Haag duality is inexorably linked with the mathematics of the Tomita–Takesaki modular theory. This is an extremely deep theory which is able to convert abstract domain properties of operators and subspaces obtained by applying algebras of local quantum physics to distinguished state vectors into concrete spacetime localization geometry (without the necessity to impose any additional structure from the mathematics of noncommutative geometry). All structural insight obtained up to now, the charge superselection structure, TCP, braid group statistics12 , V. Jones – as well as the new modular – inclusion theory mentioned in this paper, the universal nature of holography and the concept of localized entropy, all these properties depend on the causality aspects of QFT. So the reasons for giving them up must be very strong (theoretical or experimental) and amount to much more than the esthetics of differential geometric consistency observation. The biggest difference to a more scholarly and less marketing Zeitgeist of previous times becomes visible if one looks at the terminology. Whereas e.g. the quasiclassical Bohr–Sommerfeld theory was presented in a way that left no doubt about its transitory character and the step towards quantum mechanics was the de-mystification of the quasiclassical antinomies and loose ends, string theorists often praise their product as a theory of everything and invite their fellow physicists to read the big latin letter M as “mystery” in a science whose main aim used to be de-mystification. Having enjoyed the good fortune of proximity to Harry Lehmann to whose memory I have dedicated this paper, the present crisis often reminds me of good and healthy times in particle physics when he made his lasting contributions to particle physics. It would seem to me that in the present absence of profound experimental discoveries it would be more reasonable and safer to develop local quantum physics according to its very strong intrinsic logic and guidance of its underlying physical principles instead of taking off into the blue yonder under the maxim “everything goes”. But apart from a few exceptions there is a lamentable dominance of ideas which despite their long age have not contributed anything tangible to particle physics. This danger emanating from this dominance which seriously threatenes the chance of our most gifted and original young minds to contribute to the progress of particle physics (and which may even wipe out the very successful scholarly traditions in the exact sciences altogether) was certainly realized by the late Harry Lehmann who reacted to it with his characteristc mocking irony which his friends and collaborators will not forget easily, and which besides his scientific achievements probably explains Pauli’s sympathy and support extended to him. There are indications that members of the older generation (who have been keeping silence in the face of the mathematical brilliance and exclusiveness behind some of the present dominant fashion in particle physics) are slowly becoming aware of the potential danger [31, 32]. 12 Including the appearance of temporal plektonic structures in higher dimensional conformal field theories mentioned at the end of the third section.
Anti deSitter Spacetimes with Local Quantum Matter
75
Note added. Although the majority, interest has recently shifted away from the field theoretic AdS-CQFT problem, we find that it serves as an ideal ilustration how the powerful concepts of AQFT can solve a problem which otherwise (despite a very large number of papers) would have remained unsolved. Acknowledgement. I am indebted to Gerhard Mack for valuable suggestions and encouragements.
References 1. 2. 3. 4. 5. 6. 7. 8. 9.
10. 11.
12. 13. 14. 15. 16. 17. 18.
19. 20. 21.
22. 23. 24. 25. 26.
27.
Maldacena, J.: Adv. Theor. Math. Phys. 2, 231 (1998) Gubser, S.S., Klebanov, I.R., Polyakov, A.M.: Phys. Lett. B 428, 105 (1998) Witten, E.: Adv. Theor. Math. Phys. 2, 253 (1998) Fronsdal, C.: Phys. Rev. D 10, 589 (1974) Avis, S.J., Isham, C.J. and Storey, D.: Phys. Rev. D 18, 3565 (1978) Rehren, K.-H.: Ann. Henri Poincaré 1, 607 (2000) Bertola, M., Bros, J., Moschella, U. and Schaeffer, R.: AdS/CFT correspondence for n-point functions, hep-th/9908140 Buchholz, D., Florig, M. and Summers, S.J.: Hawking–Unruh temperature and Einstein Causality in anti-de Sitter space-time. hep-th/9905178 Schroer, B.: Ann. Phys. (N.Y.) 275, 190 (1999), and references therein; Schroer, B.: Local Quantum Theory beyond Quantization. In: Quantum Theory and Symmetries, ed. H.-D. Doebner, V.K. Dobrev, J.-D. Henning and W. Luecke, Singapore: World Scientific, 2000, hep-th/9912008 Schroer, B.: J. Math. Phys. 41, 3801 (2000) Segal, I.E.: Causality and Symmetry in Cosmology and the Conformal Group. Montreal 1976, Proceedings, Group Theoretical Methods In Physics, New York 1977, 433 and references therein to ealier work of the same author Luescher, M. and Mack, G.: Commun. Math. Phys. 41, 203 (1975) Hortacsu, M., Schroer, B. and Seiler, R.: Phys. Rev. D 5, 2519 (1972) Schroer, B. and Swieca, J.A.: Phys. Rev. D 10, 480 (1974); Schroer, B., Swieca, J.A. and Voelkel, A.H.: Phys. Rev. D 11, 11 (1975) Belavin, A.A., Polyakov, A.M. and Zamolodchikov, A.B.: Nucl. Phys. B 247, 83 (1984) Haag, R.: Local Quantum Physics. Berlin: Springer Verlag, 1992 Haag, R. and Schroer, B.: J. Math. Phys. 3, 248 (1962) Schroer, B.: Anomalous Scale Dimensions from Timelike Braiding. hep-th/0005134; Schroer, B.: Space- and Time-Like Superselection Rules in Conformal Quantum Field Theory. hepth/0010290 Guido, D., Longo, R., Roberts, J.E. and Verch, R.: Charged Sectors, Spin and Statistics in Quantum Field Theory on Curved Spacetimes. math-ph/9906019 Schroer, B. and Wiesbrock, H.-W.: RMP Vol. 12 No. 2, 301–326 (2000) Schroer, B. and Wiesbrock, H.-W.: RMP Vol. 12 No. 1, 139 (2000); Schroer, B. and Wiesbrock, H.-W.: Looking beyond the Thermal Horizon: Hidden Symmetries in Chiral Models. To appear in RMP 12 No. 3, (2000) Susskind, L.: J. Math. Phys. 36, 6377 (1995) ’t Hooft, G.: Dimensional reduction in quantum gravity. In: Salam-Festschrift, A. Ali et al. eds., Singapore: World Scientific, 1993, p. 284 Buchholz, D., Doplicher, S., Longo, R. and Roberts, J.H.: Rev. Math. Phys. Special Issue 49 (1992) Clifton, R. and Halvorson, H.: Entanglement and open Systems in Algebraic Quantum Field Theory. University of Pittsburgh, preprint Jan. 2000 Verlinde, E.: On the Holographic Principle in a Radiation Dominated Universe. hep-th/0008140 see also critical remarks in Bin Wang, Elcio Abdalla and Ru-Keng Su: Relating Friedmann equations to Cardy formula in universes with cosmological constant, hep-th/0101073 Narnhofer, H.: In: The State of Matter, ed. by M. Aizenman and H. Araki, Singapore: Wold-Scientific, 1994
76
B. Schroer
28. Kosaki, H.: J. Operator Theory 16, 335 (1986) 29. Wald, R.M.: Quantum Field Theory in Cuved Spacetime and Black Hole Thermodynamics. Chicago: University of Chicago Press, 1994 30. Schroer, B.: Phys. Lett. B 494, 124, (2000), hep-th/0005110 31. Todorov, I.T.: Two-dimensional conformal field theory and beyond. Lessons from a continuing fashion, math-phys/0011014 32. Penrose, R.: How to compute-help-and hurt scientific research. Convergence Winter 1999, p. 30 33. Rehren, K.-H.: Local Quantum Observables in the Anti de Sitter-Conformal QFT Correspondence. hepth/0003120 34. Kupsch, J., Ruehl, W. and Yunn, B.C.: Ann. Phys. (N.Y.) 89, 141 (1975) 35. Wiesbrock, H.-W.: Lett. Math. Phys. 31, 303 (1994); Guido, D., Longo, R. and Wiesbrock, H.-W.: Commun. Math. Phys. 192, 217 (1998) 36. Borchers, H.J.: J. Math. Phys. 41, 3604 (2000) 37. Nikolov, N.M. and Todorov, I.T.: Rationality of conformally invariant local correlation functions on compactified Minkowski space. hep-th/0009004 38. Buchholz, D., Mund, J. and Summers, S.J.: Transplantation of Local Nets and Geometric Modular Action on Robertson–Walker Space-Times. hep-th/0011237; Buchholz, D.: Algebraic Quantum Field Theory. A Status Report. hep-th/0011015 Communicated by G. Mack and W. Zimmermann
Commun. Math. Phys. 219, 77 – 88 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Mikheyev–Smirnov–Wolfenstein Effect for Linear Electron Density Harry Lehmann1, , Per Osland2, , Tai Tsun Wu3,4, 1 2 3 4
II. Institut für Theoretische Physik der Universität Hamburg, Hamburg, Germany Department of Physics, University of Bergen, Allégaten 55, 5007 Bergen, Norway Gordon McKay Laboratory, Harvard University, Cambridge, MA 02138, USA Theoretical Physics Division, CERN, 1211 Geneva 23, Switzerland
Received: 3 April 2000 / Accepted: 10 April 2000
Abstract: When the electron density is a linear function of distance, it is known that the MSW equations for two neutrino species can be solved in terms of known functions. It is shown here that more generally, for any number of neutrino species, these MSW equations can be solved exactly in terms of single integrals. While these integrals cannot be expressed in terms of known functions, some of their simple properties are obtained. Application to the solar neutrino problem is briefly discussed. 1. Introduction In studying the Mikheyev–Smirnov–Wolfenstein (MSW) effect [1] due to the coherent forward scattering of neutrinos by electrons in matter, it is often instructive to consider first special cases where the electron density is taken to be a simple function of distance. It is the purpose of the present paper to investigate perhaps the simplest case: the case where the electron density is a linear function of distance. The problem of the linear electron density is formulated in Sect. 2. The case of two neutrino species has a long history [2], and the solution, as reviewed in Sect. 3, can be expressed in terms of parabolic cylinder functions – see, for example, Chapter VIII of [3], or equivalently confluent hypergeometric functions. However, the solution in this form is specific to the case of two neutrino species, and is not convenient for generalizations to more neutrino species. Physically, this generalization is essential because there are at least three types of neutrinos. Therefore, Sect. 4 is devoted to treating in a different way the MSW differential equations for linear electron density and two neutrino species. On the one hand, this alternative method must lead to the same solutions as those in Sect. 3; on the other hand, this new treatment can be readily generalized to any number Deceased 22 November 1998.
This work was supported in part by the Research Council of Norway. Work supported in part by the United States Department of Energy under Grant No. DE-FG02-84ER40158.
78
H. Lehmann, P. Osland, T. T. Wu
of neutrino species. This case of linear electron density but any number of neutrino species forms the main content of the present paper, and various aspects of this case are treated in Sects. 5 and 6. 2. Formulation of the Problem Let there be N types of neutrinos, denoted by ν1 , ν2 , . . . νN , where ν1 is the neutrino of the first generation, i.e., the one that forms the SU(2) doublet with the electron. It is assumed that ν1 is the only neutrino which interacts differently with the electron because of the exchange of the intermediate boson W , while the other neutrinos ν2 , ν3 , . . . νN all have the same interaction with the electron. Thus, the neutrino mass matrix M [4] is an N × N matrix. The eigenvalues of M give the N neutrino masses µ. In analyzing the MSW effect, the neutrino masses are usually taken to be much smaller than the momentum p of the neutrino. Under this assumption, because (p 2 + µ2 )1/2 ∼ p +
1 2 µ , 2p
(2.1)
it is M 2 that enters in the differential equation for the MSW effect. Let (x) be the N-component neutrino wave function, then this differential equation is d 1 i (2.2)
(x) = W (x) + M 2 (x), dx 2p where W (x) is an N × N matrix whose only non-zero element is √ [W (x)]11 = 2 GF Ne (x),
(2.3)
with GF the Fermi weak-interaction constant and Ne (x) the electron density at the point x. The terminology “linear electron density” is used to mean that Ne (x) is a linear function of x. Since Ne (x) is the density of electrons, it cannot be negative. Therefore, the MSW differential equation (2.2) is physically meaningful only for the half-line of x, where Ne (x) ≥ 0. On the other hand, when the neutrino or the electron, but not both, is replaced by its antiparticle, the quantity [W (x)]11 of Eq. (2.3) changes sign. Therefore, the complementary half-line of x describes this slightly different physical situation. For this reason, Eq. (2.2) is to be studied for the entire range of x, from −∞ to +∞. For the present case of the linear electron density, Eq. (2.2) can be reduced, for a given value of p, to the dimensionless standard form i where
d ψ(t) = A(t)ψ(x), dt
(2.4)
ψ1 (t) ψ2 (t) ψ (t) ψ(t) = 3 .. . ψN (t)
(2.5)
Mikheyev–Smirnov–Wolfenstein Effect for Linear Electron Density
and
−t a2 A(t) = a3 . .. aN
a2 b2 0 .. . 0
a3 0 b3 .. . 0
. . . aN ... 0 ... 0 . .. . . . . bN
79
(2.6)
This is accomplished as follows. (i) To change the independent variable from x to t, there is a shift in origin and a rescaling with possibly a reversal of sign. (ii) To change the dependent variable from to ψ, there is a rotation in the second to N th component and an introduction of exponential factors with possibly some minus signs. Furthermore, from (i) and (ii), the elements of the matrix A(t) can be chosen to satisfy the conditions N
bj = 0,
(2.7)
j =2
b2 ≤ b3 ≤ b4 ≤ . . . ≤ bN−1 ≤ bN ,
(2.8)
aj ≥ 0
(2.9)
and for
j = 2, 3, 4, . . . N.
Consider the following special cases. (a) If, for some j , say j0 , aj0 = 0, then it is seen from Eqs. (2.5) and (2.6) that ψj0 is decoupled from the other ψj ’s. Thus, this special case of N types of neutrinos is reduced to a problem with N − 1 types of neutrinos. (b) If, again for some j , say j0 , bj0 = bj0 +1 , then a rotation can be carried out between ψj0 and ψj0 +1 such that, after this additional rotation, the new aj0 is zero. Thus, this special case of bj0 = bj0 +1 can be reduced to the above case of aj0 = 0, and hence again this second special case of N types of neutrinos is reduced to a problem with N − 1 types of neutrinos. It is therefore sufficient to study the ordinary differential equation (2.4) with Eqs. (2.5) and (2.6) under the condition (2.7) together with b2 < b3 < b4 < . . . < bN−1 < bN
(2.10)
aj > 0
(2.11)
and for
j = 2, 3, 4, . . . N.
In view of the inequality (2.10), it turns out to be convenient to define symbolically b1 = −∞
and
bN+1 = +∞.
(2.12)
80
H. Lehmann, P. Osland, T. T. Wu
3. Case N = 2 Let us review first the well-known case of the MSW effect for two types of neutrinos [1, 2]. By Eqs. (2.2)–(2.7), the MSW equations are d ψ1 (t) −t a2 ψ1 (t) = , (3.1) i a2 0 ψ2 (t) dt ψ2 (t) or more explicitly d ψ1 (t) = −tψ1 (t) + a2 ψ2 (t), dt d i ψ2 (t) = a2 ψ1 (t), dt i
(3.2) (3.3)
with a2 > 0. A second-order ordinary differential equation for ψ1 (t) is obtained by applying d/dt to Eq. (3.2) and using Eq. (3.3): dψ1 (t) d 2 ψ1 (t) − it + (a22 − i)ψ1 (t) = 0. dt 2 dt
(3.4)
In order to remove the first-derivative term, let ψ1 (t) = eit
2 /4
φ1 (t).
(3.5)
Then the equation for φ1 (t) is d 2 φ1 (t) + ( 41 t 2 + a22 − 21 i)φ1 (t) = 0. dt 2
(3.6)
Two linearly independent solutions of this Eq. (3.6) are the parabolic cylinder functions [3] Dρ (±eiπ/4 t),
(3.7)
ρ = −ia22 − 1.
(3.8)
where
Parabolic cylinder functions are special cases of the confluent hypergeometric function [5], the relation being Dρ (z) = 2(ρ−1)/2 e−z
2 /4
z ( 21 − 21 ρ, 23 ; 21 z2 ).
(3.9)
Since the confluent hypergeometric functions and satisfy the same second-order differential equation, the general solution of Eq. (3.6) is
(3.10) ψ1 (t) = t C (1 + 21 ia22 , 23 ; 21 it 2 ) + C (1 + 21 ia22 , 23 ; 21 it 2 ) . This is one convenient form for the solution for N = 2.
Mikheyev–Smirnov–Wolfenstein Effect for Linear Electron Density
81
4. Case N = 2 – an Alternative Approach In the existing treatment in the literature for linear electron density and two types of neutrinos [1, 2] as reviewed in Sect. 3, the crucial step is to recognize that the second-order differential equation (3.4) can be solved exactly in terms of known higher transcendental functions, either parabolic cylinder functions or confluent hypergeometric functions. More generally, for N types of neutrinos, the corresponding differential equation is of N th order. Even for N = 3, the third-order differential equation is not one for any wellknown transcendental function. Therefore, in order to be able to generalize the treatment of N = 2 to larger values of N , we must recast the solution of Sect. 3 so that parabolic cylinder functions and confluent hypergeometric functions do not play an essential role. A useful question to ask is the following: In what way is the linear electron density especially simple? The answer must be sought in Eq. (2.6), from which it is seen that the independent variable t appears only in one matrix element, and furthermore, it appears only linearly in that element. This implies that, if the Fourier transform is applied to the differential equation (2.4), the differentiation with respect to the Fourier-transform variable appears only once. Hence it is expected that an explicit expression can be obtained for the Fourier transform of ψ. Let ∞ 1 F (ζ ) = dt eiζ t ψ1 (t), (4.1) 2π −∞ then it follows from Eq. (3.4) that F (ζ ) satisfies the first-order differential equation −ζ 2 F (ζ ) −
d [−iζ F (ζ )] + (a22 − i)F (ζ ) = 0, dζ
where we have omitted all terms from t = ±∞. This differential equation simplifies immediately to iζ
dF (ζ ) − (ζ 2 − a22 )F (ζ ) = 0, dζ
or 1 dF (ζ ) i = (a22 − ζ 2 ). F (ζ ) dζ ζ
(4.2)
Integration over ζ gives F (ζ ) = const. e−iζ
2 /2
ζ ia2 . 2
(4.3)
From the inequality (2.11), it is seen that the function on the right-hand side of Eq. (4.3) has a singularity at ζ = 0 = b2 .
(4.4)
Therefore the constant in (4.3) can take on different values for ζ positive and for ζ negative. In other words, the differential equation (4.2) is really two differential equations, one for ζ > 0 and the other for ζ < 0, consistent with the fact that the right-hand side of Eq. (4.2) has a singularity at ζ = 0. With this observation, it is natural to define
2 2 e−iζ /2 (−ζ )ia2 for ζ < 0, (4.5a) F1 (ζ ) = 0 for ζ > 0,
82
H. Lehmann, P. Osland, T. T. Wu
and
for ζ < 0, for ζ > 0.
0 2 2 e−iζ /2 ζ ia2
F2 (ζ ) =
(4.5b)
Inverting the Fourier transform (4.1), this choice leads to
(1)
ψ1 (t) = (2) ψ1 (t)
0
−∞ ∞
dζ e−iζ t e−iζ
=
2 /2
(−ζ )ia2 , 2
(4.6) dζ e
−iζ t −iζ 2 /2
e
ζ
ia22
.
0
With the notation (2.12), these two formulas (4.6) can be written as
(n)
ψ1 (t) =
bn+1
bn
dζ e−iζ t e−iζ
2 /2
|ζ |ia2 , 2
(4.7)
for n = 1, 2. (1) (2) It remains to show that both ψ1 (t) and ψ1 (t) are confluent hypergeometric functions of the correct parameters and argument. For this purpose, it is convenient to define ∞ 2 2 dζ cos(ζ t)e−iζ /2 ζ ia2 , ψc (t) = (4.8) 0 ∞ −iζ 2 /2 ia22 dζ sin(ζ t)e ζ , ψs (t) = 0
so that it follows from Eqs. (4.6) that (1)
ψ1 (t) = ψc (t) + iψs (t),
(4.9)
(2)
ψ1 (t) = ψc (t) − iψs (t). It is found that ψc (t) = e−iπ/4 eπa2 /4 2(−1+ia2 )/2 '( 21 + 21 ia22 ) ( 21 + 21 ia22 , 21 ; 21 it 2 ) 2
2
(4.10)
and ψs (t) = −i eπa2 /4 2ia2 /2 '(1 + 21 ia22 ) t 2
2
(1 + 21 ia22 , 23 ; 21 it 2 ).
(4.11)
There are various ways to verify Eqs. (4.10) and (4.11), including carrying out power series expansions in t for the left-hand and right-hand sides. Finally, we note from Eq. (7) on p. 257 of reference [5] that
(a, c; x) =
'(1 − c) '(a − c + 1)
(a, c; x) +
'(c − 1) 1−c x '(a)
(a − c + 1, 2 − c; x). (4.12)
Therefore, the results of Sect. 3 and this section are the same.
Mikheyev–Smirnov–Wolfenstein Effect for Linear Electron Density
83
5. General Values of N The procedure of Sect. 4 for N = 2 can be generalized in a straightforward way to larger values of N . Indeed, this is the major advantage over the previously known ones as reviewed in Sect. 3. This generalization to arbitrary values of N is to be carried out in this section. Thus, the differential equations (2.4) need to be solved under the constraints (2.7), (2.10), and (2.11). By Eqs. (2.5) and (2.6), the Eqs. (2.4) are more explicitly N
i
dψ1 (t) aj ψj (t) = −t ψ1 (t) + dt
(5.1)
j =2
and, for k = 2, 3, 4 . . . N,
d i − bk ψk (t) = ak ψ1 (t). dt
(5.2)
In order to get a differential equation for ψ1 (t), apply the operator N d i − bk dt
k=2
to Eq. (5.1). By Eq. (5.2), this gives
N N N d d d i i i − bk + t ψ1 (t) = − bk ψ1 (t). aj2 dt dt dt
k=2
j =2
(5.3)
k=2 k=j
Equation (5.3) is an N th -order ordinary differential equation for ψ1 (t); it reduces to Eq. (3.4) when N = 2. Following Sect. 4, define the Fourier transform F (ζ ) of ψ1 (t) by Eq. (4.1), then the first-order differential equation for F (ζ ) is
N
k=2
(ζ − bk )
ζ −i
d dζ
F (ζ ) =
N j =2
aj2
N
(ζ − bk )F (ζ ),
(5.4)
k=2 k=j
or N aj2 d F (ζ ) = F (ζ ), ζ −i dζ ζ − bj
(5.5)
j =2
or N aj2 1 dF (ζ ) . = i −ζ + F (ζ ) dζ ζ − bj j =2
(5.6)
84
H. Lehmann, P. Osland, T. T. Wu
This Eq. (5.6) is the generalization of the previous Eq. (4.2) for N = 2. Integration over ζ gives the generalization of Eq. (4.3): F (ζ ) = const. e−iζ
2 /2
N
(ζ − bj )iaj . 2
(5.7)
j =2
From (2.11), the function on the right-hand side of this Eq. (5.7) has singularities at ζ = bj
(5.8)
for j = 2, 3, . . . N,. Therefore, define for n = 1, 2, 3 . . . N,
Fn (ζ ) =
e−iζ 0
N
2 /2
j =2 |ζ
− bj |iaj
2
for bn < ζ < bn+1 , otherwise.
(5.9)
Inverting the Fourier transform (4.1) then gives the desired N linearly independent solutions of the differential equation (5.3) as (n) ψ1 (t)
=
bn+1
bn
dζ e
−iζ t −iζ 2 /2
e
N
|ζ − bj |iaj
2
(5.10)
j =2
for n = 1, 2, 3, . . . N. In both Eq. (5.9) and Eq. (5.10), the notation of (2.12) has been used. The general solution of (5.3) is of course
ψ1 (t) =
N
(n)
Cn ψ1 (t)
(5.11)
n=1
with arbitrary constants Cn . The other components of the ψ(t) of (2.5) can be easily obtained also, and the result is ψ(t) =
N
Cn ψ (n) (t),
(5.12)
n=1
where
ψ (n) (t) =
bn+1
bn
1
a2 ζ −b2 a3 ζ −b3
iaj2 −iζ t −iζ 2 /2 dζ e e |ζ − bj | .. . j =2 aN −1 ζ −b
N
N −1
aN ζ −bN
.
(5.13)
Mikheyev–Smirnov–Wolfenstein Effect for Linear Electron Density
85
6. Limiting Behaviors for Large Distances The next task is to obtain the limiting behaviors of the various components of the wave function when the distance x is large, either positive or negative. In other words, the (n) problem is to find the limiting behaviors of the ψj (t), as given explicitly by Eq. (5.13), both for t → −∞ and t → ∞, with all the a’s and b’s fixed. It is important to remember that these two limits correspond to different physical problems, as discussed after Eq. (2.3). The consideration here will be limited to the part of the asymptotic behavior that does not vanish as t → ±∞. This is the physically interesting part. There are two possible types of contributions, from points of stationary phase and from end points of integration.
6.1. Points of stationary phase. From Eq. (5.13), the points of stationary phase are determined by ∂ (−ζ t − 21 ζ 2 ) = 0 ∂ζ
(6.1)
ζ = −t.
(6.2)
or
In Eq. (6.1), the additional phase due to the factor N
|ζ − bj |iaj
2
j =2
is not included because the aj and bj are all fixed while t → ±∞. Equation (6.2) implies that this point of stationary point is relevant only to: (1)
• ψ1 (t) as t → ∞, and (N) • ψ1 (t) as t → −∞ (1)
(N)
in view of Eq. (5.13). For example, when j > 1, ψj (t) as t → ∞ and ψj (t) as t → −∞ both behave as 1/t in absolute value so far as the contribution from this point of stationary phase (6.2) is concerned. 6.2. End points of integration. It is seen from Eq. (5.13) that, when k ≥ 2, there is an extra factor of ak ζ − bk (n)
(6.3) (n)
associated with ψk (t). But the range of integration for this ψk (t) as given by Eq. (5.13) is from bn to bn+1 . Therefore, the contribution from these end points of integration can lead to a non-zero answer only when the index k appearing in the expression (6.3) agrees with either n or n + 1. In other words, there are non-zero contributions as t → ±∞ only
86
H. Lehmann, P. Osland, T. T. Wu (k−1)
(k)
to ψk (t) [i.e., n = k − 1] and ψk (t) [i.e., n = k]. These particular components are given by k−1 bk N 2 2 −ak 2 (k−1) (t) = dζ e−iζ t e−iζ /2 (ζ − bj )iaj (bj − ζ )iaj ψk , bk − ζ bk−1 (k) ψk (t)
=
j =2
bk+1
bk
dζ e
−iζ t −iζ 2 /2
e
k
j =k
(ζ − bj )
N ia 2
(bj − ζ )
j
j =2
iaj2
j =k+1
ak . ζ − bk (6.4)
These Eq. (6.4) are exact. Since the important contributions come from the vicinity of ζ = bk , all the ζ ’s in Eq. (6.4) can be replaced approximately by bk except in the factors e−iζ t , bk − ζ , and ζ − bk . Therefore bk N 2 2 2 (k−1) (t) ∼ e−ibk /2 |bj − bk |iaj (−ak ) dζ e−iζ t (bk − ζ )−1+iak , ψk j =2 j =k
(k) ψk (t)
∼e
−ibk2 /2
N
iaj2
|bj − bk |
ak
bk
j =2 j =k
(6.5) dζ e−iζ t (ζ − bk )
−1+iak2
,
or (k−1) (t) ψk
∼e
−ibk2 /2 −ibk t
e
N
|bj − bk |
iaj2
|bj − bk |
iaj2
j =2 j =k
(k)
ψk (t) ∼ e
−ibk2 /2
e−ibk t
N
(−ak )
ak
j =2 j =k
∞
dx eixt x −1+iak , 2
0
∞
(6.6) dx e−ixt x
−1+iak2
.
0
This integral can be evaluated exactly in terms of the gamma function. (n)
6.3. Results. Figure 1 shows which ones of the various ψk (t) have non-vanishing behaviors for t → −∞ and t → ∞. These non-vanishing behaviors are: For t positive and large, √ 2 (1) ψ1 (t) ∼ 2π e−iπ/4 t iα eit /2 , (6.7) N 2 2 2 2 (k−1) ψk (t) ∼ e−ibk /2 e−ibk t |bj − bk |iaj (−ak )e−πak /2 '(iak2 )t −iak , (6.8) j =2 j =k
(k) ψk (t)
∼e
−ibk2 /2 −ibk t
e
N
j =2 j =k
iaj2
|bj − bk |
ak eπak /2 '(iak2 )t −iak ; 2
2
(6.9)
Mikheyev–Smirnov–Wolfenstein Effect for Linear Electron Density
k
n
1
2
1
∞
2
×
. . . N-1 N
4
-∞ × ×
3
3
87
× ×
4 .. .
× ..
.
N -1
..
.
..
.
× ×
N
×
(n)
Fig. 1. Table of non-vanishing components of ψk (t) as t → ±∞. A cross means that the component is non-vanishing both for t → −∞ and t → +∞; the symbol ∞ means for t → +∞ only, and −∞ means for t → −∞ only.
while, for t negative and large, √ 2 (N) ψ1 (t) ∼ 2π e−iπ/4 |t|iα eit /2 , N 2 2 2 2 (k−1) ψk (t) ∼ e−ibk /2 e−ibk t |bj − bk |iaj (−ak )eπak /2 '(iak2 )|t|−iak , j =2 j =k
(k) ψk (t)
∼e
−ibk2 /2 −ibk t
e
N
|bj − bk |
iaj2
j =2 j =k
ak e−πak /2 '(iak2 )|t|−iak . 2
2
(6.10) (6.11)
(6.12)
All the other components approach zero as t → ∞ and as t → −∞. In the asymptotic formulas (6.7) and (6.10), α is the quantity α=
N
aj2 .
(6.13)
2 ≤ k ≤ N,
(6.14)
j =2
In the formulas (6.8), (6.9), (6.11) and (6.12),
where N as always is the number of neutrino species. 7. Discussion When we started to investigate the MSW differential equations for three neutrino species in the case of the linear electron density, we were mostly interested in various possibilities of finding approximate solutions. Therefore, it was quite a surprise to us that these
88
H. Lehmann, P. Osland, T. T. Wu
coupled differential equations can be solved exactly not only for three, but also for any number of neutrino species. In the work of Wolfenstein, Mikheyev, Smirnov [1] and others [2] on the sun taking into account two species of neutrinos, it has been found that most of the effect takes place in a fairly narrow region around a particular value of the electron density. Because of this, it is quite accurate to use a linear approximation to the electron density. For more than two species of neutrinos, it is no longer true in general that there is a narrow region for most of the activity. Nevertheless, there are a number of circumstances where this is true. However, the conditions for this to hold has not yet been studied systematically. This is one direction for future work. Under the assumption of the electron density being a linear function of distance, the exact, general solution of the MSW differential equation is given by Eqs. (5.12) and (5.13). This solution is in the form of a number of single integrals. When the number of neutrino species is more than 2, these integrals cannot be evaluated in terms of known functions, and therefore their properties need to be investigated. A small step in this direction has been taken in Sect. 6, where the asymptotic behaviors of these integrals have been evaluated for large distances but with all the a’s and b’s held fixed. It is believed that, in so far as this case of linear electron density is applicable to the physically interesting case of solar neutrinos, the asymptotic evaluation of Sect. 6 is far from being sufficient. It is more likely that not only the distance, but also some of the parameters, the a’s and b’s, are large. This is a second direction for future work. Acknowledgements. We are greatly indebted to Dr. Conrad Newton for collaboration at the early stage of this work. One of us (TTW) thanks the Theory Division at CERN for its kind hospitality.
References 1. Wolfenstein, L.: Phys. Rev. D 17, 2369, (1978) ibid. 20, 2634 (1979); Mikheyev, S.P. and Smirnov, A.Yu.: Yad. Fiz. 42, 1441 (1985) [Sov. J. Nucl. Phys. 42, 913 (1985)], Nuovo Cimento C 9, 17 (1986) 2. See, for example, Landau, L.D.: Phys. Z. Sovjetunion 2, 46 (1932); Zener, C.: Proc. Roy. Soc. (London) A137, 696 (1932); Haxton, W.C.: Phys. Rev. Lett. 57, 1271 (1987); Parke, S.J.: Phys. Rev. Lett. 57, 1275 (1986); Petcov, S.T.: Phys. Lett. B 191, 299 (1986) 3. Bateman Manuscript Project, Higher Transcendental Functions, Vol. II, A. Erdélyi, ed., New York: McGraw-Hill, 1953 4. Rosen, S.P.: In: Symmetries and Fundamental Interactions in Nuclei, edited by W. C. Haxton and E. M. Henley, Singapore: World Scientific, 1995, p. 251 5. Bateman Manuscript Project, Higher Transcendental Functions,Vol. I, A. Erdélyi, ed., NewYork: McGrawHill, 1953 Communicated by A. Jaffe, G. Mack and W. Zimmermann
Commun. Math. Phys. 219, 89 – 124 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
The Elliptic Genus and Hidden Symmetry Arthur Jaffe Harvard University, Cambridge, MA 02138, USA Received: 7 January 2000 / Accepted: 10 April 2000
Dedicated to the memory of Harry Lehmann Abstract: We study the elliptic genus (a partition function) in certain interacting, twist quantum field theories. Without twists, these theories have N = 2 supersymmetry. The twists provide a regularization, and also partially break the supersymmetry. In spite of the regularization, one can establish a homotopy of the elliptic genus in a coupling parameter. Our construction relies on a priori estimates and other methods from constructive quantum field theory; this mathematical underpinning allows us to justify evaluating the elliptic genus at one endpoint of the homotopy. We obtain a version of Witten’s proposed formula for the elliptic genus in terms of classical theta functions. As a consequence, the elliptic genus has a hidden SL(2, Z) symmetry characteristic of conformal theory, even though the underlying theory is not conformal. 1. Introduction We study coupled complex bosonic and fermionic quantum fields on a two-dimensional space-time cylinder S 1 × R, where S 1 denotes a circle of length . The equations are determined by a holomorphic polynomial in n variables called the superpotential, V : Cn → C.
(1.1)
We denote the degree of this polynomial by n˜ = degree(V ), and we assume n˜ ≥ 2.
(1.2)
The complex scalar fields ϕ and the Dirac field ψ have n and 2n components respectively, ϕ = {ϕi }, where 1 ≤ i ≤ n,
and ψ = {ψα,i }, where 1 ≤ α ≤ 2,
1 ≤ i ≤ n. (1.3)
Work supported in part by the Department of Energy under Grant DE-FG02-94ER-25228. The author performed this research in part for the Clay Mathematics Institute.
90
A. Jaffe
In the literature one finds these equations called “Wess–Zumino equations” or sometimes “Landau–Ginzburg equations”. For cubic V , the equations reduce to the coupling of a non-linear boson field to the Dirac field by aYukawa interaction. Hence one occasionally also refers to the equations arising from general V as “generalized Yukawa” equations. In [15, 12] we established the existence of solutions to the Wess–Zumino equations for massive fields. Recently we extended these results by proving the existence of solutions for the equations coupling massless, multicomponent, twist fields. The word “twist” refers to the fact that the fields are multi-valued; translation about the spatial circle results in each component of the field being multiplied by a phase. This phase is proportional to a real parameter φ that we choose in the interval φ ∈ (0, 2π ], and the periodic case (no twist) corresponds to the limiting value φ = 0. The operators in the field theory act on a Fock–Hilbert space H over the circle, with domains and other properties of the operators depending on φ. For details of these definitions and results see [4, 10, 11]. We study a subset of polynomials V with properties detailed in Sect. 1.1. For these examples, the Hamiltonian H = H (V ) is self-adjoint, it is bounded from below, and the heat kernel e−βH has a trace for all β > 0. This semigroup commutes with the translation group generated by the momentum operator P . There is also a U (1) group U (θ ) = eiθJ of “twist” symmetries of H , where the generator J = J (V ) depends on V , f see Sect. 1.1. Denote the fermion number operator by N f , and let = (−I )N denote a Z2 -grading. In our examples, all four operators H , P , J , and are self-adjoint and mutually commute. Hence the operator A = e−iθJ −iσ P is unitary, and the operator A e−βH = e−iθJ −iσ P −βH has a trace for all β > 0. The elliptic genus is the partition function (1.4) ZV = Tr H e−iθJ −iσ P −βH . In a seminal paper [21], Witten suggested that one could calculate the elliptic genus of these examples in closed form. He gave a proposed formula (for φ = 0) based on an argument that ZλV should be independent of a parameter λ, and an “evaluation” of Z for V = 0. Kawai,Yamada, andYang [17] elaborated on the algebraic aspects Witten’s work and made contact with related proposals of Vafa [19]. From a mathematical point of view, these insights are not definitive; the representation (1.4) is ill-defined if both V = 0 and φ = 0, as e−βH does not have a trace, and the evaluation is only suggestive. Furthermore, establishing the existence and continuity of ZλV requires extensive analysis, beyond the scope of earlier work. We introduce a regularized ZV , with two regularizing parameters. The first regularization mollifies the zero-frequency modes, and enters through the non-zero twisting parameter φ, as explained in Sect. 1.1. The second regularization mollifies the highfrequency modes. We denote the regularization parameter by , and we discuss it in detail in Sect. 5 when we give an explicit expression for the supercharge as a densely defined sesqui-linear form on the Hilbert space H. The regularized supercharges determine self- adjoint operators. The elliptic genus depends on the parameter φ, and has a regular limit as φ → 0. (In fact, the genus continues holomorphically to all φ ∈ C.) The genus does not depend on the high-frequency mollifier . Our goal in this paper is to find and exploit infra-red and ultra- violet regularizations that yield all the following: • a self-adjoint Hamiltonian H that is bounded from below, with a trace class heat kernel, • the two-parameter group of Lie symmetries of H generated by J and P , and
The Elliptic Genus and Hidden Symmetry
91
• a sufficient number of invariant supercharges to study and to compute the elliptic genus. The method that we use in this paper has many advantages. We use twists to provide the infra-red regularization, and a ultra-violet regularization with the property of slow decrease at infinity to provide a non-local cutoff in the Hamiltonian. This regularized Hamiltonian has a form that allows us to establish stability and self-adjointness, as well as the existence of a trace for the heat kernel. This trace is uniform in the ultraviolet regularization parameter , but diverges as the twist parameter φ → 0. This regularization leaves us with half the number of translation- invariant supercharges that one expects in a twist-free theory. These supercharges also commute with J . On the other hand, more straightforward regularizations cause difficulty in at least one of these areas, either producing a heat kernel with continuous spectrum, destroying the eiθJ symmetry that one needs to study the elliptic genus, breaking all supersymmetries, making it impractical to establish stability, or producing error terms in the supersymmetry algebra that elude estimation. For example, introducing a bosonic mass, without a corresponding fermionic mass, provides an infra-red regularization compatible with a trace-class heat kernel and with J -symmetry; but all supersymmetries will be broken. We used this method in [9] to study the quantum-mechanics version of the present problem. As a result, the mathematical analysis became quite lengthy – even in the case of a finite number of degrees of freedom. On the other hand, introducing a mass in both the boson and the fermion destroys the J -symmetry of the Hamiltonian, as well as of all supercharges, requiring the analysis of o ther types of error estimates. Furthermore, a sharp upper momentum cutoff in the interaction produces non-localities that defy estimation. One new ingredient in our program is to generalize the framework of constructive quantum field theory to cover twist fields. We carry this out in more detail in [10]. A second new ingredient involves identifying and studying cancellations that occur in the geometric invariants we study, and we give the details of these cancellations. We begin in Sect. 5 with operator estimates, that justify representations of the invariants by invariants of a sequence of approximating problems. Related estimates show that we can exhibit cancellations in the difference quotients for the approximating problems. In order to estimate these cancellations, we pass from operator estimates to the study of traces in Sect. 6 and Sect. 7. Twisting partially breaks supersymmetry, as explained in detail in [11]. Half the supercharges are translation and twist invariant, while the other half of the supercharges are not. The elliptic genus can be written as a function of the invariant charges. We restrict the 3n-real twisting angles to lie on a line in R3n , parameterized by one angle φ. Doing this yields one invariant supercharge that we denote Q, and which commutes both with translations and the twist group. This supercharge satisfies Q2 = H + P . A second supercharge (one that formally exists for φ = 0) is neither translation nor twist invariant. But it is well-behaved in the sense that we can estimate the error terms in the supersymmetry algebra, and we use one of these estimates in this paper.1 In the end, we obtain the representation of the elliptic genus in terms of theta functions. The partition function then satisfies certain properties under transformations defined by 1 Other estimates on the error terms in the supersymmetry algebra play a role if one wants to identify the limiting quantum field theory with full supersymmetry in the limit as the twists are removed. The elliptic genus turns out to be the boundary value of an entire function of φ ∈ C. In particular, the limit φ → 0 exists. Since the Hilbert space and operators we study depend on φ, we define a limit of field theories as a limit of expectation values. With such a limit, as long as we keep a well-behaved, non-zero potential, we recover a standard quantum field theory as φ → 0.
92
A. Jaffe
the modular group SL(2, Z), acting on the complex space-time coordinate τ defined below. At first this seems surprising, as the theta functions and conformal symmetry are generally associated with zero mass fields or with conformal field theory. For this reason, we describe the SL(2, Z) symmetry as a hidden aspect of these Wess–Zumino models. Our results here build on work of Witten [21] and Connes [1], combining these ideas with results from our theory of twist quantum fields [4, 10] and our work in [6]. The elliptic genus is an index invariant, and as explained in Sect. IX of [6], it fits into the general framework of equivariant, non-commutative geometry (entire cyclic cohomology), characterized by the Dirac operator Q on loop space. However, the elliptic genus is only one such invariant, from a whole family of invariants, that result from the JLO-cocycle [13]. Therefore we suggest that it may be possible, within the framework of the Wess–Zumino examples that we study here, to find closed form expressions for some other invariants given in [6]. We formulated various representations for such invariants in [7, 9], and these might be useful in computation. We prove here the representation for the elliptic genus ZV . Our proof relies on a series of a priori estimates and other methods from constructive quantum field theory. In particular, we study ZλV , where λ denotes a real parameter, and establish differentiability of ZλV in λ for λ > 0, and eventually that ZλV is a constant function of λ. Another key estimate is to show that ZλV does not jump at λ = 0. In fact, ZλV is a priori Hölder continuous at λ = 0. We obtain any positive Hölder exponent α < 2/(n˜ − 1), namely there is a constant M = M(α, V , ) such that λV (1.5) Z − Z 0 ≤ M λα , for λ ∈ [0, 1]. For potentials of large degree this exponent is small, but strictly positive. These two results combine with the vanishing of the derivative, to show that ZλV is actually a constant function of λ ∈ [0, 1]. We then compute ZV by evaluating Z0 . 1.1. Assumptions. Let us give more details. The real-time bosonic field ϕRT = {ϕRT,i } has n components designated ϕRT,i , where 1 ≤ i ≤ n. The corresponding real-time fermionic fields ψRT = {ψRT,α,i } has 2n components labeled by α, i with i as before and f 1 ≤ α ≤ 2. All these fields are complex, and so given 3n twist constants & = {&bi , &α,i }, there is a one-parameter group U (θ) such that b
b
U (θ )ϕRT,i U (θ )∗ = ei&i θ ϕRT,i , and U (θ)ψRT,α,i U (θ)∗ = e&α,i θ ψRT,α,i .
(1.6)
Also, the momentum operator implements spatial translations, eiσ P ϕRT,i (x, t)e−iσ P = ϕRT,i (x − σ, t), and eiσ P ψRT,α,i (x, t)e−iσ P = ψRT,α,i (x − σ, t).
(1.7)
These properties uniquely determine each generator J and P , up to an additive constant; we choose these constants in the normalization condition NC below. A twist field has the additional property that these two groups are related. Translation around the circle results in multiplying each component of the field by a phase. Thus f there are 3n-independent twisting angles χ = {χib , χα,i } such that b
b
ϕRT,i (x + , t) = eχi ϕRT,i (x, t), and ψRT,α,i (x + , t) = eχα,i ψRT,i (x, t).
(1.8)
The Elliptic Genus and Hidden Symmetry
93
Our superpotential V is a holomorphic polynomial from Cn to C, and it determines the coupling of ϕRT with ψRT . Let Vi denote the directional derivative of V , namely Vi (z) = ∂V (z)/∂zi . We study a holomorphic polynomial superpotential V with two other basic properties: the potential is quasi-homogeneous (QH) and the potential satisfies certain elliptic bounds (EL). Furthermore, we assume that the twist constants and twisting angles satisfy certain twist relations (TR). Finally we assume certain normalization conditions (NC). We now briefly summarize these four hypotheses: QH (Quasi-homogenity) The superpotential function V: Cn → C is a holomorphic, quasi-homogeneous polynomial of degree n˜ at least two. This means that there are n constants &i called quasi-homogeneous weights, such that 0 < &i ≤ 21 and V (z) =
n
& i zi
i=1
∂V (z) . ∂zi
(1.9)
EL (Elliptic Property) Given 0 < -, there exists M < ∞ such that the function V satisfies |∂ α V | ≤ - |∂V |2 + M, and |z|2 + |V | ≤ M |∂V |2 + 1 . (1.10) Here ∂ α Vdenotes any multi-derivative of V , while |z| denotes the magnitude of z, and |∂V |2 = ni=1 |∂V /∂zi |2 is the squared magnitude of the gradient of V . TR (Twist Relations) Define the 3n twist constants & in J as functions of the n quasi-homogeneous weights &i , &bi = &i ,
f
&1,i = &i ,
f
and &2,i = 1 − &i .
(1.11)
Choose the 3n twisting angles χ to be proportional to the twist constants &, namely χib = &i φ,
f
χ1,i = &i φ,
f
and χ2,i = (1 − &i ) φ,
(1.12)
where φ is a single twisting parameter that we take to lie in the interval (0, π ]. NC (Normalization Conditions) Choose the additive constants in the generators J and P so the Fock ground state &vac is an eigenvector with the following eigenvalues2 : 1 and J &vac = − cˆ &vac , 2
P &vac = 0, where cˆ =
n
n f f &2,i − &1,i = (1 − 2&i ) .
i=1
(1.13)
i=1
This ensures that J and −J have the same spectrum. In [10] we establish Proposition 1.1. Assume that V is a holomorphic polynomial satisfying EL of Sect. 1.1. (i) There exists a self-adjoint quantum field twist Hamiltonian H (V ) that is the normresolvent limit of a sequence of approximating Hamiltonians H (V ) defined in Sect. 4. 2 The constant cˆ recurs in these problems and is called the central charge. In fact cˆ characterizes the weight of the elliptic genus as a modular function, as pointed out in [21].
94
A. Jaffe
(ii) The self-adjoint semi-group e−βH (V ) is trace class for β > 0. (iii) Suppose in addition that V is quasi-homogeneous, and that the twist constants & and the twisting angles χ satisfy TR. Then the Hamiltonians H and H both commute with the two-parameter unitary group eiθJ +iσ P of space translations and f twists, and they also commute with = (−I )N . We introduce some further notation. With (τ ) the imaginary part of τ , let H = {τ : 0 < (τ )} designate the upper half plane. We use the parameter σ ∈ R, and the strictly positive parameters β, θ, and φ. We take τ=
σ + iβ ∈ H.
(1.14)
In terms of these parameters, define the variables q = e2πiτ , so |q| < 1,
y = eiθ , so |y| = 1,
and z = eiφτ , so |z| < 1. (1.15)
Consider partition functions as functions of τ , θ , and φ, related to q, y, and z as above. The Jacobi theta function of the first kind ϑ1 (τ, θ ), defined for τ ∈ H, for θ ∈ C, with period 8 in τ , and with period 4π in θ, is given by ∞ 1 1 1 ϑ1 (τ, θ ) = iq 8 y − 2 − y 2 (1 − q n )(1 − q n y)(1 − q n y −1 ).
(1.16)
n=1
This function is odd in the second variable, namely ϑ1 (τ, θ ) = −ϑ1 (τ, −θ). We follow the standard notation in Sect. 21.3 of Whittaker and Watson [20], with the exceptions noted above. 2. Main Results We study the partition function
ZλV = Tr H e−iθJ −iσ P −βH (λV ) .
(2.1)
For V = 0, the heat kernel e−βH0 is also trace class, on account of the non-zero twisting parameter φ. Given a non-zero potential V satisfying QH and EL, we associate a family of potentials λV , where λ ∈ [0, 1], and also a generator J of symmetry with parameters & specified by TW and normalization given by NC. The partition function Z0 defined by λ = 0 has an implicit dependence on V , brought about through the choice of J . We devote the remainder of this paper to establishing the following theorem and its corollary. Theorem 2.1. Assume the polynomial potential V of degree n˜ ≥ 2 satisfies QH and EL of Sect. 1.1. Consider the self-adjoint Hamiltonian H = H (λV ), as defined in Proposition 1.1 for λ ≥ 0. Assume that the twist fields satisfy assumptions TR, and that P and J satisfy NC. (i) The map λ → ZλV (τ, θ, φ) is differentiable in λ for λ > 0.
(2.2)
The Elliptic Genus and Hidden Symmetry
95
(ii) Choose α so that 0 ≤ α < 2/(n˜ − 1). There exists a constant M = M(α, V ) such that for λ ∈ [0, 1], 0 (2.3) Z − ZλV ≤ M λα . Corollary 2.2. The map (2.2) is constant for 0 ≤ λ ≤ 1. The partition function ZV depends on V only through its weights &, and it equals V
Z (τ, θ, φ) = z
c/2 ˆ
n ϑ1 (τ, (1 − &i ) (θ − φτ )) . ϑ1 (τ, &i (θ − φτ ))
(2.4)
i=i
Remark. Corollary 2.2 shows that ZV (τ, θ, φ) extends to a holomorphic function for ab τ ∈ H, θ ∈ C, and φ ∈ C. If a, b, c, d ∈ Z, and ad − bc = 1, then ∈ SL(2, Z). cd Let τ =
aτ + b , cτ + d
θ =
θ , cτ + d
and
φ =
φτ . aτ + b
(2.5)
The analytic continuation of the partition function ZV (τ, θ, φ) obeys the transformation law V
Z (τ , θ , φ ) = e
2πi
cˆ 8
c(θ−φτ )2 cτ +d
ZV (τ, θ, φ).
(2.6)
One obtains limiting values from the representation (2.4) as the parameters φ, θ, or q vanish; these limits are not uniform and do not commute. Define the integer-valued index of the self-adjoint operator Q with respect to the grading as the difference in the dimension of the kernel and the dimension of the cokernel of Q as a map from the +1 eigenspace of to the −1 eigenspace of . Denote this integer by Index (Q). Corollary 2.3. We have the following limits. (i) As φ tends to zero, the partition function converges to3 lim ZV =
φ→0
n ϑ1 (τ, (1 − &i ) θ) i=i
ϑ1 (τ, &i θ)
.
(2.7)
As θ → 0, the partition function converges to ˆ lim ZV = zc/2
θ→0
n ϑ1 (τ, (1 − &i ) φτ ) i=1
ϑ1 (τ, &i φτ )
.
(2.8)
3 The existence of a field theory for φ = 0 requires special analysis. For λ = 0, this can be established as a consequence of the assumption EL for V . The field theory is the φ → 0 limit of the twist field theory, and the elliptic genus of the limiting theory is the limit (2.7). It agrees with the formula proposed in [17]. In the case λ = 0, the elliptic genus also has a φ → 0 limit as long as 0 < |θ | < 2π , but this limit is not the genus of a limiting theory.
96
A. Jaffe
(ii) For θ ∈ (0, π), we may take the iterated limit as φ → 0 and then q → 0 to obtain the equivariant, quantum- mechanical index studied in [9], n sin ((1 − &i )θ/2) . (2.9) lim lim ZV = q→0 φ→0 sin (&i θ/2) i=1
(iii) The integer-valued index Index (Q) can be obtained as Index (Q) = lim lim ZV = lim lim ZV θ→0
= lim
φ→0
lim
θ→0
q→0
φ→0
lim ZV
φ→0
θ→0 n
=
i=1
1 −1 . &i
(2.10)
(iv) On the other hand, lim
θ→0
lim ZV
q→0
Examples. For any n, if V (z) = EL, and &i =
1 , ki
cˆ =
= lim
n
ki i=1 zi ,
q→0
lim ZV
θ→0
= 1.
(2.11)
with 2 ≤ ki ∈ Z, then V satisfies QH and
n ki − 2 , and ki
Index (Q) =
i=1
n
(ki − 1).
(2.12)
i=1
For n = 2, with V (z) = z1k1 + z1 z2k2 , the potential also satisfies QH and EL. In this case, &1 =
1 k1 − 1 , &2 = , k1 k1 k2
cˆ = 2
(k1 − 1)(k2 − 1) , k1 k 2
and Index (Q) = k1 (k2 − 1) + 1.
(2.13)
Remark. The integer-valued index (2.10) is stable under a class of perturbations of V that are not necessarily quasi-homogeneous. Briefly, we require that V = V1 +V2 , where V1 satisfies the hypotheses QH and EL above. While V2 is a holomorphic polynomial, it is not necessarily quasi-homogeneous. In place of this, we assume that the perturbation V2 is small with respect to V1 in the following sense: given 0 < -, there exists a constant M1 < ∞ such that for any multi-derivative ∂ α of total degree |α| ≥ 1, |∂ α V2 | ≤ -|∂V1 | + M2 .
(2.14)
3. Supercharge Forms In this section, we define the supercharge Q as a densely-defined, symmetric, sesquilinear form. In later sections, we consider a family of self-adjoint operators Q that are mollifications of Q. The operators Q have a norm resolvent limit, showing that the sesquilinear form Q actually defines an unbounded operator. The definition of Q does not require renormalization.
The Elliptic Genus and Hidden Symmetry
97
The Hilbert space of our example is a Fock space H = Hb ⊗Hf . The bosonic Hilbert space Hb and the fermionic Hilbert space Hf are the symmetric and respectively the skew-symmetric tensor algebras over the one particle space K. Here K is the direct sum of 2n- copies of L2 (S 1 ). The free Hamiltonian H0 , the momentum operator P , the total number operator N = N b +N f , and twist generator J = J (&) are self-adjoint operators on H. Here N b is the total bosonic number operator, and it acts on H = Hb ⊗ Hf as N b ⊗ I , etc. The bosonic time-zero field ϕ(x), its conjugate field π(x) and fermion time-zero fields ψ(x) are operator valued distributions on H. There is a dense linear subset D ⊂ H, obtained by replacing L2 (S 1 ) by C0∞ (S 1 ), and by taking vectors with a finite number of particles. The domain D provides a natural domain on which to define operators, and then to extend them by closure. Furthermore the operators N , , H0 , P , and eiθJ all map D into D. In addition to defining operators with the domain D, we also define sesqui-linear forms with domain D × D. These are maps from pairs of vectors in D to C, that are antilinear in the first vector and linear in the second vector. By polarization, each such form can be expressed as a sum of four diagonal elements, namely as a sum of four expectations in vectors in D. On the domain D × D, the components of the time-zero fields ϕi (x), πi (x) and ψα,i (x), as well as normal-ordered polynomials in these components, are sesqui-linear forms; see for example [2]. The values of these forms defined in this way are C ∞ functions of x. We call them C ∞ -sesqui-linear forms with the domain D × D. Unless we specify otherwise, we use these domains and then extend the resulting operators or forms by closure. Ultimately our goal is to redefine operators and forms with domains determined by the range of a heat kernel of the Hamiltonian. Choose a potential function V satisfying QH and EL. This potential as a function of the scalar complex, boson field ϕ(x) determines the energy density of our system as follows. Let ψ(x) denote our Dirac field. Monomials in the components of the scalar field ϕi (x) (or in the components of the adjoint field, but not simultaneously in the components of the field and of its adjoint) are normal ordered. Since the boson fields and the Dirac fields act on different factors in the tensor product, the product of a normal ordered boson field and a Dirac field is also normal ordered. Let λ denote a real parameter lying in the interval [0, 1]. Define the normal ordered density D(λ; x) as the C ∞ sesqui-linear form D(λ; x) =
n
iψ1,j (x) πj (x) − ∂x ϕj (x)∗ + λψ2,j (x)Vj (ϕ(x))∗ ,
(3.1)
j =1
with domain D × D. The adjoint of a C ∞ sesqui-linear form is also a C ∞ sesquilinear form. Define the sesqui-linear form D(λ; x)∗ by polarization of the expectations f, D(λ; x)∗ f = f, D(λ; x)f ∗ , for f ∈ D. Define the supercharge density Q(λ; x) as the sesqui-linear form Q(λ; x) = D(λ; x) + D(λ; x)∗ .
(3.2)
The integral of these densities over S 1 yield supercharges that are densely-defined, sesqui-linear forms with the domain D × D, namely
D(λ) = 0
D(λ; x) dx, and Q(λ) = 0
Q(λ; x) dx = D(λ) + D(λ)∗ ,
(3.3)
98
A. Jaffe
where D(λ)∗ = 0 D(λ; x)∗ dx. If we also assume the twist assumption TW, then these forms have the properties for all λ ∈ [0, 1], all σ , and all θ, Q(λ) = −Q(λ) ,
eiσ P Q(λ) = −Q(λ) eiσ P ,
and eiθJ Q(λ) = −Q(λ) eiθJ .
(3.4)
The supercharge that we denote Q(λ), or sometimes Q(λV ), is the one that we study most in this paper. Define D0 = D(0) and Q0 = Q(0), and define DI and QI so that D(λ) = D0 + λDI , and Q(λ) = Q0 + λ QI .
(3.5)
The supercharge Q0 extends by closure to a self-adjoint operator Q0 . (This means that the form obtained by closing Q0 with the domain D × D uniquely determines a selfadjoint operator that we also name Q0 .) This operator is essentially self adjoint on the domain D, and also maps this domain into itself. Furthermore, Q0 commutes with the operator P and with the operator J defined for any &. The operator Q0 anticommutes f with = (−I )N . The square of the supercharge operator Q0 has the property Q20 = H0 + P .
(3.6)
As Q0 commutes with P , it follows from (3.6) that Q0 commutes with H0 . Furthermore, as P ≤ H0 , we have the elementary inequality of forms, ±Q0 ≤ |Q0 | ≤
√ 1/2 2H0 .
(3.7)
We also require the second component of the supercharge. This is a sesqui-linear form Q2 (λ), defined as the integral 0 Q2 (λ; x) dx of the density Q2 (λ; x) = D2 (λ; x) + D2 (λ; x)∗ . Here D2 (λ; x) is the C ∞ -sesqui-linear form D2 (x) =
n
iψ2,j (x) πj (x)∗ + ∂x ϕj (x) + λψ1,j (x)Vj (ϕ(x)) e−iφx/ .
(3.8)
j =1
As with the first component of the supercharge, Q2 (λ) = −Q2 (λ) ,
(3.9)
Q2 (λ) = Q2,0 + λQ2,I ,
(3.10)
and we have the decomposition
where Q2,0 and Q2,I are independent of λ. The form Q2,0 uniquely determines a selfadjoint operator that we also denote as Q2,0 . However, unlike the case of the operator Q0 , the operator Q2,0 is neither translation invariant nor twist invariant. Nevertheless, the square of Q2,0 is invariant under both these groups. This square equals Q22,0 = H0 − P + φR,
(3.11)
The Elliptic Genus and Hidden Symmetry
99
where n
R=−
2 i=1
:ψ2,i (x)ψ2,i (x)∗ : dx.
(3.12)
0
Here : · : denotes normal ordering. An explicit representation of 2 R can be given as a difference of two terms, each term being a sum of number operators for a subset of the fermionic modes, see [4]. This ensures, in particular, that 2 N,
±R ≤
(3.13)
where N denotes the total number operator. 4. Approximating Supercharge Operators In order to study the properties of the Hamiltonian, we introduce approximating families of supercharge forms Q (λ) indexed by a parameter ∈ [0, ∞], and with the property Q0 (λ) = Q0 , and Q∞ (λ) = Q(λ). Let Q (λ) = Q0 + λQI, ,
Q2, (λ) = Q2,0 + λQ2,I, .
and
(4.1)
f
In [4] we introduce a family of mollifier functions κi,b and κα,i, for the scalar and Dirac fields respectively. These mollifiers act by convolution, with a particular mollifier for each field component. The mollifiers have an index that specifies a momentum scale for the mollifier, and each mollifier converges to the Dirac measure δ as → ∞. We define mollified time-zero fields ϕ (x) and ψ (x) as sesqui-linear forms with components
f b κi, (x − y) ϕi (y) dy, and ψα,i, (x) = κα,i, (x − y) ψα,i (y) dy. ϕi, (x) = 0
0
(4.2) We apply the mollifiers only to the fields that occur in the terms QI and Q2,I . These terms are the interaction terms and are proportional to λ; in this way we mollify the boson and the fermion fields symmetrically. We construct the mollifiers from a single smooth, positive function κ˜ as follows. Let 1
κ˜ sdi (k) =
1 + k2
- ,
(4.3)
where 0 < - ≤ -(V ), and where we choose -(V ) sufficiently small. We choose for κ(k) ˜ any smooth function such that ˜ ≤ κ(0) ˜ = 1. κ˜ sdi (k) ≤ κ(k)
(4.4)
The lower bound on κ(k) ˜ by the strictly positive function κ˜ sdi (k) is the property that we call slow decrease at infinity or sdi, and it ensures that κ(k) ˜ is sufficiently close to being local, i.e. κ(k) ˜ = 1. We introduced this sdi property in [14] in order to establish stability for a purely-bosonic, bi-local interaction. In the supersymmetric case, the mollified Hamiltonian is bilocal and it is therefore natural to use an sdi mollifier. In [10] we establish stability based on these ideas. We represent the trace of the heat kernel of an
100
A. Jaffe
approximate Hamiltonian as a functional integral. The sdi property allows us to study a partition of unity of function space, and to show on each patch that the bi-local bosonic self-interaction can be bounded by a similar local self-interaction (with a coefficient that depends on the patch). The method is sufficiently robust that we can also estimate the non-local contributions from the ferm ionic determinant. We describe this phenomenon in more detail in Sect. 5. We define the family of periodic mollifier functions indexed by by the Fourier series 1 κ (x) = κ(k/ ˜ ) e−ikx , (4.5) 2π k∈
Z
where the series for κ converge in the sense of distributions. Each kernel κ (x) satisfies κ (x + ) = κ (x).
(4.6)
Denote by S the space of C ∞ , periodic functions on the circle. Let κ denote the integral operator κ : S → S defined by the integral kernel κ (x, y) = κ (x − y). In other words, κ is the operator of convolution by κ (x) on S. Given the usual topology on these smooth functions, the adjoint κ + of the operator κ acts on the dual space of distributions on the circle, defined by κ + ϕ (f ) = ϕ(κ f ). This adjoint is an integral
operator with the kernel κ + (x, y) = κ (y − x) = κ (x − y). Consider the space Sib = e−i&i φx/ S of smooth functions on the circle satisfying the twist relation f (x + ) = e−i&i φ f (x). These are the test functions for the components f f of the bosonic, time-zero, twist field. Likewise define the spaces S1,i = Sib and S2,i =
f e−i(1−&i )φx/ S. For each , define operators κi,b acting on Sib , operators κ1,i, acting f f f on S1,i , and operators κ2,i, acting on S1,i . To simplify notation we designate the mollifiers acting on the dual space by κ b , etc., without the adjoints, defining them as the convolution operators with kernels f
κi,b (x) = κ1,i, (x) = ei&i φx/ κ (x),
and
f
κ2,i, (x) = ei(1−&i /)φx κ (x). (4.7) f
f
The kernels satisfy κi,b (x) = κi,b (−x), and similarly κα,i, (x) = κα,i, (−x). They satisfy the twist relations f
κi,b (x + ) = κ1,i, (x + ) = ei&i φ κi,b (x), and
f
f
κ2,i, (x + ) = ei(1−&i )φ κ2,i, (x).
(4.8)
converge as → ∞ to the identity as operators on Sib , and f f similarly for κα,i, on Sα,i . Correspondingly, the kernels converge as distributions to a Dirac measure δ,
The operators κi,b
f
lim κi,b (x) = lim κα,i, (x) = δ(x). →∞
→∞
(4.9)
The Elliptic Genus and Hidden Symmetry
101
Also define n families of spatially-dependent kernels vi, (x) by the Fourier representations that converge in the sense of distributions, 1 (4.10) |κ(k/ ˆ )|2 e−ikx . vi, (x) = ei(1−&i )xφ/ 2π k∈
Z
In the sense of distributions, lim vi, (x) = δ(x).
(4.11)
→∞
With these definitions, we establish in [10] that the forms Q and Q2, determine self-adjoint operators. The operator Q have the properties Q = −Q ,
Q eiθJ +iσ P = eiθJ +iσ P Q ,
and
(4.12)
for all real θ, σ . Furthermore the operators Q2, satisfy Q2, = −Q2, .
(4.13)
But these operators Q2, do not commute with the group eiθJ +iσ P . The operator Q satisfies the normal relation of the first component of a supercharge and a Hamiltonian H , Q2 = H + P .
(4.14)
Here the Hamiltonian H is a perturbation of the free, twist- field Hamiltonian H0 = f H0b + H0 , and has the (non-local) form H = H (λV ) = H0 +
n i=1
dx
0
0
d y Vi (ϕ (x))∗ λ2 vi, (x − y) Vi (ϕ (y))
+ λ Y +Y
∗
,
(4.15)
where the boson-fermion coupling Y is the generalized Yukawa interaction Y = Y (V ) =
n i,i =1 0
ψ1,i, (x)ψ2,i , (x)∗ Vii (ϕ (x)) dx.
(4.16)
On account of the positive definite nature of the kernel λ2 vi, (x), the bosonic part of H , namely H b = H0b +
n i=1
0
dx 0
d y Vi (ϕ (x))∗ λ2 vi, (x − y) Vi (ϕ (y)),
(4.17)
is a sum of positive operators. In fact, the bosonic Hamiltonian can also be written, H b = H0b + λ2 Q2I, ,
(4.18)
102
A. Jaffe
where we note the identity, QI, (V )2 =
n i=1
dx
0
0
d y Vi (ϕ (x))∗ λ2 vi, (x − y) Vi (ϕ (y)).
(4.19)
The bosonic Hamiltonian H b is not normal ordered, and unlike H (λV ), it has no limit as → ∞. The second family of approximate supercharges Q2, are also related to H . However, their square has an error term in the standard supersymmetry algebra, Q22, = H − P + φR,
(4.20)
where R is the same operator that arose when analyzing the square of the free supercharge Q2,0 . The error term is given in (3.12). We use the following result from [10]: Proposition 4.1. Assume the potential V satisfies the assumptions QH, EL, assume the relations TR, and assume the definitions of Q , Q2, , H , and P in Sect. 3 and Sect. 4. Then the forms Q , Q2, , H , and P define self adjoint operators on H. The operators H are bounded from below. The operators Q , H , and P mutually commute, and they also commute with J . 5. Estimates on Operators We consider here the basic properties of the Hamiltonian and the supercharges. This leads to consideration of estimates that involve implicit renormalization cancellations. These estimates depend only on the form of the underlying operators H , P , N , etc., and they lead to inequalities of operators or their norms. These estimates do not involve further cancellations of the sort that arise in the proof of estimates on partition functions, that we consider in the following section.
5.1. A Priori Estimates. The results here require certain a priori estimates involving the family of Hamiltonians H = H (λV ), or the associated family of self-adjoint semigroups e−βH (λV ) that the H (λV ) generate. The proofs of these estimates are lengthy, so we establish them as the central results in the companion paper [10]. These estimates are of utmost importance, so we give an overview by collecting together the necessary statements. Within the context of constructive quantum field theory, the estimates we assume are of a standard nature, though they have not been previously proved in the context of zero-mass (twist) fields that we use here. The operators occurring in this section have been introduced earlier in this paper. For more details about these definitions, see [4]; for analytic details, see [10]. • In case the following inequalities involve β, we take β > 0. We choose a given, fixed φ ∈ (0, π], and a given, fixed V satisfying QH and EL of Sect. 1.1, and we define the Hamiltonian with the twist relations TR. The operators in question act on a Fock space H = H(&, φ) depending on the parameters &, φ. We fix these parameters throughout the approximations in this paper. By convention, we generally do not note the dependence of constants on φ, while we generally indicate the dependence on V .
The Elliptic Genus and Hidden Symmetry
103
• We require certain estimates that are uniform in , the parameter that designates the high-frequency mollifier. There exist positive, finite constants M1 = M1 (V ), M2 = M2 (V ), and M = M(β, V ) that are independent of , and of λ ∈ (0, 1], and such that N ≤ M1 H (λV ) + M2 , 1/2 H0
and
(5.1)
≤ M1 H (λV ) + M2 ,
Tr H e−βH
(λV )
(5.2)
≤ M(β).
(5.3)
There exists a self-adjoint R(β) = R(β; λ), that is a semigroup in β and that depends on the parameter λ, and such that e−βH
(λV )
− R(β) → 0, as
→ ∞,
(5.4)
for each λ ∈ (0, 1]. • We require the following estimate that is not uniform in . Given , there exist constants M1 = M1 ( , V ) and M2 = M2 ( , V ) such that for all λ ∈ [0, 1], H0 + λ2 QI,2 ≤ M1 H (λV ) + M2 .
(5.5)
Remarks. It is no loss of generality in (5.5) to increase the constants, if necessary, so in addition 1 ≤ M1 and H0 + λ2 QI,2 + I ≤ M1 H (λV ) + M2 , so H0 + λ2 QI,2 + I ≤ M1 (H (λV ) + M2 ) as well. We make this assumption. From the norm convergence of semigroups (5.4), we infer that the limiting semigroup R(β) has a self-adjoint generator H = H (λV ). This defines the limiting Hamiltonian, and R(β) = e−βH (λV ) . The uniform bound on the trace of e−βH (λV ) ensures that H (λV ) is bounded from below,4 and there exists a constant M3 = M3 (λV ) such that 0 ≤ H (λV ) + M3 .
(5.6)
For this limiting theory, there is a self-adjoint operator Q = Q(λV ) that commutes with P and that anticommutes with , for which Q(λV )2 = H (λV ) + P .
(5.7)
We comment briefly on the mollifier κ(k/ ˜ ) that we employ, rather than a mollifier, for example, that completely eliminates Fourier modes with large |k|. In the latter case, the approximating Hamiltonians have not been proved to be bounded from below. We first studied the special advantages of a mollifer like with slow decrease at infinity in [14], where we used this property that expresses “almost-locality”, to show that a class of bosonic Hamiltonians are bounded from below. We showed that the normal-ordered, purely-bosonic bilocal Hamiltonian :H b :, with H b of the form (4.17), is bounded from below. Specifically, in [14] we treat the case with n = 1, with a massive (rather than a b , and with no twist, φ = 0. massless) unperturbed Hamiltonian H0,m We outline the basic idea of our method in [10] to utilize the slowly decreasing prop
erty of the mollifier to prove the estimate (5.3). We begin by representing Tr H e−βH as 4 Without good control over convergence, such as the norm-convergence of semigroups that is the case here, a uniform bound like (5.1) or (5.2) on H (λV ) is insufficient information to establish a lower bound on H (λV ).
104
A. Jaffe
a functional integral. This is the functional integral for the normal-ordered purely bosonic actions, multiplied by a regularized Fredholm determinant arising from the expectation in the fermionic modes. We insert an appropriate partition of unity 1 = ∞ i =1 χi into this integral, thus dividing the integration into a sum of integrals over patches. To obtain an effective bound, we need to replace the non- local bosonic part of the action by a related local term. We do this on each patch, using several things: the positive definite form of the interaction term, the explicit form of the mollifier function κ(k/ ˆ ), in particular its monotonic property and its slowly decreasing character as a function of |k|. Using these properties, we bound the bilocal (boson ic) action from below on the patch χi . We obtain a lower bound on the bilocal action with the non-local coupling constant λ2 vi, (x − y) by a similar local action but with a local, coupling constant of ˜ d/ )2 δ(x − y). Here d + 1 = n˜ denotes the degree of the polynomial the form λ2 κ(i V . The coefficient of λ2 here is κ(i ˜ d/ )2 , and this vanishes as i → ∞ (namely at high momentum). In fact for constant , we have the asymptotics λ2 κ(i ˜ d/ )2 ∼ λ2 i −2- . We use the local action to estimate further non-local perturbations of lower degree, as well as local perturbations of lower degree, on the patch χi . This results in an additive, constant error term ri that has a magnitude |ri | ≤ o(1)(κ(i ˜ d/ )−2 ) ≤ o(1)(i )2- , - which diverges as i → ∞. The measure |χi | of the set χi satisfies |χi | ≤ e−i , where - = - (V ) > 0. This constant is small, and it depends only on the polynomial V . Therefore, fixing V , we can choose -(V ) ≥ - > 0 sufficiently small so that the product e|ri | |χi | is small for large i . When summed over i it leads to a finite estimate on the integral. We also use the approximate local bosonic action to estimate the non-local terms arising from the regularized Fredholm determinants. In this fashion we establish the uniform upper bound (5.3) on the trace of the family of approximating heat kernels. The method to establish the remaining bounds is similar. 5.2. Traces. In this section we collect a few general remarks that we use later. The Schatten p-norm of T for operators on H is defined as
p/2 1/p T p = Tr H T ∗ T . These norms satisfy Hölder’s inequalities T S r ≤ T p S q , where r = pq/(p + q), and 1 ≤ r, p, q. Furthermore, the trace norm · 1 is also given by T 1 = supunitary U |Tr H (U T )|, see Sect. III of [18]. Thus |Tr H (T )| ≤ T
1.
(5.8)
An operator T with T 1 < ∞ is said to be trace class, and such operators have a basis-independent trace. A sufficient condition to ensure the cyclicity identity of the trace, Tr H (AB) = Tr H (BA),
(5.9)
is that A is trace class and B is bounded. One says that a self-adjoint semigroup R(t) is C-summable if there is a function M(t) < ∞ such that R(t) 1 < M(t) for all 0 < t. A family of semigroups Rj (t) is uniformly C-summable if Rj (t) ≤ M(t), (5.10) 1 for all j .
The Elliptic Genus and Hidden Symmetry
105
Proposition 5.1. Assume that {Rj (t)} are a family of uniformly C-summable semigroups on a Hilbert space H, and assume that Rj (t) − R(t) → 0 as j → ∞. Then R(t) is trace class, and Rj (t) converges to R(t) in trace norm, (5.11) lim Rj (t) − R(t)1 = 0, and R(t) 1 ≤ M(t), for all 0 < t. j →∞
Furthermore, for any bounded operator A,
Tr H (AR(t)) = lim Tr H ARj (t) , for all 0 < t.
(5.12)
j →∞
Proof. Write
Rj (t) − Rm (t) = Rj (t/2) Rj (t/2) − Rm (t/2) + Rj (t/2) − Rm (t/2) Rm (t/2). (5.13) Thus by Hölder’s inequality, Rj (t) − Rm (t)
1
≤ 2M(t/2) Rj (t/2) − Rm (t/2) .
(5.14)
Hence Rj (t) is a Cauchy sequence in the Schatten ideal of trace class operators. Thus there exists a trace-class limit R(t), for which Rj (t) − R(t) → 0, and R(t) ≤ M(t). (5.15) 1 1 ≤ Rn (t) − R(t) , we infer from (5.15) that R(t) = R(t). Since Rj (t) − R(t) 1 Since R(t) and Rj (t) are trace class, if A is bounded then AR(t) and ARj (t) are also trace class. For a trace class operator T , we use (5.8) and Hölder’s inequality to obtain
Tr H ARj (t) − AR(t) ≤ ARj (t) − AR(t) ≤ A Rj (t) − R(t) , (5.16) 1 1 from which (5.12) follows. This completes the proof of the lemma.
# "
Lemma 5.2. Let e−βH be a self-adjoint, C-summable semigroup, and let A be a bounded operator on H. Then the map (σ, β) → Tr H A eiσ P −βH (5.17) extends holomorphically in β to all iβ ∈ H (keeping σ ∈ R fixed). Suppose the unitary group eiσ P is a symmetry of H , and there exist constants M1 , M2 < ∞ such that ±P ≤ M1 H + M2 .
(5.18)
Then for iβ ∈ H, the map (5.17) extends analytically in σ into a strip about the real axis of width proportional to $(β), and otherwise only depending on M1 and M2 . Proof. Theta summability ensures that H is bounded from below, so it is no loss of generality to add a constant to H so H ≥ I . With this convention, we can replace (5.18) by the assumption that there exists a constant M = M(M1 , M2 ) < ∞ such that ±P ≤ M H.
(5.19)
106
A. Jaffe
To prove analytic continuation in β, it is sufficient to establish a neighborhood of absolute convergence for the power series in - of ∞ Tr H e−(β+-)H = (−-)n /n!Tr H H n e−βH , n=0
starting initially with real β. Express β in its real and imaginary parts β = $(β)+i(β). The operator ee(β) is unitary, so for 0 < $(β), the operator H n e−βH /2 is bounded in norm by (n/$(β))n . So using Hölder’s inequality and (5.8) |Tr H H n e−βH | ≤ (n/$(β))n e−βH /2 1 . Then the exponential series converges absolutely for |-| < $(β)/e, yielding ∞ |-|n n=0
n!
|Tr H H n e−βH | ≤ (1 − |-|e/$(β))−1 e−$(β)H /2
1
< ∞,
(5.20)
as desired. We assume that P and H commute, so we simultaneously diagonalize these operators. We conclude from the spectral representation and (5.19) that |P |n ≤ M n H n for non-negative integers n. Proceed as above in the domain |-| < $(β)/Me, the power series in - for ei(σ +-)P e−βH 2 converges absolutely in operator norm. Using Hölder’s
inequality and (5.8), it then follows that Tr H eiσ P −βH is real analytic in σ for iβ ∈ H, and the proof is complete. " # Proposition 5.3. Assume quantum twist fields interact, with the nonlinearity determined by a polynomial V as specified above. Assume QH, EL, and TR of Sect. 1.1. Then there exist constants M1 and M2 , independent of , and such that ±P ≤ M1 H + M2 .
(5.21)
As a consequence, with a new constant M1 , Q2 ≤ M1 H + M2 .
(5.22)
Proof. The identity Q2 = H + P of (4.14) gives an upper bound on −P , −P ≤ H .
(5.23)
In order to obtain an upper bound on P , we take into account the details concerning the second component of the supercharge Q2, . From the relation (4.20) we infer that P ≤ H + φR.
(5.24)
Thus to establish an upper bound on P , it is sufficient to establish an upper bound on R in terms of H . We use the explicit form for R in (3.12), and the following comment; see [4] for details. It therefore follows that R satisfies the bound ±R ≤
2 N,
(5.25)
where N is the total number-of-particles operator. Using (5.1), we infer that P ≤ M1 H + M2 , with constants independent of . The bound (5.21) then follows, and from (4.14) we also infer (5.22). " #
The Elliptic Genus and Hidden Symmetry
107
5.3. Continuity of the Heat Kernel for λ > 0. We establish Lipshitz continuity, in the trace-norm topology, of the map λ → e−βH
(λV )
,
(5.26)
from the parameter λ ∈ (0, 1] into the approximating heat kernels. Stated in detail, for each allowed V , each fixed j < ∞, and each fixed λ ∈ (0, 1], and for |λ−λ | sufficiently small, there exists a constant M such that −βH (λV ) − e−βH (λ V ) ≤ M |λ − λ |. (5.27) e 1
Unfortunately, the estimates that we have proved for H (λV ) are insufficient to show that the map (5.26) is differentiable
in λ, and we do not know whether this is true. Also, we do not know whether Tr H e−H (λV ) is differentiable in λ. However, in the next
subsection we show that the partition function Tr H e−H (λV ) is differentiable in λ. We study the λ-derivative of the approximating family of heat kernels. For λ, λ ∈ (0, 1], and λ = λ , define the difference quotient of e−βH (λV ) by e−βH
− e−βH (λ V ) , and let λmin = min{λ, λ }. (5.28) λ − λ In the following we let R(β) denote the self-adjoint, trace- class semigroup generated by H (λV ), and let R (β) denote the similar semigroup generated by H (λ V ), Dβ (λ, λ ) =
(λV )
R(β) = e−βH
(λV )
,
and
R (β) = e−βH
(λ V )
.
(5.29)
β
Define the function F (λ, λ , s) for λ, λ , s ∈ (0, 1) for allowed potentials V by β
F (λ, λ , s) = −β e−sβH
(λV )
Q (λV ) QI, (V ) + QI, (V ) Q (λ V ) e−(1−s)βH
(λ )
.
(5.30)
= −βR(sβ) Q (λV ) QI, (V ) + QI, (V ) Q (λ V ) R ((1 − s)β).
(5.31)
We also write this as β
F (λ, λ , s)
Note that the bound (5.3) ensures that Dβ (λ, λ ) is trace class. By itself, this does not establish (5.27), as the trace norm may diverge as λ → λ. Also the bound (5.3), taken β together with the bound (5.5), ensures that F (λ, λ , s) is the sum of two trace-class β operators. In order to verify that F (λ, λ , s) is trace class, write each of the two heat kernels in (5.30) as the square of a heat kernel. The bound (5.3) shows that one of the heat kernel factors by itself is trace class. The second heat kernel multiplies Q (λV ), Q (λ V ), or QI, (V ) (either on the left or on the right); the estimates (5.3) and (5.5) show that each such product is bounded. Since the product of a bounded operator with a β trace-class operator is trace class, we infer that F (λ, λ , s) is trace class. But we have no control over how the trace-norm diverges (for fixed β) as s approaches an endpoint of the interval. We now address these issues. Let us denote the degree of the polynomial V by n˜ = degree(V ),
and note
2 ≤ n, ˜
in order to satisfy the elliptic growth assumption EL of Sect. 1.1.
(5.32)
108
A. Jaffe
Theorem 5.4. Assume quantum twist fields interact, with the nonlinearity determined by a polynomial V as specified above. Assume QH, EL, and TR of Sect. 1.1. Let β > 0. Let j ∈ Z+ be fixed. Then there exists a constant M = M(β, , V ) < ∞, such that the difference quotient Dβ (λ, λ ) satisfies the trace-norm bound β ˜ D (λ, λ ) ≤ M λ−1+1/(n−1) , (5.33) min 1 for all λ, λ ∈ (0, 1]. Lipshitz continuity (5.27) then follows. Theorem 5.4 is contained in Proposition 5.5 and Corollary 5.7 that follow. Proposition 5.5. Under the hypotheses of Theorem 5.4, there exists a constant M = ˜ M(β, , V ) < ∞ such that for λ, λ ∈ (0, 1], for s ∈ (0, 1), and for 0 ≤ α ≤ 1/(n−1), the following holds: β
(i) The operator F (λ, λ , s) defined in (5.30) has a trace norm bounded by β s −1+α/2 (1 − s)−1/2 + s −1/2 (1 − s)−1+α/2 . F (λ, λ , s) ≤ M λ−1+α min 1
(5.34) β
(ii) The map s → F (λ, λ , s) is continuous in the trace-norm topology. Lemma 5.6. There exists a constant M3 = M3 (j, V ) such that the following bounds hold: (i) For any α ∈ [0, 1], the interaction QI, (V ) satisfies ˜ QI, (V )2α ≤ M3α (N + I )α(n−1)
and also
˜ QI, (V )2α ≤ M3α (H0 + I )α(n−1) .
(5.35)
Here N is the total number operator and n˜ is the degree of V . (ii) The generalized Yukawa interaction Y + Y ∗ = {Q0 , QI, (V )} satisfies ˜ ±{Q0 , QI, (V )} ≤ M3 (H0 + I )n−1 .
(5.36)
(iii) For 0 ≤ α ≤ (n˜ − 1)−1 , 0 < λ ≤ 1, and 0 ≤ λ ≤ 1, −α(n−1)/2
˜ (H (λV ) + M2 )−(1−α)/2 QI, (V ) H (λ V ) + M2 ≤ M3 λ−1+α .
(5.37)
Proof. The estimates leading to this bound rely on the expansion of the bosonic field into its Fourier representation. The Fourier coefficients of the field are linear in creation and annihilation operators, multiplied by a kernel that is l 2 , by virtue of the mollifier κ, ˜ but with an l 2 norm depending on V and also on . These expansions and properties are given in detail in [4]. As a consequence, the operator Q2I, , that equals (4.19), has an expansion in terms of the fields that is a polynomial in creation and annihilation operators of degree 2(n˜ − 1). Each monomial in this expansion, expressed in terms of creation and annihilation operators, has an l 2 kernel. As a consequence, there is a constant M3 = M3 (j, V ), such that the purely bosonic interaction term QI, (V )2
The Elliptic Genus and Hidden Symmetry
109
˜ satisfies the upper bound, QI, (V )2 ≤ M3 (N + I )n−1 . This estimate is a standard property of monomials in creation and annihilation operators with l 2 -kernels; in the constructive quantum field theory literature this estimate is known as an Nτ -bound, and the contribution to the constant M3 from each monomial is the l 2 norm of the corresponding kernel, see [3]. Because the twisting angle is fixed and lies in the interval 0 < φ ≤ π, there is a constant M5 = M5 (φ) such that the commuting operators N and H0 satisfy N ≤ M5 H0 . Thus with a new choice of the constant M3 (and suppressing the dependence on φ, which is fixed) we obtain the bounds (5.35) with α = 1. The interpolation inequalities with 0 ≤ α ≤ 1 then follow from the Cauchy representation for the fractional powers of the resolvents, see Chapter V, Remark 3.50 of [16]. (ii) The bound
±{Q0 , QI, } ≤ Q20 + Q2I, = H0 + P + Q2I, ,
(5.38)
leads to the desired estimate with a new constant M3 . Use the elementary bound P ≤ H0 to estimate P , and use the bound (5.35) with α = 1 to estimate Q2I, . (iii) The bound (5.5) with 1 ≤ M1 and I ≤ H (λV ) + M2 ensures that λ2 QI, (V )2 ≤ M1 (H (λV ) + M2 ) .
(5.39)
As a consequence, the domain of the operator QI, (V ) = QI, (V ) contains all vectors in the domain of (H (λV ) + M2 )1/2 , for any λ > 0. It follows that we have an interpolation inequality: for any α ∈ [0, 1], (1−α)
λ2(1−α) |QI, (V )|2(1−α) ≤ M1
(H (λV ) + M2 )1−α .
(5.40)
The operator form of this inequality is (1−α)/2 −1+α λ . |QI, (V )|1−α (H (λV ) + M2 )−(1−α)/2 ≤ M1
(5.41)
Using part (i) of the lemma, we also have the operator interpolation inequality, α/2 ˜ (5.42) |QI, (V )|α (N + I )−α(n−1)/2 ≤ M1 . Note that the bound (5.42) does not involve λ. Combining (5.41) and (5.42), and the self-adjointness of QI, and H (λV ), we have
−α(n−1)/2 ˜ (H (λV ) + M2 )−(1−α)/2 QI, (V ) H (λ V ) + M2 ˜ ≤ (H (λV ) + M2 )−(1−α)/2 QI, (V ) (N + I )−α(n−1)/2 (5.43)
−α(n−1)/2 ˜ ˜ × (N + I )α(n−1)/2 H (λ V ) + M2 (1−α)/2 −1+α
≤ M1
λ
.
We obtain the interpolation bound on
−α(n−1)/2 ˜ α(n−1)/2 ˜ ˜ H (λ V ) + M2 , (N + I )α(n−1)/2 ≤ M1 using (5.1), as long as α(n˜ − 1) ≤ 1, which we assume. This completes the proof of the lemma. " #
110
A. Jaffe β
Proof of Proposition 5.5. Expand F (λ, λ , s) according to the definition (5.31). Write the first term R(sβ) Q (λV ) QI, (V ) R ((1−s)β) term in −F (λ, λ , s) as the following product of four bounded operators separted in braces, R(sβ) Q (λV ) QI, (V ) R ((1 − s)β) = {R(sβ/4)} {R(sβ/4) Q (λV ) R(sβ/4)} × R(sβ/4) QI, (V ) R (3(1 − s)β/4) R ((1 − s)β/4) . (5.44) Apply Hölder’s inequality to bound the trace norm of this product of four terms, using 1 . Then the exponents 1s , ∞, ∞, 1−s R(sβ) Q (λV ) QI, (V ) R ((1 − s)β) 1 ≤ R(sβ/4) 1/s R(sβ/4) Q (λV ) R(sβ/4) × R(sβ/4) QI, (V ) R (3(1 − s)β/4) R ((1 − s)β/4)1/(1−s) .
(5.45)
Bound the first and last factors on the right of (5.45) using the uniform estimate (5.3). Thus R(sβ/4)
1/s
R ((1 − s)β/4)
1/(1−s)
≤ M(β/4).
(5.46)
Use the spectral theorem to bound the second factor on the right of (5.45) uniformly in λ, by R(sβ/2) Q (λV ) R(sβ/4) ≤ O(1) s −1/2 , where the constant in O(1) depends on β and in (5.45) by
(5.47)
, but not on λ. Bound the third factor
R(sβ/4) QI, (V ) R ((1 − s)β/2) ≤ R(sβ/4) (H (λV ) + M2 )(1−α)/2
−α(n−1)/2 ˜ × (H (λV ) + M2 )−(1−α)/2 QI, (V ) H (λ V )
α(n−1)/2 ˜ × H (λ V ) R ((1 − s)β/2).
(5.48)
The first factor on the right of (5.48) is O(1) s −(1−α)/2 , again with the constant in O(1) depending on β and , but not on λ. From Lemma 5.6 we infer that the second factor in (5.48) is O(λ−1+α ), with the same proviso about O(1). Finally we estimate the third ˜ , with O(1) depending on and β. These three factor in (5.48) by O(1)(1 − s)−α(n−1)/2 bounds yield ˜ R(sβ/4) QI, (V ) R ((1 − s)β/2) ≤ O(1) λ−1+α s −(1−α)/2 (1 − s)−α(n−1)/2 . (5.49)
The Elliptic Genus and Hidden Symmetry
111
We combine the estimates (5.46), (5.47), and (5.49) to obtain ˜ R(sβ) Q (λV ) QI, (V ) R ((1 − s)β) ≤ O(1)λ−1+α s −1+α/2 (1 − s)−α(n−1)/2 , 1
(5.50) which is the first term in the bound (5.34). In order to bound the second term R(sβ) QI, (V ) Q (λ V ) R ((1 − s)β) in −F (λ, λ , s), repeat this procedure, but use the adjoint bounds. This yields the estimate R(sβ) QI, (V ) Q (λ V ) R ((1 − s)β) 1 ˜ ≤ O(1)(λ )−1+α s −α(n−1)/2 (1 − s)−1+α/2 .
(5.51)
Adding (5.50) and (5.51) completes the proof of the desired estimate (5.34). We now establish statement (ii). Let 0 < s < s < 1. Using the bound (5.1), we infer that there is a constant M suchthat, for λ ∈ [0, 1], the heat kernel e−sβH (λV ) is bounded in norm by M sβ . Therefore e−sβH (λV ) − e−s βH (λV ) ≤ 2M β . We can also bound the difference e−sβH (λV ) − e−s βH (λV ) = I − e−(s −s)βH (λV ) e−sβH (λV ) , (5.52) using the fundamental theorem of calculus, giving −sβH (λV ) − e−s βH (λV ) ≤ M β |s − s|/s. e Combining these two bounds on the difference, we infer that there is a new constant M > 1 such that for any 0 < - ≤ 1, -
−sβH (λV ) − e−s βH (λV ) ≤ M β |s − s|/s . (5.53) e The same bounds hold with H (λV ) replaced by H (λ V ). To simplify notation, let us denote H = H (λV ), H = H (λ V ), Q = Q (λV ), Q = Q (λ V ), and QI = QI, (V ). Now write the difference β
β
F (λ, λ , s) − F (λ, λ , s )
= βe−sβH Q QI + QI Q e−(1−s)βH − βe−s βH Q QI + QI Q e−(1−s )βH = β I − e−(s −s)βH e−sβH /2 F β/2 (λ, λ , s) e−(1−s)βH /2
+ βe−s βH /2 F β/2 (λ, λ , s ) e−(1−s )βH
/2
e−(s −s)βH − I .
(5.54)
From (5.53) and Hölder’s inequality, we obtain for any 0 < - ≤ 1, β β F (λ, λ , s) − F (λ, λ , s ) 1 ≤ βM β |s − s|- s −- F β/2 (λ, λ , s) + (1 − s )−- F β/2 (λ, λ , s ) . 1
1
(5.55)
Taking the bound (5.34) into account, we conclude in the case 0 < s < s < 1 that β the map s → F (λ, λ , s) is Hölder continuous in trace norm with an exponent - . A similar bound holds if 0 < s < s < 1, but with s and s interchanged, completing the proof of the proposition. " #
112
A. Jaffe
Corollary 5.7. We have the following. (i) Let η > 0. For any bounded operator A,
1−η
Tr H A F (λ, λ , s) ds = Tr H A η
1−η η
F (λ, λ , s)ds .
(5.56)
1−η β (ii) Let η > 0. The operators η F (λ, λ , s)ds converge in trace-norm as 1 β η → 0, defining 0 F (λ, λ , s)ds. Thus for any bounded A, 1
1
Tr H A F (λ, λ , s) ds = Tr H A F (λ, λ , s)ds . (5.57) 0
0
(iii) For A = I , this limit equals the difference quotient,
1−η β β = 0, lim D (λ, λ ) − F (λ, λ , s)ds η→0
η
and Dβ (λ, λ ) =
(5.58)
1
1
β
F (λ, λ , s)ds.
(5.59)
β Tr H A F (λ, λ , s) ds,
(5.60)
0
(iv) For any bounded operator A,
Tr H A Dβ (λ, λ ) =
1 0
yielding the estimate
Tr H ADβ (λ, λ ) ≤ O(λ−1+α ) A . min
(5.61) β
Proof. Statement (i) of the corollary follows from the continuity of F (λ, λ , s) in s, namely Proposition 5.5.ii. Statement (ii) of the corollary is a consequence of the es timate of Proposition 5.5.i. We now verify (iii). Consider the domain Ds × D1−s = −(1−s)H (λ V ) −sH (λV ) H×e H. Both H (λV ) and H (λ V ) are sesqui-linear forms e on this domain. Furthermore, from Proposition 5.3 we infer that both H (λV ) = Q (λV )2 − P and H (λ V ) = Q (λ V )2 − P on this domain. Therefore, we have the identity of forms, H (λV ) − H (λ V ) = Q (λV )2 − Q (λ V )2
= Q (λV ) Q (λV ) − Q (λ V ) (5.62)
+ Q (λV ) − Q (λ V ) Q (λ V )
= λ − λ Q (λV )QI, (V ) + QI, (V )Q (λ V ) ) . Consequently, on H × H, on Ds × D1−s H (λV ) − H (λ V ) −(1−s)H e−sH (λV ) e λ − λ
(λ V )
β
= −F (λ, λ , s).
(5.63)
The Elliptic Genus and Hidden Symmetry
113
Part (ii) of the corollary asserts that this expression has an integral over s ∈ [η, 1 − η], that converges in trace norm as η → 0. Therefore
1
e−sH
(λV )
0
H (λ V ) − H (λV ) −(1−s)H e λ − λ
(λ V )
ds =
1
β
F (λ, λ , s)ds.
0
(5.64) But the left side of this identity is the difference quotient Dβ (λ, λ ), so we have identified the η → 0 limit. Finally, the same argument proves that
1−η β β lim A D (λ, λ ) − A F (λ, λ , s)ds (5.65) = 0, η→0
η
1
and the bounds of statement (iv) then follow from integrating the estimate of Proposition 5.5.i. " # 6. Estimates on Traces In this section, we estimate certain partition functions. Their proof involves further cancellations, that are not captured by the estimates studied in the previous section. The proofs here use the estimates on operators from the previous section, both to justify the existence of the objects studied here, as well as to estimate the quantities that arise after exhibiting cancellations in the trace that defines the partition functions. 6.1. Differentiability for λ > 0 . In this section we establish differentiability of ZλV as a function of λ. Choose the bounded operator A in Corollary 5.7.iv to be A = e−iθJ −iσ P . Then the corollary yields a representation for the difference quotient
1−η ZλV − Zλ V β δ (λ, λ ) = = lim Tr (λ, λ , s) ds A F H η→0 η λ − λ
1 β Tr H A F (λ, λ , s) ds. =
(6.1)
0
Furthermore, the putative derivative of ZλV also has an integral representation, namely
δ (λ, λ) = lim
η→0 η
1−η
β Tr H A F (λ, λ, s) ds =
1 0
β Tr H A F (λ, λ, s) ds. (6.2)
Although both representations (6.1) and (6.2) are well defined, we have not established that δ (λ, λ ) has a limit as λ → λ, nor if this limit exists whether it equals δ (λ, λ). In this section we find the consequence of the smoothing provided by the specific operator A in the partition function, This allows us to prove differentiability of the partition function, and actually its vanishing. Theorem 6.1. Under the conditions of Theorem 2.1, the map λ → ZλV is a differentiable ∂ λV Z = δ (λ, λ) = 0. function of λ for all λ ∈ (0, 1]. In fact, the derivative vanishes, ∂λ
114
A. Jaffe
Proof. The bounds in the previous section show that δ (λ, λ ) is bounded. To establish the theorem, we show that the difference quotient (6.1) actually converges to zero, lim δ (λ, λ ) = 0,
λ →λ
(6.3)
for λ > 0. A similar argument shows that δ(λ, λ) = 0. We claim that for each fixed λ ∈ (0, 1], there exists a positive, constant M = M(β, , λ, V ), not depending on λ , such that
Tr H AF β (λ, λ , s) ≤ M |λ − λ | s −1/2 (1 − s)−1/2 , (6.4) whenever λ ∈ (0, 1] lies in the neighborhood of λ defined by Bλ = {λ : |λ − λ| ≤ 21 λ}. Let us assume this bound. As a consequence, 1 1−η
β β Tr H AF (λ, λ , s) ds = lim Tr H AF (λ, λ , s) ds η→0
0
η
≤ π M |λ − λ |, for
λ, λ
∈ (0, 1] and
λ
∈ Bλ . Thus according to the representation (6.1), δ (λ, λ ) ≤ π M |λ − λ |,
(6.5) (6.6)
for λ, λ ∈ (0, 1] and λ ∈ Bλ , and the derivative of ZλV vanishes as claimed. Thus we have reduced the proof of the theorem to the proof of (6.4), which we now establish. We use the notation in the proof of Proposition 5.5. Write the density β Tr H A F (λ, λ , s) for the difference quotient as
β Tr H A F (λ, λ , s) = β Tr H A R(sβ) Q QI R ((1 − s)β) (6.7)
+ β Tr H A R(sβ) QI Q R ((1 − s)β) . The operator A commutes with R and with R and it anticommutes with Q, Q , and QI . Also, we have seen that R(sβ) QI and Q R ((1 − s)β) are both trace class. Therefore using cyclicity of the trace,
β Tr H A F (λ, λ , s) = − β Tr H A QI R ((1 − s)β) Q R(sβ) (6.8)
+ β Tr H A R(sβ) QI Q R ((1 − s)β) . The bound (5.5) assures that the range of R is in the domain of both Q0 and QI , and hence in the domain of both Q and Q . Thus in the first term, we can write Q = Q + Q − Q = Q + λ − λ QI , to yield
β Tr H A F (λ, λ , s) = − β Tr H A R(sβ) QI R ((1 − s)β) Q
+ β λ − λ Tr H A QI R ((1 − s)β) QI R(sβ)
+ β Tr H A R(sβ) QI Q R ((1 − s)β)
= − β Tr H A R(sβ) QI Q R ((1 − s)β)
+ β λ − λ Tr H A QI R ((1 − s)β) QI R(sβ) (6.9)
+ β Tr H A R(sβ) QI Q R ((1 − s)β)
= β λ − λ Tr H A QI R ((1 − s)β) QI R(sβ)
= β λ − λ Tr H A R(sβ/2) QI R ((1 − s)β) QI R(sβ/2) .
The Elliptic Genus and Hidden Symmetry
We estimate (6.9) using Hölder’s inequality, obtaining β Tr H A F (λ, λ , s) ≤ β |λ − λ | A R(sβ/4) s R(sβ/4) QI R ((1 − s)β/4) × R ((1 − s)β/4)1−s R ((1 − s)β/2) QI R(sβ/2) 2 ≤ β |λ − λ | M(β/4) R(sβ/4) QI R ((1 − s)β/4) .
115
(6.10)
The constant M(β/4) in the last term is the constant in (5.3), and the bound on QI involves the self- adjointness of QI , R, and R . From (5.5) we infer that with a new constant M4 = M4 (β, , V ), β (6.11) Tr H A F (λ, λ , s) ≤ β |λ − λ | M4 λ−1 λ −1 s −1/2 (1 − s)−1/2 . On the set Bλ , we have λ = λ + (λ − λ) ≥ 21 λ. Thus taking M(β, , λ, V ) = 2β M4 (β, , V )λ−2 , we establish (6.4), and complete the proof of the theorem. " # 6.2. Hölder Continuity at λ = 0. In Theorem 6.1, we found that the partition function ZλV is a constant function of λ for all λ ∈ (0, 1]. At the λ = 0 endpoint of the interval, H (λV ) = H0 . If both 0 < φ ≤ π and 0 < β, then the heat kernel e−βH0 is trace class, and the partition function Z0 = Z0 is well defined. However, ZλV might have a jump discontinuity at λ = 0, so it may not be the case that ZλV = Z0 . It is important to demonstrate the continuity of ZλV , and we do so by establishing Hölder continuity at λ = 0 with an exponent depending on the degree n˜ = n(V ˜ ) of the polynomial potential V. Theorem 6.2. Assume the hypotheses of Theorem 2.1. Let 0 ≤ α < 2/(n˜ − 1). Then there exists a constant M = M(α, β, , V ) such that the partition function ZλV satisfies λV Z − Z0 ≤ M λα , for all 0 < λ ≤ 1.
(6.12)
Corollary 6.3. Under the hypotheses of Theorem 2.1, the functions ZλV are independent of and of λ, and ZλV (τ, θ, φ) = Z0 (τ, θ, φ), for all λ ∈ C.
(6.13)
Proof. The corollary for λ ∈ [0, 1] is an immediate consequence of the theorem. Substituting γ V for V , with γ ∈ C, we also have an allowed potential, and also (λγ )V λ(γ V ) = Z = Z0 . So the identity ZλV (τ, θ, φ) = Z0 (τ, θ, φ) extends to all Z λ ∈ C. The first step in the proof of the theorem is to establish a representation for the difference Z0 − ZλV , that is similar to the representation in the previous section for the difference quotient (5.60), except that it is convergent at the λ = 0 endpoint of the interval.
116
A. Jaffe
Lemma 6.4. There are constants M1 = M1 (j, V ) < ∞ and M2 = M2 (j, V ) < ∞ such that ˜ I ≤ H (λV ) + M2 ≤ M1 (H0 + I )n−1 ,
(6.14)
and for all 0 ≤ α ≤ 1, α/2 ˜ ≤ M1 . (H + M2 )α/2 (H0 + I )−α(n−1)/2
(6.15)
Proof. Write H (λV ) = H0 + λ2 QI, (V )2 + λ Y + Y ∗ , where Y + Y ∗ = {Q0 , QI, }. Since n˜ ≥ 2, the upper bound (6.14) holds trivially for λ = 0. The bound of ˜ . Finally, as a consequence of Lemma Lemma 5.6.i ensures that Q2I, ≤ M3 (H0 + I )n−1 ∗ ˜ . Taken together, these 5.6.ii, the term Y +Y is bounded from above by M3 (H0 +I )n−1 bounds establish (6.14). We choose M2 sufficiently large so that I ≤ H (λV ) + M2 . The lemma then follows from the interpolation inequality (H + M2 )α ≤ M1α (H0 + ˜ I )α(n−1) , valid for 0 ≤ α ≤ 1. For s ∈ (0, 1), define the operator-valued function β
f (λ, s) = e−sβH
(λV )
(H0 − H (λV )) e−(1−s)βH0 , for s ∈ (0, 1).
(6.16)
Lemma 6.5. Under the hypotheses of the theorem, and for s ∈ (0, 1), (i) Both e−sβH (λV ) H0 e−(1−s)βH0 and e−sβH (λV ) H (λV ) e−(1−s)βH0 are trace class. β (ii) There exists a constant M6 = M6 (β, , V ), such that the function f (λ, s) has a trace-norm bounded by β ˜ (1 − s)−1/2 . (6.17) f (λ, s) ≤ M6 s −1+1/2(n−1) 1
β
(iii) The map s → f (λ, s) is continuous in the trace-norm topology. β (iv) The integral of f exists, and for any bounded linear transformation A,
1 0
β Tr H A f (λ, s) ds = lim
1−η
η→0 η
= Tr H = Tr H
β Tr H A f (λ, s) ds
lim
η→0 η 1
1−η
β
A f (λ, s)ds
(6.18)
β A f (λ, s)ds .
0
(v) The difference ZλV − Z0 has the representation, ZλV − Z0 = β
1 0
where A = e−iθJ −iσ P .
β Tr H A f (λ, s) ds,
(6.19)
The Elliptic Genus and Hidden Symmetry
117
Proof. Write −sβH H0 e−(1−s)βH0 ≤ e−sβH /2 −1 e−sβH /2 H0 e−(1−s)βH0 /2 e 1 s × e−(1−s)βH0 /2 . (6.20) −1 (1−s)
Hence using (5.3) and also (5.5), we conclude that there is a constant M6 = M6 (β, , V ) such that −sβH H0 e−(1−s)βH0 ≤ M6 s −1/2 (1 − s)−1/2 , (6.21) e 1
so
e−sβH
e−(1−s)βH0
H0 is trace class. With a possibly larger constant M6 (β, , V ), we also have the bound −sβH ˜ H e−(1−s)βH0 ≤ e−sβH /2 −1 e−sβH /2 (H + M2 )1−1/2(n−1) e 1 s ˜ × (H + M2 )1/2(n−1) (H0 + I )−1/2 1−s × (H0 + I )1/2 e−(1−s)βH0 /2 e−βH0 /2 ˜ ≤ M6 s −1+1/2(n−1) (1 − s)−1/2 ,
(6.22)
where we use the bound of Lemma 6.4 to bound the third term of (6.22), as well as (5.3) to estimate the product of the first and last terms. This proves that e−sβH H e−(1−s)βH0 is trace class. As n˜ ≥ 2, the two bounds (6.21) and (6.22) taken together yield the proof of (i–ii). Let us use the notation R(s) = e−sβH (λV ) and R0 (s) = e−sβH0 . In order to establish (iii), take s < s and consider the difference β β f (λ, s) − f (λ, s ) 1
≤ R(s) − R(s ) (H0 − H ) R0 (1 − s)1
+ R(s ) (H0 − H ) R0 (1 − s) − R0 (1 − s ) 1
= I − R(s − s) R(s/2) R(s/2) (H0 − H ) R0 (1 − s)1 (6.23)
+ R(s ) (H0 − H ) R0 ((1 − s )/2) R0 ((1 − s )/2) R0 (s − s) − I 1 . We bound this using Hölder’s inequality by β β f (λ, s) − f (λ, s )
1 ≤ I − R(s − s) R(s/2) R(s/2) (H0 − H ) R0 ((1 − s)/2)
1
× R0 ((1 − s)/2) + R(s /2) R(s /2) (H0 − H ) R0 ((1 − s )/2)1
× R0 ((1 − s )/2) R0 (s − s) − I
β/2 ≤ I − R(s − s) R(s/2) R0 ((1 − s)/2) f (λ, s) 1
β/2 + I − R0 (s − s) R0 ((1 − s )/2) R(s /2) f (λ, s ) . 1
(6.24)
118
A. Jaffe
1 Use the bound (5.53), with 0 < - < 2(n−1) , as well as Lemma 6.5.ii, to obtain with a ˜ new constant M6 = M6 (β, , V ), - β β ˜ (1 − s)−1/2 f (λ, s) − f (λ, s ) ≤ M6 s − s s −1+1/2(n−1)−1 (6.25) - ˜ + M6 s − s (s )−1+1/2(n−1) (1 − s )−1/2−- .
This establishes continuity. The proof of (iv) follows the proof of Corollary 5.7, and we omit the details. Taking A = e−iθJ −iσ P , and observing that ∂ −sβH e ∂s
(λV ) −(1−s)βH0
e
yields (v). This completes the proof of the lemma.
β
= f (λ, s) # "
Lemma 6.6. Assume the hypotheses of Theorem 2.1, take A = e−iθJ −iσ P , and let s ∈ (0, 1). (i) We have the identity β Tr H A f (λ, s) = −λ2 Tr H A e−(1−s)βH0 /2 QI, (V ) e−sβH
(λV )
QI, (V ) e−(1−s)βH0 /2 . (6.26)
(ii) There exists a constant M7 = M7 (β, , V ) such that for all α ∈ [0, 1/(n˜ − 1)], β ˜ . (6.27) Tr H A f (λ, s) ≤ M7 λ2α s −1+α (1 − s)−α(n−1) Proof. Part (i) of the lemma is a consequence of the fact that both e−sβH (λV ) and e−(1−s)H0 are trace class. Furthermore, the bound ±P ≤ H0 along with Proposition 5.3 establishing a similar upper bound with H , shows that e−sβH (λV ) P e−(1−s)H0 is trace β class. We therefore rewrite H0 − H (λV ) in f (λ, s) as H0 − H (λV ) = H0 + P − H (λV ) − P = Q20 − Q (λV )2 = Q (λV ) (Q0 − Q (λV )) + (Q0 − Q (λV )) Q0 . Thus β
f λ, s) = e−sβH
(λV )
= −λ e−sβH
(Q (λV ) (Q0 − Q (λV )) + (Q0 − Q (λV )) Q0 ) e−(1−s)H0
(λV ) Q (λV ) QI, (V ) + QI, (V ) Q0 e−(1−s)H0 . (6.28)
Furthermore, in the first term Q (λV ) commutes with the heat kernel mollifier on the left, so the above methods show e−sβH (λV ) Q (λV ) QI, (V ) e−(1−s)H0 is trace class. Similarly, Q0 commutes with the mollifier on the right, so the second term is also trace class. Consider the first term. The operator e−sβH (λV )/2 Q (λV ) is bounded,
The Elliptic Genus and Hidden Symmetry
119
the operator e−sβH (λV )/2 QI, (V ) e−(1−s)H0 is trace class, and A anti-commutes with e−sβH (λV )/2 Q (λV ). Thus using cyclicity (5.9) one can write, − λ Tr H Ae−sβH (λV ) Q (λV ) QI, (V ) e−(1−s)H0 = − λ Tr H Ae−sβH (λV )/2 Q (λV ) e−sβH (λV )/2 QI, (V ) e−(1−s)H0 = λ Tr H A e−sβH (λV )/2 QI, (V ) e−(1−s)H0 e−sβH (λV )/2 Q (λV ) = λ Tr H A e−sβH (λV )/2 QI, (V ) e−(1−s)H0 Q (λV ) e−sβH (λV )/2
= λ Tr H A e−sβH (λV )/2 QI, (V ) e−(1−s)H0 Q0 + λQI, (V ) e−sβH (λV )/2 = λ Tr H A e−sβH (λV )/2 QI, (V ) Q0 e−(1−s)H0 e−sβH (λV )/2 + λ2 Tr H A e−sβH (λV )/2 QI, (V ) e−(1−s)H0 QI, (V ) e−sβH (λV )/2 = λ Tr H A e−sβH (λV ) QI, (V ) Q0 e−(1−s)H0 + λ2 Tr H A e−sβH (λV )/2 QI, (V ) e−(1−s)H0 QI, (V ) e−sβH (λV )/2 . (6.29) On the other hand, since each term in (6.28) is trace class, we have β Tr H A f (λ, s) = − λ Tr H A e−sβH − λ Tr H A e−sβH
(λV )
Q (λV ) QI, (V ) e−(1−s)H0 (λV ) QI, (V ) Q0 e−(1−s)H0 .
(6.30)
Substituting (6.29) into (6.30), we obtain β Tr H A f (λ, s) = −λ2 Tr H A e−sβH
(λV )/2
QI, (V ) e−(1−s)H0 QI, (V ) e−sβH
(λV )/2
,
(6.31)
which proves (i). In order to prove (ii), observe that a consequence of Lemma 5.6.iii, with α < (n˜ − 1)−1 , is the following bound. There is a constant M8 = M8 (β, , V ), such that −sβH e
(λV )/4
QI, (V ) e−(1−s)H0 /4
≤ e−sβH (λV )/4 (H (λV ) + M2 )(1−α)/2 ˜ × (H (λV ) + M2 )−(1−α)/2 QI, (V ) (H0 + I )−α(n−1)/2 ˜ × (H0 + I )α(n−1)/2 e−(1−s)H0 /4
˜ . ≤ M8 λ−1+α s −(1−α)/2 (1 − s)−α(n−1)/2
(6.32)
120
A. Jaffe
As a consequence of the representation (6.26), the fact that A is unitary, and using (5.8), we have β Tr H A f (λ, s) ≤ λ2 Tr H A e−(1−s)βH0 /2 QI, (V ) e−sβH (λV ) QI, (V ) e−(1−s)βH0 /2 ≤ λ2 e−(1−s)βH0 /2 QI, (V ) e−sβH (λV ) QI, (V ) e−(1−s)βH0 /2 1 −(1−s)βH0 /4 2 −(1−s)βH0 /4 −sβH (λV )/4 ≤ λ e QI, (V ) e e 1/(1−s) × e−sβH (λV )/2 e−sβH (λV )/4 QI, (V ) e−(1−s)βH0 /2 (6.33) 1/s 1−s 2 −βH (λV )/2 s −sβH (λV )/4 QI, (V ) e−(1−s)βH0 /4 . ≤ λ2 e−βH0 /4 e e 1
1
We have used Hölder’s inequality with the exponents (1 − s)−1 , ∞, s −1 , ∞, and the ∗ −(1−s)βH /4 0 ≤ 1. We use the bound (5.3), along fact that T = T , as well as e with (6.32), to complete the proof of (6.27). " # λV Proof of Theorem 6.2. Bound the difference Z − Z0 by using the representation of Lemma 6.5.v, and the bound of Lemma 6.6.ii. Integrating this bound, we obtain for any α ∈ (0, (n˜ − 1)−1 ),
1 λV β Z − Z0 ≤ β Tr H A f (λ, s) ds (6.34) 0 −1 2α ≤ βM7 (α) (1 − α(n˜ − 1)) (1 − α(n˜ − 2)) λ . The parameter 2α in the bound (6.34) becomes α in the bound (6.12). Thus we obtain Hölder continuity with any Hölder exponent strictly less than 2/(n˜ − 1), and the proof of the theorem is complete. " # Proof of Theorem 2.1. The bound (5.4), along with Proposition 5.1, ensures that the limit of partition functions limj →∞ ZλV actually equals ZλV . There is no question about the existence or the numerical value of the limit: Theorem 6.1 ensures that the function ZλV is constant in λ for λ > 0, and Theorem 6.2 ensures that ZλV equals the same function at λ = 0. Since Z0 is -independent, therefore ZλV is also -independent. As a result, not only do the differentiability and continutity of ZλV also hold for ZλV , but ZλV is also λ-independent for λ ∈ [0, 1]. So we have established Theorem 2.1 and the first statement in Corollary 2.2.
7. Analyticity In the previous section, we saw that ZV (τ, θ, φ) = ZV (τ, θ, φ) = Z0 (τ, θ, φ). In the next section we calculate Z0 (τ, θ, φ) and find that it is holomorphic for all τ ∈ H and all θ ∈ C. Furthermore, it actually extends to a holomorphic function of φ. (There is an independent way to verify that ZV (τ, θ, φ) is holomorphic using a priori estimates. This analyticity is in a smaller domain, but a - independent domain.)
The Elliptic Genus and Hidden Symmetry
121
Proposition 7.1. Assume QH, EL, and TR, with a fixed potential V . Then for fixed real θ and φ, the partition function ZV (τ, θ, φ) is holomorphic in τ for all τ ∈ H. Furthermore, for fixed τ ∈ H and fixed φ ∈ R, the function ZV (τ, θ, φ) extends analytically in θ to a strip | (θ ) | < R, where R = R(τ ). Let A = e−iθJ −iσ P . One can express the partition function ZV as ZV (τ, θ, φ) = Tr H Ae−βH = Tr H e−iθJ −iτ (H −P )/2+iτ (H +P )/2
2 2 = Tr H e−iθJ −iτ Q /2−P +iτ Q /2 ,
(7.1)
where τ denotes the complex conjugate of τ . We have a representation similar to (7.1) for the approximating family of partition functions,
2 2 ZV (τ, θ, φ) = Tr H A e−βH = Tr H (e−iθJ −iτ Q /2−P +iτ Q /2 ). (7.2) Lemma 7.2. The approximating partition functions ZV (σ, β, θ, φ) are holomorphic in the following senses: (i) Fix σ ∈ R, θ ∈ R, and φ ∈ (0, π ]. Then ZV (σ, β, θ, φ) defined for β > 0 is the boundary value of a holomorphic function of β extending to iβ ∈ H. (ii) Fix iβ ∈ H, θ ∈ R, and φ ∈ (0, π ]. Then ZV (σ, β, θ, φ) extends analytically in σ into a strip around the real σ axis whose width is independent of . (iii) Fix σ ∈ R, iβ ∈ H, and φ ∈ (0, π ]. Then ZV (σ, β, θ, φ) extends holomorphically in θ to a strip around the real θ axis, whose width is independent of . Proof. Express the partition function ZV in terms of the real variables, ZV =
ZV (σ, β, θ, φ) = Tr H e−iθJ −iσ P −βHj . The uniform trace bound (5.3) ensures that ZV extends to a holomorphic function of β in the right half-plane. In order to establish part (ii), we use (5.21), combined with Lemma 5.2. Finally, to establish part (iii) of the lemma, we observe that J , P , and H are mutually commuting. Furthermore we use a bound on J in terms of |P |. In fact, using the explicit form of these operators, see [4], we conclude that for fixed 0 < φ there is a constant M3 < ∞ such that ±J ≤ M3 |P |. It then follows from (5.21) that for constants M1 and M2 , independent of ±J ≤ M3 (M1 H + M2 ) .
(7.3) , (7.4)
We then apply Lemma 5.2 with θ replacing σ and J replacing P , to conclude that ZλV (τ, θ, φ) is real analytic in θ. The constants M1 , M2 , and M3 do not depend on , so there is a strip of uniform width about the real θ axis for which ZλV is uniformly bounded and holomorphic. " # Lemma 7.3. The approximate partition functions ZV (σ, β, θ, φ) satisfy the Cauchy– Riemann identity ∂ZV ∂ZV +i = 0, ∂σ ∂β for τ ∈ H. Therefore ZV is holomorphic for τ ∈ H.
(7.5)
122
A. Jaffe
Proof. By Lemma 7.2, the derivative of ZλV (σ, β, θ, φ) with respect to β and σ exist. Differentiating the representation (7.2), and using the identity (4.14) yields
∂ZV ∂ZV +i = −i Tr H A Hj + P e−βHj = −i Tr H A Q2 e−βH . (7.6) ∂σ ∂β Proposition 5.1 ensures that Q (H + M2 )−1/2 is bounded, at least if we choose M2 sufficiently large so I ≤ H + M2 . But (5.3) ensures that Q e−βH /2 = e−βH /2 Q is also bounded and trace class. As a consequence, we use cyclicity of the trace and A Q = −Q A to give Q e−βHj /2 Tr H AQ2 e−βH = Tr H A e−βH /2 Q A Q e−βH /2 = −Tr H e−βH /2 Q (7.7) = −Tr H A Q e−βH /2 Q e−βH /2 = −Tr H AQ2 e−βH = 0, completing the proof of analyticity in τ ∈ H. " # 8. Evaluation We verify the representation for the elliptic genus in the case that the potential V is zero. Proposition 8.1. Choose &i ∈ (0, 21 ] for 1 ≤ i ≤ n. Take V = 0 and assume TR and NC. Then the partition function Z0 is given by (2.4). Proof. Define
f &i (k)
=
&i , if 0 < k , 1 − &i , if k < 0
(8.1)
and the functions f
f
b (±k) = e∓iθ&i −β|k| , γ±,i
and γ±,i (±k) = e∓iθ&i (±k)−β|k| .
(8.2)
The momenta range over the following lattices, Kib = {k : k ∈ 2πZ − &i φ},
f
f
and K±,i = {k : k ∈ 2π Z − &i (±k)φ}.
(8.3)
We require that 0 < φ ≤ 2π and 0 < &i , 1 − &1 < 1, so zero is not an allowed momentum, f
f
0 ∈ Kib , K+,i , K−,i .
(8.4)
In case V = 0, the partition function factors into a product of a fermionic free-field part and a bosonic free-field part. We calculated the free bosonic and fermionic partition functions in Theorems 2.2.1 and 5.4.1 of [4], yielding f f n 1 − γ+,i (k ) 1 − γ−,i (−k ) ˆ (8.5) Z0 = y c/2 . b (k)|2 |1 − γ +,i b f f i=i k∈Ki k ∈K+,i k ∈K−,i
The Elliptic Genus and Hidden Symmetry
123
ˆ arises from the normalization constant c/2 The overall factor y c/2 ˆ in (1.13). Split each product into terms indexed by n ∈ Z, and separate the terms with positive, b (k) = γ b (−k)∗ . (The γ f satisfy such a relation negative, and zero n. Note that γ+,i ±,i −,i b,f b, f only when &i = 1/2.) For k = 2π n − χ±,i /, and n ∈ Z the functions γ±,i (±k) take the following values: f
f
b (k) γ+,i
b (−k) γ−,i
γ+,i (k)
γ−,i (−k)
n=0
(z/y)&i
(yz)&i
(z/y)1−&i
(yz)&i
n>0
q n (1/yz)&i
q n (y/z)&i
q n (1/yz)&i
q n (y/z)1−&i
n 25 approaches to two-dimensional qunatum gravity. Modern Phys. Lett. A 11, 93–101 (1996) 43. Teschner, J., Ponsot, B.: Liouville bootstrap via harmonic analysis an a noncompact quantum group. Preprint hep-th/9911110 44. Volkov, A. Yu.: Quantum Volterra Model. Phys. Lett. A 167, 345 (1992)
Strongly Coupled Quantum Discrete Liouville Theory
45. 46. 47. 48.
219
Volkov, A. Yu.: q-combinatorics and quantum integrability. Preprint q-alg/9702007 Volkov, A. Yu.: Quantum lattice KdV equation. Lett. Math. Phys. 39, 313–329 (1997) Volkov, A. Yu.: Beyond the “Pentagon Identity”. Lett. Math. Phys. 39, 393–397 (1997) Zacharov, V.E., Manakov, S.V., Novikov, S.P., Pitayevsky, L.P.: Theory of solitons. The inverse problem method. Moscow: Nauka, 1980 (in Russian). English transl.: New York: Plenum, 1984
Communicated by A. Jaffe, G. Mack and W. Zimmermann
Commun. Math. Phys. 219, 221 – 245 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Scheme Independence of the Reduction Principle and Asymptotic Freedom in Several Couplings Wolfhart Zimmermann Max-Planck-Institut für Physik, Föhringer Ring 6, 80805 München, Germany Received: 14 June 2000 / Accepted: 28 June 2000
Dedicated to the memory of Harry Lehmann Abstract: It is proved that reduction in the number of coupling and mass parameters is a scheme independent concept. This result justifies to use special renormalization schemes suitable for applications of the reduction method. Scheme changing transformations are discussed with the aim of removing gauge and mass parameters in the reduction equations. Necessary and sufficient conditions for asymptotic freedom in models with several couplings are stated. 1. Introduction The method of reducing the number of couplings was originally proposed for renormalizable models of quantum field theory with dimensionless couplings λ0 , λ1 , . . . , λn and a normalization mass κ as the only parameters [1]. Since the reduction method is exclusively based on the form of the β functions it may as well be applied to other models in formulations for which the β functions are massless and independent of gauge parameters. To this end the Landau gauge is used for gauge theories and a scheme of renormalization like dimensional renormalization in which β functions are mass independent [2, 3]. Then the β functions depend on the dimensionless couplings only βj = βj (λ0 , λ1 , . . . , λn ),
j = 0, 1, . . . , λn .
(1.1)
By the principle of reduction all couplings λj are required to be functions of a single one denoted by λ0 , λj = λj (λ0 )
(j = 1, . . . , n),
(1.2)
in a way which is compatible with invariance under the renormalization group [4–6]. Substituting the functions (1.2) for the couplings λj of the original model one obtains a formulation of a reduced model involving a single coupling parameter λ0 only. As a
222
W. Zimmermann
consequence of the renormalization group invariance of the original and the reduced model as well one finds a system of ordinary differential equations β0
dλj = βj dλ0
(1.3)
to be satisfied by the functions (1.2). For the solutions to be meaningful it is required that all couplings simultaneously vanish in the weak coupling limit λj → 0
for
λ0 → 0.
(1.4)
In many cases it is natural to impose further that all couplings allow for power series expansions with respect to a suitably selected primary coupling λ0 , λj = cj l λl0 . (1.5) In this case the correlation functions of the reduced model have formal expansions with respect to powers of λ0 , thus resembling a renormalizable theory with a single coupling λ0 . For some applications it is useful to consider partial reductions, where several parameters remain independent. It may also be of interest to require – instead of (1.4) – that all couplings simultaneously approach a non-trivial zero of the β functions. Coupling relations (1.2) which follow from the invariance of a model under a symmetry group satisfy the conditions (1.3)–(1.5) provided the symmetry can be implemented to all orders of perturbation theory. The reduction method may thus be considered as a generalization of this particular aspect of symmetry1 . The reduction method was extended by Piguet and Sibold to formulations of models with β functions depending on mass and gauge parameters [20]. In that case the reduction equations become a system of partial differential equations including derivatives with respect to the normalization mass and gauge parameters. Due to these partial derivatives it is difficult to study the solutions of the reduction equations in the general case. However, Piguet and Sibold found the remarkable result that on the basis of the Callan–Symanzik equations [21, 22] the reduction equations have the form of ordinary differential equations with parametric dependence on the mass and gauge parameters. Since in general the renormalization group equations [23] and the Callan–Symanzik equations are independent, the question of consistency between the two types of reduction equations comes up. For solutions which are uniquely determined power series in the primary coupling Piguet and Sibold proved the consistency. For general solutions the issue is more involved. But transforming to a scheme with massless β functions for which renormalization group equations and Callan–Symanzik equations coincide should furnish a resolution of this problem in general. Another important development concerns the combined reduction of couplings and masses in supersymmetric grand unified theories [24]. In this work Kubo, Mondragón and Zoupanos reduced the coefficients of the soft supersymmetry breaking terms in order to minimize the number of independent parameters. The scheme of dimensional renormalization was used with mass parameters introduced similarly to couplings. Then the differential equations of the renormalization group also involve derivatives with respect to the masses. It is characteristic for dimensional renormalization that those β functions which carry a dimension are linear or quadratic forms in the dimensional 1 For reviews see, for instance, refs. [7–14]. Refs. [15–19] contain earlier work related to the reduction of couplings.
Reduction and Asymptotic Freedom
223
couplings and masses, while the coefficients of these polynomials depend on the dimensionless couplings only. Since in this approach the mass parameters enter similarly to the couplings, masses are included with the couplings in the reduction process. In this way Kubo, Mondragón and Zoupanos obtained non-trivial constraints on the soft supersymmetry breaking terms which are compatible with renormalization and lead to surprisingly simple sum rules [25]. In the present paper it will be proved that the principle of reduction is invariant under transformations of couplings and masses which change the scheme of renormalization2 . This scheme independence justifies the use of special schemes of renormalization chosen such that the β functions take a particularly simple form. The proof includes the case of couplings with the dimension of mass and variable masses treated similarly to couplings (Sect. 2). In Sect. 3 methods of eliminating gauge and mass parameters are discussed. It is referred to the work of Breitenlohner and Maison for a comprehensive treatment [27]. For the purpose of the reduction method an alternative approach is proposed which is exclusively based on the differential equations of the renormalization group. In models with dimensionless couplings and pole masses transformations are constructed which lead to a scheme of renormalization with massless β functions. The proof is based on formal expansions with respect to powers of the coupling and uses the assumption that the massless limit of the β functions exists and is approached smoothly. The formulation obtained should be equivalent to the scheme of dimensional renormalization with appropriate normalization conditions. The generalization to models which also involve dimensional couplings and variable mass parameters is only sketched. In this case mass parameters cannot be eliminated completely from the β functions. Instead a polynomial dependence on dimensional couplings and masses remains. The final form of the reduction equations is in agreement with ref. [24]. A different interpretation of the reduction method is provided by the evolution equations [28]. A systematic discussion of the effective couplings in this respect is given in Sect. 4 for models with dimensionless couplings and pole masses. It is shown how in the reduced model the effective couplings are expressed as functionals of the primary coupling. An evolution equation for the primary coupling alone is derived. Again, particularly simple results are obtained, if a scheme of renormalization is used with massless β functions, as is justified by scheme independence. Then the reduction equations follow in the form d λ¯ j β¯0 = β¯j (j = 1, . . . n), β¯j = βj (λ¯ 0 , λ¯ 1 , . . . , λ¯ n ), (1.6) d λ¯ 0 for the effective couplings λ¯ j by eliminating the momentum variable |k| in the evolution equations. Corresponding to (1.4) the condition λ¯ j → 0
for
λ¯ 0 → 0
(1.7)
|k| → ∞
(1.8)
is imposed. In the case of λ¯ 0 → 0
for
the property of asymptotic freedom holds [29, 30]: All couplings vanish simultaneously in the high momentum limit, λ¯ 0 → 0, . . . , λ¯ n → 0
for
2 A preliminary report on this work was given in ref. [26].
|k| → ∞.
(1.9)
224
W. Zimmermann
Equation (1.8) is implied by the evolution equation for λ¯ 0 , if β¯0 has the appropriate sign for λ¯ 0 → 0. For example β¯0 should be negative in the case of λ0 > 0 in the model considered. In this way necessary and sufficient conditions for asymptotic freedom in several couplings follow3 .
2. Scheme Independence We consider models of local quantum field theory with renormalizable interactions involving several coupling and mass parameters. Apart from dimensionless coupling parameters and a normalization mass we allow for the possibility of intrinsic masses, coupling parameters of dimension mass and gauge parameters, should gauge fields be present. For the intrinsic masses either pole masses are used defined by the lowest propagator singularities or variable masses suitably defined by propagators at the normalization point. For implementing the concept of reduction some of the parameters are selected as an independent variables with other parameters depending on them. Usually one single parameter is chosen as independent variable. There are interesting applications, however, where a partial reduction with several independent parameters is useful, see ref. [24], for instance. For this reason the case of partial reduction is included. Following is a list of all parameters involved: 0 , . . . , g0 , 1 , . . . , g1 ; dimensionless couplings g01 g01 0A 0E 0 0 1 , . . . , g1 ; couplings of dimension mass g11 , . . . , g1B , g11 1F 0 , . . . , g0 , 1 , . . . , g1 ; variable masses g21 g21 2C 2G 0 , . . . , g0 , 1 , . . . g1 ; variable mass squares g31 g31 3D 3H pole masses m1 , . . . , mI ; gauge parameters α1 , . . . , αJ ; normalization mass κ.
– – – – – – –
The independent parameters are denoted by gij0 , the parameters gij1 will be considered to be functions of them, 0 0 gij1 = rij (g01 , . . . , g3D , m1 , . . . , mI , α1 , . . . , αJ , κ 2 )
(2.1)
g 1 = r(g 0 , m, α, κ 2 )
(2.2)
or
in vector notation 0 0 1 1 , . . . , g3D ), g 1 = (g01 , . . . , g3H ), r = (r01 , . . . , r3H ), g 0 = (g01 m = (m1 , . . . , mI ), α = (α1 , . . . , αJ ).
(2.3)
The distinction between linear and quadratic mass parameters is a matter of convenience relevant for the massless limit. For the time ordered correlation functions τ = τ (k, g 0 , g 1 , m, α, κ 2 ),
(2.4)
3 For reduced models with asymptotic freedom see refs. [15–18, 31–34], reviews are given in refs. [8, 10].
Reduction and Asymptotic Freedom
225
(k denotes the vector of momentum variables) the partial differential equations of the renormalization group are κ2
∂τ ∂τ ∂τ + βijl + δj + γj τ = 0. l 2 ∂κ ∂αj ∂gij
(2.5)
In the original model all variables gijl of the correlation functions are independent. By substituting the functions (2.1) for the variables gij1 in (2.4) the number of independent parameters is decreased. The correlation functions thus obtained, τ = τ (k, g 0 , m, α, κ 2 ) = τ (k, g 0 , r(g 0 , m, α, κ 2 ), m, α, κ 2 ),
(2.6)
define a new model which is called a reduced model with the reducing functions (2.1). By the reduction principle the reduced model is again invariant under the renormalization group. This means that the correlation functions (2.6) should also satisfy partial differential equations of the form κ2
∂τ ∂τ 0 ∂τ + β + δ + γj τ = 0. ij j ∂κ 2 ∂αj ∂gij0
(2.7)
Comparing (2.5) with (2.7) we obtain βij0 = βij0 ,
δj = δj ,
γj = γj
(2.8)
with the prime indicating that the functions (2.1) should be inserted for the variables gij1 . For the reducing functions (2.1) the partial differential equations κ2
∂rst ∂rst 0 ∂rst + β + δj = βst1 ij ∂κ 2 ∂αj ∂gij0
(2.9)
follow. The reduction principle requires further that the couplings vanish simultaneously in the weak coupling limit g 0 → 0, r0t = 0,
r1u = 0
at
0 goj = 0,
0 g1l = 0.
(2.10)
A considerably stronger restriction may be imposed on the reducing functions by demanding that – in addition to (2.10) – formal expansions of the dependent couplings r0t , r1u , and masses r2u , r3w as well, exist with respect to the independent couplings 0 , g 0 . In that case the correlation functions can also be expanded with respect to the g0j 1l independent couplings so that the reduced system resembles a renormalizable model. If the scheme of renormalization is changed, the couplings and variable masses are transformed like 0 1 Glij = 'ijl (g01 , . . . , g3H , m, α, κ 2 )
(2.11)
or Gl = ' l (g 0 , g 1 , m, α, κ 2 ) in vector form. Here Gl and ' l denote the vectors Gl = (G001 , . . . , G12F ),
0 1 ' l = ('01 , . . . , '2F ).
(2.12)
226
W. Zimmermann
u , gw . These transformations can be expanded with respect to powers of the couplings g0t 1v In lowest order we have
'ijl = gijl + higher orders in
u w g0t , g1v .
(2.13)
The correlation functions τˆ in the new scheme are given by τ (k, g 0 , g 1 , m, α, κ 2 ) = τˆ (k, G0 , G1 , m, α, κ 2 )
(2.14)
with the transformation (2.11) to be substituted for G0 , G1 . In the new scheme the renormalization group equations are κ2
∂ τˆ ∂ τˆ ∂ τˆ ˆstu + β + δ + γj τˆ = 0 j ∂κ 2 ∂Gust ∂αj
(2.15)
with the coefficients βˆstu = κ 2
u u ∂' u ∂'st l 'st + β + δj stj . ij ∂κ 2 ∂α ∂gijl
(2.16)
The functions (2.1) represent a surface S in the space of coordinates gijl . By the transformation (2.11) the surface S will be mapped into a surface Sˆ in the space of coordinates Glij which will be described by functions G1 = R(G0 , m, α, κ 2 ),
R = (R01 , . . . , R2F ).
(2.17)
Inserting these functions into the transformed correlation functions we obtain a reduced system with the correlation functions τˆ (k, G0 , m, α, κ 2 ) = τˆ (k, Go , R(G0 , m, α, κ 2 ), m, α, κ 2 ).
(2.18)
In order to prove the scheme independence of the reduction principle we have to show that τˆ satisfies a renormalization group equation. We begin with the construction of the functions (2.17). The surface S is mapped into the surface Sˆ by G0 = ' 0 (g 0 , r(g, m, α, κ 2 ), m, α, κ 2 ) = L0 (g 0 , m, α, κ 2 ),
(2.19)
G = ' (g , r(g , m, α, κ ), m, α, κ ) = L (g , m, α, κ )
(2.20)
1
1
0
0
2
2
1
0
2
(see Eqs. (2.1) and (2.11)). At given m, α and κ 2 the coordinates of Gl of Sˆ are thus expressed as functions of g 0 which we denote by Ll . For constructing the parametrization (2.17) we have to replace g 0 by G0 . To this end we invert (2.19) with respect to g0 , g 0 = f (G0 , m, α, κ 2 ) (inversion of
G0 = L0 (g 0 , m, α, κ 2 )).
(2.21)
Reduction and Asymptotic Freedom
227
The inversion is possible for values of g 0 not too large, since ∂L0ij 0 ∂gst
=
∂'ij0 0 ∂gst
at
∂'ij0 ∂g 1 vw + = δis δj t 1 ∂g 0 ∂gvw st
0 = 0, g1p
(2.22)
0 g1q =0
(see Eq. (2.13)). Substituting (2.21) for g 0 into (2.20) we obtain G1 = L1 (f (G0 , m, α, , κ 2 ), m, α, κ 2 ) = R(G0 , m, α, κ 2 ).
(2.23)
ˆ By this we have constructed the parametrization (2.17) of the surface S. After this preparation we turn to the proof of the renormalization group equations for the functions τˆ defined by (2.18). Into the transformation law (2.14) of the correlation functions we substitute the reducing functions (2.1) and their image (2.17) for the variables g 1 or G1 resp., τ (k, g 0 , r(g 0 , m, α, κ 2 ), m, α, κ 2 ) = τˆ (k, G0 , R(G0 , m, α, κ 2 ), m, α, κ 2 ).
(2.24)
By definition (2.6) and (2.18) of τ and τˆ this represents the transformation law for the correlation functions of the reduced system τ (k, g 0 , m, α, κ 2 ) = τˆ (k, G0 , m, α, κ 2 )
(2.25)
with (2.19) expressing the dependence of G0 on g 0 . Differentiating (2.25) with respect to κ 2 , gij0 and αj we get 0 ∂ τˆ ∂' 0 ∂rvw ∂τ ∂ τˆ ∂ τˆ ∂L0st ∂ τˆ ∂ τˆ ∂'st st = + = + + , 1 ∂κ 2 ∂κ 2 ∂κ 2 ∂κ 2 ∂G0st ∂κ 2 ∂G0st ∂κ 2 ∂G0st ∂gvw (2.26) 0 0 0 ∂ τˆ ∂L ∂ τˆ ∂' ∂ τˆ ∂' ∂rvw ∂τ st st st = = + , (2.27) 0 0 0 0 0 1 ∂g 0 ∂gij ∂Gst ∂gij ∂Gst ∂gij ∂G0st ∂gvw ij ∂ τˆ ∂Lst ∂ τˆ ∂' 0 ∂ τˆ ∂' 0 ∂rvw ∂ τˆ ∂ τˆ ∂τ st st = + = + + . 1 ∂α ∂αj ∂αj ∂αj ∂G0st ∂αj ∂G0st ∂αj ∂G0st ∂gvw j (2.28)
Inserting these expressions into (2.7), (2.8) and using (2.9) first, then (2.16) (for u = 0), we obtain ∂ τˆ ∂ τˆ 0 ∂ τˆ βˆij + δj + γj τˆ = 0. (2.29) κ2 2 + 0 ∂κ ∂αj ∂Gij These are the renormalization group equations of the reduced system in the new scheme. Combining this result with the renormalization group equations (2.5) of the original system in the new scheme we find the differential equations κ2
∂Rst ∂Rst 0 ∂Rst + βˆij + δj = βˆst1 0 2 ∂κ ∂αj ∂Gij
(2.30)
228
W. Zimmermann
for the reducing functions (2.17). This completes the proof for the scheme independence of the reduction principle. It is easy to check that condition (2.10) – and the power series requirement as well – are scheme independent. We begin with transforming (2.10). By (2.13) 0 = 0, '0s
at
0 g0a
= 0,
0 '1t = 0,
1 '0u = 0,
1 '1v = 0,
0 g1b
1 g0c
1 g1d
= 0,
= 0,
(2.31)
= 0.
Setting 0 = 0, g0a
0 g1b = 0,
it follows r0c = 0
and
r1d = 0
from (2.10) so that in (2.19), (2.20) L00s = 0,
L01t = 0
at
0 g0a = 0,
0 g1b =0
(2.32)
L10u = 0,
L11v = 0
at
0 g0a = 0,
0 g1b =0
(2.33)
and
using (2.31). Since (2.19) is inverted uniquely by (2.21), (2.32) implies f0a = 0,
f0b = 0
at
G00s = 0,
G01t = 0.
(2.34)
Inserting (2.34) followed by (2.33) into (2.23) the final result R0u = 0,
R1v = 0
at
G00s = 0,
G01t = 0
(2.35)
is obtained. This is the transformed version of (2.10) in the new scheme. Similarly the power series requirement can be checked. An expansion of r and the expansion (2.13) implies that L0 and L1 as defined by (2.19) or (2.20) resp. can be 0 , g 0 . The power series of L0 may be inverted to expanded with respect to powers of g0a 0b a power series of f (see Eq. (2.21)) because of (2.22). Inserting the power series of f into (2.23) followed by the expansion of L1 we find that the reducing functions R in the new scheme can be expanded with respect to powers of G00s and G01t . This completes the proof of the scheme independence for the condition that all couplings simultaneously approach zero and the additional requirement that the reducing functions can be expanded in the independent couplings. 3. Elimination of Parameters A comprehensive treatment on the elimination of gauge and mass parameters is given in the work of Breitenlohner and Maison published in this volume [27]. In this section we discuss possibilities of eliminating parameters which are based on the renormalization group alone and should be sufficient for applications to the reduction method. Only minimal assumptions on the dynamics of the system will be needed for that purpose.
Reduction and Asymptotic Freedom
229
The aim is to find parameter transformations which lead to schemes with particularly simple β functions. In the last section the relations (2.16) served to determine the β functions βstu in a new scheme after applying a given transformation (2.11) to the parameters. A different point of view will be taken now: We consider the β functions βˆstu as given in a suitable form and determine transformations (2.11) as solutions of Eqs. (2.16). Postponing the removal of masses as a second step we discuss the elimination of gauge parameters first. For this purpose we consider (2.16) with βˆstu taken to be the values of the β functions in the Landau gauge. In this case solutions of (2.16) can be found, but in general they involve additional parameters carrying a dimension or require a positive lower bound for the masses. Thus the correlation functions will either depend on new mass parameters or a final elimination of masses is impossible. But using a few simple consequences of gauge invariance parameter transformations can be constructed as solutions of (2.16) which do not introduce new parameters and apply to a range of mass values including the massless limit. A detailed treatment of this possibility for eliminating the gauge parameters will be given in another publication. For the remainder of this section it will be assumed that the gauge parameters have been removed. We next turn to the problem of eliminating masses. First we consider models with parameters λ0 , λ1 , . . . , λn ; m1 , . . . , mI ; ζ.
(3.1)
The couplings λi are all dimensionless. The mass parameters mj denote pole masses defined by the location of the lowest propagator singularities. The normalization mass κ is replaced by its inverse ζ =
1 |κ|
(3.2)
which is more convenient for the discussion of the massless limit. Opposite signs of the same coupling parameter are interpreted as belonging to different models, unless the square may be used instead of the original coupling parameter in the renormalization group analysis. For a specific model each coupling parameter is defined such that λj ≥ 0
(3.3)
by changing sign, if necessary. The renormalization group equations (2.5) simplify to
βs
∂τ 1 ∂τ =0 + γs τ − ζ ∂λs 2 ∂ζ
(3.4)
with βs = βs (λ0 , λ1 , . . . , λn , m1 ζ, . . . , mI ζ )
(3.5)
(similarly for γs ). In this and the following section it is assumed that the β functions are differentiable and do not vanish in4 (λ0 , . . . , λn ) ∈ D,
0 ≤ mj ζ < πj ,
ζ > 0,
(3.6)
4 Instead of differentiability Lipschitz conditions would be sufficient for the existence theorems applied in this paper.
230
W. Zimmermann
where D is a bounded domain in the sector λj > 0 on the boundary of D. In the simplest case a cube 0 < λj < ωj
(j = 1, . . . , λn ) with the origin
(j = 0, ..., n),
ωj > 0,
may be chosen for D. The interior of a cone section in λj > 0 (j = 1, . . . , n) with tip at the origin should be sufficiently general. This assumption excludes the case that the β functions vanish identically and restricts (3.6) by an appropriate boundary such that non-trivial zeroes of the β functions remain outside. Moreover, by (3.5) and (3.6) the massless limit βˆj (λ0 , . . . , λn ) = βj (λ0 , . . . , λn , 0, . . . , 0)
(3.7)
exists independently of the way the limit mj → 0 is taken. We want to change the scheme by constructing a transformation (2.11), 5j = 'j (λ0 , λ1 , . . . , λn , m, ζ ),
m = (m1 , . . . , mI ),
(3.8)
which leads to renormalization group equations
βˆs
∂ τˆ 1 ∂ τˆ + γs τˆ − ζ =0 ∂5s 2 ∂ζ
(3.9)
with the massless β functions (3.7), βˆs = βˆs (50 , . . . , 5n ).
(3.10)
The transformations (3.8) are solutions of the partial differential equations (2.16),
βs
∂'j 1 ∂'j − ζ = βˆj . ∂λs 2 ∂ζ
(3.11)
There are many solutions of (3.11). A unique solution can be constructed, for instance, by adjusting the new couplings to the old ones at a normalization mass κ = κ0 , i.e. 5j = λj
at
ζ = ζ0 = 1/|κ0 | > 0.
(3.12)
The existence of such a solution will be proved in a region (3.6). For given mass values the functions (3.8) represent an (n + 2)-dimensional surface S in the (2n + 3)-dimensional space of coordinates λi , 5j , ζ. A solution of (3.11) must be found for which S contains the (n + l)-dimensional surface S0 given by (3.12). The characteristic determinants of the n + 1 equations (3.11) are identical and have the value − 21 ζ0 on the surface S0 . Thus the characteristic determinants do not vanish at ζ = ζ0 > 0. Therefore, a unique solution of (3.11) exists which satisfies the initial conditions (3.12)5 . In this way a new scheme of renormalization is defined for which the β functions are those of the massless model. By this construction, however, a new dimensional parameter κ02 is introduced. The β functions of the new scheme do not depend on it, but the transformation (3.8) as well as the correlation functions τˆ in the new scheme involve this parameter κ0 . Moreover, the dependence on κ0 is not controlled by the renormalization group equation. Instead, a satisfactory method of eliminating masses is provided by adjusting the couplings 5j = λj
at
5 See ref. [35], Chapter 2 and ref. [36], Chapter 2.2.
ζ = 0.
(3.13)
Reduction and Asymptotic Freedom
231
This condition may be interpreted as adjusting the couplings of the old and the new scheme for |κ02 | → ∞. The procedure should not be confused with trying to normalize coupling parameters at infinite momentum. Even in the case of asymptotic freedom such normalization is not easily possible, since then all effective couplings vanish in the high momentum limit. In contradistinction the issue here is to find solutions of the partial differential equations (3.11) satisfying the initial conditions (3.13) with the β functions (3.5) and (3.10). The choice of boundary conditions (3.13) seems to be particularly natural, since the new β functions βˆs are the limits of the original β functions for vanishing ζ , βˆs = lim βs (λ0 , . . . , λn , m1 ζ, . . . , mI ζ ). ζ →0
(3.14)
For the method to work this limit should exist, of course. But it should be stressed that the massless limit of the correlation functions is not required here. It will be shown that indeed a power series solution of (3.11) can be constructed uniquely by imposing condition (3.13). An existence and uniqueness proof which is not based on expansions is also possible, but requires the use of Callan–Symanzik equations in addition as in the work of Breitenlohner and Maison [27]. For the construction of the power series expansions a few assumptions concerning the limit (3.14) will be made. In the formal expansions µ n βj = βj µ λ0 0 · · · λµ µ = (µ0 , . . . , µn ), (3.15) n , M= µj ≥ 2, the coefficients βj µ = βj µ (m1 , . . . , mI ; ζ ) = βj µ (ν1 , . . . , νI ), νj = mj ζ,
(3.16)
are assumed to exist in a region including the massless case ζ = 0. The expansions of the β functions in the new scheme are then µ n βˆj = βˆj µ λ0 0 · · · λµ (3.17) n with the constants βˆj µ given as the values of βj µ at ζ = 0, βˆj µ = βj µ (0, . . . , 0).
(3.18)
It is further assumed that the value βˆj µ is approached smoothly by βj µ in the limit ζ → 0. The condition that :βj µ (m1 ζ, . . . , mI ζ )| ≤ aj µ ζ ;j µ ,
if
0 0.
232
W. Zimmermann
The aim is to solve (3.11) by a formal expansion 5s = 's (λ0 , λ1 , . . . , λn , m1 , . . . , mI , ζ ) µ n = 5sµ (m1 , . . . , mI ; ζ )λ0 0 · · · λµ n
(3.21)
with the initial condition (3.13) imposed. This implies 5sµ (m1 , . . . , mI ; 0) = 0
(3.22)
5s(s) (m1 , . . . , mI ; 0) = 1
(3.23)
for all coefficients except
for the coefficient of λs . For the low order terms of the β functions (3.15), (3.17) and the transformation (3.21) we use the simplified notation 1 kl βj = (3.24) bj λk λl + · · · , 2 1 ˆ kl βˆj = (3.25) bj 5k 5l + · · · , 2 :bjkl = bjkl − bˆjkl , (3.26) 1 5s = Ls + Lks λk + (3.27) Lkl s λ k λl . 2 The differential equations (3.11) imply ∂Ls = 0, ∂ζ
∂Lks = 0, ∂ζ
Ls = 0,
Lks = δsk
by the conditions (3.22), (3.23). With this the expansion (3.21) takes the form µ n 5 s = λs + 5sµ (m1 , . . . , mI ; ζ )λ0 0 · · · λµ n .
(3.28)
M≥2
In the notation of (3.24)–(3.27) we obtain the differential equations 1 ∂Lkl ζ s = :bskl 2 ∂ζ for the coefficients of the quadratic terms. The solutions are ζ dx kl Ls = 2 :bskl (m1 x, . . . , mI x) . x 0
(3.29)
(3.30)
By (3.19) the integrals converge, additional constants of integration vanish due to the initial condition (3.13). For treating higher orders we proceed by induction. The hypothesis of induction is: On the basis of the differential equations (3.11) with the initial conditions (3.13) all coefficients 5sµ = 5sµ (m1 , ..., mI ; ζ )
(3.31)
Reduction and Asymptotic Freedom
233
of the expansion (3.28) with 2≤M=
µj < N
(3.32)
have been constructed. This construction is unique and it has been shown that the coefficients (3.31) are bounded by |5sµ (m1 , . . . mI ; ζ )| ≤ csµ ζ ηsµ ,
if
0 < ζ < usµ ,
(3.33)
for suitable numbers csµ , ηsµ , usµ with csµ > 0,
0 < ηsµ < 1,
usµ > 0.
We remark that (3.33) holds for the integral (3.30) as a consequence of (3.19). It will now be shown that each coefficient 5tν = 5tν (m1 , . . . , mI ; ζ ), with
ν = (ν0 , . . . , νn ),
(3.34)
νj = N
is also determined uniquely by (3.11), (3.13) and bounded similarly to (3.33). Equation (3.11) implies the differential equation 1 ∂5sν l βtν − ζ Etν = βˆtν + 2 ∂ζ
(3.35)
l
l are determined by lower orders only with M < N. They for (3.34). The terms Etν are monomials in the coefficients (3.31) with (3.32) and involve coefficients of the β functions. Therefore, they are bounded similarly to (3.33). Equation (3.35) is solved by ζ ζ dx dx l 5tν = 2 :βtν (m1 x, . . . , mI x) +2 Etν (m1 , . . . , mI ; x) . (3.36) x x 0 0 l
l all integrals converge and are again bounded Due to (3.19) and similar bounds for Etν like (3.33). Therefore, (3.33) also holds for 5tν . This completes the proof of induction. On the basis of formal expansions it is thus possible to construct a scheme of renormalization in which the β functions do not depend on the pole masses mj nor on the normalization mass κ. This result will now be applied to the reduction of a model involving the parameters (3.1) with λ0 chosen as primary coupling. For a set of reducing functions
λj = rj (λ0 , m1 ζ, . . . , mI ζ ),
(3.37)
the reduction equations (2.9) take the form β0
∂rj 1 ∂rj − ζ = βj ∂λ0 2 ∂ζ
(j = 1, . . . , n),
(3.38)
βj = βj (λ0 , r1 , . . . , rn , m1 ζ, . . . , mI ζ ).
(3.39)
234
W. Zimmermann
The reducing functions are supposed to satisfy the condition lim rj = 0
(3.40)
λ0 →0
or the stronger power series requirement rj =
∞
cj l λl0 ,
(3.41)
l=1
cj l = cj l (m1 ζ, . . . , mI ζ ). After transforming to massless β functions (3.37) is mapped into 5j = Rj (50 , m1 ζ, . . . , mI ζ ) satisfying βˆ0
∂Rj 1 ∂Rj = βˆj − ζ ∂50 2 ∂ζ
(j = 1, . . . , n),
(3.42)
βˆj = βˆj (50 , R1 , . . . , Rn ) = βj (50 , R1 , . . . , Rn , 0, . . . , 0). Although the β functions do not explicitly depend on mj or ζ , such dependence cannot be excluded for the solutions rj . But it will be shown in the following section that any ζ -dependent solution of (3.42) may be replaced by an equivalent solution of the same equations which is independent of ζ . Therefore, we may set ∂Rj =0 ∂ζ in (3.42) and solve the ordinary differential equations βˆ0
dRj = βˆj d50
(j = 1, . . . , n)
(3.43)
by functions 5j = Rj (50 ) with the requirements lim Rj = 0
50 →0
(3.44)
or the stronger power series condition Rj =
∞
Cj l 5l0 .
(3.45)
l=1
We conclude this section by making some brief remarks on the elimination of the normalization mass and the reduction method for models involving dimensional couplings and variable mass squares as in ref. [24]. The parameters are denoted by
Reduction and Asymptotic Freedom
– – – –
235
dimensionless couplings λ0 , λ1 , . . . , λn , couplings of dimension mass ξ10 , . . . , ξB0 , ξ11 , . . . , ξF1 , 0, 1, variable mass squares ω10 , . . . , ωC ω11 , . . . , ωG inverse normalization mass ζ = 1/|κ|.
The independent parameters are 0 λ0 , ξ10 , . . . , ξB0 , ω10 , . . . , ωC ,
(3.46)
1 λ1 , . . . , λn , ξ11 , . . . , ξF1 , ω11 , . . . , ωG
(3.47)
while the parameters
are treated as functions depending on (3.46), λt = rt (λ0 , ξ 0 , ω0 , ζ ) ξt1 ωt1
(t = 1, . . . , n),
= r1t (λ0 , ξ , ω , ζ )
(t = 1, . . . , F ),
= r2t (λ0 , ξ 0 , ω0 , ζ )
(t = 1, . . . , G)
0
0
(3.48)
with the vector notation ξ 0 = (ξ10 , . . . , ξB0 ),
0 ω0 = (ω10 , . . . , ωC ).
(3.49)
The renormalization group equations (2.5) are j
βj
∂τ 1 ∂τ l ∂τ l ∂τ = 0. + β1j + β2j + γj τ − ζ 1 l ∂λj 2 ∂ζ ∂ξj ∂ωj jl
(3.50)
jl
Taking into account the dimensionality of the β functions we write the representations βt = βt , u ui i = β1tk ξk , β1t uij j ui i β2u t = β2tk ωk + β2tkl ξki ξl
(3.51)
with coefficients uij
ui ui F = βt , β1tk , β2tk , β2tkl
depending on dimensionless ratios only F = F (λ0 , λ1 , . . . , λn , ζ ξ 0 , ζ ξ 1 , ζ 2 ω0 , ζ 2 ω1 ).
(3.52)
Terms involving ζ −1 or ζ −2 with non-vanishing coefficients for ζ → 0 should not be expected in realistic models. It is assumed that the limits βˆt = lim βt , ζ →0
βˆstu = lim βstu ζ →0
(3.53)
236
W. Zimmermann
exist. By (3.51) and (3.52) these limits yield quadratic forms in the dimensional couplings and masses with coefficients depending on the dimensionless couplings. The reduction equations (2.9) take the form ∂rt 1 ∂rt 0 ∂rt 0 ∂rt = βt , + β1j + β2j − ζ 0 0 ∂λ0 2 ∂ζ ∂ξj ∂ωj ∂rst 0 ∂rst 0 ∂rst 1 ∂rst β0 + β1j 0 + β2j 0 − ζ = βst ∂λ0 2 ∂ζ ∂ξj ∂ωj β0
(3.54) (3.55)
with primes indicating the insertion of the reducing functions. On the basis of formal power series expansions a transformation to a scheme can be constructed for which the β functions assume their value at ζ = 0. Details will not be given in this paper. The transformed coupling and mass parameters are denoted by 50 , 51 , . . . , 5n , 0 A1 , . . . , A0B , A11 , . . . , A1F , B01 , . . . , B0C , B11 , . . . , B1G .
(3.56)
For the transformed reducing functions we write the representations 5t = Rt (50 , A0 , B0 , ζ ) = Rt (50 , ζ A0 , ζ 2 B0 ), Stk A0k + St0 ζ −1 , A1t = R1t (50 , A0 , B0 , ζ ) = B1t = R2t (50 , A0 , B0 , ζ ) Ttkl A0k A0l + Ttk0 A0k ζ −1 . = Ttk B0k + St0 ζ −2 +
(3.57) (3.58)
(3.59)
Here the coefficients F = Stk , Sk0 , Ttk , Tt0 , Ttkl , Ttk0 depend on dimensionless ratios F = F (50 , ζ A0 , ζ 2 B0 ).
(3.60)
In the transformed version of the reduction equations ∂Rt 1 ∂Rt 0 ∂Rt 0 ∂Rt + βˆ1j + βˆ2j − ζ = βˆt , 0 ∂50 2 ∂ζ ∂Aj ∂B0j ∂Rst 0 ∂Rst 0 ∂Rst 1 ∂Rst βˆ0 = βˆst , + βˆ1j + βˆ2j − 0 0 ∂50 2 ∂ζ ∂Aj ∂Bj βˆ0
(3.61) (3.62)
the β functions are ζ -independent. Therefore, it is consistent (and can be justified by an equivalence argument) that the reducing functions (3.57)–(3.59) do not depend on ζ . This excludes terms involving ζ −1 or ζ −2 . In the remaining terms ζ may be set equal
Reduction and Asymptotic Freedom
237
to zero so that the coefficients (3.60) become independent of masses and dimensional couplings. Thus 5t = Rt (50 ),
(3.63)
A1t
= R1t (50 , A ) =
Stk A0k ,
(3.64)
B1t
= R2t (50 , A , B ) Ttkl ξk0 A0l . = Ttk B0k +
(3.65)
0
0
0
After insertion of (3.63) the β functions take the form βˆt = φt (50 ), u ui = χtk (50 )Aik , βˆ1t uij j u ui βˆ2t = ψtk (50 )Bik + ψtkl (50 )Aik Al .
(3.66)
Here (3.64) and (3.65) should be substituted for the variables A1k and B1l . Eventually the β functions and the reducing functions become expressed as quadratic forms of the independent variables A0k and B0l . Using ∂Rt = 0, ∂A0j
∂Rt = 0, ∂B0j
∂Rt = 0, ∂ζ
∂R1t ∂R1t ∂R2t = 0, = 0, =0 ∂ζ ∂ζ ∂B0j the reduction equations (3.61), (3.62) simplify considerably. With the representations (3.63)–(3.66) a first order system of ordinary differential equations is found for the coefficients Rt , Stk , Ttk , Ttkl of the reducing functions (3.63)–(3.65). The final result are Eqs. (2)–(11) of ref. [24]. 4. Evolution Equations and Asymptotic Freedom In this section evolution equations will be studied in connection with asymptotic freedom and reduction for models involving dimensionless couplings and masses defined by propagator singularities. For the notation see (3.1). Effective couplings λ¯ j = λ¯ j (z, m; λ0 , λ1 , . . . , λn , ζ ) z=
1 , |k|
ζ =
1 , |κ|
(j = 0, . . . , n),
(4.1)
m = (m1 , . . . , mI ),
depending on a momentum square k 2 are introduced by suitable vertex functions with initial values at the normalization point, λ¯ j = λj > 0
at
z = ζ > 0.
(4.2)
238
W. Zimmermann
For the effective couplings evolution equations hold in the form 1 d λ¯ j = βj (λ¯ 0 , . . . , λ¯ n , m1 z, . . . , mn z) − z 2 dz
(4.3)
with the initial values (4.2). The masses and initial values we restrict by (3.6), likewise z and the values λ¯ j assumed by the solutions of (4.3). Then by the Cauchy–Picard theorem a unique solution (4.1) of (4.3) exists with initial values (4.2). Unless the dependence on the initial values λj , ζ is relevant, the simplified notation λ¯ j = λ¯ j (z, m)
(4.4)
will be used instead of (4.1). Asymptotic freedom means that all effective couplings vanish in the high momentum limit lim λ¯ j (z, m) = 0
z→∞
(j = 0, 1, . . . , n).
(4.5)
In the case of several couplings this is not a property of the model as such, but selects, if at all possible, particular solutions of the evolution equations, while other solutions are not asymptotically free. By imposing (4.5) the couplings are no longer independent. In fact, it will be seen that (4.5) induces a reduction of couplings. Since zeroes are absent in the domain (3.6), the evolution equations (4.3) imply that each effective coupling is either monotonically increasing or decreasing. Therefore, condition (4.5) combined with convention (3.3) implies d λ¯ j >0 dz
(4.6)
βj (λ¯ 0 , . . . , λ¯ n , m1 z, . . . , mI z) < 0
(4.7)
and
for asymptotically free couplings λ¯ j on the domain (3.6). Thus a negative sign for the β functions is a necessary condition for asymptotic freedom. It is, however, – unlike the case of a single coupling – not sufficient in general. Sufficient conditions will be stated later after elimination of mass parameters in the β functions. In preparation for this, how evolution equations transform under a change of the renormalization scheme will be discussed. After a scheme changing transformation (3.7) new effective couplings may be defined by ¯ j = 'j (λ¯ 0 , . . . , λ¯ n , m, z). 5
(4.8)
Through the dependence (4.4) the transformed couplings (4.8) also become functions of z and m with initial values ¯ j = 5j 5
at
z = ζ.
(4.9)
For these functions the notation ¯ j (z, m; 50 , . . . , 5n , ζ ), ¯j =5 5
(4.10)
Reduction and Asymptotic Freedom
239
or simpler, ¯j =5 ¯ j (z, m) 5
(4.11)
will be used. With Eq. (3.11) it is easy to check that the new effective couplings again satisfy evolution equations in the form ¯j 1 d5 ¯ 0, . . . , 5 ¯ n , m1 z, . . . , mI z). − z = βˆj (5 2 dz
(4.12)
The condition (4.5) of asymptotic freedom is scheme independent. For a Taylor formula j ¯ j = λ¯ j + 5 λ¯ s λ¯ t Rst (4.13) j
with appropriate remainders Rst holds according to the properties of transformations (2.11) stated in the last section. Thus (4.5) implies the corresponding condition ¯ j (z, m) = 0 lim 5
z→0
(j = 1, . . . , n)
(4.14)
in the new scheme. The scheme independence justifies studying asymptotic freedom in a special scheme, where the β functions are massless. The evolution equations then take the simplified form ¯j 1 d5 ¯ 0, . . . , 5 ¯ n) − z = βˆj (5 2 dz
(j = 0, . . . , n)
(4.15)
with βˆj denoting the massless limit (3.7). For asymptotically free solutions we write (4.6) and (4.7) in transformed form ¯j d5 > 0, dz ¯ 0, . . . , 5 ¯ n ) < 0. βˆj (5
(4.16) (4.17)
With massless β functions it is possible to treat asymptotic freedom in two separate steps: First, all couplings are reduced to functions of a primary coupling, then the high momentum behavior is determined by a single evolution equation involving the primary ¯ 0 as a primary coupling and introduce coupling only. In order to show this we select 5 it in (4.15) as an independent variable instead of z. Because of (4.16) the function ¯0 =5 ¯ 0 (z, m) 5
(4.18)
¯ 0 , m). z = ζ¯ (5
(4.19)
may be inverted to
¯ j may be expressed as functionals of 5 ¯ 0, By this all 5 ¯j =5 ¯ j (z, m) = 5 ¯ j (ζ¯ (5 ¯ 0 , m), m), 5
(4.20)
which we denote by ¯ j = s¯j (5 ¯ 0 , m), 5
j = 1, . . . , n.
(4.21)
240
W. Zimmermann
¯ 0 as an independent variable the system (4.15) takes the equivalent form Introducing 5 d ζ¯ 1 = − ζ¯ , ¯0 2 d5 d s ¯ j βˆ0 = βˆj ¯0 d5
βˆ0
(4.22) (4.23)
with the notation ¯ 0 , s¯1 (5 ¯ 0 , m), . . . , s¯n (5 ¯ 0 , m)). βˆj = βj (5
(4.24)
Equation (4.22) is integrated by lg ζ¯ =
1 2
c
¯0 5
dx + d, β˜0
¯ 0, c>5
(4.25)
β˜0 = β0 (x, s¯1 (x, m), . . . , s¯n (x, m)). Equation (4.14) may be written equivalently as ¯ 0 , m) = 0, lim ζ¯ (5
(4.26)
¯ 0 , m) = 0. lim s¯j (5
(4.27)
5¯0 →0 ¯ 0 →0 5
Equations (4.23) constitute reduction equations for the reducing functions (4.21) of the ¯ 0 with the condition (4.27) to be imposed. With the solution of the primary coupling 5 reduction equations (4.21) the evolution of the system becomes a problem in one variable only: Eq. (4.22) or (4.25) controls the momentum dependence of the primary coupling ¯ 0 in the high momentum limit. Depending on the sign of β˜0 for small x the divergence 5 ¯ 0 → 0. of the integral for small couplings implies either ζ¯ → 0 or ζ¯ → ∞ for 5 The results of this analysis are summarized by the following necessary and sufficient conditions for asymptotic freedom: ¯ 0 is chosen so that the other Among the effective couplings a primary coupling 5 ¯ ¯ couplings 5j become functions of 50 . These functions should satisfy the reduction ¯ 0. equations (4.23) with the requirement (4.27) that the couplings vanish together with 5 ¯ The β function of 50 should be negative for sufficiently small couplings after inserting the solution of (4.23). As a corollary we note that for asymptotically free couplings all β functions simultaneously become negative for small couplings. More generally, as a consequence of (4.27) reduction solutions of (4.23) satisfy d s¯j >0 ¯0 d5
(4.28)
in (3.6) due to the absence of zeroes of the β functions and the convention (3.3). This means that all β functions have the same sign for small couplings. Negative sign corresponds to asymptotic freedom in the original sense. Positive sign of the β functions can be interpreted as asymptotic freedom in the infrared region. This is relevant for models without intrinsic masses. Not discussed in this paper is the case that β functions vanish identically for some solutions of the evolution equations.
Reduction and Asymptotic Freedom
241
We return to the theory of reduction in general schemes of renormalization. In the last section it was found that the reduction equations still involve the normalization mass after transforming to massless β functions. The resulting reduction equations in the case of asymptotic freedom seem to indicate that such a dependence should not be expected. It will be shown that indeed the normalization mass can be eliminated independently of the scheme by making use of the evolution equations. We begin by setting up the evolution equations of the reduced model. To this end we combine the reduction equations (3.38) with the original form (4.3) of the evolution equations. As initial values (4.2) for the solutions (4.1) of (4.3) reducing functions rj will be taken: λ¯0 = λ0 ,
λ¯ j = rj (λ0 , m, ζ )
at
z=ζ
(j = 1, . . . , n).
(4.29)
The functions rj are supposed to obey the reduction equations (3.38) with the condition (3.40) or the stronger power series requirement (3.41). By the assumptions stated on the β functions for the domain (3.6) existence and uniqueness of the effective couplings (4.1) is implied. Corresponding to the primary coupling λ0 we define an effective coupling λ¯ 0 by (4.1), λ¯ 0 = λ¯ 0 (z, m),
(4.30)
using the simplified notation (4.4). For the reduced model an evolution equation for λ¯ 0 alone is expected. As such we propose 1 d λ¯ − z 0 = β¯0 2 dz
(4.31)
βˆj = βj (λ¯ 0 , r1 (λ¯ 0 , m, z), . . . , rn (λ¯ 0 , m, z)),
(4.32)
with the notation
and the initial conditions λ¯ 0 = λ0
at
z=ζ
(4.33)
to be imposed. We have chosen another notation λ¯ 0 for the effective coupling, since it has yet to be shown that (4.30) indeed solves (4.31). In the domain (3.6), λ¯ 0 = λ¯ 0 (z, m)
(4.34)
exists as a unique solution of (4.31) with the initial condition (4.33). The other effective couplings λ¯ j are introduced by λ¯ j = λ¯ j (z, m) = rj (λ¯ 0 (z, m), m, z)
(j = 1, . . . , n)
(4.35)
as functionals of λ¯ 0 . It will be seen that the functions (4.34) solving (4.31)–(4.33) combined with the functions (4.35) on the one hand and the function (4.30) solving (4.3), (4.29) on the other hand are identical, λ¯ j ≡ λ¯ j
(j = 0, . . . , n).
(4.36)
For the proof we need only check that the functions (4.34), (4.35) likewise solve the evolution equations (4.3) with the initial conditions (4.2). Identity (4.36) follows by
242
W. Zimmermann
the uniqueness property of these differential equations. For j = 0 Eq. (4.3) is satisfied according to the defining equation (4.31) of λ¯ 0 . In order to verify the remaining equations we differentiate (4.35) with respect to z, 1 d λ¯ j 1 ∂rj 1 ∂rj d λ¯ 0 − z =− z − z 2 dz 2 ∂ζ 2 ∂λ0 dz ∂rj 1 ∂rj =− z = β¯j . + β¯0 2 ∂ζ ∂λ0
(4.37)
Here λ¯ 0 (z, m) and z should be substituted for the arguments λ0 and ζ resp. in the partial derivatives of rj , similar to (4.35), for the notation β¯j see Eq. (4.32). Thus we have shown that the functions λ¯ j indeed satisfy the evolution equations (4.3). Since the initial conditions (4.2) are also fulfilled, the proof of (4.36) is completed. The results may be summarized as follows. The effective coupling (4.30) of the reduced model solves the evolution equations (4.31), 1 d λ¯ 0 − z = β¯0 , 2 dz
λ¯ 0 = λ0
at
z = ζ,
(4.38)
with β¯0 given by (4.32). Defining the other couplings by λ¯ j = λ¯ j (z, m) = rj (λ¯ 0 (z, m), m, z),
(4.39)
a solution of the original evolution equations (4.3) in the form ¯ 1 d λj − z = β¯j 2 dz
(4.40)
is obtained with the initial conditions (4.2). We next turn to the question to what extent the reduction equations (3.38) contain redundant information and how it can be eliminated. On the basis of the evolution equations a natural constraint on the reducing functions will be found. Obviously, relevant for the interpretation of the reduction method can only be the final functional dependence of the effective couplings λ¯ j on the primary coupling λ¯ 0 . Accordingly, we call two sets of reducing functions equivalent, (1)
rj
(2)
∼ rj ,
(4.41)
if the resulting functional dependence λ¯ j (z, m) = s¯j (λ¯ 0 (z, m), m)
(j = 1, . . . , n)
(4.42)
is the same. In order to find an appropriate formulation we take the reduced form (4.38) of the evolution equations and invert its solution (4.30) with respect to z, z = ζ¯ (λ¯ 0 , m),
(4.43)
using that β¯0 does not vanish in the domain considered. Then the effective couplings λ¯ j may be expressed as functions of λ¯ 0 , λ¯ j = rj (λ¯ 0 , m, ζ¯ (λ¯ 0 , m)) = s¯j (λ¯ 0 , m).
(4.44)
Reduction and Asymptotic Freedom (1)
Hence rj
(2)
and rj
243
are equivalent, if
(1) (2) rj (λ¯ 0 , m, ζ¯ (1) (λ¯ 0 , m)) = rj (λ¯ 0 , m, ζ¯ (2) (λ¯ 0 , m))
or
(1)
(2)
s¯j (λ¯ 0 , m) = s¯j (λ¯ 0 , m).
(4.45)
The s¯j are not reducing functions per se, but may be viewed as such by admitting a sliding normalization mass. To see this we replace z by λ¯ 0 as an independent variable in (4.40). Similar to the discussion of asymptotic freedom the equivalent set of differential equations d ζ¯ 1 = − ζ¯ , 2 d λ¯ 0 d s ¯ j β¯0 = β¯j , d λ¯ 0 β¯0
(4.46) (4.47)
β¯j = βj (λ¯ 0 , λ¯ 0 , s¯1 (λ¯ 0 , m), . . . , s¯n (λ¯ 0 , m), m, ζ¯ (λ¯ 0 , m))
(4.48)
is obtained. In Eqs. (4.46)–(4.48) we replace λ¯ 0 by its value λ0 at the normalization point and change the notation s¯j , ζ¯ , β¯j to sj , ζ, βj accordingly. Then we have a set of n + 1 ordinary differential equations dζ 1 = − ζ, dλ0 2 dsj β0 = βj (j = 1, . . . , n), dλ0
β0
βj = βj (λ0 , s1 (λ0 , m), . . . , sn (λ0 , m), mζ (λ0 , m))
(4.49) (4.50) (j = 0, . . . , n).
(4.51)
for the functions ζ = ζ (λ0 , m),
λj = sj (λ0 , m)
(j = 1, . . . , n).
(4.52)
The function sj are related to reducing functions by (4.44): sj (λ0 , m) = rj (λ0 , m, ζ (λ0 , m)).
(4.53)
Equations (4.50) may thus be interpreted as reduction equations modified by a sliding normalization mass |κ| =
1 ζ (λ0 , m)
(4.54)
which satisfies the differential equation (4.49). In general Eqs. (4.49) and (4.50) are coupled by the dependence of the β functions on the normalization mass. But in a scheme with massless β functions the system (4.50) can be solved independently of (4.49). Any set sj of solutions for (4.50) then also satisfies (3.38) with ∂sj = 0. ∂ζ
(4.55)
244
W. Zimmermann
Therefore, in a scheme with massless β functions the functions sj coincide with reducing functions Rj independent of ζ , thus representing an equivalence class. Hence without loss of information the dependence on the normalization mass may be disregarded so that the reduction equations (3.38) with massless β functions become a set of ordinary differential equations βˆ0
dRj = βˆj d50
(j = 1, . . . , n)
(4.56)
for functions 5j = Rj (50 ). Equation (4.49) may be integrated to ζ = c exp[−
1 2
a
λ0
dx ], β˜0
(4.57)
β˜0 = βˆ0 (x, R1 (x), . . . , Rn (x)), so that identity (4.53) becomes 5j = Rj (50 ) 1 = Rj (50 , m, c exp[− 2
a
50
dx ]). β˜0
(4.58)
Since the constants m1 , . . . , mI , a and c (correlated to a) do not occur otherwise in (4.56), they may be absorbed by the arbitrary integration constants of the general solution for (4.56). Thus a set of reducing functions Rj is selected in each equivalence class by the solutions of (4.56). In the original formulation of the model on the basis of mass dependent β functions a corresponding set rj may be constructed by applying the inverse of (3.8) with (3.13) to Rj . On the reducing functions thus selected the condition (3.40) or the power series requirement (3.41) is imposed. Acknowledgement. With great pleasure I thank my colleagues P. Breitenlohner, J. Kubo, D. Maison, R. Oehme and K. Sibold for helpful discussions.
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.
Zimmermann, W.: Commun. Math. Phys. 97, 211 (1985) Weinberg, S.: Phys. Rev. D 8, 3497 (1973) Collins, J.C. and McFarlane, A.J.: Phys. Rev. D 10, 1201 (1974) Stueckelberg, E. and Petermann, A.: Helv. Phys. Acta 26, 499 (1953) Gell-Mann, M. and Low, F.: Phys. Rev. 95, 1300 (1954) Bogoliubov, N.N. and Shirkov, D.V.: Dokl. Akad. Nauk SSSR 102, 391 (1955) Zimmermann, W.: In: XIV. Intern. Coll. on Group Theor. Methods in Physics, Seoul, Korea, ed. Y.M. Cho, Singapore: World Scientific, 1985, p. 145 Sibold, K.: In: Proc. of the Intern. Europhysics Conf. on High Energy Physics, Bari, Italy 1985 Oehme, R., Progr. Theor. Phys. Suppl. 86, 215 (1986) Zimmermann, W.: In: Renormalization Group 1986, Dubna, USSR, eds. D.V. Shirkov, D.I. Kazakov and A.A. Vladimirov, Singapore: World Scientific, 1986, p. 51 Kubo, J.: In: Proc. of the 1989 Workshop on Dynamical Symmetry Breaking, eds. T. Muti and K.Yamawaki, Nagoya 1989, p. 48
Reduction and Asymptotic Freedom
245
12. Sibold, K.: Acta Physica Polonica 19, 295 (1989) 13. Kubo, J.: In: Recent Developments in Quantum Field Theory, eds. P. Breitenlohner, D. Maison and J. Wess, Heidelberg: Springer-Verlag, 2000 14. Oehme, R.: In: Recent Developments in Quantum Field Theory, eds. P. Breitenlohner, D. Maison and J. Wess, Heidelberg: Springer-Verlag, 2000 15. Chang, N.-P.: Phys. Rev. 10, 2706 (1974) 16. Fradkin, E.S. and Kalashnikov, O.K.: J. Phys. A 8, 1814 (1975); Phys. Lett. B 64, 177 (1976) 17. Ma, E.: Phys. Rev. D 11, 322 (1975); D 17, 623 (1978) 18. Chang, N.-P., Das, A. and Perez-Mercader, J.: Phys. Rev. D 22, 1829 (1980) 19. Kazakov, D.I. and Shirkov, D.V.: In: Proc. of the 1975 Smolence Conf. on High Energy Particle Interactions, eds. D. Krupa and J. Pišút, VEDA, Bratislava: publishing House of the Slovak Academy of Sciences, 1976 20. Piguet, O. and Sibold, K.: Phys. Lett. B 229, 83 (1989) 21. Callan, C.: Phys. Rev. D 2, 1541 (1970) 22. Symanzik, K.: Commun. Math. Phys. 18, 227 (1970) 23. Osviannikov, L.V.: Dokl. Akad. Nauk SSSR 109, 1112 (1956) 24. Kubo, J.: Mondragón, M. and Zoupanos, G.: Phys. Lett. B 389, 523 (1996) 25. Kawamara, Y., Kobayashi, T. and Kubo, J.: Phys. Lett. B 405, 64 (1997) 26. Zimmermann, W.: In: Proc. of the 12th Max Born Symposium, eds. A. Borowiec, W. Cegła, B. Jancewicz and W. Karwowski, Heidelberg: Springer-Verlag, 1998 27. Breitenlohner, P. and Maison, D.: published in this volume 28. Wilson, K.: Phys. Rev. D 3, 1818 (1971) 29. Gross, D.J. and Wilczek, F.: Phys. Rev. Lett. bf30, 1343 (1973); Phys. Rev. D 8, 3633 (1973) 30. Politzer, H.P.: Phys. Rev. Lett. 30, 1346 (1973) 31. Oehme, R. and Zimmermann, W.: Commun. Math. Phys. 97, 569 (1985) 32. Oehme, R., Sibold, K. and Zimmermann, W.: Phys. Lett. B 153, 147 (1985) 33. Zimmermann, W.: Lett. Math. Phys. 30, 61 (1994) 34. Kubo, J., Mondragón, M. and Zoupanos, G.: Nucl. Phys. B 424, 29 (994) 35. Courant, R. and Hilbert, D.: Methoden der mathematischen Physik. Vol. II, Berlin: Springer-Verlag, 1931 and 1937 36. Epstein, B.: Partial Differential Equations. New York: McGraw-Hill Book Company, 1962 Communicated by A. Jaffe
Commun. Math. Phys. 219, 247 – 257 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Non-Abelian Gauge Theories on Non-Commutative Spaces J. Wess Sektion Physik der Ludwig-Maximilians-Universität, Theresienstr. 37, 80333 München and Max-Planck-Institut für Physik, (Werner-Heisenberg-Institut), Föhringer Ring 6, 80805 München, Germany Received: 9 August 2000 / Accepted: 12 August 2000
Dedicated to the memory of Harry Lehmann Abstract: The construction of a non-abelian gauge theory on non-commutative spaces is based on enveloping algebra-valued gauge fields. The number of independent field components is reduced to the number of gauge fields in a usual gauge theory. This is done with the help of the Seiberg–Witten map. The dynamics is formulated with a Lagrangian where additional couplings appear.
In a recent development gauge theories on non-commutative space-time structures [1– 3] have attracted some interest. This approach can certainly lead to a regularization scheme for gauge theories. Non-commutative space-time coordinates will in general have a discrete eigenvalue spectrum when considered as self-adjoint linear operators on a Hilbert space. The gauge theory will resemble a gauge theory on a lattice. Particular non-commutative spaces have a co-module structure under the action of a quantum group [4, 16]. These spaces allow us to maintain a symmetry structure, usually referred to as quantum groups. Quantum groups can be considered as deformations of groups in the category of Hopf algebras. They depend on a parameter (parameters), we shall call it q, such that for a special value of this parameter, say q = 1, the quantum group coincides with a given group. For different values of the parameter quantum groups are not groups, they are, however, still Hopf algebras. Among the groups that can be deformed that way is the Lorentz group. Thus it seems natural to consider non-commutative spaces that are comodules of the q-deformed Lorentz group. For the regularized gauge theory a symmetry structure will be maintained which in the limit q → 1 becomes the Lorentz group. The hope, however, is that nature could accept the regularization as a natural cutoff for gauge theories. At large distances, however, space-time does not show a lattice structure that is related to a q-deformed Lorentz group. We therefore think along the line that the dynamics of a gauge theory forces a phase transition on space-time at high
248
J. Wess
energy densities or equivalently at very short distances. The phase at large distances is the well-known commutative structure of space-time, the other phase at short distances is governed by a non-commutative structure. The energy density or distance where the phase transition is supposed to take place is characterized by the gauge coupling – the Planck scale is not a natural scale in this scenario. To probe such a scenario we have to formulate a gauge theory on non-commutative spaces. We shall outline a quite general method of how this can be done. The associative algebra we have in mind will be defined by elements that generate the algebra and by relations between these elements. The generating elements we call coordinates {x 1 , . . . , x N }, the relations then generate ideals and the algebra freely generated by the coordinates and divided by the ideals is the respective algebra: Ax =
C[[xˆ 1 . . . xˆ N ]] . R
Formal power series are accepted. Three examples should illustrate this. Canonical structure: The relations are of the type [xˆ i , xˆ j ] = iθ ij ,
θ ij ∈ C.
In phase space this is the algebraic structure of quantum mechanics and has quite nontrivial consequences. In our approach the xˆ i are elements of the configuration space. In this context the canonical structure has been considered in non-commutativeYang–Mills theory in ref. [1]. Lie structure: ij
ij
[xˆ i , xˆ j ] = iθk xˆ k ,
θk ∈ C.
This is the algebraic structure of Lie algebras. It has also been used for non-commutative coordinates in matrix models. The rich structure of Lie algebras is well known. Quantum plane structure: ij
[xˆ i , xˆ j ] = iθkl xˆ k xˆ l . This is the algebraic structure of quantum planes which are comodules of quantum groups. Based on such relations a very rich mathematical structure has been revealed within the last twenty years. The simplest example is the Manin plane with N = 2: xˆ yˆ = q yˆ x, ˆ ij
ij
q ∈ C.
In general the constants θk and θkl will be subject to consistency relations. For the structure constants of a Lie algebra this will be the Jacobi identity, for the quantum structure this will lead to the quantum Yang–Baxter equation.
Non-Abelian Gauge Theories on Non-Commutative Spaces
249
Gauge transformations Fields will be elements of the algebra Ax : ψ(x) ˆ ∈ Ax . Under an infinitesimal gauge transformation a field transforms, as usual: δψ(x) ˆ = i α( ˆ x)ψ( ˆ x), ˆ
αˆ ∈ Ax .
α( ˆ x) ˆ is also an element of the enveloping algebra of a Lie group - the gauge group. The fields are representations of this Lie group, and αˆ acts on these representations. Coordinates are not a covariant concept under gauge transformations: δ(xˆ i ψ) = i xˆ i αψ ˆ = i α( ˆ xˆ i ψ). We can try to define covariant coordinates the same way as we are used to define covariant derivatives: Xˆ i = xˆ i + Aˆ i (x), ˆ
Aˆ i (x) ˆ ∈ Ax .
At the same time Aˆ is an element of the enveloping algebra, we call it enveloping algebra-valued. We demand δ Xˆ i ψ = i αˆ Xˆ i ψ and find δ Aˆ i = i[α, ˆ Aˆ i ] − i[xˆ i , α]. ˆ Tensors can be formed: ˆ Tˆ ij = [Xˆ i , Xˆ j ] − i θˆ ij (X), ˆ is the right hand side of the commutator of the coordinates with the where θˆ ij (X) coordinates replaced by covariant coordinates. The tensors are enveloping algebra-valued as well. The transformation law of these tensors is: δ Tˆ ij = i[α, ˆ Tˆ ij ]. This now allows us to proceed formally but it does not tell us how to make contact with real or complex numbers through which we communicate with nature. Finally we will have to study the representations of the algebraic elements in a Hilbert space to relate the formalism to measurements. To formulate the gauge theory in terms of fields that depend on elements of Rn we can, however, use the formalism of Weyl quantization, without studying the representation theory first. The elements of Rn we shall denote by (x 1 , . . . , x n ).
250
J. Wess
Weyl quantization With a function f (x 1 , . . . , x n ) defined on Rn we can associate an element of the algebra. First Fourier transform the function f : j 1 f˜(k1 , . . . , kn ) = d n xe−i j kj x f (x 1 , . . . , x n ) n/2 (2π ) and define the corresponding element of the algebra by: j 1 W (f ) = d n kei j kj xˆ f˜(k1 , . . . , kn ). n/2 (2π) The element xˆ of the algebra replaces the variable x in f in the most symmetric way – this follows from the power series expansion of the exponential. The algebra elements obtained that way can be multiplied. If these products can be associated again with a classical function we write the product as follows: W (f )W (g) = W (f g). In this case the symbol f g is defined by the left hand side of this equation. It will be linear in f and g. We can write the left-hand side more explicitely: 1 W (f )W (g) = ˜ d n kd n peik·xˆ eip·xˆ f˜(k)g(p). (2π )n For the canonical structure and the Lie structure this product exists and can be calculated using the Baker–Campbell–Hausdorff formula. The function f g exists. Canonical case: i
ˆ 2 k·θ·p eik·xˆ eip·xˆ = ei(k+p)·x− , i
∂
f g = e 2 ∂x i
θ ij
∂ ∂y j
f (x)g(y)
y→x
.
We have obtained the Moyal–Weyl ∗ product f ∗ g [9–14]. For the Lie structure the product of the two exponentials corresponds to group multiplication eik·xˆ eip·xˆ = ei(k+p+ 2 g(k,p))·xˆ , 1
i
f g = e2
∂ ∂ x i gi (i ∂y ,i ∂z )
f (y)g(z)y→x . z→x
For the quantum space structure the Baker–Campbell–Hausdorff formula is not manageable. Moreover, Weyl quantization using the Fourier transformation works well for functions that have a well-behaved Fourier transform. Polynomials are not among these functions. In our algebraic setting fields have been defined as elements of Ax , thus essentially are polynomials and power series. In this setting it is more natural to start from a more algebraic interpretation of Weyl quantization.
Non-Abelian Gauge Theories on Non-Commutative Spaces
251
We study algebras which have a basis. We denote the elements forming that basis by β ν . Any element of the algebra can be expressed in this basis with certain coefficient functions. f (xˆ 1 , . . . , xˆ n ) = fν β ν , ν
n
g(xˆ , . . . , xˆ ) = 1
gν β ν
ν
so can the product f ·g =
dν β ν .
If we symbolically denote the elements by {fν } and {gν } and {dν } respectively, i.e., by there coefficient functions, we write {dν } = {fν } {gν }. The algebraic properties are mapped to the coefficient functions and the non-commutative structure is reflected in the diamond product . A special role is played by algebras that have the Poincare–Birkhoff–Witt property. Poincare–Birkhoff–Witt tells us that, considered as a graded algebra, the subspace of polynomials of fixed degree has the same dimension as the corresponding subspace of polynomials in commuting variables. We then choose a particular basis in this grading and characterize an arbitrary element of the algebra by the coefficient function of this element when expanded in this basis. Then we construct the function of commuting variables by replacing the basis of non-commuting variables by the corresponding basis elements of the commuting variables. This way we obtain a unique map of elements of the algebra into the set of real analytic functions and vice versa. Let me illustrate this by the simplest example of the canonical case. We consider two variables: x, y and xˆ yˆ − yˆ xˆ = iθ . To the constant c and the elements x, ˆ yˆ of Ax correspond the constant c and the monomials x, y. For the basis in Ax we choose the symmetric monomials. The basis elements for the polynomials of second degree are: xˆ x, ˆ 21 (xˆ yˆ + yˆ x), ˆ yˆ y. ˆ We obtain the map xˆ xˆ ←→ xx, 1 (xˆ yˆ + yˆ x) ˆ ←→ xy, 2 yˆ yˆ ←→ yy. For the elements xˆ yˆ and yˆ xˆ of Ax this implies: 1 (xˆ yˆ + yˆ x) ˆ + 2 1 yˆ xˆ = (yˆ xˆ + xˆ y) ˆ − 2 xˆ yˆ =
1 iθ ←→ xy + 2 1 iθ ←→ xy − 2
1 iθ, 2 1 iθ. 2
252
J. Wess
In the language of the Weyl quantization that maps the functions f (x) to elements of Ax this is: W (c) = c, W (x) = x, ˆ W (y) = y, ˆ 1 ˆ W (xy) = (xˆ yˆ + yˆ x), 2 W (x 2 ) = xˆ x, ˆ W (y 2 ) = yˆ y. ˆ The multiplication law is easily obtained: 1 W (x)W (y) = xˆ yˆ = W xy + iθ , 2 i x y = xy + θ. 2 This is exactly the result we obtain from the Moyal–Weyl ∗ product. With some combinatorics we can show that the algebraic approach using the completely symmetrized spaces exactly reproduces the diamond product of the Weyl quantization for the canonical and Lie structures. Let me now treat the Manin plane xˆ yˆ = q yˆ x, ˆ q ∈ R, q > 1. The Poincare–Birkhoff– Witt property holds, and what is more, we can choose any ordering for a basis. We choose the xˆ yˆ ordering for simplicity. W (c) = c, W (x) = x, ˆ W (y) = y, ˆ W (f (x, y)) =: f (x, ˆ y) ˆ :. The dots :: mean that in the power series expansion of the real analytic function f each monomial in x, y has to be replaced by the ordered monomial in xˆ y. ˆ We derive the multiplication law: W (x r y s )W (x n y m ) = W (q −sn x r+n y s+m ). This immediately leads to the diamond product f g(x, y) = q
∂ −x ∂x∂ y ∂y
f (x, y)g(x , y )x →x . y →y
This should serve as an example for the general case. The Poincare–Birkhoff–Witt property holds for all the quantum spaces mentioned above. Therefore we have a unique and invertible map of the fields as elements of Ax to the fields defined as functions of the commutative variables x 1 · · · x N . Actually, monomials in any well defined ordering form a basis. Among them are the completely symmetrized polynomials. f (x 1 , . . . , x N ) ←→ Wˆ (f ). In the case of algebras with the Poincare–Birkhoff–Witt property when an arbitrary ordering is allowed we shall denote the (diamond) product as ∗ (star) product.
Non-Abelian Gauge Theories on Non-Commutative Spaces
253
Our aim is to formulate a dynamics for a physical system with non-commuting coordinates in terms of fields that depend on commuting variables. These fields we shall call physical fields. The non-commuting structure is encoded in the ∗ product as the multiplication law for the physical fields. To formulate a dynamics the algebra has to be enlarged by derivatives. this can be done on purely algebraic grounds, derivatives being introduced as elements of an algebra and the Leibniz rule taking the role of relations. The Leibniz rule has to be defined in such a way that it does not lead to new relations for the coordinates. This is what we mean when we say that the derivatives have to be consistent with the non-commutative structure of the coordinates. For the quantum plane such derivatives have been introduced in ref. [15]. Following the same strategy derivatives can be defined for the canonical structure as well. For the canonical case the usual Leibniz rule will be consistent. ∂ˆρ xˆ µ = δρµ + xˆ µ ∂ˆρ . For the canonical structure the expression xˆ α − iθ αρ ∂ˆρ commutes with all coordinates. It can be used to define a relation on the x, ˆ ∂ˆ algebra. For invertible θ this amounts to defining derivatives as follows: −1 µ ∂ˆρ = −iθρµ xˆ .
As a consequence we do not have to enlarge the algebra by derivatives. In the following we shall restrict the discussion to the canonical structure. Gauge theories As an example we treat non-abelian gauge theories on non-commutative spaces. Such gauge theories cannot be formulated with Lie algebra-valued infinitesimal transformations and consequently not with Lie algebra-valued gauge fields. The reason is that in the composition of infinitesimal transformations commutators and anticommutators of the generators of the group appear. The enveloping algebra of the Lie algebra is the proper setting for such gauge theories An enveloping algebra-valued infinitesimal transformation will in general depend on infinitely many parameters, an enveloping algebra-valued gauge field on infinitely many component fields. A dynamic based on infinitely many fields is not very attractive, it is, however, possible to define enveloping algebra-valued infinitesimal transformations and gauge fields that depend on a finite number parameters and component fields only. The construction of such transformations is based on the Seiberg–Witten map [3]. This map allows us to construct enveloping algebra-valued gauge fields Aν and gauge parameters α parametriced by a finite number of gauge fields aaν (x) and parameters αa1 (x) and their derivatives [8, 17]. The gauge fields aaν (x) will transform like the usual gauge fields. The gauge field Aν will transform as outlined above. The transformation law of the tensor T ij will be unchanged. These tensors can then be used to build invariant Lagrangians for the gauge fields aaν (x). These Lagrangians will be different from the usual gauge-invariant Lagrangians, so will be the coupling of matter fields to the gauge fields because Aν has to be used in the gauge-covariant coupling.
254
J. Wess
We are now going to outline the construction of such a theory. Non-abelian: [T a , T b ] = if ab c T c Non-commutative: [xˆ µ , xˆ ν ] = iθ µν It is natural to consider both algebras simultaneously: zˆ i = {xˆ 1 , . . . , xˆ N , T 1 , . . . , T M }, Az =
C[[ˆz1 , . . . , zˆ N+M ]] . R
The ∗ product formalisms developed with the help of the Baker–Campbell–Hausdorf formula can be applied. We introduce the commuting variables zi = {x 1 , . . . , x N , t 1 , . . . , t M } and find the ∗ product: i
(F ∗ G)(z) = e 2
θ µν ∂x∂µ
∂ ∂x ν
+t a ga i ∂t∂ ,i ∂t∂
F (x , t )G(x , t ) x →x,x →x . t →t,t →t
We define the “physical” fields ψ(x) by the correspondence ˆ x) ψ( ˆ ←→ ψ(x). They span a representation of the gauge group and depend on the commuting coordinates xi . The transformation parameters and the gauge field remain enveloping algebra-valued Aˆ ν (ˆz) ←→ Aν (z), α(ˆ ˆ z) ←→ α(z). It is only via the Seiberg–Witten map that the physical fields aaν enter. In the ∗ product formulation the transformation of the gauge field is: δAν = −i[x ν ∗, α] + i[α ∗, Aν ] = θ νµ ∂µ α + i[α ∗, Aν ]. This, and the definition of the derivative ∂ˆ µ justifies the ansatz: Aν = θ νρ Vρ , that leads to δVρ = ∂ρ α + i[α ∗, Vρ ]. We expand this equation in powers of θ and find a solution to zeroth order that is linear in t:
Non-Abelian Gauge Theories on Non-Commutative Spaces
255
Zeroth order: α = αa1 t a ,
1 a Vρ = aρ,a t . 1 transforms in the usual way: The gauge field aρ,a 1 1 δaρ,a = ∂ρ αa1 − f bc a αb1 aρ,c .
In first order in θ we find a solution that is of second order in t: First order: 2 a b t t + ... , α = αa1 t a + αab
1 a 2 t + aρ,ab tatb + . . . . Vρ = aρ,a
To this order the transformation law of Vρ will be satisfied with the following choice for α (2) and a (2) : 1 νµ 1 θ ∂ν αa1 aµ,b t at b, 2 1 2 1 1 1 t a t b = − θ νµ aν,a (∂µ aρ,b + Fµρ,b )t a t b , aρ,ab 2 1 1 1 1 1 Fνρ,b = ∂ν aµ,b − ∂µ aν,b + f cd b aν,c aµ,d . 2 a b t t = αab
This procedure can be generalized to all the higher powers in θ . We will find solutions where the coefficients of θ n−1 , α n and a n , are polynomials of degree n in t. They will 1 and their derivatives. This will be true to all orders in θ as a be functions of αa1 and aρ,a consequence of the existence of the Seiberg–Witten map [5–7]. It is natural to introduce the field strength Fνρ in analogy to the tensor Tνρ : Fκλ = ∂κ Vλ − ∂λ Vκ − iVκ ∗ Vλ + iVλ ∗ Vκ . It will transform under the restricted transformations like a tensor as well: δα 1 Fκλ = i[α ∗, Fκλ ]. To formulate the dynamics it is desirable to do it in the Lagrangian formalism. An integral has to be defined for this purpose. For the canonical structure an integral can be defined: φˆ = d N x φ(x). We have restricted the integral to the x-coordinates and forget about the t-coordinates for a moment. For the integral over the product of fields the ∗ product has to be used: ψˆ φˆ = d N x ψ(x) ∗ φ(x). From the definition of the ∗ product follows ˆ ˆ ψ φ = φˆ ψˆ
256
J. Wess
and Stokes’ theorem
ˆ = [∂ˆl , ψ]
dN x
∂ ψ = 0. ∂x
A natural choice for the Lagrangian of a gauge theory is L=
1 Tr Fαβ ∗ F αβ . 4
The trace is to be taken over the T -matrices which replace the coordinate t. This action will be invariant under gauge transformations δL = i[α ∗, L], as a consequence of the properties of the trace and the integration dicussed above. To first order in θ the following terms appear in the field strength: 1 1 1 1 Fκλ = Fκλ,a T a + θ µν (T a T b + T b T a ) Fκµ,a Fλν,b 2
1 1 1 1 2∂ν Fκλ,b − aµ,a + aν,c Fκλ,d f cd b . 2 This should serve as an example of how a non-commutative space-time structure manifests itself in the couplings of a gauge theory, formulated on ordinary space in terms of ordinary fields. The consequence of such couplings for the field theoretical properties of the model as well as for the phenomenological properties remain to be investigated. References 1. Madore, J., Schraml, S., Schupp, P. and Wess, J.: Gauge theory on noncommutative spaces. Eur. Phys. J. C 16, 161 (2000), hep-th/0001203 2. Connes, A., Douglas, M.R., Schwarz, A.: Noncommutative Geometry and Matrix Theory: Compactification on Tori. JHEP 9802, 003 (1998), hep-th/9711162 3. Seiberg, N. and Witten, E.: String theory and noncommutative geometry. JHEP 9909, 032 (1999), hepth/9908142 4. Faddeev, L., Reshetikhin, N. and Takhtajan, L.: Quantization of Lie groups and Lie algebras. Leningrad Math. J. 1, 193 (1990) 5. Jurˇco, B., Schupp, P.: Noncommutative Yang-Mills from equivalence of star products. Eur. Phys. J. C 14, 367 (2000), hep-th/0001032 6. Jurˇco, B., Schupp, P. and Wess, J.: Noncommutative gauge theory for Poisson manifolds. Nucl. Phys. B 584, 784 (2000), hep-th/0005005 7. Jurˇco, B., Schupp, P. and Wess, J.: Nonabelian noncommutative gauge theory and Seiberg–Witten map. In preparation, hep-th/0102129 8. Jurˇco, B., Schraml, S., Schupp, P. and Wess, J.: Enveloping algebra-valued gauge transformations for non-abelian gauge groups on non-commutative spaces. Eur. Phys. J. C 17, 521 (2000), hep-th/0006246 9. Weyl, H.: Quantenmechanik und Gruppentheorie. Z. Physik 46, 1 (1927); The theory of groups and quantum mechanics. New York: Dover, 1931, translated from Gruppentheorie und Quantenmechanik, Leipzig: Hirzel Verlag, 1928 10. Wigner, E.P.: Quantum corrections for thermodynamic equilibrium. Phys. Rev. 40, 749 (1932) 11. Moyal, J.E.: Quantum mechanics as a statistical theory. Proc. Cambridge Phil. Soc. 45, 99 (1949) 12. Bayen, F., Flato, M., Fronsdal, C., Lichnerowicz, A., Sternheimer, D.: Deformation theory and quantization. I. Deformations of symplectic structures. Ann. Physics 111, 61 (1978) 13. Kontsevitch, M.: Deformation quantization of Poisson manifolds, I. q-alg/9709040 14. Sternheimer, D.: Deformation Quantization: Twenty Years After. math/9809056 15. Wess, J. and Zumino, B.: Covariant differential calculus on the quantum hyperplane. Nucl. Phys. Proc. Suppl. B 18, 302 (1991)
Non-Abelian Gauge Theories on Non-Commutative Spaces
257
16. Wess, J.: q-deformed Heisenberg Algebras. In: H. Gausterer, H. Grosse and L. Pittner, eds., Proceedings of the 38. Internationale Universitätswochen für Kern- und Teilchenphysik, Lect. Notes in Phys. 543, Berlin–Heidelberg–New York: Springer-Verlag, 2000, Schladming, January 1999, math-ph/9910013 17. Bonora, L., Schnabl, M., Sheikh-Jabbari, M.M. and Tomasiello, A.: Noncommutative SO(n) and Sp(n) gauge theories. hep-th/0006091 Communicated by W. Zimmermann
Commun. Math. Phys. 219, 259 – 270 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Classical Versions of q-Gaussian Processes: Conditional Moments and Bell’s Inequality Wlodzimierz Bryc Department of Mathematics, University of Cincinnati, PO Box 210025, Cincinnati, OH 45221-0025, USA. E-mail:
[email protected] Received: 18 September 2000 / Accepted: 17 November 2000
Abstract: We show that classical processes corresponding to operators which satisfy a q-commutative relation have linear regressions and quadratic conditional variances. From this we deduce that Bell’s inequality for their covariances can be extended from q = −1 to the entire range −1 ≤ q < 1. 1. Introduction In this paper we consider a linear mapping H f → af ∈ B from the real Hilbert space H into the algebra B of bounded operators acting on a complex Hilbert space which satisfies the q-commutation relations af ag∗ − qag∗ af = f, g I,
(1)
and af = 0 for a vacuum vector . This defines a non-commutative stochastic process Xf = af + af∗ , first studied in [5], which following [2] we call the q-Gaussian process. For different values of q, these processes interpolate between the bosonic (q = 1) and fermionic (q = −1) processes, and include free processes of Voiculescu [7] (q = 0). One of the basic problems arising in this context is the existence of the classical versions of q-Gaussian processes, see Definition 2. For q = 1, these are the classical Gaussian processes with the covariances f, g f,g∈H . For q = −1, the classical versions are two-valued, so Bell’s inequality [1] shows that only some covariances may have the classical versions. In [5] classical versions were constructed for covariances corresponding to stationary two-valued Markov processes (q = −1). In [2, Prop. 3.9], the existence of such classical versions was proved for all −1 < q < 1 in the case where the q- Gaussian process is Markovian (which can be characterized in terms of the covariance function). The situation for other covariances remained open in [2] and it was unclear which qGaussian processes have no classical realizations. This issue is addressed in the present
260
W. Bryc
paper. Using a formula for conditional variances of classical versions we derive a constraint on the covariance which extends one of the Bell’s inequalities from q = −1 to general −1 ≤ q < 1. The inequality implies that there are covariances such that the corresponding non-commutative q-Gaussian processes cannot have classical versions over the entire range −1 ≤ q < 1. Since q interpolates between the values q = −1, where classical versions may fail to exist and q = 1, where the classical versions always exist, it is interesting that there is a version of Bell’s inequality which does not depend on q. The proof relies on formulas for conditional moments of the first two orders, which are of independent interest. Computations to derive them were possible thanks to recent advances in the Fock space representation of q-commutation relations (1), see [2, 3]. 2. Preliminaries This section introduces the Fock space representation of q-Gaussian processes, and states known results in the form convenient for us. It is based on [2]. 2.1. Notation. Throughout the paper, q is a fixed parameter and −1 < q < 1. For n n = 0, 1, 2, . . . we define q-integers [n]q := 1−q 1−q . The q-factorials are [n]q ! := [1]q [2]q . . . [n]q , with the convention [0]q ! := 1. The q-Hermite polynomials are defined by the recurrence xHn (x) = Hn+1 (x) + [n]q Hn−1 (x), n ≥ 0
(2)
with H−1 (x) := 0, H0 (x) := 1. These polynomials are orthogonal with respect to the unique √ continuous probability measure νq (dx) = fq (x)dx supported on √ absolutely [−2/ 1 − q, 2/ 1 − q], where density fq (x) has explicit product expansion, see [2, Theorem 1.10] or [6]; the second moments of q-Hermite polynomials are
√ 2/ 1−q
√ −2/ 1−q
(Hn (x))2 νq (dx) = [n]q !.
In our notation we are suppressing the dependence of Hn (x) on q. 2.2. q-Fock space. For a real Hilbert space H with complexification Hc = H ⊕ iH we define its q-Fock space q (H) as the closure of C ⊕ n Hc⊗n , the linear span of vectors f1 ⊗ · · · ⊗ fn , in the scalar product n |σ | if m = n σ ∈Sn q j =1 fj , gσ (j )
f1 ⊗ · · · ⊗ fn |g1 ⊗ · · · ⊗ gm q = . (3) 0 if m = n Here is the vacuum vector, Sn are permutations of {1, . . . , n} and |σ | = #{(i, j ) : i < j, σ (i) > σ (j )}. For the proof that (3) indeed is non-negative definite, see [3]. Given the q-Fock space q (H) and f ∈ H we define the creation operator af : q (H) → q (H) and its ·|· q -adjoint, the annihilation operator af∗ : q (H) → q (H) as follows: af := 0,
q-Gaussian Processes: Conditional Moments and Bell’s Inequality
af f1 ⊗ · · · ⊗ fn :=
n
q j −1 f, fj f1 ⊗ · · · ⊗ fj −1 ⊗ fj +1 ⊗ · · · ⊗ fn ,
261
(4)
j =1
and af∗ = f,
af∗ f1 ⊗ · · · ⊗ fn := f ⊗ f1 ⊗ · · · ⊗ fn .
(5)
These operators are bounded, satisfy commutation relation (1), and af +g = af + ag , see [3]. 2.3. q-Gaussian processes. We now consider (non-commutative) random variables as the elements of the algebra A generated by the self-adjoint operators Xf := af + af∗ , with vacuum expectation state E : A → R given by E(X) = |X q . Definition 1. We will call {X(t) : t ∈ T } a q-Gaussian (non- commutative) process indexed by T if there are vectors h(t) ∈ H such that X(t) = Xh(t) . For a q-Gaussian process the covariance function ct,s := E(Xt Xs ) becomes ct,s = h(t), h(s) . The Wick products ψ(f1 ⊗ · · · ⊗ fn ) ∈ A are defined recurrently by ψ() := I, and ψ(f ⊗ f1 ⊗ · · · ⊗ fn ) := Xf ψ(f1 ⊗ · · · ⊗ fn ) −
n
(6)
q j −1 f, fj ψ(f1 ⊗ · · · ⊗ fj −1 ⊗ fj +1 ⊗ · · · ⊗ fn ).
j =1
An important property of Wick products is that if X = ψ(f1 ⊗ · · · ⊗ fn ) then X = f1 ⊗ · · · ⊗ fn . We will also use the connection with q-Hermite polynomials. If f = 1 then ψ f ⊗n = Hn Xf ,
(7)
(8)
see [2, Prop. 2.9]. Formulas (3), (7), and (8) show that for a unit vector f ∈ H we have 2
q |σ | = [n]q !. (9) = E Hn (Xf ) σ ∈Sn
Thus νq is indeed the distribution of Xf . Our main use of the Wick product is to compute certain conditional expectations.
2.4. Conditional expectations. Recall that a (non-commutative) conditional expectation on the probability space (A, E) with respect to the subalgebra B ⊂ A is a mapping E : A → B such that E(Y1 XY2 ) = E(Y1 E(X)Y2 ) for all X ∈ A, Y1 , Y2 ∈ B.
(10)
262
W. Bryc
We will study only algebras B generated by the identity and the finite number of random variables Xf1 , . . . , Xfn . In this situation, we will use a more probabilistic notation: E(X|Xf1 , . . . , Xfn ) := E(X), X ∈ A. In this setting conditional expectations are easily computed for X given by Wick products. This important result comes from [2, Theorem 2.13]. Theorem 1. If Y = ψ(g1 ⊗ · · · ⊗ gm ), X1 = Xf1 , . . . , Xk = Xfk for some fi , gj ∈ H and P : H → H denotes orthogonal projection onto the span of f1 , . . . , fk then E(Y|X1 , . . . , Xk ) = ψ(P g1 ⊗ · · · ⊗ P gm ). The following formula is an immediate consequence of Theorem 1 and (8), and is implicit in [2, Proof of Theorem 4.6]. Corollary 1. If X = Xf , Y = Xg with unit vectors f = g = 1 and Hn is the nth q-Hermite polynomial, see (2), then E(Hn (Y)|X) = f, g n Hn (X).
(11)
For a finite number of vectors f0 , f1 , . . . , fk ∈ H, let Xk := Xfk . These (noncommutative) random variables have linear regressions and constant conditional variances like the classical (commutative) Gaussian random variables. Proposition 1. E(X0 |X1 , . . . , Xk ) =
k
aj Xj
(12)
j =1
and E(X02 |X1 , . . . , Xk )
=(
k
aj Xj )2 + cI.
(13)
j =1
If f1 . . . , fk ∈ H are linearly independent then the coefficients aj , c are uniquely determined by the covariance matrix C = [ci,j ] := [ fi , fj ]. Notice that Eq. (13) can indeed be rewritten as the statement that conditional variance is constant,
V ar(X0 |X1 , . . . , Xk ) := E (X0 − E(X0 |X1 , . . . , Xk ))2 |X1 , . . . , Xk = cI.
Proof. This follows from Theorem 1 and (6). Write the orthogonal projection of f0 onto the span of f1 , . . . fk as the linear combination g = j aj fj . Then E(X0 |X1 , . . . , Xk ) = E(ψ(f0 )|X1 , . . . , Xk ) = ψ(g) =
j
aj ψ(fj ),
q-Gaussian Processes: Conditional Moments and Bell’s Inequality
263
which proves (12). Similarly, E(X02 − f0 2 I|X1 , . . . , Xk ) = E(ψ(f0 ⊗ f0 )|X1 , . . . , Xk ) 2 = ψ(g ⊗ g) = aj Xj − g2 I. j
This proves (13) with c = f0 2 − g2 . If f1 . . . , fk ∈ H are linearly independent then the representation g = j aj fj is unique. To analyze standardized triplets in more detail we need the explicit form of the coefficients. (We omit the straightforward calculation.) Corollary 2. If X := Xf , Y := Xg , Z := Xh and f, h ∈ H are linearly independent unit vectors, then E(Y|X, Z) = aX + bZ,
(14)
E(Y |X, Z) = (aX + bZ) + cI, 2
2
(15)
where f, g − g, h f, h
, 1 − f, h 2 g, h − f, g f, h
b= . 1 − f, h 2
a=
(16) (17)
Another calculation shows that c = det(C)/(1 − f, h 2 ), where C is the covariance matrix; in particular c ≥ 0. 3. Conditional Moments of Classical Versions We give the definition of a classical version which is convenient for bounded processes; for a more general definition, see [2, Def. 3.1]. Definition 2. A classical version of the process X(t) indexed by t ∈ T ⊂ R is a stochastic ˜ defined on some classical probability space such that for any finite number process X(t) of indexes t1 < t2 < · · · < tk and any polynomials P1 , . . . , Pk , E (P1 (X(t1 ))P2 (X(t2 )) . . . Pk (X(tk )))
˜ 1 ))P2 (X(t ˜ 2 )) . . . Pk (X(t ˜ k )) . = E P1 (X(t
(18)
Here E(·) denotes the classical expected value given by Lebesgue integral with respect to the classical probability measure. Our main interest is in finite index set T = {t1 , t2 , t3 }, where t1 < t2 < t3 . In this case we write X := X(t1 ), Y := X(t2 ), Z := X(t3 ). We say that an ordered triplet (X, Y, Z)
˜ ˜ ˜ ˜ has a classical version X, Y , Z, if E (P1 (X)P2 (Y)P3 (Z)) = E P1 (X)P2 (Y˜ )P3 ( Z) for all polynomials P1 , P2 , P3 . The classical version of a non-commutative process is order-dependent, since the left-hand side of (18) may depend on the ordering of the variables, while the right-hand side does not. For specific example in the context of q-Gaussian random variables, see [5, formulas (2.64) and (2.65)].
264
W. Bryc
3.1. Triplets. All pairs (Xf , Xg ) of q-Gaussian random variables have classical versions because E(Xfm Xgn ) = E(Xfn Xgm ) for all integer m, n; however, the classical version of a triplet may fail to exist. With this in mind we consider q-Gaussian triplets X := Xf , Y := Xg , Z := Xh .
(19)
To simplify the notation we take unit vectors f = g = h = 1. We assume that ˜ Y˜ , Z) ˜ of (X, Y, Z), in this order. there is a classical version (X, From Corollary 2 we know that non-commutative random variables X, Y, Z have linear regression and constant conditional variance. It turns out that the corresponding ˜ Y˜ , Z˜ also have linear regressions, while their conditional classical random variables X, variances get perturbed into quadratic polynomials. ˜ Y˜ , Z) ˜ is a classical version of the q-Gaussian triplet (19) then Theorem 2. If (X, ˜ Z) ˜ = a X˜ + bZ, ˜ E(Y˜ |X, 2 2 ˜ Z) ˜ = AX˜ + B X˜ Z˜ + C Z˜ 2 + D, E(Y˜ |X,
(20) (21)
where a, b are given by (16), (17), ab(1 − q) f, h + a 2 (1 − q f, h 2 ) , 1 − q f, h 2 ab(1 + q)(1 − f, h 2 ) B= , 1 − q f, h 2 ab(1 − q) f, h + b2 (1 − q f, h 2 ) C= , 1 − q f, h 2 A=
(22) (23) (24)
and D = 1 − A − B f, h − C.
(25)
The proof relies on the following technical result. Lemma 1. If Hn , Hm are q-Hermite polynomials given by (2), then f, h n+1 [n + 2]q ! if m = n + 2 n−1 f, h [n]q ! if m = n − 2 , E (Hn (X)ZXHm (Z)) = n−1 ([n] +1) f, h 2 +q[n] [n] ! if m = n f, h
q q q 0 otherwise (26) f, h n+1 [n + 2]q ! if m = n + 2 f, h n−1 [n] ! if m = n − 2 q E (Hn (X)XZHm (Z)) = , f, h n−1 [n + 1]q f, h 2 + [n]q [n]q ! if m = n 0 otherwise (27)
q-Gaussian Processes: Conditional Moments and Bell’s Inequality
E Hn (X)X2 Hm (Z) = E Hm (X)Z2 Hn (Z) f, h n+2 [n + 2]q ! if m = n + 2 f, h n−2 [n] ! if m = n − 2 q , = f, h n [n + 1]q + [n]q [n]q ! if m = n 0 otherwise f, h n [n]q ! if m = n E (Hn (X)Hm (Z)) = . 0 otherwise
265
(28)
(29)
˜ Y˜ , Z˜ are bounded random variables, to prove (20) we need Proof of Theorem 2. Since X, only to verify that for arbitrary polynomials P , Q,
˜ Y˜ Q(Z) ˜ = E P (X)(a ˜ X˜ + bZ)Q( ˜ ˜ . E P (X) Z) This is equivalent to E (P (X)YQ(Z)) = E (P (X)(aX + bZ)Q(Z)) , see (18). The latter follows from (14) and (10), proving (20). To prove (21), we verify that for arbitrary polynomials P , Q we have
˜ = E P (X)(A ˜ ˜ . ˜ Y˜ 2 Q(Z) X˜ 2 + B X˜ Z˜ + C Z˜ 2 + D)Q(Z) E P (X) By definition (18), this is equivalent to
E P (X)Y2 Q(Z) = E P (X)(AX2 + BXZ + CZ2 + D)Q(Z) .
(30)
It suffices to show that (30) holds true when P = Hn and Q = Hm are the q-Hermite polynomials defined by (2). Formula (15) implies that the left-hand side of (30) is given by cE(Hn (X)Hm (Z)) + a 2 E(Hn (X)X2 Hm (Z)) + b2 E(Hn (X)Z2 Hm (Z)) + abE(Hn (X)XZHm (Z)) + abE(Hn (X)ZXHm (Z)), and the right-hand side becomes AE(Hn (X)X2 Hm (Z)) + CE(Hn (X)Z2 Hm (Z)) + BE(Hn (X)XZHm (Z)) + DE(Hn (X)Hm (Z)). Using formulas from Lemma 1 we can see that both sides are zero, except when m = n or m = n ± 2. We now consider these three cases separately. Case m = n + 2. Using Lemma 1, (30) simplifies to (a 2 f, h 2 +2ab f, h +b2 ) f, h n [n+2]q ! = (A f, h 2 +B f, h +C) f, h n [n+2]q !. This equation is satisfied when coefficients A, B, C satisfy the equation A f, h 2 + B f, h + C = a 2 f, h 2 + 2ab f, h + b2 .
(31)
266
W. Bryc
Case m = n − 2. Using Lemma 1, (30) simplifies to (a 2 + 2ab f, h + b2 f, h 2 ) f, h n−2 [n]q ! = (A + B f, h + C f, h 2 ) f, h n−2 [n]q !. This equation is satisfied whenever A + B f, h + C f, h 2 = a 2 + 2ab f, h + b2 f, h 2 .
(32)
Case m = n. We use again Lemma 1. On both sides of Eq. (30) we factor out f, h n−1 [n]q !, and equate the remaining coefficients. (This is allowed since we are after sufficient conditions only!) We get
f, h (a 2 + b2 )([n + 1]q + [n]q )
+ ab f, h 2 [n + 1]q + (1 + q)[n]q + c f, h
= (1 + q)(A + C) + B(q f, h 2 + 1)[n]q + D f, h .
Now we use [n + 1]q = 1 + q[n]q . Suppressing the correction to the constant term (i.e., the term free of n), we get
(1 + q) f, h (a 2 + b2 ) + ab(1 + f, h 2 ) [n]q + c f, h + . . .
= (1 + q)(A + C) f, h + B(q f, h 2 + 1) [n]q + D f, h , where c + . . . denotes the suppressed constant term corrections. This equation holds true when the coefficients at [n]q match, which gives (1 + q) f, h (A + C) + B(q f, h 2 + 1)
= (1 + q) (a 2 + b2 ) f, h + ab( f, h 2 + 1) ,
(33)
and the constant terms match: c + · · · = D. The latter holds true when the expectations are equal (n = m = 0), and hence this condition is equivalent to (25). The remaining three equations (31), (32), and (33) have a unique solution given by the expressions (22), (23), (24). Proof of Lemma 1. Using the definition of vacuum expectation state, (7) and (8) we get E(Hn (X)ZXHm (Z)) = ZHn (X)|XHm (Z) q = Xh f ⊗n |Xf h⊗m q . Therefore (4), and (5) imply E(Hn (X)ZXHm (Z)) = [n]q f, h f ⊗n−1 + h ⊗ f ⊗n |[m]q f, h h⊗m−1 + f ⊗ h⊗m q .
(34)
The latter is zero, except when m = n or m = n ± 2. We will consider these two cases separately. If m = n, by orthogonality we have E(Hn (X)ZXHm (Z)) = [n]2q f, h 2 f ⊗n−1 |h⊗n−1 q + h ⊗ f ⊗n |f ⊗ h⊗n q .
q-Gaussian Processes: Conditional Moments and Bell’s Inequality
267
Clearly, f ⊗n−1 |h⊗n−1 q = f, h n−1 [n − 1]q !; this can be seen either from (9) and (11), or directly from the definition (3). By (3) the second term splits into the sum over permutations σ ∈ Sn+1 such that σ (1) = 1 and the sum over the permutations such that σ (1) = k > 1. This gives h ⊗ f
⊗n
|f ⊗ h
⊗n
q =
f, h q
σ ∈Sn n+1
= f, h
|σ |
n
f, h +
n+1
q k−1+|σ | f, h n−1
k=2 σ ∈Sn n−1 [n]q ! + f, h q[n]q [n]q !.
Elementary algebra now yields (26) for m = n. If m = n + 2, then the right-hand side of (34) consists of only one term we get E(Hn (X)ZXHn+2 (Z)) = [n + 2]q f, h h ⊗ f ⊗n |h⊗n+1 q = [n + 2]q f, h
q |σ | f, h n = f, h n+1 [n + 2]q !. σ ∈Sn+1
Since m = n − 2 is given by the same expression with the roles of m, n switched around, this ends the proof of (26). The remaining expectations match the corresponding commutative values, and can also be evaluated using recurrence (2) and formulas (11), and (9). To prove (27) notice that since X and Hn (X) commute, using (2) and (11) we get E(Hn (X)XZHm (Z)) = E(XHn (X)(Hm+1 (Z) + [m]q Hm−1 (Z))) = f, h m+1 E(XHn (X)Hm+1 (X)) + [m]q f, h m−1 E(XHn (X)Hm−1 (X)). The only non-zero values are when m = n, or m = n ± 2. Using (2) again, and then (9) we get (27). Since by (11) we have E(Hn (X)X2 Hm (Z)) = f, h m EX2 Hn (X)Hm (X), recurrence (2) used twice proves (28). Formula (29) is an immediate consequence of (11) and (9).
3.2. Relation to processes with independent increments. In [2, Definition 3.5] the authors define the non-commutative q-Brownian motion and show that it has a classical version, see [2, Cor. 4.5]. Since the classical version of the q-Brownian motion is Markov, Theorem 2 implies that all regressions are linear, and all conditional variances are quadratic. A computation gives the following expression for the conditional variances. Proposition 2. Let X˜ t be the classical version of the q-Brownian motion, i. e., ft , fs = min{s, t}. Then for t1 < t2 < · · · < tn < s < t we have Var(X˜ s |X˜ t1 , . . . , X˜ tn , X˜ t )
˜ t − X˜ tn t X˜ t − tn X˜ tn (1 − q) t t X n (t − s) (s − tn ) . 1+ = s (t − qtn ) (t − tn )2
268
W. Bryc
In [8], classical processes with independent increments, linear regressions, and quadratic conditional variances are analyzed. These processes have the same covariances as qBrownian motion, but the conditional variances are quadratic functions of the increment X˜ t − X˜ tn only. Proposition 2 shows that the classical realizations of q-Brownian motion are not among the processes in [8] and thus have dependent increments. 4. Bell’s Inequality It is well known that all q-Gaussian n-tuples with q = 1 have classical versions: these are given by the classical Gaussian distribution with the same covariance matrix [ fi , fj ]. For q = −1 the classical version of the the standardized q-Gaussian triplet (X, Y, Z) consists of the ±1-valued symmetric random variables. The celebrated Bell’s inequality [1] therefore restricts their covariances: 1 − f, h ≥ | f, g − g, h |.
(35)
In particular, there are triplets of q-Gaussian random variables with q = −1 which do not have a classical version. The following shows that restriction (35) is in force for sub-Markov covariances over the entire range −1 ≤ q < 1. ˜ Y˜ , Z) ˜ is a classical version of q-Gaussian (X, Y, Z) := Theorem 3. Suppose that (X, (Xf , Xg , Xh ), where f, h ∈ H are linearly independent, and −1 ≤ q < 1. If either f, g g, h ≤ f, h and 0 < f, h < 1,
(36)
or f, h = 0, or q = −1, then inequality (35) holds true. Proof. Since the case q = −1 is well known, we restrict our attention to the case −1 < q < 1. Our starting point is expression (21). A computation shows that the
2 ˜ Z) ˜ := E(Y˜ 2 |X, ˜ Z) ˜ − E(Y˜ |X, ˜ Z) ˜ conditional variance Var(Y˜ |X, is as follows. (1 + q)(1 − f, h 2 ) + 2(1 − q) 1 − q f, h 2
ab (1 − q) ˜ ˜ f, h X˜ − Z˜ . − Z f, h
− X 1 − q f, h 2
˜ Z) ˜ = 1 − a 2 − b2 − ab f, h
Var(Y˜ |X,
(37)
˜ Z. ˜ It The right-hand side of this expression must be non-negative over the support of X, ˜ ˜ is known, see [4, Lemma 8.1] or [2, Theorem 1.10], that X, Z have the joint probability density function f √ (x, z) with respect to√the product of marginals νq . Moreover, f is defined for all −2/ 1 − q ≤ x, z ≤ 2/ 1 − q and from its explicit product expansion we can see that ∞ (1 − f, h 2 q k ) f (x, z) ≥ (1 + f, h q k )4 k=0
is strictly positive.√In particular, the right-hand √ √ side of (37) must be non-negative when √ evaluated at X˜ = 2/ 1 − q, Z˜ = − 2/ 1 − q.
q-Gaussian Processes: Conditional Moments and Bell’s Inequality
269
˜ Y˜ we get the rational expression Using formulas (16), (17) with the above values of X, for the conditional variance which can be written as follows.
˜ Z˜ 1 − q f, h 2 (1 − f, h )2 Var Y˜ |X,
(38) = (1 − f, h )2 1 − q f, h 2 + (1 + q) f, h f, g g, h
− ( f, g − g, h )2 (1 + f, h 2 ). Therefore
(1 − f, h )2 1 − q f, h 2 + (1 + q) f, h f, g g, h
≥ ( f, g − g, h )2 (1 + f, h 2 ). Since the assumptions imply that 1 − q f, h 2 + (1 + q) f, h f, g g, h ≤ 1 + f, h 2 , this implies (1 − f, h )2 ≥ ( f, g − g, h )2 , proving (35). 4.1. Examples. The first example shows that there are covariances such that q- Gaussian random variables have no classical version for all −1 ≤ q < 1. Example 1. Consider the case f, h = g, h > 0. This can be realized when the covariance matrix is non-negative definite; a computation shows that this is equivalent to the condition 2 f, h 2 ≤ 1 + f, g . Since (36) is satisfied, Bell’s inequality (35) implies 1 + f, g ≥ 2 f, h . Therefore, all choices of vectors f, g, h ∈ H such that f, h = g, h , 0 < f, h < 1, and 2 f, h 2 − 1 < f, g < 2 f, h − 1 lead to q-Gaussian triplets with no classical version for −1 ≤ q < 1. A nice feature of Theorem 3 is that its statement does not depend on q, as long as q < 1. But such a result cannot be sharp. A less transparent statement that the conditional variance is non-negative is a stronger restriction on the covariances, and it depends on q. This is illustrated by the next example. Example 2. Suppose f, h = g, h = 1/2. Inequality (35) used in Example 1 implies that if a classical version of a q-Gaussian process exists then f, g ≥ 0. Evaluating the √ ˜ Z) ˜ at X˜ = 2/ 1 − q, Z˜ = −X˜ we get a more restrictive conditional variance Var(Y˜ |X, constraint f, g ≥ q+5 36 . Acknowledgements. I would like to thank the referee, A. Dembo, and T. Hodges for suggestions which improved the presentation, and to V. Kaftal for several helpful discussions.
References 1. Bell, J.S.: On the Einstein–Podolski–Rosen paradox. Physics 1, 195–200 (1964) 2. Bo˙zejko, M., Kümmerer, B., and Speicher, R.: q-Gaussian processes: Non-commutative and classical aspects. Commun. Math. Phys. 185, 129–154 (1997) 3. Bo˙zejko, M. and Speicher, R.: An example of a generalized Brownian motion. Commun. Math. Phys. 137, (3), 519–531 (1991) 4. Bryc, W.: Stationary random fields with linear regressions. Ann. Probab. in print, 2001 5. Frisch, U. and Bourret, R.: Parastochastics. J. Math. Phys. 11, (2), 364–390 (1970) 6. Koekoek, R. and Swarttouw, R.F.: The Askey-scheme of hypergeometric orthogonal polynomials and its q-analogue. Report no. 98–17, Delft University of Technology, 1998, www:http://aw.twi.tudelft.nl/˜koekoek/askey.html
270
W. Bryc
7. Voiculescu, D.V. Dykema, K.J., and Nica, A.: Free random variables. Providence, RI: American Mathematical Society, 1992 8. Wesołowski, J.: Stochastic processes with linear conditional expectation and quadratic conditional variance. Probab. Math. Statist. (Wrocław) 14, 33–44 (1993) Communicated by H. Araki
Commun. Math. Phys. 219, 271 – 322 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Multiplicity of Phase Transitions and Mean-Field Criticality on Highly Non-Amenable Graphs Roberto H. Schonmann Department of Mathematics, UCLA, Los Angeles, CA 90095, USA. E-mail:
[email protected] Received: 27 March 2000 / Accepted: 7 December 2000
Abstract: We consider independent percolation, Ising and Potts models, and the contact process, on infinite, locally finite, connected graphs. It is shown that on graphs with edge-isoperimetric Cheeger constant sufficiently large, in terms of the degrees of the vertices of the graph, each of the models exhibits more than one critical point, separating qualitatively distinct regimes. For unimodular transitive graphs of this type, the critical behaviour in independent percolation, the Ising model and the contact process are shown to be mean-field type. For Potts models on unimodular transitive graphs, we prove the monotonicity in the temperature of the property that the free Gibbs measure is extremal in the set of automorphism invariant Gibbs measures, and show that the corresponding critical temperature is positive if and only if the threshold for uniqueness of the infinite cluster in independent bond percolation on the graph is less than 1. We establish conditions which imply the finite-island property for independent percolation at large densities, and use those to show that for a large class of graphs the q-state Potts model has a low temperature regime in which the free Gibbs measure decomposes as the uniform mixture of the q ordered phases. In the case of non-amenable transitive planar graphs with one end, we show that the q-state Potts model has a critical point separating a regime of high temperatures in which the free Gibbs measure is extremal in the set of automorphism-invariant Gibbs measures from a regime of low temperatures in which the free Gibbs measure decomposes as the uniform mixture of the q ordered phases. Contents 1.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Percolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
272 272 274
Work partially supported by the N.S.F. through grants DMS-9703814 and DMS-0071766 and by a Guggenheim Foundation fellowship
272
2.
3.
4.
5.
6.
7.
R. H. Schonmann
1.3 Potts and Ising models . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Contact process . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Organization of the paper . . . . . . . . . . . . . . . . . . . . . . Background on Infinite Graphs . . . . . . . . . . . . . . . . . . . . . . . 2.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Transitive graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Isoperimetric constants . . . . . . . . . . . . . . . . . . . . . . . 2.5 Spectral radius . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Number of ends . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Cut sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 Planar graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Independent Bond Percolation . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Background on independent bond percolation on graphs . . . . . . 3.2 Mean field criticality for independent percolation . . . . . . . . . . 3.3 Consequences for independent percolation of “high non-amenability” . . . . . . . . . . . . . . . . . . . . . . Potts and Ising Models I . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Applications of the Fortuin–Kasteleyn random cluster model . . . . 4.2 Mean field criticality for the Ising model . . . . . . . . . . . . . . 4.3 Consequences for FK and Potts models of “high non-amenability” Potts and Ising Models II. The Finite Island Property . . . . . . . . . . . . 5.1 Sufficient condition for the Potts free Gibbs measure to decompose as the uniform mixture of the q ordered phases . . . 5.2 Finite island property for graphs with positive anchored vertex isoperimetric constant . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Finite island property for transitive graphs with one end which satisfy the quasi-connected minimal cut sets property . . . . 5.4 Finite island property for planar graphs . . . . . . . . . . . . . . . Related Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Multiplicity of phase transitions in the FK model . . . . . . . . . . 6.2 Site percolation . . . . . . . . . . . . . . . . . . . . . . . . . . . The Contact Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Separation of critical points for the contact process . . . . . . . . . 7.2 Mean field criticality for the contact process . . . . . . . . . . . .
277 281 282 283 283 284 285 286 287 287 288 288 288 288 291 293 294 294 302 303 305 305 306 309 310 311 311 312 313 313 317
1. Introduction 1.1. Preliminaries. In this paper we address issues in the fast growing area of research that can be called “statistical-mechanics type processes on graphs”. In this area one is interested in studying the basic lattice models from statistical mechanics and related areas, including percolation and interacting particle systems, but with the typical Euclidean lattices (Zd and its relatives) replaced by a fairly general graph. For a recent review, see [Lyo5]. It has always been clear that several results in statistical mechanics are best formulated on arbitrary graphs, and do not depend on the particular structure of the graph. For instance, this is typically the case with correlation inequalities. It has also long been accepted that the study of the systems on homogeneous trees is worthwhile, in that it
Multiplicity of Phase Transitions
273
is typically simpler than on Euclidean lattices, and it sheds light on the behavior of the systems in high dimensional Euclidean lattices. It is important, therefore, to stress that the current surge of interest in statistical mechanics type processes on graphs and in particular the motivation for the current paper derive from a much broader perspective. Some of the main reasons for this interest are listed next. (1) Certain elegant and enlightening results are not present when one only considers Euclidean lattices. In other words, relevant mathematical structure is hidden when the systems are studied only on Euclidean lattices. (2) The interplay between the geometry of the graph and the behavior of the statistical mechanics system is often very interesting. The study of the processes sheds light on the geometry of the graph and conversely. (3) The study of systems on general graphs is important if one is interested in systems in random environment. For instance, the study of a diluted system on a cubic lattice may be seen as the study of the system on percolation clusters. (4) Non-Euclidean geometries are important in Physics, so that the study of physical systems on non-Euclidean lattices may be of relevance also to Physics. In Chemistry, especially after the discovery of fullerenes, it became clear that even in a three dimensional Euclidean universe atoms and molecules can arrange themselves in fashions which locally display the geometry of non-Euclidean surfaces (see, e.g., [VT] and [TM]). Quasi-crystals are still another source of motivation. (5) For applications to other areas of Science, including Biology, Economics, etc, it is natural to consider the agents as being modeled by the vertices of a graph, with the edges of the graph indicating the presence of interaction among agents (infection, mating, trade, etc.). Such graphs do not need to be regular in any sense. (6) Problems which remain open on Euclidean lattices may be better understood, and perhaps even eventually solved, by placing them in this broader context. Moreover, the added breadth of the enterprise attracts to it mathematicians with various backgrounds and in this way brings in new ideas to the field. Here we will consider three of the most basic statistical-mechanics type processes: independent percolation, Potts models (especially Ising models) and the contact process. We will obtain certain results which are similar for all these processes, and other results that relate some of these processes to each other. Papers in which percolation has been studied with emphasis on graphs which are not Euclidean lattices include: [Lyo2, GN, Lyo3, Wu1, Lyo4, BS1, Häg2, Lal1, BB, BLPS1, BLPS2, GS, HP, Sch2, Sch3, HPS, BLS, LyS, Häg3, CP, HSS, Per, MP, CS, PS-N, Lal3, BS3]. See also [BS2] for an on-line, continuously updated, account of progress in this area, as well as for links to various related Web sites. For an introduction to the Ising model and other statistical mechanics systems on homogeneous trees, see, e.g., Chapter 12 of [Geo] and the references provided there, as well as the papers [BRZ] and [Iof1] that appeared afterwards. The Ising model on more general graphs (with and without an external field) has been studied in a number of recent papers, including: [Lyo1, SeS, RNO, NW, Wu1, Wu3, Iof2, ST, JS, Häg3, EKPS, HSS] and [Wu4]. The contact process on homogeneous trees has been studied in [Pem, MS, MSZ, DS, Wu2, Zha, Lig2, Sta, Lig3, LS, SaS2, Lal2], and [Sch1]. For a comprehensive presentation on the subject see [Lig4]. The contact process on more general graphs has been studied primarily in [SaS1, SaS3, Sal] and [PS].
274
R. H. Schonmann
This introductory section is written having in mind a reader who is familiar with mathematical statistical mechanics and related subjects, but may not be so familiar with the study of such processes on graphs. Still, to keep this introduction focused, we will postpone the presentation of a detailed background on graphs, including various technical definitions, to Sect. 2. Also, to enhance readability, we state the results in the introduction not in their strongest possible form, but rather in some form which seems particularly transparent and appealing to us. Stronger, but more cumbersome results will appear in later sections of the paper. One of the main concerns in this paper is with the critical behavior. We will provide results that show that in certain cases it is of “mean-field nature”. We suppose that the reader is familiar with the general meaning of the above expression in quotation marks, and we will use it freely in this introduction, even when stating conjectures. Precise meanings will appear in the following sections. For the moment it is worth saying that in the cases in this paper in which we proved the “mean-field nature” of a critical point, what we proved is that a certain diagrammatic condition (the open triangle condition for percolation and the contact process and the finiteness of the bubble diagram for the Ising model) holds. The precise definitions of the diagrammatic conditions and of the critical exponents which are implied to take mean-field value are postponed to later sections, to keep this introduction from becoming too long. A very large class of graphs on which it is natural to study statistical mechanics type systems is that of the infinite, locally finite, connected graphs. But for most of the purposes in this paper the smaller class of transitive (same as vertex-transitive or homogeneous) graphs is the appropriate playground. Informally, a graph is transitive if all its vertices play exactly the same role. Therefore transitive graphs can be seen as “discrete models of a homogeneous space”. Basic examples of transitive graphs are the cubic lattices, Zd , d = 1, 2, . . . , other regular tilings of Euclidean space, like the triangular and hexagonal lattices, and the homogeneous trees. Other examples of transitive graphs include regular tilings of hyperbolic spaces, which can be thought of as representing “non-Euclidean crystals”. Moreover new transitive graphs can be obtained from old ones, by taking for instance Cartesian products. A very important numerical feature of a graph G = (V , E) is its Cheeger constant (same as edge-isoperimetric constant), defined as |∂E S| iE (G) = inf : S ⊂ V , 0 = |S| < ∞ , |S| where ∂E S is the edge boundary of S. Note that iE (G) lies always in the interval [0, D(G)), where D(G) is the maximal degree (same as coordination number) of the vertices of G. Graphs which have iE (G) = 0 (e.g., Euclidean lattices, including Zd , d= 1,2, . . . ) are called amenable graphs, while graphs which have iE (G) > 0 (e.g., homogeneous trees with degree at least three and regular tilings of hyperbolic spaces) are called non-amenable graphs. We state now in succession results on percolation, Potts and Ising models and the contact process. We will suppose that the reader is either familiar with the basics about these models, or will learn it from the references that are provided.
1.2. Percolation. For a general introduction to percolation see for instance [Gri2]. We will mostly consider independent bond percolation in this paper, adding the usual remark that results similar to the ones here hold also for independent site percolation, with
Multiplicity of Phase Transitions
275
similar proofs. For the reader’s convenience, Sect. 6 contains the statements of some of the corresponding results for independent site percolation, with indications on how to adapt the proofs to this setting. In independent bond percolation, each edge (same as bond) of a graph G = (V , E) is occupied with probability p and vacant otherwise, these decisions being independent for distinct edges. Clusters are the infinite connected components of the graph obtained by deleting the vacant edges from G. A basic result about percolation on a transitive graph G is that there are two critical points, 0 < pc (G) ≤ pu (G) ≤ 1, which define the boundaries of three distinct phases described as follows: (1) For p < pc (G) there is no infinite cluster a.s. (2) For pc (G) < p < pu (G) there are infinitely many infinite cluster a.s. (3) For p > pu (G) there is exactly one infinite cluster a.s. The monotonicity in p of uniqueness of the infinite cluster, contained in this statement, was proved in the case of unimodular (see definition in the next section) graphs in [HP], and generalized to all transitive graphs in [Sch2]. On amenable transitive graphs it follows from the arguments of [BK] that pc (G) = pu (G). On the other hand, there is a number of examples of non-amenable transitive graphs for which pc (G) < pu (G) (see [GN, BS1, Lal1, PS-N, Lal3] and [BS3]). One of the most interesting conjectures about percolation on transitive graphs is therefore the following one, originally stated in [BS1]. Conjecture 1.1 (Benjamini and Schramm). Suppose that G is an infinite, locally finite, connected transitive graph. Then pc (G) < pu (G) iff G is non-amenable. As for the critical behavior near and at pc (G), the arguments which lead to the conclusion that on Zd , for large d, the behavior is of mean-field nature (see, e.g. [Aiz2, AN, BA]) suggest the following conjecture. Conjecture 1.2. Suppose that G is an infinite, locally finite, connected transitive graph. If pc (G) < pu (G) then the behavior near and at pc is of mean-field nature. The following theorem is motivated by these two conjectures. The term “unimodular”, which appears there is very technical and will only be defined in Subsect. 2.2. For the moment, we simply point out that most transitive graphs of interest are known to be unimodular. For instance, this is the case for all Cayley graphs of finitely generated infinite groups (see definition also in Subsect. 2.2). Various conjectures for transitive graphs have only been proved to hold for the unimodular ones. This is due mostly to the availability in this case of a technique called “mass transport” (see, e.g., [BLPS1, BLPS2, HPS, BLS, LyS, Per, Lyo5] and Subsect. 3.2 of the current paper). Theorem 1.1. Suppose √ that G is an infinite, locally finite, connected transitive graph. If iE (G)/D(G) > 1/ 2, then pc (G) < pu (G) and the open triangle condition holds, implying that if G is unimodular a.s. there is no infinite cluster at pc (G) and the critical exponents γ , β, δ and exist and take their mean-field values. Note. The conclusion that pc (G) < pu (G) in Theorem 1.1 is not new. It is morally contained in [BS1], where an upper bound on pc (G) ((3.1) here) and a lower bound on pu (G) ((3.6) here) were obtained. When combined with an inequality available in [Moh1] ((2.4) here), these bounds imply pc (G) < pu (G). This fact was realized independently
276
R. H. Schonmann
by I. Pak and T. Smirnova-Nagnibeda, who stated it (in a somewhat more restricted setting) as Proposition 1(1) in [PS-N]. Note that if Corollary 1.1 turns out to be false, then the numerical constant defined by ϕ = inf{φ : iE (G)/D(G) > φ ⇒ pc (G) < pu (G)} would be non-trivial, i.e., different from 0 (or 1). In this case, Corollary 1.1 would be replaced with the question of obtaining the value of this constant. The triangle condition (to be defined in Subsect. 3.2) was proved to hold on certain Euclidean lattices in fundamental work by Hara and Slade (see, e.g., [HS]). Their work includes the cubic lattices Zd with d ≥ 19. It also includes modifications of Zd with d > 6, obtained by adding edges between any two vertices which in the original graph are within a large fixed distance L of each other (spread-out models). The only published result on the triangle condition being valid for a non-amenable graph that we are aware of is in [Wu1]. In that paper this is proved for a Cartesian product of Z and a homogeneous tree with large degree (the same graphs for which [GN] first showed the possibility of having 0 < pc < pu < 1). Theorem 1.1 extends this result. Absence of percolation at pc has been proved for all the unimodular non-amenable transitive graphs in [BLPS1] and [BLPS2], but it remains unproved for general transitive graphs. It is natural to ask also when is pu (G) < 1. This issue will appear, for instance, in connection to our results on the Potts model, presented in the next subsection. A fundamental topological notion related to the conjectured answer to this question is that of the number of ends of a graph. This number is the supremum of the number of infinite connected components of subgraphs produced by removing finitely many edges from the original graph. It is known (see, e.g., Sect. 6 of [Moh2]) that infinite, locally finite, connected, transitive graphs can only have 1, 2 or infinitely many ends. It is easy to prove (see Sect. 8 of [HPS] and Subsect. 3.1 of the current paper) that if the number of ends of the transitive graph G is 2, then pc (G) = pu (G) = 1, and that if the number of ends is infinity, then pc (G) < pu (G) = 1. In [BS1] it was asked whether transitive graphs with 1 end would have pu (G) < 1. By now this is widely believed to be the case, and it was explicitly stated as a conjecture (in the non-amenable case) in [Lyo5]. Conjecture 1.3. Suppose that G is an infinite, locally finite, connected transitive graph. Then pu (G) < 1 iff G has 1 end. The following classes of transitive graphs with one end have been proved to have pu (G) < 1. In [BB] this was done for graphs which, in the terminology being introduced in the current paper, have the quasi-connected minimal cut sets property. (Roughly speaking, a graph has this property if any minimal set of edges which separates two arbitrarily chosen vertices cannot be split into two sets which are arbitrarily far apart. For a precise definition, see Subsect. 2.7.) In [HPS] this was done for graphs which are Cartesian products of infinite transitive graphs. (For the definition of a Cartesian product of graphs, see Subsect. 2.3.) In [Lal1, Lal3] and [BS3] this was done for planar graphs with one end. (Roughly speaking, a graph is planar if it can be embedded in R2 with vertices being represented by points and edges being represented by lines which connect the corresponding vertices and can only intersect at their end-points. For a precise definition, see Subsect. 2.8.) In [LyS] this was done for Cayley graphs of Kazhdan groups (see the definition in that paper).
Multiplicity of Phase Transitions
277
1.3. Potts and Ising models. For a general introduction to the statistical mechanics of lattice systems see for instance [Geo]. For an introduction to the Potts model see for instance [GHM]. For q ∈ {2, 3, . . . }, the (ferromagnetic) q-Potts model on an infinite, locally finite, connected graph G = (V , E) is the statistical mechanics model in which at each site x there is a spin σx which can take values in {1, . . . , q} and for which the formal Hamiltonian is given by δσx ,σy , H (σ ) = − {x,y}∈E
where δa,b = 1 if a = b and δa,b = 0 if a = b. The Ising model corresponds to the case q = 2. The inverse temperature parameter β > 0 is included in the definition of the Gibbs distributions, formally given by µβ (σ ) =
exp(−βH (σ )) . Normalization
The set of all the Gibbs distributions (DLR states) for the q-Potts model on the infinite G,q graph G at inverse temperature β will be denoted by Gβ . The set of its extremal elements G,q
will be denoted by (Gβ )ext . The set of automorphism invariant Gibbs distributions will G,q
G,q
be denoted by Gβ,A , and the set of its extremal elements will denoted by (Gβ,A )ext . G,q
G,q
It is well known that q Gibbs distributions µ1,β , . . . , µq,β can be obtained by taking infinite volume limits using as boundary conditions the configurations in which all the spins take, respectively, the same value 1, . . . , q. These distributions, called the ordered G,q G,q G,q phases, belong to (Gβ )ext ∩Gβ,A (and hence to (Gβ,A )ext ), as can be seen by constructing them using the Fortuin–Kasteleyn (FK) random cluster model, in the coupled fashion introduced in [ES], and using the FKG inequalities for the FK model. G,q G,q G,q It is also well known that |Gβ | = 1 iff µ1,β = · · · = µq,β . Moreover, there is βc (G, q) ∈ [0, ∞] such that G,q
(1) For β < βc (G, q), |Gβ | = 1. G,q
G,q
(2) For β > βc (G, q), µi,β , i = 1, . . . , q are distinct (hence |(Gβ )ext | ≥ q). G,q
A further Gibbs distribution, µf,β , can be obtained taking infinite-volume limits using free boundary conditions. In this case construction using the FK model and use of G,q G,q the FKG inequalities for the FK model, give that µf,β ∈ Gβ,A , but not necessarily that it is an extremal element of this set of distributions. On Z2 it is known that for each q, for all β > βc (G, q), G,q
G,q
G,q
(Gβ,A )ext = {µ1,β , . . . , µq,β }, G,q
G,q
(1.1)
so that in particular µf,β ∈ (Gβ,A )ext . (This was proved for the Ising model in [MM-S] and extended to the other Potts models in [BCK].) In contrast, on homogeneous trees, G,q G,q µf,β ∈ (Gβ,A )ext for all β > 0. (This was proved in Theorem 12.31 of [Geo].) In [NW] the Potts model was studied on the Cartesian product of Z and a homogeneous tree with large degree. The following contrasting results were proved, for each value of
278
R. H. Schonmann
q. For large β, G,q
µf,β =
i=1,...,q
G,q
1 G,q µ , q i,β
(1.2)
G,q
so that in particular µf,β ∈ (Gβ,A )ext . But for an interval of values of β above βc (G, q), G,q
µf,β has decaying correlations, so that (1.2) must fail. Similar results were obtained in [Wu3] for other non-amenable (but in this case non-transitive) graphs. In [Jon] (Theorem 4.4(a)) it was shown that if G is transitive and non-amenable, then for large q, (1.2) fails for values of β in a non-degenerate interval. In contrast, the following can be proved, for instance by combining Lemma 4.3 of [Jon] with Theorem 4.2 of [Gri1] (adapted to transitive amenable graphs in Sect. 3.2 of [Jon]) and Corollary 4.5 of [Pfi]. (This theorem strengthens Theorem 4.4(b) in [Jon] in the transitive case.) Theorem 1.2. Suppose that G is an infinite, locally finite, connected transitive graph. If G is amenable then for each q and each value of β, (1.1) and (1.2) are equivalent. Moreover, these statements can fail for at most countably many values of β. The following detailed conjecture contains natural extensions of the results above. It includes an analogue of Corollary 1.1 for Potts models, and also proposes relations between the behavior of percolation and Potts models. (Part (a.1) is (1) above, included for comparison with the other statements. The fact that for transitive graphs, or more generally for bounded degree graphs, βc (G, q) > 0 is also well known; it can be proved, e.g., using the techniques in Chapter 8 of [Geo]. Compare also (b.2) in the conjecture with the known fact that βc (G, q) < ∞ iff pc (G) < 1; for this see [Häg3].) Conjecture 1.4. Suppose that G is an infinite, locally finite, connected transitive graph. ¯ q) such that: Then for each q there exist 0 < βc (G, q) ≤ β(G, G,q
(a.1) For β < βc (G, q), |Gβ | = 1. ¯ (a.2) For βc (G, q) < β < β(G, q), G,q
G,q
G,q
G,q
G,q
(Gβ,A )ext = {µ1,β , . . . , µq,β , µf,β } (|(Gβ,A )ext | = q + 1). ¯ (a.3) For β > β(G, q), G,q
G,q
G,q
G,q
(Gβ,A )ext = {µ1,β , . . . , µq,β } (|(Gβ,A )ext | = q). Moreover ¯ (b.1) βc (G, q) < β(G, q) iff G is non-amenable. ¯ (b.2) β(G, q) < ∞ iff pu (G) < 1. To provide an idea of how open the conjecture above is, we make two observations. First, even in the widely studied case of the Ising model on Zd , only in dimension d=2 G,q G,q G,q it is known that (Gβ,A )ext = {µ1,β , µ2,β } for all β > βc . In higher dimensions this is known only up to the possible existence of a countable set of exceptional values of β > βc , where it would fail (this was proved in [Leb]; it was extended to Potts models in [Pfi]). Second, in the case of the Ising model on a homogeneous tree it is easy to see
Multiplicity of Phase Transitions
279
G,q G,q that β¯ = ∞ (since µf,β has decaying correlations), but while it is known that µ1,β , G,q
G,q
G,q
µ2,β and µf,β are three distinct elements of (Gβ,A )ext for β > βc it is not known that these distributions exhaust this set. We state now some results which can be seen as further contributions in support of Conjecture 1.4. We will use the following definitions: G,q G,q β¯1 (G, q) = sup β ≥ 0 : µf,β ∈ (Gβ,A )ext for all β < β , 1 G,q G,q β¯2 (G, q) = inf β ≥ βc (G, q) : µf,β = µi,β for all β > β . q i=1,...,q
Note that clearly βc (G, q) ≤ β¯1 (G, q) ≤ β¯2 (G, q), and that if Conjecture 1.4 is true, ¯ then β¯1 (G, q) = β¯2 (G, q) (= β(G, q)). Theorem 1.3. Suppose that G is an infinite, locally finite, connected transitive graph. If iE (G)/D(G) > q/ 1 + q 2 , then βc (G, q) < β¯1 (G, q), so that for the q-Potts model G,q G,q on G there exists a non-degenerate interval of values of β on which µ1,β , . . . , µq,β and G,q
G,q
µf,β are q + 1 distinct elements of (Gβ,A )ext . Recall that transitive graphs with more than one end have pu (G) = 1. Theorem 1.4. Suppose that G is an infinite, locally finite, connected transitive graph. If G,q G,q G has more than one end, then we have for each q and every β > 0 that µf,β ∈ (Gβ,A )ext , so that β¯1 (G, q) = ∞. The next theorem shows, in particular, that under the extra assumption of unimodG,q G,q ularity, the property that µf,β ∈ (Gβ,A )ext is monotone decreasing in β, meaning that once it holds for some value of β, it holds also for smaller values of β. Theorem 1.5. Suppose that G is an infinite, locally finite, connected transitive unimodular graph. Then for each q: G,q
G,q
G,q
(a) For each β > 0 either µf,β ∈ (Gβ,A )ext , or else µf,β is a mixture of exactly q G,q
distinct elements of (Gβ,A )ext . G,q G,q (b) For all β > β¯1 (G, q), µf,β ∈ (Gβ,A )ext . (c) β¯1 (G, q) < ∞ iff pu (G) < 1. Amenable transitive graphs are unimodular ([SW], Corollary 1), so that Theorem 1.5 applies to them. In this case Theorem 1.2 implies that βc (G, q) = β¯1 (G, q). In contrast, Theorem 4.4(a) of [Jon] and Theorem 1.3 above provide sufficient conditions for βc (G, q) < β¯1 (G, q). Theorem 1.3 and Theorem 1.6 below combined and their extensions in Sects. 4 and 5 (see Theorem 4.5 and Theorem 5.4) generalize the work of [NW] and [Wu3], and show that for a class of non-amenable graphs, for all q, 0 < βc (G, q) < β¯1 (G, q) ≤ β¯2 (G, q) < ∞. Theorem 1.6. Suppose that G is an infinite, locally finite, connected transitive nonamenable graph. Then if pu (G) < 1, we have β¯2 (G, q) < ∞ for each q.
280
R. H. Schonmann
The statement that pu (G) < 1 in the next theorem is due to [BB] (see the proof of Corollary 10 there), where it is also proved that a large class of transitive graphs with one end have the quasi-connected minimal cut sets property. Theorem 1.7. Suppose that G is an infinite, locally finite, connected transitive graph with one end which satisfies the quasi-connected minimal cut sets property. Then pu (G) < 1 and β¯2 (G, q) < ∞ for each q. Often the extra tool of duality available in the planar case allows one to make faster progress in this case. This is the case with Conjecture 1.4. Our results in this case are summarized in the next theorem. Note that in this case, in addition to knowing that the G,q G,q property µf,β ∈ (Gβ,A )ext is monotone decreasing in β, we know that (1.2) is monotone increasing in β (meaning that once it holds for some value of β, it holds also for larger values of β). Moreover, we know in this case that at each β one of these two properties ¯ must hold; informally, “the critical point β(G, q) (= β¯1 (G, q) = β¯2 (G, q)) is sharp”. The statement that pu (G) < 1 in the next theorem is is due to [BS3]. Theorem 1.8. Suppose that G is an infinite, locally finite, connected transitive nonamenable planar graph with one end. Then pu (G) < 1 and for each q: G,q
G,q
(a) For each β > 0 either µf,β ∈ (Gβ,A )ext , or else (1.2) holds. (b) β¯1 (G, q) = β¯2 (G, q) < ∞. Note. Theorem 1.8 extends results obtained independently in [Wu4], a paper that was submitted before the current paper was completed, but of which we learned only after the current paper was submitted. Theorems 1.4–1.8 relate the behavior of percolation and Potts models, and are similar in this respect to results in [Häg3]. But while [Häg3] is concerned with relations between the critical points pc (G) and βc (G, q), the theorems above relate pu (G) to ¯ the conjectured β(G, q), through its technical surrogates β¯1 (G, q) and β¯2 (G, q), and address item (b.2) of Conjecture 1.4. It is important to stress that in our discussion above of Potts models on transitive G,q G,q graphs, we are concerned with the set (Gβ,A )ext , rather than with the set (Gβ )ext . In the case of the Ising model on a homogeneous tree, Conjecture 1.4 predicts a single critical G,q point, βc , separating two regimes in which (Gβ,A )ext has, respectively, cardinality 1 and 3. It is nevertheless known (see [Hig, MoS, BRZ] and [Iof1]; for similar results on more general trees see [Iof2] and [EKPS]) that there is a second critical point β˜ > βc , G,q G,q at which the structure of the set (Gβ )ext changes: µf,β ∈ (Gβ )ext for β < β˜ and G,q ˜ Also for the Ising model on Z3 a second critical point µf,β ∈ (G )ext for β > β. β
G,q
is believed to exist at which the structure of the set (Gβ )ext changes (roughening transition). Related results can also be found in [SeS]. Conjecture 1.4 does not state anything about the behavior at the critical points. These are much more delicate questions, since even in the case of Z2 the behavior at βc depends on q. These questions may be very challenging in the case of amenable graphs, even ¯ at the non-rigorous level. Also for non-amenable graphs the behavior at β(G, q) may depend on G, since this is the case for the analogous problem (the behavior at pu (G)) for percolation (see [Sch3, Per, Lal3, BS3]). Nevertheless, as for percolation, it seems reasonable to make the following conjecture.
Multiplicity of Phase Transitions
281
Conjecture 1.5. Suppose that G is an infinite, locally finite, connected transitive graph. If G is non-amenable, then for each q the behavior near and at βc (G, q) is of meanfield nature. In particular, for q = 2 (Ising model) the transition is continuous, with G,q |Gβc (G,q) | = 1, and the critical exponents take their mean-field values, while for q = G,q
3, 4, . . . the transition is discontinuous, with |(Gβc (G,q),A )ext | = q + 1. For the mean field behavior of the Potts model see, e.g, Sect. 1.C of [Wu5]. Most aspects of Conjecture 1.5 are known to hold for homogeneous trees; see, e.g., Sect. 4.8 of [Bax] for the computation of some critical exponents for the Ising model and [PLM] for the discontinuity of the transition for the Potts model. In [RNO] evidence from series expansions was found supporting mean field values for the exponents β and γ for the Ising model on some non-amenable planar transitive graphs with one end. We have no contribution to the q > 2 case of this conjecture. But for q = 2 we have the following result. Theorem 1.9. Suppose √ that G is an infinite, locally finite, connected transitive graph. If iE (G)/D(G) > 2/ 5, then the Ising model bubble diagram is finite, implying that if G is unimodular the Ising model on G has a continuous phase transition and the critical exponents γ , β and δ exist and take their mean-field values and if the critical exponent α exists it is non-positive. The bubble diagram (to be defined in Subsect. 4.2) is known to be finite on Zd for d > 4 (see [Aiz1, Aiz2, AF]). The only published result on the bubble diagram being finite for a non-amenable graph that we are aware of is in [Wu1]. As for percolation, in that paper this is proved for a Cartesian product of Z and a homogeneous tree with large degree. 1.4. Contact process. For a general introduction to interacting particle systems and to the contact process see for instance [Lig1] and [Lig4]. On an infinite connected graph of bounded degree G, one defines the contact process with infection parameter λ > 0, as a continuous time Markov process in which sites can be in state 0 (healthy) or 1 (infected), with infected sites recovering at rate 1 and healthy sites being infected by each one of their infected neighbors, independently, at rate λ. (The assumption that G is of bounded degree assures that the process is well defined and does not explode.) It is natural to define the following two critical points, λs (G) ≤ λr (G), which provide the boundaries of three distinct phases described as follows. In the description below, in each case, we suppose that the process starts with only finitely many infected sites. (1) For λ < λs (G), the infection eventually disappears a.s. (One says that the contact process dies out.) (2) For λs (G) < λ < λr (G), the infection persists forever with positive probability, but for every finite set of vertices the infection eventually disappears from them a.s. (One says that the contact process survives globally but not locally. “s” stands for survival.) (3) For λ > λr (G), with positive probability the infection recurs to every vertex infinitely often. (One says that the contact process survives locally or recurs. “r” stands for recurrence.) It is well known that under the assumption introduced above, that G is of bounded degree, λs (G) > 0. Also λr (G) ≤ λr (Z+ ) < ∞. Much deeper results include the facts that on
282
R. H. Schonmann
cubic lattices λs (G) = λr (G) (this was proved in [BG]), and that on homogeneous trees of degree at least three λs (G) < λr (G) (this was proved in [Pem] and [Lig2] and the proof was greatly simplified in [Sta]). It is natural to conjecture the following. Conjecture 1.6. Suppose that G is an infinite, locally finite, connected transitive graph. Then λs (G) < λr (G) iff G is non-amenable. Both directions of this conjecture are open problems. The assumption that G is transitive is crucial in Conjecture 1.6. For instance, a homogeneous tree with an infinite linear chain glued to it by one edge is amenable, but has λs (G) < λr (G). It is much harder to find examples of non-amenable graphs which have λs (G) = λr (G), but this was done in [PS], where a spherically symmetric tree with this property was presented. Nevertheless, in Sect. 7 we will show that even without assuming transitivity, a large value for iE (G) compared to an appropriate function of the maximum and the minimum degrees of G implies a separation between λs (G) and λr (G). In the transitive case the result is contained in Theorem 1.10 below. Regarding criticality, we state next a conjecture analogous to the ones for percolation and for the Potts models. Conjecture 1.7. Suppose that G is an infinite, locally finite, connected transitive graph. If G is non-amenable, then the behavior of the contact process near and at λs is of mean-field nature. The following theorem is motivated by the two conjectures above. Theorem 1.10. Suppose √ that G is an infinite, locally finite, connected transitive graph. If iE (G)/D(G) > 1/ 2, then λs (G) < λr (G) and the contact process open triangle condition holds, implying that if G is unimodular the contact process on G dies out at λs (G) and the critical exponents γ , β and δ exist and take their mean-field values. As far as we know, the contact process open triangle condition (to be defined in Subsect. 7.2) has not been proved to hold for any amenable graph (but for the related oriented percolation process, this was done in the case of Zd with large d in [NY]). The contact process open triangle condition was proved to hold for homogeneous trees of large degree in [Wu2] and this result was extended to all the homogeneous trees with degree at least 3 in [Sch1].
1.5. Organization of the paper. In Sect. 2 basic facts about graphs are introduced and reviewed, along with examples. In Sect. 3 independent bond percolation is discussed in more detail, and an extension of Theorem 1.1 (Theorem 3.2) is proved. In Sect. 4 Potts and Ising models are discussed in more detail, their relations to Fortuin–Kasteleyn random cluster models are reviewed and extended and Theorem 1.4, an extension of Theorem 1.5 (Theorem 4.3), Theorem 1.3 and Theorem 1.9 are proved, along with related results. In Sect. 5 the finite island property of [NW] is introduced and used to prove an extension of Theorem 1.6 (Theorem 5.4), Theorem 1.7 and Theorem 1.8. The methods and the discussion in Subsect. 5.2 are of independent interest in connection to the study of independent bond percolation on non-amenable, not necessarily transitive, graphs.
Multiplicity of Phase Transitions
283
Section 6 contains further results, related to those in Sections 3, 4 and 5. First some results on the Fortuin–Kasteleyn random cluster models, which are by-products of our study of the Potts model, are presented. Second the analogues for independent site percolation of various results on independent bond percolation are stated with explanation on how to adapt some of the proofs to this case. In Sect. 7 the contact process is discussed in more detail and Theorem 1.10 and related results are proved. 2. Background on Infinite Graphs 2.1. Basics. A graph is a pair G = (V , E), where V is an arbitrary set and E is a subset of the set {{x, y} : x, y ∈ V }. V is called the set of vertices (or sites) of G, and E is called the set of edges (or bonds) of G. Given an edge e = {x, y}, the vertices x and y are called its end-points; e is said to be incident to x and y and x and y are said to be incident to e. Vertices will be said to be neighbors or adjacent if they belong to a common edge. Edges will be said to be neighbors or adjacent if they share a common vertex. The degree of a vertex x ∈ V is the number, dx , of edges incident to it. All the graphs considered in this paper will be locally finite, i.e., the degree of each vertex will be finite. Set D(G) = sup{dx : x ∈ V }. If D(G) < ∞, the graph G is said to be of bounded degree, and D(G) is its maximal degree. A finite chain is a sequence x0 , x1 , . . . , xn , of distinct sites in which for each i, xi is neighbor to xi+1 . The length of this chain is n. The sites x0 and xn are its end-points. We also express this by saying that this chain connects x0 to xn . An infinite chain is a sequence x0 , x1 , . . . , of distinct sites in which for each i, xi is neighbor to xi+1 . The head of the chain is the vertex x0 . (In an abuse of terminology we will sometimes think of chains as unordered sets of vertices.) A finite path is a sequence e1 , e2 , . . . , en of edges in which for each i, ei is neighbor to ei+1 . An infinite path is a sequence e1 , e2 , . . . of distinct edges in which for each i, ei is neighbor to ei+1 . Such a path is said to be edge-self-avoiding if ei = ej for i = j . It is said to be vertex-self-avoiding if it is edge-self-avoiding and ei ∩ ej = ∅ when |i − j | > 1. Two vertices of a graph are said to belong to the same connected component of the graph if there is a finite chain which has them as end-points. Note that this notion partitions a graph into equivalence classes of connected components. A graph is said to be connected if it has a single connected component. A tree is a graph such that for each pair of vertices x, y ∈ V , there is a unique chain which connects them. The distance dist(y, z) between sites y and z is the minimal length of the chains which have y and z as end-points. The distance between a set of sites R and a set of sites S is the minimal distance between sites in R and sites in S; in an abuse of notation it will be indicated by dist(R, S). A site of G will be singled out and called its root, denoted by r. The ball of center x ∈ V and radius N is the set B(x, N ) = {y ∈ V : dist(x, y) ≤ N }.
284
R. H. Schonmann
We will use the abbreviation B(r, N ) = B(N ). The edge boundary of a set S ⊂ V is ∂E S = {{x, y} ∈ E : x ∈ S, y ∈ V c }, and its vertex boundary is ∂V S = {y ∈ V c : {x, y} ∈ ∂E S for some x ∈ S}. It is easy to see that for finite S, |∂E S| ≤ |∂V S| ≤ |∂E S|, D(G)
(2.1)
where the left-hand-side expression is 0 if D(G) = ∞. If G is a tree, then |∂E S| = |∂V S|. The expression S V will mean that S is a finite subset of V . In a common abuse of notation we will sometimes denote a subset of V which contains a single element by the name of this element. 2.2. Transitive graphs. An automorphism of G is a one-to-one map φ from V onto V which preserves the graph structure, i.e., such that E = {{φ(x), φ(y)} : {x, y} ∈ E}. The set of all the automorphisms of G will be denoted by Aut(G). A graph is said to be transitive (or vertex-transitive or homogeneous) if for each pair x and y of its vertices there is an automorphism of the graph which maps x to y. Intuitively, a graph is transitive if all its vertices play the same role. A graph is said to be quasi-transitive if there is a finite set of vertices, V0 , with the property that each vertex of the graph can be mapped into one of the vertices of V0 by an automorphism. Intuitively, a graph is quasi-transitive if there is a finite number of types of vertices, and vertices of the same type play the same role. Typically results on transitive graphs can be extended to quasi-transitive graphs with minor modifications. This is the case of the results in this paper. An important class of transitive graphs is that of the Cayley graphs of finitely generated groups. (Here and in Subsect. 2.3 we will use a number of terms which come from the basic theory of infinite groups. Readers not familiar with them may consult any standard text, e.g., [Hun]. No results from group theory will be used in the proofs contained in this paper.) Suppose that V is such a group and that S is a finite symmetric set of generators for it. Then the (right) Cayley graph of V for this set of generators is the graph G = (V , E) which has E = {{x, y} : x, y ∈ V , y = xz for some z ∈ S}. As we will see in Subsect. 2.3 below, the class of Cayley graphs of finitely generated groups includes a large number of examples of transitive graphs and most of those which are of greater interest. Nevertheless this class of graphs is known not to exhaust the class of transitive graphs (see [CK, BLPS1] or [Lyo5]). Moreover there is a class of graphs, known as transitive unimodular graphs which is a proper subclass of that of the transitive graphs and that properly contains the class of Cayley graphs of finitely generated groups. To define this class of graphs, we first define the stabilizer of a site x ∈ V as S(x) = {γ ∈ Aut(G) : γ (x) = x}. A transitive graph G is unimodular if for each x, y ∈ V , |{γ (y) : γ ∈ S(x)}| = |{γ (x) : γ ∈ S(y)}|.
Multiplicity of Phase Transitions
285
2.3. Examples. The basic examples of graphs of interest in statistical mechanics and related areas are the cubic lattices Zd , d ≥ 1, and other graphs which can be embedded in a periodic fashion in Rd , for some d ≥ 1 (see [Kes] for precise definitions). The cubic lattice Zd is a Cayley graph of the free Abelian group of rank d. The next most important class of examples is that of the homogeneous trees, i.e., trees which are transitive. Clearly there is exactly one homogeneous tree with degree D, for each D = 0, 1, 2, . . . , but only in case D ≥ 2 they are infinite. We will denote by Tb the homogeneous tree of degree D = b + 1. (This somewhat standard notation is motivated by the fact that the index b is the branching number of the tree, as defined for instance in [Lyo1]). Note that T1 = Z. When D = b + 1 is even, Tb is a Cayley graph of the free group with D/2 free generators. When D = b + 1 is odd, Tb is a Cayley graph of the group with b/2 free generators and one generator which is identical to its inverse. Discrete groups of isometries of the hyperbolic spaces Hd , d ≥ 2 (Fuchsian groups in the d = 2 case) define an important class of Cayley graphs (see [SeS, Lal1, Lal2] and [BS3]). The vertices of such graphs can be thought of as the tiles of a tesselation of Hd , with edges connecting vertices which correspond to tiles whose boundaries intersect in a d − 1-dimensional surface. Such examples may be seen as “crystals in a non-Euclidean space” (their Euclidean counterparts are Cayley graphs of discrete groups of isometries of Euclidean space, including Zd and the triangular and hexagonal lattices). New graphs can be obtained from old graphs by means of various operations. One important example of such an operation is the Cartesian product. Given a pair of graphs G1 = (V1 , E1 ) and G2 = (V2 , E2 ), we will let G1 × G2 denote the graph which has as vertices the elements of the Cartesian product V1 × V2 and an edge connects (x, u) to (y, v) iff either x = y and {u, v} ∈ E2 or else {x, y} ∈ E1 and u = v. In the special case in which G1 and G2 are Cayley graphs, the operation of taking their Cartesian product corresponds to the group-theoretic notion of taking a direct product of the groups. So the resulting graph is again a Cayley graph. Given two finitely generated groups G1 and G2 , one can also take their free product, G1 ∗G2 . Informally, this free product corresponds to the group generated by the collection of all the generators of G1 and G2 , keeping all the relations among them as in G1 and G2 , but not adding any new relation. For a formal definition, see [Hun]. Cayley graphs of free products of finitely generated groups are an important source of counterexamples for statements about transitive graphs. Next we introduce two somewhat special examples of non-transitive graphs, which will be used to clarify several issues in the remarks in this paper. T! is the tree in which the vertices at distance n from the root have n + 1 neighbors. Note that the number of vertices at distance n from the root is (n − 1)!. Given a number a ∈ {2, 3, . . . }, the graph Br(a) (for “bridges”) will be defined as follows. For each odd positive integer n, take a disjoint copy, Gn = (Vn , En ), of the complete graph with a (n+1)/2 vertices (i.e., En contains all pairs of distinct vertices in Vn ). For each even non-negative integer n take a distinct vertex xn , distinct also from all the vertices in V1 , V3 , . . . . The set of vertices of Br(a) is V = {x0 , x2 , . . . }∪V1 ∪V3 ∪. . . and its set of edges E is obtained by taking all the edges in each set En , n = 1, 3, . . . and adding an edge between xn and each vertex in Vn−1 and Vn+1 , for n = 2, 4, . . . and an edge between x0 and each vertex in V1 . x0 will be taken as the root of Br(a). Note that the set of vertices at distance n from the root is {xn } for n even and is Vn for n odd.
286
R. H. Schonmann
2.4. Isoperimetric constants. Given an infinite, locally finite, connected graph G = (V , E) its edge-isoperimetric constant is defined as |∂E S| iE (G) = inf : ∅ = S V . |S| Its vertex-isoperimetric constant is defined as |∂V S| : ∅ = S V . iV (G) = inf |S| Its anchored edge-isoperimetric constant (introduced in [BLS]) is defined as |∂E S| ∗ iE (G) = lim inf : r ∈ S V , S connected, |S| ≥ n . n→∞ |S| Its anchored vertex-isoperimetric constant is defined as |∂V S| : r ∈ S V , S connected, |S| ≥ n . iV∗ (G) = lim inf n→∞ |S| Note that no one of the isoperimetric constants defined above depends on the choice of the root r. Clearly iE (G) ≤ iE∗ (G), iV (G) ≤ iV∗ (G). And from (2.1) we have iE (G) ≤ iV (G) ≤ iE (G), D(G) so that for graphs of bounded degree, iE (G) > 0
iff iV (G) > 0,
iE∗ (G) ≤ iV∗ (G) ≤ iE∗ (G), D(G) iE∗ (G) > 0
iff
iV∗ (G) > 0.
The same is true for trees, since for them iE (G) = iV (G) and iE∗ (G) = iV∗ (G). While even for trees of bounded degree one can have iE (G) = 0 and iE∗ (G) > 0, or iV (G) = 0 and iV∗ (G) > 0 (see [BLS]), these are not possible if G is transitive (see Proposition 7.4 of [HSS]). The graphs Br(a), a ≥ 2 can be used to show that iE (G) > 0 does not imply iV (G) > 0 (nor even iV∗ (G) > 0). It is easy to see that iV∗ (Br(a)) = 0, by considering the sets Sk = {x ∈ V : dist(r, x) ≤ 2k + 1}, k = 0, 1, 2, . . . . For these sets |Sk | → ∞ as k → ∞, while |∂V Sk | = 1 for all k. Now we argue why for each a ∈ {2, 3, . . . }, iE (Br(a)) > 0. Given a finite nonempty set S of vertices of Br(a), let depth(S) be the maximal distance to the root of the vertices in S. There are two cases to consider. First, if depth(S) = 2k, for some integer k, then ∂E S contains all the edges between vertex x2k and vertices in V2k+1 , so that |∂E S| ≥ |V2k+1 | = a k+1 . On the other hand it is clear that |S| ≤ |{x ∈ V : dist(r, x) ≤ 2k}| = k + 1 + a + a 2 + · · · + a k = k + 1 + (a k+1 − 1)/(a − 1) ≤ k + a k+1 . Second, if depth(S) = 2k − 1, for some integer k, then ∂E S contains for each x ∈ V2k−1 either the edge between x and the vertex x2k−2 (in case x ∈ S), or the edge between x and the vertex x2k (in case x ∈ S), so that |∂E S| ≥ |V2k−1 | ≥ a k . On the other hand, it is clear that |S| ≤ |{x ∈ V : dist(r, x) ≤ 2k − 1}| ≤ |{x ∈ V : dist(r, x) ≤ 2k}| ≤ k + a k+1 , as above. So ak iE (Br(a)) ≥ inf > 0. k≥1 2k + a k+1
Multiplicity of Phase Transitions
287
2.5. Spectral radius. For more on the topic in this subsection, see, e.g., [MW] and references therein. Given a locally finite connected graph G = (V , E), let AG stand for its adjacency matrix, i.e., AG (x, y) takes the values 1 or 0, according to the vertices x and y being neighbors or not in G. Let AnG denote the nth power of AG , and note that AnG (x, y) is the number of paths of length n which connect x to y. It is easy to see that R(G) = lim sup (AnG (x, y))1/n n→∞
does not depend on x and y. The quantity R(G) is the spectral radius associated to the matrix AG . A graph G = (V , E) is said to be bipartite if V is the disjoint union of two sets, V1 and V2 , and each edge in E contains one element of V1 and one element of V2 (e.g., the graphs Zd are bipartite). Set bip(G) = 2 if G is bipartite and bip(G) = 1 otherwise. For x, y ∈ V , set oddG (x, y) = 1 if dist(x, y) is odd and oddG (x, y) = 0 if dist(x, y) is even. From Theorem 4.6 in [MW], we have AnG (x, y) ≤ (R(G))n
for all x, y ∈ V ,
n = 1, 2, . . . .
(2.2)
For n(k) = bip(G)k + oddG (x, y), n(k)
lim (AG (x, y))1/n(k) = R(G)
k→∞
for all x, y ∈ V .
(2.3)
It is clear that R(G) ≤ D(G), and it is important to know when this inequality is strict. For transitive graphs it is known that R(G) < D(G) is equivalent to iE (G) > 0. Quantitative versions of this statement are available, and the following one, valid for all infinite, connected, bounded degree graphs will play a central role in this paper, (iE (G))2 + (R(G))2 ≤ (D(G))2 .
(2.4)
This inequality (which saturates for homogeneous trees) was derived originally in [Moh1], Theorem 2.1(a). It is sometimes referred to as a “Cheeger-type-inequality”. 2.6. Number of ends. Given S V , the graph G\S is the graph obtained from the graph G by removing the vertices which belong to S and the edges incident to these vertices. The number of ends of the graph G is E(G) = sup {number of infinite connected components of G\S}. SV
Any Cartesian product of infinite graphs can easily be seen to have a single end. Any Cayley graph of a free product of non-trivial finitely generated groups can easily be seen to have infinitely many ends. It is known (see Sect. 6 of [Moh2]) that for transitive graphs the number of ends can only be: 1 (e.g., Zd , d ≥ 2), 2 (e.g., Z), or ∞ (e.g., Tb , b ≥ 2). Moreover, when the number of ends is 2, the graph is amenable and when the number of ends is infinity the graph is non-amenable. The following proposition will be used in this paper; it can be easily proved with the arguments in the proofs of Propositions 6.1 and 6.2 in [Moh2]. Proposition 2.1. Suppose that G is an infinite, locally finite, connected transitive graph. If G has more than one end, then there is a positive integer n and vertices . . . , x−1 , x0 , x1 , . . . such that the balls B(xk , n), k ∈ Z have the following property. For each i < j , any chain from B(xi , n) to B(xj , n) intersects each B(xk , n), k = i + 1, . . . , j − 1. If G has two ends, then the choices above can be made so that there is also l < ∞ such that any x ∈ V is within distance l of one of the balls B(xk , n), k ∈ Z.
288
R. H. Schonmann
l 2.7. Cut sets. Given
a graph G = (V , E), we defineG as the graph with vertex set V l and edge set E = {x, y} : x, y ∈ V , dist(x, y) ≤ l . The line graph (or cover graph) of G, denoted GE , is the graph which has vertex set VE = E and edge set EE = {{e, f } : as elements of E, e and f share and endpoint} = {{e, f } : e, f ∈ E, e ∩ f = ∅}. We will denote by distE (·, ·) the distance function on VE × VE = E × E for the graph GE . Given two vertices x, y ∈ V , a (x, y) cut set 5 ⊂ E is a set of edges that has nonempty intersection with every path from x to y. A (x, y) cut set is minimal (an mcs) if it contains no proper subset which is also a (x, y) cut set. Given 6 ⊂ E, set C(6) = sup distE (61 , 62 ). 61 ∪62 =6
If C(6) ≤ l, then we say that 6 is l-close. We will say that an infinite, locally finite, connected graph has the quasi-connected minimal cut sets property if for some l < ∞ for each pair of vertices x, y any (x, y) mcs is l-close. It is easy to see that in this case any infinite mcs, when considered as a set of vertices in (GE )l , must contain an infinite chain in this graph. The following result from [BB] shows that the most commonly considered transitive graphs with one end satisfy the quasi-connected minimal cut sets property. Theorem 2.1 (Babson, Benjamini). Suppose that G is the Cayley graph of a finitely generated, finitely present group, with respect to any symmetric finite set of generators. Then G has the quasi-connected minimal cut sets property. See Theorem 1 and Example 3 in [BB] for the proof. It is natural to ask whether all transitive graphs with one end satisfy the quasiconnected minimal cut sets property. Unfortunately this seems not to be the case (I thank Russ Lyons for having told me that R. Kenyon may have a counterexample.) 2.8. Planar graphs. Let 5 be the Euclidean (or the hyperbolic) plane. A graph G = (V , E) is planar if it can be embedded in 5 satisfying the following restrictions. Each vertex x ∈ V is mapped into a point vx ∈ 5. Each edge e = {x, y} ∈ E is mapped into the image 6e = γe ([0, 1]) of a curve γe : [0, 1] → 5, with γe (0) = vx and γe (1) = vy . If e1 = e2 , then γe1 ((0, 1)) ∩ γe2 ((0, 1)) = ∅. The connected components of 5\(∪e∈E 6e ) are called the faces of the embedding. The dual multigraph (meaning that more than one edge can connect two vertices and edges may have two identical endpoints) G† = (V † , E † ) is defined as follows. V † is the set of faces and to each edge e ∈ E one associates a dual edge e† connecting the two faces which have 6e in their boundary. When G is planar, transitive and has one end, it is not hard to show that each face is bounded. Proposition 2.1 of [BS3] gives much more detailed information in this case. In particular the embedding can be done so that each bounded set in 5 contains only finitely many points vx , x ∈ V (embedded vertices) and intersects only finitely many curve images 6e , e ∈ E (embedded edges). 3. Independent Bond Percolation 3.1. Background on independent bond percolation on graphs. Given an infinite, locally finite, connected graph G = (V , E), the probability measure which corresponds to
Multiplicity of Phase Transitions
289
making each bond of G occupied independently with probability p and vacant otherwise G will be denoted by PG p , with corresponding expectation Ep . Clusters are the connected components of the graph obtained from G by removing all the vacant bonds. In a common abuse of terminology, we sometimes think of the clusters as being graphs and sometimes as being the corresponding vertex sets. Given A, B ⊂ V , we will write {A ↔ B} for the event that A and B intersect a common cluster. We will write {A → ∞} for the event that A intersects an infinite cluster. The union of the clusters that intersect A is denoted by C(A). For x ∈ V set θxG (p) = PG p (x → ∞). Obviously, in case G is transitive, θxG (p) = θ G (p) does not depend on x ∈ V . The number of infinite clusters will be denoted by N . The threshold for percolation is pc (G) = inf{p : θxG (p) > 0 for some x ∈ V } = inf{p : θxG (p) > 0 for all x ∈ V } G = inf{p : PG p (N > 0) > 0} = inf{p : Pp (N > 0) = 1},
where the last equality is a consequence of Kolmogorov’s 0-1 law. The analogue of Theorem 2 in [BS1] for bond percolation states that pc (G) ≤
1 . 1 + iE (G)
(3.1)
This result will be strengthened in Sect. 5, when we state and prove Theorem 5.3. It is very natural to define several other critical points based on the connectivity properties under PG p: pexp (G) −γ dist(x,y) (x ↔ y) ≤ Ce for all x, y ∈ V , = sup p : for some C, γ ∈ (0, ∞) PG p
pconn (G) = sup p : lim sup PG p (x ↔ y) = 0 , n→∞ x,y∈V dist(x,y)≥n p¯ conn (G) = sup p : inf PG (x ↔ y) = 0 . p x,y∈V
Obviously, pexp (G) ≤ pconn (G) ≤ p¯ conn (G).
(3.2)
For an infinite, locally finite, connected transitive graph, one defines pu (G) = inf{p : PG p (N = 1) = 1}. It is known from [HP] and [Sch2] that for all p > pu (G), PG p (N = 1) = 1. A simple application of Harris’ inequality (see, e.g., (26) in [HPS]) gives then that for p > pu (G), G 2 PG p (x ↔ y) ≥ (θ (p)) . Therefore, when G is transitive, (3.2) can be extended to pexp (G) ≤ pconn (G) ≤ p¯ conn (G) ≤ pu (G).
(3.3)
290
R. H. Schonmann
For transitive G, Theorem 4 in [BS1] provided a lower bound for pu (G) in terms of the spectral radius of the simple symmetric random walk on G. This result was strengthened and extended in Proposition 8.3 in [HPS], which states that for an infinite, locally finite, connected graph, for p < 1/R(G), for any x, y ∈ V , PG p (x ↔ y) ≤
(pR(G))dist(x,y) . 1 − pR(G)
(3.4)
In particular pexp (G) ≥
1 . R(G)
(3.5)
In the transitive case, the bound on pu (G) obtaining from combining (3.3) with (3.5) is identical to the bound implicit in Theorem 4 in [BS1]: pu (G) ≥
1 . R(G)
(3.6)
In Subsect. 1.2, in connection to Conjecture 1.3, we mentioned some known facts on the issue of when pu (G) < 1. The following is a strengthening of one of these results. From the fact that transitive graphs with infinitely many ends are non-amenable, from (3.1) and from Proposition 2.1, we obtain for such graphs pc (G) < p¯ conn (G) = pu (G) = 1.
(3.7)
But the following example, from [LyS], shows that there are transitive graphs with infinitely many ends and pconn (G) < pu (G). For instance, this is the case for a Cayley graph G of the free product Z2 ∗ Z. Since G has infinitely many ends, pu (G) = 1, but since copies of Z2 are embedded in G, pconn (G) ≤ pc (Z2 ) < 1. Observe that also from Proposition 2.1, we have that if G is transitive and has 2 ends, then pc (G) = pexp (G) = pconn (G) = p¯ conn (G) = pu (G) = 1.
(3.8)
The following is a natural conjecture. Conjecture 3.1. For any infinite, locally finite, connected transitive graph G and for any p ∈ [0, 1], inf PG ⇐⇒ PG p (x ↔ y) > 0 p (N = 1) = 1. x,y∈V
In particular p¯ conn (G) = pu (G).
(3.9)
In [LyS], this was proved to be the case if G is also unimodular. Note then that from (3.7) and (3.8), the only cases in which (3.9) has not been proved are non-unimodular transitive graphs with one end. Open Problem 3.1. For which transitive graphs G is pconn (G) = pu (G)? The counterexample Z2 ∗ Z of [LyS] has infinitely many ends; is there any counterexample with one end?
Multiplicity of Phase Transitions
291
Open Problem 3.2. Is pexp (G) = pconn (G) for every transitive graph G? The methods of [AB] show that for every transitive graph G, for all p < pc (G), EG p (|C(r)|) < ∞. Combining this with the methods of [Ham] gives under the same conditions that PG p (x ↔ y) ≤ C exp(−γ dist(x, y)), for some C, γ ∈ (0, ∞). Therefore, for transitive graphs, (3.3) can be extended to pc (G) ≤ pexp (G) ≤ pconn (G) ≤ p¯ conn (G) ≤ pu (G).
(3.10)
If G is transitive and amenable, then we know that also pc (G) = pu (G), so that from (3.10) we learn that these graphs satisfy the equalities in Conjecture 3.1 and Open Problem 3.2. In [Lal3], Proposition 5.1, and in [BS3], Corollary 4.5, the equality pexp (G) = pconn (G) = pu (G) was also proved for a class of transitive non-amenable planar graphs with one end, which were also showed to satisfy the expected inequalities pc (G) < pu (G) < 1. 3.2. Mean field criticality for independent percolation. Next we introduce diagrams which are commonly employed to prove mean field critical behavior near and at pc . Given an infinite, locally finite, connected graph G = (V , E) and p ∈ [0, 1], for k = 0, 1, . . . and x, y ∈ V set DiagG p,k (x, y) = G G G PG p (x ↔ z1 ) Pp (z1 ↔ z2 ) . . . Pp (zk−1 ↔ zk ) Pp (zk ↔ y).
(3.11)
z1 ,...,zk ∈V
DiagG p,1 (x, y) is known as the bubble diagram, and is of relevance in the study of the critical behavior of the Ising model, as will be reviewed in the next section. DiagG p,2 (x, y) is known as the triangle diagram, and is of relevance in the study of the critical behavior of the independent percolation model. Before we can state the known results, we need one more definition: M G (p, h) = 1 −
∞ n=1
e−hn PG p (|C(r)| = n),
where h ≥ 0. We will sometimes below abbreviate pc = pc (G), since there is no risk of confusion. The results below on mean field critical behavior were proved in the union of the papers [AN, BA] and [Ngu]. Suppose that G is an infinite, locally finite, connected transitive unimodular graph. (These papers considered smaller classes of graphs, usually only Zd , but their arguments work with this generality. The role of unimodularity will be explained below.) Under the open triangle condition, which reads, lim
sup
n→∞ x,y∈V ,dist(x,y)≥n
DiagG pc ,2 (x, y) = 0,
(3.12)
one has the following (the labels on the left indicate the way one usually refers to each result, in terms of a corresponding critical exponent): −1 [γ = 1] C1 (pc − p)−1 ≤ EG p (|C(r)|) ≤ C2 (pc − p) ,
for p < pc ,
292
R. H. Schonmann
[β = 1] C1 (p − pc )1 ≤ θ G (p) ≤ C2 (p − pc )1 , for p > pc , [δ = 2] C1 h1/2 ≤ M G (pc , h) ≤ C2 h1/2 , for h > 0, m+1 )/EG (|C(r)|m ) [ = 2] For m = 1, 2, . . . C1 (pc − p)−2 ≤ EG p (|C(r)| p ≤ C2 (pc − p)−2 , for p < pc , where in each case C1 , C2 ∈ (0, ∞). G G Since by Harris’ inequality PG p (z2 ↔ x) ≤ Pp (z2 ↔ y)/Pp (x ↔ y), it is clear that the open triangle condition (3.12) implies the closed triangle condition: DiagG pc ,2 (r, r) < ∞.
(3.13)
Equations (3.12) and (3.13) are actually equivalent (C. Newman and C. Wu, private communication), but this fact (which seems to only have appeared in print with a proof that is restricted to the special case of Zd ) will not be needed in this paper. It is beyond the scope of this paper to review the long and technical proofs that diagrammatic conditions imply certain mean-field features of criticality. Nevertheless, it is important to point out the sort of step in such proofs which may not be true for a general transitive graph, but that can be justified under the extra assumption of unimodularity. For concreteness, we consider one example: The derivation in [AN] of their equation (6.4) from Russo’s formula. The following statement follows that equation: “Where we made a simple use of translation invariance (in effect to simplify later notation we replaced 0 with x in the more natural expression)”. In our notation, they performed the transformation P(r ↔ x ↔ u ↔ y) = P(x ↔ r ↔ u ↔ y). (3.14) x,u,y dist(x,u)=1
x,u,y dist(r,u)=1
This transformation is justified on Zd , using properties of the group of translations (as the authors indicated). The arguments can also easily be extended to Cayley graphs, with the group of automorphisms {φv : v ∈ V }, where φv (u) = vu for each u ∈ V (φv is left group multiplication by v), taking the role of the group of translations in the case of Zd . On the other hand, (3.14) may not hold in general for an arbitrary transitive graph. The validity of (3.14) in the case of unimodular transitive graphs is directly related to the “mass transport technique”, which works as follows. Let M(·, ·) be a function from V × V to [0, ∞) which is invariant under diagonal actions of the automorphisms of G, i.e., M(a, b) = M(φ(a), φ(b)) for all a, b ∈ V , φ ∈ Aut (G). Corollary 3.5 in [BLPS1] states that if G = (V , E) is an infinite, locally finite, connected transitive unimodular graph, then for each a ∈ V , M(a, b) = M(b, a). (3.15) b
b
Equation (3.14) follows immediately, by taking M(a, b) = P(a ↔ b ↔ u ↔ y) u,y dist (b,u)=1
and using (3.15) with a = r. Open Problem 3.3. Can one avoid the use of unimodularity in the derivation of meanfield criticality from diagrammatic conditions?
Multiplicity of Phase Transitions
293
3.3. Consequences for independent percolation of “high non-amenability”. Note that DiagG p,k (x, y) is non-decreasing in k and G DiagG p,0 (x, y) = Pp (x ↔ y).
Therefore, the following result extends Proposition 8.3 from [HPS] ((3.4), (3.5) and (3.6) above). Theorem 3.1. Suppose that G = (V , E) is an infinite, locally finite, connected graph. For each p ∈ [0, 1], each k = 0, 1, . . . , and each x, y ∈ V , DiagG p,k (x, y)
≤
∞
l k (pR(G))l .
l=dist(x,y)
Therefore, if p < 1/R(G), sup
x,y∈V ,dist(x,y)≥n
DiagG p,k (x, y) → 0 exponentially fast as n → ∞,
and for each x ∈ V ,
DiagG p,k (x, x) < ∞.
Proof. For u, v ∈ V , let Nl (u, v) be the number of edge-self-avoiding paths from u to v with length l. Set x = z0 and y = zk+1 . Using (2.2) we obtain: DiagG p,k (x, y) = ≤
k
z1 ,...,zk ∈V i=0
k
z1 ,...,zk ∈V i=0
≤
k
PG p (zi ↔ zi+1 )
li ≥0
p l0 +···+lk
l0 ,l1 ,...,lk ≥0
=
p
l0 ,l1 ,...,lk ≥0 l0 +···+lk ≥dist(x,y)
≤
=
Nli (zi , zi+1 ) pli AlGi (zi , zi+1 ) pli
k
z1 ,...,zk ∈V i=0 l0 +···+lk
l0 ,l1 ,...,lk ≥0
=
li ≥0
z1 ,...,zk ∈V i=0
=
AlGi (zi , zi+1 )
AlG0 +···+lk (x, y)
p l0 +···+lk AlG0 +···+lk (x, y) p l0 +···+lk (R(G))l0 +···+lk
l0 ,l1 ,...,lk ≥0 l0 +···+lk ≥dist(x,y) ∞ l=dist(x,y) l0 ,l1 ,...,lk ≥0 l0 +···+lk =l
(pR(G))l ≤
∞ l=dist(x,y)
l k (pR(G))l .
!
294
R. H. Schonmann
The following theorem, which thanks to (3.3) extends Theorem 1.1, is an immediate consequence of (3.1), (3.5), Theorem 3.1 and (2.4). The reader can either do the elementary computation to check this claim, or wait until we state and prove the stronger Theorem 4.4 in the next section. Theorem 3.2. Suppose that G is an infinite, locally finite, connected graph. If iE (G) > √ ( 2D(G)2 − 1 − 1)/2 (in particular if iE (G)/D(G) > 1/ 2), then pc (G) < pexp (G) and the open triangle condition (3.12) holds. The following question is motivated by this theorem. Open Problem 3.4. Is pc (G) < pexp (G) for every infinite, bounded degree, connected, non-amenable graph G? An affirmative answer would confirm Corollary 1.1. Note that for a = 2, 3, . . . , Br(a) is non-amenable and has pc = pexp = 0 (and PG p (N = 1) = 1, for p > 0), but these graphs do not have bounded degree. 4. Potts and Ising Models I 4.1. Applications of the Fortuin–Kasteleyn random cluster model. We will use the well known construction of the Gibbs measures of the Potts models by means of the dependent bond percolation process known as the Fortuin–Kasteleyn random cluster model, denoted by FK model in the sequel. We will summarize next the needed facts about the FK model and its relation to the Potts models. Readers who want to learn more about it are referred to the review [GHM]. On an infinite, locally finite, connected graph G, for each q ∈ [1, ∞) and each p ∈ [0, 1] there are two q-FK measures which will be of relevance to us in this paper. G,q G,q The wired one will be denoted by Pw,p , while the free one will be denoted by Pf,p . In G,1 G the case q = 1, we have PG,1 w,p = Pf,p = Pp . G,q
The Gibbs distributions µi,β , i = 1, . . . , q for the q-Potts model on G at inverse temperature β can be obtained by considering a random configuration of occupied and G,q vacant bonds according to the law Pw,p , p = 1 − e−β , and assigning to all the sites in each infinite cluster the state i, and to all the sites in each finite cluster a state from {1, . . . , q}, chosen uniformly at random for each cluster and independently for different G,q G,q clusters. The law of this coupled process with marginals Pw,p and µi,β will be denoted G,q
by Pw,i,p .
G,q
The Gibbs distribution µf,β , for the q-Potts model on G at inverse temperature β can be obtained by considering a random configuration of occupied and vacant bonds G,q according to the law Pf,p , p = 1 − e−β , and assigning to all the sites in each cluster a state from {1, . . . , q}, chosen uniformly at random for each cluster and independently G,q G,q for different clusters. The law of this coupled process with marginals Pf,p and µf,β will G,q
be denoted by Pf,p . In the remainder of this paper the symbol ∗ will represent w or f (when it appears more than once in an expression or statement it is understood that it takes the same meaning in each appearance). Unless otherwise stated, facts stated for the FK models
Multiplicity of Phase Transitions
295
are valid for all q ∈ [1, ∞) and relations between the FK and the Potts model are valid for all q ∈ {2, 3, . . . }. The following stochastic inequalities will be used to relate the FK model to the independent bond percolation model: G,q
P∗,p ≤ PG p,
(4.1)
and G,q
P∗,p ≥ PG p/(p+(1−p)q) .
(4.2)
G,q
The measures P∗,p have positive correlations, G,q
G,q
G,q
G,q
Pf,p ≤ Pw,p ,
(4.3)
and if p1 ≤ p2 , then P∗,p1 ≤ P∗,p2 .
(4.4)
G,q
The measures P∗,p have a trivial tail σ -field. This can be proved by the argument in the proof of Theorem 3.1(c) in [Gri1], where the case of Zd was considered. To adapt that argument to the setting of an arbitrary infinite, locally finite, connect graph, one can use Theorem 6.17 of [GHM] and the last display in that paper before that theorem. G,q In case G is transitive, the measures P∗,p are automorphism invariant. Since they have trivial tail σ -fields, they are extremal among those. The definitions of critical points for independent percolation are naturally extended G,q to percolation under the measure P∗,p . In particular we define G,q
pc,∗ (G, q) = inf{p : P∗,p (x → ∞) > 0 for some x ∈ V } G,q
= inf{p : P∗,p (x → ∞) > 0 for all x ∈ V }, G,q pexp,∗ (G, q) = sup p : for some C, γ ∈ (0, ∞) P∗,p (x ↔ y) ≤ Ce−γ dist(x,y) for all x, y ∈ V , G,q p¯ conn,∗ (G, q) = sup p : inf P∗,p (x ↔ y) = 0 . x,y∈V
G,q
G,q
We denote by " #i,β ( resp. " #f,β ) the expectation with respect to the Gibbs measure
G,q
G,q
µi,β (resp. µf,β ). It is well known that for i = 1, . . . , q, q 1 G,q G,q δσx ,i − = Pw,p (x → ∞), q −1 q i,β
(4.5)
pc,w (G, q) = 1 − e−βc (G,q) .
(4.6)
which leads to
296
R. H. Schonmann
It is also well known that for i = 1, . . . , q, q 1 G,q G,q G,q δσx ,σy − = Pw,p (x ↔ y → ∞) + Pw,p (x → ∞, y → ∞), q −1 q i,β
(4.7)
and q 1 G,q G,q = Pf,p (x ↔ y). δσx ,σy − q −1 q f,β
(4.8)
G,q
Equations (4.5) and (4.7) can be derived from the coupling Pw,p , and similarly (4.8) can G,q be obtained from the coupling Pw,p . It is well known that G,q
Pw,p (N = 0) = 1
$⇒
G,q
G,q
Pf,p = Pw,p .
(4.9)
Lemma 4.3 of [Jon] and its proof (presented in compact form in the proof of Proposition 5.1 in [Lyo5]) provide
G,q
µf,β =
i=1,...,q
1 G,q µ q i,β
⇐⇒
G,q
G,q
Pf,p = Pw,p (4.10)
q>1
$⇒
G,q
Pf,p (N ≤ 1) = 1.
Note the restriction on q in the last implication in (4.10); if q = 1 and pc (G) < pu (G), this implication is clearly false. In this paper we will not be able to use the equivalence in (4.10) to prove results about the Potts model, but rather we will use this relation in Subsect. 6.1 in the opposite direction. The following theorem provides another relation between properties of the Gibbs measures of the Potts model and connectivity properties of the related FK model. Note that using (4.8) this theorem can be restated with no mention of the FK model. Given φ ∈ Aut(G) and a spin configuration σ ∈ {1, . . . , q}V , define the action of φ on σ by (φ(σ ))x = σφ −1 (x) , for each x ∈ V . Theorem 4.1. Suppose that G = (V , E) is an infinite, locally finite, connected transitive graph and set p = 1 − e−β . For each q, if G,q
inf Pf,p (x ↔ y) = 0,
x,y∈V
then there exists a sequence of automorphisms of G, (φn )n≥1 , such that for any pair of spin events A, B ⊂ {1, . . . , q}V , G,q
G,q
G,q
lim µf,β (A ∩ φn−1 (B)) = µf,β (A)µf,β (B).
n→∞ G,q
(4.11)
G,q
In particular µf,β ∈ (Gβ,A )ext . G,q
G,q
Note. The conclusion that µf,β ∈ (Gβ,A )ext in the theorem above is not new. It is contained in Lemma 6.4 of [LyS].
Multiplicity of Phase Transitions
297
Proof. From the hypothesis of the theorem we know that there is a sequence of sites xn ∈ V such that G,q
lim Pf,p (r ↔ xn ) = 0.
n→∞
(4.12)
Because G is transitive, there are corresponding φn ∈ Aut(G) such that φn (r) = xn . G,q G,q In order to prove that (4.12) implies (4.11), we use the coupling Pf,p between Pf,p G,q
G,q
and µf,β . Pf,p acts on events contained in = = {1, . . . , q}V × {0, 1}E . To avoid the introduction of cumbersome notation, we will identify below any spin event D ⊂ {1, . . . , q}V with the event D × {0, 1}E ⊂ =, and any percolation event D ⊂ {0, 1}E with the event {1, . . . , q}V × D ⊂ =. It is a standard fact that to prove (4.11) it is sufficient to establish it in case the events A and B are cylinder events, i.e., depend only on the state of finitely many spins. We suppose therefore that the events A and B only depend on the states of the spins in a finite set 6 ⊂ V . Consider the following percolation event: Fn = {6 ↔ φn (6)} ⊂ {0, 1}E . G,q
Since Pf,p has positive correlations and is automorphism invariant, we have G,q
G,q
G,q
Pf,p (Fn ) = Pf,p (Fn ) ≤ C Pf,p (0 ↔ xn ), G,q
G,q
where C = (Pf,p (0 ↔ 6) Pf,p (all edges in 6 are occupied))−2 does not depend on n. Therefore (4.12) implies G,q
lim Pf,p (Fn ) = 0.
(4.13)
n→∞
For each S V , let PS be the set of all possible partitions of S, and let 5S be the random partition of S according to the relation of being in the same percolation cluster. Let Fn be the set of partitions of 6 ∪ φn (6) in which some site in 6 and some site in φn (6) are in the same piece of the partition. This means
56∪φn (6) ∈ Fn = Fn . (4.14) Suppose that π ∈ (Fn )c . Then, conditioned on 56∪φn (6) = π , the choices of the states of the spins in the sets 6 and φn (6) are performed independently, i.e., G,q
Pf,p (A ∩ φn−1 (B)|56∪φn (6) = π ) G,q
G,q
= Pf,p (A|56∪φn (6) = π ) Pf,p (φn−1 (B)|56∪φn (6) = π ). Therefore, G,q
G,q
µf,β (A ∩ φn−1 (B)) = Pf,p (A ∩ φn−1 (B)) =
π∈(Fn )c G,q
G,q
G,q
Pf,p (A|56∪φn (6) = π )Pf,p (φn−1 (B)|56∪φn (6) = π ) G,q
× Pf,p (56∪φn (6) = π ) + Pf,p (A ∩ φn−1 (B) ∩ Fn )
298
R. H. Schonmann
=
G,q
π∈P6∪φn (6)
G,q
Pf,p (A|56∪φn (6) = π ) Pf,p (φn−1 (B)|56∪φn (6) = π )
G,q
× Pf,p (56∪φn (6) = π ) −
G,q
Pf,p (A|56∪φn (6) = π )
π∈Fn G,q
G,q
× Pf,p (φn−1 (B)|56∪φn (6) = π ) Pf,p (56∪φn (6) = π ) G,q
+ Pf,p (A ∩ φn−1 (B) ∩ Fn ). From (4.13) and (4.14) we know that the last two terms above vanish, as n → ∞. As for the other term, we can write the following. (Below, φn (π2 ) is the partition of φn (6) obtained from the partition π2 of 6 when we relabel the sites of φn (6) according to their φn -preimages in 6. We will write “π comp π1 , φn (π2 )” for the statement that the partition π of 6 ∪ φn (6) is compatible with the partitions π1 and φn (π2 ) of 6 and φn (6).) G,q G,q Pf,p (A|56∪φn (6) = π ) Pf,p (φn−1 (B)|56∪φn (6) = π ) π∈P6∪φn (6)
G,q
=
× Pf,p (56∪φn (6) = π )
G,q
Pf,p (A|56 = π1 )
π1 ,π2 ∈P6 π∈P6∪φn (6) π comp π1 ,φn (π2 ) G,q
G,q
Pf,p (φn−1 (B)|5φn (6) = φn (π2 )) Pf,p (56∪φn (6) = π ) G,q G,q = Pf,p (A|56 = π1 ) Pf,p (B|56 = π2 ) π1 ,π2 ∈P6
×
G,q
Pf,p (56∪φn (6) = π )
π∈P6∪φn (6) π comp π1 ,φn (π2 )
=
G,q
G,q
Pf,p (A|56 = π1 ) Pf,p (B|56 = π2 )
π1 ,π2 ∈P6 G,q Pf,p ({56
= π1 } ∩ {5φn (6) = φn (π2 )}).
G,q
Since Pf,p has a trivial tail σ -field, it follows from, e.g., Proposition 7.9 of [Geo] that, as n → ∞, the last expression converges to G,q G,q G,q G,q Pf,p (A|56 = π1 ) Pf,p (B|56 = π2 ) Pf,p (56 = π1 ) Pf,p (56 = π2 ) π1 ,π2 ∈P6
=
G,q
G,q
Pf,p (A|56 = π1 ) Pf,p (56 = π1 )
π1 ∈P6
×
G,q
G,q
Pf,p (B|56 = π2 ) Pf,p (56 = π2 )
π2 ∈P6
=
G,q G,q Pf,p (A) Pf,p (B)
G,q
G,q
= µf,β (A) µf,β (B).
Multiplicity of Phase Transitions
299
This completes the proof of (4.11). G,q Applying (4.11) in the case A = B is automorphism invariant, gives µf,β (A) = G,q
(µf,β (A))2 , which is equivalent to G,q
µf,β (A) ∈ {0, 1}.
(4.15)
In other words, the σ -algebra of automorphism invariant spin events is trivial under the G,q measure µf,β . Using now Corollary 7.4 of [Geo], as in Step 5 of the proof of Theorem G,q
G,q
12.31 in the same book, we conclude that µf,β ∈ (Gβ,A )ext , as claimed.
!
Proof of Theorem 1.4. This claim is an immediate consequence of Theorem 4.1, (4.1), (3.7) and (4.1). ! The remainder of this subsection is dedicated to the proof of an extension of Theorem 1.5. For later reference, we state next the extension of Conjecture 3.1 to the FK models. Conjecture 4.1. For any infinite, locally finite, connected transitive graph G, any q ≥ 1 and any p ∈ [0, 1], G,q
inf P∗,p (x ↔ y) > 0
⇐⇒
x,y∈V
G,q
P∗,p (N = 1) = 1.
G,q
In particular for all p > p¯ conn,∗ (G, q), P∗,p (N = 1) = 1. The proof in [LyS] that for unimodular transitive graphs Conjecture 3.1 holds also applies to the FK models, as noted in the proof of Proposition 5.2 in [Lyo5], proving Conjecture 4.1 in this case. G,q The “⇐” part of Conjecture 4.1 is easy to prove, using the fact that P∗,p has positive correlations to conclude that G,q
P∗,p (N ≤ 1)
$⇒
G,q
G,q
P∗,p (x ↔ y) ≥ (P∗,p (r → ∞))2 .
(4.16)
Inequality (4.3) shows that if Conjecture 4.1 holds, then G,q
Pf,p (N = 1) = 1
$⇒
G,q
Pw,p (N = 1) = 1.
(4.17)
Given an infinite, locally finite, connected graph G = (V , E), q ∈ {2, 3, . . . }, and G,q β > 0, we define for i = 1, . . . , q, µ˜ i,β as the probability measure on {1, . . . , q}V obtained as follows. Consider a random configuration of occupied and vacant bonds G,q according to the law Pf,p , p = 1 − e−β , and assign to all the sites in the infinite clusters the state i, and to all the sites in each finite cluster a state from {1, . . . , q}, chosen uniformly at random for each cluster and independently for different clusters. G,q It is clear that if G is transitive, the measures µ˜ i,β are automorphism invariant. Furthermore, these measures have then a trivial invariant σ -field and are therefore extremal in the set of automorphism invariant probability measures on {0, 1}E . This can be proved by adapting the proof of Theorem 4.1. In the current case, the following definitions replace the ones used in that proof. Let Fn be the event that some finite cluster intersects 6 and φn (6); (4.13) is then clearly satisfied. Let PS be the set of all possible marked partitions of S V , meaning that each piece of the partition may be marked or not.
300
R. H. Schonmann
Let 5S be the random marked partition of S according to the relation of being in the same percolation cluster, with marks identifying pieces of the partition which belong to infinite clusters. Let Fn be the set of partitions of 6 ∪ φn (6) in which some site in 6 and some site in φn (6) are in the same unmarked piece of the partition. This means that (4.14) also hold in this case. The proof then proceeds as before. G,q G,q If Pf,p (r → ∞) > 0, then µ˜ i,β , i = 1, . . . , q, are distinct from each other. Indeed, with a self-explanatory notation, ˜ G,q 1 G,q G,q δσr ,i i,β = Pf,p (r → ∞) + 1 − Pf,p (r → ∞) q q − 1 G,q 1 1 = Pf,p (r → ∞) + > , q q q while, for i = j ,
δσr ,j
˜ G,q i,β
=
1 1 G,q 1 − Pf,p (r → ∞) < . q q G,q
G,q
From (4.3) and the construction of µi,β and of µ˜ i,β , 1 = 1, . . . , q, by assigning G,q
G,q
spins in the same fashion to the clusters of a percolation process with law Pw,p or Pf,p , respectively, we have the equivalence G,q
G,q
µ˜ i,β = µi,β
⇐⇒
G,q
G,q
Pf,p = Pw,p .
(4.18)
(Compare with (4.10).) On the other hand, it is not clear in general whether the measures G,q G,q µ˜ i,β are in Gβ . The next theorem provides a sufficient condition for this to happen. This theorem is complementary to Theorem 4.1 in that it provides a sufficient condition G,q G,q G,q for µf,β ∈ (Gβ,A )ext in terms of percolation properties of Pf,p . Theorem 4.2. Suppose that G = (V , E) is an infinite, locally finite, connected transitive G,q G,q graph and set p = 1 − e−β . If Pf,p (N = 1) = 1, then µ˜ i,β , 1 = 1, . . . , q are distinct G,q
elements of (Gβ,A )ext and G,q
µf,β =
i=1,...,q
G,q
1 G,q µ˜ . q i,β
(4.19)
G,q
In particular µf,β ∈ (Gβ,A )ext . G,q
G,q
Proof. Pf,p (N = 1) = 1 implies Pf,p (r → ∞) > 0, which implies that the measures G,q
µ˜ i,β , i = 1, . . . , q, are distinct from each other, as we saw above. G,q
G,q
From the construction of µf,β and of µ˜ i,β , 1 = 1, . . . , q, by assigning spins to the G,q
clusters of a percolation process with law Pf,p , we obtain (4.19) from the uniqueness of the infinite cluster in this percolation process. G,q G,q From Theorem 14.15(c) of [Geo], we learn from (4.19) that for each i, µ˜ i,β ∈ Gβ,A . (Note that while this theorem is stated in [Geo] in the setting of translation invariant Gibbs measures on Zd , its proof applies also to the more general setting of automorphism invariant Gibbs measures on a transitive graph.) Being automorphism invariant Gibbs G,q G,q measures with trivial invariant σ -field, the measures µ˜ i,β are in (Gβ,A )ext by, e.g., Corollary 7.4 of [Geo]. !
Multiplicity of Phase Transitions
301
The following is an extended version of Theorem 1.5. Theorem 4.3. Suppose that G is an infinite, locally finite, connected transitive unimodular graph. Then for each q: (a) We have the following dichotomy for each β > 0 and p = 1 − eβ . Either we have G,q
inf Pf,p (x ↔ y) = 0,
x,y∈V
G,q
Pf,p (N = 1) = 0,
G,q
G,q
µf,β ∈ (Gβ,A )ext .
Or else we have G,q
inf Pf,p (x ↔ y) > 0,
x,y∈V G,q
G,q
Pf,p (N = 1) = 1, G,q
and µf,β is a mixture of exactly q measures in (Gβ,A )ext , given by (4.19). (b) The first alternative above hold for all β < β¯1 (G, q) and the second alternative holds for all β > β¯1 (G, q), with 1 log if p¯ conn,f (G, q) < 1, 1−p¯ conn,f (G,q) β¯1 (G, q) = (4.20) ∞ if p¯ conn,f (G, q) = 1. (c) β¯1 (G, q) < ∞ iff pu (G) < 1. Proof. Claim (a) is an immediate consequence of Theorem 4.1, Theorem 4.2 and the fact that Conjecture 4.1 has been proved in the unimodular case (see the remark after that conjecture). Claim (b) is now immediate from (a) and the monotonicity (4.4). Claim (c) is immediate from (4.20), the fact that Conjecture 3.1 holds in the unimodular case (see the remark after that conjecture) and the inequalities (4.1) and (4.2). ! Note from the proof above that the assumption in Theorem 1.5 and Theorem 4.3 that G is unimodular would not be necessary if Conjecture 4.1 were vindicated (this would include the vindication of Conjecture 3.1). The following conjecture is unfortunately open even in the unimodular case. Conjecture 4.2. Suppose that G = (V , E) is an infinite, locally finite, connected transitive graph. Then for each q ≥ 1, G,q
Pf,p (N = 1) = 1
$⇒
G,q
G,q
Pf,p = Pw,p .
(Compare with (4.9), (4.10) and (4.17).) Because of (4.18) and (4.19) (or alternatively, because of (4.10)), if Conjecture 4.2 were proved, we would obtain in Theorem 4.3 the stronger conclusion, that when G,q G,q µf,β ∈ (Gβ,A )ext , then (1.2) holds, so that in particular β¯1 (G, q) = β¯2 (G, q). Note that even in the case of the lattices Zd , d ≥ 3, these conclusions are not currently available. To the best of our knowledge, even the weaker statement (a) in Theorem 1.5 is new in this case. Note. The proof of Proposition 6.1(iv) of [HJL], which was written independently of this paper, shows that Conjecture 4.2 is true in the case of planar non-amenable transitive graphs with one end. Observe that this result can also be obtained from the arguments in the proof of Theorem 5.6 in the current paper.
302
R. H. Schonmann
4.2. Mean field criticality for the Ising model. One extends the definition of the diagrams (3.11) in the obvious way: G,q
Diagp,∗,k (x, y) G,q G,q G,q G,q = Pp,∗ (x ↔ z1 ) Pp,∗ (z1 ↔ z2 ) . . . Pp,∗ (zk−1 ↔ zk ) Pp,∗ (zk ↔ y). z1 ,...,zk ∈V
The diagram corresponding to the Ising model (q=2), with k = 1 is of fundamental relevance in studying the critical behavior of this model. First we introduce the usual Ising spins, −1 if σx = 1, sx = +1 if σx = 2, for x ∈ V . Also, we use below the more standard notation for the (+)-phase and for the (−)-phase of the Ising model: G,Ising
µβ,−
= µG,2 β,1
G,Ising
µβ,+
= µG,2 β,2 ,
as well as further similar self-explanatory notation. Note that (4.7) can now be rewritten in the following way, where q = 2: G,Ising q 1 G,q sx sy +,β = δσx ,σy − q −1 q 2,β G,q
G,q
= Pw,p (x ↔ y → ∞) + Pw,p (x → ∞, y → ∞). So we have, for q = 2 and k = 1: G,Ising G,Ising G,Ising G,q "sx sz #+,β Diagp,w,k (x, y) = sz sy +,β = Bubbleβ (x, y),
(4.21)
z∈V
G,q
provided Pw,p (N = 0) = 1. Before we can state the known results on mean field critical behavior, we need a few more definitions. We will sometimes below abbreviate βc = βc (G, Ising), since there is no risk of confusion. Since there is a unique Gibbs measure when β < βc we will drop indication of + or − when referring to this unique measure. We will also G,Ising have to consider the Ising model with an external field h. We denote by µ+,β,h the (automorphism invariant when G is transitive) Gibbs measure for this system obtained by taking the infinite volume limit with + boundary conditions. (Note that on nonamenable graphs there may be more than one Gibbs distribution even when h = 0. This is well known for homogeneous trees, and was extended to other transitive non-amenable graphs in [JS].) The results below on mean field critical behavior were proved in the union of the papers [Sok, Aiz, AG] and [AF] (see also [ABF]). Suppose that G is an infinite, locally finite, connected transitive unimodular graph. Under the bubble diagram condition, which reads, G,Ising Bubbleβc (G,Ising) (r, r) < ∞, one has the following (the labels on the left indicate the way one usually refers to each result, in terms of a corresponding critical exponent; note that in this standard notation, the critical exponent β is not the same as the inverse temperature β):
Multiplicity of Phase Transitions d dβ
303
G,Ising x:dist(x,r)=1 "sr sx #β
[α ≤ 0]
0≤
[γ = 1]
C1 (βc − β)−1 ≤
[β = 1/2]
C1 (β − βc )1/2 ≤ "sr #+,β
[δ = 3]
C1 h1/3 ≤ "sr #+,βc ,h ≤ C2 h1/3 ,
< C,
G,Ising x∈V "sr sx #β G,Ising
for β < βc ,
≤ C2 (βc − β)−1 ,
≤ C2 (β − βc )1/2 ,
G,Ising
for β < βc ,
for β > βc ,
for h > 0,
where in each case C, C1 , C2 ∈ (0, ∞). 4.3. Consequences for FK and Potts models of “high non-amenability”. The next theorem is an extension to the FK models of Theorem 3.2, and hence of Theorem 1.1. It implies Theorem 1.3 and Theorem 1.9, as will be explained after we prove it. Theorem 4.4. Suppose that G is an infinite, locally finite, connected graph and q ≥ 1. If q (−1 + D(G)2 + q 2 D(G)2 − q 2 ), (4.22) iE (G) > 1 + q2 then for the q-FK model on G, pc,w (G, q) ≤ pc,f (G, q) < pexp,w (G, q) ≤ pexp,f (G, q).
(4.23)
Moreover, for k = 0, 1, . . . , G,q
sup
x,y∈V ,dist(x,y)≥n
and for each x ∈ V ,
Diagpc,w (G,q),k (x, y) → 0 exponentially fast as n → ∞, G,q
Diagpc,w (G,q),k (x, x) < ∞. Proof. From (4.2) and (3.1) we know that pc,∗ (G, q) 1 ≤ . pc,∗ (G, q) + (1 − pc,∗ (G, q))q 1 + iE (G) This is equivalent to pc,∗ (G, q) ≤
q . q + iE (G)
(4.24)
From (4.1) and (3.5) we know that pexp,∗ (G, q) ≥ 1/R(G). Combining this with (2.4) we have pexp,∗ (G, q) ≥
1 D(G)2
− iE (G)2
.
The strict inequality in (4.23) is implied by (4.24) and (4.25), provided that q 1 < , 2 q + iE (G) D(G) − iE (G)2
(4.25)
304
R. H. Schonmann
which is equivalent to (1 + q 2 )iE (G)2 + 2qiE (G) − q 2 (D(G)2 − 1) > 0. Since iE (G) ≥ 0, this is equivalent to the inequality (4.22). This completes the proof of (4.23), since the non-strict inequalities there are clear G,q G,q from Pf,p ≤ Pw,p . The claims about the diagrams follow by the same argument, using Theorem 3.1 in the place of (3.5). ! Note that the condition iE (G) q > , D(G) 1 + q2
(4.26)
which appears in Theorem 1.3 and Theorem 1.9 is stronger than (4.22), and that therefore, thanks to (4.6), Theorem 4.1 and (4.21), Theorem 1.3 and Theorem 1.9 follow from Theorem 4.4. We chose to state Theorem 1.1, Theorem 1.3 and Theorem 1.9 in the introduction with the less cumbersome (4.26) rather than with (4.22) to make these theorems look more transparent. Theorem 1.3 has the following counterpart for not necessarily transitive graphs. Theorem 4.5. Suppose that G is an infinite, locally finite, connected graph. If (4.22) holds, then there is a non-degenerate interval of values of β on which (1.2) fails and G,q µf,β has exponentially fast decaying correlations, in the sense that G,q 1 sup δσx ,σy f,β − → 0 exponentially fast, as l → ∞. (4.27) q x,y∈V dist(x,y)≥l
In particular β¯2 (G, q) > βc (G, q). Proof. Set p (G, q) = q/(q + iE (G)), which is equivalent to 1 p (G, q) = . p (G, q) + (1 − p (G, q))q 1 + iE (G)
(4.28)
From the proof of Theorem 4.4 we know that p (G, q) < pexp,f (G, q), so that the interval 1 1 I = log , log 1 − p (G, q) 1 − pexp,f (G, q) is well defined and not empty. For β ∈ I , (4.27) follows from (4.8), since p = 1−e−β < pexp,f (G, q). On the other hand, the proof of Theorem 2 in [BS1] ((3.1) in the current paper, when restated for bond percolation) shows that for p > 1/(1 + iE (G)), inf PG p (x → ∞) > 0.
x∈V
(For details, see the proof of Theorem 5.3 in Subsect. 5.2, which extends (3.1).) For β ∈ I , p = 1 − e−β > p (G, p) and therefore p/(p + (1 − p)q) > 1/(1 + iE (G)), by (4.28). Using (4.2), we conclude that G,q
inf Pw,p (x → ∞) ≥ inf PG p/(p+(1−p)q) (x → ∞) > 0.
x∈V
x∈V
Multiplicity of Phase Transitions
305 G,q
From (4.7) and the fact that Pw,p has positive correlations, we obtain now for β ∈ I and i = 1, . . . , q, 2 G,q q −1 1 1 G,q inf Pw,p (x → ∞) > . inf δσx ,σy i,β ≥ + x,y∈V x∈V q q q But if (1.2) held, this would contradict (4.27).
!
5. Potts and Ising Models II. The Finite Island Property 5.1. Sufficient condition for the Potts free Gibbs measure to decompose as the uniform mixture of the q ordered phases. Our approach in this section to Theorem 1.6 and to Theorem 1.7 is motivated by the proof of Proposition 2.1.5 in [NW], where the same conclusion was obtained in the special case in which the graph is Tb × Z, with large b. This approach, combined with results in [BS3], will also lead to the proof of Theorem 1.8. First we need to review a notion which is stronger than the requirement that percolation occurs. The setting is that of bond percolation on an infinite, locally finite, connected graph G. We say that the finite island property holds if any infinite chain in the graph intersects some infinite cluster (in the sense of containing some site which belongs to some infinite cluster). In other words, if we remove all the sites which belong to infinite clusters (along with the edges incident to them), then the remaining graph contains no infinite connected component. Theorem 5.1 (Newman and Wu). Suppose that G is an infinite, locally finite, connected G,q graph and set p = 1 − e−β . If for bond percolation on G under the law Pf,p of the FK model with parameters q and p and free boundary conditions a.s. there is a unique infinite cluster and the finite island property holds, then G,q
µf,β =
i=1,...,q
1 G,q µ . q i,β
(5.1)
The hypothesis, and hence the conclusion above, holds if for independent bond percolation on G with density p/(p + (1 − p)q) a.s. there is a unique infinite cluster and the finite island property holds. For a complete proof of this theorem, we refer the reader to the proof of Proposition 2.1.5 in [NW]. But for convenience we give next an idea of the proof. Sketch of Proof. Suppose that we construct a random spin configuration with Gibbs G,q distribution µf,β on the same probability space on which a bond percolation process G,q
with law Pf,p has been constructed, in the fashion reviewed in the last section, i.e., G,q
according to the coupling Pf,p . Suppose that in this bond percolation process there is a unique infinite cluster and the finite island property holds. The spins in the unique infinite cluster will all take the same value, say S, chosen uniformly from {1, . . . , q}. Therefore any infinite chain must contain spins which take value S. This implies that the spin at any site is “shielded from infinity by spins S”. It is a fairly standard matter to G,q use the Markov property of µf,β now (with no further reference to the FK model and G,q
the coupling Pf,p ) to derive (5.1) in a rigorous fashion.
306
R. H. Schonmann
As for the last statement in the theorem, it is enough to observe that the event that there is a unique infinite cluster and that the finite island property holds is an increasing event, and to use then (4.2). ! Theorem 5.1 motivates the following problem, which is also interesting for its own sake as a natural extension of the problem of determining for which graphs pc < 1. Open Problem 5.1. For which graphs does the finite island property hold for independent bond percolation with large p? In particular, is the condition iE∗ (G) > 0 (or the stronger condition iE (G) > 0) sufficient for this? Theorem 5.2 and Theorem 5.5 contain partial answers to this question. In combination with Theorem 5.1 they imply Theorem 1.6 and Theorem 1.7. Theorem 5.2 and Theorem 5.5 extend results in [NW] and [Wu3] and also a result in [Jon] which states that the finite island property holds for bond percolation on homogeneous trees with degree at least 3, for large p (see the proof of Theorem 1.3 in that paper). Note that Theorem 5.1 cannot be applied to the homogeneous trees, since uniqueness of the infinite cluster always fails there.
5.2. Finite island property for graphs with positive anchored vertex isoperimetric constant. Theorem 5.2. Suppose that G is an infinite, locally finite, connected graph. If iV∗ (G) > 0, then for independent bond percolation with large p the finite island property holds. Our proof of Theorem 5.2 was motivated by the approach used in [NW]. The first step in its proof is Theorem 5.3, which extends Theorem 2 in [BS1] (inequality (3.1) here), by providing exponential estimates. Its proof is adapted from [BS1], with a large deviation estimate used to obtain the desired exponential bound. (Note that [BS1] considered site percolation; the adaptation to bond percolation as done here or in Lemma 4 in [PSN] is straightforward.) It is relevant to stress that the precise form of the exponential estimates obtained in Theorem 5.3 are not a priori obvious. In a sequence of remarks which appear later in this section we will discuss this point and related aspects of the proof of Theorem 5.2. Theorem 5.3. Suppose that G is an infinite, locally finite, connected graph. (a) If iE∗ (G) > 0, then for p > 1/(1 + iE∗ (G)) there is γ ∈ (0, ∞) and for each x ∈ V there is Cx ∈ (0, ∞) such that for any connected A V with x ∈ A, −γ (|A|+|∂V A|) . PG p (A → ∞) ≤ Cx e
Moreover γ can be taken as large as desired, provided p is close enough to 1. (b) If iE (G) > 0, then for p > 1/(1 + iE (G)) there are C, γ ∈ (0, ∞) such that for any A V, −γ (|A|+|∂V A|) . PG p (A → ∞) ≤ Ce
Moreover γ can be taken as large as desired, provided p is close enough to 1.
Multiplicity of Phase Transitions
307
Proof. We will prove (a); the proof of (b) is analogous but simpler. Since p > 1/(1 + iE∗ (G)) we can take 0 < h < iE∗ (G) such that p>
1 . 1+h
(5.2)
Given x ∈ V , with no loss we can suppose that |A| is large enough that for all connected S V such that x ∈ S and |S| ≥ |A| we have |∂E S| ≥ h|S|. (5.3) As in the proofs of Theorem 2 in [BS] and Lemma 4 in [PS-N], we will use a technique known as “growing the cluster of the set A”, and we review now what this means. A random sequence of finite subsets of V , Vi , i = 0, 1, . . . , and a random sequence of finite subsets of E, Ei , i = 0, 1, . . . , will be defined, so that if |C(A)| ≤ ∞, Vi will grow towards C(A) and Ei will grow towards a set which contains ∂E C(A). To describe these sequences we order the elements of E in an arbitrary way. Set V0 = A, E0 = ∅. We suppose now that Vi and Ei are known and will explain how Vi+1 and Ei+1 are obtained. In case ∂E Vi ⊂ Ei set Vi+1 = Vi and Ei+1 = Ei . Otherwise the set ∂E Vi \Ei is not empty and we let ei be its first element. Say that xi is the endpoint of ei contained in (Vi )c . We check whether the edge ei is occupied or not. If ei is occupied, set Vi+1 = Vi ∪ {xi } and Ei+1 = Ei . If ei is vacant set Vi+1 = Vi and Ei+1 = Ei ∪ {ei }. This completes the description of the construction. On the event {A → ∞} there exists n such that C(A) = Vn , ∂E C(A) ⊂ En . In what follows we take the minimal such n, so that we have |Vn | + |En | = |A| + n.
(5.4)
Since ∂E Vn ⊂ En and x ∈ A ⊂ Vn , from (5.3) we obtain |En | ≥ |∂E Vn | ≥ h|Vn |. Using (5.4) then gives |En | ≥
h (|A| + n). 1+h
(5.5)
Note that we must have n ≥ |∂V A|, since for each vertex in ∂V A at least one of the edges incident to it must have its occupancy status checked in the construction before we can conclude whether this vertex belongs to C(A) or not. Therefore from (5.5) and the fact that Ei is non-decreasing with i, we obtain h G (A → ∞) ≤ P (|A| + n) for some n ≥ |∂V A| PG p p |En | ≥ 1+h E|A|+n ≥ h (|A| + n) for some n ≥ |∂V A| ≤ PG p 1+h ∞ E|A|+n ≥ h (|A| + n) ≤ PG p 1+h =
n=|∂V A| ∞
i=|A|+|∂V A|
PG p
|Ei | ≥
h i . 1+h
For each i, the distribution of Ei is stochastically dominated by a binomial distribution corresponding to i indepenent attempts each one with probability 1 − p of success.
308
R. H. Schonmann
Therefore from (5.2) and standard facts about large deviations for the binomial distribution (see, e.g., Exercise 2.2.23(b) on p. 35 of [OZ]) we obtain the desired conclusion, including the statement that γ can be made as large as desired by taking p close to 1. ! Remark 5.1. In the exponential estimates in Theorem 5.3, each one of the terms |A| or |∂V A| may dominate the other. For instance, if G = T! and A is a long chain, then |∂V A| dominates |A|. If G = Br(a), for some a = 2, 3, . . . and A = {x ∈ V : dist(r, x) ≤ 2k − 1} with large k, then |A| dominates |∂V A|. Remark 5.2. Even under the condition iV (G) > 0 (stronger than iV∗ (G) > 0, iE (G) > 0 and iE∗ (G) > 0), PG p (A → ∞) in Theorem 5.3 may not decay as fast as an exponential of |∂E A| for any p < 1. And this is so even if A is supposed to be a chain starting at a given fixed site. An example proving this assertion is as follows. Consider the graph Br(a), with some a = 2, 4, 6, . . . and take for A = Ak a finite chain with exactly one vertex at distance i from the root, for i = 0, . . . , 2k, and a k+1 /2 vertices at distance a k+1 , 2k+1 from the root (this is half of the existing ones). Then PG p (Ak → ∞) ≥ (1−p) while |∂E Ak | ≥ (a k+1 /2)2 = a 2k+2 /4. Proposition 5.1 (Chen, Peres). Suppose that G is an infinite, locally finite, connected graph. (a) If iE∗ (G) > 0, then there is γ ∈ (0, ∞) and for each x ∈ V there is Cx ∈ (0, ∞) such that |{S : x ∈ S ⊂ V , S connected , |∂E S| = n}| ≤ Cx eγ n . (b) If iV∗ (G) > 0, then there is γ ∈ (0, ∞) and for each x ∈ V there is Cx ∈ (0, ∞) such that |{S : x ∈ S ⊂ V , S connected , |∂V S| = n}| ≤ Cx eγ n . Part (a) is Lemma 2.1 in [CP], and part (b) can be proved in an analogous manner. (These proofs are applications of probability to combinatorics; for part (a) one uses bond percolation, and for part (b) one replaces it with site percolation.) Remark 5.3. The condition iE (G) > 0 (stronger than iE∗ (G) > 0) imposes no restriction on the way |{S : r ∈ S ⊂ V , S is a chain, |∂V S| = n}| ( ≤ |{S : r ∈ S ⊂ V , S connected, |∂V S| = n}|) grows with n. For an extreme example consider the graph Br(a) and for k > log(n − 1)/ log(a) let Sk be a finite chain which contains all vertices at distance i from the root, for i = 0, . . . , 2k, and a k+1 −n+1 vertices at distance 2k +1 from the root (this is all but n − 1 of the existing ones). Then for each admissible k, |∂V Sk | = n. This shows that for each n we have |{S : r ∈ S ⊂ V , S is a chain, |∂V S| = n}| = ∞. Remark 5.4. Note that even under the condition iV (G) > 0 (stronger than iV∗ (G) > 0, iE (G) > 0 and iE∗ (G) > 0), |{S : r ∈ S ⊂ V , S is a chain, |S| = n}| may grow faster than exponentially with n. For instance, observe that for T! this last expression takes the value (n − 2)!. Proof of Theorem 5.2. For x ∈ V let Fx denote the event that there is an infinite chain starting at x which does not contain any site which belongs to an infinite cluster. Our task is to show that for each x, PG p (Fx ) = 0.
Multiplicity of Phase Transitions
Set iVx (G) = inf
309
|∂V S| : x ∈ S V , S connected . |S|
Since iV∗ (G) > 0, we also have iVx (G) > 0. If Fx happens, then for each n there is a finite chain A V , x ∈ A, |A| ≥ n/ iVx (G), such that A → ∞. (A can be chosen as a finite chain contained in the chain whose existence is assumed in the definition of Fx .) In particular |∂V A| ≥ n. When p is large enough so that the γ in Theorem 5.3(a) is larger than the γ in Proposition 5.1(b), we obtain now PG PG ! p (Fx ) ≤ lim p (A → ∞) = 0. n→∞
i≥n AV , A is a chain |∂V A|=i, x∈A
Remark 5.5. The presence of |A| in the exponential estimate in Theorem 5.3(a) was not needed in the proof above of Theorem 5.2. The presence of |∂V A| there was the crucial element in this proof, and Remark 5.4 shows that if we only had |A| there, our approach would not work. Remark 5.3 explains why our approach does not solve the question raised in Open Problem 5.1, whether iE∗ (G) > 0 (or iE (G) > 0) could replace the condition iV∗ (G) > 0 in Theorem 5.2. The following theorem is an extension of Theorem 1.6. Theorem 5.4. Suppose that G is an infinite, locally finite, connected graph. If iV∗ (G) > 0 and for all large p < 1, PG p -a.s. there is a unique infinite cluster, then we have G,q G,q β¯2 (G, q) < ∞ for each q. In particular, for large β, µ ∈ (G )ext . f,β
Proof. Combine Theorem 5.1 with Theorem 5.2.
β
!
A class of graphs which satisfy the hypothesis of Theorem 5.4 is that of Cartesian products of 2 infinite connected bounded degree graphs, at least one of which is nonamenable. It is easy to see that then the product is also non-amenable and from Theorem 1.9 of [HPS] we have on it a.s. uniqueness of the infinite cluster for p > pc (Z2 ). 5.3. Finite island property for transitive graphs with one end which satisfy the quasiconnected minimal cut sets property. Theorem 5.5. Suppose that G = (V , E) is an infinite, locally finite, connected transitive graph. If G has the quasi-connected minimal cut sets property and has a single end, then pu (G) < 1 and for independent bond percolation with large p the finite island property holds. Proof. We know from the proof of Corollary 10 of [BB] that pc (G) ≤ pu (G) < 1. Take p ∈ (pc (G), 1), and suppose that the finite island property does not hold for independent bond percolation with parameter p. Then in this percolation process we have a.s. the coexistence of an infinite cluster (call it C) and an infinite chain which does not intersect any infinite cluster (call it I). Choose x ∈ C and y ∈ I. Let 5 be the set of edges {u, v} ∈ E with the property that there is a path from u to x inside C (i.e., all the vertices it crosses belong to C) and there is a path from v to y inside (C)c (i.e., all the vertices it crosses belong to (C)c ). It is clear that 5 is an (x, y) mcs.
310
R. H. Schonmann
We want to argue that 5 is infinite. Since G has a single end, given any n, the removal of the ball B(n) from the graph leaves exactly one infinite connected component in the remaining graph. This infinite component must contain a subset Cn of C and a subset In of I. Therefore there must be a path outside B(n) from Cn to In . Obviously, this path can be chosen so that one of its endpoints, u, belongs to Cn , but it does not cross any other site of Cn . Say that {u, v} is the edge of this path incident to u, and that w is the end-point of this path distinct from u. By pasting this path to a path from u to x inside of C and to a path from w to y inside of I, we obtain a path which shows that (u, v) ∈ 5. Since n is arbitrary one can obtain infinitely many elements of 5 in this manner. From the definition of the quasi-connected mcs property, we know that there exists l < ∞ so that any infinite mcs, when considered as a set of vertices in (GE )l , must contain an infinite chain in this graph. So we learn that 5 must contain an infinite chain in (GE )l . But all edges in 5 are vacant (since it has one endpoint in Cn and one in (Cn )c ). Consider independent site percolation on (GE )l with parameter 1 − p. From what we just learned, there is an infinite cluster in this process. But since G is a transitive graph, (GE )l must be a quasi-transitive graph and, in particular, it must have bounded degree. Therefore (GE )l does not support infinite clusters in site percolation with parameter 1−p when p is close to 1. To avoid a contradiction we must conclude that for independent bond percolation on our original graph G the finite island property holds when p is large enough. ! Proof of Theorem 1.7. Combine Theorem 5.1 with Theorem 5.5.
!
5.4. Finite island property for planar graphs. Suppose that G = (V , E) is a planar graph, as defined in Subsect. 2.8, and let G† = (V † , E † ) be its dual, so that there is a one-to-one correspondence between the edges of E and those of E † . Given a random configuration of occupied and vacant edges in {0, 1}E , say that e† ∈ E † is occupied (resp. vacant) if the corresponding edge e ∈ E is vacant (resp. occupied). Let, as before, N be the number of infinite clusters in G according to a random configuration in {0, 1}E and let N † be the number of infinite clusters in G† according to the corresponding † random configuration in {0, 1}E . From the proof of Theorem 3.4 in [BS3], we know that if G = (V , E) is an infinite, locally finite, connected transitive non-amenable planar graph with one end, and P is a probability distribution on {0, 1}E which is automorphism invariant, has a trivial invariant σ -field and satisfies the finite energy condition (which means that on any finite set of edges each configuration of occupied and vacant edges has positive conditional probability given any configuration outside of this set), then P ((N , N † ) ∈ (1, 0), (0, 1), (∞, ∞)) = 1.
(5.6)
G,q
The measures P∗,p are known to satisfy the conditions above on P , which are the same in the next theorem. Theorem 5.6. Suppose that G = (V , E) is an infinite, locally finite, connected transitive non-amenable planar graph with one end. Suppose that P is a probability distribution on {0, 1}E which is automorphism invariant, has a trivial invariant σ -field and satisfies the finite energy condition. If P (N = 1) = 1, then P (the finite island property holds) = 1.
Multiplicity of Phase Transitions
311
Proof. We will use the notation from Subsect. 2.8. Given a bounded set A ⊂ 5, we say that a set S ⊂ 5 surrounds A if there exists a simple closed curve whose image is contained in S which separates 5 into two components, one of which is bounded and contains A. Let O (resp. C) be the set of embedded edges of G which are occupied (resp. belong to infinite clusters). From (5.6) we learn that P -a.s. there are no dual infinite clusters. Any open ball B ⊂ 5 intersects only finitely many faces of the embedding of G in 5. The random set of faces which belong to the same dual cluster as some face which intersects B ! be the closure of the union of these faces. B ! must be is therefore P -a.s. finite. Let B ! ˇ The connected and bounded. Hence 5\B contains exactly one unbounded component B. ˇ boundary of B is a subset of O which has the property needed to show that O surrounds B. Since P -a.s. there is an infinite cluster, large balls must intersect this infinite cluster, and since they are surrounded by O, they must also be surrounded by C. This implies that P -a.s. C surrounds each bounded A ⊂ 5. But this implies the finite island property. ! Proof of Theorem 1.8. The statement that pu (G) < 1 is contained in Theorem 1.1 of [BS3]. From Proposition 2.1 in [BS3], we know that G is unimodular, and we can therefore use Theorem 4.3. From the dichotomy in part (a) of that theorem, Theorem 5.6 and Theorem 5.1, we obtain claim (a). Using (a) and the fact that pu (G) < 1, parts (b) and (c) of Theorem 4.3 imply now 1 . ! β¯1 (G, q) = β¯2 (G, q) = log 1 − p¯ conn,f (G, q)
6. Related Results 6.1. Multiplicity of phase transitions in the FK model. The FK model is worth of study for its own sake (see, e.g., [Gri1, Häg1, Jon, GHM] and [Lyo5]). To keep the current paper from becoming too long, we will limit ourselves to stating here a conjecture similar to Conjecture 1.4 and then stating two results which are immediate consequences of our results on the Potts model and of (4.10). This will restrict our results to the cases q ∈ {2, 3, . . . }. (Part (a.1) of the next conjecture is a consequence of the well known (4.9) and is included in the statement of the conjecture for comparison with the other statements there. Also the fact that for transitive graphs, or more generally for bounded degree graphs, pc,w (G, q) > 0 is well known.) Conjecture 6.1. Suppose that G is an infinite, locally finite, connected transitive graph. ¯ q) such that: Then for each q > 1 there exist 0 < pc,w (G, q) ≤ p(G, G,q
G,q
(a.1) For p < pc,w (G, q), Pf,p = Pw,p .
G,q
G,q
(a.2) For pc,w (G, q) < p < p(G, ¯ q), Pf,p = Pw,p . G,q
G,q
(a.3) For p > p(G, ¯ q), Pf,p = Pw,p .
312
R. H. Schonmann
Moreover (b.1) p(G, ¯ q) < pc,w (G, q) iff G is non-amenable. (b.2) p(G, ¯ q) < 1 iff pu (G) < 1. As with the last implication in (4.10), the conjecture above cannot be extended to q = 1. Note that if Conjecture 4.1 (which includes Conjecture 3.1) and Conjecture 4.2 were both proved, then using (4.10) the statements in Conjecture 6.1, with the exception of (b.1), would follow, with p(G, ¯ q) = p¯ conn,f (G, q). From Theorem 1.3, Theorem 1.6 and (4.10) we obtain Theorem 6.1. Suppose that G is an infinite, locally finite, connected transitive graph with pu (G) < 1 and that q ∈ {2, 3, . . . }. If iE (G)/D(G) > q/ 1 + q 2 , then there are pc,w (G, q) < p¯ 1 (G, q) ≤ p¯ 2 (G, q) < 1 such that for the q-FK model on G: G,q
G,q
(a) For pc,w (G, q) < p < p¯ 1 (G, q), Pf,p = Pw,p . G,q
G,q
(b) For p > p¯ 2 (G, q), Pf,p = Pw,p . (Only the use of Theorem 1.6 restricts Theorem 6.1 to integer values of q. The use of Theorem 1.3 can be replaced with Theorem 4.4 and (4.16), which hold for q ≥ 1.) From Theorem 1.8 and (4.10) we obtain Theorem 6.2. Suppose that G is an infinite, locally finite, connected transitive nonamenable planar graph with one end. Then for each q ∈ {2, 3, . . . } there is pc,w (G, q) ≤ p(G, ¯ q) < 1 such that: G,q
G,q
¯ q), Pf,p = Pw,p . (a) For pc,w (G, q) < p < p(G, G,q
G,q
(b) For p > p(G, ¯ q), Pf,p = Pw,p . Note that combining Theorem 6.2 with Theorem 4.4(a) in [Jon] and (4.10) again, we learn that under the conditions of Theorem 6.2, if q is a large integer (depending on G), then pc,w (G, q) < p(G, ¯ q). This further supports Conjecture 6.1. Note. Proposition 6.11 (iii) and (iv) of [HJL], which was written independently of this paper, contain further partial results which support Conjecture 6.1.
6.2. Site percolation. In this subsection we summarize the independent site percolation analogues of some of the independent bond percolation results in this paper. In independent site percolation, each site of a graph G = (V , E) is occupied with probability p and vacant otherwise, these decisions being independent for distinct sites. Clusters are the infinite connected components of the graph obtained from G by deleting the vacant sites, along with the edges incident to them. The notation below should be self-explanatory. First we note that Br(a), a ≥ 2 can be used to illustrate the fact that even under iE (G) > 0 we can have pcsite (G) = 1. On the other hand, Theorem 2 in [BS1] states that, similarly to (3.1), pcsite (G) ≤
1 . 1 + iV (G)
(6.1)
Multiplicity of Phase Transitions
313
√ Theorem 1.1 has an analogue √ with the condition iV (G)/D(G) > 1/ 2 replacing the condition iE (G)/D(G) > 1/ 2. And Theorem 4.4 has an analogue with the condition iV (G) > (−1 + 2D(G)2 − 1)/2 replacing (4.22). These claims are consequences of the following. First, (3.1) is replaced with (6.1). Second, (3.4) and Theorem 3.1 are essentially unchanged for site percolation. Third, iV (G) ≤ iE (G), so that (2.4) yields: (iV (G))2 + (R(G))2 ≤ (D(G))2 .
(6.2)
Theorem 5.3 has the following analogue, which improves on (6.1). Theorem 6.3. Suppose that G is an infinite, locally finite, connected graph. (a) If iV∗ (G) > 0, then for p > 1/(1 + iV∗ (G)) there is γ ∈ (0, ∞) and for each x ∈ V there is Cx ∈ (0, ∞) such that for any connected A V with x ∈ A, (A → ∞) ≤ Cx e−γ |∂V A| . PG,site p Moreover γ can be taken as large as desired, provided p is close enough to 1. (b) If iV (G) > 0, then for p > 1/(1 + iV (G)) there are C, γ ∈ (0, ∞) such that for any A V, PG,site (A → ∞) ≤ Ce−γ |∂V A| . p Moreover γ can be taken as large as desired, provided p is close enough to 1. A term |A| can also be included in the exponent in the upper bounds in Theorem 6.3, but unless one is interested in good estimates for γ , this is irrelevant, since the term |∂V A| will dominate the term |A| up to a constant factor. (Compare with Remark 5.1.) While the finite island property for site percolation cannot be motivated by a statistical mechanics Gibbs measure problem as was the case for bond percolation (see Subsect. 5.1), it is still a natural property to investigate. As for bond percolation, also for site percolation we say that the finite island property holds if any infinite chain in the graph intersects some infinite cluster (in the sense of containing some site which belongs to some infinite cluster). Using Theorem 6.3(a) and Proposition 5.1(b), we obtain the analogue of Theorem 5.2. Theorem 6.4. Suppose that G is an infinite, locally finite, connected graph. If iV∗ (G) > 0, then for independent site percolation with large p the finite island property holds. 7. The Contact Process 7.1. Separation of critical points for the contact process. As usual we will identify the state of the contact process at time t with the set of infected sites at this time. In doing so, we will also, as in previous sections, denote sets with a single element by the name of this element. A Denote by (ξG,λ;t )t≥0 the contact process with infection parameter λ started from A A ⊂ V . Also let ξG,λ;t (x) be the indicator of the event that in this process the site x ∈ V is infected at time t. The following is the contact process analogue of Theorem 5.3.
314
R. H. Schonmann
Theorem 7.1. Suppose that G is an infinite, connected, bounded degree graph. If iE (G) > 0, then for λ > 1/ iE (G) there are C, γ ∈ (0, ∞) such that for any A V , A P(ξG,λ;t = ∅ for some t > 0) ≤ Ce−γ |A| .
Moreover γ can be taken as large as desired, provided λ is large enough. In particular λs (G) ≤
1 . iE (G)
(7.1)
Proof. When the state of the contact process is S V , the process jumps to states with one less infected site at rate |V |, and to states with one more infected site at rate A λ|∂E V | ≥ λiE (G)|V |. Therefore the process (|ξG,λ;t |)t≥0 is stochastically larger than a process (Xλ;t )t≥0 which is a birth and death process, with X0 = |A|, in which a death occurs at rate Xλ;t and a birth occurs at rate λiE (G)Xλ;t . A time change argument shows that the probability that this birth and death process will ever reach the state 0 is the same as the corresponding probability for a birth and death process started from the same state, in which a death occurs at rate 1 and a birth occurs at rate λiE (G). But if λ > 1/ iE (G), the probability that this process (which is just a biased random walk) will ever reach the state 0 decays exponentially with |A|, with a rate of exponential decay which can be made as large as desired, provided λ is large enough. ! Theorem 7.2 below is the analogue of Proposition 8.3 of [HPS] ((3.4) in the current paper). Before we can state it and prove it, we need to introduce some notation and review some facts. We will compare the contact process to a branching random walks (BRW) process. This BRW process on an infinite, connected, bounded degree graph G is described as follows. At each time t ≥ 0 the state of the BRW is described by associating a number of particles to each site, with only finitely many sites having more than 0 particles. The evolution is then described by saying that at each time each particle dies at rate 1 and gives birth to a new particle at rate α, with the new particle being placed at a uniformly chosen site from among the sites which are neighbors to its parent’s site. A Denote by (ζG,α;t )t≥0 the BRW on G started from the configuration with one particle A (x) be the number of particles at each site in A V and no other particle. Also let ζG,α;t in this process at the site x ∈ V at time t. A A (·))t≥0 is stochastically larger than (ξG,λ;t (·))t≥0 , i.e., the two It is clear that (ζG,λD;t processes can be coupled in such a way that whenever the contact process has site x infected, the BRW has at least one particle at this site. (Note that the BRW that we consider is different from the one obtained from the contact process by simply allowing births of new particles on already occupied sites. That process lies stochastically between the contact process and the BRW that we are considering.) In order to analyse the BRW in a manner done in [MS] and [Schi], we need to first consider an individual random walk on G. Recall that (AG (·, ·)) denotes the adjacency matrix of G and R(G) denotes its spectral radius. For x, y ∈ V set pG (x, y) =
AG (x, y) . dx
Multiplicity of Phase Transitions
315
pG (·, ·) is the transition kernel for the simple random walk on G. We will denote by n (x, y) the probability that this random walk started at x is at y at time n. The correpG sponding spectral radius is defined by n Rrw (G) = lim sup (pG (x, y))1/n , n→∞
which does not depend on x, y ∈ V . Similarly to (2.2) and (2.3) we have n pG (x, y) ≤ (Rrw (G))n
for all x, y ∈ V ,
n = 1, 2, . . . .
(7.2)
For n(k) = bip(G)k + oddG (x, y), n(k)
lim (pG (x, y))1/n(k) = Rrw (G) for all x, y ∈ V .
(7.3)
R(G) R(G) ≤ Rrw (G) ≤ , D(G) d(G)
(7.4)
k→∞
Clearly
where d(G) is the minimal degree of G and, as before, D(G) is its maximal degree. Consider also a continuous time simple random walk on G which jumps at rate α. We will denote by PG,α;t (x, y) the probability that this continuous time random walk started at x is at y at time t. The following lemma must be well known, but we could not find a reference. Lemma 7.1. lim
t→∞
1 log PG,α;t (x, y) = α(Rrw (G) − 1), t
(7.5)
and for all t ≥ 0, PG,α;t (x, y) ≤ eα(Rrw (G)−1)t .
(7.6)
Proof. By definition, PG,α;t (x, y) =
∞ n=0
e−αt
(αt)n n p (x, y). n! G
(7.7)
So (7.6) follows from (7.2). We will prove (7.5) first when oddG (x, y) = 0. Note that from (7.3) we have then 2k lim (pG (x, y))1/(2k) = Rrw (G).
k→∞
So, given a small F > 0, there exists K < ∞ such that for k ≥ K, 2k (x, y) ≥ (R (G) − F)2k . Therefore, from (7.7), pG rw PG,α;t (x, y) ≥
∞ k=K
≥e
−αt
e−αt "
(αt)2k (Rrw (G) − F)2k (2k)!
cosh(α(Rrw (G) − F)t) −
K−1 k=0
# (α(Rrw (G) − F)t)
2k
.
316
R. H. Schonmann
Hence 1 log PG,α;t (x, y) ≥ α(Rrw (G) − F − 1). (7.8) t Since F > 0 can be taken arbitrarily small, (7.5) for oddG (x, y) = 0 follows from (7.8) and (7.6). One can derive (7.5) for oddG (x, y) = 1 in a similar fashion, or also easily derive it from (7.5) for oddG (x, y) = 0. ! lim inf t→∞
Set
{r} λexp (G) = sup λ : P ξG,λ;t (r) = 1 decays exponentially with t .
Clearly λexp (G) ≤ λr (G).
(7.9)
r (r) = 1 for only To see this, note that for λ < λexp , by a Borel–Cantelli argument, ξG,λ;t finitely many integer values of t > 0, a.s. But using the Markov property of the contact process, it is clear that if the root were infected at arbitrarily large times, then it would a.s. be infected at arbitrarily large integer times (since starting from any configuration that has the root infected at time 0, one has probability at least e−1 of having the root infected throughout the next unit interval of time).
Theorem 7.2. Suppose that G = (V , E) is an infinite, connected, bounded degree graph. For each λ > 0, x, y ∈ V , x P(ξG,λ;t (y) = 1) ≤ e(λD(G)Rrw (G)−1)t .
(7.10)
In particular λr (G) ≥ λexp (G) ≥
d(G) 1 ≥ . D(G)Rrw (G) D(G)R(G)
(7.11)
A Proof. The BRW (ζG,λD(G);t (·))t≥0 is the same BRW studied in [Schi] (with the quantity λ there being our α = λD(G)). Using (3) from that paper and (7.6), we obtain x x P(ξG,λ;t (y) = 1) ≤ E(ζG,λD(G);t (y)) = e(λD(G)−1)t PG,λD(G);t (x, y)
≤ e(λD(G)−1)t+λD(G)(Rrw (G)−1)t = e(λD(G)Rrw (G)−1)t , proving (7.10). The first inequality in (7.11) is (7.9). The second inequality in (7.11) is a direct consequence of (7.10). The third inequality in (7.11) follows from (7.4). ! Theorem 7.3. Suppose that G is an infinite, connected, bounded degree graph. If iE (G) >
D(G)2 D(G)2 + d(G)2
,
(7.12)
then for the contact process on G, λs (G) < λexp (G) ≤ λr (G). Proof. From Theorem 7.1 and Theorem 7.2, it is enough to verify that 1/ iE (G) < d(G)/(D(G)R(G)). From (2.4) we see that it is therefore enough to verify that 1/ iE (G)2 < d(G)2 /(D(G)2 (D(G)2 − iE (G)2 )). But this is equivalent to (7.12). !
Multiplicity of Phase Transitions
317
7.2. Mean field criticality for the contact process. We suppose in this subsection that G = (V , E) is an infinite, locally finite, connected transitive graph. In this case the transition kernel pG (·, ·) is symmetric, since dx = D(G) does not depend on x ∈ V . As a consequence, also PG,α;t (·, ·) is symmetric. Furthermore, (7.4) becomes then D(G)Rrw (G) = R(G).
(7.13)
We will have to consider the contact process being started at time s ≥ 0 from a A,s A,s configuration A ⊂ V . This process will be denoted by (ξG,λ;t )t≥s . Similarly, (ζG,α;t )t≥s will denote the same BRW that we considered before, but started at time s ≥ 0 from the configuration with one particle at each site in A V and no other particle. The contact process triangle diagram is defined as follows, where z ∈ V and s ≥ 0, G λ (z, s) $ = u,v∈V
∞ s
$ dt1
∞ t1
u,t1 r,0 z,s dt2 P(ξG,λ;t (v) = 1)P(ξG,λ;t (u) = 1)P(ξG,λ;t (v) = 1). 2 1 2
Before we can state some known results on the mean field critical behavior of the contact process, we need some more definitions: G
"$
∞
χ (λ) = E 0
dt
x∈V
# r ξG,λ;t (x)
,
r ρ G (λ) = P(ξG,λ;t = ∅ for all t > 0), " " ## $ ∞ G r ρ (λ, h) = 1 − E exp −h dt ξG,λ;t (x) , 0
x∈V
where h ≥ 0. We will sometimes below abbreviate λs = λs (G), since there is no risk of confusion. The results below on mean field critical behavior were proved in [BW]. Suppose that G is an infinite, locally finite, connected transitive unimodular graph. Under the contact process open triangle condition, which reads, lim
sup
l→∞ z:dist(r,z)≥l s:s≥0
G λs (z, s) = 0,
(7.14)
one has the following (the labels on the left indicate the way one usually refers to each result, in terms of a corresponding critical exponent): [γ = 1] C1 (λs − λ)−1 ≤ χ G (λ) ≤ C2 (λs − λ)−1 , for λ < λs , [β = 1] C1 (λ − λs )1 ≤ ρ G (λ) ≤ C2 (λ − λs )1 , for λ > λs , [δ = 2] C1 h1/2 ≤ ρ G (λs , h) ≤ C2 h1/2 , for h > 0, where in each case C1 , C2 ∈ (0, ∞). The following result is the contact process counterpart to Theorem 3.1. It is related to Theorem 7.2 in the same way that Theorem 3.1 is related to Proposition 8.3 of [HPS] ((3.4) in this paper). Theorem 7.4. Suppose that G = (V , E) is an infinite, connected, bounded degree graph. For each λ < 1/R(G), (7.14) holds.
318
R. H. Schonmann
Proof. For arbitrary z ∈ V and s ≥ 0, $ ∞ $ ∞ (z, s) ≤ dt dt2 G 1 λ s
u,v∈V
t1
u,t1 r,0 z,s × E(ζG,λD(G);t (v)) E(ζG,λD(G);t (u)) E(ζG,λD(G);t (v)) 2 1 2 $ ∞ r,0 = dt2 E(ζG,λD(G);t (v)) 2 v∈V
$
× = =
s
t2
s
$
∞
v∈V
s
v∈V
s
$
dt1
∞
u∈V
u,t1 z,s E(ζG,λD(G);t (u)) E(ζG,λD(G);t (v)) 1 2
$
r,0 dt2 E(ζG,λD(G);t (v)) 2
t2 s
z,s dt1 E(ζG,λD(G);t (v)) 2
dt2 e(λD(G)−1)t2
× PG,λD(G);t2 (r, v) e(λD(G)−1)(t2 −s) PG,λD(G);(t2 −s) (z, v) (t2 − s) $ ∞ = dt2 e(λD(G)−1)(2t2 −s) (t2 − s) s × PG,λD(G);t2 (r, v) PG,λD(G);(t2 −s) (v, z) v∈V ∞
$ =
s
$ =
s
dt2 e(λD(G)−1)(2t2 −s) (t2 − s) PG,λD(G);2t2 −s (r, z)
∞
dw
e(λD(G)−1)w (w − s) PG,λD(G);w (r, z), 4
where in the first step we used the stochastic domination of the contact process by the BRW, in the third step we used the Markov property of the BRW and the fact that in this process particles do not interact after having been created, in the fourth step we used (3) from [Schi], in the fifth step we used the symmetry of PG,λD(G);t (·, ·), in the sixth step we used the Markov property of the continuous time simple random walk on G which jumps at rate λD(G), in the seventh step we changed variables: w = 2t2 − s. Therefore, $ ∞ G dw fl (w), sup λ (z, s) ≤ z:dist(r,z)≥l s:s≥0
where
0
fl (w) = e(λD(G)−1)w w
sup
z:dist(r,z)≥l
PG,λD(G);w (r, z).
Thanks to (7.6) and (7.13), we have 0 ≤ fl (w) ≤ we(λD(G)−1)w eλD(G)(Rrw (G)−1)w = w e(λR(G)−1)w . And clearly, for each w,
lim fl (w) = 0.
l→∞
Therefore (7.14) follows from the dominated convergence theorem, since λR(G) − 1 < 0. !
Multiplicity of Phase Transitions
319
Proof of Theorem 1.10. The claim that λs (G) < λr (G) is contained in Theorem 7.3,√since in the case of transitive G, (7.12) is the same as the condition iE (G)/D(G) > 1/ 2. The claim that the contact process open triangle condition, (7.14), holds follows from the same arguments in the proof of Theorem 7.3, using Theorem 7.4 in the place of Theorem 7.2. ! Acknowledgements. It is a pleasure to thank Marcia Salzano and Ander Holroyd for discussions on topics in this paper, and Oded Schramm for conversations on [BS3]. I thank also Russ Lyons for having made several important comments on the first version of this paper and for having sent me a copy of [HJL] shortly after the current paper was submitted. The current paper and [HJL] were written independently, but turned out to have some interesting interrelations. For instance, the explicit expression in [HJL] for the edge-isoperimetric constant of transitive planar graphs with one end shows that when such a graph has a large degree (or has a dual with large degree) it is highly non-amenable in the sense of the current paper. See also the notes at the end of Subsects. 4.1 and 6.1. Thanks go also to Chris Wu for having sent me a copy of [Wu4] shortly after the current paper was submitted, and for having pointed out to me the contribution in [RNO]. For the relation between the current paper and [Wu4], see the note after the statement of Theorem 1.8.
References Aizenman, M.: Geometric analysis of ϕ 4 fields and Ising models. I, II. Commun. Math. Phys. 86, 1–48 (1982) [Aiz2] Aizenman, M.: Stochastic geometry in statistical mechanics and quantum field theory. In: Proceedings of the International Congress of Mathematicians, Vol. 1, 2, Warsaw: PWN, 1984, pp. 1297– 1307 [AB] Aizenman, M. and Barsky, D.: Sharpness of the phase transition in percolation models. Commun. Math. Phys. 108, 489–526 (1987) [AF] Aizenman, M. and Fernández, R.: On the behavior of the magnetization in high-dimensional Ising models. J. Stat. Phys. 44, 393–454 (1986) [ABF] Aizenman, M., Barsky, D. and Fernández, R.: The phase transition in a general class of Ising-type models is sharp. J. Stat. Phys. 47, 343–374 (1987) [AN] Aizenman, M. and Newman, C.M.: Tree graph inequalities and critical behavior in percolation models. J. Stat. Phys. 16, 811–828 (1983) [BB] Babson, E. and Benjamini, I.: Cut sets and normed cohomology with application to percolation. Preprint (1997) Proc. Am. Math. Soc. 127, 589–597 (1999) [BA] Barsky, D.J. and Aizenman, M.: Percolation critical exponents under the triangle condition. Commun. Math. Phys. 19, 1520–1536 (1991) [BW] Barsky, D.J. and Wu, C.C.: Critical exponents for the contact process under the triangle condition. J. Stat. Phys. 91, 95–124 (1998) [Bax] Baxter, R.J.: Exactly Solved Models in Statistical Mechanics. London: Academic Press, 1982 [BLPS1] Benjamini, I., Lyons, R., Peres, R. and Schramm, O.: Group-invariant percolation on graphs. Geom.and Funct. Anal. 9, 29–66 (1999) [BLPS2] Benjamini, I., Lyons, R., Peres, Y. and Schramm, O.: Critical percolation on any non-amenable group has no infinite clusters. Ann. Probab. 27, 1347–1356 (1999) [BLS] Benjamini, I., Lyons, R. and Schramm, O. Percolation perturbations in potential theory and random walks. In: Random Walks and Discrete Potential Theory (Cortona, 1997), M. Picardello and W. Woess, editors. Cambridge: Cambridge Univ. Press, 1999, pp. 56–84 [BS1] Benjamini, I. and Schramm, O.: Percolation beyond Zd , many questions and a few answers. Electronic Communications in Probability 1, 71–82 (1996) [BS2] Benjamini, I. and Schramm, O.: Recent progress on percolation beyond Zd .Available electronically from the web page www.wisdom.weizmann.ac.il/users/˜schramm [BS3] Benjamini, I. and Schramm, O.: Percolation in the hyperbolic plane. Preprint 2000 [BG] Bezuidenhout, C. and Grimmett, G.: The critical contact process dies out. Ann. Probab. 18, 1462– 1482 (1990) [BCK] Biskup, M., Chayes, L., Kotecký, R.: On the continuity of the magnetization and the energy density for Potts models on two-dimensional graphs. Preprint 1998 [BRZ] Bleher, P.M., Ruiz, P.M. and Zagrebnov, V.A.: On the purity of the limiting Gibbs state for the Ising model on the Bethe lattice. J. Stat. Phys. 79, 473–482 (1995)
[Aiz1]
320
[BK] [CK] [CS] [CP] [DS] [ES] [EKPS] [Geo] [GHM] [Gri1] [Gri2] [GN] [GS] [Häg1] [Häg2] [Häg3] [HJL] [HP] [HPS]
[HSS] [Ham] [HS] [Hig] [Hun] [Iof1] [Iof2] [Jon] [JS] [Kes] [Lal1] [Lal2]
R. H. Schonmann
Burton, R.M. and Keane, M.: Density and uniqueness in percolation. Commun. Math. Phys. 121, 501–505 (1989) Chaboud, T. and Kenyon, C.: Planar Cayley graphs with regular dual. Int. J. Alg. and Comp. 6, 553–561 (1996) Chayes, L. and Schonmann, R.H.: Mixed percolation as a bridge between site and bond percolation. Ann. Appl. Prob. (To appear) Chen, D., Peres, Y.: Anchored expansion, percolation and speed. Preprint 1999 Durrett, R. and Schinazi, R.B.: Intermediate phase for the contact process on a tree. Ann. Probab. 23, 668–673 (1995) Edwards, R.G. and Sokal, A.D.: Generalization of the Fortuin–Kasteleyn–Swendsen–Wang representation and Monte Carlo algorithm. Phys. Rev. D 38, 2009–2112 (1988) Evans, W., Kenyon, C., Peres, Y. and Schulman, L.: Broadcasting on trees and the Ising model. Ann. Appl. Prob. 10, 410–433 (2000) Georgii, H.-O.: Gibbs Measures and Phase Transitions. Berlin, NewYork: Walter de Gruyter, 1988 Georgii, H.-O., Häggström, O. and Maes, C. The Random geometry of equilibrium phases. In: Phase Transitions and Critical Phenomena, Vol. 18, C. Domb and J.L. Lebowitz, editors. London: Academic Press, 2000, pp. 1–142 Grimmett, G.R.: The stochastic random-cluster process and the uniqueness of random-cluster measures. Ann. Probab. 23, 1461–1510 (1995) Grimmett, G.R.: Percolation. (2nd edition) New York–Berlin: Springer-Verlag, 1999 Grimmett, G.R. and Newman, C.M.: Percolation in ∞ + 1 dimensions. In: Disorder in physical systems, G. R. Grimmett and D. J. A. Welsh, editors, Oxford: Clarendon Press, 1990, pp. 219–240 Grimmett, G.R. and Stacey, A.M.: Critical probabilities for site and bond percolation models. Ann. Probab. 26, 1788–1812 (1998) Häggström, O.: The random-cluster model on a homogeneous tree. Probab. Theory and Related Fields 104, 231–253 (1996) Häggström, O.: Infinite clusters in dependent automorphism invariant percolation on trees. Ann. Probab. 25, 1423–1436 (1997) Häggström, O.: Markov random fields and percolation on general graphs. Adv. App. Probab. 32, 39–66 (2000) Häggström, O., Jonasson, J. and Lyons, R.: Explicit isoperimetric constants, phase transitions in the random-cluster and Potts models, and Bernoullicity. Preprint (2000) Häggström, O. and Peres, Y.: Monotonicity of uniqueness for percolation on Cayley graphs: All infinite clusters are born simultaneously Probab. Theory and Related Fields 113, 273–285 (1999) Häggström, O., Peres, Y. and Schonmann, R.H.: Percolation on transitive graphs as a coalescent process: Relentless merging followed by simultaneous uniqueness. In: Perplexing Problems in Probability. Festschrift in honor of Harry Kesten, M. Bramson, R. Durrett, editors. Basel–Boston: Birkhäuser, 1999, pp. 69–90 Häggström, O., Schonmann, R.H. and Steif, J.: The Ising model on diluted graphs and strong amenability. Ann. Probab. (to appear) Hammersley, J.M.: Percolation processes. Lower bounds for the critical probability. Ann, Math. Stat. 28, 790–795 (1957) Hara, T. and Slade, G.: Mean-field behaviour and the lace expansion. In: Probability theory of spatial disorder and phase transition, G. Grimmett, ed., Dordrecht, Boston, London: Kluwer Publ. Co, 1994, pp. 87–122 Higuchi, Y.: Remarks on the limiting Gibbs states on a (d + 1)-tree. Publ. RIMS, Kyoto Univ. 13, 335–348 (1977) Hungerford, T.W.: Algebra. Berlin–Heidelberg–New York: Springer Verlag, 1974 Ioffe, D.: On the extremality of the disordered state for the Ising model on the Bethe lattice. Lett. Math. Phys. 37, 137–143 (1996) Ioffe, D.: Extremality of the disordered state for the Ising model on general trees. In: Trees (Versailles, 1995), B. Chauvin, S. Cohen and A. Roualt, editors, Basel–Boston: Birkhäuser, 1996, pp. 3–14 Jonasson, J.: The random cluster model on a general graph and a phase transition characterization of nonamenability. Stochastic Processes and their Appl. 79, 335–354 (1999) Jonasson, J. and Steif, J.E.: Amenability and phase transition in the Ising model. J. Theor. Probab. 12, 549–559 (1999) Kesten, H.: Percolation Theory for Mathematicians. Boston–Basel–Stuttgart: Birkhäuser, 1982 Lalley, S.P.: Percolation on Fuchsian groups. Ann. L’Institut Henri Poincaré (Probability and Statistics) 34, 151–177 (1998) Lalley, S.: Growth profile and invariant measures for the weakly supercritical contact process on a homogeneous tree. Ann. Probab. 27, 206–225 (1999)
Multiplicity of Phase Transitions
[Lal3] [LS] [Leb] [Lig1] [Lig2] [Lig3] [Lig4] [Lyo1] [Lyo2] [Lyo3] [Lyo4] [Lyo5] [LyS] [MS] [MM-S] [Moh1] [Moh2] [MW] [MoS] [MSZ] [MP] [NW] [Ngu] [NY] [PS-N] [Pem] [PS] [Per] [PLM] [Pfi] [RNO] [Sal] [SaS1] [SaS2] [SaS3]
321
Lalley, S.: Percolation clusters in hyperbolic tesselations. Preprint, 1999 Lalley, S. and Sellke, T.: Limit set of a weakly supercritical contact process on a homogeneous tree. Ann. Probab. 26, 644–657 (1998) Lebowitz, J.: Coexistence of phases phases in Ising ferromagnets. J. Stat. Phys. 16, 463–476 (1977) Liggett, T.M.: Interacting Particle Systems. Berlin–Heidelberg–New York: Springer Verlag, 1985 Liggett, T.M.: Multiple transition points for the contact process on the binary tree. Ann. Probab. 24, 1675–1710 (1996) Liggett, T.M.: Branching random walks and contact processes on homogeneous trees. Probab. Theory and Related Fields 106, 495–519 (1996) Liggett, T.M.: Stochastic Interacting Systems: Contact, Voter and Exclusion Processes. Berlin– Heidelberg–New York: Springer Verlag, 1999 Lyons, R.: The Ising model and percolation on trees and tree-like graphs. Commun. Math. Phys. 125, 337–353 (1989) Lyons, R.: Random walk and percolation on trees. Ann. Probab. 18, 931–958 (1990) Lyons, R.: Random walk, capacity and percolation on trees. Ann. Probab. 20, 2043–2088 (1992) Lyons, R.: Random walks and the growth of groups. Comptes Rendus de l’Academie de Sciences, Series I – Mathematique 320, 1361–1366 (1995) Lyons, R.: Phase transition on non-amenable graphs. J. Math. Phys. 41, 1099–1126 (2000) Lyons, R. and Schramm, O.: Indistinguishability of Percolation Clusters. Ann. Probab. 27, 1809– 1836 (1999) Madras, N. and Schinazi, R.B.: Branching Random Walk on trees. Stochastic Processes and Their Appl. 42, 255–267 (1992) Messager, A. and Miracle-Sole, S.: Equilibrium states of the two dimensional Ising model in the two-phase region. Commun. Math. Phys. 40, 187–196 (1975) Mohar, B.: Isoperimetric inequalities, growth, and the spectrum of graphs. Linear Alg. and its Appl. 103, 119–131 (1988) Mohar, B.: Some relations between analytic and geometric properties of infinite graphs. Discrete Math. 95, 193–219 (1991) Mohar, B. and Woess, W.: A survey on spectra of infinite graphs. Bull. London Math. Soc. 21, 209–234 (1989) Moore, T. and Snell, E.J.: A branching process showing a phase transition. J. Appl. Probab. 16, 252–260 (1979) Morrow, G.J., Schinazi, R.B. and Zhang, Y.: The critical contact process on a homogeneous tree. J. Appl. Probab. 31, 250–255 (1994) Muchnik, R. and Pak, I.: Percolation on Grigorchuk groups. Commun. Alg. (To appear) Newman, C.M. and Wu, C.C.: Markov fields on branching planes. Probab. Theory and Related Fields 85, 539–552 (1990) Nguyen, B.: Gap exponent for percolation processes with triangle condition. J. Stat. Phys. 49, 235–243 (1987) Nguyen, B. and Yang, W.-S.: Triangle condition for oriented percolation in high dimensions. Ann. Probab. 21, 1809–1844 (1993) Pak, I. and Smirnova-Nagnibeda, T.: On non-uniqueness of percolation on nonamenable Cayley graphs. Comptes Rendus De L’Academie Des Sciences, Serie I – Mathematique, 330, 495–500 (2000) Pemantle, R.: The contact process on trees. Ann. Probab. 20, 2089–2116 (1992) Pemantle, R. and Stacey, A.: The branching random walk and contact process on Galton–Watson and non-homogeneous trees. Preprint 1999 Peres, Y.: Percolation on nonamenable products at the uniqueness threshold. Ann. L’Institut Henri Poincaré (Probability and Statistics) 36, 395–406 (2000) Peruggi, F., di Liberto, F. and Monroy, G.: The Potts model on Bethe lattices: I. General results. J. Phys. A 16, 811–828 (1983) Pfister, C.: Translation invariant equilibrium states of ferromagnetic Abelian lattice systems. Commun. Math. Phys. 86, 375–390 (1982) Rietman, R., Nienhuis, B. and Oitmaa, J.: The Ising model on hyperlattices. J. Phys. A 25, 6577– 6592 (1992) Salzano, M.: Infinitely many contact process transitions on a tree. J. Stat. Phys. 97, 817–826 (1999) Salzano, M. and Schonmann, R.H.: The second lowest extremal invariant measure of the contact process. Ann. Probab. 25, 1846–1871 (1997) Salzano, M. and Schonmann, R.H.: A new proof that for the contact process on homogeneous trees local survival implies complete convergence. Ann. Probab. 26, 1251–1258 (1998) Salzano, M. and Schonmann, R.H.: The second lowest extremal invariant measure of the contact process II. Ann. Probab. 27, 845–875 (1999)
322
[Schi] [Sch1] [Sch2] [Sch3] [ST] [SeS] [SW] [Sok] [Sta] [TM] [VT] [Wu1] [Wu2] [Wu3] [Wu4] [Wu5] [Zha]
R. H. Schonmann
Schinazi, R.B.: On multiple phase transitions for branching Markov chains. J. Stat. Phys. 71, 507–511 (1993) Schonmann, R.H.: The triangle condition for contact processes on homogeneous tree. J. Stat. Phys. 90, 1429–1440 (1998) Schonmann, R.H.: Stability of infinite clusters in supercritical percolation. Preprint (1997), Probability Theory and Related Fields 113, 287–300 (1999) Schonmann, R.H.: Percolation in ∞ + 1 dimensions at the uniqueness threshold. In: Perplexing Problems in Probability. Festschrift in honor of Harry Kesten, M. Bramson, R. Durrett, editors. Basel–Boston: Birkhäuser 1999, pp. 53–67 Schonmann, R.H. and Tanaka, N.I.: Lack of monotonicity in ferromagnetic Ising model phase diagrams. The Ann. Appl. Probab. 8, 234–245 (1998) Series, C.M. and Sinai, Ya.G.: Ising models on the Lobachevsky Plane. Commun. Math. Phys. 128, 63–76 (1990) Soardi, P.M. and Woess, W.: Amenability, unimodularity, and the spectral radius of random walks on infinite graphs. Math. Z. 205, 471–486 (1990) Sokal, A.: A rigorous inequality for the specific heat of an Ising or J4 ferromagnet. Phys. Lett. A 71, 451–453 (1979) Stacey, A.M.: The existence of an intermediate phase for the contact process on trees. Ann. Probab. 24, 1711–1726 (1996) Terrones, H. and Mackay, A.L.: The geometry of hypothetical curved graphite structure. Carbon 30, 1251–1260 (1992) Vanderbilt, D. and Tersoff, J. Negative-curvature fullerene Analog of C60 . Phys. Rev. Lett. 68, 511–513 (1992) Wu, C.C.: Critical behavior of percolation and Markov fields on branching planes. J. Appl. Probab. 30, 538–547 (1993) Wu, C.C.: The contact process on a tree – behavior near the first transition. Stochastic Processes and their Appl. 57, 99–112 (1995) Wu, C.C.: Ising models on hyperbolic graphs. J. Stat. Phys. 85, 251–259 (1996) Wu, C.C.: Ising models on hyperbolic graphs II. J. Stat. Phys. (to appear) Wu, F.Y.: The Potts model. Rev, Mod. Phys. 54, 235–268 (1982) Zhang, Y.: The complete convergence theorem of the contact process on trees. Ann. Probab. 24, 1408–1443 (1996)
Communicated by J. L. Lebowitz
Commun. Math. Phys. 219, 323 – 355 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Mixing Properties for Mechanical Motion of a Charged Particle in a Random Medium V. Sidoravicius1 , L. Triolo2 , M. E. Vares3 1 IMPA, Rio de Janeiro, Brazil and Institute of Mathematics and Informatics, Vilnius, Lithuania 2 Dipartimento de Matematica, Università di Roma Tor Vergata, Roma, Italy 3 IMPA, Rio de Janeiro, Brazil
Received: 22 March 2000 / Accepted: 8 December 2000
Abstract: We study a one-dimensional semi-infinite system of particles driven by a constant positive force F which acts only on the leftmost particle of mass M, called the heavy particle (the h.p.), and all other particles are mechanically identical and have the same mass m < M. Particles interact through elastic collisions. At initial time all neutral particles are at rest, and the initial measure is such that the interparticle distances ξi are i.i.d. r.v. Under conditions on the distribution of ξ which imply that the minimal velocity obtained by each neutral particle after the first interaction with the h.p. is bigger than the drift of an associated Markovian dynamics (in which each neutral particle is annihilated after the first collision) we prove that the dynamics has a strong cluster property, and as a consequence, we prove existence of the discrete time limit distribution for the system as seen from the first particle, a ψ-mixing property, a drift velocity, as well as the central limit theorem for the tracer particle. 1. Introduction In the present work we are concerned with the question of long time behavior for a mechanical system with infinitely many point particles. The system consists of one charged particle, of mass M, (hereafter called the heavy particle and sometimes the h.p.) initially located at the origin and subject to a constant positive force F , and infinitely many neutral particles of mass m < M, (sometimes called the n.p.) located on the positive half-line. The h.p. will play the role of the tracer particle for the system. Initially all neutral particles are at rest. During the evolution they keep their velocities constant between the collisions, which are assumed to be elastic. Due to the external constant force, between two consecutive collisions the heavy particle is steadily accelerated. We will focus on a set of typical questions such as the existence of an asymptotic drift and diffusivity for the motion of the heavy particle, and the ergodic behaviour of the medium as seen from the heavy particle, and its mixing properties. We refer to Presutti et al. [12], for various motivations which led to a study of analogous semi-infinite systems (see also
324
V. Sidoravicius, L. Triolo, M. E. Vares
[2–4]). For a more general account on this subject we refer to [15–17], and some recent developments are discussed in [6] and [13]. At variance with respect to finite systems, for which generally it is believed that the main mechanism of “chaos” relies on the existence of foliations of the phase space into stable and unstable manifolds (see [1]), for infinite systems it is not completely clear which features are responsible for good ergodic properties. Besides an attempt to bring some analogue of the notion of hyperbolicity, which still remains quite unclear in the context of infinite mechanical systems, another rather natural mechanism was proposed by Landau and Lifschitz: “ . . . A subsystem undergoes all kinds of interactions with other parts of the system [due to infinite size of the system] . . . [and] owing to the complexity of [such] interactions it will pass sufficiently often through all its possible states . . . ” [9]. Still another possible way comes from the fact that the dynamics is expected to be asymptotically free (far from the h.p.), i.e. the neutral particles which interact with the h.p. eventually move away towards +∞ becoming asymptotically free. This naturally reminds us of the Möller wave operators of classical scattering theory as it was pointed out by Sinai more than fifteen years ago, but still very little is known in this direction and questions remain quite hard for a rigorous analysis. To our knowledge the only known result for the infinite mechanical system was obtained by Presutti et al. [12], where existence of the Möller wave operator was proven, as well as asymptotic completeness and existence of a non-trivial scattering matrix. Even though we do not address in this work the same questions, we explore some ideas which appeared in [12], combining them with coupling methods of probability theory, to prove convergence and mixing properties of the underlying discrete dynamics, at the instants the h.p. collides with the standing neutral particles. Namely, from one side the analysis relies on three characteristic properties of the evolution: a) the cluster structure of the dynamics, i.e. the h.p. interacts with a finite set of neutral particles and when such interaction is over, a new finite cluster of particles “comes in for the interaction”; when this is finished, the same phenomenon starts again, and keeps continuing on and on; b) the occurrence of rare events in the flow of neutral particles, that causes a (locally) significant change of the behaviour of the h.p. and which determines an almost complete loss of the memory of the past history of the h.p. (the meaning of “almost complete” will be specified below); c) the (almost everywhere) existence of a contractive manifold [12]. On the other hand, as we have already mentioned, we make strong use of couplings and comparisons of two dynamics associated with the same system. In spirit, this is again similar to scattering, which normally involves a comparison of two different dynamics for the same system – the given dynamics and a simplified “free” dynamics. For our concrete dynamical system the role of “simplified evolution” is played by what we call the Markovian dynamics, in which each neutral particle collides only once with the h.p.; moreover, since all n.p. are indistinguishable, we can think of them as “pulses” which just interact with the h.p., and in this (Markovian) case each one moves freely after the unique collision. Under Assumptions 2.5 given below, we will prove the existence of an invariant measure describing the medium as seen from the heavy particle at moments of collisions with the standing neutral particles (discrete dynamics), ψ-mixing property for the sequence of flight times, and as a consequence of it, we shall show the existence of an asymptotic drift and diffusive behaviour of the h.p. The main content of our assumptions is the existence of a minimal velocity for the moving n.p. which is larger than the asymptotic drift velocity of the h.p. in the Markovian dynamics. Informally the scheme of the proof can be described as follows. We allow the interacting dynamics to act for a “long time” and then look at what happened to the system.
Mixing Properties for the Mechanical Motion of Charged Particle
325
Roughly, the “long time” intervals are characterized by the “typical” time needed in the Markovian dynamics for the h.p. to get close to its asymptotic behaviour, provided that its initial velocity is not too large (later specified by a suitable constant V∗ ). At these random instants we check if there is some large enough cluster of tightly placed neutral particles right in front of the h.p. (the event described in property b) above), and if so, we start to follow the evolution of the velocity process of the h.p. while it is moving “through” this cluster. Now if one would compare two dynamics associated with the same initially standing particles, and such that one of them initially has no moving neutral particles and the other has them distributed according to some limiting measure, due to the property c), one would observe that the two velocity processes are becoming close to each other in the variational distance. This fact and particular properties of the distribution of initial interparticle distances enables us to couple these velocity processes with a positive probability, and by this, in the case of successful coupling, achieving a temporary loss of “inertial memory” of the past. This is one of the most delicate parts of the proof. The next step is to exploit property a) of the cluster structure of the dynamics in order to eliminate possible “returns” of the memory via recollisions with the particles of the distant past. We obtain control over it by following Sinai’s argument of “continuous loss of memory”, and by proving that interdistances between the h.p. and neutral particles with which the h.p. had collided in the past are stochastically bounded from below by a certain Markov chain with a positive drift, and thus, occasionally those neutral particles will stop to interact with the h.p. in the future, producing the so-called cluster indices. These are the principal components in the proof of the existence of an invariant measure describing the medium as seen from the heavy particle at the moments of collisions with the standing neutral particles (discrete dynamics) and the ψ-mixing property for the sequence of flight times. The existence of an asymptotic drift follows quite easily from the mixing property. The verification of the diffusive behaviour of the h.p. is of a more technical nature. We also should notice that in the equal masses case (m = M) the model was treated in [5]. The paper is organized as follows. In Sect. 2 we give a formal description of the model and state our main results. Section 3 is dedicated to the study of long time behaviour of the Markovian dynamics associated with the original one. In Sect. 4, we prove the tightness of the family of measures which describe the medium as seen from the h.p., for the discrete time dynamics. In Sect. 5 we prove the strong cluster property, i.e. existence of cluster indices, and as a consequence we obtain the existence and uniqueness of the invariant measure for the discrete dynamics, as well as mixing properties, and the existence of asymptotic drift for the h.p. The diffusivity of the motion of the h.p. is proven in Sect. 6, using a technical lemma the proof of which is postponed to the Appendix. 2. The Model and Results We consider a system of particles on the half-line R+ = [0, +∞), with nonnegative velocities. No two particles are allowed to be at the same space point and to have the same velocity, so that the state of the system at any given time will be represented by a countable subset of R+ × R+ , where the first coordinate will represent position, and the second will represent velocity. Thus the state space of our system will be taken as X = {x ⊆ R+ × R+ : for each b < +∞, x ∩ [0, b] × R+ is finite}, i.e., the configurations of particles will be locally (in space) finite.
326
V. Sidoravicius, L. Triolo, M. E. Vares
On X we consider the topology for which a fundamental system of neighbourhoods of x ∈ X is given by the sets GA,V = {x ∈ X : #(x ∩ (A × V )) = #(x ∩ (A × V ))}, where A and V are bounded open sets of R+ , such that x ∩ ∂(A × V ) = ∅, and where #B denotes the cardinality of a finite set B. That is, elements of X are thought of as discrete measures and X is endowed with vague topology. It is well known that with this topology X is a Polish space (see [5] and references therein). We then let B denote its Borel σ -field. Since the system is one dimensional it may be convenient to represent a configuration x as a sequence (qn (x), vn (x))n≥0 , where qn , and vn will denote the position and the velocity, resp., of the (n + 1)th particle, ordered according to position, i.e., qn ≥ qn−1 and if qn = qn−1 then vn > vn−1 . Particles are assumed to be pointlike, with the same mass m, except for the leftmost particle, which will have mass M > m, and which will be called “the heavy particle” (the h.p.). The h.p. is subject to a constant force F > 0. Other particles do not feel the field, and are called neutral particles (n.p.). The interaction among particles is elastic. Neutral particles are indistinguishable, thus we may think of them as “pulses” (without interaction among themselves). In this setup all neutral particles keep their own velocities until they collide with the h.p. At such a collision, velocities change according to the following rule, in which v and V are the incoming velocities of the neutral particle and the h.p., respectively, and v and V are the corresponding outgoing velocities: V = αV + (1 − α)v; (2.1) v = (1 + α)V − αv; def
where α = (M − m)/(M + m), so that 0 < α < 1. The h.p. is under the action of the constant force F > 0, so that between collisions def
it moves with constant acceleration f = F /M. Assumptions 2.1. Initially all n.p. are at rest i.e. vi = 0 for i ≥ 1. The h.p. is located at the origin (q0 = 0) and has velocity v0 ∈ R+ . We then completely describe an def
initial probability measure µ0 on X by requiring that the interparticle distances ξn = qn − qn−1 , (n ≥ 1) are i.i.d. positive random variables, with an absolutely continuous distribution. Under previous assumptions we know that the dynamics just described is globally well defined µ0 a.s. on X. Proposition 2.2 (Existence of dynamics). Under Assumptions 2.1 there exists a measurable flow (Tt x)t≥0 on (X, B, µ0 ), corresponding to the dynamical rules prescribed above. The proof of such a statement, as given in [14], is a combination of the following three facts which hold µ0 almost surely: i)
only finitely many neutral particles enter any given bounded neighbourhood of the h.p. in finite time;
Mixing Properties for the Mechanical Motion of Charged Particle
327
ii) all collisions are transversal, i.e. the h.p. never collides simultaneously with several neutral particles, or with a neutral particle which has velocity equal to the velocity of the h.p.; iii) only finitely many collisions occur in a finite interval of the time. Remark. Assumptions 2.1 require absolute continuity of the distribution of the interdistances ξi , though the heuristics seem to indicate that the absence of non-transversal collisions might be obtained by a co-dimension type argument, thus requiring only the continuity of such a distribution. Nevertheless for degenerated initial velocities a rigorous proof along these lines is not known to the authors. As it follows from the proof, we may assume (Tt x) to be right continuous, in the sense that if a collision happens at time t, then vn (Tt x) denote the outgoing velocities (n ≥ 0). Definition 2.3. The position of the h.p. at time t is denoted by q0 (t, x) = q0 (Tt x), where x is the initial configuration. The law of the system at time t, as seen from the position of the h.p., is described by the measure µ t (B) = µ0 ((Tt0 )−1 (B)), where Tt0 x = Sq0 (t,x) Tt x,
(2.2)
Sr x = {(q − r, v) : (q, v) ∈ x}.
(2.3)
and From Proposition 2.2, Tt0 is well defined µ0 a.s. A useful discrete time dynamics is obtained by observing the system from the position of the h.p. at the moments of “fresh collisions”, i.e., the first collision with each given n.p. or equivalently, at the moments of collisions with standing particles. That is, if x ∈ X is such that the dynamics is well defined and n ≥ 1, we define tn (x) via q0 (tn (x), x) = qn (x0 ), where x0 = {(q, v) ∈ x : v = 0} is the configuration of standing particles in x. Clearly tn (x) < +∞ µ0 a.s. Considering the rule (2.1) of elastic collisions, tn (x), (n ≥ 1) coincide with the times of successive visits of (Tt0 x) to the set X0 = {x ∈ X : q0 (x) = q1 (x) = 0, v0 (x) =
α v1 (x)}. 1+α
In particular, defining def
T1 x = Tt01 (x) x, then T1 x ∈ X0 , In fact for each n,
t2 (x) = t1 (T1 x),
and T1 (T1 x) = Tt02 (x) (x).
Tn1 (x) = Tt0n (x) (x),
which is well defined µ0 almost surely for all n ≥ 1, and which we denote by Tn x. We def
also let τn (x) = tn (x) − tn−1 (x), for n ≥ 1, with t0 (x) = 0, so that τn (x) = τ1 (Tn−1 x), µ0 a.s.
328
V. Sidoravicius, L. Triolo, M. E. Vares
Definition 2.4. For each n ≥ 1, µ(n) will denote the law of the system at time tn , as seen from the random position of the h.p., i.e. µ(n) (B) = µ0 ((Tn1 )−1 (B)), for B ∈ B. Assumptions 2.5. In the setup of Assumptions 2.1 we allow v0 , the initial non-negative velocity of the h.p., to be random, independent of the interdistances (ξn )n≥1 . Moreover we require: (a) There exists a > 0 with Eµ0 (eaξ1 ) < +∞, and def
d = inf{r : µ0 (ξ1 ≤ r) > 0} > 0. (b1) (b2) (c)
α
2df ≤ v0 ; µ0 a.s. 1 − α2
Eµ0 (v0h ) < ∞ for some h > 0. 2 (1 − α 2 )d >
Eµ (ξ1 ) 0 . ∞ α 2(i−1) ξ Eµ0 &i=1 i
For α = 0 condition (c) coincides with the condition (ii) in [5], and (b1) becomes trivial. An example satisfying Assumptions 2.5 (a), (c) is {ξi = d + Zi }i , where Zi are i.i.d. exponential random variables with rate greater than 1/d. We may now state our main results. Theorem 2.6. Under Assumptions 2.5 there exists a probability measure µ on X, concentrated on X0 , stationary for the discrete time dynamics, and such that µ(n) → µ weakly, as n → +∞. The proof of the above theorem is based on the construction of the so-called cluster index (see Definition 5.1), and using this construction we will prove the following: Theorem 2.7. Under Assumptions 2.5 we have: I) There exist positive constants c, c such that sup
A∈Mk0 , B∈M+∞ k+n
|µ(B|A) − µ(B)| ≤ ce−c n
for all k, n ≥ 1, where Mnm denotes the σ -field generated by the variables τi : m ≤ i ≤ n. II) There exists a positive constant vD (the drift velocity) so that as t → +∞, q0 (t) → vD , t
µ0 a.s.
Mixing Properties for the Mechanical Motion of Charged Particle
329
III) There exists a positive constant σ˜ so that
q0 (ut) − vD ut → Wu 0≤u≤1 √ σ˜ t 0≤u≤1 in law in the Skorohod space D([0, 1], R), as t → +∞, where (Wu ) represents the standard Wiener process. Notation. According to the interpretation of n.p. as pulses crossing each other, for an initial configuration x = (qi , vi )i≥0 , with q0 = 0, vi = 0 for all i ≥ 1, for which the evolution is well defined, we use the symbol pi , i ≥ 1, to denote the pulse initially located at qi ≡ qi (x), and let qi (t, x) and vi (t, x) be its position and velocity at time t. The initial condition x will be usually omitted from the notation, and we then write qi (t), vi (t). Thus, in general qi (t) = qi (Tt x) if i ≥ 1, due to different labelling. Further we shall use the symbols c, c to denote positive constants. These constants may vary from one line to another, though we use the same symbol. Remark. Though the limit in Theorem 2.6 doesn’t depend on the initial velocity v0 , the speed of the convergence clearly depends on its distribution. 3. Markovian Approximation In this section we focus on an auxiliary dynamics which is obtained by introducing the so-called “annihilation hypothesis”, i.e., each n.p. disappears immediately after the first collision with the h.p. This quite simple evolution will be called Markovian approximation dynamics due to the Markovian property of the velocity of the h.p. The Markovian approximation dynamics will be used here as an auxiliary tool to provide useful bounds. For the case M = m Piasecki studied particular cases for which the test particle collides only once with each pulse, so that the Markovian dynamics coincides with the real one (see [11]). Remark. All quantities like velocities, hitting times and positions related with Markovian approximation dynamics will carry a “bar” in their notations, e.g. v¯0 , q¯0 , etc. As in the previous section t¯n is defined by q¯0 (t¯n ) = ni=1 ξi , and τ¯n = t¯n − t¯n−1 , n ≥ 1, (with t¯0 = 0). Proposition 3.1. q¯0 (t¯n ) 1 lim = n→+∞ t¯n (1 − α)
Eµ (ξ1 ) f def 0 = v¯D 2E 2(j −1) α ξ µ0 j j ≥1
µ0 a.s.
(3.1)
Proof. Let u¯ n = v¯0 (t¯n ) = u¯ n (v¯0 (x), ξ1 , ξ2 , . . . , ξn ) be the outgoing velocity of the h.p. at the n-th collision. The velocity change during (t¯n−1 , t¯n ) is only due to the constant field F and, since there are no recollisions in this dynamics, one may explicitly compute: 1 u¯ n − u¯ n−1 , α 1 τ¯n = − u¯ n−1 + u¯ 2n−1 + 2f ξn , f
f · τ¯n =
(3.2) (3.3)
330
V. Sidoravicius, L. Triolo, M. E. Vares
so that we immediately get u¯ n = α ·
u¯ 2n−1 + 2f ξn ;
(3.4)
and iterating we get: u¯ 2n = α 2n v¯02 (x) + 2f
n
α 2(n−i+1) ξi
(3.5)
i=1
for any initial velocity v¯0 (x) ≥ 0. (u¯ 0 = v¯0 (x).) From Eq. (3.2) we have: n n−1 t¯n 1 u¯ n − u¯ 0 1 (1 − α) 1 τ¯i = u¯ i , = + n n αf n αf n i=1
so that
q¯o (t¯n ) = t¯n
1 n 1 n
(3.6)
i=0
n
i=1 ξi n i=1 τ¯i
=
1 u¯ n −u¯ 0 αf n
1 n
n
+
i=1 ξi . (1−α) 1 n−1 ¯i i=0 u αf n
(3.7)
Since ξ1 , ξ2 , . . . are i.i.d. random variables, the numerator in Eq. (3.7) converges a.s. to Eµ0 (ξ1 ) as n → ∞. On the other hand, (u¯ 2n , n ≥ 1) form a Markov chain, and for each n ≥ 1: u¯ 2n = α 2n v¯02 + Zn (ξ1 , ξ2 , . . . , ξn ) = Zn (
v¯02 + ξ1 , ξ2 , . . . , ξn ), 2f
(3.8)
where Zn (ξ1 , ξ2 , . . . , ξn ) = 2f (α 2 ξn + · · · + α 2n ξ1 ).
(3.9)
Obviously, for each n the law of Zn (ξ1 , ξ2 , . . . , ξn ) is the same as that of Z˜ n (ξ1 , ξ2 , . . . , ξn ) defined as Z˜ n (ξ1 , ξ2 , . . . , ξn ) = Zn (ξn , ξn−1 , . . . , ξ1 ) = 2f
n
α 2i ξi ,
(3.10)
i=1
and moreover 0 ≤ Z˜ n Z˜ ∞ = 2f
∞
Eµ0 Z˜ ∞ =
i=1 α
2i ξ
i
< +∞, a.s., and
2f α 2 Eµ (ξ1 ) < ∞. (1 − α 2 ) 0
(3.11)
In particular we see that u¯ 2n converges in distribution to ν, the law of Z˜ ∞ . By increasing the probability space, if needed, we take ζ distributed according to ν, independent of ξ1 , ξ2 , . . . , and consider the stationary Markov chain Z n = Zn ( Thus
ζ + ξ1 , ξ2 , . . . , ξn )). 2f
u¯ 2n − Z n = α 2n (v¯02 − ζ ),
(3.12)
(3.13)
Mixing Properties for the Mechanical Motion of Charged Particle
331
from where we see at once that the Markov chain u¯ 2n converges exponentially fast (in law) to its unique stationary measure, ν, and by the ergodic theorem we see that
n √ 1 n→∞ u¯ i −→ udν(u) = Eµ0 Z˜ ∞ , (3.14) n R+ i=1
µ0 a.s. On the other side a trivial application of the Borel–Cantelli Lemma gives us that u¯ 0 lim u¯ n − = 0 µ0 a.s. Recalling Eq. (3.7) we get (3.1). n The limit in (3.1) 0 < v¯D < +∞ will be called Markovian (asymptotic) drift. Remark 3.2. The condition µ0 (ξ1 ≥ d) = 1 ensures that at any collision the outgoing def 2df 2df velocity of the n.p. is at least vmin = (1 + α) 1−α 2 , provided v0 (0) ≥ α 1−α 2 , as it follows through a direct calculation using (2.1). Thus, condition (c) in Assumptions 2.5 can be read as (3.15) vmin > v¯D , which is a crucial property in the analysis that follows. When α is not too close to one, √ condition (b1) in Assumptions 2.5 might be omitted, and we would then get (1 + α) 2df as the new value for the minimal velocity; in such case, the condition (c) of Assumptions 2.5 would have to be replaced by √ 2(1 − α 2 ) d >
Eµ (ξ1 ) 0 , 2(j −1) ξ Eµ0 α j j ≥1
which however, makes no sense if 1 − α 2 ≤ 1/4. We end this section by stating and proving some elementary facts which will be useful later on. Proposition 3.3. Let x, x¯ be two initial configurations such that v0 ≤ v¯0 , vi = v¯i = 0, if i ≥ 1, and qi = q¯i , i ≥ 1, and such that the dynamics is well defined when starting from x. Let sz ≡ sz (x) (¯sz resp.) denote the time at which the h.p. reaches the point z ∈ R+ in the original (Markovian, resp.) dynamics which starts at configuration x (x¯ resp.). Then v0 (sz , x) ≤ v¯0 (¯sz , x); ¯ (3.16) for all z ∈ R+ . Proof. Let x ∈ X be such that dynamics (Tt x)t>0 is well defined, as in Proposition 2.2. Then sz (x) < +∞ and the number of collisions up to sz (x) is finite. We shall prove (3.16) by induction on K, the number of collisions of the h.p. in the original dynamics, before reaching point z. The statement is obviously true for K = 0 and K = 1. Let us assume it is true for K ≤ n and we will prove it for K = n + 1. Let then K = n + 1, and let us denote by zn (x) < zn+1 (x) < z the points where the nth and (n + 1)st collision of the h.p. happened. By induction assumption, for any z˜ < zn+1 we have ¯ v0 (sz˜ , x) ≤ v¯0 (¯sz˜ , x). The same immediately holds for the left limits v0 (sz−n+1 (x) , x) ≤ v¯0 (¯sz−n+1 (x) , x), ¯
(3.17)
332
V. Sidoravicius, L. Triolo, M. E. Vares
and independently of the fact that at the point zn+1 (x) we have a recollision or a collision with standing particle, inequality (3.17) implies that v0 (szn+1 (x) , x) ≤ v¯0 (¯szn+1 (x) , x), ¯ as well. Now, by assumption, in between zn+1 (x) and z, the h.p. moves without interaction for both dynamics (with constant acceleration f ). Thus we get v0 (sz , x) ≤ v¯0 (¯sz , x). ¯
Corollary 3.4. Under Assumptions 2.1 we have τn (x) ≥ τ¯n (x) for each n ∈ N µ0 a.s. Proof. Immediate from Proposition 3.3. 4. Tightness of {µ(n) }n≥1 Our goal in this section is to prove the following: Theorem 4.1. The family {µ(n) }n≥1 is tight. For the proof of the above theorem we shall consider a suitable sequence of random indices which we define below. For this, let us start by observing that v¯0 (t¯n ) = u¯ n−m (v¯0 (t¯m ), ξm+1 , . . . , ξn ), τ¯n = τ¯n−m (v¯0 (t¯m ), ξm+1 , . . . , ξn ), for any 1 ≤ m < n.
In what follows we fix a constant V∗ > Eµ0 Z˜ ∞ and we write v¯m,n and τ¯m,n for the restrictions of the previously defined functions to the set v¯0 (t¯m ) = V∗ , i.e., def
v¯m,n (ξm+1 , . . . , ξn ) = u¯ n−m (V∗ , ξm+1 , . . . , ξn ), def
τ¯m,n (ξm+1 , . . . , ξn ) = τ¯n−m (V∗ , ξm+1 , . . . , ξn ). Definition 4.2. Given x ∈ X we say that index n > k is k-admissible if: (i) v¯k,n (x) ≤ V∗ , n 1 (ii) ξ − E (ξ ) j µ0 1 ≤ δ, n − k j =k+1
n (iii)
j =k+1 ξj
n
l=k+1 τ¯k,l
where δ is defined as δ=
≤ v¯D + δ,
vmin − v¯D . 4
(4.1)
Mixing Properties for the Mechanical Motion of Charged Particle
333
We then consider the sequence of random indices {Rn (x)}n≥0 defined as follows: R0 (x) = min{j : v¯0 (t¯j ) ≤ V∗ }, Rn+1 (x) = min{j > Rn (x) : j is Rn (x) − admissible}.
(4.2)
Notice that if n ≥ 1 then Rn − R0 depends only on (ξ1 , ξ2 , . . . ). From the properties of the initial distribution and the expression for u¯ n we can check that µ0 (Rn < +∞) = 1 for each n ≥ 0. In fact, the following holds: Proposition 4.3. Provided Assumptions 2.5 are satisfied, under µ0 the random variables def
Yr = Rr − Rr−1 , r ≥ 1 are i.i.d. and have exponentially decaying tail, i.e.:
µ0 (Yr > n) ≤ ce−c n ,
(4.3)
for suitable c, c > 0 (which depend on V∗ and δ) and all n ≥ 1. Moreover, µ0 (R0 > n) ≤ c1 e−c1 n with c1 depending on V∗ , δ and h > 0 as in (b2), and c1 depending also on Eµ0 (v0h ). Proof. The first statement is obvious and let us now prove Eq. (4.3) which will follow if we prove that each of the following probabilities: 1 n (i’) µ0 n j =1 v¯0,j > V∗ , µ0 n1 nj=1 ξj − Eµ0 (ξ1 ) > δ , n ξj µ0 nj =1τ¯ > v¯D + δ ,
(i”) (iii”)
j =1 0,j
tends to zero exponentially fast in n, as n → +∞. For (ii’) the estimate follows at once from the Cramér theorem applied to due to our assumption on the tail of the distribution of ξ1 . As for (iii’) the main point consists in checking that if η > 0, n 1 ˜ µ0 v¯0,j ≤ Eµ0 Z∞ − η ≤ ce−c n , n
1 n
n
j =1 ξj ,
(4.4)
j =1
for suitable c, c > 0 (depending on η). Indeed, having (4.4) we use Eq.(3.7) with u¯ 0 = V∗ ; since u¯ n ≥ 0, performing straightforward algebraic manipulations we conclude (iii’) by properly choosing η and also using (ii’) with a suitable η replacing the fixed δ. Thus, it remains to check Eq. (4.4). Remark. Equation (4.4) is a one sided large deviation estimate for the sample average of the Markov chain v¯0,i . As we shall see below, there is no problem to apply the Gärtner– Ellis condition, and to get the usual large average of deviation estimates for the sample 2 Zi (or v¯ ). With our choice of V∗ > Eµ0 Z˜ ∞ , instead of V∗ > Eµ0 Z˜ ∞ we avoid 0,i
the consideration of the analogous result for v¯0,j . In this way the estimate (i’) is rougher, but this is irrelevant for our problem.
334
V. Sidoravicius, L. Triolo, M. E. Vares
Proof of (4.4). Clearly, to prove (4.4) we may replace the initial velocity V∗ by 0, i.e. it suffices to prove that if η > 0, then n 1 ˜ Zj (ξ1 , . . . , ξj ) ≤ Eµ0 Z∞ − η ≤ ce−c n . µ0 n j =1
Since the ξi are non-negative, if we write j n n 1 1 An = Zj (ξ1 , . . . , ξj ) = 2f α 2(j −i+1) ξi , n n j =1
j =1
then
i=1
n
Akn ≥
1 (k) Br , n r=1
where Br(k) =
1 k
rk
2f
j =(r−1)k+1
j
α 2(j −i+1) ξi .
i=(r−1)k+1 (k)
Clearly, for each k the random variables Br , r = 1, 2, . . . are i.i.d. and distributed as (k) Ak . In particular, for k large enough, we have Eµ0 B1 > Eµ0 Z˜ ∞ − η/2. Fix one such k and since the moment generating function of Ak is finite in a neighborhood of the origin (due to our assumption on the tails of ξi ) we again apply the Cramér theorem for i.i.d. random variables and get (4.4). As for (i’) we write µ0
n n 1 1 2 v¯0,l > V∗ ≤ µ0 v¯0,l > V∗2 n n l=1
l=1
2 (or equivalently u and now observe that the Markov chain v¯0,l ¯ 2l ) satisfies the Gärtner– Ellis condition (see e.g. Assumption 2.3.2 Ch.2 of [7]). In fact, if φ(a) = Eµ0 (eaξ1 ), Eq. (3.5) gives: n 2 1 log Eµ0 ea l=1 v¯0,l n→∞ n n 1 1 − α 2n 2 2af α 2 2(n−j +1) = lim log Eµ0 exp aα 2 V + (1 − α )ξ j n→∞ n 1 − α2 ∗ 1 − α2
lim
n 1 2af α 2 2(n−j +1) = lim log φ (1 − α ) n→∞ n 1 − α2 j =1 2af α 2 = log φ 1 − α2
j =1
(4.5)
Mixing Properties for the Mechanical Motion of Charged Particle
335
which is going to be finite and differentiable in a in some open interval around the origin. 2f α 2 Using then Theorem 2.3.6 of [7] we get (i’), since V∗2 > Eµ0 (Z˜ ∞ ) = (1−α 2 ) Eµ0 (ξ1 ), cf. (3.11). It remains to check that µ0 (R0 > n) decays exponentially fast. This follows from the argument as in (i’). Clearly the initial condition v¯02 does not affect the limit; its contribution to the above sample average being n1 nj=1 α 2j v¯02 . Nevertheless, since we allowed v¯02 to be random and not necessarily with exponentially decaying tails we just proceed in a more careful way to control the decay rate. More precisely, take β > 0 sufficiently small so that V∗2 − β > Eµ0 (Z˜ ∞ ), and observe that 2 µ0 (R0 > n) ≤ µ0 ( n 2 ≤ µ0 ( n
n j =[n/2]+1 n
u¯ 2j > V∗2 ) α 2j v¯02
j =[n/2]+1
2 > β) + µ0 ( n
n j =[n/2]+1
Zj > V∗2 − β).
The statement follows at once from condition (b2) in Assumptions 2.5, and the previous large deviation estimate. The basic ingredients for the proof of the tightness are contained in the following lemma. Lemma 4.4. a) For any b > 0 there exists a constant Cb so that supn Eµ(n) (#(x|[0,b] )) ≤ Cb ,
(4.6)
where x|I = x ∩ I × R+ for any bounded interval I ⊂ R+ . b) The family of random variables (v0 (tn ))n≥1 is tight under the measure µ0 . Proof. Statement b) follows at once from Proposition 3.3, cf. Eq. (3.16), and the expression for (v¯02 (t¯n )), according to Eq. (3.5). Since b µ(n) x : #(x ∩ ([0, b] × {0}) > = 0, d for the proof of a), it suffices to consider only the moving particles, i.e. it is enough to prove that
supn Eµ(n) #(x ∩ ([0, b] × (0, +∞)) ≤ Cb . (4.7) Of course, having arbitrarily fixed k ≥ 1, it is enough to restrict the supremum in (4.7) to n ≥ k. From condition (iii) of Definition 4.2 and Corollary 3.4 it follows that Rn+1 Rn+1 Rn+1 Rn+1 l=Rn +1 ξl , (4.8) τl ≥ τ¯l ≥ τ¯Rn ,l ≥ v¯D + δ l=Rn +1
l=Rn +1
l=Rn +1
for all n ≥ 0, where in the second inequality we used that v¯0 (t¯Rn ) ≤ V∗ . v −v¯ From Eq. (4.8) and the choice of δ ≤ min4 D cf. (4.1) we have Rn+1
vmin
l=Rn +1
τl ≥
vmin v¯D + δ
Rn+1
l=Rn +1
ξl ≥
Rn+1 v¯D + 4δ ξl . v¯D + δ l=Rn +1
(4.9)
336
V. Sidoravicius, L. Triolo, M. E. Vares
Equation (4.9) implies that during the time interval (tRn , tRn+1 ], when the h.p. travels a Rn+1 distance l=R ξ , all n.p. which were moving at tRn travel a distance at least n +1 l Rn+1 v¯D + 4δ ξl . v¯D + δ l=Rn +1
def
Thus δ ∗ = v¯ 3δ+δ d is a lower bound for their relative displacement with respect to the D h.p. during the time interval (tRn , tRn+1 ]. For any n ≥ 1, let ιn be defined through the relation Rιn ≤ n < Rιn +1 , (4.10) provided R0 ≤ n, and ιn = −1 if R0 > n. For notational completeness we set Y0 = R0 , Y−1 = 0. From the above considerations and since, due to condition (ii) of Definition 4.2, for any j ≥ 0, [Rj +1 < +∞] ⊆ [qRj +1 − qRj ≤ (Eµ0 ξ1 + δ)Yj +1 ], we see that
Eµ(n) #(x ∩ [0, b] × (0, +∞)) ≤ Eµ0 (R1 1(R1 >n) ) ι n −1
Yιn −i 1(iδ ∗ ≤Yιn +1 (Eµ0 ξ1 +δ)+b) + Eµ0 (R0 ) + Eµ0 Yιn +1 + i=0
≤ Eµ0 (Yιn +1 ) +
j −1 n
Eµ0 (
j =1 i=0
Rj 1(iδ ∗ ≤Yj +1 (Eµ0 ξ1 +δ)+b) 1(ιn =j ) ) + Eµ0 (R0 + R1 ), j (4.11)
where we have used that [ιn = j ] ∈ σ (Rj , Yj +1 ), with Y1 , Y2 , . . . are i.i.d. and integrable, Eµ0 (Yk |Rj , Yj +1 ) = Eµ0 (
Rj − R 0 Rj |Rj , Yj +1 ) ≤ , j j
µ0 a.s., if
1 ≤ k ≤ j.
Thus, by Eq. (4.10), and fixing k ≥ 1, we bound the second term on the r.h.s. of Eq. (4.11) as follows: j −1 n n j =1 i=0
j
µ0 (ιn = j, Yj +1 (Eξ1 + δ) + b ≥ iδ ∗ )
n ≤ n µ0 (ιn ≤ ) + k k
j −1 n j =[ nk ]+1
≤ nµ0 (R[ nk ]+1 > n) + k
+∞ i=0
≤ nµ0 (R[ nk ]+1 > n) + k
µ0 (ιn = j, Yj +1 (Eµ0 ξ1 + δ) + b ≥ iδ ∗ )
i=0
µ0 (Yιn +1 (Eµ0 ξ1 + δ) + b ≥ iδ ∗ )
b+1 + 1 δ∗
(4.12)
Mixing Properties for the Mechanical Motion of Charged Particle
337
Eµ0 ξ1 + δ + 1 Eµ0 (Yιn +1 ) δ∗ n n b+1 ≤ nµ0 (R[ nk ]+1 − R0 > ) + nµ0 (R0 > ) + k + 1 2 2 δ∗ Eµ0 ξ1 + δ +k + 1 Eµ0 (Yιn +1 ), δ∗ +k
where [y] denotes the integer part of y. Fix k > 4Eµ0 Y1 and since k/2 < n/([n/k] + 1) for n > k we have: nµ0
[ nk ]+1 1 n k n Yi > R[ nk ]+1 − R0 > ≤ nµ0 n ≤ cne−c k 2 [k] + 1 4
(4.13)
i=1
for suitable positive constants c, c , and all n > k. Here we are using Proposition 4.3, the choice k/4 > Eµ0 (Y1 ), and again the upper bound in the Cramér theorem. Thus, the r.h.s. of (4.13) remains bounded in n and the same happens to nµ0 (R0 > n2 ), due to Proposition 4.3. From this, (4.11), and (4.12), the proof is reduced to show that: supn Eµ0 (Yιn +1 ) < +∞. This follows from direct computation, using the independence of R0 , Y1 , Y2 , . . . : Eµ0 (Yιn +1 ) = Eµ0 (R0 1(R0 >n) ) +
∞ n j µ0 (Yi+1 = j, Ri ≤ n < Ri+1 ) j =1
i=0
∞ n ≤ Eµ0 R0 + j µ0 (Yi+1 = j, n − j < Ri ≤ n)
= Eµ0 R0 + = Eµ0 R0 + ≤ Eµ0 R0 +
j =1 ∞ j =1 ∞ j =1 ∞
i=0
j
n
µ0 (Yi+1 = j )
i=0
j µ0 (Y1 = j )
n
µ0 (Ri = s)
s=n−j +1∨0 n n
µ0 (Ri = s)
s=n−j +1∨0 i=0
j 2 µ0 (Y1 = j ) = Eµ0 R0 + Eµ0 Y12 .
j =1
Theorem 4.1 follows immediately from the previous lemma.
5. Strong Cluster Property and Invariant Measure for the Discrete Dynamics Definition 5.1. Given x ∈ X for which the dynamics is well defined, we say that k is a cluster index for x if the h.p. will never collide after tk with those particles which are moving at time tk , including the k th standing neutral particle.
338
V. Sidoravicius, L. Triolo, M. E. Vares
Theorem 5.2. There exists K : X → {0, 1, 2, . . . } verifying
µ0 {x : K(x) ≤ n} ≥ 1 − ce−c n ;
(5.1)
for suitable positive constants c, c , and all n ≥ 1, such that K(x) is a cluster index for x, µ0 a.s. Before starting the proof we need to introduce several definitions and specify some auxiliary parameters. Definition 5.3. Given x ∈ X, integers r > 1, r > 1, and 0 < @ < 1 (r , r and @ will be specified below) we recurrently define the sequence of pairs {(Rn (x), Rn (x))}n≥0 as follows: R0 (x) = R0 (x) = min{j : v¯0 (t¯j ) ≤ V∗ },
Rn+1 (x) = min{j : j ≥ Rn (x) + r and j is Rn (x) − admissible};
(5.2)
Rn+1 (x) = Rn+1 (x) + max{1 ≤ k ≤ r : max ξRn+1 (x)+j ≤ d + @}
(5.3)
and 1≤j ≤k
with the understanding that the Rn+1 = Rn+1 if ξRn+1 (x)+1 > d + @, n ≥ 0.
Clearly the random indices Rn , Rn are µ0 a.s. finite. (R0 = R0 = R0 , as defined in Sect. 4.) Definition 5.4. For n ≥ 1, the pair (Rn (x), Rn (x)) is called separating if Rn (x) − Rn (x) = r. Before specifying the constants @, r and r we will briefly justify our choice for the notion of separating pair. The heuristics is as follows: the pair (Rn (x), Rn (x)) being separating means that there is “a quite dense block” of r standing neutral particles, (ii) this will affect the evolution of our system in such a way that interacting with the first r1 < r neutral particles of the block the h.p. slows down its velocity up to a certain level, (iii) as a consequence of (ii), in the further motion within the block, the h.p. not only is unable to interact with the n.p. which were moving at the time tRn +r1 , but also when it arrives at the end of the block, all neutral particles with the indices less than or equal to Rn + r1 are relatively far. On the other hand the constant r which appears in (5.2) will be taken big enough (x)) are not so that any two subsequent pairs (Rn (x), Rn (x)) and (Rn+1 (x), Rn+1 too close to each other. This will allow us to get a lower bound for the increment of the interdistances between the h.p. and neutral particles with the indices less than ). This bound still doesn’t exclude or equal to Rn , over the time interval [tRn , tRn+1 interactions between the h.p. and “old” neutral particles during the time interval ) but it will be a fundamental ingredient for the construction of cluster [tRn , tRn+1 indices. (i)
Now we specify the constants @, r and r. We first fix 0 < @ such that @+
2d 2(d + @) < (1 + α)2 , 1 − α2 1 − α2
(5.4)
Mixing Properties for the Mechanical Motion of Charged Particle
i.e. 0 < @
(v¯D + δ)(B + Eµ0 ξ1 + δ) , 3δd
(5.8)
where δ is given by (4.1). Finally we set: def
r2 =
w¯ (r + 1)(E ξ + δ) @ µ0 1 + 1, d(vmin − w¯ @ )
(5.9)
where [y] denotes the integer part of y, and def
r = r 1 + r2 .
(5.10)
Remark 5.5. (i) With @ and r1 as in (5.4) and (5.5) we see that if the interdistances ξRn +1 , . . . , ξRn +r1 happen to satisfy d < ξRn +j ≤ d + @, 1 ≤ j ≤ r1 , then at the time t¯Rn +r1 the (outgoing) velocity of the h.p. in the corresponding Markovian approximation dynamics (and therefore in the original one, at the time tRn +r1 ) will be less than w¯ @ . In this case B is an upper bound for rj1=1 ξRn +j . 2 . Consequently, if at some time t (ii) Inequality (5.4) says that w¯ @2 + 2f (d + @) < vmin k the h.p. gets (outgoing) velocity less than or equal to w¯ @ and the next interdistance is at most d +@, then it cannot reach any moving n.p. during the time interval [tk , tk+1 ]. In this case, at time tk+1 its (outgoing) velocity will be smaller than w¯ @ . (iii) Similarly to the proof of Proposition 4.3, we see that under µ0 the random variables R0 and Yn = Rn − Rn−1 , def
n≥1
are independent. Yn , n ≥ 2 are also identically distributed. With minor modifications of that proof we see that the distribution of Y1 , and the common distribution of Yn , n ≥ 2 have exponentially decaying tails, i.e. there exist c, c > 0 so that
µ0 (Yn > r + k) ≤ ce−c k ,
(5.11)
for all n. Taking r large enough the proof shows we can always assume that µ0 (Yn > r + 1) < 1
(5.12)
for all n. Recall that from Proposition 4.3 the R0 also has exponentially decaying tail.
340
V. Sidoravicius, L. Triolo, M. E. Vares
Lemma 5.6. The choice of r in (5.8) insures that ) − qR ] − [qi (tR ) − qR ]} ≥ Eµ ξ1 + δ, min1≤i≤Rn {[qi (tRn+1 0 n n n+1
(5.13)
] between for any n ≥ 0, i.e. the increment of distance over the time interval (tRn , tRn+1 the h.p. and each of n.p. which was already moving at time tRn is at least Eµ0 ξ1 + δ.
Proof. By (iii) of Definition 4.2 and Corollary 3.4 (as in (4.8)) we have Rn+1 tRn+1 − tRn ≥
j =Rn +1 ξj
v¯D + δ
,
(5.14)
and due to the existence of minimal velocity, cf. Remark 3.2, qi (tRn+1 ) ≥ qi (tRn ) + vmin (tRn+1 − tRn ),
(5.15)
for 1 ≤ i ≤ Rn . Together with (5.14) this implies that R
qi (tRn+1 ) − qRn+1 ≥ qi (tRn ) − qRn
≥ qi (tRn ) − qRn ≥ qi (tRn ) − qRn
n+1 vmin +( ξj − 1) v¯D + δ
j =Rn +1
vmin + r d( − 1) v¯D + δ + B + Eµ0 ξ1 + δ
where, in the last inequality we have used (5.8). We next prove that
min qi (t) − q0 (t) − qi (tRn+1 ≥ −B, ) − qRn+1 tR
n+1
≤t≤tR
(5.16)
(5.17)
n+1
which together with (5.16) implies (5.13). Consider two cases: Case I. Rn+1 − Rn+1 ≤ r1 . By (5.3) we have, that for all tRn+1 ≤ t ≤ tRn+1 q0 (t) − qRn+1 ≤ r1 (d + @) = B,
(5.18)
) for all i, (5.17) follows trivially. and since qi (t) ≥ qi (tRn+1
Case II. Rn+1 − Rn+1 > r1 . As in Case I, one has the following estimate at the time tRn+1 +r1 : tR
n+1
min
≤t≤tR
n+1 +r1
qi (t) − q0 (t) − qi (tRn+1 ≥ −B. ) − qRn+1
] the h.p. moves with As noticed in Remark 5.5, during the time interval (tRn+1 +r1 , tRn+1 velocity at most w¯ @ , while all moving n.p. have velocity at least vmin > w¯ @ . Thus, over this time interval the increment of the interdistance between the h.p. and each n.p. with label i ≤ Rn (indeed for each i ≤ Rn+1 + r1 ) is increasing, and therefore (5.17) is verified, concluding the proof.
Mixing Properties for the Mechanical Motion of Charged Particle
341
The choice of r2 , and consequently that of r, implies suitable bounds on the increase of the minimal distance between the moving n.p. and the h.p. during the time interval [tRn +r1 , tRn ], provided the pair (Rn , Rn ) is separating. This will be used in the proof below. Proof of Theorem 5.2. We begin with a couple of simple observations: A) Each Rj is a stopping time with respect to the filtration (σ (v0 , ξ1 , . . . , ξn ))n≥1 , i.e., for each n ≥ 1 the event Rj = n belongs to σ (v0 , ξ1 , . . . , ξn ), the σ −field generated by the random variables v0 , ξ1 , . . . , ξn . From this and (5.3) it follows that µ0 [(Rj , Rj ) is separating] = µ0 [(Rj , Rj ) is separating, Rj = n] n≥j
=
n+r n≥j i=n+1
µ0 [ξi ≤ d + @] µ0 [Rj = n]
(5.19)
= (µ0 [ξ1 ≤ d + @])r > 0. B) On the other hand, if the pair (Rj , Rj ) is separating, from the choice of r2 in (5.9), at time tRj all n.p. with labels less than or equal to Rj + r1 will be at the distance at least (r + 1)(Eµ0 ξ1 + δ) from the h.p., i.e., min
1≤i≤Rj +r1
qi (tRj ) − qRj ≥ (r + 1)(Eµ0 ξ1 + δ).
(5.20)
In fact, to check (5.20) we first recall the observation (ii) of Remark 5.5 from which we see that
qi (tRj ) − qRj ≥ qi (tRj +r1 ) − qRj +r1 + vmin − w¯ @ (tRj − tRj +r1 ). Recalling the choice of r2 (cf. (5.9)) and that tRj − tRj +r1 ≥ r2 d/w¯ @ for a separating pair (Rj , Rj ), we get (5.20). We now check that property (5.20), together with observation (ii) in Remark 5.5, and Lemma 5.6 imply the following inclusion: [(Rj , Rj ) is separating] ∩
∞
[Rj +i − Rj+i−1 ≤ r + i]
i=1
(5.21)
⊆ [Rj + r1 is a cluster index]. Indeed, as observed in (ii) of Remark 5.5, the occurrence of the event [(Rj , Rj is separating] guarantees that during the time interval [tRj +r1 , tRj ] the h.p. cannot interact with those n.p. with labels less than or equal to Rj + r1 and moreover, according to (5.20), at time tRj all these n.p. are to the right of the point qRj + (r + 1)(Eµ0 ξ1 + δ). On the other hand, the occurrence of the event [Rj +1 − Rj ≤ r + 1], implies that qRj +1 ≤ qRj + (r + 1)(Eµ0 ξ1 + δ),
342
V. Sidoravicius, L. Triolo, M. E. Vares
and we see that, in this case, at time tRj all moving n.p. with labels less than or equal to Rj + r1 are to the right of the point qRj +1 . In particular, the h.p. cannot interact with any of them during the time interval [tRj +r1 , tRj +1 ]. Moreover, from (5.16) we have min
1≤i≤Rj +r1
qi (tRj +1 ) − qRj +1 ≥ (r + 2)(Eµ0 ξ1 + δ) + B,
thus, using (5.17) we conclude that the absence of interaction extends up to time tRj+1 and that min qi (tRj+1 ) ≥ qRj+1 + (r + 2)(Eµ0 ξ1 + δ). (5.22) 1≤i≤Rj +r1
Therefore, if the event [Rj +2 − Rj+1 ≤ r + 2] also occurs, it implies that at time tRj+1 all n.p. with the labels less than or equal to Rj + r1 will be not only to the right of the point qRj +2 , but they will be unreachable for the h.p. during the time interval (tRj , tRj+2 ], and we have: min
1≤i≤Rj +r1
qi (tRj+2 ) ≥ qRj+2 + (r + 3)(Eµ0 ξ1 + δ).
(5.23)
Repeating the above argument and using (5.13), we see that the occurrence of [(Rj , Rj ) separating ] and of each of the successive events [Rj +1 − Rj ≤ r + 1], . . . , [Rj +k+1 − Rj+k ≤ r + k + 1] implies that (by (5.13)) min
1≤i≤Rj +r1
qi (tRj+k+1 ) ≥ qRj+k+1 + (r + k + 2)(Eµ0 ξ1 + δ).
(5.24)
In these circumstances, the further occurrence of the event [Rj +k+2 − Rj+k+1 ≤ r + k + 2] guarantees that during the time interval (tRj+k+1 , tRj+k+2 ] the h.p. cannot interact with any n.p. with labels less than or equal to Rj + r1 , and moreover the interparticle distance will again increase at least by Eµ0 ξ1 + δ. Repeatedly using the same argument we obtain (5.21). Let us now use (5.21) in order to estimate the probability on the statement of Theorem 5.2. For this, we first prove the existence of positive constants c, c so that:
µ0 [∃ j ≤ n : Rj + r1 is a cluster index] ≥ 1 − ce−c n .
(5.25)
For the proof of (5.25) let us start by observing that, due to (5.21), the event of interest can be controlled in terms of the time of the last visit to 0 for a renewal Markov chain, ζ , on N, starting at 0 and with transition probabilities satisfying:
µ0 [ζk+1 = i + 1|ζk = i] ≥ 1 − ce−c i , µ0 [ζk+1 = 0|ζk = i] = 1 − µ0 [ζk+1 = i + 1|ζk = i]
(5.26)
for each i ∈ N and where c, c are positive, with c < 1 (which of course might differ from those in (5.25)). A possible way to define such a chain is indicated in (5.21) and (5.11). Namely, ζ0 = 0, ) is separating, and and for any k ≥ 1 if ζk = 0 we set ζk+1 = 1 provided (Rk+1 , Rk+1 0 otherwise. If ζk = i, (i ≥ 1), then ζk+1 = i + 1 provided Rk+1 − Rk ≤ r + i,
Mixing Properties for the Mechanical Motion of Charged Particle
343
and ζk+1 = 0 otherwise. That ζ is a Markov chain with the transition probabilities as described above follows at once, using (5.11), (5.12) and (5.19). From (5.21) we get: µ0 [∃ j ≤ n : Rj + r1 is a cluster index] ≥ µ0 [ζk > 0 for all k > n]. After the above observation, (5.26), and standard facts on the above simple Markov chain ([8], Ch. 2), we immediately get (5.25). To conclude Theorem 5.2 it now suffices to observe that there exist b > 0 so that µ0 {Rn > bn} decays exponentially in n. This follows from (5.11), the exponential decaying tail of the distribution of R0 = R0 , and standard arguments, as already used in Sect. 4 (upper estimate in the Cramér Theorem). Now we turn to the convergence of the measures µ(n) . The existence of weak limit points for the sequence {µ(n) } was established by proving tightness in Sect. 4. We prove the convergence by suitably coupling any Cesaro limit of {µ(n) }, with the initial measure µ0 . In particular, Theorem 2.6 will follow at once from the next lemma. Lemma 5.7. Let µ be a stationary measure for T1 , obtained as some weak limit point of n1 ni=1 µ(i) . Then we can define a coupling Q of µ and µ0 , in such a way that for any b > 0 there exist positive constants c, c so that
Q{(x, x ) : Tn x|[0,b] = Tn x |[0,b] } ≤ ce−c n
(5.27)
for all n ≥ 1. Proof. Let µ be as in the statement of the lemma. As in [5], under µ the moving particles are independent of the standing ones, which are distributed as under µ0 . Let µm be the marginal distribution corresponding to the moving particles in µ. Due to Assumptions 2.5 we know that under µm all n.p. have velocity at least vmin , except possibly for a set of measure zero (see Remark 3.2). Let us now make a joint construction of µ and µ0 which verifies (5.27). Let x = (xm , ξ1 (x), ξ2 (x), . . . ) be a random configuration distributed according to µ, where xm represents the moving particles, ξ1 denotes the position of the leftmost standing n.p., and ξ2 , ξ3 , . . . denote the interparticle distances between successive standing neutral particles. (Recall that we look at the system as seen from the position of the h.p., which is then located at the origin.) We start the construction of the two initial configurations x , x distributed as µ = x , and x contains and µ0 respectively, by first setting their moving particles: xm m m only one particle, namely the h.p., located at the origin, with velocity v0 (x ) being distributed according to the initial measure µ0 and independent of x. For this we might need to enlarge the original probability space where x is defined. We shall enlarge it even more, if needed, so as to accommodate the configurations x , x , cf. the remark at the end of this proof. Notation. To make notation lighter we shall denote by ξi , ξi ; i ≥ 1, the successive interdistances between standing n.p. in the configurations x and x respectively, and ξi = ξi (x), the interdistances in the starting configuration x. Let us now slightly modify the definition of R0 given in Sect. 4, by setting it as R0 (x, v0 (x )) = min{n : α 2n (v02 (x) ∨ v02 (x )) + 2f
n i=1
α 2(n−i+1) ξi (x) ≤ V∗2 };
344
V. Sidoravicius, L. Triolo, M. E. Vares
and set ξi = ξi = ξi for 1 ≤ i ≤ R0 (x, v0 (x )). This automatically implies that in the Markovian evolution for both configurations x and x , at time t¯R0 (x,v0 (x )) (i.e, at times t¯R0 (x,v0 (x )) (x ) (= t¯R0 (x,v0 (x )) (x)) and t¯R0 (x,v0 (x )) (x ) resp.) the velocity of the h.p. will be bounded from above by V∗ . By Proposition 3.3, the same will be true for the original dynamics, at the corresponding discrete times tR0 (x,v0 (x )) (x ) and tR0 (x,v0 (x )) (x ). The further indices Rj , Rj , are also modified, and defined as in Eqs. (5.2) and (5.3), provided r = r1 + r2 is changed to r˜ = r˜1 + r2 , with r˜1 to be chosen below. Before giving the prescription for r˜ , it is important to observe that with such a rule, the random variables Rj − R0 , Rj − R0 will depend only on the configuration x, being described exactly by (5.2), and (5.3), with the admissibility condition being settled in terms of the interdistances ξi , according to Definition 4.2. Letting @ be defined according to (5.4) and ρ > 0 we set, def r(ρ) = min j : α 2j w¯ @2 ≤ ρ and r˜1 will be taken as r˜1 = r1 + r(ρ) for a suitable ρ > 0 to be described below. r2 is the same as in (5.9). The definition of a separating pair is changed accordingly, replacing r by r˜ . Let then both configurations have the same interdistances between successive standing particles as the configuration x up to the index Rj 1 + r˜1 , where j1 = min{k : Rk = Rk + r˜ }, i.e. j1 is the index of the first separating pair. Observe that, [Rj 1 = n] ∈ An , where An = σ (v0 (x ), xm , ξ1 , . . . , ξn+˜r , ξ1 , . . . , ξn , ξ1 , . . . , ξn ), for each n ≥ 1. The choice of the parameters was done in such a way that at tRj +˜r1 the difference 1
of the squared velocities for the h.p. in the two evolutions (from x , x ) is at most ρ. Indeed, recalling the definition of r(ρ) and the choice of the same interdistances, this is a consequence of the fact that for both evolutions, at time tRj +r1 , the h.p. has velocity 1 at most w¯ @ , and that it does not interact with moving particles during time intervals [tRj +r1 , tRj +˜r1 ] (for both configurations), as observed in (ii) of Remark 5.5. 1
1
The distance to the next standing particle in both configurations x , x shall be properly chosen, so as to guarantee that with positive probability the next collision will happen with a standing particle and for both configurations the h.p. will get the same velocity. In order to do that we select these interdistances, (ξR +˜r +1 , ξR +˜r +1 ) in a way that their j1
j1
1
1
regular conditional distribution given the σ −field ARj , is the measure νv ,v , described 1
in Remark 5.8 below, and where v = v0 (TRj +˜r1 (x )), v = v0 (TRj +˜r1 (x )); each of 1 1 the marginals of νv ,v coincides with the distribution of ξ1 conditioned to ξ1 ≤ d + @ and (5.28) νv ,v {(ξ , ξ ) : (v )2 + 2f ξ = (v )2 + 2f ξ } ≥ a(@, ρ) > 0 for a proper choice of ρ. More precisely, given Rj 1 = n,
v0 (TRj
1
+˜r1 (x
)) = v ,
v0 (TRj
1
+˜r1 (x
)) = v ,
we take (ξ , ξ ) (conditionally) independent of An , and distributed according to the measure νv ,v and these will be the values of the interdistances (ξR +˜r +1 , ξR +˜r +1 ). j1
1
j1
1
(Notice that the marginals of νv ,v do not depend on v , v , as needed to ensure the correctness of the marginal distributions of x and x .) We are here using the standard
Mixing Properties for the Mechanical Motion of Charged Particle
345
definition for ARj , i.e., A ∈ ARj if and only if A ∈ A∞ and A ∩ [Rj 1 = n] ∈ An , for 1 1 each n. Remark 5.8. Due to the absolute continuity of the distribution of ξ1 and definition of d cf. Assumptions 2.5, we see that for any given @ > 0 if Z denotes a random variable distributed according to the conditional distribution of ξ1 given d < ξ1 ≤ d + @, then the variational distance between the distributions of u + 2f Z and that of u + 2f Z tends to zero as |u − u | → 0. On the other side, given two probability measures P , P on R there exists a probability measure Q on R × R with marginals P and P , and such that the mass off the diagonal is equal to half of the variational distance between P and P . This measure is called a “maximal coupling” of P and P , since it exactly maximizes the mass on the diagonal, among all probability measures with the given marginals. (For a proof of this classical result on coupling see e.g. p.18 of [10].) As a consequence of this observation we can find a probability measure νv ,v satisfying (5.28), provided v and v are close enough. In order to fix notations, for @ as fixed before we can take ρ(@) so that a(@, ρ) > 0 for 0 < ρ < ρ(@) since the choice of r˜1 guarantees that |(v )2 − (v )2 | ≤ ρ, as previously seen. We now set ξRj +˜r1 +i (x ) = ξRj +˜r1 +i (x ) = ξRj +˜r1 +i (x) for 2 ≤ i ≤ r2 . By the 1 1 1 choice of j1 we already know that all these variables are less than d + @ (since the pair (Rj 1 , Rj1 ) is separating). Two cases have to be considered: (a) If (v )2 +2f ξ = (v )2 +2f ξ , we then continue by the same interdistances between standing particles as from the configuration x, and look at the Markov chain ζ used in the proof of Theorem 5.2 and now defined in terms of the random variables Rj 1 +1 −Rj1 , . . . , starting with ζj1 = 1. The coupling with the same interdistances is done up to the first label k1 > j1 such that ζk1 = 0, if any. If no such k1 exists, it follows from the construction that Rj 1 + r˜1 + 1 will be a cluster index for both configurations x and x and that at this step of discrete dynamics both evolutions will have the same velocity. If k1 is finite, notice that at the corresponding Rk 1 we need to restart the previous construction. That is we keep using the same interdistances between standing particles as prescribed by the configuration {ξn }n up to the index Rj 2 + r˜1 , where j2 = min{k > k1 : Rk = Rk + r˜ }, and then we repeat the proper coupling of the next interdistances as done above. More precisely, we take (ξR +˜r +1 , ξR +˜r +1 ) in a way that their regular conditional distrij2
j2
1
1
bution given the σ −field ARj is νv0 (TR 2
j2 +˜r1
(x )),v0 (TR
j2 +˜r1
(x )) ,
and repeat the previous
argument. (b) If (v )2 + 2f ξ = (v )2 + 2f ξ then we use the same procedure but taking j2 = min{k > j1 : Rk = Rk + r˜ }. In this way two infinite configurations x , x can be constructed, and it is not hard (though tedious to write down all details) to see that each of them has the proper distribution. Using the same arguments as in the proof of Theorem 5.2, we conclude that if Q denotes their joint distribution, then it has the correct marginals, µ and µ0 , and verifies: Q{(x , x ) : there exists j ≤ n, such that Rj + r˜1 + 1 is a cluster index for
both configurations and v0 (TRj +˜r1 +1 (x )) = v0 (TRj +˜r1 +1 (x ))} ≥ 1 − ce−c n (5.29) from which the lemma follows.
346
V. Sidoravicius, L. Triolo, M. E. Vares
Remarks. (1) There are two crucial ingredients for the correctness of the above construction: (a) given Rj i the future of the Markov chain ζji +1 , . . . ζji +k is a function only of the interdistances in the x configuration: ξRj +1 , . . . , ξRj +k ; (b) the marginals of the i
i
measure νv ,v do not depend on the values of v , v and are the conditional distribution of ξ1 given d < ξ1 ≤ d + @. (2) A “constructive” approach to perform such a coupling would be to enlarge the original space, where x is defined by taking a product of independent random variables U0 , U1 , U1 , U2 , U2 , . . . , all of them uniformly distributed on the unit interval (0, 1), and completely independent of x. Then U0 is used to generate v0 (x ) and each pair (Ui , Ui ) is used to generate the observation of νv ,v needed at the index Rj i + r˜1 + 1, i.e. to generate (ξR +˜r +1 , ξR +˜r +1 ). ji
1
ji
1
As a consequence of Theorem 2.6 we may now consider the stationary process (τn )n with n ≥ 1, defined on the space (X0 , B(X0 ), µ), as: τ1 (x) = t1 (x); and for (n−1) (x)). n ≥ 2, τn (x) = t1 (T1
n ≤ Notice that this is well defined, i.e., τn (x) < +∞ µ a.s., since τn ≤ 2ξfn + vξmin cξn for some positive constant c. Moreover, the same argument used in Lemma 5.7 yields the mixing property for this process, as stated in Theorem 2.7 and proven below.
6. Proof of Theorem 2.7 Proof of Parts (I) and (II). For any set A ∈ Mk0 with µ(A) > 0 we consider the measures µ(·|A) and µ restricted to M+∞ k+1 . Proceeding as in the proof of Lemma 5.7 but applied to the evolution starting with Tk x, we see that a coupling Qk,A of two configurations x and x distributed according to µ(·) and µ(·|A), respectively, may be constructed in such a way that with probability at least 1 − ce−c m a joint cluster index less than m (which would correspond to the label k + m in the original system) will be found and such that at this epoch of the discrete dynamics, both configurations will exhibit the same velocity for the heavy particle. This automatically will ensure that on this set τj (x ) = τj (x ), for all j ≥ m. In order to perform this coupling we need first to observe that under each of these measures the law of the standing neutral particles is the same as under µ0 , and that they are independent of the moving particles. All moving n.p. have velocity at least vmin and moreover, under both measures the velocity of h.p. satisfies condition (b1) of Assumption 2.5. The construction done in the proof of Lemma 5.7 can be properly repeated here. With that same construction we have that supB∈M+∞ |µ(B|A) − µ(B)| k+m will be bounded by the complementary probability of the event in the l.h.s of Eq. (5.29), under the measure Qk,A . Since the constants c, c in this analogue of Eq. (5.29) can be chosen independently of k and A, as the construction shows, we conclude the proof of the first part of the theorem. def
Recall that under our conditions τ˜ = Eµ τ1 < +∞. Thus, the ψ−mixing property of the process (τn ) yields at once: n
1 τi → τ˜ n
µ − a.s.
i=1
(the same holds also for the measure µ0 a.s.). Since the random variables ξi satisfy the law of large numbers, under both µ and µ0 , if ϑt (x) denotes the number of collisions of the
Mixing Properties for the Mechanical Motion of Charged Particle
347
h.p. with standing particles up to time t, i.e., ϑt (x) = k if tk (x) ≤ t < tk+1 (x), for k ≥ 0, we get at once that lim
s→+∞
Eµ0 (ξ1 ) def qϑs (x) = = vD tϑs τ˜
µ0 − a.s.
From completely standard arguments we conclude that q0 (Tt (x)) q0 (t, x) = lim = vD t→+∞ t→+∞ t t lim
µ0 − a.s.,
completing the proof of Part II. Proof of Part the proof of III of√Theorem 2.7 is the diffu III. An important √ step for sivity of ni=1 (τi − Eµ τ1 )/ n and of ni=1 (ξi − vD τi )/ n under the measure µ0 . We first use the coupling constructed in the proof of Lemma 5.7 to reduce the proof of the invariance principle to the analysis of the stationary system, under the measure µ. For this, analogously to [5], we use Eq. (5.29) with n replaced by na for some 0 < a < 1/2, which guarantees that with probability tending to one as n → +∞ we have τi (x ) = τi (x ), for i ≥ na . To treat the stationary system, the main technical point is the following lemma, the proof of which is postponed to the Appendix. Lemma 6.1. The quantities Eµ (tn − Eµ tn )2 n→+∞ n +∞ = Eµ (τ1 − Eµ τ1 )2 + 2 Eµ (τ1 − Eµ τ1 )(τi − Eµ τ1 ), def
σ2 =
lim
(6.1)
i=2
Eµ (qn − vD tn )2 n→+∞ n +∞ = Eµ (ξ1 − vD τ1 )2 + 2 Eµ (ξ1 − vD τ1 )(ξi − vD τi )
σ =
2 def
lim
(6.2)
i=2
with vD = Eµ (ξ1 )/Eµ (τ1 ), are finite and strictly positive. (Recall that Eµ (ξ1 ) = Eµ0 (ξ1 ).) Once Lemma 6.1 has been proven, the proof of part III of Theorem 2.7 follows exactly the same pattern as that of part (ii) of Theorem 2 in [5], yielding the invariance principle with σ˜ = σ / Eµ (τ1 ). Details are omitted at this point.
7. Appendix: Proof of Lemma 6.1 Let us first notice that the finiteness of σ and σ as well as the last equality in each of Eqs. (6.1) and (6.2) follow at once from Part I of Theorem 2.7 and the stationarity of µ. The important point which remains to be proven is their positivity. The proof of Lemma 6.1 is analogous for σ and σ , and so we consider only the first, for shortness of writing. Using a construction, similar but more involved than that of Sect. 5, we will show the existence of weakly dependent regions for which the corresponding
348
V. Sidoravicius, L. Triolo, M. E. Vares
flight times have variances that can be explicitly estimated from below by a positive constant. Together with a suitable control of the covariances with the remaining flight times, this would give us a lower bound for the variances of tn which grows linearly in n. We begin with a “technical lemma”, which provides some needed estimates. Lemma 7.1. Given @ > 0, there exist positive constants c@ > 0, 0 < "1 ≤ "2 ≤ "3 ≤ "4 ≤ "5 and integers k1 > 0, k2 > 0 such that, if we define a finite sequence 1 +k2 +r2 +4 H = {Hi }ki=1 , where terms Hi are given by the following equation:
Hi =
"1 "2 @ "3 "4 "5
if if if if if if
i i i i i i
∈ {1, . . . , k1 }, = k1 + 1, ∈ {k1 + 2} ∪ {k1 + k2 + 5, . . . , k1 + k2 + r2 + 4}, ∈ {k1 + 3, . . . , k1 + k2 + 2} = k1 + k2 + 3, = k1 + k2 + 4,
(7.1)
and def
E = {ξi ≤ d + Hi , i = 1, . . . , k1 + k2 + r2 + 4} then we have: a) Varµ (tk1 +k2 +3 − tk1 +1 |E, v0 (0)) ≥ c@ > 0,
c@ Covµ (tk1 +k2 +3 − tk1 +1 ), τk1 +1 |E, v0 (0) ≥ − , 4
c@ Covµ τk1 +k2 +3 , τk1 +k2 +4 |E, v0 (0) ≥ − , 4
(7.2) (7.3) (7.4)
µ− almost surely on the set [v0 (0) ≤ V∗ ]; (b) on the event E ∩ [v0 (0) ≤ V∗ ], v0 (tj −1 ) ≤ α
2df + ρ(Hj ), 1 − α2
for j = k1 + 1, k1 + k2 + 3, k1 + k2 + 4, where ρ(.) is defined in Remark 5.8. We should notice that the constants "i , k1 , k2 appearing above are “of technical character”, do not have any direct relation to the nature of the model, but are sufficient to perform some explicit computations below. The proof of Lemma 7.1 is postponed to the end of the appendix, and we now check that Lemma 6.1 follows once this is proven. For this we shall again use coupling methods, the basic idea being to combine the “trap”, defined by the finite set of restrictions on the interdistances, cf. Lemma 7.1, with the notion of cluster index and a coupling as in Lemma 5.7. This will replace the notion of “good cluster index” in [5]; the situation here is more involved due to the dependence on the velocities for the discrete time dynamics. We start by a convenient modification of Definition 5.3.
Mixing Properties for the Mechanical Motion of Charged Particle
349
Definition 7.2. Given x ∈ X, we recurrently define the sequence of pairs $n (x), R $n (x))}n≥0 as follows: {(R $0 (x) = R $0 (x) = min{j : v¯0 (t¯j ) ≤ V∗ }, R $n (x) + r and j is R $n (x) − admissible}; $n+1 (x) = min{j : j ≥ R R where r satisfies (5.8) and (5.12), and $n+1 $n+1 R (x) = R (x) + max{1 ≤ k ≤ k1 + k2 + r2 + 4 : ξRn+1 (x)+j ≤ d + Hj , ∀1 ≤ j ≤ k}. (7.5)
$n , R $n are µ and µ0 a.s. finite, and each R $n is a stopping As before, the random indices R $ $ time for the filtration (σ (xm , ξ1 , . . . , ξm ))m≥1 . (Notice that R0 = R0 = R0 , cf. Sect. 4.) $n (x), R $n (x))}n≥1 defined above, we will say that Definition 7.3. Given the sequence {(R $n (x), R $n (x)) is strongly separating if R $n (x) − R $n (x) = k1 + k2 + r2 + 4. the pair (R Proof of Lemma 6.1. Due to (b) of Lemma 7.1, we may take the measure νv,v as in Remark
2df 5.8, with v = α 1−α 2 , and v ≤ v + ρ(Hi ), for i = k1 + 1, k1 + k2 + 3, k1 + k2 + 4. Let x be distributed according to µ; if needed, we shall enlarge the probability space, as in the proof of Lemma 5.7, and construct two random configurations x , x , both distributed according to µ. We shall use this enlarged space to estimate the conditional variance of tn (x ) given a suitable σ −field M; upon integration, this gives the desired estimate on the variance of tn under µ. We start by first fixing the moving particles: = x = x , i.e., both configurations have exactly the same moving particles as xm m m the configuration x. As before we shall use ξi to denote the successive interparticle distances between standing particles in the configuration x, and ξi , ξi will denote the corresponding interdistances in the configurations x , x , respectively. The random $n , R $n are taken as functions of the configuration x only, according to the indices R previous definition. Moreover, we set, for s ≥ 1,
$j , R $j ) is strongly separating}. js = min{j ≥ js−1 + 1 : (R with j0 = 0. As before, calling $n = σ (xm , ξ1 , . . . , ξn+k1 +k2 +r2 +4 , ξ1 , . . . , ξn , ξ1 , . . . , ξn ), A $n , for each n ≥ 1, s ≥ 1. Moreover we define ξ = ξ = $ = n] ∈ A we see that [R js i i $ + k1 + 1, R $ + k1 + k2 + 3, R $ + k1 + k2 + 4, and for all s ≥ 1. ξi for i = R js js js $$ is νv ,v , as deThe conditional distribution of (ξR$ +k+1 , ξR $ +k+1 ) given AR js +k js js 2df scribed by Remark 5.8, where v = α 1−α $ +k (x )) for k = 2 and v = v0 (TR js k1 , k1 + k2 + 2, k1 + k2 + 3 and for each s ≥ 1. The procedure is well defined and in this way we construct two configurations (x , x ), each of them being distributed according to µ, analogously to the coupling in the proof of Lemma 5.7. Besides the change in the finite “trap” involved in the definition of “strongly separating” with respect to what we have called “separating” in Sect. 5, our coupling procedure has now been modified to fit into the present purposes.
350
V. Sidoravicius, L. Triolo, M. E. Vares
˜ the distribution of the configurations (x , x ) constructed by this We denote by Q $ , R $ take the prescription; it follows from the construction that the random variables R j j same value in x , x and x. For s ≥ 1 we define: a) v02 (TR$ +k (x )) + 2f ξR$ js js (x)+k+1 2df 1 if = α2 + 2f ξR$ 2 js (x)+k+1 1 − α (7.6) Zs (x , x ) = for k = k1 , k1 + k2 + 2, k1 + k2 + 3, c) R $j +i−1 ≤ r + i, ∀i ≥ 1, $j +i − R s s 0 otherwise. $ , R $ , . . . ) - the sigma-algebra generated by variables Call M = σ (Z1 , Z2 , Z3 , . . . , R j1 j2 $ , s ≥ 1. In particular, from (5.28) and the properties of the Markov chain ζ Zs , R js involved in the proof of Theorem 5.2, we conclude that the pair correlations of the variables Zs decay exponentially fast and that there exists θ > 0 so that ˜ s = 1) ≥ θ. inf Q(Z
(7.7)
s
The next step is to prove the following c@ Zs 1[R$ +k1 +k2 +3≤n] , VarQ˜ (tn (x )|M) ≥ js 2 s
˜ − a.s. Q
(7.8)
Before proving Eq. (7.8) let us observe that the validity of Eq. (6.1) follows quite easily from it. For this, let us first observe that we have $1 + $m ≤ (k1 + k2 + r2 + 4)(m − 1) + R R
m j =2
$j , Y
$ def $ $ where Y j = Rj − Rj −1 , j ≥ 2 are i.i.d. integrable random variables under µ (as well as µ0 ), and their common distribution has exponentially decaying tails, analogously to (5.11). (Observe that these variables depend only on the configuration x.) Thus, if M is large enough, we have: $[n/M] + k1 + k2 + 3 > n] = 0. lim µ[R
n→+∞
(7.9)
(Use [y] to denote the integer part, if y is a non-negative real number.) Also the indices js are partial sums of i.i.d. random variables, with exponentially decaying tails: µ[j1 = k] ≤ cηk for suitable c and 0 < η < 1. Thus, for M large enough: lim µ[j[n/M ] > [n/M]] = 0.
n→+∞
(7.10)
On the other side, from (7.8) we have:
[n/M ] 1 c@ VarQ˜ (tn ) ≥ EQ˜ Zs 1[R$ j[n/M ] +k1 +k2 +3≤n] n 2n s=1
c@ [n/M ] $j ≥ + k1 + k2 + 3 > n] , θ − µ[R ] [n/M 2n
(7.11)
Mixing Properties for the Mechanical Motion of Charged Particle
351
and (6.1) follows at once from (7.9), (7.10), and (7.11). The proof of Eq. (7.8) is based on Lemma 7.1 by decomposing tn into a sum of time intervals determined by Zs = 1. Namely, let s1 , s2 , . . . be the successive indices such that Zs = 1, (recalling Eq (7.7) and the comment just before it), and to avoid too messy $ , and set indexation, write Ri∗ = R js i
% τi = tRi∗ +k1 +k2 +3 − tRi∗ +k1 +1 ,
i ≥ 1,
∗ +k +k +3 , τ%i = tRi∗ +k1 +1 − tRi−1 1 2
i ≥ 2,
% τ%1 = tR1∗ +k1 +1 .
∗ Letting L be determined by RL∗ + k1 + k2 + 3 ≤ n < RL+1 + k1 + k2 + 3, we can write
VarQ˜ (tn (x )|M) & L '
= VarQ˜ τ%i + % τi + tn − tRL∗ +k1 +k2 +3 M i=1
=
L i=1
L
VarQ˜ τ%i M + VarQ˜ (% τi M) + VarQ˜ (tn − tRL∗ +k1 +k2 +3 M
+2
i=1
1≤i 0, the same as in (5.4) and can take a positive integer b so that µ0 {d < ξ1 < d + b@ } and µ0 {d + 2@ b < ξ1 < d + @} are both positive. With φ > 0 being defined through τ (0, we take u2 such that
2@ @ ) − τ (0, ) = 2φ, b b
(1 − α)f φ, 4α 2@ @ τ (u2 , ) − τ (0, ) ≥ φ, b b 0 < u2 ≤
and v(0, Take now
def
(7.15) (7.16) (7.17)
µ0 {d < ξ1 < d + b@ } > 0, µ0 {d < ξ1 < d + @}
(7.18)
µ0 {d + 2@ b < ξ1 < d + @} > 0, µ0 {d < ξ1 < d + @}
(7.19)
1 (1 − α)φ 2 p 1 p2 . 2 2
(7.20)
def
p1 = p2 =
2@ @ ) > v(u2 , ). b b
(7.14)
and define def
c@ =
Mixing Properties for the Mechanical Motion of Charged Particle
353
Now we fix 0 < "5 < @ such that
c@ τ (0, d) τ (0, "5 ) − τ (0, 0) ≤ . 8
(7.21)
In what follows we take ρ(") sufficiently small, according to Remark 5.8. By continuity of τ (., .) we can fix 0 < u5 < ρ("5 ) such that
c@ (7.22) τ (0, d) τ (0, "5 ) − τ (u5 , 0) ≤ , 4 and due to the above mentioned properties of the functions τ (., .), v(., .) and ρ(.) we may take 0 < "4 < @ and u4 > 0 such that: 0 < u4 < ρ("4 ), 2df + u5 , (7.23) v(u4 , "4 ) ≤ α 1 − α2 and τ (0, "4 ) − τ (u4 , 0) ≤
1−α φ. 4
(7.24)
Now we fix k2 as the first integer such that α k2 v(u2 , @) ≤
u4 , 2
(7.25)
and pick 0 < "3 < @ such that 2f (d + "3 )
k2
α
2i
i=1
as well as
2df u4 2 ≤ α + , 1 − α2 2
k2 @ 2@ α 2i . α 2k2 v(0, ) − v(u2 , ) > 2f "3 b b
(7.26)
(7.27)
i=1
We set "2 > 0 such that
c@ τ (0, "2 ) − τ (0, 0) (k2 + 2)τ (0, @) ≤ , 8 and take u1 such that 0 < u1 < ρ("2 ), v(u1 , "2 ) ≤ α and
2df + u2 , 1 − α2
c@ τ (0, "2 ) − τ (u1 , 0) (k2 + 2)τ (0, @) ≤ . 4
(7.28)
(7.29) (7.30)
Now we set
u 2 1 k1 = min k ≥ r1 : α 2k V∗2 ≤ 2 and finally take "1 > 0 such that k1 2df u 1 2 2(k1 −i+1) 2f (d + "1 ) α ≤ α + . 2 1−α 2 i=1
(7.31)
(7.32)
354
V. Sidoravicius, L. Triolo, M. E. Vares
Next we verify that with the above choices of ui , "i , i = 1, . . . , 4 we have inf (tk1 +k2 +3 − tk1 +1 ) − sup(tk1 +k2 +3 − tk1 +1 ) ≥ U
L
(1 − α) φ, 2
(7.33)
where U = [v0 (0) ≤ V∗ , ξi ≤ κi , 1 ≤ i ≤ k1 + k2 + 3, ξk1 +2 ≥ d + 2@ b ] and L = [v0 (0) ≤ V∗ , ξi ≤ κi , 1 ≤ i ≤ k1 + k2 + 3, ξk1 +2 ≤ d + b@ ]. Indeed, let us observe that inf (tk1 +k2 +3 − tk1 +1 ) − sup(tk1 +k2 +3 − tk1 +1 ) U
L
≥ inf (τk1 +2 + [tk1 +k2 +2 − tk1 +2 ] + τ (u4 , 0)) U
− sup(τk1 +2 + [tk1 +k2 +2 − tk1 +2 ] + τ (0, "4 )) L
k2 −1
1 1−α 1 v0 (tk1 +2+i ) + v0 (tk1 +k2 +2 ) ≥ inf τk1 +2 − v0 (tk1 +2 ) + f αf αf U
(7.34)
i=1
k2 −1 1 1−α + τ (u4 , 0) − sup τk1 +2 − v0 (tk1 +2 ) + v0 (tk1 +2+i ) f αf L
1 v0 (tk1 +k2 +2 ) + τ (0, "4 )), + αf
i=1
where we have used Eq. (3.6), remarks (i) and (ii) just preceding Lemma 5.6, implying that under the given conditions there are no collisions with moving particles in thetime interval [tk1 , tk1 +k2 +3 ], and the fact that on L or U we have v0 (tk1 +k2 +2 ) ∈
(α
2df ,α 1−α 2
2df 1−α 2
+ u4 ], as it follows from Eqs. (7.25) and (7.26). Moreover, from
Eq. (7.24) we have τ (u4 , 0) − τ (0, "4 ) ≥ − 1−α 4 φ, and Eqs. (3.5) and (7.27) imply that inf U v0 (ti ) ≥ supL v0 (ti ) for k1 + 2 ≤ i ≤ k1 + k2 + 2. On the other side, due to the absence of recollisions during the time interval under consideration, we have:
1 1 inf τk1 +2 − v0 (tk1 +2 ) − sup τk1 +2 − v0 (tk1 +2 ) f f U L
α = inf (1 − α)τk1 +2 − v0 (tk1 +1 ) f U
α − sup (1 − α)τk1 +2 − v0 (tk1 +1 ) f L 2@ @ (1 − α) φ ≥ (1 − α)(τ (u2 , ) − τ (0, )) − b b 4 3 ≥ (1 − α)φ 4
(7.35)
by (7.15) and (7.16). From (7.35) and previous consideration (7.33) follows. We now use the following trivial fact: if Y is a real random variable for which there are constants a, a , with a − a ≥ γ > 0 and P(Y ≤ a) ≥ p1 and P(Y ≥ a ) ≥ p2 , then 1 Var(Y ) ≥ γ 2 p1 p2 . (7.36) 2
Mixing Properties for the Mechanical Motion of Charged Particle
355
From this, (7.33), (7.18) and (7.19) statement (7.2) of Lemma 7.1 follows at once. To prove (7.3) we first notice that, since each "i ≤ @, on E ∩ [v0 (0) ≤ V∗ ] we have: tk1 +k2 +3 − tk1 +1 ≤ (k2 + 2)τ (0, @); and
(7.37)
τ (u1 , 0) ≤ τk1 +1 ≤ τ (0, "2 ).
Due to the choice of "2 and u1 , cf. Eqs. (7.28) and (7.30) we immediately get (7.3). Using (7.22) and analogous estimates we get (7.4). Statement b) of the lemma follows at once from the choice of u2 , u4 and u5 (according to (7.29), (7.25), (7.26), and (7.23)). Acknowledgements. The authors would like to thank E. Presutti for many helpful discussions. Partial support by CNPq-CNR agreement and FINEP(Pronex) are gratefully acknowledged. The project has been partially supported by FAPERJ grant E-26/150.940/99 and by CNPq (Brazil).
References 1. Arnold, V. and Avez, A. (1967): Problèmes ergodiques de la mécanique classique. Paris: Gauthier-Villars 2. Boldrighini, C. (1986): Bernoulli property for a one-dimensional system with localized interaction. Commun. Math. Phys. 103, 499–514 3. Boldrighini, C., De Masi, A., Nogueira, A. and Presutti, E. (1985): The dynamics of a particle interacting with a semi-infinite ideal gas is a Bernoulli flow. In: Statistical physics and dynamical systems: Rigorous results. Fritz, J., Jaffe, A., Szasz, D., (eds). Progress in Physics, 10, Boston– Basel: Birkhäuser, pp. 153– 189 4. Boldrighini, C., Pellegrinotti, A., Presutti, E., Sinai, Ya. and Soloveitchik, M. (1985): Ergodic properties of a semi-infinite one-dimensional system of statistical mechanics. Commun. Math. Phys. 101, 363–382 5. Boldrighini, C., Cosimi, G., Frigio, S. and Nogueira, A. (1989): Convergence to a stationary state and diffusion for a charged particle in a standing medium. Probab. Theory Relat. Fields 80, 481–500 6. Boldrighini, C., Soloveitchik, M. (1995): Drift and diffusion for a mechanical system. Prob. Theory Rel. Fields 103, 349–379 7. Dembo, A. and Zeitouni, O. (1993): Large deviations techniques and applications. Boston: Jones and Bartlett Publishers, Inc. 8. Karlin, S., Taylor, H. (1975): A First Course in Stochastic Processes. 2nd edition, New York: Academic Press 9. Landau, L. and Lifschitz, E. (1959): Statistical physics. London–Paris: Pergamon Press 10. Lindvall, T. (1992): Lectures on the Coupling Method. New York: Wiley 11. Piasecki, J. (1983): Approach to field-induced stationary state in a gas of hard rods. J. Stat. Phys. 30, 201–209 12. Presutti, E., Sinai, Ya. and Soloveitchik, M. (1985): Hyperbolicity and Møller morphism for a model of classical statistical mechanics. In: Statistical physics and dynamical systems: Rigorous results. J. Fritz, A. Jaffe and D. Szasz (eds). Progress in Physics, 10, Basel–Boston: Birkhäuser, pp. 253-284 13. Pellegrinotti, A., Sidoravicius, V. and Vares, M.E. (1999): Stationary state and diffusion for a charged particle in a one dimensional medium with lifetimes. SIAM Probab. Theory Appl. 44 4, 796–825 14. Sidoravicius, V., Triolo, L. and Vares, M.E. (1998): On the forced motion of a heavy particle in a random medium I. Existence of dynamics. Markov Proc. Rel. Fields 4 4, 629–648 15. Sinai, Ya.G. (1970): Dynamical systems with elastic reflections. Russ. Math. Surv. 25, 137–189 16. Sinai, Ya.G. and Soloveitchik, M. (1986): One dimensional Classical Massive Particle in the ideal Gas. Commun. Math. Phys. 104, 423–443 17. Spohn, H. (1991): Large scale Dynamics of Interacting Particles. Text and Monographs in Physics, Berlin: Springer Communicated by Ya. G. Sinai
Commun. Math. Phys. 219, 357 – 398 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Application of the τ -Function Theory of Painlevé Equations to Random Matrices: PIV, PII and the GUE P. J. Forrester1 , N. S. Witte1,2 1 Department of Mathematics and Statistics, University of Melbourne, Victoria 3010, Australia.
E-mail:
[email protected];
[email protected] 2 School of Physics, University of Melbourne, Victoria 3010, Australia
Received: 27 June 2000 / Accepted: 8 December 2000
Abstract: Tracy and Widom have evaluated the cumulative distribution of the largest eigenvalue for the finite and scaled infinite GUE in terms of a PIV and PII transcendent respectively. We generalise these results to the evaluation of E˜ N (λ; a) := N (l) (l) (l) a l=1 χ(−∞,λ] (λ − λl ) , where χ(−∞,λ] = 1 for λl ∈ (−∞, λ] and χ(−∞,λ] = 0 otherwise, and the average is with respect to the joint of the GUE, Neigenvalue adistribution (λ − λ ) . Of particular interest as well as to the evaluation of FN (λ; a) := l l=1 ˜ are EN (λ; 2) and FN (λ; 2), and their scaled limits, which give the distribution of the largest eigenvalue and the density respectively. Our results are obtained by applying the Okamoto τ -function theory of PIV and PII, for which we give a self contained presentation based on the recent work of Noumi andYamada. We point out that the same approach can be used to study the quantities E˜ N (λ; a) and FN (λ; a) for the other classical matrix ensembles. Contents 1. 2.
3.
Introduction and Summary . . . . . . . . . . . . . . . . . . . τ -Function Theory for PIV . . . . . . . . . . . . . . . . . . . 2.1 Affine Weyl group symmetry . . . . . . . . . . . . . . 2.2 Toda lattice equation . . . . . . . . . . . . . . . . . . . 2.3 Classical solutions . . . . . . . . . . . . . . . . . . . . 2.4 Bäcklund transformations and discrete Painlevé systems τ -Function Theory for PII . . . . . . . . . . . . . . . . . . . . 3.1 Affine Weyl group symmetry . . . . . . . . . . . . . . 3.2 Toda lattice equation . . . . . . . . . . . . . . . . . . . 3.3 Classical solutions . . . . . . . . . . . . . . . . . . . . 3.4 Bäcklund transformations and discrete dPI . . . . . . . 3.5 Coalescence from PIV . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
358 362 362 366 369 372 375 375 378 379 379 381
358
4.
5.
6.
P. J. Forrester, N. S. Witte
Application to Finite GUE Matrices . . . . . . . . . . . 4.1 Calculation of EN (0; (s, ∞)) and E˜ N (s; a) . . . 4.2 Calculation of FN (s; a) . . . . . . . . . . . . . 4.3 UN (t; a) and VN (t; a) as Painlevé transcendents Edge Scaling in the GUE . . . . . . . . . . . . . . . . 5.1 Calculation of E soft (s) and E˜ soft (s; a) . . . . . 5.2 Calculation of F soft (λ; a) . . . . . . . . . . . . Conclusions – A Programme . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
382 382 386 388 389 389 393 396 397
1. Introduction and Summary Hermitian random matrices X with a unitary symmetry are defined so that the joint distribution of the independent elements P (X) is unchanged by the similarity transformation X → U † XU for U unitary. For example, an ensemble of matrices with N j P (X) := exp ∞ j =0 αj Tr(X ) =: j =1 g(λj ) for general g(x) ≥ 0 possesses a unitary symmetry. Such ensembles have the property that the corresponding eigenvalue probability density function p(λ1 , . . . , λN ) is given by the explicit functional form p(λ1 , . . . , λN ) =
N 1 g(λl ) C l=1
|λk − λj |2 ,
(1.1)
1≤j 0), Laguerre g(x) = (1.6) a (1 + x)b (−1 < x < 1), Jacobi (1 − x) (1 + x 2 )−α , Cauchy. To summarise the correspondence, which applies to any of the cases, we set out the Table 1. The quantity E˜ N (s; a) in Table 1 is specified by E˜ N (λ; a) :=
N
l=1
(l) χ(−∞,λ] (λ − λl )a ,
(l)
(1.7)
(l)
where χ(−∞,λ] = 1 for λl ∈ (−∞, λ] and χ(−∞,λ] = 0 otherwise, and the average is with respect to the eigenvalue probability density function (1.1). For general a we obtain the evaluation s ˜ ˜ UN (t; a) dt (1.8) EN (s; a) = EN (s0 ; a) exp s0
(Eq. (4.14) with the substitution (4.10)), where UN (t; a) satisfies the nonlinear equation (UN )2 − 4(tUN − UN )2 + 4UN (UN − 2a)(UN + 2N ) = 0,
(1.9)
(Eq. (4.15)) subject to the boundary condition UN (t; a)
∼
t→−∞
−2N t −
1 N (a + N ) +O 3 , t t
(1.10)
360
P. J. Forrester, N. S. Witte
(Eq. (4.18)). For (N +1) × (N +1) dimensional GUE matrices pmax (s) is proportional 2 to e−s E˜ N (s; 2). We therefore have s = pmax (s0 ) exp (1.11) pmax (s) [−2t + UN (t; 2)] dt , N →N+1
N →N+1
s0
(Eq. (4.20)). The quantity FN (s; a) in Table 1 is specified by FN (λ; a) :=
N
(λ − λl )a ,
(1.12)
l=1
(Eq. (1.7)). For general positive integers a (1.12) has been computed by Brézin and Hikami in terms of the determinant of an a × a matrix involving Hermite polynomials. Note that for a not equal to a positive integer, (1.12) is well defined provided λ has a non-zero imaginary part. For general a we obtain the evaluation λ FN (λ; a) = FN (λ0 ; a) exp VN (t; a) dt , (1.13) λ0
(Eq. (4.33)) where VN (t; a) also satisfies the nonlinear equation (4.15), but now with the boundary conditions Na 1 + O(1/t) as t → ∞ (1.14) VN (t; a) ∼ χ t→±∞ t (Eq. (4.35)) where χ = 1 for t → ∞ and |χ | = 1 for t → −∞. In the case a = 2 this average is proportional to the polynomial part of the eigenvalue density for (N +1) × (N +1) dimensional GUE matrices, which in terms of the Hermite polynomial HN (λ) is proportional to each of the 2 × 2 determinants termed Turánians [17] HN (λ) HN+1 (λ) (λ) HN+1 (λ) HN+1 HN (λ) HN+1 (λ) , , , (λ) HN (λ) HN+1 HN+1 HN+1 (λ) HN+2 (λ) (λ) (λ) HN+1 (1.15) (which are of course proportional to each other). The result (4.33) with a = 2 implies λ ρ(λ) = ρ(λ0 ) exp (1.16) [−2t + VN (t; 2)] dt , N →N+1
N →N+1
λ0
(Eq. (4.36)). In Sect. 2 we review the τ -function theory of the Painlevé IV equation, revising relevant aspects of the work of Okamoto [24, 26], Noumi and Yamada [21, 22] and Kajiwara et al. [16]. The culmination of this theory from our perspective is the derivation of determinant formula expressions for the τ -function corresponding to special values of the parameters in the Painlevé IV equation. On the other hand, it follows easily from the definitions that E˜ N and FN can be written as determinants. These are presented in Sect. 4. The determinant formulas in fact precisely coincide with those occurring in Sect. 2, so consequently we can characterise both E˜ N and FN in terms of solutions of the nonlinear equation (4.15). The theory presented in Sect. 2 also allows E˜ N , FN to be
Application of the τ -Function Theory of PIV, PII to the GUE
361
characterised as solutions of a certain fourth order difference equation (Eq. (2.82)), and UN , VN as solutions of a particular third order difference equation (4.19). √ √ Also of interest is the scaling limit of (1.7) and (1.12) with λ → 2N + λ/ 2N 1/6 . This choice of coordinate corresponds to shifting the origin to the edge of the leading order support of the eigenvalue density, then scaling the coordinate so as to make the spacings of order unity as N → ∞. We find the scaled quantities can be expressed in terms of particular solutions of the general Jimbo–Miwa–Okamoto σ form of the Painlevé II equation (u )2 + 4u (u )2 − su + u − a 2 = 0, (1.17) (Eq. (5.10)). Specifically, as already known from [36], √ ∞ s E soft (s) := lim EN 0; 2N + √ = exp − r(t) dt , N→∞ 2N 1/6 s where r(s) satisfies (5.10) with a = 0. Also E˜ soft (s; a) :=
lim
√ √ N →∞ s → 2N+s/ 2N 1/6
= E˜ soft (s0 ; a) exp
Ce−as
s
s0
2 /2
(1.18)
E˜ N (s; a)
(1.19)
u(t; a) dt ,
where u(s; a) satisfies (5.10) subject to the boundary condition u(s; a)
∼
s→−∞
1/ s 2 4
+
4a 2 −1 (4a 2 −1)(4a 2 −9) + ... , + 8s 64s 4
(Eq. (5.11)). In the case a = 2 (Eq. (5.7)) gives the formula s soft soft pmax (s) = pmax (s0 ) exp u(t; 2) dt , s0
(1.20)
(1.21)
(Eq. (5.20)) for the scaled distribution of the largest eigenvalue in the GUE. Analogous to the formula (5.7), for the scaled limit of FN (λ; a) we have 2 Ce−aλ /2 FN (λ; a) lim F soft (λ; a) := √ √ N →∞ λ → 2N+λ/ 2N 1/6
=F
soft
(λ0 ; a) exp
λ
λ0
(1.22)
v(t; a) dt ,
(Eq. (5.31)) where v(s; a), like u(s; a), satisfies (5.10). The difference between u and v is in the boundary condition; for the latter we require v(t; a) ∼ −at 1/2 − t→∞
a(4a 2 +1) a2 + 4t 32t 5/2
(1.23)
(Eq. (5.35)). The case a = 2 corresponds to the scaled eigenvalue density at the spectrum edge, which has the known evaluation [10] Ai(s) Ai (s) ρ soft (s) = − (1.24) , Ai (s) Ai (s)
362
P. J. Forrester, N. S. Witte
where Ai(s) denotes the Airy function. In fact for all a ∈ Z≥0 we have the determinantal form d j +k F soft (λ; a) = (−1)a(a−1)/2 det Ai (λ) , (1.25) j +k j,k=0,...,a−1 dλ (Eq. (5.33)). In Sect. 3 we present the τ -function theory of the Painlevé II equation in an analogous fashion to the theory presented in Sect. 2 for the Painlevé IV equation. In particular we derive the second order second degree equation satisfied by the Hamiltonian (which is known from [14] and [26]) as well as a fourth order difference equation satisfied by the τ -functions. Also derived is the fact that the right-hand side of (5.33) corresponds to a τ -function sequence in the PII theory, which is a result of Okamoto [26]. In Sect. 5 the results (5.3), (5.7), (5.31) and (5.33) are derived from a limiting process applied to the corresponding finite N results. A programme for further study is outlined in Sect. 6. 2. τ -Function Theory for PIV 2.1. Affine Weyl group symmetry. It has been demonstrated in the works of Okamoto [26] (in a series of papers treating all the Painlevé equations), Noumi and Yamada [21] (see also their works [20, 23, 22]) and the earlier work of Adler [2] that the fourth Painlevé equation y =
1 2 3 3 β (y ) + y + 4ty 2 + 2(t 2 − α)y + , 2y 2 y
(2.1)
can be recast in a way which reveals its symmetries in a particularly manifest and transparent form. Proposition 1 ([21, 22]). The fourth Painlevé equation is equivalent to the coupled set of autonomous differential equations (where = d/dt) f0 = f0 (f1 − f2 ) + 2α0 , f1 = f1 (f2 − f0 ) + 2α1 ,
f2
(2.2)
= f2 (f0 − f1 ) + 2α2 ,
with y = −f1 and where the parameters αj ∈ R with α0 + α1 + α2 = 1 are related by α = α0 − α2 ,
β = −2α12 ,
(2.3)
and the constraint taken conventionally as f0 + f1 + f2 = 2t.
(2.4)
Proof. Equation (2.4) reduces the three first order equations of (2.2) down to two. Eliminating a further variable by introducing a second derivative shows that y = −f1 satisfies the PIV equation. The form of these equations implies (f0 + f1 + f2 ) = 2α0 + 2α1 + 2α2 = k, k = 0 constant, thus permitting the normalization given above.
(2.5)
Application of the τ -Function Theory of PIV, PII to the GUE
363
α0 = 0 1
11 α1 = 0 11 α1 = 1 11 α1 = 2 11
α1 = −1 11
11
1 •1 •1 •1 1
•1 11 α2 = 2 1 1 1
111
111
111
1 1 1
11 11 11
11
11
11
α0 = 11
11 11BJ 1 11
1
•1 11
• 11
• 111 α2 = 1
•1111
11
11
11 11 T3
11111T2 11
1
11 11
11111
11
◦ 1111
11
α0 = 21
11 1 1 1 1ks
11
1
• 11
• 111 α2 = 0
• 111
• 11 T
1
11
11
O 11 11 11 11
11 11 11
11
11
11
α0 = 31
11 11
1 1 1 1 • 1 • • • α = −1
11
11
11
11 2 (1)
Fig. 1. Parameter space for (α0 , α1 , α2 ) associated with the simple roots of the root system A2
Note. Many differing conventions are in use for such a description of the PIV system and for example we have written 2αj (j = 0, 1, 2) in place of the αj used in [21, 23, 22] in order to eliminate unnecessary factors of two appearing in the ensuing theory. The hyperplane α0 + α1 + α2 = 1 in parameter space (α0 , α1 , α2 ) ∈ R3 is associ(1) ated with the simple roots α0 , α1 , α2 spanning the root system of type A2 . From this perspective the parameters α0 , α1 and α2 define a triangular lattice in the plane (see Fig. 1). Let the fundamental reflections si (i = 0, 1, 2) represent the automorphism of the lattice specified by a reflection with respect to the line αi = 0. Their action on the simple roots are given by si (αj ) = αj − αi aij ,
(2.6)
where aij are the elements of the Cartan matrix 2 −1 −1 A = −1 2 −1 . −1 −1 2
(2.7)
Let π represent the lattice automorphism corresponding to a rotation by 120◦ degrees around the barycentre of the fundamental alcove C defined by αi > 0 (i = 0, 1, 2). Then π(αj ) = αj +1 ,
(2.8)
j ∈ Z/3Z. The operators π , si obey the algebra sj2 = 1,
(sj sj +1 )3 = 1,
sj sj ±1 sj = sj ±1 sj sj ±1 ,
π 3 = 1,
π sj = sj +1 π, (2.9)
364
P. J. Forrester, N. S. Witte
= π, s0 , s1 , s2 defining an extension of the affine Weyl group associand generate W (1) ated with the A2 root system. Proposition 2 ([20, 23]). The Bäcklund transformations of the PIV system are given by on the parameters as specified by (2.6) the actions of the extended affine Weyl group W and (2.8), and on the functions as specified by si (fj ) = fj +
2αi uij , fi
π(fj ) = fj +1 (i, j = 0, 1, 2),
(2.10)
where the uij are the elements of the orientation matrix
0 1 −1 U = −1 0 1 , 1 −1 0
(2.11)
associated with the boundary of the fundamental alcove [21]. Proof. Let V denote one of π, s0 , s1 , s2 and let βi := V (αi ). Using (2.6) and (2.8) it’s a simple exercise to explicitly verify that gi := V (fi ) of the form (2.10) satisfy the structurally identical equations g0 = g0 (g1 − g2 ) + 2β0 , g1 = g1 (g2 − g0 ) + 2β1 ,
g2
(2.12)
= g2 (g0 − g1 ) + 2β2 ,
thus giving rise to the stated Bäcklund transformation.
Following [22, 16], in Tables 2, 3 the actions (2.6), (2.8) and (2.10), are listed in tabular format. (1)
Table 2. Action of the generators of the extended affine Weyl group associated with the root system A2 on the simple roots s0 s1 s2 π T1 T2 T3
α0 −α0 α 1 + α0 α 2 + α0 α1 α0 + 1 α0 α0 − 1
α1 α 0 + α1 −α1 α 2 + α1 α2 α1 − 1 α1 + 1 α1
α2 α 0 + α2 α 1 + α2 −α2 α0 α2 α2 − 1 α2 + 1
From the earlier work of Okamoto it has been known that the PIV system, as for all the Painlevé transcendents, admits a Hamiltonian formulation and that from this viewpoint the Bäcklund transformations are birational canonical transformations {q, p; H } → {q, ˜ p; ˜ H˜ }.
Application of the τ -Function Theory of PIV, PII to the GUE
365
Table 3. Bäcklund transformations for the PIV system f0 s0
f0
s1
f0 −
2α1 f1
s2
f0 +
2α2 f2
π
f1 f1 +
f2
2α0 f0
f1 f1 −
f1
2α2 f2
f2
f2 −
2α0 f0
f2 +
2α1 f1
f2 f0
Proposition 3 ([26, 16]). The PIV dynamical system is a Hamiltonian system {q, p; H } with the Hamiltonian H = (2p − q − 2t)pq − 2α1 p − α2 q, = 1/2f0 f1 f2 + α2 f1 − α1 f2 ,
(2.13)
and canonical variables q, p −f1 = q,
f2 = 2p.
(2.14)
Proof. With H specified by (2.13), Hamilton’s equations of motion read q =
∂H = q(4p − q − 2t) − 2α1 , ∂p
p = −
∂H = p(2q − 2p + 2t) + α2 . (2.15) ∂q
Substituting for p and q according to (2.14) shows that these equations are identical to the final two equations in (2.2). Note. Because −f1 satisfies the PIV equation (2.1), it follows immediately from the first equation in (2.14) that q satisfies the PIV equation (2.1). Furthermore, use of the first equation in (2.15) shows p=
1 (q + q 2 + 2tq + 2α1 ), 4q
(2.16)
so H is completely specified in terms of the Painlevé IV transcendent (2.1) with parameters (2.3). There is a degree of ambiguity in constructing a Hamiltonian in that arbitrary functions of time can be added, and in fact there is a more symmetrical form HS = 1/2f0 f1 f2 + 1/3(α1 − α2 )f0 + 1/3(α1 + 2α2 )f1 − 1/3(2α1 + α2 )f2 ,
(2.17)
which is central to the Okamoto theory (termed the auxiliary Hamiltonian). However for our purposes this complicates some later results so we prefer the unsymmetrical form. Furthermore, in the full theory of PIV [21, 22] the Hamiltonian H0 ≡ H is associated with two additional Hamiltonians H1 = π(H0 ), H2 = π 2 (H0 ) but these are not required in the random matrix context. It is also true that H (t) can be specified as the solution of a certain second order second degree equation.
366
P. J. Forrester, N. S. Witte
Proposition 4 ([26, 14]). The Hamiltonian (2.13) satisfies the second order second degree differential equation of the Jimbo–Miwa–Okamoto σ form for PIV, (H )2 − 4(tH − H )2 + 4H (H + 2α1 )(H − 2α2 ) = 0.
(2.18)
Proof. Making use of Hamilton’s equations(2.15), we have for H (t) = H (t; q(t), p(t)), H = f1 f2 ,
(2.19)
H = f1 f2 (f2 − f1 ) + 2α2 f1 + 2α1 f2 .
(2.20)
Use of (2.13) and (2.19) in (2.20) shows −1/2H + tH − H , H − 2α2 1/ H + tH − H f2 = 2 . H + 2α1 f1 =
(2.21)
Substituting (2.21) in (2.13) gives the desired equation (2.18).
For future reference we note that use of Tables 2, 3 shows that under the action of , H transforms according to the generators of W 2α0 , f0 s1 (H ) = H + 2α1 t, s2 (H ) = H − 2α2 t, π(H ) = H + f2 − 2α2 t.
s0 (H ) = H +
(2.22)
2.2. Toda lattice equation. The τ -function τ = τ (t) is defined in terms of the Hamiltonian H (t) by H (t) =
d log τ (t). dt
(2.23)
It is possible to derive a Toda lattice equation for the sequences of τ -functions {τk [n]}n=0,1,... (k = 1, 3) associated with the Hamiltonians (2.24) H α →α +n , H α →α +n , 0 0 0 0 α1 →α1 −n
α2 →α2 −n
respectively (the reason for the subscripts 1 and 3 on τ will become apparent subse which quently). An essential point is that there exist shift operators from the algebra W after n applications on H generate the shifts required by (2.24). There are in fact three fundamental shift operators [21] T1 := π s2 s1 , T2 := s1 π s2 , T3 := s2 s1 π corresponding to translations on the root lattice by the fundamental weights ω˜ j , j = 1, 2, 3 of the root (1) system A2 . As can be checked from Tables 2, 3 and (2.22) these operators have the property that T1 H = H α0 →α0 +1 , T3−1 H = H α0 →α0 +1 . (2.25) α1 →α1 −1
α2 →α2 −1
Application of the τ -Function Theory of PIV, PII to the GUE
367
Table 2 also shows that when acting on the parameters themselves, the same shifts occurring in the transformed Hamiltonian results, and thus T1 (α0 , α1 , α2 ) −1 T3 (α0 , α1 , α2 )
= (α0 + 1, α1 − 1, α2 ), = (α0 + 1, α1 , α2 − 1).
(2.26)
After a further n iterations the equations (2.25) can be written in the form T1n+1 H − T1n H = f(1)2 [n],
−(n+1)
T3
H − T3−n H = −f(3)1 [n],
(2.27)
where the subscripts (1) ((3)) refer to the system of Eqs. (2.2) with the parameters replaced as in the first (second) Hamiltonian (2.24) and use has been made of (2.13). We remark that the two results of (2.25) are inter-related. Thus consider the mapping ω defined by multiplication by −1 together with the replacements (α0 , α1 , α2 ) → (−α0 , −α2 , −α1 ),
(f0 , f1 , f2 ) → (−f0 , −f2 , −f1 ).
(2.28)
We see immediately that the system (2.2) is unchanged by ω, as is the Hamiltonian (2.13), while we can check from Table 2 that ωT1 ω = T3−1 .
(2.29)
Applying ω to the first equation of (2.25) using (2.28) and (2.29) gives the second equation. With the τ -functions τ1 [n] and τ3 [n] defined by T1n H =
d log τ1 [n], dt
T3−n H =
d log τ3 [n], dt
(2.30)
application of (2.29) shows ωτ1 [n] = Cτ3 [n].
(2.31)
In light of the relation (2.31), let us focus attention on the first equation of (2.25) only. Proposition 5 ([26, 16]). The τ -function sequence τ1 [n] corresponding to the parameter sequence (α0 + n, α1 − n, α2 ) obeys the Toda lattice equation d2 σ1 [n+1]σ1 [n−1] log σ1 [n] = , 2 dt σ12 [n]
(2.32)
where σ1 [n] := Cet
2 (α −n) 1
τ1 [n].
(2.33)
Proof. Following [26, 16] we make use of the first equation in (2.27) and consider the difference T1n+1 H − T1n H − T1n H − T1n−1 H = f(1)2 [n] − T1−1 f(1)2 [n]. (2.34) A crucial fact, which follows from Table 3 and (2.15), is that this difference is a total derivative d f(1)2 [n] − T1−1 f(1)2 [n] = (2.35) log f(1)1 [n]f(1)2 [n] + 2(α1 − n) . dt
368
P. J. Forrester, N. S. Witte
But it follows from (2.19) and (2.13) that f(1)1 [n]f(1)2 [n] + 2(α1 − n) =
2 d n d2 T1 H + 2t (α1 − n) = 2 log et (α1 −n) τ1 [n] , dt dt (2.36)
and hence the right-hand side of (2.35) is equal to 2 2 d d t (α1 −n) log log e τ1 [n] . dt dt 2
(2.37)
On the other hand (2.34) and (2.30) shows the left-hand side of (2.35) is equal to d τ1 [n+1]τ1 [n−1] log . (2.38) dt τ12 [n] Equating (2.37) and (2.38), and integrating shows that 2 d2 τ1 [n+1]τ1 [n−1] t (α1 −n) log e τ [n] =C , 1 2 dt τ12 [n]
(2.39)
and the stated result (2.32) follows. There is the ambiguity of a multiplicative constant C, possibly dependent on n but not on t, and this can be chosen freely, for example to render the Toda lattice equation in a simple form. The Toda lattice equation obeyed by the τ3 [n] with parameters (α0 + n, α1 , α2 − n) is obtained by applying the mapping ω to both sides of (2.39) and making use of (2.31). This shows 2 τ3 [n+1]τ3 [n−1] d2 log e−t (α2 −n) τ3 [n] = C , 2 dt τ32 [n]
(2.40)
which with σ3 [n] := Ce−t
2 (α −n) 2
τ3 [n],
(2.41)
gives the Toda lattice equation d2 σ3 [n+1]σ3 [n−1] log σ3 [n] = . dt 2 σ32 [n]
(2.42)
Another way to deduce (2.40) is via (2.39) and the differential equation (2.18). Now, the second Hamiltonian in (2.24) is obtained from the first by simply interchanging α1 and α2 . On the other hand α1 and α2 are interchanged in (2.18) if we replace t by it then replace H (it) by −iH (t). This tells us that in (2.39) we can make the replacements t → it,
τ1 (it) → τ3 (t),
α1 → α2 ,
(2.43)
which indeed gives (2.40). Furthermore, since (2.43) shows τ1 and τ3 are simply related, it suffices to consider one sequence only, τ3 [n] say.
Application of the τ -Function Theory of PIV, PII to the GUE
369
2.3. Classical solutions. For a special initial choice of the parameters it is possible to choose τ3 [0] = 1, and then to determine τ3 [1] in terms of a classical function, that is to say the solution of a second order linear differential equation. What is essential here is the condition for the decoupling of the two independent first order differential equations so that what remains is a Riccati equation. Proposition 6 ([26]). For the special initial choice of the parameters such that α2 = 0, i.e. for parameters (1 − α1 , α1 , 0) the first nontrivial member of the τ -function sequence τ3 [1] satisfies the Hermite-Weber equation, τ3 [1] = −2tτ3 [1] − 2α1 τ3 [1].
(2.44)
Proof. With n = 0, (1 − α1 + n, α1 , −n) implies α2 = 0. Now we see from (2.13) that H = pq(2p − q − 2t) − 2α1 p, (2.45) α2 =0
which allows us to take
H
α2 =0
= 0,
(2.46)
provided we set p = 0. Recalling (2.30) this implies τ3 [0] = 1.
(2.47)
We read off from the n = 1 case of the second equation in (2.27), together with (2.14), that T3−1 H − H = q, (2.48) α2 =0
α2 =0
and thus, after recalling the second equation in (2.30) and (2.46), d log τ3 [1] = q. dt
(2.49)
The first of Hamilton’s equations (2.15) gives, with α2 = 0 and p = 0 (after the differentiation), the Riccati equation q = −q 2 − 2tq − 2α1 .
(2.50)
Substituting (2.49) this reduces to the linear equation (2.44) first obtained in the present context by Okamoto [26]. Proposition 7. Two linearly independent solutions to the Toda lattice equation (2.40) for sequences of τ -functions with parameters (α0 + n, α1 , −n), n ≥ 0, starting from the Weyl chamber wall α2 = 0 are given by the determinant forms t 2 τ3 [n](t; α1 ) = C det (t − x)−α1 +i+j e−x dx , (2.51) i,j =0,...,n−1
−∞
and τ¯3 [n](t; α1 ) = C det
∞
−∞
(t − x)−α1 +i+j e−x dx 2
i,j =0,...,n−1
.
(2.52)
370
P. J. Forrester, N. S. Witte
Proof. In the special case α1 = 0, we observe that a solution of (2.44) is t 2 τ3 [1] = C e−x dx.
(2.53)
−∞
In fact it is possible to solve (2.44) in a form analogous to (2.53) for general α1 . Thus consider the integral t 2 Ia (t) := (t − x)a e−x dx, (2.54) −∞
and suppose temporarily that Re(a) > −1. Simple manipulation gives t d 2 Ia (t) = tIa−1 (t) + 1/2 (t − x)a−1 e−x dx dx −∞ (a − 1) Ia−2 (t). = tIa−1 (t) + 2
(2.55)
But (a − 1)Ia−2 (t) =
1 d2 Ia (t), a dt 2
Ia−1 (t) =
1 d Ia (t). a dt
(2.56)
Thus we see that I−α1 (t) satisfies (2.44) and this implies τ3 [1] = C
t
−∞
(t − x)−α1 e−x dx, 2
(2.57)
where we require Re(α1 ) < 1. Starting with (2.47) and (2.57), up to a multiplicative constant the τ -functions τ3 [n] (n = 2, 3, . . . ) are uniquely specified by the Toda lattice equation (2.42). In fact it was known to Sylvester ([19, pp. 115–117]) that the solution of (2.42) with initial condition σ3 [0] = 1 is the double Wronskian or Hankel determinant i+j d σ3 [n] = det σ3 [1] . (2.58) dt i+j i,j =0,...,n−1 Recalling (2.41) and (2.57) we therefore have τ3 [n] := τ3 [n](t; α1 ) i+j 2 t 2 d t −α1 −x 2 (t − x) e dx e = C det e−t dt i+j −∞
i,j =0,...,n−1
Making use of (2.55) we can check that t dp t2 t 2 −α1 −x 2 p t2 e (t − x) e dx = 2 e (t − x)−α1 +p e−x dx, dt p −∞ −∞ so (2.59) can also be written in the final form of (2.51).
.
(2.59)
(2.60)
Application of the τ -Function Theory of PIV, PII to the GUE
371
The second linearly dependent solution of (2.44) can also be written in an integral form similar to (2.54). Thus we see the integral I¯a (t) :=
∞
(t − x)a e−x dx, 2
−∞
(2.61)
satisfies the formulas (2.55) and (2.56), and thus satisfies (2.44) with a = −α1 . Hence in addition to (2.57) we have the solution τ¯3 [1] = C
∞
(t − x)−α1 e−x dx, 2
−∞
(2.62)
(note that for α1 not equal to a non-positive integer, this is well defined only if t ∈ R). Proceeding as in the derivation of (2.51), we deduce from the Toda lattice equation (2.42), and the initial values (2.47), (2.62), the sequence of τ -functions given by (2.52). Let us now consider the sequence of τ -functions τ¯1 [n]. Proposition 8 ([26]). The sequence of τ -function solutions to the Toda lattice equation (2.39) τ¯1 [n], n ≥ 0, corresponding to the parameter sequence (α0 + n, −n, α2 ) starting from the line α1 = 0 has the determinantal form τ¯1 [n](t; −p) = C det Hp+i+j (t)
(2.63)
i,j =0,...,n−1
for −α2 = p ∈ Z≥0 . Proof. This sequence can be obtained from τ¯3 [n] by the (inverse of) the mappings (2.43). Replacing t by −it in (2.51) does not lead to an integral of interest in random matrix applications, but doing the same in (2.52) gives τ¯1 [n](t; α2 ) = C det
∞
−∞
(t − ix)−α2 +i+j e−x dx 2
i,j =0,...,n−1
,
(2.64)
(n)
which is of interest. We recall that for τ¯1 the parameters (α0 , α1 , α2 ) in the corresponding Hamiltonian are given by (1 − α2 + n, −n, α2 ).
(2.65)
For p ∈ Z≥0 we know
∞ −∞
(t − ix)p e−x dx = 2
√ −p π 2 Hp (t),
and thus setting α2 = −p equation (2.64) yields (2.63).
Note that with p = N , n = 2, this is precisely the final determinant in (1.15).
(2.66)
372
P. J. Forrester, N. S. Witte
2.4. Bäcklund transformations and discrete Painlevé systems. It has been known that some of the Bäcklund transformations of the PIV transcendent can be identified with discrete Painlevé equations [6, 11], although no systematic study has been undertaken for this class. We will find that the difference equations for fj [n], H [n], τ [n] which are generated by the Bäcklund transformations for the two shift operations T3−1 , T1 are in fact manifestations of discrete Painlevé equations. Proposition 9 ([11]). The Bäcklund transformations of the PIV system corresponding to the shift operator T3−1 generating the parameter sequence of (α0 + n, α1 , −n) with n ∈ Z, 0 < α0 , α1 < 1 and α0 + α1 = 1 are second order difference equations of the first discrete Painlevé equation dPI, namely χk+1 + χk + χk−1 = 2t +
k − (1/2 + α1 ) + (−1)k (1/2 − α1 ) χk
k ≥ 1,
(2.67)
where χ2n+1 = f(3)2 [n],
χ2n+2 = f(3)0 [n] n ≥ 0.
(2.68)
Proof. The action of the shift operators on the fj is expressible in a terminating continued fraction, which for T3 and its inverse takes the form T3 (f0 ) = f1 − T3 (f1 ) = f2 +
T3 (f2 ) = f0 +
T3−1 (f0 ) = f2 −
T3−1 (f1 ) = f0 −
T3−1 (f2 ) = f1 +
2α2 , f2 2(α1 +α2 ) , 2α2 f1 − f2 2α2 2(α1 +α2 ) − , 2α2 f2 f1 − f2 2α0 2(α0 +α1 ) + , 2α0 f0 f1 + f0 2(α0 +α1 ) , 2α0 f1 + f0 2α0 , f0
(2.69) (2.70)
(2.71)
(2.72)
(2.73)
(2.74)
as one can verify using the action of the affine Weyl group reflections and diagram rotations as given in Tables 2, 3. For simplicity of notation we suppress the subscript (3) labelling the sequence (α0 + n, α1 , −n) during the discussion of our proofs as there is no risk of confusion. Taking the first and last members of this set, now at the nth rung of the T3 ladder, and adding their unshifted f -variable we have 2n , f2 [n] 2(n+α0 ) f2 [n+1] + f2 [n] = 2t − f0 [n] + , f0 [n]
f0 [n] + f0 [n−1] = 2t − f2 [n] +
(2.75)
Application of the τ -Function Theory of PIV, PII to the GUE
373
so that one has a closed system. This can be recognised as the two components of a staggered system of difference equations and employing the definitions of χk above we arrive at the discrete Painlevé equation dPI. In terms of the coordinate and momenta of the Hamiltonian system this difference system was found by Okamoto [26, 28] and can be expressed as
q[n+1] = (2t +q[n]−2p[n]) q[n−1] = −2p[n]
q[n]p[n] − α1 , q[n]p[n] − n
p[n+1] = −1/2q[n] + p[n−1] = t +
q[n](2t +q[n]−2p[n]) + 2α1 , 2(n+α0 ) − q[n](2t +q[n]−2p[n])
α0 + n , 2t +q[n]−2p[n]
q[n]p[n] − α1 q[n]p[n] − n − p[n] . 2p[n] q[n]p[n] − n
(2.76) (2.77) (2.78) (2.79)
Consequently a third order difference equation exists for the Hamiltonian through the relation H [n + 1] − H [n] = −f1 [n].
(2.80)
Eliminating p[n] between (2.76), (2.77) we find a second order difference equation for q[n], 2 nq[n]q[n−1] 4t + 2q[n] + q[n+1] + q[n−1] = 2 (n+1)q[n+1] − nq[n−1] − (2t + q[n] + q[n+1])(α1 + 1/2q[n](2t + q[n])) × (n+1)q[n+1] − nq[n−1] − (2t + q[n] + q[n−1])(−α1 + 1/2q[n](2t + q[n] + q[n+1] + q[n−1])) . (2.81) Use of H [n+1] − H [n] = q[n] leads to the third order equation in H [n]. In addition we have a higher order difference equation for the τ -function. Proposition 10. The τ -function sequence, appropriately normalised, associated with the shift operator T3−1 with parameter values (α0 + n, α1 , −n), n ∈ Z≥0 , 0 < α0 , α1 < 1
374
P. J. Forrester, N. S. Witte
and α0 + α1 = 1 satisfies the fourth order difference equation 4t 2 2nτ 2 [n] − τ [n+1]τ [n−1] × 2(n−α1 )τ 2 [n] − τ [n+1]τ [n−1] × τ [n−2]τ [n+1]τ [n] + 2τ 2 [n−1]τ [n+1] + 4n(α1 −n)τ 2 [n]τ [n−1] × τ [n+2]τ [n−1]τ [n] − 2τ 2 [n+1]τ [n−1] + 4n(α1 −n)τ 2 [n]τ [n+1] ! = τ [n+2]τ [n−2]τ 3 [n] − 16n2 (α1 −n)2 τ 5 [n] (2.82) + 16n(α1 −n)(α1 −2n)τ 3 [n]τ [n+1]τ [n−1] − 4(2n2 −2α1 n+1)τ [n]τ 2 [n+1]τ 2 [n−1] + τ [n+2]τ 2 [n−1] τ [n+1]τ [n−1] + 2(α1 +1−2n)τ 2 [n] "2 . + τ [n−2]τ 2 [n+1] τ [n+1]τ [n−1] + 2(α1 −1−2n)τ 2 [n] Proof. We first seek to express all the fundamental quantities in terms of the product f1 [n]f2 [n]. By multiplying the two transformations (2.69) and (2.70) we find a quadratic relation for f1 [n] (and f1 [n−1]), f1 [n](2t − f1 [n]) = f1 [n+1]f2 [n+1] + f1 [n]f2 [n] + 2α1 .
(2.83)
Next we multiply (2.70) by f1 [n] which yields a relation for the product f1 [n]f1 [n−1] = f1 [n]f2 [n]
f1 [n]f2 [n] + 2α1 . f1 [n]f2 [n] + 2n
(2.84)
One can verify then that a linear proportionality exists between f1 [n] and f1 [n−1] via the product f1 [n]f2 [n], ! " f1 [n]f2 [n] + 2α1 f1 [n] f1 [n]f2 [n] + f1 [n−1]f2 [n−1] + 2α1 − f1 [n]f2 [n] f1 [n]f2 [n] + 2n ! " f1 [n]f2 [n] + 2α1 = f1 [n−1] f1 [n+1]f2 [n+1] + f1 [n]f2 [n] + 2α1 − f1 [n]f2 [n] , f1 [n]f2 [n] + 2n (2.85) so that f1 [n] and f1 [n−1] may now be linearly related to f1 [n]f2 [n]. Multiplying these two later relations and using C
τ3 [n+1]τ3 [n−1] = 2n + f1 [n]f2 [n], τ32 [n]
with C = 1 to introduce the τ -functions we arrive at (2.82).
(2.86)
Note. The difference equations (2.81) and (2.82) have the advantage of being of the lowest order we have found possible, but the disadvantage of not being linear in the highest order terms (q[n + 1] and τ [n + 2] respectively). In fact difference equations linear in the highest order terms can be given by increasing by one the order of the equations in each case [28].
Application of the τ -Function Theory of PIV, PII to the GUE
375
Applying the operator ω (recall (2.28)) we obtain analogous results for the sequence generated by T1 . Proposition 11 ([11]). The Bäcklund transformations generated by the shift operator T1 corresponding to the parameter sequence (α0 + n, −n, α2 ) with n ∈ Z, 0 < α0 , α2 < 1, and α0 + α2 = 1, are second order difference equations of the first discrete Painlevé equation dPI, that is ηk+1 + ηk + ηk−1 = 2t −
k − [1+(−1)k ]α2 − 1/2[1−(−1)k ] , ηk
k ≥ 1,
(2.87)
where η2n+1 = f(1)1 [n],
η2n+2 = f(1)0 [n],
n ≥ 0.
Proof. This follows immediately upon applying ω to both sides of (2.67).
(2.88)
The analogue of (2.81) for the parameter sequence generated by the shift T1 can be found by applying the ω map to this relation, 2 −2np[n]p[n − 1] 2t − 2p[n] − p[n + 1] − p[n − 1] = (n + 1)p[n + 1]−np[n−1] + (t −p[n]−p[n + 1])(α2 + 2p[n](t −p[n])) × (n + 1)p[n + 1] − np[n − 1] + (t − p[n] − p[n − 1])(−α2 + 2p[n](t − p[n] − p[n + 1] − p[n − 1])) , (2.89) and this implies a third order difference equation for the Hamiltonian via H [n+1] − H [n] = f2 [n] = 2p[n].
(2.90)
There is also a higher order difference equation for the τ -function which can be derived using the relation τ1 [n+1]τ1 [n−1] = f1 [n]f2 [n] − 2n, τ12 [n]
(2.91)
although we do not reproduce this here. 3. τ -Function Theory for PII 3.1. Affine Weyl group symmetry. In the general Painlevé theory the second Painlevé equation naturally appears as a coalescence limit of PIV. From the work of [36] it is known that in random matrix theory PII occurs in the edge scaling limit of the GUE. This suggests that before studying this limit we should develop a theory of PII analogous to that developed for PIV in the previous section. We take the PII equation to be defined in the standard manner y = 2y 3 + ty + α.
(3.1)
376
P. J. Forrester, N. S. Witte
Proposition 12. The second Painlevé equation with the transcendent y = q(t) and parameter α is equivalent to the system of first order differential equations f0 = −2qf0 + α0 ,
(3.2)
= 2qf1 + α1 ,
(3.3)
f1
where f0 + f1 = 2q 2 + t and α0 + α1 = 1 with α = α1 − 1/2 = 1/2 − α0 . Proof. This is established by eliminating p through the substitutions f0 = 2q 2 − p + t and f1 = p. = s0 , s1 , π be the extended affine Weyl group of the root system Proposition 13. Let W (1) of type A1 generated by the reflections s0 , s1 and the diagram rotation π , with action on the roots α0 , α1 as given in Table 4. The coupled system (3.2), (3.3) is symmetric under the Bäcklund transformations induced by the elements of the above affine Weyl group as specified in Table 5. (1)
Table 4. Action of the generators of the extended affine Weyl group associated with the root system A1 on the simple roots s0 s1 π
α0 −α0 α0 + 2α1 α1
α1 α1 + 2α0 −α1 α0
Table 5. Bäcklund transformations for the PII system f0 s0 s1 π
f1 f1 −
f0 f0 +
2α 2 4α1 q + 21 f1 f1 f1
2α 2 4α0 q + 20 f0 f0 f1 f0
q q−
α0 f0
q+
α1 f1
−q
Proof. This can be directly verified using the equations of motion (3.2), (3.3).
Underlying the dynamics of the PII system is a Hamiltonian structure. Proposition 14. The PII dynamical system is equivalent to a Hamiltonian system {q, p; H } with Hamiltonian H = −1/2f0 f1 − α1 q,
(3.4)
and canonical coordinates and momenta q, p defined by p = f1 ,
2q 2 = f0 + f1 − t.
(3.5)
Application of the τ -Function Theory of PIV, PII to the GUE
377
Proof. Using the symmetrised differential equations (3.2, 3.3) the Hamilton equations of motion q = p − q 2 − 1/2t,
p = 2qp + α1 ,
(3.6)
can be verified. Remark. The fundamental domain or Weyl chamber for PII can be taken as the interval α ∈ (−1/2, 0] or α ∈ [0, 1/2), and so there exist identities relating the transcendents and related quantities at the endpoints of these intervals. In particular, denoting the transcendent q(t, α) and with 7 2 = 1, t = −21/3 s we have [12] d q(t, 1/27) − 7 q 2 (t, 1/27) − 1/27 t, dt d 1 q(t, 1/27) = 7 2−1/3 q(s, 0). q(s, 0) ds
−7 21/3 q 2 (s, 0) =
(3.7)
The action of the affine Weyl group on the Hamiltonian is given in Table 6. We define two shift operators corresponding to translations by the fundamental (1) weights of the affine Weyl group A1 , T 1 = π s1 ,
T2 = s1 π,
(3.8)
although T1 T2 = 1 so only one is independent. Their action on the parameter space is given in Table 7. Table 6. Bäcklund transformations of the Hamiltonian
s0
H0
H1 = π(H0 ) = H0 + q
α0 f0
H1
H0 +
s1
H0
π
H1
H1 +
α1 f1
H0 (1)
Table 7. Action of the shift operators on the simple roots of the root system A1 T1 T2
α0 α0 + 1 α0 − 1
α1 α1 − 1 α1 + 1
Proposition 15. The Bäcklund transformations corresponding to the shifts are given by T1 (f0 ) = f1 −
2α 2 4α0 q + 20 , f0 f0
T1 (f1 ) = f0 , T2 (f0 ) = f1 , T2 (f1 ) = f0 +
(3.9) (3.10) (3.11)
2α 2 4α1 q + 21 , f1 f1
(3.12)
378
P. J. Forrester, N. S. Witte
T1 (H0 ) = H0 + q,
(3.13)
α0 T1 (H1 ) = H1 − q + , f0 α1 T2 (H0 ) = H0 + q + , f1 T2 (H1 ) = H1 − q.
(3.14) (3.15) (3.16)
Proposition 16. The Hamiltonian H (t) satisfies the second order second degree differential equation of Jimbo–Miwa–Okamoto σ form for PII, # $2 # $3 H + 4 H + 2H [tH − H ] − 1/4α12 = 0. (3.17) Proof. Using first two derivatives of H H = −1/2f1 , H = −qf1 − 1/2α1 ,
(3.18)
one can solve for q, f0 , f1 and then substitute these back into the expression for H . The result (3.17) then follows after simplification. With H given by (3.4), and p and q specified by (3.5), Hamilton’s equation for q implies p = q + q 2 + 1/2t.
(3.19)
Thus H can be expressed in terms of the Painlevé II transcendent q according to H = 1/2(q )2 − 1/2(q 2 + 1/2t)2 − (α + 1/2)q.
(3.20)
3.2. Toda lattice equation. The τ -functions are defined as before (2.23) and corresponding to each sequence generated by the shift operators is a Toda lattice equation. Proposition 17. The τ -function sequence generated by the shift operator T1 with the parameter sequence (α0 + n, α1 − n) for n ≥ 0 T1n (H ) = H [n] =
d ln τ [n], dt
(3.21)
obeys the Toda lattice equation C
τ [n+1]τ [n−1] d2 = ln τ [n]. τ 2 [n] dt 2
(3.22)
Proof. This parallels the argument employed for PIV case, by utilising the relations H [n+1] − H [n] = q[n],
(3.23)
and q[n] − T1−1 (q[n]) =
d ln f1 [n], dt
(3.24)
along with
d H = −1/2f1 . dt
(3.25)
Application of the τ -Function Theory of PIV, PII to the GUE
379
3.3. Classical solutions. When the parameter values are those on a chamber wall (a point) α1 = n ∈ Z then the τ -functions are known to be expressible in terms of Airy functions [26]. Proposition 18. The solution for the first non-trivial member of the τ -function sequence τ [1](t) generated by the shift operator T1 with initial parameters (α0 , α1 ) = (1, 0) that is bounded as t → −∞ is τ [1](t) = CAi(−2−1/3 t). The nth member of this sequence is i+j d τ [n](t) = C det Ai(−2−1/3 t) dt i+j
i,j =0,... ,n−1
(3.26)
.
(3.27)
Proof. Starting from α1 = 0 at n = 0 one can take p[0] = 0 so that H [0] = 0 and conventionally τ [0] = 1. Using (3.13) we find that H [1] = q[0] and so the equation of motion (3.6) gives the second order linear differential equation τ [1] + 1/2tτ [1] = 0, and thus (3.26). The determinant formula (3.27) follows from (2.58).
(3.28)
Another special parameter value of the Hamiltonian system (3.4) is α1 = −1/2 [37] when Hamilton’s equation (3.6) permit the solution (q, p) = (t −1 , 1/2t).
(3.29)
However the corresponding value of H is not zero so in this case we do not have τ [0] = 1 (rather H [0] and thus log τ [0] is a rational function of t), and thus the sequence of τ functions generated by T1 is not given by a determinant [15]. Nonetheless the Bäcklund transformations of Prop. 15 show that H [n] and thus log τ [n] remain rational functions of t for all n = 1, 2, 3, . . . . 3.4. Bäcklund transformations and discrete dPI. The discrete dynamical system generated by the Bäcklund transformations is also integrable and can be identified with a discrete Painlevé system Proposition 19. The members of the sequence {q[n]}, n ≥ 0 generated by the shift operator T1 with the parameters (α0 +n, α1 −n) are related by a second order difference equation which is the alternate form of the first discrete Painlevé equation, a-dPI, α + 1/2 − n α − 1/2 − n + = −2q 2 [n] − t. q[n] + q[n−1] q[n+1] + q[n]
(3.30)
Proof. We deduce from (3.8) and Table 5 that α − 1/2 , p − 2q 2 − t α + 1/2 T2 (q) = −q − , p T1 (q) = −q +
(3.31) (3.32)
so eliminating p through the combination of these two we arrive at the stated result.
380
P. J. Forrester, N. S. Witte
The full set of forward and backward difference equations are [26] α − 1/2 − n , p[n] − 2q[n]2 − t α + 1/2 − n q[n−1] = −q[n] − , p[n] q[n+1] = −q[n] +
p[n+1] = −p[n] + 2q[n]2 + t, α + 1/2 − n 2 p[n−1] = t − p[n] + 2 q[n] + . p[n]
(3.33) (3.34) (3.35) (3.36)
The discrete Painlevé equation (3.30) implies a third order difference equation for the Hamiltonian α + 1/2 − n α − 1/2 − n + = −2(H [n+1] − H [n])2 − t, (3.37) H [n+1] − H [n−1] H [n+2] − H [n] because q[n] = H [n + 1] − H [n]. Equations (3.35) and (3.36) also imply q[n] =
1 αn p[n] (p[n−1] − p[n+1]) − , 4αn 2p[n]
αn := α + 1/2 − n.
(3.38)
Using this to eliminate q[n] (in say (3.35)) yields a second order difference equation for the p[n], 1 2 α2 p [n] (p[n+1] − p[n−1])2 + 2 n − 2p[n] − p[n+1] − p[n−1] + 2t = 0. 2 4αn p [n] (3.39) Furthermore, Eqs. (3.21), (3.22) and (3.25) give C
τ [n+1]τ [n−1] = p[n]. τ 2 [n]
(3.40)
So substituting in (3.39) (with say C = −2) implies a fourth order difference equation for τ [n], 2 1 2 2 [n+1] − τ [n+2]τ [n−1] + 1/8αn2 τ 6 [n] τ [n−2]τ 2αn2 (3.41) + τ [n−2]τ 3 [n]τ 2 [n+1] + τ [n+2]τ 3 [n]τ 2 [n−1] + 2τ 3 [n−1]τ 3 [n+1] + t τ 2 [n]τ 2 [n−1]τ 2 [n+1] = 0. While (3.39) and the corresponding equation for τ [n] provide a polynomial relation between the smallest set of consecutive sequence members {p[n]} and {τ [n]} we have found possible, they have the disadvantage of not being linear in the highest order term (p[n+1] and τ [n+2] respectively). This disadvantage can be remedied by increasing by one the order of the equation in each case. Thus by replacing n by n − 1 in (3.38), then adding the result to the original equation and using (3.34) implies a third order difference equation for the p[n] [29], 0=
1 αn p[n] (p[n−1] − p[n+1]) + 4αn 2p[n] 1 αn−1 + , p[n−1] (p[n−2] − p[n]) − 4αn−1 2p[n]
(3.42)
Application of the τ -Function Theory of PIV, PII to the GUE
381
which is indeed linear in p[n + 1]. Substituting (3.40) gives a fifth order equation for τ [n], linear in the highest order term τ [n+2]. 3.5. Coalescence from PIV. Since the earliest works on the Painlevé transcendents [31] it was known how to obtain the PII system from a limiting procedure or coalescence applied to the PIV, however there is in fact more than one such coalescence path. In fact our analysis of the scaled GUE requires the application of a second coalescence path rather than the one commonly employed. In the first limit the parameters (α0 , α1 , α2 ) and variables tIV , qIV , pIV , HIV in the PIV system scale in a way that α2 is fixed so that α0 = 1/2 − αII − 1/27 −6 ,
(3.43)
−6
α1 = 1/27 , α2 = αII + 1/2,
(3.44) (3.45)
tIV = −7 −3 + 2−2/3 7 tII ,
(3.46)
qIV = 7
−3
pIV = 2
+2
−2/3
HIV = −7
−3
2/3 −1
7
qII ,
(3.47)
7 pII ,
(3.48)
(αII + 1/2) + 2
2/3 −1
7
HII ,
(3.49)
as 7 → 0 then the function qII (tII ) satisfies the PII differential equation with parameter αII [13]. The second limiting procedure is obtained from the first by the mapping ω introduced in (2.28). Proposition 20. If α1 is fixed so that the variables scale like α0 = αII + 3/2 + 1/27 −6 , α1 = −αII − 1/2,
(3.50) (3.51)
α2 = −1/27 −6 ,
(3.52)
tIV = 7
−3
−2
−2/3
7 tII ,
qIV = 21/3 7 pII , 2pIV = 7
−3
HIV + α1 tIV = −2
+2 7
(3.54)
2/3 −1
2/3 −1
(3.53)
7
qII ,
HII ,
(3.55) (3.56)
as 7 → 0 then the function qII (tII ) satisfies the PII differential equation with parameter αII . Furthermore the third order difference equation for HIV (2.89, 2.90), related to the discrete Painlevé equation dPI, corresponding to the Bäcklund transformation under the shift operator T1 transforms into the third order difference equation for HII (3.37), related to the alternate discrete Painlevé equation a-dPI, under this scaling. Proof. Under the mapping HIV → −HIV , 2pIV ↔ qIV , α1 ↔ −α2 and tIV ↔ −tIV . The only equation which isn’t immediate from the mapping is (3.56). Substituting (3.53) for tIV and ignoring the term proportional to 7 tII shows this equation is equivalent to HIV = 7 −3 (αII + 1/2) − 22/3 7 −1 HII ,
(3.57)
which is precisely what results from applying the mapping ω to (3.49). The scaling of the third order difference equation for HIV associated with the discrete Painlevé equation, dPI, (2.89) to (3.37) can be verified directly.
382
P. J. Forrester, N. S. Witte
4. Application to Finite GUE Matrices In this section we will show that the determinants in (2.51) and (2.52) occur in the calculation of the quantities E˜ N and FN , introduced in the Introduction, relating to GUE random matrices. 4.1. Calculation of EN (0; (s, ∞)) and E˜ N (s; a). Consider first the probability EN (0; (s, ∞)). Proposition 21. The gap probability EN (0; (s, ∞)) is identical to the N th τ -function of the sequence generated by T3−1 from the corner of the Weyl chamber (α0 , α1 , α2 ) = (1, 0, 0), EN (0; (s, ∞)) = τ3 [N ](s; 0),
(4.1)
where the normalization of τ3 [N] must be such that lim τ3 [N ](s; 0) = 1.
(4.2)
s→∞
The resolvent kernel function R = RN (t) occurring in (1.5) is the N th Hamiltonian associated with this sequence, RN (t) = H (t) . (4.3) (α0 ,α1 ,α2 )=(1+N,0,−N)
Proof. From the meaning of EN we see from (1.1) with g(x) = e−x that 2
1 EN (0; (s, ∞)) = C
s
−∞
dx1 · · ·
s −∞
dxN
N j =1
e−xj
2
(xk − xj )2 .
(4.4)
1≤j 1/2 the metric on the covering surface is the one induced from the z plane ds 2 = dzd z¯ = dtd t¯|
dz 2 | = 4|t|2 dtd t¯, dt
|t| > 1/2 .
(2.13)
We “close the hole” in the t space by choosing, for |t| < 1/2 , the metric ds 2 = 4dtd t˜,
|t| < 1/2 .
(2.14)
Thus we have glued in a “flat patch” in the t space to close the hole created in the definition of the twist operator. The metric is continuous across the boundary |t| = 1/2 , but there is curvature concentrated along this boundary. The path integral of X over the disc |t| < 1/2 creates the required state along the edge of the hole. The map (2.7) is only the leading order approximation to the actual map in general, but our prescription is to “close with a flat patch” the hole in the t space, where the hole is the image on of a circular hole in the z plane. As → 0, the small departure of the map from the form (2.7) will cease to matter.
406
O. Lunin, S. D. Mathur
Note that we could have chosen a different metric to replace the choice (2.14) inside the hole, but this would just correspond to a different overall normalization of the twist operator. (Thus it would be like taking a different choice of .) Once we make the choice (2.14) then we must use the same construction of the twist operator in all correlators, and then the non-universal choices in the definitions will cancel out. The other holes, of types (ii) and (iii), arise from the hole at infinity in the z plane, and we proceed by first replacing the z plane by a closed surface. We take another disc with radius 1/δ (parameterized by a coordinate z˜ ) and glue it to the boundary of the z plane. Thus we get a sphere with metric given by ds 2 = dzd z¯ , = d z˜ d z¯˜ , z˜ =
1 , δ 1 |˜z| < , δ
|z|
1/δ gives no contribution to SL , since the curvature of the fiducial metric is zero, and the map gives φ = constant. 2.6. The correlator in terms of the Liouville field. Let us collect all the above contributions together. Note that (2)
(3)
SL + SL =
1 δ log , 3 δ
(2.40)
so that the variable b drops out of this combination. Now let us go back to the expression (2.6) that we want to evaluate:
σn (0)σn (a)δ =
Z (ˆs ) Z,δ [σn (z1 ), σn (z2 )] = eSL , N (Zδ ) (Zδ )n
(2.41)
here we used Eq. (2.16). Taking into account the relations (2.19) and (2.20) we finally get: n 1/3 δ
σn (0)σn (a)δ = eSL Q1−n . (2.42) δ Substituting the expression for the Liouville action, SL , we conclude that n 1/3 (1) (2) (3) (1) n−1 δ
σn (0)σn (a)δ = eSL eSL +SL Q1−n = eSL δ 3 Q1−n . δ
(2.43)
Correlation Functions for M N /SN Orbifolds
411
Thus we observe a cancellation of δ , which served only to choose a fiducial metric on and thus should not appear in any final result. The only quantity that needs computation (1) is SL using (2.33). Let us mention 0that formula (2.41) has a simple extension to the case of a general correlation function:
σn1 (z1 ) . . . σnk (zk )δ = eSL
Z (ˆs ) , (Zδ )s
(2.44)
where Z (ˆs ) is a partition function of the covering Riemann surface with the fiducial metric d sˆ 2 ( may have any genus), and s is a number of fields involved in nontrivial permutation (s = n in the case of the two point function (2.41)). The partition function Z (ˆs ) may depend on the moduli of the surface and its size (there are no moduli in the case of the sphere and the size is parameterized by δ ). 3. The 2-Point Function 3.1. The calculation. Let us apply the above scheme to evaluate the 2-point function of twist operators. If one of the twist operators corresponds to the permutation (1 . . . n),
(3.1)
then the other one should correspond to the permutation (n . . . 1),
(3.2)
since otherwise the correlation function vanishes. Thus we can write
σn (0)σn (a)
(3.3)
instead of σ(1...n) (0)σ(n...1) (a) without causing confusion. The generalization of the map (2.10) to the case of σn is z=a
tn . t n − (t − 1)n
(3.4)
For this map we have dz 2 t n−1 (t − 1)n−1 | = log[an n ] + c.c., dt (t − (t − 1)n )2 dφ (2t + n − 1)(t − 1)n − (2t − n − 1)t n =− . dt t (t − 1)((t − 1)n − t n ) φ = log |
(3.5)
This map has the branch points located at t =0→z=0
t = 1 → z = a.
and
(3.6)
There are n images of the point z = ∞ in t plane: tk =
1 , 1 − αk
αk = e
We note that α0 = 1 gives t = ∞.
2π ik n
,
k = 0, 1, . . . , n − 1.
(3.7)
412
O. Lunin, S. D. Mathur
Let us compute the contribution (2.33) for the point z = 0. Near this point we have: |z|1/n , a 1/n n−1 ∂t φ ≈ . t
z ≈ (−1)n+1 at n ,
|t| ≈
φ ≈ log[ant n−1 ],
(3.8)
Then we get the contribution to the Liouville action (2.25): n−1
n−1 1 n SL (t = 0) = − . log |a| + log n 12 n
(3.9)
By a reflection symmetry t → 1 − t, z → a − z, we get the same contribution from the other branch point: n−1
n−1 1 SL (t = 1) = − . (3.10) log |a| + log n n 12 n Now we look at the images of infinity. First we note that the integral over the boundary located near t = ∞ will give zero, since dφ/dt goes like 1/t 2 , the length of the circle goes like t and the value of φ is at best logarithmic in t. But we do get a contribution from the images of z = ∞ located at finite points in the t plane. Note that if
t=
1 + x, 1 − αk
then
(t − 1)n − t n ≈
xn . αk (1 − αk )n−2
(3.11)
This leads to aαk 1 1 aαk , x≈− , n(1 − αk )2 x n(1 − αk )2 z 2 φ = log a −1 n(1 − αk )2 αkn−1 z2 , ∂t φ ≈ − . t − (1 − αk )−1 z≈−
(3.12) (3.13)
The point t = tk we are considering gives the following contribution to the Liouville action: 1 SL (t = tk ) = log a −1 n(1 − αk )2 αkn−1 δ −2 . (3.14) 6 Thus the total contribution from the images of infinity is:
n−1 n(1 − αk )2 αkn−1 n−1 1 n+1 SL (z = ∞) = =− log log[|a|δ 2 ] + log[n]. −2 6 aδ 6 6 k=1
(3.15) We have used the following properties of αk : n−1 k=1
αk = 1;
n−1 k=1
(q − αk ) =
qn − 1 → n, q −1
if
q → 1,
(3.16)
which follow from the fact that {αk } is the set of different solutions of the equation α n − 1 = 0 and α0 = 1.
Correlation Functions for M N /SN Orbifolds
413
Adding all the contributions together, we get an expression for the interesting part of the Liouville action: 1 (n − 1)2 1 (1) SL = − (n − ) log |a| + log + 2(n − 1) log δ − 2 log n . (3.17) 6 n n This leads to the final expression for the correlation function (see (2.43)): (1)
σn (0)σn (a)δ = eSL δ An = −
n−1 3
Q1−n = a − 6 (n− n ) Cn An QBn , 1
(n − 1)2 6n
,
1
Bn = 1 − n,
Cn = n1/3 .
(3.18) (3.19)
Thus we read off the dimension 5n of σn , 5n =
1 1 (n − ). 24 n
(3.20)
The other constants in (3.18) are to be absorbed into the normalization of σn . We will discuss this renormalization after computing the 3-point functions. 3.2. “Universality” of the 2-point function. The theory we have considered above is that of the orbifold M N /SN , where the manifold M is just R, the real line. If M was R d instead, we could treat the d different species of fields independently, and obtain c 1 5n = n− , (3.21) 24 n where c = d is the central charge of the CFT for one copy of M = R d . But we see that we would obtain the result (3.21) for the symmetric orbifold with any choice of M; we just use the value of c for the CFT on M. Around the insertion of the twist operator we permute the copies of M, but the definition of the twist operator does not involve directly the structure of M itself. The Liouville action (2.17) determines the correlation function using only the value c of the CFT. Thus we recover the result (3.21) for any M. This “universality” of 5n is well known, and the value of 5n can be deduced from the following standard argument. Consider the CFT on a cylinder parameterized by w = x + iy, 0 < y < 2π . At x → −∞ let the state be the vacuum of the orbifold CFT M N /SN . Since there is no twist, each copy of M gives its own contribution to the c vacuum energy, which thus equals − 24 N . Now insert the twist operator σn at w = 0, and look at the state for x → ∞. The copies of M not involved in the twist contribute c − 24 each as before, but those that are twisted by σn turn into effectively one copy of c M defined on a circle of length 2π n. Thus the latter set contribute − 24n to the vacuum energy. The change in the energy between x = ∞ and x = −∞ gives the dimension of σn (since the state at x → −∞ is the vacuum) −
c cn c 1 − [− ] = (n − ) = 5n . 24n 24 24 n
(3.22)
Thus while our calculation of the 2-point function has not taught us anything new, we have obtained a scheme that will yield the higher point functions for symmetric orbifolds using an extension of the same universal features that gave the value of 5n in the above argument.
414
O. Lunin, S. D. Mathur
4. The Map for the 3-Point Function 4.1. Genus of the covering surface. Let us first discuss the nature of the covering surface for the case where we have an arbitrary number of twist operators in the correlation function σn1 σn2 . . . σnk . The CFT is still defined on the plane z, which we will for the moment regard as a sphere by including the point at infinity. At the insertion of the operator σnj (zj ) the covering surface has a branch point of order nj , which means that nj sheets of meet at zj . One says that the ramification order at zj is rj = nj − 1. Suppose further that over a generic point z here are s sheets of the covering surface . Then the genus g of is given by the Riemann–Hurwitz formula: 1 g = rj − s + 1. (4.1) 2 j
Let us now consider the 3-point function. We require each twist operator to correspond to a single cycle of the permutation group, and regard the product of two cycles to represent the product of two different twist operators. Let the cycles have lengths n, m, q respectively. It is easy to see that we can obtain covering surfaces of various genera. For example, if we have σ12 σ13 σ123
(4.2)
as the three permutations, then we have r1 = 1, r2 = 1, r3 = 2, s = 3, and we get g = 0. On the other hand with σ123 σ123 σ123
(4.3)
we get r1 = r2 = r3 = 2, s = 3 and we get g = 1. (This genus 1 surface is a singular limit of the torus, however.) Let us concentrate on the case where we get g = 0. Without loss of generality we can take the first permutation σn to be the cycle (1, 2, . . . k, k + 1, . . . n).
(4.4)
The second permutation is restricted by the requirement that when composed with (4.4) it yields a single cycle (which would be the conjugate permutation of the third twist operator). In addition we must have a sufficiently small number of indices in the result of the first two permutations so that we do get g = 0. A little inspection shows that σm must have the form (k, k − 1, . . . 1, n + 1, n + 2, . . . n + m − k).
(4.5)
Thus the elements 1, 2, . . . k of the first permutation occur in the second permutation in the reverse order, and then we have a new set of elements n + 1, . . . n + m − k. These two permutations compose to give the cycle σm σn equal to ( k + 1, k + 2, . . . n, 1, n + 1, n + 2, . . . n + m − k).
(4.6)
Thus σq must be the inverse of the cycle (4.6), and we have q = n + m − 2k + 1.
(4.7)
Correlation Functions for M N /SN Orbifolds
415
Note that the number of “overlaps” (i.e., common indices) between σn and σm is k. Note that we must have k ≥ 1 in order that the product σm σn be a single cycle rather than just a product of two cycles. Also note that if we have q = n + m − 1, then since s ≥ q, (4.1) gives that must have genus zero (this will be a “single overlap” correlator). Let be the covering surface that corresponds to the insertions σn (z1 )σm (z2 ) σq (z3 ). Then the number of sheets of over a generic point z is just the total number of indices used in the permutations 1 (n + m + q − 1). 2
(4.8)
n−1 m−1 q −1 + + − s + 1 = 0. 2 2 2
(4.9)
s =n+m−k = Thus the genus of is
4.2. The map for the case g = 0. We are looking for a covering surface of the sphere that is ramified at three points on the sphere, with a finite order of ramification at each point. We look for the map from z to as a ratio of two polynomials z=
f1 (t) ; f2 (t)
(4.10)
the existence of such a map will be evident from its explicit construction. By using the SL(2, C) symmetry group of the z sphere, we will place the twist operators σn , σm , σq at z = 0, z = a, z = ∞ respectively. We can assume without loss of generality that n ≤ q,
m ≤ q.
(4.11)
Note that we had placed a cutoff in the z plane to remove the region at infinity, and it will not be immediately clear how to normalize a twist that occurs around the circle at infinity. We will discuss this issue of normalization later. By making an SL(2, C) transformation t = at+b ct+d of the surface , which we assume is parameterized by the coordinate t, we can take z(t = 0) = 0,
z(t = ∞) = ∞,
z(t = 1) = a.
(4.12)
Note that this SL(2, C) transformation maintains the form (4.10) of z to be a ratio of two polynomials, and we will use the symbols f1 , f2 to denote the polynomials after the choice (4.12) has been made. Since we need s values of t for a generic value of z, with s given by (4.8), the relation (4.10) should give a polynomial equation of order s for t. Thus the degrees d1 , d2 of the polynomials f1 , f2 should satisfy: max(d1 , d2 ) = s =
1 (n + m + q − 1). 2
Since we have chosen t = ∞ for z = ∞, we get d1 > d2 , and we have d1 =
1 (n + m + q − 1). 2
(4.13)
416
O. Lunin, S. D. Mathur
The requirement of the proper behavior at infinity (z ∼ t q ) then gives: d2 = d1 − q =
1 (n + m − q − 1). 2
(4.14)
Finally, the number of indices common between the permutations σn (0) and σm (a) (the overlap) is 1 (n + m − (q − 1)) = d2 + 1. 2
(4.15)
Let us now look at the structure required of the map (4.10). For z → 0 we need z = t n (C0 + O(t)).
(4.16)
z = a + (t − 1)m (C1 + O(t − 1)).
(4.17)
dz = f1 f2 − f2 f1 = Ct n−1 (1 − t)m−1 dt
(4.18)
Similarly for t → 1 we need
Then we find f22
(C is a constant). The last step follows on noting that the expression f1 f2 − f2 f1 is a polynomial of degree d1 + d2 − 1 = n + m − 2, and the behavior of z near z = 0, z = a already provides all the possible zeros of this polynomial f22 dz dt . The expression in (4.18) is just the Wronskian of f1 , f2 , and our knowledge of this Wronskian gives an easy way to find these polynomials. We seek a second order linear differential equation whose solutions are the linear span f = αf1 + βf2 . Such an equation is found by observing that f f f f1 f1 f1 = 0 (4.19) f2 f f 2 2 so that we get the equation Wf − W f + c(t)f = 0,
(4.20)
where W = f2 f1 − f1 f2 ,
c(t) = f2 f1 − f1 f2 .
(4.21)
Here W is given by (4.18). The coefficient −W of f is −W = −Ct n−2 (t − 1)m−2 [(n − 1) − (n + m − 2)t].
(4.22)
The coefficient c(t) must be a polynomial of degree n + m − 4 but in fact we can argue further that it must have the form γ t n−2 (1 − t)m−2 ,
γ = constant
(4.23)
Correlation Functions for M N /SN Orbifolds
417
To see this look at the equation near t = 0. Let c(t) ∼ αt k with k < n − 2. Then the equation reads t n−1−k f − (n − 1)t n−2−k f +
α f = 0. C
(4.24)
Note that the two polynomials f1 , f2 which solve the equation must not have a common root t = 0, since we assume that (4.10) is already expressed in reduced form. Thus at least one of the solutions must go like f ∼ constant at t = 0, which is in contradiction with (4.24) since the first two terms on the LHS vanish while the last does not (a = 0 by definition). Thus c(t) has a zero of order at least n − 2 at t = 0, and by a similar argument, a zero of order at least m − 2 at t = 1. Thus the result (4.23) follows. Dividing through by Ct n−2 (1 − t)m−2 we can write Eq. (4.20) as t (1 − t)f + [−(n − 1) + (n + m − 2)t]f + γ˜ = 0.
(4.25)
Let us now look at t → ∞, and let the solutions to the above equation go like t p . Then we get −p(p − 1) + p(m + n − 2) + γ˜ = 0,
(4.26)
which has the solutions p± =
1 m + n − 1 ± (m + n − 1)2 + 4γ˜ . 2
(4.27)
But since we have a twist operator of order q at infinity, we must have p+ − p− = q.
(4.28)
This gives γ˜ =
1 (q − m − n + 1)(q + m + n − 1) = −d1 d2 . 4
(4.29)
Thus we have found the equation which is satisfied by both f1 and f2 : t (1 − t)y + (−n + 1 − (−d1 − d2 + 1)t)y − d1 d2 y = 0,
(4.30)
which is the hypergeometric equation. Its general solution is given by y = AF (−d1 , −d2 ; −n + 1; t) + Bt n F (−d1 + n, −d2 + n; n + 1; t).
(4.31)
The map we are looking for can be written as z=a
d2 !d1 ! F (−d1 + n, −d2 + n; n + 1; t) >(1 − n) tn , F (−d1 , −d2 ; −n + 1; t) n!(d1 − n)! >(1 − n + d2 )
(4.32)
where we have chosen the normalizations of f1 , f2 such that the t = 1 maps to z = a. In our case d1 , d2 and n are integers. Some of the individual terms in the above expression are undefined for integer d1 , d2 , n and a limit should be taken from non-integer values of n (while keeping d1 , d2 fixed at their integer values). We can write the result in a well
418
O. Lunin, S. D. Mathur
defined way by using Jacobi polynomials, which are a set of orthogonal polynomials defined through the hypergeometric function 1−x n+α (α,β) Pn (x) ≡ F −n, n + α + β + 1; α + 1; n 2 n 1 cn = (n + α + β + 1) . . . (n + α + β + ν) (4.33) ν n! ν=0 x−1 ν . · (α + ν + 1) . . . (α + n) 2 Then (4.32) becomes (n,−d1 −d2 +n−1)
z = at n Pd1 −n
−1 (−n,−d1 −d2 +n−1) (1 − 2t) Pd2 (1 − 2t) .
(4.34)
We will have occasion to use the Wronskian of the polynomials later, and we define W˜ to be normalized as follows d n (n,−d1 −d2 +n−1) (−n,−d1 −d2 +n−1) W˜ (t) = t Pd1 −n (1 − 2t) Pd2 (1 − 2t) dt d (−n,−d1 −d2 +n−1) (n,−d −d +n−1) − t n Pd1 −n 1 2 (1 − 2t) Pd2 (1 − 2t) (4.35) dt nd1 ! >(d2 − n + 1) n−1 = t (1 − t)d1 +d2 −n . n!d2 !(d1 − n)! >(1 − n) We will also have occasion to use the relation (4.32) containing hypergeometric functions, and we define W (t) = t n F (−d1 + n, −d2 + n; n + 1; t) F (−d1 , −d2 ; −n + 1; t) − t n F (−d1 + n, −d2 + n; n + 1; t)F (−d1 , −d2 ; −n + 1; t) = nt
n−1
(1 − t)
d1 +d2 −n
(4.36)
.
We will calculate the three point function using the map (4.32), (4.34) in the next section. 5. The Liouville Action for the 3-Point Function Let us evaluate the three point function
σn (0)σm (a)σq (∞)
(5.1)
using the map (4.32), (4.34). Recall that we cut circles of radius in the z plane around the twist operators at z = 0 and z = a to regularize these twist operators. But unlike the case of the 2-point function discussed in Sect. 3, now we have the twist operator σq inserted at infinity. This means that the fields X I have boundary conditions around z = ∞ such that q of the XI form a cycle under rotation around the circle X i1 → X i2 → . . . Xiq → X i1 , while the remaining fields XI are single valued around this circle. Note that if the covering surface has s sheets over a generic z then there will be s − q such single valued fields XI .
Correlation Functions for M N /SN Orbifolds
419
The covering surface will have punctures at t = 0 and t = 1 corresponding to z = 0 and z = a respectively. In addition it will have punctures corresponding to the “puncture at infinity” in the z plane. These latter punctures are of two kinds. The first kind of puncture in the t plane will correspond to the place where q sheets meet in the z plane – i.e., the lift of the point where the twist operator was inserted. But we will also have s − q other punctures in the t plane that correspond to the cut at |z| = 1/δ for the XI that are single valued around z = ∞. We will choose (when defining the “regular region”) a cutoff at value |z| = 1/δ˜ for the first kind of puncture (i.e. the puncture arising from fields X I that are twisted at z = ∞) and a value |z| = 1/δ for the second kind of puncture (i.e. punctures for fields X I which are not twisted at infinity). We will see that both δ and δ˜ cancel from all final results. 5.1. The contribution from z = 0, t = 0. Let us first consider the point z = 0 which gives t = 0. Near this point the map (4.32) gives: d2 !d1 ! zn!(d1 − n)!>(1 − n + d2 ) 1/n >(1 − n) z≈a . (5.2) t n, t ≈ n!(d1 − n)! >(1 − n + d2 ) ad2 !d1 !>(1 − n) Note that by using the relation >(x)>(1 − x) =
π sin(π x)
we can write (n − 1)! >(n − d2 ) sin(π(n − d2 )) >(1 − n) = = (−1)d2 , >(n) sin(π n) (n − d2 − 1)! >(1 − n + d2 ) so that the > functions in the above expressions are in reality well defined. The Liouville field and its derivative are given by: nad2 !d1 ! (n − d2 − 1)! n−1 n−1 φ ≈ log t , + c.c., ∂t φ ≈ n!(d1 − n)! (n − 1)! t
(5.3)
(5.4)
(5.5)
where we have dropped the factor (−1)d2 in (5.4) since φ is the real part of the logarithm. Substituting these values into the expression for the Liouville action: i SL = (5.6) dtφ∂t φ, 96π we get a contribution from the point t = 0: n−1 n − 1 n−1 d2 !d1 ! (n − d2 − 1)! log n n − log a , SL (t = 0) = − 12 12n n!(d1 − n)! (n − 1)! (5.7) where we note that the integration in (5.6) is performed along the circle n!(d1 − n)!(n − 1)! 1/n |t| = . ad2 !d1 !(n − d2 − 1)!
(5.8)
A simplification analogous to (5.4) will occur in many relations below, but for simplicity we leave the > functions in the form where they have negative arguments; we replace them with factorials of positive numbers only in the final expressions.
420
O. Lunin, S. D. Mathur
5.2. The contribution from z = a, t = 1. Let us look at the point t = 1. Using the expression for the Wronskian (4.36), we find the derivative of the map (4.32): d2 !d1 ! >(1 − n) nt n−1 (1 − t)d1 +d2 −n dz =a , dt n!(d1 − n)! >(1 − n + d2 ) [F (−d1 , −d2 ; −n + 1; t)]2
(5.9)
which can be combined with the known property of the hypergeometric function: F (a, b; c; 1) =
>(c)>(c − a − b) >(c − a)>(c − b)
(5.10)
to give the result: z ≈ a − βa(1 − t)d1 +d2 −n+1 , d1 !d2 !(d1 − n)! >(1 − n + d2 ) n . β= d1 + d2 − n + 1 n! [(d1 + d2 − n)!]2 >(1 − n)
(5.11) (5.12)
Our usual analysis gives:
1 z − a d1 +d2 −n+1 , 1−t ≈ − , z ≈ a − βa(1 − t) aβ d1 + d2 − n φ ≈ log aβ(d1 + d2 − n + 1)(1 − t)d1 +d2 −n + c.c., ∂t φ ≈ − , 1−t
d1 +d2 −n+1
d1 + d2 − n (5.13) log(d1 + d2 − n + 1) 12 (d1 + d2 − n)2 (d1 + d2 − n) − log − log(a|β|). 12(d1 + d2 − n + 1) 12(d1 + d2 − n + 1)
SL (t = 1) = −
Note that d1 + d2 − n + 1 = m, so that we can rewrite the contribution from t = 1 in a way which makes it look more symmetrical with the contribution from t = 0. But we will defer all such simplifications to the final expressions for the fusion coefficients. 5.3. The contribution from z = ∞. To analyze the contribution from the point t = ∞ it is convenient to look at the map written in terms of Jacobi polynomials (4.34). Then one can use the Rodrigues’ formula to represent the Jacobi polynomials in the form: (αβ) Pk (x)
=2
−k
k k+α k+β (x − 1)k−j (x + 1)j . j k−j
(5.14)
j =0
The limit x → ∞ gives: (αβ) Pk (x)
k −k
→x 2
k k+α k+β 2k + α + β k −k =x 2 . j k−j k j =0
(5.15)
Correlation Functions for M N /SN Orbifolds
421
Substitution of this limit into the expression (4.34) gives the behavior near t = ∞ : 1 z d1 −d2 z ≈ aγ (−1)d1 −d2 −n t d1 −d2 , t ≈ (−1)d1 −d2 −n , aγ d2 !(d1 − d2 − 1)! >(−d1 ) γ = , (5.16) (d1 − n)!(n − d2 − 1)! >(d2 − d1 ) d1 − d2 − 1 φ ≈ log aγ (d1 − d2 )t d1 −d2 −1 + c.c., ∂t φ ≈ . t Consider first the point z = ∞, t = ∞. Recall that we have taken the “regular region” on to be bounded by the image of 1/δ˜ (rather than 1/δ) when a twist operator is inserted. The contour around the puncture at infinity in the t plane should be taken to go clockwise rather than anti-clockwise, so that it looks like a normal anti-clockwise contour in the local coordinate t = 1/t around the puncture. Thus to compute the contribution from this puncture we should follow our usual procedure but reverse the overall sign. The result reads: d1 − d2 − 1 SL (t = ∞) = (−1) − log(d1 − d2 ) 12 (5.17) (d1 − d2 − 1)2 d 1 − d2 − 1 ˜ + log δ − log(a|γ |) . 12(d1 − d2 ) 12(d1 − d2 ) Finally let us analyze the images of z = ∞ that give finite values ti of t. At each of these points the map t → z is one–to–one, in contrast to the above case z = ∞, t = ∞, where q values of t correspond to each value of z in a neighborhood of the puncture. Further, there is no sign reversal for the contour of integration around these punctures when we use the coordinate t to describe the contour. Looking at the structure of the map (4.34) one can easily identify the locations of the ti : they coincide with zeroes of the denominator. So to evaluate the contribution to the Liouville action from the ti we will need some information about zeroes of Jacobi polynomials. Using the fact that Jacobi polynomials have only simple zeroes we can expand the map (4.34) around any of the ti : z≈ ξi =
atin
(n,−d1 −d2 +n−1)
Pd1 −n
(1 − 2ti )
(−n,−d1 −d2 +n−1) P d2 (1 − 2ti ) t (n,−d −d +n−1) Pd1 −n 1 2 (1 − 2ti ) n . ti d (−n,−d1 −d2 +n−1) (1 − 2ti ) dt Pd2
1 aξi ≡ , − ti t − ti
Then everything can be evaluated in terms of ξi : aξi −aξi t − ti ≈ + c.c., , φ ≈ log z (t − ti )2 1 SL (t = ti ) = − log(δ 2 aξi ). 6 Collecting the contributions from all the ti we get:
(5.18)
(5.19)
∂t φ ≈ −
2 , t − ti (5.20)
d
SL (all ti ) = −
2 d2 1 log(δ 2 a) − log(ξi ), 6 6
i=1
(5.21)
422
O. Lunin, S. D. Mathur
and we only need to evaluate the product of ξi . Note that the regularization parameter δ we use here has the same meaning as one considered in Sect. 3. This product can be written in terms of the Wronskian (4.35) and the discriminant of Jacobi polynomials. To see this we first rewrite (5.19) in terms of zeroes of Jacobi polynomials. If z = P /Q, then the Wronskian (4.35) is W˜ = P Q − P Q = −P Q at a zero of Q. Writing any of the ξi as ξ = P /Q = P Q /Q 2 we find −2 ξi = −W˜ (ti ) − 2a0 (xi − xj ) ,
(5.22)
j =i
where a0 is the coefficient in front of the highest power in the polynomial (−n,−d1 −d2 +n−1) ; it can be evaluated using (5.15). The xi are the zeros of the polynomial Pd2 Q(x) in the denominator. Applying the general definition of the discriminant to Jacobi polynomials (−n,−d1 −d2 +n−1) ≡ a02d2 −2 (xi − xj )2 , (5.23) Dd2 i<j
we get d2
ξi =
2(d −2) (−1)d2 2−2d2 a0 2
i=1
d2 (−n,−d1 −d2 +n−1) −2 Dd2 W˜ (ti ). i=1
(5.24)
The discriminant of Jacobi polynomials can be evaluated [17]: (−n,−d1 −d2 +n−1)
D ≡ Dd2 ×
d2
= 2−d2 (d2 −1)
(5.25)
j j +2−2d2 (j − n)j −1 (j − d1 − d2 + n − 1)j −1 (j − d1 − 1)d2 −j .
j =1
To evaluate the right-hand side of (5.24) we only need the expressions for d2 i=1
ti
and
d2
(1 − ti ).
(5.26)
i=1
Let us consider the general Jacobi polynomial: (αβ)
Pk
(1 − 2t) = (−2)k a0 t k + · · · + ak+1 = (−2)k b0 (t − 1)k + · · · + bk+1 . (5.27)
Obviously b0 = a0 . By taking the limits t → ∞, t → 0 and t → 1 in the above expression we find: d2
ti =
ak+1 >(k + α + 1)>(k + α + β + 1) , = k 2 a0 >(α + 1)>(2k + α + β + 1)
(5.28)
(1 − ti ) =
bk+1 >(k + β + 1)>(k + α + β + 1) . = k 2 b0 >(β + 1)>(2k + α + β + 1)
(5.29)
i=1 d2 i=1
Correlation Functions for M N /SN Orbifolds
423
Collecting all contributions together, we get log
d2 i=1
ξi = − 2d2 (d2 − 1) log 2 + d2 log n − 2 log D − (3d2 − 4) log d2 !
d1 ! (n − 1)! + (n + d2 − 1) log (5.30) n!(d1 − n)! (n − d2 − 1)! (d1 − d2 )! (d1 + d2 − n)! + (d1 − d2 + 3) log + (d1 + d2 − n) log . d1 ! (d1 − n)! + d2 log
5.4. The total Liouville action. Collecting the contributions from the different branching points we obtain the final expression for the Liouville action (n − 1)2 (d1 + d2 − n)2 (d1 − d2 − 1)2 (1) SL = − + log − log δ˜ 12n 12(d1 + d2 − n + 1) 12(d1 − d2 ) d2 n−1 d1 +d2 −n d1 −d2 −1 d2 − log δ − + − + log a 3 12n 12(d1 +d2 −n+1) 12(d1 −d2 ) 6 n−1 d1 + d 2 − n − log n − log(d1 + d2 − n + 1) (5.31) 12 12 d1 −d2 −1 n−1 d1 !d2 ! >(1−n) + log(d1 − d2 ) − log 12 12n n!(d1 − n)! >(1−n+d2 ) d 2 d1 + d2 − n d 1 − d2 − 1 1 ξi . − log |β| + log |γ | − log 12(d1 + d2 − n + 1) 12(d1 − d2 ) 6 i=1
The values of β and γ are given by (5.12) and (5.16), and the last term is given through (5.30). According to (2.44), the three point function is given by: ˜
(1)
(2)
σn (0)σm (a)σqδ (∞)δ = eSL eSL
(3)
+SL
Z (ˆs ) , (Zδ )s
(5.32)
where s is number of fields involved in permutation; it is defined by (4.8). Note that we (2) (3) have not determined the values of SL and SL for the case under consideration, as we will see these quantities will cancel in the final answer. 6. Normalizing the Twist Operators This Liouville action (5.31) yields the correlation function for twist operators with the ˜ We immediately see that the power of a in the correlator regularization parameters , δ, δ. is d 1 + d2 − n d 1 − d2 − 1 d2 n−1 + − + a :− 12n 12(d1 + d2 − n + 1) 12(d1 − d2 ) 6
1 n 1 1 1 1 q + Ma + Mam − Ma = − n− +m− −q + , (6.1) 2 6 n m q
424
O. Lunin, S. D. Mathur
which agrees with the expected a dependence of the 3-point function ˜
σn (0)σm (a)σqδ (∞) ∼ |a|−2(5m +5n −5q ) .
(6.2)
To obtain the final correlation functions and fusion coefficients we have two sources of renormalization coefficients that need to be considered: (a) We have to normalize the operators σn such that their 2-point functions are set to unity at unit z separation; at this point we should find that the parameters , δ, δ˜ disappear from the 3-point (and higher point) functions as well. After we normalize the twist operators σn in this way we will call them σn .
(b) The CFT had N fields XI , though only n of them are affected by the twist operator σn . However at the end of the calculation of any correlation function of the operators σni we must sum over all the possible ways that the ni fields that are twisted can be chosen from the total set of N fields. Thus we will have to define operators On that are sums over conjugacy classes of the permutation group, and these operators On are the only ones that will finally be well defined operators in the CFT [10]. The correctly normalized On will thus have combinatoric factors multiplying the normalized operators σn . We choose to arrive at the final normalized operators On in these two steps since the calculations involved in steps (a) and (b) are quite different; further when ni 1/δ region: there is no curvature on the torus t and 1 dz α d z˜ =− 2 2 ≈ − 2, dt δ z dt δ
(7.47)
giving a constant φ to leading order and thus a vanishing kinetic term for φ. Thus (1) SL = SL and the general expression (2.44) gives: (1)
σ2 (0)σ2 (1)σ2 (w)σ2 (z∞ )δ = eSL
Z (ˆs ) . (Zδ )2
(7.48)
To obtain the normalized 4-point function we write
σ2 (0)σ2 (1)σ2 (w)σ2 (∞) ≡
lim
|z∞ |452 σ2 (0)σ2 (1)σ2 (w)σ2 (z∞ )
lim
|z∞ |452
|z∞ |→∞
˜
=
|z∞ |→∞
σ2 (0)σ2 (1)σ2 (w)σ2δ (z∞ )
σ2 (0)σ2 (1)2
(7.49)
= 2−2/3 |w(1 − w)|− 12 Zτ . 1
Here we have used the fact that in this case the partition function Z (ˆs ) in (2.16) is that on the flat torus with modular parameter τ given through (7.38). Since the group S 2 equals the group Z2 , we can compare (7.49) with the 4-point function obtained for σ2 operators for the Z2 orbifold in [9]. Equation (7.49) agrees with (4.13) of [9] for the case of a noncompact boson field and with (4.16) for the compact boson field. One observes that if the fields Xi are noncompact bosons, then as w → 1 we find a factor log(w − 1) in the OPE in addition to the expected power (w − 1)1/4 . We suggest the following interpretation of this logarithm. There is a continuous family of momentum modes for the noncompact boson, with energy going to zero. If we do not orbifold the target space, then momentum conservation allows only a definite momentum mode to appear in the OPE of two fields. But the orbifolding destroys the translation invariance in X1 − X2 , and nonzero momentum modes can be exchanged between sets of operators where each set does not carry any net momentum charge. The exchange of such modes (with dimensions accumulating to zero) between the pair σ2 (w)σ2 (1) and the pair σ2 (0)σ2 (∞) gives rise to the logarithm. Of course when the boson is compact, this logarithm disappears, as can be verified from (7.49) or the equivalent results in [9]. 8. Discussion The motivation for our study of correlation functions of symmetric orbifolds was the fact that the dual of the AdS3 × S 3 × M spacetime (which arises in black hole studies) is the CFT arising from the low energy limit of the D1–D5 system, and the D1–D5 system is believed to be a deformation of an orbifold CFT (with the undeformed orbifold as a special point in moduli space). To study this duality we must really study the supersymmetric orbifold theory, while in this paper we have just studied the bosonic theory. It turns out however that the supersymmetric orbifold can be studied with only
438
O. Lunin, S. D. Mathur
a small extension of what we have done here. Following [14] we can bosonize the fermions. Then if we go to the covering space near the insertion of twist operator then we find only the following difference from the bosonic case – at the location of the twist operator we do not have the identity operator, but instead a “charge operator” of the form P (∂Xa , ∂ 2 Xa , . . . )ei ka Xa . Here Xa are the bosons that arise from bosonizing the complex fermions, and P is a polynomial expression in its arguments. It is easy to compute the correlation function of these charge operators on the covering space , and then we have the twist correlation functions for the supersymmetric theory. We will present this calculation elsewhere, but here we note that many properties of interest for the supersymmetric correlation functions can be already seen from the bosonic analysis that we have done here. In this section we recall the features of the AdS/CFT duality map and analyse some properties of the 3-point functions in the CFT.
8.1. “Universality” of the correlation functions. We have mentioned before that while we have discussed the orbifold theory R N /SN (where the coordinate of R gave X, a real scalar field), we could replace the CFT of X by any other CFT of our choice, and the calculations performed here would remain essentially the same. When the covering surface had genus zero, the results depend only on the value of c, and thus if we had (T 4 )N /SN theory or a (K3 )N /SN theory, then we would simply choose c = 4 in the Liouville action (2.17) (instead of c = 1). If had g = 1, then we would need to put in the partition function (of a single copy) of T 4 or K3 for the value of Z (ˆs ) in (2.16). But apart from the value of c and the value of partition functions on there is no change in the calculation. Thus in particular the 3-point functions that we have computed at genus zero are universal in the sense that if we take them to the power c then we get the 3-point functions for any CFT of the form M N /SN with the CFT on M having central charge c. There is a small change in the calculation when we consider the supersymmetric case. The fermions from different copies of M anticommute, and the twist operators carry a representation of the R symmetry. As a consequence the dimensions of the twist operators are not given by (3.21), but for the supersymmetric theory based on M = T 4 are given by 21 (n − 1). However as mentioned above, our analysis can be extended with small modifications to such theories as well. Note that our method does not work if we have an orbifold group other than SN . Thus for example if we had a ZN orbifold of a complex boson [9], then we could go to the covering space over a twist operator σn , but not write the CFT in terms of an unconstrained field on this covering space. The reason is that we have n sheets or more of the cover over any point in the base space, but the central charge of the theory is just 2, and so we cannot attribute one scalar field to each sheet of the cover. Thus our method, and its associated universalities, are special to SN orbifolds, where a twist operator just permutes copies of a given CFT but does not exploit any special symmetry of the CFT itself.
8.2. The genus expansion and the fusion rules of WZW models. We have studied the orbifold CFT on the plane, but found that the correlation functions can be organized in a genus expansion, arising from the genus of the covering surface . In the large 1 N limit the contribution of a higher genus surface goes like 1/N g+ 2 . This situation is similar to that in the Yang–Mills theory that is dual to AdS5 × S 5 . The Yang–Mills theory has correlation functions that can be expanded in a genus expansion, with higher
Correlation Functions for M N /SN Orbifolds
439
genus surfaces supressed by 1/N g . In the Yang–Mills theory the genus expansion has its origins in the structure of Feynman diagrams for fields carrying two indices (the “double line representation” of gauge bosons). In our case we have quite a different origin for the genus expansion. In the case of AdS5 × S 5 it is believed that the genus expansion of the dual Yang–Mills theory is related to the genus expansion of string theory on this spacetime, though the precise relationship is not clear. It would be interesting if the genus expansion we have for the D1–D5 CFT would be related to the genus expansion of the string theory on AdS3 × S 3 × M. In this context we observe the following relation. It was argued in [7] that the orbifold CFT M N /SN indeed corresponds to a point in the D1–D5 system moduli space. Further, at this point we have the number of 1-branes (n1 ) and of 5-branes (n5 ) given by n5 = 1, N = n1 n5 = n1 . The dual string theory is in general an SU (2) Wess–Zumino– Witten (WZW) model [18], though at the orbifold point of the CFT this string theory is complicated to analyze. The twist operators σn , n = 1 . . . N of the CFT (σ1 =Identity) correspond to WZW primaries with j = (n − 1)/2, 0 ≤ j ≤ N−1 2 . Since in a usual WZW model we have 0 ≤ j ≤ k/2, we set k = N − 1. The fusion rules for the WZW model, which give the 3-point functions of the string theory on the sphere (tree level) are as follows. The spins j follow the rules for spin addition in SU (2), except that there is also a “truncation from above” (j1 , j2 ) → j3 , |j1 − j2 | ≤ j3 ≤ |j1 + j2 |,
j1 + j2 + j3 ≤ k.
(8.1)
Now consider the 3-point function in the orbifold CFT, for the case where the genus of the covering surface is g = 0 . The ramification order of at the insertion of σni is ri = (ni −1) = 2ji . The rules in (4.4), (4.5), (4.6) translate to |j1 −j2 | ≤ j3 ≤ |j1 +j2 |. Further, the number of sheets s is bounded as s ≤ N . Then the relation (4.1) gives ri = g − 1 + s ≤ −1 + N → j1 + j2 + j3 ≤ k. (8.2) 2 i
While (8.2) is a relation for the bosonic orbifold theory, we expect an essentially similar relation for the supersymmetric case. Thus we observe a similarity between the g = 0 3-point functions of the WZW model (8.1) and of the CFT (8.2). At genus g = 1 however, we find that any three spins j1 , j2 , j3 can give a nonzero 3-point function in the string theory. In the orbifold CFT, however, we get only a slight relaxation of the rule (8.2): we get j1 + j2 + j3 ≤ k + 1. Roughly speaking we can reproduce this rule in the string theory if we require that in the string theory one loop diagram there be a way to draw the lines such that only spins j ≤ 1/2 be allowed to circulate in the loop. Of course we are outside the domain of any good perturbation expansion at this point, since if the spins are of order k then there is no small parameter in the theory to expand in, and thus there is no requirement that there be an exact relation between the rules in a WZW string theory and the rules in the orbifold CFT. We note that in [14] the 3-point functions of chiral primaries that were studied had “one overlap” in their indices. This corresponds to j1 +j2 = j3 in the above fusion rules, and since for the supersymmetric case the dimension is linear in the charge, we also have 51 + 52 = 53 . This corresponds to the case of “extremal” correlation functions in the language of [20]. In [14] the 3-point correlators for this special case were found by an elegant recursion relation, which arises from the fact that there is no singularity in the OPE, and thus the duality relation of conformal blocks becomes a “chiral ring” type of
440
O. Lunin, S. D. Mathur
associativity law among the fusion coefficients. It is not clear however how to extend this method to the non-extremal case, and one motivation for the present work was to develop a scheme to compute the correlators for j1 + j2 < j3 , which corresponds to more than one overlap. In the case of one overlap we have extended our calculations to the supersymmetric case, and found results in agreement with [14]. 8.3. 3-point couplings and the stringy exclusion principle. In the AdS5 × S 5 case the 3-point couplings of supergravity agree with the large N limit of the 3-point functions in the free Yang–Mills theory; thus there is a nonrenormalization of this correlator as the coupling g is varied. It is not clear if a similar result holds for the AdS3 × S 3 × M case, and even less clear what nonrenormalization theorems hold at finite N . But it is nevertheless interesting to ask how the correlators in the orbifold CFT behave as we go from infinite N to finite N , and in particular what happens as we approach the limits of the stringy exclusion principle. Thus we examine the ratio √ N¯ On Om Oq N¯ ¯ R(m, n, q; N ) ≡ . (8.3) √ limN→∞ N On Om Oq N where the √ subscripts on the correlator give the value of N . We have rescaled the correlators by N√to obtain the effective coupling of the 3-point function; the correlator itself goes as 1/ N . For n, m, q 0 : T n x ∈ A} = ∞ for µ-almost every point x ∈ A. Unfortunately this information is only of qualitative nature. In particular it does not address the following natural problems: 1. with which frequency an orbit visits a given set of positive measure; 2. with which rate a given point returns to an arbitrarily small neighborhood of itself. L. B. was partially supported by FCT’s Funding Program, and grants PRAXIS XXI 2/2.1/MAT/199/94 and NATO CRG970161. B. S. was supported by FCT’s Funding Program and by the Center for Mathematical Analysis, Geometry, and Dynamical Systems.
444
L. Barreira, B. Saussol
The Birkhoff Ergodic Theorem provides a comprehensive answer to the first problem. The second problem has been given considerable growing interest during the last decade, also in connection with other fields, including compression algorithms, numerical study of dynamical systems, and applications in linguistics. In particular, there exist several results towards a partial answer of this problem, including the noteworthy work of Boshernitzan [3] and Ornstein and Weiss [8] (see Sect. 2 and 4 for details). The purpose of this paper is to provide a comprehensive answer to the abovementioned problem 1, concerning the quantitative behavior of recurrence. In particular, our results are non-trivial generalizations of the above-mentioned results of Boshernitzan, and provide a dimensional version of the work of Ornstein and Weiss for the entropy (see Sects. 2 and 4 for explanations and examples). We emphasize that our approach uses different techniques. In particular we obtain a new proof of one of the main results of Boshernitzan in [3]. We now illustrate our results with a rigorous statement; see Sect. 4 for details. We shall prove that if µ is an ergodic Gibbs measure of a C 1+α diffeomorphism f on a locally maximal hyperbolic set, then log inf{k > 0 : f k x ∈ B(x, r)} log µ(B(x, r)) = lim r→0 r→0 − log r log r lim
(1)
for µ-almost every point x, where B(x, r) is the ball of radius r centered at x. Note that the identity (1) relates two quantities of very different nature, called respectively recurrence rate and pointwise dimension. In particular, only the first quantity depends on the diffeomorphism, while only the second quantity depends on the measure. Furthermore, our results motivate the introduction of a new method to compute the Hausdorff dimension of a measure. The structure of the paper is as follows. The main statements and inequalities relating the lower and upper pointwise dimensions, and the lower and upper recurrence rates are formulated and discussed in Sect. 2 and 3. We also present examples which indicate that the hypotheses in our results are optimal. In Sect. 4 we apply those results to the case of equilibrium measures supported on locally maximal hyperbolic sets, and establish the identity (1) for µ-almost every point. Section 5 contains an application to suspension flows. The proofs are collected in Sect. 6.
2. Lower Bounds for the Pointwise Dimension Let T : X → X be a Borel measurable transformation on the separable metric space (X, d). Note that T is not necessarily invertible. We define the return time of a point x ∈ X into the open ball B(x, r) by τr (x) = inf{k ∈ N : T k x ∈ B(x, r)} = inf{k ∈ N : d(T k x, x) < r}, def
where N denotes the set of positive integers. We also define the lower and upper recurrence rates of x by log τr (x) r→0 − log r
R(x) = lim
log τr (x) . r→0 − log r
and R(x) = lim
Hausdorff Dimension of Measures via Poincaré Recurrence
445
Furthermore, the lower and upper pointwise dimensions of µ at a point x ∈ X are given by log µ(B(x, r)) log r r→0
d µ (x) = lim
and
d µ (x) = lim
r→0
log µ(B(x, r)) . log r
The following statement provides upper bounds for the lower and upper recurrence rates respectively in terms of the lower and upper pointwise dimensions. Theorem 1. If T : X → X is a Borel measurable transformation on a measurable set X ⊂ Rd for some d ∈ N, and µ is a T -invariant probability measure on X, then R(x) ≤ d µ (x) and R(x) ≤ d µ (x)
(2)
for µ-almost every x ∈ X. It follows from Whitney’s embedding theorem that if X is an arbitrary subset of a finite-dimensional smooth manifold, then it can be smoothly embedded into Rd for some d ∈ N, and thus Theorem 1 applies. Example 3 in Sect. 4 illustrates that the inequalities in (2) may be strict on a set of positive measure. Boshernitzan proved in [3] that if the α-dimensional Hausdorff measure mα is σ finite on X (that is, if X can be written as a countable union of sets Xi for i = 1, 2, . . . such that mα (Xi ) < ∞ for all i), and µ is an invariant probability measure on X, then lim [n1/α · d(T n x, x)] < ∞
n→∞
for µ-almost every x ∈ X. He also showed that if, in addition, mα (X) = 0, then lim [n1/α · d(T n x, x)] = 0
n→∞
(3)
for µ-almost every x ∈ X. Recall that the Hausdorff dimension of a probability measure µ on X is given by dimH µ = inf{dimH Z : µ(Z) = 1}, where dimH Z denotes the Hausdorff dimension of the set Z. The measure µ is called exact dimensional if there exists a constant d such that d µ (x) = d µ (x) = d for µ-almost every x ∈ X.
(4)
It follows from Young’s criteria (see [13] for details) that if (4) holds, then dimH µ = d. In our setting Boshernitzan’s result can be reformulated in the following manner (for details, see Sect. 6 and in particular Lemma 4). Theorem 2 ([3]). If T is a Borel measurable transformation on the separable metric space X, and µ is a T -invariant probability measure on X, then R(x) ≤ dimH µ for µ-almost every x ∈ X. We can also rephrase the first inequality in (2) in a form similar to (3). Theorem 3. If T : X → X is a Borel measurable transformation on a measurable set X ⊂ Rd for some d ∈ N, and µ is a T -invariant probability measure on X, then (3) holds for µ-almost every x ∈ X such that d µ (x) < α.
446
L. Barreira, B. Saussol
Using Young’s criteria (see [13]), one can show that dimH µ ≥ d µ (x) for µ-almost every x ∈ X. Therefore, Theorem 3 may in general provide a stronger statement than that in (3), and the first inequality in (2) may be sharper than that in Theorem 2. This possibility indeed occurs in the following example. Example 1. In [10], Pesin and Weiss presented an example of a Hölder homeomorphism in a closed subset X of [0, 1], whose unique (and thus ergodic) measure of maximal entropy µ is not exact dimensional. More precisely, there exist disjoint sets A1 , A2 ⊂ [0, 1] with positive µ-measure whose union is equal to X, and there exist positive constants c1 and c2 with c1 = c2 such that µ|Ai is exact dimensional and d µ (x) = d µ (x) = ci for µ-almost every x ∈ Ai and i = 1, 2. Clearly dimH µ = max{c1 , c2 } and thus d µ (x) < dimH µ on a set of positive µ-measure (on the set Ai with i such that ci = min{c1 , c2 }). This example illustrates that in general Theorem 3 provides a stronger statement than that in (3). Therefore, one can see the first inequality in Theorem 1 as a non-trivial generalization of one of Boshernitzan’s main results in [3]. Furthermore, we are able to give an estimate for the upper recurrence rate, and we shall see (in Sects. 3 and 4) that for several classes of maps and measures the inequalities in (2) are in fact identities on a full measure set. Therefore, the inequalities in Theorem 1 and 3 are optimal. Example 1 also illustrates that for an arbitrary transformation the functions d µ and d µ need not be invariant µ-almost everywhere. Assume now that T is a Lipschitz map with Lipschitz inverse, and let c > 1 be a Lipschitz constant for T and T −1 . It is easy to verify that τcr (T x) ≤ τr (x) ≤ τr/c (T x) for every x ∈ X and every r > 0. Thus R(T x) = R(x)
and
R(T x) = R(x)
for every x ∈ X. Under the assumption on T , one can also verify that for every x ∈ X we have d µ (T x) = d µ (x)
and
d µ (T x) = d µ (x).
3. Coincidence of Recurrence Rate and Pointwise Dimension 3.1. Formulation of the main result. In this section we investigate conditions under which the inequalities in (2) become identities (see also Sect. 4). These conditions will be shown to hold for a large class of invariant measures. The return time of the point y ∈ B(x, r) into B(x, r) is defined by τr (y, x) = inf{k > 0 : d(T k y, x) < r}. def
(5)
One can easily verify that if d(x, y) < r then τ4r (y) ≤ τ2r (y, x) ≤ τr (y).
(6)
For each x ∈ X and r, ε > 0, we consider the set Aε (x, r) = {y ∈ B(x, r) : τr (y, x) ≤ µ(B(x, r))−1+ε }. We shall say that the measure µ has long return time (with respect to T ) if log µ(Aε (x, r)) >1 r→0 log µ(B(x, r)) lim
(7)
Hausdorff Dimension of Measures via Poincaré Recurrence
447
for µ-almost every x ∈ X and every sufficiently small ε > 0. The class of measures µ with long return time includes equilibrium measures supported on locally maximal hyperbolic sets (see Theorem 5 below). On the other hand, Example 2 below illustrates that a T -invariant measure µ may have long return time even if T is not uniformly hyperbolic. See also Sect. 3.2 for a discussion of the relation of the notion of long return time with return time statistics. The following is a considerably strengthened version of Theorem 1 for measures with long return time. Theorem 4. Let T : X → X be a Borel measurable transformation on a measurable set X ⊂ Rd for some d ∈ N, and µ a T -invariant probability measure on X. If µ has long return time, and d µ (x) > 0 for µ-almost every x ∈ X, then R(x) = d µ (x) and R(x) = d µ (x)
(8)
for µ-almost every x ∈ X. See Sect. 3.2 and 4 for applications of Theorem 4. Remark that the recurrence rates R(x) and R(x) are essentially quantities of topological nature, which are defined independently of any measure. Therefore, the identities in (8) provide non-trivial relations between topological and measure-theoretic quantities.
3.2. Relation with return time statistics. We define the distribution of return time of T (with respect to µ) on the ball B(x, r) by def
Fx,r (t) =
µ({y ∈ B(x, r) : τr (y, x) > t/µ(B(x, r))}) µ(B(x, r))
for each t ≥ 0. In a variety of systems with some kind of hyperbolicity it has been established that Fx,r (t) → e−t as r → 0, for µ-almost every x ∈ X. This behavior is known as the exponential statistic of return time, and is becoming an important ingredient in the analysis of recurrence in dynamical systems. This study is closely related to the above-introduced notion of long return time. This can be readily seen from the equation µ(Aε (x, r)) = [1 − Fx,r (µ(B(x, r))ε )]µ(B(x, r)) ≤ µ(B(x, r))1+ε + µ(B(x, r)) sup |Fx,r (t) − e−t |,
(9)
t≥0
which holds for all sufficiently small r > 0. This implies the following criterion for long return time. Proposition 1. Let T be a Borel measurable transformation on the separable metric space X, and µ a probability measure on X. Assume that for µ-almost every x ∈ X there exists γ = γ (x) > 0 such that sup{|Fx,r (t) − e−t | : t ≥ 0} ≤ µ(B(x, r))γ for all sufficiently small r > 0. Then µ has long return time.
448
L. Barreira, B. Saussol
On the other hand, it is not necessary to have an exponential statistic of return time. For example, assume that for µ-almost every x ∈ X there exist γ = γ (x) > 0 and a function Fx : [0, +∞) → [0, 1] which is Hölder continuous in an open neighborhood of 0 such that sup{|Fx,r (t) − Fx (t)| : t ≥ 0} ≤ µ(B(x, r))γ for all sufficiently small r > 0. Note that one must have Fx (0) = 1. Then it follows from (9) that the measure µ has long return time. The following example illustrates that an invariant measure may have long return time even if the map is not hyperbolic. Example 2. Let α ∈ (0, 1) and consider the map T : [0, 1] → [0, 1] defined by x(1 + 2α x α ) if x ∈ [0, 1/2] T (x) = . 2x − 1 otherwise Note that due to the presence of the neutral fixed point 0 (since T (0+ ) = 1), the map is not uniformly hyperbolic. We recall that there exists a unique invariant probability measure µ which is ergodic and absolutely continuous with respect to Lebesgue measure (see, for example, [6] for references). Since the set of points x ∈ [0, 1] such that |T (x)| > 1 has full µ-measure, the measure µ is hyperbolic (see Sect. 4 for the definition). Denote by an the left preimages of a0 = 1. Let ξ be the countable partition of [0, 1] defined by ξ = {(an+1 , an ] : n ≥ 0}. It is proved in [6] that T has exponential statistic of return time for cylinders of the partition ξ . More precisely, the following sharp estimate is established: there exists γ > 0 such that for µ-almost every x ∈ [0, 1] and all sufficiently large m ∈ N, we have µ({y ∈ ξm (x) : τξm (x) (y) > t/µ(ξm (x))}) sup − e−t ≤ µ(ξm (x))γ , µ(ξm (x)) t≥0 where ξm =
m−1 k=0
T −k ξ , and τA (y) = inf{k ∈ N : T k y ∈ A}.
Proceeding as in (9) we conclude that for µ-almost every x ∈ [0, 1] we have µ({y ∈ ξm (x) : τξm (x) ≤ µ(ξm (x))−1+ε }) ≤ 2µ(ξm (x))1+ε for every ε ≤ γ and every sufficiently large m. We now present a simple argument showing that one can replace cylinders by balls in the last inequality. For each sufficiently small r > 0, it is possible to choose integers mr ≥ nr such that ξmr (x) ⊂ B(x, r) ⊂ ξnr (x) with mr /nr → 1 as r → 0. Since τξnr (x) (y) ≤ τr (y) and
µ(B(x, r)) ≥ µ(ξmr (x)),
we obtain µ(Aε (x, r)) ≤ µ({y ∈ ξnr (x) : τξnr (x) (y) ≤ µ(ξmr (x))−1+ε }). In view of the inequalities µ(ξmr (x))−1+ε ≤ µ(ξnr (x))−1+ε/2
and
µ(ξnr (x)) ≤ µ(B(x, r))1−ε/4 ,
Hausdorff Dimension of Measures via Poincaré Recurrence
449
which are valid for all sufficiently small r, we conclude that for µ-almost every x ∈ [0, 1] we have µ(Aε (x, r)) ≤ 2µ(ξnr (x))1+ε/2 ≤ 2µ(B(x, r))(1+ε/2)(1−ε/4) for all sufficiently small r. By taking ε > 0 sufficiently small, this inequality implies that the measure µ has long return time. It follows from Theorem 4 that the identities in (8) hold for Lebesgue almost every point. We remark that it is an open problem to decide whether all hyperbolic measures (not necessarily supported on uniformly hyperbolic sets; see Sect. 4) have long return time.
4. Hyperbolic Gibbs Measures We now consider equilibrium measures supported on a locally maximal hyperbolic set of a C 1+α diffeomorphism. The following result shows that in this situation the identities in (8) hold on a set of full measure. Theorem 5. Let X be a locally maximal hyperbolic set of a C 1+α diffeomorphism on a compact smooth manifold, for some α > 0. If µ is an ergodic equilibrium measure of a Hölder continuous potential on X then: 1. µ has long return time; 2. R(x) = R(x) = dimH µ for µ-almost every x ∈ X. When µ is not ergodic one can consider the finite ergodic decomposition of µ on (relatively) open subsets Xi ⊂ X such that (Xi , T , µ|Xi ) is ergodic. Since each Xi is (relatively) open it follows immediately from Theorem 5 that R(x) = R(x) = dimH µXi for µ-almost every x ∈ Xi . The following example illustrates that for invariant measures which are not hyperbolic the second statement in Theorem 5 may not hold. Example 3. Consider a rotation of the circle by an irrational number ω which is well approximable by rational numbers. We recall that ω is said to be well approximable by rational numbers if ν(ω) > 1, where ν(ω) is the supremum of all ν > 0 such that |ω − p/q| < 1/q ν+1 for infinitely many relatively prime integers p and q. The unique invariant measure of the rotation is the Lebesgue measure m, which is clearly exact dimensional. Furthermore, it is easy to verify that if 0 < q1 < q2 < · · · is a sequence of positive integers such that |qn ω − pn | < 1/qn ν for some integer pn , then τ1/qn ν (x) = inf{k ∈ N : kω(mod1) < 1/qn ν } ≤ qn for every x in the circle, and thus log τ1/qn ν (x) 1 ≤ < 1 = dimH m. ν ν(ω) n→∞ − log(1/qn )
R(x) ≤ lim
Note that the Lebesgue measure is not hyperbolic in this example.
450
L. Barreira, B. Saussol
In view of Theorem 4, Example 3 also illustrates that an exact dimensional measure may not have long return time. We now show that the above Theorems 4 and 5 can be seen as generalizations of work of Ornstein and Weiss for the measure-theoretic entropy. Let T : X → X be a measurable transformation (note that X need notbe a metric −k Z space), and Z a measurable partition of X. Consider the partitions Zn = n−1 k=0 T for each n. We shall denote by hµ (T , Z) the µ-entropy of T with respect to Z. Then Theorem 1 in [8] can be reformulated in the following manner. Proposition 2 ([8]). Let T : X → X be a measurable transformation, Z a measurable partition of X, and µ an ergodic T -invariant probability measure on X. If we endow X with the (pseudo) metric dZ (x, y) = e−n , where n is the smallest positive integer such that Zn (x) = Zn (y), then R(x) = R(x) = hµ (T , Z) for µ-almost every x ∈ X. We stress that with the special metric dZ , the measure-theoretic entropy hµ (T , Z) coincides with the Hausdorff dimension dimH µ of the measure µ. Theorems 4 and 5 provide versions of Proposition 2 in metric spaces which may have “non-homogeneous” distances. Let now f : M → M be a diffeomorphism of a compact smooth manifold. Given x ∈ M and v ∈ Tx M the Lyapunov exponent of v at x is defined by λ(x, v) = lim
n→∞
1 log dx f n v. n
The measure µ is said to be hyperbolic if there exists a set Y ⊂ M of full µ-measure such that λ(x, v) = 0 for every x ∈ Y and every v ∈ Tx M. One should notice that the relation between Proposition 2 and Theorem 5 is similar to the relation between the Shannon–McMillan–Breiman theorem, and the following statement established by Barreira, Pesin, and Schmeling. Proposition 3 ([1]). If f is a C 1+α diffeomorphism on a compact smooth manifold M, for some α > 0, and µ is a hyperbolic f -invariant probability measure on M, then d µ (x) = d µ (x) for µ-almost every x ∈ M. The role of Proposition 3 in dimension theory of dynamical systems is similar to the role of the Shannon–McMillan–Breiman theorem in the entropy theory. While the first ensures the coincidence of many characteristics of dimension type of the measure (such as the Hausdorff dimension, lower and upper box dimensions, and lower and upper information dimensions), the later ensures the coincidence of various definitions of the entropy (such as those due to Kolmogorov and Sinai, Katok, Brin and Katok, and Pesin). See [1] for details. In a similar fashion, while Proposition 2 relates the measure-theoretic entropy with recurrence, Theorem 5 relates dimension-like characteristics with recurrence. Both results provide a non-trivial insight concerning the quantitative behavior of recurrence. A similar observation can be made about the hypotheses under which the results are established. Namely, the assumptions in Proposition 3 are known to be optimal (see [1, 9] for details), while the Shannon–McMillan–Breiman theorem only assumes the invariance of the probability measure. Similarly, while Proposition 2 only requires the measure to be ergodic, Theorem 5 requires more from the dynamical system and the invariant measure. Example 3 illustrates that the assumption that the measure is hyperbolic is essential in Theorem 5.
Hausdorff Dimension of Measures via Poincaré Recurrence
451
We believe that the other assumption in Theorem 5, concerning the regularity of the map, is also essential, although to the best of our knowledge no counterexample is known with weaker regularity. Moreover we would like to formulate the following plausible conjecture. Conjecture. Let f be a C 1+α diffeomorphism on a compact smooth manifold M, for some α > 0. If µ is an ergodic hyperbolic f -invariant probability measure on M, then R(x) = R(x) = dimH µ for µ-almost every x ∈ M. Theorem 5 establishes this statement when µ is supported on a uniformly hyperbolic set. By Theorem 1 and Proposition 3, observe that in order to establish the conjecture in the affirmative, one must show that R(x) ≥ dimH µ for µ-almost every x ∈ X. Our results motivate the introduction of a new method to compute the Hausdorff dimension of measures. More precisely, one can use Statement 5 in Theorem 5 to compute the Hausdorff dimension of an equilibrium measure µ supported on a locally maximal hyperbolic set of a C 1+α diffeomorphism. Namely, for µ-almost every point we have lim
r→0
log τr (x) = dimH µ. − log r
Therefore, one can use the following algorithm: 1. choose “µ-randomly” a point x ∈ X; 2. iterate the point x and determine the successive “best” return times to a neighborhood of x, i.e., the smallest possible positive integers m1 < m2 < · · · such that d(T m1 x, x) > d(T m2 x, x) > · · · ; 3. plot the points (log mn , − log d(T mn x, x)) in a plane; 4. estimate dimH µ from the asymptotic slope defined by these points. 5. Application to Suspension Flows We assume that T : X → X is a bi-Lipschitz transformation on the separable metric space (X, d). Let ϕ : X → (0, ∞) be a Lipschitz function. Consider the space Y = {(x, s) ∈ X × R : 0 ≤ s ≤ ϕ(x)}, with the points (x, ϕ(x)) and (T x, 0) identified for each x ∈ X. The suspension flow over T with height function ϕ is the flow / = {ψt }t on Y , where each transformation ψt : Y → Y is defined by ψt (x, s) = (x, s + t). We equip the space Y with the Bowen–Walters distance dY introduced in [4], and define the return time of a point y ∈ Y (with respect to the flow /) into the open ball BY (y, r) by τr/ (y) = inf{t > ρr (y) : ψt y ∈ BY (y, r)} = inf{t > ρr (y) : dY (ψt y, y) < r}, def
where ρr (y) = inf{t > 0 : ψt y ∈ BY (y, r)} is the escape time of y from the ball BY (y, r). Observe that ψt y ∈ BY (y, r) for all sufficiently small t, and thus we need to ensure that the orbit ψt y has escaped from BY (y, r) when defining τr/ (y).
452
L. Barreira, B. Saussol
We also define the lower and upper recurrence rates of y by log τr/ (y) r→0 − log r
/
R / (y) = lim
log τr/ (y) . r→0 − log r
and R (y) = lim
Let µ be a T -invariant Borel probability measure in X. It is well known that µ induces a /-invariant probability measure ν in Y such that ϕ(x) g dν = g(x, s) dsdµ(x) ϕ dµ Y
X
X
0
for every continuous function g : Y → R (where ds refers to Lebesgue measure in the line), and that any /-invariant measure ν in Y is of this form for some T -invariant Borel probability measure µ in X. Theorem 5 can be used to establish the following result for suspension flows. Theorem 6. Let X be a locally maximal hyperbolic set of a C 1+α diffeomorphism T on a compact smooth manifold, for some α > 0, and µ an equilibrium measure of a Hölder continuous potential on X. If / is a suspension flow over T |X then /
R / (y) = R (y) = dimH ν − 1 for ν-almost every y ∈ Y . 6. Proofs Following Federer [5], a measure µ is called diametrically regular if there exist constants η > 1 and c > 0 such that µ(B(x, ηr)) ≤ cµ(B(x, r)) for every x ∈ X and r > 0. Examples include equilibrium measures with a Hölder continuous potential for several classes of topologically mixing hyperbolic systems, and namely subshifts of finite type, conformal expanding maps, surface axiom A diffeomorphisms, and, more generally, conformal axiom A diffeomorphisms. See [9] for full details. We shall say that a measure µ is weakly diametrically regular on a set Z ⊂ X if there is a constant η > 1 such that for µ-almost every x ∈ Z and every ε > 0, there exists δ > 0 such that if r < δ then µ(B(x, ηr)) ≤ µ(B(x, r))r −ε .
(10)
It is easy to verify that if µ is a weakly diametrically regular measure on a set Z, then for each fixed constant η > 1, there exists δ = δ(x, ε) > 0 for µ-almost every x ∈ Z and every ε > 0, such that (10) holds for every r < δ. Clearly, diametrically regular measures are weakly diametrically regular on X. Lemma 1. Any Borel probability measure on Rd is weakly diametrically regular. Proof of Lemma 1. Let µ be a Borel probability measure on Rd . Clearly, it is sufficient to show that for µ-almost every x ∈ Rd we have µ(B(x, 2−n )) ≤ n2 µ(B(x, 2−n−1 )) for all sufficiently large n ∈ N. For each n ∈ N and δ > 0 let def Kn (δ) = x ∈ supp µ : µ(B(x, 2−n−1 )) < δµ(B(x, 2−n )) .
(11)
Hausdorff Dimension of Measures via Poincaré Recurrence
453
Taking a maximal 2−n−2 -separated set E ⊂ Kn (δ) we obtain µ(B(x, 2−n−1 )) ≤ δµ(B(x, 2−n )). µ(Kn (δ)) ≤ x∈E
x∈E
Since E is 2−n−2 -separated, there exists a constant M (depending only on d) such that −n E can be decomposed into the union E = M i=1 Ei where each set Ei is 2 -separated. −n Thus for each i = 1, . . . , M the union x∈Ei B(x, 2 ) is disjoint. Therefore µ(Kn (δ)) ≤
M
δµ(B(x, 2−n )) ≤ Mδ.
i=1 x∈Ei
Since
µ(Kn (n−2 )) ≤ M
n>0
n−2 < ∞,
n>0
we conclude from the Borel–Cantelli lemma that (11) holds for µ-almost every x ∈ X and all sufficiently large n ∈ N. This completes the proof. This shows that the class of weakly diametrically regular measures is very broad. In particular, due to Whitney’s embedding theorem, this class contains all probability measures supported in a finite-dimensional smooth manifold. Further weakly diametrically regular measures on arbitrary metric spaces include any measure µ on a separable metric space X restricted to the set {x ∈ X : d µ (x) = d µ (x)}. This readily follows from the definition of pointwise dimension. Note that the property of Rd that we used in the proof of Lemma 1 is that the maximal cardinality, say M(r), of a 41 r-separated subset of balls of radius r is bounded by some constant M. Our proof readily extends to separable metric spaces X with the property that M(r) = o(r −ε ) for any ε > 0. In this case, instead of (11) one can show that for each δ > 0 we have µ(B(x, 2−n )) ≤ 2nδ µ(B(x, 2−n−1 )) for µ-almost every x ∈ X and all sufficiently large n ∈ N. This readily implies the weak regularity of µ. We now provide an example of very different nature. Example 4. Let α ∈ ( 21 , 1) and define the sequence βn = nα . Consider the space of sequences X = {0, 1}N and define a metric on X by requiring that diam Cn (x) = e−βn , where Cn (x) is any cylinder of length n. Consider also the Bernoulli measure µ on X such that µ(Cn (x)) = 2−n . One can easily verify that µ is weakly diametrically regular (by checking that any ball of radius r contains at most r −ε(r) balls of radius 41 r, where ε(r) → 0 as r → 0), and also that dimH µ = +∞. In particular X = supp µ cannot be smoothly embedded into Rd for any d ∈ N. The same measure µ may not be weakly diametrically regular if the sequence βn increases in a slower fashion. This is the case for example when βn = log n. We continue with an auxiliary statement.
454
L. Barreira, B. Saussol
Lemma 2. Let µ be a finite Borel measure on the separable metric space X, and G ⊂ supp µ a measurable set. Given r > 0, there exists a countable set E ⊂ G such that: 1. B(x, r) ∩ B(y, r) = ∅ for any two distinct points x, y ∈ E; 2. µ(G \ x∈E B(x, 2r)) = 0. Proof of Lemma 2. The existence of the set E can be obtained using Zorn’s lemma on the non-empty family of subsets of G which satisfy the first property, ordered by inclusion. Then the second property is satisfied for any maximal element. Since µ(B(x, r)) > 0 for each x ∈ E ⊂ supp µ, the set E is at most countable. We shall call the set E in Lemma 2 a maximal r-separated set for G. We recall that for any a > 0 the following identities hold: log µ(B(x, ae−n )) , −n n→∞ log τae−n (x) R(x) = lim , n n→∞
d µ (x) = lim
log µ(B(x, ae−n )) , n→∞ −n log τae−n (x) . R(x) = lim n→∞ n
d µ (x) = lim
(12) (13)
Lemma 3. Let T be a Borel measurable transformation on the separable metric space X, and µ a T -invariant probability measure on X. If µ is weakly diametrically regular on a measurable set Z ⊂ X with µ(Z) > 0, then (2) holds for µ-almost every x ∈ Z. Proof of Lemma 3. Observe that the function δ(x, ·) in the definition of a weakly regular measure (see (10)) can be made measurable for each fixed x. Fix ε > 0, and choose δ > 0 sufficiently small such that the set G = {x ∈ Z : δ(x, ε) > δ} has measure µ(G) > µ(Z) − ε. For any r, λ > 0 and x ∈ X consider the set Ar,x = {y ∈ B(x, 4r) : τ4r (y, x) ≥ λ−1 µ(B(x, 4r))−1 }, where τ4r (y, x) is defined in (5). Chebychev’s inequality implies that µ(Ar,x ) ≤ λµ(B(x, 4r)) τ4r (y, x) dµ(y). B(x,4r)
Since µ is invariant, Kac’s lemma tells us that τ4r (y, x) dµ(y) = µ({y ∈ X : τ4r (y, x) < ∞}) ≤ 1. B(x,4r)
Since B(x, 2r) ⊂ B(x, 4r), we obtain µ({y ∈ B(x, 2r) : τ4r (y, x)µ(B(x, 4r)) ≥ λ−1 }) ≤ λµ(B(x, 4r)). Furthermore τ4r (y, x)µ(B(x, 4r)) ≥ τ8r (y)µ(B(y, 2r)) whenever d(x, y) < 2r (see (6)), and thus µ({y ∈ B(x, 2r) : τ8r (y)µ(B(y, 2r)) ≥ λ−1 }) ≤ λµ(B(x, 4r)).
(14)
Hausdorff Dimension of Measures via Poincaré Recurrence
455
By Lemma 2 we can find an at most countable maximal r-separated set E ⊂ G. Using (14) with λ = r 2ε and (10) with η = 4 (see also the discussion after (10)), we obtain Dε (r) = µ({y ∈ G : τ8r (y)µ(B(y, 2r)) ≥ r −2ε }) µ({y ∈ B(x, 2r) : τ8r (y)µ(B(y, 2r)) ≥ r −2ε }) ≤ def
x∈E
≤ r 2ε ≤ rε
µ(B(x, 4r))
x∈E
µ(B(x, r)) ≤ r ε .
x∈E
We conclude that
Dε (e−n ) ≤
n>− log δ
e−εn < ∞.
n>− log δ
By the Borel–Cantelli lemma we find that for µ-almost any x ∈ G, we have log τ8e−n (x) log µ(B(x, 2e−n )) ≤ 2ε + −n n for all sufficiently large n. The desired result now follows from the identities in (12) and (13), and the arbitrariness of ε. In particular, if µ is an exact dimensional probability measure on a separable metric space X, then by Lemma 3 we have R(x) ≤ dimH µ for µ-almost every x ∈ X. On the other hand, if µ is not exact dimensional then in general one can only show that R(x) ≤ dimH µ for µ-almost every x ∈ X (see Theorem 2 and Example 1). We notice that as in Lemma 3, the condition that X ⊂ Rd in Theorem 4 may also be replaced by the hypothesis that µ is weakly diametrically regular on X. We now start establishing the results in the former sections. Proof of Theorem 1. The desired statement follows from Lemmas 1 and 3.
We continue with an auxiliary statement. Lemma 4. Given x ∈ X, we have R(x) ≤ d if and only if for every ε > 0, we have lim [n1/(d+ε) d(T n x, x)] = 0.
n→∞
(15)
Proof of Lemma 4. Assume first that R(x) ≤ d. Given ε > 0 there exists a sequence of numbers rn such that rn → 0, and τrn (x) < rn −(d+ε) for all n. Let mn = τrn (x). If the sequence mn is bounded, then x is periodic and (15) holds. Assume now that mn is unbounded. Note that d(T mn x, x) < rn and mn 1/(d+2ε) d(T mn x, x) < τrn (x)1/(d+2ε) rn < rn −(d+ε)/(d+2ε) rn = rn ε/(d+ε) . Therefore lim [n1/(d+2ε) d(T n x, x)] ≤ lim [mn 1/(d+2ε) d(T mn x, x)] = 0.
n→∞
n→∞
456
L. Barreira, B. Saussol
This establishes (15) for each ε > 0. Assume now that (15) holds for every ε > 0. Setting rn = 2d(T n x, x), we conclude that τrn (x) ≤ n, and it follows from (15) that lim [τrn (x)1/(d+ε) rn ] = 0.
n→∞
Thus there exists a diverging sequence of positive integers kn such that τrkn (x)1/(d+ε) rkn < 1 for each n. Therefore log τrn (x) log(rkn d+ε ) ≤ lim = d + ε. n→∞ − log rn n→∞ − log rkn
R(x) ≤ lim
The arbitrariness of ε implies the desired result.
Proof of Theorem 2. We recall that the statement in Theorem 2 is a reformulation of a result of Boshernitzan. By the definition of the Hausdorff dimension of a measure, for any α > dimH µ and all sufficiently small δ > 0 there exists a set Z ⊂ X of full µ-measure such that α > dimH Z > dimH µ − δ, and hence mα (Z) = 0. It follows from (3) and Lemma 4 that R(x) ≤ α for µ-almost every x ∈ Z. Letting α → dimH Z and δ → 0 we obtain R(x) ≤ dimH µ for µ-almost every x ∈ X. Proof of Theorem 3. The desired statement follows from Theorem 1 and Lemma 4. Proof of Theorem 4. By Theorem 1 we have R(x) ≤ d µ (x) and R(x) ≤ d µ (x) for µ-almost every x ∈ X. We shall now establish the reverse inequalities. By Lemma 1 the measure µ is weakly diametrically regular on X. Since µ has long return time (see (7)), µ is weakly diametrically regular on X (see (10)), and d µ (x) > 0 for µ-almost every x ∈ X, if ε > 0 is sufficiently small we conclude that there exist numbers a, γ , ρ > 0 and a set G ⊂ X with µ(G) > 1 − ε such that if x ∈ G and r ∈ (0, ρ) then µ(Aε (x, 2r)) ≤ µ(B(x, 2r))1+γ , µ(B(x, 2r)) ≤ µ(B(x, r/2))r µ(B(x, r)) ≤ r a .
−aγ /2
(16) ,
(17) (18)
Consider the set Aε (r) = {y ∈ G : τr (y) ≤ µ(B(y, 3r))−1+ε }. def
Whenever d(x, y) < r we have τr (y) ≥ τ2r (y, x) (see (6)). Since B(x, 2r) ⊂ B(y, 3r), if x ∈ G then using (16), (17), and (18), we obtain µ(B(x, r) ∩ Aε (r)) ≤ µ({y ∈ B(x, r) : τ2r (y, x) ≤ µ(B(x, 3r))−1+ε }) ≤ µ(Aε (x, 2r)) ≤ µ(B(x, 2r))1+γ ≤ µ(B(x, r/2))r −aγ /2 (2r)aγ .
Hausdorff Dimension of Measures via Poincaré Recurrence
457
If E ⊂ G is a maximal 2r -separated set given by Lemma 2, then µ(Aε (r)) ≤ µ(B(x, r) ∩ Aε (r)) ≤
x∈E
µ(B(x, r/2))r −aγ /2 (2r)aγ
x∈E
≤ 2aγ r aγ /2 . We conclude that ∞
µ(Aε (e−n )) < ∞.
n=1
The Borel–Cantelli lemma implies that for µ-almost every x ∈ G we have τe−n (x) > µ(B(x, 3e−n ))−1+ε for all sufficiently large n. The identities in (12) and (13) imply that R(x) ≥ (1 − ε)d µ (x)
and
R(x) ≥ (1 − ε)d µ (x)
for µ-almost every x ∈ G. The desired statement follows from the arbitrariness of ε. For a non-uniformly hyperbolic set X in the manifold M, and a Lyapunov regular point x ∈ X, let <x : Ux → M be the Lyapunov chart at x for some sufficiently small open neighborhood Ux ⊂ Rdim M of 0, which satisfies <x 0 = x, and d0 <x (Rdim E
s (x)
× {0}) = E s (x)
and
d0 <x ({0} × Rdim E
u (x)
) = E u (x).
s
See the appendix to [1] for details. We denote by D s (x, r) ⊂ Rdim E (x) and D u (x, r) ⊂ u Rdim E (x) the balls of radius r at the origin. We also denote by B s (x, r) ⊂ V s (x) and u B (x, r) ⊂ V u (x) the balls of radius r centered at the point x, with respect to the distances induced in the local stable and unstable manifolds V s (x) and V u (x). In [7], Ledrappier and Young constructed two measurable partitions ξ s and ξ u of M such that for µ-almost every x ∈ M we have: 1. ξ s (x) ⊂ V s (x) and ξ u (x) ⊂ V u (x); 2. ξ s (x) ⊃ V s (x) ∩ B(x, γ ) and ξ u (x) ⊃ V s (x) ∩ B(x, γ ) for some γ = γ (x) > 0. We denote by µsx and µux the conditional measures associated respectively to the partitions ξ s and ξ u . Recall that any measurable partition ξ of M has associated a family of conditional measures: for µ-almost every x ∈ M there exists a probability measure µx defined on the element ξ(x) of ξ containing x. The conditional measures are characterized completely by the following property: if Bξ is the σ -subalgebra of the Borel σ -algebra generated by unions of elements of ξ then for each Borel set A ⊂ M, the function x → µsx (A ∩ ξ(x)) is Bξ -measurable and µsx (A ∩ ξ(x)) dµ. µ(A) = A
We will later need the following result concerning the product structure of hyperbolic measures. Instead of formulating the result in all its generality we state it in a form adapted to our purposes.
458
L. Barreira, B. Saussol
Proposition 4 ([12], after [1]). Let X be a locally maximal hyperbolic set of a C 1+α diffeomorphism f on a compact smooth manifold, for some α > 0. If µ is an equilibrium measure of a Hölder continuous potential on X and a, b, c > 0, then for µ-almost every x ∈ X, there exists ε(r) > 0 for each each r > 0 such that ε(r) → 0 as r → 0 and r ε(r) ≤
µsx (B s (x, r a ))µux (B u (x, r b )) ≤ r −ε(r) µ(<x (D s (x, cr a ) × D u (x, cr b )))
for all sufficiently small r > 0. Proof of Proposition 4. We use the notations of [12]. Take a Lyapunov regular point x, and let χ1 , . . . , χd be the values of the Lyapunov exponent at x. Setting ai = −a for χi < 0, and ai = −b for χi > 0 we conclude that for any r sufficiently small there exist n(r), m(r) ∈ N such that Rm(r) (x) ⊂ <x (D s (x, r a ) × D u (x, r b )) ⊂ Rn(r) (x) and m(r)/n(r) → 1 as r → 0. The desired result now follows immediately from Theorem 3.9 in [12]. We define the return time of a set A into itself by τ (A) = inf{n ∈ N : T n A ∩ A = ∅}. −k ξ for each n. Given a partition ξ of X we consider a new partition ξn = n−1 k=0 T Saussol, Troubetzkoy, and Vaienti show in [11] that the first return time of an element of ξn is typically large. Proposition 5 ([11]). Let T : X → X be a measurable transformation preserving an ergodic probability measure µ. If ξ is a finite or countable measurable partition with entropy hµ (T , ξ ) > 0 then τ (ξn (x)) ≥1 n n→∞ lim
for µ-almost every x ∈ X. Using this result, it can be shown that the first return time of a ball is also typically large. Lemma 5. Let T : X → X be a Lipschitz map with Lipschitz constant L > 1 on a compact metric space X. If µ is an ergodic T -invariant Borel probability measure with entropy hµ (T ) > 0, then 1 τ (B(x, r)) ≥ − log r log L r→0 lim
for µ-almost every x ∈ X.
Hausdorff Dimension of Measures via Poincaré Recurrence
459
Proof of Lemma 5. We claim that for each n > 0 there exists a partition ζn of X with diameter diam ζn ≤ 2−n , such that if r > 0, then µ({x ∈ X : d(x, X \ ζn (x)) < r}) < cn r,
(19)
where ζn (x) is the atom of ζn containing x, and cn is some positive constant depending only on n. Take a finite set En ⊂ X such that x∈En B(x, 2−n−1 ) = X. For each x ∈ En , we can find a sequence of real numbers rk = rk (x) ∈ (2−n−1 , 2−n ) satisfying |rk+1 − rk | ≤ 2−k−1 and µ(B(x, rk + 2−k−1 ) \ B(x, rk − 2−k−1 )) ≤ 2−k . Take r(x) = limk→∞ rk and consider the set Ak = {y ∈ X : r(x) − 2−k ≤ d(x, y) ≤ r(x) + 2−k for some x ∈ En } for each k ∈ N. Note that we obtain a cover of X satisfying µ(Ak ) ≤ 2−k+1 card En . Writing En = {x1 , . . . , xp } we set B1 = B(x1 , r(x1 ))
and
BD = B(xD , r(xD )) \
D−1
Bj
j =1
for D = 2, . . . , p. Then the partition ζn = {B1 , B2 , . . . , Bp } has the desired properties, with cn = card En . Since diam ζn → 0 as n → ∞, there exists n∗ ∈ N such that hµ (T , ζn∗ ) > hµ (T )/2 > 0. Let Z = ζn∗ , C = cn∗ and denote by Zm (x) the unique element of m−1 −k Z which contains x ∈ X. k=0 T Fix σ < 1/L < 1. Clearly, if d(x, X \ Zm (x)) < σ m , then d(T k x, X \ Z1 (T k x)) < σ m Lk for some k < m. It follows from (19) that µ({x ∈ X : d(x, X \ Zm (x)) < σ m }) ≤ m max{µ({x : d(T k x, X \ Z1 (T k x)) < σ m Lk }) : k = 0, . . . , m − 1} ≤ Cm(σ L)m , using the invariance of the measure. By the Borel–Cantelli lemma we conclude that for µ-almost every x ∈ X we have B(x, σ m ) ⊂ Zm (x) for all sufficiently large m. By Proposition 5, for µ-almost every x ∈ X we have τ (Zm (x)) τ (B(x, σ m )) τ (B(x, r)) ≤ − log σ lim = − log σ lim . m m→∞ m→∞ −m log σ r→0 − log r
1 ≤ lim
Since σ can be made arbitrarily close to 1/L, we obtain the desired statement.
460
L. Barreira, B. Saussol
Proof of Theorem 5. Let Z be a Markov partition, and consider the new partitions k=n −k n = Zm Z whenever m ≤ n. There exist constants c > 0 and λ > 0 such that k=m f n ⊂ Zn , for any cylinder Zm m n ≤ ce+λm diams Zm
and
n diamu Zm ≤ ce−λn ,
(20)
where diams and diamu denote the diameters along the stable and unstable manifolds. For µ-almost every x ∈ X, we have d µ (x) = d µ (x) = d > 0 (see Proposition 3) and, by Lemma 5, there exists δ > 0 and ρ > 0 such that B(x, r) ∩ f −k B(x, r) = ∅ def
for every r < ρ and k ≤ tr = −δ log r. Let r < ρ and write B = B(x, r). There exists a countable collection of points na na {xa }a∈A ∈ B and integers ma < 0 < na for each a ∈ A such that if Zm a (xa ) ∈ Zma denotes the unique element containing xa , then
na Zm (xa ) (mod 0), B= a a∈A
in such a way that the union is disjoint (mod 0). We may also assume, without loss of generality, that min{−ma , na } ≥ tr . Let k ≥ tr and write p = [k/2] > 0 and q = p + 1 − k < 0. We have
p
def def p B⊂ Zma (xa ) = Z−∞ (x, r) and B ⊂ Zqnb (xb ) = Zq+∞ (x, r). a∈A
b∈A
One can find sets U ⊂ A and V ⊂ A such that
p
p Zma (xa ) and Zq+∞ (x, r) = Zqnb (xb ), Z−∞ (x, r) = a∈U
b∈V
with the two unions being disjoint (mod 0). Since nb +k f −k Zqnb (xb ) = Zp+1 (f −k xb )
we obtain
B ∩ f −k B ⊂
a∈U,b∈V
p
nb +k [Zma (xa ) ∩ Zp+1 (f −k xb )].
The Gibbs property of the measure implies that there exists a constant κ > 0 such that p nb +k µ(Zma (xa ) ∩ Zp+1 (f −k xb )) µ(B ∩ f −k B) ≤ a∈U,b∈V
≤κ
a∈U,b∈V
≤κ
a∈U,b∈V p
p
nb +k µ(Zma (xa ))µ(Zp+1 (f −k xb ))
(21) p µ(Zma (xa ))µ(Zqnb (xb ))
≤ κµ(Z−∞ (x, r))µ(Zq+∞ (x, r)).
Hausdorff Dimension of Measures via Poincaré Recurrence
461
Setting p
rk = max{diams Zq+∞ (x, r), diamu Z−∞ (x, r)} it follows from (20) that rk ≤ r + ce−λk/2 . Furthermore, for µ-almost every x ∈ X (in fact for all Lyapunov regular points) we have p
Z−∞ (x, r) ⊂ <x (D s (x, 2r) × D u (x, 2rk ))
(22)
Zq+∞ (x, r) ⊂ <x (D s (x, 2rk ) × D u (x, 2r))
(23)
and
for all sufficiently small r > 0. Since k ≥ tr , we have rk ≤ r + cr λδ/2 ≤ r g , where g = λδ/3 > 0 (provided that r is sufficiently small). Proposition 4 together with (21), (22), and (23), and the main theorem in [1] yield µ(B ∩ f −k B) ≤ κµ(<x (D s (x, 2r) × D u (x, 2r g ))) × × µ(<x (D s (x, 2r g ) × D u (x, 2r))) ≤ κr −2ε(r) µsx (B s (x, r))µux (B u (x, r g )) × × µsx (B s (x, r g ))µux (B u (x, r)) ≤ κr −4ε(r) µ(B(x, r))µ(B(x, r g )) ≤ κµ(B)r −5ε(r) r gd , where ε(r) > 0 and ε(r) → 0 as r → 0. Set now 2 (log c − log r) + 1. kr = L If k ≥ kr then rk ≤ 2r, and a similar argument establishes that µ(B ∩ f −k B) ≤ κµ(B)r −5ε(r) r d . Then for any sr > 0, summing the estimate above as k runs from tr through sr we obtain k sr r −1 µ({y ∈ B(x, r) : τr (y, x) ≤ sr }) ≤ κr −5ε(r) r gd + κr −5ε(r) r d µ(B(x, r)) k=tr
k=kr
≤ κr −5ε(r) (kr r gd + sr r d ). By rechoosing ε(r) if necessary, we may assume that µ(B(x, r)) ≥ r d+ε(r) for all sufficiently small r. Setting sr = µ(B(x, r))−1+ε we obtain µ({y ∈ B(x, r) : τr (y, x) ≤ sr }) ≤ κr −5ε(r) [(−δ log r)r gd + r (d+ε(r))(−1+ε) r d ], µ(B(x, r)) provided that ε is sufficiently small. Since gd > 0, εd > 0 and ε(r) → 0 as r → 0 we conclude that (7) holds. Since ε is arbitrarily small, the measure µ has long return time. The second statement is now an immediate consequence of Theorem 4 and Proposition 3, together with Young’s criteria (see [13]).
462
L. Barreira, B. Saussol
Proof of Theorem 6. It follows immediately from Proposition 17 in [2] that there exists a constant c > 1 such that if y = (x, s) ∈ Y \ (X × {0}) and r > 0 is sufficiently small (possibly depending on x) then B(x, r/c) × I (s, r/c) ⊂ BY (y, r) ⊂ B(x, cr) × I (s, cr),
(24)
where I (s, r) = (s − r, s + r) ⊂ R. Therefore µ(B(x, r/c)2r/c ≤ ν(BY (y, r)) ≤ µ(B(x, cr))2cr for all sufficiently small r, and thus d ν (y) = d µ (x) + 1
and
d ν (y) = d µ (x) + 1.
It follows from Proposition 3 that d ν (y) = d ν (y) = dimH µ + 1 for ν-almost every y ∈ Y , and applying Young’s criteria (see [13]) we obtain dimH ν = dimH µ + 1. By (24) we obtain τr/ (y) ≤ inf{t > ρr (y) : ψt y ∈ B(x, r/c) × {s}} + r/c, τr/ (y) ≥ inf{t > ρr (y) : ψt y ∈ B(x, cr) × {s}} − cr, and hence τcr (x)−1 k=0
ϕ(T k x) − cr ≤ τr/ (y) ≤
τr/c (x)−1
ϕ(T k x) + r/c
(25)
k=0
for all sufficiently small r > 0 (possibly depending on x). By Theorem 5, given ε > 0 we have r −d+ε < τr (x) < r −d−ε for all sufficiently small r, where d = dimH µ. By (25) and the ergodicity of µ, we obtain (cr)−d+ε
X
ϕ dµ − ε − cr ≤ τr/ (y) ≤ (r/c)−d−ε ϕ dµ + ε + r/c X
for all sufficiently small r. Therefore /
log τr/ (y) = d = dimH µ r→0 − log r
R / (y) = R (y) = lim
for ν-almost every y ∈ Y . This completes the proof of the theorem.
Acknowledgement. We gratefully thank Joerg Schmeling for several enlightning discussions. We also thank the referee for the detailed and helpful comments.
Hausdorff Dimension of Measures via Poincaré Recurrence
463
References 1. Barreira, L., Pesin, Ya. and Schmeling, J.: Dimension and product structure of hyperbolic measures. Ann. of Math. 149 (2), 755–783 (1999) 2. Barreira, L. and Saussol, B.: Multifractal analysis of hyperbolic flows. Commun. Math. Phys. 214, 339– 371 (2000) 3. Boshernitzan, M.: Quantitative recurrence results. Invent. Math. 113, 617–631 (1993) 4. Bowen, R. and Walters, P.: Expansive one-parameter flows. J. Differential Equations 12, 180–193 (1972) 5. Federer, H.: Geometric measure theory. Berlin–Heidelberg–New York: Springer, 1969. 6. Hirata, M., Saussol, B. and Vaienti, S.: Statistics of return times: A general framework and new applications. Commun. Math. Phys. 206, 33–55 (1999) 7. Ledrappier, F. and Young, L.-S.: The metric entropy of diffeomorphisms. Part II: Relations between entropy, exponents and dimension. Ann. of Math. 122 (2), 540–574 (1985) 8. Ornstein, D. and Weiss, B.: Entropy and data compression schemes. IEEE Trans. Inform. Theory 39, 78–83 (1993) 9. Pesin, Ya.: Dimension theory in dynamical systems: Contemporary views and applications. Chicago Lectures in Mathematics, Chicago: Chicago University Press, 1997 10. Pesin, Ya. and Weiss, H.: On the dimension of deterministic and random Cantor-like sets, symbolic dynamics, and the Eckmann–Ruelle conjecture. Commun. Math. Phys. 182, 105–153 (1996) 11. Saussol, B., Troubetzkoy, S. and Vaienti, S.: Recurrence, dimensions and Lyapunov exponents. In preparation 12. Schmeling, J. and Troubetzkoy, S.: Scaling properties of hyperbolic measures. DANSE Preprint 50/98, 1998 13. Young, L.-S.: Dimension, entropy and Lyapunov exponents. Ergodic Theory Dynam. Systems 2, 109–124 (1982) Communicated by J. L. Lebowitz
Commun. Math. Phys. 219, 465 – 480 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Time Quasi-Periodic Unbounded Perturbations of Schrödinger Operators and KAM Methods Dario Bambusi1 , Sandro Graffi2 1 Dipartimento di Matematica “F. Enriques”, Università di Milano, Via Saldini 50, 20133 Milano, Italy.
E-mail:
[email protected] 2 Dipartimento di Matematica, Università di Bologna, Piazza di Porta S Donato 5, 40127 Bologna, Italy.
E-mail:
[email protected] Received: 3 October 2000 / Accepted: 20 December 2000
Abstract: We eliminate by KAM methods the time dependence in a class of linear differential equations in 2 subject to an unbounded, quasi-periodic forcing. This entails the pure-point nature of the Floquet spectrum of the operator H0 + P (ωt) for small. Here H0 is the one-dimensional Schrödinger operator p 2 + V , V (x) ∼ |x|α , α > 2 for |x| → ∞, the time quasi-periodic perturbation P may grow as |x|β , β < (α − 2)/2, and the frequency vector ω is non resonant. The proof extends to infinite dimensional spaces the result valid for quasiperiodically forced linear differential equations and is based on Kuksin’s estimate of solutions of homological equations with non-constant coefficients. 1. Introduction and Statement of the Results Consider the non-autonomous, linear differential equation in a separable Hilbert space H ˙ iψ(t) = (A + P (ω1 t, ω2 t, . . . , ωn t))ψ(t),
ψ(t) ∈ H, ∈ R
(1.1)
under the following conditions: A1 The operator A is positive self-adjoint. Spec(A) is discrete, and all eigenvalues 0 < λ1 < λ2 < λ3 , . . . are simple. There is d > 1 such that λi ∼ i d ,
i → ∞.
(1.2)
A2 P (φ1 , . . . , φn ) ≡ P (φ) is a function from the n-dimensional torus Tn ≡ Rn /2π Z n into the symmetric operators in H, ω := (ω1 , . . . , ωn ) ∈ [0, 1]n is a frequency vector. A3 For δ ≥ 0, denote B δ the Banach space of all closed operators T in H such that A−δ/d T is bounded (remark that B 0 = L(H)), with norm T δ :=
sup A−δ/d T xH
xH =1
(1.3)
466
D. Bambusi, S. Graffi
Then the map Tn φ → P (φ) ∈ B δ is analytic for some δ < d − 1. Our purpose is to prove the following Theorem 1.1. There exist ∗ > 0, a subset ⊂ := [0, 1]n and, if || < ∗ and ω ∈ , a unitary operator U (ωt) ≡ U (ω1 t, ω2 t, . . . , ωn t) in H with the following properties: T1 U (ωt) is analytic in t and quasiperiodic with frequencies ω; T2 U (ωt) transforms Eq. (1.1) into a system of the form A∞ :=
diag(λ∞ 1
iχ˙ (t) = A∞ (ωt)χ (t) ∞ ∞ + µ∞ 2 (ωt), λ3 + µ3 (ωt), . . . )
∞ + µ∞ 1 (ωt), λ2
(1.4) (1.5)
∞ ∞ n Here {λ∞ i }i=1 ∈ R and any function µi (φ) : T → R is analytic with zero average; T3 There exists C > 0 such that:
1 − U (ωt)0 ≤ C,
δ |λ∞ i − λi | ≤ Ci ,
|µi (ωt)| ≤ Ci δ ,
− →0 → 0.
Straightforward integration of (1.4) reduces (1.1) to an autonomous system which makes the almost-periodic nature of all its solutions evident. Corollary 1.1. 1. If || < ∗ , ω ∈ there exists a unitary transformation UF (ωt), quasiperiodic with frequency ω and such that 1 − UF (ωt)δ ≤ C, which transforms (1.1) into the system ix˙ = AF x,
∞ ∞ AF := diag(λ∞ 1 , λ2 , λ3 , . . . );
(1.6)
2. For any initial datum ψ0 the solution ψ(t) of (1.1) is almost-periodic with frequen∞ cies 2π/λ∞ 1 , 2π/λ2 , . . . ; ω1 , . . . , ωn , i.e. has the form ψ(t) =
∞ i=0
∞
φi0 (ωt)eiλi t ,
(1.7)
where {φi0 (ωt)}∞ i=1 are the components of U (ωt)ψ0 along the eigenvector basis of A. The above result can be equivalently formulated in terms of Floquet spectrum ([21], and [12] for the quasi-periodic case). Consider indeed on K := H ⊗ L2 (Tn ) the Floquet Hamiltonian operator KF := −i
n l=1
ωl
∂ + A + P (φ). ∂φl
(1.8)
The maximal operator in K generated by the differential expression (1.8), still denoted KF , is self-adjoint by A3, which makes A + P (ωt) self-adjoint on D(A) for all t. Then: Corollary 1.2. For || ≤ ∗ and ω ∈ the spectrum of KF is pure point; its eigenvaln ues are νj,k := λ∞ j + k · ω, j = 0, 1, 2 . . . , k ∈ Z .
Nonautonomous Schrödinger Operators and KAM Methods
467
Remark 1. 1. This corollary extends to unbounded and quasiperiodic perturbations the analogous result valid for operators KF with P (φ) periodic and differentiable in φ as a bounded operator in H [5,6]. The gap condition is the same by condition A1, but here the analyticity of the perturbation is required. 2. The KAM methods of [5, 6], first implemented in [2] (see also [3]) made possible to strengthen for small coupling the original result of [10] (see also [14, 17]) from absence of absolutely continuous spectrum to absence of continuous spectrum. Here too the set is the set of all frequencies fulfilling a diophantine condition with respect to the differences λi − λj . Moreover, a result of the type of Corollary 1.1 up to an error of order exp 1/∗ has been proved in [11] for a class of bounded perturbations via the Nekhoroshev technique. 3. Our proof extends to infinite dimensional spaces the KAM technique to eliminate the time dependence of quasiperiodically forced ordinary linear differential equations [1, 13, 20]. The main technical point is that the relevant homological equation has variable coefficients but can be solved by a technique developed by Kuksin[16] in the context of his analysis of the KdV equation by KAM theory. As in [3, 5, 6, 10, 14, 17, 11] the main motivation for this corollary is the (Floquet) spectral analysis for the time dependent Schrödinger equation in dimension one, namely: Theorem 1.2. Consider the time dependent Schrödinger equation H (t)ψ(x, t) = i∂t ψ(x, t), x ∈ R;
H (t) := −
d2 + Q(x) + V (x, ωt), ∈ R dx 2 (1.9)
and the corresponding Floquet Hamiltonian (1.8) under the following conditions: 1. Q(x) ∈ C ∞ (R; R), Q(x) ∼ |x|α for some α > 2 as |x| → ∞; 2. V (x, φ) is a C ∞ (R; R)-valued holomorphic function of φ ∈ Tn , with |V (x, φ)||x|−β α−2 bounded as |x| → ∞ for some β < . 2 Then there is ∗ > 0 such that the spectrum of KF is pure point for all || < ∗ , ω ∈ . Remark 2. 1. We prove the result in the more general case where V is a C ∞ (R2 ; R)valued holomorphic function V (x, ξ ; φ) of φ ∈ Tn with |V (x, ξ ; φ)|(|ξ |2 +|x|α )−δ/d bounded as |ξ | + |x| → ∞. Here V (φ) is realized as a pseudodifferential operator β family in L2 (R) of class Gρ (see e.g. [19], Chapter 8) of Weyl symbol V . 2. For α = 4 we get β < 1. Hence the quantum version of the original Duffing oscillator d2 H (t) = − 2 + x 4 + x sin (ωt) lies just outside the validity range of this corollary. dx 3. In the periodic case (n = 1) we see that, as in classical mechanics (see e.g. [7, Chap. 5.13]) not even an unbounded perturbation delocalizes the system if its strength is too small and its frequency is not too close to a resonant one. There is no diffusion (for small enough) in the classical counterpart of (1.9) even for resonant values of ω, but there are chaotic regions in phase space localized around the resonant actions. In this case it is still unknown whether or not the quantum Floquet spectrum is pure point even for bounded perturbations. On the other hand for 0 < α ≤ 2, when condition (1.2) is not satisfied, the nature of the Floquet spectrum is still unknown apart from the globally resonant case [8, 9].
468
D. Bambusi, S. Graffi
4. In the quasiperiodic case (n ≥ 2) the quantized system behaves as in the periodic one even though in the classical counterpart of (1.9) there are no topological obstructions to the growth of energy. 2. The Formal Construction Without loss of generality equation (1.1) can be written as a first-order system in 2 : ix˙ = (A + P (ωt))x, x ∈ 2 , A = diag(λ1 , λ2 , λ3 , . . . ) λi ∈ R,
λi > 0,
(2.1) (2.2)
where λi and P (ωt) ≡ P (ω1 t, ω2 t, . . . , ωn t) fulfill conditions A1–A3. The key point of any KAM method is the construction of a coordinate transformation mapping the original problem into a new one of the same form with a much smaller size of the perturbation, typically the square of the original one. Here we construct and estimate, by an algorithm very close to that of [11], a unitary operator which maps (2.1) into an equation of the same form but with a perturbation of order 2 . In this section we describe the procedure; in Sect. 3 we work out the estimates, and in Sect. 4 we set up the iterative scheme and prove its convergence. Let B(φ1 , . . . , φn ) ∈ B 0 be anti-selfadjoint ∀ φ ∈ Tn . Given the unitary operator B(φ) e , for fixed ω ∈ perform the change of basis x = eB(ωt) y. Substitution in (2.1) yields iy˙ = (A + P˜ 1 (ωt))y.
(2.3)
The new perturbation P˜ 1 is (the explicit dependence of B on t is omitted): (2.4) P˜ 1 := [A, B] − iB˙ + P ˙ B − B˙ . + e−B AeB −A−[A, B] + e−B P eB − P −i e−B Be If B makes the curly bracket vanish P˜ 1 becomes of order 2 . Hence we study the equation [A, B] − iB˙ + P = 0.
(2.5)
Taking its matrix elements between the eigenvectors of A this equation becomes −i
n
ωl
l=1
∂ Bij + (λi − λj )Bij = Pij . ∂φl
Expand both sides in Fourier series, i.e. write Bˆ ij k eik·φ , Pij = Pˆij k eik·φ . Bij = k∈Z n
k∈Z n
Equating the Fourier coefficients of both sides (2.6) becomes (ω · k + λi − λj )Bˆ ij k = Pˆij k .
(2.6)
Nonautonomous Schrödinger Operators and KAM Methods
469
Clearly this equation cannot be solved when i = j and k = 0. Assuming now ω such that ω · k + λi − λj = 0 when i = j or k = 0, the natural definition of B would be the operator with matrix elements defined as Bij :=
k∈Z n
Bii :=
Pˆij k eik·φ , ω · k + λ i − λj
k∈Z n −{0}
i = j
Pˆiik ik·φ e ω·k
(2.7)
The second line in (2.4) is of order 2 only if the operator B is bounded. However P is not bounded; as a consequence the operator diag(Bii ) is in general unbounded, and the above definition cannot yield the desired result. The idea is therefore to define B by the first of (2.7) with Bii = 0; one can guess that, since the denominators ω · k + λi − λj tend to infinity as i or j diverge, it should be possible to generate a bounded B even if P is unbounded. In the next section we will prove that this is actually the case. With the above definition of B the curly bracket in (2.4) turns out to be the operator diag(Pii ), and hence in terms of the variables y the equation takes the form. iy˙ = (A1 + 2 P 1 (ωt))y, with A1 = A + diag(Pii (ωt)). This system is defined only for ω in the subset of where the denominators in (2.7) do not vanish. In the next section we will assume a diophantine type condition also for such denominators, to be valid on a Cantor subset of . Then it will turn out that P 1 depends in a Lipschitz way on ω in such a subset. Iterating the construction, we see that the operator A is replaced by the operator A1 which depends also on the angles φ. As we shall see, this is precisely the point where Kuksin’s result[16] enters in a critical way. 3. Squaring the Order of the Perturbation Keeping in mind the discussion of the preceding section we first set some notation, and then construct and estimate the transformation squaring the order of the perturbation. Let Tns be the complexified torus with |Imφi | ≤ s. If f is an analytic function from n Ts to a Banach space (in what follows C or the complexification of B δ ), we denote f s = sup f (φ) . φ∈Tns
For B δ -valued functions we use the particular symbol f δ,s := sup f (φ)δ . φ∈Tns
Let − be a closed nonempty subset of of positive measure. If f has an additional (Lipschitz continuous) dependence on ω ∈ − we define the norm f (φ, ω) − f (φ, ω ) L f s := f s + sup sup . |ω − ω | φ∈Tn ω,ω ∈− In particular for B δ -valued functions we use the notation .L δ,s .
470
D. Bambusi, S. Graffi
Let us now include our system into a more general framework, which, by the above discussion, is convenient for the iteration scheme. Consider in 2 the equation ix˙ = (A− + P − (ωt))x
(3.1)
under the following conditions H1) − − − − − A−= diag(λ− 1 (ω) + µ1 (ωt, ω), λ2 (ω) + µ2 (ωt, ω), λ3 (ω) + µ3 (ωt, ω), . . . ). (3.2)
Here: − H1.a) ∀i = 1, . . . λ− i (ω) is positive and Lipschitz continuous w.r.t. ω ∈ ; moreover d λ− i ∼i ,
uniformly in ω ∈ − . Hence there is Cλ− > 0 independent of ω such that − − d d (3.3) λi − λ− j ≥ Cλ |i − j |. H1.b) There is Cω− > 0 suitably small and δ < d − 1 such that sup
ω,ω ∈−
− |λ− i (ω) − λi (ω )| ≤ Cω− i δ . |ω − ω |
(3.4)
n − H1.c) ∀i = 1, . . . µ− i (ω) : Ts × → R is analytic w.r.t. φ, Lipschitz continuous w.r.t. ω, and has zero average, i.e. µi (φ, ω) dφ = 0. Tn
Moreover it fulfills the estimates µi s ≤ Cµ− i δ ,
sup
sup
φ∈Tns ω,ω ∈−
H2) H3)
− |µ− i (ω, φ) − µi (ω , φ)| |ω − ω |
(3.5) ≤ Cω− i δ .
(3.6)
The operator valued function P − : Tns × − → B δ is analytic with respect to φ ∈ Tns and Lipschitz continuous w.r.t. ω ∈ − . there exist γ − > 0 and τ > n + 1 + s2/(d − 1) such that, for any ω ∈ − , one has γ− , ∀ k ∈ Zn − {0}, |k|τ γ − |i d − j d | |λi − λj + ω · k| ≥ , ∀ k ∈ Zn , 1 + |k|τ |ω · k| ≥
(3.7) i = j.
(3.8)
Remark 1. In the next section we will prove that it is possible to construct a set − of positive measure such that also the original system (1.1) fulfills the above assumption.
Nonautonomous Schrödinger Operators and KAM Methods
471
Let now B : Tns (φ1 , . . . , φn ) → B(φ1 , . . . , φn ) ∈ B 0
(3.9)
be an analytic map with B(φ1 , . . . , φn ) anti-selfadjoint for each real value of (φ1 , . . . , φn ). Consider the corresponding unitary operator eB(φ1 ,...,φn ) , and (as above) for any ω ∈ − consider the unitary change of basis x = eB(ωt) y. Substitution in Eq. 3.1 yields iy˙ = (A+ + P + (ωt))y,
(3.10)
+
(3.11)
−
−
A := A + diag(P ).
Here diag(P − ) is the diagonal matrix formed by the diagonal elements of P − , that is − − − diag(P − ) := diag(P11 (ωt), P22 (ωt), P33 (ωt) . . . ). + The new perturbation P is given by (the explicit dependence of B on t is omitted): P + := [A− , B] − iB˙ + (P − − diag(P − )) ˙ B − B˙ . + e−B A− eB − A− − [A− , B] + e−B P − eB − P − − i e−B Be (3.12) According to the standard procedure we subtract the mean of the perturbation. Namely, + + − we write A+ = diag(λ+ i + µi (ωt)), where λi = λi + Pii (φ) (the overline denotes + angular average). Hence the functions µ (φ) have zero average; the quantities λ+ i are − −iδ . − λ | ≤ C independent of φ and by A3 fulfill the estimate |λ+ µ i i The main step of the proof is to construct B so as to make the curly bracket in (3.12) vanish, i.e. to solve for the unknown B the equation [A− , B] − iB˙ + (P − − diag(P − )) = 0.
(3.13)
The procedure explained in the previous section has to be modified since now the eigenvalues of A− depend also on the angles φ. The construction is based on a lemma by Kuksin [16] that we now summarize. On the n-dimensional torus consider the equation −i
n k=1
ωk
∂ χ (φ) + E1 χ (φ) + E2 h(φ)χ (φ) = b(φ). ∂φk
(3.14)
Here χ denotes the unknown, while b, h denote given analytic functions on Tns . h has zero average; E1 , E2 are positive constants and hs ≤ 1. Concerning the frequency vector ω = (ω1 , . . . , ωn ) the assumptions are: |ω · k| ≥
γ2 , ∀ k ∈ Zn − {0}, |k|τ
|ω · k + E1 | ≥
γ1 , ∀ k ∈ Zn . 1 + |k|τ
(3.15)
The final hypothesis is an order assumption on the magnitude of the different parameters, namely: given 0 < θ < 1 and C > 0 we assume E1θ ≥ CE2 .
(3.16)
472
D. Bambusi, S. Graffi
Lemma 3.1. (Kuksin) Under the above assumptions Eq. (3.14) has a unique analytic solution χ which for any 0 < σ < s fulfills the estimate
1 C2 χ s−σ ≤ C1 bs . exp (3.17) γ1 σ a1 γ2a2 σ a3 Here a1 , a2 , a3 , C1 , C2 are constants independent of E1 , E2 , σ, s, γ1 , γ2 , ω. To apply this lemma to the construction and estimation of B, denote G the Banach space of all bounded operators B in 2 such that A−δ/d BAδ/d extends to a bounded linear operator. The norm in G is denoted BG := max B0 , A−δ/d BAδ/d 0 . (3.18) Moreover for the s− norms of an analytic function on the torus taking values in G (possibly Lipschitz-continuous on ω ∈ − ) we will use the notations BG s ,
,L BG . s
In what follows the notation a ≤·b stands for “there exists a constant C independent of Cω± , Cµ± , γ ± , s, σ, i, j, K (some of these parameters will be defined later on) such that a ≤ Cb. Equivalently we will use the notation b ·≥ a. Lemma 3.2. Let
δ d−1
< θ < 1, γ∗ > 0, Cω∗ > 0, and C ∗ > 0 be fixed. Assume that C∗ >
Cµ−
Cλ−
,
γ ≥ γ∗ ,
Cω− ≤ Cω∗ .
(3.19)
Then for any 0 < σ < s Eq. (3.13) has a unique solution B ∈ G analytic on Tns−σ , fulfilling the estimate c 1 ,L P − L . BG (3.20) s−σ ≤· b1 exp δ,s σ σ b2 Here c, b1 , b2 are constants depending only θ, n, τ, δ, C ∗ , γ∗ , Cω∗ . Proof. Taking matrix elements among eigenvectors of A− , Eq. (3.13) becomes −i
n k=1
ωk
∂ − − − Bij + (λ− i − λj )Bij + (µi (φ) − µj (φ))Bij = Pij , ∂φk
i = j. (3.21)
The first inequality of (3.19) ensures that (3.16) holds with a suitable C independent of all the relevant constants. Then a direct application of Kuksin’s Lemma yields that (3.13) has a unique analytic solution fulfilling the estimate
1 c 1 Bij Pij . ≤· d exp (3.22) s−σ s γ |i − j d | σ a1 γ a2 σ a3 To estimate of the sup norm of B we use Lemma 5.2. To this end, first remark that |i d − j d | ≥ |i − j |(i δ + j δ ). Then consider the infinite matrices of elements (i δ
Pij , + j δ)
Pij iδ . δ δ j (i + j δ )
Nonautonomous Schrödinger Operators and KAM Methods
473
Assumption H2 entails a fortiori that these infinite matrices represent bounded operators in 2 . Then Lemma 5.2 yields the estimate of the sup norm of B and of A−δ/d BAδ/d , i.e. one has BG s−2σ ≤·
1 σ a1 +n
exp
c P − δ,s σ a3
(3.23)
after redefinition of σ as 2σ and of the constant c. To obtain the estimate of the Lipschitz norm we proceed as follows. Given a function B of ω set >B := B(ω) − B(ω ).
(3.24)
Applying the operator > to (3.21) one gets that >Bij fulfills an analogous equation. Hence by Kuksin’s Lemma its solution >B can be estimated by the same argument applied in estimating B. Dividing by |ω − ω | and applying again Lemma 5.2 one gets >B >ω
s−3σ
≤·
P L δ,s
c 1 L + a exp a P − δ,s , σ 1 σ 3
whence the proof redefining σ as 3σ and taking the sup as above.
" !
We are now ready to state and prove the main result of this section. Lemma 3.3. Consider the system (3.1) within the stated assumptions. Assume furthermore that also (3.19) holds. Then there exists an anti-selfadjoint operator B ∈ G analytically depending on φ ∈ Tns−σ , and Lipschitz continuous in ω ∈ − such that 1. B fulfills the estimate (3.20); 2. For any ω ∈ − the unitary operator eB(ωt) transforms the system (3.1) into the system (3.10); 3. The new perturbation P + fulfills the estimate + L P
δ,s−σ
2 c L ≤· P − δ,s exp b ; σ 1
γ− 4. For any positive K such that (1+K τ ) < − P and a d4 > 1 (independent of K) fulfilling
(3.25)
, there exists a closed set + ⊂ −
δ,s
− − + ≤·γ − 1 + 1 ; K d4
(3.26)
5. If ω ∈ + then assumptions H1–H3 above are fulfilled also by A+ provided the constants are replaced by the new ones defined by γ + = γ − − P − δ,s (1 + K τ ), Cµ+ = Cµ− + P − δ,s , (3.27) L Cω+ = Cω− + P − δ,s , Cλ+ = Cλ− − 2 P − δ,s . (3.28)
474
D. Bambusi, S. Graffi
Proof. The estimates on B are an obvious consequence of Lemma 3.2 above. The estimate (3.25) is an immediate consequence of Lemmas 5.3 and 5.4. Concerning (3.27) and (3.28) the only nontrivial fact to be proved is the existence of a set + such that, for ω ∈ + (3.7) and (3.8) are fulfilled with the new value of γ . Since (3.7) obviously holds, we examine (3.8). First remark that one has − + iδ ; |λ− i − λi | ≤ P δ,s therefore, for |k| ≤ K we can write, by (3.8) and the inequality |i d − j d | ≥ (i δ + j δ ): + − − − − ω · k ≥ − λ − ω · k λi − λ+ λ − P δ,s (i δ + j δ ) j i j γ − − P − δ,s (1 + K τ ) d ≥ |i − j d |. 1 + |k|τ Hence (3.8) is satisfied for such values of k. Fix i, j, k and set: + Rij k (α) := ω ∈ : λ+ − λ − ω · k ≤ α i = j, i j
γ |i d − j d | . + := − − Rij k 1 + |k|τ
(3.29) (3.30)
|k|≥K
By Lemma 5.5 the set (3.29) is nonempty only if |k| ≥ |i d − j d |(Cλ− − γ − ), and by Lemma 5.6, one has d d d d Rij k γ |i − j | ≤· γ |i − j | . 1 + |k|τ (1 + |k|τ )|k| Since |i d − j d | ≥ |i − j |(i d−1 + j d−1 ), the cardinality of the set {(i, j ) | |i d − j d | ≤ L} is bounded by an absolute constant times L2/(d−1) . Hence if τ > n + 1 + 2/(d − 1) one has d d | γ |i − j γ |i d − j d | ≤· R ij k 1 + |k|τ (1 + |k|τ )|k| ij k:|k|≥K |k|≥K,|i d −j d |≤C|k| (3.31) 1 γ ≤·γ ≤· d , K 4 s τ −n+1−2/(d−1) s≥K
and this proves the assertion. ! " 4. Iteration In this section we set up the iteration needed to prove the stated results. First we preassign the values of the various constants occurring in the iterative estimates. Hence we keep , K, s and γ fixed and define, for l ≥ 1, s , sl = sl−1 − σl , Kl := lK, 4l 2 γl = γl−1 − 4l (1 + Klτ ), Cµ,l = Cµ,l−1 + l , Cλ,l = Cλ,l−1 − 2l , Cω,l = Cω,l−1 + l . l
l := (4/3) ,
σl :=
(4.1) (4.2) (4.3)
Nonautonomous Schrödinger Operators and KAM Methods
475
The initial values of the sequences are chosen as follows: γ0 := γ , s0 = s,
Cµ,0 := 0,
Cλ,0 := Cλ ,
Cω,0 := 0. γ
Proposition 4.1. There exist ∗ = ∗ (γ ) > 0 and, for any l ≥ 1, a closed set l ⊂ γ such that, if || < ∗ , one can construct for ω ∈ l a unitary transformation Ul , analytic and quasiperiodic in t with frequencies ω, mapping the system (2.1) into the system ix˙ = (Al + P l (ωt))x,
(4.4)
where: l
1. Ul (ωt) is as follows: Ul (φ) = eB (φ) eB (φ) . . . eB (φ) , and the anti-selfadjoint operj ators B ∈ G, j=1, . . . ,l depend analytically on φ ∈ Tns−σl , are Lipschitz continuous γ in ω ∈ l and fulfill (3.20) with Pl−1 , σl in place of P − , σ , respectively. 2. Al has the form of (3.2) with the upper index “minus” replaced by l, i.e. 1
2
Al = diag(λl1 (ω) + µl1 (ωt, ω), λl2 (ω) + µl2 (ωt, ω), λl3 (ω) + µl3 (ωt, ω), . . . ). (4.5) 3. The corresponding λli and µli fulfill conditions H1, H2, H3 of the previous section, − l l provided λ− i , µi are replaced by λi , µi , respectively. 4. The following estimates hold l P
δ,sl
≤ l ,
G ,L l ≤ l , B δ,sl+1
γ − γ ≤ γl 1 + l l+1
1 (lK)d4
.
(4.6)
Proof. We proceed by induction applying Lemma 3.3. First we want to apply it to the original system (2.1) to the effect of obtaining a system of the form (4.4) with l = 1. To this end remark that (2.1) satisfies all the assumptions of Lemma (3.3) except the nonresonance conditions (3.7) and (3.8) on the frequencies. We have to restrict the set of the frequencies. Define therefore γ
0 := −
ij k
Rij k
γ |i d − j d | 1 + |k|τ
γ and remark that, by Lemma 5.6, − 0 ≤·γ . Hence we can apply Lemma 3.3 and the starting point of our induction procedure is established. To go from step l to step l + 1 one has to verify that the assumptions of Lemma 3.3 are satisfied for any l. More specifically, defining γ ∗ := γ /2 and fixing C ∗ and Cω∗ we must verify that (3.19) holds. It is easy to check that this is true provided is smaller than a constant which in particular vanishes as γ → 0. Then it is immediately realized that the conclusions of Lemma 3.3 imply the thesis if is small enough (independently of l). ! "
476
D. Bambusi, S. Graffi
Proof of Theorem 1.1. Proposition 4.1 ensures the existence of ∗ > 0 such that, for || < ∗ (γ ), liml→∞ γl = γ ∞ , γ ∞ > γ /2, and liml→∞ sl = s/2. This entails the uniform convergence of the operator valued sequence of functions Ul on Tns/4 . Hence the limit, denoted U∞ (ωt), will be analytic and quasi-periodic. Moreover, writing A∞ := diag(liml→∞ (λli + µli )), one has lim Al (φ) − A∞ (φ)δ = 0
l→∞
uniformly on Tns/4 . This proves T1 and T2. The first three estimates of T3 are also clearly ∞ γ /2 l . By the third of (4.6) we implied by the above convergence. Set now γ = l=1
have
| − γ | ≤·γ0 = γ .
Denote now γ ( ∗ ) the inverse function of γ → ∗ (γ ), and define := γ () . Then the fourth estimate of assertion T3 follows. " ! Proof of Corollaries 1.1 and 1.2. Integration of (1.4) yields: ∞
∞ (t)
χi (t) = χi (0)eiλi t eiFi
,
Fi∞ (t) :=
k∈Zn −{0}
µ∞ i,k ω·k
(eiω·kt − 1),
i = 0, 1, . . . , ∞
iFi (t) n where µ∞ χi we i,k , k = Z , are the Fourier coefficients of µi (φ). Setting xi := e get ix˙i = λ∞ x . Formula (1.7) follows taking χ = U φ. Moreover it is trivially verified i i ∞ + #k, ω$ is an eigenvalue of (1.8). ! " that φi0 (ωt)eiλi t solves (1.1) if and only if λ∞ i
Proof of Theorem 1.2. Let A denote the maximal operator in L2 (R) generated by the d2 differential expression − 2 + Q(x). It is well known that A is self-adjoint, strictly dx positive and has compact resolvent and that, denoting λi , i = 1, 2, . . . its eigenvalues, 2α one has λi ∼ i α+2 , i → ∞. Hence Condition A1 is fulfilled if α > 2. A can be realized also as a pseudifferential operator of symbol σA (x, ξ ) := ξ 2 + Q(x) under Weyl quantization. σA (x, ξ ) belongs to the symbol class ?ρα (R) := ?ρα for any 0 < ρ < 1 (notations as in [19], Sect. 23). This class of symbols generates the class Gαρ of pseudodifferential operators in L2 (R) under the Weyl quantization formula: x+y 1 ei(x−y)ξ σA ( , ξ )u(y) dydξ, u ∈ S(R). (Au)(x) = 2π n Rn ×Rn 2 The inverse [A + 1]−1 , whose principal symbol is σ(A+1)−1 (x, ξ ) = (ξ 2 + Q(x) + 1)−1 , belongs to the class G−α ρ . The functional calculus for pseudodifferential operators (see e.g. [19], Chapt. II.10,11 or [4, Chap. 8],) can be applied to operators in these classes. Hence the self-adjoint operator Aq , q > 0 defined by the spectral theorem can also be αq αq realized a pseudodifferential operator in Gρ , with symbol in ?ρ . Its principal symbol 2 q is is σAq (x, ξ ) := (ξ + Q(x)) , and the principal symbol of [Aq + 1]−1 ∈ G−αq ρ 2 q −1 σ(Aq +1)−q (x, ξ ) := [(ξ +Q(x)) +1] . By assumption the symbol of the perturbation β β V belongs to ?ρ for any 0 < ρ < 1, and hence V belongs to Gρ . By the composition
Nonautonomous Schrödinger Operators and KAM Methods
477 −αq+β
property, the operator T := V [Aq + 1]−1 admits a symbol in ?ρ , and it will be bounded if −αq + β ≤ 0 ([19], Thm. 24.3). In turn, it is enough to verify this property for the principal symbol, which in this case, by the composition formula, is given by σTP (x, ξ ) = v(x, ξ ; φ)[(ξ 2 + Q(x))q + 1]−1 . Since here q = δ/d, |σTP (x, ξ )| is bounded ∀ (x, ξ ) ∈ Rn × Rn if there is D > 0 such that |v(x, ξ ; φ)| ≤ D(ξ 2 + |x|α )δ/d . If V ∼ |x|β as |x| → ∞ the inequality is satisfied α−2 2α . Then δ < d − 1 means 0 < δ < for β ≤ αδ/d. Now we can set 1 < d = α+2 α+2 α−2 and therefore β < . ! " 2 5. Technical Lemmas Lemma 5.1. Let fj be analytic functions on Tns . Then for any 0 < σ < s one has 1/2 4n 2 1/2 fj 2 ≤ s . fj s−σ σn j ≥1
j ≥1
Proof. This is Lemma B.3 of [15]; we reproduce its proof here for convenience of the reader. First consider the case n = 1. For each j ≥ 1 there exists a point φj ∈ Ts−σ such that fj ≤ |fj (φj )|. s−σ By the Cauchy integral formula fj (ζ ) 1 dζ, fj (φj ) = 2π i ∂?ρ ζ − φj where 0 < ρ < σ , is a parameter independent of j , and ∂?ρ is the boundary of the set ?ρ := {φ : −ρ < Reφ < 2π + ρ, −(s − σ + ρ) < Imφ < s − σ + ρ}. One has 2 1/2 1 fj (ζ ) fj 2 ≤ dζ s−σ 2π i ∂?ρ ζ − φj j ≥1 j ≥1 1/2 1 4 fj (ζ ) 2 1/2 ≤ |dζ | ≤ sup |fj (φ)|2 . 2π ?ρ ζ − φj ρ Ts j ≥1
j ≥1
(5.1) Taking the limit ρ → σ one gets the result. The case n > 1 follows similarly.
" !
Lemma 5.2. Let F = (Fij ) be a bounded operator on and let the matrix elements (Fij ) be analytic functions of φ ∈ Tns . Let R = (Rij ) be another operator with matrix elements depending analytically on φ ∈ Tnσ and such that 2 ,
sup |Rij (φ)| ≤
φ∈Tns
1 sup |Fij (φ)|, |i − j | φ∈Tns
i = j.
Then, for any φ ∈ Tns , R is bounded in 2 and for any positive σ < s it fulfills the estimate 4n+1 R0,s−σ ≤ n F 0,s . σ
478
D. Bambusi, S. Graffi
Proof. This is Lemma B.4 of [15]; again we reproduce its proof here for convenience of the reader. Fix φ ∈ Ts−σ . By Lemma 5.1 and the Schwarz inequality we have Rij (φ) ≤ Rij j ≥1
s−σ
j ≥1
≤
4n+1 σn
sup
The same estimate holds for R(φ)v2 =
i≥1
≤
j ≥1
Tns
Fij 2
≤
s−σ
j ≥1
|Fij |2
1/2
j ≥1
i≥1 |Fij (φ)|.
2
|Rij (φ)||vj |
1/2
≤
σn
j ≥1
(5.2)
F 0, s.
Hence, for φ ∈ Tnσ ,
≤
j =i
4n+1
1 1/2 |i − j |2
i≥1
|Rij (φ)|
j ≥1
|Rij (φ)||vj |2
j ≥1
4n+1 2 F v2 |Rij (φ)| |Rij (φ)| |vj |2 ≤ 0,s σn i≥1
j ≥1
(5.3) which proves the result. ! " Lemma 5.3. Let B ∈ G be a bounded anti-selfadjoint operator, and let P ∈ B δ be a selfadjoint operator. Then e−B P eB ∈ B δ and, provided BG ≤ 1/2, the following estimate holds −B B (5.4) e P e − P ≤ 4 P δ BG . δ
Moreover, if both B and P are Lipschitz continuous with respect to ω ∈ , then L −B B G ,L . e P e − P ≤ 4 P L δ B δ
(5.5)
Proof. Define P (t) := e−tB P etB . Then P (t) fulfills the linear differential equation P˙ = [B, P ], whence
P (0) = P
P˙ (t)δ ≤ 2 BG P (t)δ %⇒ P (t)δ ≤ exp 2 BG t P δ .
Then (5.4) follows on account of P (t) − P =
t
[B, P (s)]ds.
0
To obtain the Lipschitz estimate remark that (same notation as in the proof of Lemma 3.2), >P fulfills the equation (>P )˙ = [>B, P ] + [B, >P ], and then proceed as in the estimation of the operator norm.
" !
Nonautonomous Schrödinger Operators and KAM Methods
479
Lemma 5.4. Let B ∈ G be the solution of Eq. (3.13) and let 0 < σ < s/2. Then: −B − B e A e − A− − [A− , B]
δ,s−2σ
L −B − B e A e − A− − [A− , B]
δ,s−2σ
− 1 Bδ,s−σ + P δ , σ
− L G ,L 1 L Bδ,s−σ + P δ . ≤· Bs−σ σ ≤· BG s−σ
(5.6) (5.7)
Proof. The proof goes by the same argument of Lemma 5.3; just use the formula e−B A− eB − A− − [A− , B] =
1
s
ds
0
e−s1 B [[A− , B], B]es1 B ds1
0
and compute [A− , B] from Eq. (3.13). The assertion easily follows.
" !
Lemma 5.5. Assume that the sequence λi fulfills Assumption H1a) of Sect. 3 and 3.4, and fix α < Cλ /2; then if i = j the set Rij k (α|i d − j d |) is empty if |k| < (Cλ /2)|i d − j d |. The proof of this lemma is straightforward and therefore omitted. Lemma 5.6. If the sequence λi fulfills assumption H1a) and (3.4) ∃ C > 0 such that, if nCω 1 ≤ , Cλ 2 then one has |Rij k (α)| ≤
Cα . |k|
Proof. Following the proof of Lemma 5 of ref. [18] we fix v ∈ {−1, 1}n such that v · k = |k| and write ω = av + w with w ∈ v ⊥ . One has that, as a function of a t (ω · k)s = |k|(t − s),
t λi − λj s ≤ Cω (i δ + j δ )|v|(t − s),
so, by Lemma 5.5, either Rij k is empty or
t 1 nCω 2 ≥ |k|(t − s), (ω · k + λi − λj ) s ≥ |k|(t − s) 1 − Cλ 2 and therefore by the assumption we can conclude Rij k (α) ≤ 4 α. |k|
" !
480
D. Bambusi, S. Graffi
References 1. Arnold, V.I.: Chapitres supplémentaires de la théorie des equations différentielles ordinaires. Moscou: Mir, 1980 2. Bellissard, J.: Stability and instability in quantum mechanics. In: Trends and Developments in the Eighties, S. Albeverio and Ph. Blanchard, Editors, Singapore: World Scientific, 1985, pp. 1–106 3. Combescure, M.: The quantum stability problem for tim-periodic perturbation of the harmonic oscillator. An. Inst. H. Poincaré 47, 62–82 (1987) ; Erratum ibidem, 451–454 4. Dimassi, M., Sjöstrand, J.: Spectral Asymptotics in the Semiclassical Limit. London Math.Soc.Lecture Notes Serie 268, Cambridge: Cambridge University Press, 1999 5. Duclos, P., Stovicek, P.: Floquet Hamiltonians with Pure Point Spectrum. Commun. Math. Phys. 177, 327–347 (1996) 6. Duclos, P., Stovicek, P., Vittot, M.: Perturbation of an eigen-value from a dense point spectrum: A general Floquet Hamiltonian. Ann. Inst. H. Poincaré Phys. Théor. 71, 241–301 (1999) 7. Gallavotti, G.: The Elements of Mechanics. Berlin–Heidelberg–New York: Springer-Verlag, 1983 8. Graffi, S., Yajima, K.: Absolute Continuity of the Floquet Spectrum for a Nonlinearly Forced Harmonic Oscillator. Commun. Math. Phys. 215, 245–250 (2000) 9. Hagedorn, G., Loss, M., Slawny, J.: Non-stochasticity of time-dependent quadratic Hamiltonians and the spectra of canonical transformations. J. Phys. A 19, 521–531 (1986) 10. Howland, J.: Floquet Operators with Singular Spectrum, I. Ann. Inst. H. Poincaré 49, 309–323 (1989); II, ibidem, 325–334 (1989) 11. Jauslin, H.R., Monti, F.: Quantum Nekhoroshev theorem for quasi-periodic Floquet Hamiltonians. Rev. Math. Phys. 10, 393–428 (1998) 12. Jauslin, H.R., Lebowitz, J.L.: Spectral and stability aspects of quantum chaos. Chaos 1, 114–121 (1991) 13. Jorba, A., Simó, C.: On the reducibility of linear differential equations with quasiperiodic coefficients. J. Differ. Eqs. 98, 111–124 (1992) 14. Joye, A.: Absence of absolutely continuous spectrum of Floquet operators. J. Stat. Phys. 75, 929–952 (1994) 15. Kappeler, T., Pöschel, J.: Perturbation of KdV Equations – The KAM proof. Preprint 1997 16. Kuksin, S.B.: On small-denominators equations with large variable coefficients. J. Appl. Math. Phys. (ZAMP) 48, 262–271 (1997) 17. Nenciu, G.: Floquet operators without absolutely continuous spectrum. Ann. Inst. H. Poincaré 59, 91–97 (1993) 18. Pöschel, J.: A KAM-Theorem for some Partial Differential Equations. Ann. Scuola Norm. Sup. Pisa Cl. Sci. 23, 119–148 (1996) 19. Shubin, M.A.: Pseudodifferential Operators and Spectral Theory. Berlin–Heidelberg–New York: Springer-Verlag, 1987 20. Xu, J., Zheng, Q.: On the reducibility of linear differential equations with quasiperiodic coefficients which are degenerate. Proc. Am. Math. Soc. 126, 1445–1451 (1998) 21. Yajima, K.: Scattering Theory for Schrödinger Operators with Potentials Periodic in Time. J. Math. Soc. Japan 29, 729–743 (1977) Communicated by B. Simon
Commun. Math. Phys. 219, 481 – 487 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Absolutely Singular Dynamical Foliations David Ruelle1 , Amie Wilkinson2 1 I.H.E.S., 35, route de Chartres, 91440 Bures-sur-Yvette, France. E-mail:
[email protected] 2 Department of Mathematics, Northwestern University, 2033 Sheridan Rd., Evanston, IL 60208-2730, USA.
E-mail:
[email protected] Received: 26 September 2000 / Accepted: 8 December 2000
Abstract: Let A3 be the product of the automorphism 21 11 of T2 and of the identity on T1 .A small perturbation g of A3 among volume preserving diffeomorphisms will have an invariant family of smooth circles forming a continuous foliation of T3 . Corresponding to the vector bundle tangent to the circles there is a “central” Lyapunov exponent of (g, volume), which is nonzero for an open set of ergodic g’s. This surprising result of Shub and Wilkinson is complemented here by showing that the volume on T3 has atomic conditional measures on the ’s: there is a finite k such that almost every carries k atoms of mass 1/k. Introduction
Let A2 be the automorphism of the 2-torus, T2 = R2 /Z2 , given by 21 11 . Let A3 be the automorphism of the 3-torus T3 = R3 /Z3 given by A02 01 . Let Diff 2µ (T3 ) be the set of C 2 diffeomorphisms of T3 that preserve Lebesgue-Haar measure µ. In [SW1], M. Shub and A. Wilkinson prove the following theorem. Theorem. Arbitrarily close to A3 there is a C 1 -open set U ⊂ Diff 2µ (T3 ) such that for each g ∈ U , 1. g is ergodic. 2. There is an equivariant fibration π : T3 → T2 such that πg = A2 π . The fibers of π are the leaves of a foliation W gc of T3 by C 2 circles. The 2-jets along the fibers of π vary continuously in M. 3. There exists λc > 0 such that, for µ-almost every w ∈ T3 , if v ∈ Tw T3 is tangent to the leaf of Wgc containing w, then lim
n→∞
1 log Tw g n v = λc . n
482
D. Ruelle, A. Wilkinson
4. Consequently, there exists a set S ⊆ T3 of full µ-measure that meets every leaf of Wgc in a set of leaf-measure 0. The foliation Wgc is not absolutely continuous. Additionally, it is shown that the diffeomorphisms in U are nonuniformly hyperbolic and Bernoullian. In this note, we prove: Theorem I. Let g satisfy conclusions 1–3 of the previous theorem. Then there exist S ⊆ T3 of full µ-measure and k ∈ N such that S meets every leaf of Wgc in exactly k points. The foliation Wgc is absolutely singular. Remark. Theorem I was also proved several years ago by Anatole Katok, as a first step in an attempt to show that examples such as those later constructed in [SW1] cannot exist (since the full argument turned out not to be valid, this work remains unpublished). We are indebted to Katok for useful conversations, and for pointing out the argument that shows that the number k in Theorem I might necessarily be greater than 1. We also thank Michael Shub for useful conversations. In Katok’s example of an absolutely singular foliation in [Mi], the leaves of the foliation meet the set of full measure in one point. In the [SW1] examples, the set S may necessarily meet leaves of Wgc in more than one point, as the following argument of Katok’s shows. It follows from Theorem II in [SW2] that for k ∈ Z+ and for small a, b > 0, the map g = ja,k ◦ hb satisfies the hypotheses of Theorem I, where hb (x, y, z) = (2x + y, x + y, x + y + z + b sin 2πy), and ja,k (x, y, z) = (x, y, z) + a cos(2π kz) · (1 +
√
5, 2, 0).
For k ∈ N, let ρk be the vertical translation that sends (x, y, z) to (x, y, z + k1 ). Note that hb ◦ ρk = ρk ◦ hb and ja,k ◦ ρk = ρk ◦ ja,k . Thus g ◦ ρk = ρk ◦ g. The fibration π : T3 → T2 was obtained in [SW1] by using the persistence of normally hyperbolic submanifolds under perturbations. In the present case the symmetries ρk preserve the fibers of the trivial fibration P : T3 → T2 from which one starts, and also the maps g. Therefore the fibers of π : T3 → T2 (i.e., the leaves of center foliation Wgc ) are invariant under the action of the finite group < ρk >. Let S be the (full measure) set of points in T3 for which the center direction is a positive Lyapunov direction (i.e. for which conclusion 3 holds). Since ρk (Wgc ) = Wgc , it follows that ρk S = S. If p ∈ S ∩ W c (p), then ρk (p) ∈ ρk (S) ∩ ρk (W c (p)) = S ∩ W c (p); that is, S ∩ W c (p) contains at least k points. Thus Theorem I is “sharp” in the sense that we cannot say more about the value of k in general. Theorem I has an interesting interpretation. Recall that a G-extension of a dynamical system f : X → X is a map fρ : X × G → X × G, where G is a compact group, of the form (x, y) → (f (x), ρ(x)y). If f preserves ν, and ρ : X → G is measurable, then fρ preserves the product of ν with Haar measure on G. A Z/kZ-extension is also called a k-point extension. Let λ be an invariant probability measure for a k-point extension of f : X → X, and {λx } the family of conditional measures associated with the partition {{x} × G}. We remark that if λ is ergodic, then each atom of λx must have the same weight 1/k (up to a set of λ-measure 0).
Absolutely Singular Dynamical Foliations
483
Now take g ∈ U . Choose a coherent orientation on the leaves of {π −1 (x)}x∈T 2 . Take h : T3 → T2 × T to be any continuous change of coordinates such that h restricted to π −1 (x) is smooth and orientation preserving to {x} × T. We may then write F = h ◦ g ◦ h−1 : T2 × T → T2 × T in the form F (x, p) = (A2 x, ϕx (p)), where ϕx : T → T is smooth and orientation preserving. If P : T2 × T → T2 is the projection on the first factor of the product, we have P ◦ h = π . Therefore, writing λ = h∗ µ, we have P ∗ λ = π ∗ µ. Let {λx } be the disintegration of the measure λ along the fibers {x} × T. By a further measurable change of coordinates, smooth along each {x} × T fiber, we may assume that λ-almost everywhere, the atoms of λx are at l/k, for l = 0, . . . , k − 1. But then ϕx permutes the atoms cyclically, and we obtain the following corollary. Corollary. For every g ∈ U there exists k ∈ N such that (T3 , µ, g) is isomorphic to an (ergodic) k-point extension of (T2 , π ∗ µ, A2 ). M. Shub has observed that if g = ja,k ◦hb , then entropy considerations imply that π ∗ µ is actually Lebesgue measure on T2 . Hence, for this g, π is a finite-to-one semiconjugacy between g and A2 , sending Lebesgue measure on T3 to Lebesgue measure on T2 . 1. Proof of Theorem I The proof of Theorem I follows from a more general result about fibered diffeomorphisms. Before stating this result, we describe the underlying setup and assumptions. Let (X, ν) be a probability space, and let f : X → X be invertible and ergodic with respect to ν. It is convenient to assume that X is in fact a Polish topological space since this assumption is made in the study of random smooth dynamical systems in [BL]1 . Let M be a compact Riemannian manifold and φ a map X → Diff 1+α (M). Consider the skew-product transformation F : X × M → X × M given by F (x, p) = (f (x), ϕx (p)) and assume that it is (Borel) measurable. Also, let µ be an F -invariant ergodic probability measure on X × M such that π∗ µ = ν, where π : X × M → X is the projection onto the first factor. (0) (k) For x ∈ X, let ϕx be the identity map on M and for k ∈ Z, define ϕx by ϕx(k+1) = ϕf k (x) ◦ ϕx(k) . Since the tangent bundle to M is measurably trivial, the derivative map of ϕ along the M direction gives a cocycle X × M × Z → GL(n, R), where n = dim(M): (x, p, k) → Dp ϕx(k) . If log+ Dϕ ∈ L1 (X × M, µ), then Oseledec’s Theorem and ergodicity imply that the Lyapunov exponents λ1 < λ2 · · · < λl of this cocycle exist and are constant for µ-a.e. (x, p). We call these the fiberwise exponents of F . 1 A Polish space is a separable topological space with topology given by a complete metric. We use only the Borel structure defined by the topology.
484
D. Ruelle, A. Wilkinson
The next result, Theorem II, states that if these exponents are negative, then µ has atomic disintegration along M-fibers of X × M. The proof of Theorem II uses a fibered Pesin stable manifold theorem, which requires a stronger hypothesis on ϕ than integrability of log+ Dϕ. Namely, we asssume that for some α > 0, log+ Dϕα ∈ L1 (X, ν), where · α is the α-Hölder norm. Theorem II. Suppose that λl < 0. Then there exists a set S ⊆ X × M and an integer k ≥ 1 such that • µ(S) = 1, • For every (x, p) ∈ S, we have #(S ∩ {x} × M) = k. This has the immediate corollary: Corollary. Let f ∈ Diff 1+α (M). If µ is an ergodic measure with all of its exponents negative, then it is concentrated on the orbit of a periodic sink. The corollary has a simple proof using regular neighborhoods. Our proof is a fibered version. Theorem I is also a corollary of Theorem II. For this, the argument is actually applied to the inverse of g, which has negative fiberwise exponents, rather than to g itself, whose fiberwise exponents are positive. As we described in the previous remarks, there is a continuous change of coordinates, smooth along the fibers of π in which g −1 is expressed as a skew product of T2 × T: F (x, p) = (A2 x, ϕx (p)). Since the 2-jets of the fibers of π vary continuously, (by Assumption 2), the maps x → ϕx can be chosen to vary continuously in the C 2 -norm on Diff 2 (M). This implies that log+ Dϕα ∈ L1 (X, ν), for α = 1. Remark. Without the assumption that f is invertible, Theorem II is false. An example is described by Y. Kifer [Ki], which we recall here. Let f : T → T be a C 1+α diffeomorphism with exactly two fixed points, one attracting and one repelling. Consider the following random diffeomophism of T: with probability p ∈ (0, 1), apply f , and with probability 1 − p, rotate by an angle chosen randomly from the interval [−-, -]. Let X = ({0, 1} × T)N . To generate a sequence of diffeomorphisms f0 , f1 , . . . according to the above rule, we first define ϕ : X → Diff 1+α (T) by f if ω(0) = (0, θ ), ϕ(ω) = Rθ if ω(0) = (1, θ), where Rθ is rotation through angle θ . Next, we let ν- be the product of p, 1 − pmeasure on {0, 1} with the measure on T that is uniformly distributed on [−-, -]. Then corresponding to ν-N -almost every element ω ∈ X is the sequence {fk = ϕ(σ k (ω))}∞ k=0 , where σ : X → X is the one-sided shift σ (ω)(n) = ω(n + 1). Put another way, the random diffeomorphism is generated by the (noninvertible) skew product τ : X × T → X × T, where τ (ω, x) = (σ (ω), ϕ(ω)(x)). An ergodic ν- -stationary measure for this random diffeomorphism is a measure µ- on T such that µ- × ν-N is τ -invariant and ergodic. Such measures always exist ([Ki], Lemma I.2.2), but, for this example, there is an ergodic stationary measure with additional special properties.
Absolutely Singular Dynamical Foliations
485
Specifically, for every - > 0, there exists an ergodic ν- -stationary measure µ- on T such that, as - → 0, µ- → δx0 , in the weak topology, where δx0 is Dirac measure concentrated on the sink x0 for f . From this, it follows that, as - → 0, the fiberwise Lyapunov exponent for µ- approaches log |f (x0 )| < 0, which is the Lyapunov exponent of δx0 . Thus, for - sufficiently small, the fiberwise exponent for τ with respect to µ- is negative. Nonetheless, it is easy to see that µ- for - > 0 cannot be uniformly distributed on k atoms; if µ- were atomic, then τ -invariance of µ- × ν-N would imply that, for every x ∈ T, µ- ({Rθ (x)})dθ µ- ({x}) = pµ- ({f −1 (x)}) + (1 − p) = pµ- ({f
−1
−-
(x)}),
which is impossible if µ- has finitely many atoms. In fact, µ- can be shown to be absolutely continuous with respect to Lebesgue measure (see [Ki], p. 173 ff. and the references cited therein). Hence invertibility is essential, and we indicate in the proof of Theorem II where it is used. Proof of Theorem II. We first establish the existence of fiberwise “stable manifolds” for the skew product F . A general theory of stable manifolds for random dynamical systems is worked out in ([Ki], Theorem V.1.6 and more explicitly in [BL]). Since we are assuming that all of the fiberwise exponents for F are negative, we are faced with the simpler task of constructing fiberwise regular neighborhoods for F (see the Appendix by Katok and Mendoza in [KH]). We outline a proof, following closely [KH]. Theorem 1.1 (Existence of Regular Neighborhoods). There exists a set 50 ⊆ X × M of full measure such that for - > 0: • There exists a measurable function r : 50 → (0, 1] and a collection of embeddings 7(x,p) : B(0, q(x, p)) → M such that 7(x,p) (0) = p and exp(−-) < r(F (x, p))/r(x, p) < exp(-). • If ϕ(x,p) = 7F−1(x,p) ◦ ϕx ◦ 7(x,p) : B(0, r(x, p)) → Rn , then D0 ϕ(x,p) satisfies −1 exp(λ1 − -) ≤ D0 ϕ(x,p) −1 , D0 ϕ(x,p) ≤ exp(λl + -).
• The C 1 distance dC 1 (ϕ(x,p) , D0 ϕ(x,p) ) < - in B(0, r(x, p)). • There exist a constant K > 0 and a measurable function A : 50 → R such that for y, z ∈ B(0, r(x, p)), K −1 d(7(x,p) (y), 7(x,p) (z)) ≤ y − z ≤ A(x)d(7(x,p) (y), 7(x,p) (z)), with exp(−-) < A(F (x, p))/A(x, p) < exp(-). Proof. See the proof of Theorem S.3.1 in [KH].
Decompose µ into a system of fiberwise measures dµ(x, p) = dµx (p)dν(x). Invariance of µ with respect to F implies that, for ν-a.e. x ∈ X, ϕx ∗ µx = µf (x) . Corollary. There exists a set 5 ⊆ X × M, and real numbers R > 0, C > 0, and c < 1 such that
486
D. Ruelle, A. Wilkinson
(1) µ(5) > .5, and, if (x, p) ∈ 5, then µx (5x ) > .5, where 5x = {p ∈ M | (x, p) ∈ 5}. (2) If (x, p) ∈ 5 and dM (p, q) ≤ R, then dM (ϕx(m) (p), ϕx(m) (q)) ≤ Ccm dM (p, q), for all m ≥ 0. Proof. This follows in a standard way from the Mean Value Theorem and Lusin’s Theorem. To prove Theorem II, it suffices to show that there is a positive ν-measure set B ⊆ X, such that for x ∈ B, the measure µx has an atom, as the following argument shows. For x ∈ X, let d(x) = supp∈M µx (p). Clearly d is measurable, f -invariant, and positive on B. Ergodicity of f implies that d(x) = d > 0 is positive and constant for almost all x ∈ X. Let S = {(x, p) ∈ X × M | µx (p) ≥ d}. Observe that S is F -invariant, has measure at least d, and hence has measure 1. The conclusions of Theorem II follow immediately. Let 5, R > 0, C > 0, and c < 1 be given by Corollary 1, and let B = π(5). Let N be the number of R/10-balls needed to cover M. We now show that for ν-almost every x ∈ B, the measure µx has at least one atom. For x ∈ X, let m(x) = inf diam (Uj ), where the infimum is taken over all collections of closed balls U1 , . . . , Uk in M such that k ≤ N and µx ( kj =1 Uj ) ≥ .5. Let m = ess sup x∈B m(x). We now show that m = 0. If m > 0, then there exists an integer J such that C>cJ N < m/2,
(1)
where > is the diameter of M. Let U be a cover of M by N closed balls of radius R/10. For x ∈ B, let U1 (x), . . . , Uk(x) (x) be those balls in U that meet 5x . Since k(x) these balls cover 5x , and µx (5x ) > .5, it follows that µx ( j =1 Uj (x)) ≥ .5. But (i)
ϕx
∗ µx
= µf i (x) , and so it’s also true that µf i (x) (
k(x) j =1
ϕx(i) (Uj (x))) ≥ .5,
(2)
for all i. (i) We now use the fact that ϕx contracts regular neighborhoods to derive a contradiction. The balls Uj (x) meet 5x and have diameter less than R/10, and so by Corollary 1, (2), we have diam (ϕx(i) (Uj (x))) ≤ C>ci .
(3)
Let τ : B → N be the first-return time of f J to B, so that f J τ (x) (x) ∈ B, and f J i (x) ∈ / B, for i ∈ {1, . . . , τ (x) − 1}. Decompose the set B according to these first return times: ∞ B= Bi (mod 0), i=1
Absolutely Singular Dynamical Foliations
487
where Bi = τ −1 (i). Because f is invertible and f −1 preserves measure, we also have the mod 0 equivalence: B :=
∞
f J i (Bi ) = B
(mod 0).
i=1
Let y ∈ B . Then y = f J i (x), where x ∈ Bi ⊆ B, for some i ≥ 1. It follows from the definition of m(y) and inequalities (2), (3) and (1) that m(y) ≤
k(x) j =1
diam (ϕx(J i) (Uj (x)))
≤ Ck(x)>cJ i ≤ CN >cJ < m/2. But then m = ess sup x∈B m(x) = ess sup y∈B m(y) < m/2, contradicting the assumption m > 0. Thus m = 0, and, for ν-almost every x ∈ B, we have m(x) = 0. If m(x) = 0, then there is a sequence of closed balls U 1 (x), U 2 (x), · · · with limi→∞ diam (U i (x)) = 0 and µx (U i (x)) ≥ .5/N, for all i. Take pi ∈ U i (x); any accumulation point of {pi } is an atom for µx . Since we have shown that µx has an atom, for ν-a.e. x ∈ B, the proof of Theorem II is complete.
References [BL]
Bahnmüller, J. and Liu. P.-D.: Characterization of measures satisfying Pesin’s entropy formula for random dynamical systems. J. Dynam. Diff. Eq. 10, no. 3, 425–448 (1998) [KH] Katok, A. and Hasselblatt, B.: Introduction to the modern theory of dynamical systems. Cambridge, 1995 [Ki] Kifer, Y.: Ergodic theory of random transformations. Boston: Birkhäuser, 1986 [Mi] Milnor, J.: Fubini foiled: Katok’s paradoxical example in measure theory. Math. Intelligencer 19, no. 2, 30–32 (1997) [SW1] Shub, M. and Wilkinson, A.: Pathological foliations and removable zero exponents. Inv. Math. 139, 495–508 (2000) [SW2] Shub, M. and Wilkinson, A.: A stably Bernoullian diffeomorphism that is not Anosov. Preprint Communicated by Ya. G. Sinai
Commun. Math. Phys. 219, 489 – 522 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations M. D. Groves1, , G. Schneider2 1 Mathematisches Institut A, Universität Stuttgart, Pfaffenwaldring 57, 70569 Stuttgart, Germany 2 Mathematisches Institut, Universität Bayreuth, 95440 Bayreuth, Germany
Received: 8 November 2000 / Accepted: 12 December 2000
Abstract: We consider modulating pulse solutions for a nonlinear wave equation on the infinite line. Such a solution consists of a permanent pulse-like envelope steadily advancing in the laboratory frame and modulating an underlying wave-train. The problem is formulated as an infinite-dimensional dynamical system with one stable, one unstable and infinitely many neutral directions. Using a partial normal form and invariantmanifold theory we establish the existence of modulating pulse solutions which decay to small-amplitude disturbances at large distances. 1. Introduction 1.1. The problem and main result. We consider the nonlinear wave equation ∂t2 u = ∂x2 u − u + g(u) on the infinite line x ∈ R, where g : R → R is a smooth, odd function which satisfies g(u) = O(u3 ) and g (0) > 0. This class of equations includes for example the sineGordon equation ∂t2 u = ∂x2 u − sin u and the φ 4 equation ∂t2 u = ∂x2 u − u + u3 ,
(1)
upon which we will concentrate in order to keep the notation simple. It is well known that on time-scales of order O(1/ 2 ) Eq. (1) has O( )-amplitude solutions which are slow spatial and temporal modulations of an underlying wave train ei(k0 x−ω0 t) , where Supported by a Research Fellowship from the Alexander von Humboldt Foundation. Permanent address: Department of Mathematical Sciences, Loughborough University, Loughborough, LE11 3TU, UK
490
M. D. Groves, G. Schneider
k0 and ω0 are related by the linear dispersion relation ω02 = k02 + 1. Such solutions are described by the formula uA (x, t) = (A(X, T )ei(k0 x−ω0 t) + c.c.) + O( 2 ), where X = (x − cg t), T = 2 t, cg = k0 /(1 + k02 )1/2 is the linear group velocity and A satisfies the nonlinear Schrödinger equation 2iω0 ∂T A + (1 − (cg )2 )∂X2 A + 3|A|2 A = 0
(2)
(e.g. see Kalyakin [10], Kirrmann, Schneider and Mielke [13] and Schneider [23]). Equation (2) possesses a three-parameter family of time-periodic solutions of the form A(X, T ) = B(X − X0 )e−iγ0 T eiφ0 , in which the real-valued function B satisfies the second-order ordinary differential equation BXX = C1 B − C2 B 3 , where C1 = −2γ0 ω0 /(1 − (cg )2 ), C2 = 3/(1 − (cg )2 ). For γ0 < 0 and ω0 > 0 this equation has two homoclinic solutions 2C1 1/2 1/2 Bpulse (X) = ± sech (C1 X) C2 which connect the origin with itself. This procedure therefore identifies modulating pulse solutions of the nonlinear wave equation which are described by the approximate formula upulse = (Bpulse (X)e−iγ0 T ei(k0 x−ω0 t) + c.c.)
= (Bpulse ( (x − cg t))eik0 (x−(cp +γ1
2 )t)
+ c.c.)
accurately over time-scales of order O(1/ 2 ); here cp = (1 + k02 )1/2 /k0 is the linear phase velocity and γ1 = γ0 /k0 . In this paper we consider whether Eq. (1) has modulating pulse solutions which exist for all times t ∈ R. We establish the following result. Theorem 1. Fix a positive integer n and a positive real number k0 . For sufficiently small
> 0 (depending upon n and k0 ) there exists an infinite-dimensional, continuous family of modulating pulse solutions to Eq. (1) of the form u(x, t) = v(x − cg t, x − cp t), where v is 2π/k0 -periodic in its second argument and cp = cp + γ1 2 ,
cg =
1 . cp
These solutions satisfy v(ξ, y) = v(−ξ, y), where and limξ →±∞ h(ξ, ) = 0.
|v(ξ, y) − 2h(ξ, ) sin k0 y| ≤ n+1 , h(ξ, ) = Bpulse ( ξ ) + O( 2 )
ξ, y ∈ R,
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
491
The modulating pulse solutions identified in Theorem 1 consist of a permanent pulselike envelope which moves with constant speed cg and modulates a periodic wave train moving with velocity cp . A modulating pulse of this kind is shown in Fig. 1. In the subsequent analysis it will become clear that our result is optimal in the sense that we cannot expect that limξ →±∞ v(ξ, y) = 0.
cg
cp Fig. 1. A modulating pulse solution
1.2. The method. Theorem 1 is proved by formulating the governing equation for v(ξ, y) as an evolutionary problem in which the unbounded spatial coordinate ξ plays the role of time. This idea was introduced for nonlinear problems by Kirchgässner [11] and has become known as “spatial dynamics”. The evolutionary problem is considered as an infinite-dimensional dynamical system in which the coordinates are the components of the Fourier sine-series expansion of v(·, y). The dynamical system obtained in this fashion is a reversible Hamiltonian system with infinitely many degrees of freedom. The spectrum of the linearised system consists of infinitely many purely imaginary eigenvalues together with one positive and one negative real eigenvalue which are both O( ) in the bifurcation parameter. Our task is to find pulse-like solutions of this dynamical system. To understand the main steps and difficulties in the proof of Theorem 1 using the above spatial dynamics formulation, it is helpful to review an analogous finite-dimensional problem. Consider a Hamiltonian system with n degrees of freedom whose Hamiltonian function takes the form n
H (q, p, ) =
µm 1 2 1 2 ) + O(|(q, p)|3 ), p1 − C1 2 q12 + (q 2 + pm 2 2 2 m m=2
in which C1 > 0 and the higher-order term is a smooth function of q, p with coefficients which depend analytically upon . The associated linearised system has one √ √ positive real eigenvalue C1 , one negative real eigenvalue − C1 and n − 1 pairs of purely imaginary eigenvalues ±iµm , m = 2, . . . , n. Let us also suppose that the system is reversible, so that the Hamiltonian vector field anticommutes with the reverser S : (q, p) → (q, −p); it follows that (q(−ξ ), −p(−ξ )) solves Hamilton’s equations
492
M. D. Groves, G. Schneider
whenever (q(ξ ), p(ξ )) is a solution. This Hamiltonian system is ammenable to standard normal-form theory (see Elphick et al. [5] and Meyer and Hall [15, Ch. VII]), an application of which yields the following result. Lemma 1. For each N ≥ 3 there exists a near-identity, analytic, symplectic change of coordinates with the property that n
H (q, p, ) =
µm 1 2 1 2 2 + qm ) p1 − C1 2 q12 + (pm 2 2 2 m=2
+ HBNF (q12 , 21 (q22 + p22 ), . . . 21 (qn2 + pn2 ), ) + O(|(q, p, )|N−1 |(q, p)|3 ) in the new coordinates. The term HBNF is a polynomial of order N + 1 in (q, p, ) which satisfies HBNF (q, p, ) = O(|(q, p)|3 ). The change of coordinates preserves the reversibility. The finite-dimensional Hamiltonian system obtained by using this lemma and omitting the O(|(q, p, )|N−1 |(q, p)|3 ) remainder term in the Hamiltonian function is completely integrable (the truncated Hamiltonian function and the action functionals Im = 2 2 1 2 (qm + pm ), m = 2, . . . , n are independent first integrals). Its solution set can therefore be completely described in a systematic manner, as the following remarks indicate. The subspace {(q1 , p1 ) = (0, 0)} is clearly invariant under the flow, and since the action variables I2 , . . . , In act as integrals it is foliated by (n − 1)-dimensional tori. We may use this construction to obtain a foliation of the 2n-dimensional phase space into a family of two-dimensional reduced phase spaces parameterised by I2 , . . . , In : for each fixed value of these integrals the equations for q1 , p1 , namely ∂ξ q1 = p1 , ∂ξ p1 = C1 q1 2 − 2q1 ∂1 HBNF (q12 , I2 , . . . , In , ) = C1 q1 2 − C2 q13 + . . . , where C1 , C2 > 0, describe a one-degree-of-freedom Hamiltonian system whose phase portrait is calculated by elementary methods. The angles α2 , . . . , αn corresponding to the action variables I2 , . . . , In are recovered by quadrature. In particular, the above system has a pair of small-amplitude orbits which are homoclinic to the origin whenever the coefficient C2 /2 of q14 in HBNF is positive, and these orbits correspond to solutions of the truncated Hamiltonian system from Lemma 1 which are homoclinic to an (n − 1)torus in {(q1 , p1 ) = (0, 0)}. For I2 = . . . = In = 0 we recover two homoclinic orbits ±h(ξ ) which connect the origin with itself. A dynamical system with the above eigenvalue structure has a one-dimensional stable and a one-dimensional unstable manifold at zero; a homoclinic orbit at zero arises when these two manifolds coincide in the 2n-dimensional phase space. Such a situation can clearly only arise in 2n-dimensional space as the result of a degeneracy. In the case of the truncated Hamiltonian system from Lemma 1 the degeneracy is the invariance of {(qj , pj ) = (0, 0), j = 2, . . . , n}, itself a consequence of the complete integrability of the system. This degeneracy, and hence the homoclinic orbits at zero, generically do not persist when the higher-order remainder terms are re-introduced. One can also consider pulse-like solutions which consist of homoclinic connections to small-amplitude periodic or quasiperiodic orbits. The stable manifold to an invariant
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
493
m-torus consists of solutions which start at time ξ = 0 and converge to the torus as ξ → ∞; it is an (m + 1)-dimensional invariant manifold parameterised by time ξ and the initial data for the angles defining the quasiperiodic flow on the torus. The stable manifold to one of the invariant (n − 1)-tori in {(qj , pj ) = (0, 0), j = 2, . . . , n} for the truncated Hamiltonian system from Lemma 1 can be calculated explicitly, and it is readily confirmed that this n-dimensional manifold intersects the n-dimensional symmetric section ) = Fix S transversally in 2n points. Such points correspond to points of symmetry on homoclinic connections to the (n − 1)-torus. Because the intersection is transversal it generically survives the re-introduction of the higher-order remainder terms, and we expect that 2n pulse-like solutions persist. This procedure is carried out for the case n = 2, in which the tori are periodic orbits, by Groves and Mielke [8] for a finite-dimensional system which arises in water-wave theory. For n > 2 there is an additional complication, namely that the persistence of the invariant (n − 1)-tori is itself an issue which has to be resolved using KAM theory (see Pöschel [20] and the references therein). The above construction also relies upon the fact that the invariant tori are of maximal dimension n − 1 in the subspace {(qj , pj ) = (0, 0), j = 2, . . . , n}. It can therefore not be extended to problems with an infinite-dimensional centre part, since KAM theory is at present restricted to existence results for finite-dimensional invariant tori in infinite-dimensional phase space (e.g. see Kuksin [14], Craig and Wayne [3] and Pöschel [21]). Another possibility of constructing pulse-like solutions is to consider intersections of an appropriately defined centre-stable manifold with the symmetric section. Following standard constructions we can define a local centre-stable manifold W cs at time ξ = 0 for solutions near h(ξ ) in the following manner. Consider solutions v on ξ ≥ 0 of the form v(ξ ) = w(ξ ) + h(ξ ), where the hyperbolic part of w(ξ ) remains O(δ)-small for all ξ ≥ 0 (so that the hyperbolic part of v(ξ ) remains O(δ)-close to h(ξ ) for all ξ ≥ 0). The function w(ξ ) satisfies a nonautonomous differential equation and we construct W cs from the initial data w(0) on solutions whose centre parts are O(δ) on some time interval starting at ξ = 0; it is (2n − 1)-dimensional and parameterised by the centre and stable components of the initial data. Of course intersections of W cs and ) do not necessarily correspond to points of symmetry on solutions which stay O(δ)-close to h(ξ ) for all ξ ∈ R (the centre part of any solution can a priori grow polynomially in time). We can however exploit the Hamiltonian structure to show that there is a submanifold W˜ cs of W cs which constitutes a global centre-stable manifold; this manifold is constructed by restricting to sufficiently small initial data (see below). The manifolds W cs and W˜ cs can be calculated explicitly for the truncated Hamiltonian system from Lemma 1; in particular one finds that W˜ cs intersects ) transversally in a continuum of points. The usual transversality argument therefore indicates that the symmetric pulse-like solutions survive the re-introduction of the higher-order remainder terms. The existence of the submanifold W˜ cs of W cs is a consequence of the positivedefiniteness of H on the locally invariant centre manifold W c . This manifold is constructed by considering solutions on ξ ∈ R whose hyperbolic part remains O(δ) for all ξ ∈ R. The manifold W c consists of those points on such solutions whose centre parts are O(δ); it is (2n − 2)-dimensional and parameterised by the initial data for the centre parts of the solutions. A standard argument shows that W c can be described as the graph of a quadratic reduction function which defines the hyperbolic coordinate of a point as a function of its centre coordinate. Inserting the reduction function into the Hamiltonian, we find that it is positive definite on W c , and the usual Lyapunov stability argument shows that solutions which lie on W c with sufficiently small centre part at some time
494
M. D. Groves, G. Schneider
ξ remain on W c with O(δ) centre part for all subsequent times. Finally, we apply a familiar result from dynamical systems theory which asserts that a solution h(ξ ) + wcs (ξ ) with wcs (0) ∈ W cs converges exponentially to a solution vc (ξ ) on W c as ξ → ∞. By taking sufficiently small initial data wcs (0) we can find ξ > 0 so that wcs (ξ ) and |wcs (ξ ) + h(ξ ) − vc (ξ )| are both sufficiently small at ξ = ξ (the first quantity grows at most polynomially in time while the second decays exponentially in time). It follows that the centre part of wcs (ξ ) remains O(δ) for all ξ > ξ and hence that wcs (ξ ) + h(ξ ) remains O(δ)-close to h(ξ ) for all ξ ≥ 0. In this paper we use the above strategy based upon centre-stable manifolds to prove Theorem 1 in its infinite-dimensional setting. The problem is formulated as a reversible Hamiltonian system with infinitely many degrees of freedom in Sect. 2. A normalform result as complete as that given in Lemma 1 is not possible due to asymptotic resonances among the purely imaginary eigenvalues, but a partial normal-form theory is available which yields a sequence of approximate systems, each of which possesses two homoclinic orbits connecting the origin to itself (see Sect. 3). The local centre-stable and centre manifolds W cs , W c described above are constructed in Sect. 4, while the construction of the global centre-stable manifold W˜ cs is handled in Sect. 5 (with δ = n+1 in the above notation). The semilinearity of the equation studied here allows us to follow the standard method used for finite-dimensional systems, in which the nonlinearities are modified by use of a cut-off function in the centre part and the solutions defining the required manifold are characterised as solutions of an integral equation. The integral equation is solved using the contraction-mapping principle, and using initial data from the solutions we define global centre-stable and stable manifolds for the modified equations which correspond to local centre-stable and stable manifolds for the original equations. It is important to obtain precise estimates of the -dependence of the manifolds in the singular limit → 0 studied here; we therefore give full details of their construction. Sect. 6 contains the proof that W˜ cs intersects the symmetric section ) in a continuum of points. The proof is an analytical version of the heuristic transversality argument given above. The key is an application of the contraction-mapping principle to the hyperbolic part of initial data on W cs ; here the precise dependence of solutions upon clearly plays a central role. Spatial dynamics methods have recently been used in a number of applications involving modulating travelling waves. Haragus-Courcelle and Schneider [9] examined small-amplitude bifurcating fronts in the Taylor-Couette problem which occur when a trivial ground state (the Couette flow) loses stability and bifurcates into a spatially periodic pattern (the Taylor vortices). Their spatial dynamics formulation also has an infinite dimensional centre part at criticality, but as the bifurcation parameter is increased from zero all the eigenvalues leave the imaginary axis, four with speed O( ) and all others with speed O( 1/2 ). The small-amplitude dynamics are therefore controlled by a four-dimensional centre-manifold. A similar situation had previously been discussed for the Swift-Hohenberg equation by Eckmann and Wayne [4]. Sandstede and Scheel [22] consider a bifurcation scenario for a reaction-diffusion equation in which the zero equilibrium becomes unstable and bifurcates into a spatially periodic pattern (a Turing instability). They assume the existence of a large-amplitude localised pulse at criticality and examine how the Turing instability affects it. Their problem has a finite-dimensional centre part but an infinite-dimensional hyperbolic part and is therefore ill-posed. By contrast, our problem is well-posed but has an infinite-dimensional centre part. Cubic nonlinear Schrödinger equations have been derived as modulation equations in applications such as nonlinear optics (Newell and Moloney [18]), nonlinear elasticity
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
495
(Fu [6]) and water waves (Zakharov [25], Craig, Sulem and Sulem [2]). The extension of the present result to these problems is planned as future research. There the situation is complicated by the quasilinearity of the governing equations and the presence of an additional infinite number of stable and unstable directions in the spatial dynamics formulation. 2. Spatial Dynamics Formulation We look for modulating pulse solutions of the nonlinear wave Eq. (1) of the form u(x, t) = v(x − cg t, x − cp t) = v(ξ, y), where v is periodic in y with period 2π/k0 for some k0 > 0. Making this Ansatz, one arrives at the equation (1 − cg2 )∂ξ2 v + 2(1 − cg cp )∂ξ ∂y v + (1 − cp2 )∂y2 v − v + v 3 = 0. It is convenient to choose cp = cp + γ1 2 ,
cg = 1/cp ,
so that cp is a small perturbation of the phase velocity cp of the linearised problem and the equation simplifies to (1 − cg2 )∂ξ2 v + (1 − cp2 )∂y2 v − v + v 3 = 0.
(3)
We now formulate Eq. (3) as an infinite-dimensional dynamical system in which the coordinate ξ is the time-like variable and p is a spatial coordinate. Introducing the new variable w = ∂ξ v, we find that ∂ξ v =w, ∂ξ w = −
(4) 1 − cp2 2 ∂ v 1 − cg2 y
+
1 1 v− v3, 2 1 − cg 1 − cg2
(5)
and these equations can be understood as a dynamical system in the infinite-dimensional phase space s+1 s X = {(v, w) ∈ Hper (2π/k0 ) × Hper (2π/k0 )},
s ≥ 0;
the domain of the densely-defined vector field on the left-hand side of Eqs. (4), (5) is s+2 s+1 D = {(v, w) ∈ Hper (2π/k0 ) × Hper (2π/k0 )}.
Equations (4), (5) represent Hamilton’s equations for the Hamiltonian system (X , /, H ), where the position-independent symplectic form / : T X |x × T X |x → R is given by 2π/k0 /((v1 , w1 ), (v2 , w2 )) = (w2 v1 − v2 w1 ) dy, 0
496
M. D. Groves, G. Schneider
where in the following the tangent spaces T X |x are identified with the phase space X . The Hamiltonian function H ∈ C ∞ (X , R), which depends upon the parameter , is defined by
2π/k0
H (v, w) =
0
(1 − cp2 ) w2 1 1 − v2 + v4 (∂y v)2 − 2 2 2 2(1 − cg ) 4(1 − cg2 ) 2(1 − cg )
dy.
The corresponding Hamiltonian vector field vH is calculated by noting that the point x ∈ X belongs to D(vH ) with vH |x = v| ¯ x if and only if /(v| ¯ x , v) = dH [x](v) for all v ∈ T X |x ; a straightforward calculation based on this procedure confirms that Hamilton’s equations are given by (4) and (5). Equations (4), (5) also have a number of discrete symmetries. Firstly, they are reversible: the Hamiltonian vector field anticommutes with the reverser S : (v, w) → (v, −w). This symmetry has the consequence that (v(−ξ ), −w(−ξ )) solves the equations whenever (v(ξ ), w(ξ )) is a solution. There are two further discrete symmetries, namely antisymmetry in the dependent variable and symmetry in the spatial variable: the equations are invariant under the transformations (v, w) → (−v, −w) and y → −y. Note also that the periodicity in y combines with the translation invariance in this variable to give an O(2) symmetry. The next step in the analysis is to examine the spectrum of the linearised system associated with (4) and (5). Writing the variables as Fourier series v=
k0 vm eik0 my , 2π m∈Z
w=
k0 wm eik0 my 2π m∈Z
with reality condition v−m = v¯m , we find that the mth Fourier component satisfies the system ∂ξ vm =wm ,
(6)
m2 k02 (1 − cp2 ) + 1 ∂ ξ wm = vm (1 − cg2 )
(7)
of ordinary differential equations. The eigenvalues λm, of this system are given by λ2m, =
m2 k02 (1 − cp2 ) + 1 (1 − cg2 )
= (k02 + 1)(1 − m2 ) − 2k0 (1 + k02 )1/2 (k02 + m2 )γ1 2 + O( 4 ), in which the O( 4 ) estimate on the remainder term holds uniformly in m. To determine the spectrum of the linearised system associated with (4), (5) we decompose the phase space X into a direct sum ⊕m∈N0 Em of subspaces, where Em consists of the generalised eigenspaces corresponding to the mth and −mth Fourier components. Examining Eqs. (6), (7), we arrive at the following conclusions concerning the structure of Em , m ∈ N0 .
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
497
2 1/2 + O( ). The m = 0: We have two simple, real eigenvalues λ± 0, = ±(1 + k0 ) corresponding eigenvectors are given by 1 v = . ±λ0, w
m = 1: For = 0 we have a geometrically double zero eigenvalue. The two corresponding eigenvectors 1 1 v v cos(k0 y), sin(k0 y), = = 0 0 w w each have a generalised eigenvector, namely 0 0 v v cos(k0 y), sin(k0 y). = = 1 1 w w For > 0 we have two geometrically double eigenvalues λ± 1, which satisfy the equa2 = −2k γ 2 (1 + k 2 )3/2 + O( 4 ); they are therefore real if γ < 0. The tion (λ± ) 0 1 1 1, 0 eigenvectors are 1 1 v v = = cos(k0 y), sin(k0 y). λ± λ± w w 1, 1, m > 1: We have two geometrically double purely imaginary eigenvalues given by 2 1/2 (k 2 + 1)1/2 + O( ). The eigenvectors are λ± m, = ±i(m − 1) 0
v w
=
1 λ± m,
cos(k0 my),
v w
=
1 λ± m,
sin(k0 my).
Recall that (4), (5) are antisymmetric in (v, w) and symmetric in y. The equations are therefore invariant under the transformation (y, v, w) → (−y, −v, −w), and this further symmetry allows us to look for solutions in the form of sine series ∞ ∞ k0 k0 vm sin(k0 my), w = wm sin(k0 my). v= π π m=1
m=1
This reduction has the effect of eliminating the subspace E0 and making all eigenvalues geometrically simple. This eigenvalue picture is summarised in Fig. 2; for > 0 we have a two-dimensional hyperbolic part and an infinite-dimensional centre part. Working in the coordinates vm , wm , so that X = 2s+1 × 2s ,
D = 2s+2 × 2s+1 ,
498
M. D. Groves, G. Schneider
where t
2 = x = (xm )m∈N |
x2t
:=
m=1
we find that the Hamiltonian is given by ∞ 2 λ2m, 2 wm H = + − v 2 2 m m=1
∞
2 m2t xm
0
Fig. 2. The spectrum of the linearised problem consists of infinitely many simple purely imaginary eigenvalues together with a Jordan block at the origin for = 0 and two simple real eigenvalues for > 0
3. Normal-Form Theory In this section we perform a sequence of symplectic changes of variable which simplifies the Hamiltonian function and predicts the existence of pulse solutions. The first step is to transform the Hamiltonian into its quadratic canonical form. To this end we make the linear symplectic change of variable qm vm = √ , µm
wm =
√ µm p m ,
m = 1, 2, . . . ,
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
499
where µ1 = 1 and µ2m = −λ2m,0 = (k02 + 1)(m2 − 1), m = 2, 3, . . . . This change of variable transforms the phase space into X = 2s+1/2 × 2s+1/2 and the Hamiltonian function into H =
∞ ∞ λ21, 2 p12 µm 2 νm, 2 2 + (pm + qm q1 − qm + )− g˜ ij k2 qi qj qk q2 , 2 2 2 2 m=2
m=2
in which g˜ ij k2 = √
1 gij k2 , µ i µj µk µ2
νm, =
±i±j ±k±2=0
1 2 (λ − λ2m,0 ). µm m,
Hamilton’s equations become ∂ξ qm =
∂H , ∂pm
∂ξ pm = −
∂H , ∂qm
m = 1, 2, . . . ,
(8)
where the domain of the Hamiltonian vector field is D = 2s+3/2 × 2s+3/2 . Note that the action of the reverser S on X is given by S(qm , pm ) = (qm , −pm ) and Hamilton’s equations are invariant under the transformation (qm , pm ) → (−qm , −pm ). We now take s > 0, so that X is a Banach algebra; this step leads to simpler estimates in the subsequent analysis. Writing H = HL + HL + HN , where ∞
HL =
1 2 µm 2 2 ), p + (pm + qm 2 1 2 m=2
HL = −
λ21, 2
q12 −
∞ νm, 2 q , 2 m
m=2
we can also decompose vH = vHL +vHL +vHN , where the linear vector fields vHL , vHL : D ⊂ X → X are unbounded and densely defined, while vHN : X → X is bounded and defined upon the whole of X . Note that vHN satisfies the estimate vHN (q, p) = O((q, p)3 ); here, and in the remainder of this article, the symbol · denotes the norm in X and the order-of-magnitude estimates relate to terms which are smooth functions X → X . This semilinearity plays a central role in the subsequent analysis. The next step is to apply a series of normal-form transformations which eliminate certain terms in the Hamiltonian function. It is not possible to obtain a normal-form result as complete as that given in Lemma 1 for the corresponding finite-dimensional problem. The main difficulty lies in the fact that the normal-form transformations contain linear combinations of the frequencies µm = (k02 + 1)1/2 (m2 − 1)1/2 , m = 2, 3, . . . in their denominators, and asymptotic resonances among these frequencies lead to a smalldivisor problem (see Pöschel [21] for a thorough explanation of this point). Nevertheless, it is possible to use a partial normal form to arrive at the same conclusions concerning homoclinic orbits in the normal-form approximations. The essential requirements on the normal-form approximations are that S = {(qm , pm ) = 0, m = 2, 3, . . . } is an invariant subspace and that the dynamics in this subspace are controlled by a dynamical system of the form ∂ξ q1 = p1 , ∂ξ p1 =
λ21, q1
(9) − 2q1 ∂1 F (q12 , ).
The following result shows that these requirements are met.
(10)
500
M. D. Groves, G. Schneider
Theorem 2. Consider the Hamiltonian system (X , /, H ). For each N ≥ 3 there is a near-identity, analytic, symplectic, local change of coordinates 6 : Y → X , where Y is a neighbourhood of the origin in X , with the property that H =
∞ ∞ λ21, 2 p12 µm 2 νm, 2 2 (pm + qm q + HNF (q, p, ) + q1 − )− 2 2 m 2 2 m=2
m=2
+ O((q, p, )N−2 (q, p)4 ) in the new coordinates. The term HNF is a polynomial of order N + 1 in (q, p, ); it satisfies HNF (q, p, ) = O((q, p)4 ), contains no terms which are linear in (qm , pm ), m = 2, 3, . . . and HNF |S = F (q12 , ). The change of coordinates preserves the reversibility and the invariance under the transformation (q, p) → (−q, −p). Proof. We construct a sequence 6j , j = 4, . . . , N + 1 of symplectic transformations with the property that 6j eliminates those monomials in the Hamiltonian of the form β β
k p1α q1 qm , k p1α q1 pm , where m ≥ 2, k ≤ j − 4 and α + β = j − k − 1, together with β those monomials of the form k p1α q1 , where α > 0, k ≤ j − 4 and α + β = j − k. Using the symbol {·, ·} to denote the Poisson bracket given by ∞ ∂F ∂G ∂F ∂G , − {F, G} = ∂pm ∂qm ∂qm ∂pm m=1
we find that β
{HL , P1 } = p1α q1 qm ,
β
{HL , P2 } = p1α q1 pm ,
β
{HL , P3 } = p1α q1 ,
where α+β α+β−1 q1 q m + · · · cβ+1 p1 pm + cβ p1 β α+1 β−1 + c2 p1 q1 qm + c1 p1α q1 pm , P1 = α+β α+β−1 q1 p m + · · · cβ+1 p1 qm + cβ p1 β α+1 β−1 + c2 p1 q1 qm + c1 p1α q1 pm , α+β α+β−1 p qm + d β p 1 q 1 pm + · · · d β+1 1 β α+1 β−1 + d2 p1 q1 pm + d1 p1α q1 qm , P2 = dβ+1 p α+β pm + dβ p α+β−1 q1 qm + · · · 1 1 β−1 β + d2 p1α+1 q1 pm + d1 p1α q1 qm , P3 =
1 β+1 p α−1 q1 β +1 1
ck =
(−1)k β! , µkm (β − k + 1)!
β even, β odd, β even, β odd,
and dk =
(−1)k+1 β! , µkm (β − k + 1)!
k = 1, . . . , β + 1.
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
501
Note that there are no small-divisor problems in the above formulae since µm ≥ µ2 > 0 for all m ≥ 2. We can therefore write down a finite sum Sj of monomials of degree j in (q, p, ) with the property that the expression {HL , Sj } consists of precisely the terms of order j we wish to eliminate. For a sufficiently small neighbourhood Y of the origin in X we find that the Hamiltonian vector field vSj associated with the Hamiltonian system (Y, /, Sj ) is an analytic function Y → X which depends analytically upon . The time-one map vSt j |t=1 of its flow is therefore well defined on Y and by Liouville’s theorem it constitutes a symplectic change of variable 6j : Y → X which depends analytically upon . Using Taylor’s theorem, we find that the Hamiltonian H is transformed into H ◦ 6j = H ◦ vSt j |t=1
= H + {H, Sj } +
1 0
(1 − t){{H, Sj }, Sj } ◦ vSt j dt
∞ νm, 2 = HL − q12 − q + H4 + . . . + Hj −1 + Hj + {HL , Sj } + HR , 2 2 m
λ21,
m=2
where the symbol Hi represents those terms in the Taylor series which are homogeneous of order i in ( , q, p) with Hi = O((q, p, )i−4 (q, p)4 ) and the remainder HR : Y → R satisfies HR = O((q, p, )j −3 (q, p)4 ). Observe that terms in H4 , . . . , Hj −1 are not affected, while those of order j identified above are removed. Note that the above argument relies upon the fact that vSj is defined upon the whole of Y, a consequence of the semilinearity of (8). The reversibility implies that H (q, −p) = H (q, p), so that monomials of the type β β β
k p1α q1 qm , k p1α q1 pm and k p1α q1 can only appear in H for respectively even, odd and even values of α. Inspecting the above formulae, we find that P1 (q, −p) = −P1 (q, −p) (α is even), P2 (q, −p) = −P2 (q, −p) (α is odd), P3 (q, −p) = −P3 (q, −p) (α is even). The vector field vSj is therefore antireversible, that is S ◦ vSj = vSj ◦ S, and the change of variables 6j inherits this property. It follows that (H ◦ 6j )(q, −p) = (H ◦ 6j )(q, p), so that the transformed Hamiltonian system is reversible. A similar argument shows that the change of variable preserves the invariance under the transformation (q, p) → (−q, −p). Let us now write the Hamiltonian system obtained using Theorem 2 as ∂ξ X = L(X) + L h (X) + L c (X) + F (X) + R (X),
(11)
where L : D ⊂ X → X and L c : D ⊂ X → X are the unbounded linear operators 1 2 corresponding respectively to the quadratic Hamiltonians HL and ∞ m=2 2 νm, qm , Lh : X → X is the bounded linear operator corresponding to the quadratic Hamiltonian − 21 λ21, q12 , F : Y → X is the analytic vector field corresponding to the Hamiltonian HNF and R is the smooth O((q, p, )N−1 (q, p)3 ) remainder term. Clearly S = {(qm , pm ) = 0, m = 2, 3, . . . } is an invariant subspace for the truncated system ∂ξ X = L(X) + L h (X) + L c (X) + F (X),
(12)
the dynamics in which are controlled by the two-dimensional system (9), (10). The solution set of Eqs. (9), (10) is analysed by introducing the scaled variables ξ˜ = ξ,
q1 (ξ ) = q˜1 (ξ˜ ),
p1 (ξ ) = 2 p˜ 1 (ξ˜ )
502
M. D. Groves, G. Schneider
and writing λ21, = C1 2 + O( 4 ),
F (q12 , ) =
where C1 = −2k0 γ1 (k02 + 1)3/2 > 0,
C2 =
C2 4 q + O(q16 ), 2 1 3k0 (k02 + 1) > 0. 4π
It follows that ∂ξ˜ q˜1 = p˜ 1 , ˜ ∂ξ˜ p˜ 1 = C1 q˜1 − C2 q˜13 + R(q,
), ˜ is odd in its first argument. In the limit → 0 these where the O( ) remainder term R equations are equivalent to ∂ξ˜ q˜1 = p˜ 1 , ∂ξ˜ p˜ 1 = C1 q˜1 − C2 q˜13 , whose phase portrait is easily calculated by elementary methods and is depicted in Fig. 3. Notice in particular that it has two homoclinic orbits (q˜1+ , p˜ 1+ ), (q˜1− , p˜ 1− ) given by the explicit formulae 2C1 1/2 d ± 1/2 ± ˜ q˜1 (ξ ) = ± sech(C1 ξ˜ ), p˜ 1± (ξ˜ ) = (q˜1 (ξ˜ )). C2 dξ˜
p~1
q~1
Fig. 3. Dynamics in the (q˜1 , p˜ 1 )-subspace
We can exploit the reversibility to show that the phase portrait remains qualitatively unchanged for > 0. In particular, we find that (9), (10) has two homoclinic orbits (q1+ , p1+ ), (q1− , p1− ) of the form
f ± ( ξ ) q1± (ξ ) = , p1± (ξ )
2 g ± ( ξ )
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
503
where f ± , g ± are smooth functions with bounded derivatives. These homoclinic orbits satisfy q1± (−ξ ) q1± (ξ ) = , ξ ∈R p1± (ξ ) −p1± (−ξ ) and
|q1± (ξ )| ≤ M e−r|ξ | ,
|p1± (ξ )| ≤ M 2 e−r|ξ | ,
ξ ∈R
for r ∈ (0, λ1, ) (see Kirchgässner [12, Proposition 5.1] for an analytical explanation or Groves [7, §4] for a geometric explanation of this result). The Hamiltonian system (12) therefore has a pair of homoclinic orbits which are clearly related by the symmetry (q, p) → (−q, −p). We denote one of these orbits by h and note that its qj and pj components vanish for j ≥ 2. Moreover h(ξ ) ≤ M e−r|ξ | and (Sh)(−ξ ) = h(ξ ) for each ξ ∈ R; these facts play an important role in the subsequent analysis. 4. The Local Centre-Stable and Centre Manifolds 4.1. Notation and preliminary estimates. The change of variable X = Xˆ transforms (11) into ˆ + L h (X) ˆ + L c (X) ˆ + Fˆ (X) ˆ + ρ Rˆ (X), ˆ ∂ξ Xˆ = L(X)
(13)
and (12) into the same equation with ρ = 0; here ˆ = −1 F (X), Fˆ (X)
ˆ = −N R (X), Rˆ (X)
ρ = N−1 .
In the following analysis ρ is used to activate or deactivate the remainder term Rˆ in Eq. (13). We treat ρ as a bifurcation parameter which is independent of , but check that all results are compatible with the relationship ρ = N−1 . Note that (13) represents Hamilton’s equations for the scaled Hamiltonian system (Y, /, Hˆ ), where the scaled Hamiltonian is given by the formula ˆ = −2 H (X), Hˆ (X) so that Hˆ =
∞
pˆ 12 µm 2 2 ˆ 2 ) + O(ρ(X, ˆ )N−2 X ˆ 4 ). + (pˆ m + qˆm ) + O( 2 X 2 2
(14)
m=2
We begin by writing
ˆ Xˆ = Zˆ + h,
to find that ˆ + L c (Z) ˆ + N1 (Z) ˆ + ρN2 (Z), ˆ ˆ + L h (Z) ∂ξ Zˆ = L(Z) in which ˆ = Fˆ (hˆ + Z) ˆ − Fˆ (h), ˆ N1 (Z) ˆ = Rˆ (hˆ + Z). ˆ N2 (Z)
(15)
504
M. D. Groves, G. Schneider
Here we have used the fact that hˆ solves Eq. (13) with ρ = 0; this function satisfies ˆ ) ≤ Me−r|ξ | , h(ξ
ξ ∈ R.
(16)
Note that N1 , N2 are smooth functions Y → X , so that (13) is semilinear. The function ˆ we return to this point below. N1 contains terms which are linear in Z; Lemma 2. The estimates ˆ ≤ M (Z ˆ + e−r|ξ | ), N2 (Z) N2 (Zˆ 1 ) − N2 (Zˆ 2 ) ≤ M Zˆ 1 − Zˆ 2 ˆ Zˆ 1 , Zˆ 2 ∈ Y. hold for all Z, Proof. Because Rˆ is a smooth function on the open set Y of the Banach algebra X , we have the estimate ˆ ≤ Rˆ (Zˆ + h) ˆ N2 (Z) ˆ 3 ≤ M Zˆ + h ˆ 3 + h ˆ 3) ≤ M (Z ˆ + e−r|ξ | ). ≤ M (Z Similarly, we find that ˆ − Rˆ (Zˆ 2 + h) ˆ N2 (Zˆ 1 ) − N2 (Zˆ 2 ) = Rˆ (Zˆ 1 + h) ≤ M Zˆ 1 − Zˆ 2 . Lemma 3. The estimates ˆ ≤ M 2 (Z ˆ 2 + Ze ˆ −r|ξ | ), N1 (Z) N1 (Zˆ 1 ) − N1 (Zˆ 2 ) ≤ M 2 (Zˆ 1 + Zˆ 2 + e−r|ξ | )Zˆ 1 − Zˆ 2 ˆ Zˆ 1 , Zˆ 2 ∈ Y. hold for all Z, Proof. Recall that Fˆ is a polynomial in Xˆ with no quadratic terms; we have that ˆ = Fˆ3 (X, ˆ X, ˆ X) ˆ + · · · + FˆN (X, ˆ . . . , X), ˆ Fˆ (X) where Fˆj =
1 j ˆ j ! d F [0],
and Fˆj Lj (X ,X ) is O( j −1 ) for j ≥ 3. It follows that
ˆ = Fˆ3 (Zˆ + h, ˆ Zˆ + h, ˆ Zˆ + h) ˆ − Fˆ3 (h, ˆ h, ˆ h) ˆ + ··· N1 (Z) ˆ . . . , Zˆ + h) ˆ − FˆN (h, ˆ . . . , h), ˆ + FˆN (Zˆ + h, and we can estimate ˆ h, ˆ . . . , Zˆ + h) ˆ − Fˆj (h, ˆ . . . , h) ˆ Fˆj (Z+ ˆ . . . , Z) ˆ + j Fˆj (Z, ˆ . . . , Z, ˆ h) ˆ + . . . + j Fˆj (Z, ˆ h, ˆ . . . , h) ˆ = Fˆj (Z, ˆ 2 + Ze ˆ −r|ξ | ), j = 3, . . . , N. ≤ M j −1 (Z
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
505
To obtain the second estimate note that N1 (Zˆ 1 ) − N1 (Zˆ 2 ) =
N
ˆ . . . , Zˆ 1 + h) ˆ − Fˆj (Zˆ 2 + h, ˆ . . . , Zˆ 2 + h) ˆ Fˆj (Zˆ 1 + h,
j =3
and write, for example, ˆ . . . , Zˆ 1 + h) ˆ − Fˆ3 (Zˆ 2 + h, ˆ . . . , Zˆ 2 + h) ˆ Fˆ3 (Zˆ 1 + h, ˆ h) ˆ = Fˆ3 (Zˆ 1 , Zˆ 1 , Zˆ 1 ) − Fˆ3 (Zˆ 2 , Zˆ 2 , Zˆ 2 ) + 3Fˆ3 (Zˆ 1 − Zˆ 2 , h, ˆ − 3Fˆ3 (Zˆ 2 , Zˆ 2 , h) ˆ + 3Fˆ3 (Zˆ 1 , Zˆ 1 , h) = Fˆ3 (Zˆ 1 − Zˆ 2 , Zˆ 1 , Zˆ 1 ) + Fˆ3 (Zˆ 1 − Zˆ 2 , Zˆ 2 , Zˆ 2 ) + Fˆ3 (Zˆ 1 − Zˆ 2 , Zˆ 1 , Zˆ 2 ) ˆ h) ˆ + 3Fˆ3 (Zˆ 1 − Zˆ 2 , Zˆ 1 , h) ˆ + 3Fˆ3 (Zˆ 1 − Zˆ 2 , Zˆ 2 , h), ˆ + 3Fˆ3 (Zˆ 1 − Zˆ 2 , h, so that ˆ . . . , Zˆ 1 + h) ˆ − Fˆ3 (Zˆ 2 + h, ˆ . . . , Zˆ 2 + h) ˆ Fˆ3 (Zˆ 1 + h, 2 ≤ M (Zˆ 1 + Zˆ 2 + e−r|ξ | )Zˆ 1 − Zˆ 2 . The remaining summands are estimated in a similar fashion.
In the theory below it is also necessary to estimate the strictly nonlinear part of N1 . Lemma 4. The function N0 defined by ˆ = N1 (Z) ˆ − dFˆ [h]( ˆ Z) ˆ N0 (Z) satisfies the estimates ˆ ≤ M 2 Z ˆ 2, N0 (Z) N0 (Zˆ 1 ) − N0 (Zˆ 2 ) ≤ M 2 (Zˆ 1 + Zˆ 2 )Zˆ 1 − Zˆ 2 ˆ Zˆ 1 , Zˆ 2 ∈ Y. for all Z, Proof. In the notation of the previous lemma, we have that ˆ = N1 (Z) ˆ − N0 (Z)
N
ˆ . . . , h, ˆ Z), ˆ j Fˆj (h,
j =1
and the stated results are found by repeating the steps in the proof of the previous lemma. Let us now decompose the space X into the hyperbolic and centre parts Xh and Xc determined by the spectrum of the operator L, so that X = Xh ⊕ Xc ,
506
M. D. Groves, G. Schneider
where Xh = {u ∈ X | qm = pm = 0, m = 2, 3, . . . }, Xc = {u ∈ X | q1 = p1 = 0}. Denote the projection onto Xh along Xh by P : X → X and define Q = I − P . We also use the subscripts c and h as a shorthand notation, so that Xh = P X , Xc = QX . Using this notation, we now establish some basic facts concerning the linear operators ˆ L = L + L h + L c and Lhˆ = L + dFˆ [h]. Recall that Xc and Xh are invariant under L. The spectrum of the 2 × 2 matrix L|Xh consists of two real eigenvalues ±λ, where λ = |λ1, |. The corresponding eigenvectors are u = (1, λ) and s = (1, −λ), and a direct calculation shows that the basis {s ∗ , u∗ } of R2 which is dual to {u, s} in Xh satisfies |u∗ | ≤
M , λ
|s ∗ | ≤
M . λ
The spectrum of L|Xc consists of one pair of simple purely imaginary eigenvalues associated with each mode (qm , pm ), m ≥ 2. The following lemma is a direct consequence of this observation. Lemma 5. The operator L|Xc generates a strongly continuous group {K(ξ )}ξ ∈R of operators in L(Xc , Xc ) such that sup K(ξ )L(Xc ,Xc ) ≤ M.
ξ ∈R
The following results state some spectral properties of the operator Lhˆ . Lemma 6. The hyperbolic subspace Xh is invariant under Lhˆ . ˆ The function Fˆ has by Proof. It suffices to show that Xh is invariant under dFˆ [h].
ˆ ˆ ˆ construction the property that QF (X) = 0 whenever Xc = 0 (see Sect. 3). Since ˆ Z) ˆ = 0 whenever Zˆ c = 0. hˆ c = 0, we conclude that QdFˆ [h]( Lemma 7. The equation ∂ξ Zˆ h = Lhˆ |Xh Zˆ h has solutions s(ξ ), u(ξ ) on [0, ∞) such that |s(ξ )| ≤ Me−λξ ,
|u(ξ )| ≤ Meλξ ,
ξ ∈ [0, ∞).
The dual basis {s ∗ (ξ ), u∗ (ξ )} to {s(ξ ), u(ξ )} in Xh satisfies |s ∗ (ξ )| ≤
M λξ e , λ
|u∗ (ξ )| ≤
M −λξ e , λ
ξ ∈ [0, ∞).
Proof. Note that (Lhˆ − L)|Xh L(Xh ,Xh ) ≤ Me−r|ξ | ,
ξ ∈R
and use the method explained by Groves and Mielke [8, §4.3].
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
507
4.2. The local centre-stable manifold. We now construct the local centre-stable manifold for solutions of Eq. (15) at time ξ = 0. This manifold consists of the initial data for solutions of (15) which exist for some time interval starting at ξ = 0. Such solutions are constructed by modifying the nonlinearities by a cut-off function and proving a global existence result for solutions of the modified system on [0, ∞) whose hyperbolic parts are small. We write Eq. (15) as ˆ + ρN2 (Z)), ˆ ∂ξ Zˆ h = Lhˆ (Zˆ h ) + P (N0 (Z) ˆ + ρN2 (Z)) ˆ ∂ξ Zˆ c = L(Zˆ c ) + Q(N1 (Z)
(17) (18)
and modify the nonlinearities by means of a cut-off function θ ∈ C ∞ (X , R) which satisfies 1, Zˆ c ≤ δ, ˆ θ(Zc ) = 0, Zˆ c ≥ 2δ, where δ = n for some n ∈ N. We therefore consider the equations ˆ + ρN2 (Z)), ˆ ∂ξ Zˆ h = Lhˆ (Zˆ h ) + P (N0 (Z) ˆ + ρN2 (Z)), ˆ ∂ξ Zˆ c = L(Zˆ c ) + Q(N1 (Z)
(19) (20)
ˆ = Nj (Zθ ˆ (Zˆ c )). Note that N0 , N1 , N2 , are controlled by the same in which Nj (Z) estimates as respectively N0 , N1 , N2 . Consider the integral equation ˆ ) = Vs s(ξ ) + K(ξ )Vc Z(ξ ξ ˆ + ρN2 (Z))(τ ˆ + !P (N0 (Z) ), s ∗ (τ )" dτ s(ξ ) 0 ∞ ˆ + ρN2 (Z))(τ ˆ !P (N0 (Z) ), u∗ (τ )" dτ u(ξ ) −
ξ
ξ
+
ˆ + ρN2 (Z))(τ ˆ K(ξ − τ )Q(N1 (Z) )) dτ,
0
where Vs ∈ R, Vc ∈ Dc and !· , ·" denotes the usual inner product in R2 . We write this equation as ˆ Zˆ = F(Z) and study it in the Banach space Er+ = {Zˆ ∈ C([0, ∞), X ) | Zr := sup e−rξ Z(ξ ) < ∞}, ξ ≥0
in which the exponent r satisfies 0 < r < λ. The following lemma is proved using standard techniques for the regularity of mild solutions of semilinear evolutionary problems (see Pazy [19, Ch. 6]). Lemma 8. Any fixed point Zˆ ∈ Er+ of F is a solution of (19), (20) on [0, ∞) with the property that ˆ !Zˆ h (0), s ∗ (0)" = Vs , QZ(0) = Vc .
508
M. D. Groves, G. Schneider
Theorem 3. Suppose that |Vs | ≤ M n+1 and ρ ≤ n+2 . The function F is a contraction on Bδ+ = {Zˆ ∈ Er+ | sup |Zˆ h (ξ )| ≤ δ} ξ ≥0
and the unique fixed point of F in
Bδ+
satisfies
sup |Zˆ h (ξ )| ≤ M n+1 . ξ ≥0
Proof. The first step is to show that Bδ+ is invariant under F. For |Zˆ h (ξ )| ≤ δ on [0, ∞), we find that M ξ 2 2 −λξ ˆ |(F(Z))h (ξ )| ≤ M|Vs |e + [ δ + ρ (δ + e−rτ )]eλτ dτ e−λξ λ 0 M ∞ 2 2 + [ δ + ρ (δ + e−rτ )]e−λτ dτ eλξ λ ξ M = M|Vs |e−λξ + 2 [ 2 δ 2 + ρ δ](1 − e−λξ ) λ M M M + ρ (e−rξ − e−λξ ) + 2 [ 2 δ 2 + ρ δ] + ρ e−rξ λ(λ − r) λ λ(λ + r) ρ (21) ≤ M |Vs | + δ 2 +
and ˆ c (ξ ) ≤ MVc + M (F(Z))
ξ
[ 2 (δ 2 + δe−rτ ) + ρ (δ + e−rτ )] dτ
0 2 2
(22)
≤ MVc + M( δ + ρ δ)ξ + M( δ + ρ). It follows that ˆ h (ξ )| ≤ M n+1 + M|Vs |, |(F(Z)) ˆ c r < ∞, (F(Z)) whenever δ = n , ρ ≤ n+2 , and we conclude that F maps Bδ+ into itself with ˆ h (ξ )| ≤ M n+1 on [0, ∞) for |Vs | ≤ M n+1 . |(F(Z)) It remains to show that F is a contraction in Bδ+ . Observe that |(F(Zˆ 1 ))h (ξ ) − (F(Zˆ 2 ))h (ξ )| ≤
M λ
ξ
( 2 δ + ρ )Zˆ 1 − Zˆ 2 eλτ dτ e−λξ M ∞ 2 + ( δ + ρ )Zˆ 1 − Zˆ 2 e−λτ dτ eλξ λ ξ M ξ 2 ≤ ( δ + ρ )e(λ+r)τ dτ e−λξ Zˆ 1 − Zˆ 2 r λ 0 M ∞ 2 + ( δ + ρ )e(r−λ)τ dτ eλξ Zˆ 1 − Zˆ 2 r λ ξ 0
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
=
509
M ( 2 δ + ρ )(erξ − e−λξ )Zˆ 1 − Zˆ 2 r λ(r + λ) M ( 2 δ + ρ )erξ Zˆ 1 − Zˆ 2 r + λ(λ − r)
and (F(Zˆ 1 ))c (ξ ) − (F(Zˆ 2 ))c (ξ ) ≤ M
ξ
[ 2 (δ + e−rτ ) + ρ ]Zˆ 1 − Zˆ 2 dτ
0
ξ ≤M [( 2 δ + ρ )erτ + 2 ] dτ Zˆ 1 − Zˆ 2 r 0 M 2 rξ 2 ≤ ( δ + ρ )(e − 1) + M ξ Zˆ 1 − Zˆ 2 r r for Zˆ 1 , Zˆ 2 ∈ Bδ+ , so that |(F(Zˆ 1 ))h − (F(Zˆ 2 ))h |r M M ≤ ( 2 δ + ρ ) + ( 2 δ + ρ ) Zˆ 1 − Zˆ 2 r λ(r + λ) λ(λ − r) and (F(Zˆ 1 ))c − (F(Zˆ 2 ))c r ≤ in which the estimate
M 2 M 2 ( δ + ρ ) + r r
Zˆ 1 − Zˆ 2 r ,
sup rξ e−rξ ≤ M ξ ≥0
has been used. It follows that LipB+ F < 1 for δ = n and ρ ≤ n+2 . δ
Lemma 9. Suppose that ρ ≤ n+2 . Any solution Zˆ of (19), (20) on [0, ∞) which satisfies sup |Zˆ h (ξ )| ≤ δ ξ ≥0
is a fixed point of F in Bδ+ with !Zˆ h (0), s ∗ (0)" = Vs ,
Zˆ c (0) = Vc .
Proof. Any solution of (19), (20) on [0, ∞) satisfies ξ ˆ ) = !Zˆ h (0), s ∗ (0)"s(ξ ) + ˆ + ρN2 (Z)), ˆ s ∗ (τ )" dτ s(ξ ) Z(ξ !P (N0 (Z) 0
+ !Zˆ h (0), u (0)"u(ξ ) + ∗
+ K(ξ )Zˆ c (0) +
ξ
ˆ + ρN2 (Z)), ˆ u∗ (τ )" dτ u(ξ ) !P (N0 (Z)
0
ξ
ˆ + ρN2 (Z)) ˆ dτ. K(ξ − τ )Q((N1 (Z)
0
Using this analogue of the variation-of-constants formula, we now follow the standard arguments given by Coddington and Levinson [1, pp. 332–333].
510
M. D. Groves, G. Schneider
We now use the above results to define a local centre-stable manifold at time ξ = 0 for the nonautonomous Eqs. (17), (18); this manifold is a global centre-stable manifold for Eqs. (19), (20) in which the cut-off function is used. Definition 1. Suppose that ρ ≤ n+2 and take Vc ∈ Dc , Vs ∈ R with the property that F has a unique fixed point Zˆ Vs ,Vc in Bδ+ . The set of points W cs =
{Zˆ Vs ,Vc (0)}
Vs ,Vc
is called the local centre-stable manifold for solutions to (17), (18) at time ξ = 0. 4.3. The local centre manifold. We now construct a locally invariant manifold for Eq. (13) which consists of solutions whose hyperbolic parts are small. More precisely, we consider the equations ˆ + ρ Rˆ (X)), ˆ ∂ξ Xˆ h = L(Xˆ h ) + P (Fˆ (X) ˆ + ρ Rˆ (X)), ˆ ∂ξ Xˆ c = L(Xˆ c ) + Q(Fˆ (X)
(23) (24)
where the underscore represents modification by the cut-off function θ . The estimates ˆ ≤ M X ˆ 3, Rˆ (X) Rˆ (Xˆ 1 ) − Rˆ (Xˆ 2 ) ≤ M Xˆ 1 − Xˆ 2 ,
(25)
ˆ ≤ M 2 X ˆ 3, Fˆ (X) Fˆ (Xˆ 1 ) − Fˆ (Xˆ 2 ) ≤ M 2 (Xˆ 1 + Xˆ 2 )Xˆ 1 − Xˆ 2
(26)
ˆ Xˆ 1 , Xˆ 2 ∈ Y and are also valid for Fˆ and Rˆ . clearly hold for all X, Consider the integral equation ξ ˆ ) = K(ξ )Vc + ˆ + ρ Rˆ (X))(τ ˆ X(ξ !P (Fˆ (X) ), s ∗ eλτ " dτ se−λξ −∞ ∞ ˆ + ρ Rˆ (X))(τ ˆ − !P (Fˆ (X) ), u∗ e−λτ " dτ ueλξ
ξ
ξ
+
ˆ + ρ Rˆ (X))(τ ˆ K(ξ − τ )Q(Fˆ (X) ) dτ,
0
where Vc ∈ Dc . We write this equation as ˆ Xˆ = G(X) and study it in the Banach space ˆ r = sup e−r|ξ | X(ξ ˆ ) < ∞ . Er = Xˆ ∈ C(R, X ) | X ξ ∈R
The following theorem is obtained in the same way as the results in Sect. 4.2.
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
511
Theorem 4. Suppose that ρ ≤ n+2 . (i) Any fixed point Xˆ ∈ Er of G is a solution of (23), (24) on R with the property that ˆ QX(0) = Vc . (ii) The function G is a contraction on Bδ = {Xˆ ∈ Er | sup |Xˆ h (ξ )| ≤ δ} ξ ∈R
and the unique fixed point of G in Bδ satisfies sup |Xˆ h (ξ )| ≤ M n+1 .
ξ ∈R
(iii) Any solution to (23), (24) on R which satisfies sup |Xˆ h (ξ )| ≤ δ
ξ ∈R
is a fixed point of G in Bδ with Xˆ c (0) = Vc . We now define a local centre manifold for (13) as the set of initial data on solutions of (23), (24) on R whose hyperbolic parts are bounded by δ. The precise choice of starting time for the solutions is, however, not important because the equations are autonomous. Notice that W c is a locally invariant manifold for (13) and a globally invariant manifold for (23), (24). Definition 2. Suppose that ρ ≤ n+2 , take Vc ∈ Dc and denote the unique fixed point of G in Bδ by Xˆ Vc . The set of points Wc =
{Xˆ Vc (0)} Vc
is called the local centre manifold for (13). It will prove useful to introduce a reduction function ψ to describe the local part of W c , so that W c ∩ {Xˆ c ≤ δ} = {(ψ(Xˆ c ), Xˆ c ) | Xˆ c ≤ δ}. The reduction function ψ is defined by ψ(Xˆ c ) = (Xˆ Xˆ c )h (0), and a familiar argument shows that ψ(Xˆ c ) ≤ MXˆ c 2 (see Mielke [16, p. 77]).
512
M. D. Groves, G. Schneider
5. The Global Centre-Stable Manifold In this section we show that solutions Zˆ Vs ,Vc of (19), (20) whose initial data lie on W cs satisfy Zˆ c (ξ ) ≤ δ for ξ ∈ [0, ∞) whenever |Vs |, Vc ≤ n+2 . Such solutions solve (17), (18), in which the cut-off function θ is not used, and we then refer to the submanifold W˜ cs of W cs given by restricting to these values of Vs and Vc as a global centre-stable manifold for solutions of (17), (18) at ξ = 0. The proof of the above assertion is given in Theorem 6 below. It relies upon three preliminary results. Firstly, the origin is Lyapunov stable within the centre manifold W c , so that solutions on W c which are O( n+1 ) at some time remain O( n+1 ) for all subsequent times (Lemma 10). Secondly, the centre part of solutions with initial data on W cs remains O( n+1 ) on timescales of order 1/ n+1 (Lemma 11). Thirdly, any solution Zˆ with initial data on W cs has the property that Zˆ + hˆ converges exponentially to a solution Xˆ on W c (Theorem 5). The idea is therefore to control the centre part of Zˆ over a long timescale using Lemma 11; at the end of this timescale Zˆ c is very close to Xˆ c because of the exponential decay and can be controlled by the Lyapunov stability of the origin in W c . The main issue is careful book-keeping of the exponential decay rates and multiplicative constants in the various estimates. Lemma 10. Any solution Xˆ of (23), (24) with the property that Xˆ c (ξ ) ≤ M n+1 for some ξ > 0 satisfies Xˆ c (ξ ) ≤ M n+1 < δ, ξ ≥ ξ . Proof. Let Xˆ be a solution of (23), (24) that lies on W c , so that ˆ ) = (ψ(Xˆ c (ξ )), Xˆ c (ξ )) X(ξ ˆ )) = Hˆ (ψ(Xˆ c (ξ )), Xˆ c (ξ )), and using the whenever Xˆ c (ξ ) ≤ δ. It follows that Hˆ (X(ξ formula (14) for Hˆ and the estimate |ψ(Xˆ c )| ≤ MXˆ c 2 for ψ, we find that ˆ )) ≥ MXˆ c (ξ )2 Hˆ (X(ξ whenever Xˆ c (ξ ) ≤ δ. The classical Lyapunov stability argument now shows that Xˆ c (ξ ) ≤ M n+1 implies that Xˆ c (ξ ) ≤ M n+1 for ξ ≥ ξ (cf. Meyer & Hall [15, p. 5]). Lemma 11. Suppose that ρ ≤ n+2 . Any solution Zˆ Vs ,Vc of (19), (20) with initial data on W cs which satisfies |Vs | ≤ M n+1 ,
Vc ≤ M n+1
has the properties that |Zˆ h (ξ )| ≤ M n+1 ,
ξ ∈ [0, ∞)
Zˆ c (ξ ) ≤ M n+1 ,
ξ ∈ [0, ξ1 ]
and for any ξ1 ≤ 1/ n+1 .
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
513
Proof. The fact that |Zˆ h (ξ )| < M n+1 for ξ ∈ [0, ∞) follows directly from Theorem 3, and the estimate (22) in its proof shows that Zˆ c (ξ ) ≤ MVc + M( 2 δ 2 + ρ δ)ξ + M( δ + ρ),
ξ ∈ [0, ∞).
The result for Zˆ c (ξ ) follows from this inequality, the hypothesis on Vc and the relations δ = n , ρ ≤ n+2 . Consider a solution Zˆ to (19), (20) with initial data on W cs which meets the hypotheses in Lemma 11. We now use the strategy explained by Mielke [17] to construct a solution Xˆ of (23), (24) to which Zˆ + hˆ converges exponentially as ξ → ∞. Define Yˆ : R → Y by ˆ ) + h(ξ ˆ )), Yˆ (ξ ) = I(ξ )(Z(ξ
(27)
where I ∈ C ∞ (R, R) is a cut-off function such that 0, ξ ≤ ξ2 /2, I(ξ ) = 1, ξ ≥ ξ2 , and is chosen so that |I (ξ )| ≤ M/ξ2 for all ξ ∈ R. It follows that Yˆ solves the equation ∂ξ Yˆ = LYˆ + Fˆ (Yˆ ) + ρ Rˆ (Yˆ ) + S, in which ˆ ) + h(ξ ˆ )) + ρ Rˆ (Z(ξ ˆ ) + h(ξ ˆ ))] S(ξ ) = I(ξ )[Fˆ (Z(ξ ˆ ) + h(ξ ˆ ))) − ρ Rˆ (I(ξ )(Z(ξ ˆ ) + h(ξ ˆ ))) + I (ξ )(Z(ξ ˆ ) + h(ξ ˆ )). − Fˆ (I(ξ )(Z(ξ Notice that the difference J between Yˆ and a solution Xˆ of Eq. (23), (24) solves the equation ∂ξ J = LJ + Fˆ (Yˆ ) − Fˆ (Yˆ − J) + ρ[Rˆ (Yˆ ) − Rˆ (Yˆ − J)] + S.
(28)
We therefore seek a solution J of this equation which decays exponentially to zero as ξ → ∞ and has the property that Xˆ = Yˆ − J lies on W c . Consider the integral equation ξ J(ξ ) = !P (Fˆ (Yˆ ) − Fˆ (Yˆ − J) −∞
−
ξ
!P (Fˆ (Yˆ ) − Fˆ (Yˆ − J) + ρ[Rˆ (Yˆ ) − Rˆ (Yˆ − J)] + S)(τ ), u∗ e−λτ " dτ ueλξ
−
+ ρ[Rˆ (Yˆ ) − Rˆ (Yˆ − J)] + S)(τ ), s ∗ eλτ " dτ se−λξ ∞
∞ ξ
K(ξ − τ )Q(Fˆ (Yˆ ) − Fˆ (Yˆ − J) + ρ[Rˆ (Yˆ ) − Rˆ (Yˆ − J)] + S)(τ ) dτ.
514
M. D. Groves, G. Schneider
We write this equation as J = K(J) and study it in the Banach space Er− = {J ∈ C(R, X ) | Jr = sup erξ J(ξ ) < ∞}. ξ ∈R
Theorem 5. Suppose that ρ ≤ n+2 and ξ2 satisfies e−rξ2 /2 = n+1 . (i) Any fixed point J ∈ Er− of K is a solution of (28) on R. (ii) The function K is a contraction on Bδ− = {J ∈ Er− | sup |Jh (ξ )| ≤ δ} ξ ∈R
and the unique fixed point of K in Bδ− satisfies J(ξ ) ≤
M rξ /2 −rξ e 2 e ,
2
|Jh (ξ )| ≤ M
n , | log |
ξ ∈ R.
Proof. The first statement is proved in the same way as Lemma 8. Turning to the second statement, observe that Fˆ (Yˆ ) − Fˆ (Yˆ − J) ≤ M 2 (Yˆ + J)J, Rˆ (Yˆ ) − Rˆ (Yˆ − J) ≤ M J, (see (25), (26)) and S(ξ ) ≤
M ˆ ˆ )) + M 2 (Z(ξ ˆ )3 + h(ξ ˆ )3 ), (Z(ξ ) + h(ξ ξ2
ξ ∈ [ξ2 /2, ξ2 ]
ˆ ) ≤ (S(ξ ) vanishes outside this interval). Because ξ2 = 2(n + 1)| log |/r and Z(ξ n+1 M for ξ ≤ ξ1 (see Lemma 11), we find that S(ξ ) ≤ M
n+2 , | log |
ξ ∈ [ξ2 /2, ξ2 ]
(29)
(because ξ1 ≤ 1/ n+1 and ξ2 = 2(n + 1)| log |/r, we can always choose ξ1 > ξ2 ). Furthermore, the estimate ˆ ) ≤ Z(ξ
sup
ξ ∈[ξ2 /2,ξ2 ]
ˆ )e−r(ξ −ξ2 ) Z(ξ
≤ M n+1 e−r(ξ −ξ2 )
≤ Merξ2 /2 e−rξ shows that
S(ξ ) ≤ Merξ2 /2 e−rξ ,
ξ ∈ [ξ2 /2, ξ2 ].
(30)
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
515
Using (30) to estimate S(ξ ), we find that (K(J)(ξ )erξ ≤
M λ
ξ
−∞
M + λ +M
( 2 δJr + ρ Jr + erξ2 /2 χ )e(λ−r)τ dτ e(r−λ)ξ ∞
ξ ∞ ξ
( 2 δJr + ρ Jr + erξ2 /2 χ )e−(λ+r)τ dτ e(r+λ)ξ
( 2 δJr + ρ Jr + erξ2 /2 χ )e−rτ dτ erξ
M M ≤ ( 2 δ + ρ )Jr + erξ2 /2 λ(λ − r) λ(λ − r) M M ( 2 δ + ρ )Jr + erξ2 /2 + λ(λ + r) λ(λ + r) M 2 M rξ /2 + ( δ + ρ )Jr + e 2 , r r
(31)
where χ is the characteristic function of the interval [ξ2 /2, ξ2 ]. It follows that K(J)r < ∞, so that K maps Er− into itself. Replacing Jr by δ and r with 0 in the above inequality and using (29) to estimate S(ξ ), we find that |(K(J))h (ξ )| ≤
M 2 M
n+2 , ( δ + ρ )δ + 2 λ2 λ | log |
so that K maps Bδ− into itself. To show that K is a contraction on Bδ− , note that (K(J1 ))(ξ ) − (K(J2 ))(ξ ) ξ = !P (Fˆ (Yˆ − J2 ) − Fˆ (Yˆ − J1 ) −∞
−
ξ
!P (Fˆ (Yˆ − J2 ) − Fˆ (Yˆ − J1 ) + ρ[Rˆ (Yˆ − J2 ) − Rˆ (Yˆ − J1 )])(τ ), u∗ e−λτ " dτ ueλξ
−
+ ρ[Rˆ (Yˆ − J2 ) − Rˆ (Yˆ − J1 )])(τ ), s ∗ eλτ " dτ se−λξ ∞
∞ ξ
K(ξ − τ )Q(Fˆ (Yˆ − J2 ) − Fˆ (Yˆ − J1 ) + ρ[Rˆ (Yˆ − J2 ) − Rˆ (Yˆ − J1 )])(τ ) dτ.
Using the estimates Fˆ (Yˆ − J2 ) − Fˆ (Yˆ − J1 ) ≤ M 2 (Yˆ + J1 + J2 )(J1 − J2 , Rˆ (Yˆ − J2 ) − Rˆ (Yˆ − J1 ) ≤ M J1 − J2 ,
516
M. D. Groves, G. Schneider
which are obtained from (25), (26), we find that M ξ ( 2 δ + ρ )J1 − J2 eλτ dτ e−λξ erξ (K(J1 ))(ξ ) − (K(J2 ))(ξ )erξ = λ −∞ M ∞ 2 + ( δ + ρ )J1 − J2 e−λτ dτ eλξ erξ λ ξ ∞ +M ( 2 δ + ρ )J1 − J2 dτ erξ ξ
M ≤ ( 2 δ + ρ )J1 − J2 r λ(λ − r) M + ( 2 δ + ρ )J1 − J2 r λ(λ + r) M + ( 2 δ + ρ )J1 − J2 r , r so that LipB − K < 21 . δ
The estimates K(0)r ≤ Merξ2 /2 / 2 (see (31)) and LipB − K < Bδ−
unique fixed point J of K in Merξ2 /2 e−rξ / 2 for each ξ ∈ R.
satisfies Jr ≤
δ
Merξ2 /2 / 2 ,
1 2
show that the
so that J(ξ ) ≤
It now remains to combine the above results to prove the result announced at the start of this section and to construct a global centre-stable manifold for Eqs. (17), (18). Theorem 6. Any solution Zˆ Vs ,Vc of (19), (20) with Zˆ Vs ,Vc ∈ W cs and |Vs |, Vc ≤ n+2 satisfies Zˆ c (ξ ) ≤ δ for ξ ∈ [0, ∞). Proof. Define Yˆ by (27) and construct J using Theorem 5. The function Xˆ : R → Y given by ˆ ) = Yˆ (ξ ) − J(ξ ), ξ ∈ R X(ξ is a solution of (23), (24) and because |Xˆ h (ξ )| ≤ |Yˆh (ξ )| + |Jh (ξ )| ≤ M it lies on W c (see Theorem 4(iii)). Observe that ˆ ) + h(ξ ˆ ), Yˆ (ξ ) = Z(ξ
n < δ, | log |
ξ ∈R
ξ ≥ ξ2 ,
so that
M Zˆ c (ξ ) − Xˆ c (ξ ) ≤ J(ξ ) ≤ 2 erξ2 /2 e−rξ , ξ ≥ ξ2 .
Lemma 11 shows that Zˆ c (ξ ) ≤ M n+1 , ξ ≤ ξ , and clearly 1
Zˆ c (ξ ) − Xˆ c (ξ ) ≤ J(ξ ) ≤
M rξ /2 −rξ e 2 e 1,
2
ξ ≥ ξ1
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
517
because ξ1 > ξ2 . It follows that Zˆ c (ξ ) − Xˆ c (ξ ) ≤ J(ξ ) ≤ M n+1 ,
ξ ≥ ξ1 ,
and in particular Xˆ c (ξ1 ) ≤ M n+1 . Lemma 10 now asserts that Xˆ c (ξ ) ≤ M n+1 , ξ ≥ ξ1 , so that Zˆ c (ξ ) ≤ M n+1 , ξ ≥ ξ1 . Definition 3. The submanifold W˜ cs of W cs given by restricting to |Vs |, Vc ≤ n+2 is called the global centre-stable manifold for solutions of (17), (18) at time ξ = 0. 6. Construction of Symmetric Modulating Pulses In this section we identify solutions Zˆ Vs ,Vc of (17), (18) on [0, ∞) with Zˆ Vs ,Vc (0) ∈ W˜ cs which can be extended to solutions of (17), (18) on R. The idea is to exploit the reversibility of Eqs. (17), (18) (see Sect. 2); in particular, solutions with the property that Zˆ Vs ,Vc (0) lies on the symmetric section ) := Fix S = X ∩ {p = 0} can be extended to symmetric solutions on R. Because Zˆ c (0) = Vc we have that Zˆ c (0) ∈ )c whenever Vc ∈ )c and our task is reduced to that of identifying a criterion on Vs which guarantees that Zˆ h (0) ∈ )h . We consider a solution Zˆ Vs ,Vc with Zˆ Vs ,Vc (0) ∈ W˜ cs as a function of Vs which depends upon ρ ∈ R and Vc ∈ )c as parameters (with ρ, Vc ≤ n+2 ); accordingly we use the alternative notation Zˆ ρ,Vc (Vs ). The above comments show that Zˆ ρ,Vc (Vs )|ξ =0 ∈ ) whenever Vs is a solution of the equation Jρ,Vc (Vs ) = 0,
(32)
where Jρ,Vc : B¯ n+2 (0) ⊂ R → R is defined by Jρ,Vc (Vs ) = (I − S)Zˆ ρ,Vc (Vs )|ξ =0 . (The right-hand side of this equation is a vector in X with only one nonzero entry, namely its p1 -component, and is therefore identified with a real number.) Notice that Eq. (32) has the solution Vs = 0 at (ρ, Vc ) = (0, 0) since the unique solution of (17), (18) with (ρ, Vc ) = (0, 0) is Zˆ = 0. We therefore seek a solution of (17), (18) near this known solution for parameter values (ρ, Vc ) near (0, 0). It therefore seems natural to apply the implicit-function theorem; notice, however, that we are forced to work from first principles (by applying the contraction mapping principle) since we require precise information concerning the parameter-dependence of the solutions. To carry out the above programme it is necessary to show that J is differentiable and obtain some estimates on its derivative. We therefore need to show that the solutions Zˆ ρ,Vc (Vs ) described above are differentiable with respect to Vs and obtain some estimates on their derivatives. Formally differentiating the fixed-point equation for F which defines Zˆ ρ,Vc (Vs ) and replacing Zˆ ρ,Vc (Vs ) and dZˆ ρ,Vc [Vs ](V˜s ) with respectively Zˆ and Yˆ , we obtain the integral equation
518
M. D. Groves, G. Schneider
Yˆ (ξ ) = V˜s s(ξ ) + K(ξ )V˜c ξ ˆ + ρN2 (Z))) ˆ Yˆ , s ∗ (τ )" dτ s(ξ ) + !∂Zˆ (P (N0 (Z) 0 ∞ ˆ + ρN2 (Z))) ˆ Yˆ , u∗ (τ )" dτ u(ξ ) !∂Zˆ (P (N0 (Z) − ξ
+
0
ξ
ˆ + ρN2 (Z))) ˆ Yˆ dτ, K(ξ − τ )∂Zˆ (Q(N1 (Z)
where the cut-off function θ has not been used because Zˆ c (ξ ) < δ, ξ ∈ [0, ∞). We write this equation as ˆ Yˆ ) Yˆ = F (Z)( and study it in the space Eµ+ , where µ ∈ [r, 2r] and r is now chosen so that 0 < 2r < λ. Using the results of the following lemma, we find from the fibre contraction mapping principle and standard uniform continuity arguments that Zˆ ρ,Vc (Vs ) is differentiable with Yˆ = dZˆ ρ,Vc [Vs ](V˜s ) (cf. Vanderbauwhede [24, §1.3]). ˆ is a uniform contraction on Eµ+ for Zˆ on W˜ cs . Lemma 12. (i) The mapping F (Z) + (ii) For each fixed Yˆ ∈ Er+ the mapping F (·)(Yˆ ) : Er+ → E2r is continuous. Proof. Observe that ˆ Yˆ ))c (ξ ) ≤ M 2 (F (Z)(
ξ
Yˆ dτ ≤
0
so that
M 2 ˆ Y µ eµξ , µ
2 ˆ Yˆ ))c µ ≤ M Yˆ µ (F (Z)( µ
and M ξ 2 −λξ ˆ ˆ ˜ |(F (Z)(Y ))h (ξ )| ≤ M|Vs | e + ( δ + ρ )Yˆ eλτ dτ e−λξ λ 0 M ∞ 2 + ( δ + ρ )Yˆ e−λτ dτ eλξ λ ξ ξ M ≤ M|V˜s |e−λξ + ( 2 δ + ρ ) Yˆ dτ λ 0 ∞ M 2 + ( δ + ρ ) Yˆ dτ λ ξ M ≤ M|V˜s |e−λξ + 2 ( 2 δ + ρ )Yˆ µ eµξ ,
so that ˆ Yˆ ))h |µ ≤ M|V˜s | + |(F (Z)(
M 2 ( δ + ρ )Yˆ µ .
2
ˆ Yˆ ) is linear The first assertion follows from the above estimates and the fact that F (Z)( in Yˆ .
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
519
Turning to the second assertion, note that (F (Zˆ 1 )(Yˆ ) − F (Zˆ 2 )(Yˆ ))c (ξ ) ≤ M 2
ξ
Zˆ 1 − Zˆ 2 Yˆ dτ
0
≤
M 2 ˆ Z1 − Zˆ 2 r Yˆ r e2rξ , r
so that
M 2 ˆ (F (Zˆ 1 )(Yˆ ) − F (Zˆ 2 )(Yˆ ))c 2r ≤ Z1 − Zˆ 2 r Yˆ r , r and similarly we find that |(F (Zˆ 1 )(Yˆ ) − F (Zˆ 2 )(Yˆ ))h |2r ≤ MZˆ 1 − Zˆ 2 r Yˆ r .
The following corollary of Lemma 12 states the requisite estimates on the derivative of J . Corollary 1. (i) The operator dJ0,0 [0] : R → R is a bijection and |dJ0,0 [0]−1 | ≤
M . λ
(ii) The operator dJρ,Vc [Vs ] : R → R satisfies the estimate ρ dJρ,Vc [Vs ] − dJ0,0 [0] ≤ M |Vs | + Vc + .
Proof. Clearly
(33)
(34)
dJ0,0 [0] = (I − S)dZˆ 0,0 [0]|ξ =0
and
dZˆ 0,0 [0]|ξ =0 (V˜s ) = V˜s s(0),
so that
dJ0,0 [0](V˜s ) = V˜s (I − S)s(0).
Taking the inner product of this equation with (I + S)s ∗ (0) and using the fact that S : X → X is a self-adjoint involution, one finds that 1 V˜s = !dJ0,0 [0](V˜s ), (I + S)s ∗ (0)", 2 from which the first assertion is a direct consequence. Define Zˆ 1 = Zˆ ρ,Vc (Vs ), Yˆ1 = dZˆ ρ,Vc [Vs ], Yˆ2 = dZˆ 0,0 [0] and note that Zˆ 0,0 is identically zero. By construction we have that Yˆ1 − Yˆ2 = F (Zˆ 1 )(Yˆ1 ) − F (0)(Yˆ2 ), so that (Yˆ1 − Yˆ2 )h (ξ ) =
ξ
!∂Zˆ (P (N0 (Zˆ 1 ) + ρN2 (Zˆ 1 )))Yˆ1 , s ∗ (τ )" dτ s(ξ ) 0 ∞ !∂Zˆ (P (N0 (Zˆ 1 ) + ρN2 (Zˆ 1 )))Yˆ1 , u∗ (τ )" dτ u(ξ ). − ξ
520
M. D. Groves, G. Schneider
The estimates ∂Zˆ (N0 (Zˆ 1 ) + ρN2 (Zˆ 1 ))Yˆ1 ≤ M( 2 Zˆ 1 + ρ )Yˆ1 ρ Zˆ 1 r ≤ M |Vs | + Vc + 2
and show that
∂Zˆ ((N0 (Zˆ 1 ) + ρN2 (Zˆ 1 )))Yˆ1 ≤ M( 2 |Vs + 2 Vc + ρ )Yˆ1 erξ . It follows that M 2 ( |Vs | + 2 Vc + ρ ) |(Yˆ1 − Yˆ2 )h (ξ )| ≤ λ ξ × Yˆ1 erτ eλτ dτ e−λξ + 0
∞ ξ
Yˆ1 erτ e−λτ dτ eλξ
M 2 ≤ ( |Vs | + 2 Vc + ρ )Yˆ1 r λ ξ ∞ 2rτ (2r−λ)τ λξ × e dτ + e dτ e 0
≤
ξ
M 2 ( |Vs | + 2 Vc + ρ )Yˆ1 r λ
from which the stated result is an immediate consequence.
e2rξ − 1 e2rξ + 2r λ − 2r
,
We now study the solution set of the equation Jρ,Vc (Vs ) = 0 near the known solution Vs = 0 at (ρ, Vc ) = (0, 0) by writing it as Vs = Vs − dJ0,0 [0]−1 Jρ (Vs )
(35)
and examining this fixed point problem. According to a standard argument in nonlinear analysis the fixed-point problem (35) has a unique solution Vs = Vs (ρ, Vc ) in B¯ η (0) ⊂ R whenever η dJ0,0 [0]−1 Jρ,Vc (0) ≤ , 2 1 −1 dJ0,0 [0] dJρ,Vc [Vs ] − dJ0,0 [0] ≤ , Vs ∈ B¯ η (0). 2 The estimates (33), (34) and Jρ,Vc (0) ≤ M(Zˆ ρ,Vc (0)|ξ =0 )h M ≤ 2 ( 2 δ 2 + ρ )
(see formula (21)) show that for ρ = n+5 we can take η = n+2 , which is compatible with the restriction |Vs | ≤ n+2 for Zˆ ρ,Vc on W˜ cs .
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
521
We have therefore constructed a family of symmetric solutions Zˆ Vc to (17), (18) on ˆ ) ≤ δ = n R which are parameterised by Vc ∈ )c with Vc ≤ n+2 and satisfy Z(ξ for each ξ ∈ R. The formula X(ξ ) = h(ξ ) + Zˆ Vc (ξ ),
ξ ∈R
defines a family of pulse-like solutions to the Hamiltonian system (11) which was obtained from the original spatial dynamics formulation of the problem by the normal-form theory in Sect. 3 (because ρ = n+5 we have to take N = n + 6 in this normal-form theory). These solutions are parameterised by Vc , and although all pm -components of Vc vanish because Vc ∈ )c there still exists a continuum of solutions parameterised by the qm -components of Vc . By construction we have that X(ξ ) − h(ξ ) ≤ n+1 for each ξ ∈ R, and because h(0), Zˆ Vc (0) lie in the symmetric section ) all pm -components of X(0) vanish. Tracing the coordinate transformations back to the original variable v(ξ, y), we find that ∂ξ v(0, y) = 0 for these pulse-like solutions, which are therefore indeed symmetric. These remarks complete the proof of the existence result given in the Introduction (Theorem 1). References 1. Coddington, E.A. and Levinson, N.: Theory of Ordinary Differential Equations. NewYork: McGraw-Hill, 1955 2. Craig, W., Sulem, C. and Sulem, P.L.: Nonlinear modulation of gravity waves: A rigorous approach. Nonlinearity. 5, 497–522 (1992) 3. Craig, W. and Wayne, C.E.: Newton’s method and periodic solutions of nonlinear wave equations. Commun. Pure Appl. Math. 46, 1409–1498 (1993) 4. Eckmann, J.-P. and Wayne, C.E.: Propagating fronts and the center manifold theorem. Commun. Math. Phys. 136, 285–307 (1991) 5. Elphick, C., Tirapegui, E., Brachet, M.E., Coullet, P. and Iooss, G.: A simple global characterization for normal forms of singular vector fields. Physica D 29, 95–127 (1987) 6. Fu, Y.: On the propagation of nonlinear travelling waves in an incompressible plate. Wave Motion 19, 271–292 (1994) 7. Groves, M.D.: An existence theory for three-dimensional periodic travelling gravity-capillary water waves with bounded transverse profiles. Physica D 152–153, 395–415 (2001) 8. Groves, M.D. and Mielke, A.: A spatial dynamics approach to three-dimensional gravity-capillary steady water waves. Proc. Roy. Soc. Edin. A 131, 83–136 (2001) 9. Haragus-Courcelle, M. and Schneider, G.: Bifurcating fronts for the Taylor-couette problem in infinite cylinders. J. Appl. Math. Phys. (Z.A.M.P.) 50, 120–151 (1999) 10. Kalyakin, L.A.: Long wave asymptotics, integrable equations as asymptotic limits of nonlinear systems. Russ. Math. Surv. 44, 3–42 (1989) 11. Kirchgässner, K.: Wave solutions of reversible systems and applications. J. Diff. Eqns. 45, 113–127 (1989) 12. Kirchgässner, K.: Nonlinear resonant surface waves and homoclinic bifurcation. Adv. Appl. Math. 26, 135–181 (1988) 13. Kirrmann, P., Schneider, G. and Mielke, A.: The validity of modulation equations for extended systems with cubic nonlinearities. Proc. Roy. Soc. Edin. A 122, 85–91 (1992) 14. Kuksin, S.B.: Nearly integrable infinite-dimensional Hamiltonian systems. Berlin: Springer-Verlag, 1993 15. Meyer, K.A. and Hall, G.R.: Introduction to Hamiltonian dynamics and the N-body problem. New York: Springer-Verlag, 1992 16. Mielke, A.: A reduction principle for nonautonomous systems in infinite-dimensional spaces. J. Diff. Eqns. 65, 68–88 (1986) 17. Mielke, A.: Normal hyperbolicity of center manifolds and Saint-Venant’s principle. Arch. Rational Mech. Anal. 110, 353–372 (1986) 18. Newell, A.C. and Moloney, J.V.: Nonlinear Optics. Reading, MA: Addison-Wesley, 1992 19. Pazy, A.: Semigroups of Linear Operators and Applications to Partial Differential Equations. New York: Springer-Verlag, 1983 20. Pöschel, J.: Über invariante Tori in differenzierbaren Hamiltonschen Systemen. Bonner Mathematische Schriften 120 (1980)
522
M. D. Groves, G. Schneider
21. Pöschel, J.: Quasi-periodic solutions for a nonlinear wave equation. Comment. Math. Helv. 71, 269–296 (1980) 22. Sandstede, B. and Scheel, A.: Essential instability of pulses and bifurcations to modulated travelling waves. Proc. Roy. Soc. Edin. A 129, 1263–1290 (1999) 23. Schneider, G.: Justification of modulation equations for hyperbolic systems via normal forms. Nonlinear Differential Equations and Applications (NODEA) 5, 69–82 (1998) 24. Vanderbauwhede, A.: Centre manifolds, normal forms and elementary bifurcations. Dynamics Reported. 2, 89–169 (1989) 25. Zakharov, V.E.: Stability of periodic waves of finite amplitude on the surface of a deep fluid. Zh. Prikl. Mekh. Tekh. Fiz. 9, 86–94 (1968), (English translation J. Appl. Mech. Tech. Phys. 2, 190–194) Communicated by A. Kupiainen
Commun. Math. Phys. 219, 523 – 565 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Uniqueness of the Invariant Measure for a Stochastic PDE Driven by Degenerate Noise J.-P. Eckmann1,2 , M. Hairer1 1 Département de Physique Théorique, Université de Genève, 1211 Genève, Switzerland.
E-mail:
[email protected];
[email protected] 2 Section de Mathématiques, Université de Genève, 1211 Genève, Switzerland
Received: 10 September 2000 / Accepted: 13 December 2000
Abstract: We consider the stochastic Ginzburg–Landau equation in a bounded domain. We assume the stochastic forcing acts only on high spatial frequencies. The low-lying frequencies are then only connected to this forcing through the non-linear (cubic) term of the Ginzburg–Landau equation. Under these assumptions, we show that the stochastic PDE has a unique invariant measure. The techniques of proof combine a controllability argument for the low-lying frequencies with an infinite dimensional version of the Malliavin calculus to show positivity and regularity of the invariant measure. This then implies the uniqueness of that measure.
Contents 1. 2. 3. 4. 5. 6. 7. 8.
Introduction . . . . . . . . . . . . . . . . . . . Some Preliminaries on the Dynamics . . . . . . Controllability . . . . . . . . . . . . . . . . . . Strong Feller Property and Proof of Theorem 1.1 Regularity of the Cutoff Process . . . . . . . . . Malliavin Calculus . . . . . . . . . . . . . . . . The Partial Malliavin Matrix . . . . . . . . . . . Existence Theorems . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
523 527 527 531 533 539 544 554
1. Introduction In this paper, we study a stochastic variant of the Ginzburg–Landau equation on a finite domain with periodic boundary conditions. The deterministic equation is u˙ = u + u − u3 ,
u(0) = u(0) ∈ H,
(1.1)
524
J.-P. Eckmann, M. Hairer
1 ([−π, π ]), i.e., the closure of the space of smooth where H is the real Hilbert space Wper periodic functions u : [−π, π ] → R equipped with the norm π |u(x)|2 + |u (x)|2 dx.
u 2 = −π
(The restriction to the interval [−π, π ] is irrelevant since other lengths of intervals can be obtained by scaling space, time and amplitude u in (1.1).) While we work exclusively with the real Ginzburg–Landau equation (1.1) our methods generalize immediately to the complex Ginzburg–Landau equation u˙ = (1 + ia)u + u − (1 + ib)|u|2 u,
a, b ∈ R,
(1.2)
which has a more interesting dynamics than (1.1). But the notational details are slightly more involved because of the complex values of u and so we stick with (1.1). While a lot is known about existence and regularity of solutions of (1.1) or (1.2), only very little information has been obtained about the attractor of such systems, and in particular, nothing seems to be known about invariant measures on the attractor. On the other hand, when (1.1) is replaced by a stochastic differential equation, more can be said about the invariant measure, see [DPZ96] and references therein. Since the problem (1.1) involves only functions with periodic boundary conditions, it can be rewritten in terms of the Fourier series for u: π 1 u(x, t) = eikx uk (t), uk = e−ikx u(x) dx. 2π −π k∈Z
We call k the momenta, uk the modes, and, since u(x, t) is real we must always have uk (t) = u¯ −k (t), where z¯ is the complex conjugate of z. With these notations (1.1) takes the form u˙ k = (1 − k 2 )uk − uk1 uk2 uk3 , k1 +k2 +k3 =k
for all k ∈ Z and the initial condition satisfies {(1 + |k|)uk (0)} ∈ 2 . In the sequel, we 1 ([−π, π ]) and for its counterpart will use the symbol H indifferently for the space Wper in Fourier space. In the earlier literature on uniqueness of the invariant measure for stochastic differential equations, see the recent review [MS98], the authors are mostly interested in systems where each of the uk is forced by some external noise term. The main aim of our work is to study forcing by noise which acts only on the high-frequency part of u, namely on the uk with |k| ≥ k∗ for some finite k∗ ∈ N. The low-frequency amplitudes uk with |k| < k∗ are then only indirectly forced through the noise, namely through the nonlinear coupling of the modes. In this respect, our approach is reminiscent of the work done on thermally driven chains in [EPR99a, EPR99b, EH00], where the chains were only stochastically driven at the ends. In the context of our problem, the existence of an invariant measure is a classical result for the noise we consider [DPZ96], and the main novelty of our paper is a proof of uniqueness of that measure. To prove uniqueness we begin by proving controllability of the equations, i.e., to show that the high-frequency noise together with non-linear coupling effectively drives the low-frequency modes. Using this, we then use Malliavin calculus in infinite dimensions, to show regularity of the transition probabilities. This then implies uniqueness of the invariant measure.
Uniqueness of Invariant Measure for Stochastic PDE
525
We will study the system of equations duk = −k 2 uk dt + uk − (u3 )k dt +
qk 4π(1 + k 2 )
dwk (t),
(1.3)
with u ∈ H. The above equations hold for k ∈ Z, and it is always understood that (u3 )k = uk1 uk2 uk3 , (1.4) k1 +k2 +k3 =k k1 ,k2 ,k3 ∈Z
with u−k = u¯ k . To avoid inessential notational problems we will work with even periodic functions, so that uk = u−k ∈ R. We will work with the basis ek (x) =
1 π(1 + k 2 )
cos(kx).
(1.5)
Note that this basis is orthonormal w.r.t. the scalar product in H, but the uk are actually given by uk = (4π(1 + k 2 ))−1/2 u, ek . (We choose this to make the cubic term (1.4) look simple.) The noise is supposed to act only on the high frequencies, but there we need it to be strong enough in the following way. Let ak = k 2 + 1. Then we require that there exist constants c1 , c2 > 0 such that for k ≥ k∗ , −β
c1 ak−α ≤ qk ≤ c2 ak ,
α ≥ 2,
α − 1/8 < β ≤ α.
(1.6)
These conditions imply ∞ k=0
(1 + k 4α−3/2 )qk2 < ∞, sup k −2α qk−1 < ∞.
k≥k∗
We formulate the problem in a more general setting: Let F (u) be a polynomial of odd degree with negative leading coefficient. Let A be the operator of multiplication by 1 + k 2 and let Q be the operator of multiplication by qk . Then (1.3) is of the form dt = −At dt + F (t ) dt + Q dW (t),
(1.7)
∞
where dW (t) = k=0 ek dwk (t) is the cylindrical Wiener process on H with the wk mutually independent real Brownian motions.1 We define t (ξ ) as the solution of (1.7) with initial condition 0 (ξ ) = ξ . Clearly, the conditions on Q can be formulated as
Aα−3/8 Q HS < ∞, is bounded for k ≥ k∗ ,
qk−1 k −2α
(1.8a) (1.8b)
where · HS is the Hilbert–Schmidt norm on H. Note that for each k, (1.3) is obtained by multiplying (1.7) by (4π(1 + k 2 ))−1/2 ·, ek . 1 It is convenient to have, in the case of (1.3), A = 1 − and F (u) = 2u − u3 rather than A = −1 − and F (u) = −u3 .
526
J.-P. Eckmann, M. Hairer
Important Remark. The crucial aspect of our conditions is the possibility of choosing qk = 0 for all k < k∗ , i.e., the noise drives only the high frequencies. But we also allow any of the qk with k < k∗ to be different from 0, which corresponds to long wavelength forcing. Furthermore, as we are allowing α to be arbitrarily large, this means that the forcing at high frequencies has an amplitude which can decay like any power. The point of this paper is to show that these conditions are sufficient to ensure the existence of a unique invariant measure for (1.7). Theorem 1.1. The process (1.7) has a unique invariant Borel measure on H. There are two main steps in the proof of Theorem 1.1. First, the nature of the nonlinearity F implies that the modes with k ≥ k∗ couple in such a way to those with k < k∗ as to allow controllability. Intuitively, this means that any point in phase space can be reached to arbitrary precision in any given time, by a suitable choice of the high-frequency controls. Second, verifying a Hörmander-like condition, we show that a version of the Malliavin calculus can be implemented in our infinite-dimensional context. This will be the hard part of our study, and the main result of that part is a proof that the strong Feller property holds. This means that for any measurable function ϕ ∈ Bb (H), the function t (1.9) P ϕ (ξ ) ≡ E ϕ ◦ t (ξ ) is continuous.2 We show this by proving that a cutoff version of (1.7) (modifying the dynamics at large amplitudes by a parameter !) makes P!t ϕ a differentiable map. The interest in such highly degenerate stochastic PDE’s is related to questions in hydrodynamics where one would ask how “energy” is transferred from high to low frequency modes, and vice versa when only some of the modes are driven. This could then shed some light on the entropy-enstrophy problem in the (driven) Navier-Stokes equation. To end this introduction, we will try to compare the results of our paper to current work of others. These groups consider the 2-D Navier Stokes equation without deterministic external forces, also in bounded domains. In these equations, any initial condition eventually converges to zero, as long as there is no stochastic forcing. First there is earlier work by Flandoli-Maslowski [FM95] dealing with noise whose amplitude is bounded below by |k|−c . In the work of Bricmont, Kupiainen and Lefevere [BKL00a, BKL00b], the stochastic forcing acts on modes with low k, and they get uniqueness of the invariant measure and analyticity, with probability 1. Furthermore, they obtain exponential convergence to the stationary measure. In the work of Kuksin and Shirikyan [KS00] the bounded noise is quite general, acts on low-lying Fourier modes, and acts at definite times with “noise-less” intervals in-between. Again, the invariant measure is unique. It is supported by C ∞ functions, is mixing and has a Gibbs property. In the work of [EMS00], a result similar to [BKL00b] is shown. The main difference between those results and the present paper is our control of a situation which is already unstable at the deterministic level. Thus, in this sense, it comes closer to a description of a deterministically turbulent fluid (e.g., obtained by an external force). On the other hand, in our work, we need to actually force all high spatial frequencies. Perhaps, this could be eliminated by a combination with ideas from the papers above. 2 Throughout the paper, E denotes expectation and P denotes probability for the random variables.
Uniqueness of Invariant Measure for Stochastic PDE
527
2. Some Preliminaries on the Dynamics Here, we summarize some facts about deterministic and stochastic GL equations from the literature which we need to get started. We will consider the dynamics on the following space: 1 ([−π, π ]). The Definition 2.1. We define H as the subspace of even functions in Wper norm on H will be denoted by · , and the scalar product by ·, · .
We consider first the deterministic equation u˙ = u + u − u3 ,
u(0) = u(0) ∈ H.
(2.1)
Due to its dissipative character the solutions are, for positive times, analytic in a strip around the real axis. More precisely, denote by · Aη the norm
f Aη = sup |f (z)|, |Imz|≤η
and by Aη the corresponding Banach space of analytic functions. Then the following result holds. Lemma 2.2. For every initial value u(0) ∈ H, there exist a time T and a constant C such that for 0 < t ≤ T , the solution u(t, u(0) ) of (2.1) belongs to A√t and satisfies
u(t, u(0) ) A√t ≤ C. Proof. The statement is proven in [Col94] for the case of the infinite line. Since the periodic functions form an invariant subspace under the evolution, the result applies to our case. We next collect some useful results for the stochastic equation (1.7): Proposition 2.3. For every t > 0 and every p ≥ 1 the solution of (1.7) with initial condition 0 (ξ ) = ξ ∈ H exists in H up to time t. It defines by (1.9) a Markovian transition semigroup on H. One has the bound E sup s (ξ ) p ≤ Ct,p (1 + ξ )p . s∈[0,t]
Furthermore, the process (1.7) has an invariant measure. These results are well-known and in Sect. 8.6 we sketch where to find them in the literature.
3. Controllability In this section we show the “approximate controllability” of (1.3). The control problem under consideration is u˙ = u + u − u3 + Q f (t),
u(0) = u(i) ∈ H,
(3.1)
528
J.-P. Eckmann, M. Hairer
where f is the control. Using Fourier series’ and the hypotheses on Q, we see that by choosing fk ≡ 0 for |k| < k∗ , (3.1) can be brought to the form qk 2 u um un + fk (t), |k| ≥ k∗ , −k uk + uk − 4π(1 + k 2 ) +m+n=k (3.2) u˙ k = 2 u um un , |k| < k∗ , −k uk + uk − +m+n=k
with {uk } ∈ H and t → {fk (t)} ∈ L∞ ([0, τ ], H). We will refer in the sequel to {uk }|k| 0 the following is true: For every u(i) , u(f) ∈ H and every ε > 0, there exists a control f ∈ L∞ ([0, τ ], H) such that the solution u(t) of (3.1) with u(0) = u(i) satisfies u(τ ) − u(f) ≤ ε. Proof. The construction of the control proceeds in 4 different phases, of which the third is the actual controlling of the low-frequency part by the high-frequency controls. In the construction, we will encounter a time τ (R, ε ) which depends on the norm R of u(f) and some precision ε . Given this function, we split the given time τ as τ = 4i=1 τi , j with τ4 ≤ τ ( u(f) , ε/2) and all τi > 0. We will use the cumulated times tj = i=1 τi . Step 1. In this step we choose f ≡ 0, and we define u(1) = u(t1 ), where t → u(t) is the solution of (3.1) with initial condition u(0) = u(i) . Since there is no control, we really have (2.1) and hence, by Lemma 2.2, we see that u(1) ∈ Aη for some η > 0.
Step 2. We will construct a smooth control f : [t1 , t2 ] → H such that u(2) = u(t2 ) satisfies: +H u(2) = 0. In other words, in this step, we drive the high-frequency part to 0. To construct f , we choose a C ∞ function ϕ : [t1 , t2 ] → R, interpolating between 1 and 0 with vanishing derivatives at the ends. Define uH (t) = ϕ(t)+H u(1) for t ∈ [t1 , t2 ]. This will be the evolution of the high-frequency part. We next define the low-frequency part uL = uL (t) as the solution of the ordinary differential equation u˙ L = uL + uL − +L (uL + uH )3 , with uL (t1 ) = +L u(1) . We then set u(t) = uL (t) ⊕ uH (t) and substitute into (3.1) which we need to solve for the control Qf (t) for t ∈ [t1 , t2 ]. Since uL (t) ⊕ uH (t) as constructed above is in Aη and since Qf = u˙ − u − u + u3 , and maps Aη to Aη/2 we conclude that Qf ∈ Aη/2 . By construction, the components qk of Q decay polynomially with k and do not vanish for k ≥ k∗ . Therefore, Q−1 is a bounded operator from Aη/2 ∩ HH to HH . Thus, we can solve for f in this step. Step 3. As mentioned before, this step really exploits the coupling between high and low frequencies. Here, we start from u(2) at time t2 and we want to reach +L u(f) at time t3 . In fact, we will instead reach a point u(3) with +L u(3) − +L u(f) < ε/2.
Uniqueness of Invariant Measure for Stochastic PDE
529
The idea is to choose for every low frequency |k| < k∗ a set of three3 high frequencies that will be used to control uk . To simplify matters we will assume (without loss of generality) that k∗ > 2: Definition 3.2. We define for every k with 0 ≤ k < k∗ the set Ik by Ik = {10k∗ +k + k , 2 · 10k∗ +k , 3 · 10k∗ +k } . We also define IL0 = {k : 0 ≤ k < k∗ } and I = IL0 ∪ Ik . 0≤k 0) since it has only a finite number of non-vanishing modes. By construction we also have +L u(3) − +L u(f) ≤ ε/2. We only need to adapt the high frequency part without moving the low-frequency part too much. Since Aη is dense in H, there is a u(4) ∈ Aη with u(4) − u(f) ≤ ε/4. By the reasoning of Step 2 there is for every τ > 0 a control for which +H u(t3 + τ ) = +H u(4) when starting from u(t3 ) = u(3) . Given ε there is a τ∗ such that if τ < τ∗ then
+L u(t3 + τ ) − +L u(t3 ) < ε/4. This τ∗ depends only on u(f) and ε, as can be seen from the following argument: Since +H u(3) = 0, we can choose the controls in such a way that +H u(t3 + t) is an increasing function of t and is therefore bounded by
+H u(f) . The equation for the low-frequency part is then a finite dimensional ODE in which all high-frequency contributions can be bounded in terms of R = u(f) . Combining the estimates we see that
u(t4 ) − u(f) = +L (u(t4 ) − u(f) ) + +H (u(t4 ) − u(f) )
≤ +L (u(t4 ) − u(t3 )) + +L (u(t3 ) − u(f) )
+ +H (u(4) − u(f) ) ≤ ε . The proof of Theorem 3.1 is complete.
Uniqueness of Invariant Measure for Stochastic PDE
531
3.1. The combinatorics for the complex Ginzburg–Landau equation. We sketch here those aspects of the combinatorics which change for the complex Ginzburg–Landau equation. In this case, both the real and the imaginary parts of un and u−n are independent. Thus, we would need a noise which acts on each of the real and imaginary components of un and of u−n independently i.e., four components per n > 0 and two for n = 0. A possible definition of Ik for |k| < k∗ is: {10k∗ +2k + k, 2 · 10k∗ +2k , −3 · 10k∗ +2k } for k ≥ 0, Ik = {10k∗ +2|k|+1 − |k|, 2 · 10k∗ +2|k|+1 , −3 · 10k∗ +2|k|+1 } for k < 0. We also define IL0 = {k : |k| < k∗ } and I = IL0 ∪
Ik .
|k| 2, if z < 1.
Similarly, we define Q! (x) = Q + χ ( x /!)+k∗ ,
(4.1)
where +k∗ is the projection onto the frequencies below k∗ . Remark 4.2. These cutoffs have the following effect as a function of x : – When x ≤ ! then Q! (x) = Q and F! (x) = F (x). – When ! < x ≤ 2! then Q! (x) depends on x and F! (x) = F (x). – When 2! < x ≤ 6! then all Fourier components of Q! (x) including the ones below k∗ are non-zero and F! (x) is proportional to a F (x) times a factor ≤ 1. – When 6! < x then all Fourier components of Q! (x) including the ones below k∗ are non-zero and F! (x) = 0. At high amplitudes, the nonlinearity is truncated to 0. Thus, the Hörmander condition cannot be satisfied there unless the diffusion process is non-degenerate. We achieve this non-degeneracy by extending the stochastic forcing to all degrees of freedom when x
is large. Instead of (1.7) we then consider the modified problem dt! = −At! dt + F! ◦ t! dt + Q! ◦ t! dW (t),
(4.2)
with 0! (ξ ) = ξ ∈ H. Note that the cutoffs are chosen in such a way that the dynamics of t! (ξ ) coincides with that of t (ξ ) as long as t (ξ ) < !. We will show that the solution of (4.2) defines a Markov semigroup P!t ϕ(ξ ) = E ϕ ◦ t! (ξ ), with the following smoothing property: Theorem 4.3. There exist exponents µ, ν > 0, and for all ! > 0 there is a constant C! such that for every ϕ ∈ Bb (H), for every t > 0 and for every ξ ∈ H, the function P!t ϕ is differentiable and its derivative satisfies
DP!t ϕ(ξ ) ≤ C! (1 + t −µ )(1 + ξ ν ) ϕ L∞ .
(4.3)
Uniqueness of Invariant Measure for Stochastic PDE
533
Using this theorem, the proof of Theorem 4.1 follows from a limiting argument. Proof of Theorem 4.1. Choose x ∈ H, t > 0, and ε > 0. We denote by B the ball of radius 2 x centered around the origin in H. Using Proposition 2.3 we can find a sufficiently large constant ! = !(x, t, ε) such that for every y ∈ B, the inequality ε P sup s (y) > ! ≤ 8 s∈[0,t] holds. Choose ϕ ∈ Bb (H) with ϕ L∞ ≤ 1. We have by the triangle inequality t P ϕ(x) − P t ϕ(y) ≤ P t ϕ(x) − P t ϕ(x) + P t ϕ(x) − P t ϕ(y) ! ! ! + P t ϕ(y) − P!t ϕ(y). Since the dynamics of the cutoff equation and the dynamics of the original equation coincide on the ball of radius !, we can write, for every z ∈ B, t P ϕ(z) − P t ϕ(z) = E ϕ ◦ t (z) − ϕ ◦ t (z) ! ! ε ≤ 2 ϕ L∞ P sup s (z) > ! ≤ . 4 s∈[0,t] This implies that t P ϕ(x) − P t ϕ(y) ≤ ε + P t ϕ(x) − P t ϕ(y). ! ! 2 By Theorem 4.3 we see that if y is sufficiently close to x then t P ϕ(x) − P t ϕ(y) ≤ ε . ! ! 2 Since ε is arbitrary we conclude that P t ϕ is continuous when ϕ L∞ ≤ 1. The generalization to any value of ϕ L∞ follows by linearity in ϕ. The proof of Theorem 4.1 is complete.
5. Regularity of the Cutoff Process In this section, we start the proof of Theorem 4.3. If the cutoff problem were finite dimensional, a result like Theorem 4.3 could be derived easily using, e.g., the works of Hörmander [Hör67, Hör85], Malliavin [Mal78], Stroock [Str86], or Norris [Nor86]. In the present infinite-dimensional context we need to modify the corresponding techniques, but the general idea retained is Norris’. The main idea will be to treat the (infinite number of) high-frequency modes by a method which is an extension of [DPZ96, Cer99], while the low-frequency part is handled by a variant of the Malliavin calculus adapted from [Nor86]. It is at the juncture of these two techniques that we need a cutoff in the nonlinearity.
534
J.-P. Eckmann, M. Hairer
5.1. Splitting and interpolation spaces. Throughout the remainder of this paper, we will again denote by HL and HH the spaces corresponding to the low (resp. high)-frequency parts. We slightly change the meaning of “low-frequency” by including in the lowfrequency part all those frequencies that are driven by the noise which are in I as defined in Definition 3.2. More precisely, the low-frequency part is now {k : |k| ≤ L − 1}, where L = max{k : k ∈ I } + 1. Note that L is finite. Since A = 1 − is diagonal with respect to this splitting, we can define its low (resp. high)-frequency parts AL and AH as operators on HL and HH . From now on, L will always denote the dimension of HL , which will therefore be identified with RL .4 We also allow ourselves to switch freely between equivalent norms on RL , when deriving the various bounds. In the sequel, we will always use the notations DL and DH to denote the derivatives with respect to HL (resp. HH ) of a differentiable function defined on H. The words “derivative” and “differentiable” will always be understood in the strong sense, i.e., if f : B1 → B2 with B1 and B2 some Banach spaces, then Df : B1 → L (B1 , B2 ), i.e., it is bounded from B1 to B2 . We introduce the interpolation spaces Hγ (for every γ ≥ 0) defined as being equal to the domain of Aγ equipped with the graph norm
x 2γ = Aγ x 2 = (1 − )γ x 2 . Clearly, the Hγ are Hilbert spaces and we have the inclusions Hγ ⊂ Hδ
if
γ ≥ δ.
Note that in usual conventions, Hγ would be the Sobolev space of index 2γ + 1. Our motivation for using non-standard notation comes from the fact that our basic space is that with one derivative, which we call H, and that γ measures additional smoothness in terms of powers of the generator of the linear part.
5.2. Proof of Theorem 4.3. The proof of Theorem 4.3 is based on Proposition 5.1 and Proposition 5.2 which we now state. Proposition 5.1. Assume that the noise satisfies condition (1.6). Then (4.2) defines a stochastic flow t! on H with the following properties which hold for any p ≥ 1: (A) If ξ ∈ Hγ with some γ satisfying 0 ≤ γ ≤ α, the solution of (4.2) stays in Hγ , with a bound (5.1a) E sup t! (ξ ) pγ ≤ CT ,p,! (1 + ξ γ )p , 0 0 and every ξ ∈ H. Furthermore, for every T > 0 there is a constant CT ,p,! for which E sup t αp t! (ξ ) pα ≤ CT ,p,! (1 + ξ )p . (5.1b) 0 1,
DP!τ ϕ(ξ ) = D P!1 (P!τ −1 ϕ) (ξ ). Therefore, if we can show (4.3) for t ≤ 1, then we find for any τ > 1:
DP!τ ϕ(ξ ) ≤ 2C! (1 + ξ ν ) P!τ −1 ϕ L∞ ≤ 2C! (1 + ξ ν ) ϕ L∞ . In view of the above, it clearly suffices to show Theorem 4.3 for t ∈ (0, 1]. We first prove the bound for the case ϕ ∈ Cb2 (H). Let h ∈ H. Using Definition (1.9) of P!t ϕ and the Markov property of the flow we write
DP!2t ϕ(ξ )h = DE P!t ϕ ◦ t! (ξ )h = E DP!t ϕ ◦ t! (ξ )Dt! (ξ )h 2 2 ≤ E DP!t ϕ ◦ t! (ξ ) EDt! (ξ )h . Bounding the first square root by Proposition 5.2 and then applying Proposition 5.1 (B–C), (with T = 1) we get a bound 2 2 EDt (ξ )h
DP 2t ϕ(ξ )h ≤ C! ϕ L∞ (1 + t −µ∗ ) E 1 + t (ξ ) να∗ !
!
≤ C! ϕ L∞ (1 + t
−µ∗
)t
−αν∗
!
ν∗
(1 + ξ ) h .
Choosing µ = µ∗ + αν∗ and ν = ν∗ we find (4.3) in the case when ϕ ∈ Cb2 (H). The method of extension to arbitrary ϕ ∈ Bb (H) can be found in [DPZ96, Lemma 7.1.5]. The proof of Theorem 4.3 is complete.
536
J.-P. Eckmann, M. Hairer
5.3. Smoothing properties of the transition semigroup. In this subsection we prove the smoothing bound Proposition 5.2. Thus, we will no longer be interested in smoothing in position space as shown in Proposition 5.1 but in smoothing properties of the transition semigroup associated to (4.2). Important remark. In this section and up to Sect. 8.6 we always tacitly assume that we are considering the cutoff equation (4.2) and we will omit the index !. Thus, we will write Eq.(4.2) as dt = −At dt + F ◦ t dt + Q ◦ t dW (t).
(5.3)
The solution of (5.3) generates a semigroup on the space Bb (H) of bounded Borel functions over H = HL ⊕ HH by P t ϕ = E ϕ ◦ t , ϕ ∈ Bb (H). Our goal will be to show that the mixing properties of the nonlinearity are strong enough to make P t ϕ differentiable, even if ϕ is only measurable. We will need a separate treatment of the high and low frequencies, and so we reformulate (5.3) as dtL = −AL tL dt + FL ◦ t dt + QL ◦ t dWL (t), tL ∈ HL , (5.4a) dtH = −AH tH dt + FH ◦ t dt + QH dWH (t), tH ∈ HH , (5.4b) where HL and HH are defined in Sect. 5.1 and the cutoff version of Q was defined in (4.1). Note that QH t (ξ ) is independent of ξ and t by construction, which is why we can use QH in (5.4b). The proof of Proposition 5.2 is based on the following two results dealing with the low-frequency part and the cross-terms between low and high frequencies, respectively. Proposition 5.3. There exist exponents µ, ν > 0 such that for every ϕ ∈ Cb2 (H), every ξ ∈ Hα and every T > 0, one has E DL ϕ ◦ t (ξ )(DL tL )(ξ ) ≤ CT t −µ 1 + ξ να ϕ L∞ , for all t ∈ (0, T ].5 Lemma 5.4. For every T > 0 and every p ≥ 1, there is a constant CT ,p > 0 such that for every t ≤ T , one has the estimates (valid for hL ∈ HL and hH ∈ HH ): p E sup DL sH (ξ )hL ≤ CT ,p t p hL p , (5.5a) 0<s 0 and denote by S L the unit sphere in RL . Our bound is Theorem 7.1. There are constants µ, ν ≥ 0 such that for every T > 0 and every p ≥ 1 there is a CT ,p,! such that for all initial conditions ξ ∈ Hα for the flow t and all t < T , one has −p νp E det CLt ≤ CT ,p,! t −µp 1 + ξ α . Corollary 7.2. There are constants µ, ν ≥ 0 such that for every T > 0 and every p ≥ 1 there is a CT ,p,! such that for all initial conditions ξ ∈ Hα for the flow t and all t < T , one has, with v given by Definition 6.4: −p ≤ CT ,p,! t −µp 1 + ξ α νp . E Dv tL This corollary follows from (Dv tL )−1 = (CLt )−1 VLt and Eq.(6.6a). As a first step, we formulate a bound from which Theorem 7.1 follows easily. Theorem 7.3. There are a µ > 0 and a ν > 0 such that for every p ≥ 1, every t < T and every ξ ∈ H2 , one has t s s ∗ 2 Q V h ds < ε ≤ CT ,p,! ε p t −µp (1 + ξ 2 )νp , P inf L L h∈S L 0
with CT ,p,! independent of ξ .
Uniqueness of Invariant Measure for Stochastic PDE
545
"t Proof of Theorem 7.1. Note that 0 QsL (VLs )∗ h 2 ds is, by Eq.(6.13), nothing but the quantity h, CLt h. Then, Theorem 7.1 follows at once. The proof of Theorem 7.3 is largely inspired from [Nor86, Sect. 4], but we need some new features to deal with the infinite dimensional high-frequency part. This will take up the next three subsections. Our proof needs a modification of the Lie brackets considered when we study the Hörmander condition. We explain first these identities in a finite dimensional setting. 7.1. Finite dimensional case. Throughout this subsection we assume that both HL and HH are finite dimensional and we denote by N the dimension of H. The function Q maps H to L (H, H), and we denote by Qi : H → H its i th column (i = 0, . . . , N − 1).6 # is the drift (in this section, we absorb the linear part of the SDE into F #= Finally, F −A + F , to simplify the expressions). The equation for t is t
t
# ◦ s (ξ ) ds + F
(ξ ) = ξ + 0
t N−1
Qi ◦ s (ξ ) dwi (s).
0 i=0
Let K : H → HL be a smooth function whose derivatives are all bounded and define #t = F # ◦ t , and Qt = Qi ◦ t . We then have by Itô’s formula K t = K ◦ t , F i #t dt + dK t = (DK)t F
N−1 i=0
(DK)t Qti dwi (t) +
1 2
N−1 i=0
(D 2 K)t (Qti ; Qti ) dt.
(7.1)
We next rewrite Eq.(6.5) for VLt as: dVLt
=
#L )t −VLt (DL F
dt −
L−1
VLt (DL Qi )t
dwi (t) +
i=0
L−1
2 VLt (DL Qi )t dt.
i=0
By Itô’s formula, we have therefore the following equation for the product VLt K t : L−1 #L )t K t dt − VLt (DL Qi )t K t dwi (t) d VLt K t = − VLt (DL F i=0
+ VLt
L−1
2 #t dt (DL Qi )t K t dt + VLt (DK)t F
i=0
+ VLt
N−1
(DK)t Qti dwi (t)
i=0
+ 21 VLt − VLt
(7.2)
N−1 i=0
(D 2 K)t (Qti ; Qti ) dt
L−1 i=0
(DL Qi )t (DK)t Qti dt.
6 There is a slight ambiguity of notation here, since Q really means Q i !,i which is not the same as Q! .
546
J.-P. Eckmann, M. Hairer
By construction, DL Qi = 0 for i ≥ L and therefore we can extend all the sums above to N − 1. The following definition is useful to simplify (7.2). Let A : H → H and B : H → HL be two functions with continuous bounded derivatives. We define the projected Lie bracket [A, B]L : H → HL by [A, B]L (x) = +L [A, B](x) = DB(x) A(x) − DL AL (x) B(x). A straightforward calculation then leads to #, K]tL + d VLt K t = VLt [F + VLt
1 2
N−1
$
Qi , [Qi , K]L
i=0
%t L
dt
N −1
[Qi , K]tL dwi (t)
(7.3)
i=0
+ 21 VLt
N−1
2 (DL Qi )t K t − (DK)t (DQi )t Qti
i=0
+ (DDL Qi )t (Qti ; K t ) dt.
Note next that for i < L, both K and Qi map to HL and therefore DDL Qi (Qi ; K) = DL2 Qi (Qi ; K) when i < L and it is 0 otherwise. Similarly, (DK)(DQi )Qi equals (DK)(DL Qi )Qi when i < L and vanishes otherwise. Thus, the last sum in (7.3) only extends to L − 1. & : H → H by In order to simplify (7.3) further, we define the vector field F &= F #− F
1 2
L−1
(DL Qi )Qi .
i=0
Then we get &, K]tL + d VLt K t = VLt [F
1 2
N−1
$
i=0
Qi , [Qi , K]L
%t L
dt + VLt
N−1
[Qi , K]tL dwi (t).
i=0
This is very similar to [Nor86, p. 128], who uses conventional Lie brackets instead of [·, ·]L .
7.2. Infinite dimensional case. In this case, some additional care is needed when we transcribe (7.1). The problem is that the stochastic flow t solves (5.4) in the mild sense but not in the strong sense. Nevertheless, this technical difficulty will be circumvented by choosing the initial condition in Hα . We have indeed by Proposition 5.1 (A) that if the initial condition is in Hγ with γ ∈ [1, α], then the solution of (5.4) is in the same space. Thus, Proposition 5.1 allows us to use Itô’s formula also in the infinite dimensional case. For any two Banach (or Hilbert) spaces B1 , B2 , we denote by P (B1 , B2 ) the set of all C ∞ functions B1 → B2 , which are polynomially bounded together with all their
Uniqueness of Invariant Measure for Stochastic PDE
547
derivatives. Let K ∈ P (H, HL ) and X ∈ P (H, H). We define as above [X, K]L ∈ P (H, HL ) by [X, K]L (x) = DK(x) X(x) − DL XL (x) K(x). Furthermore, we define [A, K]L ∈ P (D(A), HL ) by the corresponding formula, i.e., [A, K]L (x) = DK(x) Ax − AL K(x), where A = 1 − . Notice that if K is a constant vector field, i.e., DK = 0, then [A, K]L extends uniquely to an element of P (H, HL ). We choose again the basis {ei }∞ i=0 of Fourier modes in H (see Eq.(1.5)) and define dwi (t) = ei , dW (t). We also define the stochastic process K t (ξ ) = (K ◦ t )(ξ ) and &= F − F
1 2
L−1
(DL Qi )Qi ,
i=0
where Qi (x) = Q(x)ei . Then one has Proposition 7.4. Let ξ ∈ H1 and K ∈ P (H, HL ). Then the equality VLt (ξ )K t (ξ ) = K(ξ ) +
t
+ 0 1 2
t
VLs (ξ )
0
∞
[Qi , K]sL (ξ ) dwi (s)
i=0
&, K]sL (ξ ) ds VLs (ξ ) −[A, K]sL (ξ ) + [F
+
t 0
VLs (ξ )
∞ $ i=0
%s Qi , [Qi , K]L L (ξ ) ds,
holds almost surely. The same equality also holds if ξ ∈ H2 and K ∈ P (H1 , HL ). Note that by [A, K]sL (ξ ) we mean DK s (ξ ) As (ξ ) − AL K s (ξ ) . Proof. This follows as in the finite dimensional case by Itô’s formula.
7.3. The restricted Hörmander condition. The condition for having appropriate mixing properties is the following Hörmander-like condition. Definition 7.5. Let K = {K (i) }M i=1 be a collection of functions in P (H, HL ). We say that K satisfies the restricted Hörmander condition if there exist constants δ, R > 0 such that for every h ∈ HL and every y ∈ H one has sup
inf
K∈K x−y ≤R
h, K(x)2 ≥ δ h 2 .
(7.4)
We now construct the set K for our problem. We define the operator [X 0 , · ]L : P (Hγ , HL ) → P (Hγ +1 , HL ) by [X0 , K]L = −[A, K]L + [F, K]L +
1 2
∞ $
Qi , [Qi , K]L
i=0
%
− L
1 2
L−1
$
i=0
(DL Qi )Qi , K
% L
.
548
J.-P. Eckmann, M. Hairer
This is a well-defined operation since Q is Hilbert–Schmidt and DQ is finite rank and we can write ∞ ∞ $ % 2 Qi , [Qi , K]L L = D K (Qi ; Qi ) + r, i=0
i=0
with r a finite sum of bounded terms. Definition 7.6. We define – K0 = {Qi , with i = 0, . . . , L − 1}, – K1 = {[X 0 , Qi ]L , with i = k∗ , . . . , L − 1}, – K = {[Qi , K]L , with K ∈ K−1 and i = k∗ , . . . , L − 1}, when > 1. Finally,
K = K0 ∪ · · · ∪ K3 .7
Remark 7.7. Since for i ≥ k∗ the Qi are constant vector fields, the quantity [X 0 , K] is in P (H, HL ) and not only in P (H1 , HL ). Furthermore, if K ∈ K then D j K is bounded for all j ≥ 0. We have Theorem 7.8. The set K constructed above satisfies the restricted Hörmander condition for the cutoff GL equation if ! is chosen sufficiently large. Furthermore, the inequality (7.4) holds for R = !/2. Finally, δ > δ0 > 0 for all sufficiently large !. Proof. The basic idea of the proof is as follows: The leading term of F is the cubic term um with m = 3. Clearly, if i1 , i2 , i3 are any 3 modes, we find % $ Ck +L ek , (7.5) ei1 , [ei2 , [u → u3 , ei3 ]L ]L L = k=±i1 ±i2 ±i3
where the e are the basis vectors of H defined in (1.5), and the Ck are non-zero combinatorial constants. By Lemma 3.3 the following is true: For every choice of a fixed k the three numbers i1 , i2 , and i3 of Ik satisfy – For j = 1, 2, 3 one has ij ∈ {k∗ , . . . , L − 1}. – If |k| < k∗ exactly one of the six sums ±i1 ± i2 ± i3 lies in the set {0, . . . , k∗ − 1} and exactly one lies in {−(k∗ − 1), . . . , 0}. In particular, the expression (7.5) does not depend on u. If instead of u3 we take a lower power, the triple commutator will vanish. The basic idea has to be slightly modified because of the cutoff !. First of all, the constant R in the definition of the Hörmander condition is set to R = !/2. Consider first the case where x ≥ 5!/2. In that case we see from (4.1) that the Q!,i , viewed as vector fields, are of the form ' (qi + 1)ei , if i < k∗ , Q!,i (x) = qi ei , if i ≥ k∗ . Since these vectors span a basis of HL the inequality (7.4) follows in this case (already by choosing only K ∈ K0 ). Consider next the more delicate case when x ≤ 5!/2. 7 The number 3 is the power 3 in u3 .
Uniqueness of Invariant Measure for Stochastic PDE
549
Lemma 7.9. For all x ≤ 3! one has for {i1 , i2 , i3 } = Ik the identity % ei1 , [ei2 , [X0 , ei3 ]L ]L L (x) =
$
Ck +L ek + r! (x),
(7.6)
k=±i1 ±i2 ±i3
where r! satisfies a bound
r! (x) ≤ C!−1 , with the constant C independent of x and of k < k∗ . Proof. In [X0 , ·]L there are 4 terms. The first, A, leads successively to [A, ei3 ]L = (1 + i32 )ei3 , which is constant, and hence the Lie bracket with ei2 vanishes. The second term contains the non-linear interaction F! . Since x ≤ 3! one has F! (x) = F (x). Thus, (7.5) yields the leading term of (7.6). The two remaining terms will contribute to r! (x). We just discuss the first one. We have, using (4.1), 1 x, ei3 +k∗ ei . [Q!,i , ei3 ]L (x) = −DQ!,i (x)ei3 = − χ ( x /!) !
x
This gives clearly a bound of order !−1 for this Lie bracket, and the further ones are handled in the same way. We continue the proof of Theorem 7.8. When k < k∗ , we consider the elements of K3 . They are of the form % Q!,i1 , [Q!,i2 , [X0 , Q!,i3 ]L ]L L (x) = qi1 qi2 qi3
$
Ck +L ek + r! (x) .
k=±i1 ±i2 ±i3
Thus, for ! = ∞ these vectors together with the Qi with i ∈ {k∗ , . . . , L − 1} span HL (independently of y with y ≤ 3!) and therefore (7.5) holds in this case, if x ≤ 5!/2 and R = !/2. The assertion for finite, but large enough ! follows immediately by a perturbation argument. This completes the case of x ≤ 5!/2 and hence the proof of Theorem 7.8. Proof of Theorem 7.3. The proof is very similar to the one in [Nor86], but we have to keep track of the x, t-dependence of the estimates. First of all, choose x ∈ H2 and t ∈ (0, t0 ]. From now on, we will use the notation O(ν) as a shortcut for C(1 + x ν2 ), where the constant C may depend on t0 and p, but is independent of x and t. Denote by R the constant found in Theorem 7.8 and define the subset Bx of H2 by ! Bx = y ∈ H2 : y − x ≤ R and y γ ≤ x γ + 1 for γ = 1, 2 . We also denote by B(I ) a ball of (small) radius O(1/L) centered at the identity in the space of all L × L matrices. (Recall that L is the dimension of HL , and that K ∈ K maps to HL .) We then have a bound of the type sup sup
∞ [Qi , K]L (y)2 ≤ O(0).
y∈Bx K∈K i=0
(7.7)
550
J.-P. Eckmann, M. Hairer
This is a consequence of the fact that QQ∗ is trace class and thus the sum converges and its principal term is equal to ∗ Tr Q∗ (y) DK (y) DK (y) Q(y) ∗ = Tr DK (y)Q(y) Q∗ (y) DK (y) =
L−1
∗
Q∗ (y) DK (y)ei 2 ≤ C! .
i=0
The last inequality follows from Remark 7.7. The other terms form a finite sum containing derivatives of the Qi and are bounded in a similar way. We have furthermore bounds of the type 2 sup sup [X 0 , K]L (y) ≤ O(ν), y∈Bx K∈K
2 $ % sup sup X 0 , [X0 , K]L L (y) ≤ O(ν),
y∈Bx K∈K ∞
sup sup
y∈Bx K∈K i=0
(7.8)
$ % Qi , [X0 , K]L (y)2 ≤ O(ν), L
where ν = 1. Let SL be the unit sphere in HL . By the assumptions on K and the choice of B(I ) we see that: (A) For every h0 ∈ SL , there exist a K ∈ K and a neighborhood N of h0 in SL such that δ inf inf inf V K(y), h2 ≥ , 2 y∈Bx V ∈B(I ) h∈N with δ the constant appearing in (7.4). Next, we define a stopping time τ by τ = min{t, τ1 , τ2 } with ! τ1 = inf s ≥ 0 : s (x) " ∈ Bx , ! τ2 = inf s ≥ 0 : VLs (x) " ∈ B(I ) , t < T as chosen in the statement of Theorem 7.3. It follows easily from Proposition 5.1 (E) that the probability of τ1 being small (meaning that in the sequel we will always assume ε ≤ 1) can be bounded by P(τ1 < ε) ≤ Cp (1 + x 2 )16p ε p , with Cp independent of x. Similarly, using Lemma 6.3, we see that P(τ2 < ε) ≤ Cp ε p . Observing that P(t < ε) < t −p ε p and combining this with the two estimates, we get for every p ≥ 1: P τ < ε ≤ O(16p)t −p ε p .
Uniqueness of Invariant Measure for Stochastic PDE
551
From this and (A) we deduce (B) for every h0 ∈ SL there exist a K ∈ K and a neighborhood N of h0 in SL such that for ε < 1, τ 2 s s VL (x)K (x), h ds ≤ ε ≤ P(τ < 2ε/δ) ≤ O(16p)t −p ε p , sup P h∈N
0
with δ the constant appearing in (7.4). Following [Nor86], we will show below that (B) implies: (C) for every h0 ∈ SL there exist an i ∈ {k∗ , . . . , L − 1}, a neighborhood N of h0 in SL and constants ν, µ > 0 such that for ε < 1 and p > 1 one has τ 2 s sup P VL (x)Qsi (x), h ds ≤ ε ≤ O(νp)t −µp ε p . h∈N
0
Remark 7.10. Note that for small x , Qi (x) = Qi,! (x) may be 0 when i < k∗ , but the point is that then we can find another i for which the inequality holds. By a straightforward argument, given in detail in [Nor86, p. 127], one concludes that (C) implies Theorem 7.3. It thus only remains to show that (B) implies (C). We follow closely Norris and choose a K ∈ K such that (B) holds. If K happens to be in K0 then it is equal to a Qi , and thus we already have (C). Otherwise, assume K ∈ Kj with j ≥ 1. Then we use a Martingale inequality. Lemma 7.11. Let H be a separable Hilbert space and W (t) be the cylindrical Wiener process on H. Let β t be a real-valued predictable process and γ t , ζ t be predictable H-valued processes. Define t t t 0 s β ds + γ s , dW (s), a =a + 0 0 t t t 0 s b =b + a ds + ζ s , dW (s). 0
0
Suppose τ ≤ t0 is a bounded stopping time such that for some constant C0 < ∞ we have ! sup |β s |, |a s |, ζ s , γ s ≤ C0 . 0<s 1, there exists a constant Cp,t0 such that τ τ s2 P (bs )2 ds < ε20 and |a | + ζ s 2 ds ≥ ε ≤ Cp,t0 (1 + C06 )p ε p , 0
0
for every ε ≤ 1. Proof. The proof is given in [Nor86], but without the explicit dependence on C0 . If we follow his proof carefully we get an estimate of the type τ τ s2 P |a | + ζ s 2 ds ≥ (1 + C03 )ε ≤ C1 (1 + C012 )p ε p . (bs )2 ds < ε10 and 0
0
Replacing ε by ε2 and making the assumption ε < 1/(1+C03 ), we recover our statement. The statement is trivial for ε > 1/(1+C03 ), since any probability is always smaller than 1.
552
J.-P. Eckmann, M. Hairer
We apply this inequality as follows: Define, for K0 ∈ K, a t (x) = VLt [X 0 , K0 ]tL (x), h , bt (x) = VLt K0t (x), h , $ %t β t (x) = VLt X 0 , [X0 , K0 ]L L (x), h , $ %t (γ t )i (x) = VLt Qi , [X0 , K0 ]L L (x), h , (ζ t )i (x) = VLt [Qi , K0 ]tL (x), h . In this expression, ζ t (x) ∈ H, (ζ t )i (x) = ζ t (x), ei and similarly for the γ t . It is clear by Proposition 7.4, Eq.(7.7), and Eq.(7.8) that the assumptions of Lemma 7.11 are satisfied with C0 = O(ν) for some ν > 0. We continue the proof that (B) implies (C) in the case when K ∈ Kj , with j = 1. Then, by the construction of Kj with j ≥ 1, there is a K0 ∈ Kj −1 such that we have either K = [Qi , K0 ]L for some i ∈ {k∗ , . . . , L − 1}, or K = [X 0 , K0 ]L . In fact, for j = 1 only the second case occurs and K0 = Qi for some i, but we are already preparing an inductive step. Applying Lemma 7.11, we have for every ε ≤ 1: P 0
+
τ
2 VLs K0s (x), h ds
: Z(Y ) ⊗ Z(Y ) −→ C, and hence a canonical isomorphism Z(Y )∗ ≡ Z(Y ). A4 is the corresponding expected behaviour of ZX . These axioms are idealized, and in practice some modifications are needed. This is illustrated in the FQFT we consider in Sect. 3. 1.2. Heuristic path integral formulae. The above framework aims to algebraicize the relation between the Feynman path integral formulation of QFT and its Hilbert space formulation. The following heuristic interpretation is useful to bear in mind. If X has connected boundary, the vector ZX represents the partition function, which is given by a formal path integral e−S(ψ) Dψ, (1.4) ZX : E(Y ) −→ C, ZX (f ) = Ef (X)
where Dψ is a formal measure. Here S : Ef (X) → C is an action functional on a space of fields on X, which for definiteness we shall take to be the space of C 0 functions on X, with boundary value f ∈ E(Y ). The vector space Z(Y ) is a space of functions on E(Y ) and forms the Hilbert space of the theory, and ZX is the vacuum state. To a cobordism X ∈ Cd with ∂X = Y0 Y0 one has f = (f0 , f1 ), and then e−S(ψ) Dψ, ZX (f0 , f1 ) = E(f0 ,f1 ) (X)
is the kernel of the linear operator ZX ∈ Hom(Z(Y0 ), Z(Y1 )) defined by ZX (f0 , f1 )ξ0 (f0 )Df0 . ZX (ξ0 )(f1 ) = E (Y0 )
If Y0 = Y1 = Y we hence obtain the bilinear form corresponding to (1.1): ξ1 (f )ZX (ξ0 )(f ) Df. (ξ0 , ξ1 ) = E (Y )
In the case of a closed manifold M = X0 ∪Y X1 we can express the space of C 0 functions on M as a fibre product E(M) = E(X0 ) ×E (Y ) E(X1 ) and so formally one expects an equality −S(ψ) −S(ψ0 ) e Dψ = Df e Dψ0 e−S(ψ1 ) Dψ1 E (M)
E (Y )
=
E (Y )
Ef (X0 )
ZX0 (f )ZX1 (f )Df,
Ef (X1 )
(1.5)
Determinant Bundles and FQFT
571
which is the path integral version of the algebraic sewing formula (1.2). The Hamiltonian of the theory is defined by the Euclidean time evolution operator e−tH = ZY ×[0,t] ∈ End(Z(Y )), and to compute the trace one has the integral formulae Tr (e−tH ) =
E (Y )
Tr ZX (f, f ) Df,
and corresponding to (1.3) Tr (e
−tH (f )
)=
E (Xf )
e−S(ψ) Dψ.
(1.6)
The sewing formula (1.5) says that the partition function on M is the vacuum-vacuum expectation value calculated from the partition functions on the two halves. Equivalently: the invariant ZM is obtained from ZX0 (f ) and ZX1 (f ) by integrating (“averaging”) away the choice of boundary data f . In the case of determinants of Dirac operators this formalism provides some insight to sewing formulae relative to a partition of the underlying manifold (see Sect. 5). First, we need to review some facts about determinants and Fock bundles for families of Dirac operators.
2. Determinant Line Bundles and Fock Spaces The determinant of a family of first-order elliptic operators arises canonically not as a function, but as a section of a complex line bundle called the determinant line bundle. The anomalies we shall discuss may be realized as obstructions to constructing appropriate trivializations of that bundle. Equivalently, we can view the determinant line of an operator as a ray in the associated Fock space (via the Plücker embedding), and globally the determinant bundle as rank 1 subbundle of an infinite-dimensional Fock bundle to which the gauge group lifts as a projective bundle map. First, recall the construction of the determinant line bundle for a family of Diractype operators over a closed compact manifold X. Such a family can be specified by a smooth fibration of manifolds π : M −→ B with fibre diffeomorphic to X, endowed with a Riemannian metric gM/B along the fibres and a vertical bundle of Clifford modules S(M/B) which we may identify with the vertical spinor bundle tensored with an external vertical gauge bundle ξ . We assume that ξ is endowed with a Hermitian structure with compatible connection. The manifold B is not required to be compact. We refer to this data as a geometric fibration. Associated to a geometric fibration one has a smooth elliptic family of Dirac operators D = {Db | b ∈ B} : H −→ H, where H = π∗ (S(M/B)) is the infinite-dimensional Hermitian vector bundle on B whose fibre at b is the Frechet space of smooth sections Hb = C ∞ (Mb , Sb ), and Sb a bundle of Clifford modules. If X is even-dimensional there is a Z2 bundle grading H = H+ ⊕ H− into positive and negative chirality fields and we then have a family of chiral Dirac operators D : H+ −→ H− . The Quillen determinant line bundle DET(D) is a complex line bundle over B with fibre at b canonically isomorphic to the complex line Det(Ker Db )∗ ⊗ Det Coker(Db ) [5, 18], where for a finite-dimensional vector space V , Det V is the complex line ∧max V . The bundle structure is defined relative to the covering of B by open subsets Uλ , with λ ∈ R+ , parameterising those operators Db for which λ is not in the spectrum of the Laplacian
572
J. Mickelsson, S. Scott
Db∗ Db . Over each Uλ are smooth finite-rank vector bundles Hλ+ , Hλ− equal to the sum of eigenspaces of Db∗ Db (resp. of Db Db∗ ) for eigenvalues less than λ, and one defines DET(D)|Uλ = Det(Hλ+ )∗ ⊗ Det Hλ− .
(2.1)
The locally defined line bundles patch together naturally over the overlaps Uλ ∩ Uλ [4]. This “spectral” construction of the determinant line bundle is designed to allow one to define the Quillen ζ -function metric and a compatible connection whose curvature R ζ is identified with the 2-form component of the Bismut family’s index density: ζ
R = (2πi)
−n/2
M/B
A(M/B)ch(ξ )
,
(2.2)
2
where A(M/B) is the vertical A-hat form and ch(ξ ) the Chern character, see [18, 5, 4]. There is, however, a natural alternative construction of the determinant line bundle, due to Segal [24, 17], and applied to Dirac families in [20], which allows us to consider more general smooth families of Fredholm operators, which need not be elliptic operators. Let α : H 0 → H 1 be a Fredholm operator of index zero. Then a point of the complex line over α is an equivalence class [A, λ] of pairs (A, λ), where the invertible operator A : H 0 → H 1 is such that A−1 α ∈ End(H 0 ) is trace-class, λ ∈ C, and the equivalence relation is defined by (Aq, λ) ∼ (A, det F (q)λ) for q ∈ End(H 0 ) an operator of the form identity plus trace-class, and det F denotes the Fredholm determinant. If ind α = d we define Det(α) := Det(α ⊕ 0) with α ⊕ 0 acting H 0 → H 1 ⊕ Cd if d > 0, or H 0 ⊕ C−d −→ H 1 if d < 0. Note that, by definition, a Fredholm operator of index zero has an approximation by an invertible operator A such that A − α has finite rank. We work with the larger ideal of trace-class operators in order to be able to use the (complete) topology determined by the trace norm. With the above notation, the abstract determinant of α is defined to be the canonical element detα := [A, detF (A−1 α)] ∈ Det(α). For an admissible smooth family of Fredholm operators A = {αb | b ∈ B} : H0 → H1 acting between (weak) vector bundles Hi [20, 24], the union DET(A) of the determinant lines is naturally a complex line bundle. The bundle structure is defined relative to a denumerable covering of open sets Uτ , where τ : H0 → H1 is finite-rank and Uτ parameterizes those b for which Ab = αb + τb is invertible, via the local trivialization b → det(αb + τb ) over Uτ . On the intersection Uτ 1 ∩ Uτ 2 the transition function is b → det F ((αb + τb1 )−1 (αb + τb2 )). For a family of elliptic operators, such as D, there is a canonical isomorphism between the two constructions of the determinant bundle described above which preserves the determinant section b → detDb , and we may therefore use them interchangeably [20]. This is important when we consider the determinant line bundle for a family of elliptic boundary value problems (EBVPs). To define such a family we proceed initially as for the case of a closed manifold with a geometric fibration π : M −→ B of connected manifolds, but with fibre diffeomorphic to a compact connected manifold X with boundary ∂X = Y . Note that the boundary manifolds ∂M and ∂X may possibly be disconnected. Globally we obtain as before a family of Dirac operators D = {Db | b ∈ B} : H −→ H. We assume that the geometry in a neighbourhood, U ≡ ∂M×[0, 1] of the boundary is a pull-back of the geometry induced on the boundary geometric fibration of closed boundary manifolds ∂π : ∂M −→ B. This means that all metrics and connections on T (M/B) and S(M/B) restricted to U are geometric products composed of the trivial geometry in the normal u-coordinate direction, and the boundary geometry in tangential
Determinant Bundles and FQFT
573
directions, so gM/B = du2 + g∂M/B and so forth. In Ub := U ≡ ∂Xb × [0, 1] the Dirac operator Db then has the form ∂ + DY,b , (2.3) Db|U = Gb ∂u where DY,b is a boundary Dirac operator and Gb is a unitary bundle automorphism. The family of boundary Dirac operators DY = {DY,b | b ∈ B} : HY −→ HY , where HY is the bundle with fibre C ∞ (Yb , SYb ) at b ∈ B, is identified with family defined by the fibration ∂π : ∂M −→ B. In contrast to the closed manifold case, the operators Db are not Fredholm. The crucial analytical property underlying the following determinant line and Fock bundle identifications is the existence of a canonical identification between the infinitedimensional space Ker(Db ) of solutions to the Dirac operator and the boundary traces K(Db ) = γ Ker(Db ), where γ : C ∞ (Xb , Sb ) → C ∞ (Yb , SYb ) is the operator restricting sections to the boundary. More precisely, the Poisson operator Kb : C ∞ (Yb , SYb ) → C ∞ (Xb , Sb ) restricts to define the above isomorphism. It extends to a continuous operator Kb : H s−1/2 (Yb ; S|Yb ) → H s (Xb ; S) on the Sobolev completions with range Ker(Db , s) = {f ∈ H s (Xb ; S) | Db f = 0 in Xb \ Yb }, and the restriction Kbs : K(Db , s)) → Ker(Db , s)
(2.4)
is an isomorphism (see [6, 11]). The Poisson operator of Db defines the Calderon projection: P (Db ) = γ Kb1 .
(2.5)
P (Db ) is a pseudodifferential projection on L2 (Yb , SYb ) which we can take to be orthogonal with range equal to Ker(D, s). The construction depends smoothly on the parameter b ∈ B and so globally we obtain a smooth map P (D) : B → End(HY ) defining, equivalently, a smooth Frechet subbundle K(D) of HY with fibre K(Db ) = ran(P (Db )). Because of the tubular boundary geometry, P (Db ) in fact differs from the APS spectral projection 1b by only a smoothing operator [19, 11]. Recall that there is a polarization HYb = Hb+ ⊕ Hb+ into the non-negative and negative energy modes of the elliptic selfadjoint boundary Dirac operator DY,b and 1b is defined to be the orthogonal projection onto Hb+ . Hence P (Db ) is certainly an element of the pseudodifferential Grassmannian Grb = Gr(HYb ) parameterizing projections on HY,b which differ from 1b by a pseudodifferential operator of order < −dimM/2, where by projection we mean selfadjoint indempotent. Grb is a dense submanifold of the Hilbert–Schmidt Grassmannian. Associated to P ∈ Grb we have the elliptic boundary value problem (EBVP) for Db , DP ,b = Db : dom(DP ,b ) → L2 (Xb ; S 1 )
(2.6)
with domain dom DP ,b = {s ∈ H 1 (Xb ; Sb0 ) | P (s|Yb ) = 0}. The operator DP ,b is Fredholm with kernel and cokernel consisting of smooth sections, see [6] for a general account of EBVPs in index theory. The smooth family of EBVPs DGrb := {DP ,b | P ∈ Grb } defines a smooth admissible family of Fredholm operators and hence an associated
574
J. Mickelsson, S. Scott
determinant line bundle DET(DGrb ) → Grb . On the other hand, for each choice of a basepoint P0 ∈ Grb we have the smooth family of Fredholm operators {PW0 ,W := P ◦ P0 : W0 → W : P ∈ Grb }, where ran(P0 ) = W0 , ran(P ) = W , and hence a relative (Segal) determinant line bundle DETW0 → Grb based at W0 . The bundles so defined for different choices of basepoint are all isomorphic, but not quite canonically. More precisely, from [22, 20], given P0 , P1 ∈ Grb there is a canonical line bundle isomorphism DETW0 ∼ = DETW1 ⊗ DET(W0 , W1 ),
(2.7)
where DET(W0 , W1 ) means the trivial line bundle with fibre the relative determinant line DET(W0 , W1 ) := Det(PW0 ,W1 ). The identification (2.4) defines a canonical line bundle isomorphism DET(DGrb ) ∼ = DETKb ,
(2.8)
preserving the determinant sections det(DGrb ) ←→ det(Sb (P )), where Sb (P ) := P P (Db ) : K(Db ) → ran(P ) (see [20]). The translation of these facts into global statements follows by observing that the restricted Grassmannians Grb , defined for each b ∈ B, fit together to define a fibration GrY → B. A spectral section (or, Grassmann section [20]) P = {Pb : b ∈ B} for the family D is defined to be a smooth section of GrY , and we denote the space of such sections by Gr(M/B). By cobordism, such sections always exist. In particular, the family of Dirac operators D defines canonically the Calderon section P (D) ∈ Gr(M/B). In this sense one may think of the parameter space B as a generalized Grassmannian (parameterizing the subspaces K(Db )), and the usual Grassmannian as a “universal moduli space”. Notice, however, that the map b → 1b is generically not a smooth spectral section because of the flow of eigenvalues of the boundary family. Indeed, it is this elementary fact that is the source of gauge anomalies, see [15] and Sect. 4. A spectral section has a number of consequences for determinants: First. We obtain a smooth family of EBVPs (D, P) = {DP ,b := (Db )Pb | b ∈ B} which has an associated determinant line bundle DET(D, P) → B with determinant section b → DP ,b . Second. A spectral section P defines a smooth infinite-dimensional vector bundle W with fibre Wb = ran(Pb ), and associated to P we have the smooth family of Fredholm operators S(P) : K(D) → W, parameterizing the operators S(Pb ) := PK(Db ),Wb : K(Db ) → Wb . This also has a determinant line bundle DET(S(P)), and corresponding to (2.8), there is a canonical line bundle isomorphism DET(D, P) ∼ = DET(S(P)),
det(DP ,b ) ←→ det(S(Pb )),
(2.9)
Determinant Bundles and FQFT
575
preserving the determinant sections. Given a pair of sections P1 , P2 ∈ Gr(M/B) there is the smooth family of admissible Fredholm operators (P1 , P2 ) : W 1 → W 2 , and corresponding to (2.7) and (2.9) one finds a canonical isomorphism DET(D, P1 ) ∼ = DET(D, P2 ) ⊗ DET(P2 , P1 ),
(2.10)
which does not preserve the determinant sections [20]. Third. We obtain a bundle of Fock spaces FP over B. To see this, return for a moment to the case of a single operator and its Grassmannian Grb . By choosing a basepoint P0 ∈ Grb , we obtain the determinant line bundle DETW0 → Grb . This is a holomorphic line bundle, but has no global holomorphic sections. The dual bundle DET∗W0 , on the other hand, has an infinite-dimensional space of holomorphic sections, and this, by definition, is the Fock space based at W0 : FW0 ,b := 2hol (Grb ; DET∗W0 ).
(2.11)
This is the natural quantization defined by geometric (Kahler) quantization. Actually, a Fock space comes together with a vacuum vector and a representation of the canonical anticommutation relations; we shall return to this at the end of the section. Taking the union Fb := ∪W ∈Grb FW,b we obtain the Fock bundle over Grb . This bundle is topologically completely determined by “the” determinant bundle DETW0 , in fact this is the most direct way to define the bundle structure on Fb . To be precise, if we change the basepoint we find, dropping the b subscript, a canonical isomorphism FW1 = 2hol (Gr; DET∗W1 ) ∼ 2hol (Grb ; DET∗ ⊗ DET(W1 , W0 )∗ ) = W0
∼ = 2hol (Grb ; DET∗W0 ) ⊗ DET(W1 , W0 )∗ ∼ FW ⊗ DET(W0 , W1 ), = 0
where we use (2.7). Hence relative to a basepoint W0 ∈ Grb we have a canonical isomorphism Fb ∼ = FW0 ⊗ DETW0 ,
(2.12)
where the first factor on the right-side is the trivial bundle with fibre FW0 . Hence the topological type of the Fock bundle Fb is determined by that of the determinant line bundle DETW0 for any basepoint W0 . One moves between the isomorphisms for different basepoints via (2.7). As an abstract vector bundle, a Fock bundle is always trivial (but not necessarily canonically); this is because of the fact that (according to Kuiper’s theorem) the unitary group in an infinite-dimensional Hilbert space is contractible. However, as already mentioned, the Fock spaces are equipped with additional structure, the vacuum vectors related to a choice of a family of (Dirac) Hamiltonians, which will modify this statement. In the case of (2.12) we have a preferred line bundle (the “vacuum bundle”) inside of the Hilbert bundle and the structure group is reduced giving a nontrivial Fock bundle; this will be discussed in more detail in Sect. 4. Now as we let b vary we obtain a vertical Fock bundle F(M/B) over the total space GrY of the Grassmann fibration, which restricted to the fibre Grb of GrY coincides with Fb . The bundle structure is obvious from the local triviality of the fibration GrY → B.
576
J. Mickelsson, S. Scott
A spectral section P is a smooth cross section of that fibration, and hence by pull-back we get a Fock bundle over B associated to P: FP := P∗ (F(M/B)) −→ B,
(2.13)
with fibre FPb = 2hol (Grb ; DET∗Wb ), Wb = ran(Pb ) at b ∈ B. In the following we may at times also write FP = FW , where W → B is the bundle associated to P. Moreover, from the equivalences above we get that the various Fock bundles are related in the following way. Proposition 2.1. For spectral sections P1 , P2 ∈ Gr(M/B), there is a canonical isomorphism of Fock bundles FP1 ∼ = FP2 ⊗ DET(P2 , P1 ).
(2.14)
Notice, in a similar way to FP , we can also identify the determinant line bundle DET(P1 , P2 ) as a pull-back bundle. For, associated to the section P1 we have a vertical determinant line bundle DETP1 → GrY , which restricts to DETW 1 over Grb , where b
Wb1 = ran(Pb1 ). Then DET(P1 , P2 ) = (P2 )∗ (DETP1 ). In particular, associated to the family D of Dirac operators parameterized by B, we have the canonical spectral section P (D), and by (2.9) we have a canonical isomorphism DET(D, P) ∼ = P∗ (DETP (D) ). At the Fock space level we have a Fock bundle FD canonically associated to the family D, independently of an extrinsic choice of spectral section, whose fibre at b ∈ B is FDb := 2hol (Grb ; DET(DGrb )∗ ). From (2.8) and (2.14) we obtain the Fock space version of (2.9) and (2.10): Proposition 2.2. There is a canonical isomorphism of Fock bundles FD ∼ = FP (D) .
(2.15)
FD ∼ = FP ⊗ DET(D, P).
(2.16)
For P ∈ Gr(M/B):
Thus the topology of the Fock bundle and the determinant line bundle are intimately related. This is the topological reason relating the Schwinger terms in the Hamiltonian anomaly to the index density. The Fock space FW0 based at W0 ∈ Grb can be thought of more concretely in terms of equivariant functions on the Stiefel frame bundle over Grb . To describe this we fix an orthonormal basis of eigenvectors of DY,b in HYb such that ei ∈ Hb− for i ≤ 0 and ei ∈ Hb+ for i > 0. A point in the fibre of the Stiefel bundle Stb based at Hb+ over W in the index zero component of Grb is a linear isomorphism ξ : Hb+ → W such that 1b ◦ ξ : Hb+ → Hb+ has a Fredholm determinant. ξ is also referred to as an “admissible basis” for W (relative to Hb+ ), in so far as it transforms ei , i > 0 to a basis for W . ξ can ξ+ be thought of as a matrix with columns labeling the elements of the basis and ξ− rows the coordinates in the standard basis ei . If ξ, ξ are two admissible bases for W then ξ = ξ.g where g is an element of the restricted general linear group Gl 1 consisting of invertible linear maps g : Hb + → Hb + such that g − I is trace-class. Gl 1 acts freely
Determinant Bundles and FQFT
577
on Stb × C by (ξ, λ).g = (ξ.g, λdetF (g)−1 ) and we obtain the alternative construction of DETH + = Stb ×Gl 1 C. b
(2.17)
Similarly, for W ∈ Grb we have DETW = StW ×Gl 1 C, where StW is the corresponding frame bundle based at W . In particular, notice that an isomorphism StW0 → StW1 is specified by an invertible operator A : W0 → W1 such that P1 P0 − A is trace-class, from which we once again have the identification (2.7). In this description, an element of the Fock space FWb is a holomorphic function ψ : StWb → C transforming equivariantly under the Gl 1 action as ψ(ξ.g) = ψ(ξ )det F (g). The distinguished element νWb (ξ ) = detF (PWb ξ )
(2.18)
is the vacuum vector. Equivalently, an element of FWb is a holomorphic function f : DETWb → C which is linear on each fibre, and from this view point the vacuum vector is the function νWb ([α, λ]) = λdetF (PWb α),
(2.19)
for any representative (α, λ) of the equivalence class [α, λ] ∈ Det(Wb , W ). A generalization of this leads to the Plücker embedding. First, for W ∈ Grb , fix an orthonormal basis {ei }i∈Z of Hb such that ei ∈ W ⊥ for i ≤ 0 and ei ∈ W for i > 0;. Let S be the set of all increasing sequences of integers S = (i1 , i2 , . . . ) with S −N and N−S finite. For each sequence S we have an admissible basis ξ(S) = {ei1 , ei2 , . . . } ∈ StW , and the Fredholm index of the operator PS PW : W → HS , where HS is the closed subspace spanned by ξ(S) and PS the corresponding orthogonal projection, defines a bijection π0 (Grb ) → Z. The Plücker coordinates of the basis ω ∈ StW are the collection of complex numbers ψS (ω)= detF (PS ω) = detF (ωS ), where ωS is the matrix formed ω+ from the rows of ω = labeled by S. In particular ψN (ω) is the coordinate defined ω− by the vacuum vector. If ω is a basis for W ∈ Grb , then the Plücker coordinates of a second admissible basis ω1 differ from those of ω by the Fredholm determinant of the matrix relating the two bases. The Plücker coordinates therefore define a projective embedding Grb → FW . This is prescribed equivalently by the map φ : StWb × StWb −→ C,
φ(τ, ω) = det F (τ ∗ ω) = detF (τ+∗ ω+ + τ−∗ ω− ), (2.20)
with respect to which the Plücker coordinates are ψS (ω) = φ(ξ(S), ω). φ is the same thing as the map on the determinant bundle gφ : DETWb × DETWb −→ C,
gφ ([α, λ], [β, µ]) = λµdet F (α ∗ PW β),
(2.21)
where α : Wb → W , (resp. β : Wb → W ), is antiholomorphic (resp. holomorphic) and antilinear (resp. linear) in the first (resp. second) variable. We then have the Plücker embedding map DETWb − {0} −→ FWb
(2.22)
578
J. Mickelsson, S. Scott
which maps [ω, λ] → λφ(ω, . ), or, using the Segal definition of the determinant line [α, λ] −→ ψ[α,λ]
(2.23)
defined for [α, λ] = [PW α, λ] ∈ DET(Wb , W ) and ξ : Wb → W by ψ[α,λ] (ξ ) = λdet F (α ∗ ◦ ξ ) = λdet F (α ∗ ◦ PW ◦ ξ ).
(2.24)
det(idWb ) −→ νWb ,
(2.25)
Notice that
where idWb := PWb ,Wb . The map (2.22) thus defines a projective embedding Grb → FW . The map (2.21) restricted to a linear map DETWb ×DETWb → C, defines a canonical metric on DETWb by [α, λ] 2 = |λ|2 det F (α ∗ α), and globally, via (2.9), we get the canonical metric of [20] on Det(D, P), det DPb
2
:= gφ (S(Pb ), S(Pb )) = detF (S(Pb )∗ S(Pb )).
(2.26)
On the other hand, we can use the map φ (or gφ ) to put a unitary structure on FW with respect to which (2.22) is an isometry. To do that we use the fact that any section in FW can be written as a linear combination of the ψ[α,λ] , and set !ψ[α,λ] , ψ[β,µ] "W = gφ ([α, λ], [β, µ]).
(2.27)
In particular, the finite linear combinations of the sections ψS , S ∈ S are dense in FW with respect to the topology of uniform convergence on compact subsets, and one has !ψS , ψS " = φ(ξ(S), ξ(S )) = δSS . Notice further the identities !νW , νW "W = 1
and
!ψ[α,λ] , ψ[α,λ] "W = [α, λ] 2 ,
(2.28)
the latter being the statement that (2.22) is an isometry. For further details see [17] and [14]. There is a different way of thinking about Fock spaces which is perhaps more familiar to physicists, as an infinite-dimensional exterior algebra (fermionic Fock space). Recall [14] that a polarization W of the Hilbert space Hb fixes a representation of the canonical anticommutation relations (CAR) in a Fock space F(Hb , W ), whose only non-zero anticommutators are a ∗ (v)a(u) + a(u)a ∗ (v) = !u, v".
(2.29)
The defining property of this irreducible representation is that there is a vacuum vector |W > with the property a(u)|W " = 0 = a ∗ (v)|W " for all u ∈ W, v ∈ W ⊥ . One has F(Hb , W ) = ∧(W ) ⊗ ∧((W ⊥ )∗ ) =
d=q−p∈Z
∧p (W ) ⊗ ∧q ((W ⊥ )∗ ).
(2.30)
(2.31)
Determinant Bundles and FQFT
579
The vacuum |W " is represented as the unit element in the exterior algebra. For u ∈ W , a(u) corresponds to interior multiplication by u, the creation operator a ∗ (u) is given by exterior multiplication. For u ∈ W ⊥ the operator a(u) (resp., a ∗ (u)) is given by exterior (resp., interior) multiplication by J u; here J : H → H ∗ is the canonical antilinear isomorphism from a complex Hilbert space to its dual. The vacuum |W > has then the characteristic property a(u)|W " = 0, u ∈ W , and a ∗ (v)|W >= 0, v ∈ W ∗ . If we choose a different W ∈ Grb then there is a complex vacuum line in F(Hb , W ) corresponding to the new polarization W . The different vacuum lines parameterized by the planes W form another realization for the determinant bundle DETW over Grb as a subbundle of the trivial Fock bundle with fibre FW . The Plücker embedding DETW →
F(Hb , W ) is defined by mapping (ω, λ) ∈ StW × C to λ S∈S detωS ψS , where ωS is as before. A Hermitian metric on F(Hb , W ) is again defined by < ψS , ψS >= δSS . On the other hand, the finite-dimensional matrix identity for α : Cm → Cn , β : Cn → Cm with n ≤ m: det(α(i))det(β(i)), (2.32) det(αβ) = (i)
the sum being over all sequences (i) = {1 ≤ i1 < i2 . . . in ≤ m}, with α(i) (resp. β(i)) the matrix obtained from A (resp. B) by selecting the columns of A (resp. rows of B) labeled by S, implies the pairing of Fock space vectors ψ[α,λ] (ξ(S))∗ ψ[β,µ] (ξ(S)). (2.33) !ψ[α,λ] , ψ[β,µ] " = S∈S
The metrics so defined on the CAR construction F(Hb , W ) and the geometric construction FW of the Fock space, then correspond under the algebraic isomorphism defined by associating to each section ψS ∈ FW the vector a(ei1 ) . . . a(eip )a ∗ (ej1 ) . . . a ∗ (ejq )|W " ∈ F(Hb , W ), where i1 < i2 < . . . ip ≤ 0 is the set of negative indices in the sequence S and 0 < j1 < j2 < . . . jq is the set of missing positive indices, giving a dense inclusion F(Hb , W ) −→ FW . Returning to the case of a family D of Dirac-type operators parameterized by a manifold B, if we are given a spectral section P ∈ Gr(M/B), then we have the global version of the above properties. Associated to P we have a Fock bundle FW → B and this is endowed with a unitary structure < , >P , given on the fibre FWb by (2.27). The bundle FW has a distinguished section, the vacuum section νP = νW , assigning to b ∈ B the vacuum vector νWb , with unit norm in the fibres. Associated to the canonical Calderon section P (D) defining the Fock bundle FK we then have a determinant line bundle DET(D, P) ∼ = DET(S(P)) and a generalized Plücker embedding DET(D, P) ∼ = DET(S(P)) −→ FK ,
(2.34)
corresponding to the viewpoint on the parameter space B as a generalized Grassmannian. More generally, for any pair of spectral sections P1 , P2 , there is a generalized Plücker embedding DET(P1 , P2 ) −→ FP1 ,
(2.35)
defined fibrewise by the embeddings Det(W1,b , W2,b ) A→ DETW1,b → FW1,b , which according to (2.22) is the map [α, λ] → ψ[α,λ] ∈ FP1 . So, a section of the determinant
580
J. Mickelsson, S. Scott
bundle DET(P1 , P2 ) defines a section of the Fock bundle FP1 . In particular, the vacuum section is the image of the determinant section in the “trivial” case DET(P1 , P1 ) −→ FP1 .
(2.36)
Associated to the family of Dirac operators D, we have a canonical vacuum section νK ∈ FK with < νKb , νKb >Kb = 1, and if we choose an external spectral section P, then via (2.34) we have a canonical section ψK,P of FK corresponding to the determinant section b → det(DPb ) ↔ det(S(Pb )) of DET(D, P), with !ψK,P , ψK,P "Kb = det(S(Pb ))
2
.
(2.37)
That is, the generalized Plücker embedding (2.34) is an isometry with respect to the canonical metric on DET(D, P). This follows by construction from (2.28). As we already mentioned, as an abstract vector bundle the Fock bundle is trivial. However, the non-triviality of the construction lies in the (locally defined) physical vacuum subbundle defined by the family of Hamiltonians. As an example, assume that we have a family of Dirac Hamiltonians parameterized by the set A of smooth vector potentials. Given a real number λ we can define W0 (A) as the subspace of the boundary Hilbert space corresponding to the spectral restriction DY,A > λ for the boundary Hamiltonian; A → W0 (A) is a smooth Grassmann section over the set Uλ ⊂ A of Hamiltonians with λ ∈ / Spec(DY,A ). Let A → W1 (A) be a globally defined Grassmann section. For each A ∈ Uλ we have a well-defined vacuum line |A" ∈ FW1 (A) . This line is just the image of the determinant line DET(W1 (A), W0 (A)) with respect to the map (2.22). If dim Y = 1 the Grassmannian Gr A does not depend on the parameter A and we may take W1 (A) as a constant section. Anyway, the bundle of vacua over Uλ can be identified as the relative determinant bundle DET(W1 , W0 ) and the twisting of this bundle depends solely on the twisting of the local section A → W0 (A).
3. Construction of the FQFT In this section we utilize the facts presented in the previous section to piece together a FQFT, generalized from the two dimensional case proposed by Segal [22].As in Sect. 1.1, the constructions are mathematical and do not refer to any particular physical system. In the next section we explain how the chiral anomaly and commutator anomaly arise in this context.
3.1. Strategy. We define a projective functor from a subcategory Cd of the category of spin manifolds to the category Cvect of Z-graded vector spaces and linear maps, which factors through the category CGr of linear relations: Cd −→ CGr % ↓ Cvect The combination of these functors is the Fock functor defining the FQFT.
Determinant Bundles and FQFT
581
3.2. Projective representations of categories. By a category C we mean a set Ob(C) of elements called the objects of C, and for any two elements a, b ∈ Ob(C) a set MorC (a, b) of morphisms a → b, such that for a, b, c ∈ Ob(C) there is a multiplication defined MorC (a, b) × MorC (b, c) −→ MorC (a, c),
(fa,b , fb,c ) −→ fb,c ◦ fa,b .
The product is required to be associative, so that if fc,d ∈ MorC (c, d), then fc,d (fb,c fa,b ) = (fc,d fb,c )fa,b . One usually also asserts the existence of an identity morphism idb ∈ MorC (b, b) which satisfies idb ◦ fa,b = fa,b and fb,c ◦ idb = fb,c . A (covariant) functor C from a category C to a category C means a map C : Ob(C) → Ob(C ) and for each pair a, b ∈ Ob(C) a map Ca,b : MorC (a, b) → MorC (C(a), C(b)) such that Ca,c (fb,c fa,b ) = Cb,c (fb,c )Ca,b (fa,b ).
(3.1)
If C is the category of vector spaces and linear maps, then C is a representation of the category C. Hence, from the viewpoint of FQFT, quantization may be roughly characterized as a projective representation of the category Cd of manifolds defined by classical geometric data. A classical result of Wigner tells us that in quantum systems we must content ourselves with projective representations of symmetry groups. Similarly, with the Fock functor we have to consider projective category representations. This means that there is essentially a scalar ambiguity in defining the map Ca,b , so that (3.1) is replaced by Ca,c (fb,c fa,b ) = c(fb,c , fa,b )Cb,c (fb,c )Ca,b (fa,b ),
(3.2)
where the “cocycle” c(fb,c , fa,b ) takes values in C−{0}. To explain the meaning here of “essentially”, recall that a projective representation of a group G is a true representation ˆ of G by C× . The group G ˆ forms a C× bundle over G whose of an extension group G ˆ Lie algebra cocycle is the first Chern class of the associated line bundle. Equivalently, G is defined by assigning to each g ∈ G a complex line Lg such that Lg1 g2 = Lg1 ⊗ Lg2 . (A well-known instance of this occurs for loop groups, see Example 4.1 below, and more generally we shall in Sect. 4 give the gauge group representations for a Yang– Mills action functional a similar description.) Likewise, a projective representation of a category is a true representation of an extension category Cˆ constructed by assigning to each fa,b ∈ MorC (a, b) a complex line Lfa,b , and given fb,c ∈ MorC (b, c), an identification Lfa,b ⊗ Lfb,c −→ Lfa,b fb,c ,
(3.3)
which is associative in the natural sense. One then has ˆ = Ob(C), Ob(C)
MorCˆ (a, b) = {(f, λ) | f ∈ MorC (a, b), λ ∈ Lf }.
(3.4)
582
J. Mickelsson, S. Scott
3.3. The category Cd . An element of Ob(Cd ) is a pair (Y, W ), where Y = (Y, gY , SY ⊗ ξY ), with Y is a closed, smooth and oriented d-dimensional spin manifold, gY a Riemannian metric on Y , SY a spinor bundle over Y , ξY a Hermitian bundle over Y with compatible gauge connection, and W is an admissible polarization of the “one-particle” Hilbert space HY = W ⊕ W ⊥ to a pair of closed infinite-dimensional subspaces. Here HY = L2 (Y, SY ⊗ ξY ) and admissible means that PW ∈ GrY , where PW is the orthogonal projection onto W and GrY is the Hilbert–Schmidt Grassmannian defined with respect to the energy polarization HY = H + ⊕H − into positive, resp. negative, energies of the Dirac operator DY . Let (Yi , Wi ) ∈ Ob(Cd ), i = 1, 2, where Yi = (Yi , gYi , SYi ⊗ ξYi ). An element of MorCd ((Y1 , W1 ), (Y2 , W2 )) is a triple X = (X, gX , SX ⊗ ξX ), where X smooth and oriented (d+1)-dimensional spin manifold with boundary ∂X = Y1 Y2 , gX a Riemannian metric on X with (gX )|Yi = gYi , SX a spinor bundle and ξX a Hermitian bundle over X with compatible gauge connection, such that (SX ⊗ξX )|Yi ∼ = SY ⊗ξYi and the connections metrics correspond under the isomorphism. We refer to X as a geometric cobordism from Y1 to Y2 . We assume that: • In a collar neighbourhood of the boundary U = U1 U2 , where Ui = ([0, 1] × Yi ) all metrics, connections are of product-type. Recall this means that near the boundary the metric becomes the product of the standard metric on the real axis and the boundary metric. Similarly, the gauge connection approaches smoothly the connection on the boundary such that at the boundary all the normal derivatives vanish. Thus ξX|Ui is a pull-back of the boundary bundle (ξYi ), and similarly all metrics, connections, etc. are pull-backs of their boundary counterparts, so gX|Ui = du2 + gYi , etc. • The orientation on the “ingoing” boundary Y1 is assumed to be induced by the orientation of X and the inward directed normal vector field on the boundary, whereas for the “outgoing” boundary Y2 the orientation is fixed by the outward directed normal vector field. For notational brevity we may write Si := SYi ⊗ ξYi , gi := gYi etc., and S1,2 := SX ⊗ ξX , g1,2 := gX etc. in the following. We augment Cd by including the empty set ∅ ∈ Ob(Cd ), and for each (Y, W ) ∈ Ob(Cd ) we also allow ∅ := id as an element of MorCd ((Y, W ), (Y, W )). In particular, a geometric cobordism X ∈ MorCd (∅, (Y, W )) means d + 1-dimensional manifold X with boundary Y (plus bundles, connections etc). Thus a morphism in Cd may have disconnected, connected, or empty (X is closed) boundary, according to whether Y is disconnected, connected or empty. For (Yi , Wi ) ∈ Ob(Cd ), i = 1, 2, 3, there is an associative product map MorC ((Y1 , W1 ), (Y2 , W2 )) × MorC ((Y2 , W2 ), (Y3 , W3 ))
(3.5)
−→ MorC ((Y1 , W1 ), (Y3 , W3 )) taking a pair (X1,2 , X2,3 ) to the geometric cobordism X1,2 ∪Y2 X2,3 = (X1,2 ∪Y2 X2,3 , g1,2 ∪ g2,3 , S1,2 ∪σ S2,3 ).
(3.6)
This “sewing together” of bundles is defined in the usual way, the crucial point being that all geometric data in the collar is pulled back from the boundary and is hence compatible under sewing of manifolds. Briefly, the collar neighbourhood Ur = [0, 1) × Y2 of the boundary of Y2 in X2,3 is a copy of the collar neighbourhood Ul = (−1, 0] × Y2 of the boundary of Y2 in X1,2 but with orientation reversed. Hence we may glue together
Determinant Bundles and FQFT
583
the manifolds X1,2 and X2,3 along Y2 to get the “doubled” manifold X1,2 ∪Y2 X2,3 with a tubular neighbourhood of the partition Y2 which we may parameterize as U = (−1, 1) × Y2 . Associated to the geometric data we have Dirac operators D1,2 and D2,3 acting respectively on sections of the Clifford bundles S1,2 and S2,3 . Over Ul the operator D1,2 takes the product form σ (∂/∂u + DY2 ), because of the change of orientation, and hence chirality, (D2,3 )|Ur = σ −1 (∂/∂v + σ DY2 σ −1 ). Over Y2 we construct S1,2 ∪σ S2,3 by gluing S1,2 to S2,3 via the unitary isomorphism σ , identifying s ∈ (S1,2 )|Y with σ s ∈ (S2,3 )|Y . (Thus σ takes positive spinors to negative spinors.) A section of S1,2 ∪σ S2,3 is a pair (ψ, φ) with ψ (resp. φ) is a smooth section of S1,2 (resp. of S2,3 ) such that ∂k ∂k k the normal derivatives of all orders match-up: ∂u k ψ(0, y) = (−1) σ (y) ∂uk φ(0, y). We then have the Dirac-type operator (D1,2 ∪ D2,3 )(ψ, φ) = (D1,2 ψ, D2,3 φ) acting on C ∞ (X1,2 ∪Y2 X2,3 , S1,2 ∪σ S2,3 ), associated to the induced geometric data on X1,2 ∪Y2 X2,3 . 3.4. The category CGr . An element of Ob(CGr ) is a pair (H, W ) with H a Hilbert space, and W a polarization of H into a pair of closed orthogonal infinite-dimensional subspaces H = W ⊕ W ⊥ . A morphism (E, H) ∈ MorCGr ((H 1 , W1⊥ ), (H2 , W2 )) is a closed subspace E ⊂ H 1 ⊕ H2 such that PE − PW ⊥ ⊕W2 is a Hilbert–Schmidt operator, 1
where PE , PW ⊥ ⊕W2 are the orthogonal projections with range E, W1⊥ ⊕W2 respectively, 1
along with an element H of the relative determinant line Det(W1⊥ ⊕ W2 , E). (It is convenient here to use the “reverse” polarization W1⊥ of H1 in order to account for boundary orientations later on, see below.) Thus there is an identification MorCGr ((H1 , W1⊥ ), (H2 , W2 )) = DETW ⊥ ⊕W2 ,
(3.7)
1
where the right-side is the determinant line bundle based at W1⊥ ⊕W2 over the trace-class Grassmannian Gr(H 1 ⊕ H2 ). Here H 1 serves to remind us that we are considering the reverse polarization W1⊥ ; we may write Gr(H 1 ⊕ H2 ) = Gr(H 1 ⊕ H2 , W1⊥ ⊕ W2 ),
DETW ⊥ ⊕W2 = DETW ⊥ ⊕W2 (H 1 ⊕ H2 ) 1
1
if we wish to emphasize the polarization. We also allow ∅ ∈ Ob(CGr ) as an object, and define MorCGr (∅, (H, W )) = DETW (H ), MorCGr ((H , W ⊥ ), ∅) = DETW ⊥ (H ), MorCGr (∅, ∅) = C.
(3.8)
To define the product of morphisms in CGr , first recall when H0 ) = ∅ ) = H2 , from the “category of linear relations”, the “join” product rule 1⊥ ⊕ W2 ) Gr(H 0 ⊕ H1 , W0⊥ ⊕ W1 ) × Gr(H 1 ⊕ H2 , W −→ Gr(H 0 ⊕ H2 , W0⊥ ⊕ W2 ),
(3.9)
(E01 , E12 ) −→ E01 ∗ E12 , where W1 ∈ Gr(H1 , W1 ), defined by E01 ∗ E12 = {(u, v) ∈ H0 ⊕ H2 | ∃ w ∈ H1 such that (u, w) ∈ E01 , (w, v) ∈ E12 }.
584
J. Mickelsson, S. Scott
The join is a generalized composition law of graphs of linear operators, but here the morphisms E are not in general everywhere defined, but dom(E) = range(PH1 PE : E → H1 ), and may also be “multi-valued”. The composition may therefore be discontinuous. From [22] we recall that for continuity one requires that: (i) the map E01 ⊕ E12 → H1 , ((u, w), (w , v)) → w − w is surjective, and (ii) E01 ⊕ E12 → H0 ⊕ H1 ⊕ H2 , ((u, w), (w , v)) → (u, w − w , v) is injective. The crucial fact is the following: Proposition 3.1. With the above notation, when (H0 , W0⊥ ) = ∅ = (H2 , W2 ) there is a canonical pairing, linear and holomorphic on the fibres in the first and second variables, 1 ). κ : DETW1 × DETW ⊥ −→ Det(W1 , W
(3.10)
κ : DETW1 × DETW ⊥ −→ C.
(3.11)
1
1 , then If W1 = W 1
More generally, if (i) and (ii) hold, then one has such a pairing κ : DETW ⊥ ⊕W1 (H0 ⊕ H1 ) × DETW ⊥ ⊕W2 (H 1 ⊕ H2 ) 0
1
1 ), −→ DETW ⊥ ⊕W2 (H 0 ⊕ H2 ) ⊗ Det(W1 , W 0
(3.12)
which respects the join multiplication: on each fibre 1⊥ ⊕ W2 , E12 ) κ : Det(W0⊥ ⊕ W1 , E01 ) × Det(W 1 ). −→ Det(W0⊥ ⊕ W2 , E01 ∗ E12 ) ⊗ Det(W1 , W
(3.13)
(Here the second factor on the right-side of (3.12) denotes the trivial bundle with fibre 1 ).) If W1 = W 1 , then Det(W1 , W κ : DETW ⊥ ⊕W1 (H0 ⊕ H1 ) × DETW ⊥ ⊕W2 (H 1 ⊕ H2 ) −→ DETW ⊥ ⊕W2 (H 0 ⊕ H2 ). 0 1 0 (3.14) Proof. As before, we denote by PW,W the orthogonal projection onto W restricted ⊥ ) = Gr(H 1 , W ⊥ ), to the subspace W . Given E ∈ Gr(H1 , W1 ), E ∈ Gr(H 1 , W 1 1 ⊥ , E ) as the determinant we can represent elements H ∈ Det(W1 , E) and δ ∈ Det(W 1 ⊥ → E with PW1 aH − idW1 and elements of linear operators aH : W1 → E and bδ : W 1 PW ⊥ bδ − idW ⊥ trace-class. We define 1
1
1⊥ , E ) −→ Det(W1 , W 1 ), κ : Det(W1 , E) × Det(W
(3.15)
by 1⊥ bδ ) ∈ Det(W1 ⊕ W 1⊥ , H1 ) ∼ 1 ), κ(H, δ) = det(P1 aH + P = Det(W1 , W
(3.16)
1 are the projections on W1 , W 1 , and Det(W1 ⊕ W ⊥ , H1 ) is the determinant where P1 , P 1 ⊥ ⊥ : W1 ⊕W → H1 . This operator differs from P1 aH +P ⊥ bδ by an operator line of P1 +P 1 1 1 of trace-class (so (3.15) is well-defined) because PW1 aH − idW1 and PW ⊥ bδ − idW ⊥ are 1 1 trace-class.
Determinant Bundles and FQFT
585
The canonical isomorphism on the right-side of (3.16) is expressed via the diagram of commutative maps with exact rows and Fredholm columns ⊥ −−−−→ W ⊥ −−−−→ 0 0 −−−−→ W1 −−−−→ W1 ⊕ W 1 1
P1 P1 +P1⊥
id
P1 P1 1 −−−−→ 0 −−−−→ W
H1
⊥ −−−−→ 0 −−−−→ W 1
where the horizontal maps are the obvious ones. Such a diagram defines an isomorphism between the determinant line of the centre map with the tensor product of the lines defined by the outer columns, mapping the determinant elements to each other [22, 20]. Hence since Det(id) = C canonically, the isomorphism follows, and with E = W1 and ⊥, E = W 1 κ(det(idW1 ), det(idW 1 ,W1 ), ⊥ )) = det(PW 1
(3.17)
where idW = PW,W , which will be relevant later in this section. 1 and choose H ∈ Det(W ⊥ ⊕ For the general case (3.12), suppose initially that W1 = W 0 ⊥ W1 , E01 ) and δ ∈ DET(W1 ⊕ W2 , E12 ) identified with the determinant elements of linear operators aH : W0⊥ ⊕ W1 → E01 and bδ : W1⊥ ⊕ W2 → E12 . Define κ1 : Det(W0⊥ ⊕ W1 , E01 ) × DET(W1⊥ ⊕ W2 , E12 ) −→ DET(W0⊥ ⊕ H1 ⊕ W2 , E01 ⊕ E12 ),
(3.18)
κ1 (H, δ) = det(aH ⊕ bδ ). On the other hand, from [22], conditions (i) and (ii) mean that there is an exact sequence 0 −→ E01 ∗ E12 −→ E01 ⊕ E12 −→ H1 −→ 0,
(3.19)
and this fits into the commutative diagram with Fredholm columns 0 −−−−→ E01 ∗ E12 −−−−→ P ⊥ PE ∗E
W0 ⊕W2 01 12
E01 ⊕ E12
G
−−−−→ H1 −−−−→ 0
id
0 −−−−→ W0⊥ ⊕ W2 −−−−→ W0⊥ ⊕ H1 ⊕ W2 −−−−→ H1 −−−−→ 0 where we modify (3.19) by composing the injection E01 ∗ E12 −→ E01 ⊕ E12 with the involution ((u, w), (w , v) → ((u, w), (−w , v), and the following surjection to ((u, w), (w , v) → (u, w + w , v), while the lower maps are again the obvious ones. The central column is G(ξ, η) = (PW ⊥ PH0 ξ, PH1 ξ + PH1 η, PW2 PH1 η). 0
Because PH1 PE01 − PW1 PH1 PE01 = PH1 (PE01 − PW ⊥ ⊕W1 )PE01 and PH1 PE12 − 0 PW ⊥ PH1 PE12 = PH1 (PE12 − PW ⊥ ⊕W2 )PE12 are trace-class, the operators G and GW1 , 1 1 where ⊥ ⊥ GW1 (ξ, η) = (PW P ξ, PW1 PH1 ξ + PW P η, PW2 PH1 η), 0 H0 1 H1
586
J. Mickelsson, S. Scott
differ by only trace-class operators and so Det(G) = Det(GW ) = DET(E01 ⊕ E12 , W0⊥ ⊕ H1 ⊕ W2 ), while from the diagram we have Det(G) ∼ = Det(E01 ∗ E12 , W0⊥ ⊕ W2 ). Thus by duality (i.e. take adjoints in the above diagrams, reversing the order of the columns and rows and the direction of the arrows) we have a canonical isomorphism Det(W0⊥ ⊕ W2 , E01 ∗ E12 ) ∼ = DET(W0⊥ ⊕ H1 ⊕ W2 , E01 ⊕ E12 ), and so composition 1 . In the general case, replace with κ1 completes the proof of (3.12) in the case W1 = W ⊥ and repeat H1 in (3.18) and the lower row of the commutative diagram by W1 ⊕ W 1 the argument used in the proof of (3.10). Finally, we note for later reference that in the “vacuum case” E01 = W0⊥ ⊕ W1 and E12 = W1⊥ ⊕ W2 one has E01 ∗ E12 = W0⊥ ⊕ W2 and κ(det(idE01 ), det(idE12 ) = det(idE01 ∗E12 ).
+
(3.20)
From (3.14) and the identification (3.7) we now have a canonical multiplication MorCGr ((H 0 , W0⊥ ), (H1 , W1 )) × MorCGr ((H 1 , W1⊥ ), (H2 , W2 )) −→ MorCGr ((H 0 , W0⊥ ), (H2 , W2 )),
(3.21)
(E0,1 , H), (E1,2 , δ)) −→ (E0,1 ∗ E1,2 , H ∗ δ),
where H ∗ δ :=
κ(H, δ) if (i) and (ii) hold, 0 otherwise.
In particular, MorCGr (∅, (H1 , W1 )) × MorCGr ((H1 , W1⊥ ), ∅) −→ MorCGr (∅, ∅),
(3.22)
is precisely Eq. (3.11). 3.5. The projective functor Cd → CGr . Define C : Ob(Cd ) −→ Ob(CGr ),
(Y, W ) −→ (HY , W ),
(3.23)
where as before HY = L2 (Y, SY ⊗ ξY ) and W is an admissible polarization. While for (Y1 , W1 ), (Y2 , W2 ) ∈ Ob(Cd ), C : MorCˆ ((Y1 , W1 ), (Y2 , W2 )) −→ MorC ˆ ((H Y1 , W1⊥ ), (HY2 , W2 )) d
Gr
X −→ (K12 , H),
(3.24)
where K12 ⊂ H Y1 ⊕ HY2 is the Calderon subspace of boundary ‘traces’ of solutions to the Dirac operator D 1,2 over X defined by the geometric data in X , and 1,2 H ∈ Det(W1⊥ ⊕ W2 , K12 ) ∼ = Det(DP ⊥ )∗ . Taking into account that Y1 is an inW1 ,W2
coming boundary, we have K12 ∈ Gr(H Y1 ⊕ HY2 ; W1⊥ ⊕ W2 ) (in fact, an element of
Determinant Bundles and FQFT
587
the ‘smooth Grassmannian’). The choice needed of the element H means that C is a true functor Cd → CGr , where Cd is the extension category of Cd whose objects are the same as Cd , and MorCd (Y1 , Y2 ) = {(X , z) | X ∈ MorCd (Y1 Y2 ), H ∈ Det(W1⊥ ⊕ W2 , K12 )}.
(3.25)
For a closed geometric cobordism X ∈ MorCd (∅, ∅) we set MorCd (∅, ∅) = Det(DX ),
(3.26)
the projectivity of the functor in this case corresponds to a choice of generator for Det(DX ), defining Det(DX ) ∼ = C = MorCGr (∅, ∅). To see the functor respects the product rules in each category, it is enough to show that K01 ∗ K12 is the Calderon subspace of the operator D 0,1 ∪ D 1,2 , i.e. K(D 0,1 ∪ D 1,2 ) = K(D 0,1 ) ∗ K(D 1,2 ) defined by morphisms X0,1 , X1,2 . This, however, is immediate from the definition of D 0,1 ∪ D 1,2 , and the fact that given ψ ∈ Ker D 0,1 , φ ∈ Ker D 1,2 it is enough for their boundary values to match up in order to get an element of Ker (D 0,1 ∪ D 1,2 ). That in turn follows because the product geometry in the collar neighbourhood
U of the outgoing boundary Y1 of X0,1 implies that ψ has the form ψ(u, y) = k e−λk u ψk (0)ek (y), where {λk , ek } is a spectral resolution of HY1 defined by the boundary Dirac operator. (To be quite correct, we should also include the identification by the boundary isomorphism σ (y) in the definition of the join K01 ∗ K12 , but this introduces no new phenomena.) Thus the requirement (3.3) for the rule X → Det(W1⊥ ⊕ W2 , K12 ) to define a projective extension of CGr , is (3.14) of Proposition 3.1. 3.6. The functor CGr → Cvect . The functor M from the category CGr to the category Cvect of (Z-graded) vector spaces and linear maps, is defined on objects of CGr by (H, W ) −→ FW = FW (H )
(H , W ⊥ ) −→ FW ⊥ = FW ⊥ (H ),
(3.27)
∅ −→ C. Thus M takes a polarized vector space to the Fock space defined by the polarization, and F∅ = C is by fiat. Here FW ⊥ = FW ⊥ (H ) := 2hol (Gr(H ), DETW ⊥ ) is the Fock space associated with the reverse polarization. M is defined on morphisms as follows. From (3.7), a morphism in MorCGr ((H1 , W1⊥ ), (H2 , W2 )) is the same thing as an element H ∈ DETW ⊥ ⊕W2 , which we may think of as the pair 1
(E, H) where H ∈ Det(W1⊥ ⊕ W2 , E). By the Plücker embedding (2.22) this gives us a canonical vector φH ∈ FW ⊥ ⊕W2 (H 1 ⊕ H2 ) ∼ = FW ⊥ (H 1 ) ⊗ FW2 (H2 ). 1
1
(3.28)
The isomorphism is immediate from (2.31) since FW (H ) is the completion of F(H, W ). To proceed we need the following facts, generalizing Eq. (8.10) of [22]:
588
J. Mickelsson, S. Scott
Proposition 3.2. The determinant bundle pairing κ of Proposition 3.1 defines a canonical Fock space pairing 1 ), ( ) : FW1 × FW ⊥ −→ Det(W1 , W
(3.29)
(νW1 , νW 1 ,W1 ). ⊥ ) = det(PW
(3.30)
1
with 1
1 , this becomes If W1 = W ( ) : FW1 × FW ⊥ −→ C, 1
(νW1 , νW ⊥ ) = 1. 1
(3.31)
More generally, κ defines a pairing FW ⊥ ⊕W1 (H 0 ⊕ H1 ) × FW ⊥ ⊕W2 (H 1 ⊕ H2 ) 0
1
1 ), −→ FW ⊥ ⊕W2 (H 0 ⊕ H2 ) ⊗ Det(W1 , W
(3.32)
(φH , φδ ) = φκ(H,δ) .
(3.33)
(νW ⊥ ⊕W1 , νW ⊥ ⊕W2 ) = νW ⊥ ⊕W2 ⊗ det(PW2 ,W0 ).
(3.34)
0
with
In particular, 0
1
0
Proof. First notice that in the finite-dimensional case there is a natural isomorphism between the Fock space (the exterior algebra) and its dual defined by the pairing ∧k H × ∧n−k H → Det(H ), (λ1 , λ2 ) → λ1 ∧ λ2 , while in the infinite-dimensional case the pairing using the CAR construction follows directly from the definition F(H, W ) = ∧(W ) ⊗ ∧((W ⊥ )∗ ). For the geometric Fock space FW , the construction of the pairing from the determinant bundle pairing κ on DETW × DETW ⊥ is entirely analogous to the construction of the inner-product < , >W on FW from the determinant bundle pairing gφ on DETW × DETW in Eq. (2.21). Indeed, in the case of the vacuum elements the two pairings are canonically identified (see (3.36) below and Sect. 5). Let us deal first with the case (3.29). We give first the invariant definition, and then the “constructive” definition along the lines of < , >W in Sect. 2. Invariantly, in the 1 , the pairing κ : DETW1 × DET ⊥ → C defines an embedding γ : case W1 = W W1 ∗ DETW1 − 0 → FW ⊥ by γ (a)( . ) = κ(a, . ), and hence a map ρ : FW ⊥ → FW , 1 ∗ ∗ ρ(f )( . ) = f (γ ( . )). This gives us a pairing FW × FW ⊥ → C with (f, g) = ∗∗ ∼ F f (γ (g)), and by duality the asserted pairing, since FW = W in the topology of uniform convergence on compact subsets of Gr(H ) (ψS ↔ evaluation at ξ(S), cf. [17], Sect. 10.2; [12], Sect. 6.2). The general case follows in the same way with C 1 , W1 ). replaced by Det(W Constructively, recall that any section in FW can be written as a linear combination of the ψ[α,λ] , with [α, λ] ∈ DETW . Hence for [α, λ] ∈ DETW1 , [β, µ] ∈ DETW ⊥ we 1 can define the Fock pairing by setting 1 , W1 ), (ψ[α,λ] , ψ[β,µ] ) = κ([α, λ], [β, µ]) ∈ Det(W
(3.35)
Determinant Bundles and FQFT
589
and then extending by linearity. In particular, from (2.25) we have νW1 = ψ[idW1 ,1] and νW , and so ⊥ = ψ[idW ⊥ ,1] 1
1
(νW1 , νW 1 ,W1 ), ⊥ ) = κ(det(idW1 ), det(idW ⊥ )) = det(PW 1
1
where the final equality is Eq. (3.17). Notice further that if we extend gφ in (2.21) to a 1 , W1 ) by gφ ([α, λ], [β, µ]) = λµdet(α ∗ PW β), map gφ : DETW1 × DETW 1 → Det(W then the Fock space inner-product becomes a Hermitian pairing < , >W1 : FW1 ×FW 1 → 1 , W1 ) and with respect to the identification Gr(H, W1 ) ↔ Gr(H 1 , W ⊥ ), W ↔ Det(W 1 W ⊥ we have νW 1 and ⊥ ↔ νW 1
(νW1 , νW 1 >W1 = det(PW 1 ,W1 ). ⊥ ) =< νW1 , νW
(3.36)
1
The pairing (3.32) now follows from (3.29) and (3.28). Alternatively we can define it directly as (ψ[α,λ] , ψ[β,µ] ) = ψκ([α,λ],[β,µ]) , where κ is the pairing (3.12). Note that if conditions (i) and (ii) do not hold then κ([α, λ], [β, µ]) = 0. Equation (3.33) is now just by construction, and Eq. (3.34) follows easily from (3.20). + ∗ and hence the The Fock space pairing (3.31) defines an isomorphism FW ⊥ ∼ = FW 1 1
vector φH ∈ FW ⊥ (H 1 ) ⊗ FW2 (H2 ) defined by 1
H ∈ MorCGr ((H1 , W1⊥ ), (H2 , W2 )) is canonically an element of Hom(FW1 (H1 ), FW2 (H2 )) which is a morphism of Cvect , as required. In the case (H 0 , W0⊥ ) = ∅ the map MorCGr (∅, (H1 , W1 )) → Hom(C, FW1 ) is defined by 1 → νW , and similarly when (H2 , W2 ) = ∅. The functoriality of the composition of linear maps with respect to the multiplication in CGr is precisely (3.33). We may state this as: Theorem 3.3. The category multiplication in CGr induces through the Fock space functor a canonical multiplication in the category Cvect . This can be conveniently summarized in the statement that the following diagram commutes: Gr(H 0 ⊕ H1 , W0⊥ ⊕ W1 ) × Gr(H 1 ⊕ H2 , W1⊥ ⊕ W2 ) −−−−→ Gr(H 0 ⊕ H2 , W0⊥ ⊕ W2 ) ∗
(H,δ)
H∗δ DETW ⊥ ⊕W × DETW ⊥ ⊕W 1 2 0 1
Plucker
−−−−→
DETW ⊥ ⊕W 2 0
Plucker
FW ⊥ ⊗ FW1 ⊗ FW ⊥ ⊗ FW2 1 0
−−−−→
FW ⊥ ⊗ FW2 0
κ
(,)
,
where H, δ are, respectively, a choice of section of the bundles DETW ⊥ ⊕W1 and 0 DETW ⊥ ⊕W2 . 1
590
J. Mickelsson, S. Scott
3.7. The Fock Functor. The Fock functor Z : Cd → Cvect is the projective representation of Cd defined by the composition of the functors C and M, thus Z is the functor Z = M ◦ C : Cd −→ Cvect .
(3.37)
Z : Ob(Cd ) −→ Ob(Cvect ),
(3.38)
Z acts on objects of Cd by
Z((Y, W )) = FW (HY ),
Z(∅) = C,
and on morphisms by Z : MorCd ((Y1 , W1 ), (Y2 , W2 )) −→ MorCd (FW1 (HY1 ), FW2 (HY2 )), Z((X , H)) = φH ,
H ∈ Det(W1⊥ ⊕ W2 , K(DX )),
(3.39)
where (Y1 , W1 ),(Y2 , W2 ) are not both empty, and φH is defined as in Sect. 3.6 by the Fock space pairing. If (Y1 , W1 ) = ∅ = (Y2 , W2 ), so X is a closed manifold, then Z(X ) = det(DX ) ∈ Det(DX ) ∼ = C, where the trivialization requires a choice. The “sewing property” of the FQFT is precisely the functorial Fock space pairing of Proposition 3.2. Note that if both Wi ) = ∅ it is not possible to choose φH to be the vacuum vector νW ⊥ ⊕W2 ∈ FW ⊥ ⊕W2 , since K(DX ), depending on global data, is always 1
1
transverse to the pure boundary data W1⊥ ⊕W2 . Consider though the case W1 = ∅. Let X be a closed connected manifold partitioned by an embedded codimension 1 submanifold Y , so that X = X0 ∪Y X 1 . Here X 0 , X1 are manifolds with boundary Y , where ∂X 0 = Y has outgoing orientation and ∂X1 = Y has incoming orientation. X 0 is assumed to be associated to a morphism X 0 in MorCd (∅, (Y, W )) for a choice of admissible polarization W ∈ Gr(HY ). In this case we can choose W = K(D 0 ) and φH = νK(D 0 ) . Similarly, we have X 1 ∈ MorCd ((Y, W ⊥ ), ∅), and we may choose W ⊥ = K(D 1 ). As a corollary of the properties of Z we then have the following algebraic sewing law for the determinant with respect to a partitioned closed manifold. Theorem 3.4. There are functorial bilinear pairings ( , ) : FK(D 0 ) (HY ) × FW ⊥ (H Y ) −→ Det(DP0 W ),
(3.40)
where the right-side is the determinant line of the EBVP DP with (νK(D 0 ) , νW ⊥ ) = det(DP0 ),
(3.41)
( , ) : FK(D 0 ) (HY ) × FK(D 1 ) (H Y ) −→ Det(DX ),
(3.42)
and
where the right-side is the determinant line of the Dirac operator DX over the closed manifold X, with (νK(D 0 ) , νK(D 1 ) ) = det(DX ).
(3.43)
Determinant Bundles and FQFT
591
Proof. We just need to recall a couple of facts. From Eqs. (3.29) and (3.30) we have a pairing FK(D 0 ) (HY )×FW ⊥ (H Y ) −→ Det(K(D 0 ), W ) with (νK(D 0 ) , νW ⊥ ) = det(S(PW )), where S(PW ) : K(D 0 ) → W is the operator of Sect. 2. But from (2.9) there is a canonical isomorphism Det(S(PW )) ∼ = Det(DPW ), with det(S(PW )) → det(DP ,b ). This proves the first statement. The second statement follows similarly upon recalling from [20] (Theorem 3.2) that there is a canonical isomorphism DET((I − P (D 1 ) ◦ P (D 0 )) ∼ = Det(DX ), again preserving the determinant elements. + Thus one may think of the determinant det(DP ) as an object in the complex line Det(K(D), P ) depending on a choice of boundary condition P , or absolutely the “quantum determinant” of D as a ray in the Fock space FK(D) defined by the vacuum vector that does not depend on a choice of P . The two viewpoints are related by (3.41). Finally, we point out that, in particular, the Fock functor naturally defines a map from geometric fibrations to vector bundles. To a geometric fibration N of closed ddimensional manifolds endowed with a spectral section P it assigns the corresponding Fock bundle FP . A “projective” morphism between objects (N1 , P1 ) and (N2 , P2 ) is a geometric fibration of M of d + 1-dimensional manifolds with boundary N1 N2 along with a section of the determinant bundle DET(P⊥ 1 ⊕ P2 , K(D)), where D is the family of Dirac operators defined by M. This defines a bundle map FP1 → FP2 using the generalized Plucker embedding (2.34) and the Fock space pairing. For a partition of a closed geometric fibration M = M 0 ∪N M 1 over a parameter manifold B by an embedded fibration of codimension 1 manifolds, the analogue of Theorem 3.4 then states that there are functorial Fock bundle pairings: ( , ) : FD0 × FP⊥ −→ DET(D0 , P),
(νD0 , νP⊥ ) = det(DP0 ),
(3.44)
where the right-side is the determinant line bundle of the family of EBVPs (D0 , P), while νD0 , νP are the vacuum sections of the Fock bundles FD0 , FP (see (2.15)), and det(DP0 ) the determinant section of DET(D0 , P); and ( , ) : FD0 × FD1 −→ Det(DM ),
(νD0 , νD1 ) = det(DM ).
(3.45)
The proof again requires only the properties of the Fock bundle pairing and the determinant bundle identifications of Sect. 2 and [20]. Notice that there is no regularization here of the determinant, only a pairing between bundle sections. 4. Gauge Anomalies and the Fock Functor In this section we give a physical application of these ideas with a Fock functor description of the chiral and commutator anomalies for an even-dimensional manifold with (odd-dimensional) boundary. The Fock functor assigns vector spaces to all odd-dimensional compact oriented spin manifolds Y and polarizations. There is no further restriction on the topology of Y . However, in this section we shall restrict to a fixed topological type for Y . For our purposes this is no real restriction since our principal aim is to understand the action of continuous symmetries, diffeomorphisms and gauge transformations, on the family of
592
J. Mickelsson, S. Scott
Fock spaces and on the morphisms between the Fock spaces; the action of the symmetry group cannot change the topological type of Y . To be more concrete, we shall consider the case of the parameter space B = A of smooth vector potentials labeling the geometries over Y . Thus we are lead to consider the action of the group of gauge transformations on the bundle F of Fock spaces over the base A. The gauge transformations act naturally on the base A and thus we have a lifting problem: Construct a (projective) action of the gauge group in the total space of F intertwining with the family of quantized Dirac Hamiltonians in the fibers. We want to stress that we are not going to construct a representation of the gauge group in a single Fock space but we have linear isometric action between different fibers of the Fock bundle. First, we recall some known facts about gauge anomalies in even dimensions. Let M be a closed even-dimensional Riemannian spin manifold and let A be the space of vector potentials on a trivial complex G-bundle over M. For each A ∈ A we have a coupled Dirac operator DA : C ∞ (M; S ⊗ E) → C ∞ (M; S ⊗ E) given locally by DA =
n
σi (∂i + 2i + Ai ) ,
i=1
where 2i and Ai are respectively the components of the local spin connection and Gconnection A, and σi the Clifford matrices. Since M is even-dimensional, then DA splits into positive and negative chirality components, and the object of interest is the Chiral Dirac operator + DA = DA (
1 + γn+1 ) : C ∞ (M; S + ⊗ E) → C ∞ (M; S − ⊗ E). 2
Acting on A we have the group of based gauge transformations G, which acts covari+ + ± ± = g −1 DA g, so that Ker Dg.A = g(Ker DA ). We are antly on the Dirac operators Dg.A interested in the Fermionic path integral: ∗ + Z(A) = e M ψ DA ψ dm DψDψ ∗ , (4.1) C ∞ (M;S + ⊗E)
and a formal extension of finite-dimensional functional calculus gives + Z(A) := det(DA ).
To obtain an unambiguous regularization of (4.1) we therefore require a gauge covariant regularized determinant varying smoothly with A in order that Z(A) pushes down smoothly to the moduli space A/G. In the case of Dirac fermions (both chirality sectors) this can be done and there is a gauge invariant regularized determinant detreg (DA ). For chiral Fermions on the other hand, there is an obstruction due to the presence of zero modes of the Dirac operator. The covariance of the kernels means that the determinant line bundle descends to A/G and the obstruction to the existence of a covariant Z(A) varying smoothly with A ∈ A is the first Chern class of the determinant bundle on A/G, which is the topological chiral anomaly. A 2-form representative for the Chern class DETD+ can be constructed as the transgression of the 1-form ω1 ∈ P(G), + 1 d(det r (DgA )) , ω1 (g) = + 2π i det r (DgA )
Determinant Bundles and FQFT
593
measuring the obstruction to gauge covariance of a choice of regularized determinant + det r (DA ). For details see [2, 14]. In the case of a manifold X with boundary Y new complications arise. Fixing an elliptic boundary condition (spectral section) P for the family of chiral Dirac operators + D+ = {DA : A ∈ A}, we obtain a Fock bundle FP over A to which we aim to lift the G action. It is natural to look first at gauge transformations (or diffeomorphisms) which are trivial on the boundary. In fact, the calculation of the Chern class in [2] can be extended to this case using a version of the families index theory for a manifold with boundary, [5, 16]. The gauge variation of the chiral determinant can be written as + + det r (Dg.A ) = detr (DA )ω(g; A),
(4.2)
where log ω is an integral over X of a local differential polynomial in g, A and the metric on X; ω is the integrated version of the “infinitesimal” anomaly form ω1 . The important point is that the formula applies both to the case of a manifold with/without boundary. In fact, in the latter case this gives a direct way to define the determinant bundle over A/G, [13]. The locality of the anomaly (4.2) is compatible with the formal sewing formula (5.2). Applying a gauge transformation which is trivial on Y to the right-hand side of the equation gives a gauge variation which is a product of gauge variations on the two halves X0 , X1 of M. This product is equal, by locality of the logarithm, to the gauge variation on M of the path integral on the left-hand-side. Since the cutting surface Y is arbitrary, one can drop the requirement that g is trivial on Y . The gauge transformations (and diffeomorphisms) which are not trivial on the boundary need a different treatment. This is because they act non-trivially on the boundary Fock spaces FY . We shall concentrate on the case when Y is odd dimensional. The first question to ask is how the action of the gauge group on the parameter space B of boundary geometries on Y is lifted to the total space of the bundle of Fock spaces F → B. This problem has already been analyzed (leading to Schwinger terms in the Lie algebra of the group G) in the literature, but in the present article we want to clarify how the boundary action intertwines with the Fock functor construction.
4.1. Commutator anomaly on the boundary. Let b ∈ B and W ∈ Grb . In the rest of this section B denotes the space of metrics and vector potentials on a fixed manifold Y and Yb is the manifold Y equipped with the geometric data b. The pair (b, W ) is mapped to (g.b, g.W ) by a gauge transformation (or a diffeomorphism) g, acting on both potentials, metrics and spinor fields. This induces an unitary map from the Fock space F(Hb , W ) to F(Hg.b , g.W ), by a ∗ (u) → a ∗ (g · u) and similarly for the annihilation operators. However, sometimes (b, W ) do not appear independently, but W is given as a function of the boundary geometry; b → PW =Wb is a Grassmann section; this leads to the construction of the bundle of Fock spaces Fb parameterized by b ∈ B, as already mentioned above. An example of this situation is the following. Suppose the Dirac operators on the boundary do not have zero eigenvalue (this happens when massive Fermions are coupled to vector potentials). Then it is natural to take Wb = Hb+ as the space of positive energy states. Still this case does not lead to any complications because of the equivariance property Wg.b = g.Wb . However, there are cases when no equivariant choices for Wb exist. This happens when we have massless chiral Fermions coupled to gauge potentials. For some potentials there are always zero modes and one
594
J. Mickelsson, S. Scott
cannot take Wb as the positive energy subspace without introducing discontinuities into the construction. Let us assume that a Grassmann section Wb is given. For each boundary geometry b we have a Fermionic Fock space Fb = F(Hb , Wb ) determined by the polarization HYb = Wb ⊕ Wb⊥ . In order to determine the obstruction to lifting the gauge group action on Y to the bundle of Fock spaces such that g −1 DYb g = DYg.b we compare the action on F to the natural action in the case of polarizations Wb defined by the positive energy subspaces of Dirac operators DYb − λ. We have fixed a real parameter λ and we consider only those boundary geometries b ∈ B for which λ is not an eigenvalue. Since the choice of polarizations W is equivariant, the gauge action lifts to the (local) Fock bundle F . Relative to W the F vacua form a complex line bundle DET(W , W ); again, this is defined only locally in the parameter space. Example 4.1. Let Y be a unit circle with standard metric but varying gauge potentials. We can choose HY = W ⊕ W ⊥ as the fixed polarization defined by the decomposition to positive and negative Fourier modes. If the gauge group is SU (n) and Fermions are in the fundamental representation of SU (n) then the mapping g → g·W defines an embedding of the loop group LSU (n) to the Hilbert–Schmidt Grassmannian Gr1 (HY , W ). The pullback of the Quillen determinant bundle over the Grassmannian to LSU (n) defines the central extension of the loop group with level k = 1, [17]. There is a general method to describe the relative determinant bundle in terms of index theory on X for ∂X = Y . We assume that the spin and gauge vector bundle on Y can be smoothly continued to bundles on X. This is the case for example when Y = S 2n−1 and X is chosen such that it has the topology of a solid ball, with a product metric near the boundary. Any vector potential can be smoothly continued to a potential on X for example as A(x, r) = f (r)A(x) with f increasing smoothly from zero to the value one at r = 1; all derivatives of f vanishing at r = 0, 1. We can now define a spectral section b → Wb as the Calderon subspace associated to the continued metric and vector potential in the bulk; we denote the Dirac operator defined by this geometric data in X by DX,b . The determinant line for a Dirac operator DX,b subject to the boundary condition W is canonically the tensor product of the line DET(W , W ) and the determinant line of the same operator DX,b but subject to another choice of boundary conditions W , (2.7) Since the spectral section Wb and the Dirac operator DX,b is parameterized by the affine space of geometric data (metrics and potentials) on the boundary, the corresponding Dirac determinant bundle is topologically trivial. Let Uλ be the set of b ∈ B such that the real number λ is not in the spectrum of the corresponding Dirac operator DYb . as the spectral subspace D On Uλ we can define the boundary conditions Wb,λ Yb > λ of the boundary Dirac operator. The set Uλ is in general non-contractible and the Dirac determinant line bundle defined by the boundary conditions W can be nontrivial. The curvature of this bundle is given by the families index theorem [5, 16]. It can be written in terms of characteristic classes in the bulk and the so-called η-form on the boundary; the latter depends on spectral information about the family of Dirac operators. The curvature P when evaluated along gauge and diffeomorphism directions on the boundary data has a simplified expression; in particular, the η-form drops out since it is a spectral invariant and the contribution from the characteristic classes in the bulk reduces to a boundary integral involving the (gauge and metric) Chern–Simons forms, [7,8]: 1 P= CS[2] (A + v, 2 + w) (4.3) 2π Y
Determinant Bundles and FQFT
with
595
ˆ dCS(A, 2) = A(R)ch(F ),
where [2] denotes the part that is a 2-form along parameter directions. The symbol A + v means a connection form on Y × B such that in the Y directions it is given by a vector potential A and in the gauge directions Lu on B it is equal to the Lie algebra valued function u. In a similar way, 2 + w is the sum of the Levi-Civita connection (on Y ) and a metric connection w such that the value of w along a vector field Lu on B, generated by a vector field u on Y , is equal to the matrix valued function on Y given by the Jacobian of the vector field u. The characteristic classes are iR/4π 1/2 ˆ , A(X) = det sinh(iR/4π ) ch(X) = tr (exp(iF /2π )), where R is the Riemann curvature tensor associated to a metric g in the bulk, F is the curvature of a gauge connection A. From the previous discussion it follows that the topological information (de Rham cohomology class of the curvature) in the relative determinant bundle DET(W , W ) is given by the curvature formula for the Dirac determinant bundle for boundary polarization W . This leads to the explicit formula for lifting the gauge and diffeomorphism group action from the base B = A × M to the Fock bundle F, [7, 8]. Here M is the space of Riemann metrics on Y . Infinitesimally, the lifting leads to an extension of the Lie algebras Lie(G) and Vect(Y ) by an abelian ideal J consisting of complex valued functions on A × M. The commutator of two pairs of elements (u, f ) and (v, g) (where f, g are in the extension part J and u, v are infinitesimal gauge transformations or vector fields) is given as [(u, f ), (v, g)] = ([u, v], Lu · g − Lv · f + c(u, v)),
(4.4)
where c(u, v) is an anti-symmetric bilinear function of the arguments u, v taking values in the ideal J . It satisfies the cocycle condition c(u1 , [u2 , u3 ]) + Lu1 · c(u2 , u3 ) + cyclic permutations = 0.
(4.5)
The cocycle c is just the curvature form evaluated along gauge (or diffeomorphism group) directions, c(u, v) = P(Lu , Lv ), where Lu is the vector field on A (resp. M) generated by the gauge (diffeomorphism) group action. When Y is one-dimensional, the cocycle reduces to the central term in an affine Lie algebra or in the Virasoro algebra; in this case the cocycle does not depend on the vector potential or the metric on Y . On S 3 the cocycle (Schwinger term) is given as [12, 9] i tr A[du, dv] (4.6) c(u, v) = 24π 2 Y when the Fermions are in the fundamental representation of the gauge group; here u, v : Y → Lie(G) are smooth infinitesimal gauge transformations. On S 3 the cocycle is trivial in case of vector fields and metrics. Also in higher dimensions explicit expressions can be worked out starting from (4.3), [8].
596
J. Mickelsson, S. Scott
In the case when the projections Pb , Pb on the spaces Wb , Wb of the Grassmann sections differ by trace-class operators, we have a general formula for the curvature of the relative determinant bundle as P(Lu , Lv ) =
1 tr Fb (Lu Fb )(Lv Fb ) − Fb (Lu Fb )(Lv Fb ) , 8πi
(4.7)
where Fb = Pb − Pb⊥ and Pb⊥ is the projection on to the orthogonal complement Wb⊥ . Note that neither of the two terms on the right have a finite trace but the difference is trace class by the relative trace-class property of Pb , Pb . Note also that in the case when all the projections P are in a single restricted Grassmannian, the first term is the standard formula for the curvature of the Grassmannian. The second term can be viewed as a renormalization; it is in fact a background field dependent vacuum energy subtraction. The proof of the curvature formula (4.7) is as follows. First, one notices that this gives the curvature of the relative determinant bundle when both variables Wb , Wb lie in the same restricted Grassmannian relative to a fixed base point P0 . Then one has to show that the difference actually makes sense when dropping the existence of common base point. For that purpose one writes P(Lu , Lv ) =
1 tr (Fb − Fb )(Lu Fb )(Lv Fb ) + Fb (Lu Fb − Lu Fb )(Lv Fb ) 8πi + Fb (Lu Fb )(Lv Fb − Lv Fb ) ,
which is manifestly a trace of a sum of trace-class operators.
4.2. Chiral anomaly in the bulk. In the construction of the Fock functor we took as independent parameter a choice of an element H ∈ DET(K(Db+ ), WYb ) in the boundary determinant bundle; recall that K(Db+ ) is the range of the Calderon projection. A choice of this element, as a function of the geometric data in the bulk, is a section of the determinant bundle. In quantum field theory such a choice is provided by a choice of the regularized determinant of the chiral Dirac operator Db+ . The determinant vanishes if and only if the orthogonal projection π : K(Db+ ) → WYb is singular and therefore it makes sense to choose H = Hb ∈ DET(K(Db+ ), WYb ) (represented as an admissible linear map Hb : WYb → K(Db+ )) such that det r (Db+ ), defined subject to the boundary conditions Wb , is equal to det F (π ◦ Hb ). In the case of chiral Fermions the determinant detr (Db+ ) is anomalous with respect to diffeomorphisms and gauge transformations on X and the variation of the determinant is given by the factor ω(g; b) in (4.2). This implies the transformation rule Hg.b = Hb · ω(g; b),
(4.8)
where g is either a gauge transformation or a diffeomorphism and b stands for both the metric and gauge potential on X. ω is a non-vanishing complex function, satisfying the cocycle condition ω(g1 g2 ; b) = ω(g1 ; g2 · b)ω(g2 ; b).
(4.9)
Here the boundary conditions should be invariant under g, meaning that the gauge transformations (and diffeomorphism) approach smoothly the identity at the boundary.
Determinant Bundles and FQFT
597
If the cocycle ω is nontrivial (and this is the generic case for chiral Fermions) in cohomology, then the relation (4.2) above tells us that the Fock functor is determined by the family of Calderon subspaces K(Db+ ) and a choice of a section (the regularized determinant) of a nontrivial line bundle over the quotient space B of B modulo diffeomorphisms and gauge transformations. Example 4.2. Let A(D) be the space of smooth potentials in a unit disk D. Let G(D, ∂D) be the group of gauge transformations which are trivial on the boundary ∂D = S 1 . For each A ∈ A(D) there is a unique g = gA : D → G such that A = g −1 Ag +g −1 dg is in the radial gauge, Ar = 0, and g(p) = 1, where p ∈ S 1 is a fixed point on the boundary. It follows that B = A(D)/G(D, ∂D) can be identified as Arad (D)×G(D)/G(D, ∂D) = Arad × PG, where Arad (D) is the set of potentials in the radial gauge and PG is the group of based loops, i.e., those loops in G which take the value 1 at the point p. The first factor Arad (D) is topologically trivial as a vector space. Thus in this case the topology of the Dirac determinant bundle over the moduli space B is given by the pull-back of the canonical line bundle over PG, [17], with respect to the map A → gA |∂D . The sections of DET → B are by definition complex functions λ : A(D) → C which obey the anomaly condition (4.2), [13].
4.3. Relation of the chiral anomaly to the commutator anomaly. The bulk anomaly and the extension (Schwinger terms) of the gauge group on the boundary are closely related, [13]. As we saw above, the Fock functor is determined by a choice of a section b → Hb of the relative line bundle DET(K(Db+ ), WYb ). The section transforms according to the chiral transformation law for regularized determinants, Hg·b = ω(g; b)Hb , for transformations g which are equal to the identity on the boundary. If now h is a transformation which is not equal to the identity on the boundary, we can define an operator T (h) acting on sections by (T (h)ξ )(b) = γ (h; b)ξ(h−1 · b),
(4.10)
where γ is a complex function of modulus one and must be chosen in such a way that ξ (b) = (T (h)ξ )(b) satisfies the condition (4.2). Explicit expressions for γ have been worked out in several cases, [14]. For example, if dimX = 2 and g is a gauge transformation then i γ (h; A) = exp( tr A dhh−1 ), (4.11) 2π X where tr is the trace in the representation of the gauge group determined by the action on Fermions. In general, γ must satisfy the consistency condition, γ (h; g · b)ω(hgh−1 ; h−1 · b) = γ (h; b)ω(g; b). In the two-dimensional gauge theory example, i i ω(g; A) = exp( tr (dgg −1 )3 ). tr Adgg −1 + 2π X 24π
(4.12)
(4.13)
598
J. Mickelsson, S. Scott
The latter integral is evaluated over a 3-manifold M such that its boundary is the closed 2-manifold obtained from X by shrinking all its boundary components to a point. The introduction of the factor γ in (4.10) has the consequence that the composition law for the group elements is modified, T (g1 )T (g2 ) = θ(g1 , g2 ; b)T (g1 g2 ),
(4.14)
where θ is a S 1 valued function, defined by θ (g1 , g2 ; b) = γ (g1 g2 ; b)γ (g1 ; b)−1 γ (g2 ; g1 −1 · b).
(4.15)
Thus we have extended the original group of gauge transformations (diffeomorphisms) by the abelian group of circle valued functions on the parameter space B. At the Lie algebra level, the relation (4.14) leads to a modified commutator (by Schwinger terms discussed above) of the “naive” commutation relations of the algebra of infinitesimal gauge transformations or the algebra of vector fields on the manifold X. Actually, the modification is “sitting on the boundary”; the action of T (g) was defined in such a way that the (normal) subgroup of gauge transformations which are equal to the identity on the boundary acts trivially on the sections ξ(b). There is an additional slight twist to this statement. Actually, the normal subgroup is embedded in the extended group as the set of pairs (g, c(g)), where c(g) is the circle valued function defined by c(g) = γ (g; b)−1 ω(g; b).
(4.16)
The consistency condition (4.12) guarantees that the multiplication rule (g1 , c(g1 ))(g2 , c(g2 )) = (g1 g2 , c(g1 g2 )), holds in the extended group with the multiplication law (g1 , µ1 )(g2 , µ2 ) = (g1 g2 , θ(g1 , g2 )µ1 µ2 g1 ), where µg (b) = µ(g −1 b). 4.4. Summary. Let us summarize the above discussion on Fock functors and group extensions. On the boundary manifold Y = ∂X a choice of boundary conditions Wb (labeled by a parameter space BY of boundary geometries) defines a fermionic Fock space FYb . The group of gauge transformations (or diffeomorphisms) on Y acts in the bundle of Fock spaces (parameterized by geometric data on the boundary) through an abelian extension; the Lie algebra of the extension is determined by a 2-cocycle (Schwinger terms) which are computed via index theory from the curvature of the relative determinant bundles DET(Wb , Wb ), where Wb is the positive energy subspace defined by the boundary Dirac operator. If the boundary is written as a union Y = Yin ∪ Yout of the ingoing and outgoing components then the Fock functor assigns to the geometric data on X a linear operator ZX : Fin → Fout . A gauge transformation in the bulk X sends ZX to γ (g; X)Zg −1 X . This action defines an abelian extension of the gauge group. There is a normal subgroup isomorphic to the group of gauge transformations which are equal to the identity on the boundary. This subgroup acts trivially, therefore giving an action of (the abelian extension of) the quotient group on the boundary. The latter group is isomorphic to the group acting in the Fock bundle over boundary geometries.
Determinant Bundles and FQFT
599
5. Path Integral Formulae and a 0+1-Dimensional Example In this section we outline the fermionic path integral formalism for an EBVP and explain how the Fock functor models this algebraically. 5.1. Path integral formulae. The analogue of (4.1) for an EBVP is ∗ ZX (P ) := det(DP ) = e X ψ Dψ dm DψDψ ∗ , EP
(5.1)
where EP = dom(DP ). This is Eq. (1.4) for the case S(ψ) = X ψ ∗ Dψ dx and where the local boundary condition f has been replaced by the global boundary condition P . If we consider a partition of the closed manifold M = X0 ∪Y X1 . The Dirac operator over M restricts to Dirac operators D 0 over X 0 and D 1 over X1 . We assume that the geometry is tubular in a neighbourhood of the splitting manifold Y , then we have Grassmannians Gr Y i of boundary conditions associated to D i , where Y 1 = Y = Y 0 . The reversal of orientation means that there is a diffeomorphism Gr Y 0 ≡ Gr Y 1 given by P ↔ I − P , so that each P ∈ Gr Y 0 defines the boundary value problems DP0 and DI1−P . According to (4.1) and (5.1), the analogue of the sewing formula (1.5) is + ∗ 0 ψ ∗ DA ψ dm ∗ M e DψDψ = DP e X0 ψ0 D ψ0 dx0 Dψ0 Dψ0∗ E (M) GrY EP (X0 ) (5.2) ∗ 1 × e X1 ψ1 D ψ1 dx1 Dψ1 Dψ1∗ . EI −P (X1 )
That is,
ZM =
GrY
or
ZX0 (P )ZX1 (I − P ) DP
(5.3)
detDP0 .detDI1−P DP .
(5.4)
det(D) =
GrY
Notice there is no regularization involved here of the determinant. Without further choices, the path integral formulae express a relation between sections of the corresponding determinant bundles, and (5.4) is a pairing on the spaces of sections of those line bundles. More precisely, according to the properties of the Fock functor (see Sect. 3), the Fermionic integral may be rigourously understood as a linear functional ∧(HY ⊕ H Y ) −→ Det(DP0 ), while (5.4) is replaced by the evaluation of the Fock space bilinear pairing on vacuum elements (3.42): det(D) = (νK(D 0 ) , νK(D 1 ) ).
(5.5)
However, adopting a slightly different point of view gives a more precise meaning to the integral formulae above. With a given boundary condition P the determinants of the (chiral) Dirac operators on the manifolds X0 and X1 should be interpreted as elements of the determinant line bundle DET over the Grassmannian Gr Y , with base point H + . The actual numerical value of the Dirac determinant depends on the choice of a (local) trivialization. For example, one could define det(D) as the zeta function regularized
600
J. Mickelsson, S. Scott
determinant det ζ ((DB )∗ DA ), where DB is a background Dirac operator chosen in such a way that (DB )∗ DA has a spectral cut, i.e., there is a cone in the complex plane with vertex at the origin and no eigenvalues of the operator lie inside of the cone. The value of the zeta determinant will depend on the choice of the background field B. A choice of an element in the line in DET over P 0 ∈ Gr Y is given by a choice of a pair (α, λ), where α : H+ → P 0 is a unitary map and λ ∈ C. It can be viewed as a holomorphic section of the dual determinant bundle DET∗ according to (2.24), ψ[α,λ] (ξ ) = λdetF (α ∗ ◦ π ◦ ξ ), where π : ξ(H+ ) → P 0 is the orthogonal projection. We can think of the variable ξ as the parameter for different elements W = ξ(H+ ) ∈ Gr Y . We want to replace the (ill-defined) integral GrY det(DP0 )det(DP1 )dP by a (so far ill-defined) integral of the form ψ[α,λ] (ξ )∗ ψ[β,µ] (ξ )dξ. (5.6) ξ
But this integral looks like the functional integral defining the inner product between a pair of fermionic wave functions (vectors in the Fock space) defined in Eq. (2.33): detF (α ∗ β) = !ψ[α,λ] , ψ[β,µ] " = ψα,λ (ξ(S))∗ ψ[β,µ] (ξ(S)). (5.7) S∈S
The relation with (5.5) is given by the identity (3.36) which tells us that (νK(D 0 ) , νK(D 1 ) ) = !νK(D 0 ) , νK(D 1 )⊥ ",
(5.8)
so here [α, λ] = det(PK 0 PK 0 ), [β, µ] = det(PK⊥1 PK⊥1 ). To illustrate this consider the case of the Dirac operator over a closed odd-dimensional spin manifold X partitioned by Y . According to (3.16), (3.35), the Fock pairing ( , ) : FK(D 0 ) (HY ) ⊗ FK(D 1 ) (H Y ) → Det(K(D 0 ), K(D 1 )⊥ ) ∼ = Det(DX ) evaluated on vacuum elements is the abstract determinant of the operator P (D 0 ) + P (D 1 ) : K(D 0 ) ⊗ K(D 1 ) → HY ,
(ξ, η) → ξ + η.
(5.9)
To realize this canonically as a number we can compare it to the Fock pairing coming from forming the doubled elliptic operator Ddouble = D 0 ∪ −D 0 over the closed double manifold X0 UY X0 [6]. To do that we use a canonical trivialization of the determinant lines resulting from the identification K(D 0 ) = graph(h0 : F + → F − ) proved in [19], where F ± are the spaces of positive and negative spinor fields over the even-dimensional boundary Y , h0 is a uniquely determined unitary isomorphism differing from g+ = (DY− DY+ )−1/2 DY+ by a smoothing operator, and DY± are the boundary chiral Dirac operators which we assume to be invertible. In particular, H + = graph(g+ : F + → F − ). Similarly K(D 1 ) = −1 by a smoothing operator F − → graph(h1 : F + → F − ), where h1 differs from −g+ + F . In this trivialization one has I h1 1 I h−1 1 0 1 0 , P (D ) = . P (D ) = 2 h0 I 2 h−1 I 1
Determinant Bundles and FQFT
601
Hence the composition of (5.9) with the graph trivialization HY → K(D 0 ) ⊗ K(D 1 ),
(ξ, η) → ((ξ, h0 ξ ), (h1 η, η)),
given us the automorphism of HY = F + ⊗ F − I h1 ξ ξ . → η η h0 I
(5.9b)
In the case of Ddouble one has K(D 1 ) = K(D 0 )⊥ and so h1 in (5.9b) is then replaced by h1 = −h−1 0 . Hence the composition of (5.9b) with the inverse of (5.9b) for Ddouble is the operator I 21 (h1 + h−1 ξ ξ 0 ) , → η η 0 21 (I + h0 h1 ) which is of (Fredholm) determinant class and this gives the realization of the left side of (5.8) in the graph trivialization as 1 νK(D 0 ) , νK(D 1 )⊥ graph = detF (I − h0 h1 ). 2 On the other hand, a similar computation of the right side of (5.8) in the graph trivialization, using (5.7) yields 1 1 ∗ α−h−1 αh0 = detF (I − h1 h2 ) , 2 2 1 α+ the 1/2 arising from the relative term αh∗0 αh0 . Here αT = , where the column T α+ index labels the different vectors of the canonical basis for the graph of T : F + → F − , and the row labels of α+ label the different coordinates of a basis for F + . !νK(D 0 ) , νK(D 1 )⊥ "graph = detF
5.2. A (0 + 1)-dimensional example. The motivation for replacing the integration formula (5.6) by the sum in (5.7) comes from finite dimensions. If H = H− ⊕ H+ is a decomposition of a 2N dimensional vector space into a pair of orthogonal N dimensional subspaces then the maps α, β, ξ above become (with respect to the basis {ei } with i = ±1, ±2, · · · ± N ) 2N × N matrices and we have the matrix identity det(α ∗ β) = det(α ∗ ξ(i))det(ξ(i)∗ β), (i)
the sum being over all sequences −N ≤ i1 < i2 . . . iN ≤ N (with iν ) = 0). On the other hand, it follows from Eq. (3.48) in [10], that the following integration formula holds in this situation: det(α ∗ β) = aN dξ dξ ∗ det(α ∗ ξ )det(ξ ∗ β) · det(ξ ∗ ξ )−2N−1 , (5.10)
602
J. Mickelsson, S. Scott
where aN is a numerical factor and the last factor under the integral sign can be incorporated to the definition of the integration measure. If we consider the basis elements α−h−1 , βh0 for linear maps hi : H + → H − and integrate over the dense subspace Ugraph 1
parameterizing the elements ξT = (ξ + , T ξ − ) with T ∈ Hom(H + , H − ), the integral (5.10) becomes det(1 − h2 h1 ) = aN dT dT ∗ det H + (1 − h1 T )det H − (1 + T ∗ h2 ) · det(1 + T ∗ T )−2N−1 .
(5.11)
This has consequences for determinants in dimension one, where we work with the compact Grassmannian. Let X = [a0 , a1 ] and let E be a complex Hermitian bundle over X with unitary connection ∇. Then the associated generalized Dirac operator is simply D = i∇d/dx : C ∞ (X; E) → C ∞ (X; E). Choosing a trivialization of E, so that Ea0 ⊕ Ea1 = Cn ⊕ Cn , a global boundary condition D is specified by an element P ∈ Gr(Cn ⊕ Cn ), defining the elliptic boundary value problem: DP = i∇d/dx : dom(DP ) −→ L2 ([a0 , a1 ]; E). The Fock functor here is a topological 0+1-dimensional FQFT from the category C1 , whose objects are points endowed with a complex finite-dimensional Hermitian vector space V (we do not need to give a polarization in this finite-dimensional situation), and whose morphisms are compact 1-dimensional manifolds with boundary with Hermitian bundle with unitary connection. The Fock functor Z takes an object (p, V ) ∈ C1 to the Fock space Z(p, V ) := 2hol (Gr(V ); (Det(E))∗ )∗ ∼ = ∧V , where E is the usual canonical vector bundle over the Grassmannian. Consider two compatible morphisms T01 = ([a0 , a1 ], E 01 , ∇ 01 ) and T12 = ([a1 , a2 ], E 12 , ∇ 12 ) in C1 , so that T02 = T12 T01 = ([a0 , a2 ], E 02 , ∇ 02 ), with E 02 |[a0 ,a1 ] = E01 , etc. Let Vi be the fibre over ai , and in [ai , aj ] we assign ai to be “incoming” and aj to be “outgoing”. For incoming boundary components ai the associated object in C1 is (ai , Vi ). Then we define Z(Tij ) = νKij , where Kij ∈ Gr(V0 ⊕ Vj ) is the Calderon subspace of boundary values of solutions to the ‘Dirac’ ij operator D = i∇d/dx . We have Z(Tij ) ∈ Z((pi , V i ) (pj , Vj )) = Z(pi , V i ) ⊗ Z(pj , Vj ) ∼ = Hom(∧Vi , ∧Vj ) := FKij . = (∧V i ) ⊗ (∧Vj ) ∼ Because Kij = graph(hij : Vi → Vj ) with hij the parallel-transport of the connection on Eij between ai and aj , a simple computation gives under the above identification Z(Tij ) ←→ ∧hij ∈ Hom(∧Vi , ∧Vj ). Next we have a canonical pairing Z(a0 , V0 ) ⊗ Z(a1 , V1 ) ⊗ Z(a1 , V1 ) ⊗ Z(a2 , V2 ) −→ Z(a0 , V0 ) ⊗ Z(a2 , V2 ), (5.12) induced by subtraction V1 ⊕ V1 → V1 , with ∧h01 ⊗ ∧h12 −→ ∧h01 h12 . If we take the case, where a2 = a0 , so that T02 = T12 T01 = (S 1 = [a0 , a2 ], E 02 , ∇ 02 ), corresponding to morphisms in Gr(V0 ⊕ V1 ) and Gr(V1 ⊕ V0 ) respectively, then Z(T01 ) ∈ FK01
Determinant Bundles and FQFT
603
and Z(T10 ) ∈ FK ⊥ , and the induced pairing FK01 ⊗ FK ⊥ −→ C, under the above 10 10 identifications is just the supertrace ( , ) : Hom(∧V0 , ∧V1 ) ⊗ Hom(∧V1 , ∧V0 ) −→ C, (−1)k tr (ab|∧k ). (a, b) → tr s (ab) := k
Applied to the vacuum elements νWT01 ↔ ∧T01 ∈ Hom(∧V0 , ∧V1 ) and νW ⊥ ∗) ∧(−T10
T10
∈ Hom(∧V1 , ∧V0 ) we have
∗ (νWT01 , νW ⊥ ) = tr s (∧ − T10 ∧ T01 ) = T10
↔
∗ ∗ tr (∧k (T10 T01 ) = det(I + T10 T01 ).
k
(5.13) Hence we have (Z(T01 ), Z(T10 )) = det(I − h10 h01 ), (since hij is unitary), and (Z(T01 ), νW ⊥ ) = det(I + T ∗ h01 ). On the other hand it well-known that det(I + T T ∗ h01 ) = detζ (DPT ). So from Eq. (5.11) we have (Z(T01 ), Z(T10 )) = aN dT dT ∗ det ζ (DP10T )det ζ (DP01−T ∗ ) · det(1 + T ∗ T )−2N−1 , (5.14) where P−T ∗ = I − PT , expressing the relation of the algebraic Fock space pairing to the path integral sewing formula Eq. (5.4). Notice that the gauge group of a boundary component of [a0 , a1 ] is just a copy of the unitary group U (n) and under the embedding g → graph(g) := Wg , the Fock functor maps g to ∧g on ∧V . Thus in the case of 0+1-dimensions the FQFT representation of the boundary gauge group is the fundamental U (n)-representation, which is a restatement of the Borel-Weil Theorem for U (n). This means that the invariant output by the FQFT, which in fact here is a TQFT, is the character of the fundamental representation π of U (n). This is what we would expect. We are dealing with a single particle evolving through time, and so its only invariants are the representations of its internal symmetry group, which is the symmetry group of the bundle E over [a, b]. In this sense we are dealing with quantum mechanics, rather than QFT, and because it is a topological field theory the Hilbert space is finite-dimensional. 5.3. Relation to the Berezin integral. The above pairing can also be described by a Fermionic integral. Let ∧V denote the exterior algebra of the complex vector space V with odd generators ξ1 , . . . , ξn . It has as its basis the monomials ξI = ξi1 . . . ξip , I = {i1 , . . . , ip }, i1 < . . . < ip , where I runs over subsets of {1, . . . , n}, and we set |I | := p. The Fermionic (or Berezin) integral is the linear functional : ∧V −→ C, f (ξ ) −→ f (ξ ) Dξ which picks out the the top degree coefficient of f (ξ ) (a polynomial in the generators) relative to the generator ξ = ξ1 . . . ξn of Det V = ∧n V . This extends to a functional : ∧V ⊗ ∧V −→ C, f (ξ , ξ ) −→ f (ξ , ξ ) Dξ Dξ ,
604
J. Mickelsson, S. Scott
defined relative to the generator ξ ξ := ξ1 ξ1 . . . ξn ξn . of Det V
⊗Det V . Given an element T ∈ End(V ) we associate to the quadratic element ξ T ξ := i,j tij ξi ξj . We then have 1 n n! (ξ T ξ ) = det(T )ξ ξ , and more generally the Gaussian expression eξ T ξ = det(TI )ξI ξI , I
where TI denotes the submatrix (tij ) with i, j ∈ I , so that we can write eξ T ξ Dξ Dξ = det(T ),
(5.15)
so determinants are expressible as complex Fermion Gaussian integrals. Next, we have a bilinear form on ∧V ⊗ ∧V defined by < f, g >= g(ξ , ξ )σ f (ξ , ξ ) Dµ[(ξ, ξ ), (ξ , ξ )],
(5.16)
where f (ξ )σ is f (ξ ) with the order of the generators reversed, and f (ξ , ξ ) Dµ[(ξ, ξ ), (ξ , ξ )] := f (ξ , ξ )e2ξ ξ D(ξ, ξ )D(ξ , ξ ), is the Fermionic integral with respect to a Gaussian measure. The 2 arises in the exponent because we are dealing with ∧V ⊗ ∧V rather than ∧V . Applied to quadratic elements eξ T ξ and eξ Sξ defined for T , S ∈ End(V ) we then have ξ T ξ ξ Sξ !e , e " = eξ Sξ +ξ T ξ +2ξ ξ Dξ Dξ
I (ξ ξ )
=
e
S I
= detV ⊕V
T ξ
I T
ξ
Dξ Dξ
S I
= det V (I − ST ). I T : V ⊕ V → V ⊕ V and the general formula S I
Here we use (5.15) applied to
a b
= det(d)det(a − bd −1 c), valid provided d : V → V is invertible. c d We can repeat the process for a pair of complex vector spaces V0 ) = V1 of the same dimension and T ∈ Hom(V0 , V1 ) and S ∈ Hom(V1 , V0 ). Now define the Fermionic integral just to be the projection onto the form of top degree : ∧V0 ⊗ ∧V1 → Det(V0 , V1 ). Associated to T we have eT ∈ V0 ⊗ ∧V1 ∼ = Hom(∧V0 , ∧V1 ) which we ∼ ∗ may regard T as an element of V0 ⊗ ∧V1 via the Hermitian isomorphism V0 = V0 , and then e = det(T ) ∈ Det(V0 , V1 ). Here det(T ) is the element det(T )(ξ1 . . . ξn ) = T ξ1 . . . T ξn , for a basis ξi of V0 , which is canonically identified with det(T ) ∈ C when V0 = V1 , and the Gaussian element is then eT = eξ T ξ . The bilinear pairing goes through as before, with !eT , eS " = detV0 (I − ST ), which gives an alternative formulation of the Fock pairing ! , > FWT0 × FW ⊥ → C. det
T1
Determinant Bundles and FQFT
605
References 1. Atiyah, M.F.: Topological quantum field theories. Inst. Hautes Etudes Sci. Publ. Math. 68, 175 (1989) 2. Atiyah, M.F. and Singer, I.M.: Dirac operators coupled to vector potentials. Proc. Nat. Acad. Sci. USA 81, 2597 (1984) 3. Atiyah, M.F., Patodi, V.K., and Singer, I.M.: Spectral asymmetry and Riemannian geometry. I. Math. Proc. Cambridge Phil. Soc. 77, 43 (1975) 4. Berline, N., Getzler, E. and Vergne, M.: Heat Kernels and Dirac Operators. Grundlehren der Mathematischen Wissenschaften 298, Berlin: Springer-Verlag, 1992 5. Bismut, J-M and Freed, D.S.: The analysis of elliptic families. II. Dirac operators, eta invariants, and the holonomy theorem. Commun. Math. Phys. 107, 103 (1986) 6. Booß-Bavnbek, B., and Wojciechowski, K.P.: Elliptic Boundary Problems for Dirac Operators. Boston: Birkhäuser, 1993 7. Carey, A., Mickelsson, J., and Murray, M.: Index theory, gerbes and Hamiltonian quantization. Commun. Math. Phys. 183, 707 (1997) 8. Ekstrand, C., and Mickelsson, J.: Gravitational anomalies, gerbes, and hamiltonian quantization. Commun. Math. Phys. 212, 613 (2000) 9. Faddeev, L., Shatasvili, S.: Algebraic and Hamiltonian methods in the theory of nonabelian anomalies. Theoret. Math. Phys. 60, 770 (1985) 10. Fujii, K., Kashiwa,T., and Sakoda,S.: Coherent states over Grassmann manifolds and the WKB exactness in path integral. J. Math. Phys. 37, 567 (1996) 11. Grubb, G.: Trace expansions for pseudodifferential boundary problems for Dirac-type operators and more general systems. Ark. Mat. 37, 45 (1999) 12. Mickelsson, J.: Chiral anomalies in even and odd dimensions. Commun. Math. Phys. 97, 361 (1985) 13. Mickelsson, J.: Kac–Moody groups, topology of the Dirac determinant bundle, and fermionization. Commun. Math. Phys. 110, 173 (1987) 14. Mickelsson, J.: Current algebras and groups. London and New york: Plenum Press, 1989 15. Mickelsson, J.: On the hamiltonian approach to commutator anomalies in 3+1 dimensions. Phys. Lett. B 241, 70 (1990) 16. Piazza, P.: Determinant bundles, manifolds with boundary and surgery. Commun. Math. Phys. 178, 597 (1996) 17. Pressley, A. and Segal, G.B.: Loop Groups. Oxford: Clarendon Press, 1986 18. Quillen, D. G.: Determinants of Cauchy–Riemann operators over a Riemann surface. Funkcionalnyi Analiz i ego Prilozhenya 19, 37 (1985) 19. Scott, S.G.: Determinants of Dirac boundary value problems over odd–dimensional manifolds. Commun. Math. Phys. 173, 43 (1995) 20. Scott, S.G.: Splitting the curvature of the determinant line bundle. Proc. Am. Math. Soc. 128, 2763–2775 (2000) 21. Scott, S.G., and Wojciechowski, K.P.: The ζ –Determinant and Quillen’s determinant for a Dirac operator on a manifold with boundary. Geom. Funct. Anal. 10, 1202–1236 (2000) 22. Segal, G.B.: The definition of conformal field theory. Oxford preprint (1990) 23. Segal, G.B.: Geometric aspects of quantum field theory. Proc. Int. Cong. Math., Tokyo (1990) 24. Segal, G.B, and Wilson, G.: Loop groups and equations of the KdV type. Inst. Hautes Etudes Sci. Publ. Math. 61, 5 (1985) 25. Witten, E: Topological quantum field theory. Commun. Math. Phys. 117, 353 (1988) 26. Witten, E: Geometry and physics. Proc. Int. Cong. Math. Tokyo (1990) Communicated by R. H. Dijkgraaf
Commun. Math. Phys. 219, 607 – 629 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Isotropic Steady States in Galactic Dynamics Yan Guo1 , Gerhard Rein2 1 Lefschetz Center for Dynamical Systems, Division of Applied Mathematics, Brown University,
Providence, RI 02912, USA
2 Mathematisches Institut der Universität München, Theresienstr. 39, 80333 München, Germany
Received: 24 October 2000 / Accepted: 7 January 2001
Abstract: The present paper completes our earlier results on nonlinear stability of stationary solutions of the Vlasov–Poisson system in the stellar dynamics case. By minimizing the energy under a mass-Casimir constraint we construct a large class of isotropic, spherically symmetric steady states and prove their nonlinear stability against general, i. e., not necessarily symmetric perturbations. The class is optimal in a certain sense, in particular, it includes all polytropes of finite mass with decreasing dependence on the particle energy. 1. Introduction The question of which galaxies or globular clusters are stable has for many years attracted considerable attention in the astrophysics literature, cf. [4,6] and the references there. If one neglects relativistic effects and collisions among the stars, then from a mathematics point of view the question is which steady states of the Vlasov–Poisson system ∂t f + v · ∂x f − ∂x U · ∂v f = 0, U = 4π ρ, lim U (t, x) = 0, |x|→∞ ρ(t, x) = f (t, x, v)dv, are stable. Here f = f (t, x, v) ≥ 0 denotes the density of the stars in phase space, t ∈ R denotes time, x, v ∈ R3 denote position and velocity respectively, ρ is the spatial mass density of the stars, and U the gravitational potential which the ensemble induces collectively. If U0 is a time-independent potential then the particle energy E=
1 2 |v| + U0 (x), 2
(1.1)
608
Y. Guo, G. Rein
is conserved along the characteristics of the Vlasov equation. Therefore, a standard technique to obtain steady states of the Vlasov–Poisson system is to prescribe the particle distribution f0 as a function of the particle energy – this takes care of the Vlasov equation – and to solve self-consistently the remaining Poisson equation. The main problem then is to show that the resulting steady state has finite mass and possibly compact support. A well known class of steady states for which this approach works are the so-called polytropes f0 (x, v) = (E0 − E)k+ .
(1.2)
Here (·)+ denotes the positive part, E0 ∈ R is a cut-off energy, and −1/2 < k ≤ 7/2; only for this range of exponents do these steady states have finite mass, if k < 7/2 they have compact support in addition. If f0 depends only on the particle energy the resulting steady state is isotropic and spherically symmetric. Assuming spherical symmetry of U0 to begin with steady states may also depend on a further conserved quantity, the modulus of angular momentum squared, L := |x|2 |v|2 − (x · v)2 ,
(1.3)
in which case they are no longer isotropic. According to Jeans’ Theorem the distribution function of any spherically symmetric steady state has to be a function of the invariants E and L, cf. [2]. In [7, 9, 10, 17] we addressed the stability of steady states by a variational technique: It was shown that an appropriately chosen energy-Casimir functional has a minimizer under the constraint that the mass is prescribed, this minimizer was shown to be a steady state, and its nonlinear stability was derived from its minimizing property. While this turned out to be an efficient method to assess the stability of known steady states and also to construct new ones which automatically have finite mass, compact support, and are stable, there were two unwanted restrictions: Perturbations had to be spherically symmetric, and only the polytropes with 0 < k < 3/2 were covered. A physically realistic perturbation, say by the gravitational pull of some distant galaxy, is hardly spherically symmetric. Also, while the restriction k > 0 was indispensable – for k ≤ 0 no corresponding Casimir functional can be defined – and is probably necessary for stability since it makes f0 a decreasing function of the particle energy, the restriction k < 3/2 is less well motivated. In [8] the first author removed the latter restriction in the case of the polytropes, while in [18] the second author removed the restriction to spherically symmetric perturbations for a class of isotropic steady states including the polytropes with k < 3/2. It is the purpose of the present paper to combine these techniques to obtain a result which we believe is optimal in the following sense: It does not require any symmetry restrictions of the perturbations, and it covers all isotropic polytropes. As a matter of fact, the restrictions we require of the steady states are necessary to guarantee finite mass and to make the distribution function a decreasing function of the energy. Presumably, if the latter condition is violated sufficiently strongly, then the steady state is unstable. The new elements in our analysis which allow for the improvements described above are the following: Previously we minimized an energy-Casimir functional under a mass constraint. Now we minimize the total energy of the system under a mass-Casimir constraint. The change of the role of the Casimir functional – from part of the minimized functional into part of the constraint – allows us to remove the restriction k < 3/2 and was introduced in [8]. It also leads to a much cleaner assumption on the steady state or on the Casimir functional respectively. Inspired by the concentration-compactness argument
Isotropic Steady States in Galactic Dynamics
609
due to P. L. Lions [14], which was used in [18], we use a refinement of our previous scaling and splitting argument in the compactness analysis of the energy functional to get rid of the symmetry assumption for the perturbations. The investigation is restricted to isotropic steady states: If one includes anisotropic ones then the Casimir functional is not conserved along not spherically symmetric solutions, and the method breaks down. Anisotropic steady states are – under the appropriate assumptions – stable against spherically symmetric perturbations. Whether or not they are stable against general perturbations remains an open problem. The paper proceeds as follows: In the next section we establish some preliminary estimates which in particular show that the total energy is bounded from below and the kinetic energy is bounded along minimizing sequences. In Sect. 3 the existence of a minimizer of the energy is established. To prevent mass from running off to spatial infinity along a minimizing sequence we analyze how the total energy behaves under scaling transformations and under splittings of the distribution into different pieces. In Sect. 4 we show that such minimizers are spherically symmetric steady states of the Vlasov–Poisson system with finite mass and compact support. The stability properties of the steady states are then discussed in Sect. 5. Here we point out one problem: If f0 is a steady state then f0 (x + t V , v + V ) for any given velocity V ∈ R3 is a solution of the Vlasov–Poisson system which for V small starts close to f0 , but travels away from f0 at a linear rate in t. This trivial “instability”, which cannot be present for spherically symmetric perturbations, is handled by comparing f0 with an appropriate shift in x-space of the time dependent perturbed solution f (t). Technically, the necessity of this shift arises in the application of our compactness argument. The considerations discussed so far are restricted to Casimir functionals satisfying a growth condition which excludes the polytropic case k = 7/2. This limiting case, the so-called Plummer sphere, is investigated in the final section. It poses additional difficulties due to a particular scaling invariance of the various functionals considered, but by using rearrangements we are able to reduce it to the same problem with symmetry, which has been investigated in [8]. This shows that it can be essential to understand the symmetric case first, and we hope that such a reduction to symmetry can be applied to other problems as well. We conclude the introduction with some references, where we also compare our approach with other approaches to the stability problem. The first nonlinear stability result for the Vlasov–Poisson system in the present stellar dynamics case is due to G. Wolansky [22]. It is restricted to spherically symmetric perturbations of the polytropes f0 (x, v) = (E0 − E)k+ Ll
(1.4)
with exponents l > −1, 0 < k < l + 3/2 with k = −l − 1/2 and uses a variational approach for a reduced functional which isnot defined on a set of phase space densities f but on a set of mass functions M(r) := |y|≤r ρ(y) dy with r ≥ 0 denoting the radial coordinate. In particular, it does not yield a stability estimate directly for the phase space distribution f . In [21] Y.-H. Wan proves stability by a careful investigation of the quadratic and higher order terms in a Taylor expansion of the energy-Casimir functional about a steady state. He has to assume the existence of the steady state, and requires a strong condition on f0 which is satisfied by the polytropes only for k = 1 and l = 0, but his arguments do not require spherical symmetry of the admissible perturbations. We also mention [1] where stability for the limiting polytropic case k = 7/2 and l = 0 is considered. Global classical solutions to the initial value problem for the Vlasov–Poisson system were first established in [15], cf. also [20]. A rigorous result on linearized stability is given in [3]. For the plasma physics case, where the sign in the Poisson equation is
610
Y. Guo, G. Rein
reversed, the stability problem is better understood; we refer to [5, 11, 12, 16]. Finally, a very general condition which guarantees finite mass and compact support of steady states, but not their stability, is established in [19]. 2. Preliminaries For a measurable function f = f (x, v) we define ρf (x) := f (x, v) dv, x ∈ R3 , and Uf := −ρf ∗ Next we define 1 Ekin (f ) := 2
1 . |·|
|v|2 f (x, v) dv dx, ρf (x)ρf (y) 1 1 Epot (f ) := − |∇Uf (x)|2 dx = − dx dy, 8π 2 |x − y| H(f ) := Ekin (f ) + Epot (f ), and
C(f ) :=
Q(f (x, v)) dv dx,
where Q is a given function satisfying certain assumptions specified below. We will minimize the total energy or Hamiltonian H of the system under a mass-Casimir constraint, i. e., over the set FM := f ∈ L1 (R6 ) | f ≥ 0, C(f ) = M, Ekin (f ) < ∞ , (2.1) where M > 0 is prescribed. The function Q has to satisfy the following Assumptions on Q: Q ∈ C 1 ([0, ∞[), Q ≥ 0, Q(0) = 0, and (Q1) Q(f ) ≥ C(f + f 1+1/k ), f ≥ 0, with constants C > 0 and 0 < k < 7/2, (Q2) Q is convex. Remark. (a) In the last section we consider the limiting case k = 7/2 for which Q(f ) := f 9/7 ,
f ≥ 0.
(b) On their support the minimizers obtained later will satisfy the relation λ0 Q (f0 ) = E with some Lagrange multiplier λ0 < 0 and E as defined in (1.1). Thus f0 is a function of the particle energy and thus a steady state of the Vlasov–Poisson system, provided this identity can be inverted.
Isotropic Steady States in Galactic Dynamics
611
(c) A typical example of a function Q satisfying the assumptions is Q(f ) = f + f 1+1/k , f ≥ 0,
(2.2)
with 0 < k < 7/2 which leads to a steady state of polytropic form (1.2). More generally, if an isotropic steady state (f0 , U0 ) is given with f0 of the form f0 (x, v) = φ(E) with some function φ then the above assumptions for the Casimir functional hold, if φ(E) vanishes for E larger than some cut-off energy E0 , φ(E) ≤ C(E0 −E)k , E ≤ E0 , where 0 < k < 7/2, and φ (E) < 0, E < E0 . The existence of a cut-off energy is necessary in order that the steady state has finite mass. The growth condition is essential for the compactness properties of H; cf. the difficulties in the limiting case k = 7/2, and note also that the polytropic ansatz with k > 7/2 leads to steady states with infinite mass. Finally, it is generally believed that steady states are unstable if the monotonicity condition on φ is violated sufficiently strongly. In this sense one can say that the assumptions on Q are optimal. (d) The function f, 0 ≤ f ≤ 1, Q(f ) = 1 2 (2.3) (f + 1), f >1 2 also satisfies our assumptions and leads to E/E0 , f0 (x, v) = 0,
E < E0 E ≥ E0
(2.4)
with some E0 < 0. Thus the fact that we do not require Q ∈ C 2 (]0, ∞[) with Q > 0 allows for examples where f0 has jump discontinuities, and these steady states will turn out to be dynamically stable as well. We collect some estimates for ρf and Uf induced by an element f ∈ FM . As in the rest of the paper constants denoted by C are positive, may depend on Q and M, and their value may change from line to line. Lemma 1. Let n := k + 3/2 so that 1 + 1/n > 6/5. Then for any f ∈ FM the following holds: (a) f ∈ L1+1/k (R6 ) with
f
1+1/k
dv dx +
f dv dx ≤ C.
(b) ρf ∈ L1+1/n (R3 ) with
1+1/n
ρf
dx ≤ C
f 1+1/k dv dx 3
≤ C Ekin (f ) 2n .
k/n
|v|2 f dv dx
(n−k)/n
612
Y. Guo, G. Rein
(c) Uf ∈ L6 (R3 ) with ∇Uf ∈ L2 (R3 ), the two forms of Epot (f ) stated above are equal, and |∇Uf |2 dx ≤ C ρf 26/5 ≤ CEkin (f )1/2 . The assertions in (b) and (c) remain valid in the limiting case k = 7/2, where n = 5 , cf. Remark (a) above. Proof. Part (a) is obvious from assumption (Q1). Splitting the v-integral according to |v| ≤ R and |v| > R and optimizing in R yields 1+1/n
ρf
≤C
f 1+1/k dv
k/n
|v|2 f dv
(n−k)/n
.
Therefore, the first estimate in (b) follows from Hölder’s inequality with indices n/k and n/(n − k), and part (a) implies the second estimate in (b). Since ρf ∈ L1 ∩ L1+1/n (R3 ) and 1 + 1/n > 6/5 we find by interpolation,
6/5
ρf
≤ CEkin (f )3/10 ;
in the limiting case this follows directly without interpolation. The estimates for Uf follow from the generalized Young’s inequality, and the equality of the two representations for Epot (f ) follows by integration by parts after regularizing ρf if necessary. As an immediate corollary of the lemma above we note that on FM the total energy H is bounded from below in such a way that Ekin – and thus certain norms of f and ρf – remain bounded along minimizing sequences: Lemma 2. There exists a constant C > 0 such that H(f ) ≥ Ekin (f ) − CEkin (f )1/2 ,
f ∈ FM ,
in particular, hM := inf H > −∞, FM
and Ekin is bounded along minimizing sequences of H in FM . The behavior of H and C under scaling transformations can be used to show that hM is negative and to relate the hM ’s for different values of M: Lemma 3. (a) Let M > 0. Then −∞ < hM < 0. (b) For all M, M > 0,
7/3 hM = M/M hM .
Isotropic Steady States in Galactic Dynamics
613
Proof. Given any function f , we define a rescaled function f¯(x, v) = f (ax, bv), where a, b > 0. Then (2.5) C(f¯) = Q(f (ax, bv)) dv dx = (a b)−3 C(f ), i. e. f ∈ FM iff f¯ ∈ FM , where M := (ab)−3 M. Next 1 |v|2 f (ax, bv) dv dx = a −3 b−5 Ekin (f ), Ekin (f¯) = 2 1 f (ax, bv) f (ay, bw) ¯ Epot (f ) = − dw dv dy dx = a −5 b−6 Epot (f ). 2 |x − y| To prove (a) we fix any f ∈ FM and let a = b−1 so that f¯ ∈ FM as well. Then H(f¯) = b−2 Ekin (f ) + b−1 Epot (f ) < 0 for b > 0 sufficiently large, since Epot (f ) < 0. To prove (b) choose a and b such that a −3 b−5 = a −5 b−6 , i. e., b = a −2 . Then H(f¯) = a 7 H(f ),
(2.6)
and since a = (M/M)1/3 and the mapping FM → FM , f → f¯ is one-to-one and onto this proves (b). One should note that both Lemma 2 and Lemma 3 remain valid in the limiting case k = 7/2. 3. Existence of Minimizers for k < 7/2 It is conceivable that along a minimizing sequence the mass could run off to spatial infinity and/or spread uniformly in space. The main problem in proving the existence of a minimizer is to show that this does not happen, which is done in the next lemma. Combined with a local compactness result for the induced fields and a new version of the splitting technique developed in our previous papers this will yield the existence of minimizers. Lemma 4. Let (fi ) ⊂ FM be a minimizing sequence of H. Then there exist a sequence (ai ) ⊂ R3 and $0 > 0, R0 > 0 such that Q(fi ) dv dx ≥ $0 ai +BR0
for all sufficiently large i ∈ N. Here we define BR := {x ∈ R3 ||x| ≤ R}. Proof. For R > 1 define
R, KR (x) := 1/|x|, 0,
|x| < 1/R, 1/R ≤ |x| ≤ R, |x| > R,
614
Y. Guo, G. Rein
and FR (x) :=
1 1{|x|>R} (x), GR (x) := |x|
1 − R 1{|x| H(fi ) ≥ −|I1 | − |I2 | − |I3 |,
(3.4)
provided i is sufficiently large. Therefore, lim inf i→∞
sup
y∈R3 y+BR
k/(n+1) 1+1/k fi
≥ C lim inf |I1 | R −(n+4)/(n+1)
dv dx
i→∞
≥ C R −(n+4)/(n+1) −hM /2 − R −1 − R −(5−n)/(n+1) .
(3.5)
By Lemma 3 (a) the right-hand side of this estimate is positive for R sufficiently large, and the proof is complete.
Isotropic Steady States in Galactic Dynamics
615
Lemma 5. Let (ρi ) ⊂ L1 ∩ L1+1/n (R3 ) be bounded with respect to both norms and ρ0 ∈ L1 ∩ L1+1/n (R3 ) with ρi + ρ0 weakly in L1+1/n (R3 ). Then for any R > 0, ∇U1BR ρi → ∇U1BR ρ0 strongly in L2 (R3 ). Proof. Take any R > R. Since by assumption on k we have 1 + 1/n ∈]6/5, 5/3[, the mapping L1+1/n (R3 ) ρ → 1BR ∇Uρ ∈ L2 (BR ) is compact. Thus the asserted strong convergence holds on BR . On the other hand, C C ρi 21 ≤ , i ∈ N ∪ {0}, |∇U1BR ρi |2 dx ≤ R − R R −R |x|≥R which is arbitrarily small for R large.
We are now ready to show the existence of a minimizer of H. Theorem 1. Let M > 0. Let (fi ) ⊂ FM be a minimizing sequence of H. Then there is a minimizer f0 ∈ FM , a subsequence (still denoted by (fi )), and a sequence of translations Ti fi (x, v) = fi (x + ai , v) with (ai ) ⊂ R3 , such that H(f0 ) = inf H = hM FM
and Ti fi + f0 weakly in L1+1/k (R6 ). For the induced potentials we have ∇UTi fi → ∇U0 strongly in L2 (R3 ). Remark. Without admitting shifts in x-space the assertion of the theorem is wrong: Starting from a given minimizer f0 and a sequence of shift vectors (ai ) ∈ R3 the sequence (Ti f0 ) is minimizing and in FM , but if |ai | → ∞ this minimizing sequence converges weakly to zero, which is not in FM . Proof of Theorem 1. Let (fi ) be a minimizing sequence and (ai ) ⊂ R3 such that the assertion of Lemma 4 holds. Since H is translation invariant (Ti fi ) is again a minimizing sequence. By Lemma 1 (a), (Ti fi ) is bounded in L1+1/k (R6 ). Thus there exists a weakly convergent subsequence, denoted by (Ti fi ) again: Ti fi + f0 weakly in L1+1/k (R6 ). Clearly, f0 ≥ 0 a. e. By Lemma 2, (Ekin (Ti fi )) is bounded so by Lemma 1, (ρi ) = (ρTi fi ) is bounded in L1+1/n (R3 ), and by assumption (Q1) this sequence is also bounded in L1 (R3 ). After extracting a further subsequence ρi + ρ0 := ρf0 weakly in L1+1/n (R3 ). Also by weak convergence Ekin (f0 ) ≤ lim inf Ekin (Ti fi ) < ∞. i→∞
616
Y. Guo, G. Rein
By (Q2) the functional C is convex. Thus by Mazur’s Lemma and Fatou’s Lemma C(f0 ) ≤ lim sup C(Ti fi ) = M, i→∞
in particular, ρ0 ∈ L1 (R3 ) by (Q1). The key step is to show that up to a subsequence we have ||∇UTi fi − ∇U0 ||2 → 0.
(3.6)
For R0 < R we denote BR0 ,R := {x ∈ R3 |R0 ≤ |x| ≤ R}, and we split Ti fi as follows: Ti fi = Ti fi 1BR =:
fi1
0 ×R
+ fi2
3
+ Ti fi 1BR
0 ,R ×R
3
+ Ti fi 1BR,∞ ×R3
+ fi3 .
(3.7)
Due to Lemma 5, ∇Uf 1 +f 2 converges strongly in L2 for any fixed R. It thus suffices to i i show that for any $ > 0, lim inf |∇Uf 3 |2 dx < $ (3.8) i→∞
i
for sufficiently large R. By Lemma 1 (b) we only need to show that lim inf Q(fi3 ) dv dx < $
(3.9)
i→∞
for sufficiently large R. We use the method of splitting to verify (3.9). According to (3.7), H(Ti fi ) = H(fi1 ) + H(fi2 ) + H(fi3 ) 2 1 ρi (x)(ρi1 + ρi3 )(y) ρi (x)ρi3 (y) − dx dy − dx dy |x − y| |x − y|
(3.10)
=: H(fi1 ) + H(fi2 ) + H(fi3 ) − I1 − I2 , with obvious definitions for ρi1 , ρi2 , ρi3 . The boundedness of ||∇Uρ 1 +ρ 3 ||2 implies that i
i
I1 ≤ C ||∇Uρ 2 ||2 . i
Since ρi2 converges weakly in L1+1/n to ρ02 := ρ0 1BR0 ,R , ||∇Uρ 2 − ∇Uρ 2 ||2 → 0, i
0
i→∞
by Lemma 5. For R > 2R0 we use Hölder’s inequality to estimate I2 as follows: 1/2 R0 −1 2 ρi (x)dx |y| ρi (y)dy ≤ C ||ρi ||6/5 . I2 ≤ 2 R BR0 BR,∞ It is a simple calculus exercise to show that 7 τ 7/3 + (1 − τ )7/3 ≤ 1 − τ (1 − τ ), τ ∈ [0, 1]. 3
(3.11)
Isotropic Steady States in Galactic Dynamics
617
With Lemma 3 and obvious definitions of Mi1 , Mi2 , Mi3 this implies that H(fi1 ) + H(fi2 ) + H(fi3 ) ≥ hM 1 + hM 2 + hM 3 i i i 7/3 7/3 7/3 1 Mi Mi2 Mi3 hM = + + M M M 7/3 7/3 Mi1 + Mi2 Mi3 hM ≥ + M M 7 Mi1 + Mi2 Mi3 hM ≥ 1− 3 M M and thus hM − H(Ti fi ) − C1 hM Mi1 Mi3 ≤ Ii1 + Ii2
≤ C2 ∇Uf 2 2 + ∇Uρ 2 − ∇Uρ 2 2 + i
0
0
R0 R
1/2 .
Here R > 2R0 are so far arbitrary, and the constants C1 , C2 are independent of R and R0 . Now assume (3.9) were false. Then there exists $1 > 0 such that for every R > 0 and i large we have (3.12) Q(fi3 ) dv dx ≥ $1 . Define $2 := −C1 hM $0 $1 > 0, where $0 is as in Lemma 4, and increase R0 from that lemma such that C2 ∇Uf 2 2 ≤ $2 /4. Next choose R > 2R0 such that C2 (R0 /R)1/2 ≤ $2 /4. Then for i large,
0
hM − H(Ti fi ) + $2 ≤ hM − H(Ti fi ) − C1 hM Mi1 Mi3 1 ≤ $2 + C2 ∇Uρ 2 − ∇Uρ 2 2 . 0 i 2 By (3.11) this contradicts the fact that (Ti fi ) is minimizing. Thus (3.9) holds, and (3.6) follows. Clearly we have H(f0 ) ≤ limi H(Ti fi ), and it remains to show that C(f0 ) = M. Assume that M0 := C(f0 ) < M; M0 > 0 since otherwise f0 = 0 in contradiction to H(f0 ) < 0. Let M0 2/3 b := < 1, a := b−1/2 , M so that by (2.5), f¯0 ∈ FM . Then by (2.6), H(f¯0 ) = a 7 H(f0 ) ≤ b−7/2 hM < hM , a contradiction; recall that b < 1 and hM < 0.
618
Y. Guo, G. Rein
4. Properties of Minimizers The purpose of the present section is to show that the minimizers obtained in the previous one are indeed steady states of the Vlasov–Poisson system. Theorem 2. Let f0 ∈ FM be a minimizer of H. Then f0 (x, v) =
φ(E), E < E0 , 0, E ≥ E0
a. e.,
where E :=
1 2 |v| + U0 (x), 2
E0 := λ0 Q (0), λ0 :=
E f0 dv dx < 0, Q (f0 ) f0 dv dx
U0 is the potential induced by f0 , and φ(E) := inf{f ≥ 0|Q (f ) = E/λ0 },
E ≤ E0 .
In particular, f0 is a steady state of the Vlasov–Poisson system. Remark. (a) The Euler–Lagrange equation for our constrained minimization problem will give us the relation λ0 Q (f0 ) = E on f0−1 (]0, ∞[), which we want to invert by means of the function φ. Clearly, if Q is strictly increasing then φ(E) = (Q )−1 (E/λ0 ), E ≤ E0 . (b) Under our general assumption on Q the function Q : [0, ∞[→ [Q (0), ∞[ is continuous, increasing, and onto. This implies that for every η ≥ Q (0) the set (Q )−1 (η) is a closed, bounded interval, and there exists an at most countable set Vcrit such that (Q )−1 (η) consists of one point for η ∈ / Vcrit . The function φ is decreasing with φ(] − ∞, E0 ]) = [0, ∞[, and for f ∈ [0, ∞[ with λ0 Q (f ) ∈ / Vcrit we have φ(λ0 Q (f )) = f as desired. (c) In the example given by (2.3), Q (f ) =
1, 0 ≤ f ≤ 1, f, f > 1
which is not one-to-one on [0, ∞[, but the Euler–Lagrange equation can be inverted to yield (2.4).
Isotropic Steady States in Galactic Dynamics
619
Proof of Theorem 2. Let f0 and U0 be a pointwise defined representative of a minimizer of H in FM and its induced potential respectively; to derive the Euler–Lagrange relation we will argue first on f0−1 (]0, ∞[) and then on the complement. For $ > 0 small, 1 6 K$ := (x, v) ∈ R | $ ≤ f0 (x, v) ≤ $ defines a set of positive, finite measure. Let w ∈ L∞ (R6 ) be compactly supported and non-negative outside K$ , and define G(σ, τ ) := Q(f0 + σ 1K$ + τ w) dv dx; for τ and σ close to zero, τ ≥ 0, the function f0 + σ 1K$ + τ w is bounded on K$ , and non-negative otherwise. Therefore, G is continuously differentiable for such τ and σ , and G(0, 0) = M. Since ∂σ G(0, 0) = Q (f0 ) dv dx = 0, K$
there exists by the implicit function theorem a continuously differentiable function τ → σ (τ ) with σ (0) = 0, defined for τ ≥ 0 small, such that G(σ (τ ), τ ) = M. Hence f0 + σ (τ )1K$ + τ w ∈ FM . Furthermore, Q (f0 )w ∂τ G(0, 0) σ (0) = − . (4.1) = − ∂σ G(0, 0) K$ Q (f0 ) Since H(f0 + σ (τ )1K$ + τ w) attains its minimum at τ = 0, Taylor expansion implies 0 ≤ H(f0 + σ (τ )1K$ + τ w) − H(f0 ) = τ E [σ (0)1K$ + w] dv dx + o(τ ) for τ ≥ 0 small. With (4.1) we get [−λ$ Q (f0 ) + E] w dv dx ≥ 0, where
λ$ :=
K$ K$
E
Q (f0 )
(4.2)
.
By our choice for w this implies that E = λ$ Q (f0 ) a. e. on K$ and E ≥ λ$ Q (f0 ) otherwise. This shows that λ$ = λ0 does in fact not depend on $. Letting $ → 0, we conclude that E = λ0 Q (f0 ) a. e. on f0−1 (]0, ∞[),
(4.3)
E ≥ λ0 Q (0) = E0 a. e. on f0−1 (0).
(4.4)
620
Y. Guo, G. Rein
If we multiply (4.3) by f0 and integrate we obtain the asserted formula for λ0 , and λ0 < 0 as claimed, since E f0 dv dx = Ekin (f0 ) − 2Epot (f0 ) < H(f0 ) < 0. We need to invert (4.3). Let Vcrit be the at most countable set of values where Q is not one-to-one, cf. part (a) of the remark above. Since for any constant η ∈ R the set E −1 (η) has measure zero – for fixed x this is a sphere in v-space – we conclude that Scrit := {(x, v) ∈ R6 |E(x, v)/λ0 ∈ Vcrit } is a set of measure zero, and on f0−1 (]0, ∞[) \ Scrit the Euler–Lagrange equation (4.3) can be inverted to yield f0 (x, v) = φ(E) as claimed, cf. part (a) of the remark above. Together with (4.4) this proves that f0 is a. e. equal to a function of the particle energy E. Next we study the regularity, symmetry, and uniqueness of minimizers. Let Ccm and m Cb denote the space of C m functions with compact support and with bounded derivatives
up to order m, respectively.
Theorem 3. (a) Let f0 ∈ FM be a minimizer of H. Then f0 is spherically symmetric with respect to some point in x-space. (b) If k ≥ 1/2 assume in addition that φ(E) ≤ C(−E)k ,
E → −∞,
where φ is defined by Q as in Theorem 2; this condition is compatible with the general assumptions on Q. Then U0 ∈ Cb2 (R3 ) with lim|x|→∞ U0 (x) = 0 and ρ0 ∈ Cc1 (R3 ). (c) If in particular Q(f ) = f + f 1+1/k , f ≥ 0, with 0 < k < 7/2 then up to a shift in x-space there are at most two minimizers of H in FM . Proof. To prove the spherical symmetry of f0 we denote by f0∗ the spherically symmetric rearrangement of f0 with respect to x so that ρf0∗ is the spherically symmetric rearrangement of ρ0 . Clearly, f0∗ ∈ FM so that H(f0∗ ) ≥ H(f0 ), and since Ekin (f0∗ ) = Ekin (f0 ) this implies that Epot (f0∗ ) ≥ Epot (f0 ). Thus by Riesz’ rearrangement inequality these two terms must be equal, and ρ0 must be spherically symmetric with respect to some point in R3 , cf. [13, Thms. 3.7 and 3.9]. By definition, U0 is symmetric as well, and by Theorem 2 also f0 . To prove part (b), consider first the case where k < 1/2 i. e., n < 2. Then ρ0 ∈ Lp ∩L1 with p = 1 + 1/n > 3/2. The usual Lp -regularity theory and Sobolev’s embedding theorem implies that 2,p
U0 ∈ Wloc (R3 ) ⊂ C(R3 ). Moreover, for any R > 0 and x ∈ R3 , ρ0 (y) −U0 (x) = dy + ··· + ··· |x−y| 3/2. To show this, we use a bootstrap argument, based on (4.5). For this to work, we need some control on the growth of the function hφ which is the reason for the extra assumption on φ. Indeed, under that assumption the following estimate holds:
hφ (u) ≤ C 1 + (E0 − u)n , u ≤ E0 . If we use this estimate on the set where ρ0 is large – this set has finite measure – and the integrability of ρ0 on the complement we find that (4.7) ρ0 (x)p dx ≤ C + (−U0 (x))np dx. If we would pick the limiting case n = 5, i. e., ρ0 ∈ L6/5 , we would find by Young’s inequality that U0 ∈ L6 , and bootstrapping this via (4.7) gives us ρ0 ∈ L6/5 back. However, for n < 5 this works better: Starting with p0 = 1 + 1/n we apply Young’s inequality to find that U0 lies in Lq with q = (1/p0 − 2/3)−1 > 1, and substituting this into (4.7) we conclude that ρ0 ∈ Lp1 with p1 = q/n; note that by assumption p0 < 3/2. If p1 > 3/2 we are done. If p1 = 3/2 we decrease p1 slightly – note that ρ0 ∈ L1 – so that in the next bootstrap step we find p2 as large as we wish. If p1 < 3/2 we repeat the process. By induction one sees that 3(1 + 1/n)(n − 1) >1 pk = k n (n − 5) + 2n + 2 as long as pk−1 < 3/2. But since 2 ≤ n < 5 the denominator would eventually become negative so that the process must stop after finitely many steps. As to part (c) we first observe that up to some shift U0 as a function of the radial variable r := |x| solves the equation 1 2 k+3/2 (r U0 ) = ck (E0 − U0 )+ , r > 0, (4.8) r2 with some appropriately defined constant ck . Here denotes the derivative with respect to r. The assertion now follows from the scaling properties of (4.8), and we refer to [8, Thm. 3] for the details.
622
Y. Guo, G. Rein
5. Dynamical Stability Let f0 ∈ FM be a minimizer as obtained in Theorem 1. To investigate its dynamical stability we note first that 1 2 1 H(f ) − H(f0 ) = |v| + U0 (f − f0 ) dv dx − ∇Uf − ∇U0 22 2 8π (5.1) 1 2 =: d(f, f0 ) − ∇Uf − ∇U0 2 , f ∈ FM . 8π Since C(f ) = C(f0 ),
d(f, f0 ) =
[E(f − f0 ) − λ0 (Q(f ) − Q(f0 ))] dv dx.
Since Q is convex and the Lagrange multiplier λ0 from Theorem 2 is negative, the integrand can be estimated from below by (E − λ0 Q (f0 )) (f − f0 ). According to Theorem 2, this quantity is zero on supp f0 , while on R6 \ supp f0 it equals (E − λ0 Q (0)) f = (E − E0 ) f ≥ 0. Thus we see that d(f, f0 ) ≥ 0, f ∈ FM . We are now ready to state our stability result. Note that if we shift a minimizer in space we obtain another minimizer. Moreover, we do in general not know whether the minimizers are unique up to spatial shifts. This fact is reflected in two versions of our stability result. Theorem 4. Let MM ⊂ FM denote the set of all minimizers of H in FM . (a) For every $ > 0 there is a δ > 0 such that for any solution t → f (t) of the Vlasov–Poisson system with f (0) ∈ Cc1 (R6 ) ∩ FM , 1 inf d(f (0), f0 ) + ∇Uf (0) − ∇U0 22 < δ 8π f0 ∈MM implies that
inf
f0 ∈MM
d(f (t), f0 ) +
1 ∇Uf (t) − ∇Uf0 22 < $, t ≥ 0. 8π
(b) Suppose that f0 ∈ MM is isolated, i. e., inf ∇Uf0 − ∇Uf˜0 2 | f˜0 ∈ MM \ {T a f0 | a ∈ R3 } > 0. Then for every $ > 0 there is a δ > 0 such that for any solution t → f (t) of the Vlasov–Poisson system with f (0) ∈ Cc1 (R6 ) ∩ FM , d(f (0), f0 ) +
1 ∇Uf (0) − ∇U0 22 < δ 8π
Isotropic Steady States in Galactic Dynamics
623
implies that for every t ≥ 0 there exists a shift vector a ∈ R3 such that d(f (t), T a f0 ) +
1 ∇Uf (t) − ∇UT a f0 22 < $. 8π
Here T a f (x, v) := f (x + a, v) for a ∈ R3 . Remark. (a) By Theorem 3 (c) the assumption of part (b) holds for the polytropes. (b) We only showed that d(f, f0 ) ≥ 0 for f ∈ FM , but one may think of this term as a weighted L2 -difference of f and f0 . For example, if Q ∈ C 2 (]0, ∞[) with cQ :=
inf
0 0,
where fmax ≥ f0 ∞ as would be the case for the polytropes Q(f ) = f + f 1+1/k with 1 ≤ k < 7/2, then by Taylor-expanding Q we find d(f, f0 ) ≥
1 cQ f − f0 22 , 2
f ∈ FM with f ≤ fmax ;
observe that the size restriction on f propagates along solutions of the Vlasov– Poisson system. (c) The restriction f (0) ∈ FM for the perturbed initial data is acceptable from a physics point of view: A physical perturbation of a given galaxy, say by the gravitational pull of some outside object, would result in a perturbed state which is an equimeasurable rearrangement of the original state, in particular, the value of C(f ) remains unchanged. Proof of Theorem 4. Assume the assertion of part (a) were false. Then there exist $ > 0, tn > 0, and fn (0) ∈ Cc1 (R6 ) ∩ FM such that for all n ∈ N, 1 1 d(fn (0), f0 ) + ∇Ufn (0) − ∇U0 22 < (5.2) inf 8π n f0 ∈MM but
inf
f0 ∈MM
d(fn (tn ), f0 ) +
1 ∇Ufn (tn ) − ∇U0 22 ≥ $. 8π
(5.3)
By (5.2) and (5.1), lim H(fn (0)) = hM .
n→∞
Since both H and C are conserved along classical solutions as launched by fn (0), lim H(fn (tn )) = hM and fn (tn ) ∈ FM , n ∈ N,
n→∞
i. e., (fn (tn )) is a minimizing sequence for H in FM . Up to a subsequence we may therefore assume by Theorem 1 that there exists a minimizer f0 ∈ FM and a sequence (an ) ⊂ R3 such that ∇Ufn (tn ) − ∇UT an f0 22 → 0;
(5.4)
624
Y. Guo, G. Rein
note that for any f ∈ FM and a ∈ R3 , ∇UT a f − ∇Uf0 2 = ∇Uf − ∇UT −a f0 2 , also d(T a f, f0 ) = d(f, T −a f0 ). Since limn→∞ H(fn (tn )) = hM = H(T an f0 ) we conclude by (5.4) and (5.1) that d(fn (tn ), T an f0 ) → 0, n → ∞, and since T an f0 ∈ MM we arrive at a contradiction to (5.3). Thus part (a) is established. Now assume that f0 is an isolated minimizer in FM , and define 1 δ0 := inf ∇Uf0 − ∇Uf˜0 2 | f˜0 ∈ MM \ {T a f0 | a ∈ R3 } > 0. 8π Let $ > 0 arbitrary. In order to find the corresponding δ we can without loss of generality assume that $ < δ0 /4. Now choose δ > 0 according to part (a), without loss of generality δ < $, and let f (0) ∈ Cc1 (R6 ) ∩ FM be such that d(f (0), f0 ) +
1 ∇Uf (0) − ∇U0 22 < δ. 8π
The function 1 ∇Uf (t) − ∇UT a f0 22 h(t, a) := d(f (t), T a f0 ) + 8π 1 2 1 3 = |v| (f (t) − f0 ) dv dx + ∇Uf (t) 22 + ∇Uf0 22 2 8π 8π + UT a f0 ρf (t) dx is continuous, and since the interaction term goes to zero as |a| → ∞, uniformly on compact time intervals, inf a∈R3 h(t, a) is also continuous. Now assume that there exists t > 0 such that inf h(t, a) ≥ $.
a∈R3
Since at time zero the left hand side is less than $ there exists some t ∗ > 0 where inf h(t ∗ , a) = $.
(5.5)
a∈R3
On the other hand, part (a) provides some f0∗ ∈ MM such that d(f (t ∗ ), f0∗ ) +
1 δ0 ∇Uf (t ∗ ) − ∇Uf0∗ 22 < $ ≤ . 8π 4
(5.6)
By (5.5) and (5.6) together with the non-negativity of d, δ0 1 ∇Uf0 − ∇Uf0∗ 22 ≤ , 8π 2 ∗
and by the definition of δ0 there must exist some a ∗ ∈ R3 such that f0∗ = T a f0 . But this means that (5.5) contradicts (5.6), and the proof of part (b) is complete.
Isotropic Steady States in Galactic Dynamics
625
6. The Case k = 7/2; the Plummer Sphere In this section we study the so-called Plummer sphere which corresponds to the minimization of H on the constraint set f 9/7 dv dx = M, Ekin (f ) < ∞ FM := f ∈ L9/7 (R6 )|f ≥ 0, i. e., we take Q(f ) = f 9/7 which means k = 7/2 and n = 5. Due to the fact that the scaling transformation (Sλ f )(x, v) = λ−7 f (λ−4 x, λv),
(6.1)
leaves each term in both H and C invariant this case presents additional difficulties. As we noted at the end of Sect. 1 the assertion of Lemma 2 remains valid so that there exists a minimizing sequence (fi ) of H in FM . We shall follow the steps in Sect. 3 to conclude the existence of a minimizer. The key step is to verify that the assertions of both Lemma 4 and Lemma 5 are still valid in the presence of the scaling (6.1). To this end we wish to employ the results in [8], so we consider (fi∗ ), the sequence of spherically symmetric rearrangements with respect to x, which is again minimizing and in FM . By [8, Thm. 2] there exists a symmetric minimizer g such that Si fi∗ + g weakly in L9/7 (R6 ), and more importantly ∇USi fi∗ − ∇Ug → 0, 2
where Si = Sλi with some λi > 0. The corresponding spatial densities also converge strongly: Lemma 6. Up to a subsequence, lim ρSi fi∗ − ρg 6/5 = 0.
i→∞
Proof. Define ρi∗ = ρSi fi∗ . By further extracting a subsequence we can assume ρi∗ + ρg
weakly in L6/5 (R3 ),
i → ∞.
(6.2)
We claim that ρi∗ (r) → ρg (r)
(6.3)
for all r ∈]0, ∞[ at which ρg is continuous. If r0 were a point of continuity for ρg at which this is not the case, then there is an $0 > 0 and a subsequence such that |ρi∗ (r0 ) − ρg (r0 )| ≥ $0 ,
i ∈ N.
Assume that ρi∗ (r0 ) − ρg (r0 ) ≥ $0 . Since ρg is continuous at r0 > 0, there exists δ > 0 such that |ρg (r) − ρg (r0 )| < $0 /2
626
Y. Guo, G. Rein
for r ∈ [r0 − δ, r0 ]. From the monotonicity of ρi∗ we have ρg dr ≤ [ρg (r0 ) + $0 /2]dr ≤ [ρi∗ (r0 ) − $0 /2]dr {r0 −δ≤r≤r0 } {r0 −δ≤r≤r0 } {r0 −δ≤r≤r0 } ≤ [ρi∗ (r) − $0 /2] dr {r0 −δ≤r≤r0 }
in contradiction to (6.2). The assumption ρi∗ (r0 ) − ρg (r0 ) < −$0 leads to the analogous contradiction on the interval [r0 , r0 + δ] so that (6.3) is established. Now we choose two sequences rj → 0 and Rj → ∞ at which ρg is continuous. By the monotonicity and (6.3), ρi∗ are uniformly bounded for each region rj ≤ r ≤ Rj . By Lebesgue’s dominated convergence theorem, ρi∗ − ρg L6/5 (rj ≤r≤Rj ) → 0.
(6.4)
But by (0.47) of [8] there is no concentration at either r = 0 or r = ∞, i. e., for any $ > 0, there is R > 0 such that 9/7 lim sup < $. Si fi∗ i→∞
{r≤1/R}∪{r≥R}
The first estimate in Lemma 1 (b) therefore implies that up to a subsequence, ρi∗ L6/5 ({r≤rj }∪{Rj ≤r}) < C$ for i large, and together with (6.4) this completes the proof.
We now establish the key fact that Si fi are equi-integrable, i. e., there is no concentration. For a non-negative function h and a cut-off parameter N > 0 we define 0 if 1/N ≤ h(x) ≤ N, h|N (x) := h(x) otherwise. Lemma 7. For any $ > 0, there is an N > 0 such that up to a subsequence, [ρSi fi ]|N 6/5 < $, i ∈ N. Proof. Let ρi∗ = ρSi fi∗ and ρi = ρSi fi . We first use Lemma 6 to prove the assertion for ρi∗ . To this end, let Ai = {r| ρi∗ < 1/N } ∪ {r| ρi∗ > N }, and Ag = {r| ρg ≤ 1/N } ∪ {r| ρg ≥ N }. For any $ > 0, there is N > 0 large such that ρg L6/5 (Ag ) < $/2. Now from Lemma 6 for i large we deduce that (ρi∗ )|N 6/5 = ρi∗ L6/5 (Ai ) ≤ ρi∗ − ρg L6/5 (Ai ) + ρg L6/5 (Ai ) ≤ $/2 + ρg L6/5 (Ai ) . Again by Lemma 6 we find that up to a subsequence ρi∗ → ρg pointwise a. e., and thus lim supi→∞ 1Ai ≤ 1Ag a. e.. Therefore, applying Fatou’s lemma for ρg (1 − 1Ai ), we get lim sup ρg L6/5 (Ai ) ≤ ρg L6/5 (Ag ) < $/2. i→∞
Isotropic Steady States in Galactic Dynamics
627
Hence up to a subsequence, (ρi∗ )|N 6/5 < $,
i ∈ N.
(6.5)
Now by the equi-measurability of the rearrangements, (6.5) implies the assertion of the lemma. In fact, for any function h ≥ 0 and p ≥ 1, ∞ p s p−1 µ{h1{hN} > s} ds (h|N ) = p 0 ∞ s p−1 µ{h1{h s} + µ{h1{h>N} > s} ds =p 0 ∞ s p−1 [µ{s < h < 1/N } + µ{h > min[N, s]}] ds =p 0 ∞ s p−1 µ{s < h∗ < 1/N } + µ{h∗ > min[N, s]} ds = (h∗|N )p . =p 0
Taking h = ρi and noticing that by definition, h∗ = ρi∗ = ρ[Si (fi )]∗ = ρSi (fi∗ ) , we see that the assertion of the lemma follows from (6.5). We are now ready to prove the analogue of Theorem 1 for the limiting case k = 7/2: Theorem 5. Let M > 0. Let (fi ) ⊂ FM be a minimizing sequence of H. Then there is a minimizer f0 ∈ FM , a subsequence (still denoted by (fi )), and a sequence of translations and scalings: Ti Si fi (x, v) := λ7i fi (λ−4 i x + ai , λi v) with (ai ) ⊂ R3 and λi > 0 such that H(f0 ) = inf H = hM FM
and Ti Si fi + f0 weakly in L1+1/k (R6 ). For the induced potentials we have ∇UTi Si fi → ∇U0 strongly in L2 (R3 ). Proof. It suffices to verify the analogues of Lemma 4 and Lemma 5 for ρi = ρSi fi which satisfies the estimate in Lemma 7. To this end we split ρi =: ρib + (ρi )|N . Notice that for a fixed cut-off parameterN > 0, (ρib ) is bounded in any Lp , 1 ≤ p ≤ ∞, so that both Lemma 4 and Lemma 5 are valid for ρib . But by Lemma 7 and the generalized Young’s inequality, ∇U(ρi )|N 2 can be made arbitrarily small for N large, uniformly in i. Therefore, the assertion of Lemma 5 is clearly valid for (ρi ). To verify Lemma 4 for ρi , we choose $ < −hM /2 in Lemma 6. Notice that ∇Uρi 2 ≤ ∇Uρ b 2 + ∇U(ρi )|N 2 . i
Since Lemma 4 is valid for ρib we deduce as in (3.4) that hM /2 > H(fi ) ≥ −|I1b | − |I2b | − |I3b | − $,
628
Y. Guo, G. Rein
where Ikb , k = 1, 2, 3, are induced by ρib . Since ρib dx ≤ sup sup y∈R3 y+BR
y∈R3 y+BR
ρi dx,
we deduce the assertion of Lemma 4 for ρi by the same argument as before.
Next we show the analogues of the assertions of Theorems 2 and 3: Theorem 6. Let f0 ∈ FM be a minimizer of H. Then (E/λ0 )7/2 , E < 0, f0 (x, v) = 0 , E ≥ 0, with some constant λ0 < 0, in particular, f0 is a steady state of the Vlasov–Poisson system. Moreover, f0 is spherically symmetric with respect to some point in x-space, and up to scalings and translations in x, U0 (r) = −c0 (1 + r 2 )−1/2 ,
ρ0 (r) =
3c0 (1 + r 2 )−5/2 , 4π
r ≥ 0,
where the positive constant c0 depends on λ0 . Proof. The identity for f0 follows exactly as in the proof of Theorem 2. The spherical symmetry follows as in the proof of Theorem 3. By monotonicity limr→∞ U0 (r) ∈ ] − ∞, 0] exists, and since U0 ∈ L6 (R3 ) this limit must be zero so that U0 is a solution of the corresponding Emden-Fowler equation (4.8) with k = 7/2 and E0 = 0. The uniqueness up to scalings follows as in [8, Thm. 3], and the explicit formulas can be checked by direct computation. Finally, we state the stability theorem for the limiting case k = 7/2: Theorem 7. Let f0 ∈ FM be a minimizer of H. Then for every $ > 0 there is a δ > 0 such that for any solution t → f (t) of the Vlasov–Poisson system with f (0) ∈ Cc1 ∩FM , d(f (0), f0 ) +
1 ∇Uf (0) − ∇Uf0 2L2 < δ 8π
implies that for every t ≥ 0 there exits a shift vector a ∈ R3 and a scaling parameter λ > 0 such that d(f (t), T a Sλ f0 ) +
1 ∇Uf (t) − ∇UT a Sλ f0 22 < $, 8π
t ≥ 0.
The only difference to the proof of Theorem 4 is that one now has to take into account not only the spatial shifts, but also the scaling transformations which arise in Theorem 5, and this is straightforward. The condition f (0) ∈ FM can also be relaxed by a scaling transformation as in [8, Thm. 4]. Note added in proof. The stability of isotropic steady states is also addressed in Wan,Y.-H.: Nonlinear stability of spherical systems in galactic dynamics. Preprint, 2000.
Isotropic Steady States in Galactic Dynamics
629
References 1. Aly, J. J.: On the lowest energy state of a collisionless selfgravitating system under phase space volume constraints. Monthly Notices Royal Astronomical Soc. 241, 15–27 (1989) 2. Batt, J., Faltenbacher, W., & Horst, E.: Stationary spherically symmetric models in stellar dynamics. Arch. Rational Mech. Anal. 93, 159–183 (1986) 3. Batt, J., Morrison, P., Rein, G.: Linear stability of stationary solutions of the Vlasov–Poisson system in three dimensions. Arch. Rational Mech. Anal. 130, 163–182 (1995) 4. Binney, J., Tremaine, S.: Galactic Dynamics. Princeton: Princeton University Press, 1987 5. Braasch, P., Rein, G., Vukadinovi´c, J.: Nonlinear stability of stationary plasmas – An extension of the energy-Casimir method. SIAM J. Applied Math. 59, 831–844 (1999) 6. Fridman, A. M., Polyachenko, V. L.: Physics of Gravitating Systems, I. New York: Springer-Verlag, 1984 7. Guo, Y.: Variational method in polytropic galaxies. Arch. Rational Mech. Anal. 150, 209–224 (1999) 8. Guo, Y.: On the generalized Antonov’s stability criterion. Contem. Math. 263, 85–107 (2000) 9. Guo, Y., Rein, G.: Stable steady states in stellar dynamics. Arch. Rational Mech. Anal. 147, 225–243 (1999) 10. Guo, Y., Rein, G.: Existence and stability of Camm type steady states in galactic dynamics. Indiana University Math. J. 48, 1237–1255 (1999) 11. Guo, Y., Strauss, W.: Nonlinear instability of double-humped equilibria. Ann. Inst. Henri Poincaré 12, 339–352 (1995) 12. Guo, Y., Strauss, W.: Instability of periodic BGK equilibria. Comm. Pure Appl. Math. 48, 861–894 (1995) 13. Lieb, E.H., Loss, M.: Analysis. Providence, RI: American Mathematical Society, 1996 14. Lions, P.-L.: The concentration-compactness principle in the calculus of variations. The locally compact case. Part 1. Ann. Inst. H. Poincaré 1, 109–145 (1984) 15. Pfaffelmoser, K.: Global classical solutions of the Vlasov–Poisson system in three dimensions for general initial data. J. Diff. Eqs. 95, 281–303 (1992) 16. Rein, G.: Nonlinear stability for the Vlasov–Poisson system – The energy-Casimir method. Math. Meth. in the Appl. Sci. 17, 1129–1140 (1994) 17. Rein, G.: Flat steady states in stellar dynamics – Existence and stability. Commun. Math. Phys. 205, 229–247 (1999) 18. Rein, G.: Stability of spherically symmetric steady states in galactic dynamics against general perturbations. Preprint, 1999 19. Rein, G., Rendall, A. D.: Compact support of spherically symmetric equilibria in non-relativistic and relativistic galactic dynamics. Math. Proc. Camb. Phil. Soc. 128, 363–380 (2000) 20. Schaeffer, J.: Global existence of smooth solutions to the Vlasov–Poisson system in three dimensions. Commun. Part. Diff. Eqs. 16, 1313–1335 (1991) 21. Wan, Y.-H.: On nonlinear stability of isotropic models in stellar dynamics. Arch. Rational Mech. Anal. 147, 245–268 (1999) 22. Wolansky, G.: On nonlinear stability of polytropic galaxies. Ann. Inst. Henri Poincaré, 16, 15–48 (1999) Communicated by H. Spohn
Commun. Math. Phys. 219, 631 – 669 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Multi-Interval Subfactors and Modularity of Representations in Conformal Field Theory Yasuyuki Kawahigashi1 , Roberto Longo2, , Michael Müger2, 1 Department of Mathematical Sciences, University of Tokyo, Komaba, Tokyo, 153-8914, Japan.
E-mail:
[email protected] 2 Dipartimento di Matematica, Università di Roma “Tor Vergata”, Via della Ricerca Scientifica,
00133 Roma, Italy. E-mail:
[email protected];
[email protected] Received: 7 July 1999 / Accepted: 13 January 2001
Dedicated to John E. Roberts on the occasion of his sixtieth birthday Abstract: We describe the structure of the inclusions of factors A(E) ⊂ A(E ) associated with multi-intervals E ⊂ R for a local irreducible net A of von Neumann algebras on the real line satisfying the split property and Haag duality. In particular, if the net is conformal and the subfactor has finite index, the inclusion associated with two separated intervals is isomorphic to the Longo–Rehren inclusion, which provides a quantum double construction of the tensor category of superselection sectors of A. As a consequence, the index of A(E) ⊂ A(E ) coincides with the global index associated with all irreducible sectors, the braiding symmetry associated with all sectors is non-degenerate, namely the representations of A form a modular tensor category, and every sector is a direct sum of sectors with finite dimension. The superselection structure is generated by local data. The same results hold true if conformal invariance is replaced by strong additivity and there exists a modular PCT symmetry.
1. Introduction This paper provides the solution to a natural problem in (rational) conformal quantum field theory, the description of the structure of the inclusion of factors associated to two or more separated intervals. This problem has been considered in the past years, seemingly with different motivations. The most detailed study of this inclusion so far has been done by Xu [50] for the models given by loop group construction for SU (n)k [47]. In this case Xu has computed the index and the dual principal graph of the inclusions. A suggestion to study this inclusion has been made also in [43, Sect. 3]. Our analysis is model independent, and will display new structures and a deeper understanding also in these and other models. Supported in part by GNAFA and MURST
Supported by EU TMR Network “Noncommutative Geometry”. Address after June 2001: Koorteweg de
Vries Institute, Amsterdam, The Netherlands
632
Y. Kawahigashi, R. Longo, M. Müger
Let A be a local irreducible conformal net of von Neumann algebras on R, i.e. an inclusion preserving map I → A(I ) from the (connected) open intervals of R to von Neumann algebras A(I ) on a fixed Hilbert space. One may define A(E) for an arbitrary set E ⊂ R as the von Neumann algebra generated by all the A(I )’s as I varies in the intervals contained in E. By locality A(E) and A(E ) commute, where E denotes the interior of R E, and thus one obtains an inclusion ˆ A(E) ⊂ A(E), ˆ where A(E) ≡ A(E ) . If Haag duality holds, as we shall assume1 , this inclusion is trivial if E is an interval, but it is in general non-trivial for a disconnected region E. We will explain its structure if E is the union of n separated intervals, a situation that can be reduced to the case n = 2, namely E = I1 ∪ I2 , where I1 and I2 are intervals with disjoint closure, as we set for the rest of this introduction. ˆ One can easily realize that the inclusion A(E) ⊂ A(E) is related to the superselection structure of A, i.e. to the representation theory of A, as charge transporters between ˆ endomorphisms localized in I1 and I2 naturally live in A(E), but not in A(E). ˆ Assuming the index [A(E) : A(E)] < ∞ and the split property2 , namely that A(I1 ) ∨ A(I2 ) is naturally isomorphic to A(I1 ) ⊗ A(I2 ), we shall show that indeed ˆ A(E) ⊂ A(E) contains all the information on the superselection rules. We shall prove that in this case A is rational, namely there exist only finitely many ˆ different irreducible sectors {[ρi ]} with finite dimension and that A(E) ⊂ A(E) is isomorphic to the inclusion considered in [28] (we refer to this as the LR inclusion, cf. Appendix A), which is canonically associated with A(I1 ), {[ρi ]} (with the identification A(I2 ) A(I1 )opp ). In particular, ˆ d(ρi )2 , [A(E) : A(E)] = i
the global index of the superselection sectors. In fact A will turn out to be rational in an even stronger sense, namely there exist no sectors with infinite dimension, except the ones that are trivially constructed as direct sums of finite-dimensional sectors. Moreover, we shall exhibit an explicit way to generate the superselection sectors of ˆ A from the local data in E: we consider the canonical endomorphism γE of A(E) into A(E) and its restriction λE = γE |A(E) ; then λE extends to a localized endomorphism λ of A acting identically on A(I ) for all intervals I disjoint from E. We have ρi ρ¯i , (1) λ= i
where the ρi ’s are inequivalent irreducible endomorphisms of A localized in I1 with conjugates ρ¯i localized in I2 and the classes {[ρi ]}i exhaust all the irreducible sectors. To understand this structure, consider the symmetric case I1 = I , I2 = −I . Then A(−I ) = j (A(I )), where j is the anti-linear PCT automorphism, hence we may identify 1 As shown in [18], one may always extend A to the dual net Ad , which is conformal and satisfies Haag duality. 2 This general property is satisfied, in particular, if Tr(e−βL0 ) < ∞ for all β > 0, where L is the 0 conformal Hamiltonian, cf. [5, 8].
Multi-Interval Subfactors and Modularity of Representations in CFT
633
A(−I ) with A(I )opp . Moreover the formula ρ¯i = j · ρi · j holds for the conjugate sector [17], thus by the split property we may identify {A(E), ρi ρ¯i |A(E) } with {A(I ) ⊗ opp A(I )opp , ρi ⊗ ρi }. Now there is an isometry Vi that intertwines the identity and ρi ρ¯i ˆ ˆ and belongs to A(E). We then have to show that A(E) is generated by A(E) and the Vi ’s and that the Vi ’s satisfy the (crossed product) relations characteristic of the LR inclusion. This last point is verified by identifying Vi with the standard implementation isometry as in [17], while the generating property follows by the index computation that will follow by the “transportability” of the canonical endomorphism above. The superselection structure of A can then be recovered by formula (1) and the split property. Note that the representation tensor category of A ⊗ Aopp generated by opp {ρi ⊗ ρi }i corresponds to the connected component of the identity in the fusion graph for A, therefore the associated fusion rules and quantum 6j -symbols are encoded in the ˆ isomorphism class of the inclusion A(E) ⊂ A(E), that will be completely determined by a crossed product construction. A further important consequence is that the braiding symmetry associated with all sectors is always non-degenerate, in other words the localizable representations form a modular tensor category. As shown by Rehren [41], this implies the existence and nondegeneracy of Verlinde’s matrices S and T , thus the existence of a unitary representation of the modular group SL(2, Z), which plays a role in topological quantum field theory. It follows that the net B ⊃ A ⊗ Aopp obtained by the LR construction is a field algebra for A ⊗ Aopp , namely B has no superselection sector (localizable in a bounded interval) and there is a generating family of sectors of A ⊗ Aopp that are implemented by isometries in B. Indeed B is a crossed product of A ⊗ Aopp by the tensor category of all its sectors. As shown by Masuda [30], Ocneanu’s asymptotic inclusion [35] and the Longo– Rehren inclusion in [28] are, from the categorical viewpoint, essentially the same constructions. The construction of the asymptotic inclusion gives a new subfactor M ∨ (M ∩ M∞ ) ⊂ M∞ from a hyperfinite II1 subfactor N ⊂ M with finite index and finite depth and it is a subfactor analogue of the quantum double construction of Drinfel d [11], as noted by Ocneanu. That is, the tensor category of the M∞ –M∞ bimodules arising from the new subfactor is regarded a “quantum double” of the original category of M–M (or N –N ) bimodules. On the other hand, as shown in [33], the Longo–Rehren construction gives the quantum double of the original tensor category of endomorphisms. (See also [12, Chapter 12] for a general theory of asymptotic inclusions and their relations to topological quantum field theory.) Our result thus shows that the inclusion arising from two separated intervals as above gives the quantum double of the tensor category of all localized endomorphisms. However, as the braiding symmetry is non-degenerate, the quantum double will be isomorphic to the subcategory of the trivial doubling of the original tensor category corresponding to the connected component of the identity in the fusion graph. Indeed, in the conformal case, multi-interval inclusions are self-dual. For our results conformal invariance is not necessary, although conformal nets provide the most interesting situation where they can be applied. We may deal with an arbitrary net on R, provided it is strongly additive (a property equivalent to Haag duality on R if conformal invariance is assumed) and there exists a cyclic and separating vector for the von Neumann algebras of half-lines (vacuum), such that the corresponding modular conjugations act geometrically as PCT symmetries (automatic in the conformal case). We will deal with this more general context.
634
Y. Kawahigashi, R. Longo, M. Müger
Our paper is organized as follows. The second section discusses general properties of multi-interval inclusions and in particular gives motivations for the strong additivity assumption. The third section enters the core of our analysis and contains a first inequality between the global index of the sectors and the index of the 2-interval subfactor. In Sect. 4 we study the structure of sectors associated with the LR net, an analysis mostly based on the braiding symmetry, the work of Izumi [22] and the α-induction, which has been introduced in [28] and further studied in [49, 2, 3]. Section 5 combines and develops the previous analysis to obtain our main results for the 2-interval inclusion. These results are extended to the case of n-interval inclusions in Sect. 6. We then illustrate our results in models and examples in Sect. 7. We collect in Appendix A the results of the universal crossed product description of the LR inclusion and of its multiple iterated occurring in our analysis. We include a further appendix concerning the disintegration of locally normal or localizable representations into irreducible ones, that is needed in the paper; these results have however their own interest. For basic facts concerning conformal nets of von Neumann algebras on R or S 1 , the reader is referred to [17, 28], see also Appendix B.
2. General Properties In this section we shortly examine a few elementary properties for nets of von Neumann algebras, partly to motivate our strong additivity assumption in the main body of the paper, and partly to examine relations with dual nets. To get our main result, the reader may however skip this part, except for Proposition 5, and get directly to the next section, where we will restrict our study to completely rational nets. In this section, A will be a local irreducible net of von Neumann algebras on S 1 , namely, A is an inclusion preserving map I I → A(I ) from the set I of intervals (open, non-empty sets with contractible closure) of S 1 to von Neumann algebras on a fixed Hilbert H space such that A(I1 ) and A(I2 ) commute if I1 ∩ I2 = ∅ and I ∈I A(I ) = B(H), where ∨ denotes the von Neumann algebra generated. If E ⊂ S 1 is any set, we put A(E) ≡
{A(I ) : I ∈ I, I ⊂ E}
and set ˆ A(E) ≡ A(E ) with E ≡ S 1 E.3 We shall assume Haag duality on S 1 , which automatically holds if A is conformal [4], namely, A(I ) = A(I ), I ∈ I,
3 The results in this section are also valid for nets of von Neumann algebras on R, if I denotes the set of non-empty bounded open intervals of R and E = R E for E ⊂ R.
Multi-Interval Subfactors and Modularity of Representations in CFT
635
ˆ ) = A(I ), I ∈ I, but for a disconnected set E ⊂ S 1 , thus A(I ˆ A(E) ⊂ A(E) is in general a non-trivial inclusion. We shall say that E ⊂ S 1 is an n-interval if both E and E are unions of n intervals with disjoint closures, namely E = I1 ∪ I2 ∪ · · · ∪ In ,
Ii ∈ I,
where I¯i ∩ I¯j = ∅ if i = j . The set of all n-intervals will be denoted by In . Recall that A is n-regular, if A(S 1 {p1 , . . . pn }) = B(H) for any p1 , . . . pn ∈ S 1 . Notice that A is 2-regular if and only if the A(I )’s are factors, since we are assuming Haag duality, and that A is 1-regular if for each point p ∈ S 1 , A(In ) = C (2) if In ∈ I and
n n In
= {p}.
Proposition 1. The following are equivalent for a fixed n ∈ N: ˆ (i) The inclusion A(E) ⊂ A(E) is irreducible for E ∈ In . (ii) The net A is 2n-regular. Proof. With E = I1 ∪ · · · ∪ In and p1 , . . . , p2n the 2n boundary points of E, we have = B(H), which holds if and only if ˆ ˆ A(E) ∩ A(E) = C if and only if A(E) ∨ A(E) 1 A(E) ∨ A(E ) = B(H), thus if and only if A(S {p1 , . . . , p2n }) = B(H), namely A is 2n-regular. If A is strongly additive, namely, A(I ) = A(I {p}), where I ∈ I and p is an interior point of I , then A is n-regular for all n ∈ N, thus all ˆ A(E) ⊂ A(E) are irreducible inclusions of factors, E ∈ In . A partial converse holds. If N ⊂ M are von Neumann algebras, we shall say that N ⊂ M has finite-index if the Pimsner–Popa inequality [38] holds, namely there exists λ > 0 and a conditional expectation E : M → N with E(x) ≥ λx, for all x ∈ M+ , and denote the index by [M : N ]E = λ−1 with λ the best constant for the inequality to hold and [M : N ] = [M : N ]min = inf [M : N ]E E
denotes the minimal index, (see [20] for an overview). Recall that A is split if there exists an intermediate type I factor between A(I1 ) and A(I2 ) whenever I1 , I2 are intervals and the closure I¯1 is contained in the interior of I2 . This implies (indeed it is equivalent to e.g. if the A(I )’s are factors) that A(I1 ) ∨ A(I2 ) is naturally isomorphic to the tensor product of von Neumann algebras A(I1 ) ⊗ A(I2 ) (cf. [10]) . For a conformal net, the split property holds if Tr(e−βL0 ) < ∞ for all β > 0, cf. [8]. Notice that if A is split and A(I ) is a factor for I ∈ I, then A(E) is a factor for E ∈ In for any n.
636
Y. Kawahigashi, R. Longo, M. Müger
Proposition 2. Let A be split and 1-regular. If there exists a constant C > 0 such that ˆ [A(E) : A(E)] < C ∀ E ∈ I2 , then [A(I ) : A(I {p})] < C ∀I ∈ I, p ∈ I. Proof. With I ∈ I and p ∈ I an interior point, let I1 , I2 ∈ I be the connected compo(n) nents of I {p}, let I2 ⊂ I2 be an increasing sequence of intervals with one boundary (n) (n) (n) point in common with I such that p ∈ / I2 and n I2 = I2 . Then En ≡ I1 ∪ I2 ∈ I2 and we have A(En ) ˆ n) A(E
A(I {p}), A(I ),
where Nn N means N1 ⊂ N2 ⊂ · · · and N = Nn , while Nn ! N will mean N1 ⊃ N2 ⊃ · · · and N = Nn . The first relation is clear by definition. The second relation follows because ˆ n ) = A(En ) = A(I ) ∨ A(Ln ), A(E where En ∈ I2 , En = I ∪ Ln , and Ln = {p}, therefore A(Ln ) ! C. By the split property A(I ) ∨ A(Ln ) ∼ = A(I ) ⊗ A(Ln ), hence by Eq. (2) A(En ) ! A(I ), thus ˆ n) A(E
A(I ).
The rest of the proof is the consequence of the following general proposition. Proposition 3. a) Let N1 ⊂ N2 ⊂ · · · ⊂ N ∩ ∩ ∩ M1 ⊂ M2 ⊂ · · · ⊂ M be von Neumann algebras, N = Ni , M = Mi , b) or let N1 ⊃ N2 ⊃ · · · ⊃ N ∩ ∩ ∩ M1 ⊃ M2 ⊃ · · · ⊃ M be von Neumann algebras, N = Ni , M = Mi . Then [M : N ] ≤ lim inf [Mi : Ni ]. i→∞
Multi-Interval Subfactors and Modularity of Representations in CFT
637
Proof. It is sufficient to prove the result in the situation b) as the case a) will follow after taking commutants. We may assume lim inf i→∞ [Mi : Ni ] < ∞. Let Ei : Mi → Ni be an expectation and λ > lim inf i→∞ [Mi , : Ni ]Ei . Then there exists i0 such that for all x ∈ M+ i , i ≥ i0 , Ei (x) ≥ λ−1 x. (0)
Let Ei = Ei |M , considered as a map from M to Ni , and let E be a weak limit point (0) of Ei . Then E(x) ≥ λ−1 x, x ∈ M+ , and E(M) ⊂ i Ni = N , moreover E|N = id, because Ei |N = id. Thus E is an expectation of M onto N and [M : N ] ≤ [M : N ]E ≤ λ. As Ei is arbitrary, we thus have [M : N ] ≤ lim inf i→∞ [Mi , : Ni ].
Recall now that the dual net Ad of A is the net on the intervals of R defined by ≡ A(R I ) , where we have chosen a point ∞ ∈ S 1 and identified S 1 with R ∪ {∞}. Note that if A is conformal, then Haag duality automatically holds [18] and the dual net Ad is also a conformal net which is moreover strongly additive; furthermore A = Ad , if and only if A is strongly additive, if and only if Haag duality holds on R.
Ad (I )
Corollary 4. In the hypothesis of Proposition 2, let Ad be the dual net on R, then A(I ) ⊂ Ad (I ) has finite index for all bounded intervals I of R. Proof. Denoting I1 = I , the complement of I in S 1 , the commutant of the inclusion A(I ) ⊂ Ad (I ) is A(I1 {∞}) ⊂ A(I1 ), and this has finite index. We have no example where A(I ) ⊂ Ad (I ) is non-trivial with finite index and A is conformal; therefore the equality A(I ) = Ad (I ), i.e. strong additivity, might follow from the assumptions in Corollary 2 in the conformal case. Proposition 5. Let A be split and strongly additive, then ˆ (a) The index [A(E) : A(E)] is independent of E ∈ I2 . ˆ (b) The inclusion A(E) ⊂ A(E) is irreducible for E ∈ I2 . Proof. Statement (b) is immediate by Proposition 1. Concerning (a), let E = I1 ∪ I2 and E˜ = I1 ∪ I˜2 , where I˜2 ⊃ I2 are intervals and ˆ E) ˜ : A(E)] ˜ < ∞, let E ˜ be the corresponding I0 ≡ I˜2 I2 . Assuming λ−1 ≡ [A( E expectation with λ-bound. Of course EE˜ is the identity on A(I0 ), hence ˆ ˜ = A(E), ⊂ A(I0 ) ∩ A(E) EE˜ (A(E)) where the last equality follows at once by the split property and strong additivity as A(I0 ) ∩ A(I˜2 ) = A(I2 ).
638
Y. Kawahigashi, R. Longo, M. Müger
Therefore EE˜ |Aˆ (E) = EE showing ˆ ˆ E) ˜ : A(E)], ˜ [A(E) : A(E)] ≤ [A( where we omit the symbol “min” as the expectation is unique. Thus the index decreases by decreasing the 2-interval. Taking commutants, it also increases, hence it is constant. Corollary 6. Let A satisfy the assumption of Proposition 2 and let Ad be the dual net on R of A. Then d (E) : Ad (E)] < ∞ ∀E ∈ I2 . [A Proof. We fix the point ∞ and may assume E = I1 ∪ I2 with ∞ ∈ I2 . Set E = I3 ∪ I4 with I3 ∞. Then Ad (I3 ) = A(I3 ), Ad (I2 ) = A(I2 ) and we have A(E) ⊂ Ad (I1 ) ∨ A(I2 ) d (E) = Ad (E) ⊂ A ˆ = (A(I3 ) ∨ Ad (I4 )) ⊂ (A(I3 ) ∨ A(I4 )) = A(E).
Anticipating results in the following, we have: ˆ Corollary 7. Let A be a local irreducible conformal split net on S 1 . If [A(E) : A(E)] = Iglobal < ∞, E ∈ I2 , then A is n-regular for all n ∈ N. Proof. If ρ is an irreducible endomorphism of A localized in an interval I , then ρ|A(I ) is irreducible [17]. Therefore, by Th. 9 (and comments thereafter) and Prop. 36, the ˆ assumptions imply that if E ∈ I2 then A(E) ⊂ A(E) is the LR inclusion associated ˆ with the system of all irreducible sectors, which is irreducible. Then A(E) ⊂ A(E) is irreducible for all E ∈ In as we shall see in Sect. 6. By Prop. 1 this implies the regularity for all n. In view of the above results, it is natural to deal with strongly additive nets, when considering multi-interval inclusions of local algebras and thus to deal with nets of factors on R, as we shall do in the following. 3. Completely Rational Nets In this section we will introduce the notion of completely rational net, that will be the main object of our study in this paper, and get a first analysis. In the following, we shall denote by Ithe set of bounded open non-empty intervals of R, set I = R I and define A(E) = {A(I ), I ⊂ E, I ∈ I} for E ⊂ R. We again denote by In the set of unions of n elements of I with pairwise disjoint closures.4 Definition 8. A local irreducible net A of von Neumann algebras on the intervals of R is called completely rational if the following holds: (a) Haag duality on R : A(I ) = A(I ) , I ∈ I, (b) A is strongly additive, 4 There will be no conflict with the notations in the previous section as the point ∞ does not contribute to the local algebras and we may extend A to S 1 setting A(I ) ≡ A(I {∞}), see Appendix B.
Multi-Interval Subfactors and Modularity of Representations in CFT
639
(c) A satisfies the split property, ˆ (d) [A(E) : A(E)] < ∞, if E ∈ I2 . Note that if A is the restriction to R of a local conformal net on S 1 (namely a local net which is Möbius covariant with positive energy and cyclic vacuum vector), then (a) is equivalent to (b), cf. [18]. ˆ We shall denote by µA = [A(E) : A(E)] the index of the irreducible inclusion of ˆ factors A(E) ⊂ A(E) in case µA is independent of E ∈ I2 , in particular if A is split and strongly additive, by Proposition 5. By a sector [ρ] of A we shall mean the equivalence class of a localized endomorphism ρ of A, that will always be assumed to be transportable, i.e. localizable in each bounded interval I (see also Appendix B). Unless otherwise specified, a localized endomorphism ρ has finite dimension. If ρ is localized in the interval I , its restriction ρ|A(I ) is an endomorphism of A(I ), thus it gives rise to a sector of the factor A(I ) (i.e. a normal unital endomorphism of A(I ) modulo inner automorphisms of A(I ) [25]) and it will be clear from the context which sense will be attributed to the term sector. The reader unfamiliar with the sector strucure is referred to [25, 28, 17] and to Appendix B. Let E = I1 ∪ I2 ∈ I2 and ρ and σ be irreducible endomorphisms of A localized respectively in I1 and in I2 . Then ρσ restricts to an endomorphism of A(E), since both ρ and σ restrict. ˆ Denote by γE the canonical endomorphism of A(E) into A(E) and λE ≡ γE |A(E) . Theorem 9. Let A be completely rational. With the above notations, ρσ |A(E) is contained in λE if and only if σ is conjugate to ρ. In this case ρσ |A(E) ≺ λE with multiplicity one. ˆ such Proof. By [28] ρσ |A(E) ≺ λE if and only if there exists an isometry v ∈ A(E) that vx = ρσ (x)v
∀x ∈ A(E).
(3)
If Eq. (3) holds, then it holds for x ∈ A(I ) for all I ∈ I by strong additivity, hence σ = ρ. ¯ Conversely, if σ = ρ, ¯ then there exists an isometry v ∈ A(I ) such that vx = ρσ (x)v for all x ∈ A(I ), where I is the interval I ⊃ E given by I = I1 ∪ I2 ∪ I¯3 with I3 the bounded connected component of E . Since ρ and σ act trivially on A(I3 ), we have v ∈ A(I3 ) ∩ A(I ), but
ˆ A(I3 ) ∩ A(I ) = (A(I3 ) ∨ A(I )) = A(E ) = A(E),
therefore Eq. (3) holds true. As the ρ and σ are irreducible, the isometry v in (3) unique up to a phase and this is equivalent to ρ ρ| ¯ A(E) ≺ λE with multiplicity one. We remark that in the above theorem strong additivity is not necessary for ρ ρ¯ ≺ λE , ˆ as can be replaced by the factoriality of A(E), equivalently of A(E); this holds e.g. in the conformal case. Moreover also the split property is unnecessary, it has not been used.
640
Y. Kawahigashi, R. Longo, M. Müger
We shall say that the net A on R has a modular PCT symmetry, if there exists a cyclic separating (vacuum) vector " for each A(I ), if I is a half-line (Reeh-Schlieder property), and the modular conjugation J of A(a, ∞) with respect to " has the geometric property J A(I + a)J = A(−I + a),
I ∈ I, a ∈ R.
(4)
This is automatic if A is conformal, see [4, 15]. It easy to see that the modular PCT property implies translation covariance, where the translation unitaries are products of modular conjugations, but positivity of the energy does not necessarily hold. Note that Eq. (4) implies Haag duality for half-lines A(−∞, a) = A(a, ∞),
a ∈ R.
Setting j ≡ AdJ , the conjugate sector exists and it is given by the formula [16] ρ¯ = j · ρ · j.
Corollary 10. If A is completely rational with modular PCT, then A is rational, namely there are only finitely many irreducible sectors [ρ0 ], [ρ1 ], . . . , [ρn ] with finite dimension and we have n
d(ρi )2 ≤ µA .
(5)
i=0
Proof. It is sufficient to show this last inequality. By the split property, the endomorphisms ρi ρ¯i |A(E) can be identified with the endomorphisms ρi ⊗ ρ¯i on A(I1 ) ⊗ A(I2 ), hence they are mutually inequivalent. By Theorem 9, n
ρi ρ¯i |A(E) ≺ λE ,
(6)
i=1
hence
ˆ : A(E)] = d(λE ) ≥ µA = [A(E)
d(ρi )2 .
We now give a partial converse to Theorem 9. Lemma 11. Let A be completely rational and let EE be the conditional expectation ˆ A(E) → A(E). (a) If E ⊂ E˜ and E, E˜ ∈ I2 , then EE˜ |Aˆ (E) = EE . ˆ E) ˜ to A(E) ˜ such that γ | ˆ (b) There exists a canonical endomorphism γE˜ of A( A(E) is a ˆ canonical endomorphism of A(E) into A(E) and satisfies γ |Aˆ (E) ∩A(E) ˜ = id.
Multi-Interval Subfactors and Modularity of Representations in CFT
Proof. (a) has been shown in the proof of Proposition 5. (b) is an immediate variation of [16, Prop. 2.3] and [28, Theorem 3.2].
641
Theorem 12. Let A be completely rational. Given E ∈ I2 , λE extends to a localized (transportable) endomorphism λ of A such that λ|A(I ) = id, if I ⊂ E , I ∈ I. Moreover, d(λ) = d(λE ) = µA . In particular, if A is conformal, then λ is Möbius covariant with positive energy. Proof. Let E = (a, b) ∪ (c, d), where a < b < c < d and E˜ = (a , b) ∪ (c, d ), where a < a and d > d. By Lemma 11 we have a γE˜ with λE˜ |A(I ) = id, if I ⊂ I, I ∈ E˜ E. ˆ E) ˜ → A(E) ˜ acting trivially Analogously there is a canonical endomorphism γ : A( on A(E). We may write γE˜ = Ad u · γ ˜ hence with u ∈ A(E),
λE˜ = Adu · λ,
λ = γ |A(E) ˜ .
Since γ |A(a,b) = id, γ |A(c,d) = id, we have λE˜ = Adu on A(a, b), A(c, d). Therefore, the formula
λ˜ = Adu
defines an endomorphism of A(a, d) acting trivially an A(b, c), with ˜ A((a,b)∪(c,d)) = λE . λ| We may also have chosen γ “localized” in (a , a ) ∪ (d , d ) with a < a < a and d < d < d so that we may assume λ˜ to act trivially on A((a , b) ∪ (c, d )). Letting a , a → −∞ and d , d → +∞, we construct, by an inductive limit of the ˜ an endomorphism λ of the quasi-local C ∗ -algebra s>0 A(−s, s). λ’s, Clearly, λ is localized in (a, d), acts trivially on A(b, c) and is transportable. Moreover, λ has finite index as the operators R, R¯ ∈ (i, λ2 ) in the standard solution for the conjugate equation [25, 29] ¯ R¯ ∗ λ(R) = 1,
¯ = 1, R ∗ λ(R)
ˆ on A(E) give the same relation on A(I ) for any I ⊃ E, I ∈ I. If A is conformal, then ρ is covariant with respect to translations and dilations by [17]. As we may vary the point ∞, λ is covariant with respect to dilations and translations with respect to a different point at ∞, hence λ is Möbius covariant. Lemma 13. Let A be completely rational. Then there are at most 'µA ( mutually different irreducible sectors of A (with finite or infinite dimension). Proof. Consider the family {[ρλ ]} of all irreducible sectors and let N be the cardinality of this family. With E = I1 ∪ I2 ∈ I2 , we may assume that each ρλ is localized in I1 and choose endomorphisms σλ equivalent to ρλ and localized in I2 . Let then ˆ uλ ∈ (ρλ , σλ ) ⊂ A(E) be a unitary intertwiner and E the conditional expectation from ˆ A(E) to A(E). Since uλ ρλ (x) = σλ (x)uλ = xuλ ,
∀x ∈ A(I1 ),
642
Y. Kawahigashi, R. Longo, M. Müger
we have
u∗λ uλ ρλ (x) = ρλ (x) u∗λ uλ ,
∀x ∈ A(I1 ),
E(u∗λ uλ )
∈ A(E) intertwines ρλ |A(I1 ) and ρλ |A(I1 ). The split property hence T = allowing us to identify A(E) and A(I1 ) ⊗ A(I2 ), every state ϕ in A(I2 )∗ gives rise to a conditional expectation Eϕ : A(E) → A(I1 ). Then Eϕ (T ) ∈ (ρλ , ρλ ), and the inequivalence of ρλ |A(I1 ), ρλ |A(I1 ), see above, entails Eϕ (T ) = 0. Since this holds for every ϕ ∈ A(I2 )∗ we conclude T = E(u∗λ uλ ) = 0,
λ = λ.
ˆ Let M be the Jones extension of A(E) ⊂ A(E), e ∈ M the Jones projection implementˆ ing E and let E1 : M → A(E) be the dual conditional expectation. Then eu∗λ uλ e = 0 if λ = λ and therefore the eλ ≡ uλ eu∗λ are mutually orthogonal projections in M with E1 (eλ ) = µ−1 λ eλ is again an orthogonal projection A . Since their (strong) sum p = we have p ≤ 1 and thus E1 (p) ≤ E(1) = 1. This implies the bound N µ−1 A ≤ 1 and thus our claim. We shall say that a sector [ρ] is of type I if ∨I ∈I ρ(A(I )) is a type I von Neumann algebra, namely ρ is a type I representation of the quasi local C∗ -algebra ∪s>o A(−s, s). Corollary 14. If A is completely rational on a separable Hilbert space, then all factor representations of A on separable Hilbert spaces are of type I. Proof. Assuming the contrary, by Corollary 59 we get an infinite family [ρλ ] of different irreducible sectors. This is in contradiction with the preceding proposition. We end this section with the following variation of a known fact [10]. Proposition 15. Let A be a completely rational net with modular PCT on a Hilbert space H. Then H is separable. Proof. We chose a pair I ⊂ I˜ of intervals and a type I factor N between A(I ) and A(I˜). The vacuum vector " is separating for A(I˜), hence for N . Thus N admits a faithful normal state, hence it is countably decomposable. Being of type I, N is countably generated. So A(I )" ⊂ N " is a separable subspace of H. But ∪∞ n=1 A(−n, n)" is dense in H, thus H is separable. 4. The Structure of Sectors for the (Time = 0) LR Net This section contains a study of the sector strucure for the net obtained by the LR construction, by means of the braiding symmetry. It will be continued in the next section by a different approach. Let N be an infinite factor and {[ρi ]} a rational system of sectors of N , namely the [ρi ]’s form a finite family of mutually different irreducible finite-dimensional sectors of N which is closed under conjugation and taking the irreducible components of compositions. The identity sector is usually labeled as ρ0 . We call M ⊃ N ⊗ N opp
Multi-Interval Subfactors and Modularity of Representations in CFT
643
the LR inclusion, the canonical inclusion constructed in [28], where M is a factor, N ⊗ N opp ⊂ M is irreducible with finite index and opp ρi ⊗ ρ i λ= i
for λ ∈ End(N ⊗ N opp ) as the restriction of γ : M → N ⊗ N opp . We shall give an alternative characterization of this inclusion in Proposition 45. The same construction works in slightly more generality, by replacing N opp with a opp j factor N1 and {ρi }i by {ρi }i ⊂ End(N1 ), where ρ → ρ j is an anti-linear invertible tensor functor of the tensor category generated by {ρi }i to the tensor category generated j by {ρi }i . Extensions of our results to this case are obvious, but sometimes useful, and will be considered possibly implicitly. The following is due to Izumi [22]. Since it is easy to give a proof in our context, we include a proof here. opp
Lemma 16. For every ρi , the (N ⊗ N opp )–M sector [ρi ⊗ id][γ ] = [id ⊗ ρ¯i ][γ ] is irreducible and each irreducible (N ⊗ N opp )–M sector arising from N ⊗ N opp ⊂ M is of this form, where γ is regarded as an (N ⊗ N opp )–M sector. If [ρi ] = [ρj ] as A–A sectors, then [ρi ⊗ id][γ ] = [ρj ⊗ id][γ ] as (N ⊗ N opp )–M sectors. We have opp [ρi ⊗ ρj ][γ ] = k Nikj¯ [ρk ⊗ id][γ ] as (N ⊗ N opp )–M sectors, where Nikj¯ is the structure constant for {ρi }i . Proof. Set [σ ] = [ρi ⊗ id][γ ] and compute [σ ][σ¯ ]. Since [γ¯ ] = [ι], where ι is the inclusion ⊗ N opp into M regarded as a M–(N ⊗ N opp ) sector, and [γ ][ι] = map of Nopp opp [λ] = k [ρk ⊗ ρk ], we have [σ ][σ¯ ] = k [ρi ρk ρ¯i ⊗ ρk ], and this contains the identity only once. So [ρi ⊗ id][γ ] is an irreducible (N ⊗ N opp )–M sector. We can similarly prove that if [ρi ] = [ρj ], then [ρi ⊗ id][γ ] = [ρj ⊗ id][γ ]. opp We next set [σ ] = [id ⊗ ρ¯i ][γ ] as an (N ⊗ N opp )–M sector, which is also irreducible. We compute opp opp opp [ρi ρk ⊗ ρk ρi ], [σ ][σ¯ ] = [ρi ⊗ id][λ][id ⊗ ρi ] = k
opp
which contains the identity only once. So we have [ρi ⊗ id][γ ] = [id ⊗ ρ¯i The rest is now easy.
][γ ].
Let us now assume we have a strongly additive, Haag dual, irreducible net of factors A(I ) on R with a rational system of irreducible sectors {[ρi ]}i (with ρ0 = id), namely {[ρi ]}i is a family of finitely many different irreducible sectors of A with finite dimension stable under conjugation and irreducible components of compositions. One may construct [42, 28] a net of subfactors A ⊗ Aopp ⊂ B so
that the correspondopp ing canonical endomorphism restricted on A ⊗ Aopp is given by i ρi ⊗ ρi . We call opp opp opp opp ∗ this B the LR net. For A , we use ε (ρk , ρl ) = j (ε(ρk , ρl )) , where j is the anti-isomorphism from A to Aopp . In order to distinguish two braidings, we write ε + and ε − . In other words, the LR net here is obtained as the time zero fields from the canonical two-dimensional net constructed in [28]: it is a local net, but if A is translation covariant with positive energy, B is translation covariant without the spectrum condition (the translation on B are space translations).
644
Y. Kawahigashi, R. Longo, M. Müger
Then the net of inclusion A ⊗ Aopp (I ) ⊂ B(I ) is a net of subfactors in the sense of [28, Sect. 3], that is, we have a vacuum vector with Reeh-Schlieder property and consistent conditional expectations. We denote by γ the canonical endomorphism of B into A ⊗ Aopp and its restriction to A ⊗ Aopp by λ. We may suppose that also λ is localized in I . We shorten our notation by setting N ≡ A(I ) and M = B(I ). We thus opp have λ(x) = i Vi (ρi ⊗ ρi )(x)Vi∗ , where Vi ’s are isometries in N ⊗ N opp with ∗ i Vi Vi = 1. We follow [21] for the terminology of (N ⊗ N opp )–M sectors, and so on, and study the sector structure of the subfactor N ⊗ N opp ⊂ M in this section. In other words we study the sector structure of a single subfactor, not the structure of superselection sectors of the net, though we will be interested in this structure for the net in the next section. So the terminology sector is used for a subfactor, not for a net, in this section. However the inclusion N ⊗ N opp ⊂ M has extra structure inherited by the inclusion of nets A ⊗ Aopp ⊂ B, that is there are the left and right unitary braid symmetries and opp the extension and restriction maps. We first note that {[ρi ⊗ ρj ]}ij gives a system of irreducible A ⊗ Aopp –A ⊗ Aopp sectors. This gives the description of the principal graph of N ⊗ N opp ⊂ M as a corollary as follows, which was first found by Ocneanu in [35] for his asymptotic inclusion. Label opp even vertices with (i, j ) for [ρi ⊗ ρj¯ ] and odd vertices with k for [ρk ⊗ id][γ ] and draw an edge with multiplicity Nijk between the even vertex (i, j ) and the odd vertex k. The connected component of this graph containing the vertex (0, 0) is the principal graph of the subfactor N ⊗ N opp ⊂ M. Now we consider the α-induction introduced in [28] and further studied in [49, 2], namely if σ is a localized endomorphism of A ⊗ Aopp , we set ασ± = γ −1 · Ad(ε ± (σ, λ)) · σ · γ .
(7)
(The notation in [28] is σ ext .) Recall that if σ is an endomorphism of A ⊗ Aopp localized in the interval I , then ασ± is an endomorphism of B localized in a positive/negative half-line containing I , yet, as shown in [2, I], ασ± restricts to an endomorphism of M = B(I ). We will denote this restriction by the same symbol ασ± . Lemma 17. The M–M sectors [αρ+i ⊗id ] are irreducible and mutually different.
Proof. We compute *αρ+i ⊗id , αρ+j ⊗id +, the dimension of the intertwiner space between αρ+i ⊗id and αρ+j ⊗id , by using [2, I, Theorem 3.9]. This number is then equal to opp * ρk ρi ⊗ ρk , ρj ⊗ id+ = δij . k
This gives the conclusion. + Lemma 18. As M–M sectors, we have [αρ+i ⊗id ] = [αid⊗ρ opp ]. i
+ Proof. By a similar argument to the proof of the above lemma, we know that [αid⊗ρ opp ]
is also irreducible. [2, I, Theorem 3.9] gives opp opp + *[αρ+i ⊗id ], [αid⊗ρ ρk ρi ⊗ ρk , id ⊗ ρi + = 1, opp ]+ = * i
which gives the conclusion.
k
i
Multi-Interval Subfactors and Modularity of Representations in CFT
645
We then have the following corollary. Corollary 19. The set of irreducible M–M sectors appearing in the decomposition of αρ+ ⊗ρ opp for all i, j is {[αρ+i ⊗id ]}i . i
j
The next theorem is useful for studying the subfactors arising from disconnected intervals for a conformal net. For the rest of this section we shall assume the braiding to be non-degenerate. Theorem 20. Assume the braiding to be non-degenerate and suppose an irreducible M–M sector [β] appears in decompositions of both αρ+ ⊗ρ opp and αρ− ⊗ρ opp for some i
i, j, k, l. Then [β] is the identity of M.
k
j
l
Proof. α + and α − map sectors localized in bounded intervals to soliton sectors localized in right unbounded and left unbounded half-lines, respectively. Hence [β] is localized in a bounded interval. By the above corollary, we may assume that [β] = [αρ+i ⊗id ] for some i, hence ρi ⊗ id must have trivial monodromy with λ, i.e., ε(ρi ⊗ id, λ)ε(λ, ρi ⊗ id) = 1, which in turn gives ε(ρi , ρk )ε(ρk , ρi ) = 1 for all k. The non-degeneracy assumption gives [ρi ] = [id] as desired. − We now define an endomorphism of M by βij = αρ+i ⊗id αid⊗ρ opp . More explicitly, j
we have βij = γ −1 · Ad(Uij+− ) · (ρi ⊗ ρj ) · γ , where opp opp opp Uij+− = Vk (ε + (ρi , ρk ) ⊗ ε −,opp (ρj , ρk ))(ρi ⊗ ρj )(Vk∗ ). opp
k
Note that if we define similarly opp opp opp Uij++ = Vk (ε + (ρi , ρk ) ⊗ ε +,opp (ρj , ρk ))(ρi ⊗ ρj )(Vk∗ ), k
we then have αρ+ ⊗ρ opp = γ −1 · Ad(Uij++ ) · (ρi ⊗ ρj ) · γ . By [2],1 Prop. 18, we have opp
i
j
− − + − + [βij ] = [αρ+i ⊗id ][αid⊗ρ opp ] = [αρ ⊗id ][αρ ⊗id ] = [α opp ][αρ ⊗id ] j i i id⊗ρ j
j
as M–M sectors. The following proposition is originally due to Izumi [22] (with a different proof) and first due to Ocneanu [37] in the setting of the asymptotic inclusion. (Also see [13].) Proposition 21. Each [βij ] is an irreducible M–M sector and these are mutually different for different pairs of i, j . Each irreducible M–M sector arising from N ⊗ N opp ⊂ M is of this form. Proof. We compute − + − + − *βij , βkl + = *αρ+i ⊗id αid⊗ρ opp , αρ ⊗id α opp + = *αρ¯ ρ ⊗id , α opp opp +. k k i id⊗ρ id⊗ρ ρ¯ j
The only sector which can be contained in
l
[αρ+¯k ρi ⊗id ]
l
and
− [αid⊗ρ opp opp ] l ρ¯j
j
is the identity
by the above proposition. So the above number is δik δj l . Since the square sums of the opp statistical dimensions for {ρi ⊗ ρj }ij and {βij }ij are the same, it completes the proof.
646
Y. Kawahigashi, R. Longo, M. Müger opp
Note that here we have used the definition in [28] for the map ρi ⊗ ρj → βij , and a general theory of this map has been studied in [2] under the name α-induction. But in [2], they assumed a certain condition, called chiral locality in the terminology of [3], and some results in [2] depend on this assumption, while the definition itself makes sense without it. Our mixed use of braidings ε+ and ε − here violates this chiral locality condition, so we can use the results in [2] here only when they are independent of the chiral locality assumption. For example, it is easy to see that the analogue of [2, I, Theorem 3.9] does not hold for our map here. With the above proposition, we have the following description of the dual principal graph of N ⊗ N opp ⊂ M as a corollary, which is originally due to Ocneanu [37]. (Also see [13].) Label even vertices with (i, j ) for [βi j¯ ] and odd vertices with k for [ρk ⊗ id][γ ] and draw an edge with multiplicity Nijk between the even vertex (i, j ) and the odd vertex k. The connected component of this graph containing the vertex (0, 0) is the dual principal graph of the subfactor N ⊗ N opp ⊂ M, which is the same as the principal graph. We next study the tensor category of the M–M sectors. Lemma 22. Let V , W be intertwiners from ρi ρk to ρm and from ρj ρl to ρn , respectively, in N . Then V ⊗ W ∗ opp ∈ N ⊗ N opp in an intertwiner from βij βkl to βmn . Proof. By a direct computation.
Then we easily get the following from the above lemma. (The quantum 6j -symbols for subfactors have been introduced in [36] as a generalization for classical 6j -symbols. See [12, Chapter 12] for details.) Theorem 23. In the above setting, the tensor categories of (N ⊗ N opp )–(N ⊗ N opp ) sectors and M–M sectors with quantum 6j -symbols are isomorphic. 5. Relations with the Quantum Double This section contains our main results. Here below we will consider an inclusion A ⊂ B of nets of factors. We shall say that A ⊂ B has finite index if there is a consistent family of conditional expectations EI : B(I ) → A(I ), I ∈ I and [B(I ) : A(I )]EI < ∞ does not depend on I ∈ I. The independence of the index of the interval I automatically holds if there is a vector (vacuum) with Reeh-Schlieder property and EI preserves the vacuum state (standard nets, see [28]). The index will be simply denoted by [B : A]. Proposition 24. Let A ⊂ B be a finite-index inclusion of nets of factors as above. If A and B are completely rational then µ A = I 2 µB with I = [B : A]. Proof. If N1 , N2 are factors, we shall use the symbol α
N1 ⊥ N2 to indicate that N1 ⊂ N2 and [N2 : N1 ] = α.
Multi-Interval Subfactors and Modularity of Representations in CFT
647
Let E = I1 ∪ I2 ∈ I2 ; we will show that µB
B(E) ⊥ B(E ) ∪ I2 ,
I2 ∪ I2 µA
A(E) ⊥ A(E ) where A(E) ⊂ B(E) has index I2 because A(E) ∼ = A(I1 ) ⊗ A(I2 ), B(E) ∼ = B(I1 ) ⊗ B(I2 ) and [B(Ii ) : A(Ii )] = I. In the diagram, the commutants are taken in the Hilbert space HB of B, hence µB
B(E) ⊥ B(E ) is obvious. We now show that on HB , I2 µA
A(E) ⊥ A(E ). Let γ : B → A be a canonical endomorphism with λ = γ |A localized in an interval I0 ; then the net I → A(I ) on HB (I ⊃ I0 ) is unitarily equivalent to the net I → λ(A(I )) on HA and we may assume I0 ⊂ I1 . Then the correspondence associated with A(E)-A(E )
on HB ,
namely HB with the natural commuting actions of A(E) and A(E ), is unitarily equivalent to the one associated with λ(A(E))-λ(A(E ))
on HA ,
namely HA with the commuting actions of A(E) and A(E ) obtained by composing their defining actions with the map X → λ(X). But λ(A(E)) = λ(A(I1 ) ∨ A(I2 )) = λ(A(I1 )) ∨ A(I2 ) and λ(A(E )) = A(E ), hence the A(E)–A(E ) correspondence on HB is unitarily equivalent to (λ(A(I1 )) ∨ A(I2 )) − A(E ) on HA and its index is ˆ ˆ [A(E) : λ(A(I1 )) ∨ A(I2 )] = [A(E) : A(E)][A(E) : λ(A(I1 )) ∨ A(I2 )] = µA I2 . It follows from the diagram that I 2 µ A = µB I 2 · I 2 , thus, I2 µB = µA .
The following proposition may be generalized to the case of a finite-index inclusion A ⊂ B as above.
648
Y. Kawahigashi, R. Longo, M. Müger
Proposition 25. Let A be completely rational with modular PCT and B ⊃ A ⊗ Aopp be the LR net. Then also B is completely rational with modular PCT. Proof. Let E = I1 ∪ I2 and I3 be the bounded connected component of E . Set C ≡ A ⊗ Aopp . Then the conditional expectation EI : B(I ) → C(I ) associated with the ˆ interval I , where I is the interior of I¯1 ∪ I¯2 ∪ I¯3 , maps B(E) onto C(E), because ˆ EI (B(E)) ⊂ C(I3 ) ∩ C(I ) = C(E), thus E ≡ E0 · EI |B(E)
(8)
ˆ onto is a finite-index expectation of B(E) onto C(E), where E0 is the expectation of C(E) C(E). Therefore µB < ∞ follows by a diagram similar to the one in (5) (with A ⊗ Aopp instead of A), as we know a priori that the vertical inclusions have a finite index, while the bottom horizontal inclusion has finite index by the argument given there. Then the strong additivity of B follows easily, and so its modular PCT property, but we omit the arguments that are not essential here (if A is a conformal case this follows directly because then also B is conformal). We now show the split property of B. For notational convenience we treat the case of two separated intervals, rather than that of an interval and the complement of a larger interval. It will be enough to show that the above expectation (8) satisfies E(b1 b2 ) = E(b1 )E(b2 ),
bi ∈ C(Ii ),
and E(B(Ii )) ⊂ C(Ii ), as we may then compose a normal product state ϕ1 ⊗ ϕ2 of C(I1 ) ∨ C(I2 ) C(I1 ) ⊗ C(I2 ) with E to get a normal product state of B(I1 ) ∨ B(I2 ). (h) Let Ri ∈ B(Ih ), h = 1, 2, be elements satisfying the relations (15) so that B(Ih ) is (h) generated by C(Ih ) and {Ri }i . With bh ∈ B(Ih ) we then have (h) (h) (h) ai Ri , ai ∈ C(Ih ), b(h) = i
hence b(1) b(2) =
i,j
(1)
(1) (2)
(1)
(2)
a i aj R i R j ,
(2)
(1)
so we have to show that E(Ri Rj ) = 0 unless i = j = 0. Now Ri ˆ some unitary ui ∈ C(E) and (2)
(2)
EI (Ri Rj ) = EI
k
Cijk
(2)
(2)
Rk
= Cij0
(2)
0 = δij ¯ Cij
(2)
(2)
= ui Ri
for
,
(see Appendix A for the definition of the Cijk ), hence (1)
(2)
(2)
(2)
(2)
(2)
E(Ri Rj ) = E(ui Ri Rj ) = E0 (ui EI (Ri Rj )) = E0 (ui Ci0i¯
(2)
) = E0 (ui )Ci0i¯
(2)
,
which is 0 if i = 0 because E0 (ui ) ∈ C(E) is an intertwiner between irreducible endomorphisms localized in I1 and I2 , while E0 (u0 ) = E0 (1) = 1.
Multi-Interval Subfactors and Modularity of Representations in CFT
649
We get the following corollary, where the last part will follow from Proposition 36 later. Corollary 26. Let A be completely rational and A ⊗ Aopp ⊂ B be the LR inclusion. Then
2 µB , µ2A = Iglobal
where Iglobal = d(ρi )2 . ˆ In particular, µB = 1 if and only if A(E) ⊂ A(E) is isomorphic to the LR inclusion. Proof. By Propositions 24, 25 and 36.
Lemma 27. Let A1 , A2 be irreducible, Haag dual nets on separable Hilbert spaces. Assume that each sector of A1 is of type I. If ρ is an irreducible localized endomorphism of A1 ⊗ A2 , then ρ ρ1 ⊗ ρ2 with ρi irreducible localized endomorphisms of Ai . Proof. Let π be a DHR representation of A1 ⊗ A2 (see Appendix B) on a separable Hilbert space H. Then π(A1 ) and π(A2 ) generate the von Neumann algebra B(H), where Ai denotes the quasi-local C∗ -algebra associated by Ai . Hence π(A1 ) and π(A2 ) are factors. Let πi ≡ π|Ai , where we identify A1 with A1 ⊗ C and A2 with C ⊗ A2 , then πi is easily seen to be localizable in bounded intervals (namely if I1 ∈ I, the restriction of π1 to the C ∗ -algebra generated by {Ai (I ) : I ∈ I1 , I ∈ I} extends to a normal representation of Ai (I1 )). Therefore πi is unitarily equivalent to a localized endomorphism of Ai . As π1 is a factor representation, by assumption π(A1 ) is a type I factor and so is π(A2 ) . We then have a decomposition π = π1 ⊗ π2 . This concludes the proof. Corollary 28. Let A be a completely rational net on a separable Hilbert space. The only irreducible finite dimensional sectors of A ⊗ Aopp are opp
[ρi ⊗ ρj ] with [ρi ], [ρj ] irreducible sectors of A. Proof. Immediate by Lemma 14 and the above lemma.
Lemma 29. Let A be completely rational and B ⊃ A ⊗ Aopp the LR net. If σ is an irreducible localized endomorphism of B and σ ≺ αρ+ , σ ≺ αρ− for some localized endomorphism ρ, ρ of A ⊗ Aopp , then σ is localized in a bounded interval. Proof. The thesis follows because σ ≺ αρ+ is localized in a right half-line and σ ≺ αρ− in a left half-line. The following lemma extends Theorem 20.
650
Y. Kawahigashi, R. Longo, M. Müger
Lemma 30. Let A be a completely rational net, {[ρi ]}i the system of all irreducible sectors with finite dimension, and B ⊃ A⊗Aopp the LR net. The following are equivalent: (i) The braiding of the net A is non-degenerate. (ii) B has no non-trivial localized endomorphism (localized in a bounded interval, finite index). Proof. We use now an argument in [7]. Let σ be a non-trivial irreducible localized endomorphism of B localized in an interval, with d(σ ) < ∞. By Frobenius reciprocity σ ≺ ασ+rest ,
σ ≺ ασ−rest , where σ rest = γ · σ |A⊗Aopp and γ : B → A ⊗ Aopp is a canonical endomorphism. Hence if ρk ⊗ id ≺ σ rest is an irreducible sector with [αρ+k ⊗id ] = [σ ], then by [28], opp ρi ⊗ ρi must be trivial, Prop. 3.9, the monodromy of ρk ⊗ id with γ |A⊗Aopp = namely ρk is a non-trivial sector with degenerate braiding. The converse is true, namely if ρk is a non-trivial degenerate sector, then αρ+k ⊗id is a non-trivial sector of B localized in a bounded interval. Lemma 31. Let A be a completely rational net with modular PCT and let {[ρi ]}i be the system of all finite dimensional sectors of A. If E = I1 ∪ I2 ∈ I2 , then λE = ρi ρ¯i |A(E) , i
where λE = γE |A(E) , the ρi ’s are localized in I1 and the ρ¯i ’s are localized in I2 . Proof. Let j = AdJ , where J is the modular conjugation of A(0, ∞). Given I ∈ I we may identify A(I )opp with j (A(I )) = A(−I ). We define a net A˜ on R setting ˜ ) ≡ A(I ) ⊗ A(I )opp = A(I ) ⊗ A(−I ), A(I
I ∈ I.
ˆ → A(E) be the canonWith I = (a, b) with 0 < a < b and E = I ∪ −I , let γE : A(E) ical endomorphism and λE ≡ γE |A(E) . We identify now λE with an endomorphism of ˜ ) and want to show that ηI extends to a localized endomorphism of A. ˜ ηI of A(I The proof is similar to the one of Theorem 12. With d > c > b, by Lemma 11 there ˜ d) with η| ˜ is an extension η of η(a,b) to A(a, A(b,d) = id and a canonical endomorphism ˜ d) such that η(a,d) acting trivially on A(a, c) with a unitary u ∈ A(a, η = Adu · η(a,d) . ˜ Therefore Adu|A˜ (−∞,c) is an extension of η(a,b) to A(−∞, c) which acts trivially on ˜ ˜ A(−∞, a) and on A(b, c). Letting c → ∞ we obtain the desired extension of η(a,b) to ˜ that we still denote by η. A, ˜ every irreducible subsector of η will be equivalent to Now, by Lemma 27 for A, ρh ⊗ (j · ρk · j ) for some h, k, hence each irreducible subsector of λE must be equivalent to ρh · ρ¯k |A(E) , where ρh is localized in (a, b) and ρk is localized in (−b, −a). By Theorem 9 this is possible if and only if h = k.
Multi-Interval Subfactors and Modularity of Representations in CFT
651
Corollary 32. Let A be completely rational with modular PCT. The following are equivalent: (i) The net A has no non-trivial sector with finite dimension. (ii) The net A has no non-trivial sector (with finite or infinite dimension). (iii) µA = 1, namely A(E) = A(E ) for all E ∈ I2 . Proof. (i) ⇒ (ii): It will be enough to show that every sector (possibly with infinite dimension) ρ of A contains the identity sector. Given E = I1 ∪ I2 with I1 , I2 ∈ I, we may suppose that ρ is localized in I1 and choose a sector ρ equivalent to ρ and ˆ localized in I2 . If u is a unitary with Adu · ρ = ρ , then u ∈ A(E), hence u ∈ A(E) by assumptions. Now A(E) A(I1 ) ⊗ A(I2 ) by the split property, hence there exists a conditional expectation E : A(E) → A(I1 ) with E(u) = 0, thus E(u) is a non-zero intertwiner between ρ and the identity. (ii) ⇒ (iii) follows by Lemma 31. (iii) ⇒ (i) follows by Th. 9 (or by Lemma 31). The condition µA = 1 is however compatible with the existence of soliton sectors. ˆ ˆ Note also that the condition that A(E) ⊂ A(E) has depth ≤ 2 (equivalently A(E) is the crossed product of A(E) by a finite-dimensional Hopf algebra) is equivalent to the innerness of the sector λ extending λE (because λE is implemented by a Hilbert space ˆ of isometries in A(E) [26]), hence it is equivalent to the property that all irreducible sectors of A have dimension 1 by Lemma 31. The following is the main result of this paper. Theorem 33. Let A be completely rational with modular PCT. Then µA = Iglobal ≡ d(ρi )2 ˆ and A(E) ⊂ A(E) is isomorphic to the LR inclusion associated with A(I1 ) ⊗ A(I2 ) and all the finite-dimensional irreducible sectors [ρi ] of A. ˆ Proof. A(E) ⊃ A(E) contains the LR inclusion by the following Proposition 36. Since µA = Iglobal by Lemma 31 it has to coincide with the LR inclusion. Corollary 34. Let A be completely rational and conformal. The inclusions A(E) ⊂ ˆ A(E) are all isomorphic for E ∈ I2 . Proof. If I ∈ I and the ρi ’s are localized in I , for any given I1 ∈ I there is a Möbius transformation giving rise to an isomorphism of A(I ) with A(I1 ) carrying the ρi ’s to endomorphisms localized in I1 . Therefore the isomorphism class of {A(E), λE } is independent of E ∈ I2 . Hence the LR inclusions based on that are isomorphic. Indeed, by using the uniqueness of the I I I1 injective factor [6, 19] and the classification of its finite depth subfactors [40] we have the following. Corollary 35. Let A be completely rational and conformal. The isomorphism class of ˆ the inclusion A(E) ⊂ A(E), E ∈ I2 , depends only on the tensor category of the sectors of A, not on its model realization.
652
Y. Kawahigashi, R. Longo, M. Müger
Proof. If A is non-trivial and I is an interval, then A(I ) is a I I I1 factor and, as the split property holds, A(I ) is injective (see e.g. [27]). Thus A(I ) is the unique injective I I I1 factor [19]. By Popa’s theorem [40], if N is a I I I1 injective factor and T ⊂ End(N ) a rational tensor category isomorphic to the tensor category of sectors of A (as abstract tensor categories), then there exists an isomorphism of N with A(I ) implementing the equivalence between the two tensor categories. Since the LR inclusion N ⊗ N opp ⊂ M clearly depends, up to isomorphism, only ˆ on N and the tensor category T ⊂ End(N ), it is then isomorphic to A(E) ⊂ A(E). We now show that, even in the infinite index case, the two-interval inclusion always contains the LR inclusion associated with any rational system of irreducible sectors. Proposition 36. Let A be completely rational with modular PCT j and E = I ∪−I ∈ I2 a symmetric 2-interval and {[ρi ]} a rational system of irreducible sectors of A with finite dimension, with the ρi ’s localized in I . Let Ri ∈ (id, ρ¯i ρi ) be non-zero intertwiners, where ρ¯i = j · ρi · j . ˆ If M is the von Neumann subalgebra of A(E) generated by A(E) and {Ri }i , then M ⊃ A(E) is isomorphic to the LR inclusion associated with {[ρi ]}i , in particular [M : A(E)] = d(ρi )2 . i
More generally this holds true if the assumption of complete rationality is relaxed with ˆ possibly [A(E) : A(E)] = ∞. Proof. Denoting by N the factor A(0, ∞), we may assume I¯ ⊂ (0, ∞) and consider the ρi as endomorphisms of N . Let then Vi be the isometry standard implementation of ρi as in [17]. Since J Vi J = Vi , we have ρi ρ¯i (X)Vi = Vi X for all X ∈ N ∨ N , hence for all local operators X by strong additivity. Since ρi is irreducible, √ (id, ρi ρ¯i ) is one-dimensional, thus Ri is a multiple of Vi and we may assume Ri = d(ρi )Vi , thus Ri∗ Ri = d(ρi ).
(9)
Now Vi Vj is the standard implementation of ρi ρj on N hence by [17, Prop. A.4], we have Ri R j = Cijk Rk , (10) k
where Cijk is the canonical intertwiner between ρk ρ¯k and ρi ρj ρ¯i ρ¯j given by Cijk =
h
wh j (wh )
wh ⊗ j (wh ),
h
where the wh ’s form an orthonormal basis of isometries in (ρk , ρi ρj ).
(11)
Multi-Interval Subfactors and Modularity of Representations in CFT
653
Setting ρ0 = id, we also have 0∗ Ri∗ = d(ρi )Cii ¯ Ri¯ .
(12)
Indeed the above equality holds up to sign by the j -invariance of both members [17, Lemma A.3], but the – sign does not occur because both members have positive expectation values on the vacuum vector. Now by the split property A(E) = A(I ) ∨ A(−I ) A(I ) ⊗ A(−I ) and A(−I ) = j (A(I )) can be identified with A(I )opp , therefore M is isomorphic to the algebra generated by A(I ) ⊗ A(I )opp and a multiple of isometries Ri satisfying the above relations. Moreover, there exists a conditional expectation from M to A(I ) ⊗ A(I )opp . Corollary 46 then gives the desired isomorphism between A(E) ⊂ M and the LR inclusion. (The Longo–Rehren inclusion in [31], as well as in [28], is dual to the one in this paper, but it does not matter here. Notice further that, in the conformal case, the ˆ 2-interval inclusion A(E) ⊂ A(E) is manifestly self-dual.) The above proof works also in the case µA = ∞ thanks to Prop. 45. Corollary 37. Let A be completely rational with modular PCT. Then the braiding of the tensor category of all sectors of A is non-degenerate. 2 Proof. With the notations in Corollary 26 we have µ2A = Iglobal µB . On the other hand 2 opp Iglobal = Iglobal (A ⊗ A ), hence 2 µB , Iglobal (A ⊗ Aopp ) = µ2A = Iglobal
therefore µB = 1. By Corollary 32 we B has no non-trivial sector localized in a bounded interval and this is equivalent to the non-degeneracy of the braiding by Lemma 30. That µA = Iglobal implies the non-degeneracy of the braiding has been noticed in [32, Corollary 4.3]. An immediate consequence of Corollary 37 follows from the work [41], where a model independent construction of Verlinde’s matrices S and T has been performed, provided the braiding symmetry is non-degenerate, thus providing a corresponding representation of the modular group SL(2, Z). Hence we have: Corollary 38. The Verlinde matrices T and S constructed in [41] are non-degenerate, hence there exists an associated representation of the modular group SL(2, Z). Corollary 39. Let A be completely rational with modular PCT. Every sector of A is a direct sum of finite dimensional sectors. Proof. Assuming the contrary, by Proposition 59 we have an irreducible sector [ρ] with infinite dimension. Let E = I1 ∪ I2 ∈ I2 with ρ localized in I1 and ρ be equivalent ˆ to ρ and localized in I2 . Let u be a unitary in (ρ, ρ ). Then u ∈ A(E), hence it has a unique expansion u= xi Ri , xi ∈ A(E), i
where Ri are as in Proposition 36. As xu = uρ(x), x ∈ A(I1 ), we have x xi Ri = xi Ri ρ(x) = xi (ρi · ρ¯i )(ρ(x))Ri = xi ρi (ρ(x))Ri i
i
i
i
∀x ∈ A(I1 ),
654
Y. Kawahigashi, R. Longo, M. Müger
thus xxi = xi ρi (ρ(x)) for all i. As there is a xi = 0, by the split property there is a non-zero intertwiner between ρi · ρ and the identity. As ρi and ρ are irreducible, this implies that ρ is finite dimensional, contradicting our assumption. Corollary 40. Let A be conformal and completely rational. Then every representation on a separable Hilbert space is Möbius covariant with positive energy. Proof. By the preceding result every such representation is a direct sum of irreducible sectors with finite dimension. According to [16] every finite dimensional sector is covariant with positive energy, thus also a direct sum of such sectors. 6. n-Interval Inclusions In this section we extend the results on the 2-interval subfactors to arbitrary multi-interval subfactors. Let A be a local, irreducible net on S 1 . We assume A to be completely rational with modular PCT, so that our previous analysis applies. Alternatively A may be assumed ˆ to be conformal with µA = [A(E) : A(E)] finite and independent of the 2-interval E; this setting will be needed to derive Cor. 7. If E ∈ In we set ˆ µn = [A(E) : A(E)]. With this notation µA = µ2 . We also consider the situation occurring in representations different from the vacuum representation: if ρ is a localizable representation of A (i. e. a DHR representation, that, on S 1 , is just the locally normal representations), we set ρ µn = [ρ(A(E )) : ρ(A(E))]. ρ
ρ
Lemma 41. µn = µ1 µn ,
∀n ∈ N.
Proof. Let E = I1 ∪ I2 ∪ · · · ∪ In ∈ In . We may suppose that ρ is an endomorphism of ˆ A localized in I1 . Since ρ acts trivially on E , we have ρ(A(E )) = A(E ) = A(E), thus the inclusion ρ(A(E)) ⊂ ρ(A(E )) is a composition ˆ ρ(A(E)) ⊂ A(E) ⊂ ρ(A(E )) = A(E) ; by the split property ρ(A(E)) ⊂ A(E) is isomorphic to ρ(A(I1 )) ⊗ A(I2 ∪ · · · ∪ In ) ⊂ ˆ 2 ∪ · · · ∪ In ), therefore A(I1 ) ⊗ A(I ˆ : A(E)] · [A(I1 ) : ρ(A(I1 )]. µρn = [A(E) ρ
Lemma 42. µn = d(ρ)2 µn−1 2 ,
∀n ∈ N. ρ
Proof. By the index-statistics theorem [25] we have µ1 = d(ρ)2 , hence, by Lemma 41, we only need to show that µn = µn−1 2 . We proceed inductively. If n = 1 the claim is trivially true. Assume the claim for a given n and let En = I1 ∪ · · · ∪ In ∈ In and En+1 = I1 ∪ · · · ∪ In ∪ In+1 ∈ In+1 . Then ˆ n ) ∨ A(In+1 ) ⊂ A(E ˆ n+1 ), A(En+1 ) = A(En ) ∨ A(In+1 ) ⊂ A(E ˆ n+1 ) : A(E ˆ n ) ∨ A(In+1 )] and, by the thus, by the split property, µn+1 = µn · [A(E ˆ ˆ n+1 ) is equal to inductive assumption, we have to show that A(En ) ∨ A(In+1 ) ⊂ A(E ) ∩ A(E ) ⊂ A(E µ2 . But the commutant of this latter inclusion A(In+1 n n+1 ) has index
Multi-Interval Subfactors and Modularity of Representations in CFT
655
is µ2 because, by the split property, it turns out to be isomorphic to A(I9 ∪ Ir ) ⊗ A(L) ⊃ ˆ 9 ∪ Ir ) ⊗ A(L), namely to a 2-interval inclusion tensored by a common factor, A(I where I9 and Ir are the two intervals of En+1 contiguous to In+1 and L is the remaining (n − 1)-subinterval of En+1 . Theorem 43. Let A be a local, irreducible completely rational net with modular PCT. Let E = ∪ni=1 Ii ∈ In and λ(n) = γ (n) |A(E), where γ (n) is a canonical endomorphism ˆ from A(E) into A(E). Then λ(n) ∼ Ni01 ...in ρi1 ρi2 · · · ρin , (13) = i1 ,... ,in
where {[ρi ]}i are all the irreducible sectors with finite statistics, ρik being localized in Ik . Ni01 ...in is the multiplicity of the identical endomorphism in the product ρi1 . . . ρin . The same results hold true if complete rationality is replaced by conformal invariance ˆ and assuming [A(E) : A(E)] = Iglobal < ∞ independently of the 2-interval E. Proof. Let I be an interval which contains ∪i Ii and let ρik , k = 1, . . . , n, be irreducible endomorphisms localized in Ik , respectively. Then the intertwiner space between ρi1 ρi2 · · · ρin , considered as an endomorphism of A(I ), and the identity has dimension Ni01 ...in . We are using here the equivalence between local and global intertwiners, that holds either by strong additivity or by conformal invariance [17]. These ˆ intertwiners are multiples of isometries in A(E). Thus, by the argument leading to Th. (n) 9, ρi1 ρi2 · · · ρin |A(E) is contained in λ with multiplicity Ni01 ...in . We have thus proved the inclusion 0 in (13). Now the dimension of the endomorphism on the right-hand side of (13) has been computed in [50]. For the sake of selfcontainedness we repeat the argument: i Ni01 ...in d(ρ1 ) · · · d(ρn ) = Ni1n...in−1 d(ρin ) d(ρi1 ) · · · d(ρin−1 ) i1 ,... ,in
i1 ,... ,in−1
=
in
2
d(ρ1 ) · · · d(ρin−1 )
i1 ,... ,in−1
(14) n−1
=
i
d(ρi2 )
,
where we have used Frobenius reciprocity Ni01 ...in = Nii1n...in−1 , the fact d(ρ) = d(ρ) and the identity i *ρi , ρ+d(ρi ) = d(ρ). On the other hand, we have n−1 ˆ : A(E)] = µn−1 d(λ(n) ) = [A(E) A = Iglobal =
d(ρi )2
n−1
,
i
where the first equality is obvious, the second is given by Lemma 42 and the last one follows from the results of the preceding section. Thus the endomorphisms on both sides of (13) have the same dimension, hence they are equivalent. The last claim in the statement follows by the same arguments and the equivalence between local and global intertwiners.
656
Y. Kawahigashi, R. Longo, M. Müger
ˆ Corollary 44. Let A be as in Th. 43. If E ∈ In , then A(E) ⊂ A(E) is isomorphic to the nth iterated LR inclusion associated with N ≡ A(I ), I ∈ I, and the system of all sectors of A (considered as sectors of N ). ˆ In particular, for a fixed n ∈ N, the isomorphism class of A(E) ⊂ A(E) depends only on the superselection structure of A and not on E ∈ In . Proof. Let E = I1 ∪ · · · ∪ In ∈ In with E¯ ⊂ (0, ∞) and n = 2k . It follows by Lemma 42 and the split property that ˆ ∪ −E) : A(E) ˆ ˆ [A(E ∨ A(−E)] = Iglobal . ˆ On the other hand, if the ρi ’s are localized in I1 , then the algebra generated by A(E) ∨ ˆ A(−E) and the standard implementation isometries Vi of ρi |Aˆ (E) are the associated LR ˆ ∪ −E), hence coincide inclusion, analogously as in Th. 33, and are contained in A(E with that by the equality of the indices. The corollary then follows in the case n = 2k by induction, once we note that at each ˆ ˆ ˆ ∪ −E) is ρi | ˆ step the extension αρ+i ⊗id from A(E) ∨ A(−E) to A(E A(E∪−E) . The same is then true for an arbitrary n by taking relative commutants. 7. Examples and Further Comments Our results may be first illustrated by considering the case of an inclusion of completely rational, local conformal irreducible nets A ⊂ B, where A = B G is the fixed-point of B with respect to the action of a finite group G and µB = 1. Then [B : A] = |G|, thus by Prop. 24, Iglobal (A) = µA = |G|2 . Now A has the DHR [9] irreducible sectors [ρπ ] ˆ and associated with π ∈ G d(ρπ )2 = |G|, ˆ π∈G
therefore A has extra irreducible sectors [σi ] with d(σi )2 = |G|2 − |G|. i
For example, in the case of the Ising model, we have A = B Z2 as above (but with B twisted local, yet this does not alter our discussion), thus µA = 4 and thus d(ρi )2 = 4, so the standard three sectors are the only irreducible sectors. On the other hand, in the situation studied in [34], the superselection category of A is equivalent to the representation category of a twisted quantum double D ω (G) with ω ∈ H 3 (G, T). Since D ω (G) is semisimple we again have d(σ )2 = dim D ω (G) = |G|2 = µA . ω (G) σ ∈D
One may compare this with the situation occurring on a higher dimensional spacetime. There the strong additivity property may be replaced by the requirement that ˜ ∩ A(O) ˜ = A(O) if O ⊂ O˜ are double cones. If E ≡ O1 ∪ O2 , where A(O ∩ O)
Multi-Interval Subfactors and Modularity of Representations in CFT
657
O1 and O2 are double cones with space-like separated closure, the split property gives a natural isomorphism of A(O1 ) ∨ A(O2 ) with A(O1 ) ⊗ A(O2 ) and d(ρπ )2 = |G|, [A(E ) : A(E)] = Iglobal = ˆ π∈G
where G is the gauge group and the ρπ ’s are the DHR sectors [9] (there are no extra sectors). The reason for this difference is that on S 1 the complement of a 2-interval is ˆ still a 2-interval, thus the inclusion A(E) ⊂ A(E) is self-dual, while on the Minkowski spacetime the spacelike complement of O1 ∪ O2 is a connected region producing no charge transfer inclusion. The index µA in the models given by the loop group construction for SU (n)k has been computed in [50]. Our results apply in particular to these nets and the 2-interval inclusion is the LR inclusion associated with the corresponding irreducible sectors {[ρi ]}i . We note that in this case the 2-interval inclusion is not the asymptotic inclusion of the corresponding Jones-Wenzl subfactor [24, 48], even up to tensoring by a common injective III1 factor. Consider SU (2)k as an example. The net has k + 1 sectors and if we choose the standard generator, we get a corresponding subfactor of Jones with principal graph Ak+1 , up to tensoring a common injective factor of type III1 , as in [47]. If we apply the construction of the asymptotic inclusion to this subfactor, we get a “quantum double” of only the sectors corresponding to the even vertices of Ak+1 . We get the same result, if we apply the LR construction to the system of N –N sectors (or M–M sectors). But the construction of a subfactor from 4 intervals gives a “quantum double” of the system of all the sectors, both even and odd. If we want to get this system from the asymptotic inclusion or the Longo–Rehren inclusion, we have to use also bimodules/sectors corresponding to the odd vertices of the (dual) principal graph. In order to get this LR inclusion from the construction of the asymptotic inclusion, we need to proceed as follows. Let {[ρi ]}i be the set of all the sectors for the net arising from the loop group
construction for SU (n)k as above. Then for a fixed interval I ⊂ S 1 , we consider ( i ρi )(A(I )) ⊂ A(I ) which has finite index and finite depth.
Take a hyperfinite II1 subfactor P ⊂ Q with the same higher relative commutants as ( i ρi )(A(I )) ⊂ A(I ). Then the tensor categories of ˆ the sectors with quantum 6j -symbols of Q ∨ (Q ∩ Q∞ ) ⊂ Q∞ and A(E) ⊂ A(E) are isomorphic. For this reason, the index of the asymptotic inclusion of the Jones subfactor with principal graph Ak+1 is half of that of the subfactor arising from 4 intervals and the net for SU (2)k . For SU (n)k , this ratio of the two indices is n. Finally we notice that there are models like the SO(2N )1 WZW models, see [1] or [34], where all irreducible sectors have dimension one, yet the superselection category C is modular in agreement with our results. In these cases the fusion graph is disconnected, therefore the equivalent categories of M − M and of N ⊗ N opp − N ⊗ N opp sectors are proper subcategories of the categories C × C opp D(C), where D(C) is the quantum double of C. We close this section with a few questions. Does there exist a net with only trivial sectors and non-trivial 2-interval inclusions (thus µA = ∞)? Is strong additivity automatic in the definition of complete rationality? Is the
LR inclusion the only extension of opp N ⊗ N opp with the given canonical endomorphism i ρi ⊗ ρi ? A. The Crossed Product Structure of the LR Inclusion Let N be an infinite factor and {[ρi ]}i a rational system of irreducible sectors of N . The LR inclusion [28] is a canonical inclusion N ⊗ N opp ⊂ M associated with N and
658
Y. Kawahigashi, R. Longo, M. Müger
{[ρi ]}i such that λ
i
opp
ρi ⊗ ρ i
,
where λ is the restriction to N ⊗ N opp of the canonical endomorphism of M into N ⊗ N opp . In [28] such an inclusion is obtained by a canonical choice of the intertwiners T ∈ (id, λ) and S ∈ (λ, λ2 ) that characterize the canonical endomorphism [26] (Q-system). We now show the universality property of this inclusion and its crossed product structure, that will provide a different realization of it. By LR inclusion we will mean the upward LR inclusion. We shall consider the free ∗ -algebra M0 generated by N ⊗ N opp and elements Ri satisfying the relations opp Ri x = (ρi ⊗ ρi )(x)Ri , x ∈ N ⊗ N opp , R ∗ R = d(ρ ), i i i k (15) R R = i j k Cij Rk , R ∗ = d(ρ )C 0∗ R , i i¯ ¯ i ii where Cijk is the canonical intertwiner between ρk ⊗ ρk and ρi ρj ⊗ ρi ρj given by Cijk = h wh ⊗ j (wh ), with j the antilinear isomorphism of N with N opp , and the wh ’s form an orthonormal basis of isometries in (ρk , ρi ρj ). We equip M0 with the maximal C∗ semi-norm associated to the representations of M0 whose restriction to N ⊗ N opp are normal and denote by M the quotient of M0 modulo the ideal formed by the elements that are null with respect to this seminorm and refer to M as the free reduced pre-C ∗ -algebra generated by N ⊗ N opp and the Ri ’s. opp
opp opp
Proposition 45. Let N be an infinite factor with separable predual and {[ρi ]}i a rational system of finite-dimensional irreducible sectors of N . Let M be the free reduced pre-C ∗ -algebra generated by N ⊗ N opp and elements Ri satisfying the relations (15) as above. Then M is a factor and N ⊗N opp ⊂ M is isomorphic to the LR inclusion associated with N and {[ρi ]}i . In particular every element X ∈ M has a unique expansion X= xi Ri , xi ∈ N ⊗ N opp . i
N opp
In other words: if N ⊗ acts normally on a Hilbert space H and Ri ∈ B(H) are elements satisfying the relations (15), then the sub-algebra M of B(H) generated by N ⊗ N opp and the Ri ’s is a factor and N ⊗ N opp ⊂ M is isomorphic to the LR inclusion. Proof. Clearly all elements of M have the form X= xi Ri , xi ∈ N ⊗ N opp ,
(16)
i
and we may suppose that M acts on a Hilbert space so that N and N opp are weakly closed.
Multi-Interval Subfactors and Modularity of Representations in CFT
659
We now construct an conditional expectation E : M → N ⊗ N opp . Setting ρ0 = id, the expectation E may be defined by E(X) = x0
(17)
for X given by (16), once we show that this is well-defined. To this end we will apply the averaging argument in [23]. opp such that there exist x ∈ N ⊗ N opp , i > 0, Let i J be the set of all x0 ∈ N ⊗ N with i≥0 xi Ri = 0. Clearly J is a two-sided ideal of N ⊗ N opp , hence J = 0 (as opp (we may suppose N to be of type III). Suppose we want to show) or J = N ⊗ N J = 0 and let X = 1 + i>0 xi Ri = 0, thus X =1+
uxi Ri u∗ = 1 +
i>0
i>0
opp
uxi ρi ⊗ ρi
(u∗ )Ri = 0
for all unitaries u ∈ N ⊗ N opp . Letting u run in the unitary group of a simple injective subfactor R of N ⊗ N opp and taking a mean over this group, we have X =1+ yi Ri = 0, i>0 opp
where yi ∈ N ⊗ N opp intertwines id and ρi ⊗ ρi on R, thus on all N ⊗ N opp by the opp simplicity of R. Since ρi ⊗ ρi is irreducible, yi = 0, i > 0, and we have 1 = 0, a contradiction. Notice now that 0∗ 0∗ Ri Ri∗ = d(ρi )Ri Cii ¯ Ri¯ = d(ρi )ρi ⊗ ρi (Cii ¯ )Ri Ri¯ opp 0∗ k = d(ρi )ρi ⊗ ρi (Cii ¯ )Ci i¯ Rk , opp
k
thus, by the conjugate equation in [25], we have E(Ri Ri∗ ) = d(ρi )ρi ⊗ ρi
opp
0∗ 0 (Cii ¯ )Ci i¯ =
1 , d(ρi )
so every X ∈ M has the unique expansion X= xi Ri , xi = d(ρi )E(XRi∗ ).
(18)
i
Denoting by M1 ⊃ N ⊗ N opp the LR inclusion associated with N and {[ρi ]}i , M1 is generated by N ⊗ N opp and elements Ri , with an expectation E , satisfying the relations as in (15) and (18) [31, Sect. 5], hence the linear map C:X≡ xi Ri ∈ M → C(X) ≡ xi Ri ∈ M1 (19) i
i
is clearly a homorphism of M onto M1 , which is the identity on N ⊗ N opp . C is clearly one-to-one by the uniqueness of the expansion (18) both in M and in M1 .
660
Y. Kawahigashi, R. Longo, M. Müger
Note that the above proposition gives an alternative construction of the LR inclusion, which is similar to Popa’s construction of the symmetric enveloping algebra [39], as follows. Let N act standardly on L2 (N ) and Vi be the standard isometry implementing ρi . The ∗ -algebra A generated by N and N is naturally isomorphic to the algebraic √ tensor product N N opp and the operators Ri ≡ d(ρi )Vi satisfy the relations (15) by [17, App. A]. By the above argument there exists a conditional expectation E : B → A, where B is the ∗ -algebra generated by A and the Vi ’s. Taking a normal state ϕ of N , the state ϕ˜ ≡ ϕ ϕ opp · E of B gives by the GNS representation the LR inclusion πϕ˜ (A) ⊂ πϕ˜ (B) (Prop. 45). Corollary 46. Let N be an infinite factor with separable predual and {[ρi ]}i a rational system of finite-dimensional irreducible sectors of N . Let M be a von Neumann algebra with M ⊃ N ⊗ N opp and Ri ∈ M elements satisfying the relations (15). If M is generated by N ⊗ N opp and the Ri ’s, then N ⊗ N opp ⊂ M is isomorphic to the LR inclusion associated with {[ρi ]}i . In particular (N ⊗N opp ) ∩M = C and there exists a normal conditional expectation from M to N ⊗ N opp . Proof. The proof is immediate, the isomorphism is obtained as in (19): X∈M→
i
(notations analogous to the ones in (19).
d(ρi )E(XRi∗ )Ri ,
In the following we shall iterate the LR construction, in order to describe the structure of multi-interval subfactors. With N an infinite factor as above and {[ρi ]}i a system of irreducible sectors with opp unitary braiding symmetry, let α + be the induction map from sectors ρi ⊗ ρj of N ⊗ N opp to sectors of the LR extension M1 ≡ M defined by formula (7). Then {αρ+i ⊗id }i is a system of irreducible sectors of M with braiding symmetry and we may opp construct the corresponding LR inclusion M1 ⊗ M1 ⊂ M2 , where the opposite of + + αρi ⊗id is αρ¯i ⊗id . We may then iterate the procedure to obtain a tower M1 ⊂ M2 ⊂ M2k ⊂ · · · and thus an inclusion Nn ⊂ Mn ,
n = 2k ,
where Nn ≡ N ⊗ N opp ⊗ N ⊗ · · · N ⊗ N opp (2k tensor factors). By construction this n−1 inclusion has index Iglobal and we refer to it as the nth iterated LR inclusion. Proposition 47. Let n = 2k . The nth iterated LR inclusion Nn ⊂ Mn is irreducible. If γ (n) : Mn → Nn is the canonical endomorphism, its restriction λ(n) = γ (n) |Nn is given by λ(n)
i1 ,i2 ,...,in
opp
opp
Ni01 i2 ...in ρi1 ⊗ ρi2 ⊗ · · · ⊗ ρin ,
where Ni01 i2 ...in ≡ *id, ρi1 ρ¯i2 · · · ρ¯in +.
(20)
Multi-Interval Subfactors and Modularity of Representations in CFT
661
Proof. By a computation similar to the one in Sect. 6, λ(n) defined by formula (20) has dimension n−1 d(λ(n) ) = Iglobal ,
therefore the formula λ(n) = γ (n) |Nn will follow by showing that ρi1 ⊗ρi2 ⊗· · ·⊗ρin ≺ γ (n) |Nn with multiplicity Ni01 i2 ...in and this will also imply the irreducibility of Nn ⊂ Mn because then λ(n) 0 id with multiplicity one. opp opp But ρi1 ⊗ ρi2 ⊗ · · · ⊗ ρin is unitarily equivalent to ρi1 ρ¯i2 · · · ρ¯in ⊗ id ⊗ · · · ⊗ id in Mn , by applying iteratively Lemma 18, hence we have the conclusion. opp
opp
Let now m < n = 2k be an integer and set Nm as the alternate tensor product of k copies of N and N opp , Nm ≡ N ⊗ N opp ⊗ N ⊗ · · · N ⊗ N opp ,
m factors.
We then define the mth iterated LR inclusion Nm ⊂ Mm , where Mm is defined as the relative commutant in Mn of the remaining n − m copies of N and N opp , i.e. Mm = (Nm ∩ Nn ) ∩ Mn . Note that Nm ⊂ Mm is an irreducible inclusion of factors because Nm ∩ Mm ⊂ Nn ∩ Mn = C. Arguing similarly as above we then have: Proposition 48. Proposition 47 holds true for all positive integers n (in formula (20) opp ρin is ρin if n is odd). Proof. Let n = 2k . Let {Vi91 ...in : 9 = 1, 2, . . . Ni1 ...in } be a basis of isometries in the opp opp space of elements in Mn that intertwine ρi1 ⊗ ρi2 · · · ⊗ ρin on Nn . Arguing as in Prop. 45 we see that any element X ∈ Mn has a unique expansion X= xi91 ...in Vi91 ...in , xi91 ...in ∈ Mn . i1 ...in 9
Using this expansion it is easy to check that for m < n the factor Mm defined above is generated by Nm and the Vi91 ...in ’s with im+1 = im+2 = · · · = in = 0. The rest then follows easily. B. Nets on R and on S 1 and Their Representations In our paper we deal with nets on R, rather than nets on S 1 , for various reasons: because this is the natural language for our arguments, because our results are valid for nets that are not necessarily conformal and, finally, because even if our analysis were restricted to conformal nets on S 1 , our proofs would require the analysis of more general nets on R (the t = 0 LR net is not conformal). In the next Sect. C we will however need to deal with nets on S 1 and their representations, and then conclude consequences for nets on R. Although the relations between nets on R and on S 1 and their representations is straightforward, we will describe explicitly this point here for the convenience of the reader. However, for simplicity, we consider only the case of strongly additive, Haag dual nets.
662
Y. Kawahigashi, R. Longo, M. Müger
Nets on S 1 . Let A be a net of von Neumann algebras on S 1 on a separable Hilbert space satisfying Haag duality. We also assume the local von Neumann algebras A(I ) to be properly infinite, which is automatically true if the split property holds, or if A is conformal (except, of course, for the trivial net A(I ) ≡ C). A representation π of A is, by definition, a map I ∈ I → πI that associates to each interval I ∈ I of S 1 a representation, on a fixed Hilbert space, of the von Neumann algebra A(I ) such that πI˜ |A(I ) = πI if I ⊂ I˜. We shall say that π is locally normal if πI is normal for all I ∈ I and that π is localizable if πI is unitary equivalent to id|A(I ) for all I ∈ I. As the A(I )’s are properly infinite the two notions coincide if π acts on a separable Hilbert space. Moreover every representation of A on a separable Hilbert space is automatically locally normal [45], thus localizable. Denote by C ∗ (A) the universal C ∗ -algebra [14] associated with A (see also [16]). For each I ∈ I there is a canonical embedding ιI : A(I ) → C ∗ (A) and ιI˜ |A(I ) = ιI if I ⊂ I˜; we identify A(I ) with ιI (A(I )) if no confusion arises. There is a one-to-one correspondence between representations of the C ∗ -algebra C ∗ (A) and representations of the net A, given by π → {I → πI ≡ π ·ιI }. Locally normal representations of the net A correspond, of course, to locally normal representations of C ∗ (A). We shall always assume our representations to act on a separable Hilbert space, thus local normality is automatic. As Haag duality holds, a localizable representation π of C ∗ (A) is unitarily equivalent to a representation of the form σ0 · ρ, where σ0 is the representation of C ∗ (A) corresponding of the identity representation of A (we shall however not need this result). Nets on R. Given a net A of von Neumann algebras on S 1 satisfying Haag duality we may associate a net A0 of Neumann algebras on R = S 1 {∞} (identification by Cayley transform) by setting A0 (I ) = A(I ), for all bounded intervals I of R. We call A0 the restriction of A to R. Clearly, if A is strongly additive, then A0 is also strongly additive and satisfies Haag duality on R in the form A(I ) = A(R I ),
(21)
where I ⊂ R is either an interval or an half-line (a, ∞) or (−∞, a), a ∈ R. Here, if E ⊂ R has non-empty interior, we denote by A0 (E) the C∗ -algebra generated by the von Neumann algebras A0 (I )’s as I runs in the intervals contained in the region E and set A0 (E) = A0 (E) . Conversely, let now A0 be a strongly additive net of properly infinite von Neumann algebras A0 (I ) on the (bounded, non-trivial) intervals of R satisfying Haag duality (21). We may compactify R to S 1 = R ∪ {∞} and extend A0 to a net A on the intervals of S 1 by defining A(I ) ≡ A0 (S 1 I )
(22)
if I is an interval whose closure contains the point ∞. Clearly, A is the unique Haag dual net on S 1 whose restriction to R is A0 ; we thus call A the extension of A0 to S 1 . We explicitly state this one-to-one in the following.
Multi-Interval Subfactors and Modularity of Representations in CFT
663
Lemma 49. Let A be a net on S 1 satisfying Haag duality and strong additivity. Then its restriction A0 to R satisfies strong additivity and Haag duality on R. Conversely if A0 is a Haag dual (21), strongly additive net on R, then its extension A to S 1 is strongly additive and Haag dual. Moreover A0 satisfies the split property on R if and only if A satisfies the split property on S 1 . Proof. The proof is immediate. The statement concerning the split property follows because an inclusion of von Neumann algebras N ⊂ M is split iff the commutant inclusion M ⊂ N is split. We now consider the relation between representations of a net A, satisfying Haag duality and strong additivity on S 1 as in Lemma 49 and its restriction A0 on R. A DHR representation π0 of A0 is, by definition, a representation π0 of A0 (R) such that π0 |A0 (RI ) is unitarily equivalent to id|A0 (RI ) for every bounded non-trivial interval I of R, cf. [9]. Clearly a localizable representation π of A determines a DHR representation π0 of A0 ; indeed π0 is consistently defined on ∪a>0 A(−a, a) by π0 (X) = πI (X), X ∈ A(I ), where I ≡ (−a, a), hence on all A(R) by continuity. We call π0 the restriction of π to A0 . Conversely, as we shall see, every DHR representation π0 of A0 (R) determines uniquely a localizable representation π of A. A localized endomorphism ρ of A0 is, by definition, an endomorphism of A0 (R) such that ρ|A0 (I ) = id|A0 (I ) for some interval I ⊂ R; one then says that ρ is localized in I . ρ is transportable if for each interval I1 there is an endomorphism ρ1 localized in I1 and (unitarily) equivalent to ρ (as representations of A0 (R)). By Haag duality then ρ1 = Adu · ρ, where the unitary u belongs to A0 (I˜), if I˜ is any interval containing both I and I1 . In this paper (as is often the case) transportability is assumed in the definition of localized endomorphism. By a classical simple argument [9], a DHR representation π0 of A0 (R) is unitarily equivalent to a (transportable) endomorphism ρ of A0 (R) localized in each given interval I ; it is enough to put ρ(X) ≡ U π0 (X)U ∗ , X ∈ A0 (R), where U is a unitary intertwiner between π0 |A0 (RI ) and id|A0 (RI ) . Proposition 50. Let A be a strongly additive, Haag dual net on S 1 and A0 be its restriction to R, as in Lemma 49. If π is a localizable representation of A, its restriction π0 to A0 is a DHR representation of A0 . Conversely, if π0 is a DHR representation of A0 , there exists a (obviously unique) localizable representation π of A whose restriction to A0 is π0 . Proof. By the above discussion, we only show that if π0 is a DHR representation of A0 , there exists a localizable representation π of A such that πI = π0 |A(I ) if I is a bounded interval of R. Indeed, if the closure of I contains the point ∞, we can define πI as the normal extension of π0 |A0 (I {∞}) , once we show the necessary normality property. Now the
664
Y. Kawahigashi, R. Longo, M. Müger
normality of π0 |A0 (I {∞}) does not depend on the unitary equivalence class of π0 , thus we may replace π0 by a DHR endomorphism ρ of A0 localized in interval I1 ⊂ R with I1 ∩ I = ∅. But then ρ|A0 (I {∞}) is the identity, hence normal. By definition, the sectors of A (resp. of A0 ) are the unitary equivalence classes of localizable representations of A (resp. of DHR representations of A0 ). By the above discussions, the two classes are in one-to-one correspondence. On the other hand localizable representations of A correspond to localizable representations of C ∗ (A) and DHR representations of A0 are equivalent to DHR localized endomorphisms of A0 , hence we have the following. Corollary 51. Let A0 be a strongly additive, Haag dual as in (21), net on R and A be its extension to S 1 . The restriction map π → π0 gives rise to a natural one-to-one correspondence between unitary equivalence classes of localizable representations of C ∗ (A) and unitary equivalence classes of DHR localized endomorphisms of A0 . In particular π(C ∗ (A)) = π0 (A0 (R)) , so π is of type I iff π0 is of type I. Proof. It remains to check the last part of the statement. As C ∗ (A) is generated (as a C∗ -algebra) by the von Neumann algebras A(I ) as I runs in the intervals of S 1 , one has π(C ∗ (A)) = ∨I πI (A(I )), thus clearly π(C ∗ (A)) ⊃ π0 (A0 (R)) . On the other hand if I is an interval of S 1 , by local normality and strong additivity we have πI (A(I )) = πI (A(I {∞})) ⊂ π0 (A0 (R)) , hence π(C ∗ (A)) ⊂ π0 (A0 (R)) . The naturality in the above corollary means that the tensor categories of localizable representations of C ∗ (A) and of DHR localized endomorphisms of A0 are equivalent, but we do not need this form of the above statement. C. Disintegration of Locally Normal Representations and of Sectors Takesaki and Winnink [44] have shown that a locally normal state decomposes into locally normal states, if the split property holds. We shall show here analogous results for localizable representations (sectors). Our arguments work, however, along the same lines to show that locally normal representations decompose into locally normal representations, also on higher dimensional manifolds. We begin with a simple lemma. Lemma 52. Let M be a von Neumann algebra, L ⊂ M a σ -weakly dense C∗ -subalgebra and J ⊂ L a right ideal of L. If π is a representation of L on a Hilbert space H such that π |J is σ -weakly continuous and π(J )H = H, then π is σ -weakly continuous, thus it extends uniquely to a normal representation of M. Proof. It is sufficient to show that π is σ -weakly continuous on the unit ball of L, see e.g. [45]. Let then {ai }i be a bounded net of elements ai ∈ L such that ai → 0 σ -weakly. If t ∈ B(H) is a σ -weak limit point of {π(ai )}i , we have to show that t = 0. By considering a subnet, if necessary, we may assume π(ai ) → t. Given h ∈ J , we have ai h ∈ J and ai h → 0, thus π(ai h) → 0 because π |J is σ -weakly continuous, therefore tπ(h) = lim π(ai )π(h) = lim π(ai h) = 0, i
i
and this entails t = 0 because h is arbitrary and π(J )H is dense in H.
Multi-Interval Subfactors and Modularity of Representations in CFT
665
We shall use the well-known fact that the C∗ -algebra of compact operators on a separable Hilbert space H has only one non-degenerate (i.e. not containing the zero representation) representation, up to multiplicity, hence this representation has a unique normal extension to B(H). Corollary 53. Let N be a type I factor with separable predual, K ⊂ N the ideal of compact operator relative to N and L a C∗ -algebra with K ⊂ L ⊂ M. If π is a representation of L such that π |K is non-degenerate, then π is σ -weakly continuous, thus it extends uniquely to a normal representation of N . Proof. Immediate because any non-degenerate representation of K is σ -weakly continuous and K is σ -weakly dense in N . Let A be a net of von Neumann algebras on S 1 over a separable Hilbert space satisfying the split property and Haag duality. If I, I˜ are intervals, we write I ⊂⊂ I˜ if the closure of I is contained in the interior ˜ of I . For each pair of intervals I ⊂⊂ I˜ we choose an intermediate type I factor N (I, I˜) between A(I ) and A(I˜) and let K(I, I˜) be the compact operators of N (I, I˜) (there is a canonical choice for N (I, I˜) [10], but this does not play a role here). We denote by IQ the set of intervals with rational endpoints and by A the C ∗ -subalgebra of C ∗ (A) generated by all K(I, I˜) as I ⊂⊂ I˜ run in IQ . Clearly A is norm separable. If I1 ⊂⊂ I˜1 ⊂ I2 ⊂⊂ I˜2 then clearly N (I1 , I˜1 ) ⊂ N (I2 , I˜2 ), but K(I1 , I˜1 ) is not included in K(I2 , I˜2 ). For this reason we define the C∗ -algebras associated to pairs of intervals I ⊂⊂ I˜, L(I, I˜) ≡ N (I, I˜) ∩ A. As N (I, I˜) is the multiplier algebra of K(I, I˜), L(I, I˜) consists of elements of A that are multipliers of K(I, I˜). By definition K(I, I˜) ⊂ L(I, I˜) ⊂ N (I, I˜) and A is the C ∗ -subalgebra of C ∗ (A) generated by all L(I, I˜) as I ⊂⊂ I˜ run in IQ . Lemma 54. If I1 ⊂⊂ I˜1 ⊂ I2 ⊂⊂ I˜2 are intervals then L(I1 , I˜1 ) ⊂ L(I2 , I˜2 ). Proof. L(I1 , I˜1 ) ⊂ N (I1 , I˜1 ) ⊂ N (I2 , I˜2 ), thus L(I1 , I˜1 ) ⊂ N (I2 , I˜2 ) ∩ A = L(I2 , I˜2 ).
Proposition 55. Let π be a locally normal representation of C ∗ (A). Then π |A is a representation of A and π |K(I,I˜) is non-degenerate for every of pair of intervals I ⊂⊂ I˜. Conversely, if σ is a representation of A such that σ |K(I,I˜) is non-degenerate for all intervals I, I˜ ∈ IQ , I ⊂⊂ I˜, there exists a unique locally normal representation σ˜ of C ∗ (A) that extends σ . Moreover equivalent representations C ∗ (A) correspond to equivalent representations of A.
666
Y. Kawahigashi, R. Longo, M. Müger
Proof. The only non-trivial part is that σ extends to a locally normal representation σ˜ of C ∗ (A). If I ⊂⊂ I˜ are intervals in IQ , we denote by σ˜ I,I˜ the unique normal extension of σ |L(I,I˜) to N (I, I˜) given by Corollary 53. Given an interval I , we choose I1 , I˜1 ∈ IQ , I1 ⊂⊂ I˜1 such that I ⊂⊂ I1 and set σ˜ I ≡ σ˜ I1 ,I˜1 |A(I ) , We have to show that σ˜ I is well-defined, then I → σ˜ I is clearly a representation of A. Indeed, let I2 , I˜2 ∈ IQ with I2 ⊂⊂ I˜2 be another pair such that I ⊂⊂ I2 . We can choose I3 , I˜3 ∈ IQ such that I ⊂⊂ I3 ⊂⊂ I˜3 ⊂⊂ I1 ∩ I2 . Then by Lemma 54 L(I3 , I˜3 ) ⊂ L(Ii , I˜i ), i = 1, 2, and therefore σ˜ I3 ,I˜3 = σ˜ I1 ,I˜1 |N (I3 ,I˜3 ) = σ˜ I2 ,I˜2 |N (I3 ,I˜3 ) . This concludes the proof.
Proposition 56. Let π be a locally normal representation of C ∗ (A) on a separable Hilbert space and denote by πA the restriction of π to A. If ⊕ πA = πλ dµ(λ) X
is a decomposition into irreducible representations πλ (which always exists), then πλ extends to a locally normal representation π˜ λ of C ∗ (A) for almost all λ. Proof. By Proposition 55, it is sufficient to show that there exists a null set E ⊂ X such that πλ |K(I,I˜) is non-degenerate for λ ∈ / E and all I, I˜ ∈ IQ with I ⊂⊂ I˜. This is clear for a fixed pair I, I˜ of the family, because π ˜ is non-degenerate. Then the statement K(I,I )
follows since the considered family of K(I, I˜)’s is countable.
Proposition 57. With the notations in Proposition 56, if π(C ∗ (A)) is a factor not of type I, then for each λ ∈ X the set Xλ ≡ {λ ∈ X, πλ πλ } has measure zero. Proof. The set Xλ is measurable by Lemma 60 below. We have µ(X Xλ ) > 0, as otherwise π would be quasi-equivalent to πλ , hence π(A) would be a type I factor. If µ(Xλ ) > 0, then πA would be the direct sum of two inequivalent representations ⊕ ⊕ πA = πλ dµ(λ) ⊕ πλ dµ(λ) Xλ
which is not possible since π(A) is a factor.
XXλ
Corollary 58. If there exists a localizable representation π of C ∗ (A) with π(C ∗ (A)) a factor not of type I, then there exist uncountably many inequivalent irreducible localizable representations of C ∗ (A). Proof. If the representation π is factorial not of type I, then the family of the πλ ’s in the above proposition contains an uncountable set of mutually inequivalent irreducible localizable representations as desired.
Multi-Interval Subfactors and Modularity of Representations in CFT
667
Corollary 59. Let A0 be a strongly additive, split net of von Neumann algebras on the intervals of R which is Haag dual as in (21). If there exists a DHR localized endomorphism ρ of A0 with ρ(A0 (R)) a factor not of type I, then there exist uncountably many inequivalent irreducible DHR localized endomorphisms of A0 . Proof. Immediate by Corollary 58 and Corollary 51. Before concluding this appendix we have to prove a lemma that has been used. Let A be any separable C ∗ -algebra and σ a representation of A. Choose a sequence of elements a9 ∈ A dense in the unit ball A1 , a sequence ϕi ∈ A∗ dense in the Banach space of normal linear functionals (σ (A) )∗ associated with σ . A linear functional ϕ ∈ A∗ is then normal with respect to σ if and only if ∀k ∈ N, ∃i ∈ N : |ϕ(a9 ) − ϕi (a9 )| ≤
1 , ∀9 ∈ N. k
(23)
We thus have the following. Lemma 60. Let A be aseparable C ∗ -algebra, π a representation of A on a separable ⊕ Hilbert space and π = X πλ dµ(λ) a direct integral decomposition into a.e. irreducible representations πλ of A. For any irreducible representation σ of A, the set Xσ ≡ {λ, πλ σ } is measurable. ⊕ Proof. Let ξ = X ξ(λ)dµ(λ) be a vector with ξ(λ) = 0, for all λ ∈ X, and consider the functional of A given by ϕλ = (πλ (·)ξ(λ), ξ(λ)). As both σ and πλ are irreducible, we have σ πλ if and only if ϕλ is normal with respect to σ . With the previous notations, we then have by Eq. (23) Xσ = Xik9 , k
i
9
where 1 Xik9 = λ ∈ X : |ϕλ (a9 ) − ϕi (a9 )| ≤ . k As Xik9 is measurable, also Xσ is measurable.
Acknowledgements. A part of this work was done during visits of the first-named author to Università di Roma “Tor Vergata”. Y.K. acknowledges the hospitality and financial supports of CNR (Italy), Università di Roma “Tor Vergata” and the Kanagawa Academy of Science and Technology Research Grants. R.L. wishes to thank the Japan Society for the Promotion of Science for the invitation at the University of Tokyo in June 1997. The authors would like to thank K.-H. Rehren for comments.
References 1. Böckenhauer, J.: An algebraic formulation of level one Wess–Zumino–Witten models. Rev. Math. Phys. 8, 925–947 (1996) 2. Böckenhauer, J., Evans, D.E.: Modular invariants, graphs and α-induction for nets of subfactors I–III. Commun. Math. Phys. 197, 361–386 (1998), 200, 57–103 (1999) & 205, 183–228 (1999) 3. Böckenhauer, J., Evans, D.E., Kawahigashi, Y.: On α-induction, chiral projectors and modular invariants for subfactors. Commun. Math. Phys. 208, 429–487 (1999) 4. Brunetti, R., Guido, D., Longo, R.: Modular structure and duality in conformal quantum field theory. Commun. Math. Phys. 156, 201–219 (1993)
668
Y. Kawahigashi, R. Longo, M. Müger
5. Buchholz, D., D’Antoni, C., Longo, R.: Nuclear maps and modular structures. II. Commun. Math. Phys. 129, 115–138 (1990) 6. Connes, A.: Classification of injective factors. Ann. Math. 104, 73–115 (1976) 7. Conti, R.: Inclusioni di algebre di von Neumann e teoria algebrica dei campi. Tesi del dottorato di ricerca in matematica, Università di Roma “Tor Vergata”, 1996 8. D’Antoni, C., Longo, R., Radulescu, F.: Conformal nets, maximal temperature and models from free probability. J. Oper. Th. 45, 195–208 (2001) 9. Doplicher, S., Haag, R., Roberts, J.E.: Local observables and particle statistics, I. Commun. Math. Phys. 23, 199–230 (1971); II. 35, 49–85 (1974) 10. Doplicher, S., Longo, R.: Standard and split inclusions of von Neumann algebras. Invent. Math. 73, 493–536 (1984) 11. Drinfel’d, V.G.: Quantum groups. Proc. ICM-86, Berkeley, 1986, pp. 798–820 12. Evans, D.E., Kawahigashi, Y.: Quantum symmetries on operator algebras. Oxford: Oxford University Press, 1998 13. Evans, D.E., Kawahigashi, Y.: Orbifold subfactors from Hecke algebras II — Quantum doubles and braiding. Commun. Math. Phys. 196, 331–361 (1998) 14. Fredenhagen, K., Rehren, K.-H., Schroer, B.: Superselection sectors with braid group statistics and exchange algebras II. Rev. Math. Phys. Special issue, 113–157 (1992) 15. Fröhlich, J., Gabbiani, F.: Operator algebras and conformal field theory. Commun. Math. Phys. 155, 569—640 (1993) 16. Guido, D., Longo, R.: Relativistic invariance and charge conjugation in quantum field theory. Commun. Math. Phys. 148, 521—551 (1992) 17. Guido, D., Longo, R.: The conformal spin and statistics theorem. Commun. Math. Phys. 181, 11–35 (1996) 18. Guido, D., Longo, R., Wiesbrock, H.-W.: Extensions of conformal nets and superselection structures. Commun. Math. Phys. 192, 217–244 (1998) 19. Haagerup, U.: Connes’ bicentralizer problem and the uniqueness of the injective factor of type I I I1 . Acta. Math. 158, 95–148 (1987) 20. Kosaki, H.: Type III Factors and Index Theory. Res. Inst. of Math., Lect. Notes 43, Seoul Nat. Univ., 1998 21. Izumi, M.: Subalgebras of infinite C ∗ -algebras with finite Watatani indices II: Cuntz–Krieger algebras. Duke Math. J. 91, 409–461 (1998) 22. Izumi, M.: The structure of sectors associated with the Longo–Rehren inclusions I. General theory. Commun. Math. Phys. 213, 127–179 (2000) 23. Izumi, M., Longo, R., Popa, S.: A Galois correspondence for compact groups of automorphisms of von Neumann algebras with a generalization to Kac algebras. J. Funct. Anal. 10, 25–63 (1998) 24. Jones, V.F.R.: Index for subfactors. Invent. Math. 72, 1–25 (1983) 25. Longo, R.: Index of subfactors and statistics of quantum fields I–II. Commun. Math. Phys. 126, 217–247 (1989); 130, 285–309 (1990) 26. Longo, R.: A duality for Hopf algebras and for subfactors. Commun. Math. Phys. 159, 133–150 (1994) 27. Longo, R.: Algebraic and modular structure of von Neumann algebras of physics. Proc. Symp. Pure Math. 38, Part 2, 551 (1982) 28. Longo, R., Rehren, K.-H.: Nets of subfactors. Rev. Math. Phys. 7, 567–597 (1995) 29. Longo, R., Roberts, J.E.: A theory of dimension. K-Theory 11, 103–159 (1997) 30. Masuda, T.: An analogue of Longo’s canonical endomorphism for bimodule theory and its application to asymptotic inclusions. Internat. J. Math. 8, 249–265 (1997) 31. Masuda, T.: Generalization of Longo–Rehren construction to subfactors of infinite depth and amenability of fusion algebras. J. Funct. Anal. 171, 53–77 (2000) 32. Müger, M.: On charged fields with group symmetry and degeneracies of Verlinde’s matrix S. Ann. Inst. H. Poincaré (Phys. Théor.) 71, 359–394 (1999) 33. Müger, M.: Categorical approach to paragroup theory I. Ambialgebras in and Morita equivalence of tensor categories & II. The quantum double of tensor categories and subfactors. In preparation 34. Müger, M.: Global symmetries in conformal field theory: Orbifold theories, simple current extensions and beyond. In preparation 35. Ocneanu, A.: Quantum symmetry, differential geometry of finite graphs and classification of subfactors. University of Tokyo Seminary Notes 45 (Notes recorded by Y. Kawahigashi), 1991 36. Ocneanu, A.: An invariant coupling between 3-manifolds and subfactors, with connections to topological and conformal quantum field theory. Preprint 1991 37. Ocneanu, A.: Chirality for operator algebras. (Notes recorded by Y. Kawahigashi), in: Subfactors (ed. H. Araki, et al.), Singapore: World Scientific, 1994, pp. 39–63 38. Pimsner, M., Popa, S.: Entropy and index for subfactors. Ann. Scient. Éco. Norm. Sup. 19, 57–106 (1986) 39. Popa, S.: Symmetric enveloping algebras, amenability and AFD properties for subfactors. Math. Res. Lett. 1, 409–425 (1994)
Multi-Interval Subfactors and Modularity of Representations in CFT
669
40. Popa, S.: Classification of Subfactors and their Endomorphisms. CBMS Lecture Notes Series, 86 41. Rehren, K.-H.: Braid group statistics and their superselection rules. In: The Algebraic Theory of Superselection Sectors. D. Kastler ed., Singapore: World Scientific, 1990 42. Rehren, K.-H.: Space-time fields and exchange fields. Commun. Math. Phys. 132, 461–483 (1990) 43. Schroer, B.: Recent developments of algebraic methods in quantum field theories. Int. J. Mod. Phys. B 6, 2041–2059 (1992) 44. Takesaki, M., Winnink, M.: Local normality in quantum statistical mechanics. Commun. Math. Phys. 30, 129–152 (1973) 45. Takesaki, M.: Theory of Operator Algebras. I. Springer-Verlag, Berlin–Heidelberg–New York: SpringerVerlag, 1979 46. Turaev, V.G.: Quantum invariants of knots and 3-manifolds. Berlin–New York: Walter de Gruyter, 1994 47. Wassermann, A.: Operator algebras and conformal field theory III: Fusion of positive energy representations of SU (N ) using bounded operators. Invent. Math. 133, 467–538 (1998) 48. Wenzl, H.: Hecke algebras of type An and subfactors. Invent. Math. 92, 345–383 (1988) 49. Xu, F.: New braided endomorphisms from conformal inclusions. Commun. Math. Phys. 192, 347–403 (1998) 50. Xu, F.: Jones-Wassermann subfactors for disconnected intervals. Commun. Contemp. Math. 2, 307–347 (2000) Communicated by A. Connes
Commun. Math. Phys. 219, 671 – 702 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Renormalization Group Flow of the Two-Dimensional Hierarchical Coulomb Gas Leonardo F. Guidi , Domingos H. U. Marchetti Instituto de Física, Universidade de São Paulo, Caixa Postal 66318, 05315 São Paulo, SP, Brasil. E-mail:
[email protected];
[email protected] Received: 29 November 1999 / Accepted: 13 January 2001
Abstract: We consider a quasilinear parabolic differential equation associated with the renormalization group transformation of the two-dimensional hierarchical Coulomb system in the limit as the size of the block L ↓ 1. We show that the initial value problem is well defined in a suitable function space and the solution converges, as t → ∞, to one of the countably infinite equilibrium solutions. The j th nontrivial equilibrium solution bifurcates from the trivial one at βj = 8π/j 2 , j = 1, 2, . . . . These solutions are fully described and we provide a complete analysis of their local and global stability for all values of inverse temperature β > 0. Gallavotti and Nicoló’s conjecture on infinite sequence of “phases transitions” is also addressed. Our results rule out an intermediate phase between the plasma and the Kosterlitz–Thouless phases, at least in the hierarchical model we consider. 1. Introduction We consider, for each β > 0, the partial differential equation β (1.1) (uxx − u2x ) − 2u = 0 4π on R+ × (−π, π ) with periodic boundary condition, u(t, −π ) = u(t, π ) and ux (t, −π) = ux (t, π ), in the space of even functions, satisfying an additional condition u(t, 0) = 01 . We show that the initial value problem is well defined in an appropriate function space B and the solution exists and is unique for all t > 0. Furthermore, as t → ∞, the solution converges in B to one of the (equilibrium) solutions φ of β (1.2) φ − (φ )2 + 2φ = 0, 4π ut −
Supported by FAPESP under grant #98/10745 − 1.
Partially supported by CNPq, FINEP and FAPESP. 1 This is assured by a Lagrange multiplier (see Remark 3.1).
672
L. F. Guidi, D. H. U. Marchetti
with φ(−π) = φ(π) and φ (−π) = φ (π ). For β > 8π , φ0 ≡ 0 is the (globally) asymptotically stable solution of (1.1). For β < 8π such that 8π/ (k + 1)2 ≤ β < 8π/k 2 holds for some k ∈ N+ , φ0 is unstable and there exist 2k non-trivial equilibria solutions φ1± , . . . , φk± of (1.2) among which φ1± are the only asymptotically stable ones.
The aim of the present work is to show that, for j ≥ 1, φj± have a (j − 1)-dimensional unstable manifold Mj ⊂ B so φj± are more stable than φj± if j < j . As a consequence, there exists a dense open set of initial conditions in B such that φ1+ (φ1− is not physically admissible) is the non-trivial stable solution for all β < 8π . Our description of Eq. (1.1) is motivated by two distinct goals. Firstly, it provides a new example of a nonlinear parabolic differential equation by which a geometric theory can be carried out (see e.g. Henry [H]). According to this theory, the above scenario can be stated as follows: there exist a sufficiently large ball B0 ⊂ B about the origin such that, if u(t, B0 ) denotes the set of points reached at time t starting from any initial function in B0 , then the invariant set t≥0 u(t, B0 ) coincides with the k-dimensional unstable manifold Kk = 0≤j ≤k Mj = M0 provided 8π/(k + 1)2 ≤ β < 8π/k 2 . Secondly, the solution of the initial value problem (1.1) describes the renormalization group (RG) flow of the effective potential in the two-dimensional hierarchical Coulomb system and the stationary solutions φj+ , the fixed points of RG, contain information on its critical phenomena. The analysis of Eq. (1.1) presented here can hopefully bring some light to a question raised by Gallavotti and Nicoló [GN] on the “screening phase transitions” in twodimensional Coulomb systems. The existence of infinitely many thresholds of “instabilities” found in the Mayer series at inverse temperature βn = 8π(1 − 1/(2n)), n ∈ N+ , indicates, according to the authors, a sequence of “intermediate” phase transitions from the “plasma phase” (β ≤ β1 = 4π ) to the multipole phase (β ≥ β∞ = 8π ). They conjectured that some partial screening takes place when the inverse temperature decreases from 8π to 4π, which prevents the formation of the neutral multipole of order larger than 2n, where n is the integer part of 1/(2 − β/4π ) (dipoles are the last to be prevented at 4π). The Kosterlitz–Thouless phase (multipole phase) was established by Fröhlich– Spencer [FS] and extended up to 8π by one of the present authors and A. Klein [MK]. Debye screening was only proved for sufficiently small β 1, where
x y = N (x, y) := inf N ∈ N+ : LN LN
(2.6)
and [z] ∈ Z2 has components the integer part of the components of z ∈ R2 . Notice that dh is not invariant by translations. Now, given an integer number N > 1 , let = N = [−LN , LN − LN−1 ]2 ∩ Z2 and define, for each configuration q ∈ Z , the block configuration q 1 : N−1 −→ Z, q(Lx + y). (2.7) q 1 (x) = 0≤yi