Commun. Math. Phys. 278, 1–29 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0398-9
Communications in
Mathematical Physics
Rigorous Remarks about Scaling Laws in Turbulent Fluids F. Flandoli1 , M. Gubinelli2 , M. Hairer3 , M. Romito4 1 2 3 4
Dipartimento di Matematica Applicata, Università di Pisa, via Buonarroti 1, 56127 Pisa, Italia Equipe de probabilités, statistique et modélisation, Université de Paris-Sud, 91405 Orsay Cedex, France Department of Mathematics, The University of Warwick, Coventry CV4 7AL, United Kingdom Dipartimento di Matematica, Università di Firenze, viale Morgagni 67/a, 50134 Firenze, Italia. E-mail:
[email protected] Received: 18 July 2005 / Accepted: 12 September 2007 Published online: 8 December 2007 – © Springer-Verlag 2007
Abstract: A definition of scaling law for suitable families of measures is given and investigated. First, a number of necessary conditions are proved. They imply the absence of scaling laws for 2D stochastic Navier-Stokes equations and for the stochastic Stokes (linear) problem in any dimension, while they imply a lower bound on the mean vortex stretching in 3D. Second, for the 3D stochastic Navier-Stokes equations, necessary and sufficient conditions for scaling laws to hold are given, translating the problem into bounds for energy and enstrophy of high and low modes respectively. Unlike in the 2D case, the validity or invalidity of such conditions in 3D remains open. 1. Introduction The scaling law devised by Kolmogorov and Obukhov for turbulent 3D fluids in 1941 (called K41 in the sequel) says that in the inertial range S2 (r ) ∼ 2/3r 2/3 , where S2 (r ) is the second order structure function and is the mean energy dissipation rate. Since Kolmogorov’s work of 1962 (and supported later by experimental evidence), K41 scaling has been believed false and has been replaced by S2 (r ) ∼ 2/3r 2/3 (r/L)κ (called K62 in the sequel) for some small value of κ > 0. Here L is the length scale at which energy is injected and the correction (r/L)κ accounts for the effects of small scale intermittency. We refer to [16,15,13,19] for further physical details. The exact value of κ and the validity itself of the previous prediction are still open problems, although there is a general agreement about the K62 “anomalous” scaling law. With respect to these difficult open problems the aim of the present work is very limited; in a sentence, our hope is just to fix some rigorous definitions and preliminary results in order to encourage further investigation by the mathematical community, especially the one dealing with stochastic partial differential equations. To be precise, our aims are the following: 1. We give one possible rigorous definition of scaling law inspired by the previous conjectures; this issue is not a priori obvious due to the fact that the scaling should
2
F. Flandoli, M. Gubinelli, M. Hairer, M. Romito
hold only in a certain range of r ’s, which does not extend to zero for finite viscosity ν > 0, but it tends to extend to zero as ν → 0. We provide some mathematical examples to understand this definition (Remark 1.5, Example 2.3). 2. We rigorously prove that the 2D Navier-Stokes equations on the torus perturbed by a large class of additive white noise cannot fulfill such a scaling. This proves rigorously what is believed on the basis of convincing but still heuristic physical arguments, see the classic papers of Onsager [22], von Neumann [27], Batchelor [1] and Fjørtoft [7]. In particular the work of Lee [20] presents a clear (albeit non-rigorous) argument which excludes the K41 scaling in 2D by showing that such a scaling is incompatible with the conservation of enstrophy. 3. The same result as in (2) is true for the 3D Stokes problem, thus only 3D nonlinear effects could produce either K41 or K62. The relevance of non-linear terms for 3D turbulence of course has been conjectured a long time ago, see for instance Taylor [24,23] and Taylor and Green [25]. We rigorously prove a necessary condition: if such scaling laws are true, then one has lower bounds on vortex stretching. The result is proved for a 3D Navier-Stokes equation on the torus perturbed by additive white noise. These results are a rigorous version of an observation made by Batchelor and Townsend [2]. 4. Although we cannot prove or disprove the scaling laws for the 3D stochastic NavierStokes equations, at least we give a number of necessary and/or sufficient conditions which could help both to understand the meaning of the scaling properties and for further investigation. As already mentioned, we base our analysis on the stochastic Navier-Stokes equations on the torus [0, 1]d , with d = 2, 3, ∂u + (u · ∇)u + ∇ p = νu + h α (x)β˙α (t) ∂t α
(1.1)
supplemented with the incompressibility condition div u = 0 and periodic boundary conditions. Here, h α (x) denote suitable vector fields and βα (t) denote independent Brownian motions (the torus instead of a more realistic framework has been chosen for mathematical simplicity). Let us remark that the theoretical usefulness of the stochastic Navier-Stokes equations has been noted since the early work of Novikov [21] who observed (albeit in a non-rigorous setup) that the Itô formula implies a simple energy balance equation for the model (see Remark 1.1 below). In the limit ν → 0 Eq. 1.1 is a singular limit problem much like the boundary layer one, and so may be considered as a prototype of high Reynold number singular limit problem, with some mathematical simplifications due to the advantages produced by stochastic analysis. Let us also remark that for 2D Navier-Stokes equations in unbounded domains (or with large-scale dissipation) the theories of Batchelor [3] and Kraichnan [17] predict that S2 (r ) ≈ r 2 with logarithmic corrections. Moreover, bounds on the energy spectrum for the deterministic 2D Navier-Stokes equations in a periodic domain forced on one or two eigenmodes of the Laplacian have been rigorously established by Constantin et al. [5] (see also [11,12]). It should be noted that another possible and interesting approach to the zero-viscosity limit is the one adopted in [18] (for the 2D case), where the amplitude of the forcing noise is proportional to the square-root of the viscosity.
Rigorous Remarks about Scaling Laws in Turbulent Fluids
3
1.1. Notations about functions spaces. Let T be the torus [0, 1]d , with d = 2, 3, L2 (T ) be the space of vector fields u : T → Rd with L 2 (T )-components, Hα (T ) be the analogous Sobolev spaces, C(T ) be the analogous space of continuous fields. 2 Let H be the space of all fields u ∈ L (T ) such that div u = 0, with zero mean, i. e. u(x) d x = 0 (zero mean), and the trace of u · n on the boundary is periodic (where T n is the outer normal, see [26], Ch. I, Thm 1.2). Let V be the space of divergence free, zero mean, periodic elements of H1 (T ) and D(A) be the space of divergence free, zero mean, periodic elements of H2 (T ). Finally, let D be the space of infinitely differentiable divergence free, zero mean, periodic fields on T . The spaces V , D(A) and D are dense and compactly embedded in H . Let A : D(A) ⊂ H → H be the (Stokes) operator Au = −u (componentwise). Sometimes we shall also need the same framework for the torus [0, L]d , d = 2, 3, with any L > 0. We set T L = [0, L]d , HL equal to the set of all fields u ∈ L2 (T L ) such that div u = 0 and u · n on the boundary is periodic, VL , D(A L ) and A L : D(A L ) ⊂ HL → HL the analogs of V , D(A) and A. Notice only that we define the inner product as |u|2HL = L −d T L |u(x)|2 d x (so that, roughly speaking, |u|2HL ∼ |u(0)|2 for homogeneous fields). 1.2. The class P of probability measures. If µ is a probability measure on a Banach space X and f is a function on X , we use the notation µ [ f (u)] := X f (u) dµ(u) whenever the integral is well defined. Let P0 be the family of all probability measures µ on H (equipped with the Borel σ -algebra) such that µ (D(A)) = 1 (D(A) is a Borel set in H ). Since H2 (T ) ⊂ C (T ) by the Sobolev embedding theorem, the elements of D(A) are continuous (have a continuous element in their equivalence class). Consequently, given x0 ∈ T , the mapping u → u (x0 ) is well defined on D(A), with values in Rd . In particular, any expression of the form µ [ f (u(x1 ), . . . , u(xn ))] is well defined for given x1 , . . . , xn ∈ T , given µ ∈ P0 , and suitable f : Rnd → R (for instance measurable non negative). It follows µ that S2 (r ) is well defined (possibly infinite) for every µ ∈ P0 . The same argument does not apply to Du(x0 ) and D 2 u (x0 ), at least in d = 3. This is why we use lengthy expressions like µ[ T Du(x) 2 d x] which are meaningful (possibly infinite) for every µ ∈ P0 . We denote by P the class of all µ ∈ P0 such that
µ
Du(x) d x < ∞ 2
T
and, for every a ∈ T and every rotation R that transforms the set of coordinate axes in itself, µ[ f (u(· − a))] = µ[ f (u)],
µ[ f (u(R·))] = µ[ f (Ru(·))]
(1.2)
for all continuous bounded f : H → R. In plain words, we impose space homogeneity and a discrete form of isotropy (compatible with the symmetries of the torus). In the following we will refer to this symmetry as partial or discrete isotropy. µ Discrete isotropy is imposed for two reasons. First, it ensures that S2 (r ) is independent of the coordinate unitary vector e, since given two such vectors e, e there is a rotation R as above such that R e = e, so
4
F. Flandoli, M. Gubinelli, M. Hairer, M. Romito
µ u(r e) − u(0) 2 = µ u(R r e ) − u(R 0) 2 = µ R(u(r e ) − u(0)) 2 = µ u(r e ) − u(0) 2 . Furthermore, we use discrete isotropy through Lemma A.3 in Appendix 3.2. Finally, µ notice that S2 (r ) < ∞ for every r > 0 and µ ∈ P, by Lemma 2.1 below. 1.3. Definition of scaling law. For every µ ∈ P we introduce the second order structure function µ S2 (r ) = µ u(r · e) − u(0) 2 (1.3) for some coordinate unitary vector e, with r > 0 (the results proved below extend to the so called longitudinal structure function; we consider (1.3) to fix the ideas). The measures of P are supported on continuous vector fields, so the pointwise operations in µ (1.3) are meaningful. Moreover, the symmetries in P imply that S2 (r ) is independent of the coordinate unitary vector e (in addition most of the estimates proved in the sequel extend to every unitary vector e). There is not only one way to define a scaling law. Inspired by the K41 and K62 theories we choose the following definition. We prefer to avoid the additional parameter L and work on the given torus of size one; to express the smallness of the inertial range of r ’s with respect to the integral scale L we shall restrict the range of r ’s as explained below. As a preliminary technical remark, notice that we are going to define K41 and K62 scaling law for a set M ⊂ P × R+ and not for a family of measures {µν }ν>0 . The reason is that Eq. (1.1) may have (a priori) more than one stationary measure for any given ν and in certain claims it seems easier to consider a set of measures for a given ν. Given ν > 0 we use the notation Mν for the set section {µ ∈ P : (µ, ν) ∈ M}. Here and in the sequel, when we talk about a set M ⊂ P × R+ , we tacitly assume that Mν = ∅ for all sufficiently small ν > 0 , since otherwise several definitions and statements would be just empty. Given (µ, ν) ∈ P × R+ , we define the mean energy dissipation rate as 2 = (µ, ν) := ν · µ Du(x) d x . [0,1]d
Remark 1.1. If µ is a stationary measure of (1.1) and a mean energy equality (coming from the Itô formula) can be rigorously proved, one can show that does not depend on (µ, ν). Given (µ, ν) ∈ P × R+ , we also define the quantity η = η(µ, ν) := ν 3/4 (µ, ν)−1/4 .
(1.4)
Remark 1.2. In case of Eqs. (1.1), η is a length scale: ν has dimension [L]2 [T ]−1 , has dimension [L]2 [T ]−3 , so η has dimension [L]. The only combination of ν and in powers, having dimension [L], is the η above. This is the simplest reason to choose η as a length scale involved in Kolmogorov theory. More refined arguments may be found in [13] and related references.
Rigorous Remarks about Scaling Laws in Turbulent Fluids
5
In the following definition κ is a non-negative real number. Definition 1.3. We say that a Kolmogorov type scaling law with exponent 23 + κ holds true for a set M ⊂ P × R+ if there exist ν0 > 0, C > c > 0, C0 > 0, and a monotone function R0 : (0, ν0 ] → R+ with R0 (ν) > C0 and limν→0 R0 (ν) = +∞, such that the bound µ
c · r 2/3+κ ≤ S2 (r ) ≤ C · r 2/3+κ holds for every pair (µ, ν) ∈ M and every r such that ν ∈ (0, ν0 ] and C0 · η(µ, ν) < r < η(µ, ν) · R0 (ν), where η(µ, ν) is defined by (1.4). This definition corresponds to K41 theory in the case κ = 0 and to the more plausible K62 theory in the case κ > 0. In fact we should restrict our next investigation to the case κ > 0. However, since the validity of such a scaling law is still an open problem, although plausible, we find of theoretical interest to analyse the necessary and/or sufficient conditions in the general case κ ≥ 0. The previous definition is a particular case of the following notion. Definition 1.4. We say that a scaling law with exponent α ∈ (0, 2) and length scale µ η : M → R+ holds true for the structure function S2 (r ) on a set M ⊂ P × R+ , if there exist a decreasing function R0 : [0, ∞) → R+ , with limν→0 R0 (ν) = +∞ and constants C2 ≥ C1 > 0, C3 > 0, ν0 > 0, such that R0 (ν) > C3 and µ
C1 · r α ≤ S2 (r ) ≤ C2 · r α
for r ∈ [C3 η (µ, ν) , η (µ, ν) R0 (ν)]
(1.5)
for every ν ∈ (0, ν0 ) and every µ ∈ Mν . Remark 1.5. The divergent factor R0 (ν) in the previous definition is essential to have a non trivial notion. If, on the contrary, we simply ask that the scaling law holds on a bounded interval r ∈ [C3 ην , C4 ην ], we have a definition without real interest. Let us explain this fact with a (useless) definition and an example. Let us say that a family M ⊂ P × R+ satisfies a local α property, α < 2, if there is a function η(µ, ν) and constants C2 ≥ C1 > 0, C4 ≥ C3 > 0, ν0 > 0, such that µ
C1r α ≤ S2 (r ) ≤ C2 r α for r ∈ [C3 η(µ, ν), C4 η(µ, ν)]
(1.6)
for every ν ∈ (0, ν0 ) and every µ ∈ Mν . As an example, consider a case with the mapping ν −→ Mν which is single valued and injective and µν
S2 (r ) = ν −1 r 2 , µν
where Mν = {µν }. This function S2 (r ) certainly does not have any interesting scaling exponent (different from 2) but satisfies the previous local α property simultaneously 1 for a continuum of values of α. Indeed, given any α ∈ (0, 2) take η(µν , ν) = ν 2−α ; then ν ν given a choice of C4 ≥ C3 > 0, for every r ∈ [C3 η(µ , ν), C4 η(µ , ν)], namely for 1 − 2−α ν r ∈ [C3 , C4 ], we have µν
1
S2 (r ) = (ν − 2−α r )2−α r α ∈ [C1 , C2 ] · r α with C1 = C32−α , C2 = C42−α . This example shows that the local α property is not a distinguished scaling property.
6
F. Flandoli, M. Gubinelli, M. Hairer, M. Romito
2. Necessary Conditions for K41 2.1. General results. The results of this subsection apply to suitable families of probability measures, without any use of the Navier-Stokes equations. They will then be applied to the stochastic Navier-Stokes equations in the next subsection. Given a measure µ ∈ P, µ = δ0 , we introduce the number θ (µ) defined by the identity µ [0,1]d Du(x) 2 d x , θ (µ)2 = (2.1) µ [0,1]d D 2 u(x) 2 d x letting θ (µ) = 0 when µ[ T D 2 u(x) 2 d x] = ∞. If µ = δ0 , the numerator and denominator vanish and we arbitrarily define θ (µ) = 1. We have θ (µ) ≤ C, where the constant is universal and depends only on the Poincaré constant of the torus. By definition, we have θ (µ)2 =
ν·µ
(µ, ν)
2 2 T D u(x) d x
for every pair (µ, ν) ∈ P × R+ . It follows from trivial dimensional analysis that θ has the dimension of a length. We interpret it as an estimate of the length scale where dissipation is more relevant. Indeed, very roughly, from 2 D 2 u(x) 2 d x u (k)|2 ) |k| (|k|2 | T ∼ 2 |k|2 | u (k)|2 T Du(x) d x we see that θ (µ)−2 has the meaning of typical square wave length of dissipation (looking at |k|2 | u (k)|2 as a sort of distribution in wave space of the dissipation). Lemma 2.1. For every µ ∈ P such that θ (µ) > 0 we have µ
S2 (r ) 1
≤ r2 · r 2 ≤ 4d µ T Du(x) 2 d x
(2.2)
for every r ∈ (0, θ (µ)/4d]. The upper bound is true for every r > 0 even if θ (µ) = 0. Proof. Since we want to use the Taylor formula for elements of D(A), we use the mollification described in Appendix A. We denote by µε the mollifications of µ. We prove in Appendix A that, for given r and µ, Du(x) 2 d x , lim µε Du(0) 2 = µ ε→0 T 2 2 2 2 D u(x) d x , lim µε D u(0) = µ ε→0 T lim µε u(r e) − u(0) 2 = µ u(r e) − u(0) 2 . ε→0
By space homogeneity of µε , µε u(r e) − u(0) 2 ≤ r 2
1 0
µε Du(σ e) 2 dσ = r 2 µε Du(0) 2
Rigorous Remarks about Scaling Laws in Turbulent Fluids
7
and thus, by the previous convergence results,
µ u(r e) − u(0)
2
≤r µ
2
Du(x) d x . 2
T
This implies the right-hand inequality of (2.2) for every r > 0. On the other hand, for smooth vector fields we have
1
u(r e) − u(0) = Du(0)r e + r 2
D 2 u(σ e)(e, e) dσ
0
and thus 2 2 2 µε Du(0)r e ≤ 2µε u(r e) − u(0) + 2µε r
1
D u(σ e)(e, e) dσ 2
2
.
0
Again from space homogeneity of µε ,
2 µε r
1 0
2 4 2 2 D , D u(σ e)(e, e) dσ µ u(0) ≤ r ε 2
and from Lemma A.3 of Appendix A, µε Du(0)e 2 = d −1 µε Du(0) 2 . Therefore r2 µε Du(0) 2 − r 4 µε D 2 u(0) 2 . µε u(r e) − u(0) 2 ≥ 2d We thus have in the limit r2 µ S2 (r ) ≥ 2d
T
Du(x) d x − r µ 2
4
D u(x) d x 2
T
2
and therefore, by definition of θ (µ), S2 (r ) ≥
r2 1 2 − µ Du(x) d x · r 2 . 2d θ (µ) T
This implies the left-hand inequality of (2.2) for r ∈ 0, θ(µ) 4d . The proof is complete. Theorem 2.2. Assume a scaling law with exponent α ∈ (0, 2) and length scale η : µ M → R+ holds true for the structure function S2 (r ) on a set M ⊂ P × R+ . Let θ (µ) be the dissipation length scale defined above. Then the two length scales θ (µ) and η(µ, ν) are related by the property
θ (µ) lim sup sup η(µ, ν) ν→0 µ∈Mν
< ∞.
(2.3)
8
F. Flandoli, M. Gubinelli, M. Hairer, M. Romito
Proof. It is intuitively rather clear that (2.2) is in contradiction with (1.5) if the ranges of r where the two properties hold overlap, so we need the bound (2.3). The proof below
confirm this intuition by ruling out the possibility that the factor µ T Du(x) 2 d x may produce a compensation. Moreover, let us notice that one could believe that the proof of (2.3) is trivial. But Remark 1.5 above shows that (2.2) and (1.6) are compatible: thus we feel that a detailed proof of (2.3) is necessary. We argue by contradiction and assume that there exists a sequence (µn , νn ) ∈ M, with νn → 0, such that θ (µn ) = +∞. η(µn , νn )
lim
n→∞
(2.4)
Notice that, in such a case, θ (µn ) must be positive, so Lemma 2.1 applies. Let us consider two sequences rn and rn defined as follows: rn = C3 η(µn , νn ), rn = rn an with limn→∞ an = +∞, rn ≤ η(µn , νn )R0 (νn ) and rn ≤ θ (µn )/(4d), where we ask that the last two inequalities are satisfied at least eventually. Such a sequence rn exists because limν→0 R0 (ν) = +∞ and (2.4) is assumed. We have (eventually) that rn , rn are both in [C3 η(µn , νn ), η(µn , νn )R0 (νn )] and n) (0, θ(µ 4d ], hence for both rn := rn and rn := rn we have 1 µ βn rn2 ≤ S2 n (rn ) ≤ βn rn2 , 4d
where we have set βn = µn T Du(x) 2 d x . The contradiction will come from the fact that, if it could happen that βn adjusts the factor rn2 to produce rnα , this cannot happen simultaneously for the two sequences rn = rn and rn = rn . Indeed, from the previous inequalities we must have µ
C1rnα ≤ S2 n (rn ) ≤ C2 rnα ,
C1rnα ≤ βn rn2 , βn rn2 ≤ 4dC2 rnα , hence βn ≥ C1rnα−2 , βn ≤ 4dC2 rnα−2 for both rn = rn and rn = rn . But the inequalities βn ≥ C1 (rn )α−2 ,
βn ≤ 4d C2 (rn )α−2
and the assumption α < 2 imply rn ≥ Crn eventually, for a suitable constant C > 0. This is impossible since limn→∞ an = +∞. The proof is complete. Example 2.3. Let us give an example of a function of (ν, r ) which satisfies the properties of Definition 1.4 and also 2.2 (to see that they are compatible). It may look artificial, but it was devised on the basis of the vortex model of [10]. The function is 1 l ∧ r 2 dl µν S2 (r ) = l 2/3 l l η with η = ν 3/4 . We have r ≤η⇒
µν S2 (r )
=
η
1
l 2/3
r 2 dl 3 = r 2 ν −1 − 1 l l 4
Rigorous Remarks about Scaling Laws in Turbulent Fluids
9
which is essentially the behaviour (2.2). On the other hand, r ∈ [η, 1], and so 1 r r 2 dl 9 dl 3 3 µν + = r 2/3 − ν 1/2 − r 2 , S2 (r ) = l 2/3 l 2/3 l l l 4 2 4 η r which is bounded above and below by the order r 2/3 since r ∈ [ν 3/4 , 1] (ν 1/2 ≤ r 2/3 ). Let us finally state two general consequences of the previous theorem, that we shall apply to the stochastic Navier-Stokes equations. Corollary 2.4. Given a family M ⊂ P × R+ , if inf (µ,ν)∈M θ (µ) > 0, then no scaling law with exponent α ∈ (0, η(µ, ν) such that 2) may hold true with a length scale lim inf ν→0 inf µ∈Mν η(µ, ν) = 0. We shall see that this simple corollary applies to the 2D stochastic Navier-Stokes equation and the Stokes problem, so a Kolmogorov type scaling law is ruled out for these systems. Let us apply the theorem to the case of a Kolmogorov type scaling law. We take, in the previous theorem, η(µ, ν) = η(µ, ν) = ν 3/4 (µ, ν)−1/4 as in Definition 1.3. In the following result, µ T D 2 u(x) 2 d x may be infinite. In fact, in the next corollary we only use the property of η(µ, ν) and not the scaling exponent 23 + κ. Corollary 2.5. Let M ⊂ P × R+ be a family having a scaling law with the exponent α ∈ (0, 2) and the length scale η(µ, ν) of Definition 1.3. Then there exist ν0 > 0 and C > 0 such that 2 2 µ D u(x) d x ≥ C 3/2 (µ, ν) · ν −5/2 T
for every ν ∈ (0, ν0 ) and every µ ∈ Mν . Proof. From (2.3), the definition of η(µ, ν) and the definition of θ 2 (µ) we have
µ T Du(x) 2 d x
< ∞. lim sup sup 3/2 (µ, ν)−1/2 µ T D 2 u(x) 2 d x ν→0 µ∈Mν ν Thus, from the definition of (µ, ν),
ν −5/2 (µ, ν)3/2
lim sup sup 2 2 ν→0 µ∈Mν µ T D u(x) d x
This implies the claim of the corollary.
< ∞.
Remark 2.6. Dimensional analysis says that ν has dimension [L]2 [T ]−1 , has dimen2 −3 3/2 −5/2 has dimension [L]−2 [T ]−2 , the correct dimension sion [L] [T ]2 , so 2 (µ,
ν) · ν of µ T D u(x) d x . 2.2. Application to stochastic Navier-Stokes equations. In this section we consider Eq. (1.1) in dimension 2 and 3 and also the corresponding linear equations (Stokes equations).
10
F. Flandoli, M. Gubinelli, M. Hairer, M. Romito
2.2.1. The noise Since we are dealing with spaces of translation invariant measures, we wish to consider classes of noises that produce such measures. Every Gaussian translation invariant noise is ‘diagonal’ with respect to the Stokes operator A in the sense that eigenmodes are all independent. In order to give a rigorous definition for our driving noise, we define (∞) := {k ∈ 2π Zd : |k| > 0} and we assume that the noise of Eq. (1.1) has the form k∈ (∞) σk β˙k (t)e−ik·x , where (βk )k∈ (∞) are independent d-dimensional Brownian motions and thecoefficients (σk )k∈ (∞) are d × d complexvalued matrices such that k · σk = 0 and k∈ (∞) |σk |2 < ∞. Additional assumptions are: we assume that σk = σ−k for every k ∈ (∞) , and |σk | = |σ Rk | for all k ∈ (∞) and for every coordinate rotation R. Together, they imply that the vector-valued random field W (t, x) = k∈ (∞) σk βk (t)e−ik·x is, for every t ≥ 0, real and partially isotropic. Finally, in order to have measures with µ(D(A)) = 1 we assume that |k|2 |σk |2 < ∞, k∈ (∞)
since the values |k|2 correspond to the eigenvalues of A. 2.2.2. The two-dimensional case We assume d = 2. The following result is well known. Lemma 2.7. Let µ be an invariant measure of (1.1) such that µ Du(x) 2 d x < ∞. T
Then µ ∈ P0 and
ν·µ
ν·µ
T
Du(x) d x = 2
T
1 |σk |2 , 2 (∞) k∈
D curl u(x) 2 d x =
1 |k|2 |σk |2 . 2 (∞) k∈
Proof. Given µ, consider the (product) filtered probability space (, A, (At )t≥0 , P) supporting both a family of independent d-dimensional Brownian motions βk (t), k ∈ (∞) , and a non anticipating random variable u 0 ∈ A0 with law µ. The corresponding strong solution u(t, x) of (1.1) is a stationary process and satisfies, due to Itô formula, the balance relations t 1 P E u(t, x) 2 d x + νE P Du(s, x) 2 d x 2 T 0 T 1 P 1 = E u 0 (x) 2 d x + t |σk |2 2 2 T (∞) k∈
and
t 1 P E curl u(t, x) 2 d x + νE P D curl u(s, x) 2 d x 2 T 0 T 1 P 1 2 = E curl u 0 (x) d x + t |k|2 |σk |2 . 2 2 T (∞) k∈
The result easily follows from stationarity.
Rigorous Remarks about Scaling Laws in Turbulent Fluids
11
Corollary 2.8. There exists a positive constant θ0 , independent of ν, such that θ (µ) ≥ θ0 for every invariant measure µ ∈ P of (1.1). Proof. The property θ (µ) ≥ θ0 follows from the definition of θ (µ) and the two identi ties of the previous lemma, since T D 2 u(x) 2 ≤ C T D curl u(x) 2 for a universal constant C > 0. In the next theorem, when we say that M ⊂ P × R+ is a family of invariant measures of (1.1), we clearly understand that each element (µ, ν) ∈ M has the property that µ is an invariant measure for the Markov semigroup associated to Eq. (1.1) with viscosity equal to ν. Theorem 2.9. In dimension d = 2, a family of invariant measures M ⊂ P × R+ of (1.1) cannot have any scaling law with exponent α ∈ (0, 2). Remark 2.10. Under our assumptions on the noise, invariant measures of (1.1) that belong to P certainly exist. In principle there could exist invariant measures for (1.1) not belonging to P, but this has recently been excluded under very weak conditions on the driving noise (see [14] and the references therein). Remark 2.11. Consider Eq. (1.1) without the nonlinear term (called Stokes equations): ∂u + ∇ p = νu + σk β˙k (t)e−ik·x ∂t (∞) k∈
in dimension d = 2, 3. Let M ⊂ P × R+ be a family of invariant measures for it. Then the same results of the previous theorem hold true. The proof is the same. Alternatively, one may work componentwise in the Fourier modes and prove easily the claims. 2.2.3. The three-dimensional case The lack of knowledge about the well posedness of the 3D stochastic Navier-Stokes equations has, among its consequences, the absence of the Markov property, and therefore of the usual notion of invariant measure. One may introduce several variants. Here we adopt the following concept. Consider the usual Galerkin approximations, recalled in Appendix B. The equation with generic index n in this scheme defines a Markov process, with the Feller property, and has invariant measures, by the classical Krylov-Bogoliubov method: if X nx (t) is its solution starting from x and νtn,x is the law of X nx (t) on H , by the Itô formula it is easy to get a bound of the form (see for instance [9]) 1 T x sup E X n (t) 2V dt ≤ C < ∞, T ≥0 T 0 which implies ([4] have been the first ones to use this elegant fast method) the necessary tightness in T of the time averaged measures 1 T n,x µn,x := ν dt. T T 0 t If we choose the initial condition x = 0, then µn,x T ∈ P (in particular it is space homogeneous and partially isotropic), so there exist invariant measures in P for the Galerkin equation. Denote by S n the set of all such invariant measures (thus S n ⊂ P).
12
F. Flandoli, M. Gubinelli, M. Hairer, M. Romito
The constant C in the estimate above is also independent of n; it follows that the invariant measures of the class S n just constructed fulfill the bound µn [ · 2V ] ≤ C. In fact it is possible to show that every element of S n has this property, [8] (if we do not want to use this property, it is sufficient to restrict the definition of S n in the sequel). These facts imply that ∪n S n is relatively compact in the weak topology of probability G (ν) (the superscript G will remind us that we use measures on H . We denote by PNS the particular procedure of Galerkin approximations) the set of limit points of ∪n S n , G (ν) if there precisely defined as follows: a probability measure µ on H belongs to PNS k n is a sequence kn → ∞ and elements µkn ∈ S such that µkn converges to µ in G (ν) are the weak topology of probability measures on H . The elements of the set PNS space homogeneous and partially isotropic (these relations are stable under weak convergence). Furthermore, they have the other regularity properties required to belong to P: finite second moment in V comes from the previous estimates, µ(D(A)) = 1 from a regularity result of [6], see also [8], summarized in the following lemma. Therefore G (ν) ⊂ P. PNS Lemma 2.12. Given ν > 0, there is a constant Cν > 0 (depending on ν) such thatµn (|A· 2/3 | H ) ≤ C for every n and every invariant measure µn ∈ S n . Given u ∈ V , let Su be the tensor with L 2 (T ) components Su =
1 (Du + Du T ) 2
(called the stress tensor). The scalar field Su (x) curl u(x), curl u(x) describes the stretching of the vorticity field. If we set ξ = curl u, then formally we have ∂t ξ + (u · ∇)ξ = νξ + Su ξ + i k × σk β˙k e−ik·x . k∈ (∞)
A formal application of the Itô formula yields the inequality 1 ν·µ D curl u(x) 2 ≤ µ Su (x) curl u(x), curl u(x) + |k|2 |σk |2 2 T T (∞) k∈
G for µ ∈ PNS (ν) (in fact formally the identity). Along with the general results of the previous sections we would get µ Su (x) curl u(x), curl u(x) d x ≥ C 3/2 (µ, ν) · ν −3/2 . (2.5)
T
This would be the final result of this section, having an interesting physical interpretation. However we are not able to prove it in this form. We analyze the status of this inequality by presenting some related rigorous results. They are of two different natures: Corollary 2.14 reformulates it for the coarse graining scheme given by Galerkin approxG (ν) imations; Corollary 2.18 expresses the most natural statement directly for µ ∈ PNS but it requires an additional unproved regularity assumption.
Rigorous Remarks about Scaling Laws in Turbulent Fluids
13
G (ν), and µ kn such that µ converges to µ in the Lemma 2.13. Given µ ∈ PNS nk ∈ S kn weak topology of probability measures on H , then
µ[|A · |2H ] ≤ lim inf µn k [|A · |2H ]. The same is true for µ[ T D curl u(x) 2 d x] in place of µ[|A · |2H ]. Proof. Let {ϕm }m∈N ∈ Cb (H ) be a sequence that converges monotonically increasing to |A · |2H for every x ∈ D(A) (it is easy to construct it by cut-off and finite dimensional approximations). Since µ(D(A)) = 1, by the Beppo-Levi theorem µ[ϕm ] → µ[|A ·|2H ]. Given ε > 0, let m 0 be such that µ[ϕm 0 ] ≥ µ[|A · |2H ] − ε. Since µn k [ϕm 0 ] → µ[ϕm 0 ] as k → ∞, eventually in k we thus have µn k [ϕm 0 ] ≥ µ[|A · |2H ] − 2ε, and therefore also µn k [|A · |2H ] ≥ µ[|A · |2H ] − 2ε. This proves the first part of the lemma; the second one is similar. G Corollary 2.14. Let M ⊂ P × R+ , with Mν ⊂ PNS (ν), be a family with the K41 scaling law, in the sense of Definition 1.4. Then there exist ν0 > 0 and C > 0 such that lim inf µn k Su (x) curl u(x), curl u(x) d x ≥ C 3/2 (µ, ν) · ν −3/2
T
k→∞
for every ν ∈ (0, ν0 ), every µ ∈ Mν and every sequence µn k ∈ S kn such that µkn converges to µ in the weak topology of probability measures on H . Proof. From the previous section we know that 2 2 µ D u(x) d x ≥ 3/2 (µ, ν) · ν −5/2 . T
Since A f, g H = curl f, curl g H
(2.6)
for every f, g ∈ D(A), we have 2 µ D curl u(x) d x ≥ C 3/2 (µ, ν) · ν −5/2 T
for a suitable universal constant C > 0. From the previous lemma we have lim inf µn k D curl u(x) 2 d x ≥ C 3/2 (µ, ν) · ν −5/2 . k→∞
T
Thus the claim of the corollary will follow from the inequality ν · µn k D curl u(x) 2 d x ≤ µn k Su (x) curl u(x), curl u(x) d x T
T
1 |k|2 |σk |2 . + 2 (∞) k∈
(2.7)
14
F. Flandoli, M. Gubinelli, M. Hairer, M. Romito
Let us sketch the proof of this inequality (see [8] for more details). Consider the Galerkin approximations du (n) + ν Au (n) + π (n) B(u (n) , u (n) ) dt = σk dβk e−ik·x k∈ (n)
described in Appendix B. From the Itô formula for Au (n) (t), u (n) (t) H we get t Au (n) (t), u (n) (t) + 2 Au (n) , ν Au (n) + π (n) B(u (n) , u (n) ) ds H
0
= Au (n) (0), u (n) (0)
H
H
1 + Mtn + |k|2 |σk |2 , 2 (∞) k∈ n
where
Mtn
is a square integrable martingale. We have Au (n) , π (n) B(u (n) , u (n) ) = Au (n) , B(u (n) , u (n) ) , H
since
π (n)
H
is selfadjoint and commutes with A. Besides (2.6) we also have A f, B(g, g) H = curl f, (g · ∇) curl g + Sg curl g H ,
hence A f, B( f, f ) H = curl f, S f curl f H for every f, g ∈ D(A). Therefore we have t (n) 2 | curl u (t)| H + (2ν|D curl u (n) |2H + curl u (n) , Su (n) curl u (n) H ) ds 0
≤ | curl u
(n)
(0)|2H
+ Mtn +
1 |k|2 |σk |2 . 2 (∞) k∈
This implies (2.7) and the proof is complete.
Remark 2.15. We cannot conclude (2.5) from the previous corollary without further (unproved) assumptions on µ or {µn k }. This could be just a technical point due to the present lack of better regularity estimates for the 3D Navier-Stokes equations, or it could be a facet of a deeper phenomenon. Let us explain it with a cartoon argument. First recall that it is easy to construct, say on the torus T , a sequence { f n } of func tions converging a.s. to zero, but with T f n d x = 1 (or even T f n d x → ∞): just take the mollifiers of a Dirac delta distribution; if we like, the example can be modified so that f n tend to develop singularities on a dense zero measure set in T , but the a.s. limit is still zero. Thus we see that for the limit measure µ we could have a small value of µ T Su (x) curl u(x), curl u(x) d x even if some coarse graining procedure, here represented by the Galerkin approximations, could give us a large value of µn k T Su (x) curl u(x), curl u(x) d x . Such arguments raise the question of the physical meaning of the true Navier-Stokes equations and possibly of its coarse graining approximations; this is not our aim, but we wanted to say that the previous corollary may be considered perhaps as a result of possible physical interest in itself, even if we cannot rewrite it in the form (2.5).
Rigorous Remarks about Scaling Laws in Turbulent Fluids
15
G (ν), and every sequence µ kn such that µ conLemma 2.16. Given µ ∈ PNS nk ∈ S kn verges to µ in the weak topology of probability measures on H , we also have µn k → µ weakly on [W 1,3 (T )]3 .
Proof. From the lemma above, {µn k } is bounded in probability on D(A): C 2/3 2/3 µn k (|Ax| H > R) = µn k |Ax| H > R 2/3 ≤ R −2/3 µn |A · | H ≤ 2/3 . R The embedding of D(A) into [W 1,3 (T )]3 is compact: recall that the Sobolev embedding theorem gives us W 2,2 ⊂ W 1,
6
6 β, 2β−1
for every β ∈ (1, 2), and the embedding of W
6 β, 2β−1
in W 2β−1 is compact; choose then β = 3/2. Therefore {µn k } is tight in [W 1,3 (T )]3 and we deduce that it converges weakly to µ also in [W 1,3 (T )]3 . G (ν) is the weak limit (in H and thus in [W 1,3 (T )]3 ) of a Corollary 2.17. If µ ∈ PNS k n sequence µn k ∈ S such that µn k [ · 2+ε V ] ≤ C for some ε, C > 0, then 1 ν·µ Du(x) 2 d x = |σk |2 . 2 T (∞) k∈
µn k [ · 3+ε V ]
≤ C, then If in addition 1 2 ν·µ D curl u(x) ≤ µ Su (x) curl u(x), curl u(x) + |k|2 |σk |2 . 2 T T (∞) k∈
Proof. It is sufficient to apply repeatedly the following fact: if µn → µ weakly in a Polish space X , ϕ ∈ C(X ) and µn [|ϕ|1+ε ] ≤ C, then µn [ϕ] → µ[ϕ]. This fact is well known but we provide the proof for completeness. Let Yn and Y be r.v.’s with law µn and µ resp., with values in X , such that Yn → Y a.s. in X . Then µn [ϕ] = E[ϕ(Yn )], µ[ϕ] = E[ϕ(Y )], so by the Vitali convergence theorem it is sufficient to prove that ϕ(Yn ) is uniformly integrable. We have E[ϕ(Yn )1ϕ(Yn )≥λ ] ≤ (E[ϕ(Yn ) p ])1/ p P(ϕ(Yn ) ≥ λ)1/q ≤ Cλ−δ . Thus the uniform integrability is proved and the proof is complete.
G (ν), be a family with the K41 Corollary 2.18. Let M ⊂ P × R+ , with Mν ⊂ PNS scaling law, in the sense of Definition 1.4. Assume that every µ in M is the weak limit of a sequence µn k ∈ S kn such that
µn k [ · 3+ε V ]≤C for some ε, C > 0. Then there exists ν0 > 0 and C > 0 such that (2.5) holds for every ν ∈ (0, ν0 ) and every µ ∈ Mν . Remark 2.19. If K41 scaling law holds then vortex stretching must be intense. Heuristically, no geometrical depletion of such stretching may occur (in contrast to the 2D case where the stretching term is zero because curl u(x) is aligned with the eigenvector of eigenvalue zero of Su (x)): indeed, if we extrapolate the behaviour E[|Du|2 ] ∼ ν1 as 1 Du ∼ √1ν , curl u ∼ √1ν , then we get E[Su curl u · curl u] ∼ ν √ if there is no help ν from the geometry. Another way to explain this idea is the following sort of generalised Hölder inequality.
16
F. Flandoli, M. Gubinelli, M. Hairer, M. Romito
G (ν), be a family with a scaling law Corollary 2.20. Let M ⊂ P × R+ , with Mν ⊂ PNS in the sense of Definition 1.3, fulfilling the assumptions of Corollary 2.18. Then there exists ν0 > 0 and C > 0 such that 1/3 1/2 2 2 µ Du d x ≤C µ Su curl u · curl u d x
T
T
for every ν ∈ (0, ν0 ) and every µ ∈ Mν . Proof. From the previous corollary and the definition of (µ, ν) we have 1/3
1/3 3/2 µ T Su curl u · curl u 2 d x ≥ C (µ, ν) · ν −3/2
1/2 = C 1/2 (µ, ν) · ν −1/2 = C µ T Du 2 d x . The proof is complete.
3. Necessary and Sufficient Conditions for Kolmogorov Type Scaling Laws As we said in the introduction, we advise the reader that we cannot prove or disprove a form of K62 law in dimension three. We simply restate the scaling laws of Kolmogorov type in various ways, with the hope to shed some light on them and encourage further research. We continue with the notations and concepts just introduced in the last section on the 3D case. The result of this section can be formulated for Definition 1.3, but the presence of the factor (µ, ν)−1/4 in the definition of η(µ, ν) makes some statements much less direct. So, having in mind the exploratory character of these equivalent conditions, we prefer to adopt a simplified form of our definition of the Kolmogorov type scaling law. Definition 3.1. We say that a scaling law of Kolmogorov type with exponent 23 + κ holds true for a set M ⊂ P × R+ if there exist ν0 > 0, C > c > 0, C0 > 0, and a monotone function R0 : (0, ν0 ] → R+ with R0 (ν) > C0 and limν→0 R0 (ν) = +∞, such that the bound µ
c · r 2/3+κ ≤ S2 (r ) ≤ C · r 2/3+κ
(3.1)
holds for every pair (µ, ν) ∈ M and every r such that ν ∈ (0, ν0 ] and C0 ν 3/4 < r < ν 3/4 R0 (ν). Recalling that η(µ, ν) = ν 3/4 (µ, ν)−1/4 , we see that this definition is equivalent to Definition 1.3 if there exist 1 > 0 > 0 such that 0 ≤ (µ, ν) ≤ 1 for all (µ, ν) ∈ M. Unfortunately, in 3D only the upper bound can be proven. However, this could be just a technical problem due to the fact that we can only use weak solutions (for slightly more regular solutions Corollary 2.17 implies that (µ, ν) would be bounded from above and below). Consider the auxiliary stochastic Navier-Stokes equations ∂t u (t, x) + ( u (t, x) · ∇) u (t, x) + ∇ p (t, x) = ν˜ u (t, x) + σk β˙k (t)e−ik·x (3.2) (∞)
k∈ L
Rigorous Remarks about Scaling Laws in Turbulent Fluids
17 (∞)
on the torus [0, L]3 with div u = 0 and periodic boundary conditions (the set L is defined in (B.1)). As we shall see below (see the next section and Lemma B.1), we obtain this equation when we perform the following scaling transformation on the solutions u of the original equation (1.1): u (t, x) = L 1/3 u(L −2/3 t, L −1 x) (and a suitably defined p (t, x)). The value of ν˜ under this transformation is ν˜ = ν L 4/3 . This scaling transformation has been introduced in the mathematical-physics literature, see [19]. What makes it special is that no coefficient depending on the scale parameter appears in front of the noise, so the energy input per unit of time and space is the same for every L. G (˜ Similarly to the case L = 1, we may introduce the (non-empty) set PNS ν , L) of limit points of the (homogeneous and isotropic) invariant measures of the corresponding Galerkin approximations. G the set of all pairs (µ, ν) such that µ ∈ P G Let us denote by PNS NS (ν). Similarly, let G G (˜ ˜ L) such that µ ∈ PNS ν , L). us denote by P˜ NS the set of all triples (µ, ν, 3.1. Basic equivalent condition. Let us introduce the notation P L for the set of probability measures analogous to P, but on the torus [0, L]3 . Denote by P· × R2+ the set of all triples (µ, ν, ˜ L) such that (˜ν , L) ∈ R2+ and µ ∈ P L . Definition 3.2. We call an admissible region a set D ⊂ R2+ of the following form: D = {(˜ν , L) ∈ R2+ ; ν˜ ∈ (0, ν0 ), L > R˜ 0 (˜ν )}, where ν˜ 0 > 0 and R˜ 0 : (0, ν˜ 0 ] → [1, ∞) is a strictly decreasing function with R˜ 0 (˜ν ) → ∞ as ν˜ → 0. An admissible region is depicted in the left-hand side of Fig. 3.1 below. ˜ ⊂ P· × R2+ is said to satisfy Condition A with anomalous Condition A. A subset M exponent κ if there exist an admissible region D ⊂ R2+ and two constants C > c > 0 such that ˜ u (e) − u (0) 2 ] ≤ C L −κ cL −κ ≤ µ[
(3.3)
˜ with (˜ν , L) ∈ D. We have denoted by for every ( µ, ν, ˜ L) ∈ M u the generic element of HL . G satisfies Condition A with anomalous exponent κ if and Proposition 3.3. The set P˜ NS G has a scaling law of Kolmogorov type with exponent 2 + κ, in the only if the set PNS 3 sense of Definition 3.1.
Proof. Given L > 0, consider the mapping SL : HL → H defined by (SL u )(x) = L −1/3 u (L x). This mapping induces a mapping S from P × R2+ to P × R+ by S(µ, ˜ ν, ˜ L) = SL∗ µ, ˜ ν˜ L −4/3 .
(3.4)
(3.5)
18
F. Flandoli, M. Gubinelli, M. Hairer, M. Romito
r˜
r
K −1 r = ν 3/ 4
ν
ν˜
ν˜ 0
(a) Parameter domain for condition A
(b) Image of the previous domain
Fig. 3.1. Effect of K −1 on an admissible domain
It follows immediately from Theorem B.2 that one has G G PNS = S(P˜ NS ).
(3.6)
Furthermore, it follows immediately from the above definitions that if (µ, ν) = S(µ, ˜ ν, ˜ r˜ ), then µ S2 (r ) = r 2/3 u(e) − u(0) 2 d µ(u). ˜ (3.7) Hr˜
It therefore follows that, in order to prove the equivalence between Condition A and Kolmogorov scaling law, it suffices to show that the domains of validity of Eq. 3.3 and of Eq. 3.1 are the same (with possibly different constants and functions R0 and R˜ 0 ), provided that (ν, r ) and (˜ν , L) are related by ν˜ = νr −4/3 ,
L = r −1 .
(3.8)
We denote by K : (ν, r ) → (˜ν , L) the above map. Condition A implies the condition in Definition 3.1 . The domain of validity of Eq. 3.3 is given by ν˜ ≤ ν˜ 0 ,
L ≥ R˜ 0 (˜ν ).
(3.9)
Under the map K −1 , this becomes r≥
ν ν˜ 0
3/4 ≡ C0 ν 3/4 ,
Both domains are shown in Fig. 3.1.
1 ≥ R˜ 0 (νr −4/3 ). r
(3.10)
Rigorous Remarks about Scaling Laws in Turbulent Fluids
19
r˜
r
K
r = ν 3/ 4
ν0
ν
(a) Parameter domain for K41
ν˜ 0
ν˜
(b) Image of the previous domain
Fig. 3.2. Effect of K on a domain of the type (3.13)
Defining the strictly decreasing function F(x) = x −3/4 R˜ 0 (x), the second condition of Eq. 3.10 is of course equivalent to ν −3/4 ≥ F(νr −4/3 ).
(3.11)
This condition (as can be inferred from Fig. 3.1), can only be satisfied simultaneously with the first condition in Eq. 3.10 if ν ≤ ν0 ≡ F(˜ν0 )−4/3 . On (0, ν0 ] this domain, Eq. 3.11 is equivalent to 3/4 ν r≤ ≡ ν 3/4 R0 (ν), (3.12) F −1 (ν −3/4 ) −3/4 . Additionally R0 is well-defined on (0, ν0 ] and it where R0 (x) = F −1 (x −3/4 ) is greater than C0 on this domain. Furthermore, since F is decreasing, R0 is strictly decreasing and it is easy to check that lim x→0 R0 (x) = ∞ because the same property holds for F. The condition in Definition 3.1 implies Condition A. The domain of validity of Eq. 3.1 is given by ν ≤ ν0 ,
r ν −3/4 ∈ [C0 , R0 (ν)].
(3.13)
ν˜ −3/4 ∈ [C0 , R0 (˜ν L −4/3 )].
(3.14)
Under the map K , this becomes ν˜ L −4/3 ≤ ν0 ,
The second condition can be rewritten as ν˜ ∈ [G(˜ν L −4/3 ), ν˜ 0 ], −4/3
where we defined ν˜ 0 = C0 in Fig. 3.2.
(3.15)
and G(x) = R0 (x)−4/3 . Both of these domains are shown
20
F. Flandoli, M. Gubinelli, M. Hairer, M. Romito
We can rewrite as above the condition ν˜ ≥ G(˜ν L −4/3 ) as 3/4 ν˜ ≡ R˜ 0 (˜ν ). L≥ G −1 (˜ν )
(3.16)
Again, it is an easy exercise to show that R˜ 0 as defined above is monotone and satisfies lim x→0 R˜ 0 (x) = ∞. The only points that remain to be clarified are: a. We haven’t taken the first equation in Eq. 3.14 into account. b. The domain of definition of R0 may not extend to ν˜ 0 . Both problems can be solved at once by simply choosing a smaller value for ν˜ 0 . Remark 3.4. Consider Eq. 3.2 and Condition A. We are in a situation where the energy injection rate per unit volume is independent of L and ν. In 3D there is clearly a cascade of energy from larger to smaller scales due to various instabilities. Kolmogorov-Obukhov 1941 theory assumes that the cascade is homogeneous, uniform, so that at scales larger than the dissipation scale the flux of energy per unit of volume and time is independent of L and ν. Under this assumption, it would be natural to conjecture that Condition A holds with κ = 0. However, the homogeneity assumption below this theory was not confirmed by later experiments and investigations. On the contrary, the fluid tends to build up localized structures at every scale larger than dissipation, that survive for times longer than the average, where energy is confined for a while and then released. This produces space-time intermittency of energy distribution, energy flux and dissipation. The consequence is a depletion of the average value of u (e) − u (0) 2 as L increases. It is then more natural to expect κ > 0 in Condition A and then in Eq. 3.1. This is supported by experiments but theoretically it is still unproved. 3.2. Necessary and sufficient conditions in terms of high and low modes. In this section, for notational simplicity, we drop the tildes in our notation. Recall that an admissible region is defined by D = {(ν, L) ∈ R2+ ; ν ∈ (0, ν0 ), L > R0 (ν)}, and that Condition A requires cL −κ ≤ µ[ u(e) − u(0) 2 ] ≤ C L −κ for every (µ, ν, L) with (ν, L) ∈ D. We start with a preparatory lemma which depends on the scaling properties of the stochastic Navier-Stokes equations in an essential way. This is the only point in this section where specific information about the measures is being used. G satisfies Condition A then there exist constants C > c > 0 and an Lemma 3.5. If P˜ NS admissible region D such that 23 −κ µ[ u(λe) − u(0) 2 ] dλ ≤ C L −κ L c ≤ e
1 2
G with (ν, L) ∈ D . The sum is extended to all coordinate for every (µ, ν, L) ∈ P˜ NS e unitary vectors. We simply have C = (1.52/3 d) · C, c = (0.52/3 d) · c, D defined by 0.54/3 · ν0 and 1.5R0 (1.5−4/3 ν), where ν0 and R0 (ν) define D.
Rigorous Remarks about Scaling Laws in Turbulent Fluids
21
G , namely µ ∈ P G (ν, L), consider the Proof. Given λ ∈ [ 21 , 23 ] and (µ, ν, L) ∈ P˜ NS NS measure µλ that corresponds to µ under the transformation u → λ−1/3 u(λ.) used in the previous section, having the property µ u(λe) − u(0) 2 = λ2/3 µλ u(e) − u(0) 2 .
By Theorem B.2 we know that µλ ∈ P NG S (νλ−4/3 , L/λ), hence (µλ , νλ−4/3 , L/λ) is in P˜ G . Thus Condition A implies NS
L −κ c ≤ µλ [ u(e) − u(0) 2 ] ≤ C L −κ if νλ−4/3 < ν0 and L/λ > R0 (νλ−4/3 ). The first condition is true if ν < 0.54/3 ν0 . The second one if L > 1.5R0 (1.5−4/3 ν). The proof can now be easily completed. Let us use some Fourier analysis on the torus TL = [0, L]d (see also Appendix 3.2). Every u ∈ HL is given by e−ik·x u (k) with u (k) := L −3 eik·x u(x) d x u(x) = TL
k∈ (∞) L
and we have Parseval identity −3 L u(x) 2 d x = u (k) 2 . TL
(∞)
k∈ L
We introduce another condition expressed in terms of the sum of the enstrophy of low modes and energy of high modes. ˜ ⊂ P· × R2+ is said to satisfy Condition B if there exist an Condition B. A subset M admissible region D ⊂ R2+ and two constants C > c > 0 such that L −κ c ≤ k 2 µ[ u (k) 2 ] + µ[ u (k) 2 ] ≤ C L −κ (∞)
k∈ L , k ≤1
(∞)
k∈ L , k >1
G such that (ν, L) ∈ D. for every (µ, ν, L) ∈ P˜ NS
Remark 3.6. Note that both the constants and the admissible regions involved in Conditions A and B need not necessarily be the same. With this definition, we may establish a first basic theorem as a corollary of the previous lemma. Theorem 3.7. Condition A implies Condition B. Proof. For every u ∈ HL we have 1 2 u(x + λe) − u(x) 2 d x = |eik·λe − 1|2 u (k) 2 u(λe) − u(0) = 3 L TL (∞) k∈ L
22
F. Flandoli, M. Gubinelli, M. Hairer, M. Romito
and thus, for every µ ∈ P NG S (ν, L) we have e
3 2 1 2
µ u(λe) − u(0) 2 dλ =
(∞)
3 2 1 2
e
k∈ L
|eik·λe − 1|2 dλ µ[ u (k) 2 ].
But there exist universal constants C > c > 0 such that 23 |eik·λe − 1|2 dλ ≤ C ( k 2 ∧ 1). c ( k 2 ∧ 1) ≤ e
1 2
Therefore, the quantities 23 µ[ u(λe) − u(0) 2 ] dλ e
1 2
and
( k 2 ∧ 1)µ[ u (k) 2 ]
(∞)
k∈ L
are “equivalent”, up to universal constants. This proves the claim.
We have at least a partial converse of the previous result. Let us introduce the following condition: ˜ ⊂ P· × R2+ is said to satisfy Condition C if there exist an Condition C. A subset M admissible region D ⊂ R2+ and two constants C > c > 0 such that k 2 µ[ u (k) 2 ] L −κ c ≤ (∞)
k∈ L k ≤1/2
≤
k 2 µ[ u (k) 2 ] +
(∞)
µ[ u (k) 2 ] ≤ C L −κ
(∞)
k∈ L k ≤1
k∈ L k >1
G such that (ν, L) ∈ D. for every (µ, ν, L) ∈ P˜ NS
Note that Condition C implies directly Condition B. What is more interesting is the following: Proposition 3.8. Condition C implies Condition A. Proof. We have e |eik·e − 1|2 ≤ C( k 2 ∧ 1) for every k. Moreover if k ≤ 1/2 we have c k 2 ≤ e |eik·e − 1|2 for some constant c > 0. The claim then follows from the next lemma and the following inequality: ik·e 2 2 ik·e 2 |e − 1| µ |e − 1| µ[ u (k) ≥ u (k)t 2 ] (∞)
k∈ L
e
(∞)
k∈ L k ≤1/2
≥c
(∞)
k∈ L k ≤1/2
e
k 2 µ[t u (k) 2 ].
Rigorous Remarks about Scaling Laws in Turbulent Fluids
23
G satisfies Condition A if and only if it satisfies the following Condition Lemma 3.9. P˜ NS A : there exist C > c > 0, and an admissible region D such that −κ ik·e 2 L c≤ |e − 1| µ[ u (k) 2 ] ≤ C L −κ (∞)
k∈ L
e
G such that (ν, L) ∈ D. for every (µ, ν, L) ∈ P˜ NS
Proof. From previous computations, we know that for every µ ∈ P NG S (ν, L) we have 2 ik·e 2 µ[ u(e) − u(0) ] = |e − 1| µ[ u (k) 2 ], (∞)
e
k∈ L
and this proves the claim.
e
Appendix A. Mollification of Measures Some computations involving Taylor expansion require more regularity than that of typical fields under µ ∈ P. For this reason we introduce mollifications of measures µ ∈ P. Note that this technical effort is useless if the noise is more regular, since one can prove more regularity of the typical elements under µ ∈ P. Let ϕ : R → R be a smooth function with compact support, symmetric, non-neg ative, strictly positive at zero, with ϕ( x ) d x = 1. Set φε (x) = ε−d ϕ( x/ε ), so d R Rd φε (x)d x = 1; {φε }ε>0 is a family of usual smooth mollifiers. For every u ∈ H set u ε (x) = Rd φε (x − y)u(y) dy. Given µ ∈ P0 , the mapping u → u ε in H induces an image measure µε ∈ P0 which is in fact supported on smooth fields. Lemma A.1. If µ ∈ P, then µε ∈ P. Proof. Using the change of variables y = y + a we have L u ε (x − a) = φε (x − y )u(y − a) dy = φε (x − y )u(y ) dy , Rd
Rd
where the last equality is understood in law under µ, and it holds true as processes in x. L
Hence u ε (· − a) = u ε (·), which can be written in terms of measures as f (u(· − a)) dµε (u) = f (u) dµε (u) H
H
for all bounded continuous f , so the space homogeneity of µε is proved. Similarly, we have u ε (Rx) = φε (R(x − R −1 y))u(y) dy = φε (x − R −1 y)u(y) dy Rd
Rd
by the symmetry of φε , and so we can conclude that for all bounded continuous f , f (u(R·)) dµε (u) = f (Ru(·)) dµε (u). H
The proof is complete.
H
24
F. Flandoli, M. Gubinelli, M. Hairer, M. Romito
Lemma A.2. For every µ ∈ P, if µ Du(x) 2 d x < ∞ T
µ
and
T
D 2 u(x) 2 d x < ∞,
then µ[ u(r e) − u(0) 2 ] < ∞ and
Du(x) 2 d x , ε→0 T D 2 u(x) 2 d x , lim µε [ D 2 u(0) 2 ] = µ lim µε [ Du(0) 2 ] = µ
ε→0
T
lim µε [ u(r e) − u(0) ] = µ[ u(r e) − u(0) 2 ]. 2
ε→0
Proof. Since for every u ∈ D(A), T Du ε (x) 2 d x is trivially bounded by a constant depending on T Du(x) 2 d x and T Du ε (x) 2 d x → T Du(x) 2 d x as ε → 0,
by Lebesgue theorem, µε T Du(x) 2 d x → µ[ T Du(x) 2 d x] as ε → 0. But
µε is space homogeneous, hence µε T Du(x) 2 = µε [ Du(0) 2 ]. This proves the first claim. The proof of the second one is entirely similar. For the third one, we have u ε (x + r e) − u ε (x) 2 = r
1 0
2 2 Du ε (x + σ e)e dσ ≤ r
0
1
Du ε (x + σ e) 2 dσ
for every u ∈ D(A), hence T
u ε (x + r e) − u ε (x) 2 d x ≤ r 2 ≤ Cr
1
T
0 2
T
Du ε (x + σ e) 2 d x dσ
Du(x) 2 d x.
Therefore, again by Lebesgue theorem, lim µε
ε→0
T
u(x + r e) − u(x) 2 d x = µ u(x + r e) − u(x) 2 d x T
and the third claim follows now from the space homogeneity of both µε and µ.
We are now in the position to prove a quantitative consequence of isotropy, that we shall use in the sequel. In the next statement we understand that both terms in the equality are either finite and equal, or both infinite. Lemma A.3. For every µ ∈ P and every coordinate unitary vector e we have Du(x) 2 d x = d µ Du(x) · e 2 d x . µ T
T
The same identity holds true for µε , moreover µε [ Du(0) 2 = d µε [ Du(0) · e 2 ].
Rigorous Remarks about Scaling Laws in Turbulent Fluids
25
Proof. Step 1. Denote by coordinate unitary vectors. e1 , . . . , ed the For u ∈ 2 D(A) 2 and Du(x) · e 2 = we have Du(x) 2 = |∂ u (x)| |∂x j u i (x)| , thus x i j j ij i 2 and so µ [ Du(0) 2 ] = Du(x) · e t µ [ Du(0) · e j 2 ]. Du(x) 2 = j ε j j ε Therefore µ
T
Du(x) 2 d x = µ Du(x) · e j 2 d x . j
T
It is then sufficient to prove that all terms of the sums on the right-hand-sides are equal, in order to prove the first and last claim of the lemma. We shall prove this below in Steps 2 and 3. Finally, the first assertion for µε is a particular case of the first claim of the lemma, since µε is an element of P. Step 2. Now, given j = 1, . . . , d, by applying a rotation R chosen as in the definition of P such that Re1 = e j , for any given N > 0, µε [ Du(0) · e j 2 ∧ N ] = lim µε [r −2 u(r e j ) − u(0) 2 ∧ N )] r →0
= lim µε [r −2 u(r e1 ) − u(0) 2 ∧ N ] r →0
= µε [ Du(0) · e1 2 ∧ N ]. By monotone convergence in N , we get that µε [ Du(0) · e j 2 ] is independent of j. This proves one of the claims. Step 3. From the previous step and homogeneity we have that the quantity
µε T Du(x) · e j 2 d x is also independent of j. Arguing as in the proof of the previ
ous lemma, this integral converges to µ T Du(x) · e j 2 d x , which is therefore also independent of j. The proof is complete.
Appendix B. Scaling Theorems We consider again the torus, T L = [0, L]d , the energy space HL with norm | · | HL , the spaces VL , D(A L ), D L and the Stokes operator A L on T L introduced in Sect. 1.1. We define (∞) L
=
2π d 2 Z : |k| > 0 , k∈ L
(B.1)
and, for the purpose of Galerkin approximations, we introduce also (n) L
(∞)
so that L
(n)
=
2π d k∈ Z : 0 < |k|2 ≤ L (∞)
= ∪n L . In particular, (∞) = 1
.
2π n L
2
26
F. Flandoli, M. Gubinelli, M. Hairer, M. Romito
B.1. Scaling theorem for Galerkin approximations. Let VL be the dual of VL ; with proper identifications we have VL ⊂ HL ⊂ VL with continuous injections. Let B L (·, ·) : VL × VL → VL be the bilinear operator defined for all u, v, w ∈ D L as w, B L (u, v) HL =
d ∂v j 1 ui wj dx = (l · u (h)) v (l) · w (k). (B.2) d L T L ∂ xi
i, j=1
h+l=k
Given L > 0, ν > 0 and θ > 0, consider (formally) the equation in HL , du + [ν A L u + B L (u, u)] dt = θ σkL dβkL e−ik·x , (∞)
k∈ L
where βkL = β Lk and σkL = σ Lk , and (βk )k∈ (∞) and (σk )k∈ (∞) have been introduced in Sect. 2.2.1 and are subject to the assumptions imposed therein, so that the random fields W L(n) (t, x) = σkL βkL (t) e−ik·x (n)
k∈ L (∞)
and the field W L (t, x) similarly defined, are space-homogeneous and partially (in the sense of the rotations of the torus) isotropic. (n) Let HL be the subspace of HL corresponding to the modes with wavelengths in (n) (n) L and consider the equation in HL , (n) du (n) + [ν A L u (n) + π L B L (u (n) , u (n) )] dt = θ σkL dβkL e−ik·x , (B.3) (n)
k∈ L (n)
(n)
where π L is the orthogonal projection of HL onto HL . Lemma B.1. If u (n) is a solution in HL of (B.3), with initial condition u (n) (0) and parameters (ν, L , θ ), then u (n) (t, x) := λβ u (n) (λ1+β t, λx) is a solution in HL/λ of Eq. (B.3) with initial condition u (n) (0) and parameters (νλβ−1 , L/λ, λ
1+3β 2
θ ) (but with new Brownian motions).
Proof. This statement is not clear a priori, especially because of the scaling transformation of the nonlinear term, so we give all the details. The solutions u (n) and u (n) (as a (n) process in HL/λ ) are given as Fourier series by (n) u (n) (t, x) = u (t, k) e−ik·x . u (n) (t, k) e−ik·x and u (n) (t, x) = (n)
k∈ L
(n)
k∈ L/λ
The Fourier coefficients of u (n) and u (n) are related by the scaling λβ (n) u (t, k) = d u (n) (λ1+β t, x ) eik ·x d x = λβ u (λ1+β t, k ) L TL (where x = λx and k = k/λ).
(B.4)
Rigorous Remarks about Scaling Laws in Turbulent Fluids
27
From Eq. (B.3) in integral form, t (n) u (t) + [ν A L u (n) + π L(n) B L (u (n) , u (n) )](s) 0 σkL βkL (t) e−ik·x , = u (n) (0) + θ (n)
k∈ L
we have β (n)
λ u
(λ
1+β
t, λx) + λ
1+2β 0
= λβ u (n) (0, λx) + λ
1+3β 2
θ
t
(n)
[ν A L u (n) + π L B L (u (n) , u (n) )](λ1+β s, λx) ds
L/λ L/λ βk (t) e−ik·x ,
σk
(n)
k∈ L/λ 1+β
(t) := λ− 2 β L (λ1+β t) are new Brownian motions. The first term on the where β k/λ k u (n) (0, x). In addition, we have l. h. s. is u (n) (t, x), and the first term on the r. h. s. is L/λ
u (n) (t, x) = λ2+β (A L u (n) )(λ1+β t, λx). A L/λ The proof of the claim will be complete if we show that (n) B L/λ ( u (n) , u (n) )](t, x). λ1+2β [π L(n) B L (u (n) , u (n) )](λ1+β t, λx) = [π L/λ
For every ϕ ∈ VL/λ , by using the Fourier expression (B.2) of the non-linear term and the scaling of Fourier coefficients (B.4), (n)
(n)
u (n) , u (n) )(t, ·), ϕ HL/λ = B L/λ ( u (n) , u (n) )(t, ·), π L/λ ϕ HL/λ π L/λ B L/λ ( (n) (n) l · u (t, h) = u (t, l) · ϕ (k) h+l=k
= λ1+2β =λ
1+2β
h l l · u (n) (λ1+β t, ) u (n) (λ1+β t, ) · ϕ (k) λ λ
h+l=k π L(n) B L (u (n) , u (n) )(λ1+β t, λ·), ϕ HL/λ ,
(n)
where the sums above are extended to all h, l and k ∈ L/λ such that h + l = k.
B.2 Scaling theorem for stationary measures. Similarly to Sect. 2.2.3, denote by P NG S (ν, L , θ ) the set of probability measures that are the limit of homogeneous isotropic invariant measures of Eqs. (B.3). Given λ > 0 and β ∈ R and µ ∈ P NG S (ν, L , θ ), let u be a random field on TL with u (x) = λβ u(λx) and let µ be the law of u law µ, define the random field u on TL/λ as on HL/λ . More intrinsically, µ is defined by the relation µ[ f (u(·))] = µ[ f (λβ u(λ·))] for every bounded continuous f on HL/λ .
28
F. Flandoli, M. Gubinelli, M. Hairer, M. Romito
Theorem B.2. If µ ∈ P NG S (ν, L , θ ) then µ ∈ P NG S (νλβ−1 , L/λ, λ
1+3β 2
θ ).
Proof. The measure µ of the theorem is the weak limit of a sequence {µn k } of invariant (n ) measures on HL k of the Galerkin problems with indexes n k . For each n k , let u (n k ) be a stationary solution (on some probability space) of (B.3), with parameters (ν, L , θ ) and marginal µn k . Let u (n k ) be the rescaled process as above, which is a solution of 1+3β
(B.3) with parameters (νλβ−1 , L/λ, λ 2 θ ) (by the lemma above) and is a stationary process. Its marginal µn k is the scaling of µn k , similarly to the relation defined above between µ and µ. Moreover µn k is an invariant measure for Eq. (B.3) with parameters (νλβ−1 , L/λ, λ
1+3β 2
θ ). From the weak convergence of µn k to µ it is now easy to deduce
the weak convergence of µn k to µ. Therefore µ ∈ P NG S (νλβ−1 , L/λ, λ is complete.
1+3β 2
θ ). The proof
Acknowledgement. The authors wish to warmly thank the anonymous referee for several valuable comments and suggestions, that dramatically improved the paper.
References 1. Batchelor, G.K.: The Theory of Homogeneous Turbulence. Cambridge Monographs on Mechanics and Applied Mathematics. Cambridge: Cambridge University Press, 1953 2. Batchelor, G.K., Townsend, A.A.: Decay of vorticity in isotropic turbulence. Proc. R. Soc. Lond. A 190(1023), 534–550 (1947) 3. Batchelor, G.K.: Computation of the energy spectrum in homogeneous, twodimensional turbulence. Phys. Fluids 12(2), 233–239 (1969) 4. Chow, P.-L., Khasminskii, R.Z.: Stationary solutions of nonlinear stochastic evolution equations. Stochastic Anal. Appl. 15(5), 671–699 (1997) 5. Constantin, P., Foias, C., Manley, O.P.: Effects of the forcing function spectrum on the energy spectrum in 2-D turbulence. Phys. Fluids 6(1), 427–429 (1994) 6. Da Prato, G., Debussche, A.: Ergodicity for the 3D stochastic Navier-Stokes equations. J. Math. Pures Appl. (9), 82(8), 877–947 (2003) 7. Fjørtoft, R.: On the changes in the spectral distribution of kinetic energy for two-dimensional, nondivergent flow. Tellus 5, 225–230 (1953) 8. Flandoli, F.: An introduction to 3D stochastic fluid dynamics. In: CIME Lectures Series, 2005, available at http://web.math.Unifi.it/users/cime// 9. Flandoli, F., Gatarek, D.: Martingale and stationary solutions for stochastic Navier-Stokes equations. Probab. Theory‘ Related Fields 102(3), 367–391 (1995) 10. Flandoli, F., Gubinelli, M.: Statistics of a vortex filament model. Electron. J. Probab. 10(25), 865–900 (electronic) (2005) 11. Foias, C., Jolly, M.S., Manley, O.P.: Kraichnan turbulence via finite time averages. Commun. Math. Phys. 255(2), 329–361 (2005) 12. Foias, C., Jolly, M.S., Manley, O.P., Rosa, R.: Statistical estimates for the Navier-Stokes equations and the Kraichnan theory of 2-D fully developed turbulence. J. Stat. Phys. 108(3–4), 591–645 (2002) 13. Frisch, U.: Turbulence. Cambridge: Cambridge University Press, 1995 14. Hairer, M., Mattingly, J.C.: Ergodicity of the 2D Navier-Stokes equations with degenerate stochastic forcing. Ann. of Math. (2), 164(3), 993–1032 (2006) 15. Kolmogorov, A.N.: A refinement of previous hypotheses concerning the local structure of turbulence in a viscous incompressible fluid at high reynolds number. J. Fluid Mech. 13, 82–85 (1962) 16. Kolmogorov, A.N.: The local structure of turbulence in incompressible viscous fluid for very large Reynolds numbers. Proc. Roy. Soc. London Ser. A 434(1890), 9–13 (1991) (translated from the Russian by V. Levin) 17. Kraichnan, R.H.: Inertial ranges in two-dimensional turbulence. Phys. of Fluids 10(7), 1417–1423 (1967) 18. Kuksin, S.B.: The Eulerian limit for 2D statistical hydrodynamics. J. Stat. Phys. 115(1-2), 469–492 (2004) 19. Kupiainen, A.: Statistical theories of turbulence. In: Advances in Mathematical Sciences and Applications, Tokyo: Gakkotosho, 2003 20. Lee, T.D.: Difference between turbulence in a two-dimensional fluid and in a three-dimensional fluid. J. Appl. Phys. 22(4), 524–524 (1951)
Rigorous Remarks about Scaling Laws in Turbulent Fluids
29
21. Novikov, E.A.: Functionals and the random-force method in turbulence theory. Sov. Phys. JETP 20, 1290–1294 (1965) 22. Onsager, L.: Statistical hydrodynamics. Nuovo Cimento (9), 6(Supplemento, 2(Convegno Internazionale di Meccanica Statistica)), 279–287 (1949) 23. Taylor, G.I.: Production and dissipation of vorticity in a turbulent fluid. Proc. R. Soc. Lond. A, 164(916), 15–23 (1938) 24. Taylor, G.I.: Observations and speculations on the nature of turbulence motion (1917). In: G.K. Batchelor, editor, Scientific Papers. Cambridge: Cambridge Univ. Press, 1971 25. Taylor, G.I., Green, A.E.: Mechanism of the production of small eddies from large ones. Proc. Roy. Soc. A 158, 499–521 (1937) 26. Temam, R.: Navier-Stokes Equations, Volume 2 of Studies in Mathematics and its Applications. Third ed., Amsterdam: North-Holland Publishing Co., 1984 (with an appendix by F. Thomasset) 27. von Neumann, J.: Recent theories of turbulence (1949). In: edited by A.H. Taub, Collected Works, Volume VI, London: Pergamon Press, 1961, pp. 437–472 Communicated by A. Kupiainen
Commun. Math. Phys. 278, 31–81 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0395-z
Communications in
Mathematical Physics
Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices N. M. Ercolani1, , K. D. T-R McLaughlin1, , V. U. Pierce2, 1 Dept. of Math., Univ. of Arizona, Tucson, AZ 85721, USA.
E-mail:
[email protected];
[email protected] 2 Dept. of Math., The Ohio State University, Columbus, OH 43210, USA.
E-mail:
[email protected] Received: 19 May 2006 / Accepted: 23 July 2007 Published online: 11 December 2007 – © Springer-Verlag 2007
Abstract: In this paper we derive analytic characterizations for and explicit evaluations of the coefficients of the matrix integral genus expansion. The expansion itself arises from the large N asymptotic expansion of the logarithm of the partition function of N × N Hermitian random matrices. Its g th coefficient is a generating function for graphical enumeration on Riemann surfaces of genus g. The case that we particularly consider is for an underlying measure that differs from the Gaussian weight by a single monomial term of degree 2ν. Our results are based on a hierarchy of recursively solvable differential equations, derived through a novel continuum limit, whose solutions are the coefficients we want to characterize. These equations are interesting in their own right in that their form is related to partitions of 2g + 1 and joint probability distributions for conditioned random walks.
1. Motivation and Background The study of the Unitary Ensembles (UE) of random matrices [25], begins with a family of probability measures on the space of N × N Hermitian matrices. The measures are of the form dµt =
1 exp {−N Tr [Vt (M)]} d M, ZN
where the function Vt is a scalar function, referred to as the potential of the external field, or simply the “external field” for short. Typically it is taken to be a polynomial, K. D. T-R McLaughlin was supported in part by NSF grants DMS-0451495 and DMS-0200749, as well as a NATO Collaborative Linkage Grant “Orthogonal Polynomials: Theory, Applications, and Generalizations” Ref no. PST.CLG.979738. N. M. Ercolani and V. U. Pierce were supported in part by NSF grants DMS-0073087 and DMS-0412310.
32
N. M. Ercolani et al.
and written as follows: υ
Vt =
1 2 tjλj. λ + 2 j=1
The partition function Z N , which appears as a normalization factor in the UE measures, plays a central role in random matrix theory and its applications. It can be reduced to an integration over the eigenvalues which takes a form proportional to the integral (1.1), below, for the particular case when k = N . When all the coefficients tk in the external field are set equal to zero the associated ensemble, corresponding to µ0 , is called the Gaussian Unitary Ensemble (GUE). Many simplifications occur in the Gaussian case (see [16] for explanations of any unfamiliar terms): (1) The partition function, when all tk vanish, is a Gaussian integral, and can be evaluated exactly. (2) The matrix moments, { Tr (M j )}k dµ0 (M), can be evaluated, using Wick’s lemma, in terms of pair correlations of the matrix entries of M which are complex normal random variables. (3) The terms in these Wick coupling expansions are, in the manner of Feynman diagrams, in 1-1 correspondence with certain labelled, oriented graphs. These observations led to the conjecture [5,13] that the logarithm of the partition function has an asymptotic expansion of the form: Z N (t) 1 log = N 2 e0 (t) + e1 (t) + 2 e2 (t) + · · · , N Z N (0) where the coefficients eg (t) should be locally analytic functions of t. The Taylor coefficients of eg should enumerate topologically distinct labelled, connected oriented graphs that can be embedded into a Riemann surface of genus g in such a way that the complement of the graph in the surface is a disjoint union of contractible cells. Such a construction is referred to as a g-map (see Sect. 1.3 for a precise definition). The eg (t) are generating functions for counting the number of g-maps with given numbers of vertices of specified valence. This conjecture was proven in [16] for appropriate domains (see below). The present paper builds on these results to present a more detailed description of the coefficients eg (t) and related generating functions. More precisely, our interest is to develop a systematic, rigorous description of the fine structure for the large N asymptotics of the following family of integrals: (k)
Z N (t1 , t2 , . . . , tυ ) = ⎧ ⎤⎫ ⎡ k ⎨ ⎬ 1 1 · · · exp −N 2 ⎣ V (λ j ; t1 , . . . , tυ ) − 2 log |λ j − λ |⎦ d k λ, ⎩ ⎭ N N j=
j=1
V (λ; t1 , . . . , tυ ) = Vt (λ) = V (λ) =
1 2 λ + 2
υ
tjλj,
(1.1)
j=1
where the parameters {t1 , . . . , tυ } are assumed to be such that the integral converges. For example, one may suppose that υ is even, and tυ > 0. We will sometimes refer to
Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices
33
the following set of t = (t1 , . . . , tυ ) for which (1.1) converges. For any given T > 0 and γ > 0, define ⎧ ⎫ υ−1 ⎨ ⎬ T(T, γ ) = t ∈ Rυ : |t| ≤ T, tυ > γ |t j | . ⎩ ⎭ j=1
The parameter k is an integer that grows with N in such a way that lim N →∞ k/N = x, where x is a finite non-zero value whose role will be specified more precisely later. In this paper we derive a hierarchy of differential equations which uniquely determine ) the coefficients in the asymptotic expansion of log Z (N N for monic even coupling parameters; i.e., we present the eg (t2ν ), for arbitrary ν, as solutions to a system of ordinary differential equations. From this one can deduce functional analytic characterizations of these coefficients. Moreover, this ode system can be solved recursively in g to explicitly construct eg (t2ν ). We illustrate this process by constructing closed form expressions for eg (t2ν ), in which ν appears as a parameter, for low values of g. This analysis of the fine structure of the eg can be extended to multiple coupling parameters and we present a limited illustration of this for the case of two parameters: eg (t2ν1 , t2ν2 ). Remark. In [2], the so-called “Loop Equation” method is used to obtain some information about the fine structure of the coefficients. This approach is based on a formal derivation of a hierarchy of equations for the Cauchy transform of the mean density of eigenvalues. This interesting approach is unsatisfactory in that it relies on several interchanges of singular limits whose justification requires analytical considerations beyond the existence of the complete asymptotic expansion of the partition function. These analytical considerations are the subject of a forthcoming paper by Ercolani and McLaughlin [17].
(k)
1.1. Leading order asymptotics. The leading order behavior of Z N (t1 , t2 , . . . , tυ ) is rather classical, and is known for a very wide class of external fields V (see, for example, [22]). We will require the following result. Theorem 1.1. There is T0 > 0 and γ0 > 0 so that for all t ∈ T(T, γ ), x ∈ [1/2, 1], and k/N → x as k, N → ∞, the following holds true: (1) lim
N →∞
1 (k) log{Z N (t1 , t2 , . . . , tυ )} = −I (x, t1 , . . . , tυ ), k2
(1.2)
where I (x, t1 , . . . , tυ ) =
1 V (λ)dµ(λ) Borel measures µ,µ≥0, dµ=1 x − log |λ − µ| dµ(λ) dµ(η) . inf
(1.3)
34
N. M. Ercolani et al.
(2) There is a unique measure µV which achieves the infimum defined on the righthand side of (1.3). This measure is absolutely continuous with respect to Lebesgue measure, and dµV = ψ dλ, 1 ψ(λ) = χ(α,β) (λ) (λ − α)(β − λ) h(λ), 2π where h(λ) is a polynomial of degree υ − 2, which is strictly positive on the interval [α, β] (recall that the external field V is a polynomial of degree υ). The polynomial h is defined by 1 ds V (s) h(z) = , √ √ 2πi x (s − α) (s − β) s − z where the integral is taken on a circle containing (α, β) and z in the interior, oriented counter-clockwise. (3) There exists a constant l, depending on V such that the following variational equations are satisfied by µV : 2 log |λ − η|−1 dµV (η) + x −1 V (λ) ≥ l for λ ∈ R\supp(µV ), (1.4) 2 log |λ − η|−1 dµV (η) + x −1 V (λ) = l for λ ∈ supp(µV ). (4) The endpoints α and β are determined by the equations β V (s) ds = 0, √ (s − α)(β − s) α β sV (s) ds = 2π x. √ (s − α)(β − s) α (5) The endpoints α(x, t ) and β(x, t ) are actually analytic functions of t and x, which possess smooth extensions to the closure of {x, t : x ∈ [1/2, 1], t ∈ T(T, γ )}. They also satisfy −α(1, 0) = β(1, 0) = 2. In addition, the coefficients of the polynomial h(λ) are also analytic functions of t and x, with smooth extensions to the closure of {x, t : x ∈ [1/2, 1], t ∈ T(T, γ )}, with h(λ, x = 1, t = 0) = 1. Remark. The variational problem appearing in (1.3) is a fundamental component in the theory of random matrices, as well as integrable systems and approximation theory. It is well known, (see, for example, [29]), that under general assumptions on V , the infimum is achieved at a unique measure µV , called the equilibrium measure. For external fields V that are analytic in a neighborhood of the real axis, and with sufficient growth at ∞, the equilibrium measure is supported on finitely many intervals, with density that is analytic on the interior of each interval, behaving at worst like a square root at each endpoint, (see [10] and [11]). Remark. We call the reader’s attention to the parameter, x, in the formulation of the variational problem. We will consider the variational problem for x ∈ (0, 1], and we are particularly interested in x near 1. This parameter represents the asymptotic ratio of k to N : x = lim N →∞ k/N .
Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices
35
Remark. For a proof of (1.2), we refer the reader to [22], however this result is commonly known in the approximation theory literature. Remark. It will prove useful to adapt the following alternative presentation for the function ψ: ψ(λ) =
1 R+ (λ)h(λ), λ ∈ (α, β), 2πi
where the function R(λ) is defined via R(λ)2 = (λ − α)(λ − β), with R(λ) analytic in C \ [α, β], and normalized so that R(λ) ∼ λ as λ → ∞. The subscript ± in R± (λ) denotes the boundary value obtained from the upper (lower) half plane. 1.2. Complete asymptotic expansion. In [16] it was established that a complete large N asymptotic expansion of 1.1 exists. In this paper we will use a straightforward generalization of this result: Theorem 1.2. There is T > 0 and γ > 0 so that for t ∈ T(T, γ ), and x = k/N in a neighborhood of x = 1, one has the N → ∞ asymptotic expansion (k) Z N (t) 1 log = k 2 e0 (x, t) + e1 (x, t) + 2 e2 (x, t) + · · · . (1.5) (k) k Z (0) N
The meaning of this expansion is: if you keep terms up to order k −2h , the error term is bounded by Ck −2h−2 , where the constant C is independent of x and t for all t ∈ T(T, γ ) and for all x in the neighborhood of 1. For each j, the function e j (x, t) is an analytic function of the (complex) vector (x, t),in a neighborhood of (1, 0). Moreover, (k) the asymptotic expansion of derivatives of log Z N may be calculated via term-byterm differentiation of the above series. Remark. In [16], this result was established in the case where x = 1, and under the assumption that t ∈ T(T, γ ), for T small enough, and γ large enough, so that Theorem 1.2 holds true. Under these assumptions, Theorem 1.3 (below) was established. However, as observed in [16] (Remark 2.1, p. 2), the domain so defined is by no means the largest domain where the asymptotic expansion can be rigorously established. All that is required is the existence of a path through the space of parameters (values of x and t) connecting (x, t) to (1, 0) in such a way that all along the path, the associated equilibrium measure is supported on a single interval, with strict variational inequality on the support, strict positivity on the interval of support, and vanishing like a square root at both endpoints of the support. The collection of all such values of (x, t) defines a suitable candidate for a maximal domain, and the proof contained in [16] can easily be extended to show that the asymptotic expansion of the partition function holds on the interior of such a domain. In particular, the above theorem may be easily deduced along these lines. Remark. Recently, Bleher and Its [8] have carried out a similar asymptotic expansion of the partition function for a 1-parameter family of external fields. A very interesting aspect of their work is that they establish the nature of the asymptotic expansion of the partition through a critical phase transition.
36
N. M. Ercolani et al.
1.3. Graphical enumeration and the partition function expansion. Our goal in the work we present here is to establish analytical characterizations of the coefficients eg and, when possible, to derive explicit expressions for these coefficients. This is what we mean by the fine structure of the expansion. In addition to providing the first proof of the asymptotic expansion described in Theorem 1.2, [16] also provides a very detailed explanation of the connection between the asymptotic expansion and enumerative geometry, originally investigated by physicists in the 70s and 80s (see, for example, [5,13], and references contained therein). Equipped with the existence of the asymptotic expansion (and the subsequent result that it may be differentiated term by term), one shows that there is a geometric characterization of each eg as a generating function for enumerating topologically distinct embeddings of graphs into Riemann surfaces of genus g. A map D on a compact, oriented connected surface X is a pair D = (K (D), [ı]), where (1) (2) (3) (4)
K (D) is a connected 1-complex; [ı] is an isotopical class of inclusions ı : K (D) → X ; the complement of K (D) in X is a disjoint union of open cells (faces); the complement of K 0 (D) (vertices) in K (D) is a disjoint union of open segments (edges).
The eg enumerate labelled maps. To be precise we introduce the notion of a g-map which is a map in which the surface X is the closed, oriented Riemann surface of genus g and which in addition carries a labelling (ordering) of the vertices. Theorem 1.3 [16]. The coefficients in the asymptotic expansion (1.5) satisfy the following relations. Let g be a nonnegative integer. Then eg (t1 . . . tυ ) =
n j ≥1
1 (−t1 )n 1 . . . (−tυ )n υ κg (n 1 , . . . , n υ ) n1! . . . nυ !
in which each of the coefficients κg (n 1 , . . . , n υ ) is the number of g-maps with n j j-valent vertices for j = 1, . . . , υ. 1.4. Outline. The organization of this paper is as follows: In Sect. 2 we present the new results concerning the fine structure of the eg and related generating functions that will be proven and further explained in the remainder of the paper. Section 3 is concerned with the leading order term, e0 . The results here are fundamental for the characterization of all the higher order terms. We derive closed form expressions for e0 as a function of each of the valence coupling parameters t2ν . We also relate these evaluations directly and explicitly to the enumeration of planar graphs. In Sect. 4 a continuum limit of the Toda Lattice hierarchy is rigorously derived in which the hierarchy of Toda times corresponds to the valence coupling parameters t2ν . This continuum limit is then used to derive another hierarchy of differential equations whose solutions are the eg . Finally in Sect. 5 we show how the differential equations derived in the previous section are used to inductively generate explicit expressions for the eg . From this we characterize the function-theoretic structure of the eg as well as present explicit formulae for the eg for low values of g. We also show how our results may be extended to the case of multiple times.
Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices
37
2. Results For e0 we have explicit formulas for monic even times Theorem 2.1. For potentials V of the form V = 21 λ2 +t2ν λ2ν , the asymptotic expansion (1.5) holds true for all t2ν ≥ 0, and in addition, we have the explicit formula e0 = η(z − 1)(z − r ) +
1 log(z), 2
where (ν − 1)2 , 4ν(ν + 1) 3(ν + 1) , r= ν−1 β2 z= . 4
η=
Here 4z can be interpreted as the global analytic continuation of β 2 which determines the support (−β, β) of the equilibrium measure. The variable z is locally an analytic function of t2ν , which satisfies the algebraic relation 2ν − 1 ν−1 1 = z + 2ν x t2ν z ν . ν−1 The singularities of e0 occur at z = 0 and z = ∞. The time derivative ∂e0 2ν − 1 ν z ((ν − 1)z − (ν + 1)) = ν−1 ∂t2ν is polynomial in z. One also has a local analytical representation (here the index n 2ν is replaced by n, so that κ0 (0, . . . , 0, n 2ν ) becomes κ0 (n)), e0 (t2ν ) =
∞ j=1
κ0 (n)
(−t2ν )n , n!
(νn − 1)! κ0 (n) = (cν )n , ((ν − 1)n + 2)! 2ν − 1 , cν = 2ν ν−1 where κ0 (n) = κ0 (n 2ν ) is the generating function for 2ν-valent 0-maps. To get a handle on how the higher coefficients eg depend on the parameters t = t2ν we (N ) exploit a remarkable relation between the partition function Z N (t) and the solutions to the hierarchy of completely integrable semi-infinite Toda lattice equations. This relation is classically known, coming from several different directions: for Toda lattice equations and Jacobi matrices see [19]; for orthogonal polynomials and Jacobi matrices see, for example, [21]; for Hankel matrices and orthogonal polynomials see, for example, [30]; and for orthogonal polynomials and random matrix theory see [24]. This relation will be
38
N. M. Ercolani et al.
further explained in Sect. 4. These differential equations may be succinctly expressed through the semi-infinite tri-diagonal matrix ⎛ ⎞ 0 1 0 0 ··· ⎜ b02 0 1 0 · · · ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ 0 b2 . . . . . . . . . ⎟ 1 L=⎜ ⎟. ⎜ ⎟ ⎜ 0 0 ... 0 1 ⎟ ⎝ ⎠ .. .. . . 2 . . . . . bn . The Toda Lattice system at level 2ν can then be defined as 1 dbk2 = (L2ν )k+2,k − (L2ν )k+1,k−1 , 2 dξ (L2ν )k+1,k−1 =
(2.1) Lk+1,i2 Li2 ,i3 . . . Li2ν ,k−1 . (2.2)
i 1 ,i 2 ,...,i 2ν+1 ;|i j+1 −i j |=1;i 1 =k+1,i 2ν+1 =k−1
The sum here is indexed by walks of length 2ν along the 1D integer lattice from k + 1 to k − 1. The solution of this system may be expressed directly in terms of the partition function Z k (t1 , t) = Z k(k) (t1 , t) associated to the potential V = 21 λ2 + t1 λ + tλ2ν : 2 d 1 bk2 (ξ ) = k log Z k (t1 , s)t1 =−k −1/2 ξ1 =0,s=2ξ k ν−1 . 2k 2 dt1 2 As a dynamical system, (2.1) should be considered as an initial value problem, with bk (0)2 = k. We can now state our next main result which characterizes the continuum limit of the Toda lattice hierarchy. Theorem 2.2. For all t ≥ 0, bk2 has a valid asymptotic expansion of the form 1 1 bk2 k z 0 (s) + 2 z 1 (s) + 4 z 2 (s) + · · · , k k where s = −2k ν−1 t. The terms of this expansion are determined by the following partial differential scheme: 1 (ν) F ( f, f w , f ww , f www ) + · · · k2 1 1 + 2g Fg(ν) ( f, f w , f w(2) , · · · , f w(2g+1) ) + · · · k
f s = cν f ν f w +
evaluated at w=1
where 1 1 f 1 (s, w) + · · · + 2g f g (s, w) + · · · , and 2 k k 1 1 f (s, 1) = z 0 (s) + 2 z 1 (s) + 4 z 2 (s) + · · · , k k f g (s, w) = w 1−2g z g (w ν−1 s). f (s, w) = f 0 (s, w) +
;
Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices
39
Note that bk2 and k f (s, 1) possess the same asymptotic expansion. The forcing term F j(ν) (· · · )|w=1 is a homogeneous multi-nomial of degree ν + 1 in the f w(r ) which does not contain any instances of z α for α ≥ j. These forcing terms have the following form:
Fg(ν) =
(ν,g) ν−ρ(V )+1
dV
V :|V |=2g+1 ρ(V )≤ν+1
2g+1
f
j=1
f w( j) j!
r j (V )
,
!ρ(V ) where V = m=1 Vm is a partition of 2g + 1; r j is the number"of times a “part”, Vm , of cardinality |Vm | = j appears in the partition; ρ = ρ(V ) = r j (V ); and (ν,g)
dV
1 = #2g+1 j=1
r j ! 1≤i1 1 or elements of some complex manifold called the manifold of spectral parameters. Suppose g possesses a non-degenerate invariant scalar product (·, ·). An r -matrix is called unitary if (x, r (u, v)y) = −(r (v, u)x, y). Remark 2. There are several algebraic interpretations of the Yang-Baxter equation ([7–9]). For our purposes the interpretation from Lemma 1.1 is the most convenient. All definitions lead to the same equation for r (u, v) provided that the r -matrix is unitary. In particular, it is easy to see [8] that Eq. (1.4) is equivalent to the classical Yang-Baxter equation written in the tensor form. The unitary r -matrices were classified in [7]. The case of the non-unitary r -matrix was considered in ([8, 9]). There is not any classification of r -matrices in the general case. It turns out that a theory of (non-unitary) r -matrices can be developed in the special case of associative algebras. Let A be an associative algebra. Let r (u, v) be a meromorphic function in two complex variables with values in End(A). For each u ∈ C we denote by Au a vector space canonically isomorphic to A. Let A˜ = ⊕u Au . We define a product on the space A˜ by the formula xu yv = (x(r (u, v)y))u + ((r (v, u)x)y)v .
(1.5)
86
A. Odesskii, V. Sokolov
Lemma 1.2. The product (1.5) defines a structure of an associative algebra on A˜ iff r (u, v) satisfies the following equation: (r (u, w)x)(r (u, v)y) − r (u, v)((r (v, w)x)y) − r (u, w)(x(r (w, v)y)) ∈ N ull(A), (1.6) where N ull(A) is the set of z ∈ A such that zt = t z = 0 for all t ∈ A. Proof. of the lemma is straightforward. Definition. The relation (r (u, w)x)(r (u, v)y) − r (u, v)((r (v, w)x)y) − r (u, w)(x(r (w, v)y)) = 0
(1.7)
is called the associative Yang-Baxter equation. Lemma 1.3. Let g be a Lie algebra with the brackets [x, y] = x y − yx. Then any solution of (1.7) is a solution of (1.4). Proof. of the lemma is straightforward. Let A = Matn . It is easy to see that any operator from End(A) to End(A) has the form x → a1 x b1 + · · · + a p x b p for some matrices a1 , . . . , a p , b1 , . . . , b p . Moreover, p is the smallest possible for such a representation iff the sets matrices {a1 , . . . , a p } and {b1 , . . . , b p } are both linear independent. Theorem 1.1. Let r (u, v)x = a1 (u, v) x b1 (v, u) + · · · + a p (u, v) x b p (v, u), where a1 (u, v), . . . , b p (u, v) are meromorphic functions with values in Matn such that {a1 (u, v), . . . , a p (u, v)} are linear independent over the field of meromorphic functions in u, v as well as {b1 (u, v), . . . , b p (u, v)}. Then r (u, v) satisfies (1.7) iff there exist meromorphic functions φi,k j (u, v, w) and ψi,k j (u, v, w) such that ai (u, v)a j (v, w) = φi,k j (u, v, w)ak (u, w), i, j
bi (u, v)b j (v, w) = ψk (u, v, w)bk (u, w), b (u, v)a j (v, w) = i
(1.8)
φ ij,k (v, w, u)bk (u, w) + ψ k,i j (w, u, v)ak (u, w).
The tensors φi,k j (u, v, w) and ψi,k j (u, v, w) satisfy the following equations: l l (u, w, t) = φi,s (u, v, t)φ sj,k (v, w, t), φi,s j (u, v, w)φs,k i, j
j,k
ψs (u, v, w)ψls,k (u, w, t) = ψli,s (u, v, t)ψs (v, w, t),
(1.9)
l,s l i φ sj,k (v, w, t)ψsl,i (t, u, v) = φs,k (u, w, t)ψ s,i j (w, u, v) + φ j,s (v, w, u)ψk (t, u, w).
Proof. of the theorem is similar to the proof of Theorem 3.1 from [1]. Remark 3. It is easy to give an invariant description of the corresponding algebraic structure. In the case of a constant r -matrix this leads to the infinitesimal bi-algebras [20] described in the Introduction. Remark 4. A similar statement holds in the case of a semi-simple algebra A.
Classical Yang-Baxter Equation Solution and Quiver Representations
Example 1. Let A = Matn and r (u, v)x =
1 u−v e(u, v)x f (v, u),
87
where
e(u, v)e(v, w) = e(u, w), f (u, v) f (v, w) = f (u, w), u−v v−w e(u, v) f (v, w) = e(u, w) + f (u, w). u−w u−w
(1.10)
Then r (u, v) is an associative r -matrix. These equations hold if we assume, for example, that e(u, v) = 1, f (u, v) = (u + C)(v + C)−1 , where C is an arbitrary constant matrix. Example 2. Let A = C p . The algebra A has a basis {ei , i = 1, . . . , p} such that ei e j = δi, j ei . The formula r (u, v)ei =
1≤ j≤ p
ψi (v) ej φ j (u) − φi (v)
gives an associative r -matrix for any functions φ1 , . . . , φ p , ψ1 , . . . , ψ p of one variable, where φ1 , . . . , φ p are not constant. This r -matrix can be written in the form r ( u , v )ei =
1≤ j≤ p
ψi ( v) ej, u j − vi
v ) are functions of p variables. In this where u = (u 1 , . . . , u p ), v = (v1 , . . . , v p ), ψi ( case the manifold of spectral parameters is C p . 2. Compatible Products and Solutions to the Classical Yang-Baxter Equation Two Lie brackets [·, ·] and [·, ·]1 defined on the same vector space g are said to be compatible if [·, ·]λ = [·, ·] + λ[·, ·]1 is a Lie bracket for any λ. In the papers [13–16] different applications of the notion of compatible Lie brackets to the integrability theory have been considered. Suppose that the bracket [·, ·] is rigid, i.e. H 2 (g, g) = 0 with respect to [·, ·]. In this case the Lie algebras with the brackets [·, ·]λ are isomorphic to the Lie algebra with the bracket [·, ·] for almost all values of the parameter λ. This means that there exists a meromorphic function λ → Sλ with values in End(g) such that S0 = I d and [Sλ (x), Sλ (y)] = Sλ ([x, y] + λ[x, y]1 ).
(2.11)
Theorem 2.1. The formula r (u, v) =
1 Su Sv−1 u−v
(2.12)
defines a solution to the classical Yang-Baxter equation (1.4). Proof. For r (u, v) given by (2.12), Eq. (1.4) is equivalent to 1 1 [Su Sw−1 (x), Su Sv−1 (y)] − Su Sv−1 ([Sv Sw−1 (x), y]) (u − v)(u − w) (u − v)(v − w) 1 Su Sw−1 ([x, Sw Sv−1 (y)]) = 0. (2.13) − (u − w)(w − v)
88
A. Odesskii, V. Sokolov
Using (2.11), we get [Su Sw−1 (x), Su Sv−1 (y)] = Su ([Sw−1 (x), Sv−1 (y)] + u[Sw−1 (x), Sv−1 (y)]1 ), Su Sv−1 ([Sv Sw−1 (x), y]) = Su ([Sw−1 (x), Sv−1 (y)] + v[Sw−1 (x), Sv−1 (y)]1 ),
Su Sw−1 ([x, Sw Sv−1 (y)]) = Su ([Sw−1 (x), Sv−1 (y)] + w[Sw−1 (x), Sv−1 (y)]1 ). Substituting these expressions into the left hand side of (2.13), we obtain the statement. Remark 1. It is clear that the r -matrix (2.12) is unitary with respect to an invariant form (·, ·) if the operator Sλ is orthogonal. In this case formula (2.11) implies that the form (·, ·) is invariant with respect to the second bracket. Two associative algebras with multiplications (x, y) → x y and (x, y) → x ◦ y defined on the same finite dimensional vector space A are said to be compatible if the multiplication (0.1) is associative for any constant λ. Suppose H 2 (A, A) = 0 with respect to the first multiplication; then there exists a meromorphic function λ → Sλ with values in End(A) such that S0 = I d and Sλ (x)Sλ (y) = Sλ (x y + λx ◦ y).
(2.14)
The Taylor decomposition of Sλ at λ = 0 has the following form: Sλ = 1 + R λ + T λ2 + · · · ,
(2.15)
where R, T, . . . are some linear operators on A. Substituting this decomposition into (2.14) and equating the coefficients of λ, we obtain the formula x ◦ y = R(x)y + x R(y) − R(x y),
(2.16)
where R is defined by (2.15). It is clear that for any a ∈ A the transformation R −→ R + ada ,
(2.17)
where ada is a linear operator v → av − va, does not change the multiplication ◦. Definition. Operators R and R are said to be equivalent if R − R = ada for some a ∈ A. The following analog of Theorem 2.1 can be proved similarly. Theorem 2.2. Suppose that Sλ satisfies (2.14), then formula (2.12) defines a solution to the associative Yang-Baxter equation (1.7). Remark 2. In the important particular case Sλ = 1 + λR the r -matrix (2.12) is equivalent to 1 + (v + R)−1 . (2.18) r (u, v) = u−v Let A = Mat N . Consider the following classification problem: describe all possible associative multiplications ◦ compatible with the usual matrix product in A. Since H 2 (A, A) = 0 for any semi-simple associative algebra A, an operator-valued meromorphic function Sλ with the properties S0 = I d and (2.14) exists for any such multiplication and the multiplication is given by formula (2.16).
Classical Yang-Baxter Equation Solution and Quiver Representations
89
Example. Let a ∈ Mat N be an arbitrary matrix and R be the operator of left multiplication by a. Then (2.16) yields the multiplication x ◦ y = xay, which is associative and compatible with the standard one. It is clear that Sλ can be chosen in the form Sλ (x) = (1 + λa)x. In this case we have r (u, v) =
1 + (v + a)−1 . u−v
Any linear operator R on the space Mat N may be written in the form R(x) = a1 xb1 + . . .+al xbl for some matrices a1 , . . . , al , b1 , . . . , bl . Indeed, the operators x → ei, j xei1 , j1 form a basis in the vector space of linear operators on Mat N . It is convenient to represent the operator R from formula (2.16) in the form R(x) = a1 x b1 + · · · + a p x b p + c x
(2.19)
with p being the smallest possible in the class of equivalence of R. This means that the matrices {a1 , . . . , a p , 1} are linear independent as well as the matrices {b1 , . . . , b p , 1}. According to (2.16), the second product has the following form: x◦y= (ai x bi y + x ai y bi − ai x y bi ) + x c y. (2.20) i
It turns out that the matrices {a1 , . . . , a p , b1 , . . . , b p , c} form a representation of a certain algebraic structure. We describe this structure in the next section. 3. M-Structures and the Corresponding Associative Algebras In this section we formulate the results of the paper [1] and their simple consequences we will use below. Definition. By weak M-structure on a linear space L we mean the following data: • • • •
Two subspaces A and B and a distinguished element 1 ∈ A ∩ B ⊂ L. A non-degenerate symmetric scalar product (·, ·) on the space L. Associative products A × A → A and B × B → B with unity 1. A left action B × L → L of the algebra B and a right action L × A → L of the algebra A on the space L that commute to each other. These data should satisfy the following properties:
1. dim A ∩ B = dim L/(A + B) = 1. 2. The restriction of the action B × L → L to the subspace B ⊂ L is the product in B. The restriction of the action L × A → L to the subspace A ⊂ L is the product in A. 3. (a1 , a2 ) = (b1 , b2 ) = 0 and (b1 b2 , v) = (b1 , b2 v), (v, a1 a2 ) = (va1 , a2 ) for any a1 , a2 ∈ A, b1 , b2 ∈ B and v ∈ L. It follows from these properties that (·, ·) defines a non - degenerate pairing between A/C1 and B/C1. Therefore dim A = dim B and dim L = 2 dim A. Given a weak M-structure L, we define an associative algebra U (L) generated by L and satisfying natural compatibility and universality conditions.
90
A. Odesskii, V. Sokolov
Definition. By weak M-algebra associated with a weak M-structure L we mean an associative algebra U (L) with a linear mapping j : L → U (L) such that the following conditions are satisfied: 1. j (b) j (x) = j (bx) and j (x) j (a) = j (xa) for a ∈ A, b ∈ B and x ∈ L. 2. For any algebra X with a linear mapping j : L → X satisfying property 1 there exists a unique homomorphism of algebras f : U (L) → X such that f ◦ j = j . It is easy to see that U (L) exists and is unique for given L. Definition. A weak M-structure L is called M-structure if there exists a central element K ∈ U (L) of the algebra U (L) quadratic with respect to L. Theorem 3.1. Let L be an M-structure. Then there exists a basis {1, A1 , . . . , A p , B 1 , . . . , B p , C} in L such that {1, A1 , . . . , A p } is a basis in A, {1, B 1 , . . . , B p } is a basis in B, and K = A1 B 1 + · · · + A p B p + C. Theorem 3.2. Let R ∈ End(U (L)) be given by the formula R(x) = A1 x B 1 + · · · + A p x B p + C x, and ◦ be defined by (2.16). Then ◦ is associative and compatible with the usual product in U (L). Notice that K = R(1). Theorem 3.3. Let ◦ be an associative product in the space Mat N compatible with the usual one and written in the form (2.16), where R is given by (2.19) with p being smallest possible in the class of equivalence of R. Then there exists an M-structure L with representation U (L) → Mat N such that dim A = dim B = p + 1, the image of A has the basis {1, a1 , . . . , a p }, and the image of B has the basis {1, b1 , . . . , b p }. Definition. A representation of U (L) is called non-degenerate if its restrictions on the algebras A and B are exact. Theorem 3.4. There is one-to-one correspondence between N - dimensional nondegenerate representations of algebras U (L) corresponding to M-structures and associative products in Mat N compatible with the usual matrix product. The structure of the algebra U (L) for an M-structure L can be described as follows. Theorem 3.5. The algebra U (L) is spanned by the elements of the form a b K s , where a ∈ A, b ∈ B, s ∈ Z+ . We need also the following Definition. Let L be a weak M-structure. By the opposite weak M-structure Lop we mean the M-structure with the same linear space L, the same scalar product and algebras A, B replaced by the opposite algebras B op , Aop , correspondingly. It is easy to see that if L is an M-structure, then Lop is an M-structure as well.
Classical Yang-Baxter Equation Solution and Quiver Representations
91
4. M-Structures with Semi-Simple Algebras A and B and Quiver Representations 4.1. Matrix of multiplicities. By V l we denote the direct sum of l copies of a linear space V. By definition, we put V 0 = {0}. Recall [17] that any semi-simple associative algebra over C has the form ⊕1≤i≤r End(Vi ), any left End(V )-module has the form V l , and any right End(V )-module has the form (V )l for some r and l. Lemma 4.1. Let L be a weak M-structure. Suppose A = ⊕1≤i≤r End(Vi ), where dim Vi = m i . Then L as a right A-module is isomorphic to ⊕1≤i≤r (Vi )2m i . Proof. Since any right A-module has the form ⊕1≤i≤r (Vi )li for some l1 , . . . , lr ≥ 0, we have L = ⊕1≤i≤r Li , where Li = (Vi )li . Note that A ⊂ L and, moreover, End(Vi ) ⊂ Li for i = 1, . . . , r . Besides, End(Vi )⊥L j for i = j. Indeed, we have (v, a) = (v, I di a) = (v I di , a) = 0 for v ∈ L j , a ∈ End(Vi ), where I di is the unity of the subalgebra End(Vi ). Since (·, ·) is non-degenerate and End(Vi )⊥End(V i ) by property 3 of the weak M-structure, we have dim L ≥ 2 dim End(V ). But i i i dim Li = dim L = 2 dim A = i 2 dim End(Vi ) and we obtain dim Li = 2 dim End(Vi ) for each i = 1, . . . , r, which is equivalent to the statement of Lemma 4.1. Lemma 4.2. Let A and B be semi-simple associative algebras: A = ⊕1≤i≤r End(Vi ),
B = ⊕1≤ j≤s End(W j ),
Then L as the Aop ⊗ B-module is given by the formula
dim Vi = m i , dim W j = n j . (4.21)
L = ⊕1≤i≤r,1≤ j≤s (Vi ⊗ W j )ai, j , where ai, j ≥ 0 and
s j=1
ai, j n j = 2m i ,
r
ai, j m i = 2n j .
(4.22)
(4.23)
i=1
Proof. It is known that any Aop ⊗ B-module has the form ⊕1≤i≤r,1≤ j≤s (Vi ⊗ W j )ai, j , where ai, j ≥ 0. Applying Lemma 4.1, we obtain dim Li = 2m i2 , where Li = ⊕1≤ j≤s (Vi ⊗ W j )ai, j . This gives the first equation from (4.23). The second equation can be obtained similarly. Definition. The r × s-matrix (ai, j ) from Lemma 4.2 is called the matrix of multiplicities of the weak M-structure L. Definition. The r × s-matrix (ai, j ) is called decomposable if there exist partitions {1, . . . , r } = I I and {1, . . . , s} = J J such that ai, j = 0 for (i, j) ∈ I × J I × J . Lemma 4.3. The matrix of multiplicities is indecomposable. Proof. Suppose (ai, j ) is decomposable. We have A = A ⊕ A , B = B ⊕ B and L = L ⊕ L , where A = ⊕i∈I End(Vi ), A = ⊕i∈I End(Vi ), B = ⊕ j∈J End(W j ), B = ⊕ j∈J End(W j ), L = ⊕(i, j)∈I ×J (Vi ⊗ W j )ai, j , L = ⊕(i, j)∈I ×J (Vi ⊗ W j )ai, j .
Let 1 = e1 + e2 , where e1 ∈ L and e2 ∈ L . It is clear that e1 , e2 ∈ A ∩ B. Therefore, dim A ∩ B > 1, which contradicts property 1 of the weak M-structure.
92
A. Odesskii, V. Sokolov
Note that if A is the matrix of multiplicities of a weak M structure with semi-simple algebras A and B, then At is the matrix of multiplicities for the opposite weak M-structure. Theorem 4.1. Let L be a weak M-structure with semi-simple algebras A and B given by formula (4.21) and with L given by (4.22). Then there exists a simple laced affine Dynkin diagram [18] with vector spaces from the set {V1 , . . . , Vr , W1 , . . . , Ws } assigned to each vertex in such a way that: 1. there is one-to-one correspondence between this set and the set of vertices, 2. for any i, j the spaces Vi , V j are not connected by edges as well as the spaces Wi , Wj, 3. ai, j is equal to the number of edges between Vi and W j , 4. the vector (dim V1 , . . . , dim Vr , dim W1 , . . . , dim Ws ) is a positive imaginary root of the diagram. Proof. Consider a linear space with a basis {v1 , . . . , vr , w1 , . . . , ws } and the symmetric bilinear form (vi , v j ) = (wi , w j ) = 2δi, j , (vi , w j ) = −ai, j . Let J = m 1 v1 +· · ·+m r vr + n 1 w1 + · · · + n s ws . It is clear that Eqs. (4.23) can be written as (vi , J ) = (w j , J ) = 0, which means that J belongs to the kernel of the form (·, ·). Therefore (see [19]) the matrix of the form is the Cartan matrix of a simple laced affine Dynkin diagram. It is also clear that J is a positive imaginary root. On the other hand, consider a simple laced affine Dynkin diagram with a partition of the set of vertices into two subsets such that vertices of the same subset are not connected. It is clear that if such a partition exists, then it is unique up to transposition of subsets. Let v1 , . . . , vr be roots corresponding to vertices of the first subset and w1 , . . . , ws be roots corresponding to the second subset. We have (vi , v j ) = (wi , w j ) = 2δi, j . Let J = m 1 v1 + · · · + m r vr + n 1 w1 + · · · + n s ws be an imaginary root and ai, j = −(vi , w j ). Then it is easy to see that (4.23) holds. Remark. The interchanging of the subsets corresponds to the transposition of the matrix (ai, j ). It is easily seen that among simple laced affine Dynkin diagrams only diagrams of the A˜ 2k−1 , D˜ k , E˜ 6 , E˜ 7 , and E˜ 8 -type admit a partition of the set of vertices into two subsets such that vertices of the same subset are not connected. The natural question arises: to describe all M-structures with the algebras A and B given by (4.21) and L given by (4.22), where the matrix (ai, j ) is constructed by an affine Dynkin diagram of the A˜ 2k−1 , D˜ k , E˜ 6 , E˜ 7 , and E˜ 8 -type. It turns out that these M-structures exist iff J is the minimal positive imaginary root. 4.2. M-structures related to affine Dynkin diagrams and quiver representations. We recall that the quiver is just a directed graph Q = (V er, E), where V er is a finite set of vertices and E is a finite set of arrows between them. If a ∈ E is an arrow, then ta and h a denote its tail and its head, respectively. Note that loops and several arrows with the same tail and head are allowed. A representation of the quiver Q is a set of vector spaces L x attached to each vertex x ∈ V er and linear maps f a : L ta → L h a attached to each arrow a ∈ E. The set of natural numbers dimL x attached to each vertex x ∈ V er is called the dimension of the representation. By affine quiver we mean such a quiver that the corresponding graph is an affine Dynkin diagram of AD E-type.
Classical Yang-Baxter Equation Solution and Quiver Representations
93
Theorem 4.2. Let L be an M-structure with semi-simple algebras A and B given by (4.21). Then there exists a representation of an affine Dynkin quiver such that: 1. There is an one-to-one correspondence between the set of vector spaces attached to vertices of the quiver and the set of vector spaces {V1 , . . . , Vr , W1 , . . . , Ws }. Each vector space from this set is attached to only one vertex. 2. For any a ∈ E the space attached to its tail ta is some of Vi and the space attached to its head h a is some of W j . 3. L as Aop ⊗ B-module is isomorphic to ⊕a∈E Vta ⊗ Wh a . 4. The vector (dim V1 , . . . , dim Vr , dim W1 , . . . , dim Ws ) is the minimal imaginary positive root of the Dynkin diagram. 5. The element 1 ∈ L = ⊕a∈E H om(Vta , Wh a ) is just a∈E f a , where f a is the linear map attached to the arrow a. Proof. In Theorem 4.1 we have already constructed the affine Dynkin diagram corresponding to L with vector spaces {V1 , . . . , Vr , W1 , . . . , Ws } attached to the vertices. Note that each edge of this affine Dynkin diagram links some linear spaces Vi and W j . By definition, the direction of this edge is from Vi to W j . The decomposition of the element 1 ∈ L = ⊕1≤i≤r,1≤ j≤s (Vi ⊗ W j )ai, j defines the element from Vi ⊗ W j . Since Vi ⊗ W j = H om(Vi , W j ), we obtain a representation of the quiver. We know already that J = (dim V1 , . . . , dim Vr , dim W1 , . . . , dim Ws ) is an imaginary positive root. It is easy to see that if it is not minimal, then dim A ∩ B > 1. Now we can use known classification of representations of affine quivers [10–12] to describe the corresponding M-structures. Note that each vertex of our quiver can not be a tail of one arrow and a head of another arrow at the same time. Given a representation of such a quiver, it remains to construct an embedding A → L, B → L and a scalar product (·, ·) on the space L. We can construct the embedding A → L, B → L by the formula a → 1a, b → b1 for a ∈ A, b ∈ B whenever we know the element 1 ∈ L. After that it is not difficult to construct the scalar product. Example. Consider the case A˜ 2k−1 . We have dim Vi = dim Wi = 1 for 1 ≤ i ≤ k. Let {vi } be a basis of Vi and {wi } be a basis of Wi . Let {ei } be a basis of End(Vi ) such that vi ei = vi and { f i } be a basis of End(Wi ) such that f i wi =wi . A generic element 1 ∈ L in a suitable basis in Vi , Wi can be written in the form 1 = 1≤i≤k (vi ⊗wi +λvi+1 ⊗wi ), where index i is taken modulo k and λ ∈ C is a generic complex number. The embedding A → L, B → L is the following: ei → 1ei = vi ⊗ wi + λvi ⊗ wi−1 , f i → f i 1 = v i ⊗ wi + λvi+1 ⊗ wi . It is clear that the vector space A ∩ B is spanned by the vector i (vi ⊗ wi + λvi ⊗ wi−1 ) and that the algebra A ∩ B is isomorphic to C. Let Q = (V er, E) be an affine quiver and ρ be its representation constructed by a given M-structure L with semi-simple algebras A and B. Let V er = V ert V erh , where V ert is the set of tails and V erh is the set of heads of arrows. We have ρ : x → Vx , y → W y , a → f a for x ∈ V ert , y ∈ V erh and a ∈ E. It turns out that representations of the algebra U (L) can also be described in terms of representations of the quiver Q. Theorem 4.3. Suppose we have a representation of the algebra U (L) in a linear space N ; then there exists a representation τ : x → N x , a → φa ; x ∈ V er, a ∈ E of the quiver Q such that 1. The restriction of the representation of the algebra U (L) on the subalgebra A ⊂ U (L) is isomorphic to ⊕x∈V ert Vx ⊗ N x .
94
A. Odesskii, V. Sokolov
2. The restriction of the representation of the algebra U (L) on the subalgebra B ⊂ U (L) is isomorphic to ⊕x∈V erh Wx ⊗ N x . 3. The formula f = a∈E f a ⊗ φa defines an isomorphism f : ⊕x∈V ert Vx ⊗ N x → ⊕x∈V erh Wx ⊗ N x . Proof. It is known that any representation of the algebra End(V ) has the form V ⊗ S, where S is a linear space. The action is given by f (v⊗s) = ( f v)⊗s. Therefore N has the form N a = ⊕x∈V ert Vx ⊗ N x with respect to the action of A = ⊕1≤i≤r End(Vi ) and has the form N b = ⊕x∈V erh Wx ⊗ N x with respect to the action of B = ⊕1≤ j≤s End(W j ) for some linear spaces N x . Both linear spaces N a and N b are isomorphic to N . Thus we have linear spaces N x attached to each x ∈ V er and isomorphism f : ⊕x∈V ert Vx ⊗ N x → ⊕x∈V erh Wx ⊗ N x . Let f = x,y∈V er f x,y . It is easy to see that f x,y = 0 if x and y are not linked by arrow and f x,y = f a ⊗ φa for some φa if x = ta , y = h a . Here f a is defined by Theorem 4.2 (see property 5). This gives us a linear map φa attached to each arrow a ∈ E. Remark 1. It is clear that all statements of this section are valid for weak M-structures with semi-simple algebras A and B. However, it is possible to check that any such weak M-structure has a quadratic central element K and therefore is an M-structure. Remark 2. It follows from Theorem 4.3 (see property 3) that dim N =
x∈V ert
m x dim N x =
n x dim N x .
(4.24)
x∈V erh
Moreover, if the representation τ is decomposable, then the representation of U (L) is also decomposable. Therefore, if the representation of U (L) is indecomposable, then dim τ must be a positive root with the property (4.24). If this root is real, then the representation does not depend on parameters and corresponds to some special value of K . If this root is imaginary, then the representation depends on one parameter and the action of K depends on this parameter also. In the Appendix we describe these representations for imaginary roots explicitly.
5. Appendix In this Appendix we present explicit formulas for M-algebras with semi-simple algebras A and B based on known classification results on affine quiver representations. We give also formulas for the operator R with values in End(U (L)). Note that K = R(1). It turns out that in all cases Sλ = 1 + λR.
(5.25)
Moreover, the operator R satisfies a polynomial equation of degree 3 in the case A˜ 2k−1 and degree 4 in other cases. Using these equations, one can define (v + R)−1 with values in the localization C(K )⊗U (L), where C(K ) is the field of rational functions in K . Formula (2.18) gives us the corresponding universal r -matrix with values in C(K ) ⊗ U (L). For any representation of U (L) in a vector space N the image of this r -matrix is an r -matrix with values in End(N ).
Classical Yang-Baxter Equation Solution and Quiver Representations
95
The case A˜ 2k−1 . The algebras A and B have bases {ei ; i ∈ Z/kZ} and { f i ; i ∈ Z/kZ} correspondingly such that the multiplications are given by ei e j = δi, j ei ,
f i f j = δi, j f i .
(5.26)
The M-algebra U (L) is generated by e1 , . . . , ek , f 1 , . . . , f k with defining relations (5.26) and e1 + · · · + ek = f 1 + · · · + f k = 1, fi e j = 0, j − i = 0, 1. The operator R can be written in the form: R(x) = ei x f j + f k ek x. 1≤i≤ j≤k−1
This operator satisfies the following equation: K R(x) − (K + 1)R 2 (x) + R 3 (x) = 0. From this equation we obtain 1 1 (v + R)−1 (x) = x + (v + K )−1 (R 2 (x) − (1 + v + K )R(x)). v v(v + 1) The corresponding r -matrix is given by (2.18). For any generic value of K the algebra U (L) has the following irreducible representation V . There exist two bases {vi ; i ∈ Z/kZ} and {wi ; i ∈ Z/kZ} of the space V such that ei v j = δi, j vi ,
f i w j = δi, j wi , vi = wi − twi−1 , i, j ∈ Z/kZ.
Here t ∈ C is a parameter of representation. In this representation K acts as multiplication by 1/(1 − t k ). ∼ C⊕C⊕(Mat2 )k−2 ⊕C⊕C has a basis {e1 , e2 , e2k , e2k+1 , The case D˜ 2k . The algebra A = e2α,i, j ; 2 ≤ α ≤ k − 1, 1 ≤ i, j ≤ 2} with multiplication eα eβ = δα,β eβ , eα eβ,i, j = eβ,i, j eα = 0, eα,i, j eβ,i , j = δα,β δ j,i eα,i, j . The algebra B ∼ = (Mat2 multiplication
)k−1
(5.27)
has a basis {e2α−1,i, j ; 2 ≤ α ≤ k, 1 ≤ i, j ≤ 2} with
eα,i, j eβ,i , j = δα,β δ j,i eα,i, j . (5.28) The M-algebra U (L) is generated by e1 , e2 , e2k , e2k+1 , eα,i, j ; 3 ≤ α ≤ 2k − 1, 1 ≤ i, j ≤ 2 with defining relations (5.27), (5.28) and e2α,i,i = e2α−1,i,i = 1, e1 + e2 + e2k + e2k+1 + 2≤α≤k−1,1≤i≤2
e2α−1,i, j eβ e2α−1,i, j e2β,i , j e3,1,2 e1 e2α−1,i, j e2α,i , j e2α−1,i,1 e2α,1, j e2k−1,1,1 e2k e2k−1,1,2 e2k+1
= = = = = = =
2≤α≤k,1≤i≤2
0, 2 < α < k, β = 1, 2, 2k, 2k + 1, 0, α = β, β + 1, e3,2,2 e1 = e3,1,1 e2 = e3,2,1 e2 = 0, e2α+1,i, j e2α,i , j = 0, j = i , e2α−1,i,2 e2α,2, j , e2α+1,i,1 e2α,1, j = e2α+1,i,2 e2α,2, j , e2k−1,1,2 e2k , e2k−1,2,1 e2k = e2k−1,2,2 e2k , λe2k−1,1,1 e2k+1 , e2k−1,2,2 e2k+1 = λe2k−1,2,1 e2k+1 .
96
A. Odesskii, V. Sokolov
The operator R can be written in the form:
R(x) =
(λe1 xe2α+1,2,2 − λe1 xe2α+1,2,1 + e2 xe2α+1,1,1 − e2 xe2α+1,1,2
1≤α≤k−1
+e2k xe2α+1,1,1 + λe2k xe2α+1,2,2 + λe2k+1 xe2α+1,1,1 + λe2k+1 xe2α+1,2,2 ) (λe2α,1,1 xe2β−1,2,2 + e2α,2,2 xe2β−1,1,1 ) + 2≤α≤k−1, 2≤β≤k
−
(λe2α,1,1 xe2β−1,2,1 + e2α,2,2 xe2β−1,1,2 )
2≤α 2, β = 1, 2, = 0, α = β, β + 1, = 0, β < k − 1, α = 2k − 1, 2k, = e3,2,2 e1 = e3,1,1 e2 = e3,2,1 e2 = 0, = e2α+1,i, j e2α,i , j = 0, j = i , = e2α−1,i,2 e2α,2, j , e2α+1,i,1 e2α,1, j = e2α+1,i,2 e2α,2, j , = e2k−1 e2k−2,2,1 , e2k−1 e2k−2,1,2 = e2k−1 e2k−2,2,2 = λe2k e2k−2,1,1 , e2k e2k−2,2,2 = λe2k e2k−2,1,2 .
The operator R can be written in the form: ((λ − 1)e1 xe2α−1,2,2 + (λ − 1)e2α,1,1 xe2k−1 R(x) = (λ − 1)e1 xe2k−1 + 2≤α≤k−1
−λe2 xe2α−1,1,2 − e1 xe2α−1,2,1 + λe2 xe2α−1,2,2 + λe1 xe2α−1,1,1 ) ((λ − 1)e2α,1,1 xe2β−1,2,2 + 2≤α,β≤k−1
+λe2α,1,1 xe2β−1,1,1 + λe2α,2,2 xe2β−1,2,2 ) + (λe2α,1,2 xe2β−1,1,1 + e2α,2,1 xe2β−1,2,2 ) 2≤β≤α≤k−1
−
(λe2α,2,2 xe2β−1,1,2 + e2α,1,1 xe2β−1,2,1 ) + (λ − 1)xe2k e2k−2,2,2 .
2≤α 0 independent of h¯ and an open set ⊂ C2 \ R2 such that if || < ∗ and ω ∈ , the quantum normal form near P0 converges uniformly with respect to h¯ . This yields an exact quantization formula for the eigenvalues, and for h¯ = 0 the classical Cherry theorem on convergence of Birkhoff’s normal form for complex frequencies is recovered. 1. Introduction and Statement of the Results Consider in the phase space R2l with canonical coordinates denoted (x, ξ ) the Hamiltonian system defined by the principal function p (x, ξ ; ω) := p0 (x, ξ ) + f 0 (x, ξ ), p0 (x, ξ ; ω) := Ik (x, ξ ) :=
1 (|ξ |2 + |ωx|2 ) = 2
l
(1.1) ωk Ik (x, ξ ),
(1.2)
k=1
1 [ξ 2 + ωk2 xk2 ], k = 1, . . . , l. 2ωk k
(1.3)
Here f 0 : R2l → R is analytic; f 0 = O([|ξ |2 + |ωx|2 ]s/2 ), s ≥ 3, as |x| + |ξ | → 0, and ∈ R. Any analytic Hamiltonian near a non-degenerate elliptic equilibrium point can be written in the form (1.1). Let the frequencies ω := (ω1 , . . . , ωl ) fulfill a diophantine condition, i.e ω, k ≥ γ |k|−τ , ∀k ∈ Zl \ {0}, |k| := |k1 | + . . . + |kl |, , γ > 0, τ > l − 1. (1.4) Partially supported by PAPIIT-UNAM IN106106-2.
102
S. Graffi, C. Villegas-Blas
Under these circumstances the Birkhoff theorem holds, namely (see e.g.[SM], Sect. 30): ∀ N ∈ N, ∀ p ∈ N, ∀ ∈ R one can construct an analytic, canonical bijection (y, η) = χ,N (x, ξ ) : R2l ↔ R2l and a sequence of analytic functions Y p (I ; ω) : Rl+ → R such that: −1 p ◦ χ,N (y, η) =
l
ωk Ik (y, η) +
N −1
Y p (I (y, η); ω) p + N R N (y, η; ). (1.5)
p=1
k=1
The l functions I := (Ik (y, η) : k = 1, . . . , l), the mechanical actions, are thus first integrals of the transformed Hamiltonian up to an error of order N . Hence the system is integrable if the remainder in (1.5) vanishes as N → ∞, namely if the Birkhoff normal form B(I ; ω, ) := ω, I +
∞
Y p (I ; ω) p , ω, I :=
p=1
l
ωk Ik
(1.6)
k=1
converges when the actions belong to some ball |I | < R of Rl+ . However, as proved by Siegel [Si] in 1941, (1.6) is generically divergent (a particular convergence criterion has been later isolated by Rüssmann [Ru]; see also [Ga]. It states that (1.6) converges if Y p (I, ω) = Y p (ω, I )). Already in 1928, on the other hand, Cherry [Ch] (see also [SM], Sect. 30; a more recent proof can be found in [Ot]) remarked that, when l = 2, the normal form is convergent provided the frequencies ω are complex with non-vanishing imaginary part. Under this assumption the small denominator mechanism which generates the divergence becomes instead a large denominator one entailing the convergence. We prove here that under the same assumptions on the frequencies, but much more restrictive conditions on the perturbation, the Cherry theorem holds in quantum mechanics as well, with estimates uniform with respect to the Planck constant h¯ . Namely, the quantum Birkhoff normal form (see [Sj]) converges uniformly with respect to h¯ , and this yields an exact quantization formula for the quantum spectrum. Consider indeed in L 2 (R2 ) the operator H () = P0 (h¯ , ω) + F0 under the assumptions: (A1) P0 (h¯ , ω) is the harmonic-oscillator Schrödinger operator with frequencies ω: 1 1 P0 (h¯ , ω)ψ = − h¯ 2 ψ + [ω12 x12 + ω22 x22 ]ψ, D(P0 ) = H 2 (R2 )∩L 22 (R2 ). (1.7) 2 2 (A2) Let ω1 = a + ib, ω2 = c + id, a = 0, c = 0, ω1 , ω2 := ac + bd. Then ω ∈ ⊂ C2 , where: |ac + bd| |ω1 , ω2 | 2 = ≤δ 0. Here: f σ := | f (s)|eσ s| ds < +∞. R 2 ×R 2
(1.11)
4. A,ρ,σ := { f ∈ L 1 (R2 × R2 ) ∩ C(C2 × C2 ) | f ,ρ,σ < +∞}, ρ > 0, σ > 0. Here: f ,ρ,σ := sup eρ|ν| f ν,ω σ . (1.12) ω∈
ν∈Z2
We can now state our assumption on the perturbation. (A3) F0 is a semiclassical pseudodifferential operator of order ≤ 0 with (Weyl) symbol f 0 ∈ A,ρ,σ for some ρ > 0, σ > 0. Explicitly: (notation as in [Ro]) F0 = O phW ( f 0 ), 1 (F0 ψ)(x) = 2 ei(x−y),ξ /h¯ f 0 ((x + y)/2, ξ )ψ(y) dydξ, ψ ∈ S(R2 ). 2 2 h R ×R (1.13) f L 1 , F0 extends to a continuous Remarks. 1. Since ([Ro], Sect. II.4) F L 2 →L 2 ≤ operator in L 2 (R2 ) because: F0 L 2 →L 2 ≤ f 0 L 1 ≤ f 0 σ ≤ f 0 ,ρ,σ .
(1.14)
2. Any f ∈ A,ρ,σ admits a holomorphic continuation from u = (x, ξ ) ∈ R2 × R2 to the strip {z = (z 1 , z 2 ) ∈ C2 × C2 | |Im z| < σ }. Obviously this holomorphic continuation can be different from the function f ◦ φ,ω (z 1 , z 2 ) : C2 × C2 → R, as 2 in the example f = e−|z| P(z) : C2 × C2 → C, P any polynomial, discussed in the Appendix. Since F0 is bounded, H () defined on D(P0 ) is closed with pure-point spectrum ∀ ∈ C, and is self-adjoint for ∈ R if ω ∈ R2+ . Moreover, P0 can be considered a semiclassical pseudodifferential operator of order 2 with symbol p0 (x, ξ ; ω).
104
S. Graffi, C. Villegas-Blas
Theorem 1.1. Let (A1-A3) be verified and let h ∗ > 0. Then there exists ∗ > 0 independent of h¯ ∈ [0, h¯ ∗ ] such that if || < ∗ the spectrum of H () is given by the quantization formula 1 E n (h¯ , ) = ω, nh¯ + (ω1 + ω2 )h¯ + N (n h¯ , h¯ ; ), 2 ∞ N (n h¯ , h¯ ; ) = N p (n h¯ , h¯ ) p .
(1.15) (1.16)
p=1
Here n = (n 1 , n 2 ), n i = 0, 1, . . ., and: 1. N p (I, h¯ ) : R2+ × [0, h ∗ ] → C is analytic in I and continuous in h¯ ; 2. The series (1.15) has convergence radius ∗ uniformly with respect to (I, h¯ ) ∈ × [0, h ∗ ]. Here is any compact of R2+ ; 3. N p (I, h¯ ) : p = 1, 2, . . . admits an asymptotic expansion to all orders in h¯ ; the order 0 term is the coefficient Y p (I ) of the Birkhoff normal form. Remarks. 1. The conditions of the Cherry theorem are much less restrictive than the present ones. In particular, the standard Schrödinger operator in which f 0 depends only on x is excluded. On the other hand, in the classical case h¯ = 0 we obtain an improved version of the theorem: indeed, in our conditions the Birkhoff normal form converges, for small enough, in any compact of R2 . To our knowledge this result is new. 1 2. Taking h¯ = 0 in N p (I, h¯ ) (1.15) becomes E νB S (h¯ , ) := ω, nh¯ + (ω1 + ω2 )h¯ + 2 ∞ p Y p (n h¯ ) , namely the Bohr-Sommerfeld quantization of the Birkhoff normal p=1
form. Formula (1.15) yields all corrections needed to recover the eigenvalues E n (h¯ , ). 3. For any fixed n and h¯ the series (1.15) coincides with the Rayleigh-Schrödinger per1 turbation expansion near the simple eigenvalue ω, nh¯ + (ω1 + ω2 )h¯ of P0 [GP]. 2 4. Always for n = 2, under the same conditions on the frequencies, but under much more general conditions on the perturbation, Melin and Sjöstrand [MS] proved that the KAM iteration scheme applied to the full symbol of the Schrödinger operator converges for I belonging to an open set of R2+ . This yields an exact quantization formula for the spectrum of H (). Under the present conditions it yields of course the same spectrum as (1.15), and reproduces it after expansion in powers of . Always under the present very particular conditions on the frequencies Theorem 1.1 represents the most sharpened version of the quantization formula with exponentially small remainder in of ([BGP], Prop. 3.1) valid for the same class of perturbations; namely, here the remainder vanishes. This is a consequence of the uniform exponential bound |N p (n h¯ , h¯ )| < C p for some C > 0 independent of (n h¯ , h¯ ) worked out in Proposition 2.2 below, in the same way as Proposition 3.1 of [BGP] follows from the uniform bound |N p (n h¯ , h¯ )| < C p p (2+τ ) p valid in the general case of real diophantine frequencies with diophantine constant τ > l − 1. 2. Proof of the Results The proof is to be obtained in four steps.
A Uniform Quantum Version of the Cherry Theorem
105
1. Perturbation theory: the formal construction. Look for a unitary transformation U (ω, , h¯ ) = ei W ()/h¯ : L 2 ↔ L 2 , W () = W ∗ (), ∈ R, such that: S() := U H ()U −1 = P0 (h¯ , ω) + Z 1 + 2 Z 2 + . . . + k Rk (),
(2.1)
where [Z p , P0 ] = 0, p = 1, . . . , k − 1. Recall the formal commutator expansion: eit W ()/h¯ H e−it W ()/h¯ =
∞
t l Hl ,
H0 := H,
Hl :=
l=0
[W, Hl−1 ] , l ≥ 1. (2.2) i h¯ l
Looking for W () under the form of a power series, W () = W1 + 2 W2 + . . . , (2.2) becomes: S=
k
s Ps + k+1 R (k+1) ,
(2.3)
s=0
where Ps = Fs =
[Ws , P0 ] + Fs , s ≥ 1, F1 ≡ F0 , i h¯ s 1 [W j , [W j , . . . , [W jr , P0 ] . . .] 1
r =2
+
r!
r =1
j1 +...+ jr =s−1 jl ≥1
2
(i h¯ )r
j1 +...+ jr =s jl ≥1
s−1 1 r!
(2.4)
[W j1 , [W j2 , . . . , [W jr , F0 ] . . .] . (i h¯ )r
Since Fs depends on W1 , . . . , Ws−1 , (2.1) yields the recursive homological equations: [Ws , P0 ] + Fs = Z s , i h¯
[P0 , Z s ] = 0.
(2.5)
To solve for S, Ws , Z s , we can equivalently look for their symbols; from now on, we denote by the same letter, but in small case, the symbol σ (A) of an operator A, except for the symbol of S, denoted . Let us now recall the following relevant results (see e.g. [Fo], Sect. 3.4): 1. σ ([A, B]/i h¯ ) = {a, b} M , where {a, b} M is the Moyal bracket of a and b. 2. Given (g, g ) ∈ Aω,σ , their Moyal bracket {g, g } M is defined as {g, g } M = g#g − g #g, where # is the composition of g, g considered as Weyl symbols. 3. In the Fourier transform representation, used throughout the paper, the Moyal bracket has the expression
2 ∧ g (s 1 )g (s − s 1 ) sin h¯ (s − s 1 ) ∧ s 1 /2 ds 1 , (2.6) ({g, g } M ) (s) = h¯ R2n where, given two vectors s = (v, w) and s 1 = (v 1 , w 1 ), s ∧s 1 := w, v1 −v, w1 . 4. {g, g } M = {g, g } if either g or g is quadratic in (x, ξ ).
106
S. Graffi, C. Villegas-Blas
Equations (2.2, 2.3, 2.4) then become, once written for the symbols: σ (ei W ()/h¯ H e−i W ()/h¯ ) =
∞
Hl , H0 := p0 + f 0 , Hl :=
l=0
{w, Hl−1 } M , l ≥ 1, l (2.7)
() =
k
s ps + k+1r (k+1) ,
(2.8)
s=0
where ps := {ws , p0 } M + f s , s ≥ 1, f 1 ≡ f 0 , s 1 {w j1 , {w j2 , . . . , {w jr , p0 } M . . .} M f s := r ! j +...+ j =s r =2
1
s−1 1 + r! r =1
(2.9) (2.10)
r jl ≥1
{w j1 , {w j2 , . . . , {w jr , f 0 } M . . .} M , s > 1.
j1 +...+ jr =s−1 jl ≥1
In turn, the recursive homological equations become: {ws , p0 } M + f s = ζs ,
{ p0 , ζs } M = 0.
(2.11)
2. Solution of the homological equation and estimates of the solution. f ∈ Aω,ρ,σ clearly entails the existence of the Fourier expansion of f φ,ω (u), and its uniform convergence with respect to φ ∈ T2 , u on compacts of R2 × R2 , and ω ∈ , namely: f φ,ω (u) = f ν,ω (u)eiν,φ =⇒ f (u) = f ν,ω (u). (2.12) ν∈Zl
ν∈Zl
We further denote, for ω ∈ , and ρ > 0: f ω,σ := f ν,ω σ ; Aω,σ := { f (u) ∈ Fσ | f (u)ω,σ < +∞},
(2.13)
ν∈Z2
f ω,ρ,σ :=
eρ|ν| f ν,ω σ ; Aω,ρ,σ := { f (u) ∈ Aω,σ | f (u)ω,ρ,σ < +∞},
ν∈Z2
f ,σ := sup f ω,σ ; A,σ := { f (u) ∈ Fσ | f (u),σ < +∞}, ω∈
f ,ρ,σ := sup f ω,ρ,σ . ω∈
(2.14) (2.15) (2.16)
Hence A,ρ,σ = { f (u) ∈ Fσ | f (u),ρ,σ < +∞} and clearly A,ρ,σ ⊂ A,σ ⊂ Fσ . Moreover the following inequalities obviously hold: sup u∈R2 ×R2
| f ν,ω (u)| ≤ fˆν,ω (s) L 1 ≤ f ν,ω σ ≤ f ,σ ≤ f ,ρ,σ ,
fˆ L 1 ≤ f σ ≤ f σ ≤ f ,σ ≤ f ,ρ,σ .
(2.17) (2.18)
A Uniform Quantum Version of the Cherry Theorem
107
Now the key remark is that {a, p0 } M = {a, p0 } for any symbol a because p0 is quadratic in (x, ξ ). The homological equation (2.11) becomes therefore {ws , p0 } + f s = ζs ,
{ p0 , ζs } = 0
(2.19)
We then have: Proposition 2.1. Let f ∈ A,ρ,σ . Then the equation {w, p0 } + f = ζ,
{ p0 , ζ } = 0
(2.20)
admits the solutions ζ ∈ A,σ , w ∈ A,ρ,σ , ζ := f 0,ω ;
w :=
ν =0
f ν,ω , iω, ν
(2.21)
with the property ζ ◦ φ = ζ ; i.e., ζ depends only on I1 , I2 . Moreover: ζ ,σ ≤ f ,σ ; w,ρ,σ ≤ f ,ρ,σ , ∇w,ρ,σ ≤
4C f ,ρ,σ (2.22) σ
for some C(, δ) > 0. To prove the proposition we need a preliminary result. Lemma 2.1. Let w be defined by (2.21), and φ,ω (x, ξ ) by (1.9). Set: φ,ω (x, ξ ) := iφ,iω (x, ξ ),
(2.23)
that is: φ,ω (x, ξ ) := (xk , ξk ), where: ⎧ ⎨
ξk sinh φk ω ⎩ ξ = ξ cosh φ + ω kx sinh φ k k k k k k xk = xk cosh φk +
k = 1, 2.
(2.24)
Then one has, uniformly with respect to (x, ξ ) on compacts of R4 : w ◦ φ,ω (x, ξ ) =
f ν,ω (x, ξ ) eiν,φ , φ ∈ T2 , iω, ν
(2.25)
ν =0
f ν,iω (x, ξ ) e−ν,φ , |φ| ≤ ρ − η, ∀ 0 < η < ρ. (2.26) w ◦ φ,ω (x, ξ ) = ω, ν ν =0
Moreover there is C(δ) > 0 such that: wω,ρ,σ ≤ C f ω,ρ,σ ; wiω,ρ,σ ≤ C f iω,ρ,σ .
(2.27)
108
S. Graffi, C. Villegas-Blas
Proof. Let us first prove that (2.21), whose convergence is proved below, solves (2.20), and that w ◦ φ,ω (x, ξ ) admits the representation (2.25). Following the argument of ([BGP]), Lemma 3.6, let us write: f ν,ω ◦ ωt,ω (u) d d w ◦
(x, ξ ) = { p0 , w}(x, ξ ) = ωt,ω dt t=0 dt t=0 iω, ν 0 =ν∈Z2 f ν,ω ◦ ωt,ω (u) f ν,ω (u)eiν,ωt d d = = dt t=0 iω, ν dt t=0 iω, ν 0 =ν∈Z2 0 =ν∈Z2 f ν,ω (u). = 0 =ν∈Z2
Clearly, this equality also entails ζ = f 0,ω . Consider now the expansions (2.25, 2.26). First, it is easy to check that ω ∈ if and only if iω ∈ . Now we have: wν,ω =
f ν,ω (x, ξ ) , iω, ν
and therefore, by a straightforward application of Lemma 2.5: wν,ω σ ≤ C f ν,ω σ . Hence: wω,ρ,σ =
eρ|ν| wν,ω σ ≤ C
ν∈Z2
eρ|ν| f ν,ω σ = f ω,ρ,σ ∀ ω ∈ .
ν∈Z2
Therefore q ∈ A,ρ,σ entails w ◦ ω,φ ∈ A,ρ,σ , whence the uniform convergence of the series (2.25). Now iω ∈ if ω ∈ ; hence w ◦ iω,φ ∈ A,ρ,σ . On the other hand, the replacement φ → iφ maps φ,iω (x, ξ ) into φ,ω (x, ξ ), and the series (2.26) is uniformly convergent if |Im φ| < ρ − η, 0 < η < ρ. Formula (2.26) is therefore proved. This concludes the proof of the lemma. Proof of Proposition 2.1. Let us first prove that ζ depends only on I1 , I2 . Consider for the sake of simplicity u = (x, ξ ) ∈ R2 . Since f ∈ A,ρ,σ , we can write: f φ,ω (x, ξ ) =
ξ iφ amn ξ −iφ m (x + )e )e + (x − 2m+n iω iω m.n=0 n
× (−iωx + ξ )eiφ + (iωx + ξ )e−iφ . ∞
The average over φ eliminates all terms but those proportional to [(x +
ξ k ξ )(x − )] [(−iωx + ξ )(iωx + ξ )]l , iω iω
i.e. to I k I l . The estimate ζ ω,σ ≤ f ω,σ is obvious, and entails ζ ,σ ≤ f ,σ . The second estimate in (2.22) has been proved in Lemma 2.1 above. To prove the third
A Uniform Quantum Version of the Cherry Theorem
109
one, consider the function f ◦ φ,ω (z) and compute, for j = 1, 2: d ∂w ∂ x j ∂w ∂ξ j w ◦ φ,ω (z)|φ=0 = + dφ j ∂ x j ∂φ j ∂ξ j ∂φ j ν j f ν,ω ∂w ξ j ∂w . = − ωjxj = ∂x j ωj ∂ξ j iω, ν 2
φ=0
0 =ν∈Z
Therefore, once more by Lemma 2.5, ∂w ξ j |ν j | ∂w f ν,ω ω,σ ≤ eρ|ν| ∂ x ω − ∂ξ ω j x j |ω, ν| j j j ω,ρ,σ 0 =ν∈Z2 ≤C eρ|ν| f ν,ω ω,σ = C f ω,ρ,σ . 0 =ν∈Z2
This yields: ∂w ξ j ∂w − ω x ≤ C f ,ρ,σ . j j ∂x ω ∂ξ j j j ,ρ,σ
(2.28)
In the same way: d ∂w ∂ x j ∂w ∂ξ j w ◦ φ,ω (z)|φ=0 = + dφ j ∂ x j ∂φ j ∂ξ j ∂φ j ν j f ν,iω ∂w ξ j ∂w = + ωjxj = ∂ x j ω j ∂ξ j ω, ν 2
φ=0
0 =ν∈Z
whence, by Lemma 2.5, ∂w ξ j |ν j | ∂w f ν,iω iω,σ + ω x ≤ eρ|ν| j j ∂x ω ∂ξ j |ω, ν| j j iω,ρ,σ 0 =ν∈Z2 ≤C eρ|ν| f ν,iω iω,σ = C f iω,ρ,σ . 0 =ν∈Z2
Recalling that ω ∈ if and only if iω ∈ we get: ∂w ξ j ∂w + ω x ≤ C f ,ρ,σ . j j ∂x ω ∂ξ j j j ,ρ,σ
(2.29)
Denote now s j , t j the Fourier dual variables of (x j , ξ j ), j = 1, 2. Then, by definition (we drop for the sake of simplicity the dependence of ω): ∂w ∂w (s j , t j ) σ (|s|+|t|) dsdt. ∂ x ξ j = 4 s j e ∂t j j R σ
110
S. Graffi, C. Villegas-Blas
Applying Lemma 2.3 to the integration over t j we get: ∂w s j w = ν,ω (s j , t j ) eσ (|s|+|t|) dsdt ∂x j ω,σ R4 ν∈Z2 ∂w 2 s j ν,ω (s j , t j ) eσ (|s|+|t|) dsdt ≤ 4 σ ∂t ν∈Z2
R
j
∂wν,ω 2 2 = ∂x ξj = σ σ j σ 2 ν∈Z
∂w . ∂x ξj j ω,σ
Therefore, by (2.28, 2.29), ∂w 2C|ω j | f ,ω,σ . ≤ ∂x σ j ,ρ,σ Analogously, applying this time Lemma 2.3 to the integration over s j : ∂w 2C f ,ω,σ . ≤ ∂ξ σ |ω j | j ,ρ,σ This is enough to prove the proposition. 3. Iterative Lemma. Proposition 2.2. Set: µ :=
4 f 0 ,ρ,σ . σ
Let µ < 1/4 and consider for k = 1, 2, . . . the function k := p0 + Zk + vk with Zk , vk ∈ A,ρ,σ , and let Zk depend on (I1 , I2 ) only. Assume moreover: ⎧ if k = 0 ⎪ ⎨ k−1 0 , Zk ,σ ≤ (2µ)s if k ≥ 1 ⎪ ⎩
(2.30)
(2.31)
s=0
vk ,ρ,σ ≤ (2µ)k f 0 ,ρ,σ .
(2.32)
Let Sk be the Weyl quantization of k . Then there exists a unitary map Tk : L 2 → L 2 , Tk := eiW/h¯ such that the Weyl symbol of the transformed operator Tk Sk Tk∗ := Sk+1 is given by (2.30) with k + 1 in place of k and satisfies (2.31, 2.32) with k + 1 in place of k.
A Uniform Quantum Version of the Cherry Theorem
111
Proof. As in [BGP], Proposition 3.2, the homological equation: { p0 , w} + vk = Vk
(2.33)
determines the symbol w of W . Here the second unknown Vk has to depend on (x, ξ ) only through I1 , I2 . Applying Proposition 1 we find that w and Vk exist and fulfill the estimates w,ρ,σ ≤ f 0 ,ρ,σ (2µ)k ; ∇w,ρ,σ ≤ (2µ)k+1 ; Vk ,ρ,σ ≤ f 0 ,ρ,σ (2µ)k . Define now: Zk+1 := Zk + Vk ; vk+1 :=
Zkl +
l≥1
Zk0 := Zk ; Zkl :=
vkl +
l≥1
pl0 ,
l≥1
1 {w, Zkl−1 } M , l
and analogous definitions for vkl and pl0 . Clearly vk+1 ∈ A,ρ,σ by Lemma 2.4 below. Then the symbol of the transformed operator has the form (2.30) with k + 1 in place of k. To get the estimates, for k ≥ 1 we can write, by Proposition 1 and Lemmas 2.2, 2.3, and 2.4:
(2µ)k+1 ≤ (2µ)k+1 , (2µ)l = 1 − 2µ l≥1 µ l ≤ 2µ, ≤ Zk ,σ · pl0 ,ρ,σ ≤ (2µ)k+1 , 1−µ
(vkl ),ρ,σ ≤ (2µ)k
l≥1
l≥1
Zkl ,σ
l≥2
whence the assertion in a straightforward way. Proof of Theorem 1. By Proposition 2 there is ∗ > 0 such that lim p0 + Zk := ()
k→∞
exists in the | · ,ρ,σ norm if || < ∗ . Then S() := O phW (()) is unitarily equivalent k ζ (l) l + to H (). Since Zk is a polynomial of order k −1 in , we can write k = p0 + l=1
vk , where ζ (l) (I1 , I2 ) are solutions of the homological equations (2.11); therefore S() has the form (2.1). Note that lim vk ,ρ,σ = 0 entails lim Rk L 2 →L 2 = 0. To sum k→∞
k→∞
up, the Weyl symbol (, h¯ ) has the convergent (uniform with respect to h¯ ) normal form (, h¯ ) = p0 (I ) +
∞
Zn (I, h¯ ) n .
n=1
Then the assertions of Theorem 1 follow exactly as in [Sj] (see also [BGP]). This concludes the proof.
112
S. Graffi, C. Villegas-Blas
4. Auxiliary results. Lemma 2.2. Let (g, g , ∇g, ∇g ) ∈ Fσ . Then: {g, g } M σ ≤ ∇gσ ∇g σ . If
(g, g , ∇g, ∇g )
(2.34)
∈ Aω,ρ,σ then {g, g } M ω,ρ,σ ≤ ∇gω,ρ,σ ∇g ω,ρ,σ ,
(2.35)
and if (g, g , ∇g, ∇g ) ∈ A,ρ,σ : {g, g } M ,ρ,σ ≤ ∇g,ρ,σ ∇g ,ρ,σ .
(2.36)
Proof. We repeat the argument of [BGP], Lemma 3.1. We have |s ∧ s 1 | ≤ |s| · |s 1 |. Hence by (2.6) and the definition of the σ − norm we get: 2 σ |s| {g, g } M σ = e ds |g(s) ˆ gˆ (s − s 1 )| · |sinh(h¯ (s − s 1 ) ∧ s 1 )/2| ds 1 h¯ R2l R2l 2 1 ≤ ds eσ (|s|+|s |) |g(s) ˆ gˆ (s 1 )| · |sinh(h¯ s ∧ s 1 )/2| ds 1 2l h¯ R2l R 1 σ |s| ≤ e |g(s)| ˆ ds eσ |s | |gˆ (s 1 )| · |s ∧ s 1 | ds 1 = 2l R2l R 1 σ |s| ≤ e |g(s)||s| ˆ ds eσ |s | |gˆ (s 1 )| · |s 1 | ds 1 = ∇gσ ∇g σ . R2l
R2l
The remaining two inequalities follow from the first one by exactly the same argument of [BGP], Lemma 3.4. This concludes the proof of the lemma. Lemma 2.3. Let g ∈ Fσ , u = (x, ξ ) ∈ R2l . Then: gσ ≤
1 ugσ . σ
(2.37)
Proof. Setting f (s) := g(s), ˆ (2.37) is clearly equivalent to 1 eσ |s| | f (s)| ds ≤ eσ |s| |∇ f (s)| ds. 2l 2l σ R R
(2.38)
We may limit ourselves to prove this inequality in the one-dimensional case, namely to show that: 1 eσ |s| | f (s)| ds ≤ eσ |s| | f (s)| ds. (2.39) σ R R To see this, first write, for s > 0: eσ s f (s) = −
∞
eσ t f (t)eσ (s−t) dt,
s
whence, for A > 0: ∞ |eσ s f (s)| ds ≤ A
=σ
| f (t)|eσ s dsdt =
A≤s≤t≤∞ ∞ −1
∞ A
| f (t)| ∞
| f (t)|(eσ t − eσ A ) dt ≤ σ −1
A
A
t
eσ s dsdt
A
| f (t)|eσ t dt.
A Uniform Quantum Version of the Cherry Theorem
113
Likewise, for s < 0, A < 0: s e−σ t f (t)e−σ (s−t) dt, e−σ s f (s) =
−∞
A
−∞
|e
−σ s
f (s)| ds = =σ
−∞≤t≤s≤A A −1 −∞
| f (t)|e
−σ s
dsdt =
A
−∞
| f (t)|
| f (t)|(e−σ t − e−σ A ) dt ≤ σ −1
A
e−σ s dsdt
t A
−∞
| f (t)|e−σ t dt.
Performing the limit A → 0 in both inequalities we get (2.39). This concludes the proof of the lemma. Lemma 2.4. Let g ∈ A,ρ,σ , w ∈ A,ρ,σ . 1. Define gr :=
1 {w, gr −1 } M , r
r ≥ 1; g0 := g.
Then gr ∈ A,ρ,σ and the following estimate holds: ∇w ,ρ,σ r g,ρ,σ . gr ,ρ,σ ≤ 4 σ
(2.40)
2. Let w solve the homological equation (2.11). Define the sequence pr 0 : r = 0, 1, . . .: p00 := p0 ;
pr 0 :=
1 {w, pr −10 } M , r ≥ 1. r
Then pr 0 ∈ Aω,σ and fulfills the following estimate: r −1 f 0 ,ρ,σ , r ≥ 1. pr 0 ,ρ,σ ≤ 4σ −1 ∇w,ρ,σ
(2.41)
Proof. Both estimates (2.40, 2.41) are straightforward consequences of Lemmas 2.2 and 2.3: as far as (2.41) is concerned, it is indeed enough to note that {w, p0 } = ζ − q, whence p10 ,ρ,σ + ∇ p10 ,ρ,σ ≤
4 f 0 ,ρ,σ . σ
Lemma 2.5. If (A3) holds there is Cδ > 0 independent of ω ∈ such that |ω1 ν1 + ω2 ν2 | ≥ Cδ ν12 + ν22 .
(2.42)
Proof. We have to show the existence of Cδ > 0 such that f (ν1 , ν2 ) :=
|ω1 ν1 + ω2 ν2 |2 ≥ Cδ , ∀ (ν1 , ν2 ) ∈ Z2 , (ν1 , ν2 ) = (0, 0). (2.43) ν12 + ν22
Notice that f is homogeneous of degree 0, namely f (µν1 , µν2 ) = f (ν1 , ν2 )∀ (ν1 , ν2 ) ∈ Z2 , (ν1 , ν2 ) = (0, 0), ∀ µ ∈ R, µ = 0. Hence it is enough to show that F(x, y) := |ω1 x + ω2 y|2 ≥ Cδ , ∀ (x, y) ∈ S 1
(2.44)
114
S. Graffi, C. Villegas-Blas
or, writing x = cos θ, y = sin θ : F(θ ) :=
1 1 [|ω1 |2 + |ω2 |2 ] + [|ω1 |2 − |ω2 |2 ] cos 2θ + ω1 , ω2 sin 2θ ≥ C. 2 2
Note that F(0) = F(2π ) = |ω1 |2 . A simple study of the function F(θ ) : S 1 → R under the assumption (A2) shows the existence of Cδ ↓ 0 as δ ↑ 1 such that |F(θ )| ≥ Cδ ∀ θ ∈ S 1 . We omit the elementary details. Appendix Consider the function f : C4 → R, f (z) := e−|z| Pn (z), z ∈ C4 , |z|2 = 2
|z k |2 .
Here Pn (z) is a polynomial of degree n. Let us verify that f belongs to A,ρ,σ ; namely, there are ρ > 0, σ > 0 such that: eρ|ν| f ν,ω (u)σ < +∞. sup ω∈
ν∈Z2
It is clearly enough to consider the case u = (x, ξ ) ∈ R2 , n = 0. Set: ω := γ eiθ , 0 ≤ θ ≤ 2π , δ1 ≤ γ ≤ δ2 . Then: 2 | φ,ω (u)|2 = xcosφ + ωξ sinφ + |ξ cosφ − ωxsinφ|2 = Ax 2 + Bxξ + Cξ 2 A := cos2 φ + γ 2 sin2 φ;
B := cos θ (γ −1 − γ ) sin 2φ, C := cos2 φ + γ −2 sin2 φ.
Therefore we can write: f φ,ω (u) := f ◦ ω,φ (u) = e−Q(γ ,θ,φ)u,u , det Q = = Tr Q = κ :=
Q(γ , θ, φ) :=
A 21 B 1 2B C
,
cos4 φ + sin4 φ + [(γ −2 + γ 2 ) − cos2 θ (γ −1 − γ )2 ] sin2 φ cos2 φ 1 + κ(1 − cos2 θ )sin2 φcos2 φ, 2 + κsin2 φ γ −2 + γ 2 − 2 ≥ 0,
whence, ∀ (θ, φ) ∈ [0, 2π ] × [0, 2π ], 1 ≤ λ1 λ2 ≤ 1 + κ, 2 ≤ λ1 + λ2 ≤ 2 + κ, where 0 < λ1 (γ , θ, φ) ≤ λ2 (γ , θ, φ) denote the eigenvalues of Q(γ , θ, φ) > 0. This easily yields the uniform estimate: 1
1 ≤ λ1 (γ , θ, φ) ≤ λ2 (γ , θ, φ) ≤ D, D := 2 + κ + (2 + κ)2 − 4 . D 2 Consider now the Fourier coefficients f ν,ω (u) = f ν,γ ,θ (u): 2π 2π 1 1 f ν,γ ,θ (u) := f ◦ ω,φ (u)e−iνφ dφ = e−Q(γ ,θ,φ)u,u e−iνφ dφ, 2π 0 2π 0
A Uniform Quantum Version of the Cherry Theorem
115
and compute their Fourier transform: 2π 1 ˆ e−Q(γ ,θ,φ)u,u e−iνφ e−iu,s dφ du f ν,γ ,θ (s) = 2(π )2 R2 0 2π 2 −1 = e−Q (γ ,θ,φ)s,s/2 e−iνφ dφ, s ∈ R2 , √ 2 (2π ) det Q 0 1 C − 21 B . Q −1 (γ , θ, φ) = det Q − 21 B A Since 2 s, Q −1 (γ , θ, φ)s ≥ λ−1 2 s ≥
s2 D
∀ (θ, φ) ∈ [0, 2π ] × [0, 2π ] we get the (ν, θ, φ)-independent estimate 2π 2 1 2 −|s|2 /D | fˆν,γ ,θ (s)| ≤ e dφ = e−|s| /D . 2 (2π ) π 0 Therefore f ν,ω σ < +∞ ∀ σ > 0, ∀ ν ∈ Z2 . Let now φ ∈ C. Writing: det Q(γ , θ, φ) = 1 +
A(γ , θ ) 2 sin (2φ), 4
A(γ , θ ) := κ(1 − cos2 θ ) ≥ 0
we get (omitting the elementary details): det Q(γ , θ, φ) = 0,
|Im φ|
m(κ) − η independent of (γ , θ, s) ∈ [δ1 , δ2 ] × [0, 2π ] × R2 such that is analytic with respect to φ in the strip |Im φ|
, K 2 (η) > 0 independent of (γ , θ ) such that: |Q −1 (γ , θ, φ)s, s| ≥ K 1 |s|2 ,
1 < K 2 (η), √ | det Q(γ , θ, φ)|
and therefore K 2 (η) −K 1 |s|2 −ρ1 |ν| e | fˆν,γ ,θ (s)| ≤ e . 2π
116
S. Graffi, C. Villegas-Blas
This in turn entails the existence of K 3 (η) > 0 independent of ν such that, ∀ σ > 0: f ν,ω σ = eσ |s| | fˆν,γ ,θ (s)| ds ≤ K 3 e−ρ1 |ν| . R2
Hence, ∀ 0 < ρ < ρ1 : f ω,ρ,σ =
eρ|ν| f ν,ω σ < K (η)
ν∈Z2
for some K (η) > 0 independent of ω ∈ . We can thus conclude that f ,ρ,σ = sup eρ|ν| f ν,ω σ < K , ω∈
ν∈Z2
i.e., f ∈ A,ρ,σ . Remarks. We have checked that f ∈ A,ρ,σ . This entails f ∈ Fσ . By the Paley-Wiener 2 2 theorem, f φ,ω (u) = e−(Ax +Bxξ +Cξ ) must have, ∀ (φ, ω), a holomorphic continuation gφ,ω (z 1 , z 2 ) from u = (x, ξ ) ∈ R × R to z = (z 1 , z 2 ) = (x + i y, ξ + iη) ∈ C × C. This holomorphic continuation is clearly gφ,ω (z 1 , z 2 ) := e−Az 1 +Bz 1 z 2 +C z 2 . 2
2
gφ,ω (z 1 , z 2 ) of course does not coincide with z2 f ◦ φ,ω ((z 1 , z 2 )) = exp −[|z 1 cosφ + sinφ|2 + |z 2 cosφ − ωz 1 sinφ|2 ] ω when (y, η) = (0, 0). Acknowledgements. We thank Dario Bambusi for a critical reading of the manuscript and André Martinez for providing us a first proof of Lemma 2.3.
References [BGP] [Ch] [Fo] [Ga] [GP] [MS] [Ot] [Ro] [Ru] [Si] [Sj] [SM]
Bambusi, D., Graffi, S., Paul, T.: Normal forms and quantization formulae. Commun. Math. Phys. 207, 173–195 (1999) Cherry, T.W.: On the solution of hamiltonian systems of differential equations in the neighboorhood of a singular point. Proc. London. Math. Soc. 27, 151–170 (1928) Folland, G.: Harmonic analysis in phase space. Princeton, NJ: Princeton University Press, 1988 Gallavotti, G.: A criterion of integrability for perturbed harmonic oscillators. wick ordering in classical mechanics. Commun. Math. Phys. 87, 365–383 (1982) Graffi, S., Paul, T.: The Schrödinger equation and canonical perturbation theory. Commun. Math. Phys. 108, 25–41 (1987) Melin, A., Sjöstrand, J.: Bohr-sommerfeld quantization condition for non-selfadjoint operators in dimension 2. Autour de l’Analyse Microlocale. Astérisque No. 284, 181–244 (2003) Ottolenghi, A.: On convergence of normal forms for complex frequencies. J. Math. Phys. 34, 5205–5216 (1991) Robert, D.: Autour de l’approximation semiclassique. Basel: Birkhäuser, 1987 Rüssmann, H.: Konvergente Reihenentwicklungen in der Störungstheorie der Himmelsmechanik. Selecta Mathematica, V, 93–60, Heidelberger Taschenbücher, 201. Berlin-New York: Springer, 1979 Siegel, C.L.: On the integrals of canonical systems. Ann. Math. 42, 806–822 (1941) Siöstrand, J.: Semi-excited levels in non-degenerate potential wells. Asymptotic Analysis 6, 29–43 (1992) Siegel C.L., Moser J.: Lectures on Celestial Mechanics. Berlin-Heidalberg-New York: SpringerVerlag, 1971
Communicated by B. Simon
Commun. Math. Phys. 278, 117–132 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0377-1
Communications in
Mathematical Physics
A Variational Analysis of Einstein–Scalar Field Lichnerowicz Equations on Compact Riemannian Manifolds Emmanuel Hebey1 , Frank Pacard2 , Daniel Pollack3 1 Université de Cergy-Pontoise, Département de Mathématiques, Site de Saint-Martin,
2 Avenue Adolphe Chauvin, 95302 Cergy-Pontoise Cedex, France. E-mail:
[email protected] 2 Université Paris XII, Département de Mathématiques, 61 Avenue du Général de Gaulle, 94010 Créteil Cedex, France. E-mail:
[email protected] 3 University of Washington, Department of Mathematics, Box 354350, Seattle, WA 98195-4350, USA. E-mail:
[email protected] Received: 3 February 2007 / Accepted: 18 March 2007 Published online: 7 November 2007 – © Springer-Verlag 2007
Abstract: We establish new existence and non-existence results for positive solutions of the Einstein–scalar field Lichnerowicz equation on compact manifolds. This equation arises from the Hamiltonian constraint equation for the Einstein–scalar field system in general relativity. Our analysis introduces variational techniques, in the form of the mountain pass lemma, to the analysis of the Hamiltonian constraint equation, which has been previously studied by other methods. 1. Introduction One of the foundations in the mathematical analysis of the Einstein field equations of general relativity is the rigorous formulation of the Cauchy problem. The basic local existence result of Foures–Bruhat [10], and the important extension of this due to Choquet-Bruhat and Geroch [5], allows one to approach the study of globally hyperbolic spacetimes via the analysis of initial data sets. The Gauss and Codazzi equations impose constraints on the choices of initial data in general relativity, and these constraints are expressed by the Einstein constraint equations. This perspective, originally studied in the context of vacuum spacetimes, has also been successfully employed in the study of many non-vacuum models obtained by minimally coupling gravity to many of the classical matter and field sources, such as electromagnetism (via the Maxwell equations), Yang-Mills fields, fluids, and others [8,11,12]. One of the simplest non-vacuum systems is the Einstein–scalar field system which arises in coupling gravity to a scalar field satisfying a linear or non-linear wave equation with respect to the Lorentz metric describing the gravitational field. The Einstein–scalar field system, when posed in this generality, includes as special cases the (massless or massive) Einstein–Klein–Gordon equations as well as the vacuum Einstein equations with a (positive or negative) cosmological constant. Einstein–scalar field theories have been the subject of interesting developments in recent years. Among these are the recent attempts to use such theories to explain the
118
E. Hebey, F. Pacard, D. Pollack
observed acceleration of the expansion of the universe [16]–[19]. Using the conformal method, Choquet-Bruhat, Isenberg, and Pollack [6,7] reformulated the constraint equations for the Einstien–scalar field system as a determined system of nonlinear partial differential equations. The equations are semi-decoupled in the constant mean curvature (CMC) setting. One of these equations, the conformally formulated momentum constraint, is a linear elliptic equation and its solvability is easy to address. The other one, the conformally formulated Hamiltonian constraint, is a nonlinear elliptic equation (the Einstein–scalar field Lichnerowicz equation) as in (1.1) below (see [3] for a survey on the constraint equations, and in particular, the conformal method). This nonlinear equation, which contains both a positive critical Sobolev nonlinearity and a negative power nonlinearity, turns out to be of great mathematical interest. In this paper we provide a variational analysis of this equation under certain conditions on its coefficients. The analysis of the Lichnerowicz equations which arise as the conformally formulated Hamiltonian constraint equations in both vacuum and non-vacuum settings has, in the past, been conducted primarily by either the method of sub- and supersolutions (i.e. a barrier method) or by perturbation or fixed point methods. This approach has been sufficient to allow for a complete understanding of solvability in, for example, the case of constant mean curvature vacuum initial data on compact manifolds [11]. In [7] this method was applied to constant mean curvature initial data for the Einstein–scalar field system on compact manifolds. In a number of cases, the method of sub and supersolutions was shown to be sufficient to completely analyze the solvability of the Einstein–scalar field Lichnerowicz equation. In other cases, the limitations of this method were exposed and only partial results were obtained. We establish here two general theorems concerning non-existence and existence respectively, of positive solutions to the Einstein–scalar field Lichnerowicz equation (1.1). These results are of interest due both to their application to questions of existence and non-existence of solutions of the Einstein–scalar field constraint equations, as well as, more generally, the introduction of variational techniques to the analysis of the constraint equations. We expect that similar variational techniques will be of use in resolving other open questions concerning initial data for the Cauchy problem in general relativity. In what follows we let (M, g) be a smooth compact Riemannian manifold of dimension n ≥ 3. We let also H 1 (M) be the Sobolev space of functions in L 2 (M) with one derivative in L 2 (M). The H 1 –norm on H 1 (M) is given by u H 1 = |∇u|2 + u 2 dvg . M
2n Let 2 = n−2 , so that 2 is the critical Sobolev exponent for the embedding of H 1 into Lebesgue’s spaces. Let also h, A, and B be smooth functions on M. We consider the following Einstein–scalar field Lichnerowicz type equations:
g u + hu = Bu 2
−1
+
A , u 2 +1
(1.1)
where g = −divg ∇ is the Laplace–Beltrami operator, and u > 0. Unless otherwise stated, solutions are always required to be smooth and positive. The relationship between the coefficients in (1.1) and initial data for the Einstein– scalar field system are as follows (see [7] for more details). We first note that the sign convention for the Laplace–Beltrami operator which we use here is the opposite of the one used in [7]. The conformal initial data for the purely gravitational portion of the
Einstein–Scalar Field Lichnerowicz Equations
119
Einstein–scalar field system consists of a background Riemannian metric g (indicating a choice of conformal class for the physical metric) together with a symmetric (0, 2)tensor σ which is divergence-free and trace-free with respect to g (so that σ is what is commonly referred to as a transverse-traceless, or TT-tensor) and a scalar function τ representing the mean curvature of the Cauchy surface M in the spacetime development of the initial data set. The initial data for the scalar field consists of two functions, ψ and π on M, representing respectively the initial value for the scalar field and its normalized time derivative. With respect to this set of conformal initial data, the constraint equations for the Einstein–scalar field system can be realized as a determined elliptic system whose unknowns consist of a positive scalar function φ and a vector field W on M. As previously remarked, in the CMC case (when τ is constant) this system becomes semi-decoupled. This means that the portion of it corresponding to the momentum constraint equation is a linear, elliptic, vector equation for W in which the unknown φ does not appear. This equation has a unique solution when (M, g) has no conformal Killing vector fields. The solution, W , of this “conformally formulated momentum constraint equation” then appears in the one of the coefficients of the “conformally formulated Hamiltonian constraint equation” which is what we refer to as the Einstein–scalar field Lichnerowicz equation. A positive solution φ of the Einstein–scalar field Lichnerowicz equation is then used with the vector field W to transform the “conformal” initial data set (g, σ, τ, ψ, π ) into a “physical” initial data set satisfying the Einstein–scalar field constraint equations (see [7]). In terms of the conformal initial data set and the vector field W (satisfying the conformally formulated momentum constraint equation) the coefficients of the Einstein–scalar field Lichnerowicz equation (1.1) are h = cn R(g) − |∇ψ|2g , A = cn |σ + DW |2g + π 2 and B = −cn (
n−1 2 τ − 4V (ψ)), n
n−2 where cn = 4(n−1) , R(g) is the scalar curvature, ∇ is the covariant derivative for g, V (·) is the potential in the wave equation for the scalar field, and the operator D is the conformal Killing operator relative to g, defined by (DW )ab := ∇a Wb + ∇b Wa − n2 gab ∇m W m . The kernel of D consists of the conformal Killing fields on (M, g). Note that relative to the notation of [7], we have h = Rg,ψ , B = −Bτ,ψ and A = Ag,W,π . We assume in what follows that A ≥ 0 in M. This assumption implies no physical restrictions since we always have that A ≥ 0 in the original Einstein–scalar field theory. One of the results of [7] is the definition of a conformal invariant, the Yamabe–scalar field conformal invariant, whose sign can be used, through a judicious choice of the background metric g, to control the sign of h. We prove two type of results in this paper. The first one, in Sect. 2, establishes a set of sufficient conditions to guarantee the nonexistence of positive solutions of (1.1). The second one, in Sect. 3, is concerned with the existence of positive solutions of (1.1). Our existence result corresponds to (but generalizes) the case of initial data with a positive Yamabe–scalar field conformal invariant considered in [7]. More specifically the results presented here should be contrasted with the partial results indicated in the third row of Table 2 of [7], and specifically with Theorems 4 and 5 in Section 5.4–5.5 of [7]. The results presented here apply, for example, when considering initial data for the Einstein– massive–Klein–Gordon system with small (relative to the mass), or zero, values of the mean curvature. The basic variational method employed here is to use the mountain
120
E. Hebey, F. Pacard, D. Pollack
pass lemma [1,15] to solve a family of ε-approximated equations, and let then ε → 0 to obtain a solution of (1.1). Finally, Sect. 4 contains a brief discussion of a class of slightly more general equations which arise when considering the Einstein–Maxwell–scalar field theory. 2. Nonexistence of Smooth Positive Solutions Examples of nonexistence results involving pointwise conditions on h, A, and B are easy to get. Let u be a smooth positive solution of (1.1), and x0 be a point where u is minimum. Then g u(x0 ) ≤ 0 and we get that h(x0 )u(x0 ) ≥ B(x0 )u(x0 )2 −1 + A(x0 )u(x0 )−2 −1 . Let us assume that both A and B are positive functions. We have h(x0 ) ≥ B(x0 ) X + A(x0 ) X 1−n ,
(2.1)
4
where we have set X = u(x0 ) n−2 . Studying the least value of the right hand side of (2.1) (considered as a function of X ), we get that (1.1) does not possess a smooth positive solution if + n nn (h ) . (2.2) > max n−1 M (n − 1) A B n−1 It also follows from (2.1) that n−2 A 4(n−1) u(x) ≥ u(x0 ) ≥ min + M h for all x ∈ M. The idea of getting such a bound will be used again in Sect. 3 when proving Theorem 3.1. We now obtain a nonexistence result involving the Lebesgue norm of the functions A, B and h. Theorem 2.1. Let (M, g) be a smooth compact Riemannian manifold of dimension n ≥ 3. Let also h, A, and B be smooth functions on M with A ≥ 0 in M. If B > 0 in M, and
nn (n − 1)n−1
n+2 4n A
n+2 4n
B
3n−2 4n
dvg >
M
(h + )
n+2 4
B
2−n 4
dvg ,
(2.3)
M
where h + = max(0, h), then the Einstein–scalar field Lichnerowicz equation (1.1) does not possess any smooth positive solution. Proof. We assume that B > 0. Let u be a smooth positive solution of (1.1). Integrating (1.1) over M we get that Advg Bu 2 −1 dvg + = hudvg . (2.4) 2 +1 M M u M By Hölder’s inequality,
hudvg ≤ M
(h ) +
M
n+2 4
B
2−n 4
4 n+2
dvg
Bu M
2 −1
n−2 n+2
dvg
.
Einstein–Scalar Field Lichnerowicz Equations
121
Again by using Hölder’s inequality, A
n+2 4n
B
3n−2 4n
dvg ≤
B u2
M
−1
3n−2 4n
dvg
M
M
Advg u 2 +1
n+2 4n
.
Collecting these inequalities and using (2.4), we get X+
A
n+2 4n
B
3n−2 4n
4n
n+2
dvg
X
≤
1−n
(h ) +
M
n+2 4
B
2−n 4
4 n+2
dvg
,
(2.5)
M
where we have set X=
B u2
−1
4 n+2
dvg
.
M
The study of the minimal value of the function of X which appears on the left hand side of (2.5) implies that nn (n − 1)n−1
A
n+2 4n
B
3n−2 4n
4n
n+2
dvg
M
This completes the proof of the theorem.
≤
(h ) +
n+2 4
B
2−n 4
4n
n+2
dvg
.
M
Many more restrictive nonexistence conditions can be obtained easily from (2.3). For example, replacing B by min M B in the two integrals in (2.3), we get that if n+2 n+2 4n (h + ) 4 dvg n+2 nn M A 4n dvg > (n−1)(n+2) (n − 1)n−1 M (min B) 4n M
is fulfilled, then (2.3) holds true and the Einstein–scalar field Lichnerowicz equation (1.1) does not possess any smooth positive solution. In the same spirit, note that condition (2.2) is more restricitive than (2.3) since, for any triple of functions satisfying (2.2) we have n(2−n) 3n−2 nn A B n+2 > (h + )n B n+2 , n−1 (n − 1)
raising this to the power n+2 4n and integrating the result over M yields (2.3). In what follows we let S = S(M, g), S > 0, be the Sobolev constant of (M, g) defined as the smallest S > 0 such that 2 2 2 2 2 |∇u| + u dvg |u| dvg ≤ S (2.6) M
M
for all u ∈ S can be given in special geometries, like, see Ilias [13], when the Ricci curvature of the manifold is positive. Concerning lower bounds, it is well-known that S ≥ K n2 , where K n is the sharp Sobolev constant in the n-dimensional Euclidean space for the Sobolev inequality u L 2 ≤ K n ∇u L 2 . By −2 /n , where Vg is the volume of M with letting u = 1 in (2.6) we also get that S ≥ Vg respect to g. Using this, we prove some nonexistence result for solutions with bound an a priori bound on their H 1 energy. H 1 (M). Explicit upper bounds for
122
E. Hebey, F. Pacard, D. Pollack
Theorem 2.2. Let (M, g) be a smooth compact Riemannian manifold of dimension n ≥ 3. Let also h, A, and B be smooth functions on M with A ≥ 0 in M. If B is arbitrary, not necessarily positive, and
2
1 2
A dvg > S
1 max 1, max M h + 2
max B − +
4
M
M
(2.7)
S n−2
for some > 0, where B − = max(0, −B) and S is as in (2.6), then the Einstein–scalar field Lichnerowicz equation (1.1) does not possess smooth positive solutions of energy u H 1 ≤ . Moreover, (2.7) is sharp in the sense that the power p = 21 in the left-hand side of (2.7) cannot be improved, and that the bound on the energy cannot be removed. Proof. We prove here that (2.7) prohibits the existence of positive solutions of (1.1). The discussion on the sharpness of this condition is postponed until after the proof. Let u be a smooth positive solution of (1.1) such that u H 1 ≤ , > 0. Let C h = max 1, max M h + , where h + = max(0, h). Then,
|∇u|2 + hu 2 dvg ≤ C h |∇u|2 + u 2 dvg .
M
(2.8)
M
Multiplying (1.1) by u, and integrating over M, we get by (2.8) that
Bu 2 dvg + M
M
Advg ≤ C h 2 . u2
(2.9)
By the Sobolev inequality (2.6) we can write that
2
Bu dvg ≥ − max B
−
M
M
S 2 ,
(2.10)
where B − = max (0, −B). Then, by combining (2.9)–(2.10) we get that M
Advg 2 − S 2 . ≤ C + max B h M u2
(2.11)
Now, Hölder inequality yields
1 2
A dvg ≤ M
M
Advg u2
1 2
2
1 2
u dvg
.
(2.12)
M
By combining this inequality with (2.11), and by the Sobolev inequality (2.6), we get that
2
1 2
A dvg ≤ S M
This proves the theorem.
−
max B + M
Ch S
4 n−2
1 2
.
Einstein–Scalar Field Lichnerowicz Equations
123
We now discuss the sharpness of (2.7) in Theorem 2.2. The Yamabe equation on a Riemannian manifold (M, g) may be written as g u +
n−2 R(g)u = u 2 −1 , 4(n − 1)
(2.13)
where R(g) is the scalar curvature of g. A positive solution u > 0 of (2.13) corresponds to a conformally related metric g˜ = u 2 −2 g with constant positive scalar curvature R(g) ˜ = 4(n−1) n−2 . Now, any solution of (2.13) is a solution of (1.1) when we let h = R(g), n−2 4(n−1)
B = α, and A = (1 − α)u 22 for some α ∈ R. This provides a transformation rule for rewriting equations like (2.13) into equations like (1.1). On the unit sphere (S n , g), for which R(g) = n(n − 1), we know (see, for instance, Aubin [2]) that there exist families (u ε )ε of solutions of (2.13), ε > 0, such that u ε H 1 = K n−n + o(1) for all ε > 0, and u ε L p → +∞ as ε → 0 for all p > 2 , where K n is the sharp Sobolev constant in the n-dimensional Euclidean space for the Sobolev inequality u L 2 ≤ K n ∇u L 2 . Letting α = 21 , the above transformation rule (2.13)→(1.1) provides a family of Einstein–scalar field Lichnerowicz type equations indexed by ε > 0, with h and B independent of ε, such that any equationin the family possesses a solution of energy less than or equal to p 2K n−n , and for which M Aε dvg → +∞ as ε → 0 for all p > 21 . This proves that the power p = 21 in the left hand side of (2.7) cannot be improved. This example can be modified in different ways with the constructions given in Brendle [4] and in Druet and Hebey [9]. We prove next that the bound on the energy in Theorem 2.2 cannot be removed. By Druet and Hebey [9] we know that on the unit sphere in dimension n ≥ 6, or on any quotient (M, g) of the unit sphere in dimension n ≥ 6, there exist families (h ε )ε of smooth functions, such that h ε → n(n−2) in C 1 (M), and families (u ε )ε of smooth 4 positive functions such that, for any ε > 0, u ε solves the Yamabe type equation g0 u ε + h ε u ε = u ε2
−1
,
(2.14)
and such that u ε H 1 → +∞ as ε → 0. Rewriting (2.14) with the transformation rule (2.13)→(1.1), we see that the u ε ’s solve (1.1) with h = h ε , B = α, and A = (1 − α)u 22 ε 1 for some α ∈ R. Letting α = 2 , we get families of Einstein–scalar field Lichnerowicz type equations indexed by ε > 0 such that any equation in the family possesses a solution, B is independent of ε, the h ε ’s converge in the C 1 -topology to a positive constant 1/2 function, and M Aε dvg → +∞ as ε → 0. In particular, we cannot hope to get that there exists C = C(n, h, B), depending on the manifold and continuously on h and B in the C 0 -topology, like this is the case for the constant in (2.7) when is fixed, such that if M A1/2 dvg ≥ C, then the Einstein–scalar field Lichnerowicz type equation (1.1) does not possess a smooth positive solution. This proves that the bound on the energy in Theorem 2.2 cannot be removed. In the same circle of ideas, we mention that if B > 0 in M, then we can give another form to (2.7) where the constant appears as C 2 . In order to get this dependency in 2 we may proceed as in the proof of Theorem 2.2, but now getting bounds from the estimate (2.9). By (2.9), since we assumed that B > 0 in M, we can write that
u 2 dvg ≤ M
C h 2 and min M B
M
Advg ≤ C h 2 . u2
(2.15)
124
E. Hebey, F. Pacard, D. Pollack
Then, by (2.12) as in the proof of the second part of Theorem 2.2, we get from (2.15) that (1.1) does not possess a smooth positive solution if max 1, max M h + 2 1 A 2 dvg > . (2.16) 1 M (min M B) 2 Condition (2.16) is complementary to the condition in Theorem 2.2. For large ’s, (2.16) is better than (2.7) since it involves the energy 2 and not 2(n−1)/(n−2) . 3. Existence of a Smooth Positive Solution In this section we use the mountain pass lemma [1,15], to get existence results that complement the nonexistence results presented in Theorem 2.2. More precisely, we prove that if M Advg is sufficiently small, and A > 0 in M, then (1.1) possesses a solution. When A ≡ 0, (1.1) is the prescribed scalar curvature equation and we know from Kazdan and Warner [14] that there are situations in which the equation does not possess a solution. In the sequel we assume that the function h is chosen so that g + h is coercive. This amounts to say that there exists a constant K h = K (M, g, h) > 0, such that |∇u|2 + h u 2 dvg |u|2 dvg ≤ K h M
for all u ∈
H 1 (M).
M
It will be convenient to define u H 1 = h
1 2 |∇u|2 + h u 2 dvg .
(3.1)
M
We also denote by Sh = S(M, g, h) > 0, the Sobolev constant defined to be the smallest constant Sh > 0 such that 2 2 2 2 |∇u| + h u dvg |u| dvg ≤ Sh
2
M
(3.2)
M
for all u ∈ H 1 (M). Observe that, if h > 0 in M, then g +h is coercive and conversely coercivity implies that M hdvg > 0, and thus that max M h > 0. Also observe that if A, B ≥ 0, A + B > 0, and if (1.1) possesses a smooth positive solution, then g + h is coercive. Indeed, in that case, there exists a function u > 0 such that g u + hu > 0 everywhere in M, and the existence of such an u implies the coercivity of g + h. Finally, as already mentioned, when h > 0 in M, then g + h is coercive and we have the bound Sh ≤ max 1,
1 min M h
2 2
S,
where S = S(M, g) > 0 is the Sobolev constant defined in (2.6). We prove here that the following existence result holds true.
Einstein–Scalar Field Lichnerowicz Equations
125
Theorem 3.1. Let (M, g) be a smooth compact Riemannian manifold of dimension n ≥ 3. Let h, A, and B be smooth functions on M for which g + h is coercive, A > 0 in M, and max M B > 0. There exists a constant C = C(n), C > 0 depending only on n, such that if A C ϕ2H 1 (3.3) dvg ≤ 2 (Sh max M |B|)n−1 h M ϕ and
Bϕ 2 dvg > 0 M
for some smooth positive function ϕ > 0 in M, where · H 1 is as in (3.1) and Sh is as h in (3.2), then the Einstein–scalar field Lichnerowicz equation (1.1) possesses a smooth positive solution. Proof (Preliminary computations.). We define I (1) : H 1 (M) → R by 1 1 |∇u|2 + hu 2 dvg − B(u + )2 dvg , I (1) (u) = 2 M 2 M
(3.4)
(2)
and if we fix ε > 0 we define Iε
: H 1 (M) → R by Advg 1 (2) Iε (u) = , 2 M (ε + (u + )2 )2
(3.5)
where 2 =
2 . 2
Obviously, for any u ∈ H 1 (M) we can write
(u H 1 ) ≤ I (1) (u) ≤ (u H 1 ) h
h
(3.6)
if the functions , : [0, +∞) → R are defined by
(t) =
1 2 max M |B| t − Sh t 2 2 2
(3.7)
and
1 2 max M |B| t + Sh t 2 2 2 for t ∈ R, where Sh > 0 and · H 1 are as in (3.1) and (3.2). h Let t0 > 0 be given by n−2 4 1 t0 = Sh max M |B| (t) =
(3.8)
(3.9)
so that is increasing in [0, t0 ], and decreasing in [t0 , +∞). We define θ > 0 such that θ2 =
1 2(n − 1)
126
E. Hebey, F. Pacard, D. Pollack
and t1 = θ t0 for t0 as in (3.9). It is easy to check that (t1 ) ≤ θ 2
2 + 2 1
(t0 ) ≤ (t0 ), 2 − 2 2
(3.10)
where and are as in (3.7) and (3.8). Finally, we define the functional Iε = I (1) + Iε(2) ,
(3.11)
(2)
where I (1) and Iε are as in (3.4) and (3.5). Let ϕ ∈ C ∞ (M), ϕ > 0 in M, be the function in the statement of the theorem. In particular Bϕ 2 dvg > 0, (3.12) M
and, without loss of generality, we can assume that ϕ H 1 = 1. h
Now, provided the constant C in (3.3) is chosen to be C = θ2
2 − 2 , 4
we find that (3.3) precisely translates into A 1 1
(t0 ), dvg ≤ 2 2 M (t1 ϕ) 2
(3.13)
and by (3.6), (3.10), and (3.13) we get that Iε (t1 ϕ) ≤ (t0 ) < Iε (t0 ϕ).
(3.14)
Finally, (3.12) implies that lim Iε (t ϕ) = −∞. +∞
Hence we can choose t2 > t0 such that Iε (t2 ϕ) < 0,
(3.15)
where Iε is the functional in (3.11). Application of the Mountain Pass Lemma. By (3.14) and (3.15), we can apply the mountain pass lemma [1,15] to the functional Iε . Let cε = inf max Iε (u), γ ∈ u∈γ
(3.16)
where stands for the set of continuous paths joining u 1 = t1 ϕ to u 2 = t2 ϕ. Observe that cε > (t0 ) and, taking the path γ (t) = t ϕ, for t ∈ [t1 , t2 ], we see that cε is bounded uniformly as ε tends to 0. We will keep in mind, for further use that
(t0 ) < cε ≤ c for all ε small enough, where c > 0 is independent of ε.
(3.17)
Einstein–Scalar Field Lichnerowicz Equations
127
By the mountain pass lemma we get that there exists a sequence (u k )k in H 1 (M) such that Iε (u k ) → cε and Iε (u k ) → 0 (3.18) as k → +∞. By (3.18),
(∇u k ∇ϕ)dvg + hu k ϕdvg − B(u +k )2 M M M Au +k ϕdvg = + o ϕ 1 H + h M (ε + (u k )2 )2 +1
−1
ϕdvg (3.19)
for all ϕ ∈ H 1 (M), where (∇u k ∇ϕ) stands for the pointwise scalar product of ∇u k and ∇ϕ with respect to g, and 1 1 B(u +k )2 dvg |∇u k |2 + hu 2k dvg − 2 M 2 M Advg 1 + = cε + o(1). (3.20) 2 M (ε + (u +k )2 )2 Combining (3.19) with ϕ = u k , and (3.20), we get that 1 n
M
+
1 2
B(u +k )2 dvg +
1 2
M
(ε + (u +k )2 )2 +1 = cε + o u k H 1 + o(1),
M
Advg (ε + (u +k )2 )2
A(u +k )2 dvg
h
and it follows from (3.21) that for k sufficiently large, 1 B(u +k )2 dvg ≤ 2cε + o u k H 1 . h n M By (3.20) and (3.22) we then get that for k sufficiently large, n−2 2 2 |∇u k | + hu k dvg ≤ B(u +k )2 dvg + 4 cε n M M ≤ 2n cε + o u k H 1 . h
In particular, by (3.22) and (3.23), |∇u k |2 + hu 2k dvg ≤ 2ncε + 1, and M 4n − B(u +k )2 dvg ≤ 3ncε cε ≤ n−2 M
(3.21)
(3.22)
(3.23)
(3.24)
for k sufficiently large, where cε is as in (3.16). By (3.24), the sequence (u k )k is bounded in H 1 (M). Up to passing to a subsequence we may then assume that there exists
128
E. Hebey, F. Pacard, D. Pollack
u ε ∈ H 1 (M) such that u k u ε weakly in H 1 (M), u k → u ε strongly in L p (M) for some p > 2, and u k → u ε almost everywhere in M as k → +∞. As a consequence,
(u +k )2 −1 (u +ε )2 −1 weakly in L 2 /(2 −1) (M), and u +k u +ε → strongly in L 2 (M) + (ε + (u k )2 )q (ε + (u +ε )2 )q
(3.25)
for all q > 0, as k → +∞. Indeed, by (3.24), the (u +k )2 −1 ’s are bounded in L 2 /(2 −1) (M). Since they converge almost everywhere to (u +ε )2 −1 , the first equation in (3.25) follows from standard integration theory. By Lebesgue’s dominated convergence theorem we also have that (ε + (u +k )2 )−q → (ε + (u +ε )2 )−q strongly in L p (M) for all p ≥ 1 and all q > 0, and since u k → u ε in L p (M) for some p > 2, we easily get that the second equation in (3.25) holds true. By (3.25), letting k → +∞ in (3.19), it follows that u ε satisfies Au +ε g u ε + hu ε = B(u +ε )2 −1 + (3.26) (ε + (u +ε )2 )2 +1 in the weak sense. The weak maximum principle and (3.26) imply that u ε ≥ 0. As a consequence, Au ε g u ε + hu ε = Bu ε2 −1 + (3.27) (ε + u 2ε )2 +1 in the weak sense. Regularity and positivity of the solution. We may rewrite (3.27) as
A g u ε + h − u ε = Bu ε2 −1 , +1 2 2 (ε + u ε ) and since h−
A (ε + u 2ε )2
+1
∈ L ∞ (M),
the regularity arguments developed in Trudinger [20] apply to (3.27). It follows that u ε ∈ L s (M) for some s > 2 . Since we have that A(ε + u 2ε )−2 +1 u ε ∈ L p (M) if p s u ε ∈ L (M), and u ε ∈ L (M) for some s > 2 , the standard bootstrap procedure, together with regularity theory, gives that u ε ∈ H 2, p (M) for all p ≥ 1, where H 2, p is the Sobolev space of functions in L p with two derivatives in L p . By the Sobolev embedding theorem we then get that the right-hand side in (3.27) is in C 0,α (M) for α ∈ (0, 1), and by regularity theory it follows that u ε ∈ C 2,α (M) for α ∈ (0, 1). In particular, the strong maximum principle can be applied and we get that either u ε ≡ 0, or u ε > 0 in M. Then we easily get that u ε ∈ C ∞ (M) is smooth. By (3.24) and (3.25), letting k → +∞ in (3.21), we get that Advg 1 ≤ (2 − 1)c, (3.28) 2 M (ε + u 2ε )2 where c is the upper bound for cε . If, for a sequence of ε j tending to 0, u ε j were to be equal to 0, we would conclude that 1 Advg ≤ c (3.29) 2 (2 − 1)ε2j M
Einstein–Scalar Field Lichnerowicz Equations
129
which is clearly impossible since we have assumed that A > 0. Therefore, for ε small enough u ε ≡ 0. Then, according to the above discussion, u ε is a smooth positive solution of (3.27). By (3.24), and standard properties of the weak limit, we also get that (|∇u ε |2 + hu 2ε )dvg ≤ 2ncε + 1 (3.30) M
for all ε > 0 small enough. Passing to the limit as ε tends to 0. In what follows we let (εk )k be a sequence of positive real numbers such that εk → 0 as k → +∞ and (3.29) holds true with ε = εk for all k, and let u k = u εk . Then u k is a smooth positive function in M such that g u k + hu k = Bu k2
−1
Au k
+
(εk + u 2k )2
(3.31)
+1
in M while, by (3.17) and (3.30), the sequence (u k )k is bounded in H 1 (M). Let xk be a point where u k is minimum. Then g u k (xk ) ≤ 0 and we get with (3.31) that h(xk ) + |B|(xk )u k (xk )2 Let δ0 > 0 be such that 2(2 +1) δ0
−2
≥
A(xk ) (εk + u k (xk )2 )2
+1
.
(3.32)
min M A 2 −2 max h + (max |B|)δ0 = . M M 2
By (3.32) we obtain that u k (xk ) ≥ δ0 , and thus that min u k ≥ δ0
(3.33)
M
when k is sufficiently large. Since (u k )k is bounded in H 1 (M) we may assume that there exists u ∈ H 1 (M) such that, up to passing to a subsequence, u k u weakly in H 1 (M), u k → u strongly in L p (M) for some p > 2, and u k → u almost everywhere in M as k → +∞. By (3.33), u ≥ δ0 almost everywhere in M. Still by (3.33), we get with similar arguments to those used to prove (3.25) that u k2
−1
u2 uk
−1
(εk + u 2k )2 +1
weakly in L 2 /(2 −1) (M), and 1 → 2 +1 strongly in L 2 (M) u
(3.34)
as k → +∞. By (3.31) and (3.34), letting k → +∞ in (3.31), we get that u is a weak solution of the Einstein–scalar field Lichnerowicz equation (1.1). Rewriting (1.1) as A g u + h − 2 +2 u = Bu 2 −1 , u
and since h − Au −2 −2 ∈ L ∞ (M), the regularity arguments developed in Trudinger [20] apply to (1.1). It follows that u ∈ L s (M) for some s > 2 . Since u ≥ δ0 almost everywhere, and δ0 > 0, the standard bootstrap procedure, together with regularity theory, gives that u is a smooth positive solution of (1.1). This ends the proof of the theorem.
130
E. Hebey, F. Pacard, D. Pollack
As a remark, the above proof provides an explicit expression for the dimensional constant C in (3.3). As another remark, it can be noted that when M Bdvg > 0, then we can take ϕ to be constant in (3.12). In particular, our existence result has the following corollary. Corollary 3.1. Let (M, g) be a smooth compact Riemannian manifold of dimension n ≥ 3 and h a smooth function on M for which g + h is coercive. There exists a constant C = C(n, h), C > 0, such that if A and B are smooth functions on M, with A > 0 in M, max M B > 0, and M Bdvg > 0, and if we further assume that (max |B|)n−1 A dvg ≤ C(n, h), (3.35) M
M
then the Einstein–scalar field Lichnerowicz equation (1.1) possesses a smooth positive solution. When A > 0 and B > 0, we can also take ϕ = A result has the following corollary.
n−2 4n
in (3.12), and our existence
Corollary 3.2. Let (M, g) be a smooth compact Riemannian manifold of dimension n ≥ 3 and h a smooth function on M for which g + h is coercive. There exists a constant C = C(n, h), C > 0, such that if A and B are smooth functions on M, with A > 0 and B > 0 in M and if we further assume that n−2 1 (max |B|)n−1 A 4n 2H 1 A 2 dvg ≤ C(n, h), (3.36) M
M
then the Einstein–scalar field Lichnerowicz equation (1.1) possesses a smooth positive solution. Interestingly, Sobolev embedding implies that 1 n−2 A 2 dvg ≤ SA 4n 2H 1 , M
and so, if A and B satisfy (3.36), then n−1 (max |B|) M
2
1 2
A dvg M
≤
C(n, h) , S
which is reminiscent of the condition (with the opposite inequality) that ensured the non-existence of a solution, which was obtained in Theorem 2.2. 4. Einstein-Maxwell-Scalar Field Theory The methods employed in Sects. 2 and 3 are strong enough to deal with additional nonlinear negative power terms in the equation of the form Cu − p for C ≥ 0 and p > 1. Such terms arise, for example, in the Einstein–Maxwell–scalar field theory. Given (M, g) compact of dimension n ≥ 3, we let h, A, B, and C be smooth functions in M, and we briefly discuss in this section equations of the form g u + hu = Bu 2
−1
+
A C + , u 2 +1 u p
(4.1)
Einstein–Scalar Field Lichnerowicz Equations
131
where A, C ≥ 0 and p > 1. In the case of the Einstein–Maxwell–scalar field theory in (spatial) dimension n = 3 we have p = 3 and C ≥ 0 represents the sum of the squares of the norms of the electric and magnetic fields on M. The approach we used to prove Theorem 2.2 deals with inequalities resulting from the signs of the coefficients and the p−1 powers of the unknown function u and thus applies to (4.1). Let pˆ = 2 2+ −1 . Then, if we concentrate on getting nonexistence results of smooth positive solutions with no a priori bound on the energy, the approach we used to prove Theorem 2.2 gives in particular that (4.1) does not possess a smooth positive solution if B > 0 in M, A, C ≥ 0 in M, and either (2.3) holds true, or
(α + 1)α+1 αα
1pˆ C
1 pˆ
B
p−1 ˆ pˆ
dvg >
M
(h + )
n+2 4
B
2−n 4
dvg ,
(4.2)
M
where α = (n − 2)( p + 1)/4. We also do get similar conditions to (4.2) for the nonexistence of solutions of (4.1) of energy bounded by . The method we used to prove Theorem 3.1 applies to (4.1) as well. Assume g + h is coercive, A, C ≥ 0 in M, A + C > 0 in M, and max M B > 0. Following the proof of Theorem 3.1 we get that there exists = (n, p), > 0 depending only on n and p, such that if A C dv ≤ , dvg ≤ (4.3) g 2 n−1 p−1 α ϕ (S max |B|) ϕ (S max h M h M |B|) M M and
Bϕ 2 dvg > 0 M
for some smooth positive function ϕ > 0 in M such that ϕ H 1 = 1, where · H 1 is h h as in (3.1), Sh is as in (3.2), and α is as in (4.2), then (4.1) possesses a smooth positive solution. As for (3.3), the constant in (4.3) can be made explicit. References 1. Ambrosetti, A., Rabinowitz, P.: Dual variational methods in critical point theory and applications. J. Funct. Anal. 14, 349–381 (1973) 2. Aubin, T.: Nonlinear Analysis on manifolds. Monge-Ampre Equations. Grund. der Math. Wissenschaften, 252. New York:Springer-Verlag, 1982 3. Bartnik, R., Isenberg, J.: The constraint equations. In: The Einstein Equations and the Large Scale Behavior of Gravitational Fields edited by P.T. Chru´sciel, H. Friedrich, Basel:Birkhäuser, 2004, pp. 1–39 4. Brendle, S.: Blow-up phenomena for the Yamabe PDE in high dimensions. To appear J. Amer. Math. Soc., DOI:10.1090/S0894-0347-07-00575-9 , 2007 5. Choquet-Bruhat, Y., Geroch, R.: Global aspects of the Cauchy problem in general relativity. Commun. Math. Phys. 14, 329–335 (1969) 6. Choquet-Bruhat, Y., Isenberg, J., Pollack, D.: The Einstein–scalar field constraints on asymptotically Euclidean manifolds. Chin. Ann. Math. Ser. B 27(1), 31–52 (2006) 7. Choquet-Bruhat, Y., Isenberg, J., Pollack, D.: The constraint equations for the Einstein–scalar field system on compact manifolds. Class. Quantum Grav. 24, 809–828 (2007) 8. Choquet-Bruhat, Y., York, J.: The Cauchy Problem. In: General Relativity and Gravitation - The Einstein Centenary, edited by A. Held New York:Plenum, 1980, pp. 99–172 9. Druet, O., Hebey, E.: Blow-up examples for second order elliptic PDEs of critical Sobolev growth. Trans. Amer. Math. Soc. 357, 1915–1929 (2004) 10. Foures-Bruhat, Y.: Théorème d’existence pour certains systèmes d’équations aux dérivées partialles non linéaires. Acta. Math. 88, 141–225 (1952) 11. Isenberg, J.: Constant mean curvature solutions of the Einstein constraint equations on closed manifolds. Class. Quantum Grav. 12, 2249–2274 (1995)
132
E. Hebey, F. Pacard, D. Pollack
12. Isenberg, J., Maxwell, D., Pollack, D.: A gluing constructions for non-vacuum solutions of the Einstein constraint equations. Adv. Theor. Math. Phys. 9(1), 129–172 (2005) 13. Ilias, S.: Constantes explicites pour les inégalités de Sobolev sur les variétés riemanniennes compactes. Ann. Inst. Fourier 33, 151–165 (1983) 14. Kazdan, J.L., Warner, F.W.: Scalar curvature and conformal deformation of Riemannian structure. J. Differ. Geom. 10, 113–134 (1975) 15. Rabinowitz, P.: Minimax methods in critical point theory with applications to differential equations, CBMS Regional Conference Series in Mathematics 65, Providance RI: Amer. Math. Soc., 1986 16. Rendall, A.: Accelerated cosmological expansion due to a scalar field whose potential has a positive lower bound. Class. Quantum Grav. 21, 2445–2454 (2004) 17. Rendall, A.: Mathematical properties of cosmological models with accelerated expansion. In: Analytical and numerical approaches to mathematical relativity, Lecture Notes in Phys. 692, Berlin:Springer, 2006, pp. 141–155 18. Rendall, A.: Intermediate inflation and the slow-roll approximation. Class. Quantum Grav. 22, 1655–1666 (2005) 19. Sahni, V.: Dark matter and dark energy. In: Physics of the Early Universe, edited by E. Papantonopoulos Berlin:Springer 2005 20. Trudinger, N.S.: Remarks concerning the conformal deformation of Riemannian structures on compact manifolds. Ann. Scuola Norm. Sup. Pisa 22, 265–274 (1968) Communicated by G.W. Gibbons
Commun. Math. Phys. 278, 133–144 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0382-4
Communications in
Mathematical Physics
Catalytic Majorization and p Norms Guillaume Aubrun, Ion Nechita Université de Lyon, Université Lyon 1, CNRS, UMR 5208 Institut Camille Jordan, Batiment du Doyen Jean Braconnier, 43, boulevard du 11 novembre 1918, 69622 Villeurbanne Cedex, France. E-mail:
[email protected];
[email protected] Received: 15 February 2007 / Accepted: 23 May 2007 Published online: 11 December 2007 – © Springer-Verlag 2007
Abstract: An important problem in quantum information theory is the mathematical characterization of the phenomenon of quantum catalysis: when can the surrounding entanglement be used to perform transformations of a jointly held quantum state under LOCC (local operations and classical communication)? Mathematically, the question amounts to describe, for a fixed vector y, the set T (y) of vectors x such that we have x ⊗ z ≺ y ⊗ z for some z, where ≺ denotes the standard majorization relation. Our main result is that the closure of T (y) in the 1 norm can be fully described by inequalities on the p norms: x p y p for all p 1. This is a first step towards a complete description of T (y) itself. It can also be seen as a p -norm analogue of the Ky Fan dominance theorem about unitarily invariant norms. The proof exploits links with another quantum phenomenon: the possibiliy of multiple-copy transformations (x ⊗n ≺ y ⊗n for given n). The main new tool is a variant of Cramér’s theorem on large deviations for sums of i.i.d. random variables. 1. Introduction The increasing interest that quantum entanglement has received in the past decade is due, in part, to its use as a resource in quantum information processing. We investigate the problem of entanglement transformation: under which conditions can an entangled state |φ be transformed into another entangled state |ψ? We restrict ourselves to LOCC protocols: Alice and Bob share |φ and have at their disposal only local operations (such as unitaries U A ⊗ I B for Alice) and classical communication. Nielsen showed in [15] that such a transformation is possible if and only if λφ ≺ λψ , where “≺” is the majorization relation and λφ , λψ are the Schmidt coefficients vectors of |φ and |ψ, respectively. Practically in the same time, Jonathan and Plenio [9] discovered a striking phenomenon: entanglement can help LOCC communication, without even being consumed. Precisely, they have found states |φ and |ψ such that |φ cannot be transformed into |ψ, but, with the help of a catalyst state |χ , the transformation |φ ⊗ |χ → |ψ ⊗ |χ is possible.
134
G. Aubrun, I. Nechita
When such a catalyst exists, we say that the state |φ is trumped by |ψ and we write λφ ≺T λψ . We say then that |φ can be transformed into |ψ by entanglement-assisted LOCC or ELOCC. It turns out that the trumping relation is much more complicated than the majorization relation; one can easily check on two given states |φ and |ψ whether λφ ≺ λψ is satisfied or not, but there is no direct way to determine if λφ ≺T λψ . Later, Bandyopadhyay et al. [2] discovered that a similar situation occurs when trying to transform by LOCC multiple copies of |φ into |ψ. It may happen that the transformation |φ → |ψ is not possible, but when considering n copies, one can transform |φ⊗n into |ψ⊗n . The phenomenon of multiple simultaneous LOCC transformations, or MLOCC, has been intensively studied in recent years and many similarities with ELOCC have been found [7,8]. In this note, we make some progress towards a complete characterization of both ELOCC and MLOCC. We show that a set of inequalities involving p norms (see the remark on Conjecture 1 at the end of the paper) is equivalent to the fact that |φ can be approached by a sequence of states |φn which are MLOCC/ELOCC-dominated by |ψ. An important point is that we allow the dimension of |φn to exceed the dimension of |φ. Our proof uses probabilistic tools; we introduce probability measures associated to |φ and |ψ and we use large deviation techniques to show the desired result. Interestingly, the result can be reversed to give a characterization of p norms that is similar to the Ky Fan characterization of unitarily invariant norms. We refer the interested reader to Sect. 3. The rest of the paper is organized as follows: in Sect. 2 we introduce the notation and the general framework of entanglement transformation of bipartite states. We also state our main result, Theorem 1. The theorem is proved in Sect. 4. Conclusions and some directions for further study are sketched in Sect. 5. The Appendix at the end of the paper contains basic results from large deviation theory needed in the proof of the main theorem.
2. Notation and Statement of the Results For d ∈ N∗ , let Pd be the set of d-dimensional probability vectors: Pd = {x ∈ Rd s.t. xi 0, xi = 1}. If x ∈ Pd , we write x ↓ for the decreasing rearrangement of x, i.e. the vector x ↓ ∈ Pd such that x and x ↓ have the same coordinates up to permuta↓ ↓ ↓ tion, and xi xi+1 . We shall also write xmax for x1 and xmin for the smallest nonzero coordinate of x. There is an operation on probability vectors that is fundamental in what follows: the tensor product ⊗. If x = (x1 , . . . , xd ) ∈ Pd and x = (x1 , . . . , xd ) ∈ Pd , the tensor product x ⊗ x is the vector (xi x j )i j ∈ Pdd ; the way we order the coordinates of x ⊗ x is immaterial for our purposes. We also define the direct sum x ⊕ x as the concatenated vector (x1 , . . . , xd , x1 , . . . , xd ) ∈ Rd+d . It x ∈ Pd satisfies xd = 0, it will be useful to identify x with the truncated vector (x1 , . . . , xd−1 ) ∈ Pd−1 . This identification induces a canonical inclusion Pd−1 ⊂ Pd . Thus, every vector x ∈ Pd can be thought of as a vector of Pd for all d d by appending d − d null elements to x. We consider thus the set of all probability vectors P0 Pd . In other words, P 0, then Q 0,β has N − 2 negative eigenvalues. In particular, for any β = 0, Q 0,β is indefinite (and H0,β is therefore not convex). Proof. We want to use the decomposition (61) of Q 0,β to show that Q 0,β can be deformed β continuously to 4N P: Consider for 0 ≤ t ≤ 1, Q 0,β (t) :=
β (t + (1 − t) Id) P (t + (1 − t) Id). 4N
As t + (1 − t) Id is positive definite for any 0 ≤ t ≤ 1 and P is regular, Q 0,β (t) is a symmetric regular (N − 1) × (N − 1)-matrix for any 0 ≤ t ≤ 1. For t = 0, β Q 0,β (0) = 4N P, whereas for t = 1, Q 0,β (1) = Q 0,β . Therefore, index(Q 0,β ) (i.e. the β P). The eigenvalues number of negative eigenvalues of Q 0,β ) coincides with index( 4N of P are µ1 = 2N − 3 with multiplicity one and µ2 = −1 with multiplicity N − 2. We now turn to the case α = 0. Proposition 6.4. Assume that N is odd and α = 0 in (2). Then, for α fixed, det Q α,β is a polynomial in β of degree N − 1 and has N − 1 real zeroes (counted with multiplicities) which we list in increasing order and denote by βk = βk (α) (1 ≤ k ≤ N − 1). They satisfy 0 < β1 < α 2 , 2α 2 < β2 ≤ · · · ≤ β N −1 and contain the (N − 1)/2 distinct numbers −1 N −1 2 2 kπ α 1 + sin ). (1 ≤ k ≤ N 2 Moreover ⎧ ⎨1 index (Q α,β ) = 0 ⎩N −2
for β < β1 for β1 < β < β2 . for β > β N −1
166
A. Henrici, T. Kappeler
Proof. Fix α ∈ R \ {0} and consider the map β → det(Q α,β ). It follows from (48) that det(Q α,β ) is a polynomial in β of degree at most N − 1, det(Q α,β ) =
N −1
qjβ j,
j=0
where q0 = det(Q α,0 ) and q N −1 = det(Q 0,1 ). By Proposition 6.2, det(Q 0,1 ) = 0, hence the degree of the polynomial det(Q α,β ) is N − 1. We claim that det(Q α,β ) has N − 1 real zeroes (counted with multiplicities). For |β| large enough, index(Q α,β ) is equal to index(Q 0,β ). By Lemma 6.3, index(Q 0,β ) is N − 2 for β > 0 and 1 for β < 0. Hence there exists R > 0 such that index(Q α,β ) = N − 2 for any β > R and index(Q α,β ) = 1 for any β < −R. For β = α 2 , Q α,α 2 is a positive multiple of the identity matrix, hence index(Q α,α 2 ) = 0. It then follows that index(Q α,β ) must change at least once in the open interval (−∞, α 2 ) and at least N − 2 times (counted with multiplicities) in (α 2 , ∞). Since a change of index(Q α,β ) induces a zero of det(Q α,β ) (counted with multiplicities), our consideration shows that β → det(Q α,β ) has N − 1 real zeroes. Further we have β1 (α) < α 2 < β2 (α). Next we prove that β1 (α) > 0, i.e. that Q α,β is regular for any β ≤ 0. Write Q α,β as a product, α2 − β Pα,β , 4N
Q α,β =
(63)
where is given by (62) and Pα,β is given by 1 γ (α, β) , Pα,β = −2 E + diag − 1+ 2 sin2 kπ N 1≤k≤N −1
(64)
where E is given by (58) and γ (α, β) :=
α2 . α2 − β
(65)
As −∞ < β ≤ 0 it follows that 0 < γ (α, β) ≤ 1 and − 21 1 +
γ (α,β) sin2 kπ N
< 0 for any
1 ≤ k ≤ N − 1. Lemma 6.1 says that Pα,β is regular if f (γ (α, β)) = 0, where f (γ ) := 1 − 2
N −1 k=1
1 + γ / sin2
kπ N
−1
in the interval 0 < γ ≤ 1. Note that f (γ ) is increasing in 0 < γ ≤ 1 and f (1) can be estimated as follows. Using that N is assumed to be odd one has N −1
−1)π kπ sin2 (N2N N < 1 − 4 −1)π 1 + sin2 kπ 1 + sin2 (N2N N k=1 π cos2 2N 4 1−4 π = −3 + π . 2 1 + cos 2N 1 + cos2 2N
f (1) = 1 − 4 =
2
sin2
Results on Normal Forms for FPU Chains
167
As for N ≥ 3, −3 +
4 1 + cos2
π 2N
< −3 +
4 1 + cos2
π 6
5 =− , 7
we conclude that f (1) < 0. Hence we have shown that f (γ ) < 0 for 0 < γ ≤ 1, and therefore Pα,β is regular for β ≤ 0 by Lemma 6.1. Hence we have proved that 0 < β1 (α). By the same method we prove that β2 (α) > 2α 2 , or equivalently, since we have already shown that β2 (α) > α 2 , that Q α,β is regular for any α 2 < β ≤ 2α 2 . We decompose Q α,β as in (63) and (64), and according to the definition (65) of γ (α, β), α 2 < β ≤ 2α 2 corresponds to γ (α, β) ≤ −1. For such γ ’s, we have 1 + γ / sin2 kπ N < 0, ) > 0 for any 1 ≤ k ≤ N − 1. Moreover, it also follows and hence − 21 (1 + γ / sin2 kπ N −1 < 0, which allows us to conclude that ) that (1 + γ / sin2 kπ N f (γ ) = 1 − 2
N −1 k=1
1 + γ / sin2
kπ N
−1
> 1 > 0.
According to Lemma 6.1, this proves the regularity of Pα,β and hence of Q α,β for α 2 < β ≤ 2α 2 . Finally introduce µk := − 21 (1+γ (α, β)/ sin2 kπ N ) and note that for β with γ (α, β) = 0π for some 1 ≤ k0 ≤ N 2−1 one has µk0 = µ N −k0 = 0. As k0 = N − k0 if − sin2 kN 1 ≤ k0 ≤ (N − 1)/2, it then follows that Pα,β has two equal rows and is therefore 0π 0π singular. Note that γ (α, β) = − sin2 kN corresponds to β = α 2 (1 + sin−2 kN ) and we have proved that β → det(Q α,β ) has at least (N −1)/2 different zeroes in the interval (α 2 , ∞). The statement about index(Q α,β ) easily follows from the above analysis. Proof of Theorem 1.3. Part (i) is proved by Proposition 6.4, whereas (ii) follows from Proposition 6.2 and Lemma 6.3. A. Proof of Lemma 4.3 For the convenience of the reader, we provide a detailed proof of Lemma 4.3 in this appendix. This lemma and its proof are due to Beukers and Rink - see ([13], Appendix A). Recall that K 4 \ K 4N ⊆ Z4 denotes the subset of quadruples (k1 , k2 , k3 , k4 ) satisfying 1 ≤ |ki | ≤ N − 1 (1 ≤ i ≤ 4) and k1 + k2 + k3 + k4 ≡ 0 mod N so that there are no integers l, m with {l, m, −l, −m} = {k1 , k2 , k3 , k4 }, and K 4r es := K r+es ∪ K r−es ⊆ K 4 , where N so that K r±es := (k1 , k2 , k3 , k4 ) ∈ K 4 | ∃ l ∈ N : 1 ≤ l ≤ 4 N N {k1 , k2 , k3 , k4 } = {±l, ±l ∓ N , ∓ l, − ∓ l} . 2 2 Note that K 4r es = ∅ if N is odd. Let us restate Lemma 4.3 as follows:
168
A. Henrici, T. Kappeler
Lemma A.1. ([13]). Let (k1 , k2 , k3 , k4 ) be an element of K 4 \K 4N . Then (k1 , k2 , k3 , k4 ) ∈ K 4r es if and only if sin
k2 π k3 π k4 π k1 π + sin + sin + sin = 0. N N N N
Let us make a few preparations for the proof of Lemma A.1. By a straightforward computation one sees that the “only if”-part of the claimed equivalence holds: Lemma A.2. For any (k1 , k2 , k3 , k4 ) ∈ K 4r es , one has
4
ki π i=1 sin N
= 0.
So it remains to prove the converse. First we consider some special cases. Lemma A.3. Let (k1 , k2 , k3 , k4 ) ∈ K 4 \ (K 4N ∪ K 4r es ). If there exist l, m, n ∈ Z such that (i) {k1 , k2 , k3 , k4 } = {l, −l, m, n}, or (ii) {k1 , k2 , k3 , k4 } = {l, N − l, m, n} with 1 ≤ l ≤ N − 1, or (iii) {k1 , k2 , k3 , k4 } = {l, −N − l, m, n} with −(N − 1) ≤ l ≤ −1, then 4
sin
i=1
ki π = 0. N
Proof. In case (i), it follows that m +n = N (and thus 1 ≤ m, n ≤ N −1) or m +n = −N nπ (and thus −(N − 1) ≤ m, n ≤ −1). Hence in both cases, sin mπ N and sin N have the 4 nπ same sign and i=1 sin kNi π = sin mπ N + sin N = 0. In the case (ii), by assumption, m + n ≡ 0 mod N . The case m + n = 0 has already been treated under (i). If m + n = N , / {−l, −N + l}. then sin kNi π > 0 for any 1 ≤ i ≤ 4. If m + n = −N , then m < 0, and m ∈ 4 (−m)π sin kNi π = 2 sin lπ − 2 sin = 0. The Thus n = −N − m < 0, and therefore i=1 N N case (iii) is treated similarly as (ii). Another special case is treated in the following lemma. Lemma A.4. Assume that (k1 , k2 , k3 , k4 ) ∈ K 4 \ K 4N satisfies ki + k j ≡ 0 mod N ∀ 1 ≤ i, j ≤ 4.
(66)
If there exist l, n ∈ {k1 , k2 , k3 , k4 } with sin
nπ lπ + sin = 0, N N
(67)
then 4 i=1
implies that (k1 , k2 , k3 , k4 ) ∈
K 4r es .
sin
ki π =0 N
(68)
Results on Normal Forms for FPU Chains
169
Proof. From the assumptions (66)–(67) it follows that there exists 1 ≤ l ≤ N − 1 so that (−N +l)π {k1 , k2 , k3 , k4 } = {l, −N + l, m, n} for some m, n ∈ Z. Then sin lπ = 0, N + sin N mπ nπ and hence by (68), sin N + sin N = 0. W.l.o.g. assume that 1 ≤ m ≤ N − 1. Then either n = −m or n = −N + m. If n = −m, then (k1 , k2 , k3 , k4 ) ∈ K 4r es by Lemma A.3 (i). If n = −N + m, then one has 4
ki = 2l − N + 2m − N = 2(l + m) − 2N .
i=1
Note that 2(l + m) − 2N cannot be an even multiple of N , as otherwise l + m ≡ 0 mod N , violating (66). If, in addition, N is odd, then 2(l + m) − 2N cannot be odd multiple 4 of N . Hence in the case N is odd we conclude that i=1 ki ≡ 0 mod N , contradicting the assumption (k1 , k2 , k3 , k4 ) ∈ K 4 . If N is even, it is however possible that 2(l+m)−2N equals ±N : If 2(l+m)−2N = N , i.e. l + m = 23 N , it follows that N2 < l, m ≤ N − 1, and (k1 , k2 , k3 , k4 ) ∈ K r−es as {k1 , k2 , k3 , k4 } = {−l , −l + N , N2 +l , − N2 +l } with l = l − N2 . If 2(l +m)−2N = −N , i.e. l + m = N2 , it follows similarly that (k1 , k2 , k3 , k4 ) ∈ K r+es as {k1 , k2 , k3 , k4 } = {l, l − N , N2 − l, − N2 − l}. So in both cases, we conclude that (k1 , k2 , k3 , k4 ) ∈ K 4r es . In view of Lemma A.3 and Lemma A.4 in order to prove Lemma A.1 it remains to show the following Lemma A.5. Assume that (k1 , k2 , k3 , k4 ) ∈ K 4 satisfies (66). If for any 1 ≤ i, j ≤ 4, sin
kjπ ki π + sin = 0, N N
(69)
(and thus (k1 , k2 , k3 , k4 ) ∈ / K 4N ∪ K 4r es ), then 4 i=1
sin
ki π = 0. N
To prove Lemma A.5 let us first rewrite (68), using Euler’s formula for the sine function, ζ j = 0, (70) 1≤| j|≤4
where ζ± j = ±e±ik j π/N are 2N th roots of unity. Note that for any quadruple (k1 , k2 , k3 , k4 ) ∈ K 4 \ K 4N satisfying (69) one has ζ j + ζ j = 0 ∀ 1 ≤ | j| ≤ | j | ≤ 4. k π
k
π
Indeed for any 1 ≤ | j| ≤ | j | ≤ 4 one has Im ζ j + Im ζ j = sin |Nj| + sin | jN| which does not vanish by assumption (69). Let us first discuss Eq. (70) and its solutions in general, i.e. we consider the equation ζ1 + · · · + ζ8 = 0
and want to study its solutions, (ζl )1≤l≤8 , on the unit circle S 1 := {z ∈ C|z| = 1}.
(71)
170
A. Henrici, T. Kappeler
We need an auxiliary result which we discuss first. Let n ≥ 2 be arbitrary and assume that the sequence (ζi )1≤i≤n ⊆ S 1 has no vanishing subsums (i.e. l∈J ζl = 0 for any ∅ = J {1, . . . , n}) and satisfies the equation n
ζi = 0.
(72)
i=1
Let M ∈ N be the smallest integer with the property that (ζi /ζ j ) M = 1 for all 1 ≤ i, j ≤ n. Then there exists ξ ∈ S 1 so that ζiM = ξ M for any 1 ≤ i ≤ n. W.l.o.g. we can assume that ξ = 1. Furthermore, let p k be a prime power dividing M so that M/ p k and p are relatively prime and define M =: M/ p and η := e2πi/ p . k
(73)
Then for any 1 ≤ l ≤ n there exists a unique integer 0 ≤ µ(l) ≤ p − 1 such that ζl = ζ˜l · ηµ(l) , where ζ˜l is an element of the field K := Q(e2πi/M ). (As ζlM = 1 there 2πi
exists 0 ≤ rl ≤ M − 1 with ζl = e M rl . If rl ≡ 0 mod p choose µ(l) = 0. If rl ≡ 0 mod p choose 1 ≤ µ(l) ≤ p − 1 so that rl ≡ pMk µ(l) mod p.) Hence (72) can be written as 0=
n l=1
ζl =
p−1
⎛ ⎝
s=0
⎞ ζl ⎠ =
l∈µ−1 (s)
p−1 s=0
⎛
⎝
⎞ ζ˜l ⎠ ηs .
(74)
l∈µ−1 (s)
We need the following algebraic fact (see e.g. [17], §60–61): k
Proposition A.6. The minimal polynomial of η = e2πi/ p over the field K = Q(e2πi/M ) is given by X p − η p if k ≥ 2 and X p−1 + X p−2 + · · · + X + 1 if k = 1. We now claim that M is square-free, or equivalently that for any prime power p k dividing M, k = 1.
(75)
Indeed, Eq. (74) shows that the minimal polynomial of ζ has degree at most p − 1, which by Proposition A.6 is only satisfied in the case k = 1. Further we claim that there exists σ ∈ C \ {0} so that ζ˜l = σ ∀ 0 ≤ s ≤ p − 1. (76) l∈µ−1 (s)
The existence of such a σ follows from Proposition A.6: As k = 1 by (75), the minimal polynomial of η over K is given by X p−1 + X p−2 +· · ·+ X +1. Since this is a polynomial of degree p−1 the polynomial on the right-hand side of (74) must be a scalar multiple of the minimal polynomial. Hence all the coefficients l∈µ−1 (s) ζ˜l have the same value σ ∈ C. As l∈µ−1 (s) ζl = σ ηs , the additional property σ = 0 follows from the assumption that there are no vanishing subsums. Hence we can assume w.l.o.g. that σ = 1. Next we claim that p ≤ n.
(77)
Results on Normal Forms for FPU Chains
171
In other words, possible prime factors of M are bounded by the number of summands in (72). To prove (77), note that it follows from (76) that for any 0 ≤ s ≤ p − 1 there exists 1 ≤ l ≤ n such that µ(l) = s, i.e. the map µ : {1, . . . , n} → {0, . . . , p − 1} is onto. This establishes (77). The map µ induces the partition ( µ−1 (s))0≤s≤ p−1 of the positive integer n into p summands, n=
p−1
µ−1 (s).
(78)
s=0
Lemma A.7. ([13], Appendix A). For any solution {ζ1 , . . . , ζ8 } of (71) contained in S 1 without vanishing subsums there exists ξ ∈ S 1 such that either {ζ1 , . . . , ζ8 } = {−ξ α, −ξ α 2 } ∪ {ξ γ j | 1 ≤ j ≤ 6}
(79)
{ζ1 , . . . , ζ8 } = {−ξ αl , −ξ αl · β i , −ξ αl · β j | 1 ≤ l ≤ 2} ∪ {ξβ k , ξβ m },
(80)
or where the quadruple (i, j, k, m) is a permutation of (1, 2, 3, 4) and α := e
2πi 3
, β := e
2πi 5
, γ := e
2πi 7
.
Proof. By a straightforward computation one verifies that the sets of the form (79) or (80) satisfy (71). It remains to prove that these are the only solutions of (71) of this type. We classify the solutions of (71) according to the possible values of p, which we now assume to be the largest prime dividing M. Since n = 8, by (77), the possible values of p are 2, 3, 5, and 7. If p = 2, then, by (75), M = 2 and therefore there exists ξ ∈ S 1 so that ζ j = ±ξ for any 1 ≤ j ≤ n. In this case there exists a solution of (72) without vanishing subsums only if n = 2. (In this case, they are given by {ζ1 , ζ2 } = ξ {1, −1} with ξ ∈ S 1 .) If p = 3, then M = 3 or M = 3 · 2, and there exists a solution of (72) without vanishing subsums only if n = 3. (In this case, they are given by {ζ1 , ζ2 , ζ3 } = ξ {1, α, α 2 } with ξ ∈ S 1 .) If p = 5, then η = β in (73). Up to permutations, there are the following three partitions of 8 into 5 summands, (4, 1, 1, 1, 1), (3, 2, 1, 1, 1), and (2, 2, 2, 1, 1). In a straightforward way one shows that the partitions (4, 1, 1, 1, 1) and (3, 2, 1, 1, 1) and their permutations give rise to solutions of Eq. (71) with vanishing subsums. E.g. the solutions corresponding to (4, 1, 1, 1, 1) are given by ξ ·(−β, −β 2 , −β 3 , −β 4 , β, β 2 , β 3 , β 4 ) with ξ ∈ S 1 , whereas the solutions corresponding to (3, 2, 1, 1, 1) are ξ · (−i, 1, i, −αβ, −α 2 β, β 2 , β 3 , β 4 ) with ξ ∈ S 1 . On the other hand the partition (2, 2, 2, 1, 1) leads to the solutions (ζ1 , . . . , ζ8 ) = ξ(−α, −α 2 , −αβ, −α 2 β, −αβ 2 , −α 2 β 2 , β 3 , β 4 ) with ξ ∈ S 1 . They are the solutions (80) with (i, j, k, m) = (1, 2, 3, 4). Permutations of the partition (2, 2, 2, 1, 1) again lead to solutions of the type (80), but with (i, j, k, m) given by a permutation of (1, 2, 3, 4). If p = 7, then η = γ in (73). Then, up to permutations, (2, 1, 1, 1, 1, 1, 1) is the only possible partition of 8 into 7 summands. The partition (2, 1, 1, 1, 1, 1, 1) leads to the solutions (ζ1 , . . . , ζ8 ) = ξ(−α, −α 2 , γ , . . . , γ 6 ) with ξ ∈ S 1 , where we used that 1 = −α − α 2 . They are of type (79). Any permutation of (2, 1, 1, 1, 1, 1, 1) leads to the same kind of solutions.
172
A. Henrici, T. Kappeler
Lemma A.8. ([13], Appendix A). For any solution {ζ1 , . . . , ζ8 } of (71) contained in S 1 without vanishing subsums of length 2 but having a vanishing subsum of length 3, 4, or 5, there exist ξ , ξ ∈ S 1 such that {ζ1 , . . . , ζ8 } = {ξ αl |0 ≤ l ≤ 2} ∪ {ξ β m |0 ≤ m ≤ 4},
(81)
where again α = e2πi/3 and β = e2πi/5 . Proof. Again, one verifies by a direct computation that the solutions (81) of (71) have the desired properties. It remains to prove that they are the only ones. First note that under the hypotheses of the lemma, vanishing subsums of length 4 cannot occur, since the latter ones would imply the existence of vanishing subsums of length 2, which by assumption is excluded. Hence, in order to find solutions of (72) for n = 8 with the desired properties, we have to find all solutions of (72) without vanishing subsums for n = 3 and n = 5. Note that by (77), p = n for n = 3 or n = 5. By the considerations in the proof of Lemma A.7, the former ones are given by (ζ1 , ζ2 , ζ3 ) = ξ(1, α, α 2 ) and the latter ones by (ζ1 , . . . , ζ5 ) = ξ (1, β, β 2 , β 3 , β 4 ) with ξ, ξ ∈ S 1 . This proves the lemma. We are now ready to prove Lemma A.5. Proof of Lemma A.5. We first select from (79), (80) and (81) all the solutions (ζ1 , . . . , ζ8 ) of (71) which are of the form (68) (after multiplication by 2i). This amounts to selecting the solutions (ζ1 , . . . , ζ8 ) of (71) having the property that {ζ1 , . . . , ζ8 } is invariant under the map ζ → −ζ −1 . It requires to choose ξ and ξ in (79), (80), and (81) appropriately. Let us explain this procedure in detail for the solutions of type (79). First we rewrite the solution (79), 2πitk 2πi x (ζ1 , . . . , ζ8 ) = ξ · (−α, −α 2 , γ , γ 2 , γ 3 , γ 4 , γ 5 , γ 6 ) = e 42 e 42 , 1≤k≤8
where ξ = e2πi x/42 with x ∈ R/42Z and (t1 , . . . , t8 ) = (6, 7, 12, 18, 24, 30, 35, 36).
(82)
The required invariance of the set of the ζk ’s under the map ζ → −ζ −1 is equivalent to the invariance of the set of the (tk + x)’s under the map t → 21 − t (mod 42). Since the set (82) of the tk ’s is invariant under the map t → −t (mod 42), {tk + x|1 ≤ k ≤ 8} is invariant under t → 21 − t (mod 42), if we choose x := 21 2 or ξ = i. Then the equation 8 ζ = 0 reads i=1 i e or sin π6 + sin identity reads
11πi 14
3π 14
+e
5πi 6
+ sin
+e
15π 14
sin
15πi 14
+e
+ sin
19πi 14
19π 14
+e
23πi 14
+e
27πi 14
+e
πi 6
+e
3πi 14
= 0,
= 0. Choosing all arguments in (0, π ), the latter
3π π 5π π + sin − sin − sin = 0. 6 14 14 14
(83)
For the solutions of type (80), one gets sin
π 13π 7π 3π + sin − sin − sin =0 6 30 30 10
(84)
Results on Normal Forms for FPU Chains
173
and sin
π 11π π π + sin − sin + sin = 0. 6 30 30 10
(85)
Let us briefly explain how (84)–(85) can be obtained. Note that from the 24 permutations of (1, 2, 3, 4) in (80), there are only six which lead to different sets of the ζi ’s, since interchanging i and j or k and m leaves the set on the right-hand side of (80) invariant. In t1 t8 the resulting six different cases, we again write {ζ1 , . . . , ζ8 } = ξ · {e2πi· 30 , . . . , e2πi· 30 } with ti in R/30Z. Then, up to translations, there are only two different types of solutions emerging from these six cases. With the appropriate choices of ξ , one gets the solutions (84) and (85). Finally, for the solutions of type (81), one gets sin
π π π 3π − sin + sin − sin = 0. 2 6 10 10
(86)
The procedure to obtain (86) is basically the same as in the preceding cases. We write (81) as {ζ1 , . . . , ζ8 } = ξ · {αl , λ · β m |0 ≤ l ≤ 2, 0 ≤ m ≤ 4} and first choose λ ∈ S 1 so that the set {αl , λ · β m |0 ≤ l ≤ 2, 0 ≤ m ≤ 4} is symmetric with respect to some axis through the origin, and then choose ξ so that this axis is the imaginary axis. To finish the proof of Lemma A.5 we show that all the solutions (k1 , k2 , k3 , k4 ) of 4 ki π i=1 sin N = 0 obtained in (83)–(86) and the additional ones obtained by replacing 4 0 < x < π in sin x by π − x satisfy i=1 ki ≡ 0 mod N and hence are not in K 4 . For the solutions obtained in (83)-(86), N is even. Hence if N is odd, then there is no quadruple (k1 , k2 , k3 , k4 ) ∈ K 4 such that (68) and (69) are satisfied. This finishes the proof of Lemma A.5 in this case. For the rest of the proof, we assume that N is even. If N = 42r for some r ∈ N, (83) becomes sin
7r π 9r π (−3r )π (−15r )π + sin + sin + sin = 0, 42r 42r 42r 42r
and we have 7r +9r −3r −15r = −2r ≡ 0 mod 42r . Hence the corresponding quadruple (k1 , k2 , k3 , k4 ) is not in K 4 . For the quadruples obtained by replacing 0 < x < π in 4 ki ≡ 0 mod 42r sin x by π − x in some of the summands in (83), the condition i=1 amounts to ± 7 ± 9 ± 3 ± 15 ≡ 0 mod 42
(87)
for any combination of plus and minus signs. The relations (87) are easily verified. Similarly, one verifies that the quadruples (k1 , k2 , k3 , k4 ) satisfying (84), (85), or (86) are not in K 4 by showing that ± 5 ± 13 ± 7 ± 9 ≡ 0, ±5 ± 1 ± 11 ± 3 ≡ 0, ±15 ± 5 ± 3 ± 9 ≡ 0 mod 30, (88) again for any combination of plus and minus signs. Hence we have shown that none of the solutions (k1 , k2 , k3 , k4 ) of (68) is an element of K 4 . This completes the proof of Lemma A.5. Proof of Lemma A.1. The claimed statement follows from Lemma A.2, A.3, A.4, and A.5.
174
A. Henrici, T. Kappeler
B. Details of Section 3 We begin by expressing the FPU Hamiltonian HV in relative coordinates. Introduce (v = (v j )1≤ j≤N −1 , v N ) ∈ R N given by (11). Then (v, v N ) = Mq is the linear change of the coordinates q1 , . . . , q N , where M is given by ⎛ ⎞ −1 1 0 . . . 0 ⎜ .. ⎟ ⎜ 0 ... ... . ⎟ ⎜ ⎟ ⎟. .. M =⎜ ⎜ . 0 ⎟ ⎜ ⎟ ⎝ 0 . . . 0 −1 1 ⎠ N −1 . . . . . . N −1 The variables (u = (u j )1≤ j≤N −1 , u N ) ∈ R N conjugate to (v, v N ) are then given by (M T )−1 p. The inverse of M T , (M T )−1 , can be computed to be ⎞ ⎛ ⎛ ⎞ 1 ... ... 1 1 0 ... ... 0 2 . . . . . . 2 ⎟ ⎜1 1 0 ... 0⎟ ⎜ .. ⎟ 1 ⎜ ⎟ ⎜ .. ⎜ .. .. ⎟ T −1 ⎟ . ⎟−⎜ (89) (M ) = ⎜ . . . ⎟. ⎜ ⎟ N ⎜ . ⎠ ⎝ .. ⎠ ⎝ . 1 ... 1 0 . . 0 ... ... 0 N ... ... N Note that by (89), u k = k P − Hence p1 = −u 1 + P;
k
p N = u N −1 + P;
j=1
p j for any 1 ≤ k ≤ N − 1 and u N = N P.
pk = (u k−1 − u k ) + P
(2 ≤ k ≤ N − 1),
and thus N P2 1 2 1 2 + u 1 + (u 1 − u 2 )2 + · · · + (u N −2 − u N −1 )2 + u 2N −1 . pj = 2 2 2 N
j=1
Moreover, using that q N +1 − q N = q1 − q N = − s ∈ Z≥1 , N
(q j+1 − q j )s =
j=1
N −1
N −1
vks + (−1)s
k=1
k=1
(qk+1 − qk ) one gets for any
N −1
s vk
.
k=1
2 Combining the two expressions displayed above yields HV = N 2P + H˜ V , where H˜ V only depends on (v, u) and is given by ⎛ N −1 2 ⎞ N −2 N −1 1 1 (u l+1 −u l )2 +u 2N −1 + ⎝ vk2 + vk ⎠ H˜ V = u 21 + 2 2 l=1 k=1 k=1 ⎛ ⎛ N −1 3 ⎞ N −1 4 ⎞ N −1 N −1 α 3 β 4 vk − vk ⎠ + ⎝ vk + vk ⎠ + O(v 5 ). (90) + ⎝ 3! 4!
k=1
k=1
k=1
k=1
Results on Normal Forms for FPU Chains
175
Note that for any values of α and β, the point (v, u) = (0, 0) is a critical point of the Hamiltonian H˜ V . To compute the Birkhoff normal form of H˜ V up to order 2 near the fixed point (v, u) = (0, 0), we take the expansion (90) as a starting point and use the linearization of the Birkhoff map at (v, u) = (0, 0) (cf. [7]) to define new coordinates (ξk , ηk )1≤k≤N −1 . The following lemma gives an independent proof of the fact that this linear map, defined by (14)–(17), is canonical. Lemma B.1. The linear transformation Z → R2N −2 , ζ → (v, u), as defined by (14)–(17), is a canonical isomorphism. Proof. First let us show {vl (ζ ), u m (ζ )} = i δlm , {vl (ζ ), vm (ζ ) = 0, {u l (ζ ), u m (ζ )} = 0
(91) (92) (93)
for any 1 ≤ l, m ≤ N − 1. Since (v, u) are canonical coordinates on R2N −2 , the proof of (91) amounts to showing that N −1 k=1
∂vl ∂u m ∂vl ∂u m − ∂ζk ∂ζ−k ∂ζ−k ∂ζk
= i δlm
for any 1 ≤ l, m ≤ N − 1. It follows from (14)–(17) that for any 1 ≤ k ≤ N − 1, λk ∂vl λk ∂vl = √ eπi(2l−1)k/N , = √ e−πi(2l−1)k/N , ∂ζk ∂ζ−k N N m−1 m−1 λk 2πi jk/N ∂u m λk −2πi jk/N ∂u m = √ e , =√ e . ∂ζk ∂ζ−k N j=0 N j=0 Hence ∂vl ∂u m ∂vl ∂u m − ∂ζk ∂ζ−k ∂ζ−k ∂ζk ⎞ ⎛ m−1 m−1 2 λ = k ⎝eπi(2l−1)k/N e−2πi jk/N − e−πi(2l−1)k/N e2πi jk/N ⎠ N j=0
=
j=0
m−1 λ2k πik (2l−2 j−1) πik eN − e N (2 j−2l+1) N j=0
=
m−1 2i kπ kπ sin (2(l − j) − 1) sin N N N j=0
i = N
m−1 j=0
2kπ(1 − (l − j)) 2kπ(l − j) cos − cos , N N
176
A. Henrici, T. Kappeler
where for the latter identity we used that 2 sin x sin y = cos(x − y) − cos(x + y). Taking the sum over k and changing the order of summation then leads to N −1 k=1
∂vl ∂u m ∂vl ∂u m − ∂ζk ∂ζ−k ∂ζ−k ∂ζk
=
m−1 N −1 i 2kπ(1−(l−j)) 2kπ(l−j) cos −cos N N N j=0 k=1
m−1 i N (δl− j,1 − δl− j,0 ) = N j=0
=i
m−1
(δl, j+1 − δl, j ) = i(δlm − δl0 ) = iδlm ,
j=0
as claimed. To prove (92) and (93) one argues in a similar way. From (91)–(93) it immediately follows that the linear map ξ → (v, u) is a canonical isomorphism. We now compute H˜ V in terms of the new variables ζ . Write H˜ V as H˜ V = Hu + Hv , where Hu and Hv denote the u- and v-dependent parts of (90), respectively. We compute Hu(ζ ) and Hv (ζ ) separately. To obtain Hu (ζ ), we substitute (14)–(16) into the expression N −2 1 2+ 2 + u2 u (u − u ) l+1 l l=1 1 N −1 and get 2 ⎛ N −1 1 ⎝ Hu (ζ ) = 2N l=0
1 = 2N Using again that obtains
N −1 l=0
⎞2
λk e
1≤|k|≤N −1
λk λk
1≤|k|,|k |≤N −1
2πilk/N
N −1
ζk ⎠
e
2πil(k+k )/N
ζ k ζk .
l=0
e2πilk/N = N δk0 and λk = λ−k for any 1 ≤ |k| ≤ N − 1, one
Hu (ζ ) =
N −1
λ2k ζk ζ−k .
k=1
Before computing Hv (ζ ), we simplify its expansion in terms of the variables (vk )1≤k≤N −1 . Define v0 by the expression on the right-hand side of (17) evaluated at l = 0. Note that N −1 N −1 1 −iπ k/N 2πilk/N vl = √ λ k ζk e e = 0. N 1≤|k|≤N −1 l=0 l=0 Hence
N −1 l=1
vl = −v0 and therefore Hv =
N −1 l=0
1 2 α 3 β 4 vl + vl + vl + O(|v|5 ). 2 3! 4!
(94)
Results on Normal Forms for FPU Chains
177
Substituting the expression (17) for vl in the quadratic term in the expansion (94), we get N −1 N −1 1 2 1 )/N 2πil(k+k vl = λk λk e e−iπ(k+k )/N ζk ζk 2 2N 1≤|k|,|k |≤N −1
l=0
=
N −1
l=0
λ2k ζk ζ−k ,
k=1
N −1 2πilk/N where we again used that λk = λ−k and l=0 e = N δk0 for any 0 ≤ |k| ≤ N −1. The terms of third and fourth order in Hv are treated similarly. Combining the above computations leads to the claimed formula H˜ V (ζ ) = G 2 + αG 3 + βG 4 + O(ζ 5 ) with G 2 , G 3 , and G 4 given by (18), (19), and (20), respectively. Acknowledgement. It is a great pleasure to thank Yves Colin de Verdière and Percy Deift for valuable comments. We also would like to thank the referee for his suggestions of how to improve the exposition of our paper.
References 1. Bambusi, D., Ponno, A.: Korteweg-de Vries equation and energy sharing in Fermi-PastaUlam. CHAOS 15, 015107 (2005) 2. Bambusi, D., Ponno, A.: On Metastability in FPU. Commun. Math. Phys. 264, 539–561 (2006) 3. Berman, G.P., Izrailev, F.M.: The Fermi-Pasta-Ulam problem: 50 years of progress. CHAOS 15(1), 015104.1–015104.18 (2005) 4. Broer, H.W.: KAM theory: the legacy of Kolmogorov’s 1954 paper. Bull. AMS (New Series) 41(4), 507–521 (2004) 5. Fermi, E., Pasta, J., Ulam, S.: Studies of non linear problems. Los Alamos Rpt. LA-1940 (1955). In: Collected Papers of Enrico Fermi. Chicago, IL: University of Chicago Press, 1965, Volume II, Theory, Methods and Applications, (2nd ed., New York: Marcel Dekker, 2000), pp. 978–988 6. Henrici, A., Kappeler, T.: Global Birkhoff coordinates for the periodic Toda lattice. Preprint, 2006 7. Henrici, A., Kappeler, T.: Birkhoff normal form for the periodic Toda lattice. http://arxiv.org/list/nlin.SI/ 0609045, 2006, to appear in Contemp. Math. 8. Henrici, A., Kappeler, T.: Resonant normal form for even periodic FPU chains. arXiv: 0709.2624 [nlin.SI] 9. Kappeler, T., Pöschel, J.: KdV & KAM. Ergebnisse der Mathematik, 3. Folge, 45. Berlin: Springer, 2003 10. Nishida, T.: A note on an existence of conditionally periodic oscillation in a one-dimensional lattice. Mem. Fac. Engrg. Kyoto Univ. 33, 27–34 (1971) 11. Pöschel, J.: Integrability of Hamiltonian Systems on Cantor Sets. Comm. Pure Appl. Math. 35, 653–695 (1982) 12. Pöschel, J.: On Nekhoroshev’s Estimate at an Elliptic Equilibrium. Int. Math. Res. Not. 4, 203–215 (1999) 13. Rink, B.: Symmetry and resonance in periodic FPU chains. Commun. Math. Phys. 218, 665–685 (2001) 14. Rink, B.: Direction reversing travelling waves in the Fermi-Pasta-Ulam chain. J. Nonlinear Science 12, 479–504 (2002) 15. Rink, B.: Proof of Nishida’s conjecture on anharmonic lattices. Commun. Math. Phys. 261, 613–627 (2006) 16. Toda, M.: Theory of Nonlinear Lattices, 2nd enl. ed., Springer Series in Solid-State Sciences 20. Berlin: Springer, 1989 17. Van der Waerden, B.L.: Algebra I. Heidelberger Taschenbücher. Berlin: Springer, 1966 18. Weissert, T.P.: The genesis of simulation in dynamics: pursuing the Fermi-Pasta-Ulam problem. New York: Springer, 1997 Communicated by G. Gallavotti
Commun. Math. Phys. 278, 179–191 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0384-2
Communications in
Mathematical Physics
Global Well-Posedness for a Smoluchowski Equation Coupled with Navier-Stokes Equations in 2D P. Constantin1 , Nader Masmoudi2 1 Department of Mathematics, The University of Chicago, 5734 S. University Avenue,
Chicago, IL 60637, USA. E-mail:
[email protected] 2 Courant Institute, New York University, 251 Mercer St, New York, NY 10012, USA.
E-mail:
[email protected] Received: 22 February 2007 / Accepted: 10 May 2007 Published online: 7 November 2007 – © Springer-Verlag 2007
Abstract: We prove global existence for a nonlinear Smoluchowski equation (a nonlinear Fokker-Planck equation) coupled with Navier-Stokes equations in 2d. The proof uses a deteriorating regularity estimate in the spirit of [5] (see also [1]).
1. Introduction Systems coupling fluids and particles are of great interest in many branches of applied physics and chemistry. The equations attempt to describe the behavior of complex mixtures of particles and fluids, and as such, they present numerous challenges, simultaneously at three levels: at the level of their derivation, the level of their numerical simulation and that of their mathematical treatment. In this paper we concentrate solely on one aspect of the mathematical treatment, the regularity of solutions. The particles in the system are described by a probability distribution f (t, x, m) that depends on time t, macroscopic variable x ∈ Rn , and particle configuration m ∈ M. Here M is a smooth compact Riemannian manifold without boundary. The particles are transported by a fluid, agitated by thermal noise, and interact among themselves. This is reflected in a kinetic equation for the evolution of the probability distribution of the particles ([2,8]). The interaction between particles – a micro-micro interaction – is modeled in a mean-field fashion by a potential that represents the tendency of particles to favor certain coherent configurations. The interaction between particles occurs only when the concentration of particles is sufficiently high. Mathematically, this term is responsible for the nonlinearity of the Smoluchowski (Fokker-Planck) equation, and physically, it is responsible for nematic phase transitions. Because the particles are considerably small, and for smooth flows, the Lagrangian transport of the particles is modeled using a Taylor expansion of the velocity field. This gives rise to a drift term in the Smoluchowski equation that depends on the spatial gradient of velocity. It is a macro-micro term, and it causes mathematical difficulties in the regularity theory.
180
P. Constantin, N. Masmoudi
The fluid is described by the incompressible Navier-Stokes equations. The microscopic particles add stresses to the fluid. This is the micro-macro interaction and it is the most puzzling and important physical aspect of the problem. Indeed, while a macromicro interaction can be derived, in principle, by assuming that the macroscopic entities vary little on the scale of the microscopic ones, the “scaling up” of the effect of microscopic quantities to the macroscopic level is more mysterious. A principle based on an energy dissipation balance, and that recovers familiar results in simple cases was proposed in [6], where the regularity of nonlinear Fokker-Planck systems coupled with Stokes equations in 3D was also proved. The linear Fokker-Planck system coupled with Stokes equations was considered in [22]. The nonlinear Fokker-Planck equation driven by a time averaged Navier-Stokes system in 2D was studied in [7]. An approximate closure of the linear Fokker-Planck equation reduces the description to closed viscoelastic equations for the added stresses themselves. This leads to well-known non-Newtonian fluid models that have been studied extensively. For regularity results we refer to Lions and Masmoudi [19] where the existence of global weak solutions was proved for an Oldroyd-type model. In Guillopé and Saut [13] and [14], the existence of the local strong solution was proved. Also, Fernández-Cara, Guillén and Ortega [11,10] and [12] proved local well posedness in Sobolev spaces. We also mention Lin, Liu and Zhang [16] where a formulation based on the deformation tensor is used to study the Oldroyd-B model. Another model for the polymers is the FENE dumbbell model. From a mathematical point of view, this model was studied by several authors. In particular W. E, Li and Zhang [9], Jourdain, Lelievre and Le Bris [15] and Zhang and Zhang [23] proved local well-posedness. Moreover, Lin, Liu and Zhang [17] proved global existence near equilibrium. After the completion of the present work, we learned that Lin, Zhang, and Zhang [18] proved a result similar to our result for the co-rotational FENE model (see also [21]). Existence of global weak solutions was also proved in [20]. 1.1. The model. Consider the system ⎧ ∂v in × (0, T ), ⎨ ∂t + v · ∇v − νv + ∇ p = ∇ · τ ∂f (1) + v · ∇ f + div (G(v, f ) f ) − f = 0 in × (0, T ), g g ⎩ ∂t divv = 0 in × (0, T ), (1) (2) where τi j = M γi j (m) f (t, x, m)dm+ M M γi j (m 1 , m 2 ) f (t, x, m 1 ) f (t, x, m 2 )dm. ij
We denote G(v, f ) = ∇g U + W , where W = cα ∂ j vi and U = K f is a potential given by U (t, x, m) = K (m, q) f (t, x, q) dq (2) M
with a kernel K which is a smooth, time and space independent symmetric function K : M × M → R. We also take = R2 . 1.2. Statement of the result. Theorem 1.1. Take v(0) ∈ W1+ε0 ,r ∩ L 2 (R2 ) and f (0) ∈ W 1,r (H −s ), for some r > 2 and ε0 > 0 and f 0 ≥ 0, M f 0 ∈ L 1 ∩ L ∞ . Then (1) has a global solution in ∞ (W 1,r ) ∩ L 2 (W 2,r ) and f ∈ L ∞ (W 1,r (H −s )). Moreover, for T > T > 0, v ∈ L loc 0 loc loc we have v ∈ L ∞ ((T0 , T ); W 2−ε,r ).
Global Well-Posedness for a Smoluchowski Equation with 2D N-S Equations
181
1.3. Preliminaries. We define C to be the ring of center 0, of small radius 1/2 and great radius 2. There exist two nonnegative radial functions χ and ϕ belonging respectively to D(B(0, 1)) and to D(C) so that χ (ξ ) + ϕ(2−q ξ ) = 1, (3) q≥0
| p − q| ≥ 2 ⇒ Supp ϕ(2−q ·) ∩ Supp ϕ(2−p ·) = ∅.
(4)
For instance, one can take χ ∈ D(B(0, 1)) such that χ ≡ 1 on B(0, 1/2) and take ϕ(ξ ) = χ(ξ/2) − χ (ξ ). Then, we are able to define the Littlewood-Paley decomposition. Let us denote by F the Fourier transform on Rd . Let h, h, q , Sq (q ∈ Z) be defined as follows: h = F −1 ϕ and h = F −1 χ ,
q u = F −1 (ϕ(2−q ξ )Fu) = 2qd
Sq u = F −1 (χ (2−q ξ )Fu) = 2qd
h(2q y)u(x − y)dy, h(2q y)u(x − y)dy.
We use the para-product decomposition of Bony ([3]), uv = Tu v + Tv u + R(u, v), where Tu v =
Sq−1 uq v and R(u, v) =
q∈Z
q uq v.
|q−q |≤1
We define the inhomogeneous and homogeneous Besov spaces by Definition 1.2. Let s be a real number, p and r two real numbers greater than 1. Then we define the following norm:
def qs p p 2 u = S u + u , 0 L q L q∈N r B sp,r (N )
and the following semi-norm:
def u B sp,r = 2qs q u L p q∈Z
r (Z)
.
Definition 1.3. • Let s be a real number, p and r two real numbers greater than 1. We denote by B sp,r the space of tempered distributions u such that u B sp,r is finite. • If s < d/ p or s = d/ p and r = 1 we define the homogeneous Besov space B sp,r as the closure of compactly supported smooth functions for the norm · B sp,r . We refer to [4] for the proof of the following results and for the multiplication law in Besov spaces.
182
P. Constantin, N. Masmoudi
Lemma 1.4. 1
1
q u L b ≤ 2d( a − b )q q u L a for b ≥ a ≥ 1, et q u L b ≤ C2−ct2 q u L b . 2q
The following corollary is straightforward. Corollary 1.5. If b ≥ a ≥ 1, then, we have the following continuous embeddings: s Ba,r
s−d
⊂ Bb,r
1 1 a−b
.
Definition 1.6. Let p be in [1, ∞] and r in R; the space L T (C r ) is the space of distributions u such that p
def
qr p u L p (0,T ;C r ) = sup 2 q u L (L ∞ ) < ∞. T
q
We will use the following theorem from [5]. Theorem 1.7. Let v be the solution in L 2T (H 1 ) of the two dimensional Navier-Stokes system ⎧ ∂v ⎪ ⎨ + v · ∇v − νv = −∇ p + f (N Sν ) ∂t divv = 0 ⎪ ⎩ v|t=0 = v0 with an initial data in L 2 and an external force f in L 1T (C −1 ) ∩ L 2T (H −1 ); then, for any ε, a T0 in the interval ]0, T [ exists such that ∇v L1
[T0 ,T ] (C
0)
≤ ε.
2. A Deteriorating Regularity Estimate The main part of this section is the proof of a deteriorating regularity estimate for transport equations in the spirit of [1] and [5]. After this proof, we will apply this estimate in order to prove Theorem 1.1. We also denote H = (−g + I )−s/2 with s > d/2 + 1. Theorem 2.1. Let σ and β be two elements of ]0, 1[ such that σ + β < 1. A constant C exists that satisfies the following properties. Let T and λ be two positive numbers and v a smooth divergence free vector field so that σ − λ ∇v L 1 (C 0 ) ≥ β and σ + λ ∇v L 1 (C 0 ) ≤ 1 − β. T
T
Consider two smooth functions f and v so that f is the solution of ∂t f + v · ∇ f + divg (G(v, f ) f ) − g f = 0, f |t=0 = f 0 .
(5)
(6)
Global Well-Posedness for a Smoluchowski Equation with 2D N-S Equations
183
Then we have, if λ ≥ 3C, Mλσ ( f ) ≤ 3 f 0 B σp,∞ (H −s ) +
3C σ +1 M (v), λ λ
(7)
where def
Mλσ (v) =
sup
2qσ −q,λ (t) q v(t) L p or
(8)
sup
2qσ −q,λ (t) q f (t) L p (H −s ) with
(9)
t∈[0,T ],q
def
Mλσ ( f ) =
def
t∈[0,T ],q t
q,λ (t, t ) = λ
t
( Sq−1 ∇v(t
) L ∞ + 1)dt
, q,λ (t) = q,λ (t, 0).
(10)
def
We will use the notation f q = q f . Applying the operator q to the transport equation (6), we get ∂t f q + Sq−1 v · ∇ f q + divg (G(Sq−1 v, Sq−1 f ) f q ) − g f q + Rq (v, f ) = 0, f q |t=0 = q f 0 , (11) where Rq is a rest term. We denote
Nq2 (t, x) =
|H f q |2 dm.
(12)
M
Applying H to (11) and taking the L 2 norm on M, we get 1 ∂t N 2 + Sq−1 v · ∇ Nq2 + V (Sq−1 v, Sq−1 U, f q ) + |∇g H f q |2 2 q H f q (H Rq (v, f ))dm = 0,
+
(13)
M
where
V (v, h, f ) = ∂ j vi M
(H divg (cαi j
f ))(H f )dm. +
(H divg (∇g h f ))(H f )dm.
(14)
M
Hence, arguing as in [7], we have |V (Sq−1 v, Sq−1 U, f q )| ≤ C(|∇ Sq−1 v| + ||H Sq−1 f || L 2 (M) )Nq2 . We will use now the following lemma, postponing its proof: Lemma 2.2. Rq (v, f ) satisfies 2qσ −q,λ (t) H Rq (v(t), f (t)) L p (L 2 ) ≤ Ce ⎛ ⎛
Cλ ∇v L 1 (C 0 )
× ⎝ Mλσ +1 (v)+ ⎝1 + Sq ∇v(t) L ∞ +
T
|q −q|≤N
⎞
⎞
q ∇v(t) L ∞ ⎠ Mλσ ( f )⎠ . (15)
184
P. Constantin, N. Masmoudi
Taking the L p norm of Nq , we get
t
Nq (t) L p ≤ Nq (0) L p + C 0
H Rq (v(t ), f (t )) L p (L 2 )
+ (1 + ∇ Sq v(t ) L ∞ ) Nq (t ) L p dt . After multiplication by 2qσ −q,λ (t) , we get 2qσ −q,λ (t) Nq (t) L p ≤ 2qσ Nq (0) L p t
+ 2−q,λ (t,t ) 2qσ −q,λ (t ) ∇ Sq v(t ) L ∞ Nq L p dt
0 t
+ 2−q,λ (t,t ) 2qσ −q,λ (t ) H Rq (v(t ), f (t )) L p (L 2 ) dt . 0
Then, using the inequality (15) and taking the sup over q, we get Mλσ ( f ) ≤ f 0 B σp,∞ (H −s ) +e ⎛
Cλ ∇v L 1 (C 0 )
⎛
× ⎝ Mλσ +1 (v)+ Mλσ ( f ) ⎝1+2 Sq ∇v(t ) L ∞ +
T
sup
t∈[0,T ],q 0
t
2−q,λ (t,t )
(16)
⎞⎞
q ∇v(t ) L ∞ ⎠⎠ dt .
(17)
|q −q|≤N
As λ ∇v L 1 (C 0 ) is smaller than (σ − β), we have T
e
Cλ ∇v L 1 (C 0 ) T
≤ eC(σ −β) .
Moreover, by definition of q,λ (t, t ), it is obvious that
t
2−q,λ (t,t ) ( Sq ∇v(t ) L ∞ + 1)dt ≤
0
1 · λ log 2
Then, we obtain that C σ +1 C σ σ M (v) + C ∇v M (f) L 1T (C 0 ) Mλ ( f ) + λ λ λ λ C 2C σ M ( f ). ≤ f 0 B σp,∞ (H −s ) + Mλσ +1 (v) + λ λ λ
Mλσ ( f ) ≤ f 0 B σp,∞ (H −s ) +
This proves the theorem, of course, if we prove the estimate (15) of the lemma. First of all, let us decompose the operator Rq . We have
Global Well-Posedness for a Smoluchowski Equation with 2D N-S Equations
Rq (v, f ) =
6
185
Rq (v, f ) with
=1
Rq1 (v, f ) =
d
q (T∂ j f v j ),
j=1
Rq2 (v, f ) =
d
[q , Tv j ∂ j ] f,
j=1
Rq3 (v, f ) =
d
q ∂ j R(v j , f ) + q−1 v j ∂ j q+1 f q − q−2 v j ∂ j q−1 f q ,
j=1
Rq4 (v,
f) =
d
divg (cαi j q (T f ∂ j v i )) + divg (q (T f ∇g U )),
i, j=1
Rq5 (v, f ) =
d
divg (cαi j [q , T∂ j vi ] f ) + divg ([q , T∇g U ] f ),
i, j=1
Rq6 (v, f ) =
d
divg cαi j (R(∂ j v i , f ) + q−1 ∂ j v i q+1 f q − q−2 ∂i v j q−1 f q )
i, j=1
+
d
divg R(∇g U, f ) + q−1 ∇g U q+1 f q − q−2 ∇g U q−1 f q .
i, j=1
Indeed,
⎞ ⎛ d T∂ j f v j + Tv j ∂ j f + R(v j , ∂ j f )⎠ q (v · ∇ f ) = q ⎝ j=1
=
2
Rq (v, f ) +
=1
d
Tv j ∂ j q f + q R(v j , ∂ j f ),
j=1
Then, we use that d j=1
Tv j ∂ j f q =
Sq −1 v j ∂ j q f q
|q−q |≤1
= Sq−1 v j ∂ j f q +
(Sq −1 v j − Sq−1 v j )∂ j q f q
|q−q |≤1
= Sq−1 v ∂ j f q + q−1 v j ∂ j q+1 f q − q−2 v j ∂ j q−1 f q . j
Hence, q (v · ∇ f ) =
3 =1
Rq (v, f ) + Sq−1 v · ∇ f q .
186
P. Constantin, N. Masmoudi
In the same way, we have q (divg (G(v, f ) f )) =
6
Rq (v, f ) + divg (G(Sq−1 v, Sq−1 f ) f q ).
=4
Let us estimate the six terms appearing above. Let us begin with Rq1 (v, f ). By definition of the paraproduct, we have Rq1 (v, f ) =
d j=1
q (Sq −1 ∂ j f q v j ).
q
As, if |q − q | > 2 then the above term is equal to 0, we deduce that H Sq −1 ∇ f L ∞ (L 2 ) q v(t) L p . H Rq1 (v(t), f (t)) L p (L 2 ) ≤ C |q−q |≤2
Using the fact that, if |q − q | ≤ 2, then H Sq −1 ∇ f L ∞ (L 2 ) ≤ C2q H f (t) L ∞ (L 2 ) ≤ C2q , we infer that q v(t) L p ≤ C ∇q v(t) L p . H Rq1 (v(t), f (t)) L p (L 2 ) ≤ C2q |q−q |≤2
|q−q |≤2
Hence 2qσ −q,λ (t) H Rq1 (v(t), f (t)) L p (L 2 ) ≤ C Mλσ +1 (v) t t
2−λ 0 Sq ∇v(t ) L ∞ dt +λ 0 Sq ∇v(t ) L ∞ dt . |q−q |≤2
But, it is obvious that t t t Sq ∇v(t ) L ∞ dt − Sq ∇v(t ) L ∞ dt ≤ (Sq − Sq )∇v(t ) L ∞ dt . 0
0
0
Using the fact that |q − q | ≤ 2, we get t t Sq ∇v(t ) L ∞ dt − Sq ∇v(t ) L ∞ dt ≤ C|q − q | ∇v L 1 (C 0 ) . 0
T
0
(18)
So it turns out that 2qσ −q,λ (t) H Rq1 (v(t), f (t)) L p (L 2 ) ≤ 2
Cλ ∇v L 1 (C 0 ) T
Mλσ +1 (v).
Now let us look at Rq2 (v, f ). By definition of the paraproduct, we have Rq2 (v, f ) = −
d
[Sq −1 v j ∂ j q , q ] f
j=1 q
=−
d [Sq −1 v j , q ]∂ j q f. j=1 q
(19)
Global Well-Posedness for a Smoluchowski Equation with 2D N-S Equations
187
The terms of the above sum are equal to 0 except if |q − q | ≤ 2. Moreover, by definition of the operators q , we have [Sq −1 v j , q ]∂ j q f (x) = 2qd h(2q (x − y))(Sq −1 v j (x) Rd
− Sq −1 v j (y))∂ j q f (y)dy. So we infer that H [Sq −1 v j , q ]∂ j q f (x) L 2 (M) ≤ 2−q ∇ Sq −1 v L ∞ 2qd
× 2q | · | × |h(2q ·)| H ∂ j q f L 2 (M) (x). Hence, H [Sq −1 v j , q ]∂ j q f (x) L p (L 2 (M)) ≤ 2−q ∇ Sq −1 v L ∞ H ∂ j q f L p (L 2 (M)) . Then, we have, using inequality (18), 2qσ −q,λ (t) H [Sq −1 v j , q ]∂ j q f L p (L 2 (M)) Cλ v 1 1 L T (C ) ≤ C Mλσ ( f ) 2 ( ∇(Sq −1 − Sq )v(t) L ∞ + Sq v(t) L ∞ ). |q−q |≤2
So, we get 2qσ −q,λ (t) H Rq2 (v(t), f (t)) L p (L 2 ) ≤ C Mλσ ( f )2 ⎛ ⎞ × ⎝ Sq ∇v(t) L ∞ + ∇(q v(t) L ∞ ⎠ .
Cλ v L 1 (C 1 )
(20)
|q−q |≤2
For Rq3 , we have H Rq3 (v, f ) L p (L 2 ) ≤ C ≤C
|q −q
|≤1 q ≥q−2
q ≥q−2
Hence, 2qσ −q,λ (t) H Rq3 (v, f ) L p (L 2 ) ≤ C
2q q v L p H q
f L ∞ (L 2 )
2q−q q ∇v L p H f L ∞ (L 2 ) .
q ≥q−2
2(1+σ )(q−q )−q,λ (t)+q ,λ (t) Mλσ +1 (v)
× H f L ∞ (L 2 ) . Then, we see that the sum converges since
|q,λ (t) − q ,λ (t)| ≤ λ ∇v L 1 (C 0 ) |q − q | ≤ (σ − β)|q − q | T
and 1 + σ − (σ − β) = 1 + β > 0. Hence, we get 2qσ −q,λ (t) H Rq3 (v, f ) L p (L 2 ) ≤ C Mλσ +1 (v) H f L ∞ (L 2 ) .
(21)
188
P. Constantin, N. Masmoudi
The estimate for Rq4 (v, f ) = Rq4,1 (v, f ) + Rq4,2 ( f ) is the same as the estimate for Rq1 (v, f ). Indeed, we have H Rq4,1 (v(t), f (t)) L p (L 2 ) ≤ C ∇g H Sq −1 f L ∞ (L 2 ) q ∇v(t) L p |q−q |≤2
≤C
q ∇v(t) L p ,
|q−q |≤2
where we used that ∇g H Sq −1 f L ∞ (L 2 ) ≤ C. Hence, we conclude as for Rq1 (v, f ). Besides, ∇g H Sq −1 f L ∞ (L 2 ) q ∇g U L p H Rq4,2 ( f (t)) L p (L 2 ) ≤ C |q−q |≤2
≤C
q f (t) L p .
|q−q |≤2
Hence, we conclude as for Rq1 (v, f ) and get 2qσ −q,λ (t) H Rq4,2 ( f (t)) L p (L 2 ) ≤ 2
Cλ ∇v L 1 (C 0 ) T
Mλσ ( f ).
(22)
We write Rq5 (v, f ) = Rq5,1 (v, f ) + Rq5,2 ( f ). The estimate for Rq5 (v, f ) is similar to the one for Rq2 (v, f ) with the only difference that we have to use the regularity of ∇v. We have [q , T∂ j vi ] f = −
d [Sq −1 ∂ j v i , q ]∂ j q f. j=1 q
The terms of the above sum are equal to 0 except if |q − q | ≤ 2. Moreover, by definition of the operators q , we have i qd [Sq −1 ∂ j v , q ]q f (x) = 2 h(2q (x − y))(Sq −1 ∂ j v i (x) Rd
− Sq −1 ∂ j v i (y))q f (y)dy. So we infer that H Rq5,1 (v, f ) L 2 (M) ≤ 2−q |∇ 2 Sq −1 v|2qd
q
2 | · |×|h(2q ·)| ∇g H q f L 2 (M) (x).
Hence, H Rq5,1 (v, f ) L p (L 2 (M)) ≤ 2−q ∇ 2 Sq −1 v L p ∇g H q f L ∞ (L 2 (M)) . Then, we have, using Inequality (18), 2qσ −q,λ (t) H Rq5,1 (v, f ) L p (L 2 (M)) (σ −1)(q−q
)− (t)+ (t) q,λ q
,λ ≤C 2 Mλσ +1 (v) ∇g H q f L ∞ (L 2 ) . |q−q |≤2 q
≤q −1
Global Well-Posedness for a Smoluchowski Equation with 2D N-S Equations
189
Hence, 2qσ −q,λ (t) H Rq5,1 (v, f ) L p (L 2 (M)) ≤ C
q
≤q+1
2−β(q−q ) Mλσ +1 (v) ∇g H f L ∞ (L 2 ) ,
and the sum is uniformly bounded since σ − 1 + λ ∇v L 1 (C 0 ) ≤ −β. Then, we argue T
in a similar way for H Rq5,2 ( f ) L p (L 2 (M)) and get
H Rq5,2 ( f ) L p (L 2 (M)) ≤ 2−q ∇ Sq −1 f L p (L 2 (M)) ∇g H q f L ∞ (L 2 (M)) , and we conclude as above with Mλσ +1 (v) replaced by Mλσ ( f ). Finally, the estimate for Rq6 (v, f ) is exactly the same as the one for Rq3 (v, f ) since, we also have that ∇g H f L ∞ (L 2 ) ≤ C. 3. Global Existence Now, we turn to the proof of our main theorem. First, we notice that the local existence ∞ ([0, T ); W 1,r ) ∩ L 2 ([0, T ); W 2,r ) and f ∈ L ∞ ([0, T ); W 1,r (H −s )) with v ∈ L loc loc loc can be easily deduced from standard arguments. Moreover, from regularity estimates for ∞ ((T , T ); W 2−ε,r ). the heat equation, we have for all 0 < T0 < T , v ∈ L loc 0 We want to prove that we can extend the solution beyond the time T . It is enough to prove that ∇v ∈ L ∞ ((0, T ) × R2 ). The local existence result tells that, for any T0 in ]0, T [, the solution (v, f ) of (1) ∞ ([T , T [; W 2−ε,r × W 1,r (H −s )) for any ε > 0. Sobolev type belongs to the space L loc 0 embeddings of Corollary 1.5 imply that (v, τ ) ∈
∞ L loc
2−ε−2
B p,∞ [T0 , T [;
1 1 r−p
1−2
× B p,∞
1 1 r−p
.
Choosing ε < 1 − 2/r and p = ∞ in the above assertion implies that (v, τ ) ∈ ∞ (C 1+σ × C σ (H −s )), where σ = 1 − ε − 2/r > 0. So we can apply Theorem L loc 1.7 and we can choose T0 such that, with the notations of Theorem 2.1, we have ∇v L1
[T0 ,T ] (C
0)
≤
min(σ − β, 1 − σ − β) · 3λ
The deteriorating regularity estimate of Theorem 2.1 applied with σ and between T0 and T tells exactly that f satisfies Mλσ ( f ) ≤ 3 f (T0 ) C σ (H −s ) +
3C σ +1 M (v). λ λ
(23)
Now, we have to estimate ∇v. The two dimensional Navier-Stokes equation can be written as ∂t v − νv = P(v · ∇v) + P Dτ,
190
P. Constantin, N. Masmoudi
where P denotes the Leray projector on the divergence free vector field. Exactly along the same lines as in the proof of Theorem 2.1, we have 2q(σ +1)−q,λ (t) P(v · ∇v) − P(Sq v · ∇q v) L ∞ ⎛ ⎞
≤ C Mλσ +1 (v) ⎝ Sq ∇v(t) L ∞ + 2q−q ∇q v(t) L ∞ ⎠ . q ≥q
Moreover, it is obvious that 1
2q(σ − 2 )−q,λ (t) P(Sq v · ∇q v) L ∞ ≤ C v(t)
1
H2
Mλσ +1 (v).
So it turns out that 2q(σ +1)−q,λ (t) q P(v · ∇v) L ∞ ⎛ ≤ C Mλσ +1 (v) ⎝ Sq ∇v(t) L ∞ +
⎞
2
(q−q )
3q 2
∇v(t) L ∞ + 2 v(t)
q ≥q
1
H2
⎠.
(24)
Using well known estimates on the heat equation (see for instance[4]) and inequalities (23) and (24) , we get that 3q C C + 2 2 Fq (T0 , T ) Mλσ +1 (v) + Mλσ (τ ) Mλσ +1 (v) ≤ v(T0 ) C σ +1 + λ ν with def
Fq (T0 , T ) =
t
sup
t∈[T0 ,T ] T0
ecν2
2q (t−t )
v(t )
1
H2
dt .
Hölder inequality implies immediately that Fq (T0 , T ) ≤
C ν
3 4
3q
2− 2 v
L 4T
0 ,T ]
1
(H 2 )
.
Moreover, it is easy to see that Mλσ (τ ) ≤ Mλσ ( f ). So, we infer that Mλσ +1 (v)
≤ v(T0 ) C σ +1
3C τ0 C σ + + ν
C C C + + v 4 Mλσ +1 (v). 1 L T ,T ] (H 2 ) λ λν ν 34 0
Now it is enough to choose T0 such that the quantity C C C + + v 4 1 L T ,T ] (H 2 ) λ λν ν 34 0 is small enough. Then as σ is greater than 0, the solution (v, τ ) of the system (1) is such that (∇v, τ ) belongs to L ∞ ([T0 , T ] × R2 ); this concludes the proof of Theorem 1.1. Acknowledgements. The work of P.C. is partially supported by NSF-DMS grant 0504213. The work of N. M. is partially supported by NSF-DMS grant 0403983.
Global Well-Posedness for a Smoluchowski Equation with 2D N-S Equations
191
References 1. Bahouri, H., Chemin, J.-Y.: Équations de transport relatives á des champs de vecteurs non-lipschitziens et mécanique des fluides. Arch. Rational Mech. Anal. 127(2), 159–181 (1994) 2. Bird, R.B., Curtiss, C., Amstrong, R., Hassager, O.: Dynamics of polymeric liquids. Kinetic Theory Vol. 2, New York: Wiley, 1987 3. Bony, J.-M.: Calcul symbolique et propagation des singularités pour les équations aux dérivées partielles non linéaires. Ann. Sci. École Norm. Sup. (4), 14(2), 209–246 (1981) 4. Chemin, J.-Y.: Théorèmes d’unicité pour le système de Navier-Stokes tridimensionnel. J. d’Anal. Math. 77, 27–50 (1999) 5. Chemin, J.-Y., Masmoudi, N.: About lifespan of regular solutions of equations related to viscoelastic fluids. SIAM J. Math. Anal. 33(1), 84–112 (electronic) (2001) 6. Constantin, P.: Nonlinear Fokker-Planck Navier-Stokes systems. Commun. Math. Sci. 3(4), 531–544 (2005) 7. Constantin, P., Fefferman, C., Titi, E., Zarnescu, A.: Regularity for coupled two-dimensional nonlinear Fokker-Planck and Navier-Stokes systems. Commun. Math. Phys. 270, 789–811 (2007) 8. Doi, M., Edwards, S.F.: The Theory of Polymer Dynamics. Oxford: Oxford University Press, 1986 9. Li, W.E.T., Zhang, P.: Well-posedness for the dumbbell model of polymeric fluids. Commun. Math. Phys. 248(2), 409–427 (2004) 10. Fernández-Cara, E., Guillén, F., Ortega, R.R.: Some theoretical results for viscoplastic and dilatant fluids with variable density. Nonlinear Anal. 28(6), 1079–1100 (1997) 11. Fernández-Cara, E., Guillén, F., Ortega, R.R.: Some theoretical results concerning non-Newtonian fluids of the Oldroyd kind. Ann. Scuola Norm. Sup. Pisa Cl. Sci. (4), 26(1), 1–29 (1998) 12. Fernández-Cara, E., Guillén, F., Ortega, R.R. The mathematical analysis of viscoelastic fluids of the Oldryod kind, 2000 13. Guillopé, C., Saut, J.-C.: Existence results for the flow of viscoelastic fluids with a differential constitutive law. Nonlinear Anal. 15(9), 849–869 (1990) 14. Guillopé, C., Saut, J.-C.: Global existence and one-dimensional nonlinear stability of shearing motions of viscoelastic fluids of Oldroyd type. RAIRO Modél. Math. Anal. Numér. 24(3), 369–401 (1990) 15. Jourdain, B., Lelièvre, T., Le Bris, C.: Existence of solution for a micro-macro model of polymeric fluid: the FENE model. J. Funct. Anal. 209(1), 162–193 (2004) 16. Lin, F.-H., Liu, C., Zhang, P.: On hydrodynamics of viscoelastic fluids. Comm. Pure Appl. Math. 58(11), 1437–1471 (2005) 17. Lin, F.-H., Liu, C., Zhang, P.: On a Micro-Macro model for polymeric fluids near equilibrium. Comm. Pure Appl. Math. 60(6), 838–866 (2007) 18. Lin, F.-H., Zhang, P., Zhang, Z.: On the global existence of smooth solution to the 2-d FENE dumbell model. Commun. Math. Phys., DOI:10.1007/s00220-007-0385-1 19. Lions, P.-L., Masmoudi, N.: Global solutions for some Oldroyd models of non-Newtonian flows. Chinese Ann. Math. Ser. B 21(2), 131–146 (2000) 20. Lions, P.-L., Masmoudi, N.: Global existence of weak solutions to micro-macro models. To appear C. R. Math. Acad. Sci. Paris, 2007 21. Masmoudi, N.: Well posedness for the FENE dumbbell model of polymeric flows. Preprint, 2007 22. Otto, F., Tzavaras, A.E.: Continuity of velocity gradients in suspensions of rod-like molecules. SFB preprint Nr. 141, 2004 23. Zhang, H., Zhang, P.: Local existence for the FENE-dumbbell model of polymeric fluids. Arch. Ration. Mech. Anal. 181(2), 373–400 (2006) Communicated by A. Kupiainen
Commun. Math. Phys. 278, 193–252 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0386-0
Communications in
Mathematical Physics
Localization for Yang-Mills Theory on the Fuzzy Sphere Harold Steinacker1 , Richard J. Szabo2 1 Institut für Theoretische Physik, Universität Wien, Boltzmanngasse 5, A-1090 Wien, Austria.
E-mail:
[email protected] 2 Department of Mathematics, and Maxwell Institute for Mathematical Sciences, Heriot-Watt University,
Colin Maclaurin Building, Riccarton, Edinburgh EH14 4AS, UK. E-mail:
[email protected] Received: 27 February 2007 / Accepted: 8 May 2007 Published online: 20 November 2007 – © Springer-Verlag 2007
Abstract: We present a new model for Yang-Mills theory on the fuzzy sphere in which the configuration space of gauge fields is given by a coadjoint orbit. In the classical limit it reduces to ordinary Yang-Mills theory on the sphere. We find all classical solutions of the gauge theory and use nonabelian localization techniques to write the partition function entirely as a sum over local contributions from critical points of the action, which are evaluated explicitly. The partition function of ordinary Yang-Mills theory on the sphere is recovered in the classical limit as a sum over instantons. We also apply abelian localization techniques and the geometry of symmetric spaces to derive an explicit combinatorial expression for the partition function, and compare the two approaches. These extend the standard techniques for solving gauge theory on the sphere to the fuzzy case in a rigorous framework. Contents 1. 2.
3.
4.
Introduction and Summary . . . . . . . . . . . . . . . . . . . Symplectic Model for Yang-Mills Theory on the Fuzzy Sphere 2.1 The fuzzy sphere . . . . . . . . . . . . . . . . . . . . . . 2.2 Configuration space of gauge fields . . . . . . . . . . . . . 2.3 The Yang-Mills action . . . . . . . . . . . . . . . . . . . . 2.4 Symplectic geometry of the configuration space . . . . . . The Classical Configuration Space . . . . . . . . . . . . . . . 3.1 Classical solutions . . . . . . . . . . . . . . . . . . . . . . 3.2 The classical action . . . . . . . . . . . . . . . . . . . . . 3.3 Local symplectic geometry of the configuration space . . . 3.4 Explicit decomposition at Yang-Mills critical surfaces . . . 3.5 Fluctuations around the critical surfaces . . . . . . . . . . Nonabelian Localization . . . . . . . . . . . . . . . . . . . . . 4.1 Equivariant cohomology and the localization principle . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
194 196 196 197 200 201 203 203 206 207 209 214 215 216
194
5. 6. 7.
8.
H. Steinacker, R. J. Szabo
4.2 Explicit evaluation of the localization forms . . . . . . . . . . . . . 4.3 Localization at the vacuum moduli space . . . . . . . . . . . . . . . 4.4 Localization at maximally irreducible saddle points . . . . . . . . . Abelianization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Itzykson-Zuber Localization on the Configuration Space . . . . . . . . . Abelian Localization and Radial Coordinates . . . . . . . . . . . . . . . 7.1 Polar decomposition of the configuration space . . . . . . . . . . . 7.2 Evaluation of the abelianized partition function: U (1) gauge theory . 7.3 Evaluation of the abelianized partition function: U (n) gauge theory . Yang-Mills Critical Surfaces in Abelianized Localization . . . . . . . . . 8.1 Itzykson-Zuber localization on the symplectic leaves . . . . . . . . 8.2 Radial coordinates for Yang-Mills critical surfaces . . . . . . . . . . 8.3 Action of the gauge group . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
219 222 225 233 234 235 236 238 241 243 245 246 250
1. Introduction and Summary Gauge theory on the fuzzy sphere has been of interest for many years as the simplest example of a noncommutative gauge theory with finitely many degrees of freedom which retains all of the classical symmetries of the corresponding undeformed field theory (see for instance [1–12] and references therein). It can be formulated as an N × N matrix model, which provides a natural regularization preserving all symmetries of quantum gauge theory on the classical sphere which is recovered in the large N limit. At the classical level one finds non-trivial gauge field configurations such as monopoles which can be naturally described in terms of the noncommutative topology of projective modules. Besides Yang-Mills gauge theory which is the focus of this paper, certain other gauge theories on the fuzzy sphere naturally emerge in string theory upon quantizing the worldvolume dynamics of spherical D2-branes [14], obtained for instance as expansions about vacua of matrix models with a Chern-Simons term [15,16] describing superstrings in pp-wave backgrounds [17]. These models contain additional scalar degrees of freedom and are not considered here. The formulation of Yang-Mills theory as an N × N matrix model allows a nonperturbative quantization in terms of a finite-dimensional path integral [9]. This can then be evaluated in terms of an N -dimensional integral, and the classical result as a sum over two-dimensional instantons [18–20] is recovered in the commutative limit N → ∞. A different approach to evaluate the path integral was given in [11], which is also restricted to the large N limit. This indicates in particular that the model is void of the usual perturbative ambiguities which plague noncommutative gauge theories in higher dimensions, such as UV/IR mixing (see [21,22] for reviews). In this paper we will formulate a new model for quantum Yang-Mills theory on the fuzzy sphere, and solve it exactly. The model reduces to pure Yang-Mills theory on the classical sphere when N → ∞ without any spurious auxiliary scalar fields. The classical theory admits topologically non-trivial solutions as in previous matrix model formulations [9], including some purely noncommutative ones. Its main virtue is that the finite-dimensional configuration space of gauge fields can be described as a compact coadjoint orbit, which is naturally a symplectic manifold with a hamiltonian action of a nonabelian Lie symmetry group. The Yang-Mills action is the square of the corresponding moment map, and therefore our model can be solved exactly using nonabelian localization techniques [18,23–27] to cast the partition function as a sum over local contributions from the classical solutions of the gauge theory. It can also be solved
Localization for Yang-Mills Theory on the Fuzzy Sphere
195
by abelian localization techniques which exploit the usual Duistermaat-Heckman theorem (see [28,29] for extensive treatments) and which provide an interesting alternative to the semiclassical expansion. Although the model described in this paper is fundamentally different from the fuzzy gauge theories that naturally emerge in string theory, which contain a Chern-Simons term in their action, nonabelian localization bears certain remarkable similarities to the nonabelian localization of Chern-Simons theory on Seifert homology spheres [27]. There are two main motivations behind the present work. Firstly, in the commutative case, two-dimensional gauge theories are exactly solvable and can be solved explicitly, either at strong coupling by exploiting the Migdal formula [30,31] which expresses it in terms of a sum over irreducible representations of its gauge group, or at weak coupling by using Poisson resummation techniques to cast it as a sum over two-dimensional instantons [18–20]. One would therefore like to have a similar picture in the noncommutative case. The instanton expansion can be readily generalized to provide the exact solution for gauge theory on a two-dimensional noncommutative torus [32,33]. However, in previous formulations of gauge theory on the fuzzy sphere this is not possible, either because extra scalar degrees of freedom not normally present in commutative Yang-Mills theory destroy the topological nature of the gauge theory and hence its exact solvability, or else because the exact solution does not decompose neatly into isolated contributions from classical solutions. Our model fills this gap, providing a gauge theory on the fuzzy sphere whose exact solution is on a unified footing with that of gauge theory on the noncommutative torus, in the same way that all two-dimensional gauge theories admit universal solutions. This is even apparent from the strong coupling expansions of the two noncommutative gauge theories [33,34], which exhibit the same degrees of complexity. However, the precise implementation of the nonabelian localization principle is rather different in the two cases. In the case of the torus, one starts from a rational noncommutative gauge theory and exploits Morita equivalence with commutative gauge theory to extract the exact instanton expansion, and then uses continuity arguments to extend the expansion to generic values of the noncommutativity parameter. On the fuzzy sphere, Morita equivalence is not available in this manner, and we will have to evaluate the quantum fluctuation integrals required in the semiclassical expansion explicitly. This entails a significantly larger amount of analysis and work than in the case of the torus. Secondly, our formulation of gauge theory on the fuzzy sphere provides a new finitedimensional model which can be solved explicitly by nonabelian localization techniques. In particular, we draw heavily on techniques developed recently in [27] to analyse higher critical points in ordinary two-dimensional Yang-Mills theory. In our case, the analysis is intrinsically finite-dimensional and in accord with rigorous results established in [24,26]. The techniques we exploit in this paper involve a beautiful mix of methods from random matrix theory and (both abelian and nonabelian) localization. In particular, we will throughout compare with some analogous results obtained directly from random matrix theory in [9]. Our approach thereby extends the toolkit of methods which can be generally used to treat gauge theories on fuzzy spaces. The outline of this paper is as follows. In Sect. 2 we introduce our new symplectic model for gauge theory on the fuzzy sphere, showing that it reduces to pure Yang-Mills theory on the classical sphere in the large N limit. We also describe in detail the standard construction of the symplectic structure on the coadjoint orbit space of gauge fields. In Sect. 3 we classify all classical solutions of the gauge theory, finding fuzzy versions of the usual instantons and monopoles as well as hosts of purely noncommutative solutions
196
H. Steinacker, R. J. Szabo
such as fluxons [35]. We then give a detailed description of the local geometry of the configuration space near each Yang-Mills critical point. In Sect. 4 we review some general aspects of nonabelian localization, and apply it to compute precisely the contributions to the path integral from the vacuum and also higher unstable critical points, showing in each case that the standard instanton contributions on the sphere are recovered at N → ∞. In Sects. 5, 6, and 7 we give an alternative description of the exact path integral in terms of abelian localization, which exploits the fact that the configuration space is a hermitian symmetric space to express the gauge field degrees of freedom in a suitable system of coordinates [36]. These coordinates have been previously used to evaluate integrals arising in random matrix theory in [37,38]. Finally, in Sect. 8 we compare the abelian and nonabelian localization approaches, indicating how to map between the Yang-Mills critical points and those of the abelianized localization. This is similar to the abelianized localization at higher critical points of ordinary Yang-Mills theory studied in [28], although in the fuzzy case the mapping is not one-to-one and is thus far more intricate. 2. Symplectic Model for Yang-Mills Theory on the Fuzzy Sphere In this section we will introduce our new symplectic model for gauge theory on the fuzzy sphere. A similar formulation was given for gauge theory on fuzzy CP 2 in [39]. This formulation will be particularly suitable for the approach that we take later on to computing the path integral using localization techniques. 2.1. The fuzzy sphere. Let N ∈ N, and let ξi , i = 1, 2, 3 be the N × N hermitian coordinate generators of the fuzzy sphere S N2 ∼ = Mat N which satisfy the relations i j k ξi ξ j = i ξk
and
ξi ξ i =
1 4
N 2 − 1 1l N ,
(2.1)
where throughout repeated upper and lower indices are implicitly summed over. The deformation parameter is N1 and S N2 becomes the algebra of functions on the classical unit sphere S 2 in the limit N → ∞. The quantum space S N2 preserves the classical invariance under global rotations as follows. The ξi generate an N -dimensional representation of the global SU (2) isometry group. Under the adjoint action of SU (2), this representation decomposes covariantly into p-dimensional irreducible representations ( p) of SU (2) as Mat N ∼ = (1) ⊕ (3) ⊕ · · · ⊕ (2N − 1),
(2.2)
which are interpreted as fuzzy spherical harmonics. This decomposition defines a natural map from S N2 to the space of functions on the commutative sphere. The integral of a function f ∈ S N2 over the fuzzy sphere is given by the trace of f , which coincides with the usual integral on S 2 , N d f, (2.3) Tr ( f ) = 4π S 2 where the above map is understood. Rotational invariance of the integral then corresponds to invariance of the matrix trace under the adjoint action of SU (2).
Localization for Yang-Mills Theory on the Fuzzy Sphere
197
Following [9], let us combine the generators ξi into a larger hermitian N × N matrix =
1 2
1l N ⊗ σ 0 + ξi ⊗ σ i ,
(2.4)
where N = 2N , σ 0 = 1l2 , while 01 0 i 1 0 , σ2 = and σ 3 = σ1 = 10 −i 0 0 −1 are the Pauli spin matrices obeying Tr σ i = 0 and
σ i σ j = δ i j 1l2 + i i j k σ k .
(2.5)
(2.6)
One easily finds from (2.1) and (2.6) the identities 2 =
N2 4
1lN
and
Tr () = N .
(2.7)
Since ξi ⊗ σ i is an intertwiner of the Clebsch-Gordan decomposition (N ) ⊗ (2) = (N − 1) ⊕ (N + 1), this implies that has eigenvalues ± N2 with respective multiplicities N± = N ± 1. 2.2. Configuration space of gauge fields. We will now describe the gauge field degrees of freedom in our formulation. To elucidate the construction in as transparent a way as possible, we begin with the abelian case of U (1) gauge theory. To introduce u(N ) gauge fields Ai on S N2 , consider the covariant coordinates [40] Ci = ξi + Ai
and
C0 =
1 2
1l N + A0 ,
(2.8)
which transform under the gauge group G = U (N ) as Cµ → U −1 Cµ U for µ = 0, 1, 2, 3 and U ∈ U (N ). We can again assemble them into a larger N × N matrix C = Cµ ⊗ σ µ .
(2.9)
Generically, these would consist of four independent fields, and we have to somehow reduce them to two tangential fields on S N2 . There are several ways to do this. For example,
one can impose the constraints A0 = 0 and Ci C i = N 4−1 1lN as in [9], leading to a constrained hermitian multi-matrix model describing quantum gauge theory on the fuzzy sphere which recovers Yang-Mills theory on the classical sphere in the large N limit. Here we will use a different approach and impose the constraints 2
C2 =
N2 4
1lN
and
Tr (C) = N
(2.10)
which is equivalent to requiring that C has eigenvalues ± N2 with multiplicities N± = N ± 1. In terms of the components of (2.9), this amounts to the constraints Ci C i + C02 =
N2 4
1lN
and
i i jk C j Ck + {C0 , Ci } = 0.
(2.11)
We checked in Sect. 2.1 above that this is satisfied for Aµ = 0, wherein C = . We can then consider the action of the unitary group U (2N ) given by C −→ U −1 C U
(2.12)
198
H. Steinacker, R. J. Szabo
which generates a coadjoint orbit of U (2N ) and preserves the constraint (2.10). The gauge fields Aµ are in this way interpreted as fluctuations about the coordinates of the quantum space S N2 . The constraint (2.10) ensures that the covariant coordinates (2.9) describe a dynamical fuzzy sphere. The gauge group G = U (N ) and the global isometry group SU (2) of the sphere are subgroups of the larger symmetry group U (2N ). In particular, the generators of the gauge group are given by elements of the form φ = φ0 ⊗ σ 0 ∈ g := u(N ) ⊂ u(N ), which defines the gauge algebra g. We thus claim that a possible configuration space of gauge fields is given by the single coadjoint orbit (2.13) O := O() = C = U −1 U U ∈ U (N ) , where ∈ u(2N ) is given by (2.4). Explicitly, dividing by the stabilizer of gives a representation of the orbit (2.13) as the symmetric space O ∼ = U (2N )/U (N + 1) × U (N − 1) of dimension dim(O) = 2(N 2 − 1). A similar construction was given in [39] for the case of CP 2 , and applied to S N2 in a different way in [11]. To justify this claim, we must check that the orbit O captures the correct number of degrees of freedom at least in the commutative limit N → ∞, i.e. that the gauge fields Ai are essentially tangent vector fields on S N2 . The tangent space to O() at a point C is isomorphic to TC O ∼ = u(N )/r, where r = u(N+ ) × u(N− ) is the stabilizer subalgebra of . This identification is equivariant with respect to the natural adjoint action of the Lie group U (N ). Explicitly, tangent vectors to O() at C have the form1 Vφ = i [C, φ]
(2.14)
for any hermitian element φ ∈ u(N )/r,2 which are just the generators of the unitary group U (N ) acting on O() by the adjoint action. These actually describe vector fields on the entire orbit space O(). Here and in the following we use the symbol C to denote both elements of O(), as well as the matrix of overcomplete coordinate functions on 2 O() defined using the embeddings O() → u(N ) → CN . The map J . Following [39], we can make the description of the tangent space to O, spanned by the vectors Vφ , more explicit as follows. Consider for C ∈ O the map J : u(N ) −→ su(N )
(2.15)
Vφ =
(2.16)
defined by J (φ) =
1 N
i N
[C, φ] .
Using (2.10) one finds that it satisfies J 3 = −J
(2.17)
and hence amounts to suitable projectors. Moreover, the map J is an antihermitian operator with respect to the invariant Cartan-Killing inner product Tr (φ ψ) on u(N ), since Tr (φ J (ψ)) =
i N
Tr (φ [C, ψ]) = − Ni Tr ([C, φ] ψ) = − Tr (J (φ) ψ) .
(2.18)
1 To streamline notation, we will not write explicitly the local dependences of fields and operators defined at points C ∈ O. 2 With our conventions, the vector fields (2.14) are real.
Localization for Yang-Mills Theory on the Fuzzy Sphere
199
The map J will play an instrumental role in this paper and its geometrical properties will be studied in more detail in the next section. Here we simply note the meaning of J in the commutative limit N → ∞. In component form with φ = φµ ⊗ σ µ , it acts as3
J (φ) ≈ − Ni φµ ⊗ σ µ , C j ⊗ σ j
≈ − Ni φµ , C j ⊗ σ µ σ j + Ni φµ C j ⊗ σ µ , σ j , (2.19) where we have set C0 ≈ large N this reduces to
1 2
1l N in the large N limit as will be justified below. Thus at
J (φ) ≈ O
1 N
− i j k φi x j ⊗ σ k
(2.20)
for “almost” commutative functions describing the gauge field fluctuations Aµ . Here ξi ≈ N2 xi define homogeneous coordinates xi on the sphere. This result means that if we interpret φi as a three-component vector field on the fuzzy sphere, including radial components, then the operator J vanishes on the normal component and essentially coincides with the complex structure for tangential fields on the Kähler manifold S 2 . In particular, the image of J , i.e. the space of tangent vectors (2.14) to O() or small variations of the gauge field, indeed admits two independent field degrees of freedom. This implies that the orbit (2.13) describes two tangent vector fields on S N2 . Hence the tangent space to O can be interpreted precisely as the space of tangent vector fields on the fuzzy sphere. This nicely reflects the affine nature of the space of gauge fields. Nonabelian gauge theory. The generalization to nonabelian U (n) gauge theory is very simple. One now takes N = 2n N
(2.21)
and enlarges the matrix (2.4) to ⊗ 1ln (which we continue to denote as for ease of 2 notation). The configuration space is given by the U (N ) orbit (2.13) with C 2 = N4 1lN and Tr (C) = n N . Then C has eigenvalues ± space
N 2
(2.22)
of respective multiplicities n (N ± 1). The configuration
O = U (2n N )/U (n N+ ) × U (n N− ) describes u(n) – valued gauge fields on S N2 . Its dimension is given by dim(O) = 2n 2 N 2 − 1 .
(2.23)
(2.24)
The gauge group is now given by G = U (n N ), and acts on the covariant coordinates Ci = ξi ⊗ 1ln + Ai , C0 = 21 1ln N + A0 as Cµ → U −1 Cµ U . This leads to the expected transformation law for the u(n) – valued gauge fields Ai . The corresponding gauge algebra is now g := u(n N ) ⊂ u(N ), consisting of elements of the form φ = φ0 ⊗σ 0 ∈ g. 3 Throughout, the notation ≈ will always mean an equality which is valid in the large N commutative limit.
200
H. Steinacker, R. J. Szabo
2.3. The Yang-Mills action. Consider the action S = S(C) :=
N g
Tr C0 −
1 2
1ln N
2
(2.25)
for C ∈ O, which is invariant under the group of gauge transformations G as well as global SU (2) rotations. We claim that it reduces in the commutative limit N → ∞ to the usual Yang-Mills action on the sphere S 2 , and can therefore be taken as a definition of the Yang-Mills action on the fuzzy sphere S N2 . We establish this explicitly below in the abelian case n = 1, the extension to general n being obvious. Consider the three-component field strength [9] Fi := i i jk C j Ck + Ci = i i jk [ξ j , Ak ] + i i jk A j Ak + Ai ,
(2.26)
where Ci = ξi + Ai as in (2.8). To understand its significance, consider the “north pole” of S N2 , where ξ3 ≈ N2 x3 = N2 1l N (with unit radius), and one can replace the operators i adξi −→ −εi j ∂ j := −εi j
∂ ∂x j
(2.27)
in the commutative limit for i, j = 1, 2. Hence upon identifying the commutative gauge fields Aicl through Aicl = −εi j A j ,
(2.28)
the “radial” component F3 of the field strength (2.26) reduces in the commutative limit to the standard expression
cl cl cl (2.29) F3 ≈ ∂1 Acl 2 − ∂2 A1 + i A1 , A2 . The constraint (2.11) now implies Fi + C0 − 21 1l N , Ci = Fi + {A0 , Ci } = 0, {ξi , Ai } + A0 + Ai Ai + A0 A0 = 0.
(2.30)
Since only configurations with A0 = O( N1 ) have finite action (2.25) and ξ3 is of order N , this implies that A3 , F1 and F2 are of order N1 at the north pole, while A1 and A2 can be finite of order 1. In particular, only the radial component F3 survives the N → ∞ limit, with F3 = −{A0 , C3 } ≈ −N A0 .
(2.31)
This analysis can be made global by considering the “radial” field strength Fr = x i Fi , which reduces to the usual field strength scalar on S 2 . The action (2.25) thus indeed reduces to the usual Yang-Mills action in the commutative limit with dimensionless gauge coupling g, giving 1 1 S≈ d (Fr )2 . (2.32) Tr (Fr )2 ≈ Ng 4π g S 2
Localization for Yang-Mills Theory on the Fuzzy Sphere
201
2.4. Symplectic geometry of the configuration space. The standard Kirillov-Kostant construction makes the orbit space (2.13) into a symplectic manifold [41]. Given two tangent vector fields Vφ , Vψ as above with φ, ψ ∈ u(N ), the symplectic two-form ω ∈ 2 (O) is defined locally through its pairing with the bivector Vφ ∧ Vψ as ω, Vφ ∧ Vψ = i Tr (C [φ, ψ]) .
(2.33)
Using trace manipulations it is easy to see that the kernel of this pairing coincides with the stabilizer algebra r, and hence it is nondegenerate on O(). We will derive below an explicit form of ω (2.47), which allows to verify directly the well-known fact that ω is closed, dω = 0.
(2.34)
Thus ω indeed defines an invariant symplectic structure on O(). The tangent vectors Vφ are hamiltonian vector fields, and we claim that their generator is given by Hφ = Tr (φ C)
(2.35)
for φ ∈ u(N ). Indeed, then dHφ = Tr (φ dC), and by using the dual evaluation dC, Vφ = i [C, φ],
(2.36)
one has dHφ , Vψ = i Tr (φ [C, ψ]) = − i Tr (C [φ, ψ]) = −ω, Vφ ∧ Vψ = −ιVφ ω, Vψ ,
(2.37)
where ιVφ denotes contraction with the vector field Vφ . Thus dHφ = −ιVφ ω
(2.38)
as claimed. This means that the hamiltonian function (2.35) defines a periodic flow generated by the action of a one-parameter subgroup C → e i t φ C e − i t φ , t ∈ R. The corresponding equivariant moment map µ : O() → u(N )∨ is the inclusion map which has the pairings µ(C), φ = Hφ ,
(2.39)
and it defines a representation of the Lie algebra u(N ) through the Poisson algebra corresponding to ω. For gauge transformations φ = φ0 ⊗ σ 0 , the moment map µ reduces to µ(C), φ = 2 Tr (φ0 C0 ) = Tr (φ0 (1ln N + 2 A0 )) . In the commutative limit and for abelian gauge fields n = 1, this becomes 2 1 µ(C), φ ≈ Tr (φ0 ) − Tr (φ0 Fr ) ≈ − d φ0 Fr N 2π S 2
(2.40)
(2.41)
up to an irrelevant shift, which is just the anticipated moment map for Yang-Mills theory on the classical sphere [18]. Given the appropriate symplectic structure and moment
202
H. Steinacker, R. J. Szabo
map on the gauge field configuration space O, the nonabelian localization principle for two-dimensional Yang-Mills theory can be applied for the action constructed as the square of the moment map. This is precisely the Yang-Mills action on S N2 given in (2.25). The constant term 21 1ln N is just the first Chern number of a background gauge field configuration and is of no significance for this discussion. This procedure will be worked out in detail in Sect. 4. More about the symplectic form. For later use, we will now derive some properties of the symplectic form introduced in (2.33). Consider the i u(N )-valued one-form on O() given by θ := C −1 dC.
(2.42)
Given the constraints (2.10) and using dC 2 = 0, this can be rewritten as θ=
4 N2
C dC =
2 N2
[C, dC].
(2.43)
Tr (θ ) = 0.
(2.44)
It obeys the constraints dθ + θ 2 = 0
and
Thus θ ∈ 1 (O, i u(N )) is essentially the canonical invariant Maurer-Cartan one-form, with the additional property [C, θ ] = −2J 2 (dC) = 2 dC,
(2.45)
where we have used the fact that dC is tangent to the orbit space and applied the projection property (2.17). In particular, along with the fact that C 2 is constant, this implies that C θ + θ C = 0.
(2.46)
Using again the constraint (2.10), the symplectic two-form (2.33) can be written as (2.47) ω = − 2Ni 2 Tr (C [dC, dC]) = 4i Tr C θ 2 . To see this, we substitute this expression using (2.18) and (2.17) into ω, Vφ ∧ Vψ = − Ni 2 Tr (C [ [C, φ], [C, ψ] ]) = i Tr (C [J (φ), J (ψ)]) = i Tr ([C, J (φ)] J (ψ)) = −N Tr J 3 (φ) ψ = N Tr (J (φ) ψ) = i Tr ([C, φ] ψ) = i Tr (C [φ, ψ])
(2.48)
for any φ, ψ ∈ u(N ), which coincides with the definition (2.33). Using (2.45) and (2.46), this identity gives a simple proof of the closure property (2.34) as (2.49) dω = 4i Tr dC θ 2 = − 8i Tr [θ, C] θ 2 = 0.
Localization for Yang-Mills Theory on the Fuzzy Sphere
203
3. The Classical Configuration Space In this section we will investigate in detail the space of classical solutions of U (n) gauge theory on the fuzzy sphere S N2 defined by the action (2.25). Understanding this space will be crucial for the exact solution of the quantum gauge theory, which as we will see in the next section is given exactly by its semiclassical expansion. We will first classify the solutions to the classical equations of motion, over which the partition function will be summed. Among these solutions we will find a variety of fluxons and, as in the case of gauge theory on the noncommutative torus, only a very small subset of all two-dimensional noncommutative instantons on S N2 map into the usual instantons of Yang-Mills theory on S 2 in the commutative limit N → ∞. We will then thoroughly describe the local symplectic geometry of the configuration space O near each critical point of the Yang-Mills action, as symplectic integrals over these neighbourhoods will produce the required quantum fluctuation determinants in the semiclassical expansion. 3.1. Classical solutions. The critical points of the Yang-Mills action (2.25) are easy to find. Since the most general variation of a gauge field C ∈ O is given by δC = [C, φ], by varying (2.25) one finds that the critical points satisfy
0 = Tr δC0 (C0 − 21 1ln N ) = Tr ([C, φ] C0 ) = Tr (φ [C0 , C]) (3.1) for arbitrary φ ∈ u(N )/r. They are therefore given by solutions of the equation [C0 , C] = 0, which agrees with the known saddle-points in the matrix model formulation of [9]. This equation is equivalent to [C0 , Ci ] = 0
(3.2)
which together with (2.11) implies that [Ci , C j ] = i i j k (2C0 ) Ck , C02 =
N2 4
1ln N − Ci C i .
(3.3)
For solutions with C0 = 0, we can use (3.2) to define Li =
1 Ci , 2C0
(3.4)
and rewrite (3.3) as [L i , L j ] = i i j k L k , N2 1 L i L i = 4C 1ln N . − 2 4
(3.5)
0
These equations mean that the critical points of the Yang-Mills action correspond to (isomorphism classes of) (n N ) × (n N ) unitary representations of the isometry group SU (2), i.e. homomorphisms πn N : SU (2) → U (n N ). Up to isomorphism, for each integer p ≥ 1 there is a unique irreducible SU (2) representation ( p) of dimension p. Therefore, there is a one-to-one correspondence between classical solutions and ordered partitions (n 1 , . . . , n k ) of the integer n N = n 1 +· · ·+n k , with n i the dimension of the i th irreducible subrepresentation in the representation πn N characterizing the given critical
204
H. Steinacker, R. J. Szabo
point. Eachsuch classical solution breaks the U (n N ) gauge symmetry locally to the centralizer i U (ki ) of the homomorphism πn N , where ki denotes the multiplicity of the blocks. They can be seen [9] to give precisely the usual two-dimensional instantons for U (n) Yang-Mills theory on S 2 . These solutions also agree with those that can be interpreted as configurations of D0-branes inside D2-branes [14], although the ones which will survive the large N limit are different. Therefore, each critical point is labelled (up to gauge equivalence) by the set of dimensions n i of the irreducible representations, supplemented with a sign si which is defined by si = sgn(C0 (n i )) = ± 1 (in that representation) when C0 (n i ) = 0 and si = 0 if C0 (n i ) = 0. We can thereby label the critical surfaces, i.e. the connected components of the moduli space of classical solutions in O, as C(n 1 ,s1 ),...,(n k ,sk )
with
n i ∈ N and si ∈ {± 1, 0}
(3.6)
with the constraints 1 ≤ n1 ≤ n2 ≤ · · · ≤ nk ,
k
n i = n N and
i=1
k
si = n,
(3.7)
i=1
and si = 0 only if n i = 1. Any non-trivial irreducible representation with n i > 1 and C0 = 0 gives a contribution ±N to the trace Tr (C), which must be balanced in order to satisfy the eigenvalue multiplicity constraint (2.22). This is the role of the condition i si = n in (3.7). Note that one can change the sign of any individual irreducible representation. The meaning of the blocks (n i , si ) can be described as follows: • sa = ± 1 : In this case C0 = 0, and hence C0 > 21 due to (3.5). These solutions come with two signs. Note that any irreducible representation with small dimension will be highly suppressed in the large N limit. The most extreme case is a sum of trivial representations, with n a = 1, for which Ci = 0
and
C0 (n a = 1) = sa
N 2.
(3.8)
• sa = 0 : In this case C0 = 0 and n a = 1, which implies that Ci = ci with ci ∈ R 2
and N4 = ci ci . These solutions are also suppressed at large N but less so than those with Ci = 0 above. They correspond to fluxons [35] whose positions on S 2 are determined by the vector ci . Note that each such saddle-point (or more generally any gauge field configuration C) defines a projective module over the fuzzy sphere algebra S N2 , obtained by writing C in 2n × 2n block-matrix form. The module then corresponds to a projector (n 1 ,s1 ),...,(n k ,sk ) ∈ Mat 2n (S N2 ). Let us describe some of these critical points explicitly. Ground state. The vacuum solution has k = n and is given by the critical surface 2 C(N ,1),...,(N ,1) , which implies that C0 = 21 1ln N . It follows that Ci C i = N 4−1 1l N , which is the quadratic Casimir invariant of the N -dimensional irreducible representation of SU (2). Using a suitable U (n N ) gauge transformation, it can be written as Ci = ξi ⊗ 1ln ,
(3.9)
Localization for Yang-Mills Theory on the Fuzzy Sphere
205
and we recover the original coordinates of the fuzzy sphere S N2 . This is equivalent to the vanishing curvature condition F = 0. In the abelian case n = 1, an application of Schur’s lemma shows that the only matrix which commutes with C is the constant matrix and so the gauge group U (N ) acts freely on the moduli space of vacuum solutions, corresponding simply to a change of basis in this case. For n > 1 the solution is a direct sum of n identical representations. This commutes with the action of u(n), and so now the gauge group U (n N ) contains a non-trivial stabilizer. The moduli space of flat connections is therefore isomorphic to the smooth manifold U (n N )/U (n) in the nonabelian case. Note that any configuration near the vacuum, with small but finite action, is given by a small deformation of an irreducible SU (2) representation describing S N2 , and in particular the gauge field fluctuations Aµ are “small”. It is in this sense that the quantum gauge theory will describe a fluctuating theory of noncommutative fuzzy sphere geometries. Fluxons. At the other extreme, if C0 has several zero eigenvalues, i.e. several fluxons, the situation is much more complicated. For example, when C0 = 0 and n = 1 we obtain a fuzzy version of the moduli space of constant curvature connections in genus 0 provided by the critical surface 2 µ−1 (C0 = 0) = Ci ∈ u(N ) Ci C i = N4 1l N , [Ci , C j ] = 0 (3.10) along with the condition (2.22) on the multiplicities of the eigenvalues of Ci ⊗ σ i . The action of the U (N ) gauge group on (3.10) can be used to simultaneously diagonalize the three matrices Ci . The Marsden-Weinstein symplectic reduction of the orbit space O() is then essentially a symmetric product orbifold of the classical sphere S 2 given by (3.11) M0 := µ−1 (C0 = 0)// U (N ) ∼ = Sym N S 2 , where Sym N (S 2 ) := (S 2 ) N /S N and the quotient by the Weyl group S N ⊂ U (N ) is the residual gauge symmetry acting by permutations of the real eigenvalues of the hermitian matrices Ci representing the positions of the fluxons on S 2 , which are indistinguishable. The fluxon moduli space M0 contains orbifold singularities arising from the fixed points of the S N -action on (S 2 ) N , which occur whenever two or more fluxon locations coincide. This is analogous to the vacuum solution of two-dimensional U (N ) gauge theory on a noncommutative torus wherein the moduli space of constant curvature connections is the symmetric product orbifold Sym N (T 2 ) [32], and there is a natural correspondence between two-dimensional noncommutative instantons and fluxons [42]. In the present case the U (N ) action on the fluxon configuration space (3.10) also has additional fixed points. Note that the restriction of the symplectic two-form (2.47) to the moduli space M0 is given by N 4 i i jk a a ω M = − 2 ci dc j ∧ dcka , 0 N
(3.12)
a=1
2 where cia ∈ R are the eigenvalues of Ci with i (cia )2 = N4 for each a = 1, . . . , N . With the usual embedding of the two-sphere S 2 → R3 , this is just the standard round symplectic two-form on the Kähler manifold (S 2 ) N . Each fluxon contributes a suppression factor e
N − 4g
due to (2.25).
206
H. Steinacker, R. J. Szabo
Instantons on S 2 . The configurations which will dominate the path integral in the large N classical limit are the low-energy solutions with small actions. These are solutions with n partitions and critical surfaces C(n 1 ,1),...,(n n ,1) with n i ≈ N . They correspond to the usual instantons of U (n) gauge theory on S 2 with vanishing U (1) flux, as shown in [9]. These solutions may also contain additional fluxons, which behave like localized flux tubes which ensure that the total U (1) flux vanishes. Their contributions are suppressed −N
by factors of at least e 4g , however they do contribute in the double scaling, quantum plane limit wherein S N2 becomes noncommutative R2 [43,44]. Monopoles. As shown in [9,13], an irreducible representation with n i = N − m i corresponds to the gauge field of a monopole with magnetic charge m i ∈ Z. Configurations with non-trivial U (1) monopole number can therefore be obtained by relaxing the constraint (2.22) and replacing it by Tr (C) = n N − c1 ,
(3.13)
where c1 = i m i ∈ Z is the first Chern number. In order to maintain the constraint 2 C 2 = N4 1lN , the matrix dimension (2.21) must then be replaced with N = 2(n N −c1 ). Some of these nontrivial U (1) bundles are realized within the original configuration space (2.23), in the presence of trivial blocks with n a = 1, sa = ± 1. For example, in the abelian case n = 1 the solutions in C(N −2,1),(1,1),(1,−1) are naturally interpreted as monopoles with charge m = 2. The blocks (1, ± 1) have vanishing field strength Fi = 0, and are naturally interpreted as Dirac strings. They are suppressed by factors 3 of at least e −N /g . Replacing the trivial blocks with fluxons leads to vanishing global U (1) flux as discussed above.
3.2. The classical action. The values of the Yang-Mills action (2.25) on the classical solutions obtained in Sect. 3.1 above will determine the classical contributions to the path integral in the next section. The action at these critical points can be evaluated as follows. Note that for each p-dimensional irreducible representation L i of the isometry 2 group SU (2), one has L i L i = p 4−1 1l p and hence from (3.5) it follows that N2 p2
1l p = 4C0 ( p)2
(3.14)
on that representation, so that C0 ( p) = ± 2Np 1l p . Consider the reduced Yang-Mills action N N2 S := Ng Tr C02 = S + Ng Tr (C0 ) − 4g Tr (1ln N ) = S + n4g , (3.15) which is somewhat easier to manipulate than S. For a dominant solution with critical surface C(n 1 ,1),...,(n n ,1) and n i > 1, the action S is given by S ((n 1 , 1), . . . , (n n , 1)) =
n n N2 N N3 1 ni 2 = . g 4g n 4n i i=1 i=1 i
(3.16)
Localization for Yang-Mills Theory on the Fuzzy Sphere
207
While possible fluxon blocks with n i = 1 do not contribute at all to S , they do contribute N 4g to the original action S (2.25). Their total contributions to S are proportional to the fluxon charge, i.e. the total number of blocks with n i = 1, and agree with the usual fluxon action [35] in the quantum plane limit of S N2 [43]. The dominant configurations in the classical limit are therefore those with ni = N − m i
n
and
mi = 0
(3.17)
i=1
with small m i ∈ Z, for which C0 (n i ) =
N 2(N −m i )
1ln i ≈
1 2
1+
mi N
1ln i .
(3.18)
Note that then Tr (C0 ) =
n
(N − m i )
i=1
nN N = 2(N − m i ) 2
(3.19)
as required. It follows that S ((n 1 , 1), . . . , (n n , 1)) ≈
n n m 2
1 2 N i (N − m i ) + O N1 ≈ mi , g 2N 4g i=1
i=1
(3.20) which is the usual expression [19,20] for the classical action of U (n) Yang-Mills theory on the sphere S 2 with trivial gauge bundle evaluated on the two-dimensional instanton on S 2 corresponding to a configuration of n Dirac monopoles of magnetic charges m i ∈ Z. Non-trivial gauge bundles over S 2 of first Chern class c1 ∈ Z are obtained by modifying the trace constraint as in (3.13).
3.3. Local symplectic geometry of the configuration space. We will now develop the local symplectic geometry of the configuration space of gauge fields near each YangMills critical point. This is done by analysing in more detail the map (2.16), satisfying (2.17). We want to find a useful description of the tangent space TC O ∼ = im(J ), i.e. of the local geometry of the orbit space O. Since J is an anti-hermitian operator with respect to the Cartan-Killing form on u(N ) (see (2.18)), it follows that the space u(N ) splits into two orthogonal subspaces as u(N ) = ker(J ) ⊕ ker J 2 + 1lN , (3.21) where ker(J ) = r = u(n N+ ) ⊕ u(n N− ) is the stabilizer subalgebra, while ker(J 2 + 1lN ) ∼ = TC O is the tangent space to the configuration space at C ∈ O. In particular, J defines a complex structure on TC O, and (3.21) is just the Cartan decomposition of u(N ) corresponding to the symmetric space O. This follows immediately by noticing that the involutive automorphism j : u(N ) −→ u(N ),
φ −→ C φ C −1
(3.22)
208
H. Steinacker, R. J. Szabo
is 1lN on ker(J ) and −1lN on ker(J 2 +1lN ) upon using the constraints (2.10). Moreover, for any Vφ , Vψ ∈ TC O, from (2.47) one has
(3.23) ω, Vφ ∧ Vψ = Ni 2 Tr [C, Vφ ] Vψ = N1 Tr J (Vφ ) Vψ and
ω, Vφ ∧ J (Vψ ) =
1 N
Tr (Vφ Vψ ),
(3.24)
expressing the fact that the symplectic two-form ω makes the configuration space O into a Kähler manifold with respect to the complex structure (2.16). All of these properties are just standard features of hermitian symmetric spaces [36], as will be exploited at length in this paper. Consider the restriction of the map J to the gauge algebra g = u(n N ) ⊂ u(N ) containing elements of the form g = φ ⊗ σ 0 . Since J (φ) is the infinitesimal gauge transformation of the gauge field C generated by φ, it describes the orbits of the gauge group G = U (n N ) acting on the configuration space O, in TC O. Generically this action is free (apart from the trivial u(1)), but not for certain critical points. For example, for the vacuum solution (3.9) the subalgebra 1l N ⊗ u(n) commutes with C. The higher critical points in the nonabelian case generically have a smaller u(1)n centralizer algebra. More precisely, consider the kernel of J at C restricted to the gauge algebra g, s := ker(J ) ∩ g,
(3.25)
which is the subgroup of the gauge group that stabilizes C. The elements φ ∈ s are orthogonal to TC O due to (3.21). Hence g decomposes into orthogonal subspaces g = s ⊕ g ,
(3.26)
where g = s⊥ =: g s contains the “proper” gauge transformations, acting freely near C. If (n 1 , . . . , n n ) is a partition of the integer n N which does not contain trivial representations of SU (2) (no fluxons), then g is the tangent space to the corresponding critical surface C(n 1 ,1),...,(n n ,1) ⊂ O, (3.27) C(n ,1),...,(n ,1) ∼ = U (n N )/S, 1
n
where S = exp(s). We claim that the subspaces J (g) and g are linearly independent. For this, assume to the contrary that J (g) and g are linearly dependent, i.e. J (g) ∈ g for some g ∈ g. This implies that [Ci , g] = 0, and therefore [C02 , g] = 0 due to (2.11). Restricting attention to critical points C for which the spectrum of C0 is non-negative (the others being strongly suppressed at large N ), this implies that g commutes with the spectral projectors of C0 , and hence also with C0 itself. Together with [Ci , g] = 0 it follows that J (g) = 0. However, J (g) and g need not be orthogonal subspaces. Generically one then has J 2 (g) + J (g) ⊂ TC O.
(3.28)
The two subspaces are not orthogonal in general, since for g1 , g2 ∈ g one can compute the inner product Tr J 2 (g1 ) J (g2 ) = Tr (g1 J (g2 )) = − Ni Tr (C [g1 , g2 ]) = − N1 ω, Vg1 ∧ Vg2 =
i N
Tr (g1 [C0 , g2 ])
(3.29)
Localization for Yang-Mills Theory on the Fuzzy Sphere
209
which is non-vanishing in general. For the vacuum solution with C0 = 21 1ln N , it follows from this expression that the subspaces are indeed orthogonal, and hence J 2 (g)⊕J (g) ⊂ TC O. In fact, one has J 2 (g) ⊕ J (g) = TC O
if C0 =
1 2
1ln N ,
(3.30)
which provides a useful description of the local geometry near the global minimum. To see (3.30), note first that in the abelian case n = 1 one has s = u(1), and (3.30) then follows since dim(O) = 2(N 2 − 1) = 2 dim(g ). In the nonabelian case, for the vacuum state the gauge stabilizer s ∼ = u(n) has dimension n 2 and hence dim(J 2 (g ) ⊕ J (g )) = 2 2 2 2n N − 2n = dim(O). In general, the subspaces J (g) = J (g s) and J 2 (g) are not linearly independent, and we can define E 0 := J (g) ∩ J 2 (g),
(3.31)
which is generically a non-trivial subspace. Define also the subspaces h, h˜ ⊂ g s with the properties that ˜ J (h) = E 0 = J 2 (h).
(3.32)
˜ implies that h ⊂ h˜ ⊂ h, we have Since J 2 (h) = −J (h) h = h˜
and
J (E 0 ) = E 0 .
(3.33)
We can accordingly decompose the gauge algebra g into orthogonal subspaces as g = g1 ⊕ h ⊕ s.
(3.34)
Since J : h → E 0 is a bijection, there is a unique map j : h −→ h
with
J 2 (h) = J ( j (h))
(3.35)
for all h ∈ h which satisfies j 2 = −1ln N . Similarly, in order to span the entire tangent space at C ∈ O we generally have to introduce another subspace E 1 , with J (E 1 ) = E 1 , which gives the general decomposition J (g h) ⊕ J 2 (g h) ⊕ E 0 ⊕ E 1 = TC O.
(3.36)
3.4. Explicit decomposition at Yang-Mills critical surfaces. We will now provide an explicit description of the various subspaces appearing in the decomposition of the tangent space (3.36). Consider the Yang-Mills critical surfaces C(n 1 ,1),...,(n n ,1) and suppose first that n 1 = · · · = n n are all distinct integers, corresponding to a completely nondegenerate solution. The elements φ of the subspace (3.25) satisfy [C, φ] = 0. This implies that φ respects the block decomposition described by the given partition (n 1 , . . . , n n ), and is therefore proportional to 1ln i on each block. These are thus u(1)n degrees of freedom. If some n i are degenerate, this space is enhanced to s = u(k1 ) × · · · × u(kl ) (3.37) for a critical surface with C = i C(n i ) ⊗ 1lki and n i all distinct. For the vacuum this is u(n), corresponding to the maximally degenerate solution, as in Sect. 3.3 above.
210
H. Steinacker, R. J. Szabo
We wish to work out the map J explicitly. For this, we decompose ⎞ ⎛ φ11 φ12 . . . ⎟ ⎜ φ = ⎝ φ21 φ22 . . . ⎠ , .. ··· ··· .
(3.38)
where φi j ∈ (n i ) ⊗ (n j ) and as before ( p) denotes the p-dimensional irreducible representation of SU (2). In the degenerate case, there is another factor corresponding to u(k j ). The non-orthogonality of J (g) and J 2 (g) in (3.29) is now easily understood as being simply due to the different u(1) charges between the SU (2) sectors of s. Since [C, C0 ] = 0 at the Yang-Mills critical surfaces, one has J ([C0 , φ]) = [C0 , J (φ)]. Thus the hermitian operator (ad i C0 )i j = i C0 (n i ) − i C0 (n j ) = i
N n j − ni =: i ci j 2 ni n j
(3.39)
acting on φi j ∈ (n i ) ⊗ (n j ) commutes with J . This implies that we can decompose the subspaces in (3.36) such as J (h) = J 2 (h) = E 0 into irreducible representations of the operator ad i C0 , i.e. into the various u(1) blocks. Restricted to the diagonal blocks, C0 (n i ) is proportional to the unit matrix 1ln i , so that Tr (J (g1 ) J 2 (g2 )) = 0 there as for the vacuum. Global SU (2) symmetry. To proceed further, we need to exploit an additional symmetry that we have neglected so far, the global rotation group SU (2). Recall from Sect. 3.1 above that each saddle-point defines a representation of SU (2) acting on the representation space V ∼ = Cn N as (3.4), and trivially on potential fluxon components. In the abelian case n = 1, this induces via the adjoint action the rotations of functions f → Ji f = [L i , f ] in S N2 ∼ = V ⊗ V , but it is a somewhat different symmetry for the nonabelian instantons. Let us decompose V into irreducible representations as V =
n
(n i ).
(3.40)
i=1
This representation can be extended to the module V ⊗ C2 for the action of the operators Ji = L i +
1 2
σi,
(3.41)
which by construction commute with C, [Ji , C] = 0,
(3.42)
on the critical surfaces. This follows from the fact that Ci ⊗ σ i is an intertwiner for the action of Ji on n n 2 V ⊗C = (n i + 1) ⊕ (n i − 1) =: V + ⊕ V − (3.43) i=1
i=1
and C has eigenvalues ± N2 on the component subspaces V ± . This enables one to decompose C further using the projectors i± onto the irreducible representations (n i ±1) with C, i± = 0, (3.44)
Localization for Yang-Mills Theory on the Fuzzy Sphere
211
and the constrained covariant coordinates take the simple form ⎛ n ⎞ + i 0 ⎟ N ⎜ ⎜ i=1 ⎟. C= n ⎝ 2 −⎠ i 0 −
(3.45)
i=1
In particular, since C0 ⊗ σ 0 is two-fold degenerate it follows that ⎞ ⎛ n + C (n ) 0 ⎟ ⎜ i=1 0 i i ⎟ C0 ⊗ σ 0 = ⎜ n ⎠ ⎝ C0 (n i ) i− 0
(3.46)
i=1
separates the explicit blocks according to (3.39). The complex structure map J respects this SU (2) symmetry, [Ji , J ] = 0,
(3.47)
which enables one to decompose the tangent space TC O into irreducible representations of the SU (2) isometry group. With respect to the block decomposition (3.43), the subspace ker(J ) ⊂ u(N ) consists of block diagonal operators while TC O consists of block off-diagonal operators, and the action of J on tangent vectors is given explicitly by 0 X 0 iX J = . (3.48) X† 0 − i X† 0 This is the obvious complex structure on TC O compatible with the action of the isometry group. The decomposition of the tangent space TC O into irreducible representations of SU (2) is now provided by ⎞ n ⎛ n n TC− O ∼ (n i + 1) ⊗ ⎝ (n j − 1)⎠ = (n i + 1) ⊗ (n j − 1), (3.49) = i=1
j=1
i, j=1
where TC± O := TC OV ± corresponds to the upper-right, respectively lower-left, blocks in (3.48), and the different sectors (i, j) are separated by the eigenvalues of the operator ad i C0 in the irreducible case. Note in particular that the lowest spin component in the Clebsch-Gordan decomposition of (n i + 1) ⊗ (n i − 1) is a spin one field as appropriate for gauge fields. This implies J (g0 ) = 0, where g0 is the subspace of SU (2) singlet components of g, and in fact g0 = s by Schur’s lemma. Global minimum. Consider first the vacuum surface C(N ,1),...,(N ,1) . Compare the SU (2)invariant decomposition of the gauge algebra g, given by g∼ = (N ) ⊗ (N ) ⊗ u(n) = ((1) ⊕ (3) ⊕ · · · ⊕ (2N − 1)) ⊗ u(n) = ((1) ⊕ (N + 1) ⊗ (N − 1)) ⊗ u(n), (3.50)
212
H. Steinacker, R. J. Szabo
with (3.49) in the degenerate case C0 = 21 1ln N . It follows that the image of J (g) indeed covers all modes of TC O, and the complexification is achieved by adding J 2 (g). This gives another proof of the decomposition (3.30). The singlet subspace of (3.50) is g0 = (1) ⊗ u(n) ∼ = u(n) = s. Maximally irreducible saddle points. Now consider a generic, completely nondegenerate critical surface C(n 1 ,1),...,(n n ,1) , and the corresponding decomposition of TC O = TC− O ⊕ TC+ O given by (3.49). The different sectors (i, j) are distinguished by the eigenvalues of the operator ad i C0 . Hence we can pick some fixed pair n i > n j , and decompose (n i + 1) ⊗ (n j − 1)
∼ = |n i − n j | + 3
i ci j
⊕ |n i − n j | + 5 i c ⊕ · · · ⊕ n i + n j − 1 i c ⊂ TC O, ij
ij
(3.51) which has eigenvalue given by (3.39) as indicated by the subscripts. Similarly, one has (n j + 1) ⊗ (n i − 1)
∼ = |n i − n j | − 1
i c ji
⊕ |n i − n j | + 1 i c ⊕ · · · ⊕ n i + n j − 1 i c ⊂ TC O ji
ji
(3.52) (where (0) is omitted) with ad i C0 eigenvalue i c ji = − i ci j . The corresponding conjugate matrix decompositions (n j − 1) ⊗ (n i + 1) and (n i − 1) ⊗ (n j + 1) are determined by hermiticity. They are given respectively by (3.51) with eigenvalue i c ji = − i ci j and by (3.52) with eigenvalue i ci j . We denote the tangent space decomposition (3.49) determined by (3.51) and (3.52) as TC O :=
n
C|n; n i + 1, n j − 1; i ci j , l TC O ,
(3.53)
i, j=1
where n denotes the dimension of (n), and we will drop its magnetic quantum number l from now on. This defines a natural basis for TC O, in which the action of J is given by block-wise multiplication with 0 i 2 (3.54) J =σ = −i 0 as in (3.48), and the action of ad i C0 by
ad i C0
ij
= |ci j |
0 σ2 σ2 0
(3.55)
since its sign depends on n i ≷ n j . In particular, by virtue of (3.23) the tangent space TC O is naturally a symplectic vector space with symplectic form of type (1, 1) with respect to the complex structure J . This construction thereby defines a local symplectic model for the neighbourhood of the Yang-Mills critical point C in the Kähler manifold O. In the next section this model space will be used to evaluate fluctuation integrals over tubular neighbourhoods of the critical surfaces. In particular, all pertinent one-forms can
Localization for Yang-Mills Theory on the Fuzzy Sphere
213
be explicitly evaluated on TC O by using the explicit expressions for C and C0 in (3.45) and (3.46). Let us now look at the SU (2)-invariant decomposition of the gauge algebra g given by g ∼ =
n
(n i ) ⊗ (n j )
(3.56)
i, j=1 n
= (|n i − n j | + 1) ⊕ (|n i − n j | + 3) ⊕ · · · ⊕ (n i + n j − 1)
=:
i, j=1 n
C|n; n i , n j ; i ci j g.
i, j=1
This can be compared with the SU (2)-invariant decomposition of the tangent space TC O in (3.53) above, whose higher modes match perfectly with those of g except for a doubling due to the complex structure J . There is, however, some mismatch in the low lying modes. In particular, TC O contains the extra subspace
E 1 :=
C|n i − n j − 1; n j + 1, n i − 1; − i ci j TC O ,
(3.57)
i> j
which is not contained in J (g). On the other hand, the modes in the subspace E 0 :=
C|n i − n j + 1; n j + 1, n i − 1; − i ci j TC O
(3.58)
i> j
occur only once in TC O, which means that they are already spanned by the image J (g) since J = 0 on the non-trivial modes. This implies that E 0 = J (E 0 ) = J (h), where h=
i= j
C|n i − n j | + 1 ; n i , n j ; i ci j g.
(3.59)
The linear independence of the subspaces J (g h) and J 2 (g h) follows from the explicit embedding TC O → u(N ) given below. Therefore J (g h) ⊕ J 2 (g h) spans the entire tangent space TC O except for the subspace E 1 , which gives the decomposition (3.36) with the various subspaces now explicitly identified. We have J (E 0 ) = E 0 and J (E 1 ) = E 1 , with the action of J given by diagonal eigenvalues ± i on the two components in (3.58) and (3.57). On the remaining space TC O E 0 E 1 the action of J is obtained by exchanging the two components in (3.36). To complete this analysis, we need to explicitly embed TC O into the space u(N ), which admits the SU (2)-invariant decomposition u(N ) ∼ = g ⊗ ((2) ⊗ (2)) =
n
(n i + 1) ⊗ (n j + 1) ⊕ (n i − 1) ⊗ (n j − 1) i, j=1
⊕ (n i + 1) ⊗ (n j − 1) ⊕ (n i − 1) ⊗ (n j + 1) , (3.60)
214
H. Steinacker, R. J. Szabo
corresponding to (3.43). Since we know the action of J on the rhs, we can determine the map J : g −→ TC O → g ⊗ ((2) ⊗ (2))
(3.61)
using n
J |n; n k + 1, nl − 1; i ckl TC O J |n; n i , n j ; i ci j g =
TC O
k,l=1
×n; n k + 1, nl − 1; i ckl |n; n i , n j ; i ci j g + h.c. (3.62) The non-vanishing inner products in this expression can be written in terms of Wigner 6 j-symbols for the group SU (2), which are known explicitly. This also enables one to compute the projection 0 : TC O −→ g,
V0 ⊗ σ 0 + Vi ⊗ σ i −→ V0
(3.63)
as 0 |n; n i + 1, n j − 1; i ci j TC O n = |n; n k , nl ; i ckl g gn; n k , nl ; i ckl |n; n i + 1, n j − 1; i ci j TC O .
(3.64)
k,l=1
In the basis (3.36), one has the useful explicit formula 0 J (g) = ad i C0 (g) which is of order 0 (E 1 ) = {0}.
1 N
and can also be used for E 0 , while 0 J 2 (g) is of order
(3.65) 1 N2
and
General solutions. The case where some of the irreducible representations (n i ) have multiplicity ki > 1 is a combination of the structures above for the vacuum state and for the nondegenerate case. Now the basis (3.53) acquires additional labelling reflecting the u(ki ) degrees of freedom, and it takes the symbolic form TC O =
l
Cn ; (n i + 1, ai ), (n j − 1, a j ) ; i ci j T O . C
(3.66)
i, j=1
In particular, one can now easily compute the symplectic form on TC O using (3.23). It is essentially given by the complex structure J . 3.5. Fluctuations around the critical surfaces. We conclude this section with a summary of the salient features of the decompositions in Sects. 3.3 and 3.4 above, as pertaining to how they will be exploited in the next section to evaluate fluctuation integrals over the local neighbourhoods of Yang-Mills critical points. Recall that globally the critical surface (with no fluxons) through some critical point C is given by the space of gauge transformations acting on C, as in (3.27). Its tangent space is embedded locally as TC C(n 1 ,s1 ),...,(n k ,sk ) = J (g s) ⊂ TC O,
(3.67)
Localization for Yang-Mills Theory on the Fuzzy Sphere
215
which can be determined explicitly using (3.62). Recall also that the gauge stabilizer s of C consists of the SU (2) singlets in g. It is given by s ∼ = u(n) for the vacuum, and s ∼ = u(1)n for completely irreducible saddle-points. In particular, s is never trivial, quite unlike the situation in ordinary two-dimensional Yang-Mills theory [27]. The global symmetry cannot be disentangled in the noncommutative case, and the nonabelian localization even at the global minimum is akin to that at higher critical points of two-dimensional Yang-Mills theory or more precisely at the flat connections of Chern-Simons gauge theory on a Seifert fibration [27]. The non-trivial part of the localization at higher critical points will therefore be given by fluctuation integrals over the spaces E 0 , E 1 and s. The only effect of the remaining part J (g h) ⊕ J 2 (g h) will be to induce normalization terms as for the vacuum critical point. In particular, the subspaces J (g s) and J 2 (g s) locally model the tangent space TC O near the vacuum. To understand the physical meaning of the subspace E 1 , note that the gauge field strength remains constant for variations along φ = X ∈ E 1 , since δC0 E = i[C, φ]0 ∈ 1 0 (E 1 ) = {0}. Let us compute the second order variation of the Yang-Mills action, given by Tr C0 δ 2 C0 = − Tr (C0 [ [C, φ], φ])
= Tr ([C0 , φ] [C, φ]) = −N Tr ad i C0 (φ) J (φ) . (3.68) Restricting to fluctuations φ = X ∈ E 1 with respect to the decomposition (3.57) one has Tr C0 δ 2 C0 E1 (3.69) = −N Tr ad i C0 (X †ji ) J (X ji ) = −2N |ci j | Tr X †ji X ji i> j
i> j
by using the actions (3.54) and (3.55), cf. (4.85). For the maximally nondegenerate saddle-points, this fluctuation is thus negative, demonstrating that the two-dimensional instantons on the fuzzy sphere S N2 are generically unstable. On the other hand, since the subspace E 0 = J (h) is obtained through gauge transformations, it produces flat directions for the Yang-Mills action. 4. Nonabelian Localization This section is the crux of the present paper, wherein we shall derive the semiclassical expansion of the partition function for Yang-Mills theory on the fuzzy sphere S N2 and show that it agrees with the known instanton expansion of quantum gauge theory on S 2 in the classical limit N → ∞. We will begin by describing the nonabelian localization principle, adapted to our specific gauge theory. We will then explicitly evaluate the contributions from two extreme classes of Yang-Mills critical points, the vacuum and the maximally irreducible solutions, and show that they give the expected contributions to the path integral at large N . The intermediate contributions from degenerate solutions, which we do not treat in detail here, are somewhat more involved but can in principle be evaluated using our techniques. The contribution from the vacuum to the partition function could be expressed in terms of the abstract cohomological formula of [18] given
216
H. Steinacker, R. J. Szabo
by intersection pairings on the vacuum moduli space, or by using the more explicit residue formula of [23]. The contributions from some higher unstable critical points to the nonabelian localization formula are formally described in [24,26,27], but the general cases that we need (including reducible saddle points) are not explicitly treated in full generality. Here we will directly evaluate, following [27], the explicit quantum fluctuation integrals near the critical points using the local symplectic geometry of the previous section. 4.1. Equivariant cohomology and the localization principle. The goal of this section is to compute the partition function of quantum Yang-Mills theory on the fuzzy sphere defined by the action (2.25) on the configuration space (2.13) of gauge fields. After an irrelevant shift of the covariant coordinates (2.8) which is equivalent to working with the reduced Yang-Mills action (3.15), it is defined by g dim(G)/2 1 Z := dC exp − Ng Tr C02 vol(G) 4π N O dim(G)/2 g 1 (4.1) = exp ω − 2g1 Tr C02 , vol(G) 2π O where we have used the fact that the symplectic volume form ωd /d!, with d := dimC (O), defines the natural gauge invariant measure on O provided by the Cartan-Killing riemannian volume form (up to some irrelevant normalization). This follows from the fact that the natural invariant metric on O is a Kähler form. We have divided by the volume of the gauge group G = U (n N ) with respect to the invariant Cartan-Killing form and by another normalization factor for later convenience, and also introduced the rescaled gauge coupling g =
g . 2N
(4.2)
We will now describe, following [18,27], how the technique of nonabelian localization can be applied to evaluate the symplectic integral (4.1) exactly. We begin by using a gaussian integration to rewrite (4.1) as ! 1 dφ Z= exp ω − i Tr (C0 φ) − g2 Tr φ 2 , (4.3) vol(G) g×O 2π where the euclidean measure for integration over the gauge algebra φ ∈ g = u(n N ) is determined by the invariant Cartan-Killing form. Since the moment map for the G-action on O is given by (2.40), by (2.38) we have d Tr (C0 φ) = −ιVφ ω.
(4.4)
Q = d − i ιVφ ,
(4.5)
Introduce the BRST operator
where d is the exterior derivative on (O) and the contraction ιVφ acts trivially on φ. It preserves the gradation if one assigns charge +2 to the elements φ of g, and it satisfies Q 2 = − i {d, ιVφ } = − i LVφ ,
(4.6)
Localization for Yang-Mills Theory on the Fuzzy Sphere
217
where LVφ is the Lie derivative along the vector field Vφ . Thus Q 2 = 0 exactly on the space G (O) := (C[[g]] ⊗ (O))G
(4.7)
consisting of gauge invariant differential forms on O which take values in the ring of symmetric functions on the Lie algebra g. By construction one has
using (2.34) and (4.4), and
Q (ω − i Tr (C0 φ)) = 0
(4.8)
Q Tr φ 2 = 0.
(4.9)
Therefore, the integrand of the partition function (4.3) defines a G-equivariant cohomology class in HG (O), and the value of Z depends only on this class. The integral of any Q-exact equivariant differential form in G (O) over g × O is clearly 0, as is the integral of any ιVφ -exact form even if its argument is not gauge invariant. Thus Z is unchanged by adding any Q-exact form to the action, which will fix a gauge for the localization. Hence we can replace it by ! dφ 1 exp ω − i Tr (C0 φ) − g2 Tr φ 2 + t Qα , (4.10) Z= vol(G) g×O 2π which is independent of t ∈ R for any G-invariant one-form α on O, where Qα = dα − i α, Vφ .
(4.11)
The independence of (4.10) on the particular representative α ∈ (O)G of its equivariant cohomology class will play a crucial role in our evaluation of the partition function. Expanding the integrand of (4.10) by writing exp(t dα) as a polynomial in t and using the fact that the configuration space O is compact, it follows that for t → ∞ the integral localizes at the stationary points of α, Vφ in g × O. By writing Vφ = Va φ a , where φ a is an orthonormal basis of g∨ , we have α, Vφ = α, Va φ a and the critical points are thus determined by the equations α, Va = 0,
(4.12)
φ dα, Va = 0.
(4.13)
a
Since (4.13) is invariant under rescaling of φ and the Lie algebra g is contractible, the homotopy type of the space of solutions in g × O is unchanged by restricting to φ = 0 and the saddle-points reduce to the zeroes of α, Va in O. Given the reduced Yang-Mills function (3.15), let us consider explicitly the invariant one-form α given by [27,32]
α = − i Tr (C0 [C, dC]0 ) = g J dS . (4.14) We claim that the vanishing locus of α, Va in this case coincides with the critical surfaces of the original Yang-Mills action (2.25) as found in Sect. 3.1. To see this, we note that the condition
0 = α, Va = Tr C0 [C, [C, φ a ] ]0 = − Tr [C, C0 ] [C, φ a ] (4.15)
218
H. Steinacker, R. J. Szabo
certainly holds whenever [C, C0 ] = 0. On the other hand, by setting φ = C0 it implies 0 = α, Vφ = − Tr [C, C0 ]2 (4.16) which by nondegeneracy of the inner product implies that [C, C0 ] = 0. Therefore the action in (4.10) has indeed the same critical points as the Yang-Mills action (2.25). Let us now explicitly establish, following [32], the localization of the partition function onto the classical solutions of the gauge theory. Plugging (4.14) and (4.11) into (4.10) and carrying out the integration over φ ∈ g gives ! dφ 1 exp (t dα + ω) Z = vol(G) g×O 2π × exp − i Tr (C0 φ) − g2 Tr φ 2 − i t Tr ([C, [C, C0 ] ] φ) =
dim(G)/2 g 1 exp (t dα + ω) vol(G) 2π O × exp − 2g1 Tr C02 + gt Tr (C0 [C, [C, C0 ] ]) −
t2 2g
Tr ([C, [C, C0 ] ])2 , (4.17)
where we have used Tr (C [C, −]) = 0. The only configurations which contribute to (4.17) in the large t limit are therefore solutions of the equation [C, [C, C0 ] ] = 0
(4.18)
0 = Tr (C0 [C, [C, C0 ] ]) = − Tr [C, C0 ]2 ,
(4.19)
which implies as in [32] that
giving [C, C0 ] = 0 as desired. Therefore the integral (4.17) receives contributions only from the solutions of the Yang-Mills equations (3.2), which establishes the claimed localization. The local geometry in g × O about each critical point, as analysed in detail in the last section, determines the partition function as a sum of local contributions involving the values of the Yang-Mills action evaluated on the classical solutions as in Sect. 3.2. Consider an equivariant tubular neighbourhood N(n 1 ,s1 ),...,(n k ,sk ) of a critical surface C(n 1 ,s1 ),...,(n k ,sk ) in g × O. Since the partition function (4.10) is independent of t, we can consider its large t limit as above, and this limit will always be implicitly assumed " from now on. Let W be a compact subset of O with W ∩ C = ∅, where C := (n i ,si ) C(n 1 ,s1 ),...,(n k ,sk ) . Then the integral over W in (4.17) has a gaussian decay in t → ∞. This means that in expanding exp(t dα + ω) into a finite sum of terms of the form ω p ∧ (t dα)m , we can disregard all terms which contain ω since they will be suppressed by factors of 1t and vanish in the large t limit. The only terms which survive the t → ∞ limit are those with p = 0, m = d, and the integral therefore vanishes unless ω is replaced by dα, except at the saddle point where dα = 0. Then one has !
dφ 1 exp t dα − i α, Vφ Z = vol(G) g×O 2π (4.20) × exp − i Tr (C0 φ) − g2 Tr φ 2
Localization for Yang-Mills Theory on the Fuzzy Sphere
219
in the vicinity of any critical point in which dα is nondegenerate. The integral Z (n 1 ,s1 ),...,(n k ,sk ) in (4.20) over the neighbourhood N(n 1 ,s1 ),...,(n k ,sk ) is determined by the local behaviour of α and the G-action near C(n 1 ,s1 ),...,(n k ,sk ) . Then Z (n 1 ,s1 ),...,(n k ,sk ) . (4.21) Z= k ,sk ) (n 1 ,s1 ),...,(n i n i =n N , i si =n
As expected [24], the sum over critical surfaces in (4.21) contains the sum over weights 1 ≤ n 1 ≤ n 2 ≤ · · · ≤ n k of the gauge group G = U (n N ). Our explicit computations will confirm the local behaviour of the partition function given by [24] #
− dim(G) − 1 i n i2 Z (n 1 ,s1 ),...,(n k ,sk ) = g e 2g H(n 1 ,s1 ),...,(n k ,sk ) g . (4.22) The smooth function H(n 1 ,s1 ),...,(n k ,sk ) : R → C, which is bounded by a polynomial at infinity, is determined by the equivariant Euler class of the fixed point locus corresponding to the weight (n 1 , . . . , n k ) after reducing the integral over g to its Cartan subalgebra, as we do explicitly in the next section. 4.2. Explicit evaluation of the localization forms. The explicit computation of the local contributions Z (n 1 ,s1 ),...,(n k ,sk ) to the Yang-Mills partition function on S N2 will rely on the local behaviour of the invariant one-form α introduced in (4.14) near the YangMills critical points. We will now pause to derive explicit expressions for the BRST transformations (4.11) on the subspaces appearing in the tangent space decomposition (3.36). Given the invariant Maurer-Cartan one-form (2.42) and the projector (3.63), consider the u(n N )-valued one-form θ0 := 0 (θ ) =
1 2
tr σ (θ ),
(4.23)
where tr σ denotes the partial trace over the spin matrices σ µ . It is given explicitly by (4.24) θ0 = N42 (C dC)0 = N42 Ci dC i + C0 dC0 and satisfies
dθ0 = − 21 tr σ θ 2 = −0 θ 2 .
(4.25)
One has θ, Vφ =
2 N2
[C, Vφ ] = − 2Ni J (Vφ )
and
θ0 , Vφ = − 2Ni 0 J (Vφ )
(4.26)
for any tangent vector Vφ = i [C, φ]. Using the identity C dC = −dC C, the localization one-form (4.14) can now be written as α = − i N2
2
Tr (C0 θ ) = − i N2
2
Tr (C θ0 ).
Hence the pairing in (4.11) is given by
α, Vφ = −N Tr C0 J (Vφ )
= N Tr J (C0 ) Vφ = −N 2 Tr J 2 (C0 ) φ .
(4.27)
(4.28)
220
H. Steinacker, R. J. Szabo
This vanishes on the critical surfaces, where J (C0 ) = 0. Furthermore, for any g ∈ g one has $ % α, J 2 (g) = −N Tr C0 J 3 (g) = N Tr (C0 J (g)) = i Tr (C0 [C0 , g]) = 0,
(4.29)
while for e0 ∈ E 0 one has $ % α, e0 = α, J (e0 ) = α, J 2 (h) = 0
(4.30)
for some h ∈ h. Both identities (4.29) and (4.30) hold even off-shell. We also note the on-shell relations α, J (g) = −N Tr C0 J 2 (g) = 0 and α, e1 = −N Tr (C0 J (e1 )) = 0 (4.31) for e1 ∈ E 1 . To evaluate the integral (4.20) using the stationary phase method, we must understand how it behaves near the Yang-Mills critical points. For this, we will study the local behaviour of the BRST variation (4.11), beginning with the pairing α, Vφ . Let us write a generic gauge field of O as C = C + ε i [ C, ] + 21 ε2 i [ C, i [ C, ] ] + O(ε3 ), where C is the given critical point, ∈ su(N ) are the fluctuations around C and ε is a small real parameter. Then J 2 (C0 ) = 0 + ε J 2 i [ C, ]0 + Ni i [ C, ], J ( C 0 )
+ Ni J [ i [ C, ], C 0 ] + O ε2
= ε J 2 ((V )0 ) + Ni J [V , C 0 ] + O ε2 , (4.32) which for φ ∈ g gives α, Vφ = −ε N 2 Tr J 2 ((V )0 ) φ + Ni J (V ), C 0 , φ + O ε2 = −ε N 2 Tr (V )0 J 2 (φ) + J (V ) Ni C 0 , φ + O ε2 = −ε N 2 Tr V J 2 (φ)0 − J (J (φ)0 ) + O ε2 , (4.33) using (3.65). This is non-degenerate for φ ∈ g s h, i.e. non-vanishing for some V ∈ TC O. To see this, it is sufficient to show that J (J 2 (φ)0 − J (J (φ)0 )) = 0. Indeed, assuming the contrary J (J 2 (φ)0 ) = J 2 (J (φ)0 ) would imply that either φ ∈ s, or J (φ)0 ∈ h which is amounts to φ ∈ h ⊕ s. On the other hand, this pairing is indeed degenerate for any V ∈ E 1 . For φ ∈ s, the second-order contribution to the form (4.33) can be obtained from (4.34) Vφ = i [C, φ] = i ε[V , φ] + O ε2
Localization for Yang-Mills Theory on the Fuzzy Sphere
and J (C0 ) =
i N
[C, C0 ] =
i N
221
ε ([V , C0 ] + [C, (V )0 ]) + O ε2 .
(4.35)
It follows that
α, Vφ = −ε2 Tr adφ (V ) adC0 (V ) + i N J ((V )0 ) + O ε3 .
(4.36)
In particular, for V ∈ E 1 this pairing simplifies to
α, Vφ = −ε2 Tr adφ (V ) adC0 (V ) + O ε3 .
(4.37)
We now turn to the exact part dα of (4.11). Using (2.44)–(2.46), one finds 2 2 dα = − i N2 Tr dC θ0 − C0 θ 2 = − i N2 Tr (C θ θ0 + C0 dθ ) .
(4.38)
For flat connections with F = 0, the second term in the first equality of (4.38) vanishes and one has dα = − i
N2 2
Tr (dC θ0 ) = − i
N2 2
Tr (C θ θ0 )
if C0 =
1 2
1ln N .
(4.39)
From (2.43) and (2.44) one generally has θ 2 = − N42 (dC)2 , and hence $ % Tr (C0 θ 2 ), Vφ ∧ Vψ = N42 Tr (C0 [ [C, φ], [C, ψ] ]) =
4 N2
Tr ([C0 , [C, φ] ] [C, ψ]) = − N42 Tr adC0 (Vφ ) Vψ (4.40)
for any pair of tangent vectors Vφ = i [C, φ] and Vψ = i [C, ψ]. Similarly, one has Tr (C θ θ0 ), Vφ ∧ Vψ
= Tr (dC θ0 ), Vφ ∧ Vψ = − 2Ni Tr Vφ J (Vψ )0 − Vψ J (Vφ )0 , (4.41) which vanishes if any of the arguments belongs to the subspace E 1 . If Vψ = J (h) ∈ E 0 for some h ∈ h, then by using the map (3.35) along with (4.41) one computes the on-shell pairing
Tr (C θ θ0 ), Vφ ∧ Vψ = − 2Ni Tr Vφ J ( j (h))0 − J (h)0 J (Vφ )
= − N22 Tr adC0 (Vφ ) j (h) + adC0 (h) J (Vφ )
= − N22 Tr N adC0 (J (φ)) j (h) + adC0 (h) J (Vφ ) = − N22 Tr −N adC0 (φ) J 2 (h) + adC0 (h) J (Vφ )
= − N22 Tr N adC0 (J (φ)) J (h) − adC0 (J (h)) Vφ
(4.42) = − N22 Tr adC0 (Vφ ) Vψ − adC0 (Vψ ) Vφ . This coincides with (4.40), and in particular it vanishes unless the vector field Vφ also belongs to the subspace E 0 . In summary, we have the on-shell evaluations
dα, Vφ ∧ Vψ = 2 i Tr Vφ adC0 (Vψ ) if Vψ ∈ E 1 (4.43) and dα, Vφ ∧ Vψ = 0
if Vψ ∈ E 0 .
(4.44)
222
H. Steinacker, R. J. Szabo
4.3. Localization at the vacuum moduli space. We will now compute the localized partition function Z 0 := Z (N ,1),...,(N ,1) at the vacuum critical surface. We denote this gauge orbit as (4.45) O0 := C(N ,1),...,(N ,1) = g C g −1 g ∈ U (n N ) ∼ = U (n N )/U (n). In this case the subspaces E 0 and E 1 in (3.36) are trivial. Localization implies that we can restrict ourselves to a G-equivariant tubular neighbourhood N0 = N(N ,1),...,(N ,1) of the critical surface, under the action of the gauge group G = U (n N ). The neighbourhood N0 has an equivariant retraction [45, Chap. 27] by a local equivariant symplectomorphism onto the local symplectic model F0 , defined to be an equivariant symplectic vector bundle over O0 with fibre J 2 (g s) which is a sub-bundle of the tangent bundle T O restricted to O0 . This means that the tangent space to F0 at the vacuum critical point C in (3.9) is given by TC O0 ⊕ J 2 (g s) ∼ = J (g s) ⊕ J 2 (g s) = TC O, the symplectic two-form on F0 is simply ω, and the hamiltonian G-action on F0 descends from the moment map µ. In physical terms, the gauge fields are decomposed along the vacuum moduli space O0 plus infinitesimal non-gauge variations in the subspace J 2 (g s). Due to the presence of the localization form α in the path integral, we can restrict ourselves to this model F0 and use it to replace the open neighbourhood N0 [27]. Indeed, because F0 is an equivariant retraction from N0 , the G-equivariant cohomology of N0 is the same as that of F0 . Furthermore, since the fibres of the bundle F0 are contractible, its G-equivariant cohomology is identified under pullback with the S-equivariant cohomology of its base space O0 , so that HG (N0 ) ∼ = HS (O0 ). Since S acts trivially on O0 , one has HS (O0 ) ∼ = C[[s]] S ⊗ H (O0 ) and the S-equivariant cohomology classes of O0 coincide with ordinary cohomology classes of O0 valued in the ring of invariant functions on the stabilizer s. Putting everything together gives an isomorphism HG (N0 ) ∼ = C[[s]] S ⊗ H (O0 ) which reduces the equivariant integral over g × N0 in (4.20) to an ordinary integral over s × O0 . This is precisely the nonabelian localization that is formally carried out in [26], and will turn out to be very much like the localization at the trivial connection of Chern-Simons theory on a Seifert homology sphere [27]. In the present case, the integral over φ ∈ s will then give the interesting non-trivial quantum fluctuation determinants about the classical solution. We will now carry out this reduction explicitly. Let gi be an orthonormal basis of g = g s, and consider the corresponding basis Ji = J (gi )
and
J˜j = J 2 (g j )
of TC O = J (g s) ⊕ J 2 (g s), with the dual basis λi , λ˜ j defined by $ % % % $ % $ $ λi , J j = δ i j , λ˜ i , J˜j = δ i j and λi , J˜j = λ˜ i , J j = 0.
(4.46)
(4.47)
Introduce the functions f i = α, Ji
(4.48)
which vanish on-shell but have non-degenerate derivatives d f i due to (4.33). Then by expanding φ = φ i gi + φ a sa into components φ i along g s and φ a along s, we have α, Vφ = N α, J (φ) = N f i φ i .
(4.49)
Localization for Yang-Mills Theory on the Fuzzy Sphere
223
It follows that the localization one-form can be expanded as α = f i λi
(4.50)
dα = d f i ∧ λi + f i dλi .
(4.51)
d & (dα)d = d f i ∧ λi + f j ϒ j , d!
(4.52)
with
In particular, one has
i=1
where d = dimC (O) = n 2 (N 2 − 1) is the (real) dimension of the vacuum orbit O0 . The forms f j ϒ j vanish on-shell, and are killed by localization in the large t limit. For example, inner products of the form α, J (s) , s ∈ s are non-vanishing off-shell at second order due to (4.36), but these higher-order terms do not contribute because √ of the localization in the large t limit. This can be seen explicitly by rescaling f i = t f i . The corresponding local contribution to the partition function (4.20) for t → ∞ is then given by ! g dφ t d 1 2 Z0 = (dα)d e − i t α,Vφ − i Tr (C0 φ)− 2 Tr (φ ) vol(G) g×F0 2π d! =
=
1 vol(G) g×F0 1 vol(G) s
dφ 2π
dφ 2π
!
! td
d & d f i ∧ λi e − i N t
f i φ i − i Tr (C0 φ)− g2 Tr (φ 2 )
i=1 g
e − i Tr (C0 φ)− 2
Tr (φ 2 )
1 Nd
d &
O0 i=1
λi .
(4.53)
Here the f i integrals over the fibre J 2 (g s) have produced delta-functions setting φ i = 0 in g s. We can carry out the integral over the moduli space O0 in (4.53) by observing that 1 Nd
d &
O0 i=1
λi =
d & G/S i=1
ηi =
vol(G) , vol(S)
(4.54)
where the pullbacks J ∗ (λi ) = ηi define left-invariant one-forms on the gauge group G dual to gi , with the map N J regarded as the derivative of the diffeomorphism G/S −→ O0 ,
g −→ g C g −1 .
(4.55)
To evaluate the remaining integral over the gauge stabilizer algebra s ∼ = u(n) in (4.53), we note that, for the vacuum critical point with C0 = 21 1ln N , the integrand defines a gauge invariant function f : u(n) → R. We may thus apply to it the Weyl integration formula which reduces its integral over u(n) to an integral over the Lie algebra u(1)n of the maximal torus U (1)n of U (n). It is given by vol (U (n)) [dφ] f (φ) = [ds] (s)2 f (s), (4.56) n! (2π )n u(n) Rn
224
H. Steinacker, R. J. Szabo
where we have identified u(1)n ∼ = Rn in a basis where the Cartan subalgebra of U (n) is represented by diagonal n × n matrices s = diag(s1 , . . . , sn ) by mapping them onto n-vectors s = (s1 , . . . , sn ) ∈ Rn . Here (s) =
'
(si − s j ) =
1≤i, j≤n
i< j
j−1
si
det
(4.57)
is the Vandermonde determinant, which is the Weyl determinant for U (n) arising as the jacobian for the diagonalization of hermitian matrices on the left-hand side of (4.56). The factor n! is the order of the Weyl group Sn of U (n) acting by permutations of the components si of s ∈ Rn , while (2π )n is the volume of the maximal torus U (1)n with respect to the chosen invariant Haar measure. An integral identity. We will make use here and in Sect. 4.4 below of the integral identity Rn
N 2
[ds] (s)2 e − i −n N
= e
si + 4i
i
m i si − g4
i
si2
i
2 −m N 4g
Rn
i 4
[ds] (s)2 e
m i si − g4
i
si2
i
,
where m = i m i . To derive (4.58), we set s = i si and ti = si − i ti = 0. Then
N
i
g
[ds] (s)2 e − i 2 i si + 4 i m i si − 4 i si n R N m = ds e − i 2 s+ i 4n s [dt] (t)2 e =
R
R
=2
Rn
ds e
(
πn g
−i
e
N 2
s+ i
N −m) − (2n16n g
Rn
2
Rn
1 n
s so that
2
i 4
i
g 2 s− 4n s
m 4n
(4.58)
[dt] (t)2 e
[dt] (t)2 e
i 4
i
m i ti − g4
i 4
i
m i ti − g4
i
(ti + n1 s)2
m i ti − g4
2 i ti
.
2 i ti
(4.59)
On the other hand
i
[ds] (s)2 e 4 Rn m i 4n s = ds e R
= =2
R
ds e
(
πn g
i
e
m 4n
m i si − g4
i
Rn
i
si2
[dt] (t)2 e
g 2 s− 4n s
m2 − 16n g
Rn
i 4
i
m i ti − g4
Rn
i 4
[dt] (t)2 e
[dt] (t)2 e
i 4
i
i
i
(ti + n1 s)2
m i ti − g4
m i ti − g4
2 i ti
2 i ti
.
(4.60)
Localization for Yang-Mills Theory on the Fuzzy Sphere
225
Final reduction. From (4.53), (4.54) and (4.56) we obtain ! g dφ 1 2 Z0 = e − i Tr (C0 φ)− 2 Tr (φ ) vol(S) s 2π ! g ds 1 1 2 − i N2 i si − 4 = e (s) 2 n n! (2π ) 2π Rn
i
si2
,
(4.61)
where we have substituted (4.2) and used vol(S) = N N /2 vol(U (n)) with respect to the Cartan-Killing metric on s, since S = U (n) ⊗ 1l N . Applying the integral identity (4.58) therefore allows us to finally write the partition function as N2 g 2 1 1 − n 4g Z0 = e [ds] (s)2 e − 4 i si . (4.62) 2 +n n n! (2π ) Rn 2
The exponential prefactor in the above expression is the Boltzmann weight of the action (3.15) evaluated on the vacuum solution. The remaining quantum fluctuation integral is the standard expression [19] for the contribution from the global minimum of the Yang-Mills action on S 2 to the U (n) sphere partition function. It arises from the trivial instanton configuration with vanishing monopole charges m i = 0 in (3.17). 4.4. Localization at maximally irreducible saddle points. We now turn to the opposite extreme and look at the local contribution to the partition function (4.20) from a generic maximally non-degenerate critical surface. We denote this gauge orbit by Omax := C(n 1 ,1),...,(n n ,1) = g C g −1 g ∈ U (n N − c1 ) ∼ = U (n N − c1 )/U (1)n (4.63) and assume that the integers n 1 > n 2 > · · · > n n are explicitly specified. Here we allow also c1 = 0 which describes sectors with non-vanishing U (1) monopole number (3.13). We want to compute the integral Z max in (4.20) over a local neighbourhood Nmax of Omax , which is independent of t in the large t limit. We first need to find a suitable basis for the tangent space TC O at the irreducible critical point C. The definition of the basis Ji , J˜i introduced in (4.46) naturally extends to include the non-trivial subspaces E 0 , E 1 in this case with Ji = J (gi ),
J˜j = J 2 (g j ),
Hi = J (h i ) ∈ J (h) = E 0
and
Ki ∈ E1 (4.64)
for gi and h i an orthonormal basis of g h s and of h s, respectively. The elements K i are assumed to form an orthonormal basis of E 1 , orthogonal to J (g) ⊕ J 2 (g). Recall from Sect. 3.4 that E 0 and E 1 are naturally complex vector spaces, whose generators are embedded into the tangent space decomposition (3.36) as ⎞ ⎛ 0 0 0 0 ⎜ 0 0 Xi 0 ⎟ ⎟ (4.65) Ki = ⎜ ⎝ 0 X† 0 0 ⎠ i 0 0 0 0
226
H. Steinacker, R. J. Szabo
and similarly for Hi . The complex structure is given by the map J , which amounts to multiplying X i by i . We accordingly take the real basis K i to be ordered as {K i } = {( K˜ i , J ( K˜ i ))}, and similarly for Hi . As matrices, all of the generators Hi , K j are hermitian. The corresponding dual one-forms β i , γ i are defined as usual by $ % % $ β i , H j = δi j and γ i , K j = δi j (4.66) with all other pairings equal to 0. We need to evaluate the pairing α, Vφ . It vanishes on-shell, and identically on J 2 (g). Its evaluation on J (g h s) has the form α, J (gi ) = f i , and as before this implies (4.49). Together with (4.30) and (4.31), it follows that the localization one-form α admits an expansion α = f i λi + gi β i + ki γ i ,
(4.67)
where f i , gi , ki vanish on-shell. We can evaluate dα = d f i ∧ λi + f i dλi + dgi ∧ β i + gi dβ i + dki ∧ γ i + ki dγ i
(4.68)
using (4.44) and (4.43) to get dα, Hi ∧ H j = 0 where
and
dα, K i ∧ K j = Ai j ,
Ai j = 2 i Tr K i adC0 (K j )
(4.69)
(4.70)
is an antisymmetric matrix. Furthermore, dα vanishes when evaluated on mixed terms of the form K i ∧J (g), K i ∧J 2 (g), Hi ∧J (g ) and Hi ∧J 2 (g ) with g ∈ g, g ∈ ghs. Therefore dα = d f i ∧ λi +
1 2
Ai j γ i ∧ γ j + O f ,
(4.71)
where O f denotes contributions which vanish on-shell such as f i dλi . One then has ⎞ ⎛ 2d d−d 0 −d1 &1 & (dα)d−d0 = pfaff(A) γi ∧ ⎝ d f j ∧ λj⎠ + O f , (4.72) (d − d0 )! i=1
j=1
where d0 (resp. d1 ) is the complex dimension of the vector space E 0 (resp. E 1 ), and pfaff(A) = i1 ···i2d1 Ai1 i2 · · · Ai2d1 −1 i2d1
(4.73)
is the pfaffian of the antisymmetric matrix A = (Ai j ). Let us now recall the local geometry and define its symplectic model. The G-equivariant tubular neighbourhood Nmax of Omax has an equivariant retraction [45] by a local equivariant symplectomorphism onto the local symplectic model Fmax , defined to be an equivariant symplectic vector bundle over Omax with fibre J 2 (g h s) ⊕ E 1 which is a sub-bundle of the tangent bundle T O restricted to Omax . This means that the tangent space to Fmax is given by TC Omax ⊕ J 2 (g h s) ⊕ E 1 ∼ = E 0 ⊕ J (g h s) ⊕ J 2 (g h s) ⊕ E 1 = TC O,
(4.74)
Localization for Yang-Mills Theory on the Fuzzy Sphere
227
the symplectic form on Fmax is simply ω, and the hamiltonian G-action on Fmax descends from the moment map µ. In physical terms, the gauge fields are split along the moduli space Omax , plus infinitesimal non-gauge variations belonging to J 2 (g h s) and unstable modes in the subspace E 1 . Due to the presence of the localization form α in the action, we can restrict ourselves to this model Fmax replacing Nmax . Identically to the case of Sect. 4.3 above, the canonical symplectic integral over g × Nmax will in this way reduce to an integral over s × Omax and the localization now resembles that at an irreducible flat connection of Chern-Simons theory [27]. We may now proceed to calculate ! dφ 1 exp Z max = vol(G) g×Nmax 2π
× ω + t dα − i α, Vφ − i Tr (C0 φ) − g2 Tr φ 2 1 = vol(G) g×Omax ×J 2 (ghs)×E 1 ! dφ (t dα)d−d0 ωd0 − i t α,Vφ − i Tr (C0 φ)− g Tr (φ 2 ) 2 ∧ e × 2π (d − d0 )! d0 ! ! dφ 1 pfaff(A) = vol(G) (ghs)⊕h⊕s 2π ⎞ ⎛ 2d d−d 0 −d1 &1 & ω d0 × t d−d0 γi ∧ ⎝ d f j ∧ λj⎠ ∧ d0 ! Omax ×J 2 (ghs)×E 1 i=1
× e − i t (N
f i φ i +α,Vφ )− i
Tr (C0 φ)− g2
j=1
Tr (φ 2 )
(4.75)
with φ ∈ h⊕s. In the second line we have used the fact that dα vanishes when evaluated on the subspace E 0 , and therefore we need d0 powers of ω to yield a non-trivial volume form. Then (t dα)d−d0 ∧ ωd0 is the only term which survives in the large t limit. We will modify this below by adding a second localization form α in order to write the localization integral in the generic form (4.20) without the symplectic two-form ω. We can now evaluate the integrals in (4.75) over f i in the fibre J 2 (g h s) and φ i ∈ g h s as in Sect. 4.3 above, which localizes for t → ∞ to an integral over the subspace E 1 and the gauge orbit Omax given by ! dφ pfaff(A) 1 Z max = t d1 vol(G) h⊕s 2π N d−d0 −d1 Omax ×E 1 ⎞ ⎛ 2d d−d 0 −d1 &1 & ω d0 × γi ∧ ⎝ λj⎠ ∧ d0 ! i=1
j=1
g
× e − i t α,Vφ − i Tr (C0 φ)− 2
Tr (φ 2 )
.
(4.76)
The gauge invariant volume form for the integration domain whose tangent space in E 0 is given by the symplectic volume form ωd0 /d0 !, since dα vanishes on E 0 , but this will be modified below. It remains to compute the integral over E 1 . Upon evaluating α, Vφ at second order on E 1 , i.e. away from the critical surface, we will find below that
228
H. Steinacker, R. J. Szabo
this pairing becomes a quadratic form which leads to a localization through a gaussian integral. However, to evaluate it explicitly it is easier to first localize the integral over E 0 , which presently is a complicated non-gaussian integral which does not admit a gaussian approximation at t → ∞ and is difficult to evaluate in a closed analytic form. But this can be done by adapting a trick taken from [27], which amounts to adding a further suitable localization one-form α , or equivalently a cohomologically trivial form Qα , to the action in (4.20). Indeed, we may compute Z max using any other invariant form α which is homotopic to α on the open neighbourhood Nmax . The one-form α need only be non-vanishing on E 0 ⊂ Nmax , as the other integrals can be directly carried out. The localization form α . In order to evaluate the integrals over E 0 and h, following [27] we introduce an additional localization term exp(t Qα ) in the partition function with α := − i Tr (θ φ) = − N2 J d Tr (C φ) . (4.77) E0
E0
The projection onto E 0 is equivalent to projecting φ ∈ g onto h. This one-form is equivariant on-shell, and it can be extended to the G-equivariant tubular neighbourhood Nmax of the critical surface Omax as follows. On the tangent space J (g h s) ⊕ E 0 of T Omax (4.74) there is an equivariant projection onto the subspace E 0 . In this way α is properly defined on the local model, and can hence be extended to Nmax . One could also define α = − i χ Tr (θ φ) E using a smooth G-invariant cutoff function χ with support 0 near the given saddle-point and χ = 1 in the tubular neighbourhood, which is globally well-defined over Nmax as an equivariant differential form. Note that t1 α +t2 α vanishes only on the original critical points for any t1 , t2 ∈ R with t1 = 0, and no new ones are introduced. Then our previous computation (4.17) would essentially go through, since α vanishes on J (g h s) and there are no critical points where dχ = 0. It is therefore just as good a localization form to use as α is. It follows that the modification of the canonical symplectic integral over Nmax given by ! dφ 1 Z max = vol(G) g×Nmax 2π (4.78) × exp ω + t1 Qα + t2 Qα − i Tr (C0 φ) − g2 Tr φ 2 is independent of both t1 , t2 ∈ R. Then α will localize the integral over h ⊂ g as well as the integral over the unstable modes in E 1 , without the need to expand α, Vφ to higher order. Integration over h. The new localization form α satisfies dα = i Tr θ 2 φ = − 2i Tr (θ [φ, θ ]) E0
and
α , Vh i = − N2 Tr J (Vh i ) φ =
2 N
E0
(4.79)
Tr Vh i J (φ) = 2 Tr (Hi J (φ)) , (4.80)
where Hi = J (h i ) with h i a basis of h. This produces a gaussian integral localizing h to the gauge stabilizer algebra s ∼ = u(1)n . To evaluate it, we will need the matrix Mi j := Tr (Hi H j )
(4.81)
Localization for Yang-Mills Theory on the Fuzzy Sphere
229
which is hermitian since we take Hi and h i to be hermitian. Similarly, one has
dα , Hi ∧ H j = N4 i2 Tr J (Hi ) [s, J (H j )] = − N4 i2 Tr Hi [s, J 2 (H j )] =
4i N2
Tr Hi [s, H j ] =:
4i N2
A˜ i j ,
(4.82)
where we have restricted to φ = s ∈ s using the localization. This implies that dα =
2i N2
A˜ i j β i ∧ β j
and
2d0 d0 & (dα )d0 = N4 i2 pfaff A˜ β i . (4.83) d0 ! i=1
To evaluate the matrices M = (Mi j ) and A˜ = ( A˜ i j ) above explicitly, we recall that the basis Hi := Hkl;i (where k, l are block indices) of E 0 takes the block form ⎞ ⎛ 0 0 0 0 ⎜ 0 0 Ylk;i 0 ⎟ ⎟ = J (h kl;i ), Hkl;i = ⎜ (4.84) ⎝0 Y† 0 0⎠ lk;i 0 0 0 0 where h kl;i ∈ h is a hermitian block matrix with a similar block decomposition. They are orthogonal for different k, l, and we will often omit the indices k, l. Note that the complex structure on E 0 defined by the map J is compatible with the natural complex structure on h. This basis is particularly useful for evaluating the pfaffian which appears in (4.83), because ads (Hkl;i ) for s ∈ s acts as multiplication by (sk −sl ) in the upper-right blocks of (4.84). It follows that i adC0 (Hkl;i ) = clk J (Hkl;i )
and
i ads (Hkl;i ) = (sl − sk ) J (Hkl;i ),
(4.85)
where the eigenvalues clk > 0 are defined in (3.39). These formulas hold only for k > l, and analogous statements are true for the subspace E 1 . We can choose an orthogonal basis Yi such that G i j = 2 Tr Yi Y j† is diagonal, as G i j is a hermitian matrix. Then
Tr Hi H j = Tr Yi Y j† + Yi† Y j = G i j ,
(4.86) Tr Hi J (H j ) = Tr i Yi Y j† − i Yi† Y j = 0. This means that the symmetric matrix M = (Mi j ) in (4.81) has the block decomposition G 0 (4.87) M= 0 G in the basis ( H˜ i , J ( H˜ i )), and similarly the matrix A˜ in (4.82) is given by
A˜ i j = Tr Hi ads (H j )
0 G = − i (sl − sk ) Tr Hi J (H j ) = − i (sk − sl ) . −G 0 i j
(4.88)
230
H. Steinacker, R. J. Szabo
We can read off the pfaffian from this expression and use (4.87) to write it as ' # pfaff A˜ = (− i )d0 det(M) (sk − sl )|n k −nl |+1 .
(4.89)
k>l
We can now evaluate the localization integral ! dφ (dα )d0 − i t2 α ,Vφ t2d0 e d0 ! h 2π ! 2d0 d0 dφ i j & t2d0 pfaff A˜ e −2 i t2 φ Mi j φ = N4 i2 βi , 2π h
(4.90)
i=1
φi
where φ = h i = h kl;i . The oscillatory gaussian integral is defined by analytic continuation t2 → t2 − i ε for a small positive parameter ε, which we are free to do as the partition function is formally independent of t2 . With this continuation understood and a suitable orientation of the vector space h, we readily compute ! 2d0 d0 2d )d 0 pfaff A˜ & dφ (dα 0 1 π d0 − βi t2d0 e − i t2 α ,Vφ = N4 i2 √ 2π 2i d0 ! det(M) i=1 h 2π φ kl;i
=
2d0 ' & i d0 |n k −nl |+1 (s − s ) βi . k l (2π N 2 )d0 k>l
i=1
(4.91) This integral thus produces a measure on s which we will use below to perform the remaining integral over the stabilizer. Integration over E 1 . Now that the φ-integration in (4.76) is localized onto s, we can proceed to evaluate the integral over E 1 . This space has a basis K i with block decomposition K kl;i similar to (4.84) for n ≥ k > l ≥ 1 (for k < l the K kl;i do not exist), which are non-vanishing if n k > nl + 1. We need to evaluate α, Vs for s ∈ s up to second order in the fluctuations about the critical point in E 1 , which is non-tangential to the gauge orbit Omax . For this, we introduce real linear coordinates x i , y i , i = 1, . . . , d1 on E 1 such that a generic vector V ∈ E 1 is parametrized as ⎛ ⎞ 0 0 0 0 ⎜0 0 zi X i 0 ⎟ ⎟, V = x i K i , y i J (K i ) = ⎜ (4.92) † i ⎝0 z X 0 0⎠ i 0 0 0 0 where we have introduced complex coordinates z i = x i + i y i . Then γ i = dx i and γ i+d1 = dy i for i = 1, . . . , d1 . As above, we can choose coordinates such that G i j = 2 Tr X i X †j is diagonal. Then (4.37) gives j
x (4.93) α, Vs = − Tr ads (V ) adC0 (V ) = x i , y i M˜ i j (s) yj
Localization for Yang-Mills Theory on the Fuzzy Sphere
231
to second order, where
M˜ i j (s) = Tr K i ads adC0 (K j )
= (sk − sl ) ckl Tr (K i K j ) = (sk − sl ) ckl
G 0 0 G
(4.94) ij
is a symmetric matrix and we have used the obvious analog of (4.85) for the basis K i . Similarly, the antisymmetric matrix A in (4.70) can be expressed as
0 G Ai j = 2 i Tr K kl;i adC0 (K kl; j ) = 2clk Tr K kl;i J (K kl; j ) = 2ckl , −G 0 i j (4.95) and using (4.94) its pfaffian is therefore given by ) ' d1 ˜ pfaff(A) = 2 det M(s) (sk − sl )1−|n k −nl | .
(4.96)
k>l
The pfaffians pfaff( A˜ ) and pfaff(A) represent the S-equivariant Euler classes in HS (Omax ) of equivariant bundles over Omax with fibres E 0 and E 1 , respectively, in terms of the weights sk for the (trivial) S-action on Omax . They are the typical representatives of fluctuations in equivariant localization [29,41], and they also appear in the nonabelian localization formulas of [27] and of [26]. Using the analytic continuation t1 → t1 − i ε and a suitable orientation of E 1 as before, we can now evaluate the oscillatory gaussian integral ' d1 π d1 1 dx i dy i t1d1 e − i t1 α,Vs = (4.97) ) . i E 1 i=1 ˜ det M(s) Symplectic integral over Fmax . Putting the results (4.76), (4.91), (4.96) and (4.97) together, we may evaluate the large t1 , t2 limit of the symplectic integral (4.78) to obtain !
dφ 1 exp d(t1 α + t2 α ) − i t1 α + t2 α , Vφ Z max = vol(G) g×Fmax 2π g
=
× e − i Tr (C0 φ)− 2 π d1 1 vol(G)
×
=
i d0 (2π N 2 )d0
i
1 N d−d0 −d1
Tr (φ 2 )
⎛
Omax
⎝
s
d−d 0 −d1 & j=1
ds 2π
! ' k>l
⎞ λ
j⎠
(sk − sl )|n k −nl |+1 )
∧
2d &0
pfaff(A) ˜ det M(s)
β
i
e −i
i=1
! n g ds i d0 −d1 ' √ 1 (s)2 e − i Tr (C0 s)− 2 n k d −d 0 1 vol(G) (2π ) 2π Rn k=1 ⎛ ⎞ 2d d−d 0 −d1 & &0 1 ⎝ × d+d −d λj⎠ ∧ βi , N 0 1 Omax j=1
Tr (C0 s)− g2 Tr (s 2 )
i=1
Tr (s 2 )
(4.98)
232
H. Steinacker, R. J. Szabo
where we have transformed the integration over φ = s = diag(s1 1ln 1 , . . . , sn 1ln n ) ∈ s to an integral over s = (s1 , . . . , sn ) ∈ Rn . We can carry out the integral over the moduli space Omax by observing again ⎛
1 N d+d0 −d1
Omax
⎝
d−d 0 −d1 &
⎞ λ
j⎠
∧
2d &0
j=1
β
d+d 0 −d1 &
=
i
G/S
i=1
ηj =
j=1
vol(G) , (4.99) vol(S)
where J ∗ (λi ) = ηi are left-invariant one-forms on the gauge group G. Note that (4.99) includes the integral over E 0 , and dimR (g s) = d + d0 − d1 . We also have vol(S) = √ N 2π n in our metric on s, since S = k k k U (1) ⊗ 1ln k , and C 0 (n i ) = 2n i 1ln i . Using furthermore d0 − d1 = n 2 − n which is an even integer, we may then bring (4.98) into the form Z max = = =
in
2 −n
(2π )n i
2 +n
n 2 −n
(2π ) in
n 2 +n
2 −n
(2π )n
2 +n
[ds] (s)2 e − i
Rn
i
[ds] (s)2 e − 2
Rn
N n/2 n √ nk
Rn
[d˜s ]
Tr (C0 s)− g2 Tr (s 2 )
N
i
si − g4
' (
ni i N
(
s˜k −
N nk
si2
(4.100) 2
N nl s˜l
e
− 2i
)
g N3 n i s˜i − 4
i
i
s˜i2
,
k>l
k=1
√ where s˜i := n i /N si . Completing the square of the gaussian function of s˜i in (4.100) identifies the Boltzmann weight of the action (3.15) on the solution in
non-degenerate mi (3.16). In the large N limit, we substitute (3.17) with s˜i ≈ 1 + 2N si . Neglecting terms of order N1 then reduces (4.100) to Z max ≈ ±
1 (2π )
n 2 +n
i
Rn
[ds] (s)2 e − 2
N
i
si
e
i 4
i
m i si − g4
i
si2
, (4.101)
and an application of the integral identity (4.58) leads to our final result Z max ≈ ±
1 2 (2π )n +n
e
−n N
2 −m N 4g
Rn
[ds] (s)2 e
i 4
i
m i si − g4
i
si2
. (4.102)
The exponential prefactor in this formula exhibits the shift of the vacuum action, corresponding to the modification of the trace constraint (2.22) to (3.13), by the Chern class c1 = m = i m i . The remaining contributions coincide with the classical result [19] for the contribution to the U (n) sphere partition function from the Yang-Mills instanton on S 2 specified by the configuration of magnetic monopole charges m 1 , . . . , m n ∈ Z. In particular, using the standard manipulation of [19] one can change integration variables in (4.102) to identify the anticipated Boltzmann weight of the action (3.20).
Localization for Yang-Mills Theory on the Fuzzy Sphere
233
5. Abelianization In the following sections we will describe an alternative technique of evaluating the partition function of U (n) Yang-Mills theory on the fuzzy sphere S N2 , within the framework of our symplectic model. This method can be regarded as a finite-dimensional version of the technique of abelianization for ordinary Yang-Mills theory in two dimensions [28], which can be used to derive the strong-coupling expansion of the gauge theory and agrees with the nonabelian localization. The advantage of this formalism is that it captures all classical contributions to the partition function in a single go and for any N , in contrast to nonabelian localization which requires analysis of each type of critical point individually and only yields tractable expressions in the large N classical limit. Its downfall is that it leads to somewhat cumbersome expressions for the partition function which arise from a rather different sort of localization. This is analogous to the case of gauge theory on the two-dimensional noncommutative torus whose strong-coupling expansion involves the addition of infinitely many higher Casimir operators to the usual Migdal formula [34], or its matrix model regularization which is given by a complicated combinatorial formula [33]. This complexity makes it difficult to explicitly extract the contributions from fuzzy sphere instantons, and we will examine this problem more thoroughly in the next section. Here we shall derive in detail our alternative abelianized formula for the partition function (4.1), representing yet another new solution for quantum gauge theory on the fuzzy sphere. Let us start from the partition function in the form (4.3). The crucial observation is that the function f : g → R defined by the symplectic integral 1 f (φ) := (5.1) exp ω − i Tr (C0 φ) − g2 Tr φ 2 vol(G) O() is gauge invariant. Analogously to what we did in Sect. 4.3, we may therefore apply the Weyl integration formula (4.56) which reduces its integral over the gauge algebra g = u(n N ) to an integral over the Lie algebra u(1)n N of the maximal torus T = U (1)n N of G = U (n N ). This rewriting of the φ-integral in (4.3) is called diagonalization or abelianization, and it can be thought of as the eigenvalue representation of the gauge theory regarded as a matrix model. In this way we may bring the partition function into the form ! g 1 dp 2 Z= e − 4 Tr ( p ) ( p)2 Z O ( p), (5.2) (n N )! Rn N 2π where
Z O ( p) =
O()
exp ω −
i 2
Tr ( p C)
(5.3)
is the Fourier transform of the orbit O() and we have identified (n N )-vectors with diagonal matrices p = diag( p1 , . . . , pn N ) ⊗ σ 0 . Localization can then be applied to the symplectic integral (5.3) in three different ways, by: 1. Considering p ∈ u(N ) and observe that Z O ( p) can be considered as being invariant under p → U −1 p U for U ∈ U (N ). One can then evaluate the integral over the orbit space O() directly using the Itzykson-Zuber formula (6.1) for the unitary group U (N ). This is essentially the calculation that was carried out in [9], which is adapted to the present formulation in Sect. 6. It amounts to an abelian localization of the original orbit integral via the Duistermaat-Heckman theorem.
234
H. Steinacker, R. J. Szabo
2. Considering p ∈ u(n N ) ⊗ σ 0 and apply abelian localization to the maximal torus T of the gauge group G = U (n N ). This will be elaborated in detail in Sect. 7, taking advantage of a suitable polar decomposition of the orbit space. This in turn will involve a localization onto the radial U (N+ ) × U (N− )-foliation, accompanied by a fluctuation integral over the moduli space of symplectic leaves. 3. Adding a localization form Qα as in Sect. 4, and applying nonabelian localization techniques to write the partition function as a sum over local contributions from Yang-Mills critical points. Technique 3 here was of course dealt with at length in Sect. 4, and will be compared in some detail to the other two approaches below. Comparison with Technique 1 first is interesting in its own right as a comparison between the matrix model approach of [9] to gauge theory on S N2 and the results of the present paper. It is also a useful warm-up to the abelianization approach of Technique 2 which shares some of its qualitative features. We will find that the abelianization technique through the polar decomposition of the configuration space exploits the radial coordinates in a rather explicit way to describe the local geometry of Yang-Mills critical surfaces, and it may also find useful applications in related considerations. 6. Itzykson-Zuber Localization on the Configuration Space The integral (5.3) can be evaluated immediately using the Itzykson-Zuber formula [46], which we briefly recall. If X, Y are m × m hermitian matrices with nondegenerate eigenvalues xi , yi ∈ R, i = 1, . . . , m, then one has iN e s xi y j det 1≤i, j≤m , (6.1) [dU ] exp i sN Tr X U Y U † = c N (m, s) (x) (y) U (m) where for m ∈ N and s ∈ C we have defined c N (m, s) := vol (U (m)) ( i N /s)
−m (m−1)/2
m−1 '
k!.
(6.2)
k=1
Applied to the present situation for U (N ), this yields 1 Z O ( p) = [dU ] exp − 2i Tr U −1 U vol (U (N+ )) vol (U (N− )) U (N ) i e − 2 i j det 1≤i, j≤N , (6.3) = c1 (N , 2) () () where = diag( p1 , . . . , pn N )⊗σ 0 and c1 (N , 2) := c1 (N , 2)/vol(U (N+ )) vol(U (N− )). This formula can be understood as an abelian localization with respect to the action of the maximal torus group U (1)N on the flag manifold U (N )/U (1)N [29]. The corresponding fixed points are the solutions of the equation [C, ] = 0,
(6.4)
Localization for Yang-Mills Theory on the Fuzzy Sphere
235
which are the saddle-points of the Itzykson-Zuber integral, and the expansion of the determinant in (6.3) into a sum over permutations π ∈ SN gives the sum over critical points in the localization formula. This is completely analogous to the abelianized localization of Sect. 7. However, the expression (6.3) is formal as it stands because both sets of eigenvalues i and i are degenerate, and correspondingly the critical surfaces are in fact nontrivial spaces. Therefore (6.3) has to be defined using an appropriate limiting procedure which removes the degeneracy. The partition function (4.3) is then given by i ! e − 2 i j det g c (N , 2) dp 2 1≤i, j≤N e − 4 Tr ( p ) ( p)2 Z= 1 , (6.5) (n N )! 2π () () Rn N where the set of eigenvalues i of consists of two copies of ( p1 , . . . , pn N ) and is therefore highly degenerate. While this explicit formula in terms of an n N -dimensional integral is very appealing, the ratio of degenerate determinants in (6.5) makes it difficult to evaluate explicitly [9], and its combinatorial expansion is even more intricate than that of Sect. 7.3. Thus far only an asymptotic analysis (of a slightly modified integral) has been made possible in [9]. The reason for this complexity is the fact that, without the addition of a suitable localization form Qα to the path integral (4.3), the localization is onto the solutions of Eq. (6.4) in O which are not related to the critical surfaces of the Yang-Mills action in any simple way. This will be explored in more detail below. 7. Abelian Localization and Radial Coordinates We now return to the symplectic orbit integral (5.3), and observe that it fulfills the conditions of the Duistermaat-Heckman theorem, or equivalently the abelian version of the localization theorem of Sect. 4.1. Therefore, we have mapped the original nonabelian localization problem to the simpler problem of abelian localization. Indeed, µT (C), p = Tr ( p C) is just the restriction of the moment map µ : O() → u(N )∨ to the maximal torus T of the gauge group G. The torus action on the orbit space O() is the restriction of the adjoint G-action given by C −→ P C P −1
(7.1)
for C = Cµ ⊗ σ µ = U U −1 ∈ O(), U ∈ U (N ) and P ∈ T . To compute the corresponding localization formula we need the fixed points of this T -action. They are given by those C ∈ O() which commute with the T -action generated by the element p ∈ u(1)n N , so that [C, p] = 0.
(7.2)
This equation will be studied in detail in Sect. 8. It is solved by those U ∈ U (N ) for which U −1 P U lies in the stabilizer subgroup U (n N+ ) × U (n N− ) ⊂ U (N ) of the element (with N± := N ± 1 as before). The saddle points U are generically also labelled by permutation matrices ∈ U (N ) representing elements π ∈ Sn N . On the configuration space O, the saddle point equation (7.2) means that C commutes with the characteristic projectors of p, i.e. C has the same block decomposition as p. The Fourier transform (5.3) will thus generically localize onto a subspace of U (n N+ )× U (n N− ) in O. It may be evaluated with the help of the degenerate version of the Duistermaat-Heckman theorem [29], which expresses it in terms of an integral over the
236
H. Steinacker, R. J. Szabo
critical submanifold U (n N+ ) × U (n N− ) with the quantum fluctuation determinants determined by the T -equivariant Euler class of the normal bundle to the stabilizer [41]. While this can be worked out in principle, it is rather cumbersome to do in practise. Instead we will proceed in a more direct fashion by exploiting some further geometrical properties of the configuration space O, which in the next section will be related to the local symplectic geometry near each Yang-Mills critical point as analysed at length in Sect. 3. This explicit calculation will justify the abelianized localization a priori, with the quantum fluctuation determinants given by integrals over symplectic leaves of a foliation of the configuration space parametrized by abelian subspaces of the tangent spaces to O. The symplectic integral (5.3) could also be analysed using Fourier transform techniques along with the Guillemin-Lerman-Sternberg theorem [47], as in [23–25], but this leads to much more complicated combinatorial expressions than the ones we derive. 7.1. Polar decomposition of the configuration space. The key step in the evaluation of (5.3) is the introduction of radial coordinates on the orbit space (see [36–38] for details). Let us go back to the Cartan decomposition (3.21) at a given point C ∈ O. Let t be a maximal abelian subalgebra in the tangent space TC O ∼ = ker(J 2 + 1lN ). Then the radial coordinates on the orbit space O are given by (7.3) U = V R V −1 = V R j V −1 , where V ∈ U (n N+ ) × U (n N− ), modulo elements of the centralizer of t, and R ∈ exp(t) up to the adjoint action of the Weyl group of the restricted root system of the irreducible symmetric space O. By definition, they satisfy the respective commutation and anticommutation relations V = V
R = R −1 .
and
(7.4)
The corresponding covariant coordinate C ∈ O() is then given by C = U U −1 = V R R −1 V −1 = 21 V R 2 + R −2 V −1 =
1 2
V R 2 V −1 + V R −2 V −1 .
(7.5)
The jacobian for the change of invariant integration measure on O can be computed by standard techniques with the result dC = r (n, N ) [dV ]
dim( 't)
dri
i=1
' sin(α, log R)m α ,
(7.6)
α>0
where4 r (n, N ) =
vol (U (N ))
2n
2 (N 2 −1)/2
vol (U (n N+ ))2 vol (U (n N− ))2 2n (N −2n N −3)/2
.
(7.7)
The radial coordinates ri ∈ [0, π2 ] are the eigenvalues of U , while V are the angular coordinates with [dV ] denoting the standard invariant Haar measure. The second product *
4 The normalization constant r (n, N ) is determined by the requirement O dC = vol(O).
Localization for Yang-Mills Theory on the Fuzzy Sphere
237
runs over positive roots of the restricted root lattice on O, and m α is the multiplicity of the root α in the Cartan decomposition (3.21). The pairing is defined by choosing an orthonormalbasis ei in weight space and identifying a root vector α with the dual element α ∨ = i αi ei . Then (α, log R) = i αi ri . This polar decomposition defines a foliation of the configuration space O by conjugacy classes under the adjoint action of the stabilizer subgroup. The radial symplectic leaves L(R) of this foliation are parametrized by the abelian Lie group exp(t). Let us make this decomposition more explicit using the known data for the symmetric space (2.23) [37]. The restricted root lattice is given by the root system BCn N− = Bn N− ∪ Cn N− , which has positive weights ei ± e j , 2ei and ei with i, j = 1, . . . , n N− , i < j. The corresponding multiplicities are m ei ±e j = 2, m 2ei = 1 and m ei = 4n. The gauge invariant volume form on O thereby becomes '
n N−
dC = r (n, N ) [dV ]
dri sin 2ri sin4n ri
i=1
'
sin2 (ri − r j ) sin2 (ri + r j ). (7.8)
i< j
Using the trigonometric identities sin(ri − r j ) sin(ri + r j ) =
1 2
(cos 2r j − cos 2ri )
and
sin2 ri =
1 2
(1 − cos 2ri ) (7.9)
and defining λi := cos 2ri ∈ [−1, 1], we may bring the measure to the form dC =
'
n N−
r (n, N ) 2 2 2n (N −1)
[dV ] (λ)2
dλi (1 − λi )2n .
(7.10)
i=1
A convenient choice for the radial coordinates is provided by setting ρ := diag(r1 , . . . , rn N− )
(7.11)
and defining R = diag σ 0 ⊗ 1ln , exp( i σ 1 ⊗ ρ) = diag σ 0 ⊗ 1ln , σ 0 ⊗ cos(ρ) + i σ 1 ⊗ sin(ρ) .
(7.12)
We also choose a basis in which =
N 2
diag 1ln N+ , −1ln N− =
N 2
diag σ 0 ⊗ 1ln , σ 3 ⊗ 1ln N−
(7.13)
and V ∈ U (n N+ ) × U (n N− ) is given by V = diag(V+ , V− ),
(7.14)
with V± ∈ U (n N± ) and [dV ] = [dV+ ] [dV− ]. The relations (7.4) are then automatically satisfied.
238
H. Steinacker, R. J. Szabo
7.2. Evaluation of the abelianized partition function: U (1) gauge theory. We will now explicitly evaluate the Fourier transform (5.3), beginning with the abelian case n = 1. Using (7.5) and (7.11)–(7.14), it is straightforward to work out the abelian moment map in (5.3) with the result µT (C), p = Tr p U U −1 = 21 Tr p V (R 2 + R −2 ) V −1 = Tr p V diag σ 0 , cos(2σ 1 ⊗ ρ) V −1 = N2 Tr diag( p1 σ 0 , p2 , . . . , p N ) V+ diag(σ 0 , λ1 , . . . , λ N− ) V+−1 − N2 Tr diag( p2 , . . . , p N ) V− diag(λ1 , . . . , λ N− ) V−−1 , (7.15) where we have used an inconsequential redefinition of the unitary matrix V+ by multiplication with an appropriate permutation matrix. Upon substitution into (5.3), we see that the two angular integrals decouple from each other. The integral over V− ∈ U (N− ) is now easily evaluated with the help of (6.1) with the result c N (N− , 4) ( p2 , . . . , p N ) (λ)
sgn(π− )
π− ∈SN−
N− '
e
iN 4
pi+1 λπ− (i)
.
(7.16)
i=1
The integral over V+ ∈ U (N+ ) is more delicate since the Itzykson-Zuber formula will involve a ratio of degenerate determinants. Since both numerator and denominator of (6.1) are completely antisymmetric functions of the eigenvalues xi and yi independently, the limit where some eigenvalues coalesce gives a well-defined analytic function in (xi , yi ) because all poles are cancelled by zeroes in the determinant. We will regularize the V+ -integral by replacing the first p1 entry in the last line of (7.15) with an auxiliary momentum variable p0 ∈ R, the second entry of 1 with an auxiliary radial variable λ0 ∈ [−1, 1], and then afterwards take the limits p0 → p1 , λ0 → 1. Defining λ N := 1, the Itzykson-Zuber formula (6.1) applied to the regularized V+ -integral yields c N (N+ , −4) ( p0 , p1 , . . . , p N ) (λ0 , λ1 , . . . , λ N ) ×
sgn(π+ ) e
− i 4N p0 λπ+ (N )
π+ ∈SN+
N− '
e−
iN 4
pi+1 λπ+ (i)
.
(7.17)
i=0
Taking the limit p0 → p1 first using l’Hôpital’s rule gives iN 4
p1
N
c N (N+ , −4)
( p1 − pi ) ( p) (λ0 , λ1 , . . . , λ N )
i=2
×
π+ ∈SN+
sgn(π+ ) λπ+ (N ) e −
iN 4
p1 λπ+ (N )
N− ' i=0
e−
iN 4
pi+1 λπ+ (i)
.
(7.18)
Localization for Yang-Mills Theory on the Fuzzy Sphere
239
Finally, taking the limit λ0 → 1 again using l’Hôpital’s rule yields − i4N c N (N+ , −4) p1
N
( p1 − pi ) ( p)
i=2
N −
(7.19)
(1 − λi )2 (λ)
i=1
− i 4N p1 λ N− 1 − i N e − i4N p1 λ e − i4N p1 λ1 . . . λ − i4N p1 e e 1 N − iN iN 4 − i 4N p1 λ N− − i 4N p1 λ1 − i4N p1 − 4 p 1 e − 4 p1 e ... e e . × .. .. .. .. . . . . − i N p e − i4N p N e − i4N p N λ1 . . . e − i4N p N λ N− e − i4N p N N 4 Substituting the above into (5.3) gives us the expression
4 vol(O) Z O ( p) = − p1 ( p)2
×
N ! (N − 1)! √
sgn(π− )
π− ∈SN−
8N
N− '
N −2
(k!)2
k=1 N 2 −N
e
iN 4
N− ' l=1
1
−1
dλl
pi+1 λπ− (i)
i=1
− i 4N p1 λ N− 1 − i N e − i4N p1 λ e − i4N p1 λ1 . . . λ − i4N p1 e e 1 N − iN iN 4 − i 4N p1 λ N− − i 4N p1 λ1 − i4N p1 − 4 p 1 e − 4 p1 e ... e e . × .. .. .. .. . . . . − i N p e − i4N p N e − i4N p N λ1 . . . e − i4N p N λ N− e − i4N p N N 4 (7.20) We will now write the product of determinants in (7.20) as a single sum over the Weyl group S N of the original gauge symmetry group U (N ). For this, we embed S N− in the Weyl group S N as the subgroup of permutations π− of {1, . . . , N− , N } with π− (N ) = N . We perform a Laplace expansion of the second determinant in (7.20) into minors along the first row to write − i 4N p1 λ N− 1 − i N e − i4N p1 λ e − i4N p1 λ1 . . . λ − i4N p1 e e 1 N − iN iN 4 − i 4N p1 λ N− − i 4N p1 λ1 − i4N p1 − 4 p 1 e − 4 p1 e ... e e .. .. .. .. . . . . − i N p e − i4N p N e − i4N p N λ1 . . . e − i4N p N λ N− e − i4N p N N 4 + N '
iN iN = sgn(π+ ) 1 − i4N p1 e − 4 p1 e − 4 λi pπ+ (i) π+ ∈SN
−
i=1
N iN iN λi e − 4 4 i=1
λi p1
pπ+ (i) e −
iN 4
pπ+ (i)
⎤ N ' k=1 k=i
e−
iN 4
⎥ ⎦ . (7.21)
λk pπ+ (k) ⎥
240
H. Steinacker, R. J. Szabo
When inserted into the expression (7.20), we can use the invariance of the radial integration measure and domain under permutations of the λi ’s to reduce the double sum over the Weyl groups to a single sum over the relative permutation π := π+ π−−1 ∈ S N with π(N ) = π+ (N ). The sum over π+ can be replaced by a sum over π , while the remaining sum over π− simply produces the order N ! of the Weyl group of U (N ). In this way we may bring the Fourier transform of the orbit into the form N
4 vol(O) Z O ( p) = − p1 ( p)2 ×
(k!)2
k=1
(N − 1)!
sgn(π )
√ N 2 −N 8N
1−
iN 4
p1
e−
iN 4
( p1 + pπ(N ) )
π ∈SN
×
N− ' i=1
−
1 −1
dλi e −
iN 4
λi ( pπ(i) − pi+1 )
N− iN iN pπ( j) e − 4 4
( pπ( j) + pπ(N ) )
j=1
×
−
1 −1
iN 4
dλ j λ j e −
iN 4
λ j ( p1 − p j+1 )
N− ' i=1 i= j
pπ(N ) e −
iN 4
( pπ(N ) + p1 )
N− ' i=1
1 −1
1 −1
dλi e −
iN 4
λi ( pπ(i) − pi+1 )
⎤ dλi e −
iN 4
λi ( pπ(i) − pi+1 ) ⎦
.
(7.22)
Finally, the radial integrations can be expressed in terms of the spectral sine-kernel of the unitary ensemble of random matrix theory and its derivative given by 1 1 sin x = dλ e − i λ x and K(x) := x 2 −1 1 sin x i 1 cos x − =− dλ λ e − i λ x . (7.23) K (x) = x x 2 −1 Then the abelianized partition function (5.2) is written as an exact expansion in gaussian momentum transforms given by 8 vol(O) N ! (N − 1)! Z =− 2 N (2π ) N 2
⎡
× ⎣ 1−
iN 4
p1
√ 2N
e
N −2
(k!)2
k=1 N 2 −N
− i 4N ( p1 + pπ(N ) )
g
e − 4N sgn(π ) [d p] p1 RN π ∈SN N− ' i=1
K
N 4
( pπ(i) − pi+1 )
i
pi2
Localization for Yang-Mills Theory on the Fuzzy Sphere N− iN N pπ( j) e − 4 4
+
( pπ( j) + pπ(N ) )
K
N 4
j=1
−
iN 4
pπ(N ) e −
iN 4
( p1 + pπ(N ) )
241
( p1 − p j+1 )
N− '
K
N 4
( pπ(i) − pi+1 )
i=1 i= j
N− '
K
N 4
⎤ ( pπ(i) − pi+1 ) ⎦ .
(7.24)
i=1
For low values of N , the momentum integrals in this formula can be computed in terms of transcendental error functions, which are the typical contributions in nonabelian localization [18] and reflect the occurrence of non-gaussian quantum fluctuation integrals. Note that there is a single momentum p1 singled out in the formula (7.24). In the U (n) case of Sect. 7.3 below there will be n momenta singled out which is where the sum over sets of n integers required by the nonabelian localization
formula in the large N limit will come from. At N → ∞, the spectral kernels K N4 ( pπ(i) − pi+1 ) ≈ 4π N δ( pπ(i) − pi+1 ) provide the necessary groupings of variables into partitions of N arising from the sum over the residual gauge symmetry group S N . The conjugacy class of a given permutation π ∈ S N is characterized entirely by its cycle decomposition, which contains n k ≥ 0 cycles of length k for k = 1, . . . , N with N = k k n k and sgn(π ) = (−1) k (k−1) n k . However, the saddle-point partitions here do not correspond to the cycles themselves, but rather to the numbers Nn 1 ,...,n N of cycles (n 1 , . . . , n N ). For instance, the vacuum state now corresponds to the instanton configuration with N fluxons, i.e. only trivial representations due to the abelianization, with moduli space (3.11) as described in Sect. 3.1. The higher critical points consist of an 3 even number of irreducible representations which are suppressed roughly as e −N /2g n i . This indicates that the radial coordinates on the configuration space O are not so nicely adapted to the local symplectic geometry of the Yang-Mills critical surfaces. We will return to these issues in the next section.
7.3. Evaluation of the abelianized partition function: U (n) gauge theory. The nonabelian case n > 1 becomes very complicated due to the increasing complexity of the combinatorics involved in regulating the Itzykson-Zuber integral (6.1) over V+ ∈ U (n N+ ). We will therefore only briefly sketch the essential features, deferring the explicit evaluation in favour of a more formal, regulated combinatorial expansion. Consider the radial coordinates λi , i = 1, . . . , n N− on O and add 2n new real variables 1 + εi . We assemble them into the ordered set defined by
λ1 , . . . , λn N+ := 1 + ε1 , . . . , 1 + ε2n , λ1 , . . . , λn N− .
(7.25)
Similarly, we double the first n entries of the momentum vector p = ( p1 , . . . , pn N ) and gather them into the ordered set defined by
p 1 , . . . , p n N+ := ( p1 + κ, . . . , pn + κ, p1 , . . . , pn , pn+1 , . . . , pn N ) . (7.26)
At the end we will take the limits εi , κ → 0. The evaluation of the Fourier transform (5.3) now proceeds exactly as in Sect. 7.2 above. To organize the combinatorics, we use the identity
242
H. Steinacker, R. J. Szabo
det
1≤i, j≤n N+
lim
e−
iN 4
pi λ j
(ε)
iN vol (U (2n)) = sgn Q → { pi } e − 4 c N (2n, −4) Q⊂{ pi } − i 4N pˆ i λ j , × e
(7.27)
εi →0
qi
i
(q)
det
1≤i, j≤n N−
where { pˆ 1 , . . . , pˆ n N− } = { p 1 , . . . , p n N+ } \ Q with Q = {q1 , . . . , q2n } a subset of { p 1 , . . . , p n N+ } which is ordered according to (7.26), and the sign is determined by the parity of the embedding. The identity (7.27) can be derived by performing a Laplace expansion of the determinant on the left-hand side into the 2n rows containing the variables 1 + εi , and using the limit formula iN e − 4 qi ε j det vol (U (2n)) 1≤i, j≤2n lim = (q), (7.28) εi →0 (ε) c N (2n, −4) which follows from the Itzykson-Zuber formula (6.1). The Vandermonde determinants can also be factorized as '
n N−
( λ ) = (λ) (ε)
(1 − λi )2n
(7.29)
i=1
up to higher order terms in εi → 0, along with ( p ) ( pn+1 , . . . , pn N ) = κ n ( p)2 ( p1 , . . . , pn )2
(7.30)
in the limit κ → 0. In this way the partition function (5.2) can be expanded as Z = ζn,N lim
κ→0
1 κn
Q⊂{ pi } g
sgn Q → { pi }
e − 4N i pi − i4N i qi (q) × [d p] e ( p1 , . . . , pn )2 Rn N n N− 1 iN ' e 4 pi+n λ j det dλl det × l=1
−1
2
1≤i, j≤n N−
1≤i, j≤n N−
e−
iN 4
pˆi λ j
,
(7.31)
where ( i N )2n +n N (1−n N+ ) vol(O) (n N )! (2π )n N 2n N− (2−n N+ ) 2
ζn,N :=
n N− −1
'
k=1
(k!)2
2n ' (m + n N− − 1)! . m!
m=1
(7.32) We now expand the two determinants in (7.31) into a double sum over the Weyl group Sn N− , and use permutation symmetry of the radial integration to rewrite it as a sum
Localization for Yang-Mills Theory on the Fuzzy Sphere
243
over a single relative permutation exactly as in Sect. 7.2 above. Using (7.23) we arrive finally at the exact combinatorial expansion Z = 2n N− (n N− )! ζn,N lim
κ→0
×
Rn N
g
1 κn
Q⊂{ pi }
iN e − 4N i pi [d p] e− 4 2 ( p1 , . . . , pn ) 2
sgn Q → { pi }
sgn(π )
π ∈Sn N− i
'
n N− qi
(q)
K
N 4
( pˆ π(i) − pi+n ) .
i=1
(7.33) The combinatorics of the large function (7.33) can be described
N limit of the partition as follows. The sine-kernels K N4 ( pˆ π(i) − pi+n ) ≈ 4π N δ( pˆ π(i) − pi+n ) define a link from pˆ π(i) to pi+n . Following these, we obtain a set of open or closed links determined by π ∈ Sn N− . The open links must start at { p1 +κ, . . . , pn +κ, p1 , . . . , pn } (since those are not contained in the pi+n ) and end at {q1 , . . . , q2n } (since those are not contained in the pˆ i ). The closed links correspond to cycles in the conjugacy class of the permutation π . In iN particular, there are no factors e − 4 pi , i = 1, . . . , n or ( p1 , . . . , pn )2 for the internal variables, and hence we can explicitly evaluate the internal integrals. The difficulty lies in evaluating the sum over all possible distinct cycles for the internal variables in a closed form. Comparison with the constrained matrix model. In [9], quantum gauge theory on the fuzzy sphere S N2 was formulated as a multi-matrix model with action Smm =
1 Ng
Tr C 2 −
N2 4
1lN
2
(7.34)
and the constraint C0 = 21 1l N . It was shown that this matrix model also reproduces YangMills theory on S 2 in the large N limit. This differs from the formulation of the present 2 2 paper essentially by replacing the pair (action , constraint) given by (C − N4 1lN )2 , 2 (C0 − 21 1l N ) with the permuted pair (C0 − 21 1l N )2 , (C 2 − N4 1lN ) . This can be understood by imposing the respective constraints using gaussian terms in the actions, as then the tangential degrees of freedom are essentially the same in both cases. The symplectic formulation of the present paper has not only the advantage of applying the equivariant localization principle to systematically construct the instanton expansion of gauge theory on the fuzzy sphere, but it also somewhat simplifies the evaluation of the matrix integral. It also enables one in principle to keep control of the N1 corrections to Yang-Mills theory on S 2 , and the approximate delta-functions at N → ∞ responsible for the groupings of variables are more transparent along the lines explained in Sects. 7.2 and 7.3.
8. Yang-Mills Critical Surfaces in Abelianized Localization In this final section we will elucidate the relationship between the nonabelian and abelianized localization approaches to the exact instanton expansion of Yang-Mills theory
244
H. Steinacker, R. J. Szabo
on the fuzzy sphere S N2 . As discussed above, the critical surfaces for abelian localization are determined by the saddle-point equation (6.4), (7.2), [C, ] = 0
(8.1)
φ ⊗ σ0
for = with φ ∈ u(N ), which can be assumed to be diagonal by using a gauge transformation. Its distinct eigenvalues ν are arranged into degenerate blocks as = with
ν
k
ν 1ln ν ⊗ σ 0
(8.2)
ν=1
n ν = N . Then [C, ] = 0 implies that the covariant coordinate C = U −1 U =
k
Cν
(8.3)
ν=1
has the same block decomposition as . Thus it can be diagonalized as Cν = Vν−1 ν Vν ,
(8.4)
where Vν is a 2n ν × 2n ν unitary matrix on the block defined by 1ln ν ⊗ σ 0 in (8.2), and ν has eigenvalues ± N2 . Then comparing (8.3) and (8.4) implies k k k −1 −1 Vν U U Vν ν = −1 (8.5) = ν=1
ν=1
ν=1
for some permutation matrix ∈ U (N ) representing an element π ∈ SN /S N+ ×S N− , since both and ν ν are diagonal N × N matrices with the same set of degenerate eigenvalues. It follows that k −1 U Vν (8.6) −1 ∈ U (N+ ) × U (N− ), ν=1
and therefore U ∈ U (N ) is equal to ν Vν times an element of the stablizer subgroup U (N+ ) × U (N− ) ⊂ U (N ) of the element . We conclude that the gauge equivalence classes of solutions of the saddle point equation [C, ] = 0 in the configuration space O are described by the following data: • A quotient permutation π ∈ SN /S N+ × S N− ; • A unitary matrix in the stabilizergroup U (N+ ) × U (N− ); and • A unitary block transformation ν Vν adapted to the block decomposition (8.2) of . It is evident that these critical surfaces are much larger than the critical surfaces of the original Yang-Mills action, and they are not even in any one-to-one correspondence with the Yang-Mills saddle points. Any such block configuration is degenerate for the action in (4.3), and contains some Yang-Mills blocks of Sect. 3.4 (with the irreducible low-energy critical surface C(N ,1) and possibly fluxons or other purely noncommutative solutions). The reason is the absence of any localization form Qα, without which there is no way to separate the desired Yang-Mills blocks of Sect. 3.4 from these abelianized critical surfaces.
Localization for Yang-Mills Theory on the Fuzzy Sphere
245
8.1. Itzykson-Zuber localization on the symplectic leaves. We now consider the foliation of the orbit O() ∼ = U (2N )/R by conjugacy classes under the adjoint action of the stabilizer group R = U (N+ ) × U (N− ). The corresponding symplectic leaves L(λ) are parametrized by the radial coordinates λi ∈ [−1, 1], i = 1, . . . , N− . For a given * i leaf L(λ), the integral R [dV ] e − 2 µT (C), p is obtained by using the Itzykson-Zuber formula for the unitary groups U (N+ ) and U (N− ), as we did in Sects. 7.2 and 7.3. As in Sect. 6 above, the Itzykson-Zuber formula can itself be regarded as a consequence of abelian localization, and the expansions of the resulting determinants in Sect. 7.2 is precisely the sum over the saddle-points on each leaf L(λ). Let us identify these saddle-points explicitly. Choosing as in (7.13), the critical points of the moment map (7.15) with respect to arbitrary variations of (V+ , V− ) ∈ R are given by the solutions of the equations
diag( p2 , . . . , p N ), V− diag(λ1 , . . . , λ N− ) V−−1 = 0,
diag( p1 σ 0 , p2 , . . . , p N ), V+ diag(σ 0 , λ1 , . . . , λ N− ) V+−1 = 0. (8.7) As in Sect. 7.3, we consider for convenience the extended sets of radial coordinates (7.25) and momentum variables (7.26) for n = 1. Then the first equation in (8.7) means that the matrix V− diag(λ1 , . . . , λ N− ) V−−1 commutes with the spectral projectors of ( p2 , . . . , p N ), i.e. it has the same block decomposition, and similarly the second equation in (8.7) implies that the matrix V+ diag( λ1 , . . . , λ N+ ) V+−1 commutes with the spectral projectors of p. Using unitary transformations on each of these blocks, the matrix V− diag(λ1 , . . . , λ N− ) V−−1 can then be diagonalized with the same eigenvalues λi . It follows that
k ν=1
Uν
V− diag(λ1 , . . . , λ N− ) V−−1
k ν=1
Uν−1
−1 = diag(λπ− (1) , . . . , λπ− (N− ) ) = − diag(λ1 , . . . , λ N− ) −
(8.8)
for some Uν ∈ SU (n ν ), where n ν labels the degenerate blocks of ( p2 , . . . , p N ) with to an element ν n ν = N− and − ∈ SU (N− ) is a permutation matrix
corresponding V U = π− ∈ S N− . If λi are nondegenerate, this implies that − − and hence ν ν V− =
k ν=1
Uν−1
− .
(8.9)
−1 If some λi are degenerate, it only follows that − ν Uν V− commutes with the
−1 ˜ ˜ spectral projectors of λ, so that − ν Uν V− = ν Uν for some Uν ∈ SU (n ν ). It follows that the angular saddle-point V− ∈ U (N− ) is given by k k −1 ˜ V− = Uν Uν . − (8.10) ν=1
ν=1
Similar statements hold for the angular saddle-point V+ ∈ U (N+ ), with the additional feature that the first two entries of p and λ are degenerate by definition.
246
H. Steinacker, R. J. Szabo
In each case, the value of the action (7.15) is given by µT (C), p =
N− N+ N N pi λπ+ (i) − pi+1 λπ− (i) . 4 4 i=1
(8.11)
i=1
Therefore, each saddle-point is characterized by two permutation matrices ± corresponding to π± ∈ S N± , which may or may not generate non-trivial fibers on the homogeneous spaces of the group ν U (n ν ) depending on the degeneracies of p and λ. The integral over these V± orbits can then be evaluated using the Itzykson-Zuber formula leading to (7.16) and (7.17), which gives precisely the sum over the saddle points. The regularization required in (7.17) reflects the fact that the critical surfaces are no longer isolated points, due to the degeneracies of λi and pi . The main point of this analysis is that these critical surfaces are again not in any oneto-one correspondence with those of the original Yang-Mills action. In fact, the abelian critical surfaces above contain as subspaces those of the Itzykson-Zuber localization on O() discussed in Sect. 6 above, which are not only stationary on the symplectic leaves L(λ) but also with respect to variations of the radial coordinates λi . However, even the critical surfaces for the Itzykson-Zuber localization on the configuration space O() are not simply related to those of the Yang-Mills action. In particular, the variational problem for the action (8.11) does not determine the λi . A given radial saddle-point π± can thus correspond to various types of Yang-Mills solutions by appropriately choosing some λi , as we show explicitly in Sect. 8.2 below. This arbitrariness in the radial coordinates λi is lifted by the addition of the localization one-form α of Sect. 4, which serves to single out the Yang-Mills saddle points from the new critical points. Nevertheless, it is instructive to work out the radial coordinates of some Yang-Mills saddle-points to illustrate the powerful workings of the polar decomposition. 8.2. Radial coordinates for Yang-Mills critical surfaces. We will now work out the radial coordinates for the solutions of the Yang-Mills equations [C0 , Ci ] = 0, which will identify precisely the appropriate localization values of λi for each critical surface of Sect. 3.1. Given (7.13) we now consider the fuzzy sphere coordinates 0 0−1 and correspondingly modify the radial coordinates (7.12) to 0 0 σ (8.12) 0−1 , R = 0 0 exp( i σ 1 ⊗ ρ) where 0 ∈ U (N ) is a permutation matrix representing the cyclic permutation π(N+ ) = (1 2 · · · N+ ).
(8.13)
As we will see, the modification by 0 , although irrelevant from the point of view of the path integral, will greatly simplify the explicit parametrization. Using this parametrization and (7.14), we can write the covariant coordinates (7.5) in the explicit form 0 N σ 0 V 0 C= 0−1 V −1 0 σ 3 ⊗ cos(2ρ) + σ 2 ⊗ sin(2ρ) 2 ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 1 0 −1 −1 ⎝ cos(2ρ) ⎠ V+ N ⎜ − i V+ ⎝ sin(2ρ) ⎠ V− ⎟ ⎜ V+ ⎟ , (8.14) = ⎝ ⎠ 1 0 2
−1 −1 i V− 0 , sin(2ρ) , 0 V+ −V− cos(2ρ) V−
Localization for Yang-Mills Theory on the Fuzzy Sphere
247
where we have applied the commutation relation i [σ 1 , σ 3 ] = 2σ 2 . The role of the cyclic permutation matrix 0 is to move the unit entries of σ 0 symmetrically around the matrix cos(2ρ). We note for later use that if the unitary matrices V± ∈ U (N± ) are block-diagonal, then so is C. We will now use this parametrization to illustrate the use of the radial coordinates by working out (8.14) explicitly for various classical gauge field configurations of Sect. 3.1. The vacuum solution. The generators of the irreducible N -dimensional representation of the su(2) Lie algebra (2.1) are given explicitly by # (ξ3 )i j = −δi j N +1−2i and (ξ+ )i j = δi+1, j (N − i) i, (8.15) 2 where i, j = 1, . . . , N and ξ± = ξ1 ± i ξ2 with ξ− = ξ+† . The vacuum solution (2.4) in the abelian case n = 1 thus has the explicit form 1 1l + ξ3 ξ+ C= 2 N 1 ξ− 2 1l N − ξ3 1 diag(−N + 2, . . . , N − 2, N ) ξ+ (8.16) = 2 1 ξ− 2 diag(N , . . . , −N + 4, −N + 2) using the splitting into equal blocks of size N . This should be identified with (8.14), which splits into blocks of sizes N± . Noting the explicit form of ξ± in (8.15) as raising and lowering operators, it follows that one can consistently take both V+ diag(λ1 , . . . , λ N− , 1, 1) V+−1 and V− diag(λ1 , . . . , λ N− ) V−−1 to be diagonal matrices. We can then consistently match the eigenvalues as N (λ1 , . . . , λ N− , 1, 1) = (−N + 2, . . . , N − 2, N , N ),
(8.17)
λi = − N −2i for i = 1, . . . , N− N
(8.18)
which gives
and provides the eigenvalues of the radial matrix R for the vacuum critical surface C(N ,1) . Note that the eigenvalue N2 from the second diagonal block 21 1l N − ξ3 of C in (8.16) is contained in the matrix N2 V+ diag(λ1 , . . . , λ N− , 1, 1) V+−1 . It follows that V− = − is a permutation matrix in U (N− ), while V+ = + U2 is a permutation matrix up to a possible conjugation with a unitary matrix U2 ∈ SU (2) ⊂ U (N+ ) acting on the two marked indices labelling the unit entries. We can absorb − by a redefinition of the λi , and hence take V− = 1l N−
(8.19)
without loss of generality. It is also enough to consider the case U2 = 1l N+ . Comparing (8.14) with (8.16), it follows that V+ = +
(8.20)
is a permutation matrix representing the irreducible cycle (8.13) of length N+ . Furthermore, one has ( ( # 4i 2 2 i (N − i) = N2 (ξ+ )i,i+1 (8.21) sin(2ρi ) = 1 − λi2 = 4i N − N2 = N
248
H. Steinacker, R. J. Szabo
for i = 1, . . . , N− , which is indeed the correct representation of ξ± in (8.15), embedded in the correct off-diagonal way in (8.14) due to the block decomposition into sizes N± . Let us point out one interesting feature of the covariant coordinate (8.16). The two diagonal entries of N2 in the center of the matrix constitute a trivial 2 × 2 unit matrix σ 0 which completely decouples from the rest of C. This block can be traced to the σ 0 in the upper-left corner of the first line in (8.14), whose position is determined by the permutation matrix 0 , or equivalently to the auxiliary radial coordinates λi = 1 + εi , i = 1, 2. In fact, any explicit entry of ± N2 in C necessarily decouples from the rest of C, for otherwise C would have eigenvalues of modulus larger than N2 . This means, in particular, that we can permute these two entries using a suitable permutation matrix V+ = + without any effect on C (but it will have an effect on the momenta pi if they are included). This observation will be useful below. This construction clearly generalizes to give the blocks C(n a ) of size 2n a of the critical surfaces C(n 1 ,s1 ),...,(n k ,sk ) corresponding to irreducible SU (2) representations of dimensions n a < N . The most extreme case n a = 1 consists of the one-dimensional representation with C0 (n a = 1) = N2 and Ci (n a = 1) = 0, whereby C(n a = 1) =
N 2
σ 0,
(8.22)
and hence only the explicit σ 0 block survives. Nonabelian generalization. For n ≥ 2, the vacuum critical surface C(N ,1),...,(N ,1) is associated with the solution (3.9) which is a direct sum of n irreducible SU (2) representations of dimension N . This can clearly be obtained by repeating the above construction n times. In particular, V+ = (+ )⊕n is a product of n “marked cycles” as above. Notice, however, that the same saddle point is obtained if one acts with an additional permutation of the 2n auxiliary radial coordinates λi = 1, i = 1, . . . , 2n (recall that the explicit entries ± N2 of C are always isolated). In doing this, the decomposition of V+ into irreducible cycles gets modified. It can nonetheless be made into one irreducible cycle with 2n marked points which come in groups of two at equal distance, for example. This demonstrates that the mapping between the Yang-Mills saddle-points and those of the abelianization approach in Sect. 7 is complicated. In particular, it is not injective. Again, this construction generalizes to blocks of the critical surfaces C(n 1 ,s1 ),...,(n k ,sk ) corresponding to irreducible SU (2) representations of various dimensionalities. Fluxons. Fix an integer 1 ≤ n ≤ N and consider the block gauge field configuration of size 2n given by N V σ 3 ⊗ cos(2ρ) + σ 2 ⊗ sin(2ρ) V −1 C= 2 N V+ cos(2ρ) V+−1 − i V+ sin(2ρ) V−−1 , (8.23) = i V− sin(2ρ) V+−1 −V− cos(2ρ) V−−1 2 which is almost the same as (8.14) above but without the σ 0 block. We choose for i = 1, . . . , n − 1, λi = − n−2i n
(8.24)
along with V+ = (n)
and
V− = 1ln−1 ,
(8.25)
Localization for Yang-Mills Theory on the Fuzzy Sphere
249
where (n) ∈ U (n + 1) is a cyclic permutation matrix representing π(n) := (1 2 · · · n). Then we get explicitly N diag(−n + 2, . . . , n − 2) ξ˜+ , (8.26) C= ξ˜− diag(n − 2, . . . , −n + 2) 2n where ξ˜± are cyclic operators (rather than raising/lowering operators as before). In this case C0 = 0, and hence this solution is part of the orbifold singularities for n coincident fluxons in the moduli space (3.11) of Sect. 3.1, rather than an irreducible representation of the isometry group SU (2). This construction is further used below. In particular, the special case n = 1 gives a single fluxon C(n = 1) = N2 σ 3 . Then there exists a unitary transformation U ∈ SU (2) such that U C(n = 1) U −1 =
N 2
U σ 3 U −1 = ci σ i ,
(8.27)
which gives the position ci of the fluxon on the sphere S 2 . Multi-block solutions. Let us modify the previous radial solution by setting λ1 = ± 1 and taking λi+1 to be given by (8.24), while keeping the angular variables (8.25) in U (n + 2) and U (n) the same. Then the block covariant coordinates (8.23) of size 2(n + 1) are given explicitly as N diag(−n + 2, . . . , n − 2, ± n) ξ+ , (8.28) C= ξ− diag(∓ n, n − 2, . . . , −n + 2) 2n which is almost the same as the vacuum configuration (8.16) for an n-dimensional irreducible representation except that there are two explicit diagonal entries N2 , − N2 instead of N2 , N2 . In particular, C0 is no longer constant and hence the gauge fields (8.28) are not solutions of the Yang-Mills equations of motion. This can be cured by the addition of extra irreducible representations as follows. One can now construct solutions of the Yang-Mills equations with several blocks and arbitrary parameters, i.e. the generic critical surfaces C(n 1 ,s1 ),...,(n k ,sk ) , by joining an even number of copies of (8.28) in a suitable way. Fix another integer m ≥ 1 such that n + m ≤ N , and consider again the block covariant coordinate (8.23) of size 2(n + m) with λ1 = 1, λi = − n−2(i−1) for i = 2, . . . , n and n λ j+n−1 = − m−2(mj−1) for j = 1, . . . , m.
(8.29)
The angular degrees of freedom are given by V+ = (n+m)
and
V− = 1ln+m−1
(8.30)
in U (n + m ± 1), corresponding to the cyclic permutation π(n+m) decomposed as π(n+m) = (π(n) )1,...,n ◦ (π(m) )n+1,...,n+m ◦ (1 n+1),
(8.31)
where the subscripts indicate the indices that the permutations act on. The role of the transposition (1 n+1) is to first interchange the explicit 1 and −1 in (8.29) for the upper block in (8.23), which then takes the form of two copies of the matrix (8.28) but with the correct explicit diagonal entries ± N2 . Since V+ = (n+m) corresponds to an irreducible cycle, C is a direct sum of two irreducible representations with opposite sign and hence lives on the critical surface block C(n,1),(m,−1) with vanishing overall trace. This construction clearly generalizes to an arbitrary number of irreducible representations of the SU (2) isometry group.
250
H. Steinacker, R. J. Szabo
8.3. Action of the gauge group. Finally, let us describe how the gauge symmetry acts on the radially foliated solutions. Recall that the gauge group G ∼ = SU (n N ) is embedded in the symmetry group of the orbit space O as φ = φ0 ⊗ σ 0 in the Lie algebra of G ⊂ SU (2n N ). This embedding is well adapted to the modification of the radial coordinates in (8.12) by the permutation matrix 0 . Indeed, there is an embedding of the “diagonal” subgroup U (n N− ) ⊂ U (n N+ ) × U (n N− ) into G given by taking V− into diag(1ln , V− ) ⊗ σ 0 as ⎞ ⎛⎛ ⎞ 1ln ⎜⎝ ⎠ 0 ⎟ V− ⎟. (8.32) V− −→ ⎜ ⎠ ⎝ 1ln 0 V− This shows explicitly that a large part of the gauge group is part of the stabilizer group R = U (n N+ ) × U (n N− ) which defines the foliation of the radial coordinates. Furthermore, there is an additional symmetry SU (n) ⊂ U (n N+ ) embedded into G by taking U into diag(U, 1ln N− ) ⊗ σ 0 as ⎞ ⎛⎛ ⎞ U ⎜ ⎝ 1ln N ⎠ 0 ⎟ − ⎟. (8.33) U −→ ⎜ ⎠ ⎝ U 0 1ln N− This extra SU (n) symmetry acts on the marked momenta p1 , . . . , pn of Sect. 7.3, and together with the degenerate Itzykson-Zuber localization it is thus responsible for the emergence of the nonabelian gauge symmetry in the commutative limit. The remainder SU (n N )/SU (n N− ) × SU (n) of the gauge group mixes the symplectic leaves, so that the radial foliation is not G-equivariant. Acknowledgement. We thank C.-S. Chu, B. Dolan, H. Grosse, X. Martin and D. O’Connor for helpful discussions. The work of H.S. was supported in part by the FWF Project P16779-N02 and in part by the FWF Project P18657. The work of R.J.S. was supported in part by the EU-RTN Network Grant MRTN-CT-2004-005104.
References 1. Madore, J.: The Fuzzy Sphere. Class. Quant. Grav. 9, 69–88 (1992) 2. Grosse, H., Klimcik, C., Presnajder, P.: Towards Finite Quantum Field Theory in Noncommutative Geometry. Int. J. Theor. Phys. 35, 231–244 (1996) 3. Klimcik, C.: Gauge Theories on the Noncommutative Sphere. Commun. Math. Phys. 199, 257–279 (1998) 4. Carow-Watamura, U., Watamura, S.: Noncommutative Geometry and Gauge Theory on Fuzzy Sphere. Commun. Math. Phys. 212, 395–413 (2000) 5. Baez, S., Balachandran, A.P., Ydri, B., Vaidya, S.: Monopoles and Solitons in Fuzzy Physics. Commun. Math. Phys. 208, 787–798 (2000) 6. Grosse, H., Rupp, C.W., Strohmaier, A.: Fuzzy Line Bundles, the Chern Character and Topological Charges over the Fuzzy Sphere. J. Geom. Phys. 42, 54–63 (2002) 7. Grosse, H., Maceda, M., Madore, J., Steinacker, H.: Fuzzy Instantons. Int. J. Mod. Phys. A 17, 2095 (2002) 8. Presnajder, P.: Gauge Fields on the Fuzzy Sphere. Mod. Phys. Lett. A18, 2431–2438 (2003) 9. Steinacker, H.: Quantized Gauge Theory on the Fuzzy Sphere as Random Matrix Model. Nucl. Phys. B679, 66–98 (2004) 10. Castro-Villarreal, P., Delgadillo-Blando, R., Ydri, B.: A Gauge-Invariant UV-IR Mixing and the Corresponding Phase Transition for U (1) Fields on the Fuzzy Sphere. Nucl. Phys. B704, 111–153 (2005) 11. Ydri, B.: The One-Plaquette Model Limit of NC Gauge Theory in 2D. Nucl. Phys. B762, 148–188 (2007); Ydri, B.: Quantum Equivalence of NC and YM Gauge Theories in 2 D and Matrix Theory. http://arxiv. org/list/hepth/0701057, 2007
Localization for Yang-Mills Theory on the Fuzzy Sphere
251
12. Aschieri, P., Grammatikopoulos, T., Steinacker, H., Zoupanos, G.: Dynamical generation of fuzzy extra dimensions, dimensional reduction and symmetry breaking. JHEP 0609, 026 (2006) 13. Karabali, D., Nair, V.P., Polychronakos, A.P.: Spectrum of Schrödinger Field in a Noncommutative Magnetic Monopole. Nucl. Phys. B627, 565–579 (2002) 14. Alekseev, A.Yu., Recknagel, A., Schomerus, V.: Noncommutative Worldvolume Geometries: Branes on SU (2) and Fuzzy Spheres. J. High Energy Phys. 9909, 023 (1999); Brane Dynamics in Background Fluxes and Noncommutative Geometry. J. High Energy Phys. 0005 010 (2000) 15. Iso, S., Kimura, Y., Tanaka, K., Wakatsuki, K.: Noncommutative Gauge Theory on Fuzzy Sphere from Matrix Model. Nucl. Phys. B 604, 121–147 (2001) 16. Azuma, T., Bal, S., Nagao, K., Nishimura, J.: Nonperturbative Studies of Fuzzy Spheres in a Matrix Model with the Chern-Simons Term. J. High Energy Phys. 0405, 005 (2004); O’Connor, D., Ydri, B.: Monte Carlo Simulation of a NC Gauge Theory on The Fuzzy Sphere. JHEP 0611 016 (2006) 17. Berenstein, D., Maldacena, J.M., Nastase, H.: Strings in Flat Space and pp-Waves from N = 4 Super Yang-Mills. J. High Energy Phys. 0204, 013 (2002) 18. Witten, E.: Two-Dimensional Gauge Theories Revisited. J. Geom. Phys. 9, 303–368 (1992) 19. Minahan, J.A., Polychronakos, A.P.: Classical Solutions for Two-Dimensional QCD on the Sphere. Nucl. Phys. B422, 172–194 (1994) 20. Gross, D.J., Matytsin, A.: Instanton Induced Large N Phase Transitions in Two-Dimensional and FourDimensional QCD. Nucl. Phys. B429, 50–74 (1994) 21. Douglas, M.R., Nekrasov, N.A.: Noncommutative Field Theory. Rev. Mod. Phys. 73, 977–1029 (2001) 22. Szabo, R.J.: Quantum Field Theory on Noncommutative Spaces. Phys. Rept. 378, 207–299 (2003) 23. Jeffrey, L.C., Kirwan, F.C.: Localization for Nonabelian Group Actions. Topology 34, 291–327 (1995); Intersection Theory on Moduli Spaces of Holomorphic Bundles of Arbitrary Rank on a Riemann Surface. Ann. Math. 148 109–196 (1998) 24. Paradan, P.-E.: The Moment Map and Equivariant Cohomology with Generalized Coefficients. Topology 39, 401–444 (2000) 25. Jeffrey, L.C., Kiem, Y.-H., Kirwan, F.C., Woolf, J.: Cohomology Pairings on Singular Quotients in Geometric Invariant Theory. Transf. Groups 8, 217–259 (2003) 26. Woodward, C.T.: Localization for the Norm-Square of the Moment Map and the Two-Dimensional Yang-Mills Integral. J. Symbl. Geom. 3(1), 17–54 (2005) 27. Beasley, C., Witten, E.: Nonabelian Localization for Chern-Simons Theory. J. Diff. Geom. 70, 183–323 (2005) 28. Blau, M., Thompson, G.: Localization and Diagonalization: A Review of Functional Integral Techniques for Low-Dimensional Gauge Theories and Topological Field Theories. J. Math. Phys. 36, 2192– 2236 (1995) 29. Szabo, R.J.: Equivariant Cohomology and Localization of Path Integrals. Lect. Notes Phys. M63, 1–319 (2000); Equivariant Localization of Path Integrals. http://arxiv.org/list/hepth/9608068, 1996 30. Migdal, A.A.: Recursion Equations in Gauge Field Theories. Sov. Phys. JETP 42, 413 (1975) [Zh. Eksp. Teor. Fiz. 69 810–822 (1975)] 31. Rusakov, B.E.: Loop Averages and Partition Functions in U (N ) Gauge Theory on Two-Dimensional Manifolds. Mod. Phys. Lett. A5, 693–703 (1990) 32. Paniak, L.D., Szabo, R.J.: Instanton Expansion of Noncommutative Gauge Theory in Two Dimensions. Commun. Math. Phys. 243, 343–387 (2003) 33. Paniak, L.D., Szabo, R.J.: Lectures on Two-Dimensional Noncommutative Gauge Theory 1: Classical Aspects. Sveske Fiz. Nauka A16, 1–27 (2003); Lectures on Two-Dimensional Noncommutative Gauge Theory 2: Quantization. Lect. Notes Phys. 662, Berlin-Heidelberg-New York:Springer, 2005, pp. 205–237 34. Paniak, L.D., Szabo, R.J.: Open Wilson Lines and Group Theory of Noncommutative Yang-Mills Theory in Two Dimensions. J. High Energy Phys. 0305, 029 (2003) 35. Gross, D.J., Nekrasov, N.A.: Solitons in Noncommutative Gauge Theory. J. High Energy Phys. 0103, 044 (2001) 36. Helgason, S.: Differential Geometry, Lie Groups and Symmetric Spaces. New York:Academic Press, 1978; Groups and Geometric Analysis: Integral Geometry, Invariant Differential Operators and Spherical Functions. New York:Academic Press, 1984 37. Caselle, M., Magnea, U.: Random Matrix Theory and Symmetric Spaces. Phys. Rept. 394, 41–156 (2004) 38. Szabo, R.J.: Finite Volume Gauge Theory Partition Functions in Three Dimensions. Nucl. Phys. B723, 163–197 (2005) 39. Grosse, H., Steinacker, H.: Finite Gauge Theory on Fuzzy CP 2 . Nucl. Phys. B707, 145–198 (2005) 40. Madore, J., Schraml, S., Schupp, P., Wess, J.: Gauge Theory on Noncommutative Spaces. Eur. Phys. J. C 16, 161–167 (2000) 41. Berline, N., Getzler, E., Vergne, M.: Heat Kernels and Dirac Operators. Berlin Heidelberg-New York:Springer-Verlag, 1992
252
H. Steinacker, R. J. Szabo
42. Griguolo, L., Seminara, D., Szabo, R.J.: Instantons, Fluxons and Open Gauge String Theory. Adv. Theor. Math. Phys. 9, 775–860 (2005) 43. Chu, C.-S., Madore, J., Steinacker, H.: Scaling Limits of the Fuzzy Sphere at One Loop. J. High Energy Phys. 0108, 038 (2001) 44. Behr, W., Meyer, F., Steinacker, H.: Gauge Theory on Fuzzy S 2 × S 2 and Regularization on Noncommutative R4 . J. High Energy Phys. 0507, 040 (2005) 45. Guillemin, V., Sternberg, S.: Symplectic Techniques in Physics. Cambridge:Cambridge University Press, 1984 46. Itzykson, C., Zuber, J.-B.: The Planar Approximation. 2. J. Math. Phys. 21, 411 (1980) 47. Guillemin, V., Lerman, E., Sternberg, S.: On the Kostant Multiplicity Formula. J. Geom. Phys. 5, 721–750 (1988) Communicated by A. Connes
Commun. Math. Phys. 278, 253–288 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0390-4
Communications in
Mathematical Physics
Lingering Random Walks in Random Environment on a Strip Erwin Bolthausen1 , Ilya Goldsheid2 1 Universität Zürich, Institut für Mathematik, Winterthurerstrasse 190, CH-8057 Zürich,
Switzerland. E-mail:
[email protected] 2 School of Mathematical Sciences, Queen Mary and Westfield College, University of London,
London E1 4NS, UK. E-mail:
[email protected] Received: 30 July 2007 / Accepted: 1 November 2007 Published online: 8 December 2007 – © Springer-Verlag 2007
Abstract: We consider a recurrent random walk (RW) in random environment (RE) on a strip. We prove that if the RE is i. i. d. and its distribution is not supported by an algebraic subsurface in the space of parameters defining the RE then the RW exhibits the (log t)2 asymptotic behaviour. The exceptional algebraic subsurface is described by an explicit system of algebraic equations. One-dimensional walks with bounded jumps in a RE are treated as a particular case of the strip model. If the one dimensional RE is i. i. d., then our approach leads to a complete and constructive classification of possible types of asymptotic behaviour of recurrent random walks. Namely, the RW exhibits the (log t)2 asymptotic behaviour if the distribution of the RE is not supported by a hyperplane in the space of parameters which shall be explicitly described. And if the support of the RE belongs to this hyperplane then the corresponding RW is a martingale and its asymptotic behaviour is governed by the Central Limit Theorem.
1. Introduction The aim of this work is to describe conditions under which a recurrent random walk in a random environment (RWRE) on a strip exhibits the log2 t asymptotic behaviour. This slow, lingering movement of a walk was discovered by Sinai in 1982 [18]. At the time, this work had brought to a logical conclusion the study of the so called simple RWs (SRW) started by Solomon in [19] and by Kesten, Kozlov, and Spitzer in [14]. The somewhat misleading term “simple” is often used as an abbreviation describing a walk on a one-dimensional lattice with jumps to nearest neighbours. Our work was motivated by a question asked by Sinai in [18] about the validity of his (and related) results for other models. Perhaps the simplest extension of the SRW is presented by a class of one-dimensional walks whose jumps (say) to the left are bounded and to the right are of length at most one. These models were successfully studied by a
254
E. Bolthausen, I. Goldsheid
number of authors and the relevant references can be found in [2]. We would like to quote one result concerning this special case since it is perhaps most close to our results stated below in Theorems 2 and 3. Namely, Bremont proved in [3] that if the environment is defined by a Gibbs measure on a sub-shift of finite type, then the asymptotic behaviour of a recurrent RW is either as in Sinai’s theorem, or it is governed by the Central Limit Law. General 1DWBJ were also studied by different authors. Key in [15] found conditions for recurrence of a wide class of 1DWBJ. Certain sufficient conditions for the Sinai behaviour of 1DWBJ were obtained by Letchikov in [17]. The results from [17] will be discussed in a more detailed way in Sect. 1.1 after the precise definition of the onedimensional model is given. We refer the reader to [20] for further historical comments as well as for a review of other recent developments. The main object of this paper is the RWRE on a strip. We prove (and this is the main result of this paper) that recurrent walks in independent identically distributed (i. i. d.) random environments on a strip exhibit the log2 t asymptotic behaviour if the support of the distribution of the parameters defining the random environment does not belong to a certain algebraic subsurface in the space of parameters. This subsurface is defined by an explicit system of algebraic equations. The one dimensional RW with bounded jumps can be viewed as a particular case of a RWRE on a strip. This fact was explained in [1] and we shall repeat this explanation here. Due to this reduction, our main result implies a complete classification of recurrent 1DWBJ in i.i.d. environments. Namely, the corresponding system of algebraic equations reduces in this case to one linear equation which defines a hyperplane in the space of parameters. If the support of the distribution of parameters does not belong to this hyperplane, then the RW exhibits the Sinai behaviour (see Theorem 2 below). But if it does, then (Theorem 3 below) the corresponding random walk is a martingale and its asymptotic behaviour is governed by the Central Limit Law. In brief, recurrent 1DWBJ are either of the Sinai type, or they are martingales. In the case of a strip, a complete classification can also be obtained and it turns out that once again the asymptotic behaviour is either the Sinai, or is governed by the Invariance Principle. However, this case is less transparent and more technical even to describe in exact terms and we shall leave it for a future work. The paper is organized as follows. We state Sinai’s result and define a more general one-dimensional model in Sect. 1.1. Section 1.2 contains the definition of the strip model and the explanation of the reduction of the one-dimensional model to the strip case. Main results are stated in Sect. 1.3. Section 2 contains several statements which are then used in the proof of the main result, Theorem 1. In particular, we introduce random transformations associated with random environments in Sect. 2.2. It turns out to be natural to recall and to extend slightly, in the same Sect. 2.2, those results from [1] which are used in this paper. An important Lemma 5 is proved in Sect. 2.3; this lemma allows us to present the main algebraic statement of this work in a constructive form. In Sect. 2.4 we prove the invariance principle for the log of a norm of a product of certain matrices. This function plays the role of the so-called potential of the environment and is responsible for the Sinai behaviour of the random walk. It is used in the proof of our main result in Sect. 3. Finally the Appendix contains results of which many (if not all) are not new but it is convenient to have them in a form directly suited for our purposes. Among these, the most important for our applications is the Invariance Principle (IP) for “contracting” Markov chains (Sect. 4.1.3). Its proof is derived from a well known IP for general Markov chains which, in turn, is based on the IP for martingales.
Lingering Random Walks in Random Environment on a Strip
255
Conventions. The following notations and terminology shall be used throughout the paper. R is the set of real numbers, Z is the set of integer numbers, and N is the set of positive integers. For a vector x = (xi ) and a matrix A = (a(i, j)) we put def def x = max |xi |, A = max |a(i, j)|. i
i
j
Note that A = sup|x|=1 Ax. We say that A is strictly positive (and write A > 0), if all its matrix elements satisfy a(i, j) > 0. A is called non-negative (and we write A ≥ 0), if all a(i, j) are non negative. A similar convention applies to vectors. def
1.1. Sinai’s result and some of its extensions to 1DWBJ. Let ω = ( pn )−∞ 0 and ξ(·) is recurrent then there is a weakly converging sequence of random variables bt (ω), t = 1, 2, . . . such that (log t)−2 ξ(t) − bt →0 as t → ∞. (1.1) The convergence in (1.1) is in probability with respect to the so-called annealed probability measure P(dω)Prω (for precise statements see Sect. 1.3). The limiting distribution of bt was later found, independently, by Golosov [7,8] and Kesten [13]. The one-dimensional walk with bounded jumps on Z is defined similarly to the simdef
ple RW. Namely let ω = ( p(n, ·)), n ∈ Z, be a sequence of non-negative vectors with m k=−m p(n, k) = 1 and m > 1. Put ξ(0) = 0 and def
Prω (ξ(t + 1) = n + k | ξ(t) = n) = p(n, k), n ∈ Z.
(1.2)
Suppose next that p(n, ·) is a random stationary in n (in particular it can be i. i. d.) sequence of vectors. Sinai’s question can be put as follows: given that a RW is recurrent, what kind of asymptotic behaviour would one observe, and under what conditions? There were several attempts to extend Sinai’s result to the (1.2) model. In particular, Letchikov [17] proved that if for some ε > 0 with P-probability 1 p(n, 1) ≥
−2 k=−m
p(n, k) + ε and p(n, −1) ≥
m
p(n, k) + ε
k=2
and the distribution of the i. i. d. random vectors p(n, ·) is absolutely continuous with respect to the Lebesgue measure (on the relevant simplex), then the analogue of Sinai’s theorem holds. (In [17], there are also other restrictions on the distribution of the RE but they are much less important than the ones listed above.) The technique we use in this work is completely different from that used in [2,3,15,17]. It is based on the methods from [1] and [6] and this work presents further development of the approach to the analysis of the RWRE on a strip started there.
256
E. Bolthausen, I. Goldsheid
1.2. Definition of the strip model . The description of the strip model presented here is the same as in [1]. Let (Pn , Q n , Rn ), −∞ < n < ∞, be a strictly stationary ergodic sequence of triples of m × m matrices with non-negative elements such that for all n ∈ Z the sum Pn + Q n + Rn is a stochastic matrix, (Pn + Q n + Rn )1 = 1,
(1.3)
where 1 is a column vector whose components are all equal to 1. We write the components of Pn as Pn (i, j), 1 ≤ i, j ≤ m, and similarly for Q n and Rn . Let (, F, P, T ) be the corresponding dynamical system with denoting the space of all sequences ω = (ωn ) = ((Pn , Q n , Rn )) of triples described above, F being the corresponding natural σ -algebra, P denoting the probability measure on (, F), and T being a shift operator on defined by (T ω)n = ωn+1 . For fixed ω we define a random walk ξ(t), t ∈ N on the strip S = Z × {1, . . . , m} by its transition probabilities Qω (z, z 1 ) given by ⎧ P (i, j) ⎪ ⎨ n Rn (i, j) def Qω (z, z 1 ) = ⎪ ⎩ Q n (i, j) 0
if z = (n, i), z 1 = (n + 1, j), if z = (n, i), z 1 = (n, j), if z = (n, i), z 1 = (n − 1, j), otherwise.
(1.4)
This defines, for any starting point z = (n, i) ∈ S and any ω, a law Prω,z for the Markov chain ξ(·) by def
Prω,z (ξ(1) = z 1 , . . . , ξ(t) = z t ) = Qω (z, z 1 )Qω (z 1 , z 2 ) · · · Qω (z t−1 , z t ).
(1.5)
We call ω the environment or the random environment on a strip S. Denote by z the set of trajectories ξ(·) starting at z. Prω,z is the so-called quenched probability measure on z . The semi-direct product P(dω)Prω,z (dξ ) of P and Prω,z is defined on the direct product × z and is called the annealed measure. All our main results do not depend on the choice of the starting point z. We therefore write Prω instead of Prω,z when there is no danger of confusion. The one-dimensional model (1.2) reduces to a RW on a strip due to the following geometric construction. Note first that it is natural to assume (and we shall do so) that at least one of the following inequalities holds: P{ω : p(x, m) > 0} > 0 or P{ω : p(x, −m) > 0} > 0.
(1.6)
Consider the one-dimensional lattice as a subset of the X -axis in a two-dimensional plane. Cut this axis into equal intervals of length m so that each of them contains exactly m consecutive integer points. Turn each of these intervals around its left most integer point anti-clockwise by π/2. The image of Z obtained in this way is a part of a strip with distances between layers equal to m. Re-scaling the X -axis of the plane by m −1 makes the distance between the layers equal to one. The random walk on the line is thus transformed into a random walk on a strip with jumps to nearest layers. The formulae for matrix elements of the corresponding matrices Pn , Q n , Rn result now from a formal description of this construction. Namely, present x ∈ Z as x = nm +i, where 1 ≤ i ≤ m. This defines a bijection x ↔ (n, i) between the one-dimensional lattice Z and the strip S = Z × {1, . . . , m}. This bijection naturally transforms the
Lingering Random Walks in Random Environment on a Strip
257
ξ -process on Z into a walk on Z × {1, . . . , m}. The latter is clearly a random walk of type (1.5) and the corresponding matrix elements are given by Pn (i, j) = p(nm + i, m + j − i), Rn (i, j) = p(nm + i, j − i), Q n (i, j) = p(nm + i, −m + j − i).
(1.7)
1.3. Main results . Denote by J the following set of triples of m × m matrices: def
J = {(P, Q, R) : P ≥ 0, Q ≥ 0, R ≥ 0 and (P + Q + R)1 = 1}. Let J0 ⊂ J be the support of the probability distribution of the random triple (Pn , Q n , Rn ) defined above (obviously, this support does not depend on n). The two assumptions C1 and C2 listed below will be referred to as Condition C. Condition C C1 (Pn , Q n , Rn ), −∞ < n < ∞, is a sequence of independent identically distributed random variables. C2 There is an ε > 0 and a positive integer number l < ∞ such that for any (P, Q, R) ∈ J0 and all i, j ∈ [1, m], ||R l || ≤ 1 − ε, ((I − R)−1 P)(i, j) ≥ ε, ((I − R)−1 Q)(i, j) ≥ ε. Remarks. 1. We note that say ((I − Rn )−1 Pn )(i, j) is the probability for a RW starting from (n, i) to reach (n +1, j) at its first exit from layer n. The inequality ||Rnl || ≤ 1−ε is satisfied in essentially all interesting cases and, roughly speaking, means that the probability for a random walk to remain in layer n after a certain time l is small uniformly with respect to n and ω. 2. If the strip model is obtained from the one-dimensional model, then C2 may not be satisfied by matrices (1.7). This difficulty can be overcome if we replace C2 by a much milder condition, namely: C3 For P - almost all ω: (a) the strip S is the (only) communication class of the walk, (b) there is an ε > 0 and a triple (P, Q, R) ∈ J0 such that at least one of the following two inequalities holds: ((I − R)−1 P)(i, j) ≥ ε for all i, j ∈ [1, m], or ((I − R)−1 Q)(i, j) ≥ ε for all i, j ∈ [1, m]. Our proofs will be carried out under Condition C2. They can be modified to make them work also under Condition C3. Lemma 6 which is used in the proof of Theorem 1 is the main statement requiring a more careful treatment under Condition C3 and the corresponding adjustments are not difficult. However, the proofs become more technical in this case, and we shall not do this in the present paper. If now vectors p(x, ·) defining matrices (1.7) are P-almost surely such that p(x, 1) ≥ and p(x, −1) ≥ for some > 0, then it is easy to see that Condition C3 is satisfied. We note also that if in addition the inequalities p(x, m) ≥ and p(x, −m) ≥ hold P-almost surely, then also C2 is satisfied. For a triple of matrices (P, Q, R) ∈ J0 denote by π = π(P,Q,R) = (π1 , . . . , πm ) a row vector with non-negative components such that π(P + Q + R) = π and
m j=1
π j = 1.
258
E. Bolthausen, I. Goldsheid
Note that the vector π is uniquely defined. Indeed, the equation for π can be rewritten as π(I − R) (I − R)−1 P + (I − R)−1 Q = π(I − R). According to Condition C2, the stochastic matrix (I − R)−1 P + (I − R)−1 Q has strictly positive elements (in fact they are ≥ 2ε). Hence π(I − R) is uniquely (up to a multiplication by a number) defined by the last equation and this implies the uniqueness of π. Consider the following subset of J : def
Jal = {(P, Q, R) ∈ J : π(P − Q)1 = 0, where π(P + Q + R) = π }, (1.8) m m where obviously π(P − Q)1 ≡ i=1 πi j=1 (P(i, j) − Q(i, j)). Note that Jal is an algebraic subsurface in J . We are now in a position to state the main result of this work: Theorem 1. Suppose that Condition C is satisfied, the random walk ξ(·) = (X (·), Y (·)) is recurrent, and J0 ⊂ Jal . Then there is a sequence of random variables bt (ω), t = 1, 2, . . ., which converges weakly as t → ∞ and such that for any > 0,
X (t) P ω : Prω | − b | ≤ ≥ 1 − → 1 as t → ∞. (1.9) t (log t)2 Remarks. The algebraic condition in this theorem requires a certain degree of nondegeneracy of the support J0 of the distribution of (Pn , Q n , Rn ). It may happen that relations (1.9) hold even when J0 ⊂ Jal . However Theorem 3 shows that there are important classes of environments where relations (1.9) (or (1.11)) hold if and only if this non-degeneracy condition is satisfied. We now turn to the one-dimensional model. It should be mentioned right away that Theorem 2 is essentially a corollary of Theorem 1. Denote by J˜ the set of all 2m + 1-dimensional probability vectors: def J˜ = {( p( j))−m≤ j≤m : p(·) ≥ 0 and
m
p( j) = 1}.
j=−m
Remember that in this model the environment is a sequence of vectors: ω = ( p(x, ·))−∞<x 0 such that p(0, 1) ≥ ε, p(0, −1) ≥ ε, p(0, m) ≥ ε, and p(0, −m) ≥ ε for any p(0, ·) ∈ J˜0 , (c) for P almost all environments ω the corresponding one-dimensional random walk ξ(·) is recurrent, (d) J˜0 ⊂ J˜al .
Lingering Random Walks in Random Environment on a Strip
259
Then there is a weakly converging sequence of random variables bt (ω), t = 1, 2, . . . such that for any > 0,
ξ(t) − b | ≤ ≥ 1 − → 1 as t → ∞. (1.11) P ω : Prω | t (log t)2 Proof. Since the one-dimensional model reduces to a model on a strip, the result in question would follow if we could check that all conditions of Theorem 1 follow from those of Theorem 2. It is obvious from formulae (1.7) that the i. i. d. requirement (Condition C1) follows from condition (a) of Theorem 2. We have already mentioned above that Condition C2 follows from condition (b). The recurrence of the corresponding walk on a strip is also obvious. Finally, condition (d) implies the algebraic condition of Theorem 1. Indeed, formulae (1.7) show that matrices Pn , Q n , Rn are defined by probability vectors p(nm +i, ·) ∈ J˜0 , where 1 ≤ i ≤ m. Put n = 0 and choose all these vectors to be equal to each other, say p(i, ·) = p(·) ∈ J˜0 , where 1 ≤ i ≤ m. A direct check shows that the triple of matrices (P, Q, R) built from this vector has the property that P + Q + R is double-stochastic and irreducible (irreducibility follows from the conditions p(1) ≥ ε and p(−1) ≥ ε). Hence the only probability vector π satisfying π(P + Q + R) = π is given by π = (m −1 , . . . , m −1 ). One more direct calculation shows that in this case mπ(P − Q)1 =
m
j p( j).
j=−m
Hence the condition J0 ⊂ Jal of Theorem 1 is satisfied if there is at least one vector ˜ p(·) ∈ J0 such that mj=−m j p( j) = 0. We conclude this section with a theorem which shows, among other things, that the algebraic condition of Theorem 2 is also necessary for having (1.11). This theorem does not require independence as such but in a natural sense it finalizes the classification of the one-dimensional recurrent RWs with bounded jumps in the i. i. d. environments. Theorem 3. Consider a one-dimensional RW and suppose that (a) p(x, ·), x ∈ Z, is a strictly stationary ergodic sequence of vectors, (b) there is an ε > 0 such that p(0, 1) ≥ ε and p(0, −1) ≥ ε for any p(0, ·) ∈ J˜0 , (c) J˜0 ⊂ J˜al , that is m
j p( j) = 0 for any p(·) ∈ J˜0 .
j=−m
Then: (i) The random walk ξ(·) is asymptotically normal in every(!) environment ω = ( p(x, ·)) −∞<x 0 such that for P-a. e. ω,
x 2 ξ(t) 1 − u lim Prω √ ≤ x = √ e 2σ 2 du, (1.12) t→∞ t 2π σ −∞ where x is any real number and the convergence in (1.12) is uniform in x.
260
E. Bolthausen, I. Goldsheid
Remarks about the proof of Theorem 3. The condition of this theorem implies that ξ(t) is a martingale: m
E ω (ξ(t) − ξ(t − 1) | ξ(t − 1) = k) =
j p(k, j) = 0,
j=−m
where E ω denotes the expectation with respect to the probability measure Prω on the space of trajectories of the random walk (we assume that ξ(0) = 0). Let Un = ξ(n) − ξ(n − 1) and put def
σn2 = E ω (Un2 | ξ(n − 1)) =
m
j 2 p(ξ(n − 1), j).
j=−m def
Obviously ε ≤ σn2 ≤ m 2 , where ε is the same as in Theorem 3. Next put Vn2 = def
n
2 j=1 σ j nm 2 . Let
and sn2 = E ω (Vn2 ) = E ω (ξ(n)2 ). It is useful to note that nε ≤ Vn2 , sn2 ≤ Tt = inf{n : Vn2 ≥ t}. Statement (i) of Theorem 3 is a particular case of a much more general theorem of Drogin who in particular proves that t −1/2 ξ(Tt ) converges weakly to a standard normal random variable. We refer to [12], p. 98 for more detailed explanations. Statement (ii) of Theorem 3 is similar to a well known result by Lawler [16]. The main ingredient needed for proving (ii) is the following claim: The limit lim n −1 Vn2 = lim n −1 sn2 exists for P-almost all ω. n→∞
n→∞
(1.13)
Once this property of the variance of ξ(·) is established, (ii) becomes a corollary of Brown’s Theorem (see Theorems 9 and 10 in Appendix or Theorem 4.1 in [12]). However proving (1.13) is not an entirely straightforward matter. The proof we are aware of uses the approach known under the name “environment viewed from the particle”. This approach was used in [16] for proving properties of variances similar to (1.13); unfortunately, the conditions used in [16], formally speaking, are not satisfied in our case. Fortunately, Zeitouni in [20] found the way in which Lawler’s result can be extended to more general martingale-type random walks in random environments which include our case. 2. Preparatory Results 2.1. Elementary corollaries of Condition C. We start with several elementary observations following from C2. Lemma 3 and a stronger version of Lemma 1 can be found in [1]. Lemmas 2 and 4 are borrowed from [6]. Lemma 1. If Condition C2 is satisfied then for P-almost every environment ω the whole phase space S of the Markov chain ξ(t) constitutes the (only) communication class of this chain. Proof. Fix an environment ω and consider matrices def def P˜n = (I − Rn )−1 Pn , Q˜ n = (I − Rn )−1 Q n .
Lingering Random Walks in Random Environment on a Strip
261
Remark that P˜n (i, j) is the probability that the random walk ξ starting at (n, i) would reach (n + 1, j) at the time of its first exit from layer n; the probabilistic meaning of Q˜ n (i, j) is defined similarly. P˜n (i, j) ≥ ε > 0 and Q˜ n (i, j) ≥ ε > 0 because of Condition C2. It is now obvious that a random walk ξ(·) starting from any z ∈ S would reach any z 1 ∈ S with a positive probability. Matrices of the form (I − R − Qψ)−1 , (I − R − Qψ)−1 P, and (I − R − Qψ)−1 Q arise in the proofs of many statements below. We shall list several elementary properties of these matrices. Lemma 2. If Condition C2 is satisfied, (P, Q, R) ∈ J0 and ψ is any stochastic matrix, then there is a constant C depending only on ε and m such that (2.1) (I − R − Qψ)−1 ≤ C. Proof. Note first that ||R l || ≤ 1 − ε implies that for some C1 uniformly in R, ∞ k R ≤ C1 . (I − R)−1 ≤ k=0
Next, it follows from (P + Q + R)1 = 1 that (I − R)−1 P1 + (I − R)−1 Q1 = 1 and (I − R)−1 Q1 = 1 − (I − R)−1 P1. Condition C2 implies that (I − R)−1 P1 ≥ mε1. Hence (I − R)−1 Q = (I − R)−1 Q1 = 1 − (I − R)−1 P1 ≤ 1 − mε. Similarly, (I − R)−1 P ≤ 1 − mε. Hence (I − R − Qψ)−1 = (I − (I − R)−1 Qψ)−1 (I − R)−1 −1 ≤ 1 − (I − R)−1 Qψ r (I − R)−1 ≤ C1 m −1 ε−1 ≡ C. Lemma is proved.
Lemma 3. ([1]). If Condition C2 is satisfied, (P, Q, R) ∈ J , and ψ is a stochastic matrix, then (I − R − Qψ)−1 P is also stochastic. Proof. We have to check that (I − R − Qψ)−1 P1 = 1 which is equivalent to P1 = (I − Qψ − R)1 ⇔ (P + Qψ + R)1 = 1. Since ψ1 = 1 and P + Q + R is stochastic, the result follows. Lemma 4. Suppose that Condition C2 is satisfied and (P, Q, R) ∈ J0 and let a matrix ϕ ≥ 0 be such that ϕ1 ≤ 1. Then ((I − R − Qϕ)−1 P)(i, j) ≥ ε and ((I − R − Qϕ)−1 Q)(i, j) ≥ ε.
(2.2)
Proof. (I − R − Qϕ)−1 P ≥ (I − R)−1 P and (I − R − Qϕ)−1 Q ≥ (I − R)−1 Q.
2.2. Random transformations, related Markov chains, Lyapunov exponents, and recurrence criteria. The purpose of this section is to introduce objects listed in its title. These objects shall play a major role in the proofs of our main results. They shall also allow us to state the main results from [1] in the form which is suitable for our purposes.
262
E. Bolthausen, I. Goldsheid
Random transformations and related Markov chains. Let be the set of stochastic def m × m matrices, X be the set of unit vectors with non-negative components, and M = × X the direct product of these two sets. Define a distance ρ(·, ·) on M by ρ((ψ, x), (ψ , x )) = ||ψ − ψ || + ||x − x ||. def
(2.3)
For any triple (P, Q, R) ∈ J0 denote by g ≡ g(P,Q,R) a transformation g : M → M, where g.(ψ, x) = ((I − R − Qψ)−1 P , ||Bx||−1 Bx), def
(2.4)
and B ≡ B(P,Q,R) (ψ) = (I − R − Qψ)−1 Q. def
(2.5)
The fact that g maps M into itself follows from Lemma 3. Remarks. Here and in the sequel the notation g.(ψ, x) is used instead of g((ψ, x)) and the dot is meant to replace the brackets and to emphasize the fact that g maps (ψ, x) into another pair from M. In fact this notation is often used in the theory of products def
of random matrices, e. g. B.x = ||Bx||−1 Bx; we thus have extended this tradition to another component of g. def
If ω ∈ is an environment, ω = (ωn )−∞ 0. But the fact that the random environment does not depend on n allows one to analyse the recurrence and transience properties of the random walk in a way which is much more straightforward than the one offered by Theorems 4 and 5. Namely, suppose that ξ(t) = (X (t), Y (t)) = (k, i). Then the conditional probability Pr { Y (t) = j | ξ(t − 1) = (k, i)} = P(i, j) + Q(i, j) + R(i, j) does not depend on X (t − 1) and thus the second coordinate of this walk is a Markov chain with a state space (1, . . . , m) and a transition matrix P + Q + R. Hence, if π = (π1 , . . . πm ) is a probability vector such that π(P + Q + R) = π then πi is the frequency of visits by the RW to the sites (·, i) of the strip. def
Consider next the displacement η(t) = X (t) − X (t − 1) of the coordinate X of the walk which occurs between times t − 1 and t. The random variable η(t) takes values 1, −1, or 0 and the following conditional distribution of the pair (η(t), Y (t)) is given by Pr { (η(t), Y (t)) = (1, j) | ξ(t − 1) = (k, i)} = P(i, j), Pr { (η(t), Y (t)) = (−1, j) | ξ(t − 1) = (k, i)} = Q(i, j), and Pr { (η(t), Y (t)) = (0, j) | ξ(t − 1) = (k, i)} = R(i, j). It is essential that this distribution depends only on i (and not on k) and thus this pair forms a time-stationary Markov chain. Let us denote by E (k,i) the corresponding conditional expectation with conditioning on (η(t − 1), Y (t − 1)) = (k, i), −1 ≤ k ≤ 1, 1 ≤ m. We then have E (k,i) (η(t)) =
m j=1
P(i, j) −
m
Q(i, j),
j=1
and the expectation of the random variable with respect to the stationary distrimsame bution is thus given by i=1 πi mj=1 (P(i, j) − Q(i, j)). Applying the law of large
Lingering Random Walks in Random Environment on a Strip
265
numbers for Markov chains to the sequence η(t) we obtain that with Pr -probability 1, lim t −1 X (t) = lim t −1
t→∞
t→∞
t k=1
η(k) =
m i=1
πi
m (P(i, j) − Q(i, j)), j=1
and this limit is independent of the ξ(0). Since this result is equivalent to the statements of Theorems 4 and 5, we obtain the following Lemma 5. Suppose that (P, Q, R) satisfies Condition C2. Then (ζ, x) ∈ M satisfies Eq. (2.14) with λ = 0 if and only if m i=1
m πi (P(i, j) − Q(i, j)) = 0.
(2.15)
j=1
m Moreover λ > 0 if and only if i=1 πi mj=1 (P(i, j) − Q(i, j)) < 0 (and thus λ < 0 m m if and only if i=1 πi j=1 (P(i, j) − Q(i, j)) > 0). 2.4. The CLT and the invariance principle for Sn ’s. The main goal of this section is to prove an invariance principle (IP) (and a CLT) for the sequence def
Sn = log Bn . . . B1 x1 − nλ,
(2.16)
where matrices Bn are defined by (2.7) and λ is given by (2.13). Obviously, Sn depends on (ψ1 , x1 ) ∈ M. We shall prove that in fact the IP (and the CLT) are satisfied uniformly in (ψ1 , x1 ) ∈ M. Moreover, exactly one of the two things takes place if the random walk is recurrent: either the asymptotic behaviour of Sn is described by a non-degenerate Wiener process, or the support of the distribution of matrices (P, Q, R) belongs to an algebraic manifold defined by Eq. (1.8). To make these statements precise we first recall one of the definitions n of the invariance principle associated with a general random sequence Sn = k=1 f k , with the convention S0 = 0. Let {C[0, 1], B, PW } be the probability space where C[0, 1] is the space of continuous functions with the sup norm topology, B being the Borel σ -algebra generated by open sets in C[0, 1], and PW the Wiener measure. Define for t ∈ [0, 1] a sequence of random functions vn (t) associated with the sequence Sn . Namely, put 1
vn (t) = n − 2 (Sk + f k+1 (tn − k)) def
if k ≤ tn ≤ k + 1, k = 0, 1, . . . , n − 1. (2.17)
For a σ > 0 let {Pσn } be the sequence of probability measures on {C[0, 1], B} determined by the distribution of {σ −1 vn (t), 0 ≤ t ≤ 1}. Definition. A random sequence Sn satisfies the invariance principle with parameter σ > 0 if Pσn → PW weakly as n → ∞. If the sequence Sn depends on (another) parameter, e.g. z 1 , then we say that Sn satisfies the invariance principle with parameter σ > 0 uniformly in z 1 if for any continuous functional on f : C[0, 1] → R one has: Eσn (f) → E W (f) uniformly in z 1 as n → ∞. Here En and E W are expectations with respect to the relevant probabilities.
266
E. Bolthausen, I. Goldsheid
Let us state the invariance principle for the sequence Sn given by (2.16). Note that in this case Sn =
n (log Bk xk − λ), where xk = Bk−1 xk−1 −1 Bk−1 xk−1 , k ≥ 2.
(2.18)
k=1
Put z n = (ψn , xn ) and f n = f (gn , z n ), where the function f is defined on the set of pairs (g, z) ≡ ((P, Q, R), (ψ, x)) by def (2.19) f (g, z) = log (I − R − Qψ)−1 Qx − λ. Obviously in these notations Sn = nk=1 f k . Denote by A the Markov operator associated with the Markov chain z n+1 = gn .z n defined by (2.6): if F is a function defined on the state space J0 × M of this chain then
def (AF)(g, z) = F(g , g.z)µ(dg ). J0 ×M
Using these notations we write ν(dz) (rather than ν(d(ψ, x))) for the invariant measure of the chain z n and we denote by M0 ⊂ M the support of ν(dz). Theorem 6. Suppose that Condition C is satisfied and the function f is defined by (2.19). Then: (i) The equation F(g, z) − (AF)(g, z) = f (g, z) (2.20) has a unique solution F(g, z) which is continuous on J0 × M0 and
F(g, z)µ(dg)ν(dz) = 0. J0 ×M
Denote by
σ2 =
J0 ×M0
(AF 2 − (AF)2 )(g, y)µ(dg)ν(dy).
S√n
(ii) If σ > 0 then σ n converges in law towards the standard Gaussian distribution N (0, 1) and the sequence Sn satisfies the invariance principle with parameter σ uniformly in (ψ1 , x1 ) ∈ M. (iii) If σ = 0, then the function F(g, y) depends only on y and for every (g, y) ∈ J0 ×M0 one has f (g, y) = F(y) − F(g.y). (2.21) (iv) If σ = 0 and λ = 0 then
J0 ⊂ Jal ,
(2.22)
with Jal given by (1.8). Proof. Statements (i), (ii), and (iii) of our theorem follow from Theorem 12. In order to be able to apply Theorem 12 we have to show that the sequence of random transformations gn has the so called contraction property. Lemma 6 establishes this property. Relation (2.22) is then derived from (2.21) and one more general property of Markov chains generated by products of contracting transformations (Lemma 8).
Lingering Random Walks in Random Environment on a Strip
267
Lemma 6. Suppose that Condition C is satisfied and let (ψn+1 , xn+1 ) = gn .(ψn , xn ), (ψn+1 , xn+1 ) = gn .(ψn , xn ), n ≥ 1,
be two sequences from M. Then there is a c, 0 ≤ c < 1, such that for any (ψ1 , x1 ), (ψ1 , x1 ) ∈ M, (2.23) ρ (ψn , xn ), (ψn , xn ) ≤ const cn , where ρ(·, ·) is defined by (2.3). Proof of Lemma 6. We shall first prove that there is a c0 < 1 such that ||ψn − ψn || ≤ const c0n . The control of the x-component would then follow from this result. Let us introduce a sequence of m×m matrices ϕn , n ≥ 1, which we define recursively: ϕ1 = 0 and (2.24) ϕn+1 = (I − Rn − Q n ϕn )−1 Pn , if n ≥ 1. Remarks. Matrices ϕn and ψn were defined in a purely analytic way. Their probabilistic meaning is well known (see [1]) and shall also be discussed in Sect. 3. def
Put k = ψk − ϕk . To control the ψ-part of the sequence (ψn , xn ) we need the following Lemma 7. Suppose that Condition C is satisfied. Then there is a c0 , 0 ≤ c0 < 1, such that for any stochastic matrix ψ1 ∈ the matrix elements of the corresponding n+1 are of the following form: n+1 (i, j) = αn (i)cn ( j) + ˜n (i, j).
(2.25)
Here αn (i) and cn ( j) depend only on the sequence (P j , Q j , R j ), 1 ≤ j ≤ n; the matrix ˜n = (˜n (i, j)) is a function of ψ1 and of the sequence (P j , Q j , R j ), 1 ≤ j ≤ n, satisfying ||˜n || ≤ C1 c0n for some constant C1 . Corollary. If Condition C holds then ||ψn+1 − ψn+1 || ≤ 2C1 c0n .
(2.26)
Proof of Corollary. Consider a sequence ψn which differs from ψn in that the starting def
value for recursion (2.6) is ψ1 . Put k = ψk − ϕk . Applying the result of Lemma 7 to n+1 we obtain: n+1 (i, j) = αn (i)cn ( j) + ˜n (i, j). (2.27) || = It follows from (2.25), (2.27), and the definition of n+1 and n+1 that ||ψn+1 −ψn+1 ||n+1 − n+1 || ≤ ||˜n || + ||˜n || ≤ 2C1 c0n .
Proof of Lemma 7. The main idea of this proof is the same as that of the proof of Theorem 1 from [1]. A very minor difference is that here we have to control the behaviour of ψn when n is growing while ψ1 is fixed; in [1] n was fixed while the starting point of the chain was tending to − ∞. A more important difference is that here we state the exponential speed of convergence of certain sequences and present the corresponding quantities in a relatively explicit way while in [1] the speed of convergence was not very essential (even though the exponential character of convergence had been clear already then).
268
E. Bolthausen, I. Goldsheid
To start, note that it follows from (2.6) and (2.24) that n+1 = ((I − Rn − Q n ψn )−1 − (I − Rn − Q n ϕn )−1 )Pn = (I − Rn − Q n ψn )−1 Q n n (I − Rn − Q n ϕn )−1 Pn = Bn n ϕn+1 .
(2.28)
Iterating (2.28), we obtain n+1 = Bn . . . B1 1 ϕ2 . . . ϕn+1 ≡ Bn . . . B1 ψ1 ϕ2 . . . ϕn+1 .
(2.29)
It follows from Lemma 4 that ϕn 1 ≤ 1. The matrix elements of the matrices ϕn , n ≥ 2, are strictly positive and, moreover, according to estimates (2.2) we have: ϕn (i, j) ≥ ε (and hence also ϕn (i, j) ≤ 1 − (m − 1)ε). We are in a position to apply to the product of matrices ϕn the presentation derived in Lemma 15 (with an ’s replaced by ϕn ’s). By the first formula in (4.16), we have: ϕ2 . . . ϕn+1 = Dn [(cn (1)1, . . . , cn (m)1) + φn ], where Dn is a diagonal matrix, cn ( j) ≥ δ with mj=1 cn ( j) = 1, and φn ≤ (1 − mδ)n−1 with δ > 0 (and of course mδ < 1). One can easily see that δ ≥ m −1 ε2 (this follows from (4.15) and the above estimates for ϕn (i, j)). We note also that the estimate for cn ( j) follows from (4.17) and (4.18). def Put c0 = 1 − mδ and let Bn = Bn . . . B1 1 Dn . We then have n+1 = Bn [(cn (1)1, . . . , cn (m)1) + φn ], and thus n+1 (i, j) = cn ( j) maxk, j |φn (k,
j)|cn−1 ( j)
≤
B (i, k) 1+ n k=1
m
const c0n .
φn (k, j) cn ( j)
(2.30)
. But all Bn (i, k) > 0 and
Hence
cn (l) n+1 (i, l) = + n (i, j, l), n+1 (i, j) cn ( j)
(2.31)
where |n (i, j, l)| < Cc0n with C being some constant. It follows from (2.31) that (n+1 (i, j))−1
m
n+1 (i, l) =
l=1
1 + n (i, j). cn ( j)
On the other hand remember that m l=1
n+1 (i, l) =
m l=1
ψn+1 (i, l) −
m
ϕn+1 (i, l) = 1 −
l=1
m
def
ϕn+1 (i, l) = αn (i).
l=1
Comparing these two expressions we obtain that n+1 (i, j) = αn (i)cn ( j) + ˜n (i, j), where |˜n (i, j)| ≤ C1 c0n . Lemma 7 is proved.
(2.32)
Lingering Random Walks in Random Environment on a Strip
269
||. Let us denote by b the transformation We now turn to the difference ||xn+1 − xn+1 n of the set X of unit non-negative vectors defined by
bn (x) = ||Bn x||−1 Bn x, where Bn = (I − Rn − Q n ψn )−1 Q n ,
(2.33)
and ψn are the same as above. The sequence bn is defined in a similar way with the only difference that ψn is replaced by ψn . Inequality (2.26) implies that for some C2 , ρ(b ¯ n , bn ) = sup ||bn (x) − bn (x)|| ≤ C2 c0n . def
x∈X
A very general and simple Lemma 16 from the Appendix now implies that ||xn+1 − xn+1 || ≤ C()(c0 + )n (1 + ||x1 − x1 ||)
and this proves Lemma 6. We can now easily prove the existence of the limit in (2.12) as well as Furstenberg’s formula (2.13) for λ. To this end note that def S¯n (ζ1 , 1) = log ||An . . . A1 || = log ||An . . . A1 1|| =
n
f (gk , z k ),
(2.34)
k=1
where the notation is chosen so that to emphasize the dependence of the sum S¯n (ζ1 , 1) on initial values x1 = 1 and ψ1 = ζ1 of the Markov chain. (Remark the difference between S¯n (ζ1 , 1) and the sum Sn in (2.16).) Lemma 6 implies that | S¯n (ζ1 , 1) − S¯n (ψ1 , x1 )| ≤ C3 ,
(2.35)
where the constant C3 depends only on the parameter ε from Condition C. But then, according to the law of large numbers applied to the Markov chain (ωn , ζn , yn ) ≡ (gn , ζn , yn ) defined in Theorem 4 we have that the following limit exists with probability 1: 1 1 log ||An . . . A1 || = lim S¯n (ζ1 , y1 ) = λ, n→∞ n n→∞ n lim
where λ is given by (2.13). Formula (2.13) implies that the mean value of the function f (g, z) defined by (2.19) is 0. Also, it is obvious that this function is Lipschitz on J0 × M in all variables. Hence, Theorem 12 applies to the sequence Sn and statements (i), (ii), and (iii) of Theorem 6 are thus proved. The case σ = 0 and λ = 0. Derivation of the algebraic condition for (P, Q, R). We start with a statement which is a corollary of a very general property proved in Lemma 13 from the Appendix. Lemma 8. Suppose that Condition C is satisfied and let g ∈ J0 , z g ∈ M be such that g.z g = z g . Then z g ∈ M0 ≡ suppν. Proof. According to Lemma 6, Condition C implies that every g ∈ J0 is contracting. Hence, by Lemma 13, z g ∈ M0 .
270
E. Bolthausen, I. Goldsheid
Derivation of the algebraic condition. According to Theorem 12 (see formula (4.10)), the equality σ = 0 implies that f (g, z) = F(z) − F(g.z). Hence, if z can be chosen to be equal to z g , then it follows that f (g, z g ) = 0. In the context of the present theorem the function f is given by f (g, z) = log ||(I − R − Qψ)−1 Qx||, where g = (P, Q, R) ∈ J0 and z = (ψ, x) ∈ M0 ⊂ × X. The equation g.z g = z g is equivalent to saying that z g = (ψ, x) satisfies (I − R − Qψ)−1 ψ = ψ and ||(I − R − Qψ)−1 Qx||−1 (I − R − Qψ)−1 Qx = x. The equation f (g, z g ) = 0 now reads log ||(I − R − Qψ)−1 Qx|| = 0 or, equivalently, ||(I − R − Qψ)−1 Qx|| = 1. Hence the conditions σ = 0 and λ = 0 imply that all pairs (g, z g ) ∈ J0 × M0 satisfy (I − R − Qψ)−1 P = ψ and (I − R − Qψ)−1 Qx = x. But, by Lemma 5, this implies that J0 ⊂ Jal , where Jal is defined by (1.8).
3. Proof of Theorem 1 As we are in the recurrent situation, we have that the Lyapunov exponent λ = 0. Throughout this section we denote by C a generic positive constant which depends on nothing but ε and m and which may vary from place to place. If f, g > 0 are two functions, depending on n ∈ Z, i ∈ {1, . . . , m}, and maybe on other parameters, we write f g if there exists a C > 1 such that C −1 f ≤ g ≤ C f. Potential and its properties. As before, Sn is defined by (2.16). We put ⎧ if n ≥ 1 ⎨ log ||An . . . A1 || def if n = 0 , n (ω) ≡ n = 0 ⎩ − log ||A . . . A || if n ≤ −1 0 n+1
(3.1)
where the matrices An are defined in (2.10). If n ≥ 1, then obviously n ≡ S¯n (ζ1 , 1) defined in (2.34). The random function n is the analog of the potential considered first in [18]. For n ≥ a, a ∈ Z, put def
Sa,n (ω; ψa , xa ) ≡ Sa,n (ω) = log Bn . . . Ba xa , where the matrices Bn are defined by (2.7). Similarly to (2.35), one has that Sa,n (ω; ζa , 1) − Sa,n (ω; ψa , xa ) ≤ C, which implies:
Sa,n (ω) − (n (ω) − a (ω)) ≤ C.
(3.2)
(3.3) (3.4)
Since one of the conditions of Theorem 1 is J0 ⊂ Jal , it follows from Theorem 6, part (iv) that n satisfies the invariance principle with a strictly positive parameter σ : σ > 0. The importance of the potential {n }n∈Z is due to that fact that it governs the stationary measure of our Markov chain; in fact it defines this stationary measure up to
Lingering Random Walks in Random Environment on a Strip
271
a multiplication by a bounded function (see (3.7). Namely, if a < b, we consider the on Markov chain ξta,b t∈N
def
Sa,b = {a, . . . , b} × {1, . . . , m}
(3.5)
with transition probabilities (1.4) and reflecting boundary conditions at L a and L b . This means that we replace (Pa , Q a , Ra ) by (I, 0, 0) and (Pb , Q b , Rb ) by (0, I, 0). This reflecting chain has a unique stationary probability measure which we denote by πa,b = πa,b (k, i) (k,i)∈S . A description of this measure was given in [1]. We repeat a,b
def
it here for the convenience of the reader. To this end introduce row vectors νk = Z πa,b (k, i) 1≤i≤m , a ≤ k ≤ b, and Z is a (normalizing) factor. In terms of these vectors the invariant measure equation reads νk = νk−1 Pk−1 + νk Rk + νk+1 Q k+1 , if a < k < b, νa = νa+1 Q a+1 , νb = νb−1 Pb−1 .
(3.6)
To solve Eq. (3.6), define for a ≤ k < b matrices αk by αa = Q a+1 , and αk = Q k+1 (I − Rk − Q k ψk )−1 , when a < k < b, def
def
where {ψk }k≥a+1 are given by (2.6) with the initial condition ψa+1 = I (we take into account that Ra = Q a = 0 in our case). We shall now check that νk can be found recursively as follows: νk =νk+1 αk , a ≤ k < b,, where νb satisfies νb ψb = νb . Indeed, the boundary condition at b in (3.6) reduces to νb = νb αb−1 Pb−1 = νb ψb , where we use the fact that αb−1 Pb−1 = ψb because Q b = I (and also due to (2.6)). But ψb is an irreducible stochastic matrix and therefore νb > 0 exists and is uniquely defined up to a multiplication by a constant. We now have for a < k < b that νk−1 Pk−1 + νk Rk + νk+1 Q k+1 = νk+1 (αk αk−1 Pk−1 + αk Rk + Q k+1 ) = νk+1 αk (Q k ψk + Rk + (I − Rk − Q k ψk )) = νk+1 αk = νk . Finally νa = νa+1 Q a+1 with αa = Q a+1 and this finishes the proof of our statement. We now have that πa,b (k, ·) = πa,b (b, ·) αb−1 αb−2 · · · · · αk , where as before πa,b (k, ·) is a row vector. Note next that αb−1 αb−2 · · · · · αk = Bb−1 · · · · · Bk+1 (I − Rk − Q k ψk )−1 . From this, we get πa,b (k, ·) Bb−1 · · · · · Bk+1 πa,b (b, ·), and using (3.2), (3.4), we obtain for a ≤ k, l ≤ b, πa,b (k, ·) exp [k − l ]. πa,b (l, ·)
(3.7)
272
E. Bolthausen, I. Goldsheid
We also consider the “mirror situation” by defining for n ≤ a the matrices ψn− in a similar way as in (2.6) by setting −1 − ψn−1 = I − Rn − Pn ψn− Q n , n ≤ a, def
and a boundary condition ψa− . Then, as in Theorem 4 a), one has that ζn− = lima→∞ ψn− exists almost surely, and does not depend on the boundary condition ψa− . We then put def − −1 A− Pn , n = I − Rn − Pn ζn and the potential − n as (3.1): ⎧ − ⎨ log ||A− 0 . . . An−1 || def − n = 0 ⎩ − − log ||A− n . . . A−1 ||
if n ≥ 1 if n = 0 . if n ≤ −1
We could as well have worked with this potential, and therefore we obtain πa,b (k, ·) − exp − k − l . πa,b (l, ·) As 0 = − 0 = 0, we get
n − − ≤ C n
(3.8)
uniformly in n. It is convenient to slightly reformulate the invariance principle for the potential. For that consider C0 (−∞, ∞), the space of continuous functions f : (−∞, ∞) → R satisfying f (0) = 0. We equip C0 (−∞, ∞) with a metric for uniform convergence on compacta, e.g. def
d ( f, g) =
∞
2−k min 1, supx∈[−k,k] | f (x) − g (x)| ,
(3.9)
k=1
and write B for the Borel-σ -field which is also the σ -field generated by the evaluation mappings C0 (−∞, ∞) → R. We also write PW for the law of the double-sided Wiener measure on C0 (−∞, ∞). For n ∈ N, we define kσ 2 def k = √ , k ∈ Z, Wn n n and define Wn (t), t ∈ R, by linear interpolation. Wn is a random variable taking values in C0 (−∞, ∞). Weak convergence of {Wn (t)}t∈R on C0 (−∞, ∞) is the same as weak convergence of {Wn (t)}t∈[−N ,N ] for any N ∈ N, and therefore, we immediately get Proposition 7. Wn converges in law to PW . Let V be the subset of functions f ∈ C0 (−∞, ∞) for which there exist real numbers a < b < c satisfying
Lingering Random Walks in Random Environment on a Strip
273
1. 0 ∈ (a, c). 2. f (a) − f (b) = f (c) − f (b) = 1. 3. f (a) > f (x) > f (b), ∀x ∈ (a, b), f (c) > f (x) > f (b), ∀x ∈ (b, c). 4. For any γ > 0, sup
f (x) > f (a),
sup
f (x) > f (c).
x∈(a−γ ,a) x∈(c,c+γ )
It is clear that for f ∈ V, a, b, c are uniquely defined by f, and we write occasionally a( f ), b( f ), c( f ). f (b) is the unique minimum of f in [a, c]. It is easy to prove that V ∈ B, and PW (V ) = 1. If δ > 0 and f ∈ V, we define def
cδ ( f ) = inf {x > c : f (x) = f (c) + δ}, def
aδ ( f ) = sup {x < a : f (x) = f (a) + δ}. If γ > 0, we set Vδ,γ to be the set of functions f ∈ V such that 1. cδ ( f ) ≤ 1/δ, aδ ( f ) ≥ −1/δ.
(3.10)
2. sup
[ f (x) − f (y)] ≤ 1 − δ,
(3.11)
[ f (x) − f (y)] ≤ 1 − δ.
(3.12)
b≤x 0, we have Vδ,γ ↑ V for δ ↓ 0, and therefore, for any δ, η > 0 we can find δ0 (γ , η) such that for δ ≤ δ0 , PW Vδ,γ ≥ 1 − η. It is easy to see that PW ∂ Vδ,γ = 0, where ∂ refers to the boundary in C0 (−∞, ∞). Therefore, given γ , η > 0, we can find N0 (γ , η) such that for n ≥ N0 , δ ≤ δ0 , we have (3.14) P Wn ∈ Vδ,γ ≥ 1 − 2η. def For t ∈ N, we set n = n (t) = log2 t . If Wn(t) ∈ Vδ,γ , then we put 2 2 b Wn(t) log2 t def aδ Wn(t) log t def cδ Wn(t) log t , at = , ct = . bt = σ2 σ2 σ2 Remark that on Wn(t) ∈ Vδ,γ , we have the following properties, translated from (3.10 )-(3.13): def
ct ≤
log2 t , σ 2δ
2
t at ≥ − log , σ 2δ
s − s ≤ (1 − δ) log t, bt ≤ s < s ≤ ct , s − s ≤ (1 − δ) log t, at ≤ s < s ≤ bt , s ≥ bt + δ log t, s ∈ [at , ct ] \ bt − γ log2 t, bt + γ log2 t , min at , ct − bt ≥ (1 + δ) log t.
(3.15) (3.16) (3.17) (3.18) (3.19)
Furthermore, if 0 ∈ [at , bt ], then sup s − bt ≤ log t,
(3.20)
0≤s≤bt
and similarly if 0 ∈ [bt , ct ]. (We neglect the trivial issue that at , bt , ct may not be in Z.) The main result is Proposition 8. For ω ∈ Wn(t) ∈ Vδ,γ , we have for any i ∈ {1, . . . , m}, / bt − γ log2 t, bt + γ log2 t ≤ 4t −δ/2 , Prω,(0,i) X (t) ∈ if t is large enough. Together with (3.14), this proves our main result Theorem 1. In all that follows, we keep γ , δ fixed, and assume that ω ∈ Wn(t) ∈ Vδ,γ . We will also suppress ω in the notation, and will take t large enough, according to ensuing necessities. We first prove several estimates of probabilities characterizing the behaviour of a RW in a finite box in terms of the properties of the function Sn .
Lingering Random Walks in Random Environment on a Strip
275
Lemma 9. Consider a random walk on Sa,b with reflecting boundary conditions (see the discussion around (3.5)), and let a < k < b. Then Pr(k,i) (τa < τb ) ≤ C
b
exp y − a ,
(3.21)
exp y − a .
(3.22)
y=k
Pr(k,i) (τb < τa ) ≤ C
k y=a
Here τa , τb are the hitting times of the layers L a , L b . Proof. We only have to prove (3.21). Equation (3.22) then follows in the mirrored situation and using (3.8). def
Put h k (i) = Pr(k,i) (τb < τa ) and consider column-vectors hk = (h k (i))1≤i≤m . In def
order to find hk we introduce the matrices ϕk+1 = (ϕk+1 (i, j))1≤i, j≤m , where def
ϕk+1 (i, j) = Prω,(k,i) (τk+1 < τa , ξ(τk+1 ) = (k + 1, j)).
(3.23)
These matrices satisfy (2.24) (with a = 0) with the modified boundary condition ϕa+1 = 0. Equation (2.29) with ψk ’s defined by (2.6) now yields k+1 = Bk . . . Ba+1 ψa+1 ϕa+2 . . . ϕk+1 , and hence k+1 ≤ Bk . . . Ba ≤ C exp(k − a ).
(3.24)
The Markov property also implies that hk = ϕk+1 hk+1 , and hence hk = ϕk+1 ϕk+2 . . . ϕb 1 since hb = 1.
(3.25)
We view the probabilities Pr(k,·) (τa < τb ) as the column vector 1−hk . Then, presenting ϕb = ψb − b , we can have Pr(k,·) (τa < τb ) = 1 − ϕk . . . ϕb−1 1 = 1 − ϕk+1 . . . ϕb−1 (ψb − b )1 = 1 − ϕk+1 . . . ϕb−1 1 + ϕk+1 . . . ϕb−1 b 1 ≤ 1 − ϕk+1 . . . ϕb−1 1 + ||b ||1. Iterating this inequality, we obtain that Pr(k,·) (τa < τb ) ≤
b
|| y ||1
y=k+1
and (3.21) follows from (3.24).
Lemma 10. Let a < b, and τ be the hitting time of L a ∪ L b – the union of two layers. Then if a ≤ k ≤ b, we have E (k,i) (τ ) ≤ C(b − a)2 exp min
sup a≤s τb ). First we see that from Lemma 9, and (3.15), (3.19), (3.20), Pr(0,i) (τb > τa ) ≤ C (b − a) exp ≤
sup x − a
(3.31)
0≤x≤b
C log2 t exp −δ log t ≤ t −δ/2 , 2 σ δ
if t is large enough, and from Lemma 11 and (3.17), C log4 t exp supa≤s t, τa > τb ) ≤
(3.32)
278
E. Bolthausen, I. Goldsheid
By the Markov property, we get Pr(0,i) (X (t) ∈ / Jt , τb < min (τa , t)) ≤
max
s≤t,1≤ j≤m
Pr(b, j) (X (s) ∈ / Jt ).
(3.33)
Now
/ Jt ) ≤ Pr(b, j) (min (τa , τc ) ≤ t) + Pr(b, j) X (a,c) (s) ∈ / Jt , (3.34) Pr(b, j) (X (s) ∈
where X (a,c) is the chain with reflecting boundary conditions at L a and L c . The second summand is estimated by Lemma 12 and (3.18), which give Pr(b, j) X (a,c) (s) ∈ (3.35) / Jt ≤ C exp sup l − b ≤ Ct −δ ≤ t −δ/2 . l ∈J / t
To estimate the first summand in (3.34) we observe that by (3.19), Pr(b−1,i) (τa < τb ) ≤ C exp [−a ] exp b−1 + exp [b ] ≤ C exp − (1 + δ) log t ≤ t −1−2δ/3 , and similarly Pr(b+1,i) (τc < τb ) ≤ t −1−2δ/3 . If, starting in (b, j), the chain reaches L a or L c in time t, there is at least one among the first t/2 of the excursions from L b which reaches L a ∪ L c . By the above estimates, each such excursion has at most probability t −1−2δ/3 to be “successful”, and therefore t/2 Pr(b, j) (min (τa , τc ) ≤ t) ≤ 1 − 1 − t −1−2δ/3 ≤ t −δ/2 . (3.36) Combining (3.30)–(3.36), we get / Jt ) ≤ 4t −δ/2 . Pr(0,i) (X (t) ∈ This proves the claim. 4. Appendix Most (if not all) of the results in this appendix are not new. The main reason for including them is that we want to present them in the form which is needed for our purpose; this is particularly relevant in the case of Markov chains generated by contracting transformations. We also hope that a more self-contained paper makes an easier reading. 4.1. The CLT and the invariance principle (IP) for stationary Markov chains. We first recall, in Subsect. 4.1.1, the classical results of B. M. Brown [2] about the CLT and the IP for martingales. We then explain in Subsect. 4.1.2 that the reduction of the proof of the CLT for Markov chains to the martingale case invented by Gordin and Lifshits [10] can be easily extended to obtain the IP for Markov chains. Finally, in Subsect. 4.1.3, we prove that the Gordin-Lifshits conditions are satisfied for a class of Markov chains generated by contracting transformations.
Lingering Random Walks in Random Environment on a Strip
279
4.1.1. The CLT and the IP for martingales (by B. M. Brown [2]). Let { Sn , Fn }, n = 1, 2, . . . be a martingale on the probability space (, F, P). Put Un = Sn − Sn−1 with S0 = 0. The expectation with respect to P is denoted by E, and E j−1 stands for n 2 the conditional expectation E(· | F j−1 ). Let σn2 = En−1 (Un2 ), Vn2 = j=1 σ j , and sn2 = E(Vn2 ) = E(Sn2 ). The main assumption in [2] concerned with martingales is: Vn2 sn−2 → 1 in probability as n → ∞.
(4.1)
We says that the Lindeberg condition holds for the class of martingales satisfying (4.1) if for any ε > 0, sn−2
n
EU 2j I (|U j | ≥ εsn ) → 0 as n → ∞,
(4.2)
j=1
where I (·) is a characteristic function of a set. For t ∈ [ 0, 1 ] define a sequence of piecewise linear random functions 2 − s 2 )−1 u n (t) = sn−1 Sk + Uk+1 (tsn2 − sk2 )(sk+1 k 2 , k = 0, 1, . . . , n − 1. if sk2 ≤ tsn2 ≤ sk+1
(4.3)
The following two theorems from [2] describe the asymptotic behaviour of the sequences Sn and u n (·). Theorem 9. If (4.1) and (4.2) hold, then Sn is asymptotically normal:
x 1 2 −1 − 21 lim P{sn Sn ≤ x } = (2π ) e− 2 y dy n→∞
−∞
(4.4)
for all x. Furthermore, all finite dimensional distributions of u n (t) converge weakly, as n → ∞, to those of a standard Wiener process W (t) on 0 ≤ t ≤ 1 (that is W (0) = 0 and EW 2 (1) = 1). Theorem 10. Let { C[0, 1], B, PW } be the probability space where C[0, 1] is the space of continuous functions with the sup norm topology, B being the Borel σ -algebra generated by open sets in C[0, 1], and PW the Wiener measure. Let {Pn } be the sequence of probability measures on { C[0, 1], B } determined by the distribution of { u n (t), 0 ≤ t ≤ 1 }. Then if (4.1) and (4.2) hold, Pn → PW weakly as n → ∞. 4.1.2. The CLT and the IP for general Markov chains. In their famous work [10], Gordin and Lifshits reduced the proof of the CLT for Markov chains to that of martingales. They then applied the same approach to the proof of the invariance principle for Markov chains in [11]. We shall explain their method here for the sake of completeness. Let z k , k = 1, 2, . . ., be a stationary ergodic Markov chain with a phase space (X, A), transition kernel K (z, dy), and initial distribution κ. Let f : X → R be a real valued function on X such that E f (z) = 0 and Var f (z) < ∞ (all expectations are taken with respect to the measure κ). Let L 2 (X, A, κ) be the natural Hilbert space associated with X, A, κ. By I we denote the identity operator in this space, and by A the transition def operator of the Markov chain: AF(z) = X F(y)K (z, dy). Put Sn = f (z 1 ) + · · · + f (z n ) with the convention S0 = 0.
(4.5)
280
E. Bolthausen, I. Goldsheid
Theorem 11. Let z k be a Markov chain described above and suppose that the function f with E f = 0 can be presented as f = (I − A)F, where F ∈ L 2 (X, A, κ) and EF = 0. Put σ 2 = ||F||2 − ||AF||2 ≡ EF 2 − E(AF)2 and suppose that σ > 0. Then σS√n n converges in law towards the standard Gaussian distribution N (0, 1) and the sequence Sn satisfies the invariance principle with parameter σ in the sense of the definition given in Sect. 2.4. Proof. Consider the identity which is due to Gordin ([9]) and was used by Gordin and Lifshits in [10]: f (z k ) = U (z k , z k+1 ) + F(z k ) − F(z k+1 ), where U (z k , z k+1 ) = F(z k+1 ) − (AF)(z k ). This identity holds true because of the conditions imposed on f . Obviously, E{U (z k , z k+1 ) | z k , . . . , z 1 } = 0. Denote Uk+1 = U (z k , z k+1 ). In these notations we can write Sn = Sˆn + F(z 1 ) − F(z n+1 ), where Sˆn = nk=1 Uk . It is clear that if Fn is a σ -algebra generated by the variables z 1 , . . . , z n , then the sequence Sˆn , n = 1, 2, .. is a martingale with respect to the filtration Fn , n = 1, 2, . . .. Let us check that all conditions required by Theorems 9 and 10 are satisfied. Indeed, σ j2 = E{U 2j | z j } = (AF 2 )(z j ) − [(AF)(z j )]2 is a stationary sequence with Eσ j2 = ||F||2 − ||AF||2 = σ 2 . Relation (4.1) takes the form (nσ 2 )−1
n
σ j2 → 1
j=1
and is satisfied with probability 1 because of the Birkhoff Ergodic Theorem. The Lindeberg condition (4.2) takes the form EU12 I (|U1 | ≥ εnσ 2 ) → 0 as n → ∞, and is obviously satisfied. Finally, functions (4.3) are now given by 1
u n (t) = n − 2 σ −1 (Sk + (tn − k)Uk+1 )
if k ≤ tn ≤ k + 1, k = 0, 1, . . . , n − 1,
and hence for k ≤ tn ≤ k + 1, 1
vn (t) = u n (t) + n − 2 σ −1 (F(z 1 ) − F(z k+1 ) + (tn − k)(F(z k ) − F(z k+1 ))), where vn (t) is as in (2.17). Since F is square integrable and z n is a stationary sequence, 1 it follows that n − 2 max1≤k≤n |F(z k )| → 0 with probability 1 as n → ∞. Hence also the sup0≤t≤1 |vn (t) − u n (t)| → 0 as n → ∞ with probability 1. All statements of our theorem follow now from Theorems 9 and 10. 4.1.3. The CLT and the IP for Markov chains generated by contracting transformations. Consider the following setup: (, F, P) is a probability space; the related expectation is denoted E. M is a compact metric space equipped with a distance ρ(·, ·). B is a semigroup of continuous Lipschitz transformations of M: for any g ∈ B there is a constant l g such that ρ(g.y, g.y ) ≤ l g ρ(y, y ) for any y, y ∈ M. Here and in the
Lingering Random Walks in Random Environment on a Strip
281
sequel g.y denotes the result of the action of g ∈ B on y ∈ M; this notation will be used most of the time but in some cases we may write g(y) rather than g.y. def
For any g1 , g2 ∈ B put ρ(g ¯ 1 , g2 ) = sup y∈M ρ(g1 .y, g2 .y). Obviously, ρ(·, ¯ ·) defines a distance on B. We can now consider a Borel sigma-algebra generated by the corresponding open subsets of B; this sigma-algebra will be denoted by S. def
Consider a measurable mapping g : → B, ω → g ω and for a B ∈ S put µ(B) = P{ω : g ω ∈ B}. We say that g is a random transformation of M. Let gk ∈ B, k ≥ 1 be a sequence of independent copies of g. Without loss of generality we can assume that gk are defined on the same probability space (, F, P). def
Denote by g( j) = g j . . . g1 the product of random transformations g1 , . . . , g j and let be the probability distribution of the product g( j) . This measure on B is often called the j th convolution power of the measure µ and is denoted by µ( j) = µ∗ j = µ ∗ · · · ∗ µ ( j times). A sequence of random transformations gk is said to be contracting if there are constants C > 0 and c, 0 ≤ c < 1 such that for any y, y ∈ M and any n ≥ 1,
ρ(g.y, g.y )µ(n) (dg) ≡ Eρ(gn . . . g1 .y, gn . . . g1 .y ) ≤ Ccn . (4.6) µ( j)
B
Remarks. Perhaps it would be more natural to say that the contraction property holds if ρ(g.y, g.y )µ(n) (dg) ≤ Ccn ρ(y, y ). However, (4.6) is sufficient for our purposes B and is what we check in our applications. As usual, products of random transformations generate a Markov chain with a state space M. Namely, let ν ≡ ν(dy) be a probability measure on M and let y1 ∈ M be chosen randomly according to the distribution ν and independent of all g j ’s. For k ≥ 1 def
define yk+1 ∈ M by yk+1 = gk .yk ≡ g(k) .y1 . The sequence of pairs (gk , yk ), k ≥ 1 forms a Markov chain with a phase space B × M; this chain will be denoted (g, y). Note that the (y)-component of this chain, the sequences yk , k ≥ 1, is itself a Markov chain with the phase space M. Since M is a compact space the chain (y) has an invariant measure; we shall suppose from now on that ν is such a measure which, in turn, implies that µ(dg)ν(dy) is an invariant measure of the chain (g, y). It is well known (and easy to see) that if gk is a contracting sequence of random transformations then the Markov chain (y) has a unique invariant measure. Let L2 (B × M) be the Hilbert space of µ × ν square integrable real valued functions and C(B × M) be its subset of continuous functions. Given an f ∈ C(B × M) let Sn denote the related Birkhoff sums along a trajectory of the Markov chain (g, y): Sn =
n
f (gk , yk ).
k=1
By A we denote the following Markov operator acting in L2 (B × M) and preserving C(B × M):
def (A f )(g, y) = f (g , g.y)µ(dg ). (4.7) B
It follows from (4.7) that (A f )(g, y) =
k
B×B
(k−1) f (g , gg.y)µ(dg ˜ )µ (d g). ˜
(4.8)
282
E. Bolthausen, I. Goldsheid
Theorem 12. Suppose that the sequence of random transformations gk is contracting and f is a continuous bounded function on B × M such that (i) B f (g, y)µ(dg) is Lipschitz on M, that is for some C f ( f (g, y) − f (g, y ))µ(dg) ≤ C f ρ(y, y ). B
(ii) B f (g, y)µ(dg)ν(dy) = 0. Then the equation (I − A)F = f,
(4.9)
has a solution F(g, y) which is continuous on B × M and
F(g, y)µ(dg)ν(dy) = 0. B×M
Besides, this solution is unique in L2 (B × M). Denote by
2 σ = (AF 2 − (AF)2 )(g, y)µ(dg)ν(dy). B×M
S√n
If σ > 0 then σ n converges in law towards the standard Gaussian distribution N (0, 1) and the sequence Sn satisfies the invariance principle with parameter σ . If σ > 0 and, in addition to (i), | f (g, y)− f (g, y )| ≤ C f (g)ρ(y, y ) with log(1+C f (g))µ(dg) < ∞, then the invariance principle for the sequence Sn is satisfied uniformly in y1 ∈ M. If σ = 0, then the function F(g, y) depends only on y and for every (g, y) in the support of µ × ν one has f (g, y) = F(y) − F(g.y). (4.10) Proof. The existence of F. Equation (4.9) can be rewritten as F = AF + f and, iterating this relation, one obtains a formal series: F=
∞
Ak f
(4.11)
k=0
Condition (ii) of the theorem and the invariance of the measure µ(dg)ν(dy) imply that
k (A f )(g, y)µ(dg)ν(dy) = f (g, y)µ(dg)ν(dy) = 0. B×M
B×M
Hence, the convergence in (4.11) would follow if we prove that k
¯ y¯ )| ≤ const c n0 for any (g, y), (g, ¯ y¯ ) ∈ support of µ × ν. |(Ak f )(g, y) − (Ak f )(g, (4.12) But it follows from (4.8) and condition (i) of the theorem that |(Ak f )(g, y) − (Ak f )(g, ¯ y¯ )|
f (g , gg.y) ˜ − f (g , g˜ g. ¯ y¯ ) µ(dg ) µ(k−1) (d g) ˜ = B
B ≤ Cf ρ(gg.y, ˜ g˜ g. ¯ y¯ )µ(k−1) (d g) ˜ ≤ C cn , B
Lingering Random Walks in Random Environment on a Strip
283
where the last inequality is due to the contraction property (4.6). The existence and continuity of F(g, y) is proved. Uniqueness. As usual, to prove the uniqueness we have to show that the homogeneous equation F = AF has only a trivial solution F ≡ 0 in the class of functions satisfying the condition B×M F(g, y)µ(dg)ν(dy) = 0. To check that this is the case assume that, to the contrary, there is an F ∈ L2 (B × M) such that F ≡ 0, satisfies the homogeneous equation, and has a zero mean value. For a given > 0 find a function F˜ which is Lipschitz on B × M and approximates F in the sense that ˜ ≤ , where || · || denotes the L2 (B × M) norm. The F˜ can always be ||F − F|| ˜ y)µ(dg)ν(dy) = 0. Next, for any n ≥ 1, chosen so that B×M F(g, ˜ + An F. ˜ F = An F = An (F − F) ˜ ≤ . Since can be made But then An F˜ → 0 uniformly in (g, y) and ||An (F − F)|| arbitrarily small, we conclude that F ≡ 0. Proof of the CLT and the IP in the case σ > 0. According to Theorem 11 the existence of F ∈ L2 (B × M) satisfying Eq. (4.9) is the main condition under which both the Central Limit Theorem and the Invariance Principle hold for Birkhoff sums picked up along a realization of a trajectory of a Markov chain. The ergodicity of the Markov chain is the other condition which is needed and which in our case follows from the contraction property. The CLT and the IP is thus proved. Proof of the uniform IP in the case σ > 0. We write Sn (y1 ) for Sn in order to emphasize the dependence of this sequence on y1 . Clearly, |Sn (y1 ) −
Sn (y1 )|
≤
n
| f (gk , yk ) −
f (gk , yk )|
k=1
≤
∞
C f (gk )ρ(yk , yk ).
(4.13)
k=1
It follows from (4.6) (due to the Chebyshev inequality) that P almost surely ρ(yk , yk ) ≤ e−εk for some ε > 0 and k ≥ k(ε, ω). It is essential that k(ε, ω) does not depend on y1 , y1 . Next, due to the condition imposed on the function f , the sequence k −1 log(1 + C f (gk )) → 0 as k → ∞ P almost surely. Hence the right-hand side of (4.13) is P almost surely bounded and the corresponding estimate does not depend on y1 , y1 . Let us now consider the dependence on y1 of the relevant vn (t) = vn (t; y1 ) (see (2.17)). For t ∈ [0, 1], and k ≤ tn ≤ k + 1, k = 0, 1, . . . , n − 1 we have: 1 vn (t; y1 ) − vn (t; y1 ) = n − 2 Sk (y1 ) − Sk (y1 ) + ( f k+1 (y1 ) − f k+1 (y1 ))(tn − k) with the obvious meaning of f k+1 (y1 ) and f k+1 (y1 ). It is now clear that P almost surely vn (t; y1 )−vn (t; y1 ) → 0 as n → ∞ uniformly in y1 , y1 . This proves that the uniformity of the invariance principle. The case σ = 0. Note that 2
2 2 (AF − A(F ))(g, y) = F(g , g.y) − F(g, ˜ g.y)µ(d g) ˜ µ(dg ). B
B
Hence σ = 0 implies that for µ × ν-almost all (g, y) and µ-almost all g
F(g , g.y) = F(g, ˜ g.y)µ(d g). ˜ B
(4.14)
284
E. Bolthausen, I. Goldsheid
But F(·, ·) is a continuous function of both variables and hence (4.14) holds for any (g, y) from the support of µ × ν. This proves that F depends only on the second variable: F(g , g.y) ≡ F(g.y) (we note that g.y runs over the whole of the support of ν when (g, y) runs over the support of µ × ν). Finally, one obtains (4.10) by substituting F(y) (rather than F(g, y)) into (4.9). 4.1.4. Markov chains generated by contracting transformations: characterization of the support of the invariant measure. The aim of this section is to give a characterization of the support of an invariant measure of a Markov chain generated by contracting transformations in terms of fixed points of these transformations. We work here within the same setup as in Sect. 4.1.3. This applies to the sequence g j , j ≥ 1, the metric space (M, ρ), the semigroup B of transformations of M, the Markov chain y j defined by y j+1 = g j .y j , j ≥ 1 (with y1 being a random element independent of all g j ’s). However, we shall suppose that B is generated by the transformations belonging to the support J0 of the distribution µ of g j ’s. This difference is important for Lemma 14. Let ν be the stationary measure of our chain and M0 be the support of ν. As usual, we say that a transformation g ∈ B is a contraction on a subset M0 ⊂ M if there is an n ≥ 1 and a c ∈ [0, 1) (both n and c may depend on g) such that ρ(g n .x , g n .x ) ≤ cρ(x , gx ) for any x , x ∈ M0 . If g ∈ B, then by x g we denote a fixed point of the transformation g: g.x g = x g . Lemma 13. If g ∈ B is a contraction on M then its fixed point x g ∈ M, belongs to the support M0 of the invariant measure ν of the Markov chain y j . Proof. Consider a random infinite sequence g1 , g2 , . . .. Since g ∈ J0 , almost every such sequence has the property that for any k ≥ 1 and any δ > 0 there are infinitely many i’s such that each element of the part gi , . . . , gi+nk−1 of the sequence approximates g so closely that (nk)
ρ(g ¯ nk , gi
)≤δ
where
(nk) def
gi
= gi+nk−1 . . . gi .
Moreover, by the law of large numbers these i’s have a positive frequency. Since ρ(x g , g nk .x ) = ρ(g nk x g , g nk .x ) ≤ ck ρ(x g , x ) for any x ∈ M, we have that (nk)
ρ(x g , gi
(nk)
.x ) ≤ ck ρ(x g , x ) + ρ(g nk .x , gi
.x ) ≤ ck ρ(x g , x ) + δ. ( j)
Hence any (small) neighbourhood of x g is visited by the sequence g1 .x , j ≥ 1, infinitely many times and, moreover, this happens with a positive frequency for almost every sequence g j , j ≥ 1. This implies that x g ∈ M0 and (g, x g ) ∈ J0 × M0 . Note that if the invariant measure ν of our Markov chain is ergodic, then the support M0 of this measure is a minimal set of B. The latter by definition means that the orbit {g.x : g ∈ B} of any x ∈ M0 is everywhere dense in M0 . Lemma 14. Let M0 ⊂ M be a minimal set of B. Suppose that there exist a gˆ ∈ B which is a contraction on M0 . Consider the set of all fixed points of B belonging to M0 : def
FixM0 (B) = {x : x ∈ M0 and there is a g ∈ B such that g.x=x }. Then FixM0 (B) is everywhere dense in M0 .
Lingering Random Walks in Random Environment on a Strip
285
Proof. The contraction gˆ given to us by the condition of the lemma has a fixed point xˆ ∈ M0 (it may have other fixed points too, but we are interested only in this one). Since M0 is minimal it coincides with the closure of the orbit {g.xˆ : g ∈ B}. For a given g ∈ B let us consider the point g.x. ˆ We shall now show that for a sufficiently large n the transformation g gˆ n has a fixed point which we shall denote x g gˆ n . Indeed, for any x , x ∈ M0 , ρ(g gˆ n .x , g gˆ n .x ) ≤ l g ρ(gˆ n .x , gˆ n .x ) ≤ l g cn ρ(x , x ). If n is such that l g cn < 1, then there is a fixed point x g gˆ n of g gˆ n . On the other hand, it is obvious that g gˆ n .x → g.xˆ as n → ∞ uniformly in x ∈ M0 because gˆ n .x → xˆ uniformly in x ∈ M0 . It follows that in particular x g gˆ n → g.xˆ and this proves the lemma. 4.2. Products of positive matrices. Lemma 15 below explains two versions of a well known contraction property of products of positive matrices (see, e.g. [5]). The first version of this property has already been explained and proved in the Appendix to [1] and we therefore prove here only the second version. There is a slight difference in the notations used in this paper and those we have introduced in [1] and no difference in the proof; we emphasize once again that this is done for the purposes of completeness and convenience of references in the proofs of other theorems. Lemma 15. Let an = (an (i, j)), n = 1, 2, . . . be a sequence of positive m ×m matrices, def def an > 0. Put H˜ n = an an−1 . . . a1 , Hn = a1 a2 . . . an and denote ⎛ ⎞−1 δ˜r = min ar (i, j)ar −1 ( j, k) ⎝ ar (i, j)ar −1 ( j, k)⎠ , 2 ≤ r ≤ n, i, j,k
j
⎛ δr = min ar (i, j)ar +1 ( j, k) ⎝ i, j,k
⎞−1 ar (i, j)ar +1 ( j, k)⎠
, 1 ≤ r ≤ n − 1.
(4.15)
j
Suppose that ∞
δ˜r = ∞.
r =2
Then the products Hn and H˜ n can be presented as follows: Hn = Dn [(cn (1)1, . . . , cn (m)1) + φn ], H˜ n = D˜ n [(c(1)1, ˜ . . . , c(m)1) ˜ + φ˜ n ], (4.16) where: Dn and D˜ n are diagonal matrices with positive diagonal elements; # # ˜ n ˜ φn ≤ rn−1 φ ≤ (1 − mδ ), r n r =2 (1 − m δr ); =1 c( ˜ j) are strictly positive numberswhich are uniquely defined by the sequence {ak }k≥1 , do not depend on n, and such that j c( ˜ j) = 1; cn ( j) are strictly positive numbers with j cn ( j) = 1 (note that cn ( j), unlike the c( ˜ j), do depend on n and, generally, do not have a limit).
286
E. Bolthausen, I. Goldsheid
Proof. Present Hn as follows: −1 a2 . . . D1−1 an = Dn a˜ 1 a˜ 2 . . . a˜ n , Hn = Dn Dn−1 a1 Dn−1 Dn−1 −1 ˜ where a˜ r ≡ Dn−r +1 ar Dn−r , D0 = I , and Dn−r = diag (Dn−r (1), . . . , Dn−r (m)) are diagonal matrices, with Dn−r (i) chosen so that to make matrices a˜ r stochastic. It is very easy to see that the only such choice is given by ar +1 (i, ir +1 ) ar +2 (ir +1 , ir +2 ) . . . an (i n−1 , i n ) Dn−r (i) = def
ir +1 ,...,i n
and ar (i, j) a˜ r (i, j) =
ir +1 ,...,i n
ir ,ir +1 ,...,i n
ar +1 ( j, ir +1 ) . . . an (i n−1 , i n )
ar (i, ir ) ar +1 (ir , ir +1 ) . . . an (i n−1 , i n )
≥ δr .
(4.17)
It is well known that the last estimate implies the following presentation of the product of stochastic matrices a˜ n : a˜ 1 a˜ 2 . . . a˜ n = (cn (1)1, . . . , cn (m)1) + φn , where min a˜ n (i, j) ≤ cn ( j) ≤ max a˜ n (i, j) i
i
(4.18)
and the matrices φn are such that φn ≤
n−1 $
(1 − mδr ).
r =1
4.3. A stability estimate. The stability property which we explain below is definitely well known to specialists in the relevant field. Given that the proof is very short, it seems that it is easier for us to prove it than to find a relevant reference. Let bn and bn be two sequences of transformations of a metric space (X, r) and def
def
= bn (xn ), n ≥ 1, with given initial values x1 , x1 ∈ X. For any xn+1 = bn (xn ), xn+1 def
two transformations b and b put ρ(b, ¯ b ) = supx∈X r(b(x), b (x)). Lemma 16. Suppose that (a) bn are uniformly contracting, that is there is a c, 0 ≤ c < 1, such that for any x, y ∈ X we have r(bn (x), bn (y)) ≤ cr(x, y); (b) ρ(b ¯ n , bn ) → 0 as n → ∞. Then r(xn , xn ) → 0 as n → ∞. If, instead of (b), a stronger property holds, namely ρ(b ¯ n , bn ) ≤ C2 c0n ρ(b ¯ 1 , b1 ) for some C2 and c0 < 1, then for > 0 there is a constant C3 such that r(xn , xn ) ≤ C3 c˜n (ρ(b ¯ 1 , b1 ) + r(x1 , x1 )), where c˜ = max(c, c0 ) + .
(4.19)
Lingering Random Walks in Random Environment on a Strip def
287
def
Proof. Put dn = ρ(b ¯ n , bn ) and rn = r(xn , xn ). Since r(xn+1 , xn+1 ) = r(bn (xn ), bn (xn )) ≤ r(bn (xn ), bn (xn )) + r(bn (xn ), bn (xn ))
≤cr(xn , xn ) + ρ(b ¯ n , bn ),
we have that rn+1 ≤ crn + dn ≤ dn + cdn−1 + · · · + ck dn−k + ck+1 rn−k .
(4.20)
For a given > 0 choose k so that ck rn−k ≤ (which is possible because X is a compact space and thus rn−k is a uniformly bounded sequence). Next choose N (, k) so that dn− j ≤ when n − j ≥ N (, k) − k. It follows now from (4.20) that rn ≤ (2 − c)(1 − c)−1 when n > N (, k). This proves the first statement of the lemma. To prove the second statement substitute k = n into (4.20) and take into account the stronger estimates for dn . Estimate (4.19) follows with an evident choice of C3 . Remarks. The second statement of this lemma does not use the fact that X is a compact space. Acknowledgement. This work was supported by the following grants of the Swiss National Foundation: 200020-107739/1 and 200020-116348. We are grateful to the Isaac Newton Institute for its hospitality during the program Interaction and Growth in Complex Stochastic Systems held in Cambridge, UK in 2003. We also thank the European Science Foundation Research Networking Programme on Phase-Transitions and Fluctuation Phenomena for Random Dynamics in Spatially Extended Systems (RDSES) for its financial support.
References 1. Bolthausen, E., Goldsheid, I.: Recurrence and transience of random walks in random environments on a strip. Commun. Math. Phys. 214, 429–447 (2000) 2. Brémont, J.: On some random walks on Z in random medium. Ann. Probab. 30, 1266–1312 (2002) 3. Brémont, J.: Behavior of random walks on Z in Gibbsian medium. C. R. Acad. Sci. Série 1 Math. 338(11), 895–898 (2004) 4. Brown, B.M.: Martingale Central Limit Theorems. Ann. Math. Statist. 42, 59–66 (1971) 5. Furstenberg, H., Kesten, H.: Products of random matrices. Ann. Math. Statist. 31, 457–469 (1960) 6. Goldsheid, I.: Linear and Sub-linear Growth and the CLT for Hitting Times of a Random Walk in Random Environment on a Strip. Probability Theory and Related Fields, appeared on line in August, 2007, DOI:10.1007/s00440-007-0091-0 7. Golosov, A.: Localization of random walks in one-dimensional random environments. Commun. Math. Phys. 92, 491–506 (1984) 8. Golosov, A.: On the limit distributions for a random walk in a critical one-dimensional random environment. Usp. Mat. Nauk 41(2), 189–190 (1986) 9. Gordin, M.I.: The Central Limit Theorem for stationary processes. Soviet Math. Dokl. 10, 1174–1176 (1969) 10. Gordin, M.I., Lifshits, B.A.: The Central Limit Theorem for stationary Markov processes. Sov. Math. Dokl. 19(2), 392–394 (1978) 11. Gordin, M.I., Lifshits, B.A.: The Invariance principle for stationary Markov processes. “Teorija verojatnostej i ejo primenenija” 1978, issue 4, pp. 865-866 (in Russian) 12. Hall, P., Heyde, C.C.: Martingale limit theory and its application. New York: Academic Press, 1980 13. Kesten, H.: The limit distribution of Sinai’s random walk in a random environment. Physica A 138, 299–309 (1986) 14. Kesten, H., Kozlov, M.V., Spitzer, F.: Limit law for random walk in a random environment. Comp. Math. 30, 145–168 (1975) 15. Key, E.: Recurrence and transience criteria for random walk in a random environment. Ann. Prob. 12, 529–560 (1984) 16. Lawler, G.: Weak convergence of a random walks in a random environment. Commun. Math. Phys. 87, 81–87 (1982)
288
E. Bolthausen, I. Goldsheid
17. Letchikov, A.V.: Localization of one-dimensional random walks in random environment. Soviet Scientific Reviews Section C: Mathematical Physics Reviews. Chur. Switzerland: Harwood Academic Publishers, 1989, pp. 173–220 18. Sinai, Ya.G.: The limiting behavior of a one-dimensional random walk in a random medium. Theory Prob. Appl. 27, 256–268 (1982) 19. Solomon, F.: Random walks in a random environment. Ann. Prob. 3, 1–31 (1975) 20. Zeitouni, O.: Random walks in random environment, XXXI Summer school in Probability, St. Flour (2001). Lecture notes in Math. 1837, Berlin:Springer, 2004, pp. 193–312 Communicated by M. Aizenman
Commun. Math. Phys. 278, 289–306 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0402-4
Communications in
Mathematical Physics
Symplectic Fibrations and the Abelian Vortex Equations T. Perutz DPMMS, Centre for Mathematical Sciences, University of Cambridge, Wilberforce Road, Cambridge CB3 0WB, United Kingdom. E-mail:
[email protected] Received: 5 June 2006 / Accepted: 7 August 2007 Published online: 13 December 2007 – © Springer-Verlag 2007
Abstract: The n th symmetric product of a Riemann surface carries a natural family of Kähler forms, arising from its interpretation as a moduli space of abelian vortices. We give a new proof of a formula of Manton–Nasir [10] for the cohomology classes of these forms. Further, we show how these ideas generalise to families of Riemann surfaces. These results help to clarify a conjecture of D. Salamon [13] on the relationship between Seiberg–Witten theory on 3–manifolds fibred over the circle and symplectic Floer homology. 1. Introduction 1.1. Relative symmetric products. Consider a pair of smooth, oriented manifolds X and S with dim(X ) − dim(S) = 2, and a proper submersion π : X → S. Thus π is a smooth fibre bundle, and its typical fibre is a compact orientable surface . Definition 1.1. The r th symmetric product bundle, or relative symmetric product, πr : SymrS (X ) → S, is defined to be the quotient by the symmetric group Sr of the fibre product ×r : π(x1 ) = · · · = π(xr )} X ×r S = {(x 1 , . . . , xr ) ∈ X
with its natural projection to S. 1.1.1. Smooth structures SymrS (X ) is a topological manifold, but it does not inherit a smooth structure from X . To make SymrS (X ) a smooth manifold one should choose a complex structure j on the vertical tangent bundle T v X = ker(Dπ ) ⊂ T X , compatibly with the orientations. Then the fibres become complex manifolds. The smooth atlas on the relative symmetric product is generated by charts which are obtained by fibrewise application of the elementary symmetric functions to ‘restricted charts’ : D 2 × U → X.
290
T. Perutz
This means that there is a chart ψ : U → S such that (i) π ◦ = ψ ◦ pr 2 , and (ii) : D 2 × {s} → X ψ(s) is a holomorphic embedding, for each s ∈ U . As observed by Donaldson and Smith [4], the existence of such charts is a consequence of the parametrised Riemann mapping theorem. We will write SymrS (X ; j) when we want to emphasise that this is the smooth structure being considered. Different choices, say j0 , j1 , give distinct smooth structures. However, SymrS (X ; j0 ) is diffeomorphic to SymrS (X ; j1 ), as one can see by considering the relative symmetric product of X × [0, 1] → S × [0, 1], equipped with an interpolating family jt . 1.1.2. Kähler forms The symmetric product Symr () of a Riemann surface equipped with a Kähler form ω is itself a Kähler manifold. To be precise, a Kähler form is determined by a hermitian line bundle (L , | · |) of degree r over , together with a real parameter τ > 2πr/ ω. The reason is that the symmetric product can be identified canonically with a moduli space of abelian vortices, and this has a natural quotient symplectic structure. There is a generalisation of this to the case of relative symmetric products. We first fix our conventions concerning families of symplectic manifolds: Definition 1.2. (a) A symplectic fibration with typical fibre (M, ω) is a smooth fibre bundle p : X → S together with a vertical two-form ω, i.e. a section of 2 (T v X )∗ , such that each fibre (X s , ω|X s ) is a symplectic manifold isomorphic to (M, ω). (b) A locally Hamiltonian fibration (LHF) is a triple (X, p, ), where p : X → S is a smooth fibre bundle and a closed two-form on X such that (X s , |X s ) is a symplectic manifold for each s ∈ S.1 Relative symmetric products of symplectic surface-fibrations are again symplectic fibrations: if ( p : X → S, ω) is a symplectic fibration with typical fibre (, ω), and one specifies a hermitian line bundle over X of fibrewise degree r and a real parameter, then SymrS (X ) → S becomes a symplectic fibration. In this paper we show how to promote this functor to locally Hamiltonian fibrations, using the abelian vortex equations. In doing so we extend Salamon’s work [13] which applies to bundles over S 1 . Our method enables one to determine the cohomology classes of the closed forms which arise, in terms of natural operations relating the cohomologies of X and SymrS (X ). 1.2. Statement of results. There is a sequence of natural operations sending cohomology classes on X to cohomology classes on the relative symmetric product SymrS (X ) of X → S to classes on X . These come about via the universal (or tautological) divisor univ = SymrS (X ) × S X, i.e., the locus of pairs (D, x), where x ∈ Supp(D). This carries a codimension-two homology class relative to boundary, and dually, a cohomology class δ ∈ H 2 (SymrS (X ) × S X ; Z). 1 The term ‘locally Hamiltonian fibration’ is used in [9] in a slightly more restrictive way than here; there it is assumed that the base is 2-dimensional and that the form satisfies the normalisation condition introduced by Guillemin and Sternberg.
Symplectic Fibrations and the Abelian Vortex Equations
291
For example, when X → S is a holomorphic fibration, δ = c1 (O( univ )). Using the projection maps p1
p2
SymrS (X ) ←−−−− SymrS (X ) × S X −−−−→ X, and cup products in cohomology, define, for each k ≥ 0, the map H ∗ (X ; Z) → H ∗+2k−2 (SymrS (X ); Z); c → c[k] := p1! ( p2∗ c) δ k .
(1)
These operations evidently behave in a natural way under base-change (i.e. pulling back by S → S). It is known [11, Lemma 2.1.1] that 1 c1 (T v X )[1] + 1[2] . c1 (T v SymrS (X )) = 2 Theorem 1. Let (X, π, ) be a proper, locally Hamiltonian surface-fibration over a manifold S, and r a positive integer. Choose • an -positive complex structure j on T v X ; • a hermitian line bundle (L , | · |) over X such that L|X s has degree r for each s ∈ S, and a unitary connection Aref on L; • a real parameter τ . We require τ > 2πra −1 , where a is the symplectic area of a fibre. There is a procedure which associates to these data a closed two-form v(, τ, L) on the relative symmetric product SymrS (X ; j) which makes it a locally Hamiltonian fibration. This procedure is compatible with restriction of the base S. The form v(, τ, L) restricts on each fibre Symr (X s ) to the canonical Kähler form arising from the abelian vortex equations with parameter τ . Its cohomology class is [v(, τ, L)] = 2π τ [][1] − π 1[2] ∈ H 2 (SymrS (X ); R). In particular, the class [v(, τ, L)] does not depend on the line bundle L. By applying the theorem to fibrations X × U → S × U , one sees that there is smooth dependence on parameters. It can be verified without difficulty that, when the base S is the circle, the form v(, τ, L) coincides with the one found by Salamon in [13]. Remark 1.3. In the case where the base S is a point, the result specialises to a formula for the cohomology class of the canonical Kähler form on the vortex moduli space (Theorem 3). This formula is due to Manton and Nasir [10]). Note, though, that their work relies on a local expansion of the Kähler form [14] whose derivation has not received the thoroughgoing analytic treatment a pure mathematician would ask for. Some further remarks on the nature of the theorem are in order. The interesting thing is not the existence of closed, fibrewise-Kähler two-forms in the specified cohomology class. Indeed, a patching procedure due to Thurston, standard in symplectic geometry, gives an easy construction of such forms. The point is rather that, among such forms, there are some which have a definite geometric (specifically, gauge-theoretic) origin. This geometric construction is closely related to the Seiberg–Witten equations on fibred 3– and 4–manifolds: see [13] and our discussion of Floer homology below. Let us say that locally Hamiltonian structures 0 , 1 on the same fibre bundle π : X → S are isotopic if there exists a locally Hamiltonian structure ∈ 2[0,1]×X on π × id : [0, 1] × X → [0, 1] × S with |{i} × X = i for i = 0, 1. We call LHFs equivalent if they are related under the equivalence relation generated by isotopy and two-form-preserving bundle isomorphism.
292
T. Perutz
Corollary 2. Fix a proper surface-bundle π : X → S. Choose two sets of data (0 , j0 , L 0 , | · |0 , Aref,0 , τ0 ), (1 , j1 , L 1 , | · |1 , Aref,1 , τ1 ) as above, and suppose that [τ0 0 ] = [τ1 1 ] ∈ H 2 (X ; R). Then the LHFs (SymrS (X ; j0 ), πr , v(, L 0 , τ0 )), (SymrS (X ; j1 ), πr , v(, L 1 , τ )) are equivalent. Proof. Because τ0 0 and τ1 1 represent the same cohomology class, the locally Hamiltonian fibrations (X, π, τ0 0 ) and (X, π, τ1 1 ) are isotopic: an isotopy is given by the form τ0 0 + d(tβ) ∈ Z 2 (X × [0, 1]), where τ1 1 − τ0 0 = dβ. This restricts to the slice X × {t} as (1 − t)τ0 0 + tτ1 1 , and hence is positive on the fibres X s,t of X × [0, 1] → S × [0, 1]. We can give X × [0, 1] a vertical complex structure J by choosing a path jt between the given ones. In the case that L 1 = L 0 , there is a hermitian line bundle (L , | · |) with connection over X × [0, 1] which restricts to (L i , | · |i ) on the ends. The form v(τ0 0 + d(tβ), L , 1) on SymrS×[0,1] (X × [0, 1]; J ) restricts on the ends to v(τi i , L i , 1). Hence (SymrS (X ; j0 ), πr , v(, L 0 , τ0 )) is equivalent to (SymrS (X ; j1 ), πr , v(, L 1 , τ1 ). It remains to show that changing the line bundle does not affect things, and for this we may assume that 0 = 1 (write for this single form) and j0 = j1 . By the theorem, we can write v(, L 0 , τ )−v(, L 1 , τ ) = dγ . Then, since v(, L 0 , τ ) and v(, L 1 , τ ) are both Kähler, the form v(, L 0 , τ ) + d(tγ ) on SymrS (X ) × [0, 1] gives an isotopy between them.
1.3. Floer homology for fibred three-manifolds. Floer homology for symplectic automorphisms works as follows. Let be the universal Z/2–Novikov ring: the ring of formal ‘series’ λ∈R a(λ)t λ , where a : R → Z/2 is a function such that (−∞, c] ∩ Supp(a) is finite for any c ∈ R. Let (X, p, ) be a LHF over a compact one-manifold S, and suppose that its fibres are compact, ‘weakly monotone’ symplectic manifolds (i.e. c1 (X s ) is positively proportional to the symplectic class, or else c1 (X s ) vanishes on π2 (X s ), or else every S ∈ π2 (X s ) has absolute Chern number |c1 (X s ), [S]| ≥ dim(X s )/2 − 2). One can then associate with (X, p, ) a -module H F∗ (X, p; ). The underlying chain group is freely generated by the set of sections of X which are horizontal for the natural connection determined by (more precisely, by some generic perturbation of ). The differential involves moduli spaces of pseudo-holomorphic sections of X × R → S × R. Isotopic LHFs have isomorphic Floer homologies. Two-form-preserving bundle isomorphisms also give isomorphisms in Floer homology. This theory is in a sense too rich: different local Hamiltonian structures may give different modules. In the case where the fibres X s are complex manifolds, one way to make the theory more manageable is to consider closed two-forms which are not just fibrewise-symplectic, but actually fibrewise-Kähler. If one also fixes the cohomology
Symplectic Fibrations and the Abelian Vortex Equations
293
class of these forms then the set of possible choices is a convex set, and H F∗ (X, p; ) is independent of the specific choice of . An example of this method of making Floer homology manageable occurs in work of Seidel [15], who applies it to mapping tori of automorphisms of a surface of genus g ≥ 2. He thereby constructs invariants of mapping classes, π0 Diff + () [φ] → H F∗ ([φ]). Though not well understood, these invariants are far from trivial: Seidel shows that the identity mapping class [id] is characterised by the property that, under a natural action by the homology of the surface on Floer homology, H 2 (; Z/2) does not annihilate the whole module. One way to generalise Seidel’s set-up is as follows. Let π : Y → S 1 be a three-manifold fibred over S 1 , and consider its relative symmetric product SymrS 1 (Y ; j). We make it an LHF using a closed, fibrewise-Kähler two-form drawn from a particular cohomology class. The output will then depend only on the cohomology class chosen: for each class W which restricts to a Kähler class on the fibre Symr (), we get a module H F∗ (SymrS 1 (Y ), πr ; W ). The requirement that the fibres should be weakly monotone forces us to exclude the range g/2 ≤ r < g − 1, where g is the genus of the fibre. Let us take W to be one of the classes occurring in our theorem: W = W (w) = 2π w [1] − π 1[2] , where w = [τ ]. In this way we obtain a Floer homology module H F∗ (Y, π, r ; w) := H F∗ (SymrS 1 (Y ), πr ; W ) by giving only (Y, π, r ) together with a class w ∈ H 2 (Y ; R) which integrates positively over the fibres of π . Corollary 2 implies that these modules are well-defined, up to canonical isomorphism. For discussion of the dependence on w we refer to [16]. Now, we can of course represent W by one of the forms v(, τ, L) supplied by the theorem. Doing so is not of any great help in computing Floer homology, but it is highly relevant when we try to understand the relation between the symplectic Floer theory just discussed and the monopole Floer homology of the three-manifold Y . Monopole Floer homology is the Floer theory arising from the Chern–Simon–Dirac functional over a 3–manifold with Spinc –structure, a functional whose critical points are precisely the Seiberg–Witten monopoles. Specifically, the name refers to the theory constructed by Kronheimer–Mrowka in their authoritative forthcoming book [7]. It is a ‘perturbed’ version of monopole Floer homology which is of interest and for this we can again use as coefficient ring. Salamon’s proposal from [13], based on an adiabatic limit computation, is that there should be an isomorphism between symplectic and monopole Floer homologies. Expressing the conjecture in terms of Kronheimer–Mrowka’s conventions (and in terms of the notions of this paper) requires a little care because Salamon’s conventions differ in various (inessential) ways. If I have accounted correctly for these discrepancies, the statement is that there is an isomorphism between symplectic Floer homology for SymrS 1 (Y ) with the form v(, τ, L) (i.e., H F∗ (Y, π, r ; w), where w = [τ ]) and a certain summand in the -module H M∗ (Y ; −4π w − 32π 2 c1 (T v Y )), the monopole Floer homology with perturbation class −4π w − 32π 2 c1 (T v Y ). (The perturbation class is non-zero, providing we assume g ≥ 0 or τ 0, so that there are no reducible monopoles and only one version of monopole Floer theory.) The summand
294
T. Perutz
in question is the direct sum of submodules H M∗ (Y, t; w), where t ranges over those Spinc -structures on Y for which c1 (t), [] = χ () + 2r . Let us tie up this discussion. On one hand, we can use pure symplectic geometry to build a group H F∗ (Y, π, r ; w). Specifying the cohomology class on a relative symmetric product—namely, w [1] − π 1[2] or a multiple of it—is an essential part of the construction. On the other hand, the existence of the special forms v(, τ, L), and Salamon’s adiabatic limit, suggest that these modules should have a gauge-theoretic interpretation. We note finally that the modules H F∗ (Y, π, r ; w) fit into a field theory for Lefschetz fibrations over surfaces with boundary, which has been studied by M. Usher [16] and the author [12] (the latter extends the framework to a larger class of singular fibrations). This too is thought to be intimately related to Seiberg–Witten theory.
2. The Vortex Equations 2.1. Review of moduli spaces of vortices. Fix a closed Riemann surface (, j), a Kähler form ω ∈ 1,1 , and a hermitian line bundle (L , | · |) over , of degree r > 0. Let A(L , | · |), or A(L), denote the space of U(1)-connections (an affine space modelled on the imaginary one-forms i1 ). The gauge group, of smooth maps from to U(1), is denoted by G. Its Lie algebra is i0 . The pairing 0
⊗ i0
→ R,
f ⊗ ig →
fgω
embeds 0 into the dual of the i0 . We consider moment maps for Hamiltonian Gactions as maps into 0 . Connections, sections and gauge transformations are by default C ∞ , and the spaces are given their C ∞ topologies. We also need A21 , the space of U(1)-connections of Sobolev class L 21 (i.e. differing from a smooth one by an L 21 form); the space of sections L 21 (L); and the Sobolev gauge group G22 = L 22 (, U(1)). Note that a map → C of class L 22 is continuous, by the Sobolev embedding theorem, and hence has a pointwise norm. 2.1.1. Action of the gauge group The conformal structure j induces a Kähler structure on the space of connections A(L). Its two-form is (a1 , a2 ) → ia1 ∧ ia2 , a1 , a2 ∈ i1 . (2)
The complex structure is the Hodge star a → ∗ j a. The action of the gauge group G on A(L) is Hamiltonian, with (equivariant) moment map A(L) → i0 ;
A → ∗iFA .
(3)
The symplectic form ω induces a Kähler structure on 0 (L), with two-form (φ1 , φ2 ) →
Imφ1 , φ2 ω, φ1 , φ2 ∈ 0 (L)
(4)
Symplectic Fibrations and the Abelian Vortex Equations
295
and complex structure φ → iφ. The gauge-action on 0 (L) is Hamiltonian with moment map 0 (L) → 0 ; ψ →
1 2 |ψ| . 2
(5)
The manifold C(L) := A(L) × 0 (L) carries the product Kähler structure σ , which depends on both j and ω. The moment map m for the diagonal G-action is the sum of the moment maps of the factors, 1 m : C(L) → 0 , m(A, ψ) = ∗i FA + |ψ|2 . 2
(6)
The Chern–Weil formula gives some basic information about this moment map: ⎧ −1 ⎪ ⎨< r : m (τ ) = ∅; 1 τ ω = r : m(A, ψ) = τ ⇒ ψ ≡ 0; ⎪ 2π ⎩> r : m(A, ψ) = τ ⇒ ψ ≡ 0.
In fact, m is submersive at (A, ψ) precisely when ψ ≡ 0, which is also the locus on which the gauge-action is free. When τ ω > 2πr , the free gauge-action on µ−1 (τ ) admits local slices (see below), so the Kähler quotient m −1 (τ )/G is a Kähler manifold. 2.1.2. The vortex equations The vortex equations with parameter τ are the following coupled equations for a pair (A, ψ) ∈ C(L): ∂¯ A ψ = 0 in 0,1 (L), m(A, ψ) = τ in
0 .
(7) (8)
Individually, we will refer to them as the Cauchy–Riemann equation and the moment map equation. The space of solutions V(L , τ ) is invariant under G, and the quotient space V(L , τ ) := V(L , τ )/G is called the vortex moduli space. The fundamental results about V(L , τ ) are as follows. Proposition 2.1. Assume that
τ ω > 2πr .
(a) The space V(L , τ ) is a finite-dimensional, complex—therefore smooth and Kähler—submanifold of m −1 (τ )/G. (b) The map Z : V(L , τ ) → Symr (), [A, ψ] → ψ −1 (0) is an isomorphism of complex manifolds.
296
T. Perutz
The unitary connection A induces a holomorphic structure on L: a local section is holomorphic if and only if it lies in ker ∂¯ A . By means of the holomorphic structure, one attaches multiplicities to points of ψ −1 (0), so that ψ has r zeros in all. This makes sense of Z . We write L A for L with this holomorphic structure. Item (a) is proved by an elliptic regularity argument, and we shall say a little more about it. As for (b), the statement that Z is bijective is an existence and uniqueness theorem for solutions to the vortex equations. This is the heart of the theorem, and various proofs are known, see e.g. Jaffe and Taubes [6], García-Prada [5]. The ‘degenerate’ case, where τ ω = 2πr , is also interesting: Addendum 2.2. When τ ω = 2πr , the moduli space V(L , τ ) = {(A, 0) : iFA = τ ω}/G is a finite-dimensional, complex—therefore smooth and Kähler—submanifold of m −1 (τ )/ G. The map V(L , τ ) → Pic L (); [A, 0] → L A is an isomorphism of complex manifolds. Here Pic L () is the Picard torus of holomorphic structures on L. 2.1.3. Smoothness of the moduli space This is a standard application of elliptic theory. We run through it briefly in preparation for the family version considered later; see [13] for some more details. The tangent space to the affine space C21 is the space of pairs (a, φ), where a is an imaginary one-form, φ a section, both of class L 21 . One obtains local slices for the action of G by imposing the Coulomb gauge condition d ∗ (ia) + Imψ, φ = 0,
(9)
which says that (a, φ) is orthogonal to the gauge-orbit of (A, ψ). Note that the left-hand side is gauge-equivariant. The linearisations of the two vortex equations at the solution (A, ψ) are (10) ∂¯ A φ + a 0,1 ψ = 0, ∗ida + Reψ, φ = 0. The second of these and (9) are real and imaginary parts of the single equation 1 (11) ∂¯ ∗ (a 0,1 ) − ψ, φ = 0. 2 Hence the space of solutions to Eqs. (10, 9) is the kernel of the C-linear differential operator 1 D(A,ψ) : (a, φ) → (∂¯ A φ + a 0,1 ψ, ∂¯ ∗ (a 0,1 ) − ψ, φ). (12) 2 Now, D(A,ψ) is a compact perturbation of the Fredholm operator (a, φ) → (∂¯ A φ, ∂¯ ∗ (a 0,1 )), which has index (r + 1 − g) − (1 − g) = r (over C). Hence D(A,ψ) is also Fredholm of ∗ index r . It is surjective (this can be seen by computing D(A,ψ) D(A,ψ) , see [13]), so its kernel has constant rank r . From this point it is straightforward to check, using the implicit function theorem, that V(L , τ ) is a differentiable submanifold of m −1 (τ )/G. Since its tangent spaces ker(D A,ψ ) are complex linear, it is a complex submanifold.
Symplectic Fibrations and the Abelian Vortex Equations
297
2.2. The Kähler class on the vortex moduli space. As we have seen, the moduli space V(L , τ ) is a complex manifold equipped with a canonical Kähler form στ . We write στ also for its pullback by Z −1 , a Kähler form on Symr (). The target of this section is to determine its cohomology class. A (2 − p)-cycle ζ in gives rise to a closed subset δζ ⊂ Symr () representing a (2r − p)-cycle: δζ consists of divisors D ∈ Symr () such that mult x (D) = mult x (ζ ) for all x ∈ . Using this map followed by Poincaré duality on Symr (), we obtain a map ν p : H p (; Z) → H 2− p (Symr (); Z). It is well-known that ν1 is an isomorphism. When p = 2, an isomorphism ∼ =
H0 (; Z) ⊕ 2 H1 (; Z) → H 2 (Symr (); Z) is given by (a, b ∧ c) → ν0 (a) + ν1 (b) ∪ ν1 (c). We define • •
η ∈ H 2 (Symr (); Z) to be the class corresponding to the point class in H0 (; Z); θ ∈ H 2 (Symr (); Z) to be the class corresponding to the cup-product form on H 1 (; Z) (here we think of the cup-product form as an element of Hom(2 H 1 (; Z), Z) = 2 H1 (; Z)).
Often we conflate these integral classes with their images in real cohomology. Theorem 3. The equation 1 [στ ] = 2π
τ ω η + 2π(θ − r η)
holds in H 2 (Symr (); R). As already mentioned, this formula was found by Manton–Nasir [10]. Our (quite different) method of proof is to exhibit connections on two line bundles over the orbit space of irreducible pairs, C∗ /G. The Chern classes of these line bundles restrict to η and θ − r η on V(L , τ ), while the appropriate linear combination of their curvature forms restricts exactly to the form στ . 2.2.1. Cohomology of the orbit space We write C∗ = C∗ (L) for the space of pairs (A, ψ) ∈ C(L) with ψ not identically zero, B∗ for the orbit space C∗ /G, and i : V(L , τ )→ B∗ for the inclusion. Lemma 2.3. i induces a surjection on cohomology, and an isomorphism on H ≤2 . Proof. Using the cohomology slant product operation, define µB : H∗ (; Z) → H 2−∗ (B∗ ; Z), h → c1 (LB )/ h, µSym : H∗ (; Z) → H 2−∗ (Symr (); Z), h → c1 (LSym )/ h.
298
T. Perutz
Here the line bundle LB → B∗ × is LB = L B /G, where the equivariant line bundle r ∗ L B → C × is the pullback of L → ; and LSym → Sym ()× is the topological univ ⊂ Symr () × . line bundle corresponding to the universal divisor These maps extend uniquely to ring homomorphisms ∗ H1 (; Z) ⊗Z Z[H0 ()] → H ∗ (B∗ ; Z), ∗ H1 (; Z) ⊗Z Z[H0 ()] → H ∗ (Symr (); Z), since the ring on the left is freely generated by H0 (; Z) ⊕ H1 (; Z). These are homomorphisms of graded rings where the grading on the left is characterised by the property that Hi (; Z) has degree 2 − i. The first of these two maps is an isomorphism [1, pp. 539–545]. The second is surjective, since the image of µSym contains H 1 (Symr (); Z) and the class η, and these generate the cohomology ring. To prove the lemma it suffices to show that i ∗ ◦ µ1 = Z ∗ ◦ µSym . This follows from the fact that (i × 1)∗ LB is isomorphic to (Z × 1)∗ LSym . To see that these bundles are isomorphic, observe that the former has a tautological section which vanishes precisely along univ .
It is convenient to have some notation to hand for integral (co)homology classes on . Let e0 ∈ H0 () be the class of a point, e2 ∈ H2 () the orientation class. Let e0 ∈ H 0 (), e2 ∈ H 2 () be their duals. Let {αi , β j }1≤i, j≤g be a symplectic basis for H1 (), and {α i , β j }1≤i, j≤g the dual basis for H 1 (). Now put η = µ1 (e0 ), θ=
g
µ1 (αi ) ∪ µ1 (βi ).
(13)
i=1
η − 2 θ in H 2 (B∗ ; Z). Lemma 2.4. c1 (LB )2 /e2 = 2r Proof. The group H 2 (B∗ × ; Z) is the direct sum of its Künneth components H 0 (B∗ ; Z) ⊗ H 2 (; Z), H 1 (B∗ ; Z) ⊗ H 1 (; Z) and H 2 (B∗ ; Z) ⊗ H 0 (; Z). The Chern class c1 (LB ) is tautologically the sum of µ1 (e2 ) ⊗ e2 ∈ H 0 (B∗ ; Z) ⊗ H 2 (; Z), g
(µ1 (αi ) ⊗ α i + µ1 (βi ) ⊗ β i ) ∈ H 1 (B∗ ; Z) ⊗ H 1 (; Z),
i=1
µ1 (e0 ) ⊗ e0 ∈ H 2 (B∗ ; Z) ⊗ H 0 (; Z). Let us call these terms A, B and C respectively. Note that A = r.1 ⊗ e2 (by definition of LB ) and C = η ⊗ e0 . The Künneth isomorphism is compatible with cup products, providing that one uses the graded tensor product of graded rings. Thus A ∪ C = r η ⊗ e2 = C ∪ A, and g 2 B = µ1 (αi ) ∪ µ1 (βi ) − µ1 (βi ) ∪ µ1 (αi ) ⊗ e2 = −2 θ ⊗ e2 . i=1
Hence c1 (LB )2 /e2 = 2(r η − θ ).
Symplectic Fibrations and the Abelian Vortex Equations
299
2.2.2. A connection on LB We now write down a canonical connection ∇ on LB , and compute its curvature. This calculation is modelled on that of Donaldson and Kronheimer [3, p. 195]. We will use the curvature form, together with its wedge-square, to construct a closed two-form on B∗ , representing a known cohomology class, whose restriction to V(L , τ ) is στ . The connection ∇ is concocted from two ingredients: • •
on the line bundle pr ∗ L → A∗ × ; a certain unitary, G-invariant connection ∇ 2 a certain connection on the principal G-bundle C∗ → B∗ , pulled back to B∗ × .
As explained in [3], such data determine a connection ∇ on the quotient line bundle LB → B∗ × , characterised by the condition vˆ (ˆs ) (∇v s)ˆ = ∇ for local sections s and vector fields v, where ˆ· denotes -horizontal lifting. is trivial in the C∗ -directions and tautological in the -directions. The connection ∇ To amplify: a section of pr ∗2 L is a map s : C∗ × → L with s(A, ψ, x) ∈ L x , and at the point (A, ψ, x),
(a,φ,v) s = d A,v (s|{A, ψ} × )(x) + d s(A + ta, ψ + tφ, x) (x). (14) ∇ dt t=0 is given by The curvature of ∇ F∇ ((0, 0, u), (0, 0, v)) = F A (u, v), F∇ ((a, φ, 0), (0, 0, v)) = a, v, F∇ ((a, φ, 0), (a , φ , 0)) = 0.
(15)
We can obtain a connection on C∗ → B∗ from our gauge-fixing condition: the horizontal space over [A, ψ] is the kernel of the linear operator (a, φ) → d ∗ (ia) − Imψ, φ. To write down the connection one-form , we need the Green’s operator G ψ associated to the Laplacian ψ = d ∗ d + |ψ|2 :
0 → 0 .
ψ is surjective (since d ∗ d maps onto the functions of mean-value zero), inducing an isomorphism of ker( ψ )⊥ with 0 ; its inverse is G ψ . Lemma 2.5. The connection one-form is given by (A,ψ) (a, φ) = iG ψ (d ∗ ia − Imψ, φ) ∈ i0 . Proof. This form has the correct kernel, so to justify the assertion one simply observes that it is invariant under G: (A,ψ,x) (−d f, f ψ, 0) = f,
f ∈ i0 .
300
T. Perutz
In accordance with the general pattern explained in [3], the curvature of the quotient connection ∇ on LB → B∗ × is given by F∇ ((0, 0, u), (0, 0, v)) = FA (u, v), F∇ ((a, φ, 0), (0, 0, v)) = a, v, F∇ ((a1 , φ1 , 0), (a2 , φ2 , 0)) = 2iG ψ (d ∗ ib − Imψ, χ ).
(16)
Here (a1 , φ1 ) and (a2 , φ2 ) are vector fields on B∗ which are horizontal with respect to ; their Lie bracket is (b, χ ). Lemma 2.6. Suppose (a1 , φ1 ) and (a2 , φ2 ) are horizontal. Then d ∗ (ib) − Imψ, χ = −Imφ1 , φ2 . Proof. Denote the pair (A + ta1 , ψ + tφ1 ) by ct . Then, at (A, ψ), 1 (a2 (c0 ) − a2 (ct )) + o(t), t 1 χ = (φ2 (c0 ) − φ2 (ct )) + o(t) t b=
as t → 0. But at ct , d ∗ ia2 = Imψ + tφ1 , φ2 (ct ), and from this one obtains d ∗ (ib) − Imψ, χ = − lim Imφ1 (c0 ), φ2 (ct ) = −Imφ1 (c0 ), φ2 (c0 ). t→0
2.2.3. Two-forms as curvature integrals We are now in a position to write down closed two-forms representing c1 (LB )/e0 and c1 (LB )2 /e2 in de Rham cohomology. Note. In this paragraph we insist that the tangent vectors (a j , φ j ) are horizontal. The class c1 (LB ) has the Chern–Weil representative iF∇ /2π , so 1 c1 (LB )/e0 = iF∇ ∧ ω0 , where ω0 = 1. (17) 2π Explicitly, this representative for c1 (LB )/e0 is the two-form 1 ((a1 , φ1 ), (a2 , φ2 )) → G ψ (Imφ1 , φ2 ) ω. π [ω]
(18)
Similarly, c1 (LB )2 /e2 =
1 4π 2
iF∇ ∧ iF∇ .
(19)
This integral involves the product of the first and third curvature terms, and the square of the second term. So c1 (LB )2 /e2 has the representative 1 1 ((a1 , φ1 ), (a2 , φ2 )) → 2 G ψ (Imφ1 , φ2 )iFA − ia1 ∧ ia2 . (20) π 2π 2 Notice the appearance of an expression familiar from (2) as the second term.
Symplectic Fibrations and the Abelian Vortex Equations
301
At this point we impose the moment map equation, restricting these forms and classes to the locus where m(A, ψ) = τ . On that locus, the class (21) 4π 2 ( θ − r η) + 2π [τ ω] η = 2π −π c1 (LB )2 /e2 + [τ ω]c1 (LB )/e0 is represented by the form ia1 ∧ ia2 + 2 G ψ (Imφ1 , φ2 ) (τ ω − iFA ) = ia1 ∧ ia2 + G ψ (Imφ1 , φ2 )|ψ|2 ω = ia1 ∧ ia2 + Imφ1 , φ2 ω
= σ ((a1 , φ1 ), (a2 , φ2 )).
(22)
(Recall that σ is our standard Kähler form on C∗ ). The penultimate equality uses the observation that, because the Laplacian of a function f has mean value zero, fω = ψ G ψ ( f ) ω = |ψ|2 G ψ ( f ) ω.
Proof of Theorem 3. What we have just found is that the class 2π([τ ω] η + 2π( θ − r η)) on B∗ , restricted to m −1 (τ )/G, is equal to [στ ]. Restricting further to the vortex moduli space, we find that the class of our preferred Kähler form is 2π([τ ω] η + 2π( θ − r η))|V(L , τ ) ∈ H 2 (V(L , τ ); R). Hence, pulling back by Z , we find that the class of our Kähler form on Symr () is 2π([τ ω]η + 2π(θ − r η)), which is the formula we have been working towards.
2.3. The Duistermaat–Heckman formula. The Duistermaat–Heckman formula [2] for the variation of cohomology of symplectic quotients gives another proof that the cohomology class [στ ] varies linearly with τ —provided that τ is a constant function—and computes the slope. Suppose that one has a Hamiltonian S 1 -action on (M, ω), with moment map µ : M → ∗ t . Here t = Lie(S 1 ). Identify t∗ with R so that the lattice dual to exp−1 (1) ⊂ t corresponds to Z ⊂ R. Suppose that µ is proper, and that its restriction to µ−1 () is submersive, for some open interval ⊂ R. The family of symplectic quotients (Mt , ωt )t∈ is then a trivial fibre bundle, and a trivialisation gives an identification of the cohomology of Mt with that of a fixed fibre Ms . The identification is canonical, hence {[ωt ]}t∈ can be considered as a family of classes on Ms . Suppose that S 1 acts freely on µ−1 (s), so that µ−1 (s) → Ms is a principal circle-bundle, with Chern class c ∈ H 2 (Ms ; R). The Duistermaat–Heckman formula says that d [ωt ] = 2π c. dt
(23)
We apply this with M = τ ∈R V(L , τ )/G0 , where G0 is the based gauge group {u : → U(1) : u(x) = 1}, x ∈ an arbitrary basepoint, and τ ∈ R stands for a constant function
302
T. Perutz
on . The circle acts by constant gauge transformations. We take = (2πr/ ω, ∞); V(L , τ )/G is η. Formula 23 gives the Chern class c of V(L , τ )/G0 →
d [στ ] = 2π ω η, dτ which is consistent with our result. One can formally recover the constant term 4π 2 (θ − r η) by specialising to the degenerate parameter τ = 2πr/ ω (for which the formula [στ ] = 4π 2 θ is easily verified); however, justifying this formal manipulation would need further thought. Since Duistermaat and Heckman’s proof identifies the variation in the symplectic forms with the curvature of a connection on µ−1 (s) → Ms , the two methods are perhaps not so different as they appear. 3. Families of Vortex Moduli Spaces 3.1. Construction of the vortex fibration. (a) Suppose that X → S is a smooth fibre bundle, where X and S are connected and oriented, and that the typical fibre is a compact surface . Let L → X be a principal U(1)-bundle, and assume that L|X s → X s has degree r > 0. Consider X → S as a fibration with structure group Diff + (). Putting P = L|X s , we can consider the composite map L → X → S as a fibration with typical fibre P and structure group Diff +P (). The latter is the group of pairs (g, ˜ g), where g˜ ∈ Aut(P) is an automorphism covering g ∈ Diff + (), so it is an extension of Diff + () by the gauge group. There are natural left actions of Diff +P () on the space of connections A(P) and on the space of sections 0 (P). These arise through the covariance of connections and of sections; representing a connection by its one-form A ∈ 1P , we have g.A ˜ = g˜ −1∗ A; g.ψ ˜ = g˜ ◦ ψ ◦ g −1 . One can then form the associated fibrations L ×Diff +P () A(P) → S,
L ×Diff +P () 0 (P) → S,
with structure group Diff +P (). These may be thought of as the bundles of connections (resp. sections) along the fibres of X → S: L ×Diff +P () A(P) ∼ = {(s, A) : s ∈ S, A ∈ A X s (L s )}, 0 L ×Diff +P () (P) ∼ = {(s, ψ) : s ∈ S, ψ ∈ 0X s (L s )}. The first of these has the special property that it is a symplectic fibration: its structure group is reduced to the symplectic automorphism group of A(P). Other fibrations can be derived from these basic ones. The space C(P) = A(P) × 0 (P ×U (1) C), comprising pairs (A, ψ), where ψ is a section of the line bundle associated with P, is also a Diff +P ()-space (the action is the diagonal one), and so is B(P) = C(P)/G,
Symplectic Fibrations and the Abelian Vortex Equations
303
because G acts on C(P) as a subgroup of Diff +P (). The associated fibrations are C X/S (L) := L ×Diff +P () C(P), B X/S (L) := L ×Diff +P () B(P). (b) Suppose now that X → S is itself a symplectic fibration, i.e. that its structure group is reduced to Aut(, ω) for some area form ω. Then the structure group L → S is reduced to Aut P (, ω), the group of pairs ( f˜, f ) with f ∗ ω = ω, and C X/S (L) → S is again a symplectic fibration. Note that P ×U(1) C is a hermitian line bundle, so our formula for the symplectic form on C(P) makes sense. Let { js ∈ J(X s , ωs )}s∈S be a smooth family of complex structures, compatible with the symplectic forms. The moment map m : C(P) → i2 , (A, ψ) → ∗iFA + |ψ|2 /2, generalises to a bundle map over S, m X/S : C X/S (L) → L ×Diff +P () 0 . We now take τ to be a constant. Then we have a sub-bundle m −1 X/S (τ ) ⊂ C X/S (L), projecting to a sub-bundle π(m −1 X/S (τ )) ⊂ B X/S (L) under the quotient map π : C X/S
(L) → B X/S (L), and π(m −1 X/S (τ ω)) → S has structure group Aut P (, ω). (c) We now impose a fibred version of the Cauchy–Riemann equation. This differs from what we have done so far in that it cannot be expressed in terms of associated bundles. The total space of the vortex fibration V X/S (L , τ ) → S is the space of triples ¯ [s, A, ψ] ∈ π(m −1 X/S (τ )) satisfying ∂ js ,A ψ = 0. It maps to S in the obvious way. The fibre over s can be identified with the vortex moduli space V X s (L|X s , τ ), and so with Symr (X s ). Lemma 3.1. The space V X/S (L , τ ) has a structure of smooth manifold which makes the projection p : V X/S (L , τ ) → S a smooth submersion, hence a fibre bundle. Proof. The linearisation of the defining equations for V X/S (L , τ ), and the fibrewise gauge-fixing condition, define an R-linear operator D(s,A,ψ) : D(s,A,ψ) (v, a, φ) = D A,ψ (a, φ) + P(v), v ∈ Ts S.
(24)
∂j . The operator D(s,A,ψ) is thus Here P is the 0th -order operator P(v) = 21 i(d A ψ) ◦ ∂v Fredholm, of real index 2r + dim(S), and surjective (since D(A,ψ) is). The kernel of D(s,A,ψ) is the putative tangent space to V X/S (L , τ ) at (s, A, ψ), and the projection π : ker D(s,A,ψ) → Ts S is putatively the derivative of p. Note that π is surjective, because its kernel is exactly ker D A,ψ , which we know has dimension 2r . Now the standard elliptic theory which we sketched above here gives smoothness of the vortex fibration and of the map p.
304
T. Perutz
3.2. Line bundles and cohomology operations.. Let LB → C∗X/S × S X be the pullback of the line bundle L → X . It is an equivariant line bundle under the fibrewise gauge-action, and so descends to a line bundle LB → B∗X/S × S X. The universal divisor univ ⊂ SymrS (X ) × S X corresponds to a unique line bundle LSym → SymrS (X ) × S X. Lemma 3.2. There is a natural isomorphism (i × 1)∗ LB → Z ∗ LSym , where i is the inclusion of V X/S (L , τ ) in B∗X/S , and Z the natural isomorphism of V X/S (L , τ ) with SymrS X . Proof. The section ([A, ψ], x) → [ψ(x)] of (i × 1)∗ LB vanishes precisely along Z −1 ( ).
Using these two line bundles one can construct operations H ∗ (X ) → H ∗+2k−2 (B∗X/S (L)), c → c˜[k] , H ∗ (X ) → H ∗+2k−2 (SymrS (X )), c → c[k] . defined for arbitrary coefficient rings. The second of these was discussed earlier (Eq. 1). Introduce the projections p1
B∗X/S
←−−−−
B∗X/S × S X
p2
−−−−→ X,
p1
p2
SymrS (X ) ←−−−− SymrS (X ) × S X −−−−→ X, and set c˜[k] = p1! (c1 (LB )k ∪ p2∗ c), c
[k]
= p1! (c1 (LSym ) ∪ k
p2∗ c).
Because of the relation between LB and LSym , we have i ∗ c˜[k] = Z ∗ c[k] .
(25) (26)
Symplectic Fibrations and the Abelian Vortex Equations
305
3.3. Associated fibrations as locally Hamiltonian fibrations. In Sect. 3.1, we constructed various associated fibrations within the category of symplectic fibrations—fibre bundles with symplectic forms on the fibres. Our next task is to refine these constructions to the category of locally Hamiltonian fibrations. The vortex fibration will then become a LHF by restricting a closed two-form defined on a larger space. The cleanest way that I have found to do this is to ‘reverse-engineer’ our cohomology calculation for the vortex moduli space. This goes as follows. on the bundle We need a fibrewise-equivariant connection ∇ LB → C∗X/S × S X. to be the unique To obtain one, choose a connection Aref on L → X . We define ∇ connection which restricts to the natural one (14) on each fibre over S, and which is given by Aref on T h X . defines a quotient conIn conjunction with the fibrewise gauge-fixing condition, ∇ ∗ nection ∇ on LB → B X/S × S X . Definition 3.3. We define the closed two-form v(τ ˜ , L) on B∗X/S by π v(τ ˜ , L) = 2π iF∇ ∧ τ − iF∇ . 2 X/S
(27)
We define v(, τ, L) to be the restriction of v(τ ˜ , L) to the vortex fibration V X/S (L , τ ). Let us clarify the integration symbol here. Projection on the first factor makes B∗X/S × S X a fibre bundle over B∗X/S . The fibre over a point of B∗X/S × S X which lies over s ∈ S is X s . It therefore makes sense to integrate down the fibres of B∗X/S × S X → B∗X/S . In particular, a closed four-form α on B∗X/S × S X gives rise to a closed two-form α. ∗ × X )/(B∗ ) (B X/S S X/S
We write this more compactly as X/S α. Bearing in mind that integration along the fibre corresponds to the cohomology pushforward, we can read off the cohomology class of v(τ ˜ , L): ˜ [1] − π 1˜ [2] ). [v(τ ˜ , L)] = 2π([τ ] Forming v(τ ˜ , L) is obviously compatible with restricting the base S. By our earlier calculations, the form i ∗ v(τ ˜ , L) on the vortex bundle restricts to the preferred Kähler form on each fibre. Thus [v(τ , L)] = 2π([τ ][1] − π 1[2] ). Theorem 1 is now an immediate consequence of what we have done. Acknowledgements. The work presented here formed a part of my doctoral thesis. I am grateful to my Ph.D. supervisor, Simon Donaldson, for his ideas and advice. Thanks also to Michael Thaddeus for pointing out the Duistermaat–Heckman method, and to Michael Usher for telling me about his related work [16]. I acknowledge support from EPSRC Research Grant EP/C535995/1.
306
T. Perutz
References 1. Atiyah, M., Bott, R.: The Yang–Mills equations over Riemann surfaces. Philos. Trans. Roy. Soc. London Ser. A 308(1505), 523–615 (1983) 2. Duistermaat, J., Heckman, G.: On the variation in the cohomology of the symplectic form of the reduced phase space. Invent. Math. 69(2), 259–268 (1982) 3. Donaldson, S., Kronheimer, P.: The geometry of four-manifolds. Oxford Mathematical Monographs, Oxford: Oxford University Press, 1990 4. Donaldson, S., Smith, I.: Lefschetz pencils and the canonical class for symplectic four-manifolds. Topology 42(4), 743–785 (2003) 5. García-Prada, O.: A direct existence proof for the vortex equations over a compact Riemann surface. Bull. London Math. Soc. 26(1), 88–96 (1994) 6. Jaffe, A., Taubes, C.: Vortices and monopoles. Progress in Physics 2, Boston, MA: Birkhäuser, 1980 7. Kronheimer, P., Mrowka, T.: Monopoles and three-manifolds. New Mathematical Monographs, Vol. 10. Cambridge Univ. Press, (in press) 8. MacDonald, I.: Symmetric products of an algebraic curve. Topology 1, 319–343 (1962) 9. McDuff, D., Salamon, D.: J -holomorphic curves and symplectic topology. Amer. Math. Soc. Colloquium Publications 52. Providence, RI: Amer. Math. Soc., 2004 10. Manton, N., Nasir, S.: Volume of vortex moduli spaces. Commun. Math. Phys. 199(3), 591–604 (1999) 11. Perutz, T.: Surface-fibrations, four-manifolds, and symplectic Floer homology. Ph.D. thesis, Imperial College London, 2005 12. Perutz, T.: Lagrangian matching invariants for fibred four-manifolds: I. Geom. Topol. 11, 759–828 (2007) 13. Salamon, D.: Seiberg–Witten invariants of mapping tori, symplectic fixed points, and Lefschetz numbers. Turkish J. Math. 23(1), 17–143 (1999) 14. Samols, T.: Vortex scattering. Commun. Math. Phys. 145(1), 149–179 (1992) 15. Seidel, P.: Symplectic Floer homology and the mapping class group. Pacific J. Math. 206(1), 219–229 (2002) 16. Usher, M.: Vortices and a TQFT for Lefschetz fibrations on 4–manifolds. Algebr. Geom. Topol. 6, 1677–1743 (2006) Communicated by N.A. Nekrasov
Commun. Math. Phys. 278, 307–327 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0401-5
Communications in
Mathematical Physics
Yang-Mills Detour Complexes and Conformal Geometry A. Rod Gover1 , Petr Somberg2 , Vladimír Souˇcek2 1 Department of Mathematics, The University of Auckland, Private Bag 92019,
Auckland 1, New Zealand. E-mail:
[email protected] 2 Mathematical Institute, Faculty of Mathematics and Physics, Charles University,
Sokolovská 83,186 75 Praha, Czech Republic. E-mail:
[email protected];
[email protected] Received: 7 July 2006 / Accepted: 2 July 2007 Published online: 8 January 2008 – © Springer-Verlag 2007
Abstract: Working over a pseudo-Riemannian manifold, for each vector bundle with connection we construct a sequence of three differential operators which is a complex (termed a Yang-Mills detour complex) if and only if the connection satisfies the full Yang-Mills equations. A special case is a complex controlling the deformation theory of Yang-Mills connections. In the case of Riemannian signature the complex is elliptic. If the connection respects a metric on the bundle then the complex is formally selfadjoint. In dimension 4 the complex is conformally invariant and generalises, to the full Yang-Mills setting, the composition of (two operator) Yang-Mills complexes for (anti-)self-dual Yang-Mills connections. Via a prolonged system and tractor connection a diagram of differential operators is constructed which, when commutative, generates differential complexes of natural operators from the Yang-Mills detour complex. In dimension 4 this construction is conformally invariant and is used to yield two new sequences of conformal operators which are complexes if and only if the Bach tensor vanishes everywhere. In Riemannian signature these complexes are elliptic. In one case the first operator is the twistor operator and in the other sequence it is the operator for Einstein scales. The sequences are detour sequences associated to certain BernsteinGelfand-Gelfand sequences. 1. Introduction In the study of Riemannian and pseudo-Riemannian geometry it is often valuable to use differential operators with good conformal behaviour. In the Riemannian setting, elliptic differential operators are particularly important. For example the conformal Laplacian controls the conformal variation of the scalar curvature. This was exploited heavily in the solution by Schoen, Aubin, Trudinger, and Yamabe (see [40]) of the “Yamabe Problem” of finding, via conformal rescaling, constant scalar curvature metrics on compact manifolds. Related curvature prescription problems and techniques have exploited the higher order conformal Laplacians of Paneitz, Graham et al. [8,18,35]. These operators
308
A. R. Gover, P. Somberg, V. Souˇcek
on functions (or really densities) also find a natural place in the recent developments [24,36] concerning the asymptotics and scattering theory of the conformally compact Poincaré-Einstein metric of Fefferman-Graham [23]. On many tensor and spinor fields there is no conformally invariant elliptic operator (taking values in an irreducible bundle); this follows from the classification of conformally invariant differential operators on the sphere [7,22]. This classification is based on the structure of generalised Verma modules and from this it follows that often the analogue, or replacement, for a conformal elliptic operator on the sphere is an elliptic complex of conformally invariant differential operators. However the situation is complicated for conformally curved structures. The requirement that a sequence of differential operators be both conformally invariant and form a complex is severe. On the other hand when such complexes exist they can be expected to play a serious role in treating the underlying structure. This idea is already well-established in the setting of self-dual 4-manifolds [1,19]. On fully conformally curved n-manifolds, with n even, there is a class of elliptic conformal complexes on differential forms [11]. Each of these is different to the de Rham complex, and these complexes generalise the conformally invariant operator of [35], with leading term n/2 . Another class of complexes is based around the (Fefferman-Graham) obstruction tensor [23]. This is a natural conformal 2-tensor that generalises, to higher even dimensions, the Bach tensor in dimension 4. It turns out that the formal deformations of obstruction-flat manifolds are controlled by a sequence of conformal operators, which form an elliptic complex if and only if the structure is obstruction-flat [12]. Unfortunately there is no obvious way to generalise either the construction in [11], or that in [12]. For 4-manifolds we construct here two conformal differential sequences which are (formally self-adjoint) complexes if and only if the (conformally invariant) Bach-tensor [2] vanishes everywhere. This condition is weaker than self-duality. In fact conformally Einstein manifolds are also Bach-flat and there are structures which are Bach-flat and neither conformally-Einstein nor half-flat [30]. Writing T : S → Tw for the usual twistor operator on Dirac spinors (as in e.g. [5]), in Theorem 4.5 we obtain a differential complex T
M
T∗
S → Tw −→ Tw → S, where M is a third order Rarita-Schwinger type operator. On the other hand in Theorem 4.3 we construct P
MT
P∗
E 0 → E 1,1 −→ E 1,1 → E 0 , where M T is a second order conformal operator, similar in form to the operator which controls deformations of Einstein structures (see [6] and references therein), while P is a curvature modification of the trace-free covariant Hessian. Non-vanishing solutions of P give conformal factors σ so that σ −2 g is Einstein (see [3]); we show via the second sequence that the Bach tensor obstructs solutions. If the manifold is Riemannian then both of the complexes are elliptic. We have been intentionally explicit in treating these constructions, as it seems these complexes should play a fundamental role in conformal and Riemannian geometry. In the compact and Riemannian-signature setting the ellipticity implies that the complexes have finite dimensional cohomology spaces. In both cases the interpretation of the 0th -cohomology is well-known but as far as we know the first cohomology is a new global conformal invariant of Bach-flat structures.
Yang-Mills Detour Complexes
309
Such conformal elliptic complexes have the scope to yield further geometric information through their detour torsion invariants [10]. In fact Theorems 4.3 and 4.5 construct short detour complexes in all dimensions n ≥ 3 and n ≥ 4 respectively. These complexes are conformally invariant only in dimension 4, but by construction have a simple conformal behaviour and may well be of interest for physics in the Lorentzian setting. The route to the constructions and results mentioned above is really one of the main points of the article. We believe that it lays foundations for an eventual general treatment of a large class of related complexes, and also many of the results should be of independent interest. The simplest example of a detour complex is the Maxwell detour complex d
δd
δ
E 0 → E 1 → E 1 → E 0.
(1)
For each vector bundle V and connection D we construct, in Sect. 3.1, a curvature adjusted twisting of this complex with the property that it is again a complex if and only if the connection D is a (pure) Yang-Mills connection, see Theorem 3.2. In dimension 4 the resulting complexes are conformal. In Sects. 3.2 (in particular Theorem 3.4) we recover a class of these complexes by considering deformations of Yang-Mills connections. We show in Proposition 3.6 of Sect. 3.4 that, in dimension 4, the Yang-Mills detour complex generalises the composition of subcomplexes of Yang-Mills complexes arising from (anti-)self-dual connections. The next main item is a rather general construction, see diagram [D] in Sect. 4. This enables the Yang-Mills detour complex to be “translated” to yield new complexes. Broadly the motivational idea is this. If one has an overdetermined differential operator (of finite type) B 0 → B 1 then one may sometimes obtain a corresponding invariant connection on a prolonged system [33]. If the latter satisfies the Yang-Mills equations and, say, preserves a metric on the prolonged system, then the Yang-Mills detour complex on the prolonged system descends and extends B 0 → B 1 to a complex. In reality this is an over-simplification, but it contains the germ of the main idea. 2. Background: Conformal Geometry Recall that a conformal structure of signature ( p, q) on M is a smooth ray subbundle Q ⊂ S 2 T ∗ M whose fibre over x consists of conformally related signature-( p, q) metrics at the point x (and S 2 T ∗ M is the symmetric part of ⊗2 T ∗ M). Sections of Q are metrics g on M. So we may equivalently view the conformal structure as the equivalence class [g] of these conformally related metrics. The principal bundle π : Q → M has structure group R+ , and so each representation R+ x → x −w/2 ∈ End(R) induces a natural line bundle on (M, [g]) that we term the conformal density bundle E[w]. We shall write E[w] for the space of sections of this bundle and g denotes the conformal metric, that is the tautological section of S 2 T ∗ M ⊗ E[2] determined by the conformal structure. On conformal manifolds this will be used to identify T M with T ∗ M[2]. Note E[w] is trivialised by a choice of metric g from the conformal class, and we write ∇ for the connection corresponding to this trivialisation (and term this the Levi-Civita connection on E[w]). It follows that (the coupled) ∇a preserves the conformal metric. In dimensions n ≥ 3 the Riemannian curvature can be decomposed into the totally trace-free Weyl curvature Cabcd and a remaining part described by the symmetric Schouten tensor Pab , according to Rabcd = Cabcd + 2g c[a Pb]d + 2g d[b Pa]c , where [· · · ] indicates antisymmetrisation over the enclosed indices. The Schouten tensor is a
310
A. R. Gover, P. Somberg, V. Souˇcek
trace modification of the Ricci tensor Ricab and vice versa: Ricab = (n − 2)Pab + Jg ab , where we write J for the trace Pa a of P. The Cotton tensor and Bach tensor are defined by, respectively, Aabc := 2∇[b Pc]a and Bab := ∇ c Aacb + Pdc Cdacb .
(2)
Under a conformal transformation we replace a choice of metric g by the metric gˆ = e2ω g, where ω is a smooth function. Explicit formulae for the corresponding transformation of the Levi-Civita connection and its curvatures are given in e.g. [3,32]. abcd = Cabcd . We recall that, in particular, the Weyl curvature is conformally invariant C In dimension 4 Bab is conformally invariant. We will write E k [w] for the sections of the tensor product E k [w] := ∧k T ∗ M ⊗ E[w]. On conformal manifolds we use the notation Ek to mean the space of sections of E k := ∧k T ∗ M ⊗ E[2k − n]. This notation (following [11]) is suggested by the duality between the section spaces E k and Ek ; compactly supported sections pair globally by contraction and integration. For any vector bundle V , E k (V ) is the space of smooth sections of E k (V ) := ∧k T ∗ M ⊗ V , while Ek (V ) means the space of sections of E k (V ) := ∧k T ∗ M ⊗ E[2k − n] ⊗ V . When a metric from the conformal class is fixed, these spaces will be identified. In conformal geometry the de Rham complex is a prototype for a class of sequences of bundles and conformally invariant differential operators, each of the form B0 → B1 → · · · → Bn , where the vector bundles Bi are irreducible tensor-spinor bundles. On the n-sphere there is one such complex for each irreducible module V for the group G = S O(n + 1, 1) of conformal motions, the space of solutions of the first (overdetermined) conformal operator B 0 → B 1 is isomorphic to V, and the sequence gives a resolution of this space viewed as a sheaf. These are the conformal cases of the (generalised) BernsteinGelfand-Gelfand (BGG) sequences, a class of sequences of differential operators that exist on any parabolic geometry [7,15]. As well as the operators Di : Bi → Bi+1 of the BGG sequence, in even dimensions there are conformally invariant “long operators” L k : B k → B n−k for k = 1, . . . , n/2 − 1 [7]. Thus there are sequences of the form D0
D1
Dk−1
Lk
Dn−k
Dn−1
B 0 → B 1 → · · · → B k → B n−k → · · · → B n , and, following [10,11], we term these detour sequences since, in comparison to the BGG sequence, the long operator here bypasses the middle of the BGG sequence. Once again from the classification it follows that these detour sequences are in fact complexes in the case that the structure is conformally flat. The dimension 4 conformal complexes, constructed in Theorems 4.3 and 4.5 below, are detour sequences of this form with k = 1. 3. Yang-Mills Detour Complexes 3.1. The general construction. We work over a pseudo-Riemannian n-manifold (M, g) of signature ( p, q) (n ≥ 2). Let V denote a vector bundle with a connection D. We denote by F the curvature of D. We also write D for the induced connection on the dual bundle V ∗ . We write d D for the connection-coupled exterior derivative operator d D : E k (V ) → E k+1 (V ). Of course we could equally consider d D : E k (V ∗ ) → E k+1 (V ∗ ), and for the formal adjoint of this we write δ D : Ek+1 (V ) → Ek (V ).
Yang-Mills Detour Complexes
311
Let us write F· for the action of the curvature on the twisted 1-forms, F· : E 1 (V ) → E1 (V ) given by (F·ϕ)a := Fa b ϕb , where we have indicated the abstract form indices explicitly, whereas the standard End(V ) action of the curvature on the V -valued 1-form is implicit. Using this we construct a differential operator M D : E 1 (V ) → E1 (V ) by M D ϕ = δ D d D ϕ − F·ϕ. The operator M D has the property that its composition with d D is given simply by an algebraic action of the “Yang-Mills current” δ D F on the bundle V , as follows. Lemma 3.1. The composition of M D : E 1 (V ) → E1 (V ) with d D : E 0 (V ) → E 1 (V ) is given by the exterior action of δ D F, as an End(V )-valued 1-form: M D d D = ε(δ D F). The composition of δ D : E1 (V ) → E0 (V ) with M D : E 1 (V ) → E1 (V ) is given by the interior action of −δ D F, as an E 1 -valued endomorphism of E 1 (V ): δ D M D = −ι(δ D F). In these expressions the interior multiplication (indicated by ι(·)) and the exterior multiplication (indicated ε(·)) refers to the form index of δ D F. Proof. For the connection D coupled with the Levi-Civita connection ∇, let us also write D. Then, again using the notation where we exhibit abstract tensor indices but suppress indices for the bundle V , a formula for M D on a twisted 1-form a is (M D )b = −D a Da b + D a Db a − Fb a a , since the Levi-Civita connection is torsion-free. On the other hand for ∈ E 0 (V ), (d D )a = Da . Thus (M D d D )b = D a (Db Da − Da Db ) − Fb a Da = D a Fba − Fba D a = ε(δ D F) . b
By a similar calculation (or using the above on (V ∗ , D) and taking formal adjoints) we obtain, δ D M D = −ι(δ D F) , for ∈ E1 (V ). (Note that ι(δ D F) = −(D b Fb a ) a .)
Remark. Note that to simplify the punctuation in calculations, we often view sections of vector bundles as order 0 operators. Thus for example D a Fba has the same meaning as D a (Fba ). If the connection D is orthogonal or unitary for some inner product or Hermitian form on V (then V may be identified with V ∗ and) the algebraic action F· : E 1 (V ) → E1 (V ) is easily verified to be formally self-adjoint and so, in this case, M D is formally selfadjoint. From these observations, and Lemma 3.1, we have the following.
312
A. R. Gover, P. Somberg, V. Souˇcek
Theorem 3.2. The sequence of operators, dD
MD
δD
E 0 (V ) → E 1 (V ) → E1 (V ) → E0 (V )
(3)
is a complex if and only if the curvature F of the connection D satisfies the (pure) Yang-Mills equation δ D F = 0. In addition: (i) If D is an orthogonal or unitary connection then the sequence is formally selfadjoint. (ii) In Riemannian signature the sequence is elliptic. (iii) In dimension 4 the sequence (3) is conformally invariant. Proof. It remains to show (ii) and (iii). For (ii) we need that the symbol sequence is exact. This sequence is simply a tensor product twisting by V of the symbol sequence of the Maxwell detour complex (1) and so it is sufficient to check that case. But that case is an easy consequence of the algebraic Hodge decomposition on an inner product space. The conformally well-defined formal adjoint of the exterior derivative d : E k → E k+1 acts δ : Ek+1 → Ek (cf. e.g. [11]). Note that in even dimensions on middle order forms we have E n/2 = En/2 and so δ : E n/2 → En/2−1 is conformally invariant. The invariance persists if we twist by a connection D, and so from the definition of M D we have the result. For a given connection D on a vector bundle V , such that δ D F = 0, we will term the complex (3) of Theorem 3.2 the (corresponding) Yang-Mills detour complex. If D is a Yang-Mills connection on a vector bundle V , then the dual connection on V ∗ and the tensor product connection on any tensor power of these are also Yang-Mills. One might alternatively work with principal connections. If ω is a Yang-Mills connection on a principal bundle P with structure group G, then we obtain a complex (3) for every finite dimensional representation of G. 3.2. A variational construction of the deformation detour. Returning to the general situation that began Sect. 3.1, let V denote a vector bundle with a connection D and denote by F the curvature of D. Consider now a smoothly parametrised family of connections D t (on V ) given, on a section v ∈ E 0 (V ), by Dat v = Da v + Aat v,
(4)
where for each t ∈ R, At ∈ E 1 (EndV ) and A0 = 0. With F t denoting the curvature of D t , we have t Fab = Fab + Da Atb − Db Aat + [Aat , Atb ],
where, once again, we write D also to mean the connection on V coupled with the Levi-Civita connection. It follows that the derivative of F t at D = D 0 is d t A |t=0 , F˙ab = Da A˙ b − Db A˙ a where A˙ a := dt a
Yang-Mills Detour Complexes
313
˙ Now we calculate the derivative, at D, of δ D t F t . We have that is F˙ = d D A. d ab t t g Da Fbc |t=0 = D b F˙bc + [ A˙ b , Fbc ] dt = D b (Db A˙ c − Dc A˙ b ) + [Fc b , A˙ b ], where A˙ acts on F and vice versa by the obvious composition of bundle endomorphisms. ˙ Multiplying Note that, since the 1-form A˙ has values in EndV , the last term here is F· A. the display by −1 gives d Dt t ˙ δ F |t=0 = M D A. dt
(5)
So we have, in particular, the following outcome. Lemma 3.3. If D is a Yang-Mills connection then the infinitesimal deformation A˙ of D is through Yang-Mills connections if and only if M D A˙ = 0. In the vector bundle picture, a so-called gauge transformation arises locally by acting on V by a section u of the fibre bundle Aut(V ) of invertible elements in End(V ). From the Leibniz rule for D (viewed as a connection on the tensor powers of V and V ∗ ) it follows immediately that this pulls back to a transformation Da → Da + u −1 Da u, of the connection, and whence Fab → u −1 Fab u,
and
D a Fab → u −1 (D a Fab )u.
(6)
Thus if u s is a smoothly parametrised family of such transformations with u 0 = idV and derivative d u s |s=0 = u˙ ∈ E 0 (End(V )), ds then we obtain that the infinitesimal variation of D s is exactly d D u: ˙ ˙ D˙ a = Da u.
(7)
So from this and (5) we have d Ds s δ F |s=0 = M D d D u. ˙ ds On the other hand from (6) and (7) we get d Ds s δ F |s=0 = (δ D F)u˙ − uδ ˙ D F. ds Putting the last two results together brings us to ˙ M D d D u˙ = ε(δ D F End(V ) )u, where F End(V ) is the curvature of D viewed as a connection on End(V ) (so e.g. F End(V ) u˙ = [F, u]). ˙ This agrees precisely with the specialisation of Lemma 3.1 to
314
A. R. Gover, P. Somberg, V. Souˇcek
End(V ) equipped with the connection induced from D on V . In particular if D is a Yang-Mills connection then so is the connection on End(V ). Since End(V ) carries the non-degenerate symmetric pairing (U, W ) = T r (U W ) and this is preserved by D, then it follows from (7) that M D is formally self-adjoint with respect to the global pairing obtained by integrating (, ). (The point is that the Yang-Mills equations are the EulerLagrange equations for the Lagrangian density T r (F ab Fab ). So (7) shows that M D is the second variation of an action. By interchanging orders of variation one obtains the symmetry.) Thus from M D d D we also have δ D M D = 0 and the following result. Theorem 3.4. For a vector bundle V , with Yang-Mills connection D, the (formal) deformation detour complex dD
MD
δD
E 0 (End(V )) → E 1 (End(V )) → E1 (End(V )) → E0 (End(V ))
(8)
is formally self-adjoint. Its first cohomology H 1 (End(V ), D) is the formal tangent space at D to the moduli space of Yang-Mills connections on V . It follows from a general deformation theory that the complex (8) controls the full formal deformation theory of the Yang-Mills equations. 3.3. Examples: (pseudo-)Riemannian manifolds with harmonic curvature. On a pseudoRiemannian (spin) manifold we write ∇ for the Levi-Civita connection and R for its curvature, the Riemannian curvature tensor. Riemannian structures satisfying δ ∇ R = 0 are said to have harmonic curvature. Einstein manifolds, for example, are harmonic in this sense. There is a rich theory of harmonic manifolds, see [6] and references therein. If δ ∇ R = 0 then, from Theorem 3.2, we get a detour complex (3) for V any tensor (spin) bundle. For example if T M is the tangent bundle then we have M ∇ : E 1 (T M) → E1 (T M) by Sb c → −2∇ a ∇[a Sb] c − Rba c d S ad . This annihilates the covariant derivative of any tangent vector field. 3.4. Half-flat connections. In the setting of conformal (or pseudo-Riemannian) 4-manifolds, we observe here that when a vector bundle connection D is half-flat then there is very simple interpretation of the Yang-Mills detour complex. First we review, in our current notation, some relevant (well-known) background. Recall that on a conformal 4-manifold M of signature ( p, q) we have = 2 for (−1)k(4−k)+q on k-forms. In the case of Minkowskian signature let us write E ± 2 the ±i-eigenspaces of . In the other signatures E ± means the ±1 eigenspaces of . In any case, since is a symmetric endomorphism of E 2 , the decomposition of E 2 into 2 is orthogonal. Viewing the curvature F (of D on V ) as a twisted 2-form, recall E +2 ⊕ E − that the curvature, or the connection, is said to be self-dual (respectively anti-self-dual) 2 (End(V )) (respectively in E 2 (End(V ))) is zero, F = 0 if the component of F in E− − + (respectively F+ = 0). So if a connection D is half-flat, in this sense, then δ D F is a multiple of d D F. But this vanishes by the differential Bianchi identity for F. So δ D F = 0 for connections which are either self-dual or anti-self-dual and each case gives a special setting where the sequence (3) is a complex. Let us write d±D for the compositions given by d D : E 1 (U ) → E 2 (U ) followed by the 2 (U ), where U means either the bundle V or its dual V ∗ . Thus projections E 2 (U ) → E±
Yang-Mills Detour Complexes
315
2 (U ) are conformally invariant. We write by construction the operators d±D : E 1 (U ) → E± D 2 2 (V ∗ ). δ± : E± (V ) → E1 (V ) for the operators formally adjoint to d±D : E 1 (V ∗ ) → E± By construction these also are conformally invariant. Now on ∈ E 0 (V ) we have 2 vanishes for all if and only if F = 0. d D d D = F. The projection of this into E± ± By a similar observation for the composition d D d D on E 0 (V ∗ ), and then taking formal adjoints, we see that we have the situation in the following proposition. These results are well-known.
Proposition 3.5. The sequences δ+D
d+D
dD
δD
E 0 (V ) −→ E 1 (V ) −→ E+2 (V ) and E+2 (V ) −→ E1 (V ) −→ E0 (V ) are complexes if and only if F+ = 0. Similarly the sequences D δ−
d−D
dD
δD
2 2 E 0 (V ) −→ E 1 (V ) −→ E− (V ) and E− (V ) −→ E1 (V ) −→ E0 (V )
are complexes if and only if F− = 0. In Riemannian signature each of these is an elliptic complex. Evidently then we obtain detour complexes by composing the twisted de Rham subcomplexes in the proposition. For example if the connection D is anti-self-dual then there is a detour complex 2δ+D d+D
dD
δD
E 0 (V ) −→ E 1 (V ) −→ E1 (V ) −→ E0 (V ).
(9)
Similarly if D is instead self-dual then there is a detour complex Dd D 2δ− −
dD
δD
E 0 (V ) −→ E 1 (V ) −→ E1 (V ) −→ E0 (V ).
(10)
The following result is a straightforward calculation. Proposition 3.6. The complexes (9) and (10) are special cases of the twisted de Rham detour complex (3) of Theorem 3.2. 4. Translation via the Yang-Mills Detour Complex We may use Theorem 3.2 to construct more exotic differential complexes. The ideas here are partly inspired by Eastwood’s curved translation principle [21,20] which in turn is a geometric adaptation of the Jantzen-Zuckermann translation functor from representation theory. Consider the following general situation. Suppose that there are vector bundles (or rather section spaces thereof) B 0 , B 1 , B1 and B0 and differential operators L 0 , L 1 , L 1 , L 0 , D and D which act as indicated in the following diagram: dD
E 0 (V ) −→ [D]
L0
6
B0
E 1 (V ) L1
D
−→
MD
−→
6
B1
δD
E1 (V ) −→ L1
MB
−→
? B1
E0 (V ) L0
D
−→
? B0
316
A. R. Gover, P. Somberg, V. Souˇcek
The top sequence is (3) for a connection D with curvature F and the operator M B : B 1 → B1 is defined to be the composition L 1 M D L 1 . Suppose that the squares at each end commute, in the sense that as operators B 0 → E 1 (V ) we have d D L 0 = L 1 D and as operators E1 (V ) → B0 we have L 0 δ D = DL 1 . Then on B 0 we have M B D = L 1 M D L 1 D = L 1 M D d D L 0 = L 1 ε(δ D F)L 0 , and similarly DM B = −L 0 ι(δ D F)L 0 . Thus if D is Yang-Mills then the lower sequence, viz. D
MB
D
B 0 −→ B 1 −→ B1 −→ B0 ,
(11)
is a complex. Remarks. Note that if the connection D preserves a Hermitian or metric structure on V then we need only the single commuting square d D L 0 = L 1 D on B 0 to obtain such a complex; by taking formal adjoints we obtain a second commuting square (L 0 δ D = DL 1 ) : B1 → B0 , where B0 and B1 are appropriate density twistings of the bundles dual to B 0 and B 1 respectively. Obviously for (11) to be a complex, it is sufficient (and necessary) for L 1 M D (d D L 0 − L 1 D) to vanish on B 0 and for (DL 1 − L 0 δ D )M D L 1 to vanish on B 1 . 4.1. The complex for (almost) Einstein scales. We work in the setting of conformal n-manifolds, n ≥ 3. We will construct here a diagram of the form [D] via the normal conformal tractor connection. The standard tractor bundle is a vector bundle with a conformally invariant connection that we may view as arising as an induced structure from the Cartan bundle and connection of [17]. In fact the Cartan connection is readily recovered from the tractor connection, see [14] where such connections and related calculus are described for the class of parabolic geometries (which also includes, for example, CR geometry, quaternionic structures and projective geometry). For our current construction it is not the normality of the tractor connection, in the sense of [14,17], that is important. Rather the key point is that it arises from a prolongation (as observed in [3]) of a certain (finite type) partial differential operator P that we may take as the operator D for the diagram [D]: In terms of a metric g, this operator P is given by Pσ = TF(∇a ∇b σ + Pab σ ),
(12)
where σ ∈ E[1]. Modulo the trace part, this is the differential operator which controls the conformal transformation of the Schouten tensor. In particular a metric σ −2 g is Einstein if and only if the scale σ ∈ E[1] is non-vanishing and satisfies Pσ = 0. In order to be explicit we give a construction of the tractor connection here, as it is the key to obtaining the required commutative diagram. For further details see [13]. We write J k E[1] for the bundle of k-jets of germs of sections of E[1]. Considering, at each point of the manifold, sections which vanish to first order at the given point reveals a canonical sequence, 0 → S 2 T ∗ M ⊗ E[1] → J 2 E[1] → J 1 E[1] → 0. This is the jet exact sequence at 2-jets. Via the conformal metric g, on a conformal manifold the bundle of symmetric covariant 2-tensors S 2 T ∗ M decomposes directly into
Yang-Mills Detour Complexes
317
the trace-free part, which we will denote E 1,1 , and a pure trace part isomorphic to E[−2], that is S 2 T ∗ M[1] = E 1,1 [1] ⊕ E[−1]. The standard tractor bundle T may be defined as the quotient of J 2 E[1] by the image of E 1,1 [1] in J 2 E[1]. By construction this is invariant, it depends only on the conformal structure. Also by construction, it is an extension of the 1-jet bundle 0 → E[−1] → T → J 1 E[1] → 0. Note that there is a tautological operator D : E[1] → E 0 (T) which is simply the composition of the universal 2-jet differential operator j 2 : E[1] → E 0 (J 2 E[1]) followed by the canonical projection E 0 (J 2 E[1]) → E 0 (T). By construction this is invariant. Via a choice of metric g, and the Levi-Civita connection it determines, we obtain a differential operator E[1] → E[1] ⊕ E 1 [1] ⊕ E[−1] by σ → (σ, ∇a σ, − n1 ( + J)σ ) and this obviously determines an isomorphism g
E 0 (T) ∼ = E[1] ⊕ E 1 [1] ⊕ E[−1].
(13)
Changing to a conformally related metric g = e2ω g (ω a smooth function) gives a different isomorphism, which is related to the previous by the transformation formula (σ, µb , τ ) = (σ, µb − σ ϒb , τ + g bc ϒb µc − 21 σ g bc ϒb ϒc ), E[1] ⊕ E 1 [1] ⊕ E[−1]
where ϒ := dω. Now we define a connection on ⎛ ⎞ ⎞ ⎛ ∇a σ − µa σ ∇a ⎝ µb ⎠ := ⎝ ∇a µb + gab ρ + Pab σ ⎠ ρ ∇a ρ − Pab µb
(14)
by the formula (15)
where, on the right-hand-side ∇ is the Levi-Civita connection for g. Obviously this determines a connection on T via the isomorphism (13). What is more surprising is that if we repeat this using the metric g , conformally related to g, in (13) and (15) we obtain the same connection on T. This may easily be verified by, for example, directly calculating that under such a conformal change the right-hand side of (15) transforms in exactly the same way as a (1-form valued) invariant section of T. That is it transforms according to (14). The canonical connection on T, so constructed, depends only on the conformal structure and is known as the (normal standard) tractor connection. In what follows we will use (13) without further explicit comment. There is also a conformally invariant tractor metric h on T given (as a quadratic form) by (σ, µ, ρ) → g −1 (µ, µ) + 2σρ. This is preserved by the connection and has signature ( p + 1, q + 1) (corresponding to g of signature ( p, q)). Note that, given a metric g, through (13) the tautological invariant operator D from above is given by the explicit formula 1 σ → (σ, ∇a σ, − (σ + Jσ )). n This is called a differential splitting operator since through the jet projections there is conformally invariant surjection X : E(T) → E[1] which inverts D. There is also a differential splitting operator E 0 [1] → E 0 (T)
E : E 1,1 [1] → E 1 (T)
ψab → (0, ψab , −(n − 1)−1 ∇ b ψab )
(cf. [20]). An easy calculation verifies that this also is conformally invariant. We have the following.
318
A. R. Gover, P. Somberg, V. Souˇcek
Proposition 4.1. With ∇ denoting the tractor connection on E 0 (T) we have ∇D = E P as differential operators on E[1]. For σ ∈ E[1], Dσ is parallel if and only if Pσ = 0. Proof. The second statement is immediate from the first. A straightforward calculation verifies that either composition applied to σ ∈ E[1] yields ⎛ ⎞ 0 ⎝ ⎠ TF(∇a ∇b σ + Pab σ ) − n1 ∇a (σ + J σ ) − Pa c ∇c σ In fact if a section I ∈ E 0 (T) is parallel then I = Dσ for some σ ∈ E[1] so a conformal manifold with a parallel tractor is almost Einstein in the sense that it has a section of E[1] that gives an Einstein scale on an open dense subset (see [28] for further details). Since the tractor connection is orthogonal (for the conformally invariant tractor metric h given above) the formal adjoints of the operators above give another commutative square of operators. That is with P ∗ : E1,1 [−1] → E0 [−1]ϕab → ∇ a ∇ b ϕab + Pab ϕab , 1 E ∗ : E1 (T) → E1,1 [−1](αa , νab , τa ) → ν(ab)0 + ∇(a αb)0 , n−1 1 D∗ : E0 (T) → E0 [−1](σ, µb , ρ) → ρ − ∇ a µa − (σ + J σ ), n δ ∇ : E1 (T) → E0 (T)a B → −∇ a a B , where E1,1 denotes the space of sections of E 1,1 ⊗ E[4 − n], we have D∗ δ ∇ = P ∗ E ∗ on E1 (T). Finally observe that the curvature of the tractor connection, as calculated directly from (15), is ⎛ ⎞ 0 0 0 ab C D = ⎝ Ac ab Cab c d 0 ⎠ 0 −Adab 0 and hence (see e.g. [31] for further details), ⎛ ⎞ 0 0 0 ∇ a ab C D = ⎝ B c b (n − 4)Ab c d 0 ⎠ 0 −Bdb 0
(16)
where, on the left-hand side, ∇ is the Levi-Civita connection coupled with the tractor connection on End(T) induced from (15). Let us say that a pseudo-Riemannian manifold is semi-harmonic if its tractor curvature is Yang-Mills, that is ∇ a ab C D = 0. Note that in dimensions n = 4 this is not a conformally invariant condition and a semi-harmonic space is a Cotton space that is also Bach-flat. From our observations above, the semiharmonic condition is conformally invariant in dimension 4 and according to the last display we have the following result.
Yang-Mills Detour Complexes
319
Lemma 4.2. In dimension 4 the tractor connection (15) is a Yang-Mills connection if and only if the structure is Bach-flat. This result is not new and equivalent observations have been known in the literature for some time [4,37,39]. It brings us to the following. Let us write M T for the composition E ∗ M ∇ E. On h ∈ E 1,1 [1] we have 1 ∇a ∇ c h bc + Ca c b d h cd , (M T h)ab = −T F S ∇ c (∇c h ab − ∇a h cb ) − n−1 and the following results. Theorem 4.3. The sequence MT
P
P∗
E 0 [1] → E 1,1 [1] −→ E1,1 [−1] → E0 [−1]
(17)
has the following properties: (i) It is a formally self-adjoint sequence of differential operators and, for σ ∈ E 0 [1],
(M T Pσ )ab = −T F S Bab σ − (n − 4)Aabc ∇ c σ ,
(18)
where T F S(· · · ) indicates the trace-free symmetric part of the tensor concerned. In particular it is a complex on semi-harmonic manifolds. (ii) In the case of Riemannian signature the complex is elliptic. (iii) In dimension 4, (17) is a sequence of conformally invariant operators and it is a complex if and only if the conformal structure is Bach-flat. Proof. Setting D = P,
L 0 = D,
L1 = E
we have the situation of the translation diagram [D] above, with the right square given by formal adjoints of these operators, and the tractor bundle connection pair (T, ∇) used for (V, D) in the top row. That is: d∇
E 0 (T) −→ D
6 E 0 [1]
E 1 (T) E
P
−→
M∇
−→
6
E 1,1 [1]
E1 (T)
δ∇
−→
E∗
MT
−→
? P∗ E1,1 [−1] −→
E0 (T) D∗
? E0 [−1]
By construction the lower sequence (17) is formally self-adjoint, and in dimension 4 conformally invariant. If the structure is semi-harmonic then the upper sequence is a complex and hence, from the commutativity of the diagram, (17) is a complex. In particular on Bach-flat 4-manifolds we obtain a complex. On the other hand, from (18) it follows that in dimension 4 we obtain a complex only if the structure is Bach-flat.
320
A. R. Gover, P. Somberg, V. Souˇcek
From (15) we calculate d ∇ on the range of E to obtain ⎛ ⎞ ⎛ ⎞ 0 0 d ∇ ⎝ ν ⎠ = ⎝ Qν ⎠ τ ∗
(19)
1 where, for ν ∈ E 1,1 [1], we have τ = − (n−1) ∇ b νab , Q is given by
(Qν)abc = 2∇[a νb]c + 2g c[a τb] , and we do not need the details of the term indicated by ∗. It follows immediately from (19), and the formulae for the tractor metric, that we have M T := E ∗ M ∇ E = Q ∗ Q + LOT, where Q ∗ denotes the formal adjoint of the operator Q. In Riemannian signature the leading symbol of Q ∗ Q has the same kernel as the leading symbol of Q, and it follows easily that the complex is elliptic. The “lower order terms” (indicated by LOT) in M T arise simply from the tractor curvature in the formula for M D and amount to an action by the Weyl curvature. Including this yields the explicit formula for M T given above the theorem. The expression (18) for the composition M T P follows from this by a short direct calculation. (The calculation is even simpler if the result of Lemma 3.1 is imported.) Remark. Note that the formula for M T is closely related to, but not the same as, the operator which arises from deformations of Einstein structures (for the latter see e.g. [6] and references therein). It should be valuable to expose the geometric meaning of the first cohomology of the sequence (17). Corollary 4.4. Einstein 4-manifolds are Bach-flat. Proof. If a non-vanishing density σ is an Einstein scale then, calculating in that scale, we have M T Pσ = −Bσ , where B is the Bach tensor. On the other hand if σ is an Einstein scale then Pσ = 0 (see (12)). Remarks. In fact, more generally, almost Einstein manifolds are also necessarily Bach flat. Since an almost Einstein manifold has an Einstein scale on an open dense subspace, this follows by continuity of the Bach tensor. (The higher dimensional extension of this result is that even dimensional almost Einstein manifolds have vanishing FeffermanGraham obstruction tensor, see e.g. [28] and references therein.) The result that Einstein metrics are Bach-flat is well-known by other means (see e.g. [38,31]). Nevertheless we feel the detour complex gives an interesting route to this. In any dimension Einstein metrics satisfy Pab = n1 J gab with J constant, so it follows from the definitions of the Cotton tensor and the Bach tensor (2) that Einstein metrics are semi-harmonic. Thus there are many examples of semi-harmonic manifolds. 4.2. The twistor spinor complex. We assume here that we have a conformal spin structure. This is no restriction locally. For the purpose of being self-contained and having the results in a uniform notation we derive the basic spinor identities we require. An alternative treatment for many of these may be found in, e.g. [26]. We will use the spintractor connection below. This is often termed the local twistor connection [5,41]. The notation we use (and the basic tractor tools) follows [9] which presents a spin-tractor calculus developed by the first author and Branson. Following that source we write S
Yang-Mills Detour Complexes
321
for the basic spinor bundle and S = S[−n] (i.e. the bundle that pairs globally in an invariant way with S on conformal n-manifolds). Evidently the weight conventions here give S a “neutral weight”. In terms of, for example, the Penrose weight conventions S = E λ [− 21 ] = E λ [ 21 ], where E λ denotes the basic contravariant spinor bundle in [41]. We write Tw for the so-called twistor bundle, that is the subbundle of T ∗ M ⊗ S[1/2] consisting of form spinors u a such that γ a u a = 0, where γa is the usual Clifford symbol. We use S and Tw also for the section spaces of these bundles. The twistor operator is the conformally invariant Stein-Weiss gradient T : S[1/2] → Tw given explicitly by ψ → ∇a ψ +
1 γa γ b ∇b ψ. n
The main result of this section is that this completes to a differential complex as follows. Theorem 4.5. On semi-harmonic pseudo-Riemannian n-manifolds n ≥ 4 we have a differential complex T
M
T∗
S[1/2] → Tw −→ Tw → S[−1/2],
(20)
where T is the usual twistor operator, T∗ its formal adjoint, and M is a third order operator and given by the formula (26) below. The sequence is formally self-adjoint and in the case of Riemannian signature the complex is elliptic. In dimension 4 the sequence (20) is conformally invariant and it is a complex if and only if the conformal structure is Bach-flat. Remarks. Of course on a fixed pseudo-Riemannian manifold we may ignore the conformal weights. Note also that, for example in dimension 4, under the chirality decomposition of this sequence, we get the two complexes T
M
T∗
S± [1/2] → Tw± −→ Tw∓ → S∓ [−1/2], by the restriction of the operators T, T∗ , and M . If we were to apply the construction below in dimension 3 then we would obtain a trivial operator M . In this dimension the BGG sequence (see Sect. 2) takes the form (20), where the middle operator is of second order. In the calculations which follow it will often be convenient to use abstract indices for the form bundles while at the same time not using any indices for the spinor bundles. We have already done this implicitly above, for example in the formula for the twistor operator which, in this notation, acts Ta : S[1/2] → Twa . From the usual gamma matrices γ a satisfying γ a γ b + γ b γ a = −2g ab Id √ we switch to the symbols β := γ / 2, so that β a β b + β b β a = −g ab Id;
(21)
322
A. R. Gover, P. Somberg, V. Souˇcek
this simplifies certain formulae in the following discussion. We denote the corresponding Dirac operator by D := β a ∇a . Given a metric g from the conformal class the spin-tractor bundle is given by g
∼ = S[1/2] ⊕ S[−1/2], where S. In the conformally related metric g = e2ω g we have a similar isomorphism and ψ ψ = , (22) ϕ + ϒc β c ψ ϕ where ϒ = dω. In terms of the g-splitting the normal conformal spin-tractor connection is given by ∇a ψ + βa ϕ ψ ∇a = . (23) ϕ ∇a ϕ + Pab β b ψ On the right side ∇ means the usual Levi-Civita (spin) connection, while on the left the same notation is used for the spin-tractor connection. It is an easy exercise to verify directly that this is a conformally invariant connection. The normality follows from the characterisation of normal tractor connections (for irreducible parabolic geometries) given in Theorem 1.3 of [13]. The invariant pseudo-Hermitian form on spin-tractors is given by ϕ, ψ + ψ, ϕ for a pair of spin-tractors (ϕ, ψ), (ϕ, ψ), and where ·, · is the usual Hermitian form on spinors (which is compatible with Clifford multiplication and is preserved by the LeviCivita spin connection). It is readily verified that this is invariant under the transformations (22) and that it is preserved by the spin-tractor connection (23). We subsequently calculate in a metric scale g without further comment. We construct two differential splitting operators: L 0 : S[1/2] → E 0 () is given by ψ →
ψ ; 2 n Dψ
(24)
L 1 : Tw → E 1 () is given by ψa →
ψa . 2 1 b n−2 (Dψa − n−1 βa ∇ ψb )
(25)
It is a straightforward exercise to verify that these transform according to (22) and so are expressions, in the metric scale g, for conformally invariant operators. An essential feature of these operators is the following commutativity result.
Yang-Mills Detour Complexes
323
Proposition 4.6. With ∇ denoting the spin-tractor connection on E 0 () we have d ∇ L 0 = L 1 T, as differential operators on S[1/2]. For ψ ∈ S[1/2], L 0 ψ is parallel if and only if Tψ = 0. The last comment follows immediately from the commutativity of the square given that L 1 is a differential splitting operator (and so, in particular, L 1 ψa = 0 ⇒ ψa = 0). A correspondence between parallel spin-tractors and twistor spinors dates back to [25]. The extra information in the proposition is that the operator L 1 is a conformally invariant tractor splitting operator. We shall postpone the proof of Proposition 4.6, as we prefer to first complete the proof of Theorem 4.5. Suppose that for a metric g, from the conformal class, the tractor connection is semi-harmonic. Recall this is exactly the condition that the normal tractor connection is Yang-Mills. It follows immediately that the spin-tractor connection is also YangMills, since this is induced from the same principal connection simply pulled back to the 2-1 covering Spin( p + 1, q + 1)-principal bundle. (Equivalently they arise from the same Cartan connection; this is the usual picture [14] and sufficient to see this result. From the Cartan picture one may easily extend to a principal bundle and connection from which the tractors are induced, and this is simply an alternative framework.) As observed above, in dimension 4 the Yang-Mills condition is exactly the condition that the metric (or conformal structure) is Bach-flat. Thus, from the proposition, the first part of Theorem 4.5 follows immediately from the commutative diagram below where M = L 1 M ∇ L 1, d∇
E 0 () −→ L0
6
E 1 () L1
T
S[1/2] −→
M∇
−→
6
Tw
(26) δ∇
E1 () −→ L1
M
−→
? Tw
E0 () L0
T∗
−→
? S[−1/2]
Using that is a self-dual bundle, the operators in the square at the right end of the diagram are defined as the formal adjoints of the operators in the first square. So all squares commute and both horizontal sequences are formally self-adjoint. To establish ellipticity we require the leading term of the operator M . Applying the spin-tractor twisted exterior derivative to L 1 t, for t ∈ Tw, we obtain a result of the form Kt (27) DKt + mβδ ∇ Kt + cur vatur e with Kt given by 2∇a1 ta2 +
4 1 βa1 (Dta2 − βa ∇ c tc ), n−2 n−1 2
where the indices a1 a2 are implicitly skewed over, and m are constants, and δ ∇ is the spin-Levi-Civita connection twisted interior derivative. By construction K is an invariant
324
A. R. Gover, P. Somberg, V. Souˇcek
operator K : Tw → Tw2 , where Tw2 is the subbundle of E 2 ⊗ S[1/2] consisting of spin-forms annihilated by interior multiplication by β. It is a straightforward exercise (or one may use the BGG machinery of [15]) to construct a differential splitting operator L 2 : Tw2 → E 2 ⊗ . This has the form s → (s, Ds + mβδ ∇ s) (cf. L 1 ) where = 0. On the other hand in the conformally flat case it follows easily, from the uniqueness of conformal differential operators, that L 2 K = d ∇ L 1 . Thus we obtain the form of the bottom slot of (27). Since the leading term of M is obtained by composing d ∇ L 1 with its formal adjoint, and D is formally self-adjoint, it follows that the leading term of the operator M is of the form K∗ π2 DK. Here K∗ denotes the formal adjoint of K and the projection π2 is the projection of spinor-valued two-forms to the space T w2 . The operator R2 = π2 D on T w 2 is an elliptic and self adjoint operator; it is a higher spin analogue of the Dirac operator. Similarly, if π1 denotes the projection of spinorvalued one-forms to T w, the operator R1 = π1 D is an elliptic self adjoint operator on T w (usually called the Rarita-Schwinger operator). Moreover, R1 K∗ is a multiple of K∗ R2 . Hence the leading term of the operator M is a multiple of R1 K∗ K. Since elements of Tw2 are annihilated by interior Clifford multiplication, it follows from the formula for K that the symbol σξ (K∗ ) is simply interior multiplication by ξ , ι(ξ ). Without loss of generality we may suppose that ξ is a unit vector. It is well known that the Rarita-Schwinger operator (on the flat space) can be composed with a third order constant coefficient operator to give the square of the Laplace operator. Hence the symbol of R1 can be multiplied on the left to get |ξ |4 = 1. By this left multiplication, σξ (K∗ R2 K)t = σξ (R1 K∗ K)t = 0 implies ι(ξ )σξ (K)t = 0. Now the explicit formula for σξ (K)t is 1 2 ε(β)(βξ t − ε(β)ι(ξ )t), n−2 n−1 where ε(·) indicates exterior multiplication. Contracting with ξ and setting to zero we obtain ((n − 3) times) 2 (n − 1)t = n ε(ξ )ι(ξ )t + ε(β)βξ ι(ξ )t . n ε(ξ )t +
The right hand side here is a multiple of σξ (T)ι(ξ )t. Thus t is in the range of σξ (T), as required. Completing the proof and remarks. From (16), (24), and Lemma (3.1) it follows easily that in dimension 4 the composition M T on ϕ, a section of S[1/2], is, up to a non-zero multiple, a Clifford multiplication of the Bach tensor Bab β b ϕ. Thus the formally selfadjoint sequence (4.5) is a complex if and only if the structure is Bach-flat, as claimed in the theorem. If ϕ is a twistor spinor (i.e. Tϕ = 0) then this Clifford action of the Bach tensor on ϕ obviously vanishes. In Riemannian signature it is in fact straightforward to recover a parallel standard tractor from the parallel spin-tractor corresponding to a twistor spinor. Thus Riemannian manifolds admitting a twistor spinor are almost Einstein and so Bach flat. In fact the last conclusion here is well-known [5]. Proof of Proposition 4.6. Let ψ be a section of S[1/2]. Using the formula (24) for the splitting operator and the expression (23) for the spin-tractor connection we have ψ ∇a ψ + n2 βa Dψ ∇a L 0 ψ = ∇a 2 . = 2 b n Dψ n ∇a Dψ + Pab β ψ
Yang-Mills Detour Complexes
Recalling that γ =
√
325
2β and that D = β a ∇a , we thus have Ta ψ . ∇a L 0 ψ = 2 b n ∇a Dψ + Pab β ψ
From (25) it is clear that it remains to show that 2 2 1 ∇a Dψ + Pab β b ψ = (DTa ψ − βa ∇ b Tb ψ). n n−2 n−1
(28)
Let us note some simpler identities. First for the Levi-Civita spin-connection the curvature on a spinor ϕ is given by 1 [∇a , ∇b ]ϕ = − Rabcd β c β d ϕ, 2 where R is the usual Riemannian curvature. Then from the Bianchi identities and the Clifford relation (21) we get Rabcd β b β c β d = Ricab β b Rabcd β a β b β c β d = −Sc/2.
(29)
Next an elementary calculation shows that T∗ , the formal adjoint of T, is given on T w by ϕa → −∇ a ϕa . Thus, for ψ in S[1/2], −T∗ Tψ = ψ +
2 2 D ψ, n
(where := ∇ a ∇a ) since of the spin-connection preserves the Clifford symbols. On the other hand since D 2 = β a ∇a β b ∇b = β a β b ∇a ∇b and the βs anti-commute up to a trace, as in (21), while the ∇s commute up to curvature we obtain 1 1 D 2 ψ = − ψ − Rabcd β a β b β c β d ψ, 2 4 and so using (29) we come to 1 ψ = −2D 2 ψ + Sc·ψ. 4 This with the expression above for T∗ T gives 1−n 1 ∗ D 2 ψ + Sc·ψ. − T Tψ = 2 n 4
(30)
We are now ready to calculate the left-hand side of (28). Applying β b ∇b to the defining identity ∇a ψ = − n2 βa Dψ + Ta ψ we get 2 β b ∇b ∇a ψ = − β b βa ∇b Dψ + β b ∇b Ta ψ. n Commuting the derivatives on the left and writing D as a shorthand for β b ∇b (applied to e.g. Tψ), we obtain ∇a Dψ +
1 2 2 Rabcd β b β c β d ψ = ∇a Dψ + βa D 2 ψ + DTa ψ. 2 n n
326
A. R. Gover, P. Somberg, V. Souˇcek
Next rearranging the terms and using (29) gives n−2 2 1 ∇a Dψ = βa D 2 ψ − Ricab β b ψ + DTa ψ. n n 2 Using now the identity (30) from above to substitute for D 2 ψ yields n−2 1 1 1 ∇a Dψ = − Ricab − Scg ab β b ψ + DTa ψ + βa T∗ Tψ. n 2 2(n − 1) n−1 But multiplying this through with 2/(n − 2) and using that the Schouten tensor Pab = 1 1 ∗ a n−2 (Ricab − 2(n−1) Scg ab ), and once again that T ϕ = −∇ ϕa , this gives exactly the expression (28), which is thus seen to be an identity. Acknowledgements. ARG would like to thank the Royal Society of New Zealand for support via Marsden Grants no. 02-UOA-108 and 06-UOA-029. Part of the work was prepared during a visit of ARG supported ˇ by the E. Cech Center. PS and VS acknowledge the support of the grant GA CR 201/05/2117 and the grant MSM 021620839. The first author would like to thank Helga Baum, Kengo Hirachi, Paul-Andi Nagy and Andrew Waldron for illuminating discussions.
References 1. Atiyah, M.F., Hitchin, N., Singer, I.M.: Self-duality in four-dimensional Riemannian geometry. Proc. Roy. Soc. London Ser A 362, 425–461 (1978) 2. Bach, R.: Zur Weylschen Relativitätstheorie und der Weylschen Erweiterung des Krümmungstensorbegriffs. Math. Z. 9, 110–135 (1921) 3. Bailey, T.N., Eastwood, M.G., Gover, A.R.: Thomas’s structure bundle for conformal, projective and related structures. Rocky Mountain J. Math. 24, 1191–1217 (1994) 4. Baston, R.J., Mason, L.J.: The conformal Einstein equations, In: Further advances in twistor theory: Volume II: Integrable systems, conformal geometry and gravitation, edited by L.J. Mason, L.P. Hughston, P.Z. Kobak, Essex: Longman, 1995 5. Baum, H., Friedrich, T., Grunewald, R., Kath, I.: Twistors and Killing spinors on Riemannian manifolds, Teubner-Texte zur Mathematik, 124. Stuttgart: B.G. Teubner Verlagsgesellschaft mbH, 1991 6. Besse, A.L.: Einstein manifolds. Ergebnisse der Mathematik und ihrer Grenzgebiete (3), 10. Berlin: Springer-Verlag, 1987 7. Boe, B.D., Collingwood, D.H.: A comparison theory for the structure of induced representations, II. Math. Z. 190, 1–11 (1985) 8. Branson, T.: Sharp inequalities, the functional determinant, and the complementary series. Trans. Amer. Math. Soc. 347, 3671–3742 (1995) 9. Branson, T.: Conformal Structure and spin geometry. In: Dirac operators: yesterday and today. Proceedings of the Summer School and Workshop held in Beirut, August 27–September 7, 2001, edited by J.-P. Bourguignon, T. Branson, A. Chamseddine, O. Hijazi, R.J. Stanton. Somerville, MA: International Press, 2005 10. Branson, T.: Q-curvature and spectral invariants. Rend. Circ. Mat. Palermo (2) Suppl. No. 75, 11–55 (2005) 11. Branson, T., Gover, A.R.: Conformally invariant operators, differential forms, cohomology and a generalisation of Q-curvature. Comm. Part. Differ. Equs. 30, 1611–1669 (2005) 12. Branson, T., Gover, A.R.: The conformal deformation detour complex for the obstruction tensor. Proc. Amer. Math. Soc. 135, 2961–2965 (2007) ˇ 13. Cap, A., Gover, A.R.: Tractor bundles for irreducible parabolic geometries. In: Global analysis and harmonic analysis (Marseille-Luminy, 1999), Sémin. Congr. 4, Paris: Soc. Math. France, 2000, pp. 129–154 ˇ 14. Cap, A., Gover, A.R.: Tractor calculi for parabolic geometries. Trans. Amer. Math. Soc. 354, 1511–1548 (2002) ˇ 15. Cap, A., Slovák, J., Souˇcek, V.: Bernstein-Gelfand-Gelfand sequences. Ann. Math. 154, 97–113 (2001) ˇ 16. Cap, A., Souˇcek, V.: Subcomplexes in Curved BGG-Sequences. http://arxiv.org/list/math.DG/0508534, 2005
Yang-Mills Detour Complexes
327
17. Cartan, E.: Les espaces à connexion conforme. Ann. Soc. Pol. Math. 2, 171–202 (1923) 18. Chang, S.-Y.A.: Non-linear elliptic equations in conformal geometry. Zurich Lectures in Advanced Mathematics, Zürich: European Mathematical Society, 2004 19. Donaldson, S.K.: Floer homology groups in Yang-Mills theory. Cambridge Tracts in Mathematics 147. Cambridge: Cambridge University Press, 2002 20. Eastwood, M.: Notes on conformal differential geometry. Rend. Circ. Mat. Palermo (2) Suppl. No. 43, 57–76 (1996) 21. Eastwood, M.G., Rice, J.W.: Conformally invariant differential operators on Minkowski space and their curved analogues. Commun. Math. Phys. 109, 207–228 (1987); Erratum: Commun. Math. Phys. 144, 213 (1992) 22. Eastwood, M.G., Slovák, J.: Semiholonomic Verma modules. J. Alg. 197, 424–448 (1997) 23. Fefferman, C., Graham, C.R.: Conformal invariants. In: Elie Cartan et les mathématiques d’aujourd’hui, Astérisque 95–116, hors série (Paris: SMF, 1985) 24. Fefferman, C., Graham, C.R.: Q-curvature and Poincaré metrics. Math. Res. Lett. 9, 139–151 (2002) 25. Friedrich, T.: On the conformal relation between twistors and Killing spinors. Rend. Circ. Mat. Palermo (2) Suppl. No. 2, 59–75 (1989) 26. Friedrich, T.: Dirac-Operatoren in der Riemannschen Geometrie, Mit einem Ausblick auf die SeibergWitten-Theorie. Advanced Lectures in Mathematics. Braunschweig: Friedr. Vieweg & Sohn, 1997 27. Gover, A.R.: Aspects of parabolic invariant theory. Rend. Circ. Mat. Palermo (2) Suppl. No. 59, 25–47 (1999) 28. Gover, A.R.: Almost conformally Einstein manifolds and obstructions. In: Differential geometry and its applications, Prague: Matfyzpress, 2005, pp. 247–260 29. Gover, A.R.: Laplacian operators and Q-curvature on conformally Einstein manifolds. Math. Ann. 336, 311–334 (2006) 30. Gover, A.R., Leitner, F.: A sub-product construction of Poincare-Einstein metrics. http://arxiv.org/list/ math.DG/0608044, 2006 31. Gover, A.R., Nurowski, P.: Obstructions to conformally Einstein metrics in n dimensions. J. Geom. Phys. 56, 450–484 (2006) 32. Gover, A.R., Peterson, L.J.: Conformally invariant powers of the Laplacian, Q-curvature, and tractor calculus. Commun. Math. Phys. 235, 339–378 (2003) 33. Gover, A.R., Šilhan, J.: The conformal Killing equation on forms – prolongations and applications, Diff. Geom. Applic., to appear. http://arxiv.org/list/math.DG/0601751, 2006 34. Graham, C.R., Hirachi, K.: The ambient obstruction tensor and Q-curvature. In: AdS/CFT correspondence: Einstein metrics and their conformal boundaries, IRMA Lect. Math. Theor. Phys. 8, Zürich: Eur. Math. Soc., 2005, pp. 59–71 35. Graham, C.R., Jenne, R., Mason, L., Sparling, G.: Conformally invariant powers of the Laplacian, I: existence. J. London Math. Soc. 46, 557–565 (1992) 36. Graham, C.R., Zworski, M.: Scattering matrix in conformal geometry. Invent. Math. 152, 89–118 (2003) 37. Korzynski, M., Lewandowski, J.: The normal Cartan connection and the Bach tensor. Class. Quant. Grav. 20, 3745–3764 (2003) 38. Kozameh, C., Newman, E.T, Tod, K.P.: Conformal Einstein Spaces. GRG 17, 343–352 (1985) 39. Merkulov, S.: A conformally invariant theory of gravitation and electromagnetism. Class. Quant. Grav. 1, 349–354 (1984) 40. Lee, J.M., Parker, T.H.: The Yamabe problem. Bull. Amer. Math. Soc. 17, 37–91 (1987) 41. Penrose, R., Rindler, W.: Wolfgang, Spinors and space-time. Vol. 1 and Vol. 2. Spinor and twistor methods in space-time geometry, Cambridge Monographs on Mathematical Physics, Cambridge: Cambridge University Press, 1987, 1988 Communicated by A. Connes
Commun. Math. Phys. 278, 329–384 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0406-0
Communications in
Mathematical Physics
Wavepacket Preservation Under Nonlinear Evolution A. Babin, A. Figotin Department of Mathematics, University of California at Irvine, Irvine, CA 92697, USA. E-mail:
[email protected];
[email protected] Received: 27 July 2006 / Accepted: 24 August 2007 Published online: 8 January 2008 – © Springer-Verlag 2007
Abstract: We study nonlinear systems of hyperbolic PDE’s in Rd , the hyperbolicity is understood in a wider sense, namely multiple roots of the characteristic equation are allowed and dispersive equations are permitted. They describe wave propagation in dispersive nonlinear media such as, for example, electromagnetic waves in nonlinear photonic crystals. The initial data is assumed to be a finite sum of wavepackets referred to as a multi-wavepacket. The wavepackets and the medium nonlinearity are characterized by two principal small parameters β and where: (i) β1 is a factor describing spatial extension of involved wavepackets; (ii) 1 is a factor describing the relative magnitude of the linear part of the evolution equation compared to its nonlinearity. A key element in our approach is a proper definition of a wavepacket. Remarkably, the introduced definition has a flexibility sufficient for a wavepacket to preserve its defining properties under a general nonlinear evolution for long times. In particular, the corresponding wave vectors and the band numbers of involved wavepackets are “conserved quantities”. We also prove that the evolution of a multi-wavepacket is described with high accuracy by a properly constructed system of envelope equations with a universal nonlinearity. The universal nonlinearity is obtained by a time averaging applied to the original nonlinearity, in simpler cases the averaged system turns into a system of Nonlinear Schrodinger equations.
1. Introduction The underlying physical subject of this work is propagation of a multi-wavepacket (a finite system of wavepackets) in a spatially dispersive and nonlinear medium, and we are particularly interested in electromagnetic waves propagation in nonlinear photonic crystals, see [4–7,55,56,58] and references therein, with the nonlinear optics constitutive relations, [12,15, Sects. 1,2, 42,48]. The mathematical subject of interest is the following general nonlinear evolutionary system
330
A. Babin, A. Figotin
i ∂τ U = − L (−i∇) U + F (U), U (r, τ )|τ =0 = h (r), r ∈ Rd ,
(1)
where (i) U = U (r, τ ), r ∈ Rd , U ∈ C2J is a 2J dimensional vector; (ii) L (−i∇) is a linear self-adjoint differential (pseudodifferential) operator with constant coefficients with the symbol L (k), which is a Hermitian 2J × 2J matrix; (iii) F is a polynomial nonlinearity such that F (0) = 0, F (0) = 0 and F (U) is translation-invariant, i.e. if Ta U (r) = U (r + a) for a ∈ Rd then F (Ta U) = Ta F (U); (iv) h = h (r) is assumed to be the sum of a finite number of wavepackets hl , l = 1, . . . , N ; (v) > 0 is a small parameter. In the case of nonlinear photonic crystals the components of the vector field U (r) are the modal amplitudes of the electromagnetic field and the nonlinearity F (U) is constructed from the nonlinear medium polarization in the adiabatic approximation, [15, Sects. 2.4.2]. The systems of the form (1) also describe as a particular case well-known equations, namely: complexification of the Nonlinear Schrodinger equation; coupled envelope equations which arise in nonlinear birefringent optical media, [41, Sect. 2i]; nonlinear Klein-Gordon and Sine-Gordon equations [61, Sect. 14.1,43, Sect. 5.8.3,44, Sect. 9.6]. Such equations appear in a number of physical problems: elementary particles, dislocations in crystals, propagation of Bloch’s domain walls in the theory of ferromagnetism, self-induced transparency in nonlinear optics, the propagation of magnetic flux quanta in long Josephson transmission lines. Significance and importance of wavepacket solutions from both physical and mathematical points of view is discussed in [4–7,41, Sect. 2, 55,58]. There are numerous problems involving small parameters only in the initial data which can be reduced to the form (1), for instance, problems with high frequency initial data or small initial data with consequent evolution on long time intervals (see Sect. 3 for details). We study the nonlinear evolution equation (1) on a finite time interval 0 ≤ τ ≤ τ ∗ , where τ ∗ > 0 is a fixed number.
(2)
L∞
norm of the initial data h but, importantly, τ ∗ does The time τ ∗ may depend on the not depend on . We consider classes of initial data such that wave evolution governed by (1) is significantly nonlinear on the time interval [0, τ ∗ ] and the effect of the nonlinearity F (U) does not vanish as → 0. Since both the linear operator L (−i∇) and the nonlinearity F (U) are translation invariant, it is natural and convenient to recast the evolution equation (1) by applying to it the Fourier transform with respect to the space variables r, namely ˆ (k) + Fˆ U ˆ (k), U ˆ (k) ˆ (k) = − i L (k) U ∂τ U = hˆ (k), (3) τ =0 ˆ (k) is the Fourier transform of U (r), i.e. where U −ir·k −d ˆ U (r) e dr, U (r) = (2π ) U (k) = Rd
Rd
ˆ (k) eir·k dr, where r, k ∈ Rd , U
(4) and Fˆ is the Fourier form of the nonlinear operator F (U) involving convolutions. The nonlinear evolution equations (1), (3) are commonly interpreted as describing wave propagation in a nonlinear medium. We assume that the linear part L (k) is a 2J ×2J Hermitian matrix with eigenvalues ωn,ζ (k) and eigenvectors gn,ζ (k) satisfying L (k) gn,ζ (k) = ωn,ζ (k) gn,ζ (k), ζ = ±, ωn,+ (k) ≥ 0, ωn,− (k) ≤ 0, n = 1, . . . , J, (5)
Wavepacket Preservation Under Nonlinear Evolution
331
where ωn,ζ (k) are real-valued, continuous for all non-singular k functions, and vectors gn,ζ (k) ∈ C2J have unit length in the standard Euclidean norm. The functions ωn,ζ (k), n = 1, . . . , J , are called dispersion relations between the frequency ω and the wavevector k with n being the band number. We assume that the eigenvalues are naturally ordered by ω J,+ (k) ≥ . . . ≥ ω1,+ (k) ≥ 0 ≥ ω1,− (k) ≥ . . . ≥ ω J,− (k),
(6)
and for almost every k (with respect to the standard Lebesgue measure) the eigenvalues are distinct and, consequently, the above inequalities become strict. Importantly, we also assume the following diagonal symmetry condition ωn,−ζ (−k) = −ωn,ζ (k), ζ = ±, n = 1, . . . , J,
(7)
which is naturally present in many physical problems (see also Remark 14 below), and is a fundamental condition imposed on the matrix L (k). In addition to that in many examples we also have ∗ gn,ζ (k) = gn,−ζ (−k), where z ∗ is complex conjugate to z.
(8)
Very often we will use the following abbreviation: ωn,+ (k) = ωn (k).
(9)
ωn,− (k) = −ωn (−k), ωn,ζ (k) = ζ ωn (ζ k), ζ = ±.
(10)
From (7) we obtain
We also will often use the orthogonal projection n,ζ (k) in C2J onto the complex line defined by the eigenvector gn,ζ (k), namely n,ζ (k) uˆ (k) = u˜ n,ζ (k) gn,ζ (k) = uˆ n,ζ (k), n = 1, . . . , J, ζ = ±.
(11)
As it is indicated by the title of this paper we study the nonlinear problem (1) for initial data hˆ in the form of a properly defined wavepacket or, more generally, a sum of wavepackets which we refer to as multi-wavepacket. The simplest example of a wavepacket w is provided by the following formula: w (r, β) = + (βr) eik∗ ·r gn,+ (k∗ ), r ∈ Rd ,
(12)
where k∗ ∈ Rd is a wavepacket wave vector, n is band number, and β > 0 is a small parameter. We refer to the pair (n, k∗ ) in (12) as the wavepacket nk-pair. Observe that the space extension of the wavepacket w (r, β) is proportional to β −1 and it is large for small β. Notice also that if β → 0 the wavepacket w (r, β) as in (12) tends, up to a constant factor, to the elementary eigenmode eik∗ ·r gn,ζ (k∗ ) of the operator L (−i∇) with the corresponding eigenvalue ωn,ζ (k∗ ). We refer to wavepackets of the simple form (12) as simple wavepackets to underline the very special way the parameter β enters its representation. The function ζ (r), which we call the wavepacket envelope, describes its shape and it can be any scalar complex-valued regular enough function, for example a function from Schwartz space. Importantly, as β → 0 the L ∞ norm of a wavepacket (12) remains constant, and, hence, nonlinear effects in (1) remain strong. Evolution of wavepackets in problems which can be reduced to the form (1) were studied for a variety of equations in numerous physical and mathematical papers, mostly
332
A. Babin, A. Figotin
by asymptotic expansions with respect to a single small parameter similar to β, see 10,13,18,20,23,29,30,38,47,50,51] and references therein. We are interested in general properties of evolutionary systems of the form (1) with wavepacket initial data which hold for a wide class of nonlinearities and all values of the space dimensions d of the number 2J of the system components. Our approach is not based on asymptotic expansions but involves the two small parameters β and with mild constraints on their relative smallness. The constraints can be expressed either in the form of certain inequalities or equalities, and a possible simple form of such a constraint can be a power law β = Cκ where C > 0 and κ > 0 are arbitrary constants. (13) Of course, general features of wavepacket evolution are independent of particular values of the constant C. In addition to that, some fundamental properties such as wavepacket invariance, are also totally independent of the particular choice of the values of κ in (13), whereas other properties are independent of κ as it varies in certain intervals. For instance, dispersion effects are dominant for κ < 1/2, whereas the wavepacket superposition principle of 7] holds for κ < 1. The qualitative picture of wavepacket evolution dependence on small β and is as follows. The parameter β enters problem (1) through the multi-wavepacket initial data h (r, β), whereas enters it through the factor 1 before the linear part. Evidently the factor
1
determines the relative magnitude of the linear part compared to the nonlin-
earity and since 1 is large, one expects the linear part to provide an important input into solutions properties. This input includes, in particular, a key role of eigenmodes and eigenfrequencies (dispersion relations) in expressing the nonlinear evolution. Importantly, in many cases of interest though 1 is large, nonlinear phenomena are significant and this is the case when β ≤ C1/2 . More precisely, if β ≤ C1/2 then, as in the case of finite-dimensional nonlinear ODE evolutionary systems, the large values of 1 lead to a well defined solution factorization into the fast (high frequency) and the slow (low frequency) components. The interplay between the fast and slow components is also similar to the ODE case, namely, the nonlinear evolution is associated primarily with the slow component governed by a nonlinear equation obtained from the original one by a certain canonical time averaging procedure. Our further analysis of the above mentioned interplay shows the following. Firstly, the linear superposition principle holds, 7], that is if κ < 1 is as in (13) and the initial data is a sum of generic wavepackets then the solution is the sum of the solutions for single involved wavepackets with precision with arbitrary small . Secondly, properly defined wavepackets and their linear β 1+
combinations are preserved under the nonlinear evolution (1), which is a subject of this paper. In the light of the above discussion we introduce the slow variable uˆ (k, τ ) by the formula iτ ˆ (k, τ ) = e− L(k) uˆ (k, τ ), U (14) and recast Eq. (3) for it as follows:
−iτ iτ L L ˆ ∂τ uˆ = e Fˆ e uˆ , uˆ τ =0 = h.
Then we obtain an integral form of (15) by integrating it with respect to τ : τ iτ −iτ L L ˆ F uˆ = F () uˆ = e Fˆ e uˆ τ dτ uˆ = F uˆ + h, 0
(15)
(16)
Wavepacket Preservation Under Nonlinear Evolution
333
with an explicitly defined nonlinear polynomial integral operator F (), which depends on the parameter is bounded uniformly with respect to in the Banach . This operator space E = C [0, τ ∗ ] , L 1 of functions vˆ (k, τ ), 0 ≤ τ ≤ τ ∗ , with the norm vˆ (k, τ ) = vˆ (k, τ ) vˆ (k, τ ) dk, (17) = sup E C ([0,τ ],L 1 ) ∗
0≤τ ≤τ ∗ Rd
where L 1 is the Lebesgue space of functions vˆ (k) with the standard norm vˆ (·) 1 = vˆ (k) dk. L
(18)
Sometimes we use more general weighted spaces L 1,a with the norm vˆ 1,a = (1 + |k|)a vˆ (k) dk, a ≥ 0. L
(19)
Rd
Rd
A rather elementary existence and uniqueness theorem (Theorem 29) implies that for a small and, importantly, independent of constant τ ∗ > 0 this equation has a unique solution uˆ (τ ) = G F (), hˆ (τ ), τ ∈ [0, τ ∗ ] , uˆ ∈ C 1 [0, τ ∗ ] , L 1 , (20) where G denotes the solution operator for Eq. (16), the operator depends on operator F (), which itself depends on the parameter . If uˆ (k, τ ) is a solution to Eq. (16) we call the function U (r, τ ) defined by (14), (4) an F-solution to Eq. (1). We denote by Lˆ 1 1 ˆ the space of functions V (r) such that their Fourier transform V (k) belongs to L , and ˆ define V Lˆ 1 = V 1 . Since L
ˆ V L ∞ ≤ (2π )−d V
L1
and Lˆ 1 ⊂ L ∞ ,
(21)
F-solutions to (1) belong to C 1 [0, τ ∗ ] , Lˆ 1 ⊂ C 1 ([0, τ ∗ ] , L ∞ ). We would like to define wavepackets in a form which explicitly allows them to be real valued. This is accomplished based on the symmetry (7) of the dispersion relations by introduction of a doublet wavepacket w (r, β) = + (βr) eik∗ ·r gn,+ (k∗ ) + − (βr) e−ik∗ ·r gn,− (−k∗ ).
(22)
Such a wavepacket is real if − (r), gn,− (−k∗ ) is complex conjugate to + (r), gn,+ (k∗ ), i.e. if (23) − (r) = ∗+ (r), gn,+ (k∗ ) = gn,− (−k∗ )∗ . Considering wavepackets with nk-pair (n, k∗ ) we usually mean doublet ones as in (22), but sometimes + or − may be zero producing (12). To identify characteristic properties of a wavepacket suitable for our needs, let us ˆ (k, β) of an elementary wavepacket w (r, β) defined by look at the Fourier transform w (12), that is ˆ β −1 (k − k∗ ) gn,ζ (k∗ ). ˆ (k, β) = β −d w (24) ˆ (k, β) a wavepacket too, obviously it possesses the following properties: We call such w (i) its L 1 norm is bounded (in fact, constant), uniformly in β → 0; (ii) for every > 0
334
A. Babin, A. Figotin
ˆ (k, β) → 0 for every k outside a β 1− -neighborhood of k∗ , and the converthe value w gence is faster than any power of β if is a Schwartz function. To explicitly interpret the last property we introduce a cutoff function (η), (η) = 1 for |η| ≤ 1, (η) = 0 for |η| > 1, together with its shifted/rescaled modification (k; k∗ ) = k; k∗ , β 1− = β −(1− ) (k − k∗ ) .
(25)
(26)
If in an elementary wavepacket w (r, β) defined by (24) ζ (r) is a Schwartz function then ˆ (·, β) ≤ C ,s β s , 0 < β ≤ 1, 1 − ·, k∗ , β 1− w which holds for arbitrarily small > 0 and arbitrarily large s > 0. Based on the above discussion we give the following definition of a wavepacket which is a minor variation of 7, Def. 8]. Definition 1 (Single-band wavepacket). Let 0 < < 1 be a fixed number. For a given band number n ∈ {1, . . . , J } and a wavevector k∗ ∈ Rd , a function hˆ (β, k) is called a wavepacket with nk-pair (n, k∗ ) and the degree of regularity s > 0 if there exists such β 0 > 0 that for β < β 0 the following conditions are satisfied: (i) hˆ (β, k) is L 1 -bounded uniformly in β, i.e. ˆ (27) h (β, ·) 1 ≤ C, 0 < β < β 0 for some C > 0; L
(ii) hˆ (β, k) has the following structure: hˆ (β, k) = hˆ − (β, k) + hˆ + (β, k) + Dˆ h , 0 < β < β 0 , where hˆ ζ (β, k) = k, ζ k∗ , β 1− n,ζ (k) hˆ ζ (β, k), ζ = ±,
(28) (29)
with ·, ζ k∗ , β 1− defined by (26) and Dˆ h satisfying the following tail estimate: ˆ (30) Dh 1 ≤ C β s , 0 < β < β 0 for some C > 0. L
The inverse Fourier transform h (β, r) of a wavepacket hˆ (β, k) is also called a wavepacket. Point (ii) of the above definition means that the wavepacket hˆ (β, k) is composed of two functions hˆ ζ (β, k), ζ = ±, which take values the in the n th band eigenspace of L (k) and are localized near ζ k∗ , where (n, k∗ ) is the nk-pair of the wavepacket. The number β 0 usually is small and may depend on a wavepacket. Evidently, if a wavepacket has the degree of regularity s, it also has a smaller degree of regularity s ≤ s with the same . Observe that the degree of regularity s is related to the smoothness of ζ (r) in (12) so that the higher the smoothness is the higher s can ˆ ζ ∈ L 1,a then one can take any s < a, see Lemma 52 below. For be taken. Namely, if
example, if in the elementary wavepacket w (r, β) defined by (12) ζ (r) is a Schwartz function then it has arbitrarily large degree of regularity.
Wavepacket Preservation Under Nonlinear Evolution
335
Remarkably it turns out that wavepackets satisfying Definition 1 preserve their defining properties under nonlinear evolution. It is remarkable, in particular, since it is wellknown that determination of classes of solutions which preserve their form under generic nonlinear evolution usually leads to infinite expansions, such as multi-scale expansions, power expansions, modal expansions, etc. with serious difficulties in establishing the convergence. Such expansions often are formally invariant, but they involve infinitely many rather complex terms and establishing the convergence is a very hard problem indeed if there is any convergence at all. Our Definition 1 of a wavepacket involves only a finite number of terms and its invariance is provided by the flexible tail term Dˆ h . We also find remarkable the very simplicity of the definition which nevetherless allows for a sufficiently detailed analysis of the dynamics, including, in particular, rigorously justified NLS-type approximations of wavepacket dynamics presented in the following sections. Our special interest is in waves that are finite sums of wavepackets and we refer to them as multi-wavepackets. Definition 2 (Multi-wavepacket). Let S be a set of nk-pairs: S = {(nl , k∗l ), l = 1, . . . , N } ⊂ = {1, . . . , J } × Rd , (nl , k∗l ) = (nl , k∗l ) for l = l , (31) and N = |S| be their number. Let K S be a set consisting of all different wavevectors k∗l involved in S with |K S | ≤ N being the number of its elements. K S is called wavepacket k-spectrum and without loss of genericity we assume the indexing of elements in S to be such that K S = {k∗i , i = 1, . . . , |K S |} , i.e. li = i for 1 ≤ i ≤ |K S | .
(32)
A function hˆ (β) = hˆ (β, k) is called a multi-wavepacket with nk-spectrum S if it is a finite sum of wavepackets, namely hˆ (β, k) =
N
hˆ l (β, k), 0 < β < β 0 for some β 0 > 0,
(33)
l=1
where hˆ l , l = 1, . . . , N , is a wavepacket with nk -pair (nl , k∗l ) ∈ S as in Definition 1. Note that if hˆ (β, k) is a wavepacket then hˆ (β, k) + O (β s ) is a wavepacket as well with the same nk-spectrum, and the same is true for multi-wavepackets. Hence, we can introduce a multi-wavepackets equivalence relation “ ” of degree s by hˆ 1 (β, k) hˆ 2 (β, k) if hˆ 1 (β, k) − hˆ 2 (β, k) 1 ≤ Cβ s for some constant C > 0. L (34) Observe also that zero functions are (trivial) wavepackets for any given (n, k)-spectrum. A wavepacket with any pair (n, k) is equivalent to zero if its L 1 norm is bounded by β s, and such trivial components of two equivalent wavepackets are excluded; the remaining sets of elements (nl , k∗l ) of spectra of two equivalent wavepackets must coincide. Let us turn now to the abstract nonlinear problem (16) where (i) F = F () depends on and (ii) the initial data hˆ = hˆ (β) is a multi-wavepacket depending on β. We would like to state our first theorem on multi-wavepacket preservation under the evolution (16) for β, → 0, which holds, as it turns out, provided its nk-spectrum S
336
A. Babin, A. Figotin
satisfies a certain natural condition called resonance invariance. This condition is intimately related to the so-called phase and frequency matching conditions for stronger nonlinear interactions, and its concise formulation is as follows. We define for given dispersion relations {ωn (k)} and any finite set S ⊂ {1, . . . , J } × Rd another finite set R (S) ⊂ {1, . . . , J } × Rd , where R is a certain algebraic operation described in Definition 18 below. It turns out that for any S always S ⊆ R (S) but if, in fact, R (S) = S we call S resonance invariant. The condition of resonance invariance is instrumental for the multi-wavepacket preservation, and there are examples showing that if it fails, i.e. R (S) = S, the wavepacket preservation does not hold. Importantly, the resonance invariance R (S) = S allows resonances inside the multi-wavepacket, that includes, in particular, resonances associated with the second and the third harmonic generations, resonant four-wave interaction, etc. Theorem 3 (Multi-wavepacket preservation). Suppose that the nonlinear evolution is governed by (16) and the initial data hˆ = hˆ (β, k) is a multi-wavepacket with nk-spectrum S and the regularity degree s, and assume S to be resonance invariant (see Definition 18 below). Let dependence between parametrs and β be any function = ρ (β) satisfying 0 < ρ (β) ≤ Cβ s , for some constant C > 0, (35) and let us set = ρ (β). Then the solution uˆ (τ , β) = G F (ρ (β)), hˆ (β) (τ ) to (16) for any τ ∈ [0, τ ∗ ] is a multi-wavepacket with nk-spectrum S and the regularity degree s, i.e. uˆ (τ , β; k) =
N
uˆ l (τ , β; k), where uˆ l is wavepacket with nk-pair (nl , k∗l ) ∈ S.
l=1
(36) The time interval length τ ∗ > 0 depends only on L 1 -norms of hˆ l (β, k) and N . The presentation (36) is unique up to the equivalence (34). The above statement can be interpreted as follows. Modes in nk-spectrum S are always resonance coupled with modes in R (S) through the nonlinear interactions, but if R (S) = S then (i) all resonance interactions occur inside S and (ii) only a small vicinity of S is involved in nonlinear interactions leading to the multi-wavepacket preservation. Many nonlinear evolution problems with small initial data can be readily reduced by elementary rescaling to the system (1) with a large parameter 1 before its linear part. For example, suppose that F (V) is a homogeneous nonlinearity of degree m (m = 3 for a cubic one) and that the nonlinear evolution is governed by ∂t V = −iL (−i∇) V + F (V), V (r, t)|t=0 = 1/(m−1) h (r), r ∈ Rd , considered for small on the large time interval 0 ≤ t ≤ the following simple change of variables:
τ∗
V (t) = 1/(m−1) U (τ ), τ = t
(37)
with a fixed τ ∗ > 0. Then (38)
transforms the problem (37) into the equivalent problem (1). In this case the inequality (35) describes a constraint between the spatial extension β1 and the amplitude factor
1/(m−1) = ρ (β)1/(m−1) of the initial data. Observe that Eq. (37) does not have any small parameters and both small parameters and β enter the problem through its initial data. Theorem 3 can be restated for problem (37) as follows:
Wavepacket Preservation Under Nonlinear Evolution
337
Corollary 4 (Multi-wavepacket preservation). Let V (r, t) be a solution to the nonlinear system (37), ρ (β) is as in (35) and we set = ρ (β). Then if the initial data is such ˆ (k, 0) = hˆ (k) is a multi-wavepacket, then −1/(m−1) V ˆ (k, t) remains that −1/(m−1) V as a multi-wavepacket with the same nk-spectrum and the degree of regularity for all τ∗ times t ∈ 0, . The statements of Theorems 3 and Corollary 4 directly follow from the following general theorem which makes no assumptions on the relations between β, → 0. Theorem 5 (Multi-wavepacket approximation). Let the initial data hˆ in the integral equation (16) be a multi-wavepacket hˆ (β, k) with nk-spectrum S as in (31), the regularity degree s and with the parameter > 0 as in Definition 1. Assume that S is resonance invariant in the sense of Definition 18 below. Let the cutoff function (k, k∗ ) and the eigenvector projectors n,± (k) be defined by (26) and (11) respectively. For a solution uˆ of (16) we set ⎡ ⎤
uˆ l (τ , β; k) = ⎣ (k, ζ k∗l ) nl ,ζ (k)⎦ uˆ (τ , β; k), l = 1, . . . , N . (39) ζ =±
Then every such uˆ l (k; τ , β) is a wavepacket and N
uˆ l (τ , β; k) sup uˆ (τ , β; k) − 0≤τ ≤τ ∗ l=1
≤ C1 + C2 β s ,
(40)
L1
where the constant C1 does not depend on , s and β, and the constant C2 does not depend on β. It is interesting to note that the statement of Theorem 5 can be extended to the special limit case β = 0, k∗l = 0. In this case the initial data of (1) are constants in r and we can consider solutions U (1) which do not depend on r. Then ∇U = 0, the linear operator L (−i∇) reduces to the multiplication by a matrix L0 = L (0) and the system (1) turns into a system of ordinary differential equations (ODE). Notice that (i) the structure of the eigenvalues (7) implies that the linear part is time-reversible; (ii) the nonlinear part can be an arbitrary polynomial. The extension of Theorem 5 to this case (see Theorem 11) reads that in a generic, non-resonant situation if initial data are bounded and a set of eigenmodes of the matrix L0 is excited at τ = 0, then in the course of evolution on a time interval [0, τ ∗ ] where τ ∗ depends on magnitude of initial data: (i) all remaining modes remain unexcited with accuracy proportional to , and (ii) only the originally excited modes can significantly evolve with this level of accuracy. For finite-dimensional systems governed by ODE’s such a statement can be derived from the classical time-averaging principle and the time-averaged equations remain nonlinear. For infinitely-dimensional systems governed by PDE and with the linear operator having a continuous spectrum, as in Theorem 5, the analysis is more complex but the time-averaging still plays an important role yielding an accurate approximation governed by a certain universal nonlinear PDE. We would like to point out also that though Theorem 3 is a simple corollary of the more general Theorem 5, it is important that the statement (40) can be formulated as multi-wavepacket invariance. That, in particular, allows to take values uˆ (τ ∗ ) as new
338
A. Babin, A. Figotin
wavepacket initial data for (1) and extend the wavepacket invariance of a solution to the next time interval τ ∗ ≤ τ ≤ τ ∗1 . This observation allows to extend the wavepacket invariance to larger values of τ (up to blow-up time or infinity) if some additional information about solutions with wavepacket initial data is available. In particular, the following theorem holds. Theorem 6. Assume that all conditions of Theorem 3 are satisfied and, in addition to that, solutions uˆ (τ , β) of (16) with the multi-wavepacket initial data hˆ (β) and = ρ (β) exist on an interval 0 ≤ τ < τ 0 , τ 0 ≤ ∞, and the estimate uˆ (·, β) C ([0,τ ],L 1 ) ≤ R (τ 1 ) 1 holds for anyτ 1 < τ 0 , where R (τ 1 ) does not depend on β ≤ β 0 . Then the solution uˆ (τ , β) = G F (ρ (β)), hˆ (β) (τ ) to (16) for any τ < τ 0 is a multi-wavepacket with nk-spectrum S and the regularity degree s, that is (36) holds. The derivation of the above statement from Theorem 3 is straightforward with the following key points. The interval τ ∗ in Theorem 3 depends only on the L 1 - norm of initial data and the solution uˆ (τ , β) is assumed to be bounded in L 1 by R (τ ) ≤ R (T ) for 0 ≤ τ ≤ T for any T < τ 0 . Therefore, we can apply Theorem 3 consecutively on intervals [nτ ∗ , (n + 1) τ ∗ ] for all integers n such that 0 ≤ nτ ∗ ≤ T and conclude that if uˆ (τ , β) is a wavepacket for τ = nτ ∗ it remains to be a wavepacket for τ ∈ [nτ ∗ , (n + 1) τ ∗ ]. Note that parameters β 0 and C in Definition 1 may depend on a wavepacket and be different for different wavepackets. Importantly, τ ∗ in the statement of Theorem 5 does not depend on β 0 and C . Since for any fixed T < τ 0 we can apply Theorem 3 a finite number of times the solution uˆ (τ ) is a wavepacket on the interval [0, T ] if T < τ 0 (with some parameters β 0 (T ) > 0 and C (T ) < ∞). Note that the wavepacket form of solutions can be used to obtain long-time estimates of solutions. Namely, very often behavior of every single wavepacket is well approximated by its own nonlinear Schrodinger equation (NLS), see 17,34,18,23,30, 31,47,50,51,53] and references therein, see also Sect. 6. Many features of the dynamics governed by NLS-type equations are well-understood, see 14,16,32,49,57,59] and references therein. These results can be used to obtain long-time estimates for every single wavepacket (as, for example, in 31]) and, with the help of the superposition principle, for the multiwavepacket solution. The wavepacket representation (36) from Theorem 3 can be used for more detailed analysis of dynamics of wavepackets uˆ l (τ , β) and interaction between them. The following theorem illustrates that by describing wavepacket interaction based on a system with a weakly universal nonlinearity similar to so-called coupled modes systems or NLS. Theorem 7 (NLS-type approximation). Let the conditions of Theorem 5 hold and, in addition to that, the initial data hˆ l (k) are of the form hˆ l = hˆ l,+ + hˆ l,− + Dˆ l , where hˆ l,ζ (k) = β −d Hˆ l,ζ β −1 (k − ζ k∗l ) gnl ,ζ (k) for |k − k∗l | ≤ β 1− , ζ = ±, Dˆ l satisfies (30), and every function Hˆ l,ζ (η), which may depend on β, is defined for all η and is bounded in L 1,a with a > s uniformly in β. Then one can write a nonlinear system of differential equations for 2N scalar envelope functions zl,ζ (τ , r) with the initial data Hl,ζ , a linear part of the system has order µ ≤ 3 and the nonlinearity is weakly universal as in (238) and has order ν ≤ 1. Let zˆl,ζ (τ , k), l = 1, . . . , N , be the Fourier transform of a solution to this system. Then there exist β 0 > 0 and a constant
Wavepacket Preservation Under Nonlinear Evolution
339
C which does not depend on β, such that for β ≤ β 0 the solution uˆ of (16) with initial data hˆ can be approximated as follows: N
uˆ l (τ , β) − β −d zˆl,ζ τ , β −1 (· − k∗l ) gnl ,ζ l=1
≤C +
E
β (µ+1)(1− ) + β (ν+1)(1− ) + β s .
(41)
The above-mentioned system with a weakly universal nonlinearity is constructed based on Eq. (1) and nk-spectrum S with the help of time averaging (70) described below. Note that in the simplest case when µ = 2, ν = 0, N = 1 (and J is arbitrary) the resulting system with a universal nonlinearity is equivalent to the classical Nonlinear Schrodinger equation (NLS). If N = 2 and k∗1 = −k∗2 we obtain the well-known coupled modes system for counterpropagating waves. This theorem applied to particular systems implies approximation theorems similar to results of (i) 30,53,6,23] on NLS approximation; (ii) 6,24,47,52] on coupled mode approximation; (iii) 54] on three-wave approximations. Note also that (41) implies that if = β κ with 1 < κ < 2, then both the first order hyperbolic equations (µ = 1, ν = 0) and the second-order NLS (µ = 2, ν = 0) provide an approximation for a solution uˆ of (16), but NLS provides a bet (1− ) ter approximation O β compared with O β 2(1− )−κ for first order hyperbolic equations. Observe that in the form (22) for a simple wavepacket we require gn,± (k∗ ) to be an eigenvector of the Hermitian matrix L (k∗ ), and one can wonder if gn,± (k∗ ) can be replaced with an arbitrary pair of vectors g± in the case J > 1. The answer is affirmative, since one can always expand any g with respect to the basis gn,± (k) using n,± (k), but the result will be a multi-wavepacket with up to 2J components rather than a single wavepacket. The rest of the paper is organized as follows. In the next section we illustrate important points of parameter dependence and wavepacket preservation based on examples. In Sect. 3 we formulate conditions of wavepacket preservation including the key resonance invariance condition. In Sect. 4 we provide examples of different forms of equations and systems which involve small or large parameters and can be written in the form of (1) after a rescaling. In Sect. 5 we introduce and discuss integrated modal forms of the evolution equation. In Sect. 6 we introduce and study the wavepacket interaction system in its relation to the original system. In Sect. 7 we approximate the wavepacket interaction system by a certain minimal wavepacket interaction system, which in the simplest cases turns into the NLS or the coupled modes system. 2. Preliminary Discussion and Examples Observe that the multi-wavepacket preservation as described in Theorems 3-7 states in different forms that (i) its modal composition is essentially preserved; (ii) its nk-spectrum (the set of nk-pairs {k∗l , nl }) remains the same at all times; (iii) no new modes are excited with good accuracy as a result of the nonlinear evolution. The preservation of multiwavepackets as they evolve shows also that only the nonlinear interactions between small neighborhoods of points (k∗l , nl ) are essential and contribute constructively to the nonlinear dynamics, whereas the amplitudes of modes with wavevectors k outside
340
A. Babin, A. Figotin
those neighborhoods is vanishingly small as β, → 0. The latter is quite remarkable ˆ ˆ since the coupling term F U (k) in (3) for such k is not small. A qualitative explanation to that, confirmed by rigorous analysis, is based on a fact that the contribution of this term to the solution is a time integral involving highly oscillatory functions that becomes vanishingly small as β, → 0. This mechanism is similar to the classical averaging mechanism for systems of ordinary differential equations described, for instance, in 11]; the relevance of the averaging mechanism for long-wave asymptotics for hyperbolic systems of PDE is well-known, see 30]. We would like to relate now the multi-wavepacket preservation property to the linear superposition
for wavepackets established in 7]. According to that principle if the initial state h = hl , with hl , l = 1, . . . , N being “generic” wavepackets, then the solution uˆ (τ ) = G (h) (τ ) to the evolution equation (15) equals with high accuracy the sum of individual solutions ul of N equations with respective initial data hl . Namely, if β, > 0 satisfy the following relation: β, → 0, β ≥ C1 with some C1 > 0,
(42)
then for all times 0 ≤ τ ≤ τ ∗ we have N N
wl (τ ) = G (wl ) (τ ) + D (τ ), G
(43)
l=1
l=1
D (τ ) E = sup D (τ ) L ∞ ≤ C
0≤τ ≤τ ∗
β 1+
+ Cβ for any > 0.
(44)
The linear superposition principle is formulated in 7] for β = C2 1/2 , but, in fact, the provided proofs of (43), (44) remain valid as long as (42) holds. Obviously, the bound β ≥ C1 in (42) determines when (44) becomes trivial. This bound is sharp and examples below show that when β ∼ the remainder D (τ ) in (43) does not tend to zero when β → 0. Both the multi-wavepacket preservation and the linear superposition apply to sums of generic wavepackets. It is important to notice though that the multi-wavepacket preservation holds for any dependence between and β which satisfy (35), that is (β) ≤ Cβ q with arbitrary small q whereas the linear superposition holds if (β) ≤ Cβ. Thus, the bounds (42) on β determine the range of its values for which both multi-wavepacket preservation and linear superposition hold simultaneously (provided some genericity conditions are satisfied). In this range wavepacket preservation provides additional information on behavior of solutions with single wavepacket initial data, namely that the solution remains a single wavepacket. Obviously, the linear superposition principle does not follow from multi-wavepacket invariance. Below we use simple examples and models to discuss different ranges of parameters and β where wavepacket preservation is valid but the solutions of equations exhibit different behavior.
2.1. A model with explicit solutions and the effect of large group velocity. Here we introduce a simple model for our general system (1) with elementary solutions which makes explicit that in the limit → 0 nonlinear effects do not vanish, in particular the blow-up time does not tend to infinity. This example also shows that on the time scale where τ is of order 1 solutions undergo significant nonlinear evolution. The influence
Wavepacket Preservation Under Nonlinear Evolution
341
of on solutions through the group velocity in this example can be seen explicitly. The model is the following system of two coupled nonlinear first order hyperbolic equations for variables u 1 (x, τ ), u 2 (x, τ ) with one-dimensional spatial variable x: ∂τ u 1 = − ∂τ u 2 = −
c1 ∂x u 1 + F1 (u 1 , u 2 ),
c2 ∂x u 2 + F2 (u 1 , u 2 ), c1 = c2 ,
u 1 |τ =0 = h 1 (x),
(45) u 2 |τ =0 = h 2 (x), (46)
where the initial data h 1 , h 2 in (46) are of wavepacket form: h 1 (x) = 1 (βx) cos k1∗ x, h 2 (x) = 2 (βx) cos k2∗ x, |k1∗ | = |k2∗ | .
(47)
We take the nonlinearity to be quadratic and of the following simple form: F1 (u 1 , u 2 ) = u 21 + a1 u 1 u 2 ,
F2 (u 1 , u 2 ) = u 22 + a2 u 1 u 2 .
(48)
The system (45)–(47) allows for an explicit form of solutions with one-wavepacket initial data, describing a wave propagating with a constant speed controlled by the linear part and with a shape evolution controlled by the nonlinearity. This simplest case is compared then with the case of two-wavepacket initial data, for which an explicit solution is not available. In the case when h 2 = 0 the second equation has trivial solution u 2 = 0 and the system (45)–(46) reduces to a single equation (45). The solution to this equation has the c1 form of a traveling wave v1 x − τ , τ , where v1 (y, τ ) is a solution of the ordinary differential equation ∂τ v1 = F1 (v1 , 0), v1 (y, 0) = h 1 (y).
(49)
The explicit formula in the case (49) yields h 1 x − c1 τ 1 β x − c1 τ cos k1∗ β x − c1τ = v1 (x, τ ) = 1 − τ h 1 x − c1τ 1 − τ 1 β x − c1 τ cos k1∗ β x − c1 τ (50) for a time interval 0 ≤ τ < τ 0 , where τ 0 = sup |h1 1 (y)| is the blow-up time. Obviously, y the blow-up time does not depend on . Consequently, the wave propagates with the velocity c1 with its shape evolution being controlled by the nonlinearity. Similarly, when h 1 = 0 the first equation has the trivial solution u 1 = 0 and the system (45)–(46) reduces to a single equation (46) which has a solution in the form of a traveling wave v2 x − c2 τ , τ propagating with the velocity c2 . Observe that for the simple model (45)–(47) the group velocity coincides with the velocity of a traveling wave. The above model is not exactly solvable if both initial conditions h 1 and h 2 do not vanish. But one can still see the way influences the nonlinear dynamics quite explicitly by applying the superposition principle from 6]. Indeed, let us assume that h 1 and h 2 are two nonzero initial wavepackets. Then the approximate superposition principle is applicable (in order to put the system in the framework of 6] we use the 4-component extension (115) and set = β κ , κ > 1). According to the principle the exact solution
342
A. Babin, A. Figotin
(u 1 , u 2 ) is approximated by v1 x − c1 τ , τ , v2 x − c2 τ , τ , which is explicitly = O β κ −1− with arbitrary small if given by (50) with the accuracy O β 1+
c1 = c2 . As it as shown in 6] the validity of such an approximate presentation is due to 2 the large difference c1 −c of the group velocities of two wavepackets. 2.2. Dispersive effects and nonlinearity. Based on an elementary example of the Nonlinear Schrodinger equation (NLS), i ∂τ u = − γ 0 u + iγ 1 ∂x u + γ 2 ∂x2 u + b1 |u|2 u, u = u (x, τ ), x ∈ R (51) with the initial data in the form of a wavepacket u|τ =0 = (βx) eik∗ x , we would like to explain here why we are interested mostly in the case ≥ C > 0, β2
(52)
when the dispersion is not dominant. To make the dependence of u on β and explicit we change the variables u (x) = v (βx) eik∗ x , βx = z, (53) and obtain the equation i ∂τ v = − γ 0 v + iβγ 1 ∂z v + γ 2 β 2 ∂z2 v1 + b |v|2 v, v|τ =0 = (z), where γ 1 = γ 1 /β + 2γ 2 k∗ . Changing variables once more, β β − iτ γ 0 v (z, τ ) = e w z + γ 1 τ , τ , z + γ 1 τ = y,
(54)
(55)
we obtain for the envelope w the following standard NLS equation: ∂τ w = −
iβ 2 γ ∂ 2 w + b |w|2 w, 2 y
w|τ =0 = (y), 0 ≤ τ ≤ τ ∗ ,
(56)
with initial data independent of the parameters β, . The behavior of the solution w to 2 Eq. (56) on the time interval 0 ≤ τ ≤ τ ∗ is determined by the dispersion parameter β , and evidently linear dispersive effects become significant when β2 is not too large. If β2
→ ∞ and β → 0, the solution tends to zero at every fixed τ = τ 0 > 0. Indeed,
if we take = β κ , κ > 2, and make another change of variables τ = tβ κ −2 , w = β 1−κ /2 W , Eq. (56) reduces to the following problem with small initial data: ∂t W = −iγ 2 ∂ y2 W + b |W |2 W,
W |t=0 = β κ /2−1 (y).
(57)
For small enough β the solution W to this problem exists for all t and W (t) → 0 as t → ∞ (see 16]). In particular, for t = τ 0 β 2−κ we have w (τ 0 ) → 0 when β → 0. In the general case, the solution dependence on small β, is as follows. The dependence on large 1 in (51) is completely described by the change of variables (55), yielding a
Wavepacket Preservation Under Nonlinear Evolution
343 −γ
wave which (i) moves as a whole with a large group velocity 1 ; (ii) has a slowly evolving shape as described by v and w in (53), (55), (56). The above observations show that for small β2 the dispersive effects dominate and control the nonlinear ones. Keeping that in mind and being interested in stronger nonlinear effects we focus primarily on the case (52), i.e. β2 ≥ C > 0, for which there are two β2
→ 0, the linear dispersion produces only a small correction to the solution of the equation ∂τ w = b |w|2 w with that nonlinear equation governing the nonlinear dynamics of the envelope w for τ ∗ being smaller than the blow-up time. In the second scenario, when β 2 ∼ , Eq. (56) becomes independent of β, and describes the evolution of the envelope w governed by an interplay between the dispersion and the nonlinearity. The case β 2 ∼ can be also characterized as one where dispersive effects do occur but they don’t dominate nonlinear effects, and, as it is well known, the dispersion can exactly balance the nonlinearity yielding solitons. scenarios of the nonlinear evolution. In the first scenario, when
2.3. A coupled modes system. Here we illustrate statements of the general theorem on the wavepacket preservation and the approximate superposition principle by a simple but still nontrivial example. Let us consider a system of two coupled NLS type equations for variables u 1 (x, τ ), u 2 (x, τ ) with one-dimensional spatial variable x, i γ 01 + iγ 11 ∂x + γ 21 ∂x2 u 1 + b11 |u 1 |2 + b12 |u 2 |2 u 1 + c12 |u 2 |2 u 2 , (58) i ∂τ u 2 = − γ 02 + iγ 12 ∂x + γ 22 ∂x2 u 2 + b21 |u 1 |2 + b22 |u 2 |2 u 2 + c22 |u 1 |2 u 1 , (59) ∂τ u 1 = −
u 1 |τ =0 = h 1 (x) = 1 (βx) eik∗1 x , u 2 |τ =0 = h 2 (x) = 2 (βx) eik∗2 x ,
(60)
where γ i j are real and bi j are complex coefficients and the initial data in (60) are in the form of wavepackets with j (y) being Schwartz functions. Notice that if in the coupled modes system (58)–(60) h 2 = 0 and c12 = c22 = 0, then it has trivial solution u 2 = 0, and reduces to a single NLS equation of the form (51). The dependence of the solution {u 1 , u 2 } on the large 1 is captured by the change of variables (55). Namely, u 1 is a wave −γ
with a slowly varying envelope described by v1 which moves with large velocity 11 . The dependence on β is of the form v1 (y, τ ) = w1 (β y, τ ) (see the following subsection for details). Similarly we can consider the case when h 1 = 0 for which the first equation has trivial solution u 1 = 0, so the system (58)–(59) reduces to a single equation (59) with the solution represented by a wave having large spacial extension proportional to −γ 12 1 β and moving with the large velocity . 2.3.1. The superposition principle. Let us assume here that h 1 = 0, h 2 = 0, c12 = 0, c22 = 0 and β = κ , 0 < κ < 1. Applying the superposition principle we obtain for generic k∗1 , k∗2 the following representation of the exact solution: u 1 (x, τ ) = v1 (x, τ ) eik∗1 x + D1 , u 2 (x, τ ) = v2 (x, τ ) eik∗2 x + D2 ,
344
A. Babin, A. Figotin
where v1 (x, τ ) is a solution of the NLS equation (58) with b12 = c12 = 0, with v2 (x, τ ) being a solution to a similar decoupled NLS equation for b22 = c22 = 0, and D1 and D2 are small terms satisfying
sup0≤τ ≤τ ∗ D1 (·, τ ) L ∞ + sup0≤τ ≤τ ∗ D2 (·, τ ) L ∞ ≤ Cβ κ −1− + Cβ, κ = κ −1 . (61) We would like to emphasize here that the coupling terms b12 |u 2 |2 u 1 + c12 |u 2 |2 u 2 and b21 |u 1 |2 u 2 + c22 |u 2 |2 u 2 in Eq. (58)–(59) are not small whereas their ultimate contributions to the solutions are small. One can explain/interpret that phenomenon as being due to the destructive wave interference and mismatch of group velocities. 2.3.2. Wavepacket preservation. Here we assume that h 1 = 0, h 2 = 0, c12 = 0, c22 = 0 and = β κ , 0 < κ ≤ 2. According to the wavepacket preservation we have u 1 (x, τ ) = v1 (x, τ ) eik∗1 x + D1 , u 2 (x, τ ) = D1 , where v1 (x, τ ) is a solution of (58) with b12 = 0 , c12 = 0, and D1 and D2 are small terms satisfying sup0≤τ ≤τ ∗ D1 (·, τ ) L ∞ + sup0≤τ ≤τ ∗ D2 (·, τ ) L ∞ ≤ C. Notice once more (see the above section) an interesting phenomenon: Eq. (59) for u 2 (x, τ ) has a coupling term b21 |u 1 |2 u 2 + c22 |u 1 |2 u 1 which does not become small as β, → 0, but, remarkably, its ultimate contribution to the solution is small. 2.3.3. Limitations of the superposition principle. Now we provide an example based on the system (58)–(60) with c12 = c22 = 0 showing that the above estimate (61) in the superposition principle is sharp in the sense that β κ −1− cannot be replaced by β κ −1+
with κ ≥ 1. We set here κ = 1 and = β. After the change of variables (53) for u 1 , u 2 −iτ
γ 01
−iτ
β w ,v = e followed by yet another change of variables βx = z, v1 = e 1 2 we obtain from (58)–(60) the following system: ∂τ w1 = −i iγ 11 ∂z w1 + βγ 21 ∂z2 w1 + b11 |w1 |2 + b12 |w2 |2 w1 , ∂τ w2 = −i iγ 12 ∂z w2 + βγ 22 ∂z2 w2 + b21 |w1 |2 + b22 |w2 |2 w2 ,
γ 01 β
w2 ,
w1 |τ =0 = 1 (z), w2 |τ =0 = 2 (z). This system has a regular dependence on β as β → 0 with the solution converging in L ∞ to the solution of the system with β = 0. If we set now in the last system b12 = b21 = 0 it turns into a system of two decoupled equations. Notice then that the difference between the solutions of the decoupled system and the original one does not tend to zero as β → 0, implying that the superposition principle does not hold when = β. 2.4. Wavepacket interaction system with a universal nonlinearity. We will prove in the following sections that the dynamics of a multi-wavepacket with a universally resonance invariant nk-spectrum for a general system can be approximated with the accuracy O () by substituting the nonlinearity with a properly constructed universal or weakly universal one. Here we provide an example of a system, called wavepacket interaction
Wavepacket Preservation Under Nonlinear Evolution
345
system, with a universal nonlinearity and show that its dynamics preserves simple wavepackets as in (12). It is shown later that universal nonlinearities are related to universally invariant multi-wavepackets in the sense of Definition 18. Wavepacket interaction system with universal nonlinearity has the form similar to NLS, namely ∂τ u j,ζ =
1 −iζ γ 0, j + γ 1, j · ∇ r u j,ζ − iζ ∇ r · γ 2, j ∇ r u j,ζ + F j,ζ ( u ), r ∈ Rd , (62) u = (u 1+ , u 1− , . . . , u N + , u N − ), j = 1, . . . , N , ζ = ±, u j,ζ = h j,ζ , h j,ζ (r) = j (βr) eiζ k∗ j ·r , τ =0
(63) (64)
where for every j coefficient γ 0, j ∈ R, γ 1, j ∈ Rd is a vector, γ 2, j is a symmetric d × d matrix, γ 1, j · ∇ r is a first order scalar differential operator, ∇ r · γ 2, j ∇ r is the second order scalar differential operator,and the universal polynomial nonlinearities F j,ζ have the following form: F j,ζ ( u) =
νF
bν , j,ζ
N
ν=1 |ν |=ν
where ν = (ν 1 , . . . , ν N ),
l=1
u l,+ u l,−
ν l
u j,ζ ,
j = 1, . . . , N , ζ = ±.
(65)
Remark 8. Notice that if we set h j,− = h ∗j,+ , bν , j,+ = bν∗, j,− = bν , j and u j,+ = u ∗j,− = 2 u j then u l,+ u l,− = u l,+ and F j,+ ( u ) turns into F j (u 1 , . . . , u N ) =
νF
ν=1 |ν |=ν
bν , j
N l=1
|u l |2ν l u j ,
(66)
and equations of (62) with ζ = + turn into ∂τ u j =
1 −iγ 0, j + γ 1 j · ∇ r u j − i∇ r · γ 2, j ∇ r u j + F j (u 1 , . . . , u N ), u j τ =0 = h j,+ , j = 1, . . . , N , ζ = ±.
(67)
Obviously, a solution of (67) defines a solution u j,+ = u j , u j,− = u ∗j of the system (62). In the simplest case N = 1, d = 1 (67) takes the form of classical NLS: γ γ ∂τ u = 1 ∂x u − i 2 ∂x2 u + b |u|2 u. Note that the universal nonlinearity F j,ζ has a characteristic property F j,ζ eiφ 1 t u 1,+ , e−iφ 1 t u 1,− , . . . , eiφ N t u N ,+ , e−iφ N t u N ,− = eiζ φ j t F j,ζ (u 1+ , u 1− , . . . , u N + , u N − ),
(68)
holding for arbitrary set values φ i . We also consider more general nonlinearities F for which (68) holds for a fixed set of frequencies φ l = ωnl (k∗l ), and call them weakly
346
A. Babin, A. Figotin
universal. We introduce now the averaging operator A T acting on polynomial functions N N F : C2 → C2 by
1 T
0
(A T F) j,ζ = A T,φ F T
j,ζ
=
e−iζ φ j t F j,ζ eiφ 1 t u 1,+ , e−iφ 1 t u 1,− , . . . , eiφ N t u N ,+ , e−iφ N t u N ,− dt,
(69)
where φ = φ 1 , . . . , φ N . The operator A T,φ depends on the frequency vector φ = = F j,ζ for φ 1 , . . . , φ N . If F is a universal polynomial nonlinearity, then A T,φ F any choice of frequencies φ 1 , . . . , φ N . Note that averaging
G av, j,ζ ( u ) = lim
T →∞
A T,φ G
j,ζ
j,ζ
u) (
(70)
N N is defined for any polynomial nonlinearity G : C2 → C2 . If φ is generic, then G av, j,ζ ( u ) is always a universal nonlinearity. In a general case G av, j,ζ for given frequencies φ one obtains a weakly universal nonlinearity which might be not universal. Systems with universal nonlinearities have interesting properties which we describe in the following proposition and remark. Proposition 9. Let = β and γ 2, j = 0. Then evolution governed by the first order system with a universal nonlinearity (62) preserves simple wavepackets as defined by (12). Proof. Let u (τ ) be a solution of (62) for 0 ≤ τ ≤ τ ∗ . Using the property (68) we change variables u j,ζ = eiζ k∗ j ·r e
−i
ζ γ 0, j
τ −i
e
γ 0 j,ζ β
τ
v j,ζ , γ 0 j,ζ = −ζ γ 1 j · k∗ j
(71)
v j,ζ τ =0 = j,ζ (βr).
(72)
and obtain from (62) ∂τ v j,ζ =
1 γ · ∇ r v j,ζ + F j,ζ ( v ), β 1j
Changing variables v j,ζ (r, τ ) = w j,ζ (βr, τ ), βr = z,
(73)
we obtain from (72) that w j is a solution of the following system of differential equations: ∂τ w j,ζ = γ 1 j · ∇ z w j,ζ + F j,ζ (w),
w j,ζ τ =0 = j,ζ (z),
(74)
which does not depend on β. Then using (73) and (71) we observe that every component u l of the solution to (62) has the form of a simple wavepacket for every τ ∈ [0, τ ∗ ], with an envelope wˆ j (τ ).
Wavepacket Preservation Under Nonlinear Evolution
347
Remark 10. Equations (62) with universal nonlinearities allow special solutions in the −i
γ 0 j
τ
form of u j,ζ = eik∗ j ·r e β v j,ζ (τ ), where v j,ζ (τ ) do not depend on r. If the initial data in (72) are constants, j,ζ (βr) = j,ζ (0), then (72) turns into a system of ODE. This implies that every linear subspace of pure modal functions with the basis v j eik∗ j ·r , v j,− e−ik∗ j ·r , j = 1, .., N is invariant with respect to nonlinear equations (62). Another class of special solutions of (62) are time-harmonic solutions of the form u j,ζ (r, τ ) = e−iζ ω j τ v j,ζ (r), where v j,ζ solve a nonlinear eigenvalue problem; for universal nonlinearities ω j can be considered as an unknown nonlinear eigenvalue. Existence of such special solutions is a special property of universal and weakly universal nonlinearities. It is remarkable that original nonlinear equations might not have time harmonic solutions whereas equations with universal nonlinearities which approximate evolution of wavepackets (see Theorem 7) admit such solutions. 2.5. Invariance of excited modes for finite-dimensional ODE’s. Here we discuss the resonance invariance conditions imposed in Theorem 5 in a simpler case of finitedimensional ODE’s. In this case one can also see the rise of universal nonlinearities in the process of time averaging. As we already discussed in the introduction, a PDE system (1) when restricted to constant functions turns into the following system of ODE’s: i ∂τ U = − L0 U + F (U), U (τ )|τ =0 = h, h ∈ C2J , U ∈ C2J, (75) where F (U) is a polynomial, U = U1,+ , U1,− , . . . , U J,+ , U J,− ∈ C2J . We assume that the eigenvalues ωn,ζ (0) = ω0n,ζ of the Hermitian matrix L0 = L (k)|k=0 are distinct 0 for j = i and the symmetry conditions (7) take the form ω0 0 ω0j,+ = ωi,+ n,−ζ = −ωn,ζ . We also assume that the eigenvectors of L0 coincide with the coordinate orts in C2J . The following limit case of Theorem 5 with β = 0 shows that solutions to this system have the property to preserve the set of initially excited modes. Theorem 11. Let the initial data h = h 1,+ , h 1,− , . . . , h J,+ , h J,− ∈ C2J in (75) have non-zero components h j,ζ only for a subset B of indices j ∈ {1, . . . , J }, and let B = {1, . . . , J } \ B be its complementary set. Assume that B is resonance invariant in the sense that the resonance equation ω0n ,ζ −
m
j=1
ω0n
j ,ζ
( j)
= 0, where n j ∈ B, ζ ( j) ∈ {+, −}
(76)
does not have solutions if n ∈ B (compare with Definition 18 in the special case when all k∗l = 0). Then under the nonlinear evolution of (75) modes with indices n ∈ B remain essentially unexcited in the following sense: sup |Un (τ )| ≤ C for all n ∈ B .
0≤τ ≤τ ∗
(77)
Note that F (U) provides a nonlinear coupling between modes Un j ,ζ ( j) with n j ∈ B and Un ,ζ with n ∈ B , but the resulting interaction is not O (1) on a fixed time interval [0, τ ∗ ] as one might expect, but rather of order O (v) as (77) shows. One way to prove
348
A. Babin, A. Figotin
Theorem 11 is to follow the proofs of Theorems 35 and 37 with obvious modifications and simplifications. In particular, instead of (15) one has to consider the following system with oscillatory coefficients: iτ
∂τ u = e
L0
−iτ 0 L F e u ,
u (τ )|τ =0 = h.
(78)
Alternatively, Theorem 11 can be derived directly from the classical time averaging principle. Indeed, the time averaging of (78) yields the following averaged system: ∂τ v = Fav (v),
v (τ )|τ =0 = h,
where Fav is defined as in (69), (70) with the frequencies φ j = ω0j,+ . From the KrylovBogolyubov averaging theorem (see 11,37]) one obtains |v (τ ) − u (τ )| ≤ C, 0 ≤ τ ≤ τ ∗ . A straightforward examination shows that if B is resonance invariantand j ∈ B then 1 the polynomial components Fav, j,ζ (v) factorize into Fav, j,ζ (v) = j ∈B ,ζ Fav, j ,ζ (v) v j ,ζ , implying (77) since v j,ζ (0) = 0 for j ∈ B . A stronger universal resonance invariance condition in Definition 18 also takes a simpler form in the ODE case. Indeed, let us collect the terms in (76) at different ω0j,+ as in (101), namely ω0n ,ζ −
m
j=1
ω0n
j ,ζ
( j)
=
J
0 δ i ωi,+ , where δ i are integers,
(79)
i=1
Similarly to Definition 18 we call B universally resonance invariant if every solution to the resonance equation (76) must have n ∈ B and every coefficient δ i in (79) for the solution is zero, i.e. δ i = 0, i = 1, . . . , J . Obviously, if all ω0n,+ are rationally independent then it is universally resonance invariant. Now let us look at how universal nonlinearities arise under time averaging. Observe that if the entire set {1, . . . , J } is universally resonance invariant and F j,ζ (v) are arbitrary polynomials, then the polynomials Fav, j,ζ (v) are obtained by discarding the “resonant” iτ 0 −iτ 0 L L terms in e F e u yielding the universal form (65), (66). For example, if F is an arbitrary cubic nonlinearity in C2N then the time averaging yields NLS-like nonlinearity Fav with components N
Fav, j,ζ u 1,+ , u 1,− , . . . , u N ,+ , u N ,− = bl, j,ζ u l,+ u l,− u j,ζ . l=1
When B is resonance invariant but not universally resonance invariant the averaging produces a weakly universal nonlinearity. A nonlinearity which is weakly universal but not universal may include additional terms, for example the cubic nonlinearity in the classical four-wave interaction system where it is assumed that ω02,− +ω03,+ +ω04,+ = ω01,+ (see 46] p. 201) in the equation for u 1,+ in addition to NLS-like terms involves the product u 2,− u 3,+ u 4,+ .
Wavepacket Preservation Under Nonlinear Evolution
349
3. Conditions and Definitions In this section we formulate and discuss definitions and conditions under which we study the nonlinear evolutionary system (1) through its modal, Fourier form (3). Most of the conditions and definitions are naturally formulated for the modal form (3), and this is one of the reasons we use it as the basic form. 3.1. Linear part. The basic properties of the linear part L (k) of the system (3), which is a 2J × 2J Hermitian matrix with eigenvalues ωn,ζ (k), has been already discussed in the Introduction. To account for all needed properties of L (k) we define the singular set of points k. Definition 12 (Band-crossing points). We call k0 a band-crossing point for L (k) if ωn+1,ζ (k0 ) = ωn,ζ (k0 ) for some n, ζ or L (k) is not continuous at k0 or if ω1,± (k0 ) = 0, we denote the set of such points by σ bc . In the next condition we collect all constraints imposed on the linear operator L (k). Condition 13 (Linear part). The linear part L (k) of the system (3) is a 2J ×2J Hermitian matrix with eigenvalues ωn,ζ (k) and corresponding eigenvectors gn,ζ (k) satisfying for k ∈ / σ bc the basic relations (5)–(7). In addition to that we assume: (i) the set of band-crossing points σ bc is a closed, nowhere dense set in Rd and has zero Lebesgue measure; (ii) the entries of the Hermitian matrix L (k) are infinitely differentiable in k for all k∈ / σ bc that readily implies via the spectral theory, 35], infinite differentiability of all eigenvalues ωn (k) in k for all k ∈ / σ; (iii) L (k) satisfies the polynomial bound L (k) ≤ C 1 + |k| p , k ∈ Rd , for some C > 0 and p > 0. (80) Remark 14 (Dispersion relations symmetry). The symmetry condition (7) on the dispersion relations naturally arises in many physical problems, for example Maxwell equations in periodic media, see 1–3,5], or when L (k) originates from a Hamiltonian. We would like to stress that these symmetry conditions are not imposed to simplify studies but rather to take into account fundamental symmetries of physical media. In fact, the opposite case when ((7) is assumed not to hold is much simpler. The symmetry creates resonant nonlinear interactions, which makes studies more intricate. Interestingly, many problems without symmetries can be put into the framework with symmetry by an extension of the relevant system (see Sect. 4). Remark 15 (Band-crossing points). Band-crossing points are discussed in more detail in 1, Sect. 5.4], 2, Sects. 4.1, 4.2]. In particular, generically the set σ bc of the bandcrossing point is a manifold of the dimension d − 2. Notice that there is an natural ambiguity in the definition of the normalized eigenvectors gn,ζ (k) of L (k) which is defined up to a complex number ξ with |ξ | = 1. This ambiguity may not allow an eigenvector gn,ζ (k) which can be a locally smooth function in k to be a uniquely defined continuous function in k globally for all k ∈ / σ bc because of a possibility of branching. But, importantly, the orthogonal projector n,ζ (k) on gn,ζ (k) as defined by (11) is uniquely defined and, consequently, infinitely differentiable in k via the spectral theory, 35], for ˆ (k) as an element of the space L 1 and σ bc is of zero all k ∈ / σ bc . Since we consider U Lebesgue measure considering k ∈ / σ bc is sufficient for us.
350
A. Babin, A. Figotin
2J We introduce for vectors uˆ ∈ C their expansion with respect to the orthonormal basis gn,ζ (k) :
uˆ (k) =
J
uˆ n,ζ (k) gn,ζ (k) =
n=1 ζ =±
J
uˆ n,ζ (k), uˆ n,ζ (k) = n,ζ (k) uˆ (k),
n=1 ζ =±
(81) and we refer to it as the modal decomposition of uˆ (k) and to uˆ n,ζ (k) as the modal coefficients of uˆ (k). Evidently j
n,ζ (k) = I2J ,
where I2J is the 2J × 2J identity matrix.
(82)
n=1 ζ =±
Notice that in view of the polynomial bound 80) we can define the action of the operator L (−i∇ r ) on any Schwartz function Y (r) by the formula r ) Y (k) = L (k) Y ˆ (k), L (−i∇
where the order of L does not exceed p.
(83)
In a special case when all the entries of L (k) are polynomials (83) turns into the action of the differential operator with constant coefficients of order not exceeding p. 3.2. Nonlinear part. The nonlinear term Fˆ in (3) is assumed to be a general functional polynomial of the form
ˆ m , where Fˆ (m) is m-homogeneous polylinear operator, ˆ = Fˆ (m) U Fˆ U m∈MF
(84) M F = m 1 , . . . , m p ⊂ {2, 3, . . .} is a finite set, and m F = max {m : m ∈ M F } . (85) ˆ For instance, The integer m F in (85) is called the degree of the functional polynomial F. if M F = {2} or M F = {3} the polynomial Fˆ is respectively homogeneous quadratic or cubic. Every m-linear operator Fˆ (m) in (84) is assumed to be of the form of a convolution ˆ 1, . . . , U ˆ 1 k . . . U ˆ m (k, τ ) = ˆ m k(m) k, k d˜ (m−1)d k, χ (m) k, k U Fˆ (m) U Dm
(86) dk . . .
dk(m−1)
, where Dm = R(m−1)d , d˜ (m−1)d k = (2π )(m−1)d k(m) k, k = k − k − . . . − k(m−1) , k = k , . . . , k(m) ,
(87)
indicating that the nonlinear operator F (m) (U1 , . . . , Um ) is translation invariant (it may be local or non-local). The quantities χ (m) in (86) are called susceptibilities. For numerous examples of nonlinearities of the form similar to (84), (86) see 1–7] and references therein. In what follows the nonlinear term Fˆ in (3) will satisfy the following conditions.
Wavepacket Preservation Under Nonlinear Evolution
351
ˆ is assumed to be of the form Condition 16 (Nonlinearity). The nonlinearity Fˆ U (84)–(86). The susceptibility χ (m) k, k , . . . , k(m) is infinitely differentiable for all k and k( j) which are not band-crossing points, and is bounded, namely (m) (m) k, k , . . . , k(m) ≤ Cχ , m ∈ M F , sup χ = (2π )−(m−1)d χ k,k ,...,k(m) ∈Rd \σ bc
(88) m m where the norm χ (m) k, k of the m-linear tensor χ (m) : C2J → C2J for fixed k, k is defined by (m) k, k = sup χ (m) k, k (x1 , . . . , xm ) , where |x| is the Euclidean norm. χ x ≤1 | j| (89) When χ (m) k, k depend on small or, more generally, on q , q > 0, we simi q . Many results of this paper extend to this case, in particular larly have χ (m) k, k, q − χ (m) k, k, 0 if χ (m) k, k, ≤ Cχ q for ≤ 1 then conditions of Corollary 38 are fulfilled. Note that since the tensors χ (m) k, k are bounded, the dependence on k, k cannot be polynomial, therefore the original equation (1) does not include spatial derivatives but rather includes bounded “pseudodifferential” operators. Note that this type of susceptibilities with spatial dispersion is common in nonlinear optics, see 15,41,55]. 3.3. Resonance invariant nk-spectrum. In this section, relying on given dispersion relations ωn (k) ≥ 0, n ∈ {1, . . . , J }, we consider resonance properties of nk-spectra S and the corresponding k-spectra K S as defined in Definition 2, i.e. S = {(nl , k∗l ), l = 1, . . . , N } ⊂ = {1, . . . , J } × Rd , K S = k∗li , i = 1, . . . , |K S | . (90) We precede the formal description of the resonance invariance (see Definition 18) with the following guiding physical picture. Initially at τ = 0 the wave is a multi-wavepacket composed of modes from a small vicinity of the nk-spectrum S. As the wave evolves according to (3) the polynomial nonlinearity inevitably involves a larger set of modes [S]out ⊇ S, but not all modes in [S]out are “equal” in developing significant amplitudes. The qualitative picture is that whenever a certain interaction phase function (see (134) below) is not zero, the fast time oscillations weaken effective nonlinear mode interaction and the energy transfer from the original modes in S to relevant modes from [S]out , keeping their magnitudes vanishingly small as β, → 0. There is a smaller set of modes [S]res out which can interact with modes from S rather effectively and develop significant amplitudes. Now, if [S]res out ⊆ S then S is called resonance invariant.
(91)
In simpler situations the resonance invariance conditions turn into the well-known in nonlinear optics phase and frequency matching conditions. For instance, if S contains n 0 , k∗l0 and the dispersion relations allow for the second harmonic generation in
352
A. Babin, A. Figotin
another band n 1 so that 2ω n 0 k∗l0 = ωn 1 2k∗l0 , then for S to be resonance invariant it must contain n 1 , 2k∗l0 too. Let us turn now to the rigorous constructions. First we introduce necessary notations. Let m ≥ 2 be an integer, l = (l1 , .., lm ), l j ∈ {1, . . . , N } be an integer vector from {1, . . . , N }m and ζ = ζ(1) , ,.., ζ (m) , ζ ( j) ∈ {+1, −1} be a binary vector from {+1, −1}m . Note that a pair ζ , l naturally labels a sample string of the length m composed of elements ζ ( j) , nl j , k∗l j from the set {+1, −1} × S. Let us introduce the sets = {(ζ , l) : l ∈ {1, . . . , N } , ζ ∈ {+1, −1}}, m = λ = (λ1 , . . . , λm ), λ j ∈ , j = 1, . . . , m .
(92)
There is a natural one-to-one correspondence between m and {−1, 1}m × {1, . . . , N }m and we will write, exploiting this correspondence λ = ζ , l1 , . . . , ζ (m) , lm = ζ , l , ϑ ∈ {−1, 1}m, l∈ {1, . . . , N }m for λ ∈ m . (93) Let us introduce the following linear combination: m
κm λ = κm ζ , l = ζ ( j) k∗l j with ζ ( j) ∈ {+1, −1},
(94)
j=1
and let [S] K ,out be the set of all its values as k∗l j ∈ K S , λ ∈ m , namely κm λ . [S] K ,out = m m∈MF
λ∈
(95)
We call [S] K ,out the output k-spectrum of K S . Everywhere in this paper we consider nk-spectra S which satisfy the following condition: σ bc = ∅. (96) [S] K ,out We also define the output nk-spectrum of S by [S]out = (n, k) ∈ {1, . . . , J } × Rd : n ∈ {1, . . . , J } , k ∈ [S] K ,out .
(97)
We introduce the following functions: 1,m λ k∗ = mj=1 ζ ( j) ωl j k∗l j , k∗ = k∗1 , . . . , k∗|K S | , where k∗l j ∈ K S ,
ζ , n, λ
k∗∗ , k∗ = −ζ ωn (k∗∗ ) + 1,m λ k∗ ,
(98) (99)
where ζ = ±1, m ∈ M F as in (84). We introduce these functions to apply later to phase functions (134). Now we introduce the resonance equation ζ , n, λ ζ κm λ , k∗ = 0, l ∈ {1, . . . , N }m , ζ ∈ {−1, 1}m , (100)
Wavepacket Preservation Under Nonlinear Evolution
353
denoting by P (S) the set of its solutions m, ζ , n, λ . Such a solution is called S-internal if n, ζ κm λ ∈ S, that is n = nl0 , ζ κm λ = k∗l0 , l0 ∈ {1, . . . , N }, and we denote the corresponding l0 = I λ . We also denote by Pint (S) ⊂ P (S) the set of all S-internal solutions to (100). Now we consider the simplest solutions to (100) which play an important role. Keeping in mind that the string l can contain several copies of a single value l, we can recast the sum in (98) as follows: N
δl ωl (k∗l ), where δl 1,m λ = 1,m ζ , l =
l=1
if l−1 (l) = ∅ j∈l−1 (l) = , 0 if l−1 (l) = ∅ (101) l−1 (l) = j : l j = l, 1 ≤ j ≤ m , l = (l1 , . . . , lm ), 1 ≤ l ≤ N . Let us call a solution m, ζ , n, λ ∈ P(S) of (100) universal if it has the following properties: (i) only a single coefficient out of all δl in (101) is nonzero, namely for some I0 we have δ I0 = ±1 and δl = 0 for l = I0 ; (ii) n = n I0 and ζ = δ I0 . A justification for calling such a solution universal comes from the fact that if it is a solution for one k∗ it is a solution for any other k∗ ∈ Rd . We denote the set of universal solutions to(100) by Puniv (S), and note that a universal solution is a S -internal solution with I λ = I0 implying Puniv (S) ⊆ Pint (S). (102) ζ ( j)
Indeed, observe that for δl as in (101), m N
ζ ( j) k∗l j = δl k∗l , κm λ = κm ζ , l = j=1
(103)
l=1
implying κm λ = δ I0 k∗I0 and ζ κm λ = δ 2I0 k∗I0 = k∗I0 . Then Eq. (100) is obviously satisfied and n, ζ κm λ = n I0 , k∗I0 ∈ S. Example 17 (Universal solutions). Suppose there is just a single band, i.e. J = 1, a symmetric dispersion relation ω1 (−k) = ω1 (k), a cubic nonlinearity F with M F = {3}. First letus S1 = {(1, k∗ )}, that is N = 1. Then the simplest nk -spectrum take 1,3 λ k∗ = δ 1 ω1 (k∗ ) and κm λ = δ 1 k∗ where we use notation (101). The universal solution set has the form Puniv (S1 ) = 3, ζ , 1, λ : λ ∈ ζ , ζ = ± , where + consists of vectors (λ1 , λ2 , λ3 ) of the form ((−, 1), (+, 1), (+, 1)), ((+, 1), (−, 1), (+, 1)) and ((+, 1), (+, 1), (−, 1)). Obviously, Puniv (S1 ) = Pint (S1 ). In the next example we take the nk-spectrum S = {(1, k∗ ), (1, −k∗ )}, that is N = 2 and k∗1 = k∗ , k∗2 = −k This example is typical for two counterpropagating Then ∗. waves. m 3 j) ( ( j) ωl j k∗l j = (δ 1 + δ 2 ) ω1 (k∗ ) and κm λ = 1,3 λ k∗ = j=1 ζ j=1 ζ
354
A. Babin, A. Figotin
k∗l j = δ 1 k∗1 + δ 2 k∗2 = (δ 1 − δ 2 ) k∗ where we use notation (101). The universal solution set has the form Puniv (S) = 3, ζ , 1, λ : λ ∈ ζ , ζ = ± , where + consists of vectors (λ1 , λ2 , λ3 ) of the form ((+, 1), (−, 1), (+, 1)), ((+, 1), (−, 1), (+, 2)), ((+, 2), (−, 2), (+, 1)), ((+, 2), (−, 2), (+, 2)), and vectors obtained from the listed ones by permutations of coordinates λ1 , λ2 , λ3 . The solutions from Pint (S) have to satisfy |δ 1 − δ 2 | = 1 and |δ1 + δ 2 | = 1 which is possibleonly if δ 1 δ 2 = 0. Since ζ = δ 1+ δ 2 we have 2 2 ζ κm λ = δ 1 − δ 2 k∗ and ζ κm λ = k∗1 if |δ 1 | = 1 or ζ κm λ = k∗2 if |δ 2 | = 1. Hence Pint (S) = Puniv (S) in this case. Note that if we set S2 = {(1, −k∗ )} then S = S1 ∪ S2 but Pint (S) is larger than Pint (S1 ) ∪ Pint (S2 ). This can be interpreted as follows. When only modes from S1 are excited, the modes from S2 remain non-excited. But when both S1 and S2 are excited, there is a resonance effect of S1 onto S2 , represented, for example, by λ = ((+, 1), (−, 1), (+, 2)), which involves the mode ζ κm λ = k∗2 . Now we are ready to define resonance invariant spectra. First, we introduce a subset [S]res out of [S]out by the formula (104) [S]res out = (n, k∗∗ ) ∈ [S]out : k∗∗ = ζ κm λ , m ∈ M F , where m, ζ , n, λ is a solution of (100) , calling it resonant output spectrum of S, and then we define resonance selection operation R (S) = S ∪ [S]res out .
(105)
Definition 18 (Resonance invariant nk-spectrum). The nk-spectrum S is called resonance invariant if R (S) = S or, equivalently, [S]res out ⊆ S. The nk-spectrum S is called universally resonance invariant if R (S) = S and Puniv (S) = Pint (S). It is worth noticing that even when a nk-spectrum is not resonance invariant often it can be easily extended to a resonance invariant one. Namely, if R j (S) ∩ σ bc = ∅ for all j then the set R∞ (S) =
∞
R j (S) ⊂ = {1, . . . , J } × Rd
j=1
is resonance invariant. In addition to that, R∞ (S) is always at most countable. Usually it is finite, i.e. R∞ (S) = R p (S) for a finite p , see examples below and we also show below that R∞ (S) = S for generic K S . Example 19 (Resonance invariant nk-spectra for quadratic nonlinearity). Suppose there is a single band, i.e. J = 1,with a symmetric dispersion relation, and a quadratic nonlinearity F, that is M F = {2}. Let us assume that k∗ = 0, k∗ , 2k∗ , 0 are not bandcrossing points and look at two examples. First, suppose that 2ω1 (k∗ ) = ω1 (2k∗ ) (no second harmonic generation) and ω1 (0) = 0. Let us set the nk-spectrum to be the set S1 = {(1, k∗ )}, then S1 is resonance invariant. Indeed, K S1 = {k∗ } , [S1 ] K ,out = {0, 2k∗ , −2k∗ }, [S1 ]out = {(1, 0), (1, 2k∗ ), (1, −2k∗ )} and an elementary examination shows that [S1 ]res out = ∅ ⊂ S1 implying R (S1 ) = S1 . For the second example let us
Wavepacket Preservation Under Nonlinear Evolution
355
assume ω1 (0) = 0 and 2ω1 (k∗ ) = ω1 (2k∗ ), that is the second harmonic generation is allowed. Here [S1 ]res out = {(1, 2k∗ )} and R (S1 ) = {(1, k∗ ), (1, 2k∗ )} implying R (S1 ) = S1 and, hence, S1 is not resonance invariant. Suppose now that 4k∗ , 3k∗ ∈ / σ bc and ω1 (0) = 0, ω1 (4k∗ ) = 2ω1 (2k∗ ), ω1 (3k∗ ) = ω1 (k∗ ) + ω1 (2k∗ ) and let us set S2 = {(1, k∗ ), (1, 2k∗ )}. An elementary examination shows that S2 is resonance invariant. Note that S2 can be obtained by iterating the resonance selection operator, namely S2 = R (R (S1 )). Note also that Puniv (S2 ) = Pint (S2 ). Notice that ω1 (0) = 0 is a special case since k = 0 is a band-crossing point, and it requires a special treatment. Example 20 (Resonance invariant nk-spectra for cubic nonlinearity). Let us consider the one-band case with a symmetric dispersion relation and a cubic nonlinearity that is M F = {3}. First we take S1 = {(1, k∗ )}, we assume that k∗ , 3k∗ are notband-cross ing points, implying [S1 ] K ,out = {k∗ , −k∗ , 3k∗ , −3k∗ }. We have 1,3 λ k∗ = 3 ( j) ω (k ) = δ ω (k ) and κ = δ 1 k∗ , where we use notation (101), 1 ∗ 1 1 ∗ m λ j=1 ζ δ 1 takes values 1, −1, 3, −3. If 3ω1 (k∗) = ω1 (3k∗ ), then (100) has a solution only if |δ 1 | = 1 and δ 1 = ζ , hence ζ κm λ = k∗ and every solution is internal. Therefore, [S1 ]res out = ∅ and R (S1 ) = S1 . Now consider the case associated with the third harmonic generation, namely 3ω1 (k∗ ) = ω1 (3k∗ ) and assume that ω1 (3k∗ ) + 2ω1 (k∗ ) = ω1 (5k∗ ), 3ω1 (3k∗ ) = ω1 (9k∗ ), 2ω1 (3k∗ ) + ω1 (k∗ ) =ω1 (7k∗ ), 2ω1 (3k∗ ) − ω1 (k∗ ) =ω1 (5k∗ ). An elementary examination shows that the set S4 = {(1, 3k∗ ), (1, k∗ ), (1, −k∗ ) (1, −3k∗ )} satisfies R (S4 ) = S4 . Consequently, a multiwavepacket having S4 as its resonance invariant nk-spectrum involves the third harmonic generation and, according to Theorem 3, it is preserved under nonlinear evolution. The above examples indicate that in simple cases the conditions on k∗ which can make S non-invariant with respect to R have a form of several algebraic equations, therefore, for almost all k∗ such spectra S are resonance invariant. The examples also show that if we fix S and dispersion relations then we can include S in the larger spectrum S = R p (S) using repeated application of the operation R to S , and often the resulting extended nk-spectrum S is resonance invariant. We show in the following section that nk-spectrum S with generic K S is universally resonance invariant. Note that the concept of resonance invariant nk-spectrum gives a mathematical description of such fundamental concepts of nonlinear optics as phase matching, frequency matching, four wave interaction in cubic media and three wave interaction in quadratic media. If a multi-wavepacket has a resonance invariant spectrum, all these phenomena may take place in the internal dynamics of the multi-wavepacket, but do not lead to resonant interactions with continuum of all remaining modes. 3.4. Genericity of the nk-spectrum invariance condition. In simpler situations, when the number of bands J and wavepackets N are not too large, the resonance invariance of nk- spectrum can be easily verified as above in Examples 19, 20, but what one can say if J or N are large, or if the dispersion relations are not explicitly given? We show below that in properly defined non-degenerate cases a small variation of K S makes S universally resonance invariant, i.e. the resonance invariance is a generic phenomenon. Assume that the dispersion relations ωn (k) ≥ 0, n ∈ {1, . . . , J } are given. Observe then that m ζ , n, λ = m ζ , n, λ k∗1 , . . . , k∗|K S | defined by (99) is a continuous function of k∗l ∈ / σ bc for every m, ζ , n, λ .
356
A. Babin, A. Figotin
Definition 21 (ω-degenerate dispersion relations). We call dispersion relations ωn (k), n = 1, . . . , J , ω-degenerate if there exists such a point k∗ ∈ Rd \ σ bc that for all k in a neighborhood of k∗ at least one of the following four conditions holds: (i) the relations J Cn ωn (k) = c0 , where all Cn are integers, one of are linearly dependent, namely n=0 which is nonzero, and the c0 is a constant; (ii) at least one of ωn (k) is a linear function; (iii) at least one of ωn (k) satisfies equation Cωn (k) = ωn (Ck) with some n and integer C = ±1; (iv) at least one of ωn (k) satisfies equation ωn (k) = ωn (−k), where n = n. Note that fulfillment of any of the four conditions in Definition 21 makes it impossible to turn some non-resonance invariant sets into resonance invariant ones by a variation of k∗l . For instance, if M F = {2} as in Example 19 and 2ω1 (k) = ω1 (2k) for all k in an open set G then the set {(1, k∗ )} with k∗ ∈ G cannot be made resonance invariant by a small variation of k∗ . Below we show that if dispersion relations are not ω -degenerate, then a small variation of k∗l turns non-resonance invariant sets into resonance invariant. d |K S | , . . . , k Theorem 22. If m ζ , n 0 , λ k∗1 ∗|K S | = 0 on a cylinder G in R \ σ bc which is a product of small balls G i ⊂ Rd \ σ bc , then either m, ζ , n 0 , λ ∈ Puniv (S) or dispersive relations ωn (k) are ω-degenerate as in Definition 21. Proof. Collecting similar terms in (100) we obtain the following equation for ki from Gi : ⎛ ⎞ |K J |K S| S|
δ in ωn (ki ) = ζ ωn 0 ⎝ δ i ki ⎠ where δ in , δ i are integers. (106) n=1 i=1
i=1
may be non-zero only if (n, k ) ∈ S, that Comparing (106) with (101) we see that δ in i is (n, ki ) = (nl , kl ) with l ∈ {1, . . . , N }, where l = l (i, n) is uniquely determined and = δ with δ as in (101). If there are two nonzero coefficients δ in (106) we use an δ in l l i elementary Proposition 24 below, noticing that we are in case (ii) of Definition 21. If we do not have two nonzero δ i then either all δ i = 0 or only one δ i = δ i0 = 0. If all δ i = 0 then the right-hand side of (106) turns into ωn 0 (0) and, G i ⊂ Rd \ σ bc , ωn 0 (0) = 0. J ω (k ) is constant, one of δ is non-zero and we Hence, for every i the sum n=1 δ in n i in are in case (i) of Definition 21. If only one δ i = 0 with i = i 0 we have J |K S|
δ in ωn (ki ) = ζ ωn 0 δ i0 ki0 for all ki ∈ G i , ki0 ∈ G i0 ,
(107)
n=1 i=1
implying linear dependence of the dispersion relations, namely J
δ in ωn (ki ) = Ci , i = i 0 , where Ci are constant.
n=1
The above equations would not imply linear dependence as in case (i) of Definition 21 only if δ in = 0, i = i 0 , n = 1, . . . , J, (108)
Wavepacket Preservation Under Nonlinear Evolution
357
and in this case the equality (107) takes the form J
δ i0 n ωn ki0 = ζ ωn 0 δ i0 ki0 for all ki0 ∈ G i0 .
(109)
n=1
J Note that in this case we deduce from (94) and (98) that n=1 δ i0 n = δ i0 . If δ i0 = 1 we are in case (iii) of Definition 21, whereas if δ i0 = 1 and n = n 0 we are in case (iv) of Definition 21. If δ i0 = 1 and n = n 0 (109) turns into δ i0 ωn 0 ki0 = ζ ωn 0 δ i0 ki0 . Since ωn 0 > 0 it implies δ i0 = ζ and ωn 0 ki0 = ωn 0 ζ δ i0 ki0 . Hence, in this case m, ζ , n 0 , λ ∈ Puniv (S), and since all possibilities are exhausted the proof is complete. Theorem 23 (Genericity of resonance invariance). Assume that dispersive relations ωn (k) are not ω-degenerate as in Definition 21. Let Krinv be a set of continuous and points k∗1 , . . . , k∗|K S | such that there exists a universally resonance invariant nk-spectrum S for which its k-spectrum K S = k∗1 , . . . , k∗|K S | . Then Krinv is open |K | and everywhere dense set in Rd \ σ bc S . Proof. The fact that Krinv is open follows from Definition 18 and the continuity in k of the dispersion relations ωn (k). Let G be a small open ball such that its clo ¯ ⊂ Rd \ σ bc |K S | . It suffices to prove that G¯ ∩ Krinv contains at least one point sure G k∗1 , . . . , k∗|K S | . For a given finite set M F let us consider all possible m, ζ , n 0 , λ ∈ × {−1, 1} × {1, . . . , J } × m m∈MF
which are not universal solutions to (100), and for a given m, ζ , n 0 , λ let G 0 (m, ζ , n 0 , ¯ and notice that it is a closed set. be a set of solutions k1 , . . . , k|K S | to (100) in G, λ) Let now G 0 (S) ⊂ G¯ be the union of the sets G 0 m, ζ , n 0 , λ over all m, ζ , n 0 , λ ∈ P (S) \ Puniv (S) and let us show that G 0 (S) = G. Indeed, suppose that G 0 (S) = G andhence G is afinite union of closed sets. According to Baire’s theorem one of the sets G 0 m, ζ , n 0 , λ with m, ζ , n 0 , λ ∈ P (S) \ P univ (S) must have a nonempty interior. Then, according to Theorem 22, the dispersion relations ωn (k) are ω-degenerate as in Definition 21 contradicting the conditions of the theorem. Hence, there is always a point k∗1 , . . . , k∗|K S | ∈ P (S) \ P univ (S) that completes the proof. The proof of the next statement is elementary and we skip it. Proposition 24. Let f 1 (k), f 2 (k), f 3 (k) be real-valued and continuous functions respectively in neighborhoods of k∗1 , k∗2 , k∗1 +k∗2 in Rd . Assume that the following equation: f 1 (k1 ) + f 2 (k2 ) = f 3 (δ 1 k1 + δ 2 k2 ) + C0 , holds in these neighborhoods where C0 , δ 1 , δ 2 are constants and δ 1 δ 2 = 0. Then all three functions f 1 (k), f 2 (k), f 3 (k) are linear in neighborhoods of k∗1 , k∗2 , k∗1 + k∗2 respectively.
358
A. Babin, A. Figotin
4. Reduction to a Standard Framework Many well known nonlinear evolutionary equations and systems can be easily reduced to the framework of (1), (3) involving two small parameters and β and characterized by the following properties: (i) the linear part L has a large factor 1 before it; (ii) the nonlinearity F (U) is independent of , β or depends on regularly; (iii) the initial data depend on β so that they do not vanish as β → 0; (iv) the solutions are considered on the time interval 0 ≤ τ ≤ τ ∗ , where τ ∗ > 0 does not depend on , β. Notice that solutions to (1), (3) under the above conditions exhibit nonlinear effects uniformly with respect to small , β on the time interval 0 ≤ τ ≤ τ ∗ . There are important classes of problems which can be readily reduced to the framework of (1), (3) by a simple rescaling. Systems with a small factor before the nonlinearity. Consider a problem of the form ∂t v = −iLv + αf (v),
v|t=0 = h, 0 < α 1,
(110)
where initial data are bounded uniformly in α. Such problems are reduced to (1) by the time rescaling τ = tα. Note that now = α and the finite time interval 0 ≤ τ ≤ τ ∗ corresponds to the long time interval 0 ≤ t ≤ τ ∗ /α. Systems with small initial data on long time intervals. The equation here is ∂t v = −iLv + f0 (v), f0 (v) =
v|t=0 = α 0 h, 0 < α 0 (m) (m+1) f0 (v) + f0 (v) + . . . ,
1, where (111)
and f (m) (v) is a homogeneous polynomial of degree m ≥ 2. After the rescaling v = α 0 V, we obtain the following equation with a small nonlinearity: (m) (112) ∂t V = −iLV + α 0m−1 f0 (V) + α 0 f 0(m+1) (V) + . . . , V|t=0 = h, which is of the form of (110) with α = α 0m−1 . Note that nonlinearities f in (110) which are obtained from problems with small initial data and regular nonlinearities f0 (v) have (m) a special form. Namely, they are almost homogeneous, f (V) = f0 (V) + α [. . .] with (m) leading term f0 (V). Introducing the slow time variable τ = tα 0m−1 we get from the above an equation of the form (1), namely i ∂τ V = − m−1 LV + f (m) (V) + α 0 f (m+1) (V) + . . . , V|τ =0 = h, (113) α0 where the nonlinearity does not vanish as α 0 → 0. In this case = α 0m−1 and the finite τ∗ time interval 0 ≤ τ ≤ τ ∗ corresponds to the long time interval 0 ≤ t ≤ m−1 with α0
small α 0 1. Note that Corollary 38 for -dependent nonlinearities can be applied to this case. This allows, in particular, to apply results of this paper to the Sine-Gordon m equation where f0 (v) = sin v. Note that a different rescaling τ = tα m 0 with = α 0 −1/m (m) (m) would produce a large term f (V). If the term f (V) is non-resonant for the initial data h such a term still produces a small contribution to the solution on interval t ≤ τ ∗ /α m 0 with small τ ∗ . The approach of this paper can be applied to this moderately singular case as well, but it would require more technical efforts and for the sake of simplicity we restrict ourselves to the regular case. The interaction of quadratic (m = 2) nonlinearity with the cubic term of the 1D model equation of form (111) was studied by Schneider 51].
Wavepacket Preservation Under Nonlinear Evolution
359
High-frequency carrier waves. Sometimes high spatial frequency of carrier waves in the initial wavepackets after a rescaling creates a large parameter 1 at the linear part. For example, Nonlinear Schrodinger equation ∂τ U = −i∂x2 U + iα |U |2 U,
U |τ =0 = h 1 (βx) eiMk∗1 x + h 2 (βx) eiMk∗2 x + c.c., (114) where c.c. stands for complex conjugate of the prior term, and M 1 is a large parameter, can be recast in the form (1). Indeed, changing variables y = M x in the above equation we obtain 1 ∂τ U = −i ∂ y2 U + iα |U |2 U,
U |τ =0 = h 1 β 1 y eik∗1 y + h 2 β 1 y eik∗2 y + c.c.,
β where β 1 = M 1, = M12 1. Note that though the nonlinearity|U |2 U in (114) is not complex homogeneous, it can be considered as a restriction of a system with a complex homogeneous nonlinearity as (67) is a restriction of (62).
First order hyperbolic equations and systems. Consider now the system (45), (46) for which the symmetry (7) does not hold. The system can be put into the standard framework by formally adding two more equations c1 c2 ∂τ w1 = ∂x w1 + F1 (w1 , w2 ), ∂τ w2 = ∂x w2 + F2 (w1 , w2 ), (115) w1 |τ =0 = 0, w2 |τ =0 = 0, which have only trivial solution w1 = w2 = 0 not affecting the solutions to the original system (45), (46). The extended system has the linear part with two-band dispersion relations ω1,ζ (k) = c1 ζ |k|, ω2,ζ (k) = c2 ζ |k| , ζ = ±, satisfying evidently (7). 5. Integrated Evolution Equation Using the variation of constants formula we recast the modal evolution equation (3) into the following equivalent integral form: τ −i(τ −τ ) −iζ τ L(k) ˆ ˆ L(k) ˆ ˆ e h (k), τ ≥ 0. (116) F U (k, τ ) dτ + e U (k, τ ) = 0
ˆ (k, τ ) into the slow variable uˆ (k, τ ) and the fast oscillatory term as Then we factor U in (14), namely iτ
iτ
ˆ (k, τ ) = e− L(k) uˆ (k, τ ), U ˆ n,ζ (k, τ ) = uˆ n,ζ (k, τ ) e− ζ ωn (k) , U
(117)
where uˆ n,ζ (k, τ ) are the modal coefficients of uˆ (k, τ ) as in (81). Notice that uˆ n,ζ (k, τ ) in (117) may depend on and (117) is just a change of variables and not an assumption. Consequently we obtain the following integrated evolution equation for uˆ = uˆ (k, τ ), τ ≥ 0,
F (m) uˆ m (k, τ ) , uˆ (k, τ ) = F uˆ (k, τ ) + hˆ (k), F uˆ = (118) F
(m)
uˆ
m
(k, τ ) =
m∈MF
τ
e 0
iτ L(k)
m −iτ L(·) ˆ e k, τ dτ , uˆ Fm
(119)
360
A. Babin, A. Figotin
where Fˆm are defined by (84) and (86) in terms of the susceptibilities χ (m) , and F (m) are bounded as in the following lemma. (m) defined by (86), (119) is a Lemma 25 (Boundness of multilinear operators). F 1 1 bounded operator from E = C [0, τ ∗ ] , L into C [0, τ ∗ ] , L 1 satisfying
m (m) uˆ j , uˆ 1 . . . uˆ m ≤ τ ∗ χ (m) F E j=1 E (m) (m) uˆ j . uˆ 1 . . . uˆ m ≤ χ ∂τ F E j
E
(120) (121)
Proof. Notice that since L (k) is Hermitian, exp −iL (k) τ1 = 1. Using the Young inequality, uˆ ∗ vˆ 1 ≤ uˆ 1 vˆ 1 , (122) L
L
L
together with (86), (119) we obtain (m) uˆ 1 . . . uˆ m (·, τ ) F Rd
(m) χ
τ
0 τ 0
Dm
L1
≤ sup χ (m) k, k k,k
(m) uˆ 1 k . . . uˆ m k k, k dk . . . dk(m−1) dτ 1 dk ≤
uˆ 1 (τ 1 )
(m) ˆ u χ . . . dτ ≤ τ (τ ) uˆ 1 E . . . uˆ m E , m 1 L1 1 ∗ L1
proving (120). Similarly we prove (121) by
Rd
∂τ F (m) uˆ 1 . . . uˆ m (·, τ ) 1 ≤ χ (m) L (m) uˆ 1 k . . . uˆ m k k, k dk . . . dk(m−1) dk ≤ χ (m) uˆ 1 E . . . uˆ m E . Dm
Equation (118) can be recast as the following abstract equation in a Banach space: ˆ u, ˆ hˆ ∈ E, uˆ = F uˆ + h,
(123)
and it readily follows from Lemma 25 that F uˆ has the following properties. Lemma 26. The operator F uˆ defined by (118)–(119) satisfies the Lipschitz condition F uˆ 1 − F uˆ 2 ≤ τ ∗ C F uˆ 1 − uˆ 2 , E E where C F ≤ Cχ m 2F (4R)m F −1 if uˆ 1 E , uˆ 2 E ≤ 2R, with Cχ as in (88). We also will use the following form of the contraction principle.
(124)
Wavepacket Preservation Under Nonlinear Evolution
361
Lemma 27 (Contraction principle). Consider equation x = F (x) + h, x, h ∈ B,
(125)
where B is a Banach space, F is an operator in B. Suppose that for some constants R0 > 0 and 0 < q < 1 we have h ≤ R0 , F (x) ≤ R0 if x ≤ 2R0 , F (x1 ) − F (x2 ) ≤ q x1 − x2 if x1 , x2 ≤ 2R0 .
(126) (127)
Then there exists a unique solution x to Eq. (125) such that x ≤ 2R0 . Let h1 , h2 ≤ R0 , then the two corresponding solutions x1 , x2 satisfy x1 , x2 ≤ 2R0 , x1 − x2 ≤ (1 − q)−1 h1 − h2 .
(128)
Let x1 , x2 be the two solutions of correspondingly two equations of the form (125) with F1 , h1 and F2 , h2 . Assume that that F1 (u) satisfies (126), (127) with a Lipschitz constant q < 1 and that F1 (x) − F2 (x) ≤ δ for x ≤ 2R0 . Then x1 − x2 ≤ (1 − q)−1 (δ + h1 − h2 ).
(129)
Lemma 26 and the contraction principle as in Lemma 27 imply the following existence and uniqueness theorem. Theorem 28. Let h E ≤ R, and let τ ∗ < 1/C Lemma F where CF , is a constant from 26. Then Eq. (118) has a solution uˆ ∈ E = C [0, τ ∗ ] , L 1 which satisfies uˆ E ≤ 2R, and such a solution is unique. The following existence and uniqueness theorem follows from Theorem 28. Theorem 29. Let (3) satisfy (88) and hˆ ∈ L 1 Rd , hˆ 1 ≤ R. Then there exists a L
unique solution to the modal evolution equation (3) in the functional space C 1 ([0, τ ∗ ] , L 1 ). The number τ ∗ depends on R and Cχ . Using the inequality (21) and applying the inverse Fourier transform we readily obtain the existence of an F−solution of (1) in C 1 [0, τ ∗ ] , L ∞ Rd from the existence of the solution of Eq. (3) in C 1 [0, τ ∗ ] , L 1 . The existence of F-solutions in spaces of spatially smooth functions can be derived by replacing Lemma 25 with an estimate similar to the one in Lemma 50. Let us recast now the system (118)–(119) into modal components using the projections n,ζ (k) as in (11). The first step to introduce elementary modal susceptibilities (m) χ having one-dimensional range in C2J and vanishing if one of its arguments uˆ j n,ζ ,ξ
belongs to a (2J − 1)-dimensional linear subspace in C2J ( j th null-space of χ
(m) ). n,ζ ,ξ
For example, in the linear case m = 1 when χ (1) acts in C2J and is presented in the standard orthonormal basis en,ζ in C2J by a 2J × 2J matrix with elements (1) (1) aξ ,ξ = an,ζ ,n ,ζ , where index ξ = n, ζ takes 2J values, the action of elementary (1)
(1)
susceptibility χ n,ζ ,n ,ζ on a vector v ∈ C2J is given by the formula χ n,ζ ,n ,ζ v = (1) an,ζ ,n ,ζ v · en ,ζ en,ζ , where en,ζ is the standard orthonormal basis in C2J . Obvi (1) (1) ously χ n,ζ ,n ,ζ v = n,ζ χ (1) n ,ζ v and χ (1) v = n,ζ ,n ,ζ χ n,ζ ,n ,ζ v. The general definition follows.
362
A. Babin, A. Figotin
Definition 30 (Elementary susceptibilities). Let (130) ξ = n, ζ ∈ {1, . . . , J }m × {−1, 1}m = m , (n, ζ ) ∈ and χ (m) k, k uˆ 1 k , . . . , uˆ m k(m) be the m-linear symmetric tensor (suscepti m (m) bility) as in (86). We introduce elementary susceptibilities χ k, k : C2J → n,ζ ,ξ C2J ) as m-linear tensors defined for almost all k, k by the following formula: (m) (m) uˆ 1 k , . . . , uˆ m k(m) k, k χ k, k uˆ 1 k , . . . , uˆ m k(m) = χ n,ζ ,ξ n,ζ , n ,ζ (m) k, k n 1 ,ζ k uˆ 1 k , . . . , n m ,ζ (m) k(m) k, k uˆ m = n,ζ (k) χ . (131) × k(m) k, k Then using (82) and the elementary susceptibilities (131) we get χ (m) k, k uˆ 1 k , . . . , uˆ m k(m)
(m) = χ k, k uˆ 1 k , . . . , uˆ m k(m) . n,ζ
ξ
n,ζ ,ξ
(132)
(m)
Consequently the modal components F of the operators F (m) in (119) are m-linear n,ζ ,ξ oscillatory integral operators defined in terms of the elementary susceptibilities (132) as follows. Definition 31 (Interaction phase). Using notations from (86) we introduce for ξ = n, ζ ∈ m the operator $ τ τ % 1 (m) exp iφ n,ζ ,ξ k, k F (u˜ 1 . . . u˜ m ) (k, τ ) = n,ζ ,ξ 0 Dm (m) (m) 1, k, k , τ 1 d˜ (m−1)d kdτ (133) χ k, k u˜ 1 k , τ 1 , . . . , u˜ m k n,ζ ,ξ
with the interaction phase function φ defined by φ n,ζ ,ξ k, k = φ n,ζ ,n ,ζ k, k = ζ ωn (ζ k) − ζ ωn 1 ζ k − . . . − ζ (m) ωn m ζ (m) k(m) , k(m) = k(m) k, k . (134) (m) n,ζ ,ξ
Using F
in (133) we recast F (m) (um ) in the system (118)–(119) as
(m) F (m) uˆ 1 . . . , uˆ m (k, τ ) = F uˆ 1 . . . uˆ m (k, τ ), n,ζ ,ξ
n,ζ ,ξ
yielding the following system for the modal components uˆ n,ζ (k, τ ) as in (11),
(m) F uˆ m (k, τ ) + hˆ n,ζ (k), (n, ζ ) ∈ . uˆ n,ζ (k, τ ) = m∈MF ξ ∈m
n,ζ ,ξ
(135)
(136)
Wavepacket Preservation Under Nonlinear Evolution
363
6. Wavepacket Interaction System The wavepacket preservation property of the nonlinear evolutionary system in any of its forms (1), (3), (118), (123), (136) is not easy to see directly. It turns out though that dynamics of wavepackets is well described by a system in a larger space E 2N based on the original equation (118) in the space E. We call it a wavepacket interaction system, which is useful in three ways: (i) the wavepacket preservation is quite easy to see and verify; (ii) it can be used to prove the wavepacket preservation for the original nonlinear problem; (iii) it can be used to study more subtle properties of the original problem, such as NLS approximation. We start with the system (118) where hˆ (k) is a multiwavepacket with a given nk-spectrum S = {(k∗l , nl ), l = 1, . . . , N } as in (31) and k-spectrum K S = {k∗i , i = 1, . . . , |K S |} as in (32). When constructing the wavepacket interaction system it is convenient to have relevant functions to be explicitly localized about the k-spectrum K S of the initial data. We implement that by making up the following cutoff functions based on (25), (26), i,ϑ (k) = (k, ϑk∗i ) = β −(1− ) (k − ϑk∗i ) , k∗i ∈ K S , i = 1, . . . , |K S |, ϑ = ± (137) with as in Definition 1 and β > 0 small enough to satisfy β 1/2 ≤ π 0 , where π 0 = π 0 (S)
0 and τ ∗ > 0 such that E
∈ E 2N which satisfies w E 2N ≤ R1 and such a solution is Eq. (141) has a solution w unique. ˆ l,ζ (k, τ ) corresponding to the solution of (142) from Lemma 34. Every function w E 2N is a wavepacket with nk-pair (k∗l , nl ) with the degree of regularity which can be any s > 0. Proof. Note that according to (137) and (142) the function ˆ l,ϑ (k, τ ) = k, ϑk∗il nl ,ϑ F (k, τ ), F (τ ) L 1 ≤ C, 0 ≤ τ ≤ τ ∗ w involves the factor l,ϑ (k) = β −(1− ) (k − ϑk∗l ) where is as in Definition 1. Hence, ˆ l,ϑ (k, τ ) = 0 if n = nl or ϑ = ϑ, (144) n,ϑ w 1−
ˆ l,ϑ (k, τ ), w ˆ l,ϑ (k, τ ) = 0 if |k − ϑk∗l | ≥ β , ˆ l,ϑ (k, τ ) = k, ϑk∗il w w (145) ˆ l,ϑ is satisfied with Dˆ h = 0 for any s > 0 and and, consequently, Definition 1 for w C = 0 in (30).
Wavepacket Preservation Under Nonlinear Evolution
365
Now we would like to show that if hˆ is a multiwavepacket, then the function
ˆ (k, τ ) = ˆ l,ϑ (k, τ ) = ˆ λ (k, τ ) w w w (146) λ∈
(l,ϑ)∈
is an approximate solution of Eq. (123) (see notation (92)). To do that we introduce ∞ (k) = 1 −
S|
|K
(k, ϑk∗i ) = 1 −
ϑ=± i=1
ϑ=± k∗i ∈K S
k − ϑk∗i . β 1−
(147)
m ˆ l,ϑ Expanding the m-linear operator F (m) and using notations (92), (93) l,ϑ w we get ⎛⎛ ⎞m ⎞
λ , where ˆ l,ϑ ⎠ ⎠ = F (m) ⎝⎝ (148) w F (m) w λ ∈m
l,ϑ
ˆ λ1 . . . w ˆ λm , λ = (λ1 , . . . , λm ) ∈ m . λ = w w
(149)
The next statement shows that (146) defines an approximate solution to the integrated evolution equation (118). Theorem 35. Let hˆ be a multi-wavepacket with resonance invariant nk-spectrum S with be a solution of (142) and w ˆ (k, τ ) be defined by (146). Let regularity degree s, w ˆ w ˆ ˆ =w ˆ −F w ˆ − h. D (150) Then there exists β 0 > 0 such that we have the estimate ˆ ˆ ≤ C + Cβ s , if 0 < ≤ 1, β ≤ β 0 . D w E
(151)
Proof. Let ⎛ ⎞
ˆ ˆ = ⎝1 − ˆ , hˆ − = hˆ − F− w il ,ϑ nl ,ϑ ⎠ F w il ,ϑ nl ,ϑ h. l,ϑ
(152)
l,ϑ
Summation of (141) with respect to l, ϑ yields
ˆ ˆ = ˆ + w il ,ϑ nl ,ϑ F w il ,ϑ nl ,ϑ h. l,ϑ
l,ϑ
Hence, from (141) and (150) we obtain ˆ w ˆ = hˆ − − F − w ˆ . D Using (28) and (30) we consequently obtain nl ,ϑ hˆ i 1 ≤ Cβ s if nl = n i ; il ,ϑ hˆ i 1 ≤ Cβ s L L ˆ − s h ≤ C 1 β . E
(153)
if k∗il = k∗i , (154)
366
A. Babin, A. Figotin
Now, to show (151) it is sufficient to prove that − F w ˆ E ≤ C2 . Obviously,
(155)
⎛ ⎞
m ˆ = ⎝1 − ˆ . F− w il ,ϑ nl ,ϑ ⎠ F (m) w
Note that
il ,ϑ nl ,ϑ =
(156)
m
l,ϑ
(·, ϑk∗ ) n,ϑ .
(157)
ϑ=± (n,k∗ )∈S
l,ϑ
Using (82) and (147) we consequently obtain
(·, ϑk∗ ) n,ϑ + ∞ = 1,
(158)
ϑ=± (n,k∗ )∈
⎛ ⎝1 −
⎞
il ,ϑ nl ,ϑ ⎠ = ∞ +
(·, ϑk∗ ) n,ϑ ,
(159)
ϑ=± (n,k∗ )∈ \S
l,ϑ
m ˆ using (148). According to (156) with defined in (90). Let us expand now F (m) w and (159) to prove (155) it is sufficient to prove that for every string λ ∈ m the following inequalities hold: λ ≤ C3 for (n, ϑ) ∈ , and (160) ∞ n,ϑ F (m) w λ ≤ C3 , if (n, k∗ ) ∈ \ S. (161) (·, ϑk∗ ) n,ϑ F (m) w We will use (144) and (145) to obtain the above estimates. According to (135)
(m) λ (k, τ ) = ˆ λ1 . . . w ˆ λm (k, τ ). F (m) w F w n,ζ
n,ζ ,ξ
ξ
(162)
Note that according to (144) if λi = l, ϑ ˆ λi , if n = nl and ϑ = ϑ. ˆ λi = n,ϑ w w Let us introduce the notation n l = nl1 , . . . , nlm , ξ λ = n l , ϑ , Since
n ,ϑ n,ϑ = 0,
ϑ ∈ m . for λ = l,
if n = n or ϑ = ϑ,
(163)
(164)
(165)
then (163) implies ˆ λ1 . . . w ˆ λm = 0 if ξ = n, ζ = ξ λ , and, hence, w
(m) w λ (k, τ ) = ˆ λ1 . . . w ˆ λm (k, τ ), F F (m) w
(m) n,ζ ,ξ
F
n,ζ
n,ζ ,ξ λ
(166)
Wavepacket Preservation Under Nonlinear Evolution
367
where we use notation (93), (164). Note also that n ,ϑ F
(m) n,ζ ,ξ
if n = n or ϑ = ζ ,
=0
(m) λ only if and, hence, we have nonzero n ,ϑ F w n,ζ ,ξ ξ = ξ λ , n = n, ϑ = ζ . By (133) (m) λ (k, τ ) F w n,ζ ,ξ λ (m) χ n,ζ ,ξ λ
τ
= 0
Dm
(168)
τ % 1 (169) k, k 1, k(m) k, k , τ 1 d˜ (m−1)d kdτ
$ exp iφ
ˆ λ1 k , τ 1 , . . . , w ˆ λm k, k w
(167)
n,ζ ,ξ λ
Now we use (145) and notice that according to the convolution identity in (86),
w ˆ λm k(m) k, k , τ 1 = 0 if k − ˆ λ1 k , τ 1 · . . . · w ϑ i k∗li ≥ mβ 1− . (170) i
Hence the integral (169) is nonzero only if k, k belongs to the set &
1−
1−
(i) k, k : k − ϑ i k∗li ≤ β , i = 1, . . . , m, k − Bβ = ϑ i k∗li ≤ mβ . i (171) We will prove now that if (n, k∗i ) ∈ / S, then for small β one of the following alternatives holds: (m) λ = 0, (172) either (·, ϑk∗i ) n ,ϑ F w n,ζ ,ξ (173) or (168) holds and φ n,ζ ,ξ k, k ≥ c > 0 for k, k ∈ Bβ . Note that since φ n,ζ ,ξ k, k is smooth, then using notation (94) we get (174) φ n,ζ ,ξ k, k − φ n ,ζ ,ξ k∗∗ , k∗ ≤ Cβ 1− for k, k ∈ Bβ ,
l . ϑ i k∗li = ζ κm ϑ, ϑ = (ϑ 1 , . . . , ϑ m ), k∗∗ = ζ i
Hence the alternative (173) holds if
φ n,ζ ,ξ k∗∗ , k∗ = 0,
(175)
and, consequently, it suffices to prove that either (172) or (175) holds. Combining (171) (m) w with (k, ϑk∗i ) = 0 for |k − ϑk∗i | ≥ β 1− we find that i,ϑ F can be non λ l ∈ [S] K ,out , and zero for small β only in a small neighborhood of a point ζ κm ϑ, that is possible only if
l = ϑk∗i , k∗i ∈ K S . k∗∗ = ζ κm ϑ,
(176)
368
A. Babin, A. Figotin
Let us show that the equality φ n,ζ ,ξ k∗∗ , k∗ = 0
(177)
is impossible for k∗∗ as in (176) and n = n as in (167), keeping in mind that (n, k∗i ) ∈ / S. It follows from (99) and (134) that Eq. (177) has the form of the resonance equation (100). Since nk-spectrum S is resonance invariant, in view of Definition 18the resonance equation (177) may have a solution only if k∗∗ = k∗i , i = il , n = nl , with nl , k∗il ∈ S. Since (n, k∗i ) ∈ / S that implies (177) does not have a solution and, hence, (175) holds when (n, k∗i ) ∈ / S. Notice that Theorem 33 and (121) yield bounds w ˆ λi E ≤ R1 , ∂τ w ˆ λi E ≤ C. These bounds combined with Lemma 36, proven below, imply that if (175) holds then (161) holds. Now let us turn to (160). According to (147) and (170) the term λ can be non-zero only if ζ κm λ = k∗∗ ∈ ∞ n ,ϑ F (m) w / K S . Since nk-spectrum S is resonance invariant we conclude as above that inequality (175) holds in this case as well. The fact that the set of all κm λ is finite, combined with inequality (175), imply (173) for sufficiently small β. Using Lemma 36 as above we derive (160). Hence, all terms in the expansion (156) are either zero or satisfy (160) or (161) implying consequently (155) and (151). Here is the lemma used in the above proof. Lemma 36. Assume that (m) ˆ λ1 k , τ 1 , . . . , w ˆ λm k(m) k, k , τ 1 = 0 for k, k ∈Bβ , i,ϑ n ,ζ χ n,ζ ,ξ k, k w / Bβ , with Bβ as in (171). (178) and φ n,ζ ,ξ k, k ≥ ω∗ > 0 for k, k ∈ Then (m) λ ≤ (179) ·, ϑ k∗i n ,ζ Fn,ζ ,ξ w E 4 2τ ∗ (m) (m) w ∂τ w w ˆ λ j E + ˆ λ j E . ˆ λi E χ χ j j=i ω∗ ω∗ i
Proof. Notice that the oscillatory factor in (133) equals $ $ τ % τ % 1 1 ∂τ 1 exp iφ k, k = . exp iφ k, k iφ k, k
Wavepacket Preservation Under Nonlinear Evolution
369
(m)
(m)
Denoting φ n,ζ ,ξ = φ, i,ϑ n ,ζ χ = χ η and integrating (133) by parts with n,ζ ,ξ respect to τ 1 we obtain (m) λ (k, τ ) k, ϑ k∗i n ,ζ F w n,ζ ,ξ
iφ k,k e w χ η(m) ˆ λ1 k , τ . . . w ˆ λm = k, k k, ϑ k∗i B iφ k, k × k(m) k, k , τ d˜ (m−1)d k χ η(m) ˆ λ1 k , 0 . . . w ˆ λm k, k w − k, ϑ k∗i B iφ k, k × k(m) k, k , 0 d˜ (m−1)d k
τ
τ
iφ k,k 1 e χ η(m) k, k ∂τ 1 k, ϑ k∗i − B 0 iφ k, k 1, ˆ λm k(m) k, k ˆ λ1 k . . . w × w d˜ (m−1)d kdτ τ
(180)
of k(i) for which (171) holds. The relations (88) and (25) imply whereB isthe set (m) (m) k, k ≤ χ . Using then (178), the Leibnitz formula and (122) we obtain χ η (179). The main result of this subsection is the next theorem which, when combined with Lemma 34, implies the wavepacket preservation, namely that the solution uˆ n,ϑ (k, τ ) of (136) is a multi-wavepacket for all τ ∈ [0, τ ∗ ]. Theorem 37. Assume that conditions of Theorem 35 are fulfilled. Let uˆ n,ϑ (k, τ ) for ˆ l,ϑ (k, τ ) be the solutions to respective systems (136), (141), w ˆ be defined n = nl and w by (146). Then there exists β 0 > 0 such that uˆ n ,ϑ − n ,ϑ w ˆ E ≤ C + C β s for 0 < β ≤ β 0 . (181) l l Proof. Note to The that uˆ n,ϑ = n,ϑ uˆ where uˆ is a solution of (118) and, according orem 28, uˆ E ≤ 2R. Comparing Eqs. (118) and (150), which are uˆ = F uˆ + hˆ and ˆ w ˆ =F w ˆ + hˆ + D ˆ , we find that Lemma 27 can be applied. Then we notice that w ˆ Taking C F τ ∗ < 1 as in by Lemma 26 F has the Lipschitz constant C F τ ∗ for such u. Theorem 28 we obtain (181) from (128). Notice that Theorem 5 is a direct corollary of Theorem 37 and Lemma 34. The following corollary shows that inequality (181) and, therefore, Theorems 5 and 3on ˆ ˆ in preservation of wavepackets hold in the case when the coefficients of operator F U ˆ = Fˆ U, ˆ . (3), (86) regularly depends on small , Fˆ U Corollary 38 (Parameter dependent nonlinearity). Assume that conditions of Theorem 35 are fulfilled. Consider a perturbed Eq. (118) uˆ (k, τ ) = F uˆ (k, τ ) +
370
A. Babin, A. Figotin
ˆ (k, ˆ satisfies the inequality F1 u, ˆ E ≤ τ )+hˆ (k), where operator F1 u, F1 u, ˆ l,ϑ (k, τ ) be the solution of (141). Cq for q ≤ 1. Let w uˆ E ≤ 2R with some qq, 0 < s ˆ l,ϑ E ≤ C + C β . Then n,ϑ uˆ − w Proof. The statement follows from (181) and Lemma 27.
The following theorem shows that any multi-wavepacket solution to (118) yields a solution to the wavepacket interaction system (141). Theorem 39. Let uˆ (k, τ ) be a solution of (118) and assume that uˆ (k, τ ) and hˆ (k) are multiwavepackets with nk-spectrum S = {(nl , k∗l ), l = 1, . . . , N } and the regularity (k, τ ) = ˆ l,ϑ degree s. Let also il ,ϑ = il ,ϑ be defined by (137). Then functions w il ,ϑ nl ,ϑ uˆ (k, τ ) are a solution to the system (141) with hˆ (k) replaced by hˆ (k, τ ) satisfying ˆ (182) h (k) − hˆ (k, τ ) ≤ Cβ s , 0 ≤ τ ≤ τ ∗ . L1
Proof. Multiplying (118) by il ,ϑ nl ,ϑ we get ˆ l,ϑ ˆ l,ϑ w = ·, ϑk∗il nl ,ϑ F uˆ (k, τ ) + ·, ϑk∗il nl ,ϑ hˆ (k), w ˆ (183) = ·, ϑk∗il nl ,ϑ u. Since uˆ (k, τ ) is a multiwavepacket with regularity s we have
uˆ (·, τ ) − w ˆ (·, τ ) L 1 ≤ C β s where w ˆ (·, τ ) = ·, ϑk∗il uˆ (·, τ ).
(184)
l,ϑ
Let us recast (183) in the form ˆ l,ϑ ˆ (k, τ ) + ·, ϑk∗il nl ,ϑ hˆ (k) + hˆ (k, τ ) , w = ·, ϑk∗il nl ,ϑ F w ˆ (k, τ ). hˆ (k, τ ) = F uˆ − F w
(185)
Denoting hˆ (k) + hˆ (k, τ ) = hˆ (k, τ ) we observe that (185) has the form of (141) with hˆ (k) replaced by hˆ (k, τ ). Inequality (182) follows then from (184) and (124). 7. Reduction of Wavepacket Interaction System to a Minimal Interaction System Our goal in this section is to substitute the wavepacket interaction system (141) with a simpler (minimal) interaction system which describes the evolution of wavepackets with the same accuracy. We fix the nk-spectrum S = {(nl , k∗l ), l = 1, . . . , N } of the initial multiwavepacket and assume everywhere below that it is resonance invariant. ˆ and on S. We The minimal interaction system is built based on operators L and Fˆ U want the minimal interaction system to satisfy the following requirements. Firstly, the approximation of solutions of (141) by solutions of the minimal interaction system of the order (µ, ν) has to be of the order in suitable region of parameters (, β) (which is larger for larger µ, ν). Secondly, the minimal interaction system of the order (µ, ν)
Wavepacket Preservation Under Nonlinear Evolution
371
should be defined by S and by the values of L (k) and its derivatives of the order up to (m) k, k and its derivatives of order up to ν at k∗l ∈ S K . µ and by the values χ The construction of the minimal interaction system consists of the following consecutive steps: (i) introduction of a time averaged wavepacket interaction system obtained by discarding non-resonant terms in the nonlinearity; (ii) reduction of the system for vector components vˆ l,ϑ to an equivalent one for scalar amplitudes vˆl,ϑ ; (iii) change of variables k = ϑk∗l + βη in the equation for vˆl,ϑ resulting in a regular dependence of coefficients on small βη; (iv) substitution of the general dependence on βη in the linear part with a certain polynomial one of the order µ, and the general dependence on βη of coefficients of the nonlinearity with a certain polynomial of the order ν; trigonometric (v) substitution of the cutoff functions ·, ϑk∗il from (141), which were preserved up to this step, with 1. As a result we obtain a minimal interaction system with weakly universal nonlinearity, which in the simplest case, where S is just a single element (k∗ , n), is equivalent to the classical NLS equation, and in the case when S consists of only two elements (k∗ , n), (−k∗ , n), is equivalent to the classical coupled modes system.
7.1. Time averaged wavepacket interaction system. Here we modify the wavepacket interaction system (141), substituting its nonlinearity with a certain universal or conditionally universal one obtained by the time averaging, and prove that this substitution produces a small error of order . As the first step we recast (141) in a slightly different form by using expansions (148), (162) together with (166) and (167) and writing the nonlinearity in Eq. (141) in the form
(m) ·, ϑk∗il nl ,ϑ F (·, τ ) = ·, ϑk∗il F
nl ,ϑ,ξ λ
m∈M F λ∈ m
F
(m)
w λ (k, τ ) = F (m)
nl ,ϑ,ξ λ
n,ζ , n ,ζ
ζ , λ , λ = l, w
ˆ λm (k, τ ), n = n l , (n, ζ ) = (nl , ϑ), ˆ λ1 . . . w w
(m) with F as in (133) and n l as in (164). Consequently, the wavepacket interaction n,ζ , n ,ζ system (141) can be written in an equivalent form ˆ l,ϑ = w
(m) ·, ϑk∗il F
nl ,ϑ,ξ λ
m∈M F λ∈ m
ˆ l=1, . . . N , ϑ=±. λ + ·, ϑk∗il nl ,ϑ h, w
(186) The construction of the above mentioned time averaged equation reduces to discarding certain terms in the original system (186). First we introduce the following sets of indices related to the resonance equation (100) and m defined by (99):
m m nl ,ϑ = λ = l, ζ ∈ : m ϑ, n l , λ = 0 ,
(187)
and then the time-averaged nonlinearity by = Fav,nl ,ϑ (w)
m∈MF
(m)
(m)
Fnl ,ϑ , Fnl ,ϑ =
λ ∈m n ,ϑ l
F
(m) nl ,ϑ,ξ λ
λ . w
(188)
372
A. Babin, A. Figotin (m)
(m)
can be obtained from Fnl ,ϑ by the averaging Note that the nonlinearity Fav,nl ,ϑ (w) formula (70) where A T is defined by formula (69) with frequencies φ j = ωn j k∗i j . Consequently, the desired equation with time-averaged nonlinearity is ˆ l = 1, . . . N , ϑ = ±, (189) vˆ l,ϑ = ·, ϑk∗il Fav,nl ,ϑ (v) + ·, ϑk∗il nl ,ϑ h, which similarly to (142) we recast concisely as v = Fav, (v) + h .
(190)
The following lemma is analogous to Lemmas 32, 26. Lemma 40. Operator Fav, (v) is bounded for bounded v ∈ E 2N , Fav, (0) = 0. Polynomial operator Fav, (v) satisfies the Lipschitz condition Fav, (v1 ) − Fav, (v2 )
E 2N
≤ Cτ ∗ v1 − v2 E 2N ,
(191)
where C depends only on Cχ as in (88), on the power of F and on v1 E 2N + v2 E 2N , and, in particular, it does not depend on β. From Lemma 40 and the contraction principle we obtain the following theorem similarly to Theorem 33. Theorem 41. Let h
E 2N
≤ R. Then there exists R1 > 0 and τ ∗ > 0 such that
Eq. (190) has a solution v ∈ E 2N satisfying v E 2N ≤ R1 , and such a solution is unique. ˆ l,ϑ (k, τ ) be the solution of Theorem 42. Let vˆ l,ϑ (k, τ ) be the solution of (189) and w ˆ replaced by vˆ . (141). Then the vˆ l,ϑ (k, τ ) is a wavepacket satisfying (144), (145) with w In addition to that, there exists β 0 > 0 such that vˆ l,ϑ − w ˆ l,ϑ E ≤ C, l = 1, . . . , N ; ϑ = ±, for 0 < ≤ 1, 0 < β ≤ β 0 . (192) is an Proof. Formula (144), (145) for vˆ l,ϑ (k, τ ) follow from (189). We note that w ˆ ˆ ˆ = w−F ˆ approximate solution of (189), namely we have an estimate for Dav w av, −h which is similar to (150), (151): ˆ ˆ − Fav, − hˆ ≤ C, ˆ = w Dav w E
if 0 < ≤ 1, β ≤ β 0 .
(193)
The proof of (193) is similar to the proof of (155) with minor simplifications thanks to the absence of terms with ∞ . Using (193) we apply Lemma 27 and obtain (192).
Wavepacket Preservation Under Nonlinear Evolution
373
7.2. Averaged system for scalar amplitudes. Now we recast (189) in the form of an equivalent system of scalar equations for amplitudes vˆl,ϑ = vˆλ of solutions vˆ λl defined based on (11), namely vˆ λl (k) = k, ζ (l) k∗il nl ,ζ (l) (k) vˆ λl (k) = vˆl,ζ (l) (k) gnl ,ζ (l) (k). (194) Note that according to (145) support of vˆl,ζ (l) is localized near ζ k∗il , and we can assume that gnl ,ζ (l) (k) depend smoothly on k near this point. Multiplying (189) by gnl ,ζ l (k) (with the standard scalar product in C2 j ) and using (194) we obtain the following system of scalar amplitude equations: vˆl,ϑ = ·, ϑk∗il f av,nl ,ϑ ( v ) + ·, ϑk∗il hˆ nl ,ϑ , l = 1, . . . , N , ϑ = ±, where (195)
(m) ˆ v hˆ nl ,ϑ = gnl ,ϑ · nl ,ϑ h, f av,nl ,ϑ ( λ . (196) f v) = m∈MF λ ∈m n ,ϑ
nl ,ϑ,ξ λ
l
According to (169) the m-linear operators in the above equation are given by f
(m) n,ϑ,ξ
vλ (k, τ ) =
(m) Q n,ϑ,ξ
τ
0
Dm
e
τ iφ n,ϑ,ξ k,k 1
Q
(m) n,ϑ,ξ
m 1 , (197) k, k vˆλi d˜ (m−1)d kdτ i=1
(m) gλ1 k , . . . , gλm k(m) k, k k, k = gn,ϑ (k) · χ . (198) k, k n,ϑ,ξ
The concise form for the system (195) of scalar equations for amplitudes is 2N v = f ( , v ) + hˆ , v ∈ E sc
(199)
where the components vˆl,ϑ of v belong to the space E sc of scalar functions with the (m) norm defined by (17), (18) applied to scalar functions. Note that Q k, k can be n,ϑ,ξ
where (171) is not extended in an arbitrary way as bounded functions for arguments k, k, satisfied, for example the extension can be zero, the extension does not affect solutions of (195) because this equation involves factors ·, ϑk∗il and (145) holds. 2N and f (0) = 0. The Lemma 43. Operator f is bounded for bounded v ∈ E sc polynomial operator f ( v ) satisfies the Lipschitz condition
f ( v1 − v2 E sc2N , v1 ) − f ( v2 ) E sc2N ≤ Cτ ∗ where C depends only on Cχ as in (88), on the order of F as a polynomial and on v1 E 2N + v2 E 2N , and it does not depend on β. From Lemma 40 and the contraction principle we obtain the following theorem similarly to Theorem 33. Theorem 44. Let hˆ 2N ≤ R. Then there exists R1 > 0 and τ ∗ > 0 such that (199) E sc
2N satisfying v E sc2N ≤ R1 , and such a solution is unique. has a solution v ∈ E sc
374
A. Babin, A. Figotin
7.3. Rescaled amplitude equations. According to (145) amplitudes vˆl,ϑ (ζ k∗l + η) are localized about the point η = 0, and to study its behavior in a vicinity of η = 0 we introduce a group of dilation operators
Bβ vˆ (η) = β d vˆ (βη), β > 0,
(200)
which preserve the L 1 -norm and commute with the convolution, i.e. Bβ vˆ 1 = vˆ 1 , Bβ vˆ ∗ Bβ wˆ = Bβ vˆ ∗ wˆ . L L
(201)
We introduce then a rescaled and shifted version of initial data hˆ nl ,ϑ in (196) by the formula Hˆ nl ,ϑ (k) = Bβ hˆ nl ,ϑ (k + ϑk∗l ), hˆ nl ,ϑ (k) = β −d Hˆ nl ,ϑ β −1 (k − ϑk∗l ) , (202) where Bβ is defined by (200), |k − ϑk∗l | ≤ β 1− , and new variables ηl = β −1 (k − ϑk∗l ), l = 1, . . . , N , η = η1 , . . . , η N .
(203)
In this and the following sections we assume that Hˆ nl ,ϑ (β, η) are defined for all η ∈ Rd , including |η| ≥ β − . Though (195) involves hˆ nl ,ϑ with a cutoff factor, namely k, ϑk∗il hˆ nl ,ϑ (k) = k, ϑk∗il , β 1− hˆ nl ,ϑ (k) as in (26), we will later use Hˆ nl ,ϑ (β, η) defined for all η, and assume that (204) 1 − β η Hˆ nl ,ϑ (β, η) 1 ≤ Cβ s , L
where (i) (β η) = η, 0, β − is as in (25), (26); (ii) and s are the same as in Definition 1; (iii) condition (204) is consistent with (29) and (30). For a solution vˆl,ϑ (k, τ ) of (195) using (145) we introduce the following functions. zˆl,ϑ (η, τ ) = β d vˆl,ϑ (ϑk∗l + βη, τ ), zˆl,ϑ (η, τ ) = β η zˆl,ϑ (η, τ ), η ∈ Rd , (205) which satisfy a rescaled version of (195) provided below. Note that since n, ζ = λ ∈ (i) m i ζ k∗li = nl ,ϑ and the nk-spectrum S is resonance invariant we have κm λ = ζ k∗l = ϑk∗l . Since k, k satisfy the convolution identity (87) the variables η, η defined by (203) satisfy a similar identity as well, namely η=
m
i=1
η(i) , η(m) (k, η ) = η −
m−1
η(i) .
(206)
i=1
Change of variables (203) in the integral operator f av,nl ,ϑ defined by (197) yields the following amplitude system for zl,ϑ which is equivalent to (195): zˆl,ϑ (η) = β η f av,nl ,ϑ,β (z ) (η) + β η Hˆ nl ,ϑ (η), l = 1, . . . N , ϑ = ±. (207)
Wavepacket Preservation Under Nonlinear Evolution
According to (137), (196) and (197), k, ϑk∗il , β 1− = β η , (m)
f av,nl ,ϑ,β (z ) =
375
f av,nl ,ϑ,β (z ) =
m∈MF
f
λ ∈m n ,ϑ
(m) nl ,ϑ,ξ λ ,β
(m)
f av,nl ,ϑ,β (z ),
(208)
z λ ,
l
(m) f n,ϑ,ξ λ ,β
z λ (η, τ ) =
Q
(m) n,ϑ,ξ λ
τ
0
η +···+η(m) =η
$ exp iφ
m ϑk∗l + βη, k∗ + β η
i=1
τ % 1 ϑk∗l + βη, k∗ + β η (209) (i) ˜ (m−1)d zˆ λi η η dτ 1 . d
n,ϑ,ξ λ
Note that the condition (171) on the domain of integration takes in the new variables the form (i) (210) η ≤ β − , i = 1, . . . , m and |η| ≤ mβ − . Finally, we rewrite the amplitude system (207) in the concise form 2N z = β · f av,β (z ) + β · Hˆ β , z ∈ E sc .
(211)
Let us show now that (211) is of the form of (118) with 2J -component vector uˆ substituted with 2N -component vector z , the matrix L (k) substituted with a diagonal matrix (m) L with entries ϑωnl (ϑk∗l + βη). For that we introduce the S-averaged tensor Q av 2N m defined on z ∈ C by the formula m
(m) (m) ϑk∗l + βη, k∗ + β η Q zˆ λi (212) Q av,n,ϑ (βη, β η , z ) = λ ∈m n,ϑ
n,ϑ,ξ λ
i=1
2N m into C2N . Note that zˆ and which depends on S through m λi n,ϑ and acts from C (m) n,ϑ,ξ
are scalar factors, zˆ λi is a scalar projection in C2N onto a line along the λith Hence, the right-hand side of (212) is a sum of elementary suscepeigenvector of L. (m) tibilities obtained from Q av as in (132) and (207) has the form of (136). Note that non-zero terms in (212) contain products zˆ λi which satisfy (100). Therefore, if β = 0 (m) and S is resonance invariant, Q av has the form of weakly universal nonlinearity; if S is (m) universally resonance invariant then Q av has the form of a universal nonlinearity as in (65). Q
7.4. Amplitude system with polynomial dispersion relations. Now we introduce an amplitude system with polynomial dispersion which is similar to (207) and provides (i) sufficiently accurate approximation to (207); (ii) standard polynomial dependence of coefficients on η, η in the sense clarified below. The amplitude system has the form (µ,ν) uˆ l,ϑ = β η f nl ,ϑ ( (213) u ) + β η Hˆ nl ,ϑ , l = 1, . . . N , ϑ = ±,
(µ,ν) (m,µ,ν) u λ , (214) f nl ,ϑ ( f u) = m∈MF λ ∈m n ,ϑ l
nl ,ϑ,ξ λ
376
A. Babin, A. Figotin
where (β η) are cutoff-factors defined in (208), (137) and approximations f for f
(m) nl ,ϑ,ξ λ
(m,µ,ν) nl ,ϑ,ξ λ
are defined below. The indices µ = 1, 2, ν = 0, 1 determine the order of
approximation: (i) µ determines the order of approximation of the dispersion relation by a polynomial of the degree µ; (ii) ν determines the order of approximation of the susceptibility coefficients (198) by a trigonometric polynomial of the degree ν. As before, we recast (213) in a concise form, u = β f (µ,ν) ( u ) + β Hˆ ,
(215)
where β (η) = (β η). Finally, we eliminate in (213) the cutoff factor (β η) by setting (β η) = (0) = 1, and introduce the amplitude system with weakly universal nonlinearity and polynomial dispersion without cutoff (µ,ν)
uˆ l,ϑ (η) = f nl ,ϑ ( u ) (η) + Hˆ nl ,ϑ (η), l = 1, . . . N , ϑ = ±,
(216)
which can be written in the form of (215) with β = 1. Let us turn now to the construction of the approximations. For every nk-pair (k∗l , nl ) we introduce the Taylor polynomials of order µ of the dispersion relation ωnl (k∗l + βη): γ 1 (k∗l , nl , βη) = ωnl (k∗l ) + βωnl (k∗l ) η, γ 2 (k∗l , nl , βη) = γ 1 (k∗l , nl , βη) +
β2 η, ωnl (k∗l ) η , 2
and similarly γ 3 for µ = 3. Obviously we have the inequality (see (171)) ωn (k∗l + βη) − γ µ (k∗l , nl , βη) ≤ Cβ (µ+1)(1− 1 ) , k, k ∈ Bβ . l
(217)
The phase function φ n,ζ ,ξ k, k , ξ = n, ζ , defined by (134), is approximated then by a polynomial phase function (µ) φ ζ k , k , βη, β η ∗l ∗ nl ,ζ ,ξ = ζ γ µ (k∗l , nl , βη) − ζ γ µ k∗l1 , n , βη − . . . − ζ (m) γ µ k∗lm , n (m) , βη(m) . (218) Note that since ξ = ξ λ with λ ∈ m nl ,ϑ defined by (187), Eq. (100) is fulfilled. (µ) 1 Hence, φ ϑk∗l , k∗ , 0, 0 = 0 and the function φ depends linearly on η, η nl ,ϑ,ξ
and φ 2
nl ,ϑ,ξ
nl ,ϑ,ξ
is quadratic, namely
ϑk∗l , k∗ , βη, β η = βφ 1 ϑk∗l , k∗ , η, η , (219) nl ,ϑ,ξ ∗ , βη, β η = βφ 1 ∗ , η, η + β 2 φ 2,2 ∗ , η, η . ϑk ϑk ϑk , k , k , k ∗l ∗l ∗l φ1
nl ,ϑ,ξ
φ2
nl ,ϑ,ξ
nl ,ϑ,ξ
nl ,ϑ,ξ
(220)
Wavepacket Preservation Under Nonlinear Evolution
377
In the case µ = 2 the polynomial phase function involves two parameters 1 , 2 : τ 1 φ 2n ,ϑ,ξ ϑk∗l , k∗ , βη, β η l τ τ 1 1 + iφ 2,2 ϑk∗l , k∗ , η, η , (221) = iφ 1n ,ϑ,ξ ϑk∗l , k∗ , η, η nl ,ϑ,ξ l 1 2 1 = , 2 = 2 ; 0 < 1 < ∞, 0 < 2 ≤ ∞, (222) β β where 1 and 2 may be large or small depending on the relation between and β. Sometimes it is convenient to consider 1 and 2 as independent parameters. If µ = 1 we formally set 2 = ∞, τ 1 = 0. If (171) holds we have the estimate 2 $ % τ1 iφ µn ,ϑ,ξ ϑk∗l ,k∗ ,β η,β η τ1 β (µ+1)(1− ) +β η , k +β η iφ ϑk ∗ ∗l nl ,ϑ,ξ e l ≤ Cτ , µ = 1, 2. − e ∗ (223) To ensure that the approximation error is small for given µ we assume that and β satisfy β (µ+1)(1− ) → 0, β → 0, → 0. (224) (m) Now we approximate the dependence of Q ϑk∗l + βη, k∗ + β η on η, η given by n,ζ ,ξ (198) by trigonometric polynomials. Zero order approximation with ν = 0 is given by (m,0) (m) Q ϑk∗l + βη, k∗ + β η = Q ϑk∗l , k∗ . (225) n,ζ ,ξ
n,ζ ,ξ
To define the first order approximation we modify the standard Taylor expansion using trigonometric polynomials instead of algebraic ones. Taking the first derivative with respect to β at β = 0, d (m) (m) ∗ + β η , Q ϑk∗l , η, k∗ , η = ϑk Q + βη, k ∗l n,ζ ,ξ n,ζ ,ξ dβ β=0
which obviously is a linear function with respect to η, η , we express then η in terms of η using (206): Q
(m) n,ζ ,ξ
m
(m), j ( j) ( j) ϑk∗l , η, k∗ , η = q ϑk∗l , k∗ · η( j) , η( j) = η1 , . . . , ηd . j=1
n,ζ ,ξ
Then the first order approximation is Q
(m,1) n,ζ ,ξ
m
(m), j (m) ϑk∗l + βη, k∗ + β η = Q ϑk∗l , k∗ + q ϑk∗l , k∗ · sin βη( j) , n,ζ ,ξ
j=1
n,ζ ,ξ
( j) ( j) where sin η( j) = sin η1 , . . . , sin ηd . An advantage of this approximation is that the ( j)
multiplication by sin η1 is a bounded operator which equals the Fourier transform of a ( j) finite-difference operator whereas the multiplication by η1 corresponds to the partial
378
A. Babin, A. Figotin
derivative and is unbounded. Since the original nonlinearity does not involve unbounded operators, the use of bounded operators is natural and convenient. In fact, it is well known that the presence of the derivatives in the nonlinearity of NLS-type equations causes well known technical difficulties, see 14]. In our approach the approximating equation provides the same accuracy and its nonlinearity involves only bounded finite-difference operators bypassing those difficulties altogether. According to Condition 16 the susceptibility is smooth and if (210) holds we have the following inequality: (m) (m,ν) Q ϑk∗l + βη, k∗ + β η − Q ϑk∗l , k∗ , βη, β η ≤ Cβ (ν+1)(1− 1 ) . (226) n,ζ ,ξ
n,ζ ,ξ
(m,µ,ν)
We introduce components f nl ,ϑ,λ formula τ (m,µ,ν) z λ (η, τ ) = f nl ,ϑ,λ
0
of the weakly universal nonlinearity f (µ,ν) by the
η +···+η(m) =η
e
iφ 1
nl ,ϑ,ξ
τ ϑk∗l ,k∗ ,η,η 1 +iφ 2,2 1
nl ,ϑ,ξ
τ ϑk∗l ,k∗ ,η,η 1
2
(227) m (m,ν) ∗ ϑk η(i) d˜ (m−1)d k dτ 1 . , k z ˆ Q ∗l λ i nl ,ϑ,ξ
i=1
As before, we establish standard properties of the operator f (µ,ν) defined by the above formula. 2N , f (0) = 0. The Lemma 45. Operator β f (µ,ν) is bounded for bounded u ∈ E sc (µ,ν) satisfies the Lipschitz condition polynomial operator β f u 1 − u2 E sc2N , (228) u 1 ) − β f (µ,ν) ( u 2 ) 2N ≤ Cτ ∗ β f (µ,ν) ( E sc
where C depends only on Cχ as in (88), on the power of F and on u 1 E sc2N + u 2 E sc2N . In particular, it does not depend on β ≥ 0 and on 0 < 1 < ∞, 0 < 2 ≤ ∞. From Lemma 40 and the contraction principle we obtain the following theorem completely similar to Theorem 33. Theorem 46. Let hˆ 2N ≤ R. Then there exists R1 > 0 and τ ∗ > 0 such that E sc
2N satisfying z E sc2N ≤ R1 . Such a solution is unique Eq. (190) has a solution z ∈ E sc −
and zˆl,ϑ (k, τ ) = 0 if |k| ≥ β .
Theorem 47. Let uˆ l,ϑ (k, τ ) be a solution to (213) and zˆl,ϑ (k, τ ) be the solution of (211). Then the following inequality holds: uˆ l,ϑ − zˆl,ϑ ≤ Cβ (µ+1)(1− ) + C−1 β (µ+1)(1− ) , l = 1, . . . , N ; ϑ = ±, E sc (229) for all 0 < ≤ 1 and 0 < β ≤ β 0 , where is the same as in Definition 1, β 0 is sufficiently small.
Wavepacket Preservation Under Nonlinear Evolution
379
Proof. To obtain (229) we note that u l,ϑ is an approximate solution of (211), namely u − β f (µ,ν) ( u ) − hˆ = Dˆ where Dˆ is small. To estimate Dˆ observe that integrals involving u have the integration domain as in (171). Hence, using (226) and (223) we obtain ˆ D 2N ≤ Cβ (µ+1)(1− ) + C−1 β (µ+1)(1− ) , E sc
and applying Lemma 27 we get (229).
7.5. Decay of solutions and elimination of cutoff factors. In this subsection we show how to remove the cutoff function in (213) and to obtain the averaged interaction system with a weakly universal nonlinearity. If µ = 1, ν = 0 and the nk-spectrum S is resonance-invariant, the amplitude system coincides with the system (62) with a weakly universal nonlinearity. For µ > 1 or ν > 0 the amplitude system involves additional terms. In particular, if µ = 2, ν = 0 and S = {(k∗ , n)} is just a single element then the linear part has the second order and the nonlinearity is universal, and the amplitude system turns into the classical NLS system: 1 ∂τ u ζ = ζ γ 2 (k∗ , n, −iζ β∇ r η) + bζ u 2ζ u −ζ , u ζ (0) = Hˆ ζ , ζ = ±. This system is equivalent to (51) when Hˆ − = Hˆ +∗ , b− = b+∗ , u − = u ∗+ . When ν > 0 the nonlinearity involves additional terms with finite difference operators. The possibility to remove cutoff functions is based on the fast decay of uˆ (k) as |k| → ∞, which is equivalent to high smoothness of u (r). The factor β can be replaced by 1 with a small error when data Hˆ (k) decay sufficiently fast. To describe the decay we introduce weighted Banach spaces of scalar functions Hˆ (k) described as follows. Definition 48 (Weight function). For a ≥ 0 we call a positive function ψ (r ), r ≥ 0, a weight function from class W (a) if it satisfies the following conditions: (i) ψ (0) > 0, ψ (r1 ) ≥ ψ (r2 ) for r1 ≥ r2 ≥ 0; (ii) ψ (r1 + r2 ) ≤ ψ (r1 ) + ψ (r2 ) + C, where C does not depend on r1 , r2 (ψ is sublinear); (iii) ψ (r ) − a ln r ≥ C > 0 for all r > 0 (ψ (r ) is superlogarithmic). We introduce L 1 (ψ) as a space of scalar functions Hˆ (k), k ∈ Rd with the norm ˆ (230) = eψ(|k|) Hˆ (k) dk. H 1 L (ψ)
Rd
For vector-functions we use the same formula with Euclidean norm |·|. In the simplest case of ψ (r ) = a ln (1 + r ) we have ψ ∈ W (a) and obtain L 1 (ψ) = L 1,a with the norm (19). If the weight function belongs to W (a) for all a the space L 1 (ψ) consists of the Fourier transforms of infinitely smooth functions. The following lemma shows that L 1 (ψ) is closed with respect to the convolution.
380
A. Babin, A. Figotin
Lemma 49. Let Hˆ 1 , Hˆ 2 ∈ L 1 (ψ) and ˆ Hˆ 1 k − k Hˆ 2 k − k dk . H3 (k) = Then Hˆ 3 (k)
Rd
L 1 (ψ)
≤ C Hˆ 1 (k)
L 1 (ψ)
ˆ H1 (k)
L 1 (ψ)
.
(231)
Proof. Using Definition 48 (ii) we obtain ψ(|k|) ˆ e eψ(|k|) Hˆ 1 k − k Hˆ 2 k dk H3 (k) ≤ Rd ≤ eC eψ (|k |) eψ (|k−k |) Hˆ 1 k − k Hˆ 2 k dk . Rd
Applying Young’s inequality (122) we obtain eψ(|k|) Hˆ 3 (k) dk ≤ eC eψ(|k|) Hˆ 1 (k) dk Rd
implying (231).
Rd
Rd
eψ(|k|) Hˆ 2 (k) dk ,
Let us introduce the norm in the space E sc (ψ) by the formula (17) ˆ ˆ ψ(|k|) ˆ H τ = H (·, ·) e = sup (k, ) dk. (232) H (·, ·) d E(ψ) C ([0,τ ∗ ],L 1 (ψ)) 0≤τ ≤τ ∗ R Using (231) instead of (18) we obtain as in Lemma 25 the following statement. 2N (ψ), f (0) = Lemma 50. Operator β f (s,ν) in (215) is bounded for bounded u ∈ E sc 0, and satisfies the Lipschitz condition u 1 − u2 E sc2N (ψ) , ≤ Cτ ∗ (233) u 1 ) − β f (s,ν) ( u 2 ) 2N β f (s,ν) ( E sc (ψ)
where C depends only on Cχ as in (88), on the power of polynomial f (s,ν) and on u 1 E sc2N (ψ) + u 1 E sc2N (ψ) and does not depend on β ≥ 0 and on 0 < 1 < ∞, 0 < 2 ≤ ∞. From Lemma 40 and the contraction principle we obtain the following theorem completely similar to Theorem 33. Theorem 51. Let Hˆ 2N ≤ R. Then there exists R1 > 0 and τ ∗ > 0 such that E sc (ψ)
2N (ψ) which satisfies u E sc2N (ψ) ≤ R1 , and such a Eq. (215) has a solution u ∈ E sc solution is unique.
The following lemma shows that can be replaced by one with a small error. Lemma 52. Let Hˆ 1 ≤ C, ψ ∈ W (a), as in (25). If s > 0, > 0 and s < a, then (204) holds.
L (ψ)
Wavepacket Preservation Under Nonlinear Evolution
Proof. We have 1 − β η Hˆ (η) dη ≤
|η|≥β −
≤
|η|≥β −
381
ˆ dη = H (η)
|η|≥β −
e−ψ(|η|) eψ(|η|) Hˆ (η) dη
− −
− e−ψ (β ) eψ(|k|) Hˆ (η) dη ≤ β s eln(β )s/ −ψ (β ) Hˆ
(234) L 1 (ψ)
.
According to Definition 48 (iii), ln β − s/ − ψ β − ≤ a ln β − − ψ β − ≤ C, and we obtain (204) from (234). Theorem 53. Let Hˆ 2N ≤ R, where the weight function ψ belongs to W (a) and E sc (ψ)
let s < a. Let u and u0 be solutions to respectively the minimal equation with cutoff factor and without cutoff factor respectively. Then there exists Cs and β 0 such that u − u0 E sc2N (ψ) ≤ Cs β s , 0 < β ≤ β 0 .
(235)
Proof. We show that u is an approximate solution to u0 = f (µ,ν) ( u 0 ) + Hˆ . Namely, ˆ u = β f (µ,ν) ( u )+β Hˆ = f (µ,ν) ( u )+ Hˆ + D, Dˆ = β −1 f (µ,ν) ( u )+ β −1 Hˆ . 2N (ψ) then f (µ,ν) ( 2N (ψ). Applying Lemma According to Lemma 49 if u ∈ E sc u ) ∈ E sc 52 we obtain ˆ ≤ Cβ s , 0 < β ≤ β 0 . (236) D 2N E sc (ψ)
Lemma 27 combined with (236) yields (235).
Now we give the theorem on approximation by solutions of a minimal system without cutoff. Theorem 54. Let Hˆ l,ζ (k), l = 1, . . . , N be functions bounded in L 1 (ψ), where ψ belongs to W (a), let s < a. Let hˆ l,ζ (k) be defined by (202) and hˆ l,ζ (k) = hˆ l,ζ (k) gnl ,ζ (k). Let uˆ (k, τ ) be a solution of Eq. (118) with multiwavepacket initial data of the form (33). Let u l,ϑ (k, τ ) be a solution to the system with a weakly universal nonlinearity (216) with initial data u l,ϑ (k, 0) = Hˆ l,ϑ (k) and uˆ min (k, τ ) =
N
ϑ
Then
β −d u l,ϑ β −1 k − ζ k∗il , τ gnl ,ϑ (k).
l=1
uˆ − uˆ min ≤ C ,s β s + Cβ (ν+1)(1− ) + C−1 β (µ+1)(1− ) + C. E
(237)
N Proof. We take uˆ = ϑ l=1 u l,ϑ and estimate uˆ (k, τ ) − uˆ min (k, τ ) E applying subsequently Theorems 37, 42, formulas (194) and (205), Theorem 47 and finally Theorem 53 to obtain inequality (237).
382
A. Babin, A. Figotin
Note that Theorem 7 is a direct corollary of Theorem 54. Remark 55. Note that (216) is the Fourier integral version of the following system of equations based on weakly universal nonlinearity and is slightly more general than (62), ∂τ u l,ϑ =
1 i (µ,ν) ωnl (k∗l ) · ∇ x u l,ϑ + ∇ r · ωnl (k∗l ) ∇ r u l,ϑ + f nl ,ϑ ( u , δ u), 1 22 u lϑ |τ =0 = Hˆ lϑ , where δ i u l (r) = u j (r + ei ) − u j (r − ei ),
(238)
where 1 , 1 are as in (222) and ei is ith standard ort in Rd . In the case when (52) holds 1/2 is bounded or small and the dependence on the coefficient 1/2 is regular for small and β and u ϑ, j (k, τ ) may be looked at as a shape function. When 1 = and 1/2 is substituted by zero we obtain an equation exactly of the form (62). When ν = 0, µ = 1 and the nk-spectrum S is universally resonance invariant as (1,0) in Definition 18, the nonlinearities f nl ,ϑ,0 are universal of the form (65). When the nk-spectrum S is resonance invariant but not universally resonance invariant, the nonlinearities are weakly universal, but may be not universal, that allows, in particular, for the second and the third harmonic generation. Acknowledgement. Effort of A. Babin and A. Figotin is sponsored by the Air Force Office of Scientific Research, Air Force Materials Command, USAF, under grant number FA9550-04-1-0359. We also would like to express our deep gratitude to the reviewer for the thorough analysis of our work and valuable suggestions which helped to improve the presentation of our results.
References 1. Babin, A., Figotin, A.: Nonlinear Photonic Crystals: I. Quadratic nonlinearity. Waves in Random Media 11, R31–R102 (2001) 2. Babin, A., Figotin, A.: Nonlinear Photonic Crystals: II. Interaction classification for quadratic nonlinearities. Waves in Random Media 12, R25–R52 (2002) 3. Babin, A., Figotin, A.: Nonlinear Photonic Crystals: III. Cubic Nonlinearity. Waves in Random Media 13, R41–R69 (2003) 4. Babin, A., Figotin, A.: Nonlinear Maxwell Equations in Inhomogenious Media. Commun. Math. Phys. 241, 519–581 (2003) 5. Babin, A., Figotin, A.: Polylinear spectral decomposition for nonlinear Maxwell equations. In: Agranovich, M.S., Shubin, M.A. (eds.) Partial Differential Equations, Advances in Mathematical Sciences, American Mathematical Society Translations-Series 2, Vol. 206, Providence, RI: Amer. Math. Soc., 2002, pp. 1–28 6. Babin, A., Figotin, A.: Nonlinear Photonic Crystals: IV Nonlinear Schrodinger Equation Regime. Waves in Random and Complex Media, 15(2), 145–228 (2005) 7. Babin, A., Figotin, A.: Linear Superposition In Nonlinear Wave Dynamics. Rev. Math. Phys. 18(9), 971–1053 (2006) 8. Babin, A., Mahalov, A., Nicolaenko, B.: Global regularity of 3D rotating Navier-Stokes equations for resonant domains. Indiana Univ. Math. J. 48(3), 1133–1176 (1999) 9. Babin, A., Mahalov, A., Nicolaenko, B.: Fast Singular Oscillating Limits and Global Regularity for the 3D Primitive Equations of Geophysics. M2AN 34(2), 201–222 (2000) 10. Ben Youssef, W., Lannes, D.: The long wave limit for a general class of 2D quasilinear hyperbolic problems. Comm. Par. Differ. Eqs. 27(5–6), 979–1020 (2002) 11. Bogoliubov, N.N., Mitropolsky, Y.A.: Asymptotic Methods In The Theory Of Non-Linear Oscillations. Delhi: Hindustan Pub. Corp., 1961 12. Boyd, R.: Nonlinear Optics. London:Academic Press, 1992 13. Bona, J.L., Colin, T., Lannes, D.: Long wave approximations for water waves. Arch. Rat. Mech. Anal. 178(3), 373–410 (2005) 14. Bourgain, J.: Global solutions of nonlinear Schrödinger equations. American Mathematical Society Colloquium Publications 46. Providence, RI: Amer. Math. Soc., 1999
Wavepacket Preservation Under Nonlinear Evolution
383
15. Butcher, P., Cotter, D.: The Elements of Nonlinear Optics. Cambridge: Cambridge Univ. Press, 1993 16. Cazenave, T.: Semilinear Schrödinger equations. Courant Lecture Notes in Mathematics 10, New York:New York University, Courant Institute of Mathematical Sciences, Providence, RI: Amer. Math. Soc. 2003 17. Colin, T.: Rigorous derivation of the nonlinear Schrödinger equation and Davey-Stewartson systems from quadratic hyperbolic systems. Asymptot. Anal. 31(1), 69–91 (2002) 18. Colin, T., Lannes, D.: Justification of and long-wave correction to Davey-Stewartson systems from quadratic hyperbolic systems. Discrete Contin. Dyn. Syst. 11(1), 83–100 (2004) 19. Craig, W., Groves, M.D.: Normal forms for wave motion in fluid interfaces. Wave Motion 31(1), 21–41 (2000) 20. Craig, W., Sulem, C., Sulem, P.-L.: Nonlinear modulation of gravity waves: a rigorous approach. Nonlinearity 5(2), 497–522 (1992) 21. Dobrokhotov, S.Yu., Maslov, V.P., Omelyanov, G.A.: Multiwave interaction in weakly nonlinear media with dispersion. In: Mathematical mechanisms of turbulence, i, Kiev: Akad. Nauk Ukrain. SSR, Inst. Mat., 1986, pp. 25–45 22. Dineen, S.: Complex Analysis on Infinite Dimensional Spaces. Berlin-Heidelberg-New york: Springer, 1999 23. Giannoulis, J., Mielke, A.: The nonlinear Schrödinger equation as a macroscopic limit for an oscillator chain with cubic nonlinearities. Nonlinearity 17(2), 551–565 (2004) 24. Goodman, R.H., Weinstein, M.I., Holmes, P.J.: Nonlinear propagation of light in one-dimensional periodic structures. J. Nonlinear Sci. 11(2), 123–168 (2001) 25. Groves, M.D., Schneider, G.: Modulating pulse solutions for quasilinear wave equations. J. Differ. Eq. 219(1), 221–258 (2005) 26. Hayashi, N., Naumkin, P.: Asymptotics of small solutions to nonlinear Schrödinger equations with cubic nonlinearities. Int. J. Pure Appl. Math. 3(3), 255–273 (2002) 27. Hille, E., Phillips, R.S.: Functional Analysis and Semigroups. Providence RI:AMS, 1991 28. Infeld, E., Rowlands, G.: Nonlinear Waves, Solitons, and Chaos. 2nd ed., Cambridge: Cambridge University Press, 2000 29. Joly, J.-L., Metivier, G., Rauch, J.: Diffractive nonlinear geometric optics with rectification. Indiana Univ. Math. J. 47(4), 1167–1241 (1998) 30. Kalyakin, L.A.: Long-wave asymptotics. Integrable equations as the asymptotic limit of nonlinear systems. Usp. Mat. Nauk 44(1)(265), 5–34, 247 (1989); translation in Russ. Math. Surv. 44(1), 3–42 (1989) 31. Kalyakin, L.A.: Asymptotic decay of a one-dimensional wave packet in a nonlinear dispersive medium. Math. USSR Sb. 60(2), 457–483 (1988) 32. Krieger, J., Schlag, W.: Stable manifolds for all monic supercritical focusing nonlinear Schrödinger equations in one dimension. J. Amer. Math. Soc. (Electronic) 19(4), 815–920 (2006) 33. Kuksin, S.B.: Fifteen years of KAM for PDE. Geometry, topology, and mathematical physics, Amer. Math. Soc. Transl. Ser. 2, 212, Providence, RI: Amer. Math. Soc., 2004, pp. 237–258 34. Kirrmann, P., Schneider, G., Mielke, A.: The validity of modulation equations for extended systems with cubic nonlinearities. Proc. Roy. Soc. Edinburgh Sect. A 122(1–2), 85–91 (1992) 35. Kato, T.: Perturbation Theory for Linear Operators. Berlin-Heidelberg-New York: Springer, 1980 36. Lax, P.D.: Integrals of nonlinear equations of evolution and solitary waves. Comm. Pure Appl. Math. 21, 467–490 (1968) 37. Mitropolskii, Yu.A., Nguyen, V.D.: Applied asymptotic methods in nonlinear oscillations. Solid Mechanics and its Applications 55. Dordrecht: Kluwer Academic Publishers Group, 1997 38. Maslov. V.P.: Non-standard characteristics in asymptotic problems. Usp. Mat. Nauk 38:6, 3–36 (1983), translation in Russ. Math. Surv. 38:6, 1–42 (1983) 39. Maslov, V.P.: Mathematical aspects of integral optics. Russ. J. Math. Phys. 8(1), 83–105 (2001) 40. Mielke, A., Schneider, G., Ziegra, A.: Comparison of inertial manifolds and application to modulated systems. Math. Nachr. 214, 53–69 (2000) 41. Moloney, J., Newell, A.: Nonlinear Optics. Advanced Book Program, Boulder, CO: Westview Press, 2004 42. Mills, D.: Nonlinear Optics. Berlin-Heidelberg-New York: Springer-Verlag, 1991 43. Nayfeh, A.H.: Perturbation Methods. New York: Wiley, 1973 44. Ostrovsky, L., Potapov, A.: Modulated Waves. Baltimore MD: The John Hopkins Univ. Press, 1999 45. Pankov, A.: Travelling Waves And Periodic Oscillations In Fermi-Pasta-Ulam Lattices. London: Imperial College Press, 2005 46. Phillips, O.M.: Wave Interactions. In: Leibovich, S., Seebass, A.R. (eds.) Nonlinear Waves. Ithaca and London: Cornell Univ. Press, 1974 47. Pierce, R.D., Wayne, C.E.: On the validity of mean-field amplitude equations for counterpropagating wavetrains. Nonlinearity 8(5), 769–779 (1995) 48. Sauter, E.G.: Nonlinear Optics. New york: Wiley-Interscience, 1996
384
A. Babin, A. Figotin
49. Schlag, W.: Spectral theory and nonlinear partial differential equations: a survey. Discrete Contin. Dyn. Syst. 15(3), 703–723 (2006) 50. Schneider, G.: Justification of modulation equations for hyperbolic systems via normal forms. NoDEA Nonlinear Differential Equations Appl. 5(1), 69–82 (1998) 51. Schneider, G.: Justification and failure of the nonlinear Schrödinger equation in case of non-trivial quadratic resonances. J. Differ. Eq. 216(2), 354–386 (2005) 52. Schneider, G., Uecker, H.: Nonlinear coupled mode dynamics in hyperbolic and parabolic periodically structured spatially extended systems. Asymptot. Anal. 28(2), 163–180 (2001) 53. Schneider, G., Uecker, H.: Existence and stability of modulating pulse solutions in Maxwell’s equations describing nonlinear optics. Z. Angew. Math. Phys. 54(4), 677–712 (2003) 54. Schneider, G., Wayne, C.E.: Estimates for the three-wave interaction of surface water waves. European J. Appl. Math. 14(5), 547–570 (2003) 55. Sipe, J.E., Bhat, N., Chak, P., Pereira, S.: Effective field theory for the nonlinear optical properties of photonic crystals. Phys. Rev. E 69, 016604 (2004) 56. Slusher, R.E., Eggleton, B.J.: Nonlinear Photonic Crystals. Berlin-Heidelberg-New York: Springer-Verlag, 2003 57. Sulem, C., Sulem, P.-L.: The Nonlinear Schrodinger Equation. Berlin-Heidelberg-New York: Springer, 1999 58. Volkov, S.N., Sipe, J.E.: Nonlinear optical interactions of wave packets in photonic crystals: Hamiltonian dynamics of effective fields. Phys. Rev. E 70, 066621 (2004) 59. Soffer, A., Weinstein, M.I.: Resonances, radiation damping and instability in Hamiltonian nonlinear wave equations. Invent. Math. 136(1), 9–74 (1999) 60. Weissert, T.P.: The Genesis of Simulation in Dynamics: pursuing the Fermi-Pasta-Ulam problem. New York: Springer-Verlag, 1997 61. Whitham, G.: Linear and Nonlinear Waves. New York: John Wiley & Sons, 1974 Communicated by P. Constantin
Commun. Math. Phys. 278, 385–431 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0410-4
Communications in
Mathematical Physics
Random Walk on the Incipient Infinite Cluster for Oriented Percolation in High Dimensions Martin T. Barlow1 , Antal A. Járai2 , Takashi Kumagai3 , Gordon Slade1 1 Department of Mathematics, University of British Columbia, Vancouver, BC V6T 1Z2, Canada.
E-mail:
[email protected];
[email protected] 2 Carleton University, School of Mathematics and Statistics, 1125 Colonel By Drive,
Ottawa, ON K1S 5B6, Canada. E-mail:
[email protected] 3 Department of Mathematics, Faculty of Science, Kyoto University, Kyoto 606-8502, Japan.
E-mail:
[email protected] Received: 7 August 2006 / Accepted: 13 September 2007 Published online: 8 January 2008 – © Springer-Verlag 2008
Abstract: We consider simple random walk on the incipient infinite cluster for the spread-out model of oriented percolation on Zd × Z+ . In dimensions d > 6, we obtain bounds on exit times, transition probabilities, and the range of the random walk, which establish that the spectral dimension of the incipient infinite cluster is 43 , and thereby prove a version of the Alexander–Orbach conjecture in this setting. The proof divides into two parts. One part establishes general estimates for simple random walk on an arbitrary infinite random graph, given suitable bounds on volume and effective resistance for the random graph. A second part then provides these bounds on volume and effective resistance for the incipient infinite cluster in dimensions d > 6, by extending results about critical oriented percolation obtained previously via the lace expansion. 1. Introduction and Main Results 1.1. Introduction. The problem of random walk on a percolation cluster—the ‘ant in the labyrinth’ [17]—has received much attention both in the physics and the mathematics literature. Recently, several papers have considered random walk on a supercritical percolation cluster [5,9,34,35]. Roughly speaking, supercritical percolation clusters on Zd are d-dimensional, and these papers prove, in various ways, that a random walk on a supercritical percolation cluster behaves in a diffusive fashion similar to a random walk on the entire lattice Zd . Although a mathematically rigorous understanding of critical percolation clusters is restricted to examples in dimensions d = 2 and d > 6, or d > 4 in the case of oriented percolation, it is generally believed that critical percolation clusters in dimension d have dimension less than d, and that random walk on a large critical cluster behaves subdiffusively. Critical percolation clusters are believed to be finite in all dimensions, and are known to be finite in the oriented setting [11]. To avoid finite-size issues associated with random walk on a finite cluster, it is convenient to consider random walk on the incipient infinite cluster (IIC), which can be understood as a critical percolation cluster
386
M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade
conditioned to be infinite. The IIC has been constructed so far only when d = 2 [29], when d > 6 (in the spread-out case) [24], and when d > 4 for oriented percolation on Zd × Z+ (again in the spread-out case) [21]. See [36] for a summary of the highdimensional results. Also, it is not difficult to construct the IIC on a tree [7,30]. Random walk on the IIC has been proved to be subdiffusive on Z2 [30] and on a tree [7,30]. See also [13,14] for related results in the continuum limit. In this paper, we prove several estimates for random walk on the IIC for spread-out oriented percolation on Zd × Z+ in dimensions d > 6. These estimates, which show subdiffusive behaviour, establish that the spectral dimension of the IIC is 43 , thereby proving the Alexander–Orbach [3] conjecture in this setting. For random walk on ordinary (unoriented) percolation for d < 6 the Alexander–Orbach conjecture is generally believed to be false [27, Sect. 7.4]. The upper critical dimension for oriented percolation is 4. Because of this, we initially expected that the spectral dimension of the IIC would be equal to 43 for oriented percolation in all dimensions d > 4, but not for d < 4. However, our methods require that we take d > 6. The random walk is allowed to travel backwards in ‘time’ (as measured by the oriented percolation process), and this allows the walk to move between vertices that are not connected to each other in the oriented sense. It may be that this effect raises the upper critical dimension for the random walk in the oriented setting to d = 6. Or it may be that our conclusions for the random walk remain true for all dimensions d > 4, despite the fact that our methods force us to assume d > 6. This leads to the open question: Do our results actually apply in all dimensions d > 4, or does different behaviour apply for 4 < d ≤ 6? 1.2. Random walk on graphs and in random environments. Our results on the IIC will be consequences of more general results on random walks on a family of random graphs. We now set up our notation for this. Let = (G, E) be an infinite graph, with vertex set G and edge set E. The edges e ∈ E are not oriented. We assume that is connected. We write x ∼ y if {x, y} ∈ E, and assume that (G, E) is locally finite, i.e., µ y < ∞ for each y ∈ G, where µ y is the number of bonds that contain y. We extend µ to a measure on G. Let X = (X n , n ∈ Z+ , P x , x ∈ G) be the discrete-time simple random walk on , i.e., the Markov chain with transition probabilities P x (X 1 = y) =
1 , µx
y ∼ x.
(1.1)
We define the transition density (or discrete-time heat kernel) of X by pn (x, y) =
P x (X n = y) ; µy
(1.2)
we have pn (x, y) = pn (y, x). The natural metric on , obtained by counting the number of steps in the shortest path between points, is written d(x, y) for x, y ∈ G. We write B(x, r ) = {y : d(x, y) < r }, V (x, r ) = µ(B(x, r )), r ∈ (0, ∞).
(1.3)
Following terminology used for manifolds, we call V (x, r ) the volume of the ball B(x, r ). We will assume G contains a marked vertex, which we denote 0, and we write B(R) = B(0, R), V (R) = V (0, R).
(1.4)
Random Walk on the Incipient Infinite Cluster for Oriented Percolation
387
For A ⊂ G, we write T A = inf{n ≥ 0 : X n ∈ A}, τ A = T Ac ,
(1.5)
τ R = τ B(0,R) = min{n ≥ 0 : X n ∈ B(0, R)}.
(1.6)
and let Let Wn = {X 0 , X 1 , . . . , X n } be the set of vertices hit by X up to time n, and let µx . Sn = µ(Wn ) =
(1.7)
x∈Wn
We write Reff (0, B(R)c ) for the effective resistance between 0 and B(R)c in the electric network obtained by making each edge of a unit resistor—see [15]. A precise mathematical definition of Reff (·, ·) will be given in Sect. 2. We now consider a probability space (, F, P) carrying a family of random graphs (ω) = (G(ω), E(ω), ω ∈ ). We assume that, for each ω ∈ , the graph (ω) is infinite, locally finite and connected, and contains a marked vertex 0 ∈ G. We denote balls in (ω) by Bω (x, r ), their volume by Vω (x, r ), and write B(R) = Bω (R) = Bω (0, R),
V (R) = Vω (R) = Vω (0, R).
(1.8)
We write X = (X n , n ≥ 0, Pωx , x ∈ G(ω)) for the simple random walk on (ω), and denote by pnω (x, y) its transition density with respect to µ(ω). Formally, we introduce a second measure space (, F ), and define X on the product × . We write ω to denote elements of . The key ingredients in our analysis of the simple random walk are volume and resistance bounds. The following defines a set J (λ) of values of R for which we have ‘good’ volume and effective resistance estimates. The set J (λ) depends on the graph , and thus is a random set under P. Definition 1.1. Let = (G, E) be as above. For λ > 1, let J (λ) be the set of those R ∈ [1, ∞] such that the following all hold: (1) V (R) ≤ λR 2 , (2) V (R) ≥ λ−1 R 2 , (3) Reff (0, B(R)c ) ≥ λ−1 R. Note that Reff (0, B(R)c ) ≤ R (see Lemma 2.2(c) in Sect. 2.1 ), so there is no need for an upper bound complementary to Definition 1.1(3). We now make the following important assumption concerning the graphs ((ω)). This involves upper and lower bounds on the volume, as well as an estimate which says that R is likely to be in J (λ) for large enough λ. Assumption 1.2. There exists R ∗ ≥ 1 such that the following hold: (1) There exists p(λ) ≥ 0, with p(λ) ≤ c1 λ−q0 for some q0 , c1 > 0, such that for each R ≥ R∗, P(R ∈ J (λ)) ≥ 1 − p(λ), (1.9) (2) E[V (R)] ≤ c2 R 2 , for R ∈ [R ∗ , ∞), (3) E[1/V (R)] ≤ c3 R −2 for R ∈ [R ∗ , ∞).
388
M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade
Remark. Assumption 1.2(2,3), together with Markov’s inequality, provides upper bounds of the form cλ−1 for the probability of the complements of the events in Definition 1.1(1,2). This creates some redundancy in our formulation, but we state things this way because some of our conclusions for the random walk rely only on Assumption 1.2(1) and do not require the stronger volume bounds given by Assumption 1.2(2,3). Note that Assumption 1.2 only involves statements about the volume and resistance from one point 0 in the graph. In general, this kind of information would not be enough to give much control of the random walk. However, the graphs considered here have strong recurrence properties, and are therefore simpler to handle than general graphs. We use techniques developed in [6,7,37–39]. We will prove in Theorem 1.7 that Assumption 1.2 holds for the IIC for sufficiently spread-out oriented percolation on Zd × Z+ when d > 6. As the reader of Sects. 4–5 will see, obtaining volume and (especially) resistance bounds on the IIC from one base point is already difficult; it is fortunate that we do not need to assume more. We have the following four consequences of Assumption 1.2 for random graphs. They give control, in different ways, of the quantities E ω0 τ R , p2n (0, 0), d(0, X n ), and Sn , which measure the rate of dispersion of the random walk X from the base point 0. Some statements in the first proposition involve the averaged law defined by the semi-direct product P ∗ = P × Pω0 . Theorem 1.3. Suppose Assumption 1.2(1) holds. Then, uniformly with respect to n ≥ 1 and R ≥ 1, P(θ −1 ≤ R −3 E ω0 τ R ≤ θ ) → 1 as θ → ∞,
(1.10)
≤ θ ) → 1 as θ → ∞,
(1.11)
< θ ) → 1 as θ → ∞,
(1.12)
−1/3
(1.13)
P(θ ∗
P (θ
−1
−1
ω ≤ n 2/3 p2n (0, 0) ∗ P (d(0, X n )n −1/3
< (1 + d(0, X n ))n
) → 1 as θ → ∞.
Since Pω0 (X 2n = 0) ≈ n −2/3 , we cannot replace 1 + d(0, X n ) by d(0, X n ) in (1.13). Theorem 1.4. Suppose Assumption 1.2(1,2,3) hold. Then there exists n ∗ ≥ 1 (depending only on R ∗ and the function p(·) in Assumption 1.2), and constants ci such that c3 n
−2/3
c1 R 3 ≤ E(E ω0 τ R ) ≤ c2 R 3
ω ≤ E( p2n (0, 0)) ≤ c4 n −2/3 c5 n 1/3 ≤ E(E ω0 d(0, X n ))
for all R ≥ 1,
(1.14)
∗
(1.15)
∗
(1.16)
for all n ≥ n , for all n ≥ n .
We do not have an upper bound in (1.16); this is discussed further in Example 2.6 below. Remark. The above two theorems in fact do not require the polynomial decay of p(λ); it is enough to have p(λ) → 0 as λ → ∞. Let ds (G) be the spectral dimension of G, defined by log p2n (x, x) , n→∞ log n
ds (G) = −2 lim
(1.17)
if this limit exists. Here x ∈ G; it is easy to see that the limit is independent of the base point x. Note that ds (Zd ) = d. In (c) below, recall that is the second probability space, on which the random walk X is defined.
Random Walk on the Incipient Infinite Cluster for Oriented Percolation
389
Theorem 1.5. Suppose Assumption 1.2(1) holds. Then there exist α1 , α2 , α3 , α4 < ∞, and a subset 0 with P(0 ) = 1 such that the following statements hold: (a) For each ω ∈ 0 and x ∈ G(ω) there exists N x (ω) < ∞ such that ω (log n)−α1 n −2/3 ≤ p2n (x, x) ≤ (log n)α1 n −2/3 , n ≥ N x (ω).
(1.18)
In particular, ds (G) = 43 , P-a.s., and the random walk is recurrent. (b) For each ω ∈ 0 and x ∈ G(ω) there exists Rx (ω) < ∞ such that (log R)−α2 R 3 ≤ E ωx τ R ≤ (log R)α2 R 3 , R ≥ Rx (ω).
(1.19)
Hence log E ωx τ R = 3. R→∞ log R lim
(c) Let Yn = max0≤k≤n d(0, X k ). For each ω ∈ 0 and x ∈ G(ω) there exist N x (ω, ω), Rx (ω, ω) such that Pωx (N x < ∞) = Pωx (Rx < ∞) = 1, and such that (log n)−α3 n 1/3 ≤ Yn (ω, ω) ≤ (log n)α3 n 1/3 , n ≥ N x (ω, ω),
(1.20)
(log R)−α4 R 3 ≤ τ R (ω, ω) ≤ (log R)α4 R 3 , R ≥ Rx (ω, ω).
(1.21)
Remark. One cannot expect (1.18) or (1.19) to hold with α1 = 0 or α2 = 0, since it is known that log log fluctuations occur in the analogous limits for the IIC on regular trees [7]. (This example is discussed further in Example 1.8(i) below.) Let Wn = {X 0 , X 1 , . . . , X n } as before and let |Wn | denote its cardinality. For a sufficiently regular recurrent graph one expects that |Wn | ≈ n ds /2 . The original formulation of the Alexander-Orbach conjecture [3] was that, in all dimensions, for the IIC, |Wn | ≈ n 2/3 ,
(1.22)
so that ds = in all dimensions. As noted already above, the conjecture is now not believed to hold in low dimensions. The following theorem shows that a version of the Alexander–Orbach conjecture does hold for random graphs that satisfy Assumption 1.2(1). As we will see in Theorem 1.7, this is the case for the IIC for sufficiently spread-out oriented percolation on Zd × Z+ for d > 6. 4 3
Theorem 1.6. (a) Suppose Assumption 1.2(1) holds. Then there exists a subset 0 with P(0 ) = 1 such that for each ω ∈ 0 and x ∈ G(ω), lim
n→∞
2 log Sn = , Pωx -a.s. log n 3
(1.23)
(b) Suppose in addition there exists a constant c0 such that all vertices in G have degree less than c0 . Then 2 log |Wn | = , Pωx -a.s. (1.24) lim n→∞ log n 3 See Example 1.8 for a graph with unbounded degree which satisfies Assumption 1.2, but for which (1.24) fails. Remark. See [32] for results which generalise the above theorems to the situation where there exist indices α < β such that V (R) is comparable to R α and Reff (0, B(R)c ) is comparable to R β−α . Our case is α = 2, β = 3.
390
M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade
1.3. The IIC. In this section, we define the oriented percolation model and recall the construction of the IIC for spread-out oriented percolation on Zd × Z+ in dimensions d > 4 [21]. For simplicity, we will consider only the most basic example of a spreadout model. (In the physics literature, oriented percolation is usually called directed percolation; see [28].) The spread-out oriented percolation model is defined as follows. Consider the graph with vertices Zd × Z+ and directed bonds ((x, n), (y, n + 1)), for n ≥ 0 and x, y ∈ Zd with 0 ≤ x − y∞ ≤ L. Here L is a fixed positive integer and x∞ = maxi=1,...,d |xi | for x = (x1 , . . . , xd ) ∈ Zd . Let p ∈ [0, 1]. We associate to each directed bond ((x, n), (y, n + 1)) an independent random variable taking the value 1 with probability p and 0 with probability 1 − p. We say a bond is occupied when the corresponding random variable is 1, and vacant when the random variable is 0. Given a configuration of occupied bonds, we say that (x, n) is connected to (y, m), and write (x, n) −→ (y, m), if there is an oriented path from (x, n) to (y, m) consisting of occupied bonds, or if (x, n) = (y, m). Let C(x, n) denote the forward cluster of (x, n), i.e., C(x, n) = {(y, m) : (x, n) −→ (y, m)}, and let |C(x, n)| denote its cardinality. The joint probability distribution of the bond variables will be denoted P, with corresponding expectation denoted E; these depend on p and are defined on a probability ˜ P). Let θ ( p) = P(|C(0, 0)| = ∞). For all dimensions d ≥ 1 and for all space (, F, L ≥ 1, there is a critical value pc = pc (d, L) ∈ (0, 1) such that θ ( p) = 0 for p ≤ pc and θ ( p) > 0 for p > pc . In particular, there is no infinite cluster when p = pc [11,19]. For the remainder of this paper, we fix p = pc , so that P = P pc . To define the IIC, some terminology is required. A cylinder event is an event that is determined by the occupation status of a finite set of bonds. We denote the algebra of cylinder events by F0 , and define F to be the σ -algebra generated by F0 . The most natural definition of the IIC is as follows. Let {(x, m) −→ n} denote the event that there exists (y, n) such that (x, m) −→ (y, n). Let Qn (E) = P(E|(0, 0) −→ n) (E ∈ F0 ),
(1.25)
Q∞ (E) = lim Qn (E) (E ∈ F0 ),
(1.26)
and define the IIC by n→∞
assuming the limit exists. A possible alternate definition of the IIC is to define Pn (E) = with τn =
1 P(E ∩ {(0, 0) −→ (x, n)}) (E ∈ F0 ) τn d
(1.27)
x∈Z
x∈Zd
P((0, 0) −→ (x, n)), and to let P∞ (E) = lim Pn (E) (E ∈ F0 ), n→∞
(1.28)
assuming the limit exists. Let d + 1 > 4 + 1 and p = pc . It was proved in [21] that there is an L 0 = L 0 (d) such that for L ≥ L 0 the limit (1.28) exists for every cylinder event E ∈ F0 . Moreover, P∞ extends to a probability measure on the σ -algebra F, and, writing C = C(0, 0), C is P∞ a.s. an infinite cluster. It was also proved in [21] that if the critical survival probability P((0, 0) −→ n) is asymptotic to a multiple of n −1 as n → ∞, then for L 0 = L 0 (d) the limit (1.26) exists and defines a probability measure on F, and moreover Q∞ = P∞ so both constructions yield the same measure. Subsequently, it was shown in [22,23] that
Random Walk on the Incipient Infinite Cluster for Oriented Percolation
391
(y, n)
(x,m)
Fig. 1. Although the vertex (x, m) is not connected to (y, n), or vice versa, in the sense of oriented percolation (oriented upwards), it is nevertheless possible for a random walk to move from one of these vertices to the other.
the survival probability is indeed asymptotic to a multiple of n −1 when d + 1 > 4 + 1 and L ≥ L 0 (d). We will find both of the equivalent definitions (1.26) and (1.28) to be useful. We call (C, Q∞ )= (C, P∞ ) the IIC, and this provides the random environment for our random walk. We write E∞ for expectation with respect to Q∞ . It will be convenient to remove a Q∞ -null set N from the configuration space , so that for all ω ∈ 0 = −N the cluster C(ω) is infinite (and connected). The IIC C(ω), ω ∈ under the law Q∞ gives a family of random graphs, with marked vertex 0 = (0, 0), so as in Sect. 1.2 we (x,n) can define a random walk X = (X j , j ∈ Z+ , Pω , (x, n) ∈ C(ω)). Note that although the orientation is used to construct the cluster C, once C has been determined the random walk on C can move in any direction—see Fig. 1. Theorem 1.7. For d > 6, there is an L 1 = L 1 (d) ≥ L 0 (d) such that for all L ≥ L 1 , Assumption 1.2(1)–(3) hold with q0 = 1 and constants c1 , c2 , c3 independent of d and L. Consequently, the conclusions of Theorems 1.3, 1.4, 1.5 and 1.6 all hold for the random walk on the IIC. In particular, the Alexander–Orbach conjecture holds in the form of (1.24). As we will see later, the restriction to d > 6 is required only for our estimate of the effective resistance. Remark. Since the constants in Assumption 1.2 are independent of d, L for the IIC (provided d > 6 and L ≥ L 1 (d)), the constants α1 , . . . , α4 in Theorem 1.5 are also independent of d and L when applied to the IIC. The proofs of our main results are performed in two principal steps, corresponding to the results in Sect. 1.2 and Theorem 1.7 respectively. The results in Sect. 1.2 are proved in Sect. 2. The first step is to obtain estimates for a fixed (non-random) graph . In Sect. 2.1, using arguments based on those in [6] and [7], we show that volume and resistance bounds on lead to bounds on transition probabilities and hitting times. Then, in Sect. 2.2 we translate these results into the random graph context, and prove Theorems 1.3–1.6. The second step is the proof of Theorem 1.7. Section 3 states three properties of the IIC for critical spread-out oriented percolation in dimensions d > 6, and show that these imply Theorem 1.7. These properties are proved in Sects. 4–5, using an extension of results of [21,22,26] that were obtained using the lace expansion.
392
M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade
1.4. Further examples. We have some other examples of random graphs which satisfy Assumption 1.2. Example 1.8. (i) Assumption 1.2 holds for random walk on the IIC for the binomial tree; see [7, Corollary 2.12]. Therefore the conclusions of Theorems 1.3–1.6 hold for a random walk on this IIC. The results of [7] go beyond Theorem 1.5(a) and (b) in this context, but Theorem 1.5(c) and Theorem 1.6 here are new. (ii) It is shown in [4] that the invasion percolation cluster on a regular tree is stochastically dominated by the IIC for the binomial tree. Consequently, upper bounds on the volume and lower bounds on the effective resistance of the invasion percolation cluster follow from the corresponding bounds for the IIC (using Lemma 2.2(e) in Sect. 2.1). Assumption 1.2(1,2) for the invasion percolation cluster therefore follows from its counterpart for the IIC for the binomial tree. In addition, the lower bound on the volume in Assumption 1.2(3) is proved for the invasion percolation cluster in [4]. Therefore Assumption 1.2 holds for the invasion percolation cluster on a regular tree, and hence simple random walk on the invasion percolation cluster also obeys the conclusions of Theorems 1.3–1.6. See [4] for further details about this example. (iii) Consider the incipient infinite branching random walk (IIBRW), obtained as the limit as n → ∞ of critical branching random walk (say with binomial offspring distribution) conditioned to survive to at least n generations [20, Sect. 2]. We interpret the IIBRW as a random infinite subgraph of Zd × Z+ . There is the option of considering either one edge per particle jump, leading to the occurrence of multiple edges between vertices, or identifying any such multiple edges as a single edge; we believe both options will behave similarly in dimensions d > 4. Consider simple random walk on the IIBRW. Our volume estimates for the IIC for oriented percolation for d > 4 will adapt to give similar estimates for the IIBRW for d > 4. The effective resistance Reff (0, B(R)c ) for the IIBRW is lower than it is for the IIC on a tree, due to cycles in the IIBRW. It is an interesting open problem to obtain a lower bound on Reff (0, B(R)c ) for the IIBRW, to establish Assumption 1.2 and hence its consequences Theorems 1.3–1.6 for random walk on the IIBRW. Our main interest is the question: Does random walk on the IIBRW have the same behaviour in all dimensions d > 4, or is there different behaviour for 4 < d ≤ 6 and d > 6? An answer would shed light on the question raised at the end of Sect. 1.1. It would also be of interest to consider this question in the continuum limit: Brownian motion on the canonical measure of super-Brownian motion conditioned to survive for all time (see [20]). (iv) A non-random graph satisfies Assumption 1.2 if and only if there exists λ such that J (λ) = [1, ∞). If i , 1 ≤ i ≤ n are graphs satisfying Assumption 1.2 then the graph obtained by joining the i at their marked vertices also satisfies Assumption 1.2. (v) Consider the non-random graph consisting of Z+ with for each n a finite subgraph G n connected by one point in G n to the vertex n. If µ(G n ) n and the diameter of G n is o(n) then Assumption 1.2 holds. In particular, if we take G n to be the complete graph with rn = n 1/2 vertices, then while V (R) R 2 , we have |B(R)| R 3/2 . In this case (1.23) holds, whereas log |Wn | 1 (1.29) lim = , Pωx -a.s. n→∞ log n 2 The rough idea behind (1.29) is as follows. By (1.20), the distance travelled up to time n is approximately n 1/3 . The proof of Theorem 1.6 shows that the random walk will visit a positive fraction of the vertices within this distance, and there are of order (n 1/3 )3/2 = n 1/2 such vertices, leading to (1.29). This shows that some bound on vertex degree is necessary before one can pass from (1.23) to (1.24).
Random Walk on the Incipient Infinite Cluster for Oriented Percolation
393
Throughout the paper, we use c, c to denote strictly positive finite constants whose values are not significant and may change from line to line. We write ci for positive constants whose values are fixed within theorems and lemmas. 2. Random Walk on a Random Graph In this section we prove Theorems 1.3–1.6. First, in Sect. 2.1, we study the random walk on a fixed graph; then, in Sect. 2.2 we apply these results to a family of random graphs satisfying Assumption 1.2. 2.1. Random walk on a fixed graph. In this section, we fix an infinite locally-finite connected graph = (G, E), and will show that bounds on the quantities V (R) and Reff (0, B(R)c ) lead to control of E 0 τ R , pn (0, 0) and E 0 d(0, X n ). The results in [6] (see [6, Theorem 1.3, Lemma 2.2]) cover the case where, for all x ∈ G and R ≥ 1, c1 R 2 ≤ V (x, R) ≤ c2 R 2 , c3 R ≤ Reff (x, B(x, R)c ) ≤ c4 R.
(2.1)
Here, we treat the case where we only have information available on the volume and effective resistance from one fixed point 0 in the graph, and only for certain values of R. Our methods are very close to those of [6], but the need to keep track of the values of R for which we make use of the bounds makes the details of the proofs more complicated. The following proposition gives the majority of the bounds on τ R , pn (0, 0) and d(0, X n ) that will be used in Sect. 2.2. Recall the definition of J (λ) from Definition 1.1. In the following proposition, we will take λ ≥ 1 and assume that R, and certain multiples of R, are in J (λ). We then obtain (for example) bounds on E 0 τ R ; these bounds will involve constants depending on λ. For the limit Theorems 1.5 and 1.6 we need to know that the dependence of these constants on λ is polynomial in λ. To indicate this, we write Ci (λ) to denote positive constants of the form Ci (λ) = Ci λ±qi , which will be fixed throughout this section. The sign accompanying qi > 0 is such that statements become weaker as λ increases. Proposition 2.1. Let λ ≥ 1. There exist C1 (λ), . . . , C9 (λ) such that the following hold: (a) Suppose that R ∈ J (λ). Then E x τ R ≤ 2λR 3
for x ∈ B(R).
(2.2)
Suppose that R, R/(4λ) ∈ J (λ). Then E x τ R ≥ C1 (λ)R 3 ,
for x ∈ B(0, R/(4λ)).
Let ε < 1/(4λ) and R, ε R, ε R/(4λ) ∈ J (λ). Then P y τ R ≤ C2 (λ)(ε R)3 ≤ C3 (λ)ε, for y ∈ B(ε R).
(2.3)
(2.4)
(b) Suppose that R ∈ J (λ). Then pn (0, y) + pn+1 (0, y) ≤ C4 (λ)n −2/3 for y ∈ B(R) if n = 2R3 .
(2.5)
Suppose that R, R/(4λ) ∈ J (λ). Then p2n (x, x) ≥ C5 (λ)n −2/3
for 41 C1 (λ)R 3 ≤ n ≤ 21 C1 (λ)R 3 , x ∈ B(0, R/(4λ)). (2.6)
394
M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade
(c) Let n ≥ 1, M ≥ 1, and set R = Mn 1/3 . If R, C6 (λ)R/M, C6 (λ)R/(4λM) ∈ J (λ), then C (λ) 7 . (2.7) P 0 n −1/3 d(0, X n ) > M ≤ M We have C7 (λ) ≤ cλ22/3 . (d) Let R = (n/2)1/3 and M ≥ 1. If R, R/M ∈ J (λ) then P 0 (d(0, X n ) < R/M) ≤
λC4 (λ) . M2
(2.8)
Also, if R, C8 (λ)R ∈ J (λ) then E 0 d(0, X n ) ≥ C9 (λ)n 1/3 .
(2.9)
The overall strategy for the proof of these various inequalities is as follows. We begin with obtaining bounds on the mean exit time E 0 τ R . Using the Green function (see (2.17) below for the definition) we can write E z τB = g B (z, y)µ y . (2.10) y∈B
Bc)
Since g B (x, x) = Reff (x, (see (2.20)), this leads to the upper and lower bounds on E x τ R for x sufficiently close to 0 given in (2.2) and (2.3). The final inequality concerning τ R is (2.4), which bounds from above the lower tail of τ R . (This is equivalent to bounding from above the speed at which X can move from its starting point 0.) The proof for this takes the bounds in (2.2) and (2.3) as its starting point, but also uses a simple inequality relating effective resistance and hitting probabilities—see Lemma 2.3 below. The next set of inequalities we prove are those for the heat kernel pn (x, y). In the continuous time setting these are proved using differential inequalities which relate the derivative of the heat kernel to its energy. Unfortunately in discrete time the differential inequalities are replaced by rather less intuitive difference equations, which in addition take a slightly more complicated form. The estimate (2.5) is proved from an inequality which bounds the heat kernel just in terms of the volume of balls—see (2.31). Adding information on τ R then enables one to obtain the lower bound (2.6). The final bounds on d(0, X n ) then follow easily from the bounds on τ R and pn (0, x). 2.1.1. Bounds on τ R . We begin by giving a precise definition of effective resistance. Let E be the quadratic form given by ( f (x) − f (y))(g(x) − g(y)), (2.11) E( f, g) = 21 x,y∈G x∼y
where x ∼ y means {x, y} ∈ E. If we regard as an electrical network with a unit resistor on each edge in E, then E( f, f ) is the energy dissipation when the vertices of G are at a potential f . Set H 2 = { f ∈ RG : E( f, f ) < ∞}. Let A, B be disjoint subsets of G. The effective resistance between A and B is defined by: Reff (A, B)−1 = inf{E( f, f ) : f ∈ H 2 , f | A = 1, f | B = 0}.
(2.12)
Let Reff (x, y) = Reff ({x}, {y}), and Reff (x, x) = 0. For general facts on effective resistance and its connection with random walks see [2,15,33]. We recall some basic properties of Reff (·, ·).
Random Walk on the Incipient Infinite Cluster for Oriented Percolation
395
Lemma 2.2. Let = (G, E) be an infinite connected graph. (a) (b) (c) (d) (e)
Reff is a metric on G. If A ⊂ A, B ⊂ B, then Reff (A , B ) ≥ Reff (A, B). Reff (x, y) ≤ d(x, y). If x, y ∈ G \ A, then Reff (x, A) ≤ Reff (x, y) + Reff (y, A). , and if A = A∩G If = (G , E ) is a subgraph of , with effective resistance Reff and B = B ∩ G , then Reff (A , B ) ≥ Reff (A, B). (f) For all f ∈ RG and x, y ∈ G, | f (x) − f (y)|2 ≤ Reff (x, y)E( f, f ).
(2.13)
Proof. For (a) see [31, Sect. 2.3]. The monotonicity in (b) and (e) is immediate from the variational definition of Reff . (c) is easy, and there is a proof in [6, Lemma 2.1]. (d) follows from (a) by considering the graph in which all vertices in A are connected by short circuits, which reduces A to a single vertex a. (f) If f (x) = f (y) then (2.13) is immediate. If not, then set u(z) = ( f (z) − f (y))/ ( f (x) − f (y)), so that u(x) = 1 and u(y) = 0. Then by (2.12), Reff (x, y)−1 ≤ E(u, u) = E( f, f )| f (x) − f (y)|−2 , which gives (2.13).
The inequality (2.13) will play an important role in obtaining pointwise information on functions from resistance or energy estimates. Recall that T A was defined in (1.5) to be the hitting time of A ⊂ G. If A and B are disjoint subsets of G and x∈ A ∪ B, then (see [10, Fact 2, p. 226]) P x (T A < TB ) ≤
Reff (x, B) . Reff (x, A)
(2.14)
Lemma 2.3. Let λ ≥ 1 and suppose R ∈ J (λ). Let 0 < ε ≤ 1/(2λ), and y ∈ B(ε R). Then λε ≥ 1 − 2ελ, 1 − ελ P 0 (Ty < τ R ) ≥ 1 − ελ.
P y (T0 < τ R ) ≥ 1 −
(2.15) (2.16)
Proof. By Lemma 2.2(c) Reff (y, 0) ≤ d(y, 0), while by Lemma 2.2(d) and the definition of J (λ), Reff (y, B(R)c ) ≥ Reff (0, B(R)c ) − Reff (0, y) ≥
R − ε R. λ
So by (2.14), P y (τ R < T0 ) ≤
ελ Reff (y, 0) ≤ . Reff (y, B(R)c ) 1 − ελ
Similarly, P 0 (τ R < Ty ) ≤ Reff (0, y)/Reff (0, B(R)c ) ≤ ελ.
396
M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade
The initial steps in bounding τ R use the Green kernel for the random walk X , so we now recall its definition. (These facts about Green functions will only be used in this subsubsection.) Let B ⊂ G, L(y, n) =
n−1
1(X k =y) ,
k=0
and set x −1 g B (x, y) = µ−1 y E L(y, τ B ) = µ y
∞
P x (X k = y, k < τ B ).
(2.17)
k=0
Then g B (x, y) = g B (y, x) and g B (x, ·) is harmonic on B \ {x}, and zero outside B. Using the Markov property at Ty gives g B (x, y) = P x (Ty < τ B )g B (y, y). Summing (2.17) over y ∈ B gives E z τB =
(2.18)
g B (z, y)µ y .
(2.19)
y∈B
The final property of g B (·, ·) we will need is that Reff (x, B c ) = g B (x, x).
(2.20)
One way to see this is to note that g B (x, ·) is the potential due to a unit current flow from x to B c , so that g B (x, x) is the effective resistance from x to B c . Alternatively, writing p xB (y) = g B (x, y)/g B (x, x), one can verify that p xB attains the minimum in (2.12), and that E( p xB , p xB ) = g B (x, x)−1 . Proof of Proposition 2.1(a), (2.2). It is easy to use (2.19) to obtain an upper bound for the exit time from a ball. By Lemma 2.2(d) we have Reff (z, B c ) ≤ 2R for any z ∈ B = B(R). So, g B (z, y)µ y ≤ g B (z, z)µ y = Reff (z, B c )V (R) ≤ 2λR 3 , (2.21) E z τB = y∈B
which gives (2.2).
y∈B
Proof of Proposition 2.1(a), (2.3). Write B = B(R). To obtain a lower bound for E 0 τ B we restrict the sum in (2.19) to a smaller ball B = B(R/(4λ)), and use Lemma 2.3 to bound g B (0, y) from below on B . If y ∈ B then Lemma 2.3 gives P y (T0 < τ B ) ≥ 21 , so by (2.18) and (2.20), g B (0, y) = g B (0, 0)P y (T0 < τ B ) ≥ 21 g B (0, 0) = 21 Reff (0, B c ) ≥ 21 R/λ. As R/(4λ) ∈ J (λ) we have µ(B ) ≥ λ−1 (R/(4λ))2 , and therefore we obtain g B (0, y)µ y ≥ 21 g B (0, 0)µ(B ) ≥ cλ−4 R 3 . E 0τB ≥ y∈B
Then for x ∈ B we have E x τ B ≥ P x (T0 < τ B )E 0 τ B , which gives (2.3).
(2.22)
Random Walk on the Incipient Infinite Cluster for Oriented Percolation
397
The upper and lower bounds on E x τ R lead to a preliminary inequality on the distribution of τ R . Lemma 2.4. Suppose that R, R/(4λ) ∈ J (λ). Let x ∈ B(0, R/4λ) and n ≥ 1. Then P x (τ R > n) ≥
C1 (λ)R 3 − n 2λR 3
for n ≥ 0.
(2.23)
Proof. By the Markov property, (2.2) and (2.3), C1 (λ)R 3 ≤ E x τ R ≤ n + E x [1{τ R >n} E X n (τ R )] ≤ n + 2λR 3 P x (τ R > n). Rearranging this gives (2.23).
Setting n = δ R 3 in (2.23) gives P x (τ R ≤ δ R 3 ) ≤ 1 −
C1 (λ) − δ . 2λ
(2.24)
This inequality has the defect that the right-hand side of (2.24) does not converge to 0 as δ → 0. We will need a better bound in order to control d(0, X n ), and this is given in (2.4). Proof of Proposition 2.1(a), (2.4). This proof takes a little more work; we obtain it by a kind of bootstrap from (2.23) and Lemma 2.3. The basic point is that, starting at y ∈ B(ε R), X is very likely to visit 0 before escaping from B(R). So X will with high probability have made many excursions from 0 to ∂ B(ε R) before time τ B . Thus τ B is stochastically larger than a sum of independent random variables, each of which, by (2.23), has a probability at least p > 0 of being greater than c R 3 . Rather than following this intuition directly and using stochastic inequalities, it is simpler to obtain a pair of inequalities (2.25) and (2.26) which contain the same information. Let t0 > 0, and set q(y) = P y (τ R ≤T0 ),
a(y) = P y (τ R ≤ t0 ).
Then a(y) = P y (τ R ≤ t0 ) = P y (τ R ≤ t0 , τ R ≤T0 ) + P y (τ R ≤ t0 , τ R > T0 ) ≤ P y (τ R ≤ T0 ) + P y (T0 < τ R , τ R − T0 ≤ t0 ) ≤ q(y) + (1 − q(y))a(0) ≤ q(y) + a(0), (2.25) using the strong Markov property for the second inequality. Starting X at 0 we have a(0) = P 0 (τ R ≤ t0 ) ≤ E 0 [1{τε R ≤t0 } P X τε R (τ R ≤ t0 )] ≤ P 0 (τε R ≤ t0 ) max a(y). y∈∂ B(ε R)
(2.26) Combining (2.25) and (2.26) gives a(0) ≤
max y∈∂ B(ε R) q(y) . P 0 (τε R > t0 )
(2.27)
398
M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade
Note that as J (λ) is defined to be a subset of [1, ∞), the condition that ε R/(4λ) ∈ J (λ) implies that R ≥ 4λ/ε. Since ε < 1/(4λ), ε R+1 ≤ 2ε R t0 ) ≥
C1 (λ) ; 4λ
combining this with (2.28), (2.27) and (2.25) completes the proof of (2.4).
2.1.2. Heat kernel bounds. We now turn to the heat kernel bounds in Proposition 2.1(b). Our first result Proposition 2.5 follows from [6, Lemmas 1.1, 1.2 and 3.10], but as the proof is short we give it here. To deal with issues related to the possible bipartite structure of the graph it proves helpful to consider pn (x, y) + pn+1 (x, y). The main result of the proposition below is the inequality (2.31), which gives an upper bound for pn (x, x) just in terms of the volume. The proof of the analogous inequality in continuous time is a bit easier—see [7, Theorem 4.1]. Proposition 2.5. Let x0 ∈ G and f n (y) = pn (x0 , y) + pn+1 (x0 , y). (a) We have E( f n , f n ) ≤
2 f 2n/2 (x0 ). n
(2.29)
2 d(x0 , y) f 2n/2 (x0 ). n
(2.30)
(b) We have | f n (y) − f n (x0 )|2 ≤ (c) Let r ∈ [1, ∞) and n = 2r 3 . Then f n (x0 ) ≤ c1 n −2/3 (1 ∨ (r 2 /V (x0 , r )).
(2.31)
Proof. (a) It is easy to check that E( f n , f n ) = f 2n (x0 ) − f 2n+2 (x0 ). The spectral decomposition (see for example, Chapter 3 (32) of [2]) gives that k → f 2k (x0 ) − f 2k+2 (x0 ) is non-increasing. Thus n ( f 2n (x0 ) − f 2n+2 (x0 )) ≤ (2n/2 + 1) f 4n/2 (x0 ) − f 4n/2+2 (x0 ) ≤2
2n/2 i=n/2
and (2.29) is obtained.
( f 2i (x0 ) − f 2i+2 (x0 )) ≤ 2 f 2n/2 (x0 ),
Random Walk on the Incipient Infinite Cluster for Oriented Percolation
399
(b) Using Lemma 2.2(c),(f), | f n (y) − f n (x)|2 ≤ Reff (x, y)E( f n , f n ) ≤ d(x, y)E( f n , f n ). We then use (2.29) to bound E( f n , f n ). (c) Choose x∗ ∈ B(x0 , r ) such that f n (x∗ ) = min x∈B(x0 ,r ) f n (x). Then
f n (x∗ )V (x0 , r ) ≤
f n (x)µx ≤
x∈B(x0 ,r )
pn (x0 , x)µx +
x∈G
pn+1 (x0 , x)µx ≤ 2,
x∈G
so that f n (x∗ ) ≤ 2/V (x0 , r ). Since n is even, by (2.30) we have f n (x0 )2 ≤ 2 f n (x∗ )2 + | f n (x0 ) − f n (x∗ )|2 ≤
8 cr f n (x0 ) + . V (x0 , r )2 n
Using a + b ≤ 2(a ∨ b), we see that f n (x0 ) ≤ (c /V (x0 , r )) ∨ (cr /n).
Remark. In fact, (2.29) can be sharpened to give E( f n , f n ) ≤ c1 n −1 p2n/2 (x0 , x0 ), – see [6, Lemma 3.10], but we do not need this. Proof of Proposition 2.1(b). Let f n (y) = pn (0, y) + pn+1 (0, y). As R ∈ J (λ), R 2 /V (R) ≤ λ, so by Proposition 2.5(c), f n (0) ≤ c1 λn −2/3 .
(2.32)
By Proposition 2.5(b), if n is even f n (y) ≤ f n (0) + | f n (y) − f n (0)| ≤ f n (0) + (2d(0, y)n −1 f n (0))1/2 ≤ cλn −2/3 , (2.33) which proves (2.5). To prove the lower bound (2.6) we use Lemma 2.4. For sufficiently small n this bounds from above the probability that X has left B by time n, and so bounds from below P 0 (X n ∈ B). This leads easily to a lower bound on p2n (x, x). Here are the details. Let n ≤ 21 C1 (λ)R 3 . Then using (2.23), P x (X n ∈ B) ≥ P x (τ B > n) ≥ 14 λ−1 C1 (λ).
(2.34)
By Chapman–Kolmogorov and Cauchy–Schwarz ⎛ P x (X n ∈ B)2 = ⎝
⎞2 pn (x, y)µ y ⎠ ≤ µ(B)
y∈B
and using (2.34) gives (2.6).
y∈B
pn (x, y)2 µ y ≤ p2n (x, x)λR 2 ,
400
M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade
2.1.3. Bounds on d(0, X n ). The main work for these bounds has already been done in the proofs of Proposition 2.1(a) and (b), and in particular the proof of (2.4). Proof of Proposition 2.1. (c) The proof of (2.7) follows from (2.4) after suitable checking, since P 0 (d(0, X n )n −1/3 > M) = P 0 (d(0, X n ) > R) ≤ P 0 (τ R ≤ n).
(2.35)
We now fill in the details. Define ε by the relation n = C2 (λ)(ε R)3 ; so that ε = C6 (λ)/M. Let C7 (λ) = C3 (λ)C6 (λ). The desired inequality is trivial when C7 (λ)/M ≥ 1, so assume that C7 (λ)/M < 1. This means ε = C6 (λ)/M < C3 (λ)−1 . Since we may take C3 (λ) > 4λ, we obtain ε < (4λ)−1 , so we can apply (2.4). Using (2.35) and (2.4), P 0 (d(0, X n )n −1/3 > M) ≤ P 0 (τ R ≤ C2 (λ)(ε R)3 ) ≤ C3 (λ)ε =
C7 (λ) , M
(2.36)
which proves (2.7). Tracking the powers of λ gives that C7 (λ) ≤ cλ22/3 . (d) We can bound the probability that X is in a ball B by the volume of the ball and the maximum of the heat kernel on the ball. By (2.5), writing B = B(0, R/M) ⊂ B(0, R) and f n (0, y) = pn (0, y) + pn+1 (0, y), P 0 (X n ∈ B ) = pn (0, y)µ y ≤ f n (0, y)µ y ≤V (R/M)C4 (λ)R −2 ≤ λC4 (λ)/M 2 , y∈B
y∈B
(2.37) proving (2.8). The final inequality in (d) now follows easily, since all we need is that d(0, X n ) is greater than cn 1/3 with positive probability. Let M = C8 (λ) satisfy M 2 = 2λC4 (λ). Then using (2.8), P 0 (d(0, X n ) < R/M) ≤ 21 , so E 0 d(0, X n ) ≥ 21 R/M. We do not have an upper bound on E 0 d(0, X n ) to complement the lower bound of Proposition 2.1(d), which uses volume and resistance bounds from a single base point, i.e., bounds on V (0, R) and Reff (0, B(R)c ). Suppose that J (λ) = [1, ∞) for some p λ ≥ 1, and let Z n = n −1/3 d(0, X n ). Then we are able to bound E 0 Z n for p < 1, since (2.7) gives p
E 0 [Z n ] ≤ ≤
∞ m=1 ∞ m=1
(2m+1 ) p P 0 2m ≤ n −1/3 d(0, X n ) < 2m+1 ∞ (2m+1 ) p P 0 n −1/3 d(0, X n ) ≥ 2m ≤ c1 2m( p−1) = c2 < ∞. m=1
On the other hand the following example indicates that, under our hypotheses, we cannot p expect to have a uniform bound on E 0 (Z n ) when p > 1. We sketch this argument below. Example 2.6. Let be the subgraph of Z2 with vertex set G = G 0 ∪ G 1 , where G 0 = {(n, 0), n ∈ Z}, and G 1 = {(n, m) : 0 ≤ m ≤ n}. Let the edges be {(n, 0), (n + 1, 0)}, for n ∈ Z, and {(n, m), (n, m + 1)} if n ≥ 1 and 0 ≤ m ≤ n − 1. Thus consists of Z− and a comb-type graph of vertical branches with base Z+ . Write 0 for (0, 0). It is easily checked that V (0, R) R 2 , and Reff (0, B(0, R)c ) ≥ R/4. Thus there exists λ0 < ∞ such that J (λ0 ) = [1, ∞). Let H (a, b) = {(n, m) ∈ G : a ≤ n ≤ b}.
Random Walk on the Incipient Infinite Cluster for Oriented Percolation
401
Let X n be the simple random walk on . If we time-change out the excursions of X away from Z then we obtain a simple random walk Yn on Z. Now let R ≥ 1, and r = R 2/3 ∈ Z. Let A = H (−r, r ). Since B(0, r/2) ⊂ A ⊂ B(0, 2r ), Proposition 2.1(a) implies that E 0 τ A ≈ r 3 ≈ R 2 . Since X only moves horizontally when it is on the x-axis, P 0 (X τ A = (−r, 0)) = 1/2. If X τ A = (−r, 0) then the probability that X reaches H (−∞, −R) before returning to 0 is r/R ≈ R −1/3 ; also, if X does this then the time taken to do so will be of order R 2 . These arguments lead us to expect that if n = R 2 then P 0 (X n ∈ H (−∞, −R/2)) ≥ c R −1/3 .
(2.38)
Given (2.38), it follows from Markov’s inequality that E 0 Z n ≥ n − p/3 (R/2) p P 0 (X n ∈ H (−∞, −R/2)) ≥ cn ( p−1)/6 , p
and the lower bound diverges if p > 1. This concludes Example 2.6. 2.2. Results for random graphs. We now consider a family of random graphs, as described in Sect. 1.2, and prove Theorems 1.3–1.6. Most of the hard work has been done in the previous section, where we obtained bounds for a fixed graph . We begin by obtaining tightness of the quantities R −3 E 0 τ R , n 2/3 p2n (0, 0), and −1/3 n d(0, X n ). We recall the definition of the function p(λ) in Assumption 1.2(1), and that p(λ) ≤ c0 λ−q0 . Proof of Theorem 1.3. The basic idea here is straightforward. For each of the quantities we are interested in, the estimates in Proposition 2.1 tell us that provided the environment is ‘good’ at the scale R (that is, more precisely, that ci R ∈ J (λ) for suitable ci ) then the quantity takes the value we want. The bounds we get will only hold if R or n is large enough, but it is easy to handle the small values of R or n. We begin with (1.10). Let ε > 0. Choose λ ≥ 1 such that 2 p(λ) < ε. Let R/(4λ) ≥ R ∗ , and set F1 = {R, R/(4λ) ∈ J (λ)}. Then, by Assumption 1.2(1), P(F1 ) ≥ 1−2 p(λ). For ω ∈ F1 , by Proposition 2.1(a), there exists c1 < ∞, q1 ≥ 0 such that (c1 λq1 )−1 ≤ R −3 E ωx τ R ≤ c1 λq1
for x ∈ B(R/(4λ)).
So, if θ ≥ c1 λq1 then for R ∈ [4λR ∗ , ∞), P θ −1 ≤ R −3 E ω0 τ R ≤ θ ≥ P(F1 ) ≥ 1 − 2 p(λ) ≥ 1 − ε.
(2.39)
(2.40)
Let R0 ≥ 1. Since 0 < sup1≤r ≤R0 r −3 E ω0 τr < ∞, we have lim P(θ −1 ≤ r −3 E ω0 τr ≤ θ ) = 1
θ→∞
uniformly for r ∈ [1, R0 ].
Combining this with (2.40) gives (1.10). A similar argument enables us to handle the cases of small n in (1.11)–(1.13), and we do not provide further details on this point below. For (1.11) let n ≥ 1, λ ≥ 1, and let R0 , R1 be defined by n = 21 C1 (λ)R13 = 2R03 . Let F2 = {R0 , R1 , R1 /(4λ) ∈ J (λ)}. Suppose that R0 and R1 /(4λ) are both greater than R ∗ ; then P(F2 ) ≥ 1 − 3 p(λ). If ω ∈ F2 then by Proposition 2.1(b), ω (0, 0) ≤ c2 λq2 . (c2 λq2 )−1 ≤ n 2/3 p2n
402
So,
M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade
ω P (c2 λq2 )−1 ≤ n 2/3 p2n (0, 0) ≤ c2 λq2 ≥ P(F2 ) ≥ 1 − 3 p(λ),
(2.41)
proving (1.11). We now prove (1.12). Let n ≥ 1 and λ ≥ 1. Let M = λ8 and set R0 = Mn 1/3 ,
R1 = C6 (λ)n 1/3 ,
R2 = C6 (λ)n 1/3 /(4λ),
F3 = {R0 , R1 , R2 ∈ J (λ)}. If n is large enough so that Ri ≥ R ∗ for 0 ≤ i ≤ 2, then by (2.7), if ω ∈ F3 then C (λ) cλ22/3 c 7 ≤ = 2/3 . Pω0 n −1/3 d(0, X n ) > λ8 ≤ 8 λ λ8 λ Taking θ = λ8 , we have P ∗ n −1/3 d(0, X n ) > θ ≤ P(F3c ) + E Pω0 (n −1/3 d(0, X n ) > λ8 )1 F3 ≤ 3 p(θ 1/8 ) + c3 θ −1/12 ,
(2.42)
and (1.12) follows. Finally, we prove (1.13). Let R = (n/2)1/3 , M ≥ 1. If R, R/M ∈ J (λ) then by (2.8), λC (λ) 4 Pω0 n −1/3 d(0, X n ) < 2−1/3 M −1 ≤ . (2.43) M2 Given ε > 0 choose λ so that p(λ) < ε and M so that λC4 (λ)/M 2 < ε. Let F4 = {R, R/M ∈ J (λ)}. Then (2.43) holds for ω ∈ F4 , so taking expectations with respect to P, P ∗ n −1/3 (1 + d(0, X n )) < 2−1/3 M −1 ≤ P ∗ n −1/3 d(0, X n ) < 2−1/3 M −1 = EPω0 n −1/3 d(0, X n ) < 2−1/3 M −1 ≤ P(F4c ) + ε < 3ε. This deals with the case of large n; for small n we just use 1 + d(0, X n ) ≥ 1.
Proof of Theorem 1.4. We begin with the upper bounds in (1.14)–(1.15). Here all we need do is to use the bounds on EV (R) and E(1/V (R)) given by Assumption 1.2(2), together with the bounds on E 0 τ R and p2n (0, 0) obtained above. By (2.21) and Assumption 1.2(2), E(E ω0 τ R ) ≤ E(2RV (R)) ≤ c R 3 , provided R ≥ R ∗ . If R ≤ R ∗ then since τ R ≤ τ R ∗ , we obtain the upper bound in (1.14) by adjusting the constant c2 . Also, by Proposition 2.5(c), if r = (n/2)1/3 then using Assumption 1.2(3), ω (0, 0) ≤ cn −2/3 E(1 + r 2 /V (r )) ≤ c n −2/3 , E p2n
again provided r ≥ R ∗ . For each of the lower bounds, it is sufficient to find a set F ⊂ of ‘good’ graphs with P(F) ≥ c > 0 such that, for all ω ∈ F we have suitable lower bounds on E ω0 τ R ,
Random Walk on the Incipient Infinite Cluster for Oriented Percolation
403
ω (0, 0) or E 0 d(0, X ). We assume that R ≥ 1 is large enough so that R/(4λ ) ≥ R ∗ , p2n n 0 ω where λ0 is chosen large enough that p(λ0 ) < 1/8. Again, we obtain the lower bound in (1.14) for small R using the fact that E(E ω0 τ R ) ≥ 1 and adjusting the constant c1 . Let F = {R, R/(4λ0 ) ∈ J (λ0 )}. Then P(F) ≥ 43 , and for ω ∈ F, by (2.3), E ω0 τ R ≥ c1 (λ0 )R 3 . So,
E(E ω0 τ R ) ≥ E(E ω0 τ R 1 F ) ≥ c1 (λ0 )R 3 P(F) ≥ c2 (λ0 )R 3 . Given n ∈ N, choose R so that n = 21 C1 (λ0 )R 3 . Then there exists n ∗ (depending on λ0 and R ∗ ) such that n ≥ n ∗ implies that R/(4λ0 ) ≥ R ∗ . Let F be as above. Then using (2.6) to bound p2n (0, 0) from below, ω E p2n (0, 0) ≥ P(F)c3 (λ0 )n −2/3 ≥ c4 (λ0 )n −2/3 ,
giving the lower bound in (1.15). A similar argument uses (2.9) to conclude (1.16).
Proof of Theorem 1.5. These results will follow from the bounds already obtained in Proposition 2.1 and in the proof of Theorem 1.3 by a straightforward Borel–Cantelli argument. We will take 0 = a ∩ b ∩ c , where the sets ∗ are defined in the proofs of (a), (b) and (c). Recall that by Assumption 1.2(1), p(λ) = P(R ∈ J (λ)) ≤ c0 λ−q0 . ω (0, 0). By (2.41) we have (a) We begin with the case x = 0, and write w(n) = p2n P((c1 λq1 )−1 < n 2/3 wn ≤ c1 λ−q1 ) ≥ 1 − 3 p(λ). p(λk ) < ∞, by Borel–Cantelli there Let n k = ek and λk = k 2/q0 . Then, since 2/3 exists K 0 (ω) with P(K 0 < ∞) = 1 such that c1−1 k −2q1 /q0 ≤ n k w(n k ) ≤ c1 k 2q1 /q0 for all k ≥ K 0 (ω). Let a = {K 0 < ∞}. For k ≥ K 0 we therefore have −2/3
c2−1 (log n k )−2q1 /q0 n k
−2/3
≤ w(n k ) ≤ c2 (log n k )2q1 /q0 n k
,
so that (1.18) holds for the subsequence n k . The spectral decomposition (see for example ω (0, 0) is monotone decreasing in n. So, if n > N = e K 0 +1, let k ≥ K [2]) gives that p2n 0 0 be such that n k ≤ n < n k+1 . Then −2/3
w(n) ≤ w(n k ) ≤ c2 (log n k )2q1 /q0 n k
≤ 2e2/3 c2 (log n)2q1 /q0 n −2/3 .
Similarly w(n) ≥ w(n k+1 ) ≥ c3 n −2/3 (log n)−2q1 /q0 . Taking q2 > 2q1 /q0 , so that the constants c2 , c3 can be absorbed into the log n term, we obtain ω (log n)−q2 n −2/3 ≤ p2n (0, 0) ≤ (log n)q2 n −2/3 for all n ≥ N0 (ω). (2.44) ω (0, 0)/ log n = −2/3, P-a.s. is then immediate. Since p ω (0, 0) = That limn log p2n n 2n ∞, X is recurrent. If x, y ∈ C(ω) and k = dω (x, y), then the Chapman–Kolmogorov equations give that ω ω p2n (x, x)( pkω (x, y)µx (ω))2 ≤ p2n+2k (y, y),
and using this it is easy to obtain (1.18) from (2.44).
404
M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade
(b) Let Rn = en and λn = n 2/q0 . Let Fn = {Rn , Rn /(4λn ) ∈ J (λn )}. Then (provided Rn /(4λn ) ≥ 1) we have P(Fnc ) ≤ 2 p(λn ) ≤ 2n −2 . So, by Borel–Cantelli, if b = lim inf Fn , then P(b ) = 1. Hence there exists M0 with M0 (ω) < ∞ on b , and such that ω ∈ Fn for all n ≥ M0 (ω). Now fix ω ∈ b , and let x ∈ C(ω). Write F(R) = E ωx τ R . By (2.39) there exist constants c4 , q4 such that (c4 λn4 )−1 ≤ Rn−3 F(Rn ) ≤ c4 λn4 , q
q
(2.45)
provided n ≥ M0 (ω) and n is also large enough so that x ∈ B(Rn /(4λn )). Writing Mx (ω) for the smallest such n, c4−1 (log Rn )−2q4 /q0 Rn3 ≤ F(Rn ) ≤ c4 (log Rn )2q4 /q0 Rn3 , for all n ≥ Mx (ω). As F(R) is monotonic, the same argument as in (a) enables us to replace F(Rn ) by F(R), for all R ≥ Rx = 1 + e Mx . Taking α2 > 2q4 /q0 we obtain (1.19). (c) Recall that Yn = max0≤k≤n d(0, X k ). We begin by noting that {Yn ≥ R} = {τ R ≤ n}.
(2.46)
Using this, (1.20) follows easily from (1.21). It remains to prove (1.21). Since τ R is monotone in R, as in (b) it is enough to prove the result for the subsequence Rn = en . The estimates in (b) give the upper bound. In fact, if ω ∈ b , and n ≥ Mx (ω), then by (2.45), q
Pωx (τ Rn ≥ n 2 c4 λn4 Rn3 ) ≤
F(Rn ) q 2 n c4 λn4 Rn3
≤ n −2 .
So, by Borel–Cantelli (with respect to the law Pωx ), there exists N x (ω, ω) with Pωx (N x < ∞) = Pωx ({ω : N x (ω, ω) < ∞}) = 1 such that τ Rn ≤ c5 (log Rn )q5 Rn3 , for all n ≥ N x . For the lower bound, write C2 (λ) = c6 λ−q6 , C3 (λ) = c7 λq7 . Let λn = n 2/q0 , and −q −q εn = n −2 λn 6 7 . Set G n = {Rn , εn Rn , εn Rn /(4λn ) ∈ J (λn )}. Then, for n sufficiently large so that εn Rn /(4λn ) ≥ 1, we have P(G cn ) ≤ 3 p(λn ) ≤ 3c0 n −2 . Let c = b ∩ (lim inf G n ); then by Borel–Cantelli P(c ) = 1 and there exists M1 with M1 (ω) < ∞ for ω ∈ c such that ω ∈ G n whenever n ≥ M1 (ω). By (2.4), if n ≥ M1 and x ∈ B(εn Rn ) then −q
Pωx (τ Rn ≤ c6 λn 6 εn3 Rn3 ) ≤ c7 λn7 εn ≤ c7 n −2 . q
So, using Borel–Cantelli, we deduce that (for some q8 ) τ Rn ≥ c6 λ−q6 εn3 Rn3 ≥ n −q8 Rn3 = (log Rn )−q8 Rn3 , for all n ≥ N x (ω, ω). This completes the proof of (1.21).
(2.47)
Random Walk on the Incipient Infinite Cluster for Oriented Percolation
405
Proof of Theorem 1.6. (a) We first consider the case x = 0. The upper bound on log Sn / log n follows easily from the bounds on τ R and V (R), as follows. A Borel– Cantelli argument similar to those above implies that V (R) ≤ R 2 (log R)c
(2.48)
for all sufficiently large R. Recall that Yn = max0≤k≤n d(0, X n ). We have Wn ⊂ B(Yn ), so Sn ≤ V (Yn ). So, for sufficiently large n, using (1.20),
Sn ≤ V ((log n)α3 n 1/3 ) ≤ n 2/3 (log n)c ,
(2.49)
proving the upper bound in (1.23). For the lower bound, we need to show that a positive proportion of the points in B(Yn ) have been hit by time n, and for this we use Lemma 2.3. q2 Choose q1 ≥ 1, q2 ≥ 1 so that we can write C2 (λ) = c1 λ−q1 and C3 (λ) = c2 λ . Let Rk = ek , and λk = k q3 , where q3 ≥ 2 is chosen large enough so that p(λk ) < ∞. −q Let εk = c2−1 λk 2 k −q3 . Set Fk = {Rk , εk Rk , εk Rk /4λk ∈ J (λk )}. Write ξ(x, R) = 1{Tx >τ R } . If R ∈ J (λ) and ε < 1/2λ then by Lemma 2.3, Pω0 (ξ(x, R) = 1) ≤ ελ, for x ∈ B(ε R). Set Z k = V (εk Rk )−1
ξ(x, Rk )µx ;
x∈B(εk Rk )
this is the proportion of points in B(εk Rk ) which are not hit by time τ Rk . Then if ω ∈ Fk , Pω0 (Z k ≥ 21 ) ≤ 2E ω0 Z k ≤ 2εk λk ≤ k −q3 . Let m(k) = k q3 λk Rk3 . Then if ω ∈ Fk , by (2.2), Pω0 (τ Rk ≥ m(k)) ≤ 2λk Rk3 m(k)−1 = 2k −q3 . Thus P ∗ (Fkc ∪ Z k ≥ 21 ∪ {τ Rk ≥ m(k)}) ≤ 3 p(λk ) + 3k −q3 , so by Borel–Cantelli, P ∗ -a.s. there exists a k0 (ω, ω) < ∞ such that, for all k ≥ k0 , Fk holds, τ Rk ≤ m(k), and Z k ≤ 1/2. So, for k ≥ k0 , 2 (1 − ξ(x, Rk ))µx = V (εk Rk )(1 − Z k ) ≥ 21 λ−1 Sm(k) ≥ Sτ Rk = k (εk Rk ) . x∈B(εk Rk )
Let n be large enough so that m(k) ≤ n < m(k + 1) for some k ≥ k0 . Then log Sn log Sm(k) 2k − c log k ≥ ≥ , log n log m(k + 1) 3(k + 1) + c log(k + 1) and the lower bound in (1.23) follows. This proves (1.23) when x = 0.
406
M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade
Now let 0 = {ω : G(ω) is recurrent and Pω0 (lim(log Sn / log n) = 23 ) = 1}. n
We have P(0 ) = 1. If ω ∈ 0 , and x ∈ G(ω) then X hits 0 with Pωx –probability 1. Since the limit does not depend on the initial segment X 0 , . . . , X T0 , we obtain (1.23). (b) We have |Wn | ≤ Sn ≤ c0 |Wn |, so (1.24) is immediate from (1.23). Remark. Note that the constants ci in Theorem 1.4 and αi in Theorem 1.5 depend only on the constants c1 , c2 , c3 , q0 in Assumption 1.2. 3. Verification of Assumption 1.2 for the IIC In Sect. 3.1, we state three propositions which give estimates for the volume and effective resistance for the IIC. Propositions 3.1–3.2, which pertain to the volume growth of C, are proved in Sect. 4. Proposition 3.3, which will be used to estimate the effective resistance, is proved in Sect. 5. In Sect. 3.2, we use the three propositions to verify Assumption 1.2 for the IIC, and complete the proof of our main result Theorem 1.7. 3.1. Three propositions. We will use the following notation for the IIC. Let U (R) = {(x, n) : n ≥ R}, B(R) = {(x, n) ∈ C : 0 ≤ n < R}, and ∂ B(R) = {(x, R) : (x, R) ∈ C}. We note that, using the graph distance d on C, B(R) is just the ball B(0, R), and ∂ B(R) is its exterior boundary. Let Z R = b0 R −2 V (R),
(3.1)
where b0 is a constant that will be specified below (4.25). The constant b0 has limit as L → ∞.
1 2
Proposition 3.1. Let d > 4 and L ≥ L 0 . Under the IIC measure, the random variables Z R converge in distribution to a strictly positive limit Z , whose distribution is independent of d and L. Also, all moments converge, i.e., E∞ Z lR → EZ l for each l ∈ N. In particular, c1 (d)R 2 ≤ E∞ V (R) ≤ c2 (d)R 2 , R ≥ 1. Moreover, c1 and c2 do not depend on d, if we further require that L ≥ L 1 , for some L 1 = L 1 (d). Remark. We do not need the full strength of Proposition 3.1 to establish Assumption 1.2 for the IIC. However, since the scaling limit of V (R) is also of independent interest, we will prove the stronger result, and, moreover, identify the limiting random variable Z in terms of super-Brownian motion. Proposition 3.2. Let d > 4 and L ≥ L 0 . Then Q∞ (V (R)R −2 < λ) ≤ c1 (d) exp{−c2 (d)λ−1/2 },
R ≥ 1.
(3.2)
Moreover, c1 and c2 do not depend on d, if we further require that L ≥ L 1 , for some L 1 = L 1 (d).
Random Walk on the Incipient Infinite Cluster for Oriented Percolation
407
The third proposition gives an estimate on the expected number of edges at level n −1 that need to be cut in order to disconnect 0 from level R. We say that (x, n), (x , n ) ∈ C are RW-connected, if there is a path, not necessarily oriented, in C from (x, n) to (x , n ). We reserve the term connected to mean oriented connection, that is (x, n) −→ (x , n ). Let
(x, n) is RW-connected to , 0 6. There exists L 1 = L 1 (d) ≥ L 0 (d) such that for L ≥ L 1 , R ≥ 1 and 0 < a < 1, E∞ (|D(n)|) ≤ c1 (a),
0 < n ≤ a R.
(3.4)
The constant c1 (a) is independent of the dimension d and also of L. Remark. Proposition 3.3 is the only place where we need d > 6 rather than d > 4. 3.2. Verification of Assumption 1.2 for the IIC. We begin with a lemma that relates |D(n)| and the effective resistance. Lemma 3.4. For oriented percolation in any dimension d ≥ 1, Reff (0, ∂ B(R)) ≥
R n=1
1 . |D(n)|
(3.5)
Proof. We have that Reff (0, ∂ B(R)) is the minimum energy dissipation of a unit current from 0 to ∂ B(R) – see [15, p. 63]. Let I be such a unit current. Fix 1 ≤ n ≤ R, let k = |D(n)|, and let J1 , . . . Jk be the currents in the bonds in D(n). Then since k |Ji | ≥ 1. Hence all the current must flow through the edges in D(n), we have i=1 k the energy dissipation for I in the bonds in D(n), which is i=1 |Ji |2 , is greater than 1/k = |D(n)|−1 . Summing then gives (3.5). Now we combine Proposition 3.3 and Lemma 3.4 to show that it is unlikely that the effective resistance Reff (0, ∂ B(R)) is less than a small multiple of R. Proposition 3.5. There is a constant c such that for d > 6, L ≥ L 1 , R ≥ 2 and > 0, Q∞ (Reff (0, ∂ B(R)) ≤ ε R) ≤ cε.
(3.6)
Proof. Let R ≥ 2. Fix 21 < a < 1 and let r = a R; note that r ≥ 1. By Lemma 3.4 and the Cauchy–Schwarz inequality, r −1 r −1 −1 Reff (0, ∂ B(R)) ≤ |D(n)| ≤ r −2 |D(n)|. (3.7) n=1
n=1
408
M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade
Therefore, by Proposition 3.3, Markov’s inequality and (3.7), Q∞ (Reff (0, ∂ B(R)) ≤ ε R) = Q∞ (Reff (0, ∂ B(R))−1 ≥ ε−1 R −1 ) ≤ ε RE∞ Reff (0, ∂ B(R))−1 r −2 |D(n)| ≤ ε Rr −1 c1 (a) ≤ 2a −1 c1 (a)ε. ≤ ε Rr E∞ n=1
Proof of Theorem 1.7. Let W R = V (R)/R 2 . By Proposition 3.1 we have (2) and Q∞ (W R ≥ λ) ≤ λ−1 E∞ W R ≤ cλ−1 .
(3.8)
Also, Proposition 3.2 gives Q∞ (W R < λ−1 ) ≤ c exp(−c λ1/2 ),
(3.9)
and (3) is then immediate after integration. The combination of (3.8)–(3.9) and (3.6) (with ε = λ−1 ), together with the fact that each of the bounds is less than cλ−1 for large λ, gives (1) with q0 = 1 and R ∗ = 2. The fact that all constants here are independent of d, L implies that the constants in Assumption 1.2 share this independence. 4. IIC Volume Estimates: Proof of Propositions 3.1–3.2 In Sect. 4.2 we prove Proposition 3.1, and in Sect. 4.3 we prove Proposition 3.2. The proofs make use of results from several previous papers involving the lace expansion; these results are gathered together and slightly extended in Sect. 4.1. We assume throughout that d > 4 and that L is large; these assumptions will often not be mentioned explicitly in the following. Throughout: β = L −d , K denotes a constant that only depends on d, and K¯ denotes an absolute constant. The values of the constants K and K¯ may change from one occurrence to the next. 4.1. Preliminaries. In this section, we recall and slightly extend various results from [20,21,25,26]. These results isolate the necessary ingredients from other papers that will be used in the proof of Propositions 3.1–3.2. 4.1.1. Critical oriented percolation r -point functions. The critical oriented percolation two-point function τn (x) is defined by
Let τn =
τn (x) = P pc ((0, 0) −→ (x, n)). x∈Zd τn (x).
(4.1)
By [26, Theorem 1.1], sup τn (x) ≤ Kβ(n + 1)−d/2 , n ≥ 1,
(4.2)
τn = A(1 + O(n (4−d)/2 )), as n → ∞,
(4.3)
x∈Zd
Random Walk on the Incipient Infinite Cluster for Oriented Percolation
409
where |A − 1| ≤ Kβ. The estimate [25, (4.2)] shows that the error term in (4.3) is bounded by Kβn (4−d)/2 (note that f n (0, z c ) of [25] corresponds to our τn ). Hence for L ≥ L 1 = L 1 (d), we have K¯ −1 ≤ A ≤ K¯ ,
|τn − A| ≤ K¯ n (4−d)/2 , n ≥ 1,
K¯ −1 ≤ τn ≤ K¯ , n ≥ 0. (4.4) Also, noting that τ1 is called pc in [26], we see from [26, Eq. (1.12)] that |τ1 − 1| ≤ Kβ ≤ K¯ for L ≥ L 1 (d) sufficiently large. For all r ≥ 2, the critical oriented percolation r -point function τn(r ) (x) is defined by τn(r1),...,nr −1 (x1 , . . . , xr −1 ) = P pc ((0, 0) −→ (xi , n i ) for all i = 1, . . . , r − 1),
(4.5)
with xi ∈ Zd , n i ∈ Z+ . The asymptotic behaviour of the Fourier transforms of the r -point functions is given in [26, Theorem 1.2]. A very special case of [26, Theorem 1.2] is that there is a δ > 0 such that for t1 , t2 > 0, (3) ∗ 3 −δ t τnt (x , x ) = nV A ∧ t + O(n ) (4.6) 1 2 1 2 ,nt 1 2 x1 ,x2 ∈Zd
as n → ∞ (see [26, (1.22)]). The vertex factor V ∗ is written V in [26] but written V ∗ here to avoid confusion with the volume. The vertex factor is a constant with |V ∗ − 1| ≤ Kβ, and we assume that L 1 has been chosen so that K¯ −1 ≤ V ∗ ≤ K¯ . 4.1.2. The IIC r -point functions. Let y = (y1 , . . . , yr −1 ) and m = (m 1 , . . . , m r −1 ) with yi ∈ Zd , m i ∈ Z+ . For r ≥ 2, the IIC r -point function is defined by ρm(r ) (y ) = Q∞ ((0, 0) −→ (yi , m i ) for all i = 1, . . . , r − 1). Let
ρˆm(r ) =
y1 ,...,yr −1 ∈Zd
ρm(r ) (y ).
(4.7) (4.8)
Let A be the constant of (4.3), and let V ∗ be the vertex factor of (4.6). Let r ≥ 2, t = (t1 , . . . , tr −1 ) ∈ (0, 1]r −1 , and for a positive integer m, let m t be the vector with that for r ≥ 2, components mti . It is immediate from [21, (5.15)] (with k = 0) lim
m→∞
1
(r ) ρˆ (r ) = Mˆ 1, , t
(m A2 V ∗ )r −1 m t
(r) is defined recursively as follows (see [21, Sect. 4.2]). where the limit Mˆ 1, t For r = 1, we have simply Mˆ s(1) = 1.
(4.9)
(4.10)
For r > 2 and s¯ = (s1 , . . . , sr ) with each si > 0, the Mˆ s¯(r) are given recursively by s (r ) ˆ Ms¯ = ds Mˆ s(1) , (4.11) Mˆ s¯(i)I −s Mˆ s¯(rJ−i) \I −s 0
I ⊂J1 :|I |≥1
where i = |I |, J = {1, . . . , l}, J1 = J \{1}, s = mini si , sI denotes the vector consisting of the components si of s with i ∈ I , and sI − s denotes subtraction of s from each component of sI . The explicit solution to the recursive formula (4.11) can be found, e.g.,
410
M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade
in [26, (1.25)]. In particular, Mˆ s(2) 1 ,s2 = s1 ∧ s2 . It is shown in [21, Lemma 4.2] that for r ≥ 1 and t > 0, (r) Mˆ t,...,t = t r −1 2−(r −1) r !. (4.12) To this we add the following elementary fact. Lemma 4.1. For r ≥ 1, Mˆ s(r1),...,sr is nondecreasing in each si . Proof. The proof is by induction on r . For r = 1, Mˆ s(1) 1 = 1 by (4.10), which is nondecreasing. Assume the result holds for all j ≤ r . Then it holds also for r + 1 by (4.11), since increasing an si can only increase the integrand (by the induction hypothesis) or the domain of integration in (4.11). 4.1.3. Super-Brownian motion. As discussed in [21, Sect. 4], the quantity Mˆ s¯(r ) appearing in (4.9) is the r th moment of the canonical measure N of super-Brownian motion X t , namely Mˆ s(r1),...,sr = N X s1 (Rd ) · · · X sr (Rd ) . (4.13) For an introduction to the canonical measure, see [36, Chap. 17]. Let Yt denote the canonical measure of super-Brownian motion conditioned to survive for all time (see [20]). Let 1
Z=
dt Yt (Rd ),
(4.14)
0
so that Z is a positive random variable. It is clear that the distribution of Z does not depend on L. It also does not depend on d, since it is equal to the mass up to time 1 of the continuum random tree conditioned to survive forever. The moments of Z are given, for integers l ≥ 1, by 1 1 (l+1) l EZ = dt1 · · · dtl Mˆ 1, (4.15) t 0
0
(see [20, Sect. 3.4]). We will use the fact that Z has an exponential moment. This follows from 1 1 (l+1) l EZ ≤ dt1 · · · dtl Mˆ 1,1,...,1 = 2−l (l + 1)!, (4.16) 0
0
where we have used (4.15), Lemma 4.1 and (4.12). 4.1.4. Rate of convergence to the IIC. For the proof of Proposition 3.2, we will need an estimate for the rate of convergence of Pn to P∞ (recall the definitions from (1.27)– (1.28)). Let Em denote the set of cylinder events measurable with respect to the set of edges up to level m − 1. In [21, Eq. (2.19)], the following representation was obtained for Pn (E), E ∈ Em : n−1 1 Pn (E) = ϕl (E)τ1 τn−l−1 + ϕn (E) , (4.17) τn l=m
where ϕl (E) is a function arising in the lace expansion. The factor τ1 was called pc in [21]. By [21, Lemma 2.2], ϕl satisfies |ϕl (E)| ≤ Kβm(l − m + 1)−d/2 ,
l ≥ m + 1.
(4.18)
Random Walk on the Incipient Infinite Cluster for Oriented Percolation
411
However, a very slight modification of the proof of [21, Lemma 2.2] actually shows that |ϕl (E)| ≤ Kβ(l − m + 1)(2−d)/2 , l ≥ m ≥ 1
(4.19)
m−1 (l − a)−d/2 used in [21, (replace the upper bound K m(l − m + 1)−d/2 on a=0 (2.33),(2.35)] by the more careful upper bound K (l − m + 1)(2−d)/2 ), and we will use this variant. The IIC measure is given in [21, Eq. (2.29)] as P(E) =
∞
τ1 ϕl (E),
E ∈ Em .
(4.20)
l=m
The following lemma bounds the rate at which the measure P2m converges to P∞ . Lemma 4.2. Let d > 4. For E ∈ Em , |P2m (E) − P∞ (E)| = O((m + 1)(4−d)/2 ),
(4.21)
where the constant in the error term is uniform in E and L ≥ L 0 . The error term can be guaranteed to be uniform in d as well, by further requiring that L ≥ L 1 for some L 1 = L 1 (d). Proof. By the triangle inequality, 2m 2m |P2m (E) − P∞ (E)| ≤ P2m (E) − τ1 ϕl (E) + P∞ (E) − τ1 ϕl (E) . l=m
(4.22)
l=m
For the second term on the right-hand side, we use (4.20) and (4.19) to obtain 2m ∞ ∞ τ1 ϕl (E) ≤ τ1 |ϕl (E)| ≤ Kβ (l−m+1)(2−d)/2 ≤ Kβm (4−d)/2 . P∞ (E)− l=m l=2m+1 l=2m+1 (4.23) For the first term on the right-hand side of (4.22), we use (4.17) to obtain 2m−1 2m τ2m−l−1 1 τ1 ϕl (E) ≤ τ1 |ϕl (E)| − 1 + |ϕ2m (E)| − τ1 . P2m (E) − τ2m τ2m l=m
l=m
(4.24) By (4.19), the last term is bounded by Kβm (2−d)/2 . To bound the sum, we split it into the cases m ≤ l < 3m/2 and 3m/2 ≤ l ≤ 2m − 1. In the first case, we use (4.3) to obtain |(τ2m−l−1 /τ2m ) − 1| ≤ Kβm (4−d)/2 . Then inserting the bound (4.19) and summing over l, we obtain a bound Kβm (4−d)/2 for the first case. In the second case, we bound |τ2m−l−1 /τ2m − 1| ≤ K . Inserting the bound on ϕl , and summing over l, we obtain a bound Kβm (4−d)/2 for the second case. Thus, in either case, (4.24) is bounded by Kβm (4−d)/2 . For L ≥ L 1 this bound is at most K¯ m (4−d)/2 . With (4.22)–(4.23), this proves (4.21).
412
M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade
4.2. Volume convergence: Proof of Proposition 3.1. In this section, we prove Proposition 3.1. We now choose b0 = (2τ1 A2 V ∗ R 2 )−1 in (3.1), so that Z R is defined by Z R = (2τ1 A2 V ∗ R 2 )−1 V (R).
(4.25)
As pointed out in Sect. 4.1, the constants τ1 , A, V ∗ all have limit 1 as L → ∞. Let Z˜ R = (A2 V ∗ R 2 )−1 |B(R)|.
(4.26)
Thus Z˜ R is defined in terms of the vertices in B(R), whereas Z R is defined in terms of the edges. Recall the random variable Z defined in (4.14). We use (4.9) to prove that lim R→∞ E Z˜ lR = EZ l for all l ≥ 1, and then adapt this to Z R . Let l ≥ 1. By definition, E Z˜ lR =
R−1 R−1 1 · · · · · · ρn(l+1) (x1 , . . . , xl ) 1 ,...,n l (A2 V ∗ R 2 )l d d n 1 =0
=
1 R
R−1
···
n 1 =0
1 R
nl =0 x1 ∈Z
R−1
xl ∈Z
1 ρˆ (l+1) , (A2 V ∗ R)l tR
nl =0
(4.27)
where t = (n 1 R −1 , . . . , nl R −1 ). The summand on the right-hand side is bounded by a constant, by standard tree-graph inequalities [1] (see [21, Sect. 5.1] for the details when l = 1). Therefore, by (4.9), the dominated convergence theorem, and (4.15), lim E Z˜ lR =
R→∞
1
dt1 · · ·
0
0
1
(l+1) dtl Mˆ 1, = EZ l . t
(4.28)
The next lemma implies that it is also the case that lim R→∞ EZ lR = EZ l for all l ≥ 1. Lemma 4.3. For all l ≥ 1 and R ≥ 3, (1 − 2/R)2l E Z˜ lR−2 ≤ EZ lR ≤ E Z˜ lR−1 + c(d, L , l)R −1 .
(4.29)
Since Z was shown in (4.16) to have a moment generating function with radius of convergence at least 2, the convergence of moments established in Lemma 4.3 implies that Z R converges weakly to Z (see [12, Theorem 30.2]). Note that for L ≥ L 1 , the constants A, V ∗ and τ1 satisfy bounds independent of d, hence c1 and c2 in Proposition 3.1 do not depend on d. This completes the proof of Proposition 3.1, subject to Lemma 4.3. Proof of Lemma 4.3. For l ≥ 1, we define σm(l+1) x , y) = Q∞ ((0, 0) −→ (xi , m i ) −→ (yi , m i + 1) for all i = 1, . . . , l). ( Note that 2|edges in B(R − 1)| ≤
(x,m)∈B(R)
µ(x,m) = V (R) ≤ 2|edges in B(R)|,
(4.30)
Random Walk on the Incipient Infinite Cluster for Oriented Percolation
413
since edges on the boundary of B(R) are counted once in V (R), while other edges are counted twice. Therefore EZ lR ≥
R−2 R−2 1 · · · 2 ∗ 2 l (τ1 A V R ) n 1 =0
···
nl =0 x1 ,y1 ∈Zd
σn(l+1) (x1 , . . . , xl , y1 , . . . , yl ), 1 ,...,n l
xl ,yl ∈Zd
(4.31) with a corresponding upper bound if the summations over the n i ’s extend to R − 1. Lower bound. The Harris–FKG inequality [16,18] implies that for increasing events A and B we have Qn (A ∩ B) ≥ Qn (A)P(B). If A and B are cylinder events, then by passing to the limit, we have Q∞ (A ∩ B) ≥ Q∞ (A)P(B). Hence x , y) ≥ ρn(l+1) ( x) σn(l+1) (
l
τ1 (yi − xi ).
(4.32)
i=1
With (4.27), this gives EZ lR ≥ [(R − 2)/R]2l E Z˜ lR−2 . Upper bound. Let x ) = {(0, 0) −→ ∞, (0, 0) −→ (xi , m i ), i = 1, . . . , l}. Am ( x , y) denote the event that the following l + 1 events occur on disjoint sets of Let Fm ( edges: x ), {(x1 , m 1 ) −→ (y1 , m 1 + 1)}, . . . , {(xl , m l ) −→ (yl , m l + 1)}. Am (
(4.33)
Then x , y) ≤ Q∞ (Fm ( x , y))+Q∞ (Am ( x )∩li=1 {(xi , m i ) −→ (yi , m i + 1)}\ Fm ( x , y)). σm(l+1) ( (4.34) The BK inequality implies that for increasing events A and B that depend on only finitely many edges we have P(A ◦ B) ≤ P(A)P(B), where A ◦ B denotes disjoint occurrence [8,18]. We will bound the first term by passing to the limit in the BK inequality. Let Am,n x ) = {(0, 0) −→ n, (0, 0) −→ (xi , m i ), i = 1, . . . , l}, ( x , y) analogously, by replacing Am ( x ) in (4.33) by Am,n x ). Then each and define Fm,n ( ( event in the definition of Fm,n x , y) only depends on finitely many edges, hence by BK, ( P(Fm,n x , y)) ≤ P(Am,n x )) ( (
l
τ1 (yi − xi ).
i=1
Dividing both sides by P((0, 0) −→ n) and letting n → ∞, we get x , y)) ≤ Q∞ (Am ( x )) Q∞ (Fm (
l i=1
(l+1)
τ1 (yi − xi ) = ρm ( x)
l
τ1 (yi − xi ).
(4.35)
i=1
l The sum of this bound over x and y is ρˆm(l+1) τ1 . With (4.27), this gives a contribution l ˜ E Z R−1 to the upper bound version of (4.31). We claim that on the event Am ( x )∩li=1 {(xi , m i ) −→ (yi , m i + 1)} \ Fm ( x , y), there exists 1 ≤ i ≤ l such that either (xi , m i ) −→ (x j , m j ) for some j = i, or (xi , m i ) −→
414
M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade
∞. To see this, we may assume that all the (xi , m i )’s are different, otherwise there is nothing to prove. Under this assumption, the last l events in (4.33) occur disjointly. As in a tree-graph bound [1], choose a set of disjoint paths showing that Am ( x ) occurs. Then at least one of the paths uses an edge ((xi , m i ), (yi , m i + 1), otherwise Fm ( x , y) would occur. This path includes a connection (xi , m i ) −→ (x j , m j ) or (xi , m i ) −→ ∞, proving the claim. By the claim, the second term on the right-hand side of (4.34) is at most ⎤ ⎡ ⎣ Q∞ (Am ( x ), (xi , m i ) −→ (x j , m j )) + Q∞ (Am ( x ), (xi , m i ) −→ ∞)⎦ . 1≤i≤l
j=i
(4.36) Each term in (4.36) can be bounded using a tree-graph inequality where the number of internal vertices in the tree-graph bound is l − 1, one less than it would be for ρ (l+1) . This implies that the sum of (4.36) over x and y inside B(R) is bounded by c(d, L , l)R l−1 . It follows that EZ lR ≤ E Z˜ lR−1 + c(d, L , l)R −1 , which gives the desired upper bound and completes the proof of (4.29).
4.3. Volume estimate: Proof of Proposition 3.2. In this section, we prove Proposition 3.2. Recall the definitions of Pn and P∞ from (1.27)–(1.28). It is enough to show that we can find constants R0 (d), c1 (d), c2 (d), c3 (d) such that for R ≥ R0 and λ ≤ c3 we have P∞ (V (R)R −2 < λ) ≤ c1 exp{−c2 λ−1/2 }.
(4.37)
Indeed, the restrictions on λ and R can be removed by adjusting the constant c1 as follows. First, for λ > c3 , if c1 > exp{c2 (c3 )−1/2 }, the right-hand side of (4.37) is larger than 1. As for R < R0 , due to the (deterministic) inequality V (R) ≥ R, we have V (R)R −2 ≥ R R −2 > R0−1 . Therefore, if λ < R0−1 , the left-hand side of (4.37) is 0. 1/2 For λ ≥ R0−1 , it is enough to require that c1 > exp{c2 R0 }. Finally, note that if initially R0 , c1 , c2 , c3 are independent of d, then so is the adjusted c1 . We begin with a simple consequence of Proposition 3.1. Corollary 4.4. Given ε > 0, there exists λ0 = λ0 (ε, d), such that Q∞ (V (R)R −2 < λ0 ) < ε, R ≥ 1.
(4.38)
For L ≥ L 1 , λ0 can be chosen independent of d. Proof. This follows from Proposition 3.1 and the fact that Z is strictly positive.
Let c = c(d) = supm≥1 τm . According to (4.38), there is a constant c3 = c3 (d) such that 1 P∞ (V (R) < 4c3 (R + 1)2 ) < , R ≥ 1. (4.39) 3c We fix m 0 = m 0 (d) such that for m ≥ m 0 the error term on the right-hand side of (4.21) is at most (3c)−1 . Let R0 = 16c3 m 20 . Fix λ ≤ c3 and R ≥ R0 . We will prove that (4.37) 1/2 holds for λ and R with the choice of c3 made and with c1 = 1 and c2 = 21 log(3/2)c3 .
Random Walk on the Incipient Infinite Cluster for Oriented Percolation
415
There is nothing to prove if λ < R0 /R 2 , since, in this case P∞ (V (R)R −2 < λ) ≤ P∞ (V (R) < R0 ) ≤ P∞ (V (R) < R) = 0
(4.40)
and (4.37) holds trivially. Hence, without loss of generality, we assume that 16c3 m 20 R0 = 2 ≤ λ ≤ c3 . R2 R
(4.41)
To estimate P∞ (V (R) < λR 2 ), we subdivide the time interval [0, R] into blocks that provide roughly independent contributions to the volume, and apply (4.39) in each block. The number of blocks is S = (c3 /λ)1/2 , which is at least 1 by (4.41). The length of a block is 2m, with m = R/2S. Note that m ≥ m 0 , since 1/2
R R R ≥ 01/2 = 2m 0 > 1, ≥ 1/2 2S 2(c3 /λ) 2c3
and hence m=
R 2S
≥
R R ≥ m0. ≥ 4S 4(c3 /λ)1/2
(4.42)
(4.43)
Set n i = i(2m), i = 0, . . . , S, so that the i th block starts at level n i−1 and ends at level ni . By (1.28), 1 P pc (V (R) < λR 2 , (0, 0) −→ (x, N )). (4.44) N →∞ τ N d
P∞ (V (R) < λR 2 ) = lim
x∈Z
The path (0, 0) −→ (x, N ) on the right-hand side passes through the levels n 1 , . . . , n S , and hence there exist 0 = x0 , x1 , . . . , x S ∈ Zd such that (0, 0) −→ (x1 , n 1 ) −→ · · · −→ (x S , n S ) −→ (x, N ). We write x i = (xi , n i ) for i = 0, . . . , S, and write x = (x, N ). It follows that P pc (V (R) < λR 2 , (0, 0) −→ (x, N )) ⎞ ⎛ = P pc ⎝ {V (R) < λR 2 , x i−1 −→ x i , i = 1, . . . , S} ∩ {x S −→ x}⎠ x1 ,...,x S ∈Zd
≤
P pc (V (R) < λR 2 , x i−1 −→ x i , i = 1, . . . , S, x S −→ x).
x1 ,...,x S ∈Zd
(4.45) Let
C(y; n) = C(y) ∩ (Zd × {0, 1, . . . , n}).
(4.46)
On the event on the right-hand side of (4.45), x i−1 is contained in B(R), and hence C(x i−1 ; n i−1 + m) ⊂ B(R). Denote Vi = µ(C(x i−1 ; n i−1 + m)). Then on the event in the right-hand side of (4.45), since λ ≤ c3 /S 2 by the choice of S, we have R 2 c3 ≤ 4c3 (m + 1)2 . (4.47) Vi ≤ V (R) < λR 2 ≤ 2 R 2 = 4c3 S 2S
416
M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade
Hence, the right-hand side of (4.45) is at most S ! P pc {Vi < 4c3 (m + 1)2 , x i−1 −→ x i } ∩ {x S −→ x} . x1 ,...,x S ∈Zd
(4.48)
i=1
The S + 1 events in (4.48) depend on disjoint sets of bonds, so the probability factors as x1 ,...,x S
P pc (x S −→ x)
∈Zd
S
P pc (Vi < 4c3 (m + 1)2 , x i−1 −→ x i ).
(4.49)
i=1
We insert this into (4.45), and use (4.44), (4.3) and (1.27) to obtain ⎞ ⎛ S τ N −n S ⎝ P∞ (V (R) < λR 2 ) ≤ P pc (Vi < 4c3 (m + 1)2 , x i−1 −→ x i )⎠ lim sup τN N →∞ d i=1
xi ∈Z
" #S = τ2m P2m (V (m) < 4c3 (m + 1)2 ) . (4.50) By Lemma 4.2, the right-hand side equals " #S S τ2m P∞ (V (m) < 4c3 (m + 1)2 ) + O((m + 1)(4−d)/2 ) .
(4.51)
By the choice of m 0 and (4.39), both terms inside the square brackets are at most (3c)−1 . Since $ % 1 S = (c3 /λ)1/2 ≥ (c3 /λ)1/2 , 2 it follows from our choice of c that S ' & 2 S 2 1/2 S ≤ ≤ exp − 21 log(3/2)c3 λ−1/2 . (4.52) P∞ (V (R) < λR 2 ) ≤ τ2m 3c 3 1/2
The choice c2 = 21 log(3/2)c3 gives (4.37). Noting that for L ≥ L 1 , c, c3 and m 0 (and hence all further constants chosen) are independent of d, this completes the proof of Proposition 3.2. 5. IIC Resistance Estimates: Proof of Proposition 3.3 In this section we prove Proposition 3.3. Throughout, we use x, y, . . . to denote spacetime vertices in Zd × Z+ , we denote the spatial component of a vertex x by x, and we write |x| = n when x = (x, n). According to (3.3),
|x| = n, x is RW-connected to , 0 < n ≤ R. (5.1) D(n) = e = (w, x) ⊂ C : level R by a path in C ∩ U (n) Our goal is to prove that for d > 6, L sufficiently large and 0 < a < 1, E∞ (|D(n)|) ≤ c1 (a), 0 < n ≤ a R.
(5.2)
Random Walk on the Incipient Infinite Cluster for Oriented Percolation
417
y R
n
x w
0 Fig. 2. The configuration bounded in (5.4). The vertices w = (w, n − 1), x = (x, n), y = (y, N ) are summed over w, x, y ∈ Zd , and the three unlabelled vertices are summed over space and time
Writing y = (y, N ), by (1.28) and (4.3) we have E∞ |D(n)| = P∞ [(w, x) ∈ D(n)] w,x∈Zd
=
1 lim A N →∞
P pc (w, x) ∈ D(n), 0 −→ y .
(5.3)
w,x,y∈Zd
Hence we will focus on the event {(w, x) ∈ D(n), 0 −→ y}, for fixed n, w = (w, n−1), x = (x, n) and y = (y, N ). Remark. For a quick indication of why we need to assume d > 6, consider the configuration in Fig. 2, which contributes to the right-hand side of (5.3). Using the fact that τn is bounded by a constant by (4.4), and using (4.2) (see also (5.32) below), the configuration in Fig. 2 can be bounded above using the BK inequality by c
∞ ∞ ∞ l n l (l − j + 1)−d/2 ≤ c (l − n + 1)(2−d)/2 ≤ c (l − n + 1)(4−d)/2 l=n k=n j=0
=c
l=n k=n ∞ (4−d)/2
m
l=n
,
(5.4)
m=1
where j, k, l are the time coordinates of the unlabelled vertices, from bottom to top. Here, the connection from the lower unlabelled vertex to the upper unlabelled vertex via w and x contributes K (l − j + 1)−d/2 , and the other connections all contribute constants. The right-hand side is bounded only for d > 6. Our complete proof of (5.2) is more involved since we must estimate the contributions to (5.3) due also to more complex zigzag random walk paths. In Sect. 5.1, we prove Lemma 5.1, which explores the geometry of the event {(w, x) ∈ D(n), 0 −→ y}. Then, in Sect. 5.2, we apply Lemma 5.1 to construct
418
M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade R z
n
x w p q p
r
A
B 0
Fig. 3. Illustration of the setup in Lemma 5.1
events A J (n, w, x, y), J ≥ 0, such that {(w, x) ∈ D(n), 0 −→ y} ⊂
∞
A J (n, w, x, y).
(5.5)
J =0
In Sect. 5.3, the BK inequality [8] is used to obtain a diagrammatic bound for the probability of the event A J (n, w, x, y). Finally, in Sect. 5.4, we estimate the diagrams in this diagrammatic bound, to prove (5.2) and hence Proposition 3.3. The need to restrict to d > 6, rather than d > 4, occurs only in our last lemma, Lemma 5.6. 5.1. An intersection lemma. We will need the existence of certain intersections within the cluster C that are implied by the presence of a random walk path from x to R. These intersections are isolated in the following lemma. The following notation will be convenient: C˜(p,q) = {v : 0 −→ v disjointly from the edge (p, q)}, (p, q) ⊂ C. Also, we write y 1 y 2 for an occupied oriented path y 1 −→ y 2 . Such paths are in general not unique, but context will often identify a unique path for consideration. We first describe informally the statement of the lemma, whose setup is illustrated in Fig. 3. Suppose that (w, x) ∈ D(n), and 0 −→ y. Let (p, q) be an edge on an occupied path that starts at 0 and ends with the edge (w, x). Assume that q −→ / R. (p,q) ˜ Then C(q) must intersect C , otherwise a RW-connection from x to R in C ∩ U (n) could not occur. The lemma gives a more sophisticated version of the intersection requirements, which allows us to have some control over the way the intersection occurs.
Random Walk on the Incipient Infinite Cluster for Oriented Percolation
419
This is needed, because we will use the lemma recursively to construct a set of paths realizing the intersections. Assume that we are given a subgraph A ∪ B of C˜(p,q) , that will represent a set of paths already constructed, where A will be a certain ‘preferred region.’ Assume that A ∪ B is disjoint from C(q), and 0 ∈ A ∪ B. Then there will be upwards occupied paths from some vertex r ∈ A ∪ B and some vertex p ∈ qx to an intersection point z. It will be convenient, if we can also conclude that r is in the preferred region A. For this reason, we will also assume that any occupied path from B to C(q) passes through A. Now we state the lemma precisely. Lemma 5.1. Assume the event {(w, x) ∈ D(n), 0 −→ y}. In addition, assume the following: (i) (ii) (iii) (iv)
(p, q) ⊂ C and either q −→ w or (p, q) = (w, x); q −→ / R; A and B are subgraphs of C˜(p,q) with 0 ∈ A∪ B, and such that (A∪ B)∩C(q) = ∅; every occupied oriented path from B to C(q) passes through a vertex of A.
Then there exist p ∈ qx, r ∈ A and z with |p| < |z| < R, such that p −→ z and r −→ z edge-disjointly, and edge-disjointly from px ∪ A ∪ B. Here z may coincide with p or r. Proof. We first show that C(q) and C˜(p,q) must have a common vertex v. Fix a random walk path from x to R in U (n), showing that (w, x) ∈ D(n). Note that C (as a set of vertices) is the union C˜(p,q) ∪ C(q). Since starts at x ∈ C(q), but q −→ / R, there is an edge (v, v ) ⊂ such that v ∈ C(q) but v ∈ C(q), and therefore v ∈ C˜(p,q) . We need to have |v | = |v| − 1 (otherwise v ∈ C(q)). We can rule out (v , v) = (p, q), since stays in U (n), and |p| ≤ n − 1. It follows that v ∈ C˜(p,q) , and hence is in the intersection C(q) ∩ C˜(p,q) . / R, |p| < |q| ≤ |z| < R. Choose z ∈ C(q) ∩ C˜(p,q) with |z| minimal. Since q −→ We can find occupied oriented paths qz ⊂ C(q) and 0z ⊂ C˜(p,q) . These two paths must be edge-disjoint by minimality of |z|. Let p be the last visit of qz to qx, and let r be the last visit of 0z to A ∪ B. Such a last visit exists, since we assumed 0 ∈ A ∪ B. Since z ∈ A ∪ B, due to (A ∪ B) ∩ C(q) = ∅, the last visit has to be in A by assumption (iv). The path p z is edge-disjoint from px, by the definition of p . It is also edge-disjoint from A ∪ B, by minimality of |z|. Likewise, the path rz is edge-disjoint from A ∪ B by definition of r. It is also edge-disjoint from p z, by minimality of |z|. Remark. Note that in the proof, we have first found a vertex r ∈ A ∪ B, and assumption (iv) was only used to show that we must have r ∈ A. In fact, without assumption (iv), we would get the statement of the lemma with r ∈ A ∪ B. The significance of being able to ensure that r is in the smaller set A, as well as the roles played by A and B will become apparent in Sect. 5.2. 5.2. The event A J (n, w, x, y). In this section, we define the event A J (n, w, x, y) and prove (5.5). The following lemma is key. Lemma 5.2. Let e = (w, x), and assume the event {e ∈ D(n), 0 −→ y}. Then there exists J ≥ 0, such that the following vertices and paths (all edge-disjoint) exist:
420
M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade
R
y
z2
z3 v2
n
v3 u3
z1 u2 v1 v0
u1 u0
v∗ 0 Fig. 4. The vertices and disjoint paths of A J (n, w, x, y) for J = 3. Here x = v 3 and w = u3
(i) vertices u0 , u1 , . . . , u J = w such that 0 ≤ |u0 | ≤ |u1 | ≤ · · · ≤ |u J | = n − 1; (ii) vertices v 0 , v 1 , . . . , v J = x, and, if J ≥ 1, vertices z1 , . . . , z J such that |ui−1 | ≤ |v i−1 | ≤ |zi |, 1 ≤ i ≤ J ; |ui−1 | < |zi | < R, 1 ≤ i ≤ J ;
(5.6) (5.7)
(iii) 0 −→ u0 and ui−1 −→ ui , 1 ≤ i ≤ J ; (iv) ui−1 −→ zi , 1 ≤ i ≤ J ; (v) v i−1 lies either on ui−1 ui or ui−1 zi , and v i −→ zi , 1 ≤ i ≤ J . In addition, at least one of the following holds: Case (a) v 0 −→ y; Case (b) v 0 −→ R and there exists v ∗ on 0u0 such that v ∗ −→ y. Definition 5.3. We denote by A J = A J (n, w, x, y) the event that the vertices and disjoint paths listed in Lemma 5.2 exist, and (w, x) is occupied. See Fig. 4. The inclusion (5.5) then follows immediately from Lemma 5.2. Proof of Lemma 5.2. Throughout the proof, we assume the event {e = (w, x) ∈ D(n), 0 −→ y}. We first show that if x −→ R then the lemma holds with J = 0. Indeed, take u0 = w and v 0 = x. Then 0 −→ u0 , since u0 ∈ C. Hence it is left to show that at least one of Cases (a) and (b) holds. If v 0 = x −→ y, then Case (a) holds. If not, then since 0 −→ y we can find v ∗ ∈ 0u0 such that v ∗ −→ y edge-disjointly from 0u0 . The connection v ∗ y has to be edge-disjoint from wx R, otherwise we are in Case (a). Hence Case (b) holds. For the rest of the proof, we assume x −→ / R. We construct the paths claimed in the lemma recursively. Hence our proof will be based on a recursion hypothesis whose statement involves an integer I ≥ 0, and which
Random Walk on the Incipient Infinite Cluster for Oriented Percolation
421
says that a subset of the paths claimed in the lemma (depending on I ) have already been constructed. In order to advance the recursion, the hypothesis also specifies graphs A I and B I such that Lemma 5.1 can be applied with A = A I and B = B I . The outline of the proof is the following. Since the statement of the hypothesis for I = 0 is slightly different than for I ≥ 1, we state and verify the hypothesis for I = 0 separately. This will show that the recursion can be started. Since the general step of the recursion is complex, we explain the first two steps of the recursion (I = 1 and I = 2) in some detail, before formulating the recursion hypothesis precisely in the general case I ≥ 1. The recursion will lead to the proof of the lemma by the following steps. We prove that if the hypothesis holds for some value of I ≥ 0, then either the conclusion of Lemma 5.2 follows with J = I + 1, or else the hypothesis also holds for I + 1. If, for some i > 0, the hypothesis holds for I = 0, 1, . . . , i, then its statement will guarantee the existence of vertices p0 , p 1 , . . . , pi with |p0 | < |p1 | < · · · < |pi | < n.
(5.8)
Consequently the hypothesis cannot hold for all I = 0, 1, . . . , n, and the implications just mentioned provide a proof of Lemma 5.2. We now carry out the details. (R) Recursion hypothesis for I = 0. There exists p0 , q 0 such that 0 −→ p0 , p0 −→ R, p0 −→ w −→ x, q 0 −→ / R,
(5.9) (5.10)
where (p 0 , q 0 ) is the first edge in the path p0 x. All paths stated are edge-disjoint. Letting A0 = {0p0 , p 0 R} = {paths in (5.9)}, B0 = ∅, the hypotheses of Lemma 5.1 are satisfied with p = p 0 , q = q 0 , A = A0 and B = B0 . Verification of (R) for I = 0. Since 0 −→ w and 0 −→ R, there exists p 0 such that 0 −→ p 0 , p 0 −→ w and p0 −→ R disjointly. Fix the paths 0p0 , p 0 w and p0 R, and let (p0 , q 0 ) be the first step of the path p0 x. If we select p 0 so that |p0 | is maximal, then we have q 0 −→ / R. We verify the hypotheses of Lemma 5.1 with these choices. First, (i), (ii) and 0 ∈ A0 ∪ B0 are immediate. Also, C(q 0 ) ∩ (A0 ∪ B0 ) = C(q 0 ) ∩ A0 = ∅, since otherwise q 0 −→ R. Finally, (iv) is vacuous, since B0 is empty. Next, to illustrate the main idea of the proof, we explain the first two steps of the recursion. Since we have verified (R) in the case I = 0, we can apply Lemma 5.1 with p = p0 , q = q 0 , A = A0 and B = B0 . Lemma 5.1 shows that there exist p ∈ q 0 x and r ∈ A0 = 0p0 ∪ p 0 R and a vertex z such that p −→ z and r −→ z. For reasons that will be explained in the third paragraph below, we select p with |p | maximal such that the conclusions of Lemma 5.1 hold. With this choice of p , we set p 1 = p , z1 = z and r 0 = r. Note that |p1 | > |p0 |. We define the vertices u0 and v 0 as follows. Note that r 0 ∈ A0 , which is the union of the paths 0p0 and p 0 R. If r 0 ∈ p 0 R then we set v 0 = r 0 and u0 = p0 , and if r 0 ∈ 0p0 then we set v 0 = p0 , u0 = r 0 . In either case, we have |u0 | ≤ |p0 | < |z1 | < R, and hence (5.7) holds for i = 1.
422
M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade R
R z2
n
x w
z1 q1
n
x w
z1 q2
p2 v1 = p1
p1 v0 = r0 u 0 = p0
(a)
v0
u1 = r1 u0
0
(b)
0
Fig. 5. Assumptions of the recursion hypothesis for (a) I = 1; (b) I = 2. The thick solid lines indicate the sets (a) B1 and (b) B2 , and the thick dashed lines the sets (a) A1 and (b) A2 . The intersection lemma is used to produce paths that join the thick dashed lines to the thin solid lines
The paths constructed so far are depicted in Fig. 5 (a). For the moment, the reader should disregard q 1 , and the distinction between thin, thick and dashed paths in the figure. We either have |p 1 | < |x| = n, as depicted in Fig. 5(a), or p1 = x. We first argue that in the case p1 = x, Lemma 5.2 holds with J = 1. Indeed, if p1 = x, we set u1 = w and v 1 = x. Then apart from the claim regarding Cases (a) and (b), the vertices and paths required by Lemma 5.2 for J = 1 have been constructed. (Note that the conclusion of Lemma 5.1 guarantees that the newly constructed paths are edge-disjoint from the old ones.) It is not difficult to also show that either Case (a) or (b) holds, and we leave the details of this to when we deal with the general recursion step. Next we explain how to continue the construction if |p 1 | < |x| = n. Let q 1 denote the first vertex on the path p 1 x following p1 . Let B1 denote the union of the thick solid lines in Fig. 5(a), that is, B1 = 0p0 ∪ p0 R ∪ r 0 z1 = A0 ∪ r 0 z1 . Let A1 denote the union of the dashed lines in Fig. 5(a), that is, A1 = p 0 p1 ∪ p 1 z1 . We want to apply Lemma 5.1 with A = A1 , B = B1 , etc. It is easy to verify conditions (i)–(iii) of the lemma. The crucial condition here is (iv), which allows us to conclude that r ∈ A1 , and hence the two new paths produced by Lemma 5.1 will connect the dashed lines to the thin solid lines in Fig. 5(a). The reason condition (iv) is satisfied is that we chose |p1 | to be maximal. Indeed, a glance at Fig. 5(a) suggests that if we had paths from q 1 x and B1 \A1 to a vertex z that are edge-disjoint from A1 ∪ B1 , then that would contradict the maximality of |p1 |. (Recall the earlier application of Lemma 5.1 with A = A0 , B = B0 , etc., and the choice of p 1 .) We will verify the details of this when we deal with the general case I ≥ 1. We can summarize the above discussion by saying that Hypothesis (R) for I = 0 should imply that in the case p 1 = x the following statement holds: (R) Recursion hypothesis for I = 1. Vertices and paths (all edge-disjoint) with the following properties exist: (i) p1 and q 1 such that / R, p1 −→ w −→ x, q 1 −→
(5.11)
Random Walk on the Incipient Infinite Cluster for Oriented Percolation
423
where (p1 , q 1 ) is the first edge of the path p 1 x, and |p1 | > |p0 |; (ii) u0 , v 0 , z1 , such that 0 −→ u0 , u0 −→ z1 , v 0 −→ R;
(5.12)
(iii) u0 −→ p1 ; (iv) v 0 lies either on u0 p 1 , in which case p0 = v 0 , or on u0 z1 , in which case p 0 = u0 ; (v) p 0 −→ p 1 −→ z1 . Letting A1 = {p 0 p1 , p 1 z1 }, B1 = A0 ∪ {r 0 z1 } = {paths in (5.12)} ∪ {u0 p0 }, the hypotheses of Lemma 5.1 are satisfied with p = p 1 , q = q 1 , A = A1 and B = B1 . The next step of the construction is carried out similarly. An application of Lemma 5.1 gives the paths shown in Fig. 5(b). Again, we chose p so that |p | is maximal, and set p 2 = p , z2 = z and r 1 = r for this choice of p . We define u1 and v 1 depending on the location of r 1 , similarly to the previous step. If p2 = x, we can conclude similarly to the previous step that the lemma holds with J = 2. If p 2 = x, as in Fig. 5(b), we advance the induction similarly to the previous step. This time, we use both the choice of p1 and p 2 to conclude the necessary statement about A2 and B2 . Now we state the recursion hypothesis in general for I ≥ 1. (R) Recursion hypothesis for I ≥ 1. Vertices and paths (all edge-disjoint) with the following properties exist: (i) p I and q I such that
p I −→ w −→ x, q I −→ / R,
(5.13)
where (p I , q I ) is the first edge of the path p I x, and |p I | > |p I −1 |; (ii) ui , 0 ≤ i < I ; v i , 0 ≤ i < I ; zi , 1 ≤ i ≤ I , such that Lemma 5.2 (iii) holds with i restricted to 1 ≤ i < I , Lemma 5.2 (iv) holds with i restricted to 1 ≤ i ≤ I , Lemma 5.2 (v) holds with i restricted to 1 ≤ i < I , v 0 −→ R;
(5.14) (5.15) (5.16) (5.17)
(iii) u I −1 −→ p I ; (iv) v I −1 lies either on u I −1 p I , in which case p I −1 = v I −1 , or on u I −1 z I , in which case p I −1 = u I −1 ; (v) p I −1 −→ p I −→ z I . Letting A I = {p I −1 p I , p I z I }, B I = B I −1 ∪ A I −1 ∪ {r I −1 z I } = {paths in (5.14)–(5.17)} ∪ {u I −1 p I −1 }, the hypotheses of Lemma 5.1 are satisfied with p = p I , q = q I , A = A I and B = B I . Figure 5 illustrates those paths of Fig. 4 that have been constructed at the stages I = 1 and I = 2. Note that p I receives either the label u I or v I . Hence pi will always equal either ui or v i , depending on the location of v i (by part (iv) of the hypothesis). Note also that (5.8) holds if (R) holds for all I = 0, 1, . . . , i.
424
M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade
Consequence of (R): Definition of p I +1 , u I , v I and z I +1 . We now assume that (R) holds for some I ≥ 0. An application of Lemma 5.1 with the data given in the hypothesis shows the existence of vertices p , r and z with certain properties. We now choose p so that |p | is maximal, and such that the properties claimed in Lemma 5.1 hold. We set p I +1 = p , z I +1 = z and r I = r for this choice. Note that r I ∈ A I , which is a union of two paths in both cases I = 0 and I ≥ 1. In the case I = 0, if r 0 ∈ p0 R then we set v 0 = r 0 and u0 = p 0 , and if r 0 ∈ 0p 0 then we set v 0 = p0 , u0 = r 0 . Similarly, in the case I ≥ 1, we set v I = r I and u I = p I if r I ∈ p I z I , and we set v I = p I , u I = r I if r I ∈ p I −1 p I . In both cases, it is clear that |u I | ≤ |p I | < |z I +1 | < R, and hence (5.7) holds for i = I + 1. It follows immediately from these definitions, and from the disjointness properties ensured by Lemma 5.1, that assumptions (ii)–(v) of (R) now hold with I replaced by I + 1. Verification of Lemma 5.2 if p I +1 = x. We show that if p I +1 = x, then Lemma 5.2 holds with J = I + 1. For this, we define u I +1 = w and v I +1 = x. It is immediate from these definitions, from the disjointness properties ensured by Lemma 5.1, and from the already established properties (ii)–(v) of hypothesis (R) for I + 1 = J , that (i)–(v) of Lemma 5.2 hold. It remains to show that either Case (a) or Case (b) holds. Since 0 −→ y, there exists v ∗ ∈ 0u0 , such that v ∗ −→ y disjointly from 0u0 . If v ∗ y is not disjoint from v 0 R, we are in Case (a), and we can ignore v ∗ . If v ∗ y intersects u0 p0 or u0 z1 , let v 0 be the last such intersection. Note that v ∗ y must be disjoint from all other paths constructed, since those are subsets of C(q 0 ), and q 0 −→ / R. Hence if the intersection v 0 exists, we can replace v 0 by v 0 and we are in Case (a). If the intersection v 0 does not exist, we are in Case (b). This verifies the claims of Lemma 5.2. We are left to show that if p I +1 = x, then (R) must hold for I + 1. Advancing the recursion I =⇒ I + 1 if p I +1 = x. Since p I +1 ∈ q I x, but p I +1 = x, we have |p I +1 | > |p I |, and p I +1 −→ w, showing (i) of hypothesis (R). We have already seen that (ii)–(v) are guaranteed to hold. We are left to show that the hypotheses of Lemma 5.1 hold with the data given. (i), (ii) and 0 ∈ A I +1 ∪ B I +1 are clear from the definitions. By the definition of q I +1 , A I +1 ∪ B I +1 is a subgraph of C˜(q I +1 ) . Assume, for a contradiction, that we have z∗ ∈ C(q I +1 )∩(A I +1 ∪ B I +1 ). Without loss of generality, assume that z∗ is the first visit of an occupied path q I +1 z∗ to A I +1 ∪ B I +1 . In particular, q I +1 z∗ is edge-disjoint from A I +1 ∪ B I +1 . Observe that A I +1 ∪ B I +1 = A I +1 ∪ A I ∪ B I ∪ {r I z I +1 }. If we had z∗ ∈ A I +1 , then the disjoint paths q I +1 z∗ z I +1 and r I z I +1 would satisfy the conclusions of Lemma 5.1 for p = p I , q = q I , etc. This contradicts the choice of p I +1 (the maximality of |p I +1 |), since |q I +1 | > |p I +1 |. If we had z∗ ∈ r I z I +1 , we get a similar contradiction due to the paths q I +1 z∗ and r I z∗ . Finally, we can rule out z∗ ∈ A I ∪ B I , since C(q I +1 ) ⊂ C(q I ), and the latter is disjoint from A I ∪ B I . We are left to show that every occupied path from B I +1 to C(q I +1 ) has to pass through A I +1 . Assume, for a contradiction, that there exists z∗ ∈ C(q I +1 ), and z∗ ∈ B I +1 such that z∗ −→ z∗ disjointly from A I +1 . By considering the last visit, we may also assume that z∗ is the only vertex of z∗ z∗ in A I +1 ∪ B I +1 . We may also assume that q I +1 z∗ and z∗ z∗ are edge-disjoint. We already saw C(q I +1 ) ∩ (A I +1 ∪ B I +1 ) = ∅, in particular,
Random Walk on the Incipient Infinite Cluster for Oriented Percolation
425
q I +1 z∗ is edge-disjoint from A I +1 ∪ B I +1 . Observe that B I +1 = A I ∪ B I ∪ {r I z I +1 } =
I
(Ai ∪ {r i zi+1 }).
(5.18)
i=0
If we had z∗ ∈ r i zi+1 , then the paths q I +1 z∗ and r i z∗ z∗ would contradict the choice of pi+1 . Finally, if we had z∗ ∈ Ai , then the paths q I +1 z∗ and z∗ z∗ would contradict the choice of pi+1 . This completes the verification of hypothesis (R) for I + 1. This completes the proof of Lemma 5.2. 5.3. A diagrammatic bound. In this section, we use Lemma 5.2 and the BK inequality [8] to bound P pc [A J (n, w, x, y)]. For this, we need the following preliminaries. The critical survival probability is defined by θ N = P pc (0 −→ N ).
(5.19)
The two papers [22,23] show that for d > 4 and L ≥ L 0 (d), we have θ N ∼ cN −1 as N → ∞, for some c = c(d, L) = 2 + O(L −d ). Moreover, θN ≤
K , N
N ≥ 0, L ≥ L 0 ,
(5.20)
with the constant K = 5 which is of course independent of both d and L (see [22, Eq. (1.11)]). To abbreviate the notation, when y 1 = (y1 , m 1 ) and y 2 = (y2 , m 2 ) we write τ (y 1 , y 2 ) = τm 2 −m 1 (y2 − y1 ). We also introduce U1 (u0 , v 0 , u1 , v 1 , z1 ) = τ (v 0 , u1 ) τ (u1 , v 1 ) τ (v 1 , z1 ) τ (u0 , z1 ), U2 (u0 , v 0 , u1 , v 1 , z1 ) = τ (u0 , u1 ) τ (u1 , v 1 ) τ (v 1 , z1 ) τ (v 0 , z1 ), U = U1 + U2 .
(5.21)
For 0 ≤ |u0 | < n and |u0 | ≤ |v 0 | < R and y = (y, N ), let τ (0, u0 ) τ (u0 , v 0 ) τ (v 0 , y), ϕ(u0 , v 0 ) = y∈Zd
ϕ R (u0 , v 0 ) =
y∈Zd
v ∗ ∈Zd ×Z+
τ (0, v ∗ ) τ (v ∗ , u0 ) τ (u0 , v 0 ) θ R−|v 0 | τ (v ∗ , y), (5.22)
ψ (0) (u0 , v 0 ) = ϕ(u0 , v 0 ) + ϕ R (u0 , v 0 ). For I ≥ 1, 0 ≤ |u I | < n and |u I | ≤ |v 0 | < R, let ψ (I ) (u I , v I ) =
U (u I −1 , v I −1 , u I , v I , z I )
u I −1 ∈Zd ×Z + z I ∈Zd ×Z + v I −1 ∈Zd ×Z + 0≤|u I −1 |≤|u I | |v I | 6 rather than d > 4. The first lemma gives a bound on ψ (0) .
Random Walk on the Incipient Infinite Cluster for Oriented Percolation
427
Lemma 5.5. Let d > 4, R ≥ 1, 0 < a < 1, 0 < n ≤ a R, w = (w, n − 1) and x = (x, n). Then lim sup ψ (0) (w, x) ≤ ( K¯ 3 + K¯ 4 K a/(1 − a)). (5.28) N →∞
w,x∈Zd
Proof. By definition and (5.27), ϕ(w, x) = τn−1 τ1 τ N −n ≤ K¯ 3 . w,x∈Zd
Similarly, writing v ∗ = (v∗ , l∗ ),
ϕ R (w, x) =
n−1
τl∗ τn−l∗ −1 τ1 θ R−n τ N −l∗ ≤
l∗ =0
w,x∈Zd
K¯ 4 K n . R−n
Since n/(R − n) ≤ a/(1 − a) because n ≤ a R, this gives (5.28).
For J ≥ 1, we use a somewhat stronger formulation of the bound, in which |u J | and |v J | are not restricted to the values n − 1 and n. This will allow us to prove a bound on ψ (J ) by induction. Lemma 5.6. Let d > 6, R ≥ 1, 0 < a < 1, 0 < n ≤ a R. Suppose that 0 ≤ k J < n, k J ≤ l J < R, u J = (u J , k J ) and v J = (v J , l J ). Then lim sup ψ (J ) (u J , v J ) ≤ (2 K¯ 3 K 3 β) J ( K¯ 3 + 3 K¯ 5 K a/(1 − a)), J ≥ 1. N →∞
u J ,v J ∈Zd
(5.29) Proof. We start by inserting the definition of ψ (J ) into the left-hand side of (5.29). With z J = (z J , s J ), u J −1 = (u J −1 , k J −1 ) and v J −1 = (v J −1 , l J −1 ), the left-hand side of (5.29) equals lim sup N →∞
kJ
R−1
sJ
U (u J −1 , v J −1 , u J , v J , z J )
u J ,v J ∈Zd z J ,u J −1 ,v J −1 ∈Zd k J −1 =0 s J =l J l J −1 =k J −1
× ψ (J −1) (u J −1 , v J −1 ). (5.30) The vertices u J , v J and z J only appear in the factor U . We claim that U (u J −1 , v J −1 , u J , v J , z J ) ≤ 2 K¯ 3 Kβ(s J − k J −1 + 1)−d/2 .
(5.31)
u J ,v J ,z J ∈Zd
To see this, note that s J = |z J | > |u J −1 | = k J −1 , by (5.7). For the U1 term, we use (4.2) to bound τ (u J −1 , z J ) by Kβ(s J − k J −1 + 1)−d/2 . Then the sums over z J , v J and u J contribute the factor K¯ 3 , by using (5.27) for the other three factors in U1 . For the U2 term, we apply (4.2) and τn ≤ K¯ to see that sup τn (y)τm (x − y) ≤ Kβ(n + m + 1)−d/2 , n + m ≥ 1. (5.32) x∈Zd y∈Zd
428
M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade
An application of (5.32) to the convolution of τ (u J −1 , u J ), τ (u J , v J ) and τ (v J , z J ), together with (5.27), yields an upper bound of the same form. This proves (5.31). Inserting (5.31) into (5.30) and rearranging, we get (5.30) ≤ 2 K¯ 3 Kβ
kJ
R−1
(s J − k J −1 + 1)−d/2
k J −1 =0 s J =l J sJ
×
lim sup
l J −1 =k J −1
N →∞
ψ (J −1) (u J −1 , v J −1 ).
(5.33)
u J −1 ,v J −1 ∈Zd
Now we prove (5.29) by induction on J . To start the induction, we verify (5.29) for J = 1. This is most of the work; advancing the induction is easy. When J = 1, the lim sup in (5.33) consists of two terms, corresponding to ϕ and ϕ R . The ϕ-term is bounded by lim sup τ (0, u0 ) τ (u0 , v 0 ) τ (v 0 , y) = lim sup τk0 τl0 −k0 τ N −l0 ≤ K¯ 3 . N →∞
N →∞
u 0 ,v0 ∈Zd y∈Zd
(5.34) Inserting this into (5.33), and assuming d > 6, we see that the ϕ contribution to (5.33) is bounded by 2 K¯ 3 Kβ K¯ 3
k1 R−1
(s1 − k0 + 1)(2−d)/2 ≤ (2 K¯ 3 K 2 β)( K¯ 3 ).
(5.35)
k0 =0 s1 =l1
The ϕ R term is bounded as follows. First, the lim sup is bounded by
lim sup N →∞
k0
τ (0, v ∗ ) τ (v ∗ , u0 ) τ (u0 , v 0 ) θ R−l0 τ (v ∗ , y)
u 0 ,v0 ∈Zd l∗ =0 y,v∗ ∈Zd
≤ K¯ 4
k0 l∗ =0
K 1 θ R−l0 ≤ K¯ 4 (k0 + 1) ≤ K¯ 4 K n . R − l0 R − l0
(5.36)
We insert this bound into (5.33) to obtain (2 K¯ 3 Kβ)( K¯ 4 K )n
k1 R−1
(s1 − k0 + 1)−d/2
k0 =0 s1 =l1
s1 l0 =k0
1 . R − l0
(5.37)
We split the sum over s1 into the cases: (1) s1 < n + (R − n)/2; (2) s1 ≥ n + (R − n)/2. In case (1), we have 1 1 2 . ≤ ≤ R − l0 R − s1 R−n Inserting this into (5.37), the contribution of case (1) to the expression in (5.37) is bounded by (2 K¯ 3 Kβ)(2 K¯ 4 K )
k1 n R−n
k0 =0
n+(R−n)/2
(s1 − k0 + 1)(2−d)/2
s1 =l1
n a ≤ (2 K K β)(2 K K ) ≤ (2 K¯ 3 K 2 β)(2 K¯ 4 K ) . R−n 1−a ¯3
2
¯4
(5.38)
Random Walk on the Incipient Infinite Cluster for Oriented Percolation
429
In case (2), since n ≥ k1 ≥ k0 we have (s1 − k0 + 1)−d/2 ≤ K (R − k0 + 1)−d/2 , and the sum over l0 in (5.37) is bounded by log(R − k0 + 1) ≤ K¯ (R − k0 + 1)δ for some fixed exponent δ (e.g., δ = 1/4 suffices). Therefore the contribution of case (2) to the expression in (5.37) is bounded by (2 K¯ 3 K 2 β)( K¯ 5 K )n
k1 R−n (R − k0 + 1)(2δ−d)/2 2 k0 =0
R−n (R − n)(2δ+2−d)/2 2 n (R − n)(2δ+6−d)/2 ≤ (2 K¯ 3 K 3 β)( K¯ 5 K ) R−n a . ≤ (2 K¯ 3 K 3 β)( K¯ 5 K ) 1−a ≤ (2 K¯ 3 K 3 β)( K¯ 5 K )n
(5.39)
Putting (5.38) and (5.39) together, we get that (5.37) is bounded by (2 K¯ 3 K 3 β)(3 K¯ 5 K a/(1 − a)). Together with (5.35) this proves the J = 1 case of (5.29). To advance the induction, we assume now that (5.29) holds for an integer J = M − 1 ≥ 1, and prove that it holds for J = M. Using d > 6, we insert the bound (5.29) into (5.33) to get that the right-hand side of (5.33) is bounded by (2 K¯ 3 Kβ)(2 K¯ 3 K 3 β) M−1 ( K¯ 3 + 3 K¯ 5 K a/(1 − a))
kM
R−1
(s M − k M−1 +1)(2−d)/2
k M−1 =0 s M =l M
¯3
¯3
¯5
≤ (2 K K β) ( K + 3 K K a/(1 − a)). 3
M
This completes the proof of (5.29).
(5.40)
Proof of (5.26). It follows immediately from Lemmas 5.5–5.6 that (5.26) holds with c2 = ( K¯ 3 + 3 K¯ 5 K a/(1 − a)) and c3 = 2 K¯ 3 K 3 β. Recall that the constant K = 5 of (5.20) is independent of d and L. Choosing β small ensures that 0 < c3 < 21 . This proves (5.26), and thus completes the proof of Proposition 3.3. Acknowledgements. The work of MTB, AAJ and GS was supported in part by NSERC of Canada. The work of TK was supported in part by the Ministry of Education, Culture, Sports, Science and Technology of Japan, Grant-in-Aid 18654018 (Houga). We thank an anonymous referee for suggesting several improvements to the exposition.
References 1. Aizenman, M., Newman, C.M.: Tree graph inequalities and critical behavior in percolation models. J. Statist. Phys. 36, 107–143 (1984) 2. Aldous, D., Fill, J.: Reversible Markov Chains and Random Walks on Graphs. Book in preparation, available at http://www.stat.berkeley.edu/~aldous/RWG/book.html, 2003 3. Alexander, S., Orbach, R.: Density of states on fractals: “fractons”. J. Physique (Paris) Lett. 43, L625–L631 (1982) 4. Angel, O., Goodman, J., den Hollander, F., Slade, G.: Invasion percolation on regular trees. Ann. Probab., to appear
430
M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade
5. Barlow, M.T.: Random walks on supercritical percolation clusters. Ann. Probab. 32, 3024–3084 (2004) 6. Barlow, M.T., Coulhon, T., Kumagai, T.: Characterization of sub-Gaussian heat kernel estimates on strongly recurrent graphs. Comm. Pure Appl. Math. 58, 1642–1677 (2005) 7. Barlow, M.T., Kumagai, T.: Random walk on the incipient infinite cluster on trees. Illinois J. Math. 50, 33–65 (2006) 8. van den Berg, J., Kesten, H.: Inequalities with applications to percolation and reliability. J. Appl. Prob. 22, 556–569 (1985) 9. Berger, N., Biskup, M.: Quenched invariance principle for simple random walk on percolation clusters. Prob. Theory Related Fields 137, 83–120 (2007) 10. Berger, N., Gantert, N., Peres, Y.: The speed of biased random walk on percolation clusters. Probab. Theory Related Fields 126, 221–242 (2003) 11. Bezuidenhout, C., Grimmett, G.: The critical contact process dies out. Ann. Probab. 18, 1462–1482 (1990) 12. Billingsley, P.: Probability and Measure. 3rd edition, New York: John Wiley and Sons, 1995 13. Croydon, D.: Volume growth and heat kernel estimates for the continuum random tree. Probab. Theory Related Fields. 140(1–2), 207–238 (2008) 14. Croydon, D.: Convergence of simple random walks on random discrete trees to Brownian motion on the continuum random tree. Ann. Inst. H. Poincaré Probab. Statist., to appear 15. Doyle, P.G., Snell, J.L.: Random Walks and Electric Networks. Washington DC: Mathematical Association of America, 1984; avilable at http://arxiv.org/abs/math/0001057v1, 2000 16. Fortuin, G., Kastelyn, P., Ginibre, J.: Correlation inequalities on some partially ordered sets. Commun. Math. Phys. 22, 89–103 (1971) 17. de Gennes, P.G.: La percolation: un concept unificateur. La Recherche 7, 919–927 (1976) 18. Grimmett, G.: Percolation. 2nd ed., Berlin: Springer, 1999 19. Grimmett, G., Hiemer, P.: Directed percolation and random walk. In: V. Sidoravicius, editor, In and Out of Equilibrium, Boston: Birkhäuser, pp. 273–297, 2002 20. van der Hofstad, R.: Infinite canonical super-Brownian motion and scaling limits. Commun. Math. Phys. 265, 547–583 (2006) 21. van der Hofstad, R., den Hollander, F., Slade, G.: Construction of the incipient infinite cluster for spreadout oriented percolation above 4 + 1 dimensions. Commun. Math. Phys. 231, 435–461 (2002) 22. van der Hofstad, R., den Hollander, F., Slade, G.: The survival probability for critical spread-out oriented percolation above 4 + 1 dimensions. I. Induction. Probab. Theory Related Fields 138, 363–389 (2007) 23. van der Hofstad, R., den Hollander, F., Slade, G.: The survival probability for critical spread-out oriented percolation above 4 + 1 dimensions. II. Expansion. Ann. Inst. H. Poincaré Probab. Statist. 43, 509–570 (2007) 24. van der Hofstad, R., Járai, A.A.: The incipient infinite cluster for high-dimensional unoriented percolation. J. Statist. Phys. 114, 625–663 (2004) 25. van der Hofstad, R., Slade, G.: A generalised inductive approach to the lace expansion. Probab. Theory Related Fields 122, 389–430 (2002) 26. van der Hofstad, R., Slade, G.: Convergence of critical oriented percolation to super-Brownian motion above 4 + 1 dimensions. Ann. Inst. H. Poincaré Probab. Statist. 39, 415–485 (2003) 27. Hughes, B.D.: Random Walks and Random Environments. Volume 2: Random Environments. Oxford: Oxford University Press, 1996 28. Janssen, H.-K., Täuber, U.C.: The field theory approach to percolation processes. Ann. Phys. 315, 147–192 (2005) 29. Kesten, H.: The incipient infinite cluster in two-dimensional percolation. Probab. Theory Related Fields 73, 369–394 (1986) 30. Kesten, H.: Subdiffusive behavior of random walk on a random cluster. Ann. Inst. H. Poincaré Probab. Statist. 22, 425–487 (1986) 31. Kigami, J.: Analysis on Fractals. Cambridge: Cambridge University Press, 2001 32. Kumagai, T., Misumi, J.: Heat kernel estimates for strongly recurrent random walk on random media, preprint, 2007 33. Lyons, R., Peres, Y.: Probability on Trees and Networks. Book in preparation, available at http://mypage. iu.edu/~rdlyons/prbtree/prbtree.html 34. Mathieu, P., Piatnitski, A.: Quenched invariance principles for random walks on percolation clusters. Proc. Roy. Soc. A 463, 2287–2307 (2007) 35. Sidoravicius, V., Sznitman, A.-S.: Quenched invariance principles for walks on clusters of percolation or among random conductances. Probab. Theory Related Fields 129, 219–244 (2004) 36. Slade, G.: The Lace Expansion and its Applications. Lecture Notes in Mathematics Vol. 1879. Ecole d’Eté de Probabilités de Saint–Flour XXXIV–2004, Berlin: Springer, 2006 37. Telcs, A.: Volume and time doubling of graphs and random walks: the strongly recurrent case. Comm. Pure Appl. Math. 54, 975–1018 (2001)
Random Walk on the Incipient Infinite Cluster for Oriented Percolation
431
38. Telcs, A.: Local sub-Gaussian estimates on graphs: the strongly recurrent case. Electron. J. Probab. 6, paper 22 (2001) 39. Telcs, A.: A note on rough isometry invariance of resistance. Combin. Probab. Comput. 11, 427–432 (2002) Communicated by M. Aizenman
Commun. Math. Phys. 278, 433–451 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0404-2
Communications in
Mathematical Physics
Exponential Decay Towards Equilibrium for the Inhomogeneous Aizenman-Bak Model J. A. Carrillo1 , L. Desvillettes2 , K. Fellner3 1 ICREA (Institució Catalana de Recerca i Estudis Avançats) and Departament de Matemàtiques,
Universitat Autònoma de Barcelona, E-08193 Bellaterra, Spain. E-mail:
[email protected] 2 CMLA, ENS Cachan, CNRS, PRES UniverSud, 61 Av. du Pdt. Wilson,
94235 Cachan Cedex, France. E-mail:
[email protected] 3 Faculty of Mathematics, University of Vienna, Nordbergstr. 15, 1090 Wien, Austria.
E-mail:
[email protected] Received: 18 August 2006 / Accepted: 25 July 2007 Published online: 8 January 2008 – © Springer-Verlag 2007
Abstract: The Aizenman-Bak model for reacting polymers is considered for spatially inhomogeneous situations in which they diffuse in space with a non-degenerate sizedependent coefficient. Both the break-up and the coalescence of polymers are taken into account with fragmentation and coagulation constant kernels. We demonstrate that the entropy-entropy dissipation method applies directly in this inhomogeneous setting giving not only the necessary basic a priori estimates to start the smoothness and size decay analysis in one dimension, but also the exponential convergence towards global equilibria for constant diffusion coefficient in any spatial dimension or for non-degenerate diffusion in dimension one. We finally conclude by showing that solutions in the one dimensional case are immediately smooth in time and space while in size distribution solutions are decaying faster than any polynomial. Up to our knowledge, this is the first result of explicit equilibration rates for spatially inhomogeneous coagulation-fragmentation models. 1. Introduction We analyze the spatial inhomogeneous version of a size-continuous model for reacting polymers or clusters of aggregates: ∂t f − a(y) x f = Q( f, f ).
(1.1)
Here, f = f (t, x, y) is the concentration of polymers/clusters with length/size y ≥ 0 at time t ≥ 0 and point x ∈ ⊂ Rd , d ≥ 1. These polymers/clusters diffuse in the environment . This set is assumed to be a smooth bounded domain with normalized volume, i.e., || = 1. In the one dimensional case, we will set = (0, 1). Equation (1.1) is to be considered with homogeneous Neumann boundary condition ∇x f (t, x, y) · ν(x) = 0
on ∂
(1.2)
434
J. A. Carrillo, L. Desvillettes, K. Fellner
with ν the outward unit normal to , so that there is no polymer flux through the physical boundary. We assume the diffusion coefficient a(y) to be non-degenerate in the sense that there exist a∗ , a ∗ ∈ R+ such that 0 < a∗ ≤ a(y) ≤ a ∗ .
(1.3)
On the other hand, the reaction term Q( f, f ) of (1.1) models chemical degradationbreak-up or fragmentation- and polymerization -coalescence or coagulation- of polymers/clusters. More precisely, the full collision operator reads as Q( f, f ) = Q c ( f, f ) + Q b ( f, f ) = Q + ( f, f ) − Q − ( f, f ) − + = Q +c ( f, f ) − Q − c ( f, f ) + Q b ( f, f ) − Q b ( f, f )
(1.4)
with obvious definitions of the coagulation Q c ( f, f ), fragmentation or break-up Q b ( f, f ), loss Q − ( f, f ) and gain Q + ( f, f ) operators which are determined from the four basic terms in (1.4): 1. Coalescence of clusters of size y ≤ y and y − y results in clusters of size y: y Q +c ( f, f ) := f (t, x, y − y ) f (t, x, y ) dy . (1.5) 0
2. Polymerization of clusters of size y with other clusters of size y produces a loss in its concentration: ∞ Q− ( f, f ) := 2 f (t, x, y) f (t, x, y ) dy . (1.6) c 0
3. Break-up of clusters of size
y
larger than y contributes to create clusters of size y: ∞ Q +b ( f, f ) := 2 f (t, x, y ) dy . (1.7) y
4. Break-up of polymers of size y reduces its concentration: Q− b ( f, f ) := y f (t, x, y).
(1.8)
This kind of model finds its application not only in polymers and cluster aggregation in aerosols [S16,S17,AB,Al,Dr] but also in cell physiology [PS], population dynamics [Ok] and astrophysics [Sa]. Here, fragmentation and coagulation kernels are all set up to constants as in the original Aizenman-Bak model [AB]. This will be of paramount importance in the basic a-priori estimates. The conservation of the total number of monomers at time t ≥ 0 quantified by ∞ N (t, x) d x, where N (t, x) := y f (t, x, y) dy
0
is the basic conservation law satisfied by Eq. (1.1) since the reaction term (1.4) satisfies ∞ y Q( f, f ) d y d x = 0, 0
and thus, assuming initially a positive total number of monomers, we formally conclude ∞ y f (t, x, y) d y d x = N (t, x) d x = N0 (x) d x := N∞ > 0. (1.9) 0
Exponential Equilibration Rate for the Inhomogeneous Aizenman-Bak Model
Another macroscopic quantity of interest is the number density of polymers, ∞ f (t, x, y) dy, M(t, x) :=
435
(1.10)
0
that together with the total number of monomers N (t, x) satisfies the reaction-diffusion system ∞ ya(y) f (t, x, y) dy = 0, (1.11) ∂t N − x 0 ∞ a(y) f (t, x, y) dy = N − M 2 , (1.12) ∂t M − x 0
becoming a closed decoupled system in the constant diffusion case (a(y) := a): ∂t N − a x N = 0, ∂t M − a x M = N − M 2 .
(1.13) (1.14)
The definition of the full collision operator has to be understood in the weak sense as < Q( f, f ), ϕ >=
∞ 0
∞
f (y ) − f (y) f (y ) (ϕ(y) + ϕ(y ) − ϕ(y )) dy dy
0
(1.15) for any smooth function ϕ(y), where y = y + y and the dependence on (t, x) of the density function has been dropped for notational convenience. An alternative weak formulation that can be useful in several arguments below is obtained integrating by parts in the Q +b part giving ∞ ∞ ∞ ∞ < Q( f, f ), ϕ >= − 2 ϕ(y) f (y) dy f (y ) dy + f (y) f (y )ϕ(y ) dy dy 0 0 0 0∞ ∞ +2 f (y) Φ(y) dy − y f (y) ϕ(y) dy (1.16) 0
0
for any smooth function ϕ, the function Φ being the primitive of ϕ (∂ y Φ = ϕ) such that Φ(0) = 0. Let us consider the (free-energy) entropy functional associated to any positive density f as ∞
H ( f )(t, x) =
( f ln f − f ) dy,
(1.17)
0
and the relative entropy H ( f |g) = H ( f ) − H (g) of two states f and g not necessarily with the same L 1y -norm. Then, the entropy formally dissipates as d dt
∞ ∞ |∇x f |2 dy dx − H( f ) dx = − a(y) ( f − f f ) f 0 0 0 f dy dy d x := −D H ( f ) × ln (1.18) ff
with obvious notations.
∞
436
J. A. Carrillo, L. Desvillettes, K. Fellner
Global existence and uniqueness of classical solutions has been studied in [Am,AW] for some particular cases, namely, for constant diffusion coefficient or dimension one with additional restrictions for the coagulation and fragmentation kernel not including the AB model. The initial boundary-value problem to (1.1)–(1.2) was then analyzed in [LM02-1], for much more general coagulation and fragmentation kernels including the AB model (1.5) – (1.8), proving the global existence of weak solutions satisfying the entropy dissipation inequality t H ( f (t)) d x + D H ( f (s)) ds ≤ H ( f0 ) d x
0
for all t ≥ 0. The equilibrium states for which the entropy dissipation vanishes are better understood after applying a remarkable inequality proven in [AB, Props. 4.2 and 4.3]. A modified version of this inequality (reviewed in Sect. 2) reads: ∞ ∞ √ 2 f √ f − f f ln ≥ M H ( f | f ) + 2(M − N ) . (1.19) dy dy N ,N ff 0 0 Herein, f √ N ,N denotes a distinguished, exponential-in-size distribution with the very √ moments M = N and N : f √ N ,N (t, x, y) = e
− √y
N
.
These distributions f √ N ,N appear as analogues to the so-called intermediate or local equilibria in the study of inhomogeneous kinetic equation (e.g. [DV01,CCG,FNS,DV05, FMS,NS]). Finally, the conservation of mass (1.9) identifies (at least formally) the global 2 = N = N : equilibrium f ∞ with constant moments M∞ ∞ f∞ = e
− √ Ny
∞
.
(1.20)
The analogy to intermediate equilibria carries over to the following additivity of relative entropies: H ( f | f ∞ ) = H ( f | f √ N ,N ) + H ( f √ N ,N | f ∞ ).
(1.21)
It is worth pointing out that even if f √ N ,N and f ∞ do not have the same L 1y −norm, its global relative entropy √ √ H ( f N ,N | f ∞ ) d x = 2 N dx − N dx ≥ 0
is a nonnegative quantity, as easily checked via Jensens’s inequality. In [LM02-1], it is proved that f ∞ attracts all global weak solutions in L 1 ( × (0, ∞)) of (1.1)-(1.2) but no time decay rate is obtained. This result is the analogue to convergence results along subsequences for the classical Boltzmann equation in [De]. Other existence and uniqueness results for inhomogeneous coagulation-fragmentation models were given in [CD] and the references therein. Finally, let us mention that the conservation law (1.9) is known not to hold for certain coagulation-fragmentation kernels, phenomena known as gelation [ELMP], and the convergence or not towards typical self-similar profiles for the pure coagulation models is a related issue; we refer
Exponential Equilibration Rate for the Inhomogeneous Aizenman-Bak Model
437
to [Le,LM05,MP]. We refer finally to [LM02-1,LM03,LM04] for an extensive list of related literature. Let us now discuss some works on the study of the long time asymptotics for related models. Qualitative results concerning a discrete version of a coagulation-fragmentation system, the Becker-Döring system, have been obtained in [CP,LW,LM02-2] and the references therein. We emphasize that global explicit decay estimates towards equilibrium were obtained for the Becker-Döring system without diffusion in [JN] by entropy-entropy dissipation methods. Other techniques have recently been developed for inhomogeneous kinetic equations. We refer to [MN] for a spectral approach and to [V06] for a general description of the concept of hypocoercivity. Note that the presence of diffusion instead of advection makes it possible in our present context, not to use the concept of hypocoercivity. In this work we prove exponential decay towards equilibrium with explicit rates and constants. Our key result, Lemma 2 in Sect. 2, establishes a functional inequality between entropy and entropy dissipation provided lower and upper bounds on the moment M and (1.3). We are able to apply this functional inequality to solutions of (1.1)-(1.2) in the next two situations. In the special case of size-independent diffusion coefficients a(y) = a, we show, as the first application of Lemma 2, the exponential decay towards equilibrium in all space dimensions d ≥ 1 by exploiting the closed system (1.13)–(1.14) for N and M in Sect. 2. In the case of general diffusion coefficients a(y) satisfying (1.3) we prove in Sect. 3 a-priori estimates in the one-dimensional case d = 1, which entail an entropy-entropy dissipation estimate with a constant sufficient to conclude exponential decay via a suitable Gronwall argument (see Sect. 4). These two cases are summarized in the following theorem: Theorem 1. Let be a smooth bounded connected open set of Rd , d ≥ 1 and assume a constant diffusion coefficient a(y) = a > 0 or let be the interval (0, 1) and consider a diffusion coefficient satisfying (1.3). Let us also assume that f 0 = 0 is a nonnegative initial datum such that (1+ y +ln f 0 ) f 0 ∈ L 1 ((0, 1)×(0, ∞)). In the case a(y) = a > 0 assume further that initial moments M0 (x) and N0 (x) are L ∞ ()-functions. Then, the global weak solutions f (t, x, y) of (1.1)–(1.2) decay exponentially to the global equilibrium state (1.20) with explicitly computable constants C1 , C2 and rate α, both in global relative entropy: H ( f (t)| f ∞ ) d x ≤ C1 H ( f 0 | f ∞ ) d x e−α t , (1.22)
and in the L 1x,y sense: f (t, ·, ·) − f ∞ L 1x,y ≤ C2
H ( f0 | f∞) d x
α
e− 2 t
(1.23)
for all t ≥ 0, where f ∞ is defined by (1.20) and N∞ > 0 is determined by the conservation of mass (1.9). In the one dimensional case, it is further possible to interpolate the exponential decay in a “weak” norm like L 1 with polynomially growing bounds in “strong” norms like (weighted) L 1y (Hx1 ) in order to get an exponential decay in a “medium” norm like L 1y (L ∞ x ). Thus, the decay toward equilibrium can be extended to these stronger norms. The following proposition is proved at the end of Sect. 4:
438
J. A. Carrillo, L. Desvillettes, K. Fellner
Proposition 1. Under the assumptions of Theorem 1 for the case d = 1, for all t∗ > 0 and q ≥ 0, there are explicitly computable constants C3 , α > 0 such that whenever t ≥ t∗ , ∞ (1 + y)q f (t, ·, y) − f ∞ (y) L ∞ dy ≤ C3 e−α t . (1.24) x 0
A bootstrap argument in the spirit of the proof of Proposition 1 allows to replace the L∞ x norm by any Sobolev norms in (1.24). 2. Entropy-Entropy Dissipation Estimate Please note that in this section we will systematically use the shortcuts: ∞ ∞ M= f (x, y) d yd x, N= y f (x, y) d yd x. 0
0
We start by reminding the reader of the following functional inequality: Lemma 1 ([AB, Prop. 4.3]). Let g := g(y) be a function of L 1+ ((0, ∞)) with finite entropy g ln g ∈ L 1 ((0, ∞)), then ∞ ∞ ∞ ∞ g(y)g(y ) ln g(y + y ) dy dy ≤ g(y) dy g(y ) ln g(y ) dy 0
0
0
0
2
∞
−
g(y) dy
.
(2.1)
0
This inequality allows to show the dissipation inequality (1.19). Following the original paper [AB] or the survey [LM04], one finds that ∞ ∞ √ f dy dy ≥ M H ( f | f √ N ,N ) + (M − N )2 ( f − f f ) ln ff 0 0 N N N . (2.2) + M2 ln + 1 − M2 M2 M2 In fact, after expanding the left-hand side of (2.2), one applies Lemma 1 to the term ∞ ∞ − f f ln f dy dy , 0
while for the term
0
0
∞ ∞ 0
f ff ln ff
f ff
dy dy
one uses Jensen’s inequality for the convex function x ln x and further that ∞ ∞ f dy dy = N . 0
0
Then, after directly calculating the remaining terms one obtains (2.2), as in [LM04], and moreover √ the inequality (1.19) when applying the elementary inequality x ln(x)+1−x ≥ (1 − x)2 for x ≥ 0 to the last term on the right-hand side of (2.2).
Exponential Equilibration Rate for the Inhomogeneous Aizenman-Bak Model
439
For the subsequent large-time analysis, we will rather study the relative entropy with respect to the global equilibrium, which dissipates according to (1.18) and (1.19) as ∞ d |∇x f |2 H ( f | f∞) d x ≤ − a(y) dy dx dt f 0
√ M H ( f | f √ N ,N ) + 2(M − N )2 d x := −D( f ). (2.3) −
We introduce a lemma enabling to estimate the entropy of f by means of its entropy dissipation. This is a functional estimate, that is, the function f in this lemma does not depend on t and has not necessarily something to do with the solution of our equation. Lemma 2. Assume (1.3). Let f := f (x, y) ≥ 0 be a measurable function with moments satisfying 0 < M∗ ≤ M(x) ≤ M L ∞ and 0 < N∞ = N . Then, the following x entropy-entropy dissipation estimate holds: C(M∗ , N∞ , a∗ , P()) D( f ) ≥ H ( f | f ∞ ) d x, (2.4) M L ∞ x with a constant C(M∗ , N∞ , a∗ , P()) depending only on M∗ , N∞ , a∗ and the Poincaré constant P(). Proof. Step 1. We start with the right-hand side of (2.4) by using the additivity (1.21) and calculating √ H ( f | f∞) d x = H ( f | f √ N ,N ) d x + 2 N− N . (2.5)
Step 2. The second term of (2.5) is bounded as:
√ √ 2 M − N 2L 2 + M − M 2L 2 . (2.6) N− N≤√ x x N∞ √ √ √ Indeed, since N − N is orthogonal to N − M in L 2x , we have √ 2 √ √ √ √ N− N 1 1 =√ N− N≤ N − N 2L 2 ≤ √ N − M 2L 2 , x x N∞ N∞ N √ and further, we obtain (2.6) by expanding N − M 2L 2 and Young’s inequality x
√ 1 √ N − M 2L 2 − M − M 2L 2 ≤ N − M − M + M 2L 2 . x x x 2 Thus, we obtain (using 0 < M∗ < M) √ 2 − 12 √ H ( f | f ∞ ) d x ≤ max M−1 , 2N M H ( f | f ) d x + 2 M − N ∞ ∗ N ,N L2
x
4 +√ M − M 2L 2 x N∞ ∞ − 12 ≤ max M−1 , 2N ∞ ∗ 0
+√ by the inequality (1.19)
4 M − M 2L 2 , x N∞
0
∞
f − ff
ln
f ff
dy dy d x (2.7)
440
J. A. Carrillo, L. Desvillettes, K. Fellner
Step 3. Next, the variance of M, i.e. the last term on the right-hand side of (2.7) is controlled by the first, “Fisher”-type term of (2.3). Denoting with P() the constant of Poincaré’s inequality, we estimate using Cauchy-Schwartz, ∞ ∞ ∞ a∗ |∇x f |2 |∇x f |2 d yd x ≥ dy a(y) f dy d x f M L ∞ f 0 0 0 x 2 ∞ a∗ a∗ ≥ ∇x f dy d x = |∇x M|2 d x M L ∞ M L ∞ 0 x x a∗ ≥ M − M 2L 2 . (2.8) x P() M(t, ·) L ∞ x We remark that the seemingly more natural estimate ∞ √ √ 4 |∇x f |2 d yd x ≥ M − M 2L 2 , a(y) x f P() 0 provides a bound which does not seem sufficient to conclude as in Step 2. Step 4. Finally, combining (2.7) and (2.8), we have − 12 4P() M(t, ·) L ∞ x −1 D( f ), H ( f | f ∞ ) d x ≤ max M∗ , 2N∞ , √ a∗ N ∞ which yields the proof of Lemma 2.
Now, let us directly apply this entropy-entropy dissipation estimate to prove the constant diffusion part of Theorem 1. In the constant diffusion case, the equations for the first two moments M(t, x) and N (t, x) become the closed system (1.13)-(1.14). The existence and uniqueness of global, classical solutions with global L ∞ bounds from below and above are standard thanks to the maximum principle applied to the equations for N and further for M. We refer, for instance, to [Ro,Ki] for details, to conclude with: Lemma 3. Let be a smooth bounded connected open set of Rd , d ≥ 1 and let us assume that the initial data M0 (x) and N0 (x) = 0 are nonnegative L ∞ ()-functions. Then, there exist increasing functions t → M∗ (t), N∗ (t) and decreasing functions t → M ∗ (t), N ∗ (t) such that the unique global bounded solutions of the system (1.13)-(1.14) satisfy 0 < M∗ (t) ≤ M(t, x) ≤ M ∗ (t) < ∞, 0 < N∗ (t) ≤ N (t, x) ≤ N ∗ (t) < ∞,
(2.9) (2.10)
for all t > 0. Proof of Theorem 1. Case a(y) = a constant, d ≥ 1. Let us fix t∗ > 0. From (2.9)(2.10), we have 0 < M∗ ≤ M(t, x) ≤ M∗ < ∞ and 0 < N∗ ≤ N (t, x) ≤ N ∗ < ∞ for all t ≥ t∗ , and thus C(M∗ , N∞ , a∗ , P()) H ( f | f∞) d x D( f ) ≥ M∗ for all t ≥ t∗ due to (2.4) with the constant C(M∗ , N∞ , a∗ , P()) given in Lemma 2. As a direct consequence, we get d C(M∗ , N∞ , a∗ , P()) H ( f | f∞) d x ≤ − H ( f | f∞) d x dt M∗ for all t ≥ t∗ . Gronwall’s lemma implies estimate (1.22).
Exponential Equilibration Rate for the Inhomogeneous Aizenman-Bak Model
441
Next, convergence in L 1 as stated in Theorem 1 follows from the functional inequality of Csiszar-Kullback type [Cs,Ku]: 2 M(t, x) d x + N∞ H ( f (t)| f ∞ ) d x. (2.11) f (t, ·, ·) − f ∞ L 1 ≤ 2 x,y
The proof is standard, see [CCD] for related inequalities, and it is shown via a Taylor expansion of the function ϕ( f ) = f ln( f ) − f up to second order around f ∞ . Indeed, for a function ζ (x, y) ∈ (inf{ f (x, y), f ∞ (y)}, sup{ f (x, y), f ∞ (y)}), we get ∞ y 1 −√ ( f − f ∞ )2 d y d x, H ( f | f∞) d x = ( f − f∞) + 2ζ N∞ 0 and the first term vanishes due to the conservation law (1.9). For the second term, we apply Hölder’s inequality ∞ 1 2 2 ( f − f ∞ ) d y d x with ζ L 1x,y ≤ M d x + N∞ . f − f ∞ L 1 ≤ ζ L 1x,y x,y 0 ζ Noticing that t ∈ [0, t∗ ] → f (t, ·, ·) ∈ L 1x,y is bounded, we finally get (1.23), which concludes the proof of Theorem 1. Remark 1. As a consequence of the previous result, we also showed that the unique √ global bounded solutions of the system (1.13)-(1.14) satisfy M(t, x) → M∞ = N∞ and N (t, x) → N∞ as t → ∞ in L 1 () exponentially fast with explicit constants. In fact, we first remark that H ( f | f ∞ ) = H ( f | f M,N ) + H ( f M,N | f ∞ ) with √ √
2 d x, E S := H ( f M,N | f ∞ )d x = 2 N (ξ ln ξ − ξ + 1)+ N∞ N − N∞
where ξ =
√M N
and
M2 − M y e N . N It is obvious that E S is nonnegative since the minimum of ξ ln ξ − ξ + 1 is zero, and it can be written as √ √ M2 M ln − 2(M − N ) + 2 H ( f M,N | f ∞ ) d x = N∞ − N d x, N f M,N (t, x, y) =
by using the conservation of mass (1.9). Since H ( f | f M,N ) ≥ 0, then (1.22) implies the exponential convergence to zero of E S by the above additivity property. Finally, a simple Taylor expansion shows that, for all t ≥ t∗ , 2 2 N (t) − M(t) L 2 + N (t) − N∞ L 2 ≤ L H ( f M,N (t)| f ∞ ) d x, x
with
x
M∗ √ ∗ L = max 1, √ , N , N∗
that implies by trivial arguments the exponential convergence in L 1 () towards equilibrium for M and N . In fact, the system (1.13)-(1.14) might have been studied by a direct application of the techniques in [DF05,DF06].
442
J. A. Carrillo, L. Desvillettes, K. Fellner
3. A-priori Estimates In the sequel, we shall discuss the general diffusion coefficient, i.e., size dependent verifying (1.3) but we restrict to the one dimensional case, d = 1 (we shall not recall this fact in the various lemmas). We begin the proof of Theorem 1. Lemma 4. Assume that f 0 = 0 is a non-negative initial datum such that (1 + y) f 0 ∈ L 1 ((0, 1)×(0, ∞)). Then, there exists M∗0 > 0 such that solutions of (1.1)–(1.8) satisfy ∞ sup f (t, x, y) d y d x ≤ M(t, x) d x ≤ M∗0 . (3.1) t≥0
0
Proof. We estimate the L 1 ()-norm of M(t, x) by integrating equality (1.14), obtaining d M(t, x) d x = N (t, x) d x − M(t, x)2 d x dt 2 N0 (x) d x − M(t, x) d x ≤
by Hölder’s inequality and the conservation of mass (1.9). Therefore, for all t ≥ 0, 1/2
M(t, x) d x ≤ max
M0 (x) d x,
N0 (x) d x
:= M∗0
showing (3.1).
We now turn to a control of the L 1y (L ∞ x )-norm of f : Lemma 5. Assuming that the nonnegative initial datum f 0 = 0 satisfies (1 + y + ln f 0 ) f 0 ∈ L 1 ((0, 1) × (0, ∞)). Then, the number density of polymers M ∈ L 1 + L ∞ (0, ∞; L ∞ (0, 1)). More precisely, there exist m ∞ > 0 and an L 1+ (0, ∞)-function m 1 (t) such that the solution of (1.1)–(1.8) satisfies ∞ sup f (t, x, y) dy ≤ m ∞ + m 1 (t), (3.2) 0 0<x 0 and for any p > 1, i.e., there exist explicit constants M∗p ( f 0 , m ∞ , m 1 , p) such that M p ( f )(t) ≤ M∗p ,
for a.e. t > t∗ > 0.
(3.5)
Proof. We proceed in two steps: Step 1. We first assume that M p ( f )(t∗ ) < ∞ for certain p > 1 and t∗ > 0. Using the weak formulation (1.16), it is easy to check that ∞ ∞ ∞ < Q( f, f ), y p >= − 2 y p f (y) dy M(t, x) + f (y) f (z)(y + z) p dy dz 0 0 0 p−1 ∞ f (y) y p+1 dy. − p+1 0 Taking into account Lemma 5 and (y + z) p ≤ C p (y p + z p ), we deduce ∞ p−1 ∞ f (y) y p+1 dy < Q( f, f ), y p > ≤ 2(C p − 1) y p f (y) dy [m ∞ + m 1 (t)] − p+1 0 0
Exponential Equilibration Rate for the Inhomogeneous Aizenman-Bak Model
445
for all p > 1. Integrating in space, we find that the evolution of the moment of order p > 1 is given by d p−1 M p ( f )(t) ≤ 2(C p − 1) M p ( f )(t) [m ∞ + m 1 (t)] − M p+1 ( f )(t). dt p+1
(3.6)
Trivial interpolation of the p + 1-order moment with the moment of order one implies M p ( f )(t) ≤
1
p−1
1
N0 (t, x) d x + M p+1 ( f )(t)
0
for all > 0, and thus d p−11 M p ( f )(t) ≤ 2(C p − 1) M p ( f )(t) [m ∞ + m 1 (t)] − M p ( f )(t) + D
dt p+1
for certain constant D . Choosing > 0 such that 2(C p − 1)m ∞ −
1 p−11 ≤− , p+1
2
we obtain d 1 M p ( f )(t) ≤ − M p ( f )(t) + 2(C p − 1) m 1 (t) M p ( f )(t) + D
dt 2
for a.e. t > t∗ . According to Duhamel’s formula, t t − t∗ m 1 (s) ds − M p ( f )(t) ≤ M p ( f )(t∗ ) exp 2(C p − 1) 2
t∗ t t t −s ds, + D
exp 2(C p − 1) m 1 (τ ) dτ − 2
t∗ s
(3.7)
which shows that the moment M p ( f )(t) is bounded by a constant M∗p for a.e. t > t∗ since m 1 (t) ∈ L 1 ((0, ∞)) by Lemma 5. Moreover, it follows from (3.6) that the boundedness of M p (t∗ ) immediately implies that T M p+1 ( f )(t) dt < ∞ t∗
for all T > 0, and thus the finiteness of M p+1 ( f )(t) for a.e. t > t∗ and a simple induction argument enables then to conclude the bounds on all higher moments. Step 2. It remains to show that for given nontrivial initial data y f 0 ∈ L 1x,y and for a p > 1 and a time t∗ > 0 we have that M p (t∗ ) < ∞. We start with the following observation [MW, Appendix A]: For a nonnegative integrable function g(y) = 0 on (0, ∞), there exists a concave function (y), depending on g, smoothly increasing from (0) > 0 to (∞) = ∞ such that ∞ (y) g(y) dy < ∞. 0
446
J. A. Carrillo, L. Desvillettes, K. Fellner
Moreover, the function can be constructed to satisfy (y) − (y ) ≥ C
y − y y ln2 (e + y)
(3.8)
for 0 < y < y with C not depending on g. We refer to [MW, Appendix A] for all the details of this “by-now standard” construction. To show now that M p (t∗ ) < ∞ for a p > 1 and a time t∗ > 0, we take functions (x, y) constructed for nontrivial y f 0 (x, y) ∈ L 1y (0, ∞) a.e. x ∈ (0, 1) and calculate - similar to Step 1 - the moment 1 ∞ M1, ( f )(t) = y (x, y) f (x, y) d y d x. 0
0
For the fragmentation part, we use (3.8) for 0 < y < y and estimate y y (y) Q f ( f ) = 2 y ( (y ) − (y)) dy f (y) 0 y −2 −1 ≤ −C ln (e + y)y y (y − y ) dy f (y) 0
y2 = −C ln−2 (e + y) f (y) ≤ −Cδ y 2−δ f (y), 6 for all δ > 0 and a positive constant Cδ , where the (t, x)-dependence has been dropped for notational convenience. Hence, by estimating the coagulation part similar to Step 1, making use of the concavity of , we obtain that d M1, ( f )(t) ≤ 3(m ∞ + m 1 (t))M1, ( f )(t) − Cδ M2−δ ( f )(t), dt and boundedness of the moment M1, follows by interpolation as well as the finiteness of M2−δ ( f )(t∗ ) analogously to Step 1. Next, we show that M and N are bounded below uniformly (with respect to t and x) for all t ≥ t∗ > 0. Proposition 2. Under the assumptions of Theorem 1, let t∗ > 0 be given. Then, there are strictly positive constants M∗ and N∗ such that for all t ≥ t∗ > 0, M(t, x) ≥ M∗
and
N (t, x) ≥ N∗ .
Proof. We write the equation satisfied by f in this way: f, ∂t f − a(y) ∂x x f = g1 − y f − M(t, ·) L ∞ x where g1 is nonnegative. Then
t
t y+ M(s,·) L ∞ ds x = g2 , (∂t + a(y) ∂x x ) f e 0
where g2 is nonnegative. Now, we recall that the solution h := h(t, x) of the heat equation ∂t h − a ∂x x h = G,
Exponential Equilibration Rate for the Inhomogeneous Aizenman-Bak Model
447
with homogeneous Neumann boundary condition on the interval (0, 1), where a > 0 is a constant and G := G(t, x) ∈ L 1 , is given by the formula 1 h(t, x) = √ 2 π
1
˜ z) h(0,
−1
t
1 + √ 2 π
1 −1
0
∞
(2k+x−z)2 1 √ e− 4a t dz at k=−∞
˜ G(s, z)
∞
(2k+x−z)2 1 e− 4a (t−s) dzds, √ a (t − s) k=−∞
with h˜ and G˜ denoting the “evenly mirrored around 0 in the x variable” functions h and G. Therefore, for all t1 , t ≥ 0, and x ∈ (0, 1), y ∈ R+ , t +t (t +t) y+ 1 M(s,·)
∞
ds
Lx 0 f (t1 + t, x, y) e 1 1 t 2 1 1 − (x−z) t y+ 1 M(s,·) L ∞ ds x ≥ √ dz, f˜(t1 , z, y) √ e 4a(y) t e 1 0 2 π −1 a(y) t
so that when t ∈ [t∗ , 2t∗ ] (and since |x − z| < 2): 1 1 1 f (t1 , z, y) e− a∗ t∗ e−2t∗ y−2t∗ m ∞ −µ1 dz ∗ 2π a t∗ 0 1 ≥C f (t1 , z, y) e−2t∗ y dz,
f (t1 + t, x, y) ≥ √
0
where C > 0 depends on the constants a∗ , a ∗ , m ∞ , µ1 and t∗ > 0. We recall that for all t ≥ t∗ , 1
∞
0 0
y 2 f (t, x, y) d y d x ≤ M∗2 ,
and thus, for any A > 0, we deduce N (t1 + t, x) ≥ C e−2t∗ A ≥ C e−2t∗ A
1
A
0 0 1 0
= C e−2t∗ A
0
1
f (t1 , z, y) y dydz
N (t1 , x) d x − M∗2 /A
N0 (x) d x − M∗2 /A ,
due to the conservation law (1.9) and 1 0
∞ A
y f (t, x, y) d y d x ≤ M∗2 /A.
448
J. A. Carrillo, L. Desvillettes, K. Fellner
Choosing now A, we get that N (t1 +t, x) ≥ N∗ for some N∗ > 0 which does not depend on t1 . Using Lemma 6, M(t1 + t, x) ≥ C
1
∞
0 0
≥Ce
−2t∗ A
≥ C e−2t∗ A
f (t1 , z, y) e−2y dydz 1
A
f (t1 , z, y) dydz 0 0
M0∗ − M∗2 /A2 .
Once again choosing A, we get that M(t1 + t, x) ≥ M∗ . Since M∗ does not depend on t1 , we get Proposition 2. 4. Proofs of Theorem 1 and Proposition 1 if = (0, 1). With Proposition 2 and Lemma 5 providing the moment bounds required by the entropyentropy dissipation Lemma 2 in the one dimensional case = (0, 1), we turn now to the Proof of Theorem 1. Case = (0, 1). According to Lemma 2,
d dt
1
0
H ( f | f ∞ ) d x ≤ −D( f ) ≤ −
C M L ∞ x
1
H ( f | f ∞ ) d x,
0
where M L ∞ (t) ≤ m ∞ + m 1 (t) is in L 1t +L ∞ t by Lemma 5. Hence, for t∗ > 0, x
1
0
H ( f (t)| f ∞ ) d x ≤
1
0
H ( f (t∗ )| f ∞ ) d x exp
t
−
t∗
C ds . M L ∞ x
∞
Knowing that m 1 (t) ∈ L 1t with 0 m 1 (t) dt ≤ µ1 , we consider the sets A := {s > 0 : m 1 (s) ≥ 1} and Bt := {s ∈ [0, t] : m 1 (s) < 1}. We readily find that
|A| =
∞
ds ≤ A
m 1 (t) dt ≤ µ1
and
|Bt | = t −
ds ≥ t − µ1 . A∩[0,t]
0
Moreover,
t
t∗
−
C ds ≤ M L ∞ x
− Bt
C C (t − µ1 ), ds ≤ − M L ∞ (1 + m ∞ ) x
finishing the proof of (1.22). The proof of the L 1 -decay estimate (1.23) follows the same arguments as in the case of constant diffusion done in Sect. 2 using Csiszar-Kullback type inequalities. Finally, we show Proposition 1. Let us denote by C T any constant of the form C(t) (1+ T )s , where s ∈ R and C(t) is bounded on any interval [t∗ , +∞) with t∗ > 0.
Exponential Equilibration Rate for the Inhomogeneous Aizenman-Bak Model
449
Proof of Proposition 1. We observe using the bounds (3.5) and (3.1) that for all q ≥ 0, T 1 ∞ (1 + y)q Q + ( f, f ) d y d x dt 0
≤
0
0
T 1
∞
(1 + y)q+1 f (t, x, y) d y d x dt q +1 0 0 0 T 1 ∞ ∞ (1 + y + z)q f (t, x, y) f (t, x, z) dz dy d x dt + 0
0
0
0
∗ ) T + 2q ≤ 2q+1 (M∗0 + Mq+1
T 0
M(t, ·) L ∞ (M∗0 + Mq∗ ) dt ≤ C T . x
According to the properties of the heat kernel (cf. [DF06] for example), we know that for any ε > 0 and t∗ > 0,
f (·, ·, y) L 3−ε ([t∗ ,T ]×) ≤ C T f (0, ·, y) L 1x + Q + ( f, f )(·, ·, y) L 1 ([0,T ]×) . As a consequence,
∞
0
(1 + y)q f (·, ·, y) L 3−ε ([t∗ ,T ]×) dy ≤ C T .
Then, for all r ∈ [2, 3[, ∞ (1 + y)q Q + ( f, f )(·, ·, y) L r/2 ([t∗ ,T ]×) dy 0
∞
(1 + y)q+1 f (·, ·, y) L r ([t∗ ,T ]×) dy q +1 0 ∞ ∞ + (1 + y)q f (·, ·, y ) f (·, ·, y − y ) dy L r/2 ([t∗ ,T ]×) dy 0 ∞ ∞ 0 (1 + y + z)q f (·, ·, y) f (·, ·, z) L r/2 ([t∗ ,T ]×) dydz ≤ CT +
≤
0
0
q−1 ≤ CT + 2
∞ 0
2 (1 + y) f (·, ·, y) q
L r ([t
∗ ,T ]×)
dy
≤ CT .
Using again the properties of the heat kernel (still described in [DF06]), we see that for any s ∈ [1, ∞) and t∗ > 0, ∞ (1 + y)q f (·, ·, y) L s ([t∗ ,T ]×) dy ≤ C T . 0
The above argument can now be used with r = 4 and shows that ∞ (1 + y)q Q + ( f, f )(·, ·, y) L 2 ([t∗ ,T ]×) dy ≤ C T . 0
As a consequence, the standard energy estimate on the heat kernel implies that ∞ (1 + y)q f (T, ·, y) Hx1 dy ≤ C T . 0
450
J. A. Carrillo, L. Desvillettes, K. Fellner
Then, using a Gagliardo-Niremberg type interpolation and Theorem 1, we obtain ∞ (1 + y)q f (T, ·, y) − f ∞ (y) L ∞ dy x 0 ∞
3/4 (1 + y)q f (T, ·, y) − f ∞ (y) H 1 ≤ x 0
1/4 × f (T, ·, y) − f ∞ (y) L 1 dy x ∞ 3/4 4q/3 ≤ (1 + y) f (T, ·, y) − f ∞ (y) Hx1 dy 0
× ≤
1/4
∞
f (T, ·, y) − f ∞ (y) L 1x dy
0 3/4 C T exp(−Cst
T ) ≤ Cst exp(−Cst T ),
which concludes the proof of Proposition 1.
Acknowledgements. JAC acknowledges the support from DGI-MEC (Spain) project MTM2005-08024. KF is partially supported by the WWTF (Vienna) project “How do cells move?” and the Wittgenstein Award 2000 of Peter A. Markowich. JAC and KF appreciate the kind hospitality of the ENS de Cachan. The authors want to express their gratitude to the reviewer who helped us to improve this work.
References [AB] [Al] [Am] [AW] [CCD] [CCG] [CD] [Cs] [CP] [De] [DF05] [DF06] [DV01] [DV05] [Dr]
Aizenman, M., Bak, T.: Convergence to equilibrium in a system of reacting polymers. Commun. Math. Phys. 65, 203–230 (1979) Aldous, D.J.: Deterministic and stochastic models for coalescence (aggregation, coagulation): a review of the mean-field theory for probabilists. Bernoulli 5, 3–48 (1999) Amann, H.: Coagulation-fragmentation processes. Arch. Rat. Mech. Anal. 151, 339–366 (2000) Amann, H., Walker, C.: Local and global strong solutions to continuous coagulationfragmentation equations with diffusion. J. Differ. Eqs. 218, 159–186 (2005) Cáceres, M.J., Carrillo, J.A., Dolbeault, J.: Nonlinear stability in lp for solutions of the vlasovpoisson system for charged particles. SIAM J. Math. Anal. 34, 478–494 (2002) Cáceres, M.J., Carrillo, J.A., Goudon, T.: Equilibration rate for the linear inhomogeneous relaxation-time boltzmann equation for charged particles. Comm. Partial Differ. Eqs. 28, 969–989 (2003) Chae, D., Dubovskii, P.: Existence and uniqueness for spatially inhomogeneous coagulationcondensation equation with unbounded kernels. J. Integ. Eqs. Appl. 9, 219–236 (1997) Csiszár, I.: Information-type measures of difference of probability distributions and indirect observations. Studia Sci. Math. Hungar 2, 299–318 (1967) Collet, J.F., Poupaud, F.: Asymptotic behaviour of solutions to the diffusive fragmentationcoagulation system. Phys. D 114, 123–146 (1998) Desvillettes, L.: Convergence to equilibrium in large time for boltzmann and b.g.k. equations. Arch. Rat. Mech. Anal. 110, 73–91 (1990) Desvillettes, L., Fellner, K.: Exponential decay toward equilibrium via entropy methods for reaction-diffusion equations. J. Math. Anal. Appl. 319, 157–176 (2006) Desvillettes, L., Fellner, K.: Entropy methods for Reaction-Diffusion Equations: Degenerate Diffusion and Slowly Growing A-priori bounds. To appear in Rev. Matem. Iber. Desvillettes, L., Villani, C.: On the trend to global equilibrium in spatially inhomogeneous entropy-dissipating systems: the linear fokker-planck equation. Comm. Pure Appl. Math. 54, 1–42 (2001) Desvillettes, L., Villani, C.: On the trend to global equilibrium for spatially inhomogeneous kinetic systems: the boltzmann equation. Invent. Math. 159, 245–316 (2005) Drake, R.L.: “A general mathematical survey of the coagulation equation”. Topics in Current Aerosol Research (part 2), International Reviews in Aerosol Physics and Chemistry, Oxford: Pergamon Press, 1972, pp. 203–376
Exponential Equilibration Rate for the Inhomogeneous Aizenman-Bak Model
[ELMP] [FMS] [FNS] [JN] [Ki] [Ku] [LM02-1] [LM02-2] [LM03] [LM04] [LM05] [LW] [Le] [MP] [MW] [MN] [NS] [Ok] [PS] [Ro] [Sa] [S16] [S17] [V06]
451
Escobedo, M., Laurençot, Ph., Mischler, S., Perthame, B.: Gelation and mass conservation in coagulation-fragmentation models. J. Differ. Eqs. 195, 143–174 (2003) Fellner, K., Miljanovic, V., Schmeiser, C.: Convergence to equilibrium for the linearised cometary flow equation. Trans. Theory Stat. Phys. 35, 109–136 (2006) Fellner, K., Neumann, L., Schmeiser, C.: Convergence to global equilibrium for spatially inhomogeneous kinetic models of non-micro-reversible processes. Monatsh. Math. 141, 289– 299 (2004) Jabin, P.E., Niethammer, B.: On the rate of convergence to equilibrium in the becker-döring equations. J. Differ. Eqs. 191, 518–543 (2003) Kirane, M.: On stabilization of solutions of the system of parabolic differential equations describing the kinetics of an autocatalytic reversible chemical reaction. Bull. Inst. Mat. Acad. Sin. 18(4), 369–377 (1990) Kullback, S.: A lower bound for discrimination information in terms of variation. IEEE Trans. Information Theory 4, 126–127 (1967) Laurençot, Ph., Mischler, S.: The continuous coagulation-fragmentation equation with diffusion. Arch. Rat. Mech. Anal. 162, 45–99 (2002) Laurençot, Ph., Mischler, S.: From the discrete to the continuous coagulation-fragmentation equations. Proc. Roy. Soc. Edinburgh Sect. A 132, 1219–1248 (2002) Laurençot, Ph., Mischler, S.: Convergence to equilibrium for the continuous coagulationfragmentation equation. Bull. Sci. Math. 127, 179–190 (2003) Laurençot, Ph., Mischler, S.: On coalescence equations and related models. Modeling and Computational Methods for Kinetic Equations P. Degond, L. Pareschi, G. Russo eds., Boston: Birkhäuser, 2004, pp. 321–356 Laurençot, Ph., Mischler, S.: Liapunov functionals for smoluchowskis coagulation equation and convergence to self-similarity. Monatsh. Math. 146, 127–142 (2005) Laurençot, Ph., Wrzosek, D.: The becker-döring model with diffusion. ii. long-time behaviour. J. Differ. Eqs. 148, 268–291 (1998) Leyvraz, F.: Scaling theory and exactly solved models in the kinetics of irreversible aggregation. Phys. Rep. 383, 95–212 (2003) Menon, G., Pego, R.L.: Approach to self-similarity in smoluchowski’s coagulation equations. Comm. Pure Appl. Math. 57, 1197–1232 (2004) Mischler, S., Wennberg, B.: On the spatially homogeneous boltzmann equation. Ann. Inst. H. Poincaré Anal. Non Linéaire 16(4), 467–501 (1999) Mouhot, C., Neumann, L.: Quantitative study of convergence to equilibrium for linear collisional kinetic models in the torus. Nonlinearity 19, 969–998 (2006) Neumann, L., Schmeiser, C.: Convergence to global equilibrium for a kinetic fermion model. SIAM J. Math. Anal. 36, 1652–1663 (2005) Okubo, A.: Dynamical aspects of animal grouping: swarms, schools, flocks and herds. Adv. Biophys. 22, 1–94 (1986) Perelson, A.S., Samsel, R.W.: Kinetics of red blood cell aggregation: an example of geometric polymerization. Kinetics of aggregation and gelation, F. Family, D.P. Landau, eds., London: Elsevier, 1984 Rothe, F.: Global Solutions of Reaction-Diffusion Systems. Lecture Notes in Mathematics, Berlin: Springer, 1984 Safronov, V.S.: Evolution of the ProtoPlanetary cloud and Formation of the earth and the planets. Jerusalem: Israel Program for Scientific Translations Ltd., 1972 Smoluchowski, M.: Drei vorträge über diffusion, brownsche molekularbewegung und koagulation von kolloidteilchen. Physik Zeitschr 17, 557–599 (1916) Smoluchowski, M.: Versuch einer mathematischen theorie der koagulationskinetik kolloider lösungen. Z Phys Chem 92, 129–168 (1917) Villani, C.: Hypocoercive diffusion operators. International Congress of Mathematicians, Vol. III, Zürich, Eur. Math. Soc., 2006, pp. 473–498
Communicated by A. Kupiainen
Commun. Math. Phys. 278, 453–486 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0405-1
Communications in
Mathematical Physics
n ), Two-parameter Quantum Affine Algebra Ur,s (sl Drinfel’d Realization and Quantum Affine Lyndon Basis Naihong Hu1, , Marc Rosso2 , Honglian Zhang1,3, 1 Department of Mathematics, East China Normal University, Min Hang Campus, Dong Chuan Road 500,
Shanghai 200241, PR China. E-mail:
[email protected] 2 Départment Mathématiques et Applications, Ecole Normale Superieure, 45 Rue de Ulm,
75230 Paris Cedex 05, France. E-mail:
[email protected] 3 Department of Mathematics, Shanghai University, Shanghai 200444, PR China.
E-mail:
[email protected] Received: 22 August 2006 / Accepted: 23 July 2007 Published online: 8 January 2008 – © Springer-Verlag 2008
n ) (n > 2) Abstract: We further define two-parameter quantum affine algebra Ur,s (sl after the work on the finite cases (see [BW1,BGH1,HS,BH]), which turns out to be a Drinfel’d double. Of importance for the quantum affine cases is that we can work out the compatible two-parameter version of the Drinfel’d realization as a quantum affinization of Ur,s (sln ) and establish the Drinfel’d Isomorphism Theorem in the two-parameter setting, via developing a new combinatorial approach (quantum calculation) to the quantum affine Lyndon basis we present (with an explicit valid algorithm based on the use of Drinfel’d generators). 1. Introduction 1.1. In 2001, Benkart-Witherspoon investigated the structures of two-parameter quantum groups Ur,s (g) for g = gln, or sln in [BW1] originally obtained by Takeuchi [T], and the finite-dimensional weight representation theory in [BW2], and further obtained some new finite-dimensional pointed Hopf algebras in [BW3] when r s −1 is a root of unity, which possess new ribbon elements under some conditions (and will yield new invariants of knots and links). These show that two-parameter quantum groups are well worth further studying. 1.2. In 2004, Bergeron-Gao-Hu [BGH1] gave the structures of two-parameter quantum groups Ur,s (g) for g = so2n+1 , sp2n , so2n , and developed in [BGH2] the highest weight representation theory when r s −1 is not a root of unity. Especially, [BGH1] explored the environment condition upon which Lusztig’s symmetries exist for the classical simple Lie algebras g, namely, they exist as Q-isomorphisms between Ur,s (g) and the N.H., supported in part by the NNSF (Grants 10431040, 10728102), the PCSIRT, the TRAPOYT and the FUDP from the MOE of China, the SRSTP from the STCSM, die Deutche Forschungsgemeinschaft (DFG), as well as an ICTP long-term visiting scholarship. H.Z., supported by a Ph.D. Program Scholarship Fund of ECNU 2006.
454
N. Hu, M. Rosso, H. Zhang
associated object Us −1 ,r −1 (g) only when rank (g) = 2, and in the case when rank (g) > 2, the sufficient and necessary condition for the existence of Lusztig’s symmetries between Ur,s (g) and its associated object forces Ur,s (g) to take the “one-parameter” form Uq,q −1 (g), where r = s −1 = q. In other words, when rank (g) > 2, the Lusztig’s symmetries exist only for the one-parameter quantum groups Uq,q −1 (g) as Q(q)-automorphisms (rather merely as Q-isomorphisms). In this case, these symmetries give rise to, with respect to modulo some identification of group-like elements, the usual Lusztig symmetries on quantum groups Uq (g) of Drinfel’d-Jimbo type. The Lusztig symmetry property indicates that there do exist remarkable differences between the two-parameter quantum groups in question and the one-parameter quantum groups of Drinfel’d-Jimbo type. Afterwards, Hu-Shi [HS] and Bai-Hu [BH] studied the two-parameter quantum groups for type G 2 and E cases. Through these work, we found that the treatments in two-parameter cases are frequently more subtle to follow combinatorial approaches only, for instance, the description of the convex PBW-type basis (cf. [BH]) has to appeal to the use of Lyndon words (see [R2] and references therein) because there is no braid group available in question. Thereby so far, it seems desirable to extend these kind of two-parameter quantum groups in Benkart-Witherspoon’s sense in finite cases to the affine cases. The present paper is aimed at this purpose for the affine type A(1) n (n > 1) case. To this end, we n ) (n > 2) (whereas Ur,s (sl 2 ) is essentially first give the defining structure of Ur,s (sl 2 ) if set r s −1 = q 2 , which is not considered in the paper). isomorphic to Uq,q −1 (sl 1.3. As is well-known, the importance of the Drinfel’d generators (in the Drinfel’d realization) for quantum affine algebras is just like that of the loop generators (in the loop realization) for affine Kac-Moody algebras (see [Ga,K]). Early in 1987, Drinfel’d [Dr2] put forward his famous new (conjectural) realization of quantum affine algebras Uq ( g) with g semisimple, because he recognized that the study of finite dimensional representations of Uq ( g) is made easier by the use of this realization on the set of Drinfel’d generators, which is called the Drinfel’d realization of Uq ( g) or the Drinfel’d quantum affinization of Uq (g). Besides this, the Drinfel’d realization also finds its main contribution to the construction of vertex representations for quantum affine algebras Uq ( g) (see [FJ,J1,DI2], etc.), as does the loop realization in the vertex representation theory of affine Kac-Moody algebras (see [K]). In 1993, Khoroshkin-Tolstoy [KT] constructed the Drinfel’d realization for the untwisted types using a Cartan-Weyl generators system with no proof. The first perfect proof of the Drinfel’d isomorphsim only for the untwisted types was given by Beck [B2] in 1994, making use of his extended braid group actions, based on the work of Damiani [Da], Levendorskii-Soibel’man-Stukopin [LSS] for the 2 ). In 1998, Jing [J2] basically adopted the inverse map suggested by Beck case Uq (sl for the untwisted types (see the final remark in [B2, Sect. 4]) and gave a combinatorial proof for the Drinfel’d isomorphism for the untwisted types. 1.4. In order to further explore and enrich the structure and representation theory of the two-parameter quantum affine algebras later on, another main result of this paper is to n ) (n > 2). Its definition depends on the selfgive the Drinfel’d realization of Ur,s (sl compatible defining system (Definition 3.1), which in the two-parameter setting, varies dramatically in comparison with the one-parameter cases (see [Dr2], or [B2, Theorem 4.7]) and is nontrivial to match up here and there the whole relations together. Indeed, to invent the two-parameter version of Drinfel’d realization needs some insights, e.g., from the antisymmetric point of view via the Q-algebra antiautomorphism τ , based on
Two-Parameter Quantum Affine Algebra, Drinfel’d Realization
455
some information from the combinatorial description of the convex PBW-type basis via the Lyndon words (see [R2,BH], etc.), and also, the proof of the Drinfel’d isomorphism in our case depends completely on the combinatorial approach with specific techniques to design those defining relations in order to fit the compatibilities in the whole system. If the readers follow the details, they will find how our quantum calculations (somehow a bit tedious) work well and necessarily for exactly verifying the compatibilities of the defining system. The reason is that the method we expanded, to some extent, essentially follows an approach to a kind of description of the quantum “affine” Lyndon basis. Actually, we can construct explicitly all quantum real and imaginary root vectors using this method (see Lemmas 4.7 & 4.8, together with Definition 3.9). 1.5. The paper is organized as follows. We first give the structure of two-parameter n ) (n > 2) as a Hopf algebra in Sect. 2. We prove that the quantum affine algebra Ur,s (sl n ) is characterized as a Drinfel’d double two-parameter quantum affine algebra Ur,s (sl D(B, B ) of Hopf subalgebras B, B with respect to a skew-dual pairing. In Sect. 3, we explicitly describe the two-parameter Drinfel’d quantum affinization of Ur,s (sln ) (n > 2), that is, the Drinfel’d realization in the two-parameter case which is antisymmetric with respect to the Q-algebra antiautomorphism τ . In the case when r s = 1, i.e., r = s −1 = q, our result modulo some identification yields the usual Drinfel’d realizan ) of Drinfel’d-Jimbo type (see [Dr2,B2,DI1,J2], tion of a quantum affine algebra Uq (sl etc.). Since Beck’s extended braid group actions approach is invalid for our case, we combine the Lyndon words description ([R2]) with the quantum Lie bracket operation ([J2]) to develop a combinatorial trick in the quantum affine case (we call it quantum calculations), which can be utilized in the construction of all the quantum root vectors (including real and imaginary ones), so that we can formulate and prove the quantum “affine” Lyndon basis for the first time (in a more explicit form than that of [B1]) for Ur,s ( n± ) based on the Drinfel’d realization in Sect. 3, and further prove the Drinfel’d isomorphism using our combinatorial algorithm in Sect. 4. In fact, our proof also provides a concrete process of how to construct the Drinfel’d generators using the Chevalley-KacLusztig generators. n ) and Drinfel’d Double 2. Quantum Affine Algebra Ur,s (sl 2.1. Let K = Q(r, s) denote a field of rational functions with two-parameters r , s (r = ±s). Assume is a finite root system of type An−1 with a base of simple roots. Regard as a subset of a Euclidean space E = Rn with an inner product ( , ). Set I = {1, · · · , n − 1}, I0 = {0} ∪ I . Let ε1 , ε2 , · · · , εn denote an orthonormal basis of E, then we can take = {αi = εi − εi+1 | i ∈ I } and = {εi − ε j | i = j ∈ I }. Let δ denote n . Take α0 = δ − (ε1 − εn ), then = {αi | i ∈ I0 } is the primitive imaginary root of sl n . a base of simple roots of affine Lie algebra sl Let A = (ai j ) (i, j ∈ I0 ) be a generalized Cartan matrix associated to affine Lie n . Let h be a vector space over K with a basis { h 0 , h 1 , · · · , h n−1 , d } and algebra sl define the linear action of αi (i ∈ I0 ) on h by αi (h j ) = a ji ,
αi (d) = δi,0 , for j ∈ I0 .
n . The standard nondegenerate Let Q = Zα0 +· · ·+Zαn−1 denote the root lattice of sl symmetric bilinear form (· , ·) on h∗ satisfies (αi , α j ) = ai j ,
(δ, αi ) = (δ, δ) = 0, ∀i, j ∈ I0 .
456
N. Hu, M. Rosso, H. Zhang
n ) (n > 2) be the unital associative algebra over K Definition 2.1. Let U = Ur,s (sl 1 1 ±1 generated by the elements e j , f j , ω±1 ( j ∈ I0 ), γ ± 2 , γ ± 2 , D ±1 , D ±1 (called j , ωj the Chevalley-Kac-Lusztig generators), satisfying the following relations: (A1) γ ± 2 , γ ± 2 are central with γ = ωδ , γ = ωδ , γ γ = r s, such that ωi ωi−1 = ωi ωi −1 = 1 = D D −1 = D D −1 , and 1
1
[ ωi±1 , ω j±1 ] = [ ωi±1 , D ±1 ] = [ ωj±1 , D ±1 ] = [ ωi±1 , D ±1 ] = 0 = [ ωi±1 , ωj±1 ] = [ ωj±1 , D ±1 ] = [D ±1 , D ±1 ] = [ ωi±1 , ωj±1 ]. (A2) For i ∈ I0 and j ∈ I , D ei D −1 = r δ0i ei , ω j ei ω j−1 = r (ε j ,αi ) s (ε j+1 ,αi ) ei , ω0 ei ω0−1 = r −(εi+1 ,α0 ) s (ε1 ,αi ) ei ,
D f i D −1 = r −δ0i f i , ω j f i ω j−1 = r −(ε j ,αi ) s −(ε j+1 ,αi ) f i , ω0 f i ω0−1 = r (εi+1 ,α0 ) s −(ε1 ,αi ) f i .
(A3) For i ∈ I0 and j ∈ I , D ei D −1 = s δ0i ei , ωj ei ωj−1 = s (ε j ,αi )r (ε j+1 ,αi ) ei , ω0 ei ω0 −1 = s −(εi+1 ,α0 )r ( 1 ,αi ) ei ,
D f i D −1 = s −δ0i f i , ωj f i ωj−1 = s −(ε j ,αi )r −(ε j+1 ,αi ) f i , ω0 f i ω0 −1 = s (εi+1 ,α0 )r −( 1 ,αi ) f i .
(A4) For i, j ∈ I0 , we have [ ei , f j ] =
δi j (ωi − ωi ). r −s
(A5) For i, j ∈ I0 , but (i, j) ∈ / { (0, n − 1), (n − 1, 0) } with ai j = 0, we have [ ei , e j ] = 0 = [ f i , f j ]. (A6) For i ∈ I0 , we have the (r, s)-Serre relations: ei2 ei+1 − (r + s) ei ei+1 ei + (r s) ei+1 ei2 = 0, 2 2 ei ei+1 − (r + s) ei+1 ei ei+1 + (r s) ei+1 ei = 0, 2 2 en−1 e0 − (r + s) en−1 e0 en−1 + (r s) e0 en−1 = 0,
en−1 e02 − (r + s) e0 en−1 e0 + (r s) e02 en−1 = 0. (A7) For i ∈ I0 , we have the (r, s)-Serre relations: f i2 f i+1 − (r −1 + s −1 ) f i f i+1 f i + (r −1 s −1 ) f i+1 f i2 = 0, 2 2 f i f i+1 − (r −1 + s −1 ) f i+1 f i f i+1 + (r −1 s −1 ) f i+1 f i = 0, 2 2 f n−1 f 0 − (r −1 + s −1 ) f n−1 f 0 f n−1 + (r −1 s −1 ) f 0 f n−1 = 0,
f n−1 f 02 − (r −1 + s −1 ) f 0 f n−1 f 0 + (r −1 s −1 ) f 02 f n−1 = 0.
Two-Parameter Quantum Affine Algebra, Drinfel’d Realization
457
n ) is a Hopf algebra with the coproduct , the counit ε and the antipode S Ur,s (sl defined below: for i ∈ I0 , we have 1
1
1
(γ ± 2 ) = γ ± 2 ⊗ γ ± 2 , (D ±1 ) = D ±1 ⊗ D ±1 , (wi ) = wi ⊗ wi , (ei ) = ei ⊗ 1 + wi ⊗ ei , 1
1
1
1
(γ ± 2 ) = γ ± 2 ⊗ γ ± 2 , (D ±1 ) = D ±1 ⊗ D ±1 , (wi ) = wi ⊗ wi , ( f i ) = f i ⊗ wi + 1 ⊗ f i ,
1
ε(ei ) = ε( f i ) = 0, ε(γ ± 2 ) = ε(γ ± 2 ) = ε(D ±1 ) = ε(D ±1 ) = ε(wi ) = ε(wi ) = 1, 1
1
S(ei ) =
−wi−1 ei ,
S(γ ± 2 ) = γ ∓ 2 ,
1
1
S(γ ± 2 ) = γ ∓ 2 , S( f i ) =
S(D ±1 ) = D ∓1 ,
− f i wi−1 ,
S(wi ) =
wi−1 ,
S(D ±1 ) = D ∓1 , S(wi ) = wi−1 .
2.2. In what follows, we give the skew-pairing and the Drinfel’d double structure. Definition 2.2. A bilinear form , : B × A −→ K is called a skew-dual pairing of two Hopf algebras A and B (see [KS, 8.2.1] ), if it satisfies b, 1A = εB(b),
1B, a = εA(a),
op
b, a1 a2 = B(b), a1 ⊗ a2 ,
b1 b2 , a = b1 ⊗ b2 , A(a) ,
for all a, a1 , a2 ∈ A and b, b1 , b2 ∈ B, where εA, εB denote the counits of A, B, respectively, and A, B are the respective coproducts. Definition 2.3. For any two Hopf algebras A and B skew-paired by , , there exists a Drinfel’d quantum double D(A, B) which is a Hopf algebra whose underlying coalgebra is A ⊗ B with the tensor product coalgebra structure, whose algebra structure is defined by (a ⊗ b)(a ⊗ b ) = SB(b(1) ), a(1)
b(3) , a(3)
aa(2) ⊗ b(2) b , for a, a ∈ A and b, b ∈ B, and whose antipode S is given by S(a ⊗ b) = (1 ⊗ SB(b))(SA(a) ⊗ 1). ) denote the Hopf (Borel-type) subalgebra of Ur,s (sl n ) generated by (resp. B Let B ±1 ±1 ± 21 ± 21 ±1 ±1 e j , ω j , γ , D (resp. f j , ω j , γ , D ) with j ∈ I0 . × B −→ K of the Proposition 2.4. There exists a unique skew-dual pairing , : B Hopf subalgebras B and B such that: (1) f i , e j = δi j
1 , s −r
(i, j ∈ I0 ),
(2) ωi ,
r (ε j , αi ) s (ε j+1 , αi ) , (i ∈ I0 , j ∈ I ) ω j = −(εi+1 , α0 ) (ε1 , αi ) r s , (i ∈ I0 , j = 0),
458
N. Hu, M. Rosso, H. Zhang
(3) ±1 −1 ωi ±1 , ω−1 = ωi , ω j ∓1 , (i, j ∈ I0 ), j = ωi , ω j
(4) 1
1
1
γ ± 2 , γ = γ , γ ± 2 = γ , γ ± 2 = 1, (5) D , D ±1 = D ±1 , D = D , D ±1 = 1, (6) γ ± 2 , ωi±1 = 1 = ωi ±1 , γ ± 2 , (i ∈ I0 ), 1
1
(7) D ±1 , ωi = D , ωi±1 = s ∓δ0i , ωi±1 , D = ωi , D ±1 = r ±δ0i , (i ∈ I0 ), (8) 1
1
1
D , γ ± 2 = D ±1 , γ 2 = s ∓ 2 ,
1
1
1
γ ± 2 , D = γ 2 , D ±1 = r ± 2 ,
and all other pairs of generators are 0. Moreover, we have S(b ), S(b) = b , b for , b ∈ B. b ∈ B Proof. The uniqueness assertion is clear, as any skew-dual pairing of bialgebras is determined by the values on the generators. We proceed to prove the existence of the pairing. The pairing defined on generators as (1)—(8) may be extended to a bilinear form on × B in a way such that the defining properties in Definition 2.2 hold. We will verify B are preserved, ensuring that the form is well-defined and and B that the relations in B . and B is a skew-dual pairing of B First, it is straightforward to check that the bilinear form preserves all the relations 1 . Next, we observe that and the ω ±1 , γ ± 21 , D ±1 in B among the ωi±1 , γ ± 2 , D ±1 in B i the identities hold: for i, j ∈ I , (ε j , αi ) = −(εi+1 , α j ),
(ε j , α0 ) = −(ε1 , α j ),
(2.1)
which ensure the compatibility of the form defined above with the relations of (A2) and respectively. This fact is easily checked by definition (see (1)—(8)). So or B (A3) in B . and B we are left to verify that the form preserves the (r, s)-Serre relations in B For 1 ≤ i < n, (r, s)-Serre relations in B and B have been checked in [BW1]. Here . It suffices to and B we need only to verify the relations involving index i = 0 in B consider the following case (the remaining case is similar) X, e02 en−1 − (r −1 + s −1 )e0 en−1 e0 + (r s)−1 en−1 e02 , . By definition, this equals where X is any word in the generators of B (2) (X ), e0 ⊗ e0 ⊗ en−1 − (r −1 + s −1 )e0 ⊗ en−1 ⊗ e0 + (r s)−1 en−1 ⊗ e0 ⊗ e0 ,
(2.2)
Two-Parameter Quantum Affine Algebra, Drinfel’d Realization op
459
where stands for B . In order for any one of these terms to be nonzero, X must involve exactly two f 0 factors, one f n−1 factor, and arbitrarily many ω±1 ( j ∈ I0 ), j 1
γ ± 2 , or D ±1 factors. For simplicity, we first consider three key cases: (i) If X = f 02 f n−1 , then (2) (X ) is equal to (ω0 ⊗ ω0 ⊗ f 0 + ω0 ⊗ f 0 ⊗ 1 + f 0 ⊗ 1 ⊗ 1)2 (ωn−1 ⊗ ωn−1 ⊗ f n−1 +ωn−1 ⊗ f n−1 ⊗ 1 + f n−1 ⊗ 1 ⊗ 1).
The relevant terms of (2) (X ) are f 0 ω0 ωn−1 ⊗ f 0 ωn−1 ⊗ f n−1 + ω0 f 0 ωn−1 ⊗ f 0 ωn−1 ⊗ f n−1 + f 0 ω0 ωn−1 ⊗ ω0 f n−1 ⊗ f 0 + ω0 f 0 ωn−1 ⊗ ω0 f n−1 ⊗ f 0
+ω02 f n−1 ⊗ f 0 ω0 ⊗ f 0 + ω02 f n−1 ⊗ ω0 f 0 ⊗ f 0 . Therefore, (2.2) becomes f 0 ω0 ωn−1 , e0 f 0 ωn−1 , e0 f n−1 , en−1
+ ω0 f 0 ωn−1 , e0 f 0 ωn−1 , e0 f n−1 , en−1
−1 −1 − (r +s ) f 0 ω0 ωn−1 , e0 ω0 f n−1 , en−1 f 0 , e0
, e0 ω0 f n−1 , en−1 f 0 , e0
+ ω0 f 0 ωn−1 + (r s)−1 ω02 f n−1 , en−1 f 0 ω0 , e0 f 0 , e0
+ ω02 f n−1 , en−1 ω0 f 0 , e0 f 0 , e0
1 1 + ω0 , ω0 − (r −1 +s −1 ) ω0 , ωn−1 + ω0 , ω0 ω0 , ωn−1
= (s − r )3 +(r s)−1 ω0 , ωn−1 2 + ω0 , ωn−1 2 ω0 , ω0
1 −1 −1 −1 −1 −1 2 2 −1 = 1 + r s − (r +s )(s + r s s) + (r s) (s + s r s ) (s − r )3 = 0.
(ii) When X = f 0 f n−1 f 0 , it is easy to get the relevant terms of (2) (X ): ω0 ωn−1 f 0 ⊗ f 0 ωn−1 ⊗ f n−1 + f 0 ωn−1 ω0 ⊗ ωn−1 f 0 ⊗ f n−1 +ω0 ωn−1 f 0 ⊗ ω0 f n−1 ⊗ f 0 + f 0 ωn−1 ω0 ⊗ f n−1 ω0 ⊗ f 0
+ω0 f n−1 ω0 ⊗ ω0 f 0 ⊗ f 0 + ω0 f n−1 ω0 ⊗ f 0 ω0 ⊗ f 0 .
Thus, (2.2) becomes 1 ω0 , ω0 ωn−1 , ω0 + ωn−1 , ω0
(s−r )3 −(r −1 +s −1 ) ω0 , ω0 ωn−1 , ω0 ω0 , ωn−1 + 1 +(r s)−1 ω0 , ωn−1 ω0 , ω0 + ω0 , ωn−1
1 −1 −1 −1 −1 −1 −1 −1 −1 −1 r s = · r +r −(r +s )(r s · r s+1)+(r s) (s · r s +s) (s−r )3 = 0.
460
N. Hu, M. Rosso, H. Zhang
(iii) If X = f n−1 f 02 , one can similarly get that (2.2) vanishes. Finally, if X is any word involving exactly two f 0 factors, one f n−1 factor, and 1 arbitrarily many factors ωj±1 ( j ∈ I0 ), γ ± 2 and D ±1 , then (2.2) will just be a scalar multiple of one of the quantities we have already calculated, and then will be 0. are preserved. Analogous calculations show that the relations in B ) is isomorphic to Ur,s (sl n ) as Hopf algebras. B Theorem 2.5. D(B, ) by ei and similarly for ω±1 , γ ± 2 , B Proof. We denote the image ei ⊗ 1 of ei in D(B, i 1 ) by B D ±1 , denote the image 1⊗ fi of f i in D(B, f i and similarly for ωi ±1 , γ ± 2 , D ±1 . ) −→ Ur,s (sl n ) by B Define a map ϕ : D(B, 1
ϕ( ei ) = ei , ϕ( f i ) = f i , ϕ( ωi±1 ) = ωi±1 , ϕ( ωi±1 ) = ωi±1 , ϕ( γ ± 2 ) = γ ± 2 , ϕ(γ 1
1
± 21
) = γ
± 21
±1 ) = D ±1 , ϕ( , ϕ( D D
±1
The remaining argument is analogous to that of [BGH1, Theorem 2.5].
) = D
±1
.
Remark 2.6. (1) Up to now, we have completely solved the compatibility problem on n ) (n > 2). the defining relations of our two-parameter quantum affine algebra Ur,s (sl This is done in two steps: the proof of Theorem 2.5 indicates that the cross relations are half of the relations (A1)—(A4), and the proof of Proposition 2.4 and B between B shows the remaining relations, including the remaining half of relations (A1)—(A4) and the (r, s)-Serre relations (A5)—(A7). n ) modulo the Hopf ideal gener(2) When r = s −1 = q, the Hopf algebra Uq,q −1 (sl ated by the set { ωi − ωi−1 (i ∈ I0 ), γ 2 − γ − 2 , D − D −1 } is the usual quantum affine n ) of Drinfel’d-Jimbo type. algebra Uq (sl 1
1
Let U 0 = K[ω0±1 , · · · , ωn±1 , ω0 ±1 , · · · , ωn ±1 ], U0 = K[ω0±1 , · · · , ωn±1 ], and U0 = n ), B, and K[ω0 ±1 , · · · , ωn ±1 ] denote the Laurent polynomial subalgebras of Ur,s (sl 0 n) B respectively. Clearly, U = U0 U0 = U0 U0 . Furthermore, let us denote by Ur,s ( ) generated by ei (resp. f i ) for all i ∈ I0 . (resp. B (resp. Ur,s ( n − ) ) the subalgebra of B = U Ur,s ( = Ur,s ( n ) U0 , and B n − ), so that the Thus, by definition, we have B 0 ) ∼ B double D(B, n ) ⊗ U 0 ⊗ Ur,s ( n − ), as vector spaces. On the other hand, if = Ur,s ( − we consider , : B × B −→ K by b , b − := S(b ), b , the convolution inverse of the skew-dual pairing , in Proposition 2.4, the composition with the flip mapping −→ K, given × B σ then gives rise to a new skew-dual pairing | := , − ◦ σ : B by b|b = S(b ), b . As a byproduct of Theorem 2.5, similar to [BGH1, Cor. 2.6], we n ). get the standard triangular decomposition of Ur,s (sl n ) ∼ n − ) ⊗ U 0 ⊗ Ur,s ( n ), as vector spaces. Corollary 2.7. Ur,s (sl = Ur,s (
n n ), the defining relaζi αi ∈ Q (the root lattice of sl Corollary 2.8. For any ζ = i=0 tions (A2) and (A3) in Ur,s (sln ) take the form: ωζ ei ωζ−1 = ωi , ωζ ei , ωζ ei ωζ
−1
= ωζ , ωi −1 ei ,
ωζ f i ωζ−1 = ωi , ωζ −1 f i , ωζ f i ωζ
−1
= ωζ , ωi f i .
Two-Parameter Quantum Affine Algebra, Drinfel’d Realization
Ur,s ( n ±) =
461
±η
Ur,s ( n ± ) is then Q ± -graded with
−1 η ( n ± ) = a ∈ Ur,s ( n ± ) ωζ a ωζ−1 = ωη , ωζ a, ωζ a ωζ = ωζ , ωη −1 a , Ur,s η∈Q +
for η ∈ Q + ∪ Q − . η Furthermore, U = η∈Q Ur,s (sl n ) is Q-graded with
η (sln ) = ων E β ∈ U ωζ (Fα ωµ ων E β ) ωζ−1 = ωβ−α , ωζ Fα ωµ ων E β , Ur,s Fα ωµ −1 ωζ (Fα ωµ ων E β ) ωζ = ωζ , ωβ−α −1 Fα ωµ ων E β , with β − α = η , where Fα (resp. E β ) runs over monomials f i1 · · · f il (resp. e j1 · · ·e jm ) such that αi1 + · · · + αil = α (resp. α j1 + · · · + α jm = β). n ) such that τ (r ) = Definition 2.9. Let τ be the Q-algebra anti-automorphism of Ur,s (sl s, τ (s) = r , τ ( ωi , ω j ±1 ) = ωj , ωi ∓1 , and τ (ei ) = f i , τ ( f i ) = ei , τ (ωi ) = ωi , τ (ωi ) = ωi , τ (γ ) = γ , τ (γ ) = γ , τ (D) = D , τ (D ) = D. = τ (B) with those induced defining relations from B, and those cross relations Then B in (A2)—(A4) are antisymmetric with respect to τ . n ) and Quantum Affine Lyndon Basis 3. Drinfel’d Realization of Ur,s (sl n ) (n > 2) we defined in 3.1. For the two-parameter quantum affine algebra Ur,s (sl Sect. 2, we give the following definition of its Drinfel’d realization. In the two-parameter case, the defining relations (D2), (D6), (D7) and (D8) below appear to vary dramatically in comparison with the one-parameter cases (see (d2), (d6), (d7) and (d8) in Remark 3.3), where the compatibilities for the whole system are based on some intrinsic considerations as indicated in the sequel. We briefly write i, j := ωi , ω j . n ) (n > 2) be the unital associative algebra over K generated Definition 3.1. Let Ur,s (sl
by the elements xi± (k), ai (), ωi±1 , ωi ±1 , γ ± 2 , γ ± 2 , D ±1 , D ±1 (i ∈ I , k, k ∈ Z, , ∈ Z\{0}), subject to the following defining relations: 1
1
(D1) γ ± 2 , γ ± 2 are central with γ γ = r s, ωi ωi−1 = ωj ωj−1 = 1 = D D −1 = D D −1 (i, j ∈ I ), and 1
1
[ ωi±1 , ω j±1 ] = [ ωi±1 , D ±1 ] = [ ωj±1 , D ±1 ] = [ ωi±1 , D ±1 ] = 0 = [ ωi±1 , ωj±1 ] = [ ωj±1 , D ±1 ] = [D ±1 , D ±1 ] = [ ωi±1 , ωj±1 ]. (D2) ||
ai j
(r s) 2 ( i, i 2 − i, i − [ ai (), a j ( ) ] = δ+ ,0 ||(r − s)
ai j 2
) γ || − γ || · . r −s
462
N. Hu, M. Rosso, H. Zhang
(D3) ±1
[ ai (), ω±1 j ] = [ ai (), ω j ] = 0.
(D4) D xi± (k) D −1 = r k xi± (k),
D xi± (k) D −1 = s k xi± (k),
D ai () D −1 = r ai (),
D ai () D −1 = s ai ().
(D5) −1 ±1 ± ωi x ± j (k) ωi = j, i x j (k),
−1 ωi x ± = i, j ∓1 x ± j (k) ωi j (k).
(D61 ) ai j
||
[ ai (), x ± j (k) ]
(r s) 2 ( i, i 2 − i, i − =± (r − s)
[ ai (), x ± j (k) ]
(r s) 2 ( i, i 2 − i, i − =± (r − s)
ai j 2
)
γ ± 2 x± j (+k), for < 0,
)
γ ± 2 x ± j (+k), for > 0.
(D62 ) ai j
||
ai j 2
(D7) ± ±1 ± xi± (k+1) x ± j (k ) − j, i x j (k ) x i (k+1) ± 1 2 ± ± ±1 ± = − j, i i, j −1 x± j (k +1) x i (k) − i, j x i (k) x j (k +1) .
(D8) [ xi+ (k),
x− j (k ) ]
δi j = r −s
−k − k+k k k+k 2 2 γ γ ωi (k+k ) − γ γ ωi (k+k ) ,
where ωi (m), ωi (−m) (m ∈ Z≥0 ) with ωi (0) = ωi and ωi (0) = ωi are defined by: ∞ ∞ −m − ωi (m)z = ωi exp (r −s) ai ()z ; m=0 ∞
ωi (−m)z m
=
ωi
=1 ∞
exp −(r −s)
ai (−)z
,
=1
m=0
with ωi (−m) = 0 and ωi (m) = 0, ∀ m > 0. (D91 ) ± ± xi± (m)x ± j (k) = x j (k)x i (m),
(D92 )
for ai j = 0,
± ±1 + s ±1 ) xi± (m 1 )x ± Sym m 1 , m 2 xi± (m 1 )xi± (m 2 )x ± j (k) − (r j (k)x i (m 2 ) ± ± +(r s)±1 x ± j (k)x i (m 1 )x i (m 2 ) = 0, for ai j = −1, 1 ≤ i < j < n,
Two-Parameter Quantum Affine Algebra, Drinfel’d Realization
(D93 )
463
± ∓1 Sym m 1 , m 2 xi± (m 1 )xi± (m 2 )x ± + s ∓1 ) xi± (m 1 )x ± j (k) − (r j (k)x i (m 2 ) ± ± +(r s)∓1 x ± (k)x (m )x (m ) = 0, for ai j = −1, 1 ≤ j < i < n, 1 2 j i i
Sym denotes symmetrization with respect to the indices (m 1 , m 2 ). As one of crucial observations of the compatibilities of the defining system above, we have n ) (n > 2) Proposition 3.2. There exists the Q-algebra antiautomorphism τ of Ur,s (sl such that τ (r ) = s, τ (s) = r , τ ( ωi , ω j ±1 ) = ωj , ωi ∓1 and τ (ωi ) = ωi , τ (ωi ) = ωi , τ (γ ) = γ , τ (γ ) = γ , τ (D) = D , τ (D ) = D, τ (ai ()) = ai (−), τ (xi± (m)) = xi∓ (−m),
τ (ωi (m)) = ωi (−m), τ (ωi (−m)) = ωi (m),
and τ preserves each defining relation (Dn) in Definition 3.1 for n = 1, · · · , 9. Remark 3.3. (1) Note that the defining relations (D1)—(D5), (D7), (D8), and (D91 )— (D93 ) are self-compatible each under the Q-algebra antiautomorphism τ , while the couple of the defining relations ((D61 ),(D62 )) is compatible with each other with respect to τ . Using such a τ , it is sufficient to consider the compatibility for half of the relations, e.g., those relations involving in +-parts for xi± (m), or in positive ’s for ai () (for instance, see (D62 )). (2) The constraint condition γ γ = r s in (D1) is required intrinsically by the compatibilities among (D1), (D3), (D5), (D6), (D7) & (D8). For instance, by (D7), we have 1 − − 2 [ xi− (0), x − j (1) ] i, j = ( j, i i, j ) [ x i (1), x j (0) ] j,i −1 . Thus, using the property (3.5) in Definition 3.4 below and (D8) & (D5), we get − − + 2 [ x +j (0), [ xi− (0), x − j (1) ] i, j ] = ( j, i i, j ) [ x i (1), [ x j (0), x j (0) ] ] j,i −1 ω j − ωj 1 − = ( j, i i, j ) 2 xi (1), r −s −1 1
j,i
=
1 2
( j, i i, j ) − ( j, i i, j ) r −s
− 12
xi− (1)ω j .
However, using (3.5), (D8), (D3), (D5) & (D62 ), we can follow another way to expand [ x +j (0), [ xi− (0), x − j (1) ] i, j ] directly as − − + [ x +j (0), [ xi− (0), x − j (1) ] i, j ] = [ x i (0), [ x j (0), x j (1) ] ] i, j
= γ − 2 [ xi− (0), a j (1) ] ω j 1
1 2
− 21
= (r s) (γ γ )
j, j
a ji 2
− j, j − r −s
a ji 2
xi− (1)ω j .
464
N. Hu, M. Rosso, H. Zhang
Therefore, we obtain that γ γ = r s and i, j j, i = i, i ai j , for any i, j ∈ I . (3) As a glimpse of the compatibility of (D2) with (D61 ), (D62 ) and (D8), we 1 have the following: By (D8), we get ai (1) = ωi−1 γ 2 [ xi+ (0), xi− (1) ] and ai (−1) = 1
ωi−1 γ 2 [ xi+ (−1), xi− (0) ]. Then using one of these expressions of ai (±1) and using (D61 ) (or (D62 )) and (D8) again, we may expand the Lie bracket [ai (1), a j (−1)] in two manners to get to the same formula as (D2). One is to expand ai (1) first, and then to use (D61 ) & (D8) as follows: [ ai (1), a j (−1) ] = ωi−1 γ 2 [ [xi+ (0), xi− (1)], a j (−1) ] 1 = ωi−1 γ 2 [ [xi+ (0), a j (−1)], xi− (1) ] + [ xi+ (0), [xi− (1), a j (−1)] ] 1 1 1 = ωi−1 γ 2 {−ai j } i,i γ − 2 [xi+ (−1), xi− (1)] − γ 2 [xi+ (0), xi− (0)] ωi − ωi −1 γ ωi − γ ωi −γ = {−ai j } i,i ωi r −s r −s γ − γ γ − γ = {ai j } i,i
, = {−ai j } i,i
r −s r −s 1
(r s)
ai j || 2 ( i,i 2
− i,i − (r −s)
ai j 2
) where {ai j } i,i := = {a ji } j, j , {−ai j } i,i = −{ai j } i,i . Expanding a j (−1) instead and using (D62 ) & (D8), we get the same result. More compatibilities will be clearer in the proof of the Drinfel’d isomorphism theorem. n ) (4) Another observation is the following: When r = s −1 = q, the algebra Uq,q −1 (sl
modulo the ideal generated by the set { ωi − ωi−1 (i ∈ I ), γ 2 − γ − 2 , D − D −1 } is n ) defined below (cf. [B2]). exactly the usual Drinfel’d realization Uq (sl The unital associative algebra Uq (sln ) over Q(q) is generated by the elements xi± (k), 1
1
ai (), ωi±1 , γ ± 2 , D ±1 , (i ∈ I , k ∈ Z, ∈ Z\{0}) subject to the following defining relations: 1
(d1) γ ± 2 are central, ωi ωi−1 = 1 = D D −1 (i ∈ I ), and for i, j ∈ I , one has 1
[ ωi±1 , ω j±1 ] = [ ωi±1 , D ±1 ] = 0. (d2) [ ai j ] γ − γ − · [ ai (), a j ( ) ] = δ+ , 0 , q − q −1
q n − q −n . [n] = q − q −1
(d3) [ ai (), ω±1 j ] = 0. (d4) D xi± (k) D −1 = q k xi± (k),
D ai () D −1 = q ai ().
(d5) −1 ±ai j ± x j (k). ωi x ± j (k) ωi = q
Two-Parameter Quantum Affine Algebra, Drinfel’d Realization
465
(d6) [ ai (), x ± j (k) ] = ±
[ ai j ] ∓ || ± γ 2 x j (+k).
(d7) ±ai j ± ± xi± (k + 1)x ± x j (k )xi (k+1) j (k ) − q ± ± = q ±ai j xi± (k)x ± j (k +1) − x j (k +1)x i (k).
(d8) [ xi+ (k),
x− j (k ) ]
δi j = q − q −1
k− k2 k − k2 −1 γ ωi (k+k ) − γ ωi (k+k ) ,
where ωi (m) and ωi−1 (−m) (m ∈ Z≥0 ) with ωi (0) = ωi and ωi−1 (0) = ωi−1 are defined by: ∞ ∞ −m −1 − ωi (m)z = ωi exp (q−q ) ai ()z , (ωi (−m) = 0, ∀ m > 0) ; m=0 ∞
=1
ωi−1 (−m)z m = ωi−1 exp −(q−q −1 )
m=0
∞
ai (−)z ,
ωi−1 (m) = 0, ∀ m > 0 .
=1
(d91 ) ± ± xi± (m)x ± j (k) = x j (k)x i (m),
(d92 )
(d93 )
for ai j = 0,
± ±1 + q ∓1 ) xi± (m 1 )x ± Sym m 1 , m 2 xi± (m 1 )xi± (m 2 )x ± j (k) − (q j (k)x i (m 2 ) ± ± +x ± j (k)x i (m 1 )x i (m 2 ) = 0, for ai j = −1, 1 ≤ i < j < n, ± ∓1 + q ±1 ) xi± (m 1 )x ± Sym m 1 , m 2 xi± (m 1 )xi± (m 2 )x ± j (k) − (q j (k)x i (m 2 ) ± ± +x ± j (k)x i (m 1 )x i (m 2 ) = 0, for ai j = −1, 1 ≤ j < i < n.
3.2. Before putting forward the Drinfel’d isomorphism theorem, that is, showing that the n )(n > 2) in Definition 3.1 is exactly the Drinfel’d realization of Q(r, s)-algebra Ur,s (sl n ) (n > 2) defined in Definition 2.1, the two-parameter quantum affine algebra Ur,s (sl we need to make some preliminaries on Lyndon words, and to adapt a definition of quantum Lie bracket borrowed from [J2] to give our definition about “affine” quantum Lie bracket (see Definition 3.6) which enables us to derive an interesting description on the quantum affine Lyndon basis in the quantum affine cases for the first time. Note that the (affine) quantum Lie bracket possesses some advantages in calculations such as less related to degrees of elements (see the properties (3.3) & (3.4) below). This generalized quantum Lie bracket, like the one used in the usual construction of the quantum Lyndon basis (for definition, see [R2]), is consistent with the process when adding the bracketing on those corresponding Lyndon words. This is crucial to the quantum calculations we develop later on.
466
N. Hu, M. Rosso, H. Zhang
Definition 3.4 ([J2]). The quantum Lie bracket [ a1 , a2 , · · · , as ](q1 , q2 , ··· , qs−1 ) is defined inductively by [ a1 , a2 ]q = a1 a2 − q a2 a1 , for q ∈ K\{0}, [ a1 , a2 , · · · , as ](q1 , q2 ,··· ,qs−1 ) = [ a1 , [ a2 , · · · , as ](q1 ,··· ,qs−2 ) ]qs−1 , for qi ∈ K\{0}. The following identities follow from the definition: [ a, bc ]v = [ a, b ]x c + x b [ a, c ] vx ,
x = 0,
[ ab, c ]v = a [ b, c ]x + x [ a, c ] b,
x = 0,
v x
[ a, [ b, c ]u ]v = [ [ a, b ]x , c ]
uv x
(3.1) (3.2)
+ x [ b, [ a, c ] ] , v x
u x
x = 0,
(3.3)
[ [ a, b ]u , c ]v = [ a, [ b, c ]x ] uvx + x [ [ a, c ] vx , b ] ux , x = 0, a, [ b1 , · · · , bs ](v1 , ··· , vs−1 ) = [ b1 , · · · , [ a, bi ], · · · , bs ](v1 , ··· , vs−1 ) ,
(3.4) (3.5)
i
[ a, a, b ](u, v) = [ a, a, b ](v, u) = a 2 b − (u + v) aba + (uv) ba 2 .
(3.6)
n ), we define the Definition 3.5. For the generators system of the algebra Ur,s (sl ˙ ˙ Q-gradation (where Q is the root lattice of sln ) as follows: deg(ωi±1 ) = deg(ωi±1 ) = deg(γ ± 2 ) = deg(γ ± 2 ) = deg(D ±1 ) = deg(D ±1 ) = 0, 1
deg(ai (±)) = 0,
1
deg(xi± (k)) = ±αi .
n ) has a triangular decomHence, the defining relations (D1)—(D9) ensure that Ur,s (sl position: 0 n ) = Ur,s ( Ur,s (sl n− ) ⊗ Ur,s (sln ) ⊗ Ur,s ( n ), where Ur,s ( n± ) = α∈ Q˙ ± Ur,s ( n± )α is generated respectively by xi± (k) (i ∈ I ), and 0 (sl n ) is the subalgebra generated by ω±1 , ω±1 , γ ± 21 , γ ± 21 , D ±1 , D ±1 and ai (±) Ur,s i i 0 (sl n ) is generated by the toral subalgebra Ur,s (sl n )0 and for i ∈ I , ∈ N. Namely, Ur,s n ) generated by those quantum imaginary the quantum Heisenberg subalgebra Hr,s (sl root vectors ai (±) (i ∈ I , ∈ N).
n± ), Definition 3.6. For α, β ∈ Q˙ + (a positive root lattice of sln ), xα± (k), xβ± (k ) ∈ Ur,s ( we define their “affine” quantum Lie bracket as follows: xα± (k), xβ± (k ) := xα± (k) xβ± (k ) − ωα , ωβ ∓1 xβ± (k ) xα± (k). (3.7) ∓1 ωα ,ωβ
By Definition 3.6, the formula (D7) will take the convenient form as ± 1 2 ± ± −1 x (k +1) = − j, i i, j
(k ), x (k+1) xi± (k), x ± j j i ∓1 i, j
j,i ∓1
.
(3.8)
By (3.6), the (r, s)-Serre relations (D92 ) & (D93 ) for m 1 = m 2 in the case of ai j = −1 can be reformulated as: xi± (m), xi± (m), x ± for 1 ≤ i < j < n, j (k) (r ±1 , s ±1 ) = 0, (3.9) xi± (m), xi± (m), x ± (k) = 0, for 1 ≤ j < i < n. j ∓1 ∓1 (s
,r
)
Two-Parameter Quantum Affine Algebra, Drinfel’d Realization
467
Remark 3.7. (1) For any nonsimple root α (= αi ) (i ∈ I ), the meaning of notation xα+ (k) (resp. xα− (k)) in Definition 3.6 has a bit of ambiguity, as is well-known even for quantum “classical” root vectors xα+ (0) which have different linearly-independent choices. However, the combinatorial approach to Lyndon words, together with the “affine” quantum Lie bracket, will give us a valid and specific choice for xα+ (k) which leads to a construction of quantum “affine” Lyndon basis for Ur,s ( n ), on which acting τ will yield a corresponding construction of quantum “affine” Lyndon basis for Ur,s ( n− ) (see Proposition 3.10 & Theorem 3.11 below). (2) In fact, (3.8) describes a kind of consistent constraints of quantum affine root vectors defined by some Lyndon words of different levels (if say, x ± j (k) have level k) which obeys the defining rule of Lyndon basis (see below) via Lyndon words as in the classical types, since from (3.8), we get the level-shifting formula a ± ± ∓ 2i j x xi± (k), x ± (k +1) = i, i
(k+1), x (k ) j i j i, j ∓1 i, j ∓1 ± + i, j ∓1 − j, i ±1 x ± (3.10) j (k ) x i (k+1) . Based on this formula, we will see that it makes it reasonable to give the definition of quantum affine root vector xα± (k) as in (3.14) & (3.15) below such that the level k completely concentrates on the component of the lowest index, in the ordered constituents of Lyndon basis. This will be clear from the proof of Proposition 3.10. (3) Let Ur,s (n) denote the subalgebra of Ur,s ( n ), generated by xi+ (0) (i ∈ I ). ∼ By definition, it is clear that Ur,s (n) = Ur,s (n), the subalgebra of Ur,s (sln ) generated by ei (i ∈ I ) (see [BGH1, Remarks (2), p. 391]). Now let us recall the construction of a Lyndon basis. The natural ordering < in I gives a total ordering of the + (0)}. Let A∗ be the set of all words in the alphabet A alphabet A = {x1+ (0), · · · , xn−1 (including the vacuum 1) and let u < v denote that word u is lexicographically smaller than word v. Recall that a word ∈ A∗ is a Lyndon word if it is lexicographically smaller than all its proper right factors (cf. [LR,R2,BH]). Let K[A∗ ] be the associative algebra of K-linear combinations of words in A∗ whose product is juxtaposition, namely, a free K-algebra. Let J be the (r, s)-Serre ideal of K[A∗ ] generated by elements {(ad xi+ (0))1−ai j (x +j (0)) | 1 ≤ i = j ≤ n − 1}. Clearly, Ur,s (n) = K[A∗ ]/J . Now given another ordering in A∗ , introducing a usual length function | · | for each word u ∈ A∗ . We say u w, if |u| < |w| or |u| = |w| and u ≥ w. Then we call a (Lyndon) word to be good with respect to the (r, s)-Serre ideal J if it cannot be written as a sum of strictly smaller words modulo J with respect to the ordering . From [R2], the set of quantum Lie brackets (or say, q-bracketings) of all good Lyndon words consists of a system of quantum root vectors of Ur,s (n). More precisely, we have a construction ˙ + (a positive root system of sln ) in the for any quantum root vector xα+ (0) with α ∈ following. Take a corresponding ordering (compatible with the natural ordering < on I ) of ˙ + = {αi j := αi + αi+1 + · · · + α j−1 = εi − ε j | 1 ≤ i < j ≤ n} with αi,i+1 = αi as follows (see [H, p. 533]): α12 , α13 , α14 , · · · , α1n , α23 , α24 , · · · , α2n , · · · , αn−1,n ,
(3.11)
˙ + (for definition, see [R2, Sect. 6]). Hence, for which is a convex ordering on ˙ + , by [R2], we can construct the quantum root vector xα+ (0) as a each α = αi j ∈
468
N. Hu, M. Rosso, H. Zhang
(r, s)-bracketing of a good Lyndon word in the inductive fashion: xα+i j (0) : = xα+i, j−1 (0), x +j−1 (0) −1
+ = · · · xi+ (0), xi+1 (0)
ωα
i, j−1
, ω j−1
i,i+1 −1
, · · · , x +j−1 (0)
+ = · · · xi+ (0), xi+1 (0) r , · · · , x +j−1 (0)
r
ωα
i, j−1
, ω j−1 −1
.
(3.12)
Applying τ to (3.12), we can obtain the definition of quantum root vector xα−i j (0) as below: − − (0), · · · , x (0), x (0) · · · . (3.13) xα−i j (0) = τ xα+i j (0) = x − j−1 i+1 i s s
Theorem 3.8. (i) The set
xα+n−1,n (0)n−1,n · · · xα+23 (0)23 xα+1n (0)1n · · · xα+13 (0)13 xα+12 (0)12 i j ≥ 0 is a Lyndon basis of Ur,s (n). (ii) The set
xα−12 (0)12 xα−13 (0)13 · · · xα−1n (0)1n xα−23 (0)23 · · · xα−n−1,n (0)n−1,n i j ≥ 0 is a Lyndon basis of Ur,s (n− ). ˙ + , we define the quantum affine root vectors xα± (k) of nonDefinition 3.9. For αi j ∈ ij trivial level k by + xα+i j (k) := · · · xi+ (k), xi+1 (0) r , · · · , x +j−1 (0) , (3.14) r − − , (3.15) xα−i j (k) := x − j−1 (0), · · · , x i+1 (0), x i (k) s · · · s
where τ xα±i j (±k) = xα∓i j (∓k). (k) For each fixed α ∈ Q˙ + , let us denote by Ur,s ( n ) the subspace of Ur,s ( n )α , consistα (k) ˙+ n )α = k∈Z Ur,s ( n )α . When α = αi ∈ ing of elements of level k. Hence, Ur,s ( (k) is a simple root, by definition, dim Ur,s ( n )αi = 1 for any level k. However, for any (k) nonsimple root α = αi (i ∈ I ), dim Ur,s ( n )α = ∞ for any level k. In this case, given a ˙ + , we call a tuple (β j1 , · · · , β jν ) (ν ≥ 1) a partition of root positive root α = αi j ∈ αi j if β j1 < · · · < β jν in the ordering given in (3.11) such that β j1 + · · · + β jν = αi j . If ν > 1, we say this partition is proper. Denote by P◦ (α) the set of all proper partitions of (k ) (k ) (k) root α. Obviously, we have Ur,sν ( n )β jν · · · Ur,s1 ( n )β j1 ⊆ Ur,s ( n )α if k1 + · · · + kν = k. Now we write (kν ) (k1 ) (k) (k) n ) := Ur,s ( n )β jν · · · Ur,s ( n )β j1 ⊆ Ur,s ( n )α α ( (β j ,··· ,β jν )∈P ◦ (α) 1 k1 +···kν =k
Two-Parameter Quantum Affine Algebra, Drinfel’d Realization
469
(k)
for the subspace of Ur,s ( n )α spanned by basis elements’ products of level k from those proper partitions pertaining to α. Using the Q-antiautomorphism τ on (−k) ( n ), we get α n− ) := τ (−k) ( n) . (k) α ( α (k) ± ˙ + , whose proof shows Then we have the following description on Ur,s ( n )α for α ∈ that Definition 3.9 makes sense.
˙ + (a positive root system of sln ), we Proposition 3.10. For 1 ≤ i < j ≤ n and αi j ∈ have (k) (k) (i) Ur,s ( n )αi j = Kxα+i j (k) αi j ( n ), (k) − (k) (ii) Ur,s ( n )αi j = Kxα−i j (k) αi j ( n− ). Proof. (i) We will use an induction on rank n, where n ≥ 2. Assume that i < j and k > 0, then by (3.10), we have a + + − 2i j + + x = i, i
(k+1), x (k −1) xi (k), x j (k ) i j i, j −1 i, j −1 + i, j −1 − j, i x +j (k −1) xi+ (k+1) , (3.16)
xi+ (k), x +j (−k )
ai j = i, i 2 xi+ (k−1), x +j (−k +1) i, j −1 + j, i − i, j −1 x +j (−k ) xi+ (k).
i, j −1
(3.17)
When n = 2, for any k ∈ N, repeatedly using (3.16) & (3.17), we get + k x1 (k), x2+ (k ) r = 1, 1 2 xα+13 (k+k )
+
k
1, 1
k −t+1 2
(r −s) x2+ (t−1) x1+ (k+k −t+1)
t=1 k
x1+ (k), x2+ (−k )
r
) ( n ), ≡ 1, 1 2 xα+13 (k+k ) mod α(k+k 13 k
= 1, 1 − 2 xα+13 (k−k )
+
k
1, 1 −
k −t 2
(s−r ) x2+ (−t) x1+ (k−k +t)
t=1 k
) ( n ), ≡ 1, 1 − 2 xα+13 (k−k ) mod (k−k α13
which means that in both cases, we have + k ) ( n ), for any k ∈ Z. x1 (k), x2+ (k ) r ≡ 1, 1 2 xα+13 (k+k ) mod α(k+k 13 Therefore, in rank 2 case, any elements for x2+(k )x1+ (k)) of degree α13 generated (except + + + by x1 (k) and x2 (k ) are of the form: x1 (k), x2+ (k ) a for any a ∈ K; however, + x1 (k), x2+ (k ) a = x1+ (k), x2+ (k ) r + (r − a) x2+ (k ) x1+ (k) k
) ( n ). ≡ (r s −1 ) 2 xα+13 (k+k ) mod α(k+k 13
(3.18)
470
N. Hu, M. Rosso, H. Zhang
This fact shows that (k) Ur,s ( n )α13 = Kxα+13 (k)
(k) n) α13 (
(k) − as vector spaces. Dually, we also have Ur,s ( n )α13 = Kxα−13 (k) (k) n− ) as vector α13 ( spaces. Now we assume that we have proved the results for rank < n, that is, for those αi j with 1 ≤ i < j < n. For the rank n case, owing to the ordering given in (3.11), we are (k) ± left to prove the remaining cases: Ur,s ( n )αin with 1 ≤ i < j = n. In view of the same observation as (3.18), we need only consider the following ele+ (k ) for 1 ≤ i < ments of degree αin and level k + k generated by xα+i,n−1 (k) and xn−1 + (k ) + (k ) . By definition (see = xα+i,n−1 (k), xn−1 n: xα+i,n−1 (k), xn−1 −1 ωα
i,n−1
, ωn−1
r
(3.14)) and using (3.4), (3.5) & (3.1), we have
+ + + xα+i,n−1 (k), xn−1 (k ) = xα+i,n−2 (k), xn−2 (0) , xn−1 (k ) (using (3.4)) r r r + + = xα+i,n−2 (k), xn−2 (0), xn−1 (k ) r r + + + xαi,n−2 (k), xn−1 (k ) , xn−2 (0) +r (2nd term = 0 by (3.5) & (D91 )) + + = xα+i,n−2 (k), xn−2 (0), xn−1 (k ) r (using (3.18): rank 2 case) r −1 k2 + + + xαi,n−2 (k), xn−2 (k ) , xn−1 (0) = (r s ) r r +
(using the inductive hypothesis) + + ∗t (r − s) xα+i,n−2 (k), xn−1 (t) xn−2 (k −t)
r
t
(using (3.1))
) + + mod α(k+k xα+i,n−1 (k+k ), xn−1 (0) ( n ), x (0) n−1 i,n−1 r r (by definition) + + ∗t (r − s) xn−1 (t) xα+i,n−2 (k), xn−2 (k −t) + r t (using the inductive hypothesis)
≡ (r s −1 )
k (n−1−i) 2
k (n−1−i)
) ( n) ≡ (r s −1 ) 2 xα+in (k+k ) mod α(k+k in −t) + + + ∗t (r − s) xn−1 (t) xα+i,n−1 (k+k −t) mod xn−1 (t) α(k+k ( n) i,n−1 t
≡ (r s −1 )
k (n−1−i) 2
) xα+in (k+k ) mod α(k+k ( n ), in
Two-Parameter Quantum Affine Algebra, Drinfel’d Realization
471
where in the 1st “≡”, we used the following fact: + + + + xα+i,n−2 (k), xn−1 (t) xn−2 (k −t) = xn−1 (t) xα+i,n−2 (k), xn−2 (k −t) r r + + + + xαi,n−2 (k), xn−1 (t) xn−2 (k −t) (2nd term = 0 by (3.5) & (D91 )) + + (k −t) ; = xn−1 (t) xα+i,n−2 (k), xn−2 r
while in the 2nd “≡”, we used the facts: ) ) + α(k+k ( n ), xn−1 (0) ⊆ α(k+k ( n ), i,n−1 in −t) + xn−1 (t) α(k+k ( n) i,n−1
r
) ⊆ α(k+k ( n ). in (k+k )
The latter is clear, due to the definition of αin ( n ). As for the first inclusion, we have the following argument provided that we notice the basis elements’ constituents of (k+k ) αi,n−1 ( n ). Indeed, for any basis element xα+ν ,n−1 (kν ) xα+
ν−1 ,ν
) (kν−1 ) · · · xα+i, (k1 ) ∈ (k+k n) αi,n−1 ( 1
of level k + k pertaining to a partition of αi,n−1 , using (3.2), we have + xα+ν ,n−1 (kν ) xα+ ,ν (kν−1 ) · · · xα+i, (k1 ), xn−1 (0) ν−1 1 r + + + + = xαν ,n−1 (kν ), xn−1 (0) xα ,ν (kν−1 ) · · · xαi, (k1 ) (by definition) ν−1 1 r + (0) + xα+ν ,n−1 (kν ) xα+ ,ν (kν−1 ) · · · xα+i, (k1 ), xn−1 1 ν−1 (2nd term = 0 by (3.5) & (D91 ) since 1 < · · · < ν < n−1) = xα+ν ,n (kν ) xα+
ν−1 ,ν
(kν−1 ) · · · xα+i, (k1 ) 1
) ( n ), ∈ α(k+k in
here k1 + · · · + kν = k + k . Up to now, we have finished the proof of (i). Using τ to (i), we can get the second statement (ii). The argument above (in fact used the so-called quantum calculations) implies the important conclusions about the quantum affine Lyndon basis we present below. Theorem 3.11. (i) The set (i) (i) (i) n−1,n + + 1n + 12 (i) xαn−1,n (i) xα1n (i) xα12 (i) ··· ··· st ≥ 0 i∈Z
i∈Z
i∈Z
(i)
is an “affine” Lyndon basis of Ur,s ( n ), where each index set Iαst = {i ∈ Z | st = 0} is finite.
472
N. Hu, M. Rosso, H. Zhang
(ii) The set
i∈Z
(i)
xα−12 (i)12
···
i∈Z
(i)
xα−1n (i)1n
···
i∈Z
(i)
xα−n−1,n (i)n−1,n
(i) st ≥ 0
is an “affine” Lyndon basis of Ur,s ( n− ), where each index set Iαst = {i ∈ Z | (i) st = 0} is finite. 3.3. The following main theorem establishes the Drinfel’d isomorphism between the n ) (in Definition 2.1) and the (r, s)-analogue two-parameter quantum affine algebra Ur,s (sl of Drinfel’d quantum affinization of Ur,s (sln ) (in Definition 3.1), which affords the twon ) as required. parameter Drinfel’d realization of Ur,s (sl Theorem 3.12 (Drinfel’d Isomorphism). For Lie algebra sln with n > 2, let θ = α1n be n ) −→ the maximal positive root. Then there exists an algebra isomorphism : Ur,s (sl Ur,s (sln ) defined by: for each i ∈ I, ωi −→ ωi ωi −→ ωi ω0 −→ γ −1 ωθ−1
ω0 −→ γ −1 ωθ−1 1
1
γ ± 2 −→ γ ± 2 1
γ ±2 D ±1 D ±1 ei
−→ −→ −→ −→
1
γ ±2 D ±1 D ±1 xi+ (0)
f i −→ xi− (0) e0 −→ xα−1n (1) · (γ −1 ωθ−1 ) = xθ− (1) · (γ −1 ωθ−1 ) f 0 −→ (γ −1 ωθ−1 ) · xα+1n (−1) = τ xα−1n (1) · (γ −1 ωθ−1 ) , . where ωθ = ω1 · · · ωn−1 , ωθ = ω1 · · · ωn−1
Since Lusztig’s symmetry of the braid group for the two-parameter cases is no more available when the rank of g is bigger than 2 (see [BGH1, Sect. 3]), this means that Beck’s approach (using the extended braid group actions (see [B2]) to prove the Drinfel’d Isomorphism Theorem) is not yet valid for the two-parameter cases here. Our treatment in the next section in fact develops a valid and interesting algorithm on the quantum calculations, which, as the reader has seen, is also a successful application to the combinatorial approach to the quantum “affine” Lyndon basis (based on the Drinfel’d generators) we introduced above. In some sense, our method also provides another new combinatorial proof via the quantum “affine” Lyndon basis even in the one-parameter setting.
Two-Parameter Quantum Affine Algebra, Drinfel’d Realization
473
4. Proof of the Drinfel’d Isomorphism Theorem 4.1. Let E i , Fi , ωi , ωi denote the images of ei , f i , ωi , ωi (i ∈ I0 ) in the algebra n ) under the mapping , respectively. Ur,s (sl n ) the subalgebra of Ur,s (sl n ) generated by E i , Fi , ω±1 , ω±1 Denote by U r,s (sl i i 1
1
(i ∈ I0 ), γ ± 2 , γ ± 2 , D ±1 and D ±1 , that is, " ! n ) := E i , Fi , ω±1 , ω ±1 , γ ± 21 , γ ± 21 , D ±1 , D ±1 i ∈ I0 . U r,s (sl i i
Thereby, to prove the Drinfel’d Isomorphism Theorem (Theorem 3.12) is equivalent to prove the following three theorems: n ) −→ U r,s (sl n ) is an epimorphism. Theorem 4.1. : Ur,s (sl n ) = Ur,s (sl n ). Theorem 4.2. U r,s (sl n ) −→ Ur,s (sl n ) is injective. Theorem 4.3. : Ur,s (sl 4.2. Proof of Theorem 4.1. We shall check that the elements E i , Fi , ωi , ωi (i ∈ 1 1 n ). I0 ), γ ± 2 , γ ± 2 , D ±1 , D ±1 satisfy the defining relations of (A1)–(A7) of Ur,s (sl First of all, the defining relations of Ur,s (sln ) imply that E i , Fi , ωi , ωi (i ∈ I ) generate n ), which is isomorphic to Ur,s (sln ). So we are left to a subalgebra Ur,s (sln ) of Ur,s (sl check the relations involving the index i = 0. n ). Obviously, the relations of (A1) hold, according to the defining relations of Ur,s (sl For (A2): we just check the following three relations involving i = 0, the remaining relations in (A2) are parallel to check. Using (D4), we get D E 0 D −1 = D xθ− (1) D −1 · (γ −1 ωθ−1 ) = r E 0 . For 0 ≤ j < n, noting that ωθ−1 , ω j = γ −1 ωθ−1 , ω j = ω0 , ω j (by Proposition 2.4), we have − −1 −1 ω j E 0 ω−1 ωθ ) ω−1 j = ω j x θ (1) (γ j = ωn−1 , ω j −1 · · · ω1 , ω j −1 E 0 = ω0 , ω j E 0 .
For 0 ≤ i < n, ω0 E i ω0−1 = (γ −1 ωθ−1 ) E i (γ ωθ ) = ωθ−1 E i ωθ , when i = 0, since ωi , ωθ −1 = ωi , ω0 (by Proposition 2.4), we obtain ωθ−1 E i ωθ = ωθ−1 xi+ (0) ωθ = ωi , ωθ −1 xi+ (0) = ωi , ω0 E i ; and when i = 0, since ωθ , ωθ−1 −1 = ω0 , ω0 (by Proposition 2.4), we have ωθ−1 E 0 ωθ = ωθ−1 xθ− (1) (γ −1 ωθ−1 ) ωθ = ω0 , ω0 E 0 . Similarly, one can verify the relations in (A3).
474
N. Hu, M. Rosso, H. Zhang
For (A4): first of all, when i = 0, we see that [ E 0 , Fi ] = xθ− (1) · (γ −1 ωθ−1 ), xi− (0) = − xi− (0), xθ− (1) ω , ω (γ −1 ωθ−1 ). θ
i
n ), we claim the folAccording to the corresponding cross relations held in Ur,s (sl lowing crucial lemma, whose proof using the typical quantum calculations is technical. Lemma 4.4. xi− (0), xθ− (1) ω , ω = 0, for i ∈ I . i
θ
Proof. (I) When i = 1, ω1 , ω0 = ω2 , ω1 = s, and ω1 , ωθ = s −1 . By (3.8) & (3.9), we have − − − − − (by (3.8)) x1 (0), xα13 (1) s −1 = x1 (0), x2 (0), x1 (1) 2,1
s −1 1 = −( 1, 2 −1 2, 1 ) 2 x1− (0), x1− (0), x2− (1) 1,2 −1 s 1 − − − 2 x1 (0), x1 (0), x2 (1) (r −1 ,s −1 ) = 0. = −(r s) (by (3.9)) Hence, repeatedly using (3.3), we have − − − − − x1 (0), xα1n (1) s −1 = x (0), xn−1 (0) , xα1,n−1 (1) (= 0 by (D91 )) 1 − (0), x1− (0), xα−1,n−1 (1) −1 (by (3.3)) + xn−1 s s − = xn−1 (0), x1− (0), xα−1,n−1 (1) −1 s s = ··· =
(inductively using (3.3) & (D91 ))
− xn−1 (0),
− xn−2 (0), · · ·
,
x1− (0),
xα−13 (1) s −1
··· (s,··· ,s)
= 0. , ω0 = r −1 , that is, ωn−1 , ωθ = r . By (3.3), (3.9) & (II) When i = n − 1, ωn−1 (D91 ), we have − xn−1 (0), xα−1n (1) r (by definition) ⎤ ⎡ − − − ⎦ (0), xn−1 (0), xn−2 (0), xα−1,n−2 (1) (using (3.3)) = ⎣ xn−1 s s r − − − (this term using (3.3)) = xn−1 (0), xn−1 (0), xn−2 (0) s , xα−1,n−2 (1) s r − − − − + s xn−1 (0), xn−2 (0), xn−1 (0), xα1,n−2 (1) (= 0 by (3.5), (D91 )) r
Two-Parameter Quantum Affine Algebra, Drinfel’d Realization
=
− − − xn−1 (0), xn−1 (0), xn−2 (0) (s,r ) ,
+r
− − xn−1 (0), xn−2 (0)
475
xα−1,n−2 (1)
(this term= 0 by (3.9)) s
− , xn−1 (0), xα−1,n−2 (1)
s
= 0.
(= 0 by (3.5), (D91 )) r −1 s
(III) When 1 < i < n − 1, ωi , ω0 = 1, that is, ωi , ωθ = 1. In order to derive the required result, we first need tomake two claims below: Claim (A). xi− (0), xα−1,i+1 (1) = xi− (0), xα−1,i+1 (1) = 0, for i ≥ 2. ωi ,ωα1,i+1
r
In fact, by (3.3), (3.9) & (D91 ), we have
xi− (0), xα−1,i+1 (1) ⎡ =
⎣ x − (0), i
= xi− (0),
r
xi− (0),
(by definition)
− xi−1 (0),
xα−1,i−1 (1)
⎤
⎦ r
s s
(using (3.3))
− (this term using (3.3)) xi− (0), xi−1 (0) s , xα−1,i−1 (1) s r − − − − + s xi (0), xi−1 (0), xi (0), xα1,i−1 (1) (= 0 by (3.5), (D91 )) r − − = xi (0), xi− (0), xi−1 (0) (s,r ) , xα−1,i−1 (1) (= 0 by (3.9)) s − − − − (= 0 by (3.5), (D91 )) +r xi (0), xi−1 (0) s , xi (0), xα1,i−1 (1) r −1 s
= 0. Claim (B).
xi− (0), xα−1,i+2 (1)
ωi ,ωα1,i+2
=
xi− (0), xα−1,i+2 (1)
= 0 (i ≥ 2), if
r = −s. By definition, we note that [ b, a ]u = −u [ a, b ]u −1 . So, we get '
( − − xi− (0), xi+1 (0), xi− (0) = −s xi− (0), xi− (0), xi+1 (0) (s −1 ,r −1 ) (by (3.6)) (s,r −1 ) − = −s xi− (0), xi− (0), xi+1 (0) (r −1 ,s −1 ) (by (3.9)) = 0.
476
N. Hu, M. Rosso, H. Zhang
We then consider the following deduction: − − − − − − xi (0), xα1,i+2 (1) −1 = xi (0), xi+1 (0), xi (0), xα1i (1) s (by (3.3)) r s s r −1 s − = xi− (0), xi+1 (0), xi− (0) s , xα−1i (1) s −1 (using (3.3)) r s − − − − (this term= 0 by (3.5), (D91 )) + s xi (0), xi (0), xi+1 (0), xα1i (1) r −1 s − − − − = xi (0), xi+1 (0), xi (0) (s, r −1 ) , xα1i (1) (this term= 0 by the above) 2 s − + r −1 xi+1 (0), xi− (0) s , xα−1,i+1 (1) (using (3.4)) rs ⎡ − − −1 ⎣ − + xα−1,i+2 (1), xi− (0) −1 xi+1 (0), xi (0), xα1,i+1 (1) =r r s r s 2 = xα−1,i+2 (1), xi− (0) −1 (1st term= 0 by Claim (A)). r
s
Expanding both sides of the above equation according to definition, we easily get (1 + r −1 s) xi− (0), xα−1,i+2 (1) = 0. Thus the required result is obtained under the assumption. Now applying (3.5), we can get − − [ xi− (0), xθ− (1) ] = xn−1 (0), · · · , xi+2 (0), xi− (0), xα−1,i+2 (1) (s,··· ,s) =0
(by Claim (B)).
This completes the proof of Lemma 4.4.
Next, we turn to check the relation below, whose argument (using the quantum calculations) is crucial to our verification on compatibilities of the defining relations system n ) mentioned in Remark 3.3. of Ur,s (sl Proposition 4.5. [ E 0 , F0 ] =
ω0 −ω0 r −s .
Proof. Using (D1) & (D5), we have −1 [ E 0 , F0 ] = xα−1n (1) γ −1 ωθ −1 , γ −1 ωθ xα+1n (−1) −1 = xα−1n (1), xα+1n (−1) · (γ −1 γ −1 ωθ −1 ωθ ). Note that for j ≥ 1, we have − x− j+1 (0), ω j s = (r − s) ω j x j+1 (0), ωj X, x +j+1 (k) = ωj X, x +j+1 (k) , r + x j (k), Y ω j+1 = x +j (k), Y ω j+1 . r
x− j+1 (0), ω j
s
= 0,
(4.1)
(4.2) (4.3) (4.4)
Two-Parameter Quantum Affine Algebra, Drinfel’d Realization
477
So (4.2) implies that there hold − + (0), x (0), x (0) = ωj x − x− j j+1 j j+1 (0), s − − x2 (0), x1 (1), x1+ (−1) s = γ ω1 x2− (0), − + (0), x (0) , x (0) = −x − x− j+1 j+1 j j (0) ω j+1 . s
Now let us write briefly
+ (0) x1+ (−1), x2+ (0), · · · , xi−1
r,··· ,r
:=
(4.5) (4.6) (4.7)
+ · · · x1+ (−1), x2+ (0) r , · · · r , xi−1 (0) . r
Thus, by (3.5), we have − + xα1i (1), xα+1i (−1) = xα−1i (1), x1+ (−1), x2+ (0), · · · , xi−1 (0) r,··· ,r
− + + + = xα (1), x1 (−1) , x2 (0), · · · , xi−1 (0) 1i r,··· ,r
i−1 + . x1+ (−1), x2+ (0), · · · , xα−1i (1), x +j (0) , · · · j=2 r,··· ,r
(4.8) (i) For j = 1, by (3.5), (D8) & (4.6), we have − − xα1i (1), x1+ (−1) = xi−1 (0), · · · , x2− (0), x1− (1), x1+ (−1) s (by (4.6)) (s,··· ,s) = γ ω1 xα−2i (0),
so that
M(i) : =
xα−1i (1),
(i > 2),
x1+ (−1)
, x2+ (0), · · ·
+ xi−1 (0)
r,··· ,r
+ ω1 xα−2i (0), x2+ (0) r , · · · , xi−1 (0) r,··· ,r
+ = γ ω1 (0) xα− (0), x2+ (0) , · · · , xi−1 2i r,··· ,r
− + + = γ ω1 ω2 xα3i (0), x3 (0) r , · · · , xi−1 (0) r,··· ,r
− + = γ ω1 ω2 (0) xα3i (0), x3+ (0) , · · · , xi−1 r,··· ,r
=γ
= ···
− + = γ ω1 · · · ωi−2 xi−1 (0), xi−1 (0) = γ ω1 · · · ωi−2
ωi−1 − ωi−1
r −s
,
(i > 2),
478
N. Hu, M. Rosso, H. Zhang
where we used the following identities, respectively: ωj−1 xα−ji (0), x +j (0) = ωj−1 xα−ji (0), x +j (0) , (by (4.3)), r xα−ji (0), x +j (0) = ωj xα−j+1,i (0), (by (3.13) & (4.5)). (ii) For j = i − 1, again by (3.5), (3.3) & (4.7), we get − − − + + xα1i (1), xi−1 (0) = xi−1 (0), xi−1 (0) , xi−2 (0), xα−1,i−2 (1) (by (3.3)) s s − − + − (by (4.7)) = xi−1 (0), xi−1 (0) , xi−2 (0) s , xα1,i−2 (1) s − − − + xi−1 (0), xi−1 (0) , xα1,i−2 (1) + s xi−2 (0), (= 0) − = − xi−2 (0) ωi−1 , xα−1,i−2 (1) s − − = − xi−2 (0), xα1,i−2 (1) ωi−1 s
= − xα−1,i−1 (1) ωi−1 , where we notice that (*):
− + (0) , x − xi−1 (0), xi−1 (1) = 0. α1,i−2
Thereby, we further obtain N (i) : =
x1+ (−1),
=−
x2+ (0), · · ·
,
+ xα−1i (1), xi−1 (0)
xα+1,i−1 (−1), xα−1,i−1 (1) ωi−1
r
,··· r,··· ,r
(by (4.4))
= − xα+1,i−1 (−1), xα−1,i−1 (1) ωi−1 = xα−1,i−1 (1), xα+1,i−1 (−1) ωi−1 .
(iii) For 1 < j < i − 1, by (3.5), (3.3), (4.7) & (D91 ), we obtain ⎤ ⎡ − − + − ⎦ xα−1i (1), x +j (0) = ⎣ xi−1 (0), · · · x − j (0), x j (0) , x j−1 (0), x α1, j−1 (1) s s (s,··· ,s) =
(by (3.3)) − − − + − xi−1 (0), · · · x j (0), x j (0) , x j−1 (0) , xα1, j−1 (1) s (s,··· ,s)
(by (4.7))
Two-Parameter Quantum Affine Algebra, Drinfel’d Realization
+s
− xi−1 (0), · · ·
x− j−1 (0),
479
+ x− j (0), x j (0)
, xα−1, j−1 (1)
(s,··· ,s)
(= 0 by (*)) − − − − = − xi (0), · · · , x j+1 (0), x j−1 (0) ω j , xα1, j−1 (1) s (s,··· ,s) − = − xi− (0), · · · , x − j+1 (0), x α1 j (1) ω j s (s,··· ,s) − = − xi− (0), · · · x − (0), x (1) (by (3.5), (D91 )) ωj α j+1 1j (s,··· ,s) = 0, where in the fourth and fifth equality “=” we used the following identities, respectively: − − − − x− j−1 (0) ω j , x α1, j−1 (1) s = x j−1 (0), x α1, j−1 (1) s ω j = x α1 j (1) ω j , − − − x− (0), x (1) ω = x (0), x (1) ωj. j α1 j α1 j j+1 j+1 s
As a result of (i), (ii) & (iii), (4.8) becomes − xα1i (1), xα+1i (−1) = M(i) + N (i) = M(i) + xα−1,i−1 (1), xα+1,i−1 (−1) ωi−1 = M(i) + M(i−1) ωi−1 + xα−1,i−2 (1), xα+1,i−2 (−1) ωi−2 ωi−1 = ··· = M(i) + M(i−1) ωi−1 + M(i−2) ωi−2 ωi−1 + · · · +M(3) ω3 · · · ωi−1 + xα−12 (1), xα+12 (−1) ω2 · · · ωi−1 =
γ ωα 1i − γ ωα1i r −s
,
(i > 1),
where we used (D8) to get
γ ω1 − γ ω1 xα−12 (1), xα+12 (−1) = . r −s
Therefore, by (4.9), (4.1) takes the required formula: [ E 0 , F0 ] =
γ −1 ωθ−1 − γ −1 ωθ−1 . r −s
The proof of Proposition 4.5 is complete.
(4.9)
480
N. Hu, M. Rosso, H. Zhang
For (A5): We need only to verify that [ E 0 , E j ] = 0 and [ F0 , F j ] = 0 for1 < j
2 and r = −s. The argument for Claim (C) is technical. Indeed, by induction on n, we have: when n = 3, by (3.8), one gets − x2 (1), xα−13 (1) s = x2− (1), x2− (0), x1− (1) s (by (3.8)) s − 1 − − = −(r s) 2 x2 (1), x1 (0), x2 (1) r −1 s 1 = (r s −1 )− 2 x2− (1), x2− (1), x1− (0) (r,s) 1 = (r s −1 )− 2 x2− (1), x2− (1), x1− (0) (s,r ) (by (3.6)) = 0,
(by (3.9)),
which is exactly the (r, s)-Serre relation (see (3.9)). For n > 3, we first notice the fact: − − xn−1 (0), xα−2n (1) ω ,ω = xn−1 (0), xα−2n (1) r = 0, n−1
α2n
for n > 3,
which can be proved using the same method of the proof of (II) in Lemma 4.4. We thus have − − xα2n (1), xθ− (1) r = xn−1 (0), xα−2,n−1 (1) , xθ− (1) (by (3.4)) s r − = xn−1 (0), xα−2,n−1 (1), xθ− (1) 1 rs − − − + xn−1 (0), xθ (1) r , xα2,n−1 (1) (= 0 by Claim (A)) s ⎡ ⎤ − − ⎦ = ⎣ xn−1 (0), xα−2,n−1 (1), xn−1 (0), xα−1,n−1 (1) (by (3.3)) s 1 rs ⎡ ⎤ − − ⎦ xα−2,n−1 (1), xn−1 = ⎣ xn−1 (0), (0) −1 , xα−1,n−1 (1) s s2 r s
(4.10)
482
N. Hu, M. Rosso, H. Zhang
⎡
− + s −1 ⎣ xn−1 (0),
− xn−1 (0),
xα−2,n−1 (1), xα−1,n−1 (1) s
⎤ ⎦ s2
rs
(2nd sumand= 0 using induction hypothesis) − −1 = −s xn−1 (0), xα−2n (1), xα−1,n−1 (1) 2 (by (3.3)) s rs − = −s −1 (= 0 by (4.10)) xn−1 (0), xα−2n (1) r , xα−1,n−1 (1) 3 s − − r s −1 xα−2n (1), xn−1 (0), xα−1,n−1 (1) (by definition) s r −1 s 2 = −r s −1 xα−2n (1), xθ− (1) r −1 s 2 . By definition, expanding both sides of the above identity gives us (1 + r s −1 ) xα−2n (1) xα−1n (1) = (r + s) xα−1n (1) xα−2n (1), which means xα−2n (1), xα−1n (1) s = 0, under the assumption r = −s. For (A7): The verification is analogous to that of (A6). n ) is generated by 4.3. Proof of Theorem 4.2. We shall show that the algebra Ur,s (sl ±1 ±1 ± 21 ± 21 ±1 ±1 E i , Fi , ωi , ωi , γ , γ , D , D (i ∈ I0 ). To this end, we need to prove the following results. n ), then Lemma 4.7. (1) x1− (1) = E 2 , E 3 , · · · , E n−1 , E 0 (r,··· ,r ) γ ω1 ∈ U r,s (sl n ). for any i ∈ I , xi− (1) ∈ U r,s (sl (2) x1+ (−1) = τ E 2 , E 3 , · · · , E n−1 , E 0 (r,··· ,r ) γ ω1 = γ ω1 [F0 , Fn−1 , · · · , F3 , n ), then for any i ∈ I , x + (−1) ∈ U r,s (sl n ). F2 ] s,··· ,s ∈ U r,s (sl i
= xα− (1) ωi+1 · · · ωn−1 γ −1 ω−1 for i ≥ 1, where E(n−1) Proof. (1) Set E(i) = θ 1,i+1 E 0 . Observing that xi+ (0), xα−1,i+1 (1) = xα−1i (1) ωi in the proof (see case (ii)) of Proposition 4.5, we get an important recursive relation: + − −1 −1 = x (0), x (1) ω · · · ω γ ω E i , E(i) i+1 n−1 α1,i+1 i θ r r + − −1 −1 = xi (0), xα1,i+1 (1) ωi+1 · · · ωn−1 γ ωθ = E(i−1).
(4.11)
Recursively using the above relations, we obtain x1− (1) = E(1)γ ω1 = E 2 , E(2) γ ω1 = · · · r γ ω1 = E 2 , · · · , E n−1 , E(n−1) (r,··· ,r ) = E 2 , · · · , E n−1 , E 0 (r,··· ,r ) γ ω1 n ). ∈ U r,s (sl
(4.12)
Two-Parameter Quantum Affine Algebra, Drinfel’d Realization
483
n ) for i ≥ 1. Notice Now suppose that we already have obtained xi− (1) ∈ U r,s (sl that − xi+ (0), xi− (0) , xi+1 (1) r −1 ωi−1 ( ' − (1) ω−1 (r s) xi+ (0), xi− (0), xi+1 (r −1 ,1) i 1 − (0), xi− (1) (s,1) ωi−1 −(r s) 2 xi+ (0), xi+1 1 Fi+1 , xi− (1) s , E i ωi−1 (r s) 2 n ), U r,s (sl
− xi+1 (1) = (r s)
= = = ∈
(by (3.4)), (by (3.8)),
(4.13)
which gives rise to the recursive construction of some basic quantum real root vectors of level 1. Hence, we obtain the required result. x+ · · · ωi+1 (−1) for i ≥ 1, where (2) Set F(i) = τ ( E(i)) = γ −1 ωθ−1 ωn−1 α1,i+1 = F(n−1) = F0 . Applying τ to (4.11), we see that F(i), Fi s = F(i−1) and F(1) −1 + −1 γ ω1 x1 (−1), which implies the first claim. The remaining claim follows from − + (−1) = τ (xi+1 (1)) = (r s) 2 ωi−1 xi+1 1
n ). Fi , xi+ (−1), E i+1 r ∈ U r,s (sl
This completes the proof of Lemma 4.7.
(4.14)
We observe that Lemma 4.7, together with (4.12), (4.13) & (4.14), gives the construction of the Drinfel’d generators of level 1. Furthermore, the first conclusion of the following lemma gives the construction of the quantum imaginary root vectors of any level (= 0), while the second gives the construction of some basic quantum real root vectors of any level. Actually, as a result of Definition 3.9 and Lemma 4.8 below, this approach also gives the construction of all quantum real root vectors of any level. n ), for ∈ Z\{0}. Lemma 4.8. (1) ai () ∈ U r,s (sl n ), for k ∈ Z. (2) xi± (k) ∈ U r,s (sl Proof. (1) At first, it follows from (D8) that n ), ai (1) = ωi−1 γ 1/2 xi+ (0), xi− (1) ∈ U r,s (sl −1 1/2 + − n ). xi (−1), xi (0) = τ (ai (1)) ∈ U r,s (sl ai (−1) = ωi γ
(4.15) (4.16)
n ) for all ≤ and Suppose that we have already obtained ai (± ) ∈ U r,s (sl some ≥ 1. Now using (D6n ) & (D8), we have the following expansion (in fact, the expansions of both sides are the same which also show the compatibility between (D6n ) and (D8)
484
N. Hu, M. Rosso, H. Zhang
for n = 1, 2.): n ) x + (0), ai (), x − (1) U r,s (sl i i = xi+ (0), ai () , xi− (1) + ai (), xi+ (0), xi− (1) = ∗γ 2 xi+ (), xi− (1) 1 + ai (), γ − 2 ωi ai (1) (this term= 0 by (D2)) ⎡ ⎢ 1 = ∗(γ γ )− 2 γ − 2 ωi ⎢ ⎣ai (+1) +
1< p≤+1
k k =+1
⎤
⎥ ∗ (r −s) p−1 ai ( j1 ) · · · ai ( j p )⎥ ⎦, (4.17)
n ). where scalars ∗, ∗ ∈ K\{0}. So ai (+1) ∈ U r,s (sl n ). Thereby, Applying τ to the above formula, we can get ai (−(+1)) ∈ U r,s (sl ai () ∈ U r,s (sln ), for any ∈ Z\{0}. (2) follows from (D6) (setting i = j and k = 0), together with (1). n ) = Ur,s (sl n ), that is to say, the latter is indeed Therefore, we have proved U r,s (sl 1 1 ±1 ±1 generated by E i , Fi , ωi , ωi , γ ± 2 , γ ± 2 , D ±1 , D ±1 (i ∈ I0 ). 4.4. Proof of Theorem 4.3. From Sects. 4.2 & 4.3, we actually get an algebra epimorn ) −→ Ur,s (sl n ), since both algebras have essentially the same phism : Ur,s (sl generators system enjoying the defining relations from the former. n ) and Ur,s (sl n ) have commonly a natural Q-gradation Notice that both algebras Ur,s (sl structure (see Corollary 2.8), which is by definition preserved evidently under . On n )0 and Ur,s (sl n )0 generated by the same the other hand, both toral subalgebras Ur,s (sl generators system of group-like elements
1 1 ±1 ωi±1 , ω i (i ∈ I0 ), γ ± 2 , γ ± 2 , D ±1 , D ±1 are obviously isomorphic with respect to 0 := |Ur,s (sl n )0 .
n are two subAssigned to the positive or negative nilpotent Lie subalgebra n ± of sl ± ± ± n ) and Ur,s (sl n ) algebras Ur,s ( n ) and Ur,s ( n ). Both are generated by n in Ur,s (sl respectively. Denote ± := |Ur,s ( n ± ) . By Corollary 2.7, the double structure of n ) in Theorem 2.5 implies its triangular decomposition structure Ur,s ( n −) ⊗ Ur,s (sl n )0 ⊗ Ur,s ( n + ). This fact likewise indicates that has a corresponding decomUr,s (sl position − ⊗ 0 ⊗ + . So, we are left to show ± are isomorphic. It suffices to consider the epimorphism + : Ur,s ( n + ) −→ Ur,s ( n + ). + + Observe that Ur,s ( n ) (resp. Ur,s ( n )) is generated by elements ei (resp. E i ) for i ∈ I0 and subject to (r, s)-Serre relations (A5) & (A6). To check that + is an isomorphism, now we fix r = q and specialize s at q −1 as follows. Note that Ur,s ( n + ) can be viewed as defined over the Laurent polynomials ring ±1 ±1 Q[r , s ]. Let A ⊂ Q(r, s) be the localization of ring Q[r ±1 , s ±1 ] at the maximal + be the A-subalgebra of U ( + ideal (r s − 1). Let UA r,s n ) generated by ei (i ∈ I0 ). Let + + (r s−1)UA be the ideal generated by (r s−1) in UA . Define the algebra Uq+ , the specializa+ /(r s − 1)U + . Obviously, U + ∼ U ( + n + ) at s = q −1 , by Uq+ = UA tion of Ur,s ( q = q n ), the A
Two-Parameter Quantum Affine Algebra, Drinfel’d Realization
485
n ). However, in this case, + induces usual one-parameter quantum subalgebra of Uq (sl + + + the isomorphism : Uq ( n ) −→ Uq ( n ) given by the Drinfel’d isomorphism in the one-parameter case (see [B2] or [J2]). Since specialization doesn’t change the root multiplicities, + : Ur,s ( n + ) −→ Ur,s ( n + ) is an isomorphism. Up to now, from subsections 4.2—4.4, we have finally established the Drinfel’d isomorphism in the two-parameter case. Acknowledgement. Part of this work was done when Hu visited l’DMA, l’Ecole Normale Supérieure de Paris from October to November, 2004, the Fachbereich Mathematik der Universität Hamburg from November 2004 to February 2005, as well as the ICTP (Trieste, Italy) from March to August, 2006. He would like to express his deep thanks to ENS de Paris, H. Strade and ICTP for the hospitalities and the supports from ENS, DFG and ICTP. The authors are indebted to the referee for useful comments.
References [B1]
Beck, J.: Convex bases of PBW type for quantum affine algebras. Commun. Math. Phys. 165, 193– 199 (1994) [B2] Beck, J.: Braid group action and quantum affine algebras. Commun. Math. Phys. 165, 555–568 (1994) [BGH1] Bergeron, N., Gao, Y., Hu, N.: Drinfel’d doubles and Lusztig’s symmetries of two-parameter quantum groups. J. Algebra 301, 378–405 (2006) [BGH2] Bergeron, N., Gao, Y., Hu, N.: Representations of two-parameter quantum orthogonal and symplectic groups. “Proceedings of the International Conference on Complex Geometry and Related Fields”, AMS/IP Studies in Adv. Math. 39, Providence, RI: Amer. Math. Soc., 2007, pp. 1–21 [BH] Bai, X., Hu, N.: Two-parameter quantum groups of exceptional type E-series and convex PBW-type basis. Algebra Colloquium, to appear, available at http://arXiv.org/list/Math.QA/0605179, 2006 [BW1] Benkart, G., Witherspoon, S.: Two-parameter quantum groups and Drinfel’d doubles. Alg. Rep. Theory 7, 261–286 (2004) [BW2] Benkart, G., Witherspoon, S.: Representatons of two-parameter quantum groups and Schur-Weyl duality. In: Hopf algebras, Lecture Notes in Pure and Appl. Math., 237, New York: Dekker, 2004, pp. 62–92 [BW3] Benkart, G., Witherspoon, S.: Restricted two-parameter quantum groups, In: Fields Institute Communications, “Representations of Finite Dimensional Algebras and Related Topics in Lie Theory and Geometry”, Vol. 40, Providence, RI: Amer. Math. Soc., 2004, pp. 293–318 [Da] Damiani, I.: A basis of type Poincaré-Birkhoff-Witt for the quantum algebra of sl(2). J. Algebra 161, 291–310 (1993) [DI1] Ding, J.T., Iohara, K.: Generalization of Drinfel’d quantum affine algebras. Lett. Math. Phys. 41(2), 181–193 (1997) [DI2] Ding, J.T., Iohara, K.: Drinfel’d comultiplication and vertex operators. J. Geom. Phys. 23, 1–13 (1997) [Dr1] Drinfel’d, V.G.: Quantum groups. ICM Proceedings (New York, Berkeley, 1986), Providencem RI: Amer. Math. Soc., pp. 798–820, 1987 [Dr2] Drinfel’d, V.G.: A new realization of Yangians and quantized affine algebras. Soviet Math. Dokl. 36, 212–216 (1988) [FJ] Frenkel, I., Jing, N.: Vertex representations of quantum affine algebras. Proc. Nat’l. Acad. Sci. USA. 85, 9373–9377 (1998) [G] Grossé, P.: On quantum shuffle and quantum affine algebras. J. Algebra 318, 495–519 (2007) [Ga] Garland, H.: The arithmetic theory of loop algebras. J. Algebra 53, 480–551 (1978) [H] Hu, N.: Quantum divided power algebra, q-derivatives, and some new quantum groups. J. Algebra 232, 507–540 (2000) [HS] Hu, N., Shi, Q.: The two-parameter quantum group of exceptional type G 2 and Lusztig’s symmetries. Pacific J. Math. 230, 327–345 (2007) [J1] Jing, N.: Twisted vertex representations of quantum affine algebras. Invent. Math. 102, 663–690 (1990) [J2] Jing, N.: On Drinfel’d realization of quantum affine algebras. Ohio State Univ. Math. Res. Inst. Publ. 7, Berlin: de Gruyter, pp. 195–206, 1998 [K] Kac, V.: Infinite Dimentional Lie Algebras. 3rd edition, Cambridge: Cambridge Univ. Press, 1990 [KS] Klimyk, A., Schmüdgen, K.: Quantum Groups and Their Reprsentations. Berlin: Springer, 1997 [KT] Khoroshkin, S.M., Tolstoy, V.N.: On Drinfel’d realization of quantum affine algebras. J. Geom. Phys. 11, 445–452 (1993)
486
N. Hu, M. Rosso, H. Zhang
[LR]
Lalonde, M., Ram, A.: Standard Lyndon bases of Lie algebras and enveloping algebras. Trans. Amer. Math. Soc. 347(5), 1821–1830 (1995) Levendorskii, S., Soibel’man, Y., Stukopin, V.: Quantum Weyl group and universal quantum R-matrix (1) for affine Lie algebra A1 . Lett. Math. Phys. 27(4), 253–264 (1993) Rosso, M.: Quantum groups and quantum shuffles. Invent. Math. 133(2), 399–416 (1998) Rosso, M.: Lyndon bases and the multiplicative formula for R-matrices. Preprint, 2002 Takeuchi, M.: A two-parameter quantization of G L(n). Proc. Japan Acad. 66(Ser. A), 112–114 (1990)
[LSS] [R1] [R2] [T]
Communicated by A. Connes
Commun. Math. Phys. 278, 487–548 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0403-3
Communications in
Mathematical Physics
Lagrangian Approach to Sheaves of Vertex Algebras Fyodor Malikov Department of Mathematics, University of Southern California, Los Angeles, CA 90089, USA. E-mail:
[email protected] Received: 29 September 2006 / Accepted: 8 June 2007 Published online: 8 January 2008 – © Springer-Verlag 2007
Abstract: We explain how sheaves of vertex algebras are related to mathematical structures encoded by a class of Lagrangians. The exposition is focused on two examples: the WZW model and the (1,1)-supersymmetric σ -model. We conclude by showing how to construct a family of vertex algebras with base the Barannikov-Kontsevich moduli space thus furnishing the B-model moduli for Witten’s half-twisted model. Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Diffieties and Functional Pre-Symplectic Structures . . . . . . . . . 1.1 The jets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Local formulas . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 De Rham complex . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Differential equations . . . . . . . . . . . . . . . . . . . . . . . 1.5 Functional pre-symplectic structure . . . . . . . . . . . . . . . . 1.6 Calculus of variations and integrals of motion. Bosonic σ -model 2. Vertex Poisson Algebras . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Tensor products . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 From vertex Poisson algebras to Courant algebroids . . . . . . . 2.5 Symbols of vertex differential operators . . . . . . . . . . . . . 2.6 A sheaf-theoretic version . . . . . . . . . . . . . . . . . . . . . 2.7 A natural sheaf of SVDOs . . . . . . . . . . . . . . . . . . . . 2.8 The Lagrangian interpretation . . . . . . . . . . . . . . . . . . . 2.9 An example: WZW model . . . . . . . . . . . . . . . . . . . .
Partially supported by the National Science Foundation.
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
488 492 492 493 494 495 496 500 506 507 507 508 508 512 513 515 518 519
488
3.
F. Malikov
Supersymmetric Analogues . . . . . . . . . . . . . . . . . 3.1 Bits of supergeometry . . . . . . . . . . . . . . . . . . 3.2 Functional pre-symplectic structure . . . . . . . . . . . 3.3 Calculus of variations . . . . . . . . . . . . . . . . . . 3.4 An example: (1,1)-supersymmetric σ -model . . . . . . 3.5 Vertex Poisson algebra interpretation. Witten’s models 3.6 Quantization. B-model moduli . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
525 525 528 531 532 538 543
Introduction More than anything else, the present notes are a report on what we have been able to make of the recent papers by Kapustin and Witten [Kap,W4]. Even after the tremendous effort [QFS], much of mathematical literature treating various aspects of string theory and related topics is conspicuously lacking any mention of the Lagrangian,1 an object that is at the heart of a physical theory. We would like to make precise the relation of sheaves of vertex algebras [MSV] to a Lagrangian field theory. What vertex algebras and Lagrangians have in common is that both produce infinite dimensional Lie algebras. If V is a vertex algebra, then, in particular, it is a vector space with multiplications (n) , n ∈ Z, and a derivation T . The corresponding Lie algebra is Lie(V ) = (V /T (V ),(0) ).
(0.1)
We will be concerned with the class of vertex algebras, or rather of sheaves thereof, introduced in [MSV]. To make our life easier, we will, first, mostly consider their quasiclassical limits, i.e., the corresponding vertex Poisson algebras and, second, work in the C ∞ -setting. This class comprises vertex algebra analogues of sheaves of symbols of differential operators, and their natural habitat is different versions of ∞-jet spaces. To give an example, let be a 1-dimensional real manifold and consider J ∞ (T ∗ M ), the space of ∞-jets of sections of the trivial bundle T ∗ M = T ∗ M × → . We will find it convenient to work with families of such jet-spaces, to be denoted J ∞ (T ∗ M/ ), that are are naturally and similarly attached to a “time” fibration τ: → with fiber . The push-forward of the structure sheaf O J ∞ (T ∗ M/ ) on M := M × carries a structure of a sheaf of vertex Poisson algebras, a vertex analogue of the Poisson algebra of functions on T ∗ M. This sheaf is natural in that the assignment M → O J ∞ (T ∗ M/ ) is functorial in M. Such sheaves of symbols of vertex differential operators, SVDO, can be defined axiomatically and classified [GMS1]. A simple quasiclassical version of the classification obtained in [GMS1] shows that locally (on M) all SVDOs are isomorphic and the set of isomorphism classes is identified with H 3 (M, R). In particular, an SVDO . O J ∞ (T ∗ M/ ) + H is defined for each closed 3-form H . To each such SVDO construction (0.1) attaches a sheaf of Lie algebras on M, .
.
.
Lie(O J ∞ (T ∗ M/ ) + H ) = (O J ∞ (T ∗ M/ ) + H )/T (O J ∞ (T ∗ M/ ) + H ). (0.2) 1 [AKSZ,FL] are notable exceptions.
Lagrangian Approach to Sheaves of Vertex Algebras
489
One of the more interesting examples arises when M is a compact simple Lie group. In this case the set of isomorphism classes of SVDOs is a 1-dimensional vector space, and we denote the representatives by S DG,k , k ∈ C. If we let g be the corresponding Lie algebra and V (g)k the corresponding vertex Poisson algebra, then there arise two Poisson-commuting subalgebras [F,FP,AG,GMS2] jl
jr
V (g)k → (G, S DG,k ) ← V (g)−k
(0.3)
engendered by the left/right translations of G by itself. If is a circle, this implies the existence of 2 copies of the affine Lie algebra jl
jr
gˆ k → (G, Lie(S DG,k )) ← gˆ −k . The case of k = 0 is exceptional; in this case the gˆ k × gˆ −k -module structure of the SVDO has a form reminiscent of the WZW (G, S DG,k ) = Vλ,k ⊗ Vλ∗ ,−k , (0.4) λ
where Vλ,k is the gˆ k -module induced from the finite dimensional simple g-module with highest weight λ; λ∗ is the dual weight. On the Lagrangian side one deals with a somewhat different jet-space, J ∞ (M ), where, recall, is 2-dimensional. Defined on it there is a sheaf of variational bi-complexes, ( •,• J ∞ (M ) , δ, d). An action, S, is defined to be 1,0 S ∈ (J ∞ (M ), 2,0 J ∞ (M ) /d J ∞ (M ) ),
(0.5)
and can be represented by a collection {L j ∈ 2,0 J ∞ (M ) }
(0.6)
of locally defined Lagrangians equal to each other modulo d-exact terms on double intersections. Action S also produces a Lie algebra I S of integrals of motion by virtue of the Noether theorem. Let us relate this algebra to (0.2). Each action (0.5) defines a space, Sol S , often referred to as the solution space, which for our purposes had better be chosen to be the infinite dimensional submanifold of J ∞ (M ) defined by the Euler-Lagrange equations. Sol S and the jet-spaces considered are examples of a diffiety, a notion introduced by A.M.Vinogradov [V]. Attached to each action there is what is usually called a variational 2-form on Sol S . Generally speaking, it is not a form but a global section 0,2 /d ω S ∈ Sol S , 1,2 Sol S Sol S
(0.7)
annihilated by δ. Special diffiety properties allow to attach to each such form, not necessarily coming from a Lagrangian, a Lie algebra structure on a certain subsheaf of “functions” on Sol S .
490
F. Malikov
We call it a functional pre-symplectic structure and denote the corresponding sheaf of Lie algebras by HωSolS S . One has I S ⊂ (Sol S , HωSolS S ).
(0.8)
We have come to the point. For a class of Lagrangians, comprising those that are convex and of order 1, Sol S is diffeomorphic to J ∞ (T M/ ) and there is a version of the Legendre transform g : J ∞ (T M/ ) → J ∞ (T ∗ M/ ) that gives a Lie algebra sheaf isomorphism ∼
g # : HωSolS S −→ g −1 Lie(O J ∞ (T ∗ M/ ) )
(0.9)
in the case of a single globally defined Lagrangian, cf. (0.1–0.2). Classification of SVDOs is also reflected in the Lagrangian world: given a globally defined Lagrangian L and a closed 3-form H , known as an H -flux, one defines, following e.g. [GHR,W1], a collection of Lagrangians L H , such as in (0.6), so that there is an isomorphism ∼
.
g # : HωSolS S −→ g −1 Lie(O J ∞ (T ∗ M/ ) + H ),
(0.10)
hence an embedding I S ⊂ (M, Lie(O J ∞ (T ∗ M/ ) ).
(0.11)
As an illustration, let us relate the WZW model [W1,GW] to S DG,k . It is conformally invariant meaning that 2 copies, left and right moving, of the Virasoro algebra are among the integrals of motion IW Z W . Both are embedded into Lie(S DG,k ) by virtue of (0.11). Precisely when the level, k, is non-zero, they coincide with the Virasoro algebra defined via the Sugawara construction inside jl (V (g)k ) or jk (V (g)−k ) resp., see (0.3). The existence of the left/right moving Virasoro algebra allows to define the left/right moving subalgebra of Lie(S DG,k ), or indeed, the left/right moving subalgebra of S DG,k . Again, precisely when k = 0, the spaces of global sections of these equal jl (V (g)k ) or jk (V (g)−k ) resp. This is an easy consequence of decomposition (0.4). A disclaimer is in order: no attempt at originality has been made. But it is also true that we failed to find an exposition of this material suitable for our purposes. Our main source of fact and inspiration was [Di], Ch. 19; see also [DF,S,V,Z]. Needless to say, much of what has just been discussed is contained in one form or another in [BD], e.g. Sects. 2.3.20, 3.9, but note that the meaning is somewhat different: we work in the C ∞ setting and our constructions are not necessarily chiral. In fact, one of our wishes was to understand “left and right movers”, whose ubiquity in physics literature bedevils some of us in the mathematics community. All of the above has a more or less straightforward superanalogue. As an example, we analyze the (1,1)-supersymmetric σ -model arising on a Riemannian manifold M. It poiss is similarly governed by a super SVDO, M , which is a quasiclassical limit of the C ∞ -version of the chiral de Rham complex [MSV]. The Lie superalgebra of integrals of motion contains two copies of the N=1 superconformal algebra, and we write explicit poiss formulas for their embeddings into (M, Lie( M )). If, in addition, M is a Kähler manifold, then this symmetry algebra is enlarged to include 2 copies of the N = 2 superconformal algebra, a remarkable fact known since
Lagrangian Approach to Sheaves of Vertex Algebras
491
[Z,A-GF]. If so, a quadruple of operators, Q •,• , • = ±, arises, appropriate combinations poiss of which are differentials on M . The 3 cohomology sheaves, poiss
H Q −− +Q ++ ( M
poiss
), H Q −− +Q +− ( M
poiss
), H Q −− ( M
)
(0.12)
are versions of the quasiclassical limit of Witten’s A-, B- and half-twisted models [W2]. Their cohomology can be computed using versions of the de Rham complex and ∂¯ resopoiss,an lution; the result is H ∗ (M, C), H ∗ (M, ∗ T M ), and H ∗ (M, M ) resp., where the poiss latter is a purely holomorphic version of M , the quasiclassical limit of the chiral de Rham complex [MSV]. Note that in the present situation the left/right moving subalgebra can also be defined by analogy with the WZW case considered above. Of the 3 cohomology vertex Poisson poiss,an algebras, H ∗ (M, C), H ∗ (M, ∗ T M ), and H ∗ (M, M ), the last one is an infinite dimensional subquotient of the left moving algebra and is often referred to as Witten’s chiral algebra [W3,W4]. The first two are also appropriate subquotients but are finite dimensional and of topological nature. poiss The construction of the sheaf complexes used, such as ( M , Q −− + Q ++ ), poiss poiss ( M , Q −− + Q +− ), ( M , Q −− ), is easily quantized to produce the complexes t −− + Q ++ ), ( ver t , Q −− + Q +− ), ( ver t , Q −− ). of vertex algebra sheaves, ( ver M ,Q M M The sheaf cohomology of the first two remains the same, that of the 3rd is quite different and equals the cohomology of the chiral de Rham complex. When formulated in the physics language, the relation of this quantization to the genuine supersymmetric model is that the latter equals the former “perturbatively”; this is the main result of Kapustin [Kap]. It seems, however, that the emphasis made in [Kap] on the “infinite metric limit” is somewhat disingenuous: the above supersymmetries depend on a genuine metric and its Kähler form. By skillfully applying the techniques of SUSY vertex algebras [HK], Ben-Zvi, Heluani, and Szczesny have recently solved [B-ZHS] a harder problem of finding quantum versions of the above mentioned embeddings of the various superconformal algebras in the chiral de Rham complex. Our discussion seems to indicate that their quantization is related to physics (apart from non-perturbative effects) in about the same way the known quantization of (0.3) [FP,F,AG,GMS2] is related to the quantum WZW: it mixes the chiral and anti-chiral sectors. At this point we can only ask, following [FP], if there is a physical model of interest whose chiral algebra is as in [B-ZHS]. t −− ) can be deformed. One The differential Q −− of the (quantum) complex ( ver M ,Q ver t −− of the ways to think about the complex ( M , Q ) is that it is a vertex algebra version ¯ of the ∂-resolution of the algebra of polyvector fields. We conclude by showing that the Barannikov-Kontsevich construction [BK] has a vertex algebra analogue: we define a family of vertex algebras with base the Barannikov-Kontsevich moduli space MBK t by assigning to each t ∈ MBK an element Q −− ∈ (M, ver t M ) and a vertex algebra t H Q −− ((M, ver M )). t t The conformal weight zero subspace of the family {t → H Q −− ((M, ver M ))} t encodes precisely the Frobenius manifold introduced in [BK]. Our construction amounts to defining a morphism of the deformation functor of the Lie algebra of polyvector fields t to that of (M, Lie( ver M )), which is yet another application of construction (0.1). This furnishes the B-model moduli for Witten’s half-twisted model. The instanton effects seem to be out of reach, to us; see, however, an intriguing sentence in [W4], p.1, and [FL] for an interesting new approach based on marginal operators.
492
F. Malikov
1. Diffieties and Functional Pre-Symplectic Structures The geometry of jet-spaces is a huge and familiar topic, see sources such as [BD,Di,Ol, V]. The purpose of this section, see Lemma 1.5.4.1, is to introduce a “universal” sheaf of Lie algebras Hcan , which contains the algebra of symmetries of a class of Lagrangians, see Lemma 1.6.8.1. 1.1. The jets. Assume given a d-dimensional C ∞ -manifold , the “world sheet”, and a smooth fiber bundle τ : →
(1.1.1)
with base , an open subset of R, and fiber , a (d − 1)-dimensional manifold. This is the minimal requirement; most of the time, we will have the Cartesian product ×
(1.1.2)
with a fixed coordinate (“intrinsic time”)
on
τ = σ 0 : → R,
(1.1.3)
σ = (σ 1 , . . . , σ d−1 ) : Rd−1 →
(1.1.4)
and étale coordinates
,
on so that (1.1.4) is a universal cover – étale because we would like to include the case of a torus; furthermore, we will be mostly interested in d = 1. Let M, “space-time”, be an n-dimensional C ∞ -manifold and M = M × . There arises a fiber bundle M → , and we denote by J k (M ) the space of k-jets of its sections. Each J k (M ) is a finite dimensional C ∞ -manifold, and the natural projections πk,l : J k (M ) → J l (M ), k ≥ l organize the collection {J k (M ), k ≥ 0} in a projective system. The space of ∞-jets, J ∞ (M ), is the projective limit, lim← J k (M ). Its sheaf of smooth functions is the direct limit of those on J k (M ), ≥ 0. Let O J k (M ) be the sheaf of all smooth functions on J k (M ) that are polynomials in positive order jets and define O J ∞ (M ) = lim O J k (M ) . →
J ∞ (M
The fiber bundle ) → carries a well-known flat connection, or equivalently, O J ∞ (M ) is a sheaf of D -algebras, where D is the sheaf of differential operators on . Denote by ρ : T → T J ∞ (M )
(1.1.5)
the corresponding morphism of the sheaves of vector fields. We will often refer to this situation by calling J ∞ (M ) → a D -manifold, thus mimicking [BD]. In particular, attached to any tangent vector ξ ∈ Tt , there is a tangent vector ρ(ξ ) ∈ T(x,t) (J ∞ (M )). Hence there arises an integrable d-dimensional distribution M (x, t) → span{ξˆ , ξ ∈ Tt } ⊂ T(x,t) J ∞ (M ) known as the Cartan distribution.
Lagrangian Approach to Sheaves of Vertex Algebras
493
J ∞ (M ) is a simple example of what A.M.Vinogradov calls a diffiety. Since the Cartan distribution is an important structure ingredient, by an infinitesimal automorphism of J ∞ (M ) one means a contact vector field [V], i.e., a vector field that preserves the Cartan distribution. Locally defined contact vector fields form a Lie algebra subsheaf C J ∞ (M ) ⊂ T J ∞ (M ) . Call a contact vector field evolutionary if it is tangent to the fibers of the projection J ∞ (M ) → and let Evol(J ∞ (M )) denote the sheaf of all evolutionary vector fields. Of course, Evol(J ∞ (M )) ⊂ C J ∞ (M ) ⊂ T J ∞ (M ) are embeddings of Lie algebra sheaves. All of this admits a relative version: if one defines J ∞ (M/ ) to be the space of jets of sections → M in the direction of fibers of the bundle → , then the definitions of the connection ρ/ : T/ → T J ∞ (M/ ) ,
(1.1.6)
(where T/ is the sheaf of vector fields on tangent to the fibers of the projection → ), Cartan distribution, sheaf of evolutionary vector fields Evol(J ∞ (M/ )), etc., are immediate. From now, unless otherwise stated, we will be working over a base S, S being either or a point. If (1.1.2-4) are valid, then the fibers of the projection J ∞ (M ) → are canonically identified, and ∼
∼
∞ J ∞ (M ) −→ J∞ (M) × , J ∞ (M/ ) −→ J/ (M) ×
(1.1.7)
∞ for some infinite dimensional manifolds, J∞ (M) and J/ (M), whose definition is easy to reconstruct from 1.2 below.
1.2. Local formulas. As an illustration, and for future use, let us show what all of this means in terms of local coordinates. Let x 1 , . . . , x n be local coordinates on M, those on being defined by (1.1.2–4). For a multi-index (m) = (m 0 , . . . , m d−1 ), let j
m
d−1 j x(m) = ∂σm00 · · · ∂σ d−1 x ,
where ∂σ m = ∂/∂σ m and x j is regarded, formally, as a function of σ 0 , . . . , σ d−1 . Then j
{σ i , x(m) : 0 ≤ i ≤ d − 1, 1 ≤ j ≤ n, (m) ∈ Zd+ } are local coordinates on J ∞ (M ), j
{σ i , x(m) : 0 ≤ i ≤ d − 1, 1 ≤ j ≤ n, (m) ∈ Zd+ , m 0 = 0} are local coordinates on J ∞ (M/ ), and sections of O J ∞ (M/S ) are smooth functions
in σ i, x j and polynomials in x(m) , (m) = 0, with m 0 = 0 if S = . j
j
j
Let δ/δx(m) denote the vertical vector field ∂/∂ x(m) ∈ T J ∞ (M ) . Morphism (1.1.5) is defined by ρ(∂σ i ) = ∂σ i +
n j=1 (m)∈Zd+
j
x(m+ei )
δ j
δx(m)
,
where ei = (0, . . . , 0, 1, 0, . . . , 0), 1 appearing at the i th position.
(1.2.1)
494
F. Malikov
Evolutionary vector fields are in 1–1 correspondence with n-tuples of functions F 1 , . . . , F n ∈ O J ∞ (M ) (called the characteristic of a vector field) and are defined via the following prolongation formula: ⎛ ⎞ n δ ⎠ ⎝F j δ + ρ(∂σ 0 )m 0 · · · ρ(∂σ m d−1 )m d−1 F j ξ= . (1.2.2) j δx j δx (m)=0
j=1
(m)
The relative analogs of (1.2.1,2) are obviously obtained by demanding that m 0 = 0. 1.3. De Rham complex. Reflecting the product structure of M , the de Rham complex on J ∞ (M/S ) is bi-graded: p
i, j
J ∞ (M/S ) = ⊕i+ j= p J ∞ (M/S ) . It carries 2 anti-commuting differentials: i, j
i, j+1
i, j
i+1, j
δ : J ∞ (M ) → J ∞ (M ) , dρ/S : J ∞ (M/S ) → J ∞ (M/S ) , [δ, dρ/S ] = 0, defined as follows. The space, or rather the sheaf of spaces, ∗,∗ J ∞ (M/S ) is naturally an O -module, and δ is the vertical de Rham differential, i.e., the one that is O -linear. The flat connection ρ/S gives rise to a differential, dρ/S , on ∗,0 J ∞ (M/S ) in the standard manner. For example, in terms of local cordinates, j
dρ/S x(m) =
d−1
j
x(m)+ei dσ i ,
(1.3.1)
i=
where = 0 if S = ∅ and = 1 if S = . Then the condition [δ, dρ/S ] allows one to extend dρ/S to the entire ∗,∗ J ∞ (M/S ) unambiguously. Thus δ F(σ, x(m) ) =
∂ F(σ, x(m) ) δx(m) ; δdσ = 0, ∂ x(m)
j
j
dρ/S δx(m) = −δdρ/S x(m) = −
d−1
j
δx(m)+ei ∧ dσ i .
(1.3.2) (1.3.3)
i=
There is a mapping of bi-complexes ∗,∗ ∗,∗ J ∞ (M ) → J ∞ (M
/ )
,
(1.3.4)
that sends a form to its restriction to the fibers of the composite projection τ
J ∞ (M ) → → .
(1.3.5)
As a practical matter, (1.3.4) amounts to dτ → 0.
(1.3.6)
Let ιξ be the operator of contraction with a vector field ξ . A straightforward computation proves the following.
Lagrangian Approach to Sheaves of Vertex Algebras
495
Lemma 1.3.1. A vertical vector field ξ on J ∞ (M/S ) is evolutionary iff [ιξ , dρ/S ] = 0.
(1.3.7)
Corollary 1.3.2. If ξ is evolutionary, then [dρ/S , Lieξ ] = 0.
(1.3.8)
Indeed, [dρ/S , Lieξ ] = [dρ/S , [δ, ιξ ]] = −[δ, [dρ/S , ιξ ]] = 0. 1.4. Differential equations. Let J ⊂ O J ∞ (M ) be a sheaf of ideals preserved by the connection ρ. Let Sol ⊂ J ∞ (M ) be the zero locus of this ideal. If some regularity conditions hold, then this submanifold delivers another example of a diffiety. For example, one can, and we will, assume that J is locally pseudo-Cauchy-Kovalevskaya, i.e., there is a distinguished coordinate on , to be denoted τ , and for any point in M there is a coordinate system x 1 , . . . , x n s.t. the ideal is generated, around the pre-image of this point on the jet space, by the functions E 1 , . . . , E n satisfying E j = ρ(∂τ )l j x j + · · · ,
(1.4.1)
where the dots stand for the terms that do not involve jets of x i of degree ≥ li , 0 ≤ i ≤ d − 1 in the direction of ∂τ . Here are some of the structure properties Sol shares with the ambient jet space: τ
• Sol is fibered over , hence over via → ; • the algebra of functions O Sol is a D -algebra (because the flat connection preserves Sol), hence a D/ -algebra, where D/ is the subalgebra of D that commutes with τ −1 O ; we will write Sol if we wish to emphasize the D/ -algebra structure; • the de Rham complex ∗Sol/S is bi-graded and carries two commuting differentials, δ, the vertical differential, and dρ/S , the D/S -module differential. If (1.4.1) is valid, then solving the equation E j = 0 for ρ(∂τ )l j x j , one sees that Sol τ looks like ∞-jets in the direction of the fiber of the bundle → to something finite dimensional. In particular, if l1 = l2 = · · · = ln = 2, then ∼
Sol −→ J ∞ (T M/ ),
(1.4.2)
as D/ -manifolds. Any evolutionary vector field on the ambient jet space that preserves the ideal J descends to a vector field on Sol, which still satisfies (1.3.7). We emulate this situation by making the following definition. Definition 1.4.1. A vertical vector field ξ on Sol is called evolutionary (relative to S) if [ιξ , dρ/S ] = 0.
(1.4.3)
496
F. Malikov
Lemma 1.4.2. If ξ is evolutionary, then [dρ/S , Lieξ ] = 0.
(1.4.4)
1.4.3. Let Evol(Sol) S denote the sheaf of all evolutionary vector fields on Sol relative to S. The identity [Lieξ , ιη ] = ι[ξ,η] combined with (1.4.3,4) implies that Evol(Sol) S is a Lie algebra. It is an obvious consequence of Definition 1.4.1 that Evol(Sol) ⊂ Evol(Sol) .
(1.4.5)
1.5. Functional pre-symplectic structure. A symplectic structure, that is, a nondegenerate closed 2-form gives rise to a Poisson algebra structure on the structure sheaf of a manifold. A pre-symplectic structure, i.e., a degenerate closed 2-form similarly gives rise to a Poisson algebra structure on a certain, admissible, subalgebra of the structure sheaf. This subalgebra consists of functions constant along the leaves of the foliation tangent to the kernel of the form [Fad]. In the case of a diffiety, such as Sol, an analogue of the structure sheaf is supplied by the sheaf of local functionals, and we would like to explain, in the spirit of [DF,Di], that in this case a pre-symplectic structure gives rise to a Poisson bracket on the sheaf of local functionals, which may be just as good for all practical purposes as the symplectic one. Arising in this way the notion of a functional pre-symplectic structure is a rather straightforward geometric version of Dorfman’s symplectic operator [D]. 1.5.1. The following is a list of standard symplectic geometry notions adjusted to the case where vector fields are replaced with evolutionary vector fields and equalities are valid up to dρ/S -exact terms. From now on M is a diffiety, such as Sol ⊂ J ∞ (M ) or J ∞ (M/S ). The relation [δ, dρ/S ] = 0, see Sect. 1.3, implies that δ descends to a differential ¯ •,• → ¯ •,•+1 , where ¯ •,• = •,• /dρ/S •−1,• . δ¯ : M/S M/S M/S M/S M/S
(1.5.1)
¯ •,• are often referred to as functional forms [Ol]. Elements of the quotient complex M/S ¯ d−1,2 ) a functional pre-symplectic form if We will call ω¯ ∈ H 0 (M, M/S
δ¯ω¯ = 0.
(1.5.2)
Note that if ξ is an evolutionary vector field, then Lieξ and ιξ are well-defined oper¯ •,• thanks to (1.4.3–4). ators acting on the quotient complex M/S An evolutionary vector field ξ ∈ Evol(M)/S is called Hamiltonian if Lieξ ω¯ = 0.
(1.5.3)
¯ d−1,0 admissible relative to ω¯ if there is an evolutionary Call a functional form F¯ ∈ M/S vector field ξ F such that δ¯ F¯ = ιξ F¯ ω. ¯
(1.5.4)
Note that (1.5.4) implies that ξ F¯ is Hamiltonian, which prompts one to think of F¯ as a Hamiltonian associated to ξ F¯ . Hence the following bit of notation:
Lagrangian Approach to Sheaves of Vertex Algebras
497
ω¯ Definition 1.5.1.1. Let HM /S be the sheaf of all functional (d − 1, 0)-forms that are admissible relative to ω. ¯
Note that although an admissible F¯ does not determine ξ F¯ uniquely it does so up to the kernel of ω: ¯ any two such vector fields ξ F¯ , η F¯ satisfy ιξ F¯ −η F¯ ω¯ = 0.
(1.5.5)
¯ G¯ define the bracket For any admissible F, ¯ G} ¯ = ξ ¯ G. ¯ { F, F
(1.5.6)
¯ G} ¯ = ιξ ¯ δ¯ G¯ = ιξ ¯ ιξ ¯ ω, ¯ { F, F F G
(1.5.7)
Equivalently,
which shows, by virtue of (1.5.5), that (1.5.6) is independent of the choice of ξ F¯ . Next, ¯ G} ¯ ∈ Hω¯ . { F, M/S
(1.5.8)
¯ G} ¯ = ι[ξ ¯ ,ξ ¯ ] ω, ¯ F, δ{ ¯ F G
(1.5.9)
In fact,
as the following computation (based on a repeated use of (1.5.2–3) ) shows:
¯ G} ¯ = δ(ξ ¯ F, ¯ F¯ G) = δ¯ ιξ ¯ ιξ ¯ ω¯ = Lieξ ¯ ιξ ¯ ω¯ − ιξ ¯ Lie xi ¯ ω¯ + ιξ ¯ ιξ ¯ δ¯ω¯ δ{ F G F G F G F G = ι[ξ F¯ ,ξG¯ ] ω¯ + ιξG¯ Lieξ F¯ ω¯ = ι[ξ F¯ ,ξG¯ ] ω. ¯ Therefore, we obtain the map ω¯ ω¯ ω¯ ¯ ¯ ¯ ¯ ¯ {., .} : HM /S × HM/S → HM/S , ( F, G) → { F, G} = ξ F¯ G.
(1.5.10)
ω¯ Proposition 1.5.2. Map (1.5.10) makes HM /S into a sheaf of Lie algebras.
Proof. The antisymmetry of (1.5.10) is an immediate consequence of (1.5.7). The Jacobi identity is proved as follows: {{F, G}, H } = [ξ, η]H = ξ ηH − ηξ H = {F, {G, H }} − {G, {F, H }}, where the first equality is a consequence (1.5.9).
1.5.3. The absolute and relative versions of this construction can be compared. Indeed, by virtue of (1.4.5), morphism of bi-complexes (1.3.4) induces a morphism of the Lie algebra sheaves ω¯ ω¯ HM → HM / ,
(1.5.11)
498
F. Malikov
which in terms of local coordinates amounts to dτ → 0,
(1.5.12)
cf. (1.3.6). 1.5.4. Example 1. Canonical commutation relations. Replace M as the target space with T ∗ M and consider J ∞ (T ∗ M/ ) with d = dim = 2. There arises the projection π : J ∞ (T ∗ M/ ) → T ∗ M/ .
(1.5.13)
Let ωo be the canonical symplectic form on T ∗ M and ω = ωo ∧ dσ ; the latter is a (1, 2)-form on T ∗ M/ – we are taking advantage of coordinates (1.1.2–4). There arises then π ∗ ω, the pull-back of ω to J ∞ (T ∗ M/ ) under (1.5.14). Let us now com∗ pute HπJ ∞ω(T ∗ M ) defined in 1.5.1.1. /
Lemma 1.5.4.1. ∗
HπJ ∞ω(T ∗ M
/ )
¯ 1,0 = J ∞ (T ∗ M
/ )
.
(1.5.14)
Informally speaking, this lemma says that in this case any function is admissible, hence π ∗ ω is as good as symplectic, the fact that we alluded to in the beginning of Sect. 1.5. Proof. Note that, ωo being non-degenerate, any section of 1,1 T ∗ M/ can be written as ιξ o ω for some vector field ξ o on T ∗ M . Pulling back on J ∞ (T ∗ M/ ), one sees that likewise any section of π ∗ 1,1 can be written as ιξ o π ∗ ω, where ξ o is now a T∗M /
vector field on J ∞ (T ∗ M/ ), which is locally a linear combination of δ/δx i , x i being local coordinates on T ∗ M. Thinking of ξ o as a characteristic, one can prolong it to an evolutionary vector field ξ , as in (1.2.2), and thus obtain π ∗ 1,1 T∗M
/
= {ιξ π ∗ ω, ξ ∈ Evol(J ∞ (T ∗ M/ ))}.
Now observe that 1,1 J ∞ (T ∗ M
/ )
is generated, as a D/ -module, by π ∗ 1,1 T∗M
Hence (1.5.15) holds true for the entire 1,1 J ∞ (T ∗ M
/ )
(1.5.15)
1,1 J ∞ (T ∗ M/ )
/
.
modulo dρ/ -exact terms, i.e.,
= {ιξ π ∗ ω + dρ/ β, ξ ∈ Evol(J ∞ (T ∗ M/ )), β ∈ 0,1 J ∞ (T ∗ M
/ )
}
(1.5.16) and (1.5.14) follows.
Because of its importance, the sheaf of Lie algebras arising in this way will be denoted thus ∗
Hcan = HπJ ∞ω(T ∗ M def
/ )
¯ 1,0 = J ∞ (T ∗ M
/ )
.
(1.5.17)
Computationally, the gist of our discussion is as follows. The algebra of functions on the cotangent bundle with the canonical Poisson bracket is a Lie subalgebra of Hcan : (1.5.18) π # : π −1 OT ∗ M/ dσ, {., .}T ∗ M → Hcan ,
Lagrangian Approach to Sheaves of Vertex Algebras
499
and the rest of the Lie algebra structure is determined by {Fdσ , G H dσ } = {Fdσ, Gdσ }H + G{Fdσ, H dσ }, (1.5.19a) {Fdσ , Gρ(∂σ )H dσ } = {Fdσ, Gdσ }ρ(∂σ )H + Gρ(∂σ ){Fdσ, H dσ }, (1.5.19b) because an evolutionary vector field is a derivation commuting with ρ(∂σ ), see e.g. (1.4.4.). To see what all of this means, let us compute some brackets. Let F, G be functions on M/ , ξ, η vector fields on M vertical w.r.t. M → , which we regard as fiberwise linear functions on T ∗ M . Then, {Fdσ , Gdσ } = 0, {ξ dσ , Gdσ } = ξ Gdσ , {ξ dσ , ηdσ } = [ξ, η]dσ .
(1.5.20a) (1.5.20b) (1.5.20c)
The first instance of the bracket jet nature manifesting itself is as follows. If Fi d x i is a 1-form on M, then α = Fi (x)ρ(∂σ )x i is a well-defined (0, 0)-form on J ∞ (T ∗ M ). Having thus embedded 1M into 0,0 , one uses (1.5.18a–b) to obtain J ∞ (T ∗ M ) /
{ξ dσ , αdσ } = (Lieξ α)dσ ,
(1.5.20d)
if ξ does not depend on σ explicitly, and {ξ dσ , αdσ } = (Lieξ α + ι∂σ ξ α)dσ
(1.5.20e)
in general, where Lieξ α is the Lie derivative of α along ξ . Formulas (1.5.20a–d), without functions explicitly depending on τ, σ , are a familiar definition of the Lie algebra associated with the Courant algebroid on T M ⊕ T ∗ M. The idea that the Courant algebroid has infinite dimensional nature apparently goes back to I.Dorfman [Dor]. It was revived recently, in a slightly different context, by P.Bressler [Bre]. Note that identities (1.5.19a,b) seem to incorporate the Leibnitz identity, which they do not, because Hcan is not an associative algebra. It is, however, a quotient of 1,0 1,0 J ∞ (T ∗ M/ ) , and the latter is. In fact, J ∞ (T ∗ M/ ) is a sheaf of vertex Poisson algebras, and its quotient Hcan is canonically associated to its sheaf of Lie algebras, see Proposition 2.7.3 below. 1.5.5. Example 2. The solution space of an order 2 system in the pseudo-CauchyKovalevskaya form. Let us place ourselves in the situation of 1.4 and let Sol satisfy (1.4.2), i.e., ∼
Sol −→ J ∞ (T M/ ). The latter does not carry any canonical 2-form, but let us fix a diffeomorphism ∼
g : T M −→ T ∗ M, which in practice is most often defined by a metric on M. It is lifted, uniquely, to a diffeomorphism of D/S -manifolds (for any base S) ∼
g : J ∞ (T M/S ) −→ J ∞ (T ∗ M/S ),
(1.5.21)
500
F. Malikov
hence a diffeomorphism ∼
g : Sol −→ J ∞ (T ∗ M/ ), and a sheaf isomorphism ∼
−1 •,• g # : •,• Sol/ −→ g J ∞ (T ∗ M )/ ,
(1.5.22)
where g −1 stands for the inverse image in the category of sheaves of vector spaces. Let g ∗ ω be the symplectic form on T M obtained by pulling back the canonical symplectic form ω on T ∗ M. We have arrived at Lemma 1.5.5.1. The mapping (1.5.22) descends to an isomorphism of Lie algebra sheaves g∗ ω
∼
g # : H Sol/ −→ g −1 Hcan . 1.6. Calculus of variations and integrals of motion. Bosonic σ -model. Calculus of variations is the principal source of the brackets discussed in 1.4–5. 1.6.1. An action A is a global section, cf. (1.5.1), ¯ d,0 A ∈ J ∞ (M ), ∞
J (M )
.
(1.6.0)
It can be represented by a Lagrangian which is a collection of sections ( j) L = {L (i) ∈ Ui , d,0 − L (i) ∈ Imdρ on U j ∩ Ui }, (1.6.1a) ∞ J (M ) s.t.L determined up to a transformation L (i) → L (i) + dρ β (i) ,
(1.6.1b)
where {Ui } is an open covering of J ∞ (M ). Choosing local coordinates one observes that (i)
δL (i) = −dρ γ (i) + E j δx j ,
(1.6.2)
d,0 for some γ (i) ∈ d−1,1 J ∞ (M ) , known as a variational 1-form, and some E j ∈ J ∞ (M ) .
Since representation (1.6.2) is unique [Di,T], and transformation (1.6.1b) leaves E (i) j unaffected (because [δ, dρ ] = 0, see 1.3), associated to the action A there arises the sheaf of Euler-Lagrange ideals J L =< D E 1 , D E 2 , . . . , D E n >⊂ O J ∞ (M ) .
(1.6.3)
Let Sol L ⊂ J ∞ (M ) be the corresponding zero locus. We will assume that J L is of pseudo Cauchy-Kovalevskaya type, and usually (1.4.2) will hold.
Lagrangian Approach to Sheaves of Vertex Algebras
501
The variational 1-form γ (i) is not quite uniquely defined, but due to the well-known acyclicity theorem [T,Di], locally it is determined up to a dρ -exact term. Therefore, the variational 2-form ω(i) = δγ (i) ¯ d−1,2 unambiguously defines a section of the quotient sheaf J ∞ (M ) over Ui . Since transformation (1.6.1b) leaves it invariant, there arises def ¯ d−1,2 ω¯ L = {Ui → ω(i) } ∈ J ∞ (M ), (1.6.4) ∞ J (M ) . By construction, ω¯ L satisfies (1.5.2); hence on Sol L there arises the sheaf of Lie algebras HωSolL , see Proposition 1.5.2. Our task now is to detect inside it a subalgebra of integrals of motion. As we have seen already, the nature of the argument tends to be purely local, and until further notice it will be assumed that L ∈ (J ∞ (M ), d,0 J ∞ (M ) ). 1.6.2. A symmetry of L is an evolutionary vector field ξ s.t. Lieξ L = dρ αξ ,
(1.6.5)
for some αξ ∈ d−1,0 J ∞ (M ) . Denote by Sym L the set of all symmetries of L; it is naturally a Lie algebra. It is easy to derive from (1.6.5) that any ξ ∈ Sym L preserves J L , see [Di], hence defines a vector field on Sol L , to be denoted ξ¯ . Let Sym L be the Lie algebra of all such vector fields. ¯ d−1,0 An integral of motion of L is an F¯ ∈ (J ∞ (M ), J ∞ (M ) ) s.t. ¯ Sol L = 0. dρ F|
(1.6.6)
¯ ¯ d−1,0 I L = { F¯ ∈ (J ∞ (M ), J ∞ (M ) ) s.t. dρ F| Sol L = 0}
(1.6.7a)
¯ d−1,0 ). I˜ L = I L | Sol L ⊂ (Sol L , Sol L
(1.6.7b)
Let
and
If ξ is a symmetry of L with characteristic {Q j }, see (1.2.2), then the computation dρ αξ = ιξ δL
(1.6.2)
= −ιξ dρ γ + E j Q j
(1.3.7)
= dρ ιξ γ + E j Q j
(1.6.8)
shows that αξ − ιξ γ is an integral of motion. The form αξ being determined by ξ up to a dρ -exact term, (1.6.8) defines maps Sym L → I L , ξ → αξ − ιξ γ
(1.6.9)
Sym L → I˜ L .
(1.6.10)
and, by restriction,
Noether’s Theorem 1.6.3. ([Di,Ol]). Map (1.6.9) is a surjection, map (1.6.10) is an isomorphism.
502
F. Malikov
Therefore, I˜ L inherits a Lie algebra structure from Sym L . Let us now show that this Lie algebra structure is consistent with that on the sheaf Hω¯ L . ¯L Lemma 1.6.4. I˜ L is a Lie subalgebra of (Sol L , HωSol ) such that (1.6.10) is a Lie L algebra isomorphism.
It is this Lie algebra that is often referred to as the algebra of integrals of motion or current algebra. Proof. It is known, see e.g. [Di] 19.6.17 or [DF] Proposition 2.76, that if ξ is a symmetry of L such that (1.6.5) holds, then, upon restricting to Sol L , Lieξ γ = δαξ + dρ β for some β. An application of δ to both sides of this equality shows that ξ is Hamiltonian, see (1.5.2). The corresponding integral of motion Fξ = αξ − ιξ γ is admissible because δ Fξ = δαξ − διξ γ = δαξ − Lieξ γ + ιξ δγ = −dρ β + ιξ ω L . ¯L ). Furthermore, the line above shows that modulo dρ , Hence I˜ L ⊂ (Sol L , HωSol L
δ¯ F¯ξ = ιξ ω¯ L . Hence the bracket of two integrals of motion induced by the Lie algebra structure on ¯L HωSol , see (1.5.10), is as follows: L { F¯ξ , F¯η } = ξ F¯η , which is also an integral of motion, because due to (1.4.4), dρ ξ G = ξ dρ G = 0. The corresponding symmetry of L is, of course, [ξ, η], which completes the proof.
1.6.5. Let us now drop the requirement that L be globally defined. The exposition above has to be altered a little. An evolutionary vector field is a symmetry of L, see (1.6.1), if it is of each L (i) : (i)
Lieξ L (i) = dρ αξ . ( j)
There may arise discrepancies αξ(i) − αξ on double intersections Ui ∩ U j , but (1.6.1a) and (1.4.4) ensure that they are dρ -exact. Therefore, while the collection (i)
(i)
{Fξ = αξ − ιξ γ (i) } does not define a global section of d−1,0 J ∞ (M ) , taken modulo dρ it defines a global section d−1,0 ¯ ∞ of J (M ) . The rest of the discussion in 1.6.2–4 goes through unchanged, and we obtain
Lagrangian Approach to Sheaves of Vertex Algebras
503
Corollary 1.6.6. Lemma 1.6.4 holds true for any Lagrangian (1.6.1a,b). ¯L ¯L Along with HωSol , there is its relative version, HωSol/ and the Lie algebra sheaf morphism ¯L ¯L HωSol → HωSol/
defined in (1.5.11), which seems to be neither surjection nor injection, generally speaking. Lemma 1.6.7. If Sol L satisfies (1.4.1), then the composition L I˜ L → (Sol, HωSolL ) → (Sol, HωSol/ )
is an injection. Proof. Assume that F˜ ∈ I˜ L is annihilated by the composite map. This means that if o o ˜ F ∈ d−1,0 Sol L is a representative of F, then F = F ∧ dτ , and dρ/ F = 0. Due to the o Takens acyclicity theorem [T] (applicable thanks to (1.4.1)), F = dρ/ G for some G. Therefore, F = ±dρ (G ∧ dτ ) and F˜ = 0, as desired. Now we would like to explain that for an important class of Lagrangians, the sheaf ¯L can defined in (1.5.17) and exhibit some conHωSol/ is isomorphic to the canonical H crete Lie algebras of integrals of motion. 1.6.8. Order 1 Lagrangians and the Legendre transform. Let us assume that the Lagrangian L depends only on 1-jets of the coordinates x j . If we let ˜ 0 ∧ · · · ∧ dσ d−1 , L = Ldσ then (1.6.2) becomes
(1.6.11)
∂ L˜ j 0 d−1 p δL = − dρ (−1) dσ ∧ · · · ∧ dσ δx ∧ dσ ∧ · · · ∧ ∂(∂σ p x j ) (1.6.12) ∂ L˜ ∂ L˜ − ∂σ p + δx j ∧ dσ 0 ∧ · · · ∧ dσ d−1 , ∂x j ∂(∂σ p x j ) p+1
where means that the term is omitted and summation w.r.t. repeated indices is assumed. Assume now that on there is a distinguished coordinate, say τ = σ 0 , such that L is a convex function of jets of coordinates in the τ -direction. It follows then that Sol L satisfies (1.4.2). Applying (1.3.4) to γ we obtain ∂ L˜ γ := γ |dτ =0 = δx j ∧ dσ 1 ∧ · · · ∧ dσ d−1 . ∂(∂τ x j ) Note that, as a function of ∂τ x j , L˜ is canonically a function on the tangent space T M. It follows that γ is unambiguously a 1-form on T M. The convexity of L implies that the Legendre transform dT M L˜ : T M → T ∗ M
(1.6.13)
504
F. Malikov
is a diffeomorphism. A moment’s thought shows that γ is the pull-back of the canonical ˜ which places us in the situation of Lemma 1.5.5.1. In a 1-form on T ∗ M w.r.t. dT M L, coordinate form, we have: if x j are coordinates on M, x j = ∂/∂ x j are fiberwise linear functions on T ∗ M, then ˜ # (x j ) = x j , (dT M L) ˜ # (x j ) = (dT M L) and
∂ L˜ , ∂(∂τ x j )
˜ # x j δx j ∧ dσ 1 ∧ · · · ∧ dσ d−1 , γ = (dT M L) (1.6.14)
˜ # δγ = (dT M L) ˜ # δx j ∧ δx j ∧ dσ 1 ∧ · · · ∧ dσ d−1 , ωL = (dT M L)
are the pull-backs of the canonical degenerate symplectic form. Hence Lemmas 1.5.5.1 and 1.6.7 specialized to the present situation read as follows. Lemma 1.6.8.1. If L depends only on the 1-jets of coordinates and is convex, then in the case where d = 2, there are the following Lie algebra (sheaf) morphisms: ∼ ∼ ¯L ˜ −1 Hcan , I˜ L → (M, Hω¯ L ) −→ (M, Hcan ). HωSol/ −→ (dT M L) Sol/
1.6.8.2. This lemma explains the universality of Hcan . One can argue, therefore, that the Lie algebra content of the “theory” is independent of the Lagrangian. What captures the properties of an individual Lagrangian is the subalgebra of integrals of motion. For example, if L is independent of τ , the intrinsic time, then ρ(∂τ ) is a symmetry of L, and (1.6.9) produces the corresponding integral of motion as follows: since ˜ 1 ∧ · · · ∧ dσ d−1 ), ρ(∂τ )L = dρ ( Ldσ the corresponding integral of motion, upon restriction to the fibers of Sol L → , becomes ˜ ∂ L j Hρ(∂τ ) = αρ(∂τ ) − ιρ(∂τ ) γ = L˜ − ∂τ x dσ 1 ∧ · · · ∧ dσ d−1 , ∂(∂τ x j ) (1.6.15) which is the familiar energy function, of course. 1.6.9. Bosonic string, left/right movers, and a rudiment of generalized geometry. Let M be a Riemannian manifold with metric (., .), be 2-dimensional with coordinates τ and σ . By definition, a point in J 1 (M ) is a triple (t, x, ∂ x), where t ∈ , x ∈ M, and ∂ x is a linear map ∂ x : Tt → Tx M, ξ → ∂ξ x. This makes sense out of the symbol (∂ξ x, ∂η x) as a function on J 1 (M ). The following L=
1 ((∂σ − ∂τ ) x, (∂σ + ∂τ ) x) dσ ∧ dτ 2
(1.6.16)
Lagrangian Approach to Sheaves of Vertex Algebras
505
is then a well-defined Lagrangian, the celebrated σ -model Lagrangian. In terms of local coordinates x 1 , . . . , x n s.t. (., .) = gi j d x i d x j , it looks as follows: L=
1 gi j ∂σ x i ∂σ x j − gi j ∂τ x i ∂τ x j dσ ∧ dτ. 2
A direct computation shows that δL = −dρ ((∂τ x, δx) dσ − (∂σ x, δx) dτ )
+ ∇∂τ x ∂τ x − ∇∂σ x ∂σ x dσ ∧ dτ,
(1.6.17)
where ∇∂• x is the value of the Levi-Civita connection on ∂• x. It is clear that L satisfies all the conditions of Lemma 1.6.8.1. The Lagrangian being independent of τ and σ , associated to ρ(∂τ ) and ρ(∂σ ) there arise two integrals of motion, energy and momentum, and any linear combination thereof. But much more is true. In fact, any vector field of the type either ξ − =
1 1 f (σ − τ )ρ(∂σ − ∂τ ) or ξ + = f (σ + τ )ρ(∂σ + ∂τ ) 2 2
(1.6.18)
is a symmetry of L. Indeed, precisely because (∂σ ± ∂τ )(σ ∓ τ ) = 0, one has 1 − f (σ − τ ) ((∂σ − ∂τ ) x, (∂σ + ∂τ ) x) (dσ + dτ ) , ξ L = dρ 4 (1.6.19) 1 + ξ L = −dρ f (σ + τ ) ((∂σ − ∂τ ) x, (∂σ + ∂τ ) x) (dσ − dτ ) . 4 Using (1.6.9) and Lemma 1.6.7 one obtains the corresponding integrals of motion, inside ¯L HωSol , L / 1 f (σ − τ ) ((∂σ − ∂τ ) x, (∂σ − ∂τ ) x) dσ, 4 1 = − f (σ + τ ) ((∂σ + ∂τ ) x, (∂σ + ∂τ ) x) dσ. 4
Fξ − = Fξ +
(1.6.20)
Upon applying Legendre transform (1.6.13), which in terms of local coordinates is given by xi = giα ∂τ x α , ∂τ x i = g iα xα , formulas (1.6.20) become 1 ij 1 1 g xi x j + gi j ∂σ x i ∂σ x j − x j ∂σ x j dσ, Fξ − = f (σ − τ ) 4 4 2 1 ij 1 1 i j j Fξ + = f (σ + τ ) − g xi x j − gi j ∂σ x ∂σ x − x j ∂σ x dσ, 4 4 2
(1.6.21)
and this computes the image of Fξ ± under the composite map of Lemma 1.6.8.1. Let Vir ± = span{Fξ ± } ⊂ (M, Hcan ).
(1.6.22)
506
F. Malikov
All of this means that the space of global sections of the sheaf of Lie algebras Hcan contains 2 commuting copies of the Lie algebra of vector fields on . In the case where = S 1 × , each is the centerless Virasoro algebra, hence the notation. In view of canonical commutation relations discussed in 1.5.5, formulas (1.6.21) are 2 bozonizations of the Virasoro algebra – in the quasiclassical limit. This prompts the following definitions: ¯ L ,+ − in Hω¯ L Definition 1.6.9.1. (i) Denote by HωSol the centralizer of Vir Sol L / and L / call it the right moving algebra. ¯ L ,− ω¯ L + (ii) Denote by HωSol the centralizer of Vir in H Sol / and call it the left moving L / L algebra.
We will present a computation of left/right moving algebra in the context of the ¯L WZW model in Sect. 2.9.2. Let us also note that HωSol contains yet another Virasoro L / algebra–the sum of the first two, which upon restricting to {τ = 0} becomes Vir o = span{Fξ + + Fξ − } = span{ f (σ )x j ∂σ x j dσ }.
(1.6.23)
Bosonization (1.6.23) is much simpler than (1.6.21) and was thoroughly investigated in [MSV,GMS1], but the corresponding Virasoro algebra is neither right nor left moving. 1.6.9.2. Generalized geometry interpretation. Formulas (1.6.18) admit a nice, Lagrangian free, interpretation in the spirit of Hitchin’s “generalized geometry”, [G]. The idea of generalized geometry is that the tangent bundle of a manifold must be consistently replaced with the direct sum of the tangent and cotangent bundles. From this point of view, a metric on M is a reduction of the structure group of T M ⊕ T ∗ M from S O(n, n) to S O(n, 0) × S O(0, n). Letting {ei }, {e j } be a pair of relatively dual bases of the S O(n, 0)-subbundle and letting { f i }, { f j } the same for the S O(0, n)-subbundle, one can form 2 invariantly defined tensors, ei ei and f i f i . Noticing that xi , in (1.6.18), is naturally identified with ∂x i , ∂σ x j with d x i , one concludes that Vir + is generated by ei ei and Vir − by − f i f i . To talk about these and other issues coherently, one must change gears and introduce vertex Poisson algebras. 2. Vertex Poisson Algebras Our presentation of this well-known topic, see e.g. [FB-Z], will be a little different in the following respects. First of all, we will fix an associative commutative C-algebra B to be the ground ring for all linear algebra constructions of this section. Second of all, we will let g = Der B and demand that all the structures be g-equivariant. These assumptions are intended to handle functions of τ and σ should they appear. Therefore, two examples to be kept in mind are these: B = C ∞ (), g = T () or B = C, g = 0.
(2.1)
The case at hand, where M = M × , is rather special, and we could have avoided including B and g as part of data (which is customary in works on vertex algebras), but we decided against it. That the natural setting for what follows is equivariant was pointed out by Beilinson and Drinfeld [BD, 3.9].
Lagrangian Approach to Sheaves of Vertex Algebras
507
Definition 2.1. A g-equivariant vertex Poisson algebra is a collection (V, T,(n) , g; n ≥ −1), where V is a B-module, T :V →V is a B-linear map, and (n)
: V ⊗ V → V, a(n) b = 0 if n >> 0
is a family of B-bilinear multiplications, such that the following axioms hold: I. The triple (V, T,(−1) ) is a commutative associative algebra with derivation T . II. The collection (V, T,(n) ; n ≥ 0) is a vertex Lie algebra, i.e., the following holds: II.1. skew-commutativity a(n) b = (−1)n+1
∞ (−1) j j=0
j!
T j (b(n+ j) a),
II.2. Jacobi identity a(m) b(n) c − b(n) a(m) c =
∞ m (a( j) b)(n+m− j) c, j j=0
II.3. properties of T : (T a)(n) = [T, a(n) ] = −na(n−1) if n ≥ 0. III. Leibnitz identity: for any n ≥ 0, a(n) is a derivation of (−1) . IV. g-equivariance: V is a g-module, and the maps (n) and T are g-module morphisms. In addition, we will always be assuming that a vertex Poisson algebra (V, T,(n) ; n ≥ −1) is Z+ -graded, i.e., V =
∞
Vn , T (Vn ) ⊂ Vn+1 , g(Vn ) ⊂ Vn , Vm( j) Vn ⊂ Vm+n− j−1 .
(2.1.1)
n=0
We will unburden the notation by letting V stand for (V, T,(n) , g; n ≥ −1) when this does not lead to confusion and by suppressing (−1) so that ab stands for a(−1) . We will also tend to drop the adjective “equivariant” whenever doing so seems appropriate. Note that if m = n = 0, then II.2 becomes
a(0) b(0) c − b(0) a(0) c = a(0) b (0) c, (2.1.2) which is the usual Jacobi identity for (V,(0) ). Anticommutativity fails, but II.1 ensures that it holds up to T (...). This proves the following important and well-known Lemma 2.2. If V is a vertex Poisson algebra, then T (V ) ⊂ V is a 2-sided ideal w.r.t. and (V /T (V ),(0) ) is a Lie algebra.
(0) ,
508
F. Malikov
2.3. Tensor products. The simplest example of a vertex Poisson algebra is a commutative associative algebra V with derivation T . Defining a(−1) to be the multiplication by a and letting a(n) = 0 if n ≥ 0. makes V into a vertex Poisson algebra. If (V1 ,(n) T1 ) and (V2 ,(n) , T2 ) are two vertex Poisson algebras, then V1 ⊗ V2 carries at least two vertex Poisson algebra structures. First of all, one can simply regard V1 ⊗ V2 as an extension of scalars whereby V1 ⊗ V2 becomes a vertex Poisson algebra over V1 with derivation T2 and multiplications coming from V2 . Second of all, one can define T = T1 + T2 and
(a ⊗ b)(n) =
⎧ ⎪ ⎨ ⎪ ⎩
∞ i=0
a(−1) b(−1) if n = −1 1 . (2.3.1) if n ≥ 0 T1i a b(n+i) + a(n+i) T1i b (−1) (−1) i!
If, in addition, V1 is of the type we started with, i.e., if (V1 )(n) (V1 ) = 0 for all n ≥ 0, then (2.3.1) is simplified as follows:
(a ⊗ b)(n) =
⎧ ⎪ ⎨ ⎪ ⎩
∞ i=0
a(−1) b(−1) if n = −1 1 . T1i a b(n+i) if n ≥ 0 (−1) i!
(2.3.2)
In a sense, the second version is a twist of the first by derivation T1 ∈ Der(V1 ). In the context of equivariant vertex Poisson algebras this can be generalized as follows. If (V,(n) , T ) is an equivariant vertex Poisson algebra over B and ξ ∈ g, then letting a(n)ξ =
∞ 1 i ξ a (n+i) i!
(2.3.3)
i=0
defines a vertex Poisson algebra (V,(n)ξ , T + ξ ). We will refer to this construction as the ξ -twist. Note that the ξ -twist reduces the constants from B to the algebra of ξ -invariants, Bξ .
2.4. From vertex Poisson algebras to Courant algebroids. The Poisson vertex algebra structure on V = ⊕∞ n=0 Vn defines on the subspace V0 + V1 the following operations: (−1) (−1) (0) (0) (1)
: V0 ⊗ V0 : V0 ⊗ V1 : V1 ⊗ V0 : V1 ⊗ V1 : V1 ⊗ V1 T : V0
→ → → → → →
V0 , V1 , V1 ⊗ V0 → V1 , V0 , V0 ⊗ V1 → V0 , V1 , V0 , V1 ,
(2.4.1a) (2.4.1b) (2.4.1c) (2.4.1d) (2.4.1e) (2.4.1f)
all the other operations either not preserving the subspace V0 + V1 or being zero due to condition (2.1.1). Vertex Poisson algebra axioms imply that (2.4.1a–f) satisfy certain conditions; e.g., (2.4.1a) is such that (V0 ,(−1) ) is an associative commutative B-algebra, and (2.4.1b) is
Lagrangian Approach to Sheaves of Vertex Algebras
509
such that V1 is a V0 -module. In [GMS1], these conditions were written down explicitly and made into an axiomatic definition of a vertex algebroid – in a more complicated, quantum, situation. It is a nice observation due to Bressler [Bre] that under some nondegeneracy assumptions a quasiclassical limit of a vertex algebroid is an exact Courant V0 -algebroid; e.g. (2.4.1d) is the Dorfman
barcket [Dor,G] on V1 . Therefore, the assignment V → V0 ⊕ V1 , T,(−1) ,(0) ,(1) defines a functor from a subcategory of vertex Poisson algebras to the category of exact Courant V0 -algebroids. This functor is actually an equivalence of categories, and a classification of exact Courant algebroids furnishes that of a subclass of vertex Poisson algebras. For future use, and for the reader’s convenience – after all the present situation is somewhat different – let us now reproduce the essence of this argument. 2.4.1. We have seen already that the pair (V0 ,(−1) ) is an associative commutative B-algebra. Let A = V0 . The entire V , hence V1 , is an A-module and A(n) A = 0 if n ≥ 0. By virtue of Axiom I, the map T : A → AT (A) ⊂ V1 is a B − derivation,
(2.4.2)
i.e., T (ab) = aT (b) + bT (a) and T (B) = 0. Therefore, AT (A) is a quotient of the module of relative Kähler differentials, A/B . Assumption 1. Let (A; T : A → AT (A)) be isomorphic to (A; d : A → A/B ). There arises an exact sequence of A-modules 0 → A/B → V1 → V1 / A/B → 0.
(2.4.3)
Let T = V1 / A/B . It is an A-module and a Lie algebra w.r.t. the operation (0) , by virtue of Lemma 2.2. Furthermore, the map
(0)
:T ⊗A→ A
(2.4.4)
is well defined because (AT (A))(n) A = (AT (A))(n) (AT (A)) = 0, n ≥ 0.
(2.4.5)
This map gives A a T A/B -module structure compatible with the A-module structure in that (aξ )(0) b = a(ξ(0) b). For each τ ∈ T , τ(0) ∈ End(A) is actually a B-derivation of A, and this defines a Lie algebra homomorphism over A, T → Der B (A), All of this can be summarized by saying that T is an A-Lie algebroid. Assumption 2. Morphism (2.4.6) is an isomorphism.
(2.4.6)
510
F. Malikov
The map (0)
: T ⊗ A/B → A/B ,
(2.4.7)
also arising by virtue of (2.4.5), equals the Lie derivative: ξ(0) ω = Lieξ ω,
(2.4.8)
cf. (1.5.20d). (Indeed, ξ(0) (aT b) = (ξ(0) a)T b + a(T ξ(0) b)). Next, again thanks to (2.4.5), there arises the map : T ⊗ A/B → A.
(1)
(2.4.9)
It is the natural pairing of vector fields and forms: ξ(1) ω = ιξ ω.
(2.4.10)
(Indeed, ξ(1) (aT b) = (ξ(1) a)T b + a(ξ(1) T b) = a(ξ(0) b), where axioms II. 3 and III are used.) This determines all of (2.3.1a-f) that makes sense on the graded object A ⊕ (T ⊕ A/B ). To continue our analysis we need to make the following Assumption 3. Let sequence (2.4.3) be split. Let us fix a splitting s : T → V1 .
(2.4.11)
Then there arise the following two maps: (1)s (0)s
: T ⊗ T → A, : T ⊗ T → A/B ,
(2.4.12) (2.4.13)
where (2.4.12) is the restriction of (1) to s(T ), and (2.4.13) is the composition of the restriction of (0) to s(T ) with the projection V1 → A/B = V1 /s(T ). These two maps determine all of (2.4.1a-f). The map (1)s is, in fact, a symmetric A-bilinear form on T . By varying the splitting s it can killed. Indeed, letting h(., .) =(1)s , we obtain, for any ξ ∈ T , an A-linear form h(ξ, .) ∈ A/B . Replacing s with sh defined to be 1 sh (ξ ) = s(ξ ) − h(ξ, .) 2 we get (1)sh = 0. Therefore, we can, and usually will, assume that V1 = T ⊕ A/B
(2.4.14)
Lagrangian Approach to Sheaves of Vertex Algebras
511
and (1)
: T ⊕ A/B ⊗ T ⊕ A/B → A
(2.4.15)
is the canonical pairing (ξ + ω)(1) (ξ + ω ) = ιξ ω + ιξ ω, cf. (2.4.10). 2.4.2. Therefore, all moduli, if any, come from (0)s . A short computation shows that it is A-linear. Furthermore, Axiom IV implies that
(2.4.16) (0)s ∈ Hom g T ⊗ T , A/B . Hence (0)s can be considered as an A-trilinear g-invariant functional on T , and as such it will be denoted by H : g ⊗3 . (2.4.17) (0)s ≈ H ∈ A/B Skew-commutativity II.1 implies that it is anti-commutative in the first 2 variables: H (ξ, η, .) = −H (η, ξ, .). Jacobi identity II.2 applied to [ξ(1) , η(0) ](ζ ), ξ, η, ζ ∈ s(T A ), shows that, in fact, g H (., ., .), is totally anti-commutative, hence belongs to 3A/B . Jacobi identity II. 2 applied to [ξ(0) , η(0) ](ζ ), ξ, η, ζ ∈ s(T A ), shows that H is closed, i.e., g . (2.4.18) H ∈ 3,cl A/B Conditions (2.4.14-15 or 15h ) do not determine the splitting s; they are respected by the shearing transformation g T ξ → ξ + ιξ α for a fixed α ∈ 2A/B . (2.4.19) The effect of this transformation on H is H → H + d D R α.
(2.4.20)
2.4.3. Checking the various properties of maps (2.4.1a–f) derived in 2.3.1 against the definition of an exact Courant A-algebroid [LWX] (especially in the form proposed in [Bre]) shows the following. If Assumptions 1–3 hold, then the equivariant Poisson vertex algebra structure on V defines an equivariant exact Courant A-algebroid structure on T ⊕ A such that (0)
: (T ⊕ A ) ⊗ (T ⊕ A ) → T ⊕ A
is the Dorfman [Dor,G] bracket, (1)
: (T ⊕ A ) ⊗ (T ⊕ A ) → A
is the symmetric pairing, and (2.4.6) is the anchor. The discussion in 2.4.2 practically proves (see [Bre,GMS1] for a complete g analy3,cl sis) that the category of exact equivariant Courant A-algebroids is an A/B -space.
512
F. Malikov
Indeed, if C is one such algebroid and H ∈ algebroid
g 3,cl A/B , then the H -twisted Courant
.
C + H is defined by replacing (0)s with (0)s + H.
(2.4.21)
A “canonical” Courant algebroid C0 can be chosen by letting the only “unknown” operation (0)s be zero: define C0 to be s.t. (2.4.14–15) hold and (0)s = 0.
(2.4.22) g This identifies the category of equivariant exact Courant A-algebroids with 3,cl A/B s.t. g . 3,cl
H → C H = C0 + H. (2.4.23) A/B The effect of shear (2.4.19) on H recorded in (2.4.20) implies the following description of morphisms: g . s.t. d D R α = H , (2.4.24) Mor(C, C + H ) = α ∈ 2A/B and automorphisms
g . Aut(C) = 2,cl A/B
(2.4.25)
In particular, the set of isomorphism classes of exact Courant A-algebroids is identified with the g-invariant de Rham cohomology group, g g /d D R 2A/B . (2.4.26) 3,cl A/B 2.5. Symbols of vertex differential operators. Let be an open subset of a Rd , U of Rn , and U = U × . Define B = C ∞ (), g = TRd (). Identify g with the subalgebra of horizontal vector fields on U , thereby making C ∞ (U ) into a g-module. These are the prerequisites to the definition of a TRd ()-equivariant vertex Poisson algebra over B. Definition 2.5.1. Call V an algebra of symbols of vertex differential operators, SVDO for short, if (i) V0 = C ∞ (U ), V1 is a TRd ()-equivariant exact Courant C ∞ (U )-algebroid over B = C ∞ (), (ii) V is generated as an associative commutative algebra with derivation T by V0 ⊕V1 . The discussion in 2.4.3 means that we have obtained a functor, say F, from the category of SVDOs to the category of equivariant exact Courant C ∞ (U )-algebroids: F : {SVDOs} → {Courant algebroids}.
(2.5.1)
Theorem 2.5.2. ([GMS1,Bre]). This functor is an equivalence of categories. To be precise, [GMS1,Bre] only construct F ∗ , the left adjoint to F, but a simple representation-theoretic argument shows that the “vertex envelope”, F ∗ (C), is simple. (Indeed, by construction any element of F ∗ (C) can be moved to A = F ∗ (C)0 by a sequence of operations a(n) , τ(n) , where a ∈ A and τ ∈ T , and thus generate the entire F ∗ (C).)
Lagrangian Approach to Sheaves of Vertex Algebras
513
2.6. A sheaf-theoretic version. All of this can be spread over manifolds. The geometric prerequisite is a fiber bundle π : M →
(2.6.1a)
∇ : T → T M .
(2.6.1b)
with a flat connection
A sheaf of SVDOs, V, over M is a sheaf of vector spaces s.t. the space of sections V(U ) is an SVDO for each open U ⊂ M with V(U )0 = O M (U ), B(U ) = π ∗ O (πU ), g = T (πU ),
(2.6.2)
and equivariant structure determined by ∇. The condition that V(U )0 = O M (U ) implies that V is automatically a sheaf of O M (U )-modules. It follows from (2.4.6) that the next homogeneous component, V1 is an extension of vertical vector fields by relative 1-forms: 0 → M / → V1 → T M / → 0.
(2.6.3)
As to the existence of such sheaves, they are plentiful locally: for any sufficiently small ∇ open U ⊂ M , the category of such sheaves over U is an 3,cl (U )-space, as M / ∇ follows from (2.4.23). If VU is one such sheaf and H ∈ 3,cl (U ), then M / ∇ 2 Mor(VU , VU + H ) = α ∈ M / (U ) s.t. d D R α = H , .
(2.6.4)
cf. (2.4.24). Technically, (2.6.4) means that there is a gerbe, in particular, a sheaf of categories, of SVDOs bound by the sheaf complex ∇ ∇ dD R 0 → 2M / (U ) → 3,cl (U ) → 0, M / so that the categories over sufficiently small U are equivalent to that of SVDOs with V0 = O M (U ). A priori there may be no single sheaf of SVDOs on the entire M; an obstruction to its existence is a certain canonical characteristic class lying in ∇ ∇ H 2 M , 2M / . → 3,cl M / At this point let us return to the concrete situation of interest to us, where M = M× and ∇ is the horizontal connection. If so, the above discussion is simplified in that the ∇ sheaves •M / can be replaced with •M . For example, the obstruction becomes a class lying in H 2 (M, 2M → 3,cl M ).
514
F. Malikov
This class vanishes; the obstruction (equal to the 1st Pontryagin class) computed in [GMS1], see also [Bre], is a purely quantum phenomenon, and in any case, an example of such sheaf will be exhibited shortly. Furthermore, (2.4.24–26) imply that the set of isomorphism classes of such sheaves is an H 1 (M, 2M → 3,cl M )-torsor, and the group of automorphisms of any such sheaf ∼ 2,cl 0 is isomorphic to H 0 (M, 2M → 3,cl M ) −→ H (M, M ). Note that since the sequence dD R
3,cl 2 0 → 2,cl M → M → M → 0
(2.6.5)
is exact, we obtain isomorphisms ∼
2,cl 1 H 1 (M, 2M → 3,cl M ) −→ H (M, M ), ∼
2,cl 0 H 0 (M, 2M → 3,cl M ) −→ H (M, M ).
(2.6.6)
The long exact cohomology sequence associated with (2.6.5) implies, in addition, that ∼
∼
0 3 0 2 3 H 1 (M, 2,cl M ) −→ H (M, M )/d H (M, M ) −→ H (M, R),
(2.6.7)
where the last isomorphism is the de Rham theorem. This proves Proposition 2.6.1. a) The set of isomorphism classes of sheaves of SVDOs on M = M × with horizontal connection is identified with either of the isomorphic 3 groups H 1 (M, 2,cl M ) and H (M, R). b) If V is a sheaf of SVDOs over M, then ∼
AutV −→ H 0 (M, 2,cl M ). 2.6.2. Here is an explicit construction of identifications a) and b) of Proposition 2.6.1. The presentation of the set of isomorphism classes as H 1 (M, 2,cl M ) emphasizes the fact that locally all such sheaves are isomorphic (this is an immediate consequence of (2.4.26)). Indeed, let {Ui } be a covering by balls. Let Vi be the restriction V to Ui . Then there arise canonical identifications, ∼
φi j : Vi |Ui ∩U j −→ V j |Ui ∩U j ,
(2.6.8)
ˇ to be thought of as gluing functions. Let now αi j ∈ 2,cl M (Ui ∩ U j ) be a Cech cocycle 2,cl 1 representing α ∈ H (M, M ). Regarding αi j as an automorphism of V j |Ui ∩U j , define .
∼ def φˆ i j = φi j + αi j : Vi |Ui ∩U j −→ V j |Ui ∩U j ,
(2.6.9)
ˇ cocycle to be the composition of φi j and the shear by αi j defined in (2.4.19). The Cech ˆ ˆ ˆ condition satisfied by {αi j } implies that φik ◦ φk j ◦ φ ji = id on the triple intersection Ui ∩ U j ∩ Uk for any i, j, k. Thus φˆ i j are gluing functions of a new sheaf of SVDOs, to . be denoted V + α.
Lagrangian Approach to Sheaves of Vertex Algebras
515
Contrary to this, the presentation of the set of isomorphism classes as H 3 (M, R) has nothing to do with gluing functions or even the O M -module structure. Indeed, for an element of H 3 (M, R), pick a global closed 3-form H representing it. By definition . (2.4.21), the sheaf V + H is different from V only in that the operation (0)
: V0 ⊗ V0 → V0 .
is replaced with (0) + H (and the sheaf V + dβ, β a global 2-form, is canonically isomorphic to V). . The relation of one point of view to another is as follows. For example, given V + H , find a collection β = {βi ∈ 2M (Ui )} so that dβi = H |Ui . Then dCˇ (β) is de Rham-closed ˇ and hence is a Cech 1-cocycle with coefficients in 2,cl M . The map H 0 M, 3,cl
H → β → class of dCˇ (β) ∈ H 1 M, 2,cl (2.6.10) M M descends to the inverse of (2.6.7). . Now, (V + H )|Ui = Vi as vector spaces but not as SVDOs; to obtain an SVDO isomorphism, the shear by βi is needed: .
βi : Vi → (V + H )|Ui .
(2.6.11)
The effect of this transformation on the gluing functions is as follows: .
φi j → φi j + dCˇ β,
(2.6.12)
cf. (2.4.20), and this delivers the desired isomorphism .
∼
.
V + H −→ V + (class of dCˇ β).
(2.6.13)
2.7. A natural sheaf of SVDOs. Let us attach to any smooth M a sheaf of SVDOs which depends on M functorially. In order to do so, let us place ourselves in the situation where T ∗ M = T ∗ M ×, satisfies (1.1.2-4) and carries, in particular, a distinguished coordinate system, σ and τ . Taking advantage of (1.1.7), we note that the operator of the jet connection, (1.1.5), splits in the vertical and horizontal components, e.g., ρ(∂σ ) = ∂σv + ∂σh ,
(2.7.1)
where the latter stands for the operator of differentiation w.r.t. σ “appearing explicitly”. Let π : J ∞ (T ∗ M/ ) → M
(2.7.2)
be the natural projection. There arises the direct image of the structure sheaf π∗ O J ∞ (T ∗ M/ ) which we will take the liberty to denote also by O J ∞ (T ∗ M/ ) because this is unlikely to cause confusion. Thus, for example, if U ⊂ M is open, then O J ∞ (T ∗ M/ ) (U ) will stand for the space of functions on the jet-space regular over π −1 (U ).
516
F. Malikov
Being a structure sheaf, O J ∞ (T ∗ M/ ) carries a canonical multiplication. Let us define a grading O J ∞ (T ∗ M/ ) =
∞
OiJ ∞ (T ∗ M
i=0
OiJ ∞ (T ∗ M ) /
j · O J ∞ (T ∗ M ) /
⊂
/ )
s.t. (2.7.3)
i+ j O J ∞ (T ∗ M ) /
by requiring that the pull-back of functions on M have degree 0, the pull-back of fiberwise linear functions on T ∗ M have degree 1, and the operator ∂σv , defined in (2.7.1), have degree 1, i.e., that ∂σv (OiJ ∞ (T ∗ M ) ) ⊂ Oi+1 . Thus, for example, J ∞ (T ∗ M ) /
O0J ∞ (T ∗ M
/ )
= O M , O1J ∞ (T ∗ M
/
/ )
= T M / ⊕ M / ,
(2.7.4)
cf. (2.6.3), where T M / is realized inside O J ∞ (T ∗ M/ ) as the pull-back of fiberwise linear functions on T ∗ M, and M / is realized as O M ∂σv O M , cf. Sect. 2.4.1, Assumption 1. Proposition 2.7.1. The sheaf O J ∞ (T ∗ M/ ) carries a unique structure of a sheaf of SVDOs over B = O such that (−1) is the canonical multiplication, T = ∂σv (which furnishes (2.4.1a,b,f) in this case), and (2.4.1c–e) take the following form: if ξ, ξ ∈ T M , ω, ω ∈ M , then
(0) : T M / ⊕ M / ⊗ O M → O M , O M ⊗ T M / ⊕ M / → O M ,
(0)
(ξ + ω)(0) F = −F(0) (ξ + ω) = ξ F, (2.7.5)
: T M / ⊕ M / ⊗ T M / ⊕ M / → T M / ⊕ M / ,
(ξ + ω)(0) ξ + ω = [ξ, ξ ] + Lieξ ω − Lieξ ω + ∂σv ιξ ω , (2.7.6)
(1) : T M / ⊕ M / ⊗ T M / ⊕ M / → O M , (2.7.7)
(ξ + ω)(1) ξ + ω = ιξ ω + ιξ ω.
Note that (2.7.5–7) restricted to some U ⊂ M are nothing but the definition of the canonical Courant C ∞ (U )-algebroid C0 of (2.4.21); therefore O J ∞ (T ∗ M/ ) (U ) is nothing but F ∗ (C0 ), where F is equivalence of categories (2.5.1). The vertex Poisson algebra structure of Proposition 2.7.1 is not quite what we need. Being T -equivariant, it is subject to the ξ -twist, see (2.3.3), for any ξ ∈ H 0 (, T ).
J ∞ (T ∗ M ) denote the sheaf O J ∞ (T ∗ M ) with the vertex Definition 2.7.2. Let O / / Poisson algebra structure defined in Proposition 2.7.1 and let O J ∞ (T ∗ M/ ) denote the latter’s ∂σh -twist, see (2.7.1). Note that in the case of O J ∞ (T ∗ M/ ) , the derivation T becomes T = ρ(∂σ ).
(2.7.8)
Lagrangian Approach to Sheaves of Vertex Algebras
517
In particular, (2.7.6) is changed as follows:
(0) : T M / ⊕ M / ⊗ T M / ⊕ M / → T M / ⊕ M / ,
(2.7.9) (ξ + ω)(0) ξ + ω = [ξ, ξ ] + Lieξ ω − Lieξ ω + ρ(∂σ ) ιξ ω , and the operations on O J ∞ (T ∗ M/ ) are no longer linear over O , only over O . Let us now relate O J ∞ (T ∗ M/ ) to the canonical Lie algebra sheaf Hcan defined in (1.5.17). Lemma 2.2 associates with O J ∞ (T ∗ M/ ) the sheaf of Lie algebras Lie(O J ∞ (T ∗ M/ ) ) = O J ∞ (T ∗ M/ ) /ρ(∂σ )O J ∞ (T ∗ M/ ) . Proposition 2.7.3. The Lie algebra sheaves Hcan and Lie(O J ∞ (T ∗ M/ ) ) are canonically isomorphic. Proof. The sheaf isomorphism ∼
O J ∞ (T ∗ M/ ) −→ 1,0 J ∞ (T ∗ M
/ )
, F → Fdσ
descends to ∼
O J ∞ (T ∗ M/ ) /ρ(∂σ )O J ∞ (T ∗ M/ ) −→ 1,0 J ∞ (T ∗ M
/ )
/dρ/ 0,0 J ∞ (T ∗ M
/ )
.
Lemma 1.5.4.1 (and (1.5.17)) identifies the range of this map with Hcan , and thanks to (2.7.8), the domain of this map is Lie(O J ∞ (T ∗ M/ ) ) – it is at this point that we need the ∂σh -twist; hence a sheaf isomorphism ∼
Lie(O J ∞ (T ∗ M/ ) ) −→ Hcan .
(2.7.10)
Map (2.7.10) respects all defining relations (1.5.17–18a,b): (1.5.18) is (part of) (2.7.5,6), (1.5.19a) is Sect. 2.1, Axiom III, and (1.5.19b) is Sect. 2.1, Axiom II.3 (another point where the ∂σh -twist is necessary). Hence (2.7.10) is a Lie algebra sheaf isomorphism. Terminology 2.7.4. We have obtained two families of sheaves of vertex Poisson algebras. First, those provided by the combination of Propositions 2.6.1a) and 2.7.1. They .
J ∞ (T ∗ M ) + H , where H ∈ H 0 (M, 3,cl ) represents a can be realized as either O /
M
.
J ∞ (T ∗ M ) + {αi j } , where {αi j } is a cocycle 3-dimensional cohomology class, or O representing an element of H 1 (M, 2,cl M ).
/
.
Second, their ∂σh -twisted versions, to be denoted by O J ∞ (T ∗ M/ ) + H and
. O J ∞ (T ∗ M/ ) + {αi j } . As Proposition 2.7.3 indicates, it is the latter that will be of importance. Note, however, that these choices have arisen only
because we have included
. .
J ∞ (T ∗ M ) + {αi j } and O J ∞ (T ∗ M ) + {αi j } functions of τ and σ . In fact, both O / / induce the same vertex Poisson algebra structure on the fiber at any
point
(σ, τ ) ∈ . . For this reason sheaves such as O J ∞ (T ∗ M/ ) + {αi j } , where {αi j } will also be referred to as sheaves of SVDOs.
518
F. Malikov
2.8. The Lagrangian interpretation. Let us place ourselves in the situation of 1.6.8 and assume that the Lagrangian L ∈ H 0 (J ∞ (M ), 2,0 J ∞ (M ) ) is of order 1, globally defined, and convex. A combination of Proposition 2.7.3 and Lemma 1.6.8.1 gives ∼ ¯L ˜ HωSol/ −→ Lie(O J ∞ (T ∗ M/ ) ), I L ⊂ (M , Lie(O J ∞ (T ∗ M/ ) )). (2.8.1)
In this sense the universal sheaf of SVDO’s O J ∞ (T ∗ M/ ) governs the theory associated to L. In order to interpret similarly all the other, twisted, sheaves of SVDO’s provided by Proposition 2.6.1a), one needs to consider Lagrangians (1.6.1a,b) that do not glue in a global section of 2,0 J ∞ (M ) . One possibility to construct such a Lagrangian is to add what a physicist might call a Wess-Zumino term or an H -flux, cf. [GHR,W1]. Fix a global closed 3-form H on M and let {Ui } be an open covering of M fine to ensure the existence of a collection of 2-forms {β (i) ∈ 2M (Ui ) s.t. dβ (i) = H on Ui }.
(2.8.2)
L H = {L (i) = L + β (i) (ρ(∂τ ), ρ(∂σ ))dτ ∧ dσ }.
(2.8.3)
Define
It follows from (2.8.4) that on double intersection β (i) − β ( j) are closed and, provided {Ui } is fine enough, are exact, i.e., there is a collection of 1-forms, {α (i j) } such that β ( j) − β (i) = dα (i j) . Then a quick computation shows that L ( j) − L (i) = dρ
ιρ(∂τ ) α (i j) dτ + ιρ(∂σ ) α (i j) dσ .
Therefore, collection (2.8.3) is a new Lagrangian in the sense of (1.6.1a,b). The L H is a collection of locally defined Lagrangians, which are still order 1 and convex, hence Sol L H can still be identified with the universal J ∞ (T ∗ M/ ). One way to define such identification is to use L, as in (2.8.1): ∼
dT M L : (Sol L H ) −→ J ∞ (T ∗ M/ ),
(2.8.4)
but the obvious counterpart of (2.8.1) fails in this case. Instead, (2.8.4) gives an isomorphism of the twisted sheaf, see (2.4.21), .
∼
ω¯
H
L Lie(O J ∞ (T ∗ M/ ) + H ) −→ H/ .
(2.8.5)
.
This attaches the twisted sheaf O J ∞ (T ∗ M/ ) + H to the Lagrangian L H . To see how the twist comes about note that the Legendre transform dT M L used in (2.8.4) does not respect the canonical variational 2-form ω L H , see (1.6.4). This can be straightened out locally. According to (1.6.13), one way to proceed is to choose, over Ui , the mapping to be dT M L (i) . Since L (i) = L + β (i) , 1 dT M L (i) (ξ ) = dT M L(ξ ) + ιξ β (i) , 2
(2.8.6)
Lagrangian Approach to Sheaves of Vertex Algebras
519
as follows, e.g., from local formulas (1.6.14). But mappings (2.8.6) are incompatible on ˇ double intersections Ui ∩ U j , the obstruction being the Cech cocycle 1 dCˇ {β (i) } = {β ( j) − β (i) } ∈ Z Cech (M, 2,cl ˇ M ).
(2.8.7) .
In order to restore the compatibility, let us introduce the twisted sheaf O J ∞ (T ∗ M/ ) +
dCˇ {β (i) } obtained by twisting the gluing functions of O J ∞ (T ∗ M/ ) over Ui ∩ U j by the 2-form β ( j) − β (i) , as we did in (2.6.9). Then the collection of mappings ∼
{(dT M L (i) )∗ : O J ∞ (T ∗ M/ ) (Ui ) −→ O Sol L H (Ui ) delivers a map of the twisted sheaf . O J ∞ (T ∗ M/ ) + dCˇ {β (i) } → O Sol L H ,
(2.8.8a)
(2.8.8b)
so that the arising map ∼ . ω¯ L H Lie O J ∞ (T ∗ M/ ) + dCˇ {β (i) } −→ H/
(2.8.9)
is a Lie algebra sheaf isomorphism. It is explained in some detail in 2.6.2 that this sheaf . is the same as O J ∞ (T ∗ M/ ) + H , see (2.6.13); hence (2.8.9) is equivalent to (2.8.5). Incidentally, the classification of automorphisms of SVDO’s, Proposition 2.6.1b) is also accurately reflected in the Lagrangian approach. Given a globally defined Lagrangian and a closed 2-form β, a B-field, let L β = L + β(ρ(∂τ ), ρ(∂σ ))dτ ∧ dσ , cf. (2.8.3). This does nothing to either the corresponding equations of motion or the corresponding variational 2-form. Hence Sol L = Sol L β , literally, as pre-symplectic manifolds, but there arise two competing Legendre transforms, dT M L and dT M L β . A moment’s thought shows that the latter is the composition of the former with the B-field transform, ξ → ξ + ιξ β, and this provides the Lagrangian realization of the automorphism of the SVDO O J ∞ (T ∗ M/ ) associated to β in Proposition 2.6.1b. The subalgebras of integrals of motion I˜ L → (Sol L , Hω¯ L ), arising by virtue of /
Lemma 1.6.8.1, also tend to come from vertex Poisson subalgebras of O J ∞ (T ∗ M/ ) . For example, the three Virasoro algebras, left, right, and “half-twisted”, see (1.6.22,23), are the Lie-functor evaluated on the three subalgebras of (M, O J ∞ (T ∗ M/ ) ) generated by 1 1 1 ij g xi x j + gi j ∂σ x i ∂σ x j − x j ∂σ x j , 4 4 2 1 1 1 − g i j xi x j − gi j ∂σ x i ∂σ x j − x j ∂σ x j , 4 4 2
(2.8.10)
−x j ∂σ x j , respectively. The global nature of these local formulas was unraveled in 1.6.9.2. 2.9. An example: WZW model. Let us see how all of this plays out in the case where the target manifold is a real Lie group G, either compact and simple or GL(n, R).
520
F. Malikov
2.9.1. Classification. Let g = LieG be the corresponding Lie algebra. Fix an invariant bilinear form g ∈ S 2 (g∗ )g and an invariant trilinear form H (x, y, z) = g([x, y], z).
(2.9.1)
The left translates of these generate the invariant metric and 3-form (resp.) on G, which we will take the liberty of denoting by the same letters g ∈ H 0 (G, TG⊗2 ), H ∈ H 0 (G, 3G ).
(2.9.2)
Note that the latter is closed: H ∈ H 0 (G, 3,cl G ).
(2.9.3)
H 3 (G, R) = R · (class of H ).
(2.9.4)
It is well known that
Therefore, Proposition 2.6.1a) implies that the set of isomorphism classes of SVDO’s on G form a 1-parameter family: def
.
S DG,k = O J ∞ (G / ) +
−k H. 2
(2.9.5)
As it was explained in Sect. 2.6.2, the structure of S DG,k is determined by the following: there is a fixed splitting
S DG,k
1
= TG / ⊕ G / ,
(2.9.6)
poiss and the vertex Poisson algebra structure makes DG,k into the Courant algebroid 1 that satisfies k (2.4.14, 15)hold true, and(0)s = − H, 2
(2.9.7)
cf. (2.4.21–23). Induced by the action on the left and on the right, there are the corresponding Lie algebra g = Lie G embeddings in the space of global vector fields jl0 : g → H 0 (G , TG / ), jr0 : g → H 0 (G , TG / ) s.t. [ jl0 (g), jr0 (g)] = 0. (2.9.8)
These embeddings respect the SVDO structure on S DG,0 1 in that jl0 ([x, y]) = jl0 (x)(0) jl0 (y) , jl0 (x)(n) jl0 (y) if n > 0,
(2.9.9a)
jr0 ([x, y]) = jr0 (x)(0) jr0 (y) , jr0 (x)(n) jr0 (y) if n > 0,
(2.9.9b)
Lagrangian Approach to Sheaves of Vertex Algebras
521
and
jl0 (x)
(n)
jl0 (x) = 0 if n ≥ 0,
(2.9.9c)
as it follows from either (2.7.6) or (2.4.6). Technically, (2.9.9a–c) mean the following. Associated to g there is a Z+ -graded vertex Poisson algebra, V (g)k , see e.g. [FB-Z]. It is the universal vertex Poisson algebra generated by (V (g)k )0 = R, (V (g)k )1 = g,
(2.9.10)
⎧ ⎨ kg(x, y) if n = 1 [x, y] if n = 0 x(n) y = ⎩ 0 if n > 1.
(2.9.11)
such that for any x, y ∈ g,
By definition, (2.9.9a–c) imply that maps (2.9.8) can be extended to vertex Poisson algebra maps jl0 : V (g)0 → H 0 (G , S DG,0 ), jr0 : V (g)0 → H 0 (G , S DG,0 )
(2.9.12)
such that
jl0 (V (g)0 )
(n)
jr0 (V (g)0 ) = 0 if n ≥ 0.
(2.9.13)
0 must be deformed. Let In order to carry this over to k = 0, the maps jl/r
k poiss jlk : g → DG,k , jlk (x) = jl0 (x) + g jl0 (x), . , 1 2 k poiss k k 0 , jr (x) = jr (x) − g jr0 (x), . , jr : g → DG,k 1 2
(2.9.14) (2.9.15)
Theorem 2.9.1.1. [FP,F,AG,GMS2]. Maps (2.9.14,15) extend to vertex Poisson algebra embeddings jlk
jrk
V (g)k → H 0 (G , S DG,k ) ← V (g)−k
(2.9.16)
such that
jlk (V (g)k )
(n)
jrk (V (g)−k ) = 0 if n ≥ 0.
(2.9.17)
Remark 2.9.1.2. This appealing result has a long and somewhat unhappy history. A version of it first appeared in [FP] (in a more complicated, quantum, situation) but apparently had been known even earlier to E. Frenkel, [F] – all of this before the introduction of sheaves of vertex algebras – and then was thoroughly forgotten. Arkhipov and Gaitsgory [AG] gave a proof in the language of chiral algebras. Our presentation is close to [GMS2].
522
F. Malikov
The algebra V (g)k has a well-known family of modules, Vλ,k , induced from Vλ , the simple finite dimensional g-module with highest weight λ, see e.g. [FBZ]. According to Theorem 2.9.1.1, H 0 (G , S DG,k ) is a V (g)k ⊗ V (g)−k -module, see Sect. 2.3 for the definition of the tensor product of vertex Poisson algebras. Proposition 2.9.1.3. If k = 0, then there is an isomorphism of V (g)k ⊗ V (g)−k -modules
∼ ˆ ⊕λ Vλ,k ⊗ Vλ∗ ,−k , H 0 (G , S DG,k ) −→ C ∞ ()⊗
(2.9.18)
where λ∗ stands for the highest weight of the g-module dual to Vλ . Sketch of Proof. The validity of decomposition (2.9.18) for the subspace H 0 (G , (S DG,k )0 ) is the content of the Peter-Weyl theorem. It is not hard to deduce from (2.9.14,15) that any R-basis, B, of jlk (g) ⊕ jrk (g) is a basis of H 0 (G , (S DG,k )1 ) poiss over functions if and only if k = 0. Hence, the entire H 0 (G , DG,k ) is the space of differential polynomials in B over functions. Decomposition (2.9.18) follows at once from the induced nature of modules Vλ,k . Remark 2.9.1.4. A proof – in the quantum case – of (2.9.18) for a generic k first appeared in [FS]. Our proof goes through in the quantum case as well as also for a generic k. It is shown in a recent preprint [Zh] what may happen at special values of k. Decomposition (2.9.21) is tantalizingly similar to the space of states of the WZW model to which S DG,k is indeed intimately related. 2.9.2. WZW. Consider the standard σ -model Lagrangian with target G: Lκ =
κ g ((∂τ − ∂σ ) x, (∂τ + ∂σ ) x) dτ ∧ dσ, 2
(2.9.19)
cf. (1.6.16), where g(., .) is the invariant metric (2.9.2) and κ is an arbitrary constant. −k/2H Next use the 3-form H of (2.9.2) to obtain L κ as explained in (2.8.4–5). The WZW Lagrangian [W1] is −k H
L W Z W = L k/22 .
(2.9.20)
As follows from (2.8.5) and normalization (2.9.5), the sheaf S DG,k governs the theory −k/2H for any κ. It is clear why the H -twist of (2.9.19) is needed – the associated to L κ pleasing decomposition (2.9.18) is valid only if k = 0. Let us now explain the choice of κ made in (2.9.20). Recall that Lagrangian (2.9.19) is conformally invariant, i.e., the corresponding algebra of integrals of motion contains two Virasoro subalgebras Vir ± , see (1.6.17). It is easy to see that the twisted version, −k/2H Lκ , is also, and Vir ± are still the corresponding integrals of motion. By virtue of (2.8.5) the Legendre transform delivers the embeddings
Vir ± → G, Lie S DG,k .
(2.9.21)
Lagrangian Approach to Sheaves of Vertex Algebras
523
On the other hand, each V (g)k carries its own Virasoro element – a well-known fact. By virtue of Theorem 2.9.1.1, there arise then two more Virasoro subalgebras
Vir l → G, Lie S DG,k ← Vir r . (2.9.22) Lemma 2.9.2.1. Upon taking the images of (2.9.21–22) Vir + = Vir l , Vir − = Vir r
(2.9.23)
if and only if κ = k/2. This allows to compute the left/right moving subalgebra, see Definition 1.6.9.1. Corollary 2.9.2.2. The right moving subalgebra of WZW is Lie(C ∞ () ⊗ V (g)k ) and the left moving is Lie(C ∞ () ⊗ V (g)−k ). The Lie-functor appearing in 2.9.2.1–2 only obscures the matter, of course. Armed with the notion of a vertex Poisson algebra we can easily refine both Definition 1.6.9.1 and 2.9.2.1–2. The Lie algebra Vir itself is the Lie-functor applied to a certain vertex Poisson algebra, Vir. Embeddings (2.9.21–22) are engendered by vertex Poisson algebra embeddings of 4 copies of Vir: Vir ± → (G, S DG,k ),
(2.9.24)
Vir → (G, S DG,k ) ← Vir .
(2.9.25)
l
r
Lemma 2.9.2.1 can be refined as follows: upon taking the images of (2.9.24,25) Vir + = Virl , Vir − = Virr iff κ =
k . 2
(2.9.26)
Definition 1.6.9.1 can be similarly refined: Definition 2.9.2.3. Let the left/right moving subalgebras of S DG,k be + S DG,k = {v ∈ S DG,k s.t. v(n) Vir − = 0 ifn ≥ 0},
(2.9.27)
− S DG,k = {v ∈ S DG,k s.t. v(n) Vir + = 0 ifn ≥ 0}.
(2.9.28)
The refined form of Corollary 2.9.2.2 is this: − + S DG,k = C ∞ () ⊗ V (g)k , S DG,k = C ∞ () ⊗ V (g)−k .
(2.9.29)
Proof 2.9.2.4. We will prove (2.9.26) and (2.9.29) from which Lemma 2.9.2.1 and Corollary 2.9.2.2 follow immediately. Proving (2.9.26) amounts to painstakingly translating from Sect. 2.9.1 to Sect. 2.9.2, the Legendre transform being the main tool. To facilitate bookkeeping, we will assume that G = GL(n, R); an extension via a faithful representation to compact Lie groups is immediate. Let then x i j be coordinates, ∂i j = ∂/∂ x i j , and {E i j } the standard basis of gl(n, R). The invariant metric is g = xtα d x α j x jβ d x βt ,
(2.9.30)
where xtα are defined so that xtα x α j = δt , and the summation w.r.t. repeated indices is always assumed. j
524
F. Malikov
Embeddings (2.9.8.) take the form jl0 (E i j ) = x αi ∂α j ,
(2.9.31)
jr0 (E i j )
(2.9.32)
= −x
jα
∂iα .
By virtue of (2.9.30), definitions (2.9.14,15) read k jlk (E i j ) = x αi ∂α j + x jγ ∂σ x γ i , 2 k k jα jr (E i j ) = −x ∂iα + xγ i ∂σ x jγ . 2
(2.9.33) (2.9.34)
Finally, the elements that generate the two corresponding Virasoro vertex Poisson algebras inside S DG,k , cf. (2.9.25), are Virl =
, Virr =< jrk (E i j ) jrk (E ji ) > . k k
(2.9.35)
−k/2H
, one needs to use To recapitulate all of this in terms intrinsic to the Lagrangian L κ the twisted version of the Legendre transform, see (2.8.4), i.e., apply (1.6.13–14) not to −k/2H Lκ but to L 0κ . This amounts to letting ∂i j =
∂ L 0κ ; ∂(∂τ x i j )
thus ∂i j = κ xαi ∂τ x βα x jβ .
(2.9.36)
Plugging this in (2.9.33–34) gives k αi αi x jα , = κ∂τ x + ∂σ x 2 k k jα jα xαi . jr (E i j ) = −κ∂τ x + ∂σ x 2
jlk (E i j )
(2.9.37) (2.9.38)
It is pleasing to notice that precisely when κ = k/2, the latter formulas become the WZW currents, see [W1], (15) or [GW], (2.3), jlk (E i j ) = k∂+ x αi x jα ,
(2.9.39)
jrk (E i j )
(2.9.40)
= k∂− x
jα
xαi ,
where ∂± = (∂σ ± ∂τ )/2. Now to the Virasoro subalgebras. Plugging (2.9.37,38) in (2.9.35) one finds similarly that precisely when κ = k/2 the corresponding Virasoro elements are Virl = < kg(∂+ x, ∂+ x) >, Virr = < kg(∂− x, ∂− x) >,
(2.9.41) (2.9.42)
i.e., defined by the familiar, see (1.6.20), formulas for Vir ± . This concludes our proof of (2.9.26).
Lagrangian Approach to Sheaves of Vertex Algebras
525
Now to (2.9.29). Having at our disposal (2.9.26), we infer from Theorem 2.9.1.1 that − + C ∞ () ⊗ V (g)k ⊂ S DG,k , C ∞ () ⊗ V (g)−k ⊂ S DG,k .
(2.9.43)
To prove the reverse inclusions, let 1 1 L l = jlk (E i j ) jlk (E ji ), L r = jrk (E i j ) jrk (E ji ). k k It follows easily from the definition of the modules Vλ,k that def
def
KerL l(0) = V0,−k = V (g)−k , Ker L r(0) = V0,k = V (g)k .
(2.9.44)
By definition then − + C ∞ () ⊗ V (g)k ⊃ S DG,k , C ∞ () ⊗ V (g)−k ⊃ S DG,k ,
(2.9.45)
which concludes the proof of (2.9.29). 3. Supersymmetric Analogues 3.1. Bits of supergeometry. All of the geometric background of Sect. 1 allows more or less straightforward super-generalization. We will explain this very briefly, and in less generality, because our exposition will be more example-oriented. Such sources as [DM,L,M1] provide an introduction to supermathematics. 3.1.1. Super world-sheet. The world-sheet is now a 2|2-dimensional real C ∞ -manifold either with a fixed coordinate system ˆ → R2|2 (u, v, θ + , θ − ) :
(3.1.1a)
or a fixed étale coordinate system ˆ (u, v, θ + , θ − ) : R2|2 → ,
(3.1.1b)
θ±
where (u, v) are even and are odd. We have the underlying even manifold ˆ = {θ + = θ − = 0} →
(3.1.2)
ˆ (u,v) → .
(3.1.3)
and the bundle
The time-fibration will be defined to be the composition τ ˆ (u,v) → → ⊂ R
ˆ for some fibration τ , where is an even manifold underlying . ˆ The Lie algebra of vector fields on contains two remarkable elements ∂ ∂ ∂ ∂ D+ = + − θ + , D− = − − θ − . ∂θ ∂u ∂θ ∂v The following relations hold true: ∂ ∂ [D+ , D+ ] = −2 , [D− , D− ] = −2 , [D+ , D− ] = 0, ∂u ∂v ∂ ∂ , D± = , D± = 0. ∂v ∂u
(3.1.4)
(3.1.5)
(3.1.6)
526
F. Malikov
3.1.2. Super-jets. Let M be a C ∞ -supermanifold with underlying even manifold M even . Define ˆ Mˆ = M × .
(3.1.7)
ˆ Mˆ → .
(3.1.8)
ˆ It is fibered over : The manifold of ∞-jets of sections of this bundle, J ∞ (Mˆ ), is defined in a straightforward manner as follows (cf. [BD, p.80]). Definition 3.1.2.1. J ∞ (Mˆ ) is a supermanifold with underlying even manifold J ∞ (Meven ) and the structure sheaf O J ∞ (Mˆ ) defined to be the symmetric algebra on Dˆ ⊗Oˆ O M modulo the relations 1 ⊗ f · 1 ⊗ g = 1 ⊗ f g, 1 ⊗ 1 = 1, ˜ ˜
ξ ⊗ f g = (ξ ⊗ f ) · (1 ⊗ g) + (−1)ξ f (1 ⊗ f ) · (ξ ⊗ g)
(3.1.9)
for any ξ ∈ Tˆ , f, g ∈ O M , where˜stands for the parity. There arises a fiber bundle
ˆ J ∞ Mˆ →
(3.1.10)
ρ : Tˆ → T J ∞ (Mˆ ) s.t. ρ(η)(ξ ⊗ f ) = (ηξ ) ⊗ f
(3.1.11)
with connection
in complete analogy with (1.1.5). The relative versions, such as J ∞ (M/ ), are immediate. ˆ Note that connection (3.1.11) is constant in the direction of (θ + , θ − ), i.e., if we let ∞ J (Mˆ )o = {θ + = θ − = 0} → J ∞ (Mˆ ), then there is a diffeomorphism
∞ (3.1.12) J (Mˆ ), ρ(∂θ ± ) → J ∞ (Mˆ )o × R0|2 , ρ o (∂θ ± ) = ∂θ ± of R0|2 -manifolds with connection. Indeed, given a local coordinate system X i on M, the collection i {X (m),() , u, v, θ + , θ − ; (m) ∈ Z2+ , () ∈ Z22 }
(3.1.13a)
constitutes a local coordinate system on J ∞ (Mˆ ), where i X (m = (∂um 1 ∂vm 2 ∂θ+1 ∂θ−2 ) ⊗ X. 1 ,m 2 ),(1 ,2 )
(3.1.13b)
Letting F˜ i = (∂θ − ∂θ + ) ⊗ X i, ψ+i = (∂θ + ) ⊗ X i − θ − F˜ i, i ψ− = (∂θ − ) ⊗ X i + θ + F˜ i,
i x i = X i − θ + ψ+i − θ − ψ− − θ + θ − F˜ i ,
(3.1.14)
Lagrangian Approach to Sheaves of Vertex Algebras
we obtain another local coordinate system i i i x(m) , ψ±,(m) , F˜(m) ; u, v, θ + , θ − ; (m) ∈ Z2+ , () ∈ Z22
527
(3.1.15)
such that i i i ∂θ ± x(m) = ∂θ ± ψ±,(m) = ∂θ ± F˜(m) = 0,
(3.1.16)
and (3.1.12) follows. Note that change of variables (3.1.14) is nothing but the formal Taylor series expansion at J ∞ (Mˆ )o : i X i = x i + θ + ψ+i + θ − ψ− + θ + θ − F˜ i .
(3.1.17)
Along M, {x i } are coordinates and i transform as (even or odd) d x i . ψ±
(3.1.18)
3.1.3. Differential equations. The definition and discussion of a submanifold Sol ⊂ J ∞ (Mˆ ) as the zero locus of a differential ideal J is quite parallel to Sect. 1.4. Since our exposition is strongly focused on one particular example, that of the (2,2)-supersymmetric σ -model, we will restrict ourselves to the case where J is locally generated by 4n functions, E αi , 1 ≤ i ≤ n, 1 ≤ α ≤ 4, such that (cf. (1.4.1)) E 1i = F˜ i + · · · , i E 2i = ∂τ ψ− + ··· ,
E 3i = ∂τ ψ+i + · · · ,
(3.1.19)
E 4i = ∂τ2 x i + · · · , · in the where the omitted terms are independent of F˜ · , of non-zero order jets of ψ± · direction of τ , and of order > 1 jets of x also in the direction of τ . (τ is time-function (3.1.4) tacitly assumed to have been included in a coordinate system.) Letting
Sol o = {θ + = θ − = 0} → Sol,
(3.1.20)
one obtains a diffeomorphism of R0|2 -manifolds with connection ∼
(Sol, ρ(∂θ ± )) −→ (Sol o × R0|2 , ρ o (∂θ ± ) = ∂θ ± ),
(3.1.21)
by restricting (3.1.12). Note that Sol o is a D −, hence a D/ −, supermanifold; to emphasize the latter o . structure we will often write Sol If (3.1.19) holds, then (3.1.18) implies a diffeomorphism of D/ -manifolds ∼
o ∞ Sol (T (T M) ) , −→ J
where is the familiar parity change functor. Similarly,
∼ Sol −→ J ∞ T (T M)/ × R0|2 as D/ -manifolds. Both (3.1.22,23) are analogous to (1.4.2). ˆ
(3.1.22)
(3.1.23)
528
F. Malikov
3.2. Functional pre-symplectic structure. The right framework for super-generalization ˆ [L,M1,DM]. of 1.5 is provided by integral, rather than differential, forms on 3.2.1. Recall that the sheaf of integral forms is defined to be I∗ˆ =
4 i=−∞
I iˆ s.t. I 4−i = i Tˆ ⊗Oˆ Ber( ˆ ), ˆ
(3.2.1)
where Ber( ˆ ) is the Berezinian of ˆ . By definition, I ∗ˆ is a locally free ∗ Tˆ -module defined by
Tˆ → EndOˆ (I∗ˆ ), ξ → ιξ , where ιξ β = ξ ∧ β. def
(3.2.2)
Next, I ∗ˆ carries a unique structure of a module over the Clifford algebra, Cl(Tˆ ⊕ ˆ ), such that j
i+ j
iˆ ⊗Oˆ I ˆ → I ˆ , α ⊗ β → α ∧ β, [ιξ , α∧] = α(ξ ).
(3.2.3)
The Berezinian, Ber( ˆ ), carries the Lie derivative operation Tˆ ⊗R Ber( ˆ ) → Ber( ˆ ), ξ ⊗ β → Lieξ β,
(3.2.4)
which is naturally extended to Tˆ ⊗R I iˆ → I iˆ , ξ ⊗ β → Lieξ β.
(3.2.5)
The sheaf of integral forms is a complex with differential d : I iˆ → I i+1 ˆ
(3.2.6a)
[d, ιξ ] = Lieξ , ξ ∈ Tˆ .
(3.2.6b)
determined by
Many other differential-geometric identities, such as [Lieξ , ιη ] = ι[ξ,η] ,
[Lieξ , β∧] = (Lieξ β)∧, ξ, η ∈ Tˆ , β ∈ ˆ ,
(3.2.7)
keep on holding true. ˆ carries a fixed (étale) coordinate system (u, v, θ + , θ − ), there Since our particular is an integral form [dθ + dθ − ] such that du ∧ dv ∧ [dθ + dθ − ] trivializes the Berezinian Ber( ˆ ). Letting [dθ ± ] = ι∂θ ∓ [dθ + dθ − ] = [dθ ± ], one discovers a part of I ∗ˆ pleasingly – and deceptively – similar to the de Rham complex; e.g., Lie∂θ ± [dθ + dθ − ] = 0, d([dθ + dθ − ]) = d([dθ ± ]) = 0, d(θ ± [dθ ∓ ] = [dθ + dθ − ]. (3.2.8)
Lagrangian Approach to Sheaves of Vertex Algebras
529
Once a projection ˆ → is given, integration over fibers delivers a morphism I4ˆ → 2 , α → α o
(3.2.9)
which, in the case where the projection is (3.1.3), means that f (u, v, θ + , θ − )du ∧ dv ∧ [dθ + dθ − ] → ∂θ − ∂θ + f (u, v, θ + , θ − )du ∧ dv, (3.2.10) cf. (3.2.8). This is often referred to as integrating out θ + and θ − . 3.2.2. Back to super-presymplectic forms. Let M be either Sol or any version of an ˆ Let ∞-jet space considered in 3.1.2 that is fibered over . ˜ ∗,∗ = ∗ ⊗Oˆ I∗ˆ . M M/ˆ
(3.2.11)
If we wish to work in a relative situation determined by τ , see (3.1.4), then we write ∗ ˜ ∗,∗ = ∗ ⊗Oˆ I/ . ˆ M/ M/ˆ
(3.2.12)
In any case, we get a bi-complex with an obvious vertical differential ˜ ∗,i+1 ˜ ∗,i → δ: M/S M/S
(3.2.13)
˜ i,∗ → ˜ i+1,∗ , dρ/S : M/S M/S
(3.2.14)
and a horizontal differential
which owes its existence to connection (3.1.11) and is defined in exactly the same way as its counterpart in Sect. 3.1; here and elsewhere S is either or a point. ˜ ∗,∗ taken as a replacement of ∗,∗ , the discussion of Sect. 1.5.1–3 With M/ M/ carries over to the super-case practically word for word. For example, cf. (1.5.2), a 3,2 2,2 0 ˜ ˜ functional pre-symplectic form is ω ∈ H M, M/S /dρ M/S such that, ˜ 2,3 δω ∈ H 0 M, dρ/S . M/S
(3.2.15)
The outcome is the Lie superalgebra sheaf over M, ω HM /S .
(3.2.16)
Here is an operation that does not have an adequate purely even analogue. In all our examples, ∼
M −→ Mo × R0|2 ,
530
F. Malikov
˜ 3,2 / in a way respecting the connection, cf. (3.1.12, 21, 23). Given ω ∈ H 0 M, M/S ˜ 2,2 , operation (3.2.9) produces ωo ∈ H 0 Mo , ˜ 2,2o . Integrat˜ 3,2o /dρ o dρ M/S M /S M /S ing over fibers one obtains a Lie algebra sheaf morphism, an isomorphism in fact, ∼
ω ω HM /S −→ HMo /S . o
(3.2.17)
As a practical matter, (3.2.17) amounts to carrying out a Taylor expansion as in (3.1.17) and then extracting the coefficient of θ + θ − [dθ + dθ − ] as in (3.2.10). Similar in spirit is a morphism ω ω → HM HM /
(3.2.18)
that relates the relative and absolute versions and amounts to letting dτ = 0, cf. (1.5.11). 3.2.3. Example: canonical commutation relations. Let M be an n-dimensional purely even C ∞ -manifold. The 2n|2n-dimensional supermanifold T ∗ (T M) carries a wellknown closed 2-form ωo . If we let {x i } be coordinates on M, then {x i , xi = ∂x i } along with their superpartners {φ i , φi } form a system of local coordinates on T ∗ (T M), and ωo = δxi ∧ δx i + δφi ∧ δφ i .
(3.2.19)
Now use the projection π : J ∞ (T ∗ (T M)/ ) → T ∗ (T M) to introduce π ∗ ωo , a closed 2-form on J ∞ (T ∗ (T M)/ ). A suitable analogue of Hcan , see 1.5.4, is provided by fixing a suitable σ so that σ , τ is a coordinate system on , see (3.1.4), letting ω = π ∗ ωo ∧ dσ,
(3.2.20)
and defining H˜ can = HωJ ∞ (T ∗ (T M)
/ )
.
(3.2.21)
The rest of the discussion in 1.5.4 carries over to the present situation practically word for word; we will not dwell upon this any longer. Note that in this example integral forms do not appear. The reader interested in an ω example of a full-fledged Lie superalgebra sheaf HM /S will have to wait for the discussion of the calculus of variations in Sect. 3.3. 3.2.4. Legendre transform? In practice, the manifold J ∞ (T (T M)/ ) may be more important than J ∞ (T ∗ (T M)/ ) because of (3.1.22). The possibility to apply H˜ can then rests on the existence of the diffeomorphism, cf. (1.5.21), g : J ∞ (T (T M)/ ) → J ∞ (T ∗ (T M)/ ),
(3.2.22)
because given (3.2.21) there arises at once a Lie algebra sheaf isomorphism, cf. Lemma 1.5.5.1, g # : Hg
∗ω
∼
−→ g −1 H˜ can .
(3.2.23)
Lagrangian Approach to Sheaves of Vertex Algebras
531
Isomorphism (3.2.22), however, is a more subtle matter in the present situation than the usual Legendre transform. While the purely even manifolds underlying both the manifolds in (3.2.22) are the familiar J ∞ (T M/ ) and J ∞ (T ∗ M/ ), and they are easy to identify via a metric, the structure sheaves are more substantially different. The essence of this difference is that while M → OT (T M)
(3.2.24)
as a direct summand, its T ∗ (T M)-counterpart, T M , appears via the extension 0 → End M → A M → T M → 0,
(3.2.25)
where A M is the Atiyah algebra, i.e., the algebra of order 1 differential operators acting on the sections of M . One way to construct (3.2.22) seems to be this: split (3.2.25) by means of a connection ∇
←
0 → End M → A M → T M → 0,
(3.2.26)
and then identify ∼
M −→ T M
(3.2.27)
by means of a metric. This is exactly what the Lagrangian of a (1,1)-supersymmetric σ -model allows to do, see Sect. 3.4.3–4. 3.3. Calculus of variations. 3.3.1. The discussion of Sect. 1.6 carries over in a straightforward manner. Here are a few highlights. An action is
4,0 ˜ 3,0 ˜ ∞ /dρ
. A ∈ J ∞ Mˆ , (3.3.1) J M J∞ M ˆ
It is represented by a collection of Lagrangians ˜ 4,0 L = L ( j) ∈ ∞ J
ˆ
Mˆ
(U j )
(3.3.2)
determined up to dρ -exact terms and equal to each other on intersections Ui ∩ U j up to dρ -exact terms, cf. (1.6.0–1ab). An analogue of (1.6.2) is immediate, the outcome is a Dˆ -supermanifold Sol L with variational 1-form γ L and 2-form ω L = δγ L , cf. (1.6.4). The definition of a symmetry of L is also an obvious modification of (1.6.5). Noether’s Theorem 1.6.3 establishes a bijection between symmetries and integrals of motion as follows: ˜
ξ ↔ αξ + (−1)ξ +1 ιξ γ L ; the change of sign occurs when swapping ιξ and dρ as in (1.6.8).
(3.3.3)
532
F. Malikov
Thus there arise the Lie algebra sheaf HωSolL L , containing the algebra of integrals of motion I˜L , its relative version, HωSolL L / , and morphisms HωSolL L → HωSolL L / , I˜L → (Sol L , HωSolL L ) → (Sol L , HωSolL L / ),
(3.3.4)
whose composition is an injection provided (3.1.19) holds. A familiar novelty is that in all of this θ ± can be integrated out. The result is this: the action
o 2,0
o , A ∈ J ∞ Mˆ (3.3.5) , ∞
o /dρ o 1,0 ∞ J
Mˆ
J
Mˆ
the Lagrangian L
( j)
∈
o (U j ) 2,0 J ∞ Mˆ
,
(3.3.6)
and, since nothing is gained or lost, the integrated version of (3.3.4) as follows: ωo ωo ωo ωo H SolL o → H SolL o / , I˜L → Sol Lo , H SolL o → (Sol Lo , H SolL o / ), (3.3.7) L
L
L
L
where Sol Lo is defined in (3.1.20) In view of (3.1.22), this means J ∞ (T (T M)/ ) equipped with ω L and an embedding L I˜L → (M, HωJ ∞ (T (T M)
/ )
).
(3.3.8)
We will now exhibit an example where L HωJ ∞ (T (T M)
/ )
∼
−→ H˜ can .
(3.3.9)
3.4. An example: (1,1)-supersymmetric σ -model. 3.4.1. Let M be an n-dimensional purely even Riemannian manifold with metric (., .). Analogously to Sect. 1.6.9, we observe that a point in J 1 (Mˆ ) is a triple (tˆ, X, ∂ X ), a ˆ a point in M, and a map point in , ˆ → TX M, ξ → ∂ξ X, ∂ X : Ttˆ
(3.4.1)
cf. (3.1.13a,b). Hence for fixed vector fields ξ , η, (ξ X, ηX ) is a global section of O J 1 (M ˆ ) . (Of course, to be precise, we should have used B-points.) We will unburden the notation by letting ξ X stand for ∂ξ X . Here is a coordinate expression for this function gi j (X )ξ X i ηX j .
(3.4.2)
The (1,1)-supersymmetric σ -model Lagrangian is defined to be L = (D+ X, D− X )du ∧ dv ∧ [dθ + dθ − ],
(3.4.3)
Lagrangian Approach to Sheaves of Vertex Algebras
533
where the vector fields D± are from (3.1.5), cf. (1.6.13). Integrating out θ + and θ − gives (an exercise in differential geometry, see e.g. [QFS, p. 666]) L 11 = (−(∂u x, ∂v x) + (∇∂v x ψ+ , ψ+ ) + (∇∂u x ψ− , ψ− ) + (R(ψ+ , ψ− )ψ+ , ψ− ) − (F, F))du ∧ dv.
(3.4.4)
In this formula ∂u x = ∂u X |θ + =θ − =0 , ∂v x = ∂v X |θ + =θ − =0 , ψ± = ∂θ ± X |θ + =θ − =0
(3.4.5a)
and coincide with their namesakes from (3.1.14), ∇ is the Levi-Civita connection associated to the metric (., .), R is the curvature tensor, and F = ∇ D+ X D− X |θ + =θ − =0 ,
(3.4.5b)
which is somewhat different from its counterpart F˜ of (3.1.14). In fact, with a little extra effort the entire Taylor series expansion of L in θ + , θ − , cf. (3.1.17), can be computed to the effect that L = ((ψ+ , ψ− ) −θ + ((∂u x, ψ− ) + (ψ+ , F)) + θ− ((∂v x, ψ+ ) −(ψ− , F)) + θ + θ − L 11 )du ∧ dv ∧ [dθ + dθ − ].
(3.4.6)
To see better what all of this means, let us write down the first three terms of (3.4.4) in local coordinates (3.1.14,17); the result is s i s gs j (x) + ∂u x α ψ− ψ− αi gs j (x) L 11 = (−gi j (x)∂u x i ∂v x j + ∂v x α ψ+i ψ+ αi j
j
j
j
i ψ− · · · )du ∧ dv. + gi j (x)∂v ψ+i ψ+ + gi j (x)∂u ψ−
(3.4.7)
Computation of δL 11 , cf. (1.6.2), yields the Euler-Lagrange equations and a variational 1-form. The former are as follows, see also [QFS, p. 666]: F = 0, ∇∂u x ψ− = −R(ψ+ , ψ− )ψ+ , (3.4.8) ∇∂v x ψ+ = −R(ψ+ , ψ− )ψ− , 1 ∇∂u x ∂v x = (R(ψ− , ψ− )∂v x + R(ψ+ , ψ+ )∂u x) − (∇ψ+ R)(ψ− , ψ− )ψ+ . 2 The latter is
j s i ψ− αi gs j (x)δx α dv γ Lo = −gi j (x)∂v x j δx i + ψ− j s + gi j (x)∂u x i δx j − ψ+i ψ+ αi gs j (x)δx α du j
j
i − gi j (x)ψ+ δψ+i du + gi j (x)ψ− δψ− dv.
(3.4.9)
This, unlike more challenging (3.4.8), is a straightforward consequence of (3.4.7). Note that we have computed after projection (3.2.9), i.e., with θ ± integrated out, (3.2.10); nothing is gained or lost because γ L matters only modulo dρ -exact terms.
534
F. Malikov
ˆ carries another pair of distin3.4.2. (1,1)-supersymmetry. In addition to D± , (3.1.5), guished vector fields, + =
∂ ∂ ∂ ∂ + θ + , − = − + θ − . + ∂θ ∂u ∂θ ∂v
(3.4.10)
They enjoy similar properties ∂ ∂ [+ , + ] = 2 , [− , − ] = 2 , [+ , − ] = 0, ∂u ∂v ∂ ∂ , ± = , ξ± = 0, ∂v ∂u
(3.4.11)
[• , D• ] = 0.
(3.4.12)
and
Relations (3.4.11) imply that def ˆ N 1+ = span{ f (u)+ } ⊂ Tˆ (),
(3.4.13)
def ˆ N 1− = span{ f (v)− } ⊂ Tˆ ()
are two commuting copies of the N=1- supersymmetric superalgebra Lie realized in ˆ note that each contains a copy, Vir ± , of the algebra of vector fields vector fields on ; on . In fact, both are subalgebras of the algebra of symmetries of L: N 1± → I˜L .
(3.4.14)
Indeed, using (3.4.6), one computes easily that Lieρ f (u)+ L 11 = dρ ( f (u)L 01 )dv, Lieρ f (v)− L 11 = dρ ( f (v)L 10 )du. (3.4.15) It is then rather straightforward, and pleasing, to use (3.3.3, 3.4.9) in order to compute the corresponding integrals of motion def
j
Q +f = Q ρ f (u)+ = 2 f (u)gi j (x)ψ+i ∂u x j du − gi j (x)F i ψ− dv, i Q −f = Q ρ f (v)− = −2 f (v)gi j (x)ψ− ∂v x j dv + gi j (x)F i ψ+ , def
j
(3.4.16)
which, upon imposing the Euler-Lagrange equation F = 0, becomes i Q +f = 2 f (u)gi j (x)ψ+i ∂u x j du, Q −f = −2 f (v)gi j (x)ψ− ∂v x j dv.
(3.4.17)
This furnishes the embeddings Vir ± → N 1± → (Sol Lo , HωSolL o ),
(3.4.18a)
L
and the definitions (cf. 1.6.11.1) of right/left moving subalgebras ωo ,±
H SolL o = {F ∈ HωSolL 0 : [F, Vir ∓ ] = 0}. L
(3.4.18b)
L
Next, we will see that all of this unfolds within the canonical Lie algebra sheaf H˜ can of Sect. 3.2.3.
Lagrangian Approach to Sheaves of Vertex Algebras
535
Proposition 3.4.3. There is a diffeomorphism ∼
gˆ : Sol Lo −→ J ∞ (T ∗ (T M)/ ) of D/ -manifolds, which delivers the Lie algebra sheaf isomorphism ∼ g # : HωSolL ,± ˆ −1 H˜ can , o / −→ g L
cf. Lemmas 1.5.5.1 and 1.6.8.1. 3.4.4. Proof. Super-Legendre transform. In order to proceed, we need to make sure that Sol L as defined by (3.4.8) satisfies Cauchy-Kovalevskaya condition (3.1.19). Apparently neither u nor v can play the role of time, but the following change of variables: u = σ + τ, v = σ − τ
(3.4.19)
1 1 (∂σ + ∂τ ), ∂v = (∂σ − ∂τ ) 2 2
(3.4.20)
so that ∂u =
does the job. Therefore, cf. (3.1.22), ∼
Sol Lo −→ J ∞ (T (T M)/ ),
(3.4.21)
and our task is to find ∼
gˆ : J ∞ (T (T M)/ ) −→ J ∞ (T ∗ (T M)/ )
(3.4.22)
that identifies, modulo dρ/ -exact terms, ω L on the L.H.S. with the pull-back of the canonical ωo , (3.2.19), on the R.H.S. (g) ˆ ∗ ω|dτ =0 = ω L |dτ =0 + dρ/ (· · · ).
(3.4.23)
The variational 1-form γ Lo , computed in (3.4.9), is not well suited for this purpose. In addition to (3.4.19), let us introduce variables ρj =
1 j 1 j j j (ψ− + ψ+ ), φ j = (ψ− − ψ+ ). 2 2
(3.4.24)
j
(Since ψ± are sections of 2 copies of the bundle of 1-forms, see (3.4.5a) and (3.1.17,18), this change of variables makes sense globally.) Plugging these variables in Lagrangian (3.4.4) gives 1 (∂τ x, ∂τ x) + 2(∇∂τ x ρ, φ) + 2(∇∂τ x φ, ρ) + · · · dτ ∧ dσ, L 11 = 2 where · · · stand for the terms not containing ∂τ . Since (∇∂τ x φ, ρ) = −(φ, ∇∂τ x ρ) + ∂τ (φ, ρ) = (∇∂τ x ρ, φ, ) + ∂τ (φ, ρ), we obtain L˜ 11 =
1 (∂τ x, ∂τ x) + 4(∇∂τ x ρ, φ) + · · · 2
(3.4.25)
dτ ∧ dσ = L 11 mod dρ (...). (3.4.26)
536
F. Malikov
It is immediate to derive from (3.4.26) that the corresponding variational 1-form s α j γ L = (gi j ∂τ x j + 4iα ρ φ gs j (x))δx i ∧ dσ + 4gi j (x)δρ i φ j ∧ dσ,
(3.4.27)
equals γ L |dτ =0 modulo dρ -exact terms. If we let ρi =
1 gi j (x)φ j , 4
(3.4.28)
then s α ρ ρs )δx i ∧ dσ + δρ i ρi ∧ dσ. γ L = (gi j ∂τ x j + iα
(3.4.29)
s α ρ ρs , φ i → ρ i , φi → ρi xi → gi j ∂τ x j + iα
(3.4.30)
The substitution
makes sense as a globally defined map dT M L˜ 11 : T (T M) → T ∗ (T M).
(3.4.31)
It is a super-analogue of the Legendre transform, (1.6.13–14), which was envisaged in Sect. 3.2.4; indeed, if xi = ∂x i , then the first of assignments (3.4.30) is exactly splitting (3.2.26). The D/ -manifold property allows to extend this map unambiguously to the jet-spaces, and it is clear that such map identifies δγ L with ωo from (3.2.19). 3.4.5. Therefore, H˜ can is to the (1,1)-supersymmetric σ -model what Hcan is to the ordinary σ -model. In particular, writing integrals of motion (3.4.16) in terms of the new variables introduced in Sect. 3.4.4 provides a free field realization of N 1± . The result, which we will discuss in the context of the Kähler geometry, see the next section, is presumably the quasiclassical limit of the formulas obtained in [B-ZHS]. 3.4.6. The Kähler case: (2,2)-supersymmetry and the Witten Lie algebra. It is an exciting discovery going back to [Zu,A-GF] that in the Kähler case the supersymmetry algebra becomes twice as large. 3.4.6.1. Let then M be a complex manifold and (., .) a Kähler metric on it. To handle this case, we will change the notation somewhat: the natural vector bundles, such as T M, will be assumed to be complexified, and decompositions, such as T M = T 10 M ⊕T 01 M, will arise. What has been treated as a vector field, e.g. ∂τ x, ∂τ ψ+ , will become a section of T 10 M, and ∂τ x, ¯ ∂τ ψ¯ + will stand for the complex conjugate sections. We will also let, sloppily but customarily, ¯
j
j¯
∂τ x j = ∂τ x j , ψ± = ψ± .
(3.4.32)
The defining property of the Kähler metric ∇(T 10 ) ⊂ T 10 , ∇(T 01 ) ⊂ T 01 is crucial for what follows.
(3.4.33)
Lagrangian Approach to Sheaves of Vertex Algebras
537
Computing as in 3.4.4 (and using (3.4.33)) one obtains L 11 = (−(∂u x, ∂u x) ¯ − (∂v x, ∂u x) ¯ + (∇∂v ψ+ , ψ¯ + ) +(∇∂u ψ− , ψ¯ − ) + · · · )du ∧ dv mod dρ , (3.4.34) where the terms not containing ∂u , ∂v are omitted. Property (3.4.33) implies (and (3.4.34) supports) that w.r.t. the grading on O J ∞ (Mˆ )o defined by ψ± → 1, ψ¯ ± → −1,
(3.4.35)
L 11 is homogeneous of degree 0. Therefore, any homogeneous component of a symmetry of L 11 is also a symmetry. Integrals of motion (3.4.16) afford decomposition +− Q +f = Q ++ f + Qf ,
−− Q −f = Q −+ f + Qf
(3.4.36)
into the sum of degree ±1 components, which implies that the entire quadruple +− −+ −− ˜ {Q ++ f , Q f , Q f , Q f } ⊂ IL ,
(3.4.37)
and this extends (3.4.18a) to an embedding of a pair of N=2-superconformal Lie algebras Vir ± → N 1± → N 2± → I˜ L → H˜ can .
(3.4.38)
In particular, (and this follows from the consideration of the degree) −− ++ −− [Q ++ f , Q g ] = [Q f , Q g ] = 0.
(3.4.39)
Witten has used these relations [W2,W3] to define what in the present context becomes Witten Lie algebra sheaves: {X ∈ H˜ can : [Q 1∓,∓ , X ] = 0} . {[Q ∓,∓ , X ] all X ∈ H˜ can }
W± =
def
(3.4.40)
1
(There are, of course, two more versions of these sheaves.) 3.4.6.2. Some formulas. For the purpose of writing embeddings such as (3.4.38) explicitly, rewrite (3.4.34) using σ and τ which were defined in (3.4.19), ¯
j¯
t L 11 = (gi j¯ ∂τ x i ∂τ x j − 2∂τ x i isj gs t¯ψ+ ψ+t¯ + 2∂τ x i i¯s¯ j¯ gs¯t¯ψ− ψ− )dτ ∧ dσ (3.4.41) ¯j j i¯ )dτ ∧ dσ · · · , + (2gi j¯ ψ+ ∂τ ψ+i − 2gi¯ j ψ− ∂τ ψ− j
where the terms not containing ∂τ are omitted. It follows that, cf. (3.4.27), ¯
¯
¯
t i γ L = (gi j¯ ∂τ x j − 2isj gs t¯ψ+ ψ+t¯ )δx i ∧ dσ + (gi j¯ ∂τ x i + 2 s¯j¯i¯ gs¯t ψ− ψ− )δx j ∧ dσ (3.4.42) j j¯ i¯ ) ∧ dσ. × (−2gi j¯ ψ+ δψ+i + 2gi¯ j ψ− δψ− j
538
F. Malikov
If we let ψ = ψ+ , ψ¯ = ψ¯ − ,
(3.4.43a)
and j¯
j
ψi = −2gi j¯ ψ+ , ψi¯ = 2gi¯ j ψ− ,
(3.4.43b)
then ¯
¯
¯
γ L = (gi j¯ ∂τ x j + isj ψ j ψs )δx i ∧ dσ + (gi j¯ ∂τ x i + s¯j¯i¯ ψ i ψs¯ )δx j ∧ dσ ¯
× (ψi δψ i + ψi¯ δψ i ) ∧ dσ.
(3.4.44)
Therefore, the coordinate form of the super-Legendre transform (3.4.31) is ¯
xi → gi j¯ ∂τ x j + isj ψ j ψs , x • → x • ,
(3.4.45)
¯
x j¯ → gi j¯ ∂τ x i + s¯j¯i¯ ψ i ψs¯ , φ • → ψ • , φ• → ψ• .
Plugging these in (3.4.17) and extracting homogeneous components as in (3.4.36) one obtains, upon letting dτ = 0, ¯
¯
i j j Q −− f = f (σ − τ )(−x j¯ φ + gi j¯ ∂σ x φ )dσ, ¯
¯
¯
j ji ji s α Q −+ f = 2 f (σ − τ )(∂σ x φ j¯ − g x i φ j¯ + g iα φ φs φ j¯ )dσ,
(3.4.46− )
¯
j i j Q ++ ¯ ∂σ x φ )dσ, f = f (σ + τ )(x j φ + g ji ¯
¯
i ij i j s¯ α¯ Q +− f = −2 f (σ + τ )(∂σ x φi + g x j¯ φi − g j¯α¯ φ φs¯ φi )dσ.
(3.4.46+ )
One may wish at this point to use these formulas to compute Witten’s Lie algebra sheaf (3.4.40). Two things transpire immediately: first, the role played by f in all of this is rather superficial and, second, if one removes from the first of (3.4.45− ) the annoying gi j¯ ∂σ x i φ j (and gi j¯ ∂σ x i φ j from the first of (3.4.45+ ) resp.), then it becomes exactly ¯ (∂- resp.) differential; and so, perhaps, W ± should be of completely holomorphic the ∂(antiholomorphic resp.) nature. This is all true, but the language suited to analysis of such issues is that of vertex Poisson algebras. 3.5. Vertex Poisson algebra interpretation. Witten’s models. The sheaf H˜ can is the tip of an iceberg. It is, just as its purely even counterpart Hcan was, Sect. 1.5.4, a Lie algebra sheaf attached to a certain sheaf of vertex Poisson superalgebras.
Lagrangian Approach to Sheaves of Vertex Algebras
539
3.5.1. The notion of a super-SVDO is quite analogous to the one we discussed in Sect. 2. It is a Z+ -graded vertex Poisson superalgebra V = V0 ⊕ V1 ⊕ · · · such that V0 = C ∞ (T U ), U ⊂ Rn ,
(3.5.1)
V1 = TU / (U ) + U / (U ) + TU / (U ) + U / (U ) .
(3.5.2)
and, non-canonically,
Classification of such algebras [GMS3], under some obvious non-degeneracy assumptions, is obtained in a way similar to Sect. 2.4.3, 2.5. They form an 3,cl (U )-torsor, i.e., given a super-SVDO V and a closed 3-form H ∈ 3,cl (U ), an operation .
(V, H ) → V + H
(3.5.3)
.
is defined, where V + H is a super-SVDO different from V only in that the operation (0)
: TU / ⊗ TU / → V1
is replaced with (0) H
=(0) +H,
(3.5.4)
cf. (2.4.21). (This involves only even components of V1 .) One has, cf. (2.4.24), .
Mor (V, V + H ) = {α ∈ 2 (U )s.t.dα = H }.
(3.5.5)
In particular, cf. (2.4.25), ∼
2,cl (U ) −→ Aut(V ),
(3.5.6a)
where the automorphism corresponding to α is the one determined by the shear, cf. (2.4.19), TU / (U ) ξ → ξ + ιξ α.
(3.5.6b)
All of this can be defined over manifolds. There is a distinguished such sheaf poiss of super-SVDOs, the vertex Poisson de Rham complex [MVS], M . As an O M module, poiss
M
= π∗ O J ∞ (T ∗ (T M)/ ) ,
(3.5.7)
where π is the projection J ∞ (T ∗ (T M)/ ) → M . The operations are determined by the requirement that they all be of classical origin – as in Proposition 2.7.1. Here are some examples written down in local coordinates: j
¯
j¯
(xi )(0) f (x) = ∂x i f (x), (xi¯ )(0) f (x) = ∂x i¯ f (x) (φi )(0) φ j = δi , (φi¯ )(0) φ j = δi¯ , ξ(0) η = [ξ, η], ξ(o) α = Lieξ α, (3.5.8a) ¯
¯
¯
where ξ = f i (x)xi + f i (x)xi¯ , η = g i (x)xi +g i (x)xi¯ , α = h i (x)ρ(∂σ )x i +h i¯ (x)ρ(∂σ )x i , the vertex algebra derivation being T = ρ(∂σ ).
(3.5.8b)
540
F. Malikov
(The twist that takes care of functions explicitly depending on σ and can imposed as in the even case, see Definition 2.7.2, has been tacitly assumed throughout.) One has, analogously to Proposition 2.6.1, Proposition 3.5.1.1. a) The set of isomorphism classes of sheaves of super-SVDOs on M is identified with H 3 (M, R). b) If V is a sheaf of super-SVDOs, then ∼
AutV −→ H 0 (M, 2,cl M ). Let Lie(V) = V/T (V). Operation tion 2.7.3,
(0)
(3.5.9)
makes LieV into a sheaf of Lie superalgebras. One has, cf. Proposi-
Proposition 3.5.2. The algebra sheaves H˜ can and Lie( M
poiss
) are isomorphic.
3.5.3. Some of the constructions above are simplified when performed in the framework of vertex Poisson superalgebras because some of the Lie algebras considered are the value of the Lie-functor. For example, there are N=1,2 supersymmetric vertex Poisson algebras [K], N 1 and N 2, such that the N=1,2 supersymmetric Lie superalgebras, which appeared in (3.4.13), are N 1 = Lie(C ∞ () ⊗ N 1), N 2 = Lie(C ∞ () ⊗ N 2).
(3.5.10)
The elements, see (3.4.45), ¯
¯
Q −− = −x j¯ φ j + gi j¯ ∂σ x i φ j , Q
−+
j¯
¯ ji
= 2(∂σ x φ j¯ − g xi φ j¯ + g
¯ ji
s α iα φ φs φ j¯ ),
(3.5.11− )
¯
Q ++ = x j φ j + g ji¯ ∂σ x j φ i , ¯
¯
Q +− = −2(∂σ x i φi + g i j x j¯ φi − g i j s¯j¯α¯ φ α¯ φs¯ φi ), poiss
define global sections of M (2.9.25) holds true.
(3.5.11+ )
. By definition, the following analogue of (2.8.12) and
Lemma 3.5.3.1. The two pairs of global sections (Q −− , Q −+ ) and (Q ++ , Q +− ) generpoiss ate, inside H 0 (M, M ), two pairwise Poisson-commuting copies of the vertex Poisson N=2 superalgebra: poiss
N 2+ → H 0 (M, M
+ ) ← N 2− , (N 2(n) (N 2− ) = 0 if n ≥ 0.
(3.5.12)
Lagrangian Approach to Sheaves of Vertex Algebras
541
A streamlined version of Witten’s Lie algebra sheaf (3.4.40) is Witten’s vertex Poisson algebra sheaf defined as follows. Relations (3.4.39) in the vertex algebra context imply that each element of the quadruple {Q •,• (0) , • = ±}) and various linear combinations poiss
thereof are differentials of the sheaf M obtain a cohomology sheaf
. Letting Q (0) be one such differential, we
poiss def
H Q ( M
) =
Ker Q (0) . ImQ (0)
(3.5.13)
It is a vertex Poisson algebra sheaf – a well-known fact and an immediate consequence of (0) being a derivation of all (n) -products (super-analogue of Jacobi identity, Sect. 2.1, II.2). Of sheaves (3.5.13) the following 3 will be of interest to us: Definition 3.5.3.2 (cf. [W2]). poiss
),
(3.5.14a)
poiss
),
(3.5.14b)
poiss H Q −− ( M ).
(3.5.14c)
A − model sheaf :W A = H Q −− +Q ++ ( M B − model sheaf :W B = H Q −− +Q +− ( M half − twisted model sheaf :W1/2 = The relation of (3.5.13–14) to (3.4.40) is that W − = Lie(W1/2 ),
(3.5.15)
to give but one example. The cohomology, H ∗ (M, V), of a sheaf of vertex Poisson algebras V is a vertex Poisson algebra, of course. We are led then, following [W2], to Definition 3.5.3.3. A − model vertex Poisson algebra :H ∗ (M, W A ), B − model vertex Poisson algebra :H ∗ (M, W B ), half − twisted model vertex Poisson algebra :H ∗ (M, W1/2 ),
(3.5.16a) (3.5.16b) (3.5.16c)
Theorem 3.5.4. Let M be Kähler. Then 1) the following isomorphisms are valid: ∼
H ∗ (M, W A ) −→ H ∗ (M, C), ∼
∗
∗
(3.5.17a)
∗
H (M, W B ) −→ H (M, T M ), ∗
∼
H (M, W1/2 ) −→ H poiss,an
∗
(3.5.17b)
poiss,an (M, M ),
(3.5.17c) poiss
where M is a purely holomorphic version of the sheaf M [MSV]; poiss poiss,an + 2) embedding N 2 → M , (3.5.12), descends to an embedding N 2+ → M poiss,an whose image coincides with N=2 superconformal structure introduced on M .
542
F. Malikov
Remark 3.5.4.1. 1) Of these, the first two are finite dimensional supercommutative algebras and as such are trivial examples of a vertex Poisson algebra with zero derivation T as noted in Sect. 2.3. Contrary to this, the last one is a full-fledged infinite dimensional vertex Poisson algebra. Being infinite dimensional it is characterized by its character (q-dimension), which is closely related to the elliptic genus of M. The algebra can be quantized, and the character of the quantum version has provided some insights into the elliptic genus [BL,MS,GM1,GM2]. 2) This theorem, especially (3.5.17c) is a refined version of [Kap]. In fact, Kapustin deals with the quantum version of this result; we will discuss quantization in the next section. poiss
3.5.4.2. Sketch of proof. Apply to M automorphism (3.5.6a-b) determined by the ¯ Kähler 2-form gi j¯ d x i ∧ d x j . As a result, (Q −− )(0) will be replaced with a vertex ana¯ logue of the ∂-differential: ¯ ∂¯ver t = (x j¯ φ j )(0) .
(3.5.18)
Essentially by definition, poiss,an
( M
poiss
, 0) → ( M
, ∂¯ver t ), ¯
¯
is a quasiisomorphism [MSV]. Indeed, a glance at (3.5.8a) convinces one that x j φ j are not ∂¯ver t -cocycles, and x j¯ φ j¯ are ∂¯ver t -cohomologous to 0. Therefore ∂¯ver t effectively kills all antiholomorphic variables, leaving holomorphic ones intact. This defines poiss poiss,an a purely holomorphic analogue of M , that is, M . Hence a quasiisomorphism poiss,an
( M
poiss
, 0) → ( M
, (Q −− )(0) ),
which proves (3.5.17c). In (3.5.17a-b) one more differential is turned on. Definition (3.5.11) implies that upon the same shear by the Kähler form, Q ++ = x j φ j .
(3.5.19)
Therefore, (Q −− )(0) + (Q ++ )(0) is a vertex analogue of total de Rham differential, and (3.5.17b) becomes essentially [MSV], Theorem 2.4. Similarly, in the (Q −− )(0) -cohomology, Q +− = −4∂σ x i φi ,
(3.5.20)
and a simple analysis along the lines of [MSV], Sect. 2.3–2.4, shows that poiss,an
H∂σ x j φ j ( M
∼
) −→ H ∗ (M, ∗ T M ),
as desired. Item 2) is a result of checking (3.5.19, 20) against [MSV], (2.3b). Next, we establish concrete complexes which compute vertex Poisson algebras of the A-, B-, and half-twisted models.
Lagrangian Approach to Sheaves of Vertex Algebras
543
Corollary 3.5.5. ∼
H ∗ (M, W A ) −→ H Q −− +Q ++ ((M, M
poiss
)),
∼
poiss H ∗ (M, W B ) −→ H Q −− +Q +− ((M, M )), ∼ poiss H ∗ (M, W1/2 ) −→ H Q −− ((M, M )).
(3.5.21a) (3.5.21b) (3.5.21c)
poiss
Proof. The sheaf M is a complex w.r.t. the 3 differentials used above. Hence there poiss poiss poiss arise 3 different hypercohomology groups, H A ( M ), H B ( M ), H1/2 ( M ). Each can be computed by any of the two spectral sequences. The computation using one of them is the content of Theorem 3.5.4. It says that the result is the vertex Poisson algebra of A-, B-, and half-twisted models resp. The computation using another will poiss poiss then prove the corollary, because the sheaf M being flabby, H j (M, M ) = 0 if j > 0. Remark. In view of Theorem 3.5.4, isomorphisms (3.5.21a,b) are vertex Poisson algebra ¯ versions of the de Rham complex and ∂-resolution of the algebra of polyvector fields ¯ resp., while (3.5.21c) is the ∂-resolution of the vertex Poisson de Rham complex. 3.5.6. H-flux. Let us now give, along the lines of Sect. 2.8, a Lagrangian interpretation of twisted sheaves of super-SVDOs which arise via (3.5.3) and are parametrized by H 3 (M, R), see Proposition 3.5.1.1. Fix H ∈ (M, 3,cl M ), a closed 3-form; a cover {Ui } of M; and a collection of 2-forms (i) β ∈ (Ui , 2M ) s.t. dβ (i) = H |Ui . Having noticed that β j (D+ X, D− X ) is naturally a section of the structure sheaf of the jet space over U j , introduce the H -twist of Lagrangian (3.4.3) as follows, cf. (2.8.5): L H = {L + β j (D+ X, D− X )du ∧ dv ∧ [dθ + dθ − ]}.
(3.5.22)
The argument parallel to that leading to (2.8.5) proves the following. Lemma 3.5.6.1. ωo
H H˜ SolL o
LH
poiss .
∼
−→ Lie( M
+ H ),
poiss .
where M
+ H is defined as in (3.5.3).
Therefore, all the constructions originating in [GHR] and further explored in papers poiss . such as [BLPZ,KL] translate into different vertex Poisson subalgebras of M + H, which depend on a choice of a generalized Kähler structure. 3.6. Quantization. B-model moduli. This section is an announcement. It will be assumed throughout that the automorphism by the Kähler form has been performed so that ¯
Q −− = φ j x j¯ , Q ++ = φ i xi , cf. Sect. 3.5.4.2.
(3.6.0)
544
F. Malikov poiss
3.6.1. The differential graded sheaves of vertex Poisson algebras, ((M, M ), Q (0) ), where Q is any of the differentials appearing in (3.5.21a,b,c), can be quantized. What t we mean by this is that, first, there is a sheaf of vertex algebras ver M [MSV] whose poiss quasiclassical limit is M and, second, this sheaf carries quantum analogues of each ++ of the 3 differentials. In fact, quantum versions of (Q −− (0) ) and (Q (0) ) are in [MSV], and +− (Q )(0) has been recently proposed in [B-ZHS]; in what follows the use of the latter is easy to avoid. Thus there arise 3 vertex algebra versions of A-, B-, and half-twisted models resp.: H ∗ (M, W A
quant
H
∗
∼
t ) −→ H Q −− +Q ++ ((M, ver M )),
∼ quant t (M, W B ) −→ H Q −− +Q +− ((M, ver M )), ∼ quant t H ∗ (M, W1/2 ) −→ H Q −− ((M, ver M )).
(3.6.1a) (3.6.1b) (3.6.1c)
) and H ∗ (M, W B ), coincide with their quasiclassiThe first two, H ∗ (M, W A cal limits (3.5.21a,b). The 3rd is quite different from its quasiclassical limit and equals the cohomology of the chiral de Rham complex, H ∗ (M, ch M ) [MSV]. Relation of this naive quantization to the genuine quantum string theory is expressed by saying, in physics language, that the latter equals the former “perturbatively”, [Kap]. But let us show that both (3.6.1b,c) can be further deformed along the Barannikov-Kontsevich moduli space [BK]. We will focus on the half-twisted model (3.6.1c). quant
quant
3.6.2. Recall that associated (by Deligne, see [GoM,Kon,BK]) to any differential Lie superalgebra (g = g0 ⊕ g1 , d) there is a deformation functor, Def g, with domain the category of Artin algebras and range the category of sets . In order to define it, introduce the space of solutions to the Maurer-Cartan equation with values in an Artin algebra A: 1 1 (3.6.2) MCg(A) = γ : dγ + [γ , γ ] = 0, γ ∈ (g ⊗ A) . 2 The operation (g ⊗ A)1 γ → dβ + [γ , β] if β ∈ (g ⊗ A)0
(3.6.3)
does not preserve the set MCg(A), but it does so infinitesimally, see a lucid explanation in [M2], Ch. 2, Sect. 9. Exponentiating (3.6.3) gives a group action G(A)0 × MCg(A) → MCg(A).
(3.6.4)
Def g(A) = MCg(A)/G(A)0 .
(3.6.5)
Define
The motivation behind this ([M2], Ch. 2, sect. 9) is that (i) if γ is a solution of the Maurer-Cartan equation, then d +[γ , .] is also a differential, and (ii) the adjoint action of g0 results in the action on solutions of the Maurer-Cartan equation defined in (3.6.3).
Lagrangian Approach to Sheaves of Vertex Algebras
545
Barannikov and Kontsevich apply this functor in the case where ∗,0 ¯ g B K = (M, 0,∗ M ⊗ TM ), d = ∂ [., .] is the Schouten-Nijenhuis bracket.
(3.6.6)
Our task is similar but somewhat different. We need, see (3.6.1c), to deform (Q −− )(0) t within the class of differentials on the vertex algebra (M, ver M ). Even though the latter is not a Lie algebra, this deformation problem is governed by the differential Lie superalgebra
def t −− (ˆg, d, [., .]) = (M, Lie( ver )(0) ,(0) , (3.6.7a) M )), (Q where t ver t ver t (M, Lie( ver M )) = (M, M /T ( M )).
(3.6.7b)
Indeed, (0)
makes
t (M, ver M )
t ver t : gˆ ⊗ (M, ver M ) → (M, M )
(3.6.8)
a gˆ -module, on which gˆ operates by derivations. Furthermore,
1 ((Q −− )(0) + γ(0) )2 = (Q −− (0) γ )(0) + 2 (γ(0) γ )(0) .
(3.6.9)
Hence, if γ satisfies the Maurer-Cartan equation, then (Q −− )(0) + γ(0) is a differential. Let us define then Def (M, ver t ) = Def gˆ .
(3.6.10)
t g B K ⊂ (M, ver M ),
(3.6.11)
M
By definition,
which, by virtue of (3.6.7b), gives a map, an injection in fact, ι : g B K → gˆ .
(3.6.12)
It is not a differential Lie algebra homomorphism, but its twisted version ι Q ++ : g B K → gˆ , a → Q ++ (0) ι(a)
(3.6.13)
is; here Q ++ is a vertex analogue of the ∂-differential; it has appeared in (3.6.1) and is defined by the same formula as its quasiclassical limit (3.6.0). Indeed, it is a pleasing exercise to check that the Schouten-Nijenhuis bracket can be expressed in purely vertex algebra terms, cf. Proposition 1.1 in [Get], ι ([a, b]) = ι(a)(0) (Q ++ (0) ι(b)).
(3.6.14)
Therefore
++ ι Q ++ ([a, b]) = Q ++ = Q ++ (0) ι(a)(0) Q (0) ι(b) (0) ι(a)
(0)
Q ++ (0) ι(b)
= ι Q ++ (a)(0) ι Q ++ (b)(0) . Note that morphism (3.6.13) changes the parity, as it should, because g B K is an odd Lie superalgebra. This proves
546
F. Malikov
Lemma 3.6.2.1. Map (3.6.13) defines a morphism of functors Def gB K → Def (M, ver t ) . M
(3.6.15)
If M is a Calabi-Yau manifold, then Def gB K is represented by a formal scheme that is the formal neighborhood of 0 of the superspace H ∗ (M, ∗ T M ) [BK]. In particular, there exists a generic formal solution of the Maurer-Cartan equation in variables chosen to be any basis of the dual space (H ∗ (M, ∗ T M ))∗ . Therefore, Corollary 3.6.2.2. If M is a Calabi-Yau manifold, then there is a family of vertex algebras H ∗ (M, W1/2
quant
∼
t )t −→ H Q −− ((M, ver M )), t
(3.6.16)
with base the formal neighborhood of 0 in the superspace H ∗ (M, ∗ T M ). Some of these deformations are not so formal; for example, (Q −− )(0) itself depends quite explicitly on the choice of a complex structure, see (3.6.0); this can be extended by including generalized complex structures [G]; and considerable work has been done in order to interpret other points of the Barannikov-Kontsevich moduli space. 3.6.3. Vertex Frobenius manifolds? It appears that there is more than just that to this story. The events unfolding in the conformal weight zero component of H Q −− t t ((M, ver M )) is precisely the Barannikov-Kontsevich construction of the Frobenius manifold structure on Def gB K . Furthermore, it is plausible that each line of [BK] has a vertex algebra analogue valid up to homotopy. For example, operation (−1) makes each vertex algebra into a homotopy associative commutative algebra [LZ]. Furthermore, the order 2 differential operator defined on g, which is essential for [BK], has a vertex analogue; this analogue is (Q ++ )(1) , which is well defined precisely when M is a Calabi-Yau manifold [MSV]. It is also an order 2 differential operator of sorts in that ++ [(Q ++ )(1) , a(−1) ] − (Q ++ (1) a)(−1) = (Q (0) a)(0) ,
(3.6.17)
which is a derivation of all (n) -multiplications – a remark of Lian and Zuckerman, [LZ], Lemma 2.1. What all of this seems to indicate is that there is a reasonable definition of a vertex t Frobenius manifold of which H Q −− ((M, ver M )) is an important example. t
Acknowledgements. The author thanks V.Gorbounov, A.Kapustin, and B.Khesin for illuminating discussions. Parts of this work were done while the author was visiting the Fields Institute, IHES, Max-Planck-Institut in Bonn, and Erwin Schrödinger Institut in Vienna. It is a pleasure to acknowledge the support, hospitality, and stimulating atmosphere of these institutions.
References [AKSZ] [AG] [A-GF]
Aleksandrov, M., Kontsevich, M., Schwarz, A., Zaboronsky, O.: The geometry of the master equation and topological quantum field theory. Internat. J. Mod. Phys. A12, 1405–1430 (1997) Arkhipov, S., Gaitsgory, D.: Differential operators on the loop group via chiral algebras. Int. Math. Res. Not. 2002(4), 165–210 (2002) Alvarez-Gaumé, L., Freedman, D.Z.: Geometrical structure and ultraviolet finiteness in the supersymmetric σ -model. Commun. Math. Phys. 80(3), 443–451 (1981)
Lagrangian Approach to Sheaves of Vertex Algebras
[BK] [BD] [B-ZHS] [BL] [BLPZ] [Bre] [DF] [DM] [Di] [Dor] [Fad] [FP] [F] [FL] [FB-Z] [FS] [GHR] [GW] [Get] [GoM] [GM1] [GM2] [GMS1] [GMS2] [GMS3] [G] [HK] [K] [Kap] [KL] [Kon] [L] [LZ]
547
Barannikov, S., Kontsevich, M.: Frobenius manifolds and formality of lie algebras of polyvector fields. Int. Math. Res. Not. 1998(4), 201–215 (1998) Beilinson, A., Drinfeld, V.: Chiral algebras, American Mathematical Society Colloquium Publications, 51, Providence, RI: Amer. Math. Soc. 2004 Ben-Zvi, D., Heluani, R., Szczesny, M.: Supersymmetry of the chiral de Rham complex. http:// arXiv.org/list/math. QA/0601532, 2006 Borisov, L., Libgober, A.: Elliptic genera of toric varieties and applications to mirror symmetry. Inv. Math. 140(2), 453–485 (2000) Bredthauer, A., Lindström, U., Persson, J., Zabzine, M.: Generalized kähler geometry from supersymmetric sigma models. Lett. Math. Phys. 77, 291–308 (2006) Bressler, P.: The first Pontryagin class. http://arxiv.org/math. AT/0509563, 2005 Deligne, P., Freed, D.: Classical field theory. In: Quantum fields and strings: A course for mathematicians v.1, P.Deligne et al, editors, Providence, RI: Amer. Math. Soc. 2000 Deligne, P., Morgan, J.: Notes on supersymmetry (following J.Bernstein). In: Quantum fields and strings: A course for mathematicians, v.1, P.Deligne et al, editors, Providence, RI: Amer. Math. Soc. 2000 Dickey, L.A.: Soliton equations and Hamiltonian systems, Second edition. Advanced Series in Mathematical Physics 26, River Edge, NJ: World Scientific Publishing Co., Inc., 2003 Dorfman, I.Ya.: Dirac structures of integrable evolution equations. Phys. Lett. A 125(5), 240–246 (1987) Faddeev, L.D.: The Feynman integral for singular Lagrangians (in Russian), Teoret. Mat. Fiz. 1(1), 3–18 (1969) Feigin, B., Parkhomenko, S.: Regular representation of affine Kac-Moody algebras. In: Algebraic and geometric methods in mathematical physics (Kaciveli, 1993), Math. Phys. Stud. 19, Dordrecht: Kluwer Acad. Publ., 1996, pp 415–424 Frenkel, E.: Private communication Frenkel, E., Losev, A.: Mirror symmetry in two steps: a-i-b. Commun. Math. Phys. 269, 39–86 (2007) Frenkel, E., Ben-Zvi, D.: Vertex algebras and algebraic curves. Mathematical Surveys and Monographs 88, Providence, RI: Amer. Math. Soc., 2001 Frenkel, I., Styrkas K.: Modified regular representations of affine and Virasoro algebras, VOA structure and semi-infinite cohomology. http://arXiv.org/math. QA/0409117, 2004 Gates, S.J. Jr.., Hull, C.M., Roˇcek, M.: Twisted multiplets and new supersymmetric nonlinear σ -models. Nucl. Phys. B 248(1), 157–186 (1984) Gepner, D., Witten, E.: String theory on group manifolds. Nucl. Phys. B 278(3), 493–549 (1986) Getzler, E.: A darboux theorem for hamiltonian operators in the formal calculus of variations. Duke Math. J. 111(3), 535–560 (2002) Goldman, W.M., Millson, J.J.: The deformation theory of representations of fundamental groups of compact kähler manifolds. Inst. Hautes Études Sci. Publ. Math. 67, 43–96 (1988) Gorbounov, V., Malikov, F.: Vertex algebras and the landau-ginzburg/calabi-yau correspondence. Moscow Math. J. 4(3), 729–779 (2004) Gorbounov, V., Malikov, F.: The chiral de Rham complex and the positivity of the equivariant signature of the loop space. http://arXiv.org/math. AT/0205132, 2002 Gorbounov, V., Malikov, F., Schechtman, V.: Gerbes of chiral differential operators. ii. vertex algebroids. Inv. Math. 155, 605–680 (2004) Gorbounov, V., Malikov, F., Schechtman, V.: On chiral differential operators over homogeneous spaces. Int. J. Math. Math. Sci. 26(2), 83–106 (2001) Gorbounov, V., Malikov, F., Schechtman, V.: Gerbes of chiral differential operators. III. http:// arXiv.org/list/math. AG/0005201, 2000 Gualtieri, M.: Generalized complex geometry. http://arXiv.org/list/math. DG/0401221, 2004 Heluani, R., Kac, V.G.: Supersymmetric vertex algebras. Commun. Math. Phys. 271, 103–178 (2007) Kac, V.: Vertex algebras for beginners. 2nd edition, Providence, RI: Amer. Math. Soc. 1998 Kapustin, A.: Chiral de Rham complex and the half-twisted sigma-model. http://arXiv.org/list/ hep-th/0504074, 2005 Kapustin, A., Li, Yi.: Topological sigma-models with H-flux and twisted generalized complex manifolds. http://arXiv.org/list/hep-th/0407249, 2004 Kontsevich, M.: Deformation quantization of poisson manifolds. Lett. Math. Phys. 66(3), 157–216 (2003) Leites, D.: Introduction to the theory of supermanifolds. Russ. Math. Surv. 35(1), 1–64 (1980) Lian, B.H., Zuckerman, G.J.: New perspectives on the brst-algebraic structure of string theory. Commun. Math. Phys. 154(3), 613–646 (1993)
548
[LWX] [M1] [M2] [MS] [MSV] [Ol] [QFS] [S] [T] [V] [W1] [W2] [W3] [W4] [Z] [Zh] [Zu]
F. Malikov
Liu, Z.-J., Weinstein, A., Xu, P.: Manin triples for lie bialgebroids. J. Diff. Geom. 45, 547–574 (1997) Manin, Yu.I.: Gauge field theory and complex geometry. Grundlehren 289 Berlin-Heidelberg-New York: Springer-Verlag, 1988 Manin, Yu.I.: Frobenius manifolds, quantum cohomology, and moduli spaces. Colloquium Publications 47, Providence, RI: Amer. Math. Soc 1999 Malikov, F., Schechtman, V.: Deformations of vertex algebras, quantum cohomology of toric varieties, and elliptic genus. Commun. Math. Phys. 234(1), 77–100 (2003) Malikov, F., Schechtman, V., Vaintrob, A.: Chiral de rham complex. Commun. Math. Phys. 204, 439–473 (1999) Olver, P.J.: Applications of Lie groups to differential equations. Graduate Texts in Mathematics 107, New York: Springer-Verlag, 1986 Quantum fields and strings: A course for mathematcians. v.1, 2, P. Deligne et al, eds., Providence, RI: Amer. Math. Soc. 2000 Schwarz, A.: Symplectic formalizm in conformal field theory. In: Symétries Quantiques, Les Houches, Session LXIV, 1995, A. Connes, K. Gawedzki, Zinn-Justin, eds., Elsevier Science B.V., 1998 Takens, F.: A global version of the inverse problem of the calculus of variations. J. Differ. Geom. 14(4), 543–562 (1979) Vinogradov, A. M. Cohomological analysis of partial differential equations and secondary calculus. Translations of Mathematical Monographs, 204. Providence, RI: Amer. Math. Soc. 2001 Witten, E.: Nonabelian bosonization in two dimensions. Commun. Math. Phys. 92(4), 455– 472 (1984) Witten, E.: Mirror manifolds and topological field theories. In: Essays on mirror symmetry, S.T. Yau, ed., Hong Kong: International Press, 1992 Witten, E.: On the landau-ginzburg description of n = 2 minimal models. Int. J. Mod. Phys. A9, 4783–4800 (1994) Witten, E.: Two-Dimensional Models With (0,2) Supersymmetry: Perturbative Aspects. http:// arXiv.org/list/hep-th/0504078, 2005 Zuckerman, G.J.: Action principles and global geometry. In: Mathematical aspects of string theory (San Diego, Calif., 1986), Adv. Ser. Math. Phys., 1, Singapore: World Sci. Publishing, 1987, pp. 259–284 Zhu, M.: Vertex operator algebras associated to modified regular representations of affine Lie algebras. http://arXiv.org/list/math/0611517, 2006 Zumino, B.: Supersymmetry and Kähler manifolds. Phys. Lett. 27B, 203 (1979)
Communicated by L. Takhtajan
Commun. Math. Phys. 278, 549–566 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0397-x
Communications in
Mathematical Physics
The Ground State Energy of Heavy Atoms: Relativistic Lowering of the Leading Energy Correction Rupert L. Frank1, , Heinz Siedentop2 , Simone Warzel3 1 Department of Mathematics, Royal Institute of Technology, 100 44 Stockholm, Sweden.
E-mail:
[email protected] 2 Mathematisches Institut, Ludwig-Maximilians-Universität München, Theresienstraße 39,
80333 München, Germany. E-mail:
[email protected] 3 Department of Mathematics, Princeton University, Princeton, NJ 08544-1000, USA.
E-mail:
[email protected] Received: 16 February 2007 / Accepted: 2 May 2007 Published online: 21 December 2007 – © R.L. Frank, H. Siedentop and S. Warzel 2007
Abstract: We describe atoms by a pseudo-relativistic model that has its origin in the work of Chandrasekhar. We prove that the leading energy correction for heavy atoms, the Scott correction, exists. It turns out to be lower than in the non-relativistic description of atoms. Our proof is valid up to and including the critical coupling constant. It is based on a renormalization of the energy whose zero level we adjust to be the ground-state energy of the corresponding non-relativistic problem. This allows us to roll the proof back to results for the Schrödinger operator. 1. Introduction The energy of heavy atoms has attracted considerable interest that dates back to the advent of quantum mechanics. As in classical mechanics it soon became clear that the exact solution of problems involving more than two particles interacting through Coulomb forces is not possible. Thomas [61] and Fermi [22,23] introduced their description of such atoms by the particle density and Lenz [31], who wrote down the corresponding energy functional which we will use here (see (7)), addressed this question and derived that the ground state energy of atoms should decrease with the atomic number Z as Z 7/3 . Scott predicted that this could be refined by an additive Z 2 -correction. Considerably later Schwinger [47] argued also for Scott’s prediction. Schwinger [48] and Englert and Schwinger [10–12] even refined these considerations by adding more lower order terms (see also Englert [9]). The challenging conjecture whether the predicted formulae by Thomas and Fermi would yield asymptotically correct results in leading order when compared with the N -particle Schrödinger theory was settled by Lieb and Simon in their seminal paper [37]. Alternative proofs were given by Thirring [60] (lower bound), Lieb [34], and Balodis and Solovej [41]. The Scott correction was established by Hughes [26,27] (lower bound), © 2007 The authors. Reproduction of this article for non-commercial purposes by any means is permitted.
Current address: Department of Mathematics, Princeton University, Princeton, NJ 08544-1000, USA
550
R. L. Frank, H. Siedentop, S. Warzel
and Siedentop and Weikard [49–53] (lower and upper bound). In fact, even the existence of the Z 5/3 -correction conjectured by Schwinger was proven (Fefferman and Seco [18– 20,13,21,16,14,15,17]). Later these results were extended in various ways, e.g., the Scott correction to ions (Bach [1,2]), to molecules (Ivrii and Sigal [29], Solovej and Spitzer [59,58], Balodis [4]), and to molecules in the presence of magnetic fields (Sobolev [56] and Ivrii [30]). Ivrii [28] extended the validity of Schwinger’s correction to the molecular case. Nevertheless, from a physical point of view, these considerations are questionable, since large atoms force the bulk of the electrons on orbits that are close to the nucleus (of order Z −1/3 ) where the electrons move with high speed which requires a relativistic treatment. Schwinger [48] has estimated this effect concluding that they should contribute to the Scott correction whereas the leading term should be unaffected by the change of model. Sørensen [45] was the first who proved that the Thomas-Fermi term is indeed left unaffected when the non-relativistic Hamiltonian is replaced by the Chandrasekhar operator in the limit of large Z and large velocity of light c with κ := Z /c fixed. Cassanas and Siedentop [5] showed that similarly to the Chandrasekhar case, the leading energy is not affected for the Brown-Ravenhall operator. Recently, Solovej, Sørensen, and Spitzer [57] announced a proof that a correction is at most of the order Z 2 , although no claim on the actual value of the coefficient was made. (See also Sørensen [44] for the non-interacting case.) In the present paper, we give an alternate proof of the Scott correction of the Chandrasekhar operator, which we present – for simplicity – in the atomic case. Our proof relies heavily on a semi-classical approximation for electrons that are far enough from the nucleus. However, we use them only indirectly relying on known results about the non-relativistic Scott correction. In addition we use only relatively standard technical means as Lieb-Thirring and Hardy inequalities. Our basic strategy is a renormalization of the energy setting the energy of the Schrödinger atom as zero. Moreover, we are able to extend the result of [57] to the case of the critical coupling constant. However, the question of whether the Schwinger correction which lives on the scale Z −2/3 also exists in this relativistic model cannot be answered with our techniques and is, therefore, left open. The energy of an heavy atom is described by a quadratic form E # : QN → R ⎡ N T − Z |x|−1 + ψ → ψ, ⎣ ν
ν=1
with Q N :=
N
⎤ |xµ − xν |−1 ⎦ ψ
(1)
1≤µ 0. Indeed, by scaling x → x/κ, s(κ) = tr κ −2 p 2 + κ −4 − κ −2 − |x|−1 − 21 p 2 − |x|−1 , −
−
and κ −2 p 2 + κ −4 − κ −2 is monotone decreasing with respect to κ. 3. It is part of our assertion that the operator in brackets in (6) belongs to the trace class. In the subcritical case κ < 2/π this was already proved by Sørensen [44]. Since neither the Schrödinger nor the Chandrasekhar operator depend explicitly on spin, we shall assume henceforth q = 1; the general case follows along the same line. We prove Theorem 1 in Sect. 3 after having established a precise bound on the spectral shift for one-particle operators in the next section. 2. Bound on the Spectral Shift For any real-valued potential v for which the following operators can be defined according to Friedrichs, we set S(v) :=
p 2 − v, C(v) := p 2 + 1 − 1 − v, 1 2
(11) (12)
the Schrödinger, respectively Chandrasekhar, operator in L 2 (R3 ). We assume c = 1 throughout this section. If the potential v is radially symmetric, both the Schrödinger and the Chandrasekhar operator commute with the angular momentum operators allowing for a decomposition into the corresponding invariant subspaces. For each l ∈ N0 the subspace Hl spanned by the spherical harmonics Yl,m with m = −l, . . . , l, is an invariant subspace of S(v) and ∞ H = L 2 (R3 ). We write for the orthogonal projection onto H and C(v), and ⊕l=0 l l l trl (A) := tr(l A) for the corresponding reduced trace. Our main result in this section concerns the decay of the spectral shift trl [C(v)]− − [S(v)]− as the angular momentum l increases. We shall prove
(13)
Relativistic Scott Correction
553
Theorem 2. There exists a constant M such that for all µ ≥ 0 and for all l ∈ N0 and for all v : [0, ∞) → [0, ∞) satisfying v(r ) ≤
r −1
2 π
(14)
the sum of eigenvalue differences for angular momentum l is bounded according to 0 ≤ trl [C(v) + µ]− − [S(v) + µ]− ≤ M(l + 1)−2 .
(15)
This theorem shows that there is an effective cancelation in the difference in (15). Indeed, if v(r ) = κr −1 , then trl S(κr
−1
)
−
∞ 1 κ2 = (2l + 1) , 2 (n + l)2 n=1
and also that (15) implies that the operator this does not decay at all as l → ∞. We note p 2 + 1 − 1 − κ|x|−1 − 21 p 2 − κ|x|−1 − appearing in Theorem 1 is trace class −
for any κ ∈ (0, π2 ].
2.1. Reminder on Lieb-Thirring estimates. In the proof of Theorem 2 we use the following relativistic Lieb-Thirring inequalities due to Daubechies [7]. Proposition 1. For any γ > γ trl [C(v)]−
1 2
there exists a constant L γ such that for all l ≥ 0,
≤ L γ (2l + 1)
∞
0
1+γ [v(r )]+
1 +γ + [v(r )]+2
dr.
(16)
Proposition 1 is also valid for γ = 21 , but we will not need this fact. γ
γ
Proof. Since trl [C(v)]− ≤ (2l + 1) tr 0 [C(v)]− , it suffices to verify the claim for l = 0. If we extend v to an evenfunction v˜ on R, then C(v) is unitarily equivalent to the part of the whole-line operator p 2 + 1 − 1 − v˜ on antisymmetric functions. In the whole-line case, the result follows by evaluating the integral in [7, Eq. (2.14)]. Our treatment of the critical case κ = Theorem 11] of Lieb and Yau.
2 π
is based on the following inequality [38,
Proposition 2. Let I be a function with support in {x ∈ R3 : |x| ≤ 1}. Then for all µ > 0, tr I | p| − π2 |x|−1 − µ I ≤ const µ4 |I (x)|2 dx. −
554
R. L. Frank, H. Siedentop, S. Warzel
2.2. Finiteness of partial traces. In (15) the trace of the difference of the operators [C(v) + µ]− and [S(v) + µ]− appears. We begin by proving that both operators separately have finite traces. Since S(v) ≥ C(v) (see also (25) below) it suffices to prove this in the relativistic case. Lemma 1. For all µ ≥ 0 and all l ∈ N0 one has trl C π2 |x|−1 + µ − < ∞. Proof. Obviously it suffices to prove the lemma for µ = 0. Pick a Lipschitz function ϕ : R+ → [0, π/2] with Lipschitz constant φ0 which vanishes for r ≤ 1/2 and which is π/2 for r ≥ 1. Then I := cos(ϕ) has compact support around the origin and, furthermore, it constitutes together with A := sin(ϕ) a quadratic partition of unity, i.e., I 2 + A2 = 1. According to Lieb and Yau [38, Theorem 9] we have the localization formula
ψ, ( p 2 + 1)1/2 ψ = I ψ, ( p 2 + 1)1/2 I ψ + Aψ, ( p 2 + 1)1/2 Aψ − ψ, Lψ (17) for ψ ∈ L 2 (R3 ). Here L is the bounded integral operator on L 2 (R3 ) with non-negative kernel given in terms of a Bessel function L(x, y) := K 2 (|x − y|)
sin2 [(ϕ(|x|) − ϕ(|y|))/2] . π 2 |x − y|2
(18)
We shall estimate this localization error by a multiplication operator. More precisely, we shall show that there exists a constant M > 0 such that
ψ, Lψ ≤ M ψ, e−|x| ψ.
(19)
To prove this, we note that by the Schwarz inequality we have sin2 ((ϕ(|x|) − ϕ(|y|))/2) 2
ψ, Lψ ≤ dx |ψ(x)| dy K 2 (|x − y|) π 2 |x − y|2 R3 R3 2 φ0 ≤ dx |ψ(x)|2 dy K 2 (|x − y|) 2π |x|R Z (x) |x − y|
(39)
and the corresponding one-particle operators by STF = S(Z |x|−1 − χTF ), CTF = Cc (Z |x|−1 − χTF ),
(40) (41)
both self-adjointly realized in L 2 (R3 ). Here we use a notation similar to that in Sect. 2, (42) Cc (v) := p 2 c2 + c4 − c2 − v. We remark that we slightly deviate from the usual choice Z |x|−1 − ρ Z ∗ | · |−1 (x) for the screened potential. This is motivated by the correlation inequality (44) below. The concept of an exchange hole can be traced back to Slater [55]. It also has been used to estimate the exchange-correlation energy (Lieb [32], Lieb and Oxford [36]). We shall express the many-particle ground-state energy in terms of quantities involving the above one-particle operators. In the relativistic case we use the correlation inequality of [40] to obtain a lower bound on the many-particle ground-state energy. Lemma 5. For all L ∈ N, E κC (Z ) ≥ −
L−1
∞ trl Cc (Z |x|−1 ) − trl [CTF ]− − D(ρ Z , ρ Z ). −
l=0
(43)
l=L
Proof. We use the correlation inequality [40, Eq. (14)]
|xµ − xν |−1 ≥
N
χTF (xν ) − D(ρ Z , ρ Z ),
(44)
ν=1
1≤µ K − l,
(58)
where K := [d Z 1/3 ] with d some positive constant independent of Z . # Case l ≥ L. We choose ψn,l,m (x) = ϕn,l (|x|)Yl,m (x/|x|), where the functions ϕn,l , as well as the weights wn,l , are defined exactly as in [49, Sect. 2] independently of #. (The exact form of the functions and the values of the weights for l ≥ L are irrelevant in our context.)
Relativistic Scott Correction
561
Note that the above construction guarantes d # to be density matrices, i.e., 0 ≤ d # ≤ 1. Moreover, by the choice of L, K , and wn,l one can assure that tr d # ≤ Z . (For # = S this is proved in [49, Corollary 4.1], and follows hence also for # = C.) Since dl# is independent of # for l ≥ L we drop the superscript in this case. Moreover, we shall use the notations # d< :=
L−1 l=0
dl# , d> :=
∞
dl ,
l=L
and # # ρl# (x) := dl# (x, x), ρ< (x) := d< (x, x), ρ> (x) := d> (x, x).
We recall now that the density matrix d S gives an energy which is correct up to the order we are interested in. More precisely, one has Proposition 4. Let L := Z 1/12 . Then, for sufficiently large Z , E S (Z ) = tr[S(Z |x|−1 )d S ] + D(ρ S , ρ S ) + O(Z 47/24 ).
(59)
Proof. It is shown in [49] that for sufficiently large Z , E S (Z ) ≤ tr S(Z |x|−1 )d S + D(ρ S , ρ S ) ≤ E TF (Z ) + 41 Z 2 + const Z 47/24 . Combining this with the lower bound on E S (Z ) which was recalled in (47) and (48), we obtain the assertion. We decrease the ground state energy further by dropping a part of the Coulomb energy, E S (Z ) ≥ tr[S(Z |x|−1 )d<S ] + tr[S(Z |x|−1 )d> ] + D(ρ> , ρ> ) − const Z 47/24 .
(60)
For an upper bound in the relativistic case we employ a variational principle to obtain Lemma 6. For sufficiently large Z , C C C C E κC (Z ) ≤ tr[Cc (Z /|x|)d< ] + tr[S(Z /|x|)d> ] + D(ρ> , ρ> ) + 2D(ρ< , ρ> ) + D(ρ< , ρ< ).
Proof. As noted above, d C satisfies 0 ≤ d C ≤ 1 and tr d C ≤ Z for sufficiently large Z [49, Corollary 4.1]. Using that the Hartree-Fock functional bounds the ground state energy from above – even if non-idempotent density matrices are inserted, a fact that was proven by Lieb [33] (see also Bach [3]) – and estimating the indirect part of the Coulomb energy by zero we obtain E κC (Z ) ≤ tr[Cc (Z |x|−1 )d C ] + D(ρ C , ρ C ).
(61)
C C Both terms on the right-hand side are split according to d = d< + d> . To obtain the 1 2 desired upper bound we use the inequality 2 p ≥ c2 p 2 + c4 − c2 for large angular momenta.
562
R. L. Frank, H. Siedentop, S. Warzel
The following lemma shows the irrelevance of the interaction energy of the low lying states with all other electrons (including themselves). The proof follows the strategy pursued in [49], namely to estimate it by the lowest Coulomb energy of a particle in the field of an external point charge Z , and then simply multiplying by the particle number. There is, however, one important change in the channel l = 0. Because of the singular nature of the lowest eigenfunctions in the critical case, their expectations in potentials with Coulomb singularities does not exist. To circumvent this problem we use the Hardy-Littlewood-Sobolev inequality followed by a recent Sobolev-type inequality [24]. C , ρ C ) ≤ const Z 11/6 log Z . Lemma 7. One has D(ρ< C , ρ C ) and D(ρ C , ρ ) separately. For the latter one we Proof. We treat the terms D(ρ< < < > recall that ρlC (x) dx = (2l + 1)(K − l), 0 ≤ l < L , (62)
where K = O(Z 1/3 ) and that by Proposition 3.4 in [49], S ∞ ρ (x) ρl (x) dx ≤ dx ≤ const Z 4/3 . |x| |x|
(63)
l=L
The densities ρl# are spherically symmetric because of the addition formula for the spherical harmonics. Hence, using Newton’s theorem [42], we have 1 ρ> (y) C C D(ρ< , ρ> ) ≤ (x)dx ρ< dy 2 |y| ≤ const
L−1
(2l + 1)(K − l)Z 4/3 ≤ O(L 2 K Z 4/3 ) = O(Z 11/6 ). (64)
l=0 C := ρ C − ρ C and estimate We set ρ < 0 C C C C ≤ 2D ρ0C , ρ0C + 2D ρ . D ρ< , ρ< , ρ
(65)
This allows to treat the contributions from l = 0 and 1 ≤ l < L separately. Using a scaled version of Lemma 3 with Rl := (l + 21 )2 − 4κ 2 /4κ we obtain for 1 ≤ l < L, 1 tr |x|−1 dlC ≤ tr Cc (0) dlC + tr χ{|x|>Rl /c} |x|−1 dlC 2Z c 1 tr dlC , ≤ tr |x|−1 dlC + 2 Rl where the last inequality used the fact that eigenfunctions of dlC are also eigenfunctions of Cc (Z |x|−1 ) with negative eigenvalue. Hence, summing over l and noting that Rl−1 ≤ const l −2 ,
L L−1 ρ (y) dy = tr(|x|−1 dlC ) ≤ const Z l −2 ρlC (x) dx. |y| l=1
l=1
Relativistic Scott Correction
563
Thus by (62) and again by Newton’s theorem 1 ρ (y) C C C dy D(ρ , ρ ) ≤ ρ (x) dx 2 |y| ≤ const K L 2 K Z log L ≤ const Z 11/6 log Z . Finally, we treat the term corresponding to l = 0. By the Hardy-Littlewood-Sobolev inequality (cf. [35]) and by Hölder’s inequality ⎛ D(ρ0C , ρ0C ) ≤ const ρ0C 26/5 = const ⎝
# K
⎞5/3
$6/5 C |ψn,0,0 (x)|2
dx ⎠
n=1
≤ const K 1/3
# K
$5/3 C |ψn,0,0 (x)|12/5 dx
.
n=1
Now we use the Sobolev-type inequality [24, Eq. (2.8)] 1/2 u212/5 ≤ const u, (| p| − π2 |x|−1 )u u,
(66)
where the first factor on the right-hand side is to be understood in form sense. Using C that | p| − π2 |x|−1 ≤ c−1 Cc (Z |x|−1 ) + c and that ψn,0,0 is a normalized eigenfunction −1 of Cc (Z |x| ), we deduce C ψn,0,0 12/5 ≤ const c1/4 .
(67)
Combining the previous relations we arrive at D(ρ0C , ρ0C ) ≤ const K 1/3 (K c3/5 )5/3 ≤ const Z 5/3 . This completes the proof of the lemma.
(68)
Proof (of Theorem 1 – second part). It follows from Lemma 7 that C C C , ρ> ) + D(ρ< , ρ< ) = O(Z 11/6 log Z ). 2D(ρ