Commun. Math. Phys. 220, 1 – 12 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
On the Definition of SRB-Measures for Coupled Map Lattices Esa Järvenpää, Maarit Järvenpää University of Jyväskylä, Department of Mathematics, P.O. Box 35, 40351 Jyväskylä, Finland. E-mail:
[email protected];
[email protected] Received: 23 June 2000 / Accepted: 4 January 2001
Abstract: We consider SRB-measures of coupled map lattices. The emphasis is given to a definition according to which a SRB-measure is an invariant probability measure whose projections onto finite-dimensional subsystems are absolutely continuous with respect to the Lebesgue measure. We show that coupled map lattices which are close to an uncoupled expanding map have typically an infinite number of SRB-measures. In particular, we give a counterexample to the Bricmont–Kupiainen conjecture.
1. Introduction The SRB-measure (Sinai, Ruelle, Bowen) is by definition a “natural” invariant probability measure of a dynamical system (X, T ), where X is a manifold and T : X → X is a differentiable mapping. The meaning of the word “natural” comes from the interpretation that the dynamical system is a model of some physical system. The natural measure should tell how typical points behave asymptotically, that is, what the long time behaviour of the system is for typical initial values. Typical points are determined by the set-up of the actual experiment. If the phase space of the system is a manifold then one may argue that the Lebesgue measure or some smooth modification of it is the right distribution for the initial values. Having found an invariant measure µ and aset A ⊂ X with positive Lebesgue measure such that the Birkhoff average limn→∞ n1 ni=1 δT i (x) tends to µ in the weak∗ -topology for all x ∈ A, it is reasonable to say that µ is a SRBmeasure. Here δx is the probability measure concentrated at the point x. The existence of several other definitions for the SRB-measure found in the literature stems from the fact that this is a difficult condition to test. One definition is that the SRB-measure is an invariant probability measure whose conditional distributions on unstable leaves are absolutely continuous with respect to the corresponding Lebesgue measure. According to another definition it is an equilibrium state for a certain potential function obtained from the derivative of the map. A third definition states that the SRB-measure is a limit of the
2
E. Järvenpää, M. Järvenpää
Lebesgue measure under the iteration of the dynamics. For nice finite-dimensional systems like expanding maps on compact manifolds or axiom A systems all these definitions agree and give the same unique SRB-measure. When adopting the aforementioned definitions into the infinite-dimensional setting of coupled map lattices, one should take into consideration that in an experiment it is possible to measure only a finite number of quantities, in particular, a finite number of coordinates. Thus it seems quite natural to demand that the finite dimensional projections of a SRB-measure are absolutely continuous with respect to the corresponding Lebesgue measure. The extension of the equilibrium state definition to the infinite dimensional setting is not trivial because of the difficulties caused by infinite determinants and matrices. The third definition is obtained by studying finite dimensional approximations of the whole system, taking the limit of the (finite) Lebesgue measure under these approximations, and letting the subsystem size tend to infinity. Even for expanding maps one possibility is to demand that finite dimensional conditional distributions are absolutely continuous. All of the above definitions have been used in the literature. Bunimovich and Sinai [BS] studied expanding maps of the unit interval with a special diffusive coupling over one-dimensional lattice Z. They showed that the system has an invariant Gibbs state whose projections onto finite-dimensional subsystems are absolutely continuous with respect to the Lebesgue measure. In [BK1] Bricmont and Kupiainen used the first mentioned definition, proved the existence of a SRB-measure for analytic expanding circle maps in the regime of small analytic coupling over d-dimensional lattice Zd , and conjectured the uniqueness of this SRB-measure. They extended the existence result for special Hölder continuous functions in [BK2]. They also verified that the SRB-measure is unique in the class of measures for which the logarithm of the density is Hölder continuous. In [J] it was shown that all these results remain true if one replaces the circle by any compact Riemannian manifold. Jiang and Pesin [JP] considered weakly coupled Anosov maps. They managed to extend the equilibrium state definition to this setting and proved the existence and uniqueness of the SRB-measure. Recently, Keller and Zweimüller [KZ] studied piecewise expanding interval maps with a special unidirectional coupling using the last mentioned definition. They established the existence and uniqueness of the SRB-measure in this setting. Finally, the proofs of [BK2, JP] give the uniqueness of the SRB-measure given as in the third definition above. The purpose of this paper is to show that the first mentioned definition is not equivalent with the second and third ones in an infinite dimensional setting. We will construct a coupled map lattice which has an infinite number of SRB-measures according to the first mentioned definition (see Theorem 3.4). (Three of these are also (space) translation invariant.) We also argue that our example is not just a curious artificial system but it manifests a typical behaviour. Thus, although being perhaps the most natural of the above definitions at the heuristic level, this definition has the drawback of being non-unique. Our results also imply that for each finite subsystem X one can find a set A of positive Lebesgue measure such that for each x ∈ A there are boundary conditions y1 (x) and y2 (x) such that n 1 lim δT i (x∨yi ) = µi , n→∞ n i=1
where µ1 = µ2 and x ∨ y is the natural element of the phase space X. Hence the boundary conditions do have an effect. Note that one cannot draw the conclusion that there is a physical phase transition since for each x ∈ A one has to choose the boundary
Non-Uniqueness of SRB-Measures for Coupled Map Lattices
3
condition in a very special way in order to see another SRB-measure than the one whose existence was proved in [BK2].
2. Preliminaries Our main motivation comes from the well-known projection results in Rn stating that the projections of a Radon measure µ onto almost all m-planes are absolutely continuous with respect to the m-dimensional Lebesgue measure provided that the m-energy of µ is finite [M, Theorem 9.7]. Our strategy is to use the fact that expanding maps have small invariant sets (and measures) in the sense that their dimensions are less than the dimension of the ambient manifold. For example, the 13 -Cantor set is invariant under the map x → 3x mod 1. If one takes a finite n-fold product of these Cantor-sets, one will obtain a set which is invariant under the corresponding n-fold product map. Of course, the dimension of this product set is less than n, and so the natural Hausdorff measure living on the set, although being invariant, is not a SRB-measure since it is not absolutely continuous with respect to the n-dimensional Lebesgue measure. However, as n grows, the dimension of the product Cantor set grows. In particular, for each integer m one can find n such that the dimension of the n-fold Cantor set is greater than m. By the above mentioned projection result typical projections of the n-fold Hausdorff measure onto m-dimensional subspaces are absolutely continuous with respect to the m-dimensional Lebesgue measure. Of course, for this system the m-dimensional subsystems are atypical and the projections onto them are not absolutely continuous. Our idea is that a small coupling will make these coordinate planes typical ones. However, one has to be careful since in [HK] Hunt and Kaloshin proved that these projection results are not valid in infinite dimensional spaces. The projection theorems have also the reversed statements according to which the set of exceptional directions may have positive dimension although having zero measure (see [F]). Thus one cannot expect anything more than “almost all”-results. We adopt the very general formulation of the projection theorem due to Peres and Schlag [PS]. We begin by recalling the notation from [PS] which we will use later. Definition 2.1. Let (X, d) be a compact metric space, Q ⊂ Rn an open connected set, and : Q × X → Rm a continuous map with n ≥ m. For any multi-index |η| η = (η1 , . . . , ηn ) ∈ Nn , let |η| = ni=1 ηi be the length of it, and ∂ η = (∂ε1 )η1∂...(∂εn )ηn , where = (ε1 , . . . , εn ) ∈ Q. Let L be a positive integer and δ ∈ [0, 1). We say that ∈ C L,δ (Q) if for any compact set Q ⊂ Q and for any multi-index η with |η| ≤ L there exist constants Cη,Q and Cδ,Q such that
|∂ η (, x)| ≤ Cη,Q and sup |∂ η (, x) − ∂ η ( , x)| ≤ Cδ,Q | − |δ |η |=L
for all , ∈ Q and x ∈ X. Next we will give a definition of a subclass of C L,δ (Q) from [PS]. Definition 2.2. Let ∈ C L,δ (Q) for some L and δ. Define for all x = y ∈ X, x,y () =
(, x) − (, y) . d(x, y)
4
E. Järvenpää, M. Järvenpää
Let β ∈ [0, 1). The set Q is a region of transversality of order β for if there exists a constant Cβ such that for all ∈ Q and for all x = y ∈ X the condition |x,y ()| ≤ Cβ d(x, y)β implies det(Dx,y ()(Dx,y ())T ) ≥ Cβ2 d(x, y)2β . Here the derivative with respect to is denoted by D and AT is the transpose of a matrix A. Further, is (L, δ)-regular on Q if there exists a constant Cβ,L,δ and for all multiindices η with |η| ≤ L there exists a constant Cβ,η such that for all , ∈ Q and for all distinct x, y ∈ X, |∂ η x,y ()| ≤ Cβ,η d(x, y)−β|η| and
sup |∂ η x,y () − ∂ η x,y ( )| ≤ Cβ,L,δ | − |δ d(x, y)−β(L+δ) .
|η |=L
Remark 2.3. Note that if the determinant in Definition 2.2 is bounded away from zero then Q is a region of transversality of order β for all β ∈ [0, 1). Definition 2.4. Let µ be a Borel measure on X and α ∈ R. The α-energy of µ is d(x, y)−α dµ(x)dµ(y). Eα (µ) = X
X
We denote the image of a measure µ under a map f : X → Y by f∗ µ, that is, f∗ µ(A) = µ(f −1 (A)) for all A ⊂ Y . The following theorem from [PS] gives a relation between Sobolev-norms of images of measures under C L,δ (Q)-mappings and energies of original measures. Theorem 2.5. Let Q ⊂ Rn and ∈ C L,δ (Q) such that L + δ > 1. Let β ∈ [0, 1). Assume that Q is a region of transversality of order β for and that is (L, δ)-regular on Q. Let µ be a finite Borel measure on X such that Eα (µ) < ∞ for some α > 0. Then there exist a constant a0 depending only on m, n, and δ such that for any compact Q ⊂ Q, ∗ µ22,γ dLn () ≤ Cγ Eα (µ) Q
for some constant Cγ provided that 0 < (m + 2γ )(1 + a0 β) ≤ α and 2γ < L + δ − 1. Here · 2,γ is the Sobolev norm, that is, |ˆν (ξ )|2 |ξ |2γ dLm (ξ ) ν22,γ = Rm
for any finite compactly supported Borel measure on Rm , where νˆ (ξ ) = e−iξ ·x dν(x) Rm
is the Fourier transform of ν. Proof. [PS, Theorem 7.3].
Non-Uniqueness of SRB-Measures for Coupled Map Lattices
5
Remark 2.6. Let ν be a finite compactly supported Borel measure on Rn . If ν2,0 < ∞ then ν is absolutely continuous with respect to the Lebesgue measure Ln and its RadonNikodym derivative is L2 -integrable, that is, D(ν, Ln ) ∈ L2 (Rn ) (see 3.5). Indeed, if νˆ ∈ L2 (Rn ) then by the surjectivity of the Fourier transform [SW, Theorem 2.3, p. 17] there exists f ∈ L2 (Rn ) such that fˆ = νˆ . Thus by [T, Definition 1.7, p. 262] f = ν as a distribution meaning that f = D(ν, Ln ). Note also that ν2,γ < ∞ for γ ≥ n + 2 implies that D(ν, Ln ) has L2 -integrable derivatives of order γ , that γ is, D(ν, Ln ) ∈ W2 (Rn ). So by [SW, Lemma 3.17, p. 26] D(ν, Ln ) is continuously differentiable. 3. Results Let ' = Zd S 1 , where d ≥ 1 is an integer and S 1 ⊂ C is the unit circle. We use ˜ ⊂ Zd let π : ' → ' and the notation ' = S 1 for all ⊂ Zd . For ⊂ π , : ' ˜ → ' be the natural projections. Let ε0 > 0 and let A : ' → ' be such ˜ that its lift A : ' → ', where ' = Zd R, is A (x)i = xi + εil 2−|i−l| g(xl ) (3.1) l∈Zd
for all i ∈ Zd , where | · | is a metric on Zd , εil ∈ (−ε0 , ε0 ) for all i, l ∈ Zd and g is continuously differentiable and 1-periodic. (We use the covering map p : ' → ' such that Zd [0, 1] is a covering domain. Then A = p ◦ A ◦ p−1 .) For the discussion of the explicit form of the conjugacy A , see Remarks 3.5. Set E = Zd ×Zd (−ε0 , ε0 ) and denote by L the product over Zd × Zd of normalized Lebesgue measures on (−ε0 , ε0 ). It is not difficult to see that A is invertible for all ∈ E provided ε0 is small enough (depending on |g |). We fix such ε0 and set T = A ◦F ◦A−1 , d 3 1 maps z → z (or t → 3t mod 1 if S is where F : ' → ' is the product over Z of viewed as [0, 1]). Let K = Zd K and µ = Zd Hs |K , where K is the 13 -Cantor set on S 1 (or [0, 1]) and Hs |K is the restriction of the s-dimensional Hausdorff measure to 2 K with s = log log 3 . (Note that s is the Hausdorff dimension of K). Now (A )∗ µ is clearly T -invariant, that is, (T )∗ (A )∗ µ = (A )∗ µ. Our aim is to show that for L-almost all the projection (π )∗ (Aε )∗ µ is absolutely continuous with respect to the Lebesgue measure on ' for all finite ⊂ Zd . Let ⊂ Zd . We denote the restriction of A to ' by A, , that is, A, (x)i = xi + εil 2−|i−l| g(xl ) l∈
˜ ⊂ Zd be finite for all i ∈ . Set µ = Hs |K and K = K. Let ⊂ ˜ such that | |s > | |, where the number of elements in is denoted by | |. Let ˜ E × ˜ = × ˜ (−ε0 , ε0 ) and let L × be the restriction of L to E × ˜ . We will first ˜
show that for L × -almost all ∈ E × ˜ the measure (π , ◦ Aε, ˜ )∗ µ ˜ is absolutely ˜ continuous with respect to the Lebesgue measure on ' . As it will be indicated in the proof of Proposition 3.2 this claim follows from Theorem 2.5. In order to apply Theorem 2.5 we have to give some conditions on g. Since g is 1-periodic and continuously differentiable there necessarily exists t0 ∈ [0, 1] such that
6
E. Järvenpää, M. Järvenpää
g (t0 ) = 0. In order to satisfy the transversality assumption in Theorem 2.5, we demand that g = 0 on K. More precisely, let b > 0 and let g be increasing on [0, 1/6] such that g(0) = 0 and g (t) ≥ b for all t ∈ [0, t1 ] for some 1/9 < t1 < 1/6. Define g(t + 1/6) = g(1/6 − t) for t ∈ [0, 1/6] and g(1 − t) = −g(t) for t ∈ [0, 1/3]. We extend g to the interval [1/3, 2/3] such that g is continuously differentiable, g([0, 1]) ⊂ [−1, 1], for some B ≥ b we have |g (t)| ≤ B for all t ∈ [0, 1], and |g (t)| ≥ b for all t ∈ [1/3, 1/3 + t2 ] ∪ [2/3 − t2 , 2/3], where 0 < t2 < 1/9. Consider the second step in the construction of the Cantor set K. Call the chosen intervals Ii , i = 1, . . . , 4, that is, I1 = [0, 1/9], I2 = [2/9, 1/3], I3 = [2/3, 7/9], and I4 = [8/9, 1]. Let x ∈ K and ⊂ Zd . Define x˜ ∈ K in the following way: For all i ∈ , let x˜i = xi . For j ∈ c = Zd \ set x˜j = xj if xj ∈ I1 ∪ I4 , x˜j = 1/6 − (xj − 1/6) if xj ∈ I2 , and x˜j = 5/6 + 5/6 − xj if xj ∈ I3 . Note that with these definitions g(x˜j ) = g(xj ) for all j ∈ Zd implying that π ◦ A (x) ˜ = π ◦ A (x). Further, if / [−t1 , t1 ] for some j ∈ c then x˜j ∈ [−t1 , t1 ]. xj ∈ Let x, y ∈ K such that xi ∈ I1 and yi ∈ I2 for some i ∈ . Then A (y)i − A (x)i ≥ yi − xi − εil 2−|i−l| |g(yl ) − g(xl )| l∈Zd
≥ yi − xi −
εil 2−|i−l| B|yl − xl | ≥ yi − xi − Cε0 ≥
l∈Zd
1 (3.2) 18
for ε0 small enough since yi − xi ≥ 1/9. Thus the cubes at the second stage of the construction of K with i th side I1 will not overlap with cubes with i th side I2 under the projection π ◦ A provided that i ∈ . (The same argument works in other cases as well, see 3.3 below.) More precisely, there exists a constant c > 0 such that |π ◦ A (x) − π ◦ A (y)| ≥ c
(3.3)
for all x, y ∈ K with xi ∈ I1 ∪ I4 and yi ∈ I2 ∪ I3 (or xi ∈ I2 and yi ∈ I3 ) for some i ∈ . Further, as in (3.2) we see that there exists c˜ > 0 such that |A (x)i − 1/6| ≥ c˜ for all i ∈ and x ∈ K, giving the existence of δ > 0 such that 1 1 − δ, + δ = ∅ (3.4) π{i} ◦ A (K) ∩ 6 6 for all i ∈ . We fix ε0 and δ such that the above results hold.
˜ ⊂ Zd be finite such that | |s ˜ > | |. Set X ˜ = ˜ [−t1 , t1 ]. Lemma 3.1. Let ⊂
Define : E × ˜ × X ˜ → ' by (, x) = π , ◦ A, ˜ (x). Then the assumptions ˜ of Theorem 2.5 are valid for δ = 0, β = 0, and for all integers L > 1. Further, ˜ Eα (µ ˜ ) < ∞ for any | | < α < | |s.
Proof. We may replace ' by Rm , where m = | |. Let i0 ∈ . Note that X ˜ is a compact metric space equipped with the metric 2−2|i0 −l| |xl − yl |2 . d(x, y)2 = ˜ l∈
Clearly ∈ C L,0 (E × ˜ ) for all positive integers L since all the first order partial derivatives are constants. Note that Q in Definition 2.1 will not play any role here since all the estimates are independent of Q .
Non-Uniqueness of SRB-Measures for Coupled Map Lattices
7
To check the transversality assumption in Definition 2.2, define for all x = y ∈ X ˜ , x,y () =
(, x) − (, y) . d(x, y)
˜ and x, y ∈ X ˜ such that x = y. Then Fix i ∈ , k = (k1 , k2 ) ∈ × ,
Dx,y ()i,k = δi,k1 2−|i−k2 |
g(xk2 ) − g(yk2 ) , d(x, y)
where δi,j is the Kronecker’s delta. Thus for i, j ∈ , (Dx,y ()Dx,y ()T )i,j =
δi,j −|i−l|−|j −l| 2 (g(xl ) − g(yl ))2 d(x, y)2 ˜ l∈ 2 −|i−i0 |−|j −i0 |
≥ δi,j b 2
.
By Remark 2.3 the transversality assumption is valid for β = 0 with the constant C0 = bm 2− i∈ |i−i0 | . Finally, is obviously (L, 0)-regular (in fact (L, δ)-regular for all δ ∈ [0, 1)) on E × ˜ for all positive integers L. The last assertion follows from the well-known properties of the Hausdorff measure Hs |K (see [M, Chapter 8]). The following absolute continuity result follows from Theorem 2.5 and Lemma 3.1. ˜
˜ > | |. Then for L × ˜ ⊂ Zd be finite such that | |s Proposition 3.2. Let ⊂ almost all ∈ E × ˜ the measure (π , ◦ A ) µ is absolutely continuous with ˜ ∗ ˜ ˜ , respect to the Lebesgue measure on ' . Proof. By the arguments given before stating Lemma 3.1 we may replace ' ˜ by X ˜ = ˜
× ˜ )∗ µ ˜ [−t1 , t1 ]. Lemma 3.1 and Theorem 2.5 give (π , ˜ ◦A, ˜ 2,0 < ∞ for L
almost all ∈ E × ˜ which by Remark 2.6 implies the claim. In Proposition 3.3 we will prove that one may replace A, ˜ by A and µ ˜ by µ in Proposition 3.2. For this purpose we use differentiation theory of measures. Let ν and λ be Radon measures on Rn . Recall that the lower derivative of ν with respect to λ at a point x ∈ Rn is defined by D(ν, λ, x) = lim inf r→0
ν(B(x, r)) , λ(B(x, r))
(3.5)
where B(x, r) is the closed ball with centre at x and with radius r. If the limit exists it is called the Radon-Nikodym derivative of ν with respect to λ and is denoted by D(ν, λ, x). Further, ν is absolutely continuous with respect to λ if and only if D(ν, λ, x) < ∞ for ν-almost all x ∈ Rn [M, Theorem 2.12]. ˜ > | | and let 1 ∈ E ˜ ˜ ⊂ Zd be finite such that | |s Proposition 3.3. Let ⊂
× such that the conclusion of Proposition 3.2 is valid. Then for all ∈ E with × ˜ = 1 we have D((π ◦ A )∗ µ, L , x) < ∞ for (π ◦ A )∗ µ-almost all x ∈ ' . Here L is the Lebesgue measure on ' and × ˜ = (εij )(i,j )∈ × ˜ .
8
E. Järvenpää, M. Järvenpää
Proof. Let , 0 ∈ E such that × ˜ = 1 , × ˜c = ˜ ˜ = (0 ) × ˜ ˜ , and (0 )Zd × (0 ) ˜ c ×Zd = 0. Set ν = (π ◦ A )∗ µ and ν0 = (π , ◦ A ) µ . Then ν and ν 0 are ˜ ∗ ˜ ˜ 0 , Radon measures with compact supports [M, Theorem 1.18]. It follows directly from (3.1) that (A0 , ˜ )∗ µ ˜ = (π ˜ ◦ A0 )∗ µ, meaning that ν0 = (π ◦ A0 )∗ µ. By Proposition 3.2 the measure ν0 is absolutely continuous with respect to L . Set m = | |. We will first show that there exists a constant C > 0 such that for all r > 0, √ ν (B(x, r))dν (x) ≤ C ν0 (B(x, mr))dν0 (x). (3.6) '
'
By [FO, Lemma 2.6] it is enough to prove that ν (Q)2 ≤ C Q∈D (r, )
ν0 (Q)2 ,
(3.7)
Q∈D (r, )
where D(r, ) is the family of r-mesh cubes in R , that is, cubes of the form [l1 r, (l1 + 1)r) × · · · × [lm r, (lm + 1)r), where li ∈ Z for all i = 1, . . . , m. Let r > 0. Consider the cubes at the nth stage of the construction of K, where 3−n < r. Call this nth stage approximation K(n). Setting V0 = A0 , ˜ (K ˜ (n)) × K ˜ c (n) = A, ˜ (K ˜ (n)) × K ˜ c (n), we get A0 (spt µ) ⊂ V0 implying that spt ν0 ⊂ π (V0 ). Here the support of a measure λ is denoted by spt λ. ˜ and x, y ∈ X = Zd [−t1 , t1 ] such that xk = yk for all k ∈ , ˜ then If i ∈ A (x)i − A (y)i = εil 2−|i−l| (g(xl ) − g(yl )). (3.8) ˜c l∈
(Recall the discussion before Lemma 3.1 according to which we can assume that xi ∈ ˜ c. [−t1 , t1 ] for all i ∈ Zd ). Note that the difference in (3.8) depends only on xj for j ∈ Defining V = A (K(n)), we have spt ν ⊂ π (V ). Further, A (x)i = A, ˜ (x)i for ˜ c meaning that the restriction of V to the subspace ˜ if xj = 0 for all j ∈ all i ∈ ' ˜ ⊂ ' equals A, ˜ (K ˜ (n)) = A0 , ˜ (K ˜ (n)). So by (3.8) V is obtained from V0 by tilting the rows of “cubes” above each “cube” in A, ˜ (K ˜ (n)) in such a way that the ˜ Thus ν is obtained from ν0 by amount of translation does not depend on xi for i ∈ . spreading around the “cubes” defining ν0 . Let Q ∈ D(r, ). If there is Q ∈ D(r, ) such that a part of the “cubes” above it in V0 are tilted above Q then the corresponding “cubes” above Q (in V0 ) are removed away by (3.8). Define AQ = {Q ∈ D(r, ) | π (A (A−1˜ (Q × X \ ) × X ˜ c )) ∩ Q = ∅}. ˜ ,
Then for all Q ∈ AQ with π (V ) ∩ π (A (A−1˜ (Q × X \ ) × X ˜ c )) ∩ Q = ∅ we ˜ , c have V0 ∩ (Q × X ) = ∅. Further, Q × X c = PQ (Q ), (3.9) Q ∈D (r, ) Q∈AQ
where
PQ (Q ) = {x ∈ Q × X c | π (A (A−1˜ (x ˜ ) × x ˜ c )) ∈ Q }. ,
Non-Uniqueness of SRB-Measures for Coupled Map Lattices
9
Observe that (A0 )∗ µ(PQ (Q )) = (A )∗ µ(A (A−1 0 (PQ (Q )))).
(3.10)
Note that by (3.8) the geometric shape of this partition is independent of Q, that is, if Q1 ∈ D(r, ) with Q1 × X c =
PQ1 (Q ),
Q ∈D (r, ) Q1 ∈AQ
then for all Q2 = τ (Q1 ) ∈ D(r, ) (τ is a translation) we have
Q2 × X c =
τ (PQ1 (Q )).
Q ∈D (r, ) Q1 ∈AQ
Naturally, this partition can be restricted to V0 . Hence for all Q ∈ D(r, ) there are 1 non-negative numbers pQ (Q ) = ν0 (Q) (A0 )∗ µ(PQ (Q )) adding to 1 such that
ν0 (Q) = (A0 )∗ µ(Q × X c ) =
(A0 )∗ µ(PQ (Q ))
Q ∈D (r, ) Q∈AQ
=
(3.11)
pQ (Q )ν0 (Q).
Q ∈D (r, ) Q∈AQ
This gives by (3.10) that ν (Q) =
Q ∈AQ
(A0 )∗ µ(PQ (Q)) =
pQ (Q)ν0 (Q ).
(3.12)
Q ∈AQ
The numbers pQ (Q ) depend on both Q and PQ (Q ). Enumerating the partition of Q×X c given in (3.9) we get Q×X c = ∪i PQ (i), where the geometric shape of PQ (i) ∈ D(r, ) we have PQ may vary as i varies. However, for all i and Q, Q (i) = τ (PQ (i)), = τ (Q). Hence the differences in PQ (i) as Q varies where τ is the translation with Q and i is kept fixed are due to the fact that the measure is not evenly distributed inside horizontal | |-dimensional slices of Q × X c . Note that if such a horizontal slice intersects an element PQ (Q ) of the partition (3.9), then, by (3.8), it may intersect only the elements PQ (Q ), where Q is a neighbour of Q in D(r, ). Let N = 3| | be the number of neighbours. We say that Q and Q are related (Q ∼ Q ) if there exists Q
10
E. Järvenpää, M. Järvenpää
such that Q , Q ∈ AQ . Then by (3.11) and (3.12) N
ν0 (Q)2 −
Q∈D (r, )
=N
ν (Q)2
Q∈D (r, )
pQ (Q )pQ (Q )ν0 (Q)2
Q∈D (r, ) Q ∈D (r, ) Q ∈D (r, ) Q∈AQ Q∈AQ
−
pQ (Q)pQ (Q)ν0 (Q )ν0 (Q )
Q∈D (r, ) Q ∈AQ Q ∈AQ
=
pQ (Q)pQ (Q)(ν0 (Q ) − ν0 (Q ))2 + P ≥ 0
Q ,Q ∈D (r, ) Q∈D (r, ) Q ,Q ∈AQ Q ∼Q
since the remainder P (which is due to the occasionally very generous compensation factor N ) is non-negative. This concludes the proof of (3.7). Let α be the L -measure of the m-dimensional unit ball. By [M, Theorem 2.12] D(ν0 , L , x) exists and is finite for L -almost all x. By Proposition 3.2 the same is true for ν0 -almost all x. By Remark 2.6 we can choose D(ν0 , L ) as smooth as we like by ˜ In particular, it can be chosen to be uniformly continuous so that one can increasing . find r0 > 0 such that ν0 (B(x, r))α −1 r −m ≤ max{2D(ν0 , L , x), 1} for all 0 < r < r0 and x ∈ ' . Thus using Fatou’s lemma, inequality 3.6, the theorem of dominated convergence, and Theorem 2.5 together with Plancharel’s formula [SW, Theorem 2.1, p. 16], we have D(ν , L , x)dν (x) = lim inf ν (B(x, r))α −1 r −m dν (x) r→0 ≤ lim inf ν (B(x, r))α −1 r −m dν (x) r→0 √ ≤ lim inf C ν0 (B(x, mr))α −1 r −m dν0 (x) r→0 √ m = C( m) D(ν0 , L , x)dν0 (x) = C D(ν0 , L , x)2 dL (x) < ∞. Thus D(ν , L , x) is finite for ν -almost all x.
Theorem 3.4. For L-almost all the map T has infinitely many SRB-measures. Proof. For all finite ⊂ Zd , let Eg ( ) = { ∈ E | (π ◦ A )∗ µ is absolutely continuous with respect to L }. By Propositions 3.2 and 3.3 and [M, Theorem 2.12] we get for all finite ⊂ Zd , L(Eg ( )) = 1.
Non-Uniqueness of SRB-Measures for Coupled Map Lattices
Defining Eg =
11
Eg ( )
⊂Zd
| | 0), and has a phase transition, if 1 < c < 2 (d > 2) [4]. It has been widely believed without proof that the hierarchical Ising model in d ≥ 4 dimensions has a critical trajectory converging to the Gaussian fixed point and that the “continuum limit” of the hierarchical Ising model in d ≥ 4 dimensions will be trivial. In this paper, we prove this fact. In the present analysis, it is crucial that the critical Ising model is mapped into a weak coupling regime after a small number of renormalization group transformations (in fact, 70 iterations for d = 4). Moreover, using a framework essentially different from that of [16, 7], we see in the weak coupling regime that the “effective coupling constant” of a critical model decays as c1 /(N +c2 ) after N iterations in d = 4 dimensions (exponentially for d > 4). Our framework in the weak coupling regime is designed especially for a critical trajectory starting at the strong coupling regime so that the criterion of convergence to the Gaussian fixed point can be checked numerically with mathematical rigor. Corresponding results, triviality of φ44 spin model on regular lattice (“full model”), are far harder, and a proof of triviality of Ising model on 4 dimensional regular lattice is, though widely believed, still open. We should here note the excellent and hard work of [9, 10] where the existence of critical trajectory in the weak coupling regime (near Gaussian fixed point; “weak triviality”) is solved by rigorous block spin renormalization group transformation. Our main theorem is the following: √ Theorem 1.1. If d ≥ 4 (i.e. c ≥ 2), there exists a “critical trajectory” converging to the Gaussian fixed point starting from the hierarchical Ising models. Namely, there exists a positive real number sc such that if hN , N = 0, 1, 2, · · · , are defined by (1.5) with h0 = hI,sc , then the sequence of measures hN (x) dx, N = 0, 1, 2, · · · , converges weakly to the massless Gaussian measure hG (x) dx. Remark. Our proof is partially computer-aided and shows for d = 4 that sc ∈ [1.7925671170092624, 1.7925671170092625]. In the following sections, we give a proof of Theorem 1.1. We will concentrate on the case d = 4, since the cases d > 4 can be proved along similar lines (with weaker bounds).
16
T. Hara, T. Hattori, H. Watanabe
2. Strategy The proof of Theorem 1.1 is decomposed into two parts: Theorem 2.1(analysis in the weak coupling regime) and Theorem 2.2 (analysis in the strong coupling regime). They are stated in Sect. 2.3, and their proofs are given in Sect. 4 and Sect. 5, respectively. Theorem 1.1 is proved at the end of this section assuming them. (1) In Theorem 2.1, we control the renormalization group flow in a weak coupling regime by means of a finite number of truncated correlations (Taylor coefficients of logarithm of characteristic functions), and, in terms of the truncated correlations, we give a criterion, a set of sufficient conditions, for the measure to be in a domain of attraction of the Gaussian fixed point. (2) In Theorem 2.2, we prove, by rigorous computer-aided calculations, that there is a trajectory whose initial point is an Ising measure and for which the criterion in Theorem 2.1 is satisfied after a small number of iterations. The first part (Theorem 2.1) is essentially the Bleher–Sinai argument [1, 2, 16]. However, the criteria introduced in the references [16, 7] seem to be difficult to handle when “strong coupling constants” are present in the model, as in the Ising models. In order to overcome this difficulty, we use characteristic functions of single spin distributions and Newman’s inequalities for truncated correlations. The second part (Theorem 2.2) is basically simple numerical calculations of truncated correlations up to 8 points to ensure the criterion. The results are double checked by Mathematica and C++ programs, and furthermore they are made mathematically rigorous by means of Newman’s inequalities. It should be noted that rigorous computer-aided proofs are employed in [14] to Dyson’s hierarchical model in d = 3 dimensions, to prove, with [13], an existence of a non-Gaussian fixed point. (The “physics” are of course different between d = 3 and d = 4.) We also focus on a complete mathematical proof, by combining rigorous computer-aided bounds with mathematical methods such as Newman’s inequalities and the Bleher–Sinai arguments. 2.1. Characteristic function. Denote the characteristic function of the single spin distribution hN as √ ˆhN (ξ ) = FhN (ξ ) = e −1ξ x hN (x) dx. (2.1) R
The renormalization group transformation for hˆ N is hˆ N+1 = FRF −1 hˆ N ,
(2.2)
FRF −1 = T S,
(2.3)
which has a decomposition
where
√ 2 c ξ , 2
β T g(ξ ) = const. exp − g(ξ ), 2 Sg(ξ ) = g
(2.4) (2.5)
Triviality of Hierarchical Ising Model in Four Dimensions
17
and the constant is so defined that T g (0) = 1. The transformation (2.2) has the same form as the N = 2 case of the Gallavotti hierarchical model [5, 11, 12]. Note that only for N = 2 the Gallavotti model is equivalent (by Fourier transform) to the Dyson’s hierarchical model. We introduce a “potential” VN for the characteristic function hˆ N and its Taylor coefficients µn,N by hˆ N (ξ ) = e−VN (ξ ) , VN (ξ ) =
∞
(2.6)
µn,N ξ n .
(2.7)
n=1
(Note that hˆ N (0) = 1.) The coefficient µn,N is called a truncated n point correlation. They are functions of Ising parameter s in h0 = hI,s , but to simplify expressions, we will always suppress the dependences on s in the following. In particular, for the initial condition h0 = hI,s , we have hˆ 0 (ξ ) = hˆ I,s (ξ ) = FhI,s (ξ ) = cos(sξ ), 1 1 4 1 6 µ2,0 = s 2 , µ4,0 = s , µ6,0 = s , 2 12 45 and
µ8,0 =
17 8 s , 2520
etc.,
√ √ 2 h1 (x) = RhI,s (x) = const. eβcs /2 δ(x − s c) + δ(x + s c) + 2δ(x) , √ 1 2 1 + k cos( csξ ) , with k = eβcs /2 , 1+k k k = k", µ4,1 = (2k − 1)"2 , µ6,1 = (16k 2 − 13k + 1)"3 , 6 90 k cs 2 = (272k 3 − 297k 2 + 60k − 1)"4 , etc., with " = . 2520 2(k + 1)
hˆ 1 (ξ ) = µ2,1 µ8,1
2.2. Newman’s inequalities. The function VN has a remarkable positivity property and its Taylor coefficients obey Newman’s inequalities (for a brief review of relevant part, see Appendix A): 1 (2µ4,N )n/2 , n = 3, 4, 5, · · · . (2.8) n These inequalities follow from [15, Theorem 3, 6], since we have chosen the Ising spin distribution h0 = hI,s and the function of η defined by √ c N ηx e hN (x)dx = exp η φθ (2.9) 2 N,hI,s 0 ≤ µ2n,N ≤
θ
has only pure imaginary zeros as is shown in [15, Theorem 1]. Note also that (1.2) and (1.6) imply µ2n+1,N = 0,
n = 0, 1, 2, · · · .
(2.10)
18
T. Hara, T. Hattori, H. Watanabe
The bounds (2.8) are extensively used in this paper. We here note the following facts: (1) The right-hand side of (2.7) has a nonzero radius of convergence. (2) It suffices to prove lim µ4,N = 0 in order to ensure that µ2n,N , n ≥ 3, converges N→∞
to zero, hence the trajectory converges to the Gaussian fixed point. 2.3. Proof of Theorem 1.1. Let h0 = hI,s and d = 4. Note the following simple observations on the “mass term” µ2,N , which is the variance of hN (x) dx. (1) µ2,N is continuous in the Ising parameter s, because hN (x) dx is a result of a finite number of renormalization group transformation (1.2). (2) µ2,N is increasing in s, vanishes at s = 0, and diverges as s → ∞. We then put, for N = 0, 1, 2, · · · ,
s N = inf s > 0 | µ2,N ≥ 1 , √ 3 s N = inf s > 0 | µ2,N ≥ min 1 + √ µ4,N, 2 + 2 . 2
(2.11) (2.12)
Obviously, we have 0 < s N ≤ s N < ∞. Note also that 3 1 ≤ µ2,N ≤ 1 + √ µ4,N 2
(2.13)
holds for s ∈ [s N , s N ]. As is seen in Sect. 4, (2.13) is necessary for the model to be critical. We call this a critical mass condition. The following theorem states our result in the weak coupling regime and is proved in Sect. 4. Theorem 2.1. Let h0 = hI,s and d = 4. Assume that there exist integers N0 and N1 , satisfying N0 ≤ N1 , such that, for s ∈ [s N1 , s N1 ], the bounds 0 ≤ µ4,N0 ≤ 0.0045, 1.6µ24,N0
≤ µ6,N0 ≤
(2.14)
6.07µ24,N0 , 48.469µ34,N0 ,
(2.15)
N0 ≤ N < N1 ,
(2.17)
0 ≤ µ8,N0 ≤
(2.16)
and µ2,N < 2 +
√
2,
hold. Then there exists an sc ∈ [s N1 , s N1 ] such that if s = sc then lim µ4,N = 0,
N→∞
lim µ2,N = 1.
N→∞
Triviality of Hierarchical Ising Model in Four Dimensions
19
s=sc µ4 s=sN -- 1 0.0045
N0
N1
-s=s N
1
N0
N1
0
µ2
1.0
Fig. 2.1. A schematic view of trajectories on (µ2 , µ4 -plane) in Theorem 2.1. Trajectories for s = s N1 and for s = s N1 (solid lines) and the critical trajectory for s = sc (broken line) are shown. The Gaussian fixed point corresponds to the point (1.0, 0). The region defined by inequalities for (µ2 , µ4 ) analogous to (2.13) and (2.14) (and (2.17)) is shaded
Remark. The original Bleher–Sinai argument takes N0 = N1 . We include the N0 < N1 case which makes it possible to complete our proof by evaluating various quantities only at 2 endpoints of the interval in consideration for Ising parameter s, instead of all values in the interval, as is implicit in the assumptions of Theorem 2.1. This point will be clarified at the end of Sect. 5.3. The following theorem states our result in the strong coupling regime and is proved in Sect. 5. Theorem 2.2. The assumptions of Theorem 2.1 are satisfied for N0 = 70 and N1 = 100, where s N1 and s N1 satisfy 1.7925671170092624 ≤ s N1 ,
s N1 ≤ 1.7925671170092625.
Proof of Theorem 1.1 for d = 4 assuming Theorem 2.1 and Theorem 2.2. Theorem 2.1 and Theorem 2.2 imply that there exists sc ∈ [s N1 , s N1 ] such that, for s = sc , lim µ4,N = 0 and lim µ2,N = 1 hold. Then (2.6), (2.7), and (2.8) imply
N→∞
N→∞
2 lim hˆ N (ξ ) = e−ξ ,
N→∞
uniformly in ξ on any closed interval in R. It is easy to see that e−ξ is the characteristic function of the massless Gaussian measure hG , hence Theorem 1.1 holds for d = 4. The bounds on s N1 and s N1 in Theorem 2.2 imply 2
1.7925671170092624 ≤ sc ≤ 1.7925671170092625.
20
T. Hara, T. Hattori, H. Watanabe
3. Truncated Correlations In this section, we prepare basic (recursive) bounds on the truncated correlations that will be used in Sect. 4. The renormalization group transformation is decomposed as (2.3). Since the mapping S is simple, the essential part of our work is an analysis of T . The consequence in this section is Proposition 3.1. 3.1. Recursions. Note first that in terms of VN the mapping S can be expressed as
Se
−VN
(ξ ) = e
−2VN
√
c 2 ξ
.
Using (2.7), (2.10), (1.4) we also have
√ ∞ c 21−(1+2/d)n µ2n,N ξ 2n . ξ = 2VN 2
(3.1)
(3.2)
n=1
Next, write (2.5) as T g = const. gβ/2 , where g(ξ ) =
gt = exp(−t)g,
(3.3)
√ d 2g 1 (ξ ), and β = ( 2 − 1) for d = 4. gt is a solution to 2 dξ 2 ∂gt = −gt , g0 = g. ∂t
Hence, if we put gt (ξ ) = exp(−Vt (ξ )), then Vt satisfies d Vt = (∇Vt )2 − Vt , dt
(3.4)
∂Vt (ξ ). In other words, VN+1 is given as a solution of (3.4) at t = β/2 ∂ξ (modulo constant term), with the initial condition (3.2) at t = 0. If we write where ∇Vt (ξ ) =
Vt (ξ ) =
∞
µ2n (t)ξ 2n ,
n=0
then (3.4) implies d µ2n (t) = − (2n + 2)(2n + 1)µ2n+2 (t) dt n + (2")(2n − 2" + 2)µ2" (t) µ2n−2"+2 (t). "=1
(3.5)
Triviality of Hierarchical Ising Model in Four Dimensions
21
In particular, we have d µ2 (t) = 4µ2 (t)2 − 12µ4 (t), dt d µ4 (t) = 16µ2 (t)µ4 (t) − 30µ6 (t), dt d µ6 (t) = 24µ2 (t)µ6 (t) + 16µ4 (t)2 − 56µ8 (t), dt d µ8 (t) = 32µ2 (t)µ8 (t) + 48µ4 (t)µ6 (t) − 90µ10 (t). dt
(3.6) (3.7) (3.8) (3.9)
Thus, µ2n,N and µ2n,N+1 are related for d = 4 by e.g., 1 1 1 1 µ2 (0) = √ µ2,N , µ4 (0) = µ4,N , µ6 (0) = √ µ6,N , µ8 (0) = µ8,N , 4 32 2 8 2
β β β β µ2,N+1 = µ2 , µ4,N+1 = µ4 , µ6,N+1 = µ6 , µ8,N+1 = µ8 . 2 2 2 2 3.2. Bounds. We first note that the quantities µn (t) obey Newman’s inequalities: by comparing (2.5) and (3.3) we see that the correspondence VN → V (t) is obtained by a replacement β → 2t in (1.2). Therefore µn (t) also is a truncated n point correlation of a measure to which arguments in [15] apply, hence an analogue of (2.8) holds: 0 ≤ µ2n (t) ≤
1 (2µ4 (t))n/2 , n
n = 3, 4, 5, · · · .
(3.10)
We have to show decay of µ4,N as N → ∞. In case d > 4, the decay follows from (3.6) and (3.7) with d-dependent coefficients, namely, if we throw out the negative contributions −µ4 (t) and −µ6 (t) to the right-hand sides of (3.6) and (3.7), respectively, then we have upper bounds on µ2 (t) and µ4 (t). This argument eventually yields exponential decay of µ4,N . In case d = 4, the situation is more subtle, since the decay of µ4,N is weak, i.e., powerlike instead of exponential. In order to derive the delicate bound on µ4 (t), a lower bound for µ6 (t) must be incorporated, which in turn needs an upper bound on µ8 (t). Thus, we have to deal with Eqs. (3.6)–(3.9). This is the principle of our estimation. The result is the following: Proposition 3.1. Let d = 4 and N be a positive integer, and put rN =
√
1
=√
√
1
1 − ( 2 − 1)(µ2,N − 1) 2 − ( 2 − 1)µ2,N √ 2rN − 1 rN 1 ζN = √ = −√ . µ 2µ2,N 2µ2,N 2,N
,
(3.11) (3.12)
(i) If µ2,N < 2 +
√
2,
(3.13)
22
T. Hara, T. Hattori, H. Watanabe
then µ2,N+1 ≤ rN µ2,N ,
(3.14)
µ2,N+1 ≥
(3.15)
rN µ2,N − 3rN2 ζN µ4,N .
(ii) If, furthermore, 21 15 µ4,N ≥ √ ζN µ6,N + ζN2 µ24,N , 4 4 8 2 µ6,N 123 7 1 √ + ζN µ24,N ≥ 24ζN3 µ34,N + √ ζN2 µ4,N µ6,N + ζN µ8,N , 2 8 8 2 8 2 3 45 ζN µ4,N ≥ 12ζN3 µ24,N + √ ζN2 µ6,N , 2 8 2
(3.16) (3.17) (3.18)
then
15 µ2,N+1 ≤ rN µ2,N − 3rN2 ζN µ4,N − 8ζN3 µ24,N − √ ζN2 µ6,N , (3.19) 4 2
15 µ4,N+1 ≥ rN4 µ4,N − √ ζN µ6,N − 21ζN2 µ24,N , (3.20) 2 2
15 µ4,N+1 ≤ rN4 µ4,N − √ ζN µ6,N − 21ζN2 µ24,N 2 2 705 105 2 (3.21) + √ ζN3 µ4,N µ6,N + 447ζN4 µ34,N + ζN µ8,N , 4 2 2
µ6,N µ6,N+1 ≤ rN6 (3.22) √ + 4ζN µ24,N , 2
µ6,N 123 µ6,N+1 ≥ rN6 √ + 4ζN µ24,N − 192ζN3 µ34,N − √ ζN2 µ4,N µ6,N − 7ζN µ8,N , 2 2 (3.23)
µ 12 8,N µ8,N+1 ≤ rN8 (3.24) + √ ζN µ4,N µ6,N + 24ζN2 µ34,N . 2 2 The rest of this section is devoted to a proof of Proposition 3.1.
Proof. Now, observe that µ¯2 (t) defined by d 1 µ¯2 (t) = 4µ¯2 (t)2 , µ¯2 (0) = √ µ2,N , dt 2
(3.25)
is an upper bound of µ2 (t): µ2,N 1 µ2 (t) ≤ µ¯2 (t) = √ . √ 2 1 − 2 2µ2,N t √ 2−1 β = for d = 4 implies (3.14). This, at t = 2 4
(3.26)
Triviality of Hierarchical Ising Model in Four Dimensions
23
Put 1 , √ 1 − 2 2µ2,N t m(t) = µ¯2 (t) − µ2 (t).
M(t) =
We have m(t) ≥ 0, and (3.13) implies that M(t) is√ increasing in t ∈ [0, β/2]. By a change of variable z = M(t) − 1 (dz = 2 2µ2,N M(t)2 dt) and by putting m(z) ˆ = m(t)/M(t)2 ,
µˆ4 (z) = µ4 (t)/M(t)4 ,
µˆ6 (z) = µ6 (t)/M(t)6 , µˆ8 (z) = µ8 (t)/M(t)8 , we have, from (3.6)–(3.9), z µ4,N 1 (−8m(z) ˆ µˆ4 (z) − 15µˆ6 (z))dz, (3.27) +√ 4 2µ2,N 0 z µ6,N 1 µˆ6 (z) = √ + √ (8µˆ4 (z)2 − 12m(z) ˆ µˆ6 (z) − 28µˆ8 (z))dz, (3.28) 8 2 2µ2,N 0 z µ8,N 1 µˆ8 (z) = (24µˆ4 (z)µˆ6 (z) − 16m(z) ˆ µˆ8 (z) − 45µˆ10 (z))dz, +√ 32 2µ2,N 0
µˆ4 (z) =
m(z) ˆ =√
1 2µ2,N
(3.29)
z
(6µˆ4 (z) − 2m(z) ˆ 2 )dz,
(3.30)
0
Eqs. (3.27)–(3.30) with positivity of µ2n (t) imply µ4,N , 4 z µ24,N µ6,N µ6,N 1 µˆ6 (z) ≤ √ + √ 8µˆ4 (z)2 dz ≤ √ + √ z, 8 2 2µ2,N 0 8 2 2 2µ2,N z µ8,N 1 µˆ8 (z) ≤ 24µˆ4 (z)µˆ6 (z)dz +√ 32 2µ2,N 0
µˆ4 (z) ≤
µ8,N 3 µ4,N 2 3 µ4,N µ6,N z+ z , + 32 8 µ2,N 4 µ22,N z 3µ4,N 1 6µˆ4 (z)dz ≤ √ z. m(z) ˆ ≤√ 2µ2,N 0 2 2µ2,N
(3.31) (3.32)
3
≤
(3.33) (3.34)
√ β β (z = M( ) − 1 = 2rn − 1 for d = 4) implies (3.15). 2 2 Using (3.31), (3.32), (3.34) in (3.27), we have
In particular, (3.34) at t =
µˆ4 (z) ≥
21µ24,N 2 µ4,N 15µ6,N z− z . − 4 16µ2,N 8µ22,N
(3.35)
24
T. Hara, T. Hattori, H. Watanabe
Using (3.32), (3.33), (3.34), (3.35) in (3.28) and (3.30) we further have 12µ34,N µ24,N µ6,N 123µ4,N µ6,N 2 7µ8,N z − √ 3 z3 − z − √ z, µˆ6 (z) ≥ √ + √ √ 2 8 2 2 2µ2,N 2µ2,N 16 2µ2,N 8 2µ2,N (3.36) 6µ24,N √ 3 z3 2µ2,N
3µ4,N 45µ6,N m(z) ˆ ≥ √ z− − √ 2 z2 . (3.37) 2 2µ2,N 16 2µ2,N √
√ √ β 2−1 β and z = M − 1 = 2rN − 1 M = 2rN . When d = 4, β = 2 2 2 Then the assumptions (3.16) – (3.18) of Proposition 3.1 imply that the right-hand sides β of (3.35), (3.36), and (3.37) are non-negative at t = . On the other hand, they are 2 concave in z for z ≥ 0. Recall also that z = M(t) − 1 is increasing in t ∈ [0, β/2]. Therefore, they are non-negative for all t ∈ [0, β/2]. Using (3.35), (3.36), and (3.37) in (3.27), we therefore have z
6µ24,N 3µ4,N µ4,N 45µ6,N 1 8 √ z − √ 3 z3 − √ 2 z2 × −√ µˆ4 (z) ≤ 4 2 2µ2,N 16 2µ2,N 2µ2,N 0 2µ2,N
21µ24,N 2 µ4,N 15µ6,N × z− z − 4 16µ2,N 8µ22,N
12µ34,N 3 123µ4,N µ6,N 2 µ24,N µ6,N 7µ8,N +15 √ + √ z− √ 3 z − z − √ z dz √ 8 2 2 2µ2,N 2µ2,N 16 2µ22,N 8 2µ2,N ≤
21µ24,N 2 µ4,N 15µ6,N z− z − 4 16µ2,N 8µ22,N
3 705µ4,N µ6,N 3 447µ4,N 4 105µ8,N 2 z + z + z . 32µ32,N 16µ42,N 32µ22,N √ Recalling that at t = β/2 (z = M( β2 ) − 1 = 2rN − 1) we have
+
(3.38)
β µ¯2 ( ) = rN µ2,N , 2 µ2,N+1 µ4,N+1 µ6,N+1 µ8,N+1
2 β = rN µ2,N − m( ˆ 2rN − 1)M , 2
4 √ β = µˆ4 ( 2rN − 1)M , 2
6 √ β = µˆ6 ( 2rN − 1)M , 2
8 √ β = µˆ8 ( 2rN − 1)M , 2 √
we see that (3.37), (3.35), (3.38), (3.32), (3.36), (3.33) imply (3.19)–(3.24), respectively. This completes a proof of Proposition 3.1.
Triviality of Hierarchical Ising Model in Four Dimensions
25
4. Bleher–Sinai Argument In order to show Theorem 2.1, we confirm existence of a critical parameter s = sc by means of Bleher–Sinai argument, and, at the same time, we derive the expected decay of µ4,N . In Bleher–Sinai argument, monotonicity of s N and s N with respect to N is essential. Proposition 4.1. Let d = 4. Then the following hold: (1) If µ2,N − 1 < 0 then µ2,N+1 < µ2,N . 3 1 (2) If > µ2,N − 1 ≥ √ µ4,N then µ2,N+1 ≥ µ2,N . 4 2 Proof. Note that for both cases in the statement, the assumption (3.13) in Proposition 3.1 holds. Hence, (3.14), with (3.11) and monotonicity of µ2,N , implies µ2,N − 1 < 0 ⇒ rN < 1 ⇒ µ2,N+1 < µ2,N . Next we see that (3.15), with (3.11) and (3.12), implies √ 3rN ( 2rN − 1) µ2,N − 1 ≥ ⇒ µ2,N+1 ≥ µ2,N . √ µ4,N (2 − 2)µ22,N
(4.1)
(4.2)
Put L1 (x) = √
3 . √ 2x( 2 − ( 2 − 1)x)2 √
Then by straightforward calculation we see 1≤x≤
5 3 ⇒ L1 (x) ≤ L1 (1) = √ , 4 2
and (3.11) implies √ 3rN ( 2rN − 1) . L1 (µ2,N ) = √ (2 − 2)µ22,N Therefore (4.2) implies that 1 3 > µ2,N − 1 ≥ √ µ4,N ⇒ µ2,N+1 ≥ µ2,N . 4 2
(4.3)
Corollary 4.2. Let d = 4. Then, for the s N defined in (2.11), it holds that s N ≤ s N+1 . Proof. Since µ2,N is increasing in s, if s < s N then µ2,N < 1, hence Proposition 4.1 implies µ2,N+1 < µ2,N < 1, further implying s < s N+1 . Hence the statement holds.
26
T. Hara, T. Hattori, H. Watanabe
For later convenience, define rN∗ =
1
√
3 1 − ( 2 − 1) √ µ4,N 2 1 ζ∗N = 1 − √ , 2 √ ∗ 2rN − 1 .
ζN∗ = √ 3 2 1 + √ µ4,N 2
,
(4.4)
(4.5) (4.6)
Then we see that if (2.13) holds, then we have, from (3.11) and (3.12), 1 2M, n n c n c n n n n aM,N a",N an−",N ≤ a1,N × M bn,N = " " 4 4 a1,N "=0 "=0 c a n a 1,N M,N = . M 2 a1,N
(5.26)
Triviality of Hierarchical Ising Model in Four Dimensions
35
Therefore 2a¯ ",N ≤
aM,N c a1,N " M 2 a1,N
≤
aM,N M a1,N
=
aM,N M a1,N
=
aM,N M a1,N
∞
m (2m + 2" − 1)!! (2m)!! (2" − 1)!! m=2M+1−"
∞ c a " m m + " 1,N βc a1,N " 2 m=2M+1−"
∞ c a " 2M+1−" k 2M + 1 + k 1,N βc a1,N βc a1,N " 2 k=0
"
∞ 2M+1 k 2M + 1 + k 1 . (5.27) βc a1,N βc a1,N " 2β βc a1,N
k=0
Here, T2M+1," (r) =
∞
βc a1,N
k
k=0
∞ 2M + 1 + k k 2M + 1 + k = r " "
"
1 2M + 1 m = q , "−m 1−r
k=0
(5.28)
m=0
r where r = βc a1,N , and q = 1−r . By assumption r < 21 . The binomial coefficient in the summand is largest when m = 0, because 2M + 1 > 2M ≥ 2". Therefore,
"
1 1 2M + 1 m 1 2M + 1 T2M+1," (r) ≤ q ≤ 1−r " 1−r 1−q " m=0
1 2M + 1 = . 1 − 2r "
(5.29)
This proves
2a¯ ",N ≤
1 2β
"
2M+1
βc a1,N aM,N 2M + 1 × M ≤ 2a¯ ",N , " 1 − 2βc a1,N a1,N
where 2a¯ ",N is defined in (5.14). This proves a˜ n,N ≤ a˜¯ n,N .
(5.30)
Remark. We can “improve” Proposition 5.1 by employing (correct) bounds, in a similar ca¯
n
1,N way as the term proportional to in (5.9). In actual calculations, we improve 2 a¯ n,N+1 , n = 1, 2, · · · , M, in (5.12), the upper bounds for an,N+1 ’s, using (A.6) (as well 2 as its special case (5.5)). To be more specific, we compare a¯ 4,N+1 in (5.12) with a¯ 2,N+1 and replace the definition if the latter is smaller. Then we go on to “improve” a¯ 6,N+1 by comparing with a¯ 2,N+1 a¯ 4,N+1 , and so on. Conceptually there is nothing really new here, but this procedure improves the actual value of the bounds in Proposition 5.1.
36
T. Hara, T. Hattori, H. Watanabe
5.3. Computer results. In this subsection we prove Theorem 2.2 on computers using Proposition 5.1. We double checked by Mathematica and C++ programs on interval arithmetic. Here we will give results from C++ programs. Our program employs interval arithmetic, which gives rigorous bounds numerically. The idea is to express a number by a pair of “vectors”, which consists of an array of length M of “digits”, taking values in {0, 1, 2, · · · , 9}, and an integer corresponding to “exponent”. To give a simple example, let M = 2. One can view that 0.0523 is expressed on the program, for example, as I1 = [5.2 × 10−2 , 5.3 × 10−2 ], and 3 is expressed as I2 = [3.0 × 100 , 3.0 × 100 ]. When the division I1 /I2 is performed, our program routines are so designed that they give correct bounds as an output. Namely, the computer output of I1 /I2 will be [1.7 × 10−2 , 1.8 × 10−2 ]. We may occasionally lose the best possible bounds, but the program is so designed that we never lose the correctness of the bounds. Thus all the outputs are rigorous bounds of the corresponding quantities. In actual calculation we took M = 70 digits, which turned out to be sufficient. We also note that interval arithmetic is employed in [14] for the hierarchical model in d = 3 dimensions. We took an independent approach in programming – we focused on ease in implementing the interval arithmetic to main programs developed for standard floating point calculations – so that structure and details of the programs are quite different. However, our numerical calculations are “not that heavy” to require anything special. For the program which we used for our proof, see the supplement to [17]. As will be explained below, we only need to consider 2 values for the initial Ising parameter s: s− = 1.7925671170092624, and s+ = 1.7925671170092625. We perform explicit recursion on computers for each s = s± using Proposition 5.1. We summarize what is left to be proved: 1 , 0 ≤ s ≤ sN1 , 0 ≤ N ≤ N1 , where N1 = 100. This condition is 2βc from (5.15), imposed because we are going to do evaluation using Proposition 5.1. Note that this condition is stronger than (2.17) in the assumptions in Theorem 2.1, √ 1 1 because = (2 + 2) = 1.707 · · · for d = 4. 2βc 2 (2) s− ≤ s N1 and s N1 ≤ s+ . To prove this, it is sufficient (as seen from the definitions (2.11) and (2.12)) to prove
(1) a¯ 1,N
1 + √ µ4,N1 , when s = s+ . 2 (5.31)
(3) For any s satisfying s− ≤ s ≤ s+ , the bounds (0 ≤)µ4,N0 ≤ 0.0045, 1.6µ24,N0
≤ µ6,N0 ≤
(0 ≤)µ8,N0 ≤
6.07µ24,N0 , 48.469µ34,N0 ,
(5.32) (5.33) (5.34)
hold for N0 = 70. This condition comes from the assumptions in Theorem 2.1 (sufficient, if s− ≤ s N1 and s N1 ≤ s+ ). We now summarize our results from explicit calculations.
Triviality of Hierarchical Ising Model in Four Dimensions
37
1 2 (1) We have a¯ 1,N ≤ s+ = 1.6066 · · · , 0 ≤ s ≤ s+ , 0 ≤ N ≤ N1 . The largest value 2 for a¯ 1,N in the range of parameters is actually obtained at s = s+ and N = 0. (2) Our calculations turned out to be accurate to obtain more than 40 digits below decimal point correctly for µ2,100 and µ4,100 at s = s± , which is more than enough to prove (5.31). In fact, we have 0.99609586499804791366176669341357334889503943 ≤ a 1,100 ≤ µ2,100 ≤ a¯ 1,100 ≤ 0.99609586499804791366176669341357334889503972, at s = s− , and 1.0131857903720691722396611098376636943838027 ≤ a 1,100 ≤ µ2,100 ≤ a¯ 1,100 ≤ 1.0131857903720691722396611098376636943838031, 0.00281027097809098768088795100753480139767915 2 ≤ 21 (−a¯ 2,100 + a 21,100 ) ≤ µ4,100 ≤ 21 (−a 2,100 + a¯ 1,100 ) ≤ 0.00281027097809098768088795100753480139767969, at s = s+ . (3) To prove (5.32)–(5.34), we note the following. Let us write the s dependences of an,N and µn,N explicitly like an,N (s) and µn,N (s). For any integer N and for any s satisfying s− ≤ s ≤ s+ , the monotonicity of an,N (s) with respect to s implies µ4,N (s) =
1 1 (−a2,N (s) + a1,N (s)2 ) ≤ (−a2,N (s− ) + a1,N (s+ )2 ) =: µ¯ 4,N . 2 2 (5.35)
Hence if we can prove µ¯ 4,70 ≤ 0.0045, then we have proved (5.32). In a similar way, sufficient conditions for (5.33) and (5.34) are 1.6 ≤
µ6,70 µ¯ 24,70
,
µ¯ 6,70 ≤ 6.07, µ24,70
µ¯ 8,70 ≤ 48.469, µ34,70
with obvious definitions (as in (5.35) for µ¯ 4,N ) for µn,70 and µ¯ n,70 . The bounds we have for these quantities are (we shall not waste space by writing too many digits): µ¯ 4,70 ≤ 0.004144, 3.6459 ≤
µ6,70 µ¯ 24,70
,
µ¯ 6,70 µ¯ 8,70 ≤ 3.7542, 3 ≤ 38.488. µ24,70 µ4,70
This completes a proof of Theorem 2.2, and therefore Theorem 1.1 is proved. Acknowledgement. The authors would like to thankYoichiro Takahashi for his interest in the present work and for discussions. Part of this work was done while T. Hara was at Department of Mathematics, Tokyo Institute of Technology. The researches of T. Hara and T. Hattori are partially supported by Grant-in-Aid for Scientific Research (C) of the Ministry of Education, Science, Sports and Culture.
38
T. Hara, T. Hattori, H. Watanabe
A. Newman’s Inequalities Let X be a stochastic variable which is in class L of [15]. X ∈ L has Lee-Yang property, which states that the zeros of the moment generating function E eH X are pure imag inary. In fact, it is shown in [15, Prop. 2] using Hadamard’s Theorem that E eH X has the following expression: !
E e
HX
"
=e
bH 2
#
j
H2 1+ 2 αj
$ ,
(A.1)
where b is a non-negative constant and αj , j = 1, 2, 3, · · · , is a positive nondecreasing ∞ αj−2 < ∞. sequence satisfying j =1
Consequences of (A.1) in terms of inequalities among moments (n point functions) are given in [15], among which we note the following: 1. Positivity [15, Theorem 3]. Put µ2n
! √ d 2n 1 =− log E e −1ξ X (2n)! dξ 2n
"%% % %
ξ =0
.
(A.2)
Then, µ2n ≥ 0, n = 0, 1, 2, · · · .
(A.3)
(Note that (A.1) implies µ2n+1 = 0.) 2. Newman’s bound [15, Theorem 6]. Put v2n = nµ2n . Then, v4n ≤ v4n ,
v6 ≤
√ v 4 v8 ,
v4n+2 ≤ v6 v4n−1 ,
(A.4)
where the first and third inequalities follow from (2.10) of [15], while the second one n/2 is (2.12) of [15]. These imply v2n ≤ v4 , n ≥ 2, and therefore µ2n ≤
(2µ4 )n/2 , n = 2, 3, 4, · · · . n
(A.5)
Furthermore, we will prove the following. Proposition A.1. Put aN =
" ! N! E X2N , N ∈ Z+ . Then, (2N )!
aM+N ≤ aM aN
N, M = 0, 1, 2, · · · .
(A.6)
Proof. Put yj = αj−2 > 0. Then " ! 2 1 + H 2 yj . E eH X = ebH j
(A.7)
Triviality of Hierarchical Ising Model in Four Dimensions
39
Expand the infinite product to obtain H4
H6
yj + yi yj + y i y j yk + . . . 1 + H 2 yj = 1 + H 2 2! 3! j
j
=
∞
i,j
i,j,k
H 2n cn , n!
n=0
with
cn =
yi1 yi2 yi3 . . . yin ,
(A.8)
(A.9)
i1 ,i2 ,...,in
where primed summations denote summations over non-coinciding indices. Hence we have, ∞ ! " E eH X = H 2N
N=0
!
Comparing with E e
m,n:m+n=N
HX
"
∞ N bm cn bN−n cn = . H 2N m! n! (N − n)! n!
(A.10)
n=0
N=0
∞ aN 2N = H , we obtain N! N=0
aN =
N N n=0
bN−n cn .
n
Note that (A.9) implies cn+m ≤ cm cn ,
(A.11)
because the conditions of primed summations are weaker for the left-hand side. This with b ≥ 0 implies M N M N M+N−m−n b cm cn aM aN = m n m=0 n=0
≥
N M M N
m=0 n=0
=
M+N
b
m
M+N−"
=
"
c"
m:0≤m≤M, 0≤"−m≤N
"=0 M+N
bM+N−m−n cm+n
n
b
"=0
M+N−"
c"
M +N "
M N m "−m
= aM+N ,
where, in the last line, we also used
" M N M +N = , m "−m "
(A.12)
m: 0≤m≤M, 0≤"−m≤N
which is seen to hold if we compare the coefficients of x " of an identity (1 + x)M+N = (1 + x)M (1 + x)N .
40
T. Hara, T. Hattori, H. Watanabe
References 1. Bleher, P.M. and Sinai, Ya.G.: Investigation of the critical point in models of the type of Dyson’s hierarchical model. Commun. Math. Phys. 33, 23–42 (1973) 2. Bleher, P.M. and Sinai, Ya.G.: Critical indices for Dyson’s asymptotically hierarchical models. Commun. Math. Phys. 45, 247–278 (1975) 3. Collet, P. and Eckmann, J.-P.: A renormalization group analysis of the hierarchical model in statistical physics. Springer Lecture Note in Physics 74, 1978 4. Dyson, F.J.: Existence of a phase-transition in a one-dimensional Ising ferromagnet. Commun. Math. Phys. 12, 91–107 (1969) 5. Gallavotti, G.: Some aspects of the renormalization problems in statistical mechanics. Memorie dell’ Accademia dei Lincei 15, 23–59 (1978) 6. Gaw¸edzki, K. and Kupiainen, A.: Triviality of φ44 and all that in a hierarchical model approximation. J. Stat. Phys. 29, 683–699 (1982) 7. Gaw¸edzki, K. and Kupiainen, A.: Non-Gaussian fixed points of the block spin transformation. Hierarchical model approximation. Commun. Math. Phys. 89, 191–220 (1983) 8. Gaw¸edzki, K. and Kupiainen, A.: Nongaussian Scaling limits. Hierarchical model approximation. J. Stat. Phys. 35, 267–284 (1984) 9. Gaw¸edzki, K. and Kupiainen, A.: Asymptotic freedom beyond perturbation theory. In: K. Osterwalder and R. Stora, eds., Critical Phenomena, Random Systems, Gauge Theories. Les Houches 1984, Amsterdam: North-Holland, 1986 10. Gaw¸edzki, K. and Kupiainen, A.: Massless lattice φ44 Theory: Rigorous control of a renormalizable asymptotically free model. Commun. Math. Phys. 99, 199–252 (1985) 11. Koch, H. and Wittwer, P.: A non-Gaussian renormalization group fixed point for hierarchical scalar lattice field theories. Commun. Math. Phys. 106, 495–532 (1986) 12. Koch, H. and Wittwer, P.: On the renormalization group transformation for scalar hierarchical models. Commun. Math. Phys. 138, 537–568 (1991) 13. Koch, H. and Wittwer, P.: A nontrivial renormalization group fixed point for the Dyson–Baker hierarchical model. Commun. Math. Phys. 164, 627–647 (1994) 14. Koch, H. and Wittwer, P.: Bounds on the zeros of a renormalization group fixed point. Mathematical Physics Electronic Journal 1, No. 6 (24pp.) (1995) 15. Newman, C.M.: Inequalities for Ising models and field theories which obey the Lee–Yang theorem. Commun. Math. Phys. 41, 1–9 (1975) 16. Sinai, Ya.G.: Theory of phase transition: Rigorous results. New York: Pergamon Press, 1982 17. Hara, T., Hattori, T., and Watanabe, H.: Triviality of hierarchical Ising Model in four dimensions. Archived in mp_arc (Mathematical Physics Preprint Archive, http://www.ma.utexas.edu/mp_arc/) 00-397 Communicated by D. C. Brydges
Commun. Math. Phys. 220, 41 – 67 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Geometric Optics and Long Range Scattering for One-Dimensional Nonlinear Schrödinger Equations Rémi Carles Antenne de Bretagne de l’ENS Cachan and IRMAR, Campus de Ker Lann, 35 170 Bruz, France. E-mail:
[email protected] Received: 23 May 2000 / Accepted: 8 January 2001
Abstract: With the methods of geometric optics used in [2], we provide a new proof of some results of [11], to construct modified wave operators for the one-dimensional cubic Schrödinger equation. We improve the rate of convergence of the nonlinear solution towards the simplified evolution, and get better control of the loss of regularity in Sobolev spaces. In particular, using the results of [9], we deduce the existence of a modified scattering operator with small data in some Sobolev spaces. We show that in terms of geometric optics, this gives rise to a “random phase shift” at a caustic. Contents 1. 2. 3. 4. 5. 6. 7.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . Formal Computations . . . . . . . . . . . . . . . . . . . . . . . Estimates on Some Oscillatory Integrals . . . . . . . . . . . . . Energy Estimates . . . . . . . . . . . . . . . . . . . . . . . . . Justification of Nonlinear Geometric Optics Before the Caustic . Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . Construction of the Modified Scattering Operator and Application
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
41 46 48 53 54 58 63
1. Introduction In this article, we consider the nonlinear Schrödinger equation in one space dimension, 1 i∂t ψ + ∂x2 ψ = λ|ψ|p ψ, λ ∈ R, 2 in the particular case where p = 2. Define the Fourier transform by Fv(ξ ) = v (ξ ) = e−ix.ξ v(x)dx.
(1.1)
42
R. Carles
For p > 2, it is well known that to any asymptotic state ψ− ∈ H 1 ∩ F(H 1 ) =: , one can associate a solution ψ of (1.1) that behaves asymptotically as the free evolution of ψ− , that is, U0 (−t)ψ(t) − ψ− −→ 0, t→−∞
i 2t ∂x2
where U0 (t) := e denotes the unitary group of the free Schrödinger equation. The operator W− : ψ− → ψ|t=0 is called a wave operator. The case p = 2 is different (long range case). It is proved (see [1,12,13,5]) that if ψ− ∈ L2 and U0 (−t)ψ(t) − ψ− −→ 0 in L2 , where ψ solves t→−∞
1 i∂t ψ + ∂x2 ψ = |ψ|2 ψ, 2 then ψ = ψ− = 0. One cannot compare the nonlinear dynamics with the free dynamics. In [11], the author constructs modified wave operators that allow to compare the nonlinear dynamics of (1.1) when p = 2 with a simpler one, yet more complicated than the free dynamics. Assuming the asymptotic state ψ− is sufficiently smooth and small in a certain Hilbert space, Ozawa defines a new operator (that depends on ψ− ) such that the evolution of ψ− under this dynamics can be compared to the asymptotic behavior of a certain solution of 1 i∂t ψ + ∂x2 ψ = λ|ψ|2 ψ. (1.2) 2 Using the methods of geometric optics as in [2], we rediscover these modified operators, and improve some convergence estimates (Corollary 1). Moreover, we have better control of the (possible) loss of regularity, which, along with the results of [9], makes it possible to define a modified scattering operator (S = W+−1 W− ) for small data in (Corollary 2). This enables us to describe the validity of nonlinear geometric optics with focusing initial data. In particular, we show that the caustic crossing is described in terms of the scattering operator (as in [2]), plus a “random phase shift” (Corollary 3). In [8], Ginibre and Velo construct modified wave operators in Gevrey spaces. They make no size restriction on the data, but require analyticity for the asymptotic states. In the present article, we cannot leave out the smallness assumption, but our asymptotic states are less regular. Denote H := {f ∈ H 3 (R); xf ∈ H 2 (R)} = {f ∈ S (R); f H := (1 + x 2 )1/2 (1 − ∂x2 )f L2 + (1 − ∂x2 )3/2 f L2 < ∞}. Recall one of the results in [11]. Theorem 1 ([11], Theorem 2). There exists γ > 0 with the following properties. For 1 any ψ− ∈ F(H) with ψ − L∞ < γ , (1.2) has a unique solution ψ ∈ C(R; H ) ∩ 4 1,∞ Lloc (R; W ) such that for any α with 1/2 < α < 1, ψ(t) − eiS
t
−∞
ψ(τ ) − eiS
− (τ )
− (t)
U0 (t)ψ− H 1 = O(|t|−α ), 1/4
U0 (τ )ψ− 4W 1,∞ dτ
= O(|t|−α ) as t → −∞,
(1.3)
(1.4)
Geometric Optics and Long Range Scattering for NLS
43
where the phase shift S − is defined by S − (t, x) :=
λ 2π
x 2 ψ− log |t|. t
(1.5)
Remark 1. Theorem 1 in [11] gives an asymptotic in L2 instead of H 1 , and requires less regularity on the asymptotic state ψ− . Yet, it is still required to be small in the same space as in Theorem 2. Now we recall why the method of geometric optics can be closely related to scattering theory in the case of the nonlinear Schrödinger equation. In [2], we consider the initial value problem 1 iε∂t uε + ε 2 ∂x2 uε = λε α |uε |β uε , (t, x) ∈ R+ × R, 2 (1.6) 2 ε −i x2ε u|t=0 = e f (x), where α ≥ 1, β > 0 and 0 < ε ≤ 1 is a parameter going to zero. With the initial phase −x 2 /2, rays of geometric optics (which are the projection on the (t, x) space of the bicharacteristics) focus at the point (t, x) = (1, 0). We proved in [2] that in the case where β = 2α > 2 (“nonlinear caustic”), the asymptotic behavior, as ε goes to zero, of the solution near t = 1 is easily expressed in terms of f and the wave operator W− . To see that point, we introduced the scaling 1 ε t −1 x ε , , (1.7) u (t, x) = √ ψ ε ε ε that satisfies
1 1 1 ε U0 ψ − −→ ψ− := √ f (x). ε ε ε→0 2iπ
Define the function ψ by
(1.8)
i∂ ψ + 1 ∂ 2 ψ = λ|ψ|β ψ, t 2 x ψ|t=0 = W− ψ− .
Then ψ is a concentrating profile for uε , that is 1 t −1 x uε (t, x) ∼ √ ψ , . ε→0 ε ε ε In this paper, we treat the limiting case of (1.6), that is α = 1, β = 2. We study the validity of nonlinear geometric optics, for positive times, for the solutions of the following initial value problem, 1 iε∂t uε + ε 2 ∂x2 uε = λε|uε |2 uε , (t, x) ∈ R+ × R, 2 (1.9) x2 1 2 uε = e−i 2ε +iλ|f (x)| log ε f (x). |t=0
44
R. Carles
We altered the initial data by adding the term eiλ|f (x)| log ε in order to recover the same modified wave operator as in [11]. Nonlinear geometric optics could be justified as well without this term, by the same methods as that which follows in this article, but would not make it possible to deduce the existence of modified wave operators for (1.2). From now on, the function f in the initial data is supposed to belong to H, and nonzero. Then for every (fixed) ε > 0, (1.9) has a unique global solution, which belongs to C(Rt , ) (see for instance [5,6]). The following definition, that follows the spirit of [2], will be motivated in Sect. 2. 2
Definition 1. Let g ε be defined for t < 1 by g ε (t, ξ ) := λ|f (−ξ )|2 log
1
1−t ε
.
The approximate solution uεapp is defined for t < 1 by x.ξ t−1 2 1 dξ ε uεapp (t, x) := √ e−i 2ε ξ +i ε +ig (t,ξ ) a0 (ξ ) , 2π ε with
a0 (ξ ) :=
2π f (−ξ ). i
We define the symbol aε (t, ξ ) by x.ξ t−1 2 1 dξ ε uε (t, x) = √ e−i 2ε ξ +i ε +ig (t,ξ ) aε (t, ξ ) , 2π ε
(1.10)
(1.11)
(1.12)
(1.13)
which makes sense since uε ∈ L2 and
ξ 1 t−1 2 ε . aε (t, ξ ) = √ ei 2ε ξ −ig (t,ξ ) uε t, ε ε
(1.14)
We can now state the main result. Theorem 2. Let f ∈ H. There exist C ∗ = C ∗ (f ) and ε∗ = ε ∗ (f ) > 0 such that for 0 < ε ≤ ε∗ , nonlinear geometric optics is valid before the focus, with the following distinctions. – If f L∞ < |2λ|−1/2 , then in {1 − t ≥ C ∗ ε} and for any 0 ≤ s ≤ 1, 1 − t 2+s ε . aε (t, .) − a0 H s ∩F(H s ) = O log 1−t ε 2 ∗ – If f L∞ ≥ |2λ|−1/2 , then denote C0 := 2|λ|f L∞ . For any α > 0, there exists Cα 1/(C +α) 0 such that in 1 − t ≥ Cα∗ ε| log ε|5/2 , and for any 0 ≤ s ≤ 1,
aε (t, .) − a0 H s ∩F(H s ) = O
ε 2+s . | log ε| (1 − t)C0 +2sα
The above estimates are uniform on the time intervals we consider.
Geometric Optics and Long Range Scattering for NLS
45
Define the Galilean operator J (see for instance [5,6]) by J (t) := x + it∂x . 1/2 . Then there exists a unique Corollary 1. Let ψ− ∈ F(H) with ψ − L∞ < (π/|λ|) ψ ∈ C(R, ) solution of (1.2) such that for any 0 ≤ s ≤ 1, as t → −∞, (log |t|)2+s − ψ(t) − eiS (t) U0 (t)ψ− H s = O , (1.15) |t| (log |t|)3 iS − (t) J (t)ψ − J (t)e U0 (t)ψ− L2 = O . (1.16) |t| In particular, we have (log |t|)5/2 iS − (t) ψ(t) − e . (1.17) U0 (t)ψ− L∞ = O |t|3/2
Actually, we will prove uniqueness under weaker conditions, as stated in the following proposition. Recall that f and ψ− are related by (1.8). Proposition 1. Let f ∈ H 2 (R). Suppose f L∞ < |2λ|−1/2 . Then there exists at most one function ψ ∈ C(Rt , L2 ∩ L∞ ) solution of (1.2) satisfying the following property: There exists 1/2 < α < 1, with α > 2|λ|f 2L∞ , such that, as t → −∞, 1 iS − (t) ψ(t) − e . U0 (t)ψ− L2 ∩L∞ = O |t|α Remark 2. Our method does not recover the convergence in L4t (L∞ x ) of the derivatives, stated in Theorem 1. However, we recover all the others, with a better convergence rate. Remark 3. The other improvement involves the regularity of the function ψ thus constructed. We get some regularity of the momenta of ψ, namely xψ ∈ L2 , which did not appear in [11]. Thanks to this regularity, we can use the results of asymptotic completeness stated in [9], in order to define a long range scattering operator for small data. Corollary 2. We can define a modified scattering operator for (1.2), for small data in H. There exists δ > 0 such that to any ψ− ∈ F(H) satisfying ψ− ≤ δ, we can associate unique ψ ∈ C(Rt , ) solution of (1.2) and ψ+ ∈ L2 such that ψ(t)
∼
t→±∞
eiS
± (t)
U0 (t)ψ± in L2 ,
(1.18)
where S ± are defined by (1.5). The map S : ψ− → ψ+ is the modified scattering operator. Corollary 3. Let f ∈ H. Assume f is sufficiently small. Then nonlinear geometric optics is valid in L2 for the problem (1.9), before and after the caustic. The caustic crossing is described by the modified scattering operator S and a “random phase shift”. One has the following asymptotics in L2 , – if t < 1, then π
2 ei 4 i x +i λ ψ u (t, x) ∼ √ e 2ε(t−1) 2π − ε→0 2π(1 − t)
ε
x t−1
2 log
1−t ε
ψ −
x , t −1
46
R. Carles
– if t > 1, then
π
2 e−i 4 i x +i λ ψ u (t, x) ∼ √ e 2ε(t−1) 2π + ε→0 2π(t − 1)
ε
x t−1
2 log
t−1 ε
ψ +
x , t −1
where ψ− is defined by (1.8) and ψ+ = Sψ− . Remark 4. The phase shift of −π/2 between the two asymptotics is classical, and appears even in the linear case ([4]). The change in the profile, measured by a scattering operator, was proved in [2]. The new phenomenon here is the phase shift 2 2 x x λ λ t −1 log t − 1 , ψ+ − ψ− log 2π t −1 ε 2π t −1 ε which is “very nonlinear”, and depends on ε, hence can be called “random”. Remark 5. From a physical point of view, the nonlinearity λ|ψ|2 ψ appears as the first term of a Taylor expansion of a more general nonlinearity h(|ψ|2 )ψ. For instance, h may be bounded (to model the phenomenon of saturation). For large times, ψ is small and we can write h(|ψ|2 )ψ = λ|ψ|2 ψ + R(|ψ|2 )ψ,
(1.19)
with R(|ψ|2 ) = O(|ψ|4 ). One can check that replacing λ|ψ|2 ψ with the right-hand side of (1.19), Corollary 1 still holds, as well as Corollary 2, since the results in [9] still hold with (1.19). Notations. We will denote d¯ξ := so that the Fourier inverse formula writes F −1 f (x) =
dξ , 2π eixξ f (ξ )d¯ξ.
For x ∈ R, we denote x := (1 + x 2 )1/2 . 2. Formal Computations In this section, we recall how the oscillatory integrals were introduced in the nonlinear short range case ([2]), and give a formal argument that leads to Definition 1 before the focus, that is for t < 1. Suppose uε solves the initial value problem 1 iε∂t uε + ε 2 ∂x2 uε = 0, (t, x) ∈ R+ × R, 2 (2.1) 2 ε −i x2ε u|t=0 = e f (x). For t < 1, the asymptotics when ε goes to zero is given by WKB methods, x2 1 x uε (t, x) ∼ √ ei 2ε(t−1) . f ε→0 1 − t 1−t
(2.2)
Geometric Optics and Long Range Scattering for NLS
47
Near the focus, this description fails to be valid. Neither the profile nor the phase in (2.2) are defined for t = 1. For much more general cases, Duistermaat showed that a uniform description can be obtained in terms of oscillatory integrals ([4]), that is, in this case, xξ t−1 2 1 ε u (t, x) = √ (2.3) e−i 2ε ξ +i ε aε (ξ )d¯ξ. ε It is easy to check that aε has an asymptotic expansion in powers of ε, and in particular, aε −→ a0 defined by (1.12). For t < 1 the usual stationary phase formula applied to the ε→0
above integral with aε replaced by a0 gives the asymptotics (2.2). For t > 1, one has almost the same asymptotics, the main difference is a phase shift of −π/2 due to the caustic crossing. For the nonlinear case (1.6), we generalized the previous representation as follows ([2]), xξ t−1 2 1 ε u (t, x) = √ (2.4) e−i 2ε ξ +i ε aε (t, ξ )d¯ξ. ε This formula makes sense as soon as uε ∈ L2 , since ξ 1 t−1 2 aε (t, ξ ) = √ ei 2ε ξ uε t, . ε ε The nonlinear term εα |uε |β uε is negligible when ∂t aε goes to zero. With this natural definition, we proved that the nonlinear term can have different influences away from the caustic, and near t = 1, which led us to use the same vocabulary as in [10], linear/nonlinear propagation, linear/nonlinear caustic. We also proved that the four cases can be encountered. When the propagation is nonlinear (α = 1), a formal computation based on the stationary phase formula suggests as a limit transport equation for the symbol aε , i∂t a(t, ξ ) =
λ |2π(1 − t)|
β 2
|a|β a(t, ξ ),
(2.5)
at least away from the caustic, with initial data a|t=0 = a0 (ξ ). Multiplying (2.5) by a, ¯ ig(t,ξ ) one notices that the modulus of a is constant. If we write a = a0 e , the equation for g is: ∂t g(t, ξ ) = −
λ |1 − t|
β 2
|f (−ξ )|β .
(2.6)
If we wish to get as a limit transport equation the relation ∂t a˜ = 0, it seems natural to define a modified symbol a˜ ε as x.ξ t−1 2 1 ε u (t, x) = √ (2.7) e−i 2ε ξ +i ε +ig(t,ξ ) a˜ ε (t, ξ )d¯ξ, ε with g|t=0 = 0. In the case of a linear caustic (β < 2), we proved that indeed, a˜ ε (t, ξ ) −→ a0 (ξ ) in L∞ t,loc (x ). ε→0
48
R. Carles
In the case we want to study now, β = 2, the integration of (2.6) is possible only for t < 1. With the initial data g|t=0 = λ|f (−ξ )|2 log 1ε , it gives the result introduced in Definition 1. As in the cases recalled above, the transport equation for the modified symbol a˜ ε must be, for t < 1, ∂t a˜ ε −→ 0, ε→0
which leads us to the definition of the approximate solution (1.11). From now on, we will leave out the tilde symbol for a, and adopt the notation (1.13). Remark 6. The function g is defined only for t < 1, not near t = 1. One must remember that the formal computations that lead to the definition of g are based on the application of 2 the stationary phase formula. When the phase 1−t 2 ξ + xξ does not have non-degenerate critical points, one must not expect this formal argument to be valid in the general case. On the other hand, recall that the case we study (α = 1 and β = 2) corresponds to a nonlinear propagation and a nonlinear caustic. The phase g takes the nonlinear effects of the propagation before the caustic into account. To take the nonlinear effects of the caustic into account, one has to define a (long range) scattering operator for the cubic Schrödinger equation (see Sect. 7). For t < 1, the function uεapp satisfies the equation √ xξ t−1 2 1 ε iε∂t uεapp + ε 2 ∂x2 uεapp = − ε ∂t g ε (t, ξ )e−i 2ε ξ +i ε +ig (t,ξ ) a0 (ξ )d¯ξ 2 xξ t−1 2 1 |f (−ξ )|2 ε = λε √ a0 (ξ )d¯ξ. e−i 2ε ξ +i ε +ig (t,ξ ) 1−t ε
(2.8)
For t < 1, one can formally apply the stationary phase formula to the integral defining uεapp , x2 x 1 x i 2ε(t−1) +ig ε t, t−1 ε uapp (t, x) ∼ e f (2.9) =: uε1 (t, x). ε→0 (1 − t)1/2 1−t On the other hand, if one applies the stationary phase formula to the right-hand side of (2.8), it comes λε|uε1 |2 uε1 (t, x), so formally, uεapp is an approximate solution of (1.9). In the following section, we estimate precisely the remainders when one applies the stationary phase formula as above. 3. Estimates on Some Oscillatory Integrals 3.1. The fundamental estimate. We first estimate precisely the remainder of the usual stationary phase formula applied to the first order, in L2 . Lemma 1. Let σ (t, ξ ) be locally bounded in time with values in L2 (R). Denote xξ t−1 2 1 H ε (t, x) := √ e−i 2ε ξ +i ε σ (t, ξ )d¯ξ, ε and .ε the first term given by the stationary phase formula, x2 x i −i ε 2ε(1−t) σ t, . . (t, x) := e 2π(1 − t) t −1
Geometric Optics and Long Range Scattering for NLS
49
1. There exists a continuous function h, with h(0) = 0, such that ε ε H (t, .) − .ε (t, .) 2 = h . L 1−t 2. If σ (t, .) ∈ H 2 (R), the rate of continuity of h can be estimated, ε H (t, .) − .ε (t, .) 2 ≤ C ε σ (t, .) 2 . Hξ L |1 − t| Proof. From the definition of H ,
1−t x 2 1 i ξ + 1−t σ (t, ξ )d¯ξ H (t, x) = e e 2ε √ ε x2 1−t 2 1 x ei 2ε ξ σ t, ξ − = e−i 2ε(1−t) √ d¯ξ, 1−t ε 2
x −i 2ε(1−t)
ε
hence from Parseval formula,
ε xy 2 i ei 2(1−t) y ei 1−t Fξ−1 H (t, x) = e →y σ (t, y)dy 2π(1 − t) x2 x i −i 2ε(1−t) = e σ t, 2π(1 − t) t −1 xy x2 ε 2 i + e−i 2ε(1−t) ei 2(1−t) y − 1 ei 1−t F −1 σ (t, y)dy, 2π(1 − t) ε
2
x −i 2ε(1−t)
and the last term can also be written as ε x2 i x −i 2ε(1−t) i 2(1−t) y2 −1 −1 F σ t, F e . e 2π(1 − t) t −1 Now from the Plancherel formula, ε H (t, .) − .(t, .) 2 = h t, L x
with
ε 1−t
z 2 h(t, z) = ei 2 y − 1 F −1 σ
L2y
,
.
Then the first point follows from the dominated convergence theorem. When σ (t, .) ∈ H 2 , we have z h(t, z) = 2 sin y 2 F −1 σ (t, .) 4 L2y z ≤ 2 y 2 F −1 σ (t, .) 4 L2y ≤ |z| y 2 F −1 σ (t, .) 2 = C|z|σ (t, .)H 2 . Ly
This inequality completes the proof of Lemma 1.
50
R. Carles
3.2. Convergence of the initial data. To obtain asymptotics in for the symbols as stated in Theorem 2, we have to notice the following properties. If xξ t−1 2 1 v ε (t, x) = √ e−i 2ε ξ +i ε bε (t, ξ )d¯ξ, ε then
1 √ ε and 1 √ ε
e−i
e−i
xξ t−1 2 2ε ξ +i ε
xξ t−1 2 2ε ξ +i ε
ξ bε (t, ξ )d¯ξ = ε∂x v ε (t, x),
∂ξ bε (t, ξ )d¯ξ = J ε (t)v ε (t, x),
where we denoted J ε (t) :=
x + i(t − 1)∂x . ε
(3.1)
The operator J ε is nothing else than the usual Galilean operator, rescaled accordingly to our problem. Lemma 2. The operator J ε satisfies the following properties. – The commutation relation,
1 2 2 J (t), iε∂t + ε ∂x = 0. 2 ε
(3.2)
x2
– Denote M ε (t) = ei 2ε(t−1) , then J ε (t) writes J ε (t) = i(t − 1)M ε (t)∂x M ε (2 − t).
(3.3)
– The modified Sobolev inequality, w(t)L∞ ≤ C √
1 1/2 1/2 w(t)L2 J ε (t)w(t)L2 . |1 − t|
(3.4)
– For any function F ∈ C 1 (C, C) satisfying the gauge invariance condition ∃G ∈ C 1 (R+ , R), F (z) = zG (|z|2 ), one has J ε (t)F (w) = ∂z F (w)J ε (t)w − ∂z¯ F (w)J ε (t)w.
(3.5)
Th first step to prove Theorem 2 is to study the convergence of the initial value of the symbol aε .
Geometric Optics and Long Range Scattering for NLS
51
Lemma 3. The following convergence holds in , aε (0, ξ ) −→ a0 (ξ ). ε→0
More precisely, there exists C = C(f H ) such that
1 2 ≤ Cε log , ε 1 3 ≤ Cε log . ε
aε (0, .) − a0 L2 ξ(aε (0, ξ ) − a0 (ξ ))L2 , ∂ξ (aε (0, ξ ) − a0 (ξ ))L2
(3.6)
Moreover, the same estimates hold with aε (0, ξ ) − a0 (ξ ) replaced with (aε (0, ξ ) − 2 a0 (ξ ))e−iλ|f (−ξ )| log ε . Proof. From (1.14) and the initial value of uε , one has i 1 1 2 2 2 aε (0, ξ ) = eiλ|f (−ξ )| log ε . √ e− 2ε (x+ξ ) +iλ|f (x)| log ε f (x)dx. ε Denote hε (x) := eiλ|f (x)|
2 log 1 ε
f (x). From Parseval formula, one also has y2 1 −iλ|f (−ξ )|2 log ε =√ aε (0, ξ )e e−iyξ −iε 2 hε (y)dy, 2iπ
hence (aε (0, ξ ) − a0 (ξ )) e
−iλ|f (−ξ )|2 log ε
=√
1 2iπ
e
−iyξ
e
2
−iε y2
− 1 hε (y)dy.
Following the proof of Lemma 1, one then proves that the L2 -norm of the above quantity is O(ε| log ε|2 ), and its -norm is O(ε| log ε|3 ). The estimates of Lemma 3.6 are then straightforward. 3.3. Estimating the approximate solution. To estimate the remainder uε − uεapp , we will need some information as for the L∞ -norm of the approximate solution. The following lemma provides some. Lemma 4. Let β > 0. There exists C∗ = C∗ (β, f H 2 ) such that in the region {1 − t ≥ C∗ ε}, uεapp (t) satisfies almost the same estimate as uε1 (t) in L∞ , that is, uεapp (t)L∞ ≤
f L∞ + β . √ 1−t
Proof. Write uεapp (t)L∞ ≤ uε1 (t)L∞ + uεapp (t) − uε1 (t)L∞ , and denote d ε (t, x) := uεapp (t, x) − uε1 (t, x). From the modified Sobolev inequality, d ε (t)L∞ ≤ √
C 1−t
d ε (t)L2 J ε (t)d ε L2 . 1/2
1/2
(3.7)
52
R. Carles
Now the L2 -norms can be estimated thanks to Lemma 1, with σ ε (t, ξ ) := eig
ε (t,ξ )
a0 (ξ ).
Thus, d ε (t)L2 ≤ C
ε σ ε (t, .)H 2 . ξ 1−t
It is a straightforward computation to see that since H 1 (R) ⊂ L∞ (R), there are some constants such that ε 1−t 2 ε d (t)L2 ≤ C(f H 2 ) . log 1−t ε Since J ε (t) acts as the differentiation with respect to ξ on the symbols, the first part of Lemma 1 gives 1−t ε J ε (t)d ε L2 = log h , ε 1−t where h ∈ C(R) satisfies h(0) = 0. Then from (3.7),
1/2 ε ε C(f H 2 ) 1 − t 3/2 ε . d (t)L∞ ≤ √ h log 1−t ε 1−t 1−t Hence, for 1 − t ε, d ε (t) is negligible compared to uε1 (t) in L∞ . This completes the proof of Lemma 4. The proof of the next lemma is similar, and uses the regularity f ∈ H. Lemma 5. There exists C∗ = C∗ (f H ) such that in the region {1 − t ≥ C∗ ε}, the derivatives of uεapp satisfy almost the same estimates as the derivatives of uε1 in L∞ , that is, there exists C = C(f H ) such that ε∂x uεapp (t)L∞ ≤ √ J ε (t)uεapp L∞ ≤ √
C 1−t C 1−t
, log
(3.8) 1−t . ε
(3.9)
3.4. The equation satisfied by the approximate solution. From Sect. 2 and more precisely from Eq. (2.8), the approximate solution uεapp solves the cubic nonlinear Schrödinger equation up to the error term 6ε (t, x) := |uεapp |2 uεapp (t, x) xξ t−1 2 1 |f (−ξ )|2 ε −√ a0 (ξ )d¯ξ. e−i 2ε ξ +i ε +ig (t,ξ ) 1−t ε
(3.10)
Lemma 6. There exist C = C(f H 2 ) and C∗ = C∗ (f H 2 ) such that uniformly in the region {1 − t ≥ C∗ ε}, 1−t 2 ε log 6ε (t)L2x ≤ C . (3.11) (1 − t)2 ε
Geometric Optics and Long Range Scattering for NLS
53
Proof. Write 6ε (t, x) = 6ε (t, x) + |uε1 |2 uε1 (t, x) − |uε1 |2 uε1 (t, x), and introduce ε (t, x) := |uεapp |2 uεapp (t, x) − |uε1 |2 uε1 (t, x). 6 ε satisfies the estimate stated in Lemma 6. The other estimate to complete We prove that 6 the proof of Lemma 6 would be easier and will be left out. First remark that ε (t, .)L2 ≤ C uεapp (t)2L∞ + uε1 (t)2L∞ (uεapp − uε1 )(t)L2 . 6 x x x One has obviously 1 f 2L∞ . 1−t From Lemma 4, uεapp satisfies the same estimate in the region we are considering. Hence, uε1 (t)2L∞ ≤ x
ε (t, .)L2 ≤ C(f H 2 ) 6 From Lemma 1 with σ ε = eig
1 (uεapp − uε1 )(t)L2x . 1−t
(3.12)
ε (t,ξ )
a0 (ξ ), we finally have ε ig ε (t,ξ ) (uεapp − uε1 )(t)L2x ≤ C a0 (ξ ) 2 , e Hξ 1−t
ε satisfies the estimate announced in Lemma 4. and it is easy to check that 6
The following lemma is the extension of Lemma 6 we will need for the proof of Theorem 2, and its proof is similar. Lemma 7. There exist C = C(f H ) and C∗ = C∗ (f H ) such that uniformly in the region {1 − t ≥ C∗ ε}, 1−t 3 ε ε ε log , J (t)6 (t)L2x ≤ C (1 − t)2 ε (3.13) 1−t 3 ε ε log . ε∂x 6 (t)L2x ≤ C (1 − t)2 ε 4. Energy Estimates In this section, we derive the three energy estimates we will use to justify nonlinear geometric optics. Recall that the exact solution uε and the approximate solution uεapp satisfy 1 iε∂t uε + ε 2 ∂x2 uε = λε|uε |2 uε , 2 1 iε∂t uεapp + ε 2 ∂x2 uεapp = λε|uεapp |2 uεapp − ε6ε , 2 where 6ε is defined by (3.10) and is estimated in Lemmas 6 and 7. Introduce the remainder w ε := uε − uεapp . Subtracting the previous two equations, one has 1 iε∂t w ε + ε 2 ∂x2 w ε = λε |uε |2 uε − |uεapp |2 uεapp + ε6ε . 2
(4.1)
54
R. Carles
Multiplying the previous equation by w ε and taking the imaginary part of the result integrated in x, it follows ∂t w ε (t)L2 ≤ C uε (t)2L∞ + uεapp (t)2L∞ w ε (t)L2 + C6ε (t)L2 (4.2) ≤ C wε (t)2L∞ + uεapp (t)2L∞ wε (t)L2 + C6ε (t)L2 . Differentiating (4.1) with respect to x and multiplying by ε∂x w ε , one has similarly ∂t ε∂x w ε (t)L2 ≤ C w ε (t)2L∞ + uεapp (t)2L∞ ε∂x w ε (t)L2 + Cw ε (t)L2 uεapp (t)L∞ ε∂x uεapp (t)L∞
(4.3)
+ Cw ε (t)2L∞ ε∂x uεapp (t)L2 + Cε∂x 6ε (t)L2 . Finally, since from Lemma 2 J ε commutes with the Schrödinger operator and acts on the nonlinearity we are considering as a differentiation, we also have ∂t J ε (t)w ε L2 ≤ C w ε (t)2L∞ + uεapp (t)2L∞ J ε (t)w ε L2 + Cw ε (t)L2 uεapp (t)L∞ J ε (t)uεapp L∞
(4.4)
+ Cw ε (t)2L∞ J ε (t)uεapp L2 + CJ ε (t)6ε L2 . The main idea to justify nonlinear geometric optics is to integrate those three energy estimates so long as wε (t)L∞ is not greater than uεapp (t)L∞ . Since w ε is expected to be a remainder, this case actually occurs in “sufficiently” large regions, as we will see in the next section. 5. Justification of Nonlinear Geometric Optics Before the Caustic We now illustrate the method announced above. From Lemma 4, the “so long” condition writes for instance 4f L∞ wε (t)L∞ ≤ √ . 1−t
(5.1)
From Inequality (3.4) and Lemma 3, wε (0)L∞ ≤ Cw ε (0)L2 J ε (0)w ε L2 ≤ Cε| log ε|5/2 . 1/2
1/2
Hence there exists ε∗ = ε∗ (f H ) > 0 such that for 0 < ε ≤ ε∗ , w ε (0)L∞ ≤ 2f L∞ . By continuity, Condition (5.1) is satisfied for 0 ≤ t ≤ Tε for some Tε > 0. Then so long as (5.1) holds, we can integrate the three energy estimates using Gronwall lemma. Since (4.3) and (4.4) are very similar, introduce the following norm, (5.2) wε (t)Y := max ε∂x w ε (t)L2 , J ε (t)w ε L2 .
Geometric Optics and Long Range Scattering for NLS
55
Now we can write (3.4) as w ε (t)L∞ ≤ √
C
w ε (t)L2 w ε (t)Y . 1/2
1−t
1/2
So long as (5.1) holds, estimate (4.2) can be written as follows, ∂t w ε (t)L2 ≤ C1
f 2L∞ ε w (t)L2 + C6ε (t)L2 , 1−t
(5.3)
where C1 is a universal constant that does not depend on f . Denote C0 := C1 f 2L∞ . From the Gronwall lemma, we can integrate the previous inequality as follows, t 1 − s C0 w ε (0)L2 ε ε w (t)L2 ≤ +C 6 (s)L2 ds. (5.4) (1 − t)C0 1−t 0 From Lemma 5, and from Lemma 7, Inequalities (4.3) and (4.4) can also be written 1−t C0 C w ε (t)Y + w ε (t)L2 log ∂t w ε (t)Y ≤ 1−t 1−t ε C 1 − t + w ε (t)L2 w ε (t)Y log (5.5) 1−t ε 1−t 3 ε log , +C (1 − t)2 ε where we possibly increased the value of C1 (this question will be addressed more precisely in Sect. 6). To estimate the integral of the right-hand side of (5.4), we use Lemmas 6 and 7. The integral is not greater than t C ε 1−s 2 log ds. (5.6) (1 − t)C0 0 (1 − s)2−C0 ε For j > 0, we are thus led to study t 1−s j 1 log ds. 2−C0 ε 0 (1 − s) We take j > 0 and not only j = 2 because to estimate w ε (t)Y , we will have to deal with similar integrals with j = 3. With the substitution σ = 1−t ε , it becomes ε
C0 −1
1 ε 1−t ε
log σ j dσ. σ 2−C0
(5.7)
Since in Lemmas 4, 6 and 7, we had to restrict our attention to the region 1−t ε, we can replace log σ with | log σ | in the previous integral with no change in the asymptotics, and we have to study 1 ε (log σ )j dσ, j > 0. (5.8) 1−t σ 2−C0 ε To estimate these integrals, we have to distinguish two cases, namely C0 < 1 and C0 ≥ 1.
56
R. Carles
5.1. Case C0 < 1. In this case, one has obviously 2 − C0 > 1, hence the integral (5.8) is convergent. More precisely, we have to estimate the remainder of a converging integral. Integration by parts shows that for b > a 1, b (log σ )j (log a)j . dσ = O σ 2−C0 a 1−C0 a With a =
1−t ε ,
it follows that the energy estimate (5.4) becomes w ε (t)L2 ≤ C2
ε 1−t
log
1−t ε
2 .
(5.9)
Let α > 0. Then if 1 − t ≥ C∗ ε where C∗ is such that C2
(log C∗ )3 = α, C∗
where C2 is the constant in (5.9), Inequality (5.5) becomes 1−t 3 C0 + α ε Cε log . w (t)Y + ∂t w ε (t)Y ≤ 1−t (1 − t)2 ε
(5.10)
Taking α > 0 sufficiently small, we have C0 + α < 1, hence we can replace C0 + α with C0 with no change in the result. Now we can apply Gronwall lemma to (5.10), t 1 1−s 3 w ε (0)Y Cε wε (t)Y ≤ log + ds. (1 − t)C0 (1 − t)C0 0 (1 − s)2−C0 ε The previous estimate with j = 3 yields ε w (t)Y ≤ C 1−t ε
1−t log ε
From (3.4), w ε (t)L∞ ≤ √
C
ε 1−t 1−t
log
3
1−t ε
.
(5.11)
5/2 .
Hence, there exists C∗ = C∗ (f ) such that for 0 < ε ≤ ε∗ and in the region {1−t ≥ C∗ ε}, condition (5.1) is always satisfied, and estimates (5.9) and (5.11) hold, which we can summarize in the following proposition. −1/2
Proposition 2. Define δ := C1 . If f ∈ H satisfies f L∞ < δ, then nonlinear geometric optics is uniformly valid in the region {1 − t ≥ C∗ ε} for some (large) C∗ = C∗ (f ), with the estimates, ε 1−t 2 , log aε (t, .) − a0 L2 ≤ C 1−t ε ε 1−t 3 . log ∂ξ (aε (t, .) − a0 )L2 , ξ(aε (t, ξ ) − a0 (ξ ))L2 ≤ C 1−t ε
Geometric Optics and Long Range Scattering for NLS
57
5.2. Case C0 ≥ 1. Now the integral (5.8) is divergent. Integration by parts shows that for b > a 1, b (log σ )j C0 −1 j dσ = O b (log b) . σ 2−C0 a Then the energy estimate (5.4) becomes w ε (t)L2 ≤ C
ε | log ε|2 . (1 − t)C0
(5.12)
Let α > 0. First, we restrict our study to the region 1−t ε ≤ 2α. | log ε|2 log C 0 (1 − t) ε
(5.13)
Then the energy estimate (5.5) becomes, when we take only the “worst” terms into account, ∂t w ε (t)Y ≤
1−t C0 + 2α ε Cε 2 log | log ε| w (t)Y + . 1−t (1 − t)1+C0 ε
(5.14)
Applying the Gronwall lemma and proceeding as for the L2 -norm yields wε (t)Y ≤ C
ε| log ε|3 . (1 − t)C0 +2α
(5.15)
From (3.4), wε (t)L∞ ≤ √
C ε | log ε|5/2 . 1 − t (1 − t)C0 +α
1/(C0 +α) Hence, for 1 − t ε| log ε|5/2 and ε sufficiently small, condition (5.1) is always satisfied, and estimates (5.12) and (5.15) hold. Notice that in this region, for ε sufficiently small, (5.13) is automatically satisfied. Proposition 3. Take δ as in Proposition 2. Assume f ∈ H satisfies f L∞ ≥ δ. Let α > 0. Then there exists Cα∗ such that nonlinear geometric optics is uniformly valid in the 1/(C0 +α) region {1 − t ≥ Cα∗ ε| log ε|5/2 }, where C0 = f 2L∞ /δ 2 , with the estimates, ε | log ε|2 , (1 − t)C0 ε ≤C | log ε|3 . (1 − t)C0 +2α
aε (t, .) − a0 L2 ≤ C ∂ξ (aε (t, .) − a0 )L2 , ξ(aε (t, ξ ) − a0 (ξ ))L2
Propositions 2 and 3 imply Theorem 2, up to the computation of the smallness constant we find with this method, which we shall perform in the next section.
58
R. Carles
6. Interpretation 6.1. Computation of δ. We now focus on the case f L∞ < δ, and compute the best constant given by our method. From Sect. 5.1, we have to compute the coefficient in the factor of wε (t)L2 in Inequality (4.2), and the constant that appears in the first line of the right-hand side of (4.3) and (4.4). Indeed, for these last two inequalities, we proved that the other terms can be either absorbed (provided we remain sufficiently “far” from the caustic), or considered as a small source term. For Inequality (4.2), we multiplied (4.1) by w ε , then took the imaginary part of the result integrated in space. Write |uε |2 uε − |uεapp |2 uεapp = |uε |2 w ε + (|uε |2 − |uεapp |2 )uεapp . With the method of energy estimates, the first term will vanish, and the second is written |wε |2 + 2 Re(w ε uεapp ) uεapp . Hence, we can rewrite (4.2) more precisely as ∂t w ε (t)L2 ≤ 2|λ| 2w ε (t)L∞ + uεapp (t)L∞ uεapp (t)L∞ w ε (t)L2 + source term.
(6.1)
For Inequality (4.3), we differentiate |uε |2 uε − |uεapp |2 uεapp , with the result (uε )2 ε∂x uε − (uεapp )2 ε∂x uεapp + 2(|uε |2 ε∂x uε − |uεapp |2 ε∂x uεapp ). The very last term will be considered as a source term. The term before is written |uε |2 ε∂x uε = |uε |2 ε∂x w ε + |uε |2 ε∂x uεapp . When we take the imaginary part, the term |uε |2 ε∂x w ε vanishes, and the other term is made of source terms and of “absorbed” terms. Finally, the only relevant term will be (uε )2 ε∂x w ε , and we can rewrite (4.3) as 2 ∂t ε∂x w ε (t)L2 ≤ 2|λ| w ε (t)L∞ + uεapp (t)L∞ ε∂x w ε (t)L2 (6.2) + absorbed terms + source terms. Since from (3.5), J ε acts on the nonlinearity as a differentiation, the computation is exactly the same as with ε∂x , and we have 2 ∂t J ε (t)w ε L2 ≤ 2|λ| w ε (t)L∞ + uεapp (t)L∞ J ε (t)w ε L2 (6.3) + absorbed terms + source terms. Now notice that in Lemma 4, we could have obtained the estimate 1+β uεapp (t)L∞ ≤ √ f L∞ , 1−t for any β > 0, provided that we take C∗ sufficiently large.
Geometric Optics and Long Range Scattering for NLS
59
Similarly, for Condition (5.1), we could have taken w ε (t)L∞ ≤ √
β
f L∞ .
1−t
Obviously, the smaller β is, the smaller ε∗ is, and the larger C ∗ in Theorem 2. We see that for any β > 0, we can take C0 = 2(1 + 2β)2 |λ|f 2L∞ , which proves that we can take δ = |2λ|−1/2 , and completes the proof of Theorem 2. 6.2. Proof of Corollary 1. Existence. Recall the scaling (1.7). For every ε > 0, ψ ε is the unique solution in C(R, ) of the initial value problem 1 i∂t ψ ε + ∂x2 ψ ε = λ|ψ ε |2 ψ ε , 2 (6.4) 2 √ ε −iε x2 −iλ|f (εx)|2 log ε ψ = εf (εx)e . |t=−1/ε
For t < 0, define ψapp by
ψapp (t, x) :=
t
e−i 2 ξ
2 +ixξ +iλ|f (−ξ )|2 log |t|
a0 (ξ )d¯ξ.
From Theorem 2, if f L∞ < |λ|−1/2 , then there exist ε∗ and C ∗ such that for 0 < ε ≤ ε∗ and 0 ≤ s ≤ 1, ψ ε (t) − ψapp (t)H s ≤ C
(log |t|)2+s , |t|
(6.5)
uniformly for −1/ε ≤ t ≤ −C ∗ . Moreover, since the operator J ε is nothing but the classical Galilean operator J (t) up to the scaling (1.7), we also have J (t)ψ ε − J (t)ψapp L2 ≤ C
(log |t|)3 , |t|
(6.6)
uniformly for −1/ε ≤ t ≤ −C ∗ . Proposition 4. Assume f L∞ < |λ|−1/2 . Then there exists C ∗ > 0 such that (ψ ε (−C ∗ ))0 0 such that for any t ≤ −C ∗ , ψapp (t)L∞ ≤
f L∞ + β . |t|1/2
Then for t ≤ −C ∗ and from (6.17), (6.18) becomes C C ∂t φ(t)L2 ≤ + φ(t)L2 , |t| |t|α+1/2 with C := |λ| (f L∞ + β)2 < 1. For t0 ≤ t ≤ −C ∗ , the Gronwall lemma gives φ(t)L2
C t0 ≤ Cφ(t0 )L2 . t
Using the assumptions again, we have φ(t)L2
1 ≤C α |t0 |
C t0 . t
Given our choice for β, α > C. Fix t = −C ∗ . The right-hand side goes to zero when t0 goes to −∞. Hence φ(−C ∗ ) = 0, and φ ≡ 0 from the uniqueness for (1.2) in C(Rt , L2 ∩ L∞ ) (see [7]). This proves Proposition 1 and completes the proof of Corollary 1. Remark 9. For Proposition 1, we need the assumption f ∈ H 2 (R) because it is the minimum regularity we assumed for Lemma 4. 7. Construction of the Modified Scattering Operator and Application 7.1. Proof of Corollary 2. We first recall the main result in [9] for nonlinear Schrödinger equation, which corresponds to the notion of asymptotic completeness of the modified wave operators introduced in [11]. Theorem 3 ([9], Theorem 1.2, case n = 1). Let ϕ ∈ , with ϕ = δ ≤ δ, where δ is sufficiently small. Let ψ ∈ C(Rt , ) be the solution of the initial value problem (6.14), with C ∗ = 0. Then there exist unique functions W ∈ L2 ∩ L∞ and φ ∈ L∞ such that for t ≥ 1, t dτ 2 F U0 (−t)ψ (t) exp −i λ ˆ )|2 − W | ψ(τ ≤ Cδ t −α+C(δ ) , (7.1) 2π 1 τ 2 ∞ L ∩L t λ dτ 2 2 2 ˆ |ψ(τ )| ≤ Cδ t −α+C(δ ) , (7.2) − λ|W | log t − φ 2π τ 1
L∞
64
R. Carles
where Cδ < α < 1/4, and φ is a real valued function. Furthermore we have the asymptotic formula for large time t, 2 x 2 x x x 1 W exp i + iλ W ψ(t, x) = log t + iφ (it)1/2 t 2t t t (7.3) 2
+ O(δ t −1/2−α+C(δ ) ) and the estimate F U0 (−t)ψ (t) − W exp(iλ|W |2 log t + iφ)
2
L2 ∩L∞
≤ Cδ t −α+C(δ ) .
(7.4)
Remark 10. Uniqueness follows from (7.1) and (7.2), which make it possible to define W and φ. The asymptotics (7.3) and (7.4) are immediate consequences of (7.1) and (7.2). ˜ ˜ where φ˜ ∈ L∞ , then (7.3) and In particular, if we replace (W, φ) with (W ei φ , φ − φ), (7.4) still hold. Remark 11. This theorem states “almost” asymptotic completeness for small data for the modified wave operators introduced in [11]. Indeed, no regularity for the momenta of ψ is proved in [11]. In Corollary 1, we limit the loss of regularity, and in particular obtain for ψ that required in Theorem 3. 1/2 . From Corollary 1, Proof of Corollary 2. Let ψ− ∈ F(H), with ψ − L∞ < (π/|λ|) there exists a unique ψ ∈ C(R, ) solution of (1.2) satisfying (1.15), (1.16). The first step is then to check that for ψ− sufficiently small, ψ(0) < δ, so that we can use the results of Theorem 3. The second step consists in defining ψ+ . From Duhamel’s formula, one has 0 ψ(0) = U0 (C ∗ )ψ(−C ∗ ) − iλ U0 (−s)|ψ|2 ψ(s)ds. −C ∗
On the other hand, we saw that for C ∗ 1,
U0 (C ∗ )ψ(−C ∗ ) ≤ U0 (C ∗ )ψapp (−C ∗ ) + U0 (C ∗ )(ψ − ψapp )(−C ∗ ) ≤ Cψ− log C ∗ + C
log C ∗ 4 . C∗
From local estimates for (1.2), we see that there exist functions hj , j = 1, 2, 3, with h1 (x) −→ 0, h2 is increasing, and h3 (x) −→ 0, such that x→+∞
x→0
ψ(0) ≤ h1 (C ∗ ) + h2 (C ∗ )h3 (ψ− ).
(7.5)
Taking first C ∗ sufficiently large, then ψ− sufficiently small, we see that we can have ψ(0) < δ. Then Theorem 3 provides (unique) functions W and φ. Define ψ+ by ψ+ := F −1 W eiφ ∈ L2 (R).
(7.6)
Geometric Optics and Long Range Scattering for NLS
65
From (7.3) and (7.4), we have, in L2 , ψ(t)
eiλ|ψ+ ( t )|
∼
t→+∞
x
2
log t
which, along with Corollary 1, yields Corollary 2.
U0 (t)ψ+ ,
7.2. Consequences for nonlinear geometric optics. In Sect. 2, we mentioned the fact that to describe the asymptotics of uε after the caustic, one needs a modified scattering operator. Now we have one, we can describe uε globally. We first give a heuristic approach, then prove Corollary 3. We already noticed that the phase g ε (hence the symbol aε ) is defined only for t < 1. If we want a global description, we have to replace g ε with a phase φ ε which is defined for all t, and coincides asymptotically with g ε for t < 1. To guess which possible φ ε we can choose, recall Scaling (1.7). The function ψ ε solves (1.2), and we saw that, for t ∈] − ∞, T ], where T is finite, ψ ε (t) −→ ψ(t) in L2 ∩ L∞ . ε→0
Hence we have 1 i∂t ψ ε + ∂x2 ψ ε = λ|ψ|2 ψ ε + small. 2
(7.7)
Forget the “small” term. We now have to study a linear Schrödinger equation, with a time-dependent potential λ|ψ|2 . According to the vocabulary used in [3], this is not a short range potential, for it does not belong to L1t (L∞ x ). A scattering theory for long range potentials is available (see for instance [3]). The first idea is due to Dollard and consists in studying t ε 2 ψ (t, x) exp −iλ |ψ| (s, sξ )ds 0
in order to get rid of the long range part. In our context, this means that we can replace g ε with φ ε (t, ξ ) := −λ
t−1 ε
1 |ψ|2 (s, sξ )ds + λ|f (−ξ )|2 log . ε −1/ε
The symbol aε is now defined (globally in time) by x.ξ t−1 2 1 ε ε u (t, x) = √ e−i 2ε ξ +i ε +iφ (t,ξ ) aε (t, ξ )d¯ξ. ε
(7.8)
(7.9)
Now from Corollary 1, one has, for t < 1, |ψ(s, sξ )| =
1 1 ∞ |ψ − (ξ )| + o(1) in Lt (Lx ), |2π s|1/2
∞ hence, in L∞ t,loc (0, 1; Lx ),
φ ε (t, ξ ) = g ε (t, ξ ) + o(1).
(7.10)
66
R. Carles
Therefore, even with this new definition of aε , we have, for t < 1, aε (t, ξ ) −→ a0 (ξ ) in L2 . ε→0
Similarly, for t > 1 and from Theorem 3, there exists a function H (that depends on ∞ ψ) such that in L∞ t,loc (1, 2; Lx ), t −1 + H (ξ ) + o(1). ε
φ ε (t, ξ ) = −λ|W (ξ )|2 log In particular, since aε (t, ξ ) = e−iφ
ε (t,ξ )
(7.11)
1−t t −1 F U0 ψε ε ε
and the map ϕ → (W, φ) in Theorem 3 is continuous, 2 aε (t, ξ ) −→ e−iH (ξ )+iφ(ξ ) W (ξ ) = e−iH (ξ ) ψ + , in L . ε→0
(7.12)
Apparently, the limit of aε depends on this function H . One must bear in mind that this function H is closely related to our choice in the definition of the new phase φ ε . For instance, one can check that replacing φ ε with φ ε (t, ξ ) + h1 (ξ )
t−1 ε
−1/ε
h2 (s)ds,
where h1 ∈ L∞ , h2 ∈ L1 , would just alter the definition of H . Thus this function appears as a parameter in the definition of aε . Nevertheless, the asymptotics for uε is independent of H . It is given, in L2 , by (7.12), (7.11) and the first part of Lemma 1. This leads to the asymptotics given in Corollary 3 for t > 1. The asymptotics for t < 1 is a simple consequence of Theorem 2 and (7.10). This completes the proof of Corollary 3. Acknowledgements. I would like to thank Professor A. Bressan for his invitation at SISSA, where this work was achieved. This research was supported by the European TMR ERBFMRXCT960033.
References 1. Barab, J. E.: Nonexistence of asymptotically free solutions for nonlinear Schrödinger equation. J. Math. Phys. 25, 3270–3273 (1984) 2. Carles, R.: Geometric optics with caustic crossing for some nonlinear Schrödinger equations. Indiana Univ. Math. J. 49, 475–551 (2000) 3. Derezi´nski, J., and Gérard, C.: Scattering theory of quantum and classical N-particle systems. Texts and Monographs in Physics, Berlin–Heidelberg: Springer Verlag, 1997 4. Duistermaat, J. J.: Oscillatory integrals, Lagrangian immersions and unfolding of singularities. Comm. Pure Appl. Math. 27, 207–281 (1974) 5. Ginibre, J.: Introduction aux équations de Schrödinger non linéaires. Cours de DEA, Paris Onze Édition (1995) 6. Ginibre, J.: An introduction to nonlinear Schrödinger equations. In: Nonlinear waves (Sapporo, 1995). Gakk¯otosho, R. Agemi and Y. Giga and T. Ozawa (eds.), GAKUTO International Series, Math. Sciences and Appl., 1997, pp. 85–133 7. Ginibre, J., and Velo, G.: On a class of nonlinear Schrödinger equations. III. Special theories in dimensions 1, 2 and 3. Annales de l’Institut Henri Poincaré. Section A. Physique Théorique. Nouvelle Série 28, 287– 316 (1978)
Geometric Optics and Long Range Scattering for NLS
67
8. Ginibre, J., and Velo, G.: Long Range Scattering and Modified Wave Operators for some Hartree Type Equations III. Gevrey spaces and low dimensions. J. Diff. Eq., to appear 9. Hayashi, N., and Naumkin, P.: Asymptotics for large time of solutions to the nonlinear Schrödinger and Hartree equations. Am. J. Math. 120, 369–389 (1998) 10. Hunter, J., and Keller, J.: Caustics of nonlinear waves. Wave Motion 9, 429–443 (1987) 11. Ozawa, T.: Long range scattering for nonlinear Schrödinger equations in one space dimension. Commun. Math. Phys. 139, 479–493 (1991) 12. Strauss, W.: Nonlinear scattering theory. In: Scattering theory in mathematical physics, J. Lavita and J. P. Marchands (eds.), Dordrecht: Reidel, 1974 13. Strauss, W.: Nonlinear scattering theory at low energy. J. Funct. Anal. 41, 110–133 (1981) Communicated by A. Kupiainen
Commun. Math. Phys. 220, 69 – 94 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Regularized Products and Determinants Georg Illies IHES, Le Bois-Marie, 35, Route de Chartres, 91440 Bures-sur-Yvette, France. E-mail:
[email protected] Received: 4 April 2000 / Accepted: 15 January 2001
Abstract: Zeta-regularized products are used to define determinants of operators in infinite dimensional spaces. This article provides a general theory of regularized products and determinants which delivers a better approach to their existence and explicit determination. 1. Introduction The zeta-regularized product of a sequence ak ∈ C∗ is defined by ∞ ∞ d −s ak := exp − ak |s=0 , ds k=1
(1.1)
k=1
provided that the Dirichlet series converges absolutely in a half plane and can be meromorphically continued to the left of (s) = 0; the evaluation at s = 0 means the constant term of the Laurent expansion. This obviously generalizes the ordinary finite product. Zeta-regularization was first used to define analytic torsion [RS] and since then has played a role in global analysis, the theory of dynamical zeta functions, and Arakelov theory. Theoretical physicists use zeta-regularization as a method for renormalization in quantum field theories [EORBZ] and various papers (e.g. [Ef, Ko1, Ko2, Ko3, Sa]) have calculated the regularized determinant of Laplacians. Zeta-regularization also appeared in a conjectural cohomological approach to motivic L-functions ([De1, De2, De3, Ma]). In that context the question appeared as to which meromorphic functions of finite order (e.g. motivic L-functions) are zeta-regularized, i.e. can be represented as f (z) = (z − ρ)±1 (1.2) Present address: Algebra und Zahlentheorie, Fachbereich Mathematik, Universität – Gesamthochschule Siegen, Walter-Flex-Str. 3, 57068 Siegen, Germany. E-mail:
[email protected] 70
G. Illies
where the product is over all zeroes and poles ρ of f (z) with multiplicities and the sign of the exponent being positive for zeroes and negative for poles. This turns out to be the basic problem of zeta regularized determinants and it was the starting point of the following investigation which, we hope, gives a satisfying answer to the question. Regularization entails several technical problems because of the meromorphic continuation of the Dirichlet series. For example, the regularized product of all primes does not exist as p −s has the natural boundary (s) = 0 ([LW]). The aim of this paper is to give a better approach to regularized products improving the formalism in [Vo, CV, QHS] and [JL1] (compare Sect. 6 below) which is based on the representation of the Dirichlet series as the Mellin transform of the series θ(t) :=
∞
eak t .
(1.3)
k=1
In many applications arg(ak ) varies in such a way that this series does not converge for any t, for example in the product (1.2). This problem can be solved by instead using a kind of Hankel integral of the Dirichlet series (see Sect. 7). To treat the product (1.2) one has by definition to regard the function ζ (s, z) := ±(z − ρ)−s . This paper is also thought of as an examination of the analytic and asymptotic properties of this generalized Hurwitz zeta function which should be interesting for its own sake. Before giving a short overview we introduce the notion of a divisor which is basic for all that follows. A divisor D is given by a function mD : C → Z such that there is a β > 0 with |mD (ρ)| < ∞. |ρ|β
(1.4)
ρ∈C
Condition (1.4) reflects that the Dirichlet series in Definition (1.1) must converge absolutely in a half plane. We recall a fundamental fact from the theory of entire functions of finite order (compare [Ti] for the proof): A function mD : C → Z gives rise to a divisor if and only if there is a meromorphic function f (z) of finite order (i.e. f (z) is the quotient of two entire functions of finite order) such that mD (ρ) = ord f (z), z=ρ
ρ ∈ C,
thus D is the divisor of f (z) in the usual sense. And this function f (z) is determined by D up to an exponential polynomial, i.e. a function g(z) is meromorphic of finite order with divisor D if and only if there is a polynomial P (z) with g(z) = eP (z) f (z). After introducing some notation (Sect. 2) we define a general class of regularized products in Sect. 3, zeta-regularization being just an example; the rhs of (1.2) with the multiplicities mD (ρ) in the case of its existence is called regularized determinant and denoted by (z − ρ)±1 . (1.5) D (z) := We prove that D (z) is a meromorphic function of finite order with divisor D, thus equals eP (z) f (z) for a certain polynomial P (z). Regularization means finding this polynomial. Section 4, a sort of theoretical excursion, discusses axiomatic generalizations of the regularization process and shows that a theory of regularization should deal with quasidirected divisors (defined in Sect. 2).
Regularized Products and Determinants
71
If the Dirichlet series in the definition of regularized products also satisfies certain exponential estimates and does not have too many poles we speak of bounded regularizability (Sect. 5). In that case one can apply certain integral transformations, especially the Mellin transform, the mentioned Hankel integral and the Laplace transform, to get Theorems 3, 4 and 6 of Sects. 6 and 7 and 9. They give the equivalence of bounded regularizability with certain asymptotics for θD (t), ζD (s, z) and for the function θD (t, s) which is defined as the Laplace transform of ζD (s, z). As a corollary of Theorem 4 one gets Theorem 5 in Sect. 7 which is the fundamental theorem of the theory of regularization. It states that D is bounded regularizable if and only if for some 0 < ψi < π , i = 1, 2 and ε > 0 an asymptotic log f (z) =
zαi logni z + o(|z|−ε ) |z| → ∞
(1.6)
i
with finite sum, αi ∈ C and ni ∈ N0 , is valid for −ψ2 < arg(z) < ψ1 . D (z) exists in that case and also the polynomial P (z) can be determined in terms of this asymptotic of log f (z) which is very intrinsic. These results deliver a satisfying theory of regularization and apply to a large class of examples. In Sect. 8.1 for instance it is shown that every meromorphic function of finite order representable by a Dirichlet series is regularized; this improves results of Jorgenson and Lang ([JL1, JL2]) who had to assume that it also satisfies a functional equation. In 8.2 we regularize higher "-functions. Thus Sect. 8 is applicable to various kinds of zeta and L-functions. The function θD (t, s), introduced in Sect. 9, is a type of multivalued theta function and plays a central role in [Il2]. There is also an alternative approach to regularization via renormalizing certain divergent integrals (Sect. 10). The following three sections contain technical proofs which were postponed. The article reproduces the main results of Chapter 2 of my thesis [Il1] in a more special context. In some cases we only give sketches of the proofs, for complete proofs, generalizations and further results the reader is referred to [Il1]. In [Il2] it is shown how to apply the theory of regularization to generalize results of Cramér ([Cr]) and Guinand ([Gui]) thus improving results of [JL2].
2. Notation In the sequel f (z) denotes a meromorphic function of finite order and D its divisor. We define two important parameters: The exponent r of f and D is the infimum of all β > 0 satisfying (1.4); the genus g of f and D is the smallest n ∈ N0 such that (1.4) is satisfied for β = n + 1; note g + 1 ≥ r ≥ g. We will say that D lies in a set M ⊂ C if mD (ρ) = 0 implies that ρ ∈ M. Let 0 < ϕi < π, i = 1, 2, then we define open connected sets Wrϕ1 ,ϕ2 := {z ∈ C∗ | − ϕ2 < arg(z) < ϕ1 }, Wlϕ1 ,ϕ2 := C∗ \Wrϕ1 ,ϕ2 and a contour Cϕ1 ,ϕ2 consisting of the ray from e−ϕ2 i ∞ to 0 and the ray from 0 to eϕ1 i ∞; thus C = Wlϕ1 ,ϕ2 ∪ Cϕ1 ,ϕ2 ∪ Wrϕ1 ,ϕ2 is a disjoint union.
72
G. Illies
(z) ✻
Cϕ1 ,ϕ2 ✡✡ ✣ ✡ ✡ ✡ ✡ ✡ ϕ1 ◗ϕ2 Wrϕ1 ,ϕ2 ◗ ◗ ◗ ◗ ◗ ◗ ❦
Wlϕ1 ,ϕ2
✲
(z)
A divisor D is called directed if it lies in a Wlϕ1 ,ϕ2 . It is called quasi-directed if it is directed with the exception of finitely many ρ, and it is called strictly directed if it lies in a Wlϕ1 ,ϕ2 with ϕ1 > π2 and ϕ2 > π2 . We will also write ρ ∈ D instead of mD (ρ) = 0 and use the following notation:
ϕ(ρ) :=
ρ∈D
mD (ρ)ϕ(ρ).
ρ∈C
3. Xi Functions and Regularization Definition 3.1. If D is a directed divisor, UD := {z ∈ C | |z| < |ρ|, ρ ∈ D} and the argument is chosen so that −π < arg(z − ρ) < π then ξD (s, z) :=
ρ∈D
"(s) , (z − ρ)s
(s) > r,
z ∈ UD ,
(3.1)
is called the Hurwitz xi function of D; ξD (s) := ξD (s, 0) is called the xi function of D. Convergence is absolute and ξD (s, z) is holomorphic in both variables. Proposition 3.2. ξD (s, z) satisfies the following differential equation: d ξD (s, z) = −ξD (s + 1, z). dz
(3.2)
A function f (z) is meromorphic of finite order with divisor D if and only if for some l ≥ g: d l+1 log f (z) = (−1)l ξD (l + 1, z). dz
(3.3)
Proof. Equation (3.2) follows by taking the term by term derivative. For (3.3) check that d l+1 Wei,l log D,a (z) defined in (3.6) below satisfies (3.3) and observe that the operation dz exactly kills exponential polynomials of degree ≤ l.
Regularized Products and Determinants
73
Proposition 3.3. For z ∈ UD the following absolutely convergent Taylor series expansion is valid: ξD (s, z) =
∞
(−1)m ξD (s + m)
m=0
zm . m!
(3.4)
If ξD (s) is meromorphic for (s) > −p then ξD (s, z) is also meromorphic for (s) > −p and holomorphic for z ∈ U for any simply connected U ⊂ C with UD ⊂ U and ρ ∈ U for all ρ ∈ D. Proof. The Taylor series follows from (3.2); for the meromorphy in s observe that shifting the coefficients does not change the convergence radius. The continuation in z is obtained by treating finitely many ρ ∈ D separately. Definition 3.4. A regularization sequence δ is a sequence of complex numbers δ0 , δ1 , . . . with δ0 = 1. Formally let δ(s) := δ0 + δ1 s + δ2 s 2 + . . . . A directed divisor D is called regularizable if ξD (s) is meromorphic in a half plane (s) > −ε with ε > 0. For z ∈ U D (z) := exp(−CTs=0 (δ(s)ξD (s, z)))
(3.5)
is called the δ-regularized determinant of D. One calls D (0) the δ-regularized product of D. Note that CTs=0 means the constant term in the Laurent expansion at s = 0. If δ(s) is a divergent series, then one has to develop ξD (s, z) in a Laurent series and multiply it formally with the formal series for δ(s). In the sequel there will often appear formulas which must be interpreted in this formal sense. Examples. 1) xi-regularized determinant (Jorgenson, Lang): δ(s) = 1, 2) zeta-regularized determinant: δ(s) = " −1 (s + 1), 3) zero-renormalized determinant: δ(s) = "(1 − s). Remark. The factor δ(s) isintroduced because of several reasons. First one wants to handle "scaled" products a(z − ρ) (compare [De1, De2, De3]). It also turned out that the canonical way of renormalization (see Theorem 7 in Sect. 10) differs from zeta-regularization. A further reason is that in [JL1] xi-regularization was used which is technically the simplest regularization. While zeta-regularization as well as zerorenormalization generalize the ordinary finite product (as every regularization with δ1 = γ does, γ the Euler-Mascheroni constant), xi-regularization does not. Zeta-regularization satisfies the product rule ρ n = ( ρ)n so comes closest to what one would expect for a product. Fix a ∈ C with mD (a) = 0 and - ≥ g, then we define the absolutely convergent Weierstrass product
mD (ρ) z−a 1 z − a k Wei,D,a (z) := 1− , (3.6) exp ρ−a k ρ−a ρ∈C
k=0
which is a meromorphic function of finite order with divisor D. For a = 0 and g = l one has the usual canonical Weierstrass product (compare [Ti]).
74
G. Illies
Theorem 1. D (z) is a meromorphic function of finite order with divisor D. The explicit relation to Weierstrass products is given by D (z) = eP (z) Wei,D,a (z),
(3.7)
where with a suitable branch of the logarithm P (z) =
(z − a)m log(m) D (a), m!
(3.8)
m=0
log(m) D (z) = (−1)m+1 CTs=0 (δ(s)ξD (s + m, z)) m = 0, 1, . . . . In the sequel a meromorphic function of finite order f (z) with divisor D will be called δ-regularized if it equals D (z). Proof of Theorem 1. We have (
d m+1 log D (z) = (−1)m CTs=0 (δ(s)ξD (s + m + 1, z)) ) dz = (−1)m ξD (m + 1, z) for m ≥ g;
(3.9) (3.10)
the first equation holds because of (3.2) and is valid for all m ∈ N0 . It is also true that ξD (s, z) is holomorphic for s = m + 1 if m ≥ g by the definition of g. Comparison with (3.3) proves the first assertion. One also easily checks d m+1 0 for m = −1, 0, . . . , - − 1 Wei,log D,a (z)|z=a = (−1)m ξD (m + 1, a) for m ≥ -. dz Using this as well as (3.9) and (3.10) the explicit relation to Weierstrass products follows by subtracting the Taylor series expansion around s = a for log Wei,D,a (z) from that for log D (z). 4. Determinant Systems We will call a function f (z) associated to a divisor D if there is a polynomial P (z) such that f (z) = eP (z) D,a (z) and deg P ≤ g Wei,g
or equivalently that (3.3) is satisfied for l = g (compare the proof of (3.3)). Observe that Wei,g in this definition we have set - = g in (3.6). Note also that if D,a (z) is, in addition, entire then its order is exactly r and no entire function with divisor D can have a smaller order (compare [Ti]). So associated functions have minimal order because of g ≤ r. By Theorem 1 regularization means picking out a certain associated function to a divisor. Now we ask for extensions of this process to non-regularizable divisors. For α ∈ C we define the translated divisor D |+α by mD |+α (z + α) := mD (z) and the sum D1 + D2 by mD1 +D2 (z) := mD1 (z) + mD2 (z). Let Dfin be the abelian group of all finite divisors, Dbreg that of all bounded regularizable quasi-directed divisors (compare Sect. 5), Dreg that of all regularizable quasi-directed divisors, i.e. those which are regularizable directed after eliminating finitely many ρ, Dqd that of all quasi-directed divisors, and D that of all divisors. These are all translation-invariant with proper inclusions Dfin ⊂ Dbreg ⊂ Dreg ⊂ Dqd ⊂ D. The following definition arises from the demand for generalizations of the characteristic polynomial to the infinite dimensional case.
Regularized Products and Determinants
75
Definition 4.1. Let D ⊆ D be a translation-invariant subgroup. A determinant on D attaches to every D ∈ D an associated function D (z), such that: i) D |+α (z + α) = D (z) ii) D1 +D2 (z) = D1 (z)D2 (z)
(translation-invariance) (linearity)
for D, D1 , D2 ∈ D , α ∈ C. (D , ) is called a determinant system. Examples. 1) (Dfin , ) with characteristic “polynomial” which is a rational function defined by D (z) := ρ∈C (z − ρ)mD (ρ) . 2) (Dreg , ) with the δ-regularized determinant . For a D ∈ Dreg which is not directed, D (z) can be defined by translation-invariance. Theorem 2 answers the question of how large determinant systems can be. Theorem 2. a) There is no determinant system (D, ). b) For every regularization sequence δ there is a determinant system (Dqd , ) which is an extension of the δ-regularized determinant system (Dreg , ). Proof. a) Let D be the divisor that consists only of zeroes of order one lying at the lattice points ρ = m + ni, m, n ∈ Z. From translation invariance one gets D (z + 1) = D (z) and D (z + i) = D (z). Hence D (z) must be a doubly-periodic entire non-constant function which is impossible by Liouville’s theorem. b) (Idea) One has to choose the exponential polynomials consistently with linearity and translation-invariance. This leads to a system of infinitely many linear equations with infinitely many variables which can be reduced to finite systems by Zorn’s Lemma. (See Sect. 11 for a complete proof.) Remark. Not every determinant system is extendable, so b) is an aesthetic property of regularization. The proof is non-constructive and its extensions are not uniquely determined. The meaning of regularization is that it gives large constructively defined determinant systems. 5. Bounded Regularizability, Singularities and Asymptotics In this section we introduce the special case of bounded regularizability of divisors and give all the neccessary technical definitions to formulate the results of Sects. 6, 7 and 9 which state its equivalence to various asymptotics. Definition 5.1. Let D be a directed divisor, 0 < σi < π for i = 1, 2 and p ∈ R ∪ {∞}, then D resp. ξD (s) are called (σ1 , σ2 )-bounded p-regular if: i) ξD (s) is meromorphic for (s) > −p. ii) ξD (s) has only finitely many poles in the strip α1 < (s) < α2 for any −p < α1 < α2 . iii) For all −p < α1 < α2 and σ1 < σ1 , σ2 < σ2 , π O(e( 2 −σ2 )(s) ) for (s) → ∞ ξD (s) = π O(e−( 2 −σ1 )(s) ) for (s) → −∞ in the strip α1 < (s) < α2 .
76
G. Illies
We simply say bounded p-regular, if there are 0 < σi < π such that (σ1 , σ2 )-bounded p-regularity is valid. We have bounded regularizability if p > 0. Note that every directed divisor D in Wlϕ1 ,ϕ2 is (ϕ1 , ϕ2 )-bounded (−r)-regular as follows from Stirling’s formula. Definition 5.2. A pB-System consists of: 1. A finite or infinite sequence of pairs (pn , Bn (z))n=0,1,2,... with complex numbers pn satisfying (p0 ) ≤ (p1 ) ≤ . . . ≤ (pn ) ≤ . . . and polynomials Bn (z) ∈ C[z], Bn (z) = k bn,k zk . 2. An abscissa p ∈ R ∪ {∞} such that p > (pn ) for all n, and in addition for infinite sequences: p = limn→∞ (pn ). pB-systems capture the simultaneous information about the occurring singular part distributions and the occurring asymptotics. Example. If the divisor D is (σ1 , σ2 )-bounded p-regular, then there is a pB-system (pn , Bn (z))n=0,1,2,... with abscissa p such that the poles of ξD (s) in the half plane (s) > −p lie exactly at the values s = −pn and the Laurent expansions have the singular parts Bn (∂s )[
(−1)k k! 1 ]= bn,k . s + pn (s + pn )k+1 k
(−pn , Bn (∂s )[
1 ])n=0,1,... s + pn
is called the singular part distribution of ξD (s) in that case. Definition 5.3. Let (pn , Bn (z))n=0,1,... be a pB-system with abscissa p as above and 0 < σi ≤ ϕi < π, i = 1, 2. A function θ : Wrϕ1 ,ϕ2 −→ C satisfies the Cramér asymptotic with abscissa p in Wrσ1 ,σ2 , θ (t) ∼
∞
t pn Bn (log t) for |t| → 0,
n=0
if the estimate for t ∈ Wrσ1 ,σ2 θ (t) − t pn Bn (log t) = O(|t|q ) for |t| → 0 (pn ) r, ∞ ξD (s) = θD (t)t s−1 dt,
(6.2)
0
and its inverse for t ∈ Wr(ϕ2 − π2 ),(ϕ1 − π2 ) and c > r c+i∞ 1 ξD (s)t −s ds θD (t) = 2π i c−i∞
(6.3)
with absolute convergence of the integrals. Proof. Because of the theorem about Mellin inversion it suffices to prove (6.3) and by majorized convergence, this is reduced to the case of a one-point-divisor. In that case (6.3) is the inverse formula for Euler’s Mellin integral for "(s). This approach is only possible for strictly directed divisors as otherwise the defining series for θD (t) does not converge for any t. Theorem 3. Let π2 < σi ≤ ϕi < π for i = 1, 2 and D be strictly directed in Wlϕ1 ,ϕ2 , and let (pn , Bn (z))n=0,1,... be a pB-system with abscissa p. Then the following statements are equivalent: A) ξD (s) is (σ1 , σ2 )-bounded p-regular with singular part distribution (−pn , Bn (∂s )[
1 ])n=0,1,... . s + pn
C) θD (t) satisfies a Cramér asymptotic with abscissa p of the form θD (t) ∼
∞
t pn Bn (log t) for |t| → 0
n=0
in Wr(σ2 − π2 ),(σ1 − π2 ) . Proof (sketch). C) ⇒ A) is shown by (6.2): The poles and singular parts of ξD (s) arise by integrating the terms of the Cramér asymptotic, and the exponential estimation for ξD (s) in vertical strips can be shown by rotating the ray of integration in (6.2) in Wr(σ2 − π2 ),(σ1 − π2 ) . A) ⇒ C) follows from (6.3) by replacing the abscissa c of the line of integration by a smaller c > −p and applying the residue theorem. The residues of the integrand produce the terms of the Cramér asymptotic.
78
G. Illies
Remark 1. This theorem is well known in the context of the Mellin and Laplace transform (e.g. [Do, II, Chap. 5], where a complete proof can be found). Using it one can decide whether a strictly directed divisor is bounded regularizable or not, by checking for the existence of a suitable Cramér asymptotic for the partition function. For example the regularized product of the eigenvalues of Laplacians on manifolds exists because of Cramér asymptotics arising from heat kernel expansions (comp. [Ef, Ko1, Ko2, Ko3, Sa]). Remark 2. In the case of strictly directed divisors also the implication A) ⇒ B’) of Theorem 4 and, in particular, the asymptotic (7.8) can be obtained by the Mellin integral ξD (s, z) =
∞
0
θD (t)e−zt t s−1 dt
using 0
∞
pn
t Bn (log t)e
−zt s−1
t
"(s + pn ) . dt = Bn (∂s ) zs+pn
The Mellin transform approach to regularized products and determinants (the details can be found in Sect. 2.4 in [Il1]) was also extensively studied by Jorgenson and Lang ([JL1]).
7. Hankel Integrals and Stirling Asymptotics The Mellin integral method has two shortcomings: It is possible only for strictly directed divisors and the partition function is a non-intrinsic construction, one wants criteria in terms of associated functions. In this section we solve these problems postponing the technical proofs until Sect. 12. For powers a s and log(a) we always use −π < arg(a) < π. Proposition 7.1. Let D be a directed divisor in Wlϕ1 ,ϕ2 (0 < ϕi < π ). a) ξD (s, z) =
1 2π i
c+i∞
c−i∞
"(s − s ) ξD (s )ds zs−s
(7.1)
for z ∈ Wrϕ1 ,ϕ2 and (s) > c > r with absolute convergence of the integral. b) Let 0 < σi < ϕi for i = 1, 2, C = Cσ1 ,σ2 and z ∈ Wrσ1 ,σ2 . Then for (s) > r and (s0 ) > r one has the absolutely convergent integral representation ξD (s, z) =
1 2π i
C
"(s − s0 + 1) ξD (s0 , w)dw. (z − w)s−s0 +1
(7.2)
In the sequel the representations (7.1) and (7.2) play a similar role as (6.2) and (6.3) in Sect. 6. For the explicit description of the Stirling asymptotics we need the following definition.
Regularized Products and Determinants
79
Definition 7.2. Let δ be a fixed regularization sequence. Then for any q ∈ C we define the linear map [q] : C[z] −→ C[z] B(z) −→B [q] (z), by
"(s + q) CTs=0 δ(s)B(∂s ) = z−q B [q] (log z). zs+q
(7.3)
For Pk (z) := zk we get: [q]
Pk (z) =
k j =0
(−1)j
k (k−j ) " (q)zj j
(7.4)
in the case that q = −n for all n ∈ N0 , while for q = −n, [q] Pk (z)
k CTs=q (" (k−j ) (s))zj = (−1) j j =0 (−1)n (−1)k+1 k+1 z + + (−1)k k!δk+1 . n! k+1 k
j
(7.5)
Special case B(z) = b0 . Easy calculations using the fact that "(z) is holomorphic for (−z) ∈ N0 as well as the expansion "(s) = 1s − γ + . . . (γ the Euler–Mascheroni constant) and "(s − n) = "(s)((s − 1)(s − 2) . . . (s − n))−1 deliver for q = −n b0 "(q) n 1 (7.6) B [q] (z) = (−1)n+1 z + γ − δ1 − j =1 j for q = −n, b0 n! (for zeta-regularization as well as for zero-renormalization one has δ1 = γ .) The following basic properties of [q] are clear by (7.4), (7.5) and the definition. Proposition 7.3. a) [q] is bijective for q = 0, −1, −2, . . . with deg B [q] = deg B. b) [q] is injective for q = 0, −1, −2, . . . with deg B [q] = deg B + 1 and with dim Coker([q]) = 1. c) d −q [q] z B (log z) = −z−(q+1) B [q+1] (log z). dz
(7.7)
Remark 3. In particular, every Stirling asymptotic with abscissa p can be represented as linear combination of terms of the form z−q B [q] up to a polynomial P (z) with1 P (z)zp → 0 for |z| → 0, and this polynomial is uniquely determined. This shows that the asymptotics in B’) of Theorem 4 and B) of Theorem 5 are general Stirling asymptotics which are written in a special manner. And this also means that the Stirling asymptotic (7.8) is an effective method to determine the regularized determinant among all associated functions for D. 1 Observe: Terms z−q with q ≥ p make no sense in Stirling asymptotics with abscissa p.
80
G. Illies
Remark 4. Part c) of the proposition together with (3.2) shows that the Stirling asymptotics in B’) and B) can de differentiated term by term. Theorem 4. Let 0 < σi ≤ ϕi < π for i = 1, 2 and D be a directed divisor in Wlϕ1 ,ϕ2 , let p ∈ R ∪ {∞} and ξD (s) be meromorphic for (s) > −p (compare Prop. 3.3). Then for any regularization sequence δ, s0 with (s0 ) > −p and a pBsystem (pn , Bn (z))n=0,1,... with abscissa p ∈ R ∪ {∞} the following statements are equivalent: A) ξD (s) is (σ1 , σ2 )-bounded p-regular with the singular part distribution (−pn , Bn (∂s )[
1 ])n=0,1,... . s + pn
B’) There is a polynomial Ps0 (z) with Ps0 (z)zp+s0 → 0 for |z| → 0 and such that the Stirling asymptotic with abscissa p + (s0 ) CTs=0 (δ(s)ξD (s + s0 , z))) ∼ Ps0 (z) +
∞
[pn +s0 ]
z−(pn +s0 ) Bn
(log z) for |z| → ∞
n=0
is valid in Wrσ1 ,σ2 . The polynomial in B’) is then uniquely determined: Ps0 (z) = 0. The idea of the proof given in Sect. 12 is rather similar to the proof of Theorem 3. To get the Stirling asymptotic B’) from the singular part distribution A) one uses (7.1), shifts the line of integration and applies the residue theorem. The other direction is a little bit more difficult but the basic idea is of course to use (7.2) and integrate the Stirling asymptotic term by term. Some technical difficulties arise because (7.2) is not valid for z = 0. Using Eqs. (3.2), (3.3) and (3.5) one obtains the following theorem as an easy corollary of Theorem 4. Theorem 5. Let 0 < σi ≤ ϕi < π for i = 1, 2 and D be a directed divisor in Wlϕ1 ,ϕ2 ; let f (z) be a meromorphic function of finite order with divisor D. Then for a regularization sequence δ, m ∈ N0 and a pB-system (pn , Bn (z))n=0,1,... with abscissa p ∈ R ∪ {∞} the following statements are equivalent: A) ξD (s) is (σ1 , σ2 )-bounded p-regular with singular part distribution (−pn , Bn (∂s )[
1 ])n=0,1,... . s + pn
B) There is a polynomial Pf (z) with Pf (z)zp → 0 for |z| → 0, such that the Stirling asymptotic with abscissa (p + m) (m)
log
f (z) ∼
(m) Pf (z) + (−1)m−1
is valid in Wrσ1 ,σ2 .
∞ n=0
[pn +m]
z−(pn +m) Bn
(log z) for |z| → ∞
Regularized Products and Determinants
81
Pf (z) in B) can then be chosen independent of m, it is (up to the choice of the logarithm) uniquely determined. If, in addition, p > 0, so D is bounded regularizable, then for the δ-regularized determinant one has PD = 0, i.e. the following Stirling asymptotic with abszissa p in Wrσ1 ,σ2 is valid: log D (z) ∼ −
∞
[pn ]
z−pn Bn
(log z) for |z| → +∞.
(7.8)
n=0
Theorem 5 can be regarded as the fundamental theorem about bounded regularizability by Remark 3 is states that whenever a log(m) f (z) satisfies any Stirling asymptotic with abscissa greater than zero, then f (z) and its divisor D are bounded regularizable, and (7.8) allows to determine its regularized determinant, i.e. the polynomial P (z) mentioned in the introduction. The triple equivalence A) ⇔ B) (⇔ C)) given by Theorems 3 and 5 where the latter equivalence is valid only for strictly directed divisors will be generalized in Sect. 9 (Theorems 4 and 6) to an equivalence A) ⇔B’) ⇔ C’) valid for all directed divisors which summarizes all informations about singular part distributions and asymptotics of ξD (s, z) and θD (t, s). 8. Examples 8.1. Dirichlet series. Corollary 8.1. If f (z) is meromorphic of finite order and has an absolutely convergent Dirichlet series representation f (z) = 1 +
∞ βn n=0
αnz
, (z) > σ0 ,
with βn ∈ C and αn ∈ R>1 with limn→∞ αn = ∞, then f (z) is δ-regularized for every regularization sequence δ, i.e. f (z) = D (z), in particular, f (z) is associated to its divisor (compare to Sect. 4). ξD (s, z) is holomorphic for s ∈ C. Proof. By the Taylor series expansion for log(1 + x) it is clear that the trivial Stirling asymptotic log f (z) ∼ 0 as |z| → 0 with abscissa +∞ is valid in Wr( π2 −ε),( π2 −ε) , so by Theorem 5 and Proposition 3.3 the assertion is clear. Remark 5. Using only the Mellin integral method one needs to assume that f (z) also satisfies a functional equation and examines eρt , θD− (t) := eρt θD+ (t) := ρ∈D,(ρ)>0
ρ∈D,(ρ)≤0
separately. For f (z) = ζ (z) the Riemann zeta function a classical result of Cramér ([Cr]) delivers the Cramér asymptotics for θD+ (t) and θD− (t) (with logarithmic terms in contrast to the examples from the spectra of Laplacians mentioned in Sect. 6) and thus regularizability of ζ (z) ([So, ScSo]). In [JL2] Cramér’s result was generalized to a class of f (z) as in the above corollary which in addition satisfies a functional equation, and their result implies regularizablity
82
G. Illies
of all these functions. Corollary 8.1 shows regularizability for a much larger class and moreover one no longer needs Cramér’s result. The methods of this section of [Il2] also apply to the “polynomial Bessel fundamental class” introduced in [JL3]. Nevertheless a functional equation is neccessary if one wants to get information about θD+ (t) (compare [Il2]). Remark 6. Theorem 5 gives satisfying criteria for deciding whether a function is bounded regularizable or not. For example, consider the function 1 2 f (z) = √ (z2 + 1)(1 + e−z )"(z) + e−z " (z). 2π √ It can be immediately seen that it is zeta-regularized: ( 2π )−1 "(z) is zeta-regularized because of (8.5) below, and this is true for (z2 + 1) because it is a characteristic polynomial, and this holds for (1 + e−z ) because it is a Dirichlet series; the second summand is small (in an angular domain) compared to the first and does not change the Stirling asymptotic of log f (z). 8.2. "-functions. In §2.8 of [Vi] the functions "n (z) were defined which appear in the functional equations of Selberg zeta functions and which are special cases of the general higher "-functions introduced by Barnes ([Bar]). They are simple examples for regularization with non-trivial Stirling asymptotics and their zeta-regularization can already be found in [Va, Ku] and [Ma, §3.3]. We give the following definition which is equivalent to that of Vigneras. Definition 8.2. The sequence ("n (z))n=0,1,... of "-functions of order n is defined by the following conditions: 0) "0 (z) = 1z . 1) "n−1 (z), n ∈ N, is an entire function of finite order and the divisor Dn of "n (z) consists
exactly of the ρ = −k, k ∈ N0 with multiplicity − n+k−1 n−1 . 2) "n (1) = 1 for all n ∈ N0 . 3) For all n ∈ N0 the following functional equation is valid: "n+1 (z) "n+1 (z + 1) = . "n (z) Using higher Bernoulli polynomials ([No]) one has for n ∈ N0 , 1 θDn (t) = −(−θD1 (t))n = − , (1 − e−t )n (8.1) ∞ (−1)ν Bνn (0) ν−n =− for 0 < |t| < 2π , t ν! ν=0
thus by Theorem 3 the Dn are bounded regularizable. Applying (7.6) and (7.8) one gets the Stirling asymptotic with abscisssa +∞ for the δ-regularized determinant k n n (0) B 1 n−k log z + γ − δ1 − zk log Dn (z) ∼ (−1)n+1 (n − k)!k! j k=0 j =1 (8.2) ∞ n (k − 1)!B (0) n+k z−k for |z| → ∞ + (−1)n+k (n + k)! k=1
Regularized Products and Determinants
83
in Wr(π−ε),(π−ε) . Proposition 8.3. The functions "n (z) are well defined; one has "n (z) = e−Pn (z) Dn (z) with polynomials Pn (z) of degree ≤ n which are determined (e.g. using Lagrange interpolation) by the relations j −1 j −1 Pn (j ) = log Dn−i (1), j = 1, . . . , n + 1. (−1)i (8.3) i i=0
The values log Dn (1) := −CTs=0 (δ(s)ξDn (s, 1)) can be expressed in terms of the Riemann zeta function: log Dn (1) =
n−1
τn,l ζ (−l) + (δ1 − γ )
l=0
n−1
τn,l ζ (−l)
(8.4)
l=0
for n ≥ 1 and log D0 (1) = δ1 − γ , with the Euler-Mascheroni constant γ and τn,l from the development n−1 n+x−1 = τn,l (x + 1)l . n−1 l=0
Proof. One shows that there exists exactly one choice of polynomials Pn (z) with Pn+1 (z + 1) + Pn (z) − Pn+1 (z) = 0 Pn (1) = log Dn (1) for n ∈ N0 , with deg P0 = 0. (Because of deg P0 = 0 and the first equation one gets deg Pn ≤ n, by induction it is easy to prove that the Pn (j ) are given as in the proposition, and in the other direction, that the uniquely determined Pn (z) with deg Pn ≤ n and with these Pn (1) satisfy the two equations.) The expression for log Dn (1) follows from δ(s)"(s) = 1s + (δ1 − γ ) + . . . and ξDn (s, 1) = −"(s)
∞ n+k−1 k=0
n−1
(k + 1)−s = −"(s)
n−1
τn,l ζ (s − l).
l=0
"1 (z) is the usual "-function, "2−1 (z) = G(z) is known as Barnes’ G-function. For these two functions we√will give the result more explicitly. It is well known that 1 ζ (0) = − 21 , ζ (0) = − log 2π and ζ (−1) = − 12 . With the Kinkelin-Glaisher constant 1 A one can express ζ (−1) = 12 − log A (compare [Vo, pp. 461–464], [Al, p. 357]), but we use just ζ (−1). Corollary 8.4. For n = 1, 2 one has 1 "1 (z) D1 (z) = √ e−(δ1 −γ )(z− 2 ) , 2π "2 (z) ζ (−1)+z log √2π+ δ1 −γ ((z−1)2 − 1 ) 2 6 . D2 (z) = √ e 2π By combining (8.5) and (8.2) one gets the usual Stirling formula for "(z).
(8.5) (8.6)
84
G. Illies
9. The Function θD (t, s) We now define and examine the function θD (t, s) for a directed divisor. This function is a Laplace transform of ξD (s, z) for the variable z which turns out to be a "mixture" of θD (t) and ξD (s) and is an essential tool in [Il2]. We give without proof a sort of generalization of Theorem 3 to directed divisors in terms of this function. ∗ with ϕ1 < arg(t) < In the sequel Wlϕ1 ,ϕ2 is regarded as the subset of all those t ∈ C −ϕ2 +2π (and we use these arguments for log t). Then Wlϕ1 ,ϕ2 is also defined for ϕi ≤ 0, which is needed in what follows. we define Definition 9.1. For ρ ∈ C\R≥0 , s ∈ C with (−s) ∈ N0 and t ∈ C e xp(ρ, t, s) :=
e−πi(s−1) "(s)"(1 − s, ρt) · eρt . 2π i
For a directed divisor D in Wlϕ1 ,ϕ2 (with 0 < ϕi < π ) we define e xp(ρ, t, s) for t ∈ Wl( π2 −ϕ1 ),( π2 −ϕ2 ) , (s) > r. θD (t, s) :=
(9.1)
(9.2)
ρ∈D
In the definition of e xp(ρ, t, s), a type of multivalued exponential function, the incomplete Gamma function (obviously holomorphic in α and z) ∞ ∗ "(α, z) := e−τ τ α−1 dτ, α ∈ C, z ∈ C z
is used. Properties of "(α, z) are well known (e.g. [EMOT, II, Chap. 9]). We state the needed properties of e xp(ρ, t, s) in Lemma 13.1 and give a selfcontained proof. In particular, by the lemma one can see that the defining sum for θD (t, s) converges absolutely and is holomorphic in the given domains. Proposition 9.2. With D as in the above definition, t ∈ Wl( π2 −ϕ1 ),( π2 −ϕ2 ) and (s) > r one has iα t −(s−1) e ∞ wt θD (t, s) = e ξD (s, w)dw (9.3) 2π i 0 for every α ∈] − ϕ2 , ϕ1 [ satisfying (eiα t) < 0, and the integral converges absolutely. θD (t, s) satisfies the following functional equations: θD (t, s + 1) − θD (t, s) = and if D is strictly directed in Wlϕ1 ,ϕ2 with
π 2
t −s · ξD (s) 2π i
(9.4)
< ϕi < π ,
θD (t, s) − e2πi(s−1) θD (exp(2π i)t, s) = θD (t), t ∈ Wr(ϕ2 − π2 ),(ϕ1 − π2 ) ,
(9.5)
θD (t, s), is identified where the overlap in Wl( π2 −ϕ1 ),( π2 −ϕ2 ) , the domain of definition for with Wr(ϕ2 − π2 ),(ϕ1 − π2 ) which is the domain of definition for the partition function θD (t) (compare Definition 6.1). Proof. By majorized convergence using Lemma 12.1 the Laplace integral representation is obtained from (13.4). The functional equations follow from the corresponding ones for e xp(ρ, t, s) given in Lemma 13.1.
Regularized Products and Determinants
85
Remark 7. The proposition shows that θD (t, s) behaves like ξD (s) in the variable s and like θD (t) in the variable t. In particular, θD (t, s) is meromorphic for (s) > −p if and only if this is true for ξD (s). For q ∈ C, a regularization sequence δ and B(z) ∈ C[z] we define the polynomial B [[q]] (z) ∈ C[z] by π(−z)s+q 1 CTs=0 δ(s)B(∂s ) = B [[q]] (log z)zq , − 2πi sin(π(s + q)) with arg(−z) := arg(z) − π. Theorem 6. Let 0 < σi ≤ ϕi < π for i = 1, 2 and D be a directed divisor in Wlϕ1 ,ϕ2 such that ξD (s) is meromorphic for (s0 ) > −p (compare Remark 1). For s0 ∈ C with (s0 ) > −p , a regularization sequence δ and a pB-system (pn , Bn (z))n=0,1,... with abscissa p < ∞, the following statements are equivalent: A) ξD (s) is (σ1 , σ2 )-bounded p-regular with singular part distribution 1 . − pn , Bn (∂s ) s + pn n=0,1,... s0 (t, t −1 ) with P s0 (t, t −1 )t −(p+s0 −1) → 0 for |t| → ∞ C’) There exists a polynomial P and such that the Cramér asymptotic with abscissa p + (s0 ) − 1, s0 (t, t −1 ) θD (t, s + s0 ) ∼ P CTs=0 δ(s)t s+s0 −1 +
∞
[[pn +s0 −1]]
t pn +s0 −1 Bn
(log t)
(9.6)
n=0
for |t| → 0 is valid in Wl( π2 −σ1 ),( π2 −σ2 ) . The polynomial in C’) is then uniquely determined: n−1 s0 (t, t −1 ) = 1 P CTs=0 (δ(s)ξD (s + s0 − k − 1))t k 2π i k=0
with n such that n − 1 < p + (s0 ) − 1 ≤ n. As this theorem has no direct application to regularization we omit the analogue of Proposition 7.3 for [[q]] which shows that the Cramér asymptotic in C’) is a general one written in a special manner, and we give only the idea of the proof of Theorem 6. Proof. (idea) By Theorem 4 it suffices to prove B’) ⇔ C’). This equivalence can be shown using (9.3) and its inversion by a Hankel integral ((2.6.11) in [Il2]) integrating the asymptics term by term. For details see Sect. 2.6.1 in [Il2]. Remark 8. B [[q]] (log t)t q − B [[q]] (log(exp(2π i)t))(exp(2π i)t)q = B(log t)t q and (9.5) lead one to rediscover the implication A) ⇒ C) in Theorem 3 but now one has with Theorem 4 the general equivalence A) ⇔ B’) ⇔C’) for all directed divisors already mentioned in Sect. 7. Because of (9.3) C’) is an explicit determination of the Cramér asymptotic of the Laplace transform of ξD (s, z) which is the basic meaning of Theorem 6.
86
G. Illies
10. Renormalized Determinants The following is a generalization of ideas from §5 of [Vo]. Because of (3.2) and (3.3) every meromorphic function D (z) of finite order with divisor D has representations of the form z λ λ1 log D (z) = ... (−1)- ξD (- + 1, λ0 )dλ0 dλ1 . . . dλ(10.1) a-+1
a-
a1
for certain - ≥ g and ai ∈ C, e.g. Wei,D,a (z) defined by Eq. (3.6) for ai = a. Easy considerations show that one must have |ai | = ∞ in order to get a determinant (compare Sect. 4) by this. But then (10.1) is divergent, so one has to renormalize the divergent integral. If D is quasi-directed and bounded regularizable according to Theorem 4 one has a Stirling asymptotic for ξD (- + 1, z), and (10.1) with ai = ∞ can be renormalized z if one has a renormalization for every integral of the form ∞ λ−q B(log λ)dλ, q ∈ C, B(z) ∈ C[z] (taking of course the value of the integral in case of absolute convergence). In the sequel for B(z) ∈ C[z] and q ∈ C we define B {q−1} (z) ∈ C[z] by d −(q−1) {q−1} z B (log z) = z−q B(log z), dz B {0} (0) = 0. Thus the z−(q−1) B {q−1} (log z) are just those primitives of the z−q B(log z) whose constant terms are zero. Definition 10.1. A renormalization sequence ω is a sequence (ωn )n=0,1,... of complex numbers. For such a renormalization sequence ω and D ∈ Dbreg , i.e. D is a quasidirected bounded regularizable divisor, the ω-renormalized determinant D (z) of D is defined by log D (z) =
z λ∞
∞
...
λ1
∞
(−1)- ξD (- + 1, λ0 )dλ0 dλ1 . . . dλ-
for - ≥ g, integrating (- + 1) times using the Stirling asymptotic for ξD (- + 1, z) from Theorem 4 and following the renormalization rule: z z−(q−1) B {q−1} (log z) for q = 1 λ−q B(log λ)dλ := (10.2) B {0} (log z) + ω(B) for q = 1 ∞ with ω(B) :=
k
ωk0 bk for B(z) =
k bk z
k.
One can easily prove that the definition of D (z) is independent of - ≥ g and of the lines of integration and that it delivers indeed a determinant system on Dbreg . The Stirling asymptotic that determines log D (z) in the same way as (7.8) for the δregularized determinant is derived by integrating the Stirling asymptotic for ξD (- + 1, z) term by term following (10.2). The next theorem shows that renormalization and regularization in fact are essentially the same.
Regularized Products and Determinants
87
Theorem 7. There is a bijection between the set of regularization sequences δ and the set of renormalization sequences ω such that the δ-regularized determinant and the ω-renormalized determinant deliver the same determinant system on Dbreg . The ω0 -renormalized determinant with ωn0 := 0 for all n ∈ N0 delivers the zerorenormalization as defined in Example 3 in Sect. 3. Proof. By Theorem 5 and the properties of the map [q], in particular, (7.7) and the fact that Stirling asymptotics for log(m) f (z) can be differentiated term by term (Remark 2 in Sect. 7) one easily sees that it is sufficient to observe the following: 1. δ(s) = "(1 − s) is a regularization sequence with B [0] (0) = 0 for all B(z) ∈ C[z]. This follows from (7.5) as then one has CTs=0 (" (k) (s)) = (−1)k+1 k!δk for all k ∈ N0 . 2. Let >1 be the C-vector space of all renormalization sequences and >2 that of all regularization sequences. Define ? to be the C-vector space of all C-linear maps from C[z] to C. Then regard the maps α1 : >1 −→ ?, ω −→ (B → ω(B)), α2 : >2 −→ ?, δ −→ (B → (B [0]δ − B [0]0 )), where in the latter definition [q]δ means the map [q] for the regularization sequence δ while [q]0 means [q] for the special regularization sequence of zero renormalization (δ(s) = "(1 − s)). These maps are obviously isomorphisms and α1−1 ◦ α2 is the demanded isomorphism between >2 and >1 . 11. Proof of Theorem 2b) Given a system of relations
(i)
|+α1,k
D1
+ ... +
k
D (i)
(i)
|+αn,k
Dn
= D (i) ,
i ∈ I,
(11.1)
k
Dreg ,
Dqd \Dreg ,
(i)
with ∈ D1 , . . . , Dn ∈ αm,k ∈ C for i ∈ I , m = 1, . . . , n and with finite sums over the index k. We first regard logarithms of associated functions for large real z. We choose the logarithms of the regularized determinants log D (i) (z) := −CTs=0 (δ(s)ξD (i) (s, z)) and logarithms log inDm (z) of certain associated functions inDm (z). We search for polynomials Pm (z) =
gm l=0
(−1)l+1
xm,l l z, l!
m = 1, . . . , n
such that log Dm (z) = Pm (z) + log inDm (z) is consistent with i) and ii) of Definition 4.1 under (11.1). With the polynomials (i) (i) P(i) (z) := log D (i) (z) − log inD1 z − α1,k − . . . − log inDn z − αn,k k
k
(11.2)
88
G. Illies
this is equivalent to (i) (i) P1 z − α1,k + . . . + Pn z − αn,k , P(i) (z) = k
i ∈ I,
(11.3)
k
and this is equivalent to a system of linear equations for the xm,l . Now by Zorn’s lemma a system of linear equations has a solution if every finite subsystem has one. Thus it suffices to prove that there is always a solution if |I | < ∞. So wlog we may assume that δ(s) is a polynomial (as the finitely many log D (i) (z) depend only on finitely many δn ). And after a trivial translation we also assume that all (i) divisors are directed and that there is an α > 0 such that |z| < α implies |z − αm,k | < |ρ| for all ρ that occur in the D (i) , Dm ; we always assume |z| < α. If we choose log inm (z) = Wei,g log Dm ,0 m (z) (compare Eq. (3.6)), then we have with the coefficients given in the proof of Theorem 1 b) and using Proposition 3.3: (i) log inDm z − αm,k (i) l gm z − αm,k (i) − CTs=0 δ(s) ξDm s, z − αm,k − ξDm (s + l) (−1)l l! l=0
and thus by Eq. (11.2)
(i) l g1 z − α1,k (−1)l ξD1 (s + l) P(i) (z) = − CTs=0 δ(s) l! l=0
... +
gn l=0
k
(−1)l
(i) l
z − αn,k l!
k
(11.4)
ξDn (s + l) , i ∈ I
(i) (i) (as k ξD1 (s, z −α1,k )+. . .+ k ξDn (s, z −αn,k ) = ξD (i) (s, z)). Comparison of (11.4) and (11.3) leads one to introduce the functions xm,l (s) := δ(s)ξDm (s + l), Pm (s, z) :=
gm l=0
P(i) (s, z) :=
(−1)l+1
xm,l (s) l z, l! (i)
P1 (s, z − α1,k ) + . . . +
k
(11.5) k
(i)
Pn (s, z − αn,k ),
where the xm,l (s) and the Pm (s, z) are all holomorphic for (s) > rmax := max rm while P(i) (s, z) is meromorphic for (s) > 0 with CTs=0 (P(i) (s, z)) = P(i) (z),
i ∈ I,
(11.6)
for |z| < α as is seen from Eq. (11.4). We now expand (11.5) and (11.3) by powers of z: P(i) (s, z) =
g max l=0
p(i),l (s)zl
Regularized Products and Determinants
89
gmax and P(i) (z) = l=0 p(i),l zl . Regard the p(i),l (s) and correspondingly the p(i),l as the components of vectors p(s), p ∈ CM , and regard the xm,l (s) and xm,l as the components of vectors x m,l (s), x m,l ∈ CN . The expansion of (11.3) and (11.5) by powers of z delivers a matrix B ∈ Mat(N × M, C) such that p(s) = B · x(s)
for (s) > rmax ,
(11.7)
and it has to be shown that there is a x ∈ CN such that p = B · x. But there is a matrix Bˆ ∈ Mat(M × N , C) such that a solution exists if and only if Bˆ · p = 0. With this Bˆ one has Bˆ · p = Bˆ · CTs=0 (p(s)) = CTs=0 (Bˆ · p(s)) = 0, where the first equality is obtained from (11.6) and the last from (11.7).
Remark. Observe that in the proof the operation CTs=0 is applied to functions f (s) = f1 (s) + f2 (s) with f1 (s) being meromorphic around s = 0 and f2 (s) defined only for (s) > 0 but continuous at s = 0.
12. Proof of Theorem 4 The following estimate is needed to apply majorized convergence to integrals over ξD (s, z). Lemma 12.1. Let 0 < ϕi < ϕi < π , i + 1, 2 and let D be a directed divisor in Wlϕ1 ,ϕ2 and given r ≥ r such that c := ρ∈D |mD (ρ)ρk−r | < ∞. Then for (s0 ) > r, mD (ρ) r −(s )
0 (z − ρ)s0 = O |z|
(12.1)
ρ∈D
for z ∈ Wrϕ1 ,ϕ2 and |z| → ∞. Proof. We split the series in
|ρ|< 21 |z| and
|ρ|≥ 21 |z| and treat these two series separately. 1 r |ρ|<x |mD (ρ)| ≤ cx for any 2 |z| and use
For the first series we estimate |z − ρ| > x > 0 which follows immediately from the definition of c. This last inequality on the other hand implies x1 ≤|ρ|<x2
x2 mD (ρ) 1 ≤ cx r 1 + cr y r −1 α dy 1 α ρα x1 y x1
(12.2)
x for all α ∈ R>0 and 0 < x1 < x2 (as the rhs obviously maximizes x12 y −α dµ(y) under x the condition x1 dµ(y) ≤ cx r for all x ∈ [x1 , x2 ]). Observing that there is a β > 0 such that |z − ρ| > β|ρ| for all ρ ∈ D and z ∈ Wrϕ1 ,ϕ2 and using (12.2) we get the estimate also for the second series.
90
G. Illies
In the sequel we often tacitly use the following estimate: For 0 ≤ ϕ < and m ∈ N0 , " (m) (s) = O(e−ϕ|(s)| ),
|(s)| → ±∞
π 2,
α1 < α2 (12.3)
for α1 < (s) < α2 . For m = 0 this is part of the Stirling formula, for m > 0 it follows by applying Cauchy’s inequalities. Proof of Proposition 7.1. By majorized convergence (for b) apply the above Lemma) the two integral representations have to be proved only for one-point-divisors. In the sequel for expressions a s we always use arg(a) ∈] − π, π [. a) Let (s) > c > 0 and (ρ) < 0, (z) > 0, then by Euler’s Mellin integral for "(s) and its inversion one has ∞ "(s) = e−zt t s−1 eρt dt (z − ρ)s 0 ∞ c+i∞ "(s ) −s 1 −zt s−1 = e t t ds dt 2π i c−i∞ (−ρ)s 0 c+i∞ 1 "(s − s ) "(s ) = ds , 2πi c−i∞ zs−s (−ρ)s the last equation by interchanging the integrations (Fubini). Using the identity theorem one gets this formula as needed for ρ ∈ Wlϕ1 ,ϕ2 and z ∈ Wrϕ1 ,ϕ2 because both sides are holomorphic in the variables ρ and z. b) For ρ ∈ Wlϕ1 ,ϕ2 , z ∈ Wrϕ1 ,ϕ2 and (s) > 0 we will prove 1 "(s) = s (z − ρ) 2πi
C
"(s − s0 + 1) "(s0 ) dw. (z − w)s−s0 +1 (w − ρ)s0
(12.4)
For z0 ∈ Wrϕ1 ,ϕ2 and (s0 ) < 1 one has 1 2πi
C
iπs0 − e−iπs0 ∞ 1 1 v −s0 −s e dw = z dv 0 (z0 − w)s−s0 +1 w s0 2π i (1 + v)s−s0 +1 0 sin π s0 "(1 − s0 )"(s) = z0−s π "(s − s0 + 1) "(s) 1 = s , z0 "(s0 )"(s − s0 + 1)
the first equation by substituting w → −z0 v and deforming the contour (residue theorem), the second because of the representation 1.5 (2) in [EMOT] for Euler’s beta function B(u, v) = "(u)"(v)" −1 (u + v) and the third because of the equation "(1 − s0 )"(s0 ) = π sin−1 πs0 . Now for ρ ∈ Wlϕ1 ,ϕ2 such that z = z0 + ρ ∈ Wrϕ1 ,ϕ2 , replace the contour C by the shifted contour C − ρ. The value of this integral is independent of ρ (residue theorem) and (by majorized convergence for ρ → 0) equals the value of the above integral. Applying the substitution w → (w − ρ) and the identity theorem yields (12.4) in the demanded generality.
Regularized Products and Determinants
91
Proof of Theorem 4. A) ⇒ B’). Let −p < −q < r < c with (pn ) = q for all n. Then by the residue theorem (7.1) for (s) > c and z ∈ Wrϕ1 ,ϕ2 becomes −q+i∞ "(s − s ) 1 ξD (s, z) = ξD (s )ds 2π i −q−i∞ zs−s "(s − s ) + Ress =−pn ξD (s ) zs−s (pn ) −q as is seen by the identity theorem. From this B’) easily follows for (s0 ) > −p. If −p < (s0 ) ≤ −p then first take the Stirling asymptotic for s0 = s0 + k with k ∈ N such that −p < (s0 ) ≤ −p + 1 and integrate k times. B’) ⇒ A). First note that we just need to show that ξD (s) is (σ1 , σ2 )-bounded pregular, but we do not need to determine the singular part distribution as then because of A) ⇒ B’) and the properties of the map [q] it must be the demanded one. Let now 0 < σi < σi < σ . We have for (s) > r and C = Cσ1 ,σ2 , 1 "(s − s0 + 1) CTs1 =0 (δ(s1 )ξD (s1 + s0 , w)) dw, (12.5) ξD (s, z) = 2πi C (z − w)s−s0 +1 which is obtained by applying partial integration to (7.2) using (3.2) where the necessary estimates for |w| → ∞ are derived by integrating (12.1). Now as (12.5) is not valid for z = 0 one has to use a little trick: One deforms the contour C and uses a "shifted" Stirling asymptotics. Let ε > 0 then by Taylor series expansion one obtains a pB-systems n ) with abscissa p such that ( pn , B q (z) := CTs1 =0 (δ(s1 )ξD (s1 + s0 , z)) R B n (log(z + ε)) s0 (z) − −P (z + ε)pn +s0 ( pn ) max(r, −(p0 ), deg P 1 Bn (log(w + ε)) "(s − s0 + 1) ξD (s) = dw 2π i C (w + ε)pn +s0 (−w)s−s0 +1 ( pn ) 0, e xp(ρ, t, s) − e2πi(s−1) e xp(ρ, exp(2π i)t, s) i(α+δ) ei(α−δ) ∞ wt e ∞ t −(s−1) e = − dw "(s)eρt 2πi ws 0 0 ! " (e−π i(s−1) −eπ i(s−1) )t s−1 "(1−s)
=
1 "(s)"(1 − s)2i sin(π s)eρt = eρt , 2πi
thus (13.1), by the identity theorem also in general. It remains to prove (13.3). We assume arg(ρt) ∈] − ε , ε [ for 0 < ε < π2 , the general case follows then by rotating the ray of integration, ∞ (−ρt)w "(s) e e xp(ρ, t, s) = dw, (−ρt)−(s−1) 2π i (w + 1)s 0
Regularized Products and Determinants
93
which immediately gives the estimate | exp(ρ, t, s)| < c1 |ρt|−((s)−1) for |ρt| ≥ 1 for a suitable c1 > 0 und thus (13.3) by (13.2). For |ρt| ≤ 1 on the other hand with 1 0 < α := (ρt) ≤ 1 and the trivial estimate e−x ≤ x+α for x ∈ R≥0 one has 0
∞
e−(ρt)w dw = α (s)−1 |(w + 1)s | ≤α
(s)−1
∞
0 ∞ 0
and the assertion easily follows also for |ρt| ≤ 1.
e−x dx (x + α)(s) dx dx, (x + α)(s)+1
14. Miscelleanea In Chapter 2 of [Il1] the formalism of regularized determinants was developed more generally: Following Jorgenson and Lang ([JL1]) divisors with non-integer multiplicities mD : C → C (instead of C → Z) were regarded, then everything can be carried out with almost no difficulties, except that the associated functions become multivalued with the ρ ∈ D as branch points. Also essential singularities for ξD (s) were allowed. In that case it is neccessary that the formal power series δ(s) is convergent near zero. With this assumption almost everything can be done in general although some not completely trivial convergence problems occur. The maps [q] and [[q]] defined in Sects. 7 and 8 are special cases of the following construction: For q ∈ C, a regularization sequence δ and a function h(s), which is meromorphic in a neighborhood of q we define a linear map [h, q] : C[z] → C[z] (notation: B(z) → B [h,q] (z)) by CTs=0 (δ(s)B(∂s )[h(s)zs+q ]) = B [h,q] (log z)zq . If h1 (s) and h2 (s) are two such function, then if h1 (s) is, in addition, holomorphic at s = q the composition law [h2 , q] ◦ [h1 , q] = [h1 · h2 , q] is easily checked. For example 1 this implies [[q]] = − 2πi [1 − q] ◦ [q] for q = −n, n ∈ N0 . Also a sort of inverse of [q] can be defined (compare Satz 2.3.6 in [Il1]). Acknowledgements. I would like to thank C. Deninger for supervising my Ph.D. thesis as well as M. Schröter, I. Vardi, C. Bree, C. Soulé, A. Voros, J. B. Bost and J. Jorgenson for helpful discussions and improvements. Parts of the article were written during a visit at the IHES.
References [Al] [Bar] [Cr] [CV] [De1] [De2]
Almquist, G.: Asymptotic Formulas and Generalized Dedekind Sums. Exp. Math. 7, 343–359 (1998) Barnes, E.W.: On the Theory of the Multiple Gamma Function. Phil. Trans. of the Royal Soc. (A) 19, 374–439 (1904) Cramér, H.: Studien über die Nullstellen der Riemannschen Zetafunktion. Math. Zeitschrift 4, 104–130 (1919) Cartier, P., Voros, A.: Une nouvelle interpretation de la formule des traces de Selberg. In: The Grothendieck Festschrift, Vol. 2, Basel–Boston: Birkhäuser, 1991, pp. 1–67 Deninger, C.: Motivic L-functions and regularized determinants. In: Motives, Proc. of Symp. Pure Math. 55/1, Providence, RI: AMS, 1994, pp. 707–743 Deninger, C.: Motivic L-functions and regularized determinants II. In: F. Catanese (Hrsg.) Proc. Arithmetic Geometry, Cortona, 1994
94
[De3]
G. Illies
Deninger, C.: Some Analogies between Number Theory and dynamical Systems on foliated Spaces. Documenta Mathematica, extra vol. ICM 1998, I, Plenary Talks, pp. 23–46 [Do] Doetsch, G.: Handbuch der Laplacetransformation I/II, Basel: Birkhäuser, 1950/1955 [Ef] Efrat, L.: Determinants of Laplacians on surfaces of finite volume. Commun. Math. Phys. 119, 443–451 (1988); Erratum. Commun. Math. Phys. 138, 607 (1991) [EMOT] Erdelyi, A., Magnus, W., Oberhettinger, F., Tricomi, F.G.: Higher transcendental functions I, II, III. New York: McGraw-Hill, 1953 [EORBZ] Elizalde, E., Odintsov, S.D., Romeo, A., Bytsenko, A.A., Zerbini, S.: Zeta regularization techniques with applications. Singapore: World Scientific, 1994 [Gui] Guinand, A.D.: Fourier reciprocities and the Riemann zeta function. Proc. London Math. Soc. (2) 51, 401–414 (1950) [Il1] Illies, G.: Regularized products, trace formulas and Cramér functions. Ph.D.-thesis (in German), Schriftenreihe des mathematischen Instituts der Universität Münster, 3. Serie, Heft 22, 1998 [Il2] Illies, G.: Cramér functions and Guinand equations. IHES-preprint 1999 [JL1] Jorgenson, J., Lang, S.: Basic Analysis of regularized series and products. LNM 1564, Berlin: Springer, 1994 [JL2] Jorgenson, J., Lang, S.: On Cramér’s theorem for general Euler products with functional equation. Math. Ann. 297/3 383–416 (1993) [JL3] Jorgenson, J., Lang, S.: Extension of analytic number theory and the theory of regularized harmonic series from Dirichlet series to Bessel series. Math. Ann. 306, 75–124 (1996) [Ko1] Koyama, S.Y.: Determinant expressions of Selberg zeta functions I. Trans. AMS 324, 149–168 (1991) [Ko2] Koyama, S.Y.: Determinant expressions of Selberg zeta functions II. Trans. AMS 329, 755–772 (1992) [Ko3] Koyama, S.Y.: Determinant expressions of Selberg zeta functions III. Proc. AMS 113, 303–311 (1991) [Ku] Kurokawa, N.: Multiple sine functions and Selberg zeta functions. Proc. Japan Acad. 67A, 61–64 (1991) [LW] Landau, E., Walfisz, A.: Über die Nichtfortsetzbarkeit einiger durch Dirichletsche Reihen definierter Funktionen. Rend. di Palermo 44, 8286 (1919) [Ma] Manin, Y.I.: Lectures on zeta functions and motives Preprint MPI Bonn, 1992 [No] Norlund, N.E.: Memoire sur les polynomes de Bernoulli. Acta Mathematica 43, 121–196 (1920) [QHS] Quine, J.R., Heydari, S.H., Song, R.Y.: Zeta-regularized products. Trans. of the AMS 338, 1, 213–231 (1993) [RS] Ray, D., Singer, I.: Analytic torsion for analytic manifolds. Ann. Math. 98, 154–177 (1973) [Sa] Sarnak, P.: Determinants of Laplacians. Commun. Math. Phys. 110, 113–120 (1987) [ScSo] Schröter, M., Soulé, C.: On a Result of Deninger Concerning Riemann’s Zeta Function. In: Motives, Proc. of Symp. Pure Math. 55/1, Providence, RI: AMS, 1994, pp. 745–747 [So] Soulé, C.: Letter to C. Deninger, 13.2.1991, as: M. Schröter, S. Soulé: On a result of Deninger concerning Riemann’s zeta function. In: Motives, Proc. of Symp. Pure Math. 55/1, Providence, RI: AMS, 1994, pp. 745–747 [Ti] Titchmarsh, E.C.: The Theory of Functions. 2nd ed., Oxford: Oxford University Press, 1939 [Va] Vardi, I.: Determinants of Laplacians and multiple Gamma Functions. Siam J. Math. Anal. 19, 1, 493–507 (1988) [Vi] Vigneras, M.F.: L’equation fonctionelle de la fonction zeta de Selberg du groupe modulaire SL(2, Z). Asterisque 61, 235–249 (1979) [Vo] Voros, A.: Spectral Functions, Special Functions and the Selberg Zeta Function. Commun. Math. Phys. 110, 439–465 (1987) Communicated by P. Sarnak
Commun. Math. Phys. 220, 95 – 104 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Super Brockett Equations: A Graded Gradient Integrable System R. Felipe1 , F. Ongay2 1 ICIMAF, Havana, Cuba, and Universidad de Antioquia, Medellín, Colombia 2 CIMAT, Guanajuato, Mexico. E-mail:
[email protected] Received: 9 February 2000 / Accepted: 18 January 2001
Abstract: Rather recently equations of Lax type defined by a double commutator, the so-called Brockett equations, have received considerable attention. In this paper we prove that a supersymmetric version of a Brockett hierarchy is an infinite dimensional integrable gradient system. As far as we know, this is the only graded system of this type existing in the literature. 0. Introduction Ever since the discovery in 1968 by Gardner, Green, Kruskal and Miura of the inverse scattering method to solve the KdV equations, the theory of infinite dimensional integrable systems, sometimes also known as the theory of soliton equations, has been the subject of a great deal of work, and many results and applications have stemmed from this newfound attention to the subject. As is well known, one of the first major developments came with the realization that these systems can be put in the so-called Lax form, L˙ = [L, N ], since this description is particularly well suited to stress some of the geometrical interpretations of the equations, in particular allowing to place them into a Hamiltonian framework. On the other hand, some ten years ago, ODE’s of Lax type defined by more than one Lie bracket were introduced by R. Brockett (see [B1] and [B2]), in connection with some least squares matching and sorting problems. Surprisingly enough, these so-called Brockett systems exhibit many remarkable features besides the original intended ones: to name one, it was discovered by A. Bloch, R. Brockett and T. Ratiu (see e.g. [B-B-R]) that the equations corresponding to the celebrated Toda lattice can be cast into this mold. But moreover, another property of these equations, still more relevant to our purposes, was also proved in [B-B-R], where it was shown that these finite dimensional systems Partially supported by CONACYT, Mexico, project 28-492E and CODI project “Complete integrability of Brockett type equations”, University of Antioquia, Colombia.
96
R. Felipe, F. Ongay
are completely integrable, but of gradient type (the existence of a suitable Hamiltonian structure remaining an open question). Quite recently, the theory of Brockett equations was adapted by one of us for PDE’s (reference [F]), and it was proved that many important properties, such as the complete integrability and the property of being a gradient system, were still valid in this infinite dimensional context, but also that this analog of the Brockett equation belongs to a hierarchy, similar to the well known KdV or KP hierarchies. In this work we consider yet another extension of the Brockett system: Following the approach to supersymmetric (i.e., Z2 -graded) versions of the KP hierarchy, studied for example by Manin and Radul ([M-R]), Mulase ([Mu2]), or Rabin ([R]), we define and study a supersymmetric extension of the Brockett hierarchy introduced in [F]. In particular, our main results will show that the properties of being completely integrable and a gradient flow, also extend to this graded hierarchy; to the best of our knowledge, this is the first example of a graded system possessing these properties. Furthermore, the flows associated to this new hierarchy naturally “live” on a flag in the space of gauge operators, and we conjecture that this geometric feature of our construction might be of some use in the algebro-geometric study of deformations of line bundles over algebraic curves, both in the classical and graded case. 1. A Z2 -Graded Brockett Hierarchy We will consider in this work a rather standard (1, 1) dimensional setting, namely, the one studied by Manin and Radul, which we now briefly recall, referring the reader to the basic reference [M-R] for more details (see also [Mu2]): First of all, let x denote an even variable, ξ an odd one (the parity of an object will be denoted by a tilde, so that for instance x˜ = 0; ξ˜ = 1), and fix some ring of “superfunctions” in these variables (for instance, we may take the ring of formal power series in x and ξ ), B, where the operator θ = ∂ξ +∂x acts as an odd derivation (recall that θ 2 = ∂x ). Then one considers the ring of (formal) super pseudo-differential operators, B((θ −1 )), with coefficients in B. To avoid confusion with the action of the derivations on the operators, the product in this ring will be denoted by ◦, and by θ −1 we will denote the (formal) inverse of θ. Thus, every operator L ∈ B((θ −1 )) can be written as a formal series bi θ i , L= i≤m
and, as usual, we will write L+ =
bi θ i ;
L− =
bi θ i ,
(1)
i 0, Eq. (7) gives the flow of the gradient of the graded Adler functional Fk (S) on the affine subspace 1 + E (−k−2) . Proof. Indeed, to end the proof of our claim, it remains only to observe that, from Lemma 2, we have θk S −1 = −S −1 ◦ θk S ◦ S −1 = (−1)k+1 S −1 ◦ [, k+1 − ]. Therefore, modulo an inessential sign, the right-hand side of (15) is in fact equivalent to the right-hand side of (7), which we have already shown to be equivalent to the super Brockett system. Remark. The graded hierarchy that we have constructed in this paper preserves, and in a definite sense generalizes, several of the remarkable features of the standard Brockett equation. But moreover, we have also seen that these super Brockett equations will induce a flow on an infinite Grassmannian, of a different type to that given by the known super KP flows. We conjecture, therefore, that this hierarchy might also be of value, for instance, for the algebro-geometric study of deformations of superline bundles over supercurves, etc. (and it is clear that this remark also applies to the non-graded case; see also [F]). We hope to clarify some of these questions in a future work. Acknowledgements. Both authors wish to express their indebtedness to Prof. J. Rabin, who patiently listened to our expositions of a preliminary version of this work, and made several valuable comments. The bulk of this paper was done during reciprocal visits by each author to his coauthor’s respective institution; both of us thankfully acknowledge their hospitality during these stays. Finally, we are grateful to one of the referees, who pointed out an error in the original manuscript.
References [B-B-R] Bloch, A.M., Brockett, R.W., and Ratiu, T.S.: Completely integrable gradient flows. Commun. Math. Phys. 147, 57–54 (1992) [B1] Brockett, R.W.: Least squares matching problems. Linear Algebra Appl. 122, 761–777 (1989) [B2] Brockett, R.W.: Dynamical systems that sort lists, diagonalize matrices, and solve linear programming problems. Linear Algebra Appl. 146, 79–91 (1991) [D] Dickey, L.A.: Soliton equations and Hamiltonian systems Advanced Series in Math. 12, Phys. Singapore: World Scientific, 1991 [F] Felipe, R.: Algebraic aspects of Brockett type equations. Physica D 132, 287–297 (1999) [M-R] Manin, Yu.I., and Radul, O.A.: A supersymmetric extension of the Kadomtsev–Petviashvili hierarchy. Commun. Math. Phys. 98, 65–77 (1985)
104
[Mu1] [Mu2] [R]
R. Felipe, F. Ongay
Mulase, M.: Complete integrability of the Kadomtsev–Petviashvili equation. Adv. Math. 54, 57–66 (1984) Mulase, M.: A new super KP system and a characterization of the Jacobians of arbitrary algebraic supercurves. J. Diff. Geom. 34, 651–680 (1991) Rabin, J. M.: The geometry of super KP flows. Commun. Math. Phys. 137, 533–552 (1991)
Communicated by T. Miwa
Commun. Math. Phys. 220, 105 – 164 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Fermionic Formulas for Level-Restricted Generalized Kostka Polynomials and Coset Branching Functions Anne Schilling1, , Mark Shimozono2, 1 Department of Mathematics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge,
MA 02139, USA. E-mail:
[email protected] 2 Department of Mathematics, Virginia Tech, Blacksburg, VA 24061-0123, USA.
E-mail:
[email protected] Received: 9 April 2000 / Accepted: 26 January 2001
Abstract: Level-restricted paths play an important rôle in crystal theory. They correspond to certain highest weight vectors of modules of quantum affine algebras. We show that the recently established bijection between Littlewood–Richardson tableaux and rigged configurations is well-behaved with respect to level-restriction and give an explicit characterization of level-restricted rigged configurations. As a consequence a new general fermionic formula for the level-restricted generalized Kostka polynomial is obtained. Some coset branching functions of type A are computed by taking limits of these fermionic formulas. 1. Introduction Generalized Kostka polynomials [26, 33, 35–38] are q-analogues of the tensor product multiplicity λ cR = dim Homsln (V λ , V R1 ⊗ · · · ⊗ V RL ),
(1.1)
where λ is a partition, R = (R1 , . . . , RL ) is a sequence of rectangles and V λ is the irreducible integrable highest weight module of highest weight λ over the quantized enveloping algebra Uq (sln ). The generalized Kostka polynomials can be expressed as generating functions of classically restricted paths [30, 33, 37]. In terms of the theory of Uq (sln )-crystals [16, 17] these paths correspond to the highest weight vectors of tensor products of perfect crystals. The statistic is given by the energy function on paths. n )-crystal strucThe Uq (sln )-crystal structure on paths can be extended to a Uq (sl ture [18]. The level-restricted paths are the subset of classically restricted paths which, New address as of July 2001: Department of Mathematics, University of California, One Shields Ave., Davis, CA 956116-8633, USA. E-mail:
[email protected] Partially supported by NSF grant DMS-9800941.
106
A. Schilling, M. Shimozono
n )after tensoring with the crystal graph of a suitable integrable highest weight Uq (sl module, are affine highest weight vectors. Hence it is natural to consider the generating functions of level-restricted paths, giving rise to level-restricted generalized Kostka polynomials which will take a lead rôle in this paper. The notion of level-restriction is also very important in the context of restricted-solid-on-solid (RSOS) models in statistical mechanics [3] and fusion models in conformal field theory [39]. The one-dimensional configuration sums of RSOS models are generating functions of level-restricted paths (see for example [2, 9, 14]). The structure constants of the fusion algebras of Wess– Zumino–Witten conformal field theories are exactly the level-restricted analogues of the Littlewood–Richardson coefficients in (1.1) as shown by Kac [15, Exercise 13.35] and Walton [40, 41]. q-Analogues of these level-restricted Littlewood–Richardson coefficients in terms of ribbon tableaux were proposed in ref. [10]. The generalized Kostka polynomial admits a fermionic (or quasi-particle) formula [25]. Fermionic formulas originate from the Bethe Ansatz [4] which is a technique to construct eigenvectors and eigenvalues of row-to-row transfer matrices of statistical mechanical models. Under certain assumptions (the string hypothesis) it is possible to count the solutions of the Bethe equations resulting in fermionic expressions which look like sums of products of binomial coefficients. The Kostka numbers arise in the study of the XXX model in this way [22–24]. Fermionic formulas are of interest in physics since they reflect the particle structure of the underlying model [20, 21] and also reveal information about the exclusion statistics of the particles [5–7]. The fermionic formula of the Kostka polynomial can be combinatorialized by taking a weighted sum over sets of rigged configurations [22–24]. In ref. [25] the fermionic formula for the generalized Kostka polynomial was proven by establishing a statisticpreserving bijection between Littlewood–Richardson tableaux and rigged configurations. In this paper we show that this bijection is well-behaved with respect to levelrestriction and we give an explicit characterization of level-restricted rigged configurations (see Definition 5.5 and Theorem 8.2). This enables us to obtain a combinatorial formula for the level-restricted generalized Kostka polynomials as the generating function of level-restricted rigged configurations (see Theorem 5.7). As an immediate consequence this proves a new general fermionic formula for the level-restricted generalized Kostka polynomial (see Theorem 6.2 and Eq. (6.7)). Special cases of this formula were conjectured in refs. [8, 12, 13, 27, 33, 42]. As opposed to some definitions of “fermionic formulas” the expression of Theorem 6.2 involves in general explicit negative signs. However, we would like to point out that because of the equivalent combinatorial formulation in terms of rigged configurations as given in Theorem 5.7 the fermionic sum is manifestly positive (i.e., a polynomial with positive coefficients). The branching functions of type A can be described in terms of crystal graphs of n )-modules. For certain triples of weights irreducible integrable highest weight Uq (sl they can be expressed as limits of level-restricted generalized Kostka polynomials. The structure of the rigged configurations allows one to take this limit, thereby yielding a fermionic formula for the corresponding branching functions (see Eq. (7.10)). The derivation of this formula requires the knowledge of the ground state energy, which is obtained from the explicit construction of certain local isomorphisms of perfect crystals (see Theorem 7.3). A more complete set of branching functions can be obtained by considering “skew” level-restricted generalized Kostka polynomials. We conjecture that rigged configurations are also well-behaved with respect to skew shapes (see Conjecture 8.3).
Fermionic Formulas for Level-Restricted Generalized Kostka Polynomials
107
The paper is structured as follows. Section 2 sets out notation used in the paper. In Sect. 3 we review some crystal theory, in particular the definition of level-restricted paths, which are used to define the level-restricted generalized Kostka polynomials. Littlewood–Richardson tableaux and their level-restricted counterparts are defined in Sect. 4. The formulation of the generalized Kostka polynomials in terms of Littlewood– Richardson tableaux with charge statistic is necessary for the proof of the fermionic formula which makes use of the bijection between Littlewood–Richardson tableaux and rigged configurations. The latter are the subject of Sect. 5 which also contains the new definition of level-restricted rigged configurations and our main Theorem 5.7. The proof of this theorem is reserved for Sect. 8. The fermionic formulas for the level-restricted Kostka polynomial and the type A branching functions are given in Sects. 6 and 7, respectively. 2. Notation All partitions are assumed to have n parts, some of which may be zero. Let R = (R1 , R2 , . . . , RL ) be a sequence of partitions whose Ferrers diagrams are rectangles. Let Rj have µj columns and ηj rows for 1 ≤ j ≤ L. We adopt the English notation for partitions and tableaux. Unless otherwise specified, all tableaux are assumed to be column-strict (that is, the entries in each row weakly increase from left to right and in each column strictly increase from top to bottom). 3. Paths The main goal of this section is to define the level-restricted generalized Kostka polyn )-crystal graphs nomials. These polynomials are defined in terms of certain finite Uq (sl whose elements are called paths. The theory of crystal graphs was invented by Kashiwara [16], who showed that the quantized universal enveloping algebras of Kac–Moody algebras and their integrable highest weight modules admit special bases whose structure at q = 0 is specified by a colored graph known as the crystal graph. The crystal graphs for the finite-dimensional irreducible modules for the classical Lie algebras were computed explicitly by Kashiwara and Nakashima [17]. The theory of perfect crystals gave a realization of the crystal graphs of the irreducible integrable highest weight modules for affine Kac–Moody algebras, as certain eventually periodic sequences of elements taken from finite crystal graphs [19]. This realization is used for the main application, some new explicit formulas for coset branching functions of type A. 3.1. Crystal graphs. Let Uq (g) be the quantized universal enveloping algebra for the Kac–Moody algebra g. Let I be an indexing set for the Dynkin diagram of g, P the weight lattice of g, P ∗ the dual lattice, {αi | i ∈ I } the (not necessarily linearly independent) simple roots, {hi | i ∈ I } the simple coroots, and {i | i ∈ I } the fundamental weights. Let · , · denote the natural pairing of P ∗ and P . Suppose V is a Uq (g)-module with crystal graph B. Then B is a directed graph whose vertex set (also denoted B) indexes a basis of weight vectors of V , and has directed edges colored by the elements of the set I . The edges may be viewed as a combinatorial version of the action of Chevalley generators. This graph has the property that for every b ∈ B and i ∈ I , there is at most one edge colored i entering (resp. leaving) b. If there is an edge b → b colored i, denote this by fi (b) = b and ei (b ) = b. If there is no edge
108
A. Schilling, M. Shimozono
colored i leaving b (resp. entering b ) then say that fi (b) (resp. ei (b )) is undefined. The fi and ei are called Kashiwara lowering and raising operators. Define φi (b) (resp. i (b)) to be the maximum m ∈ N such that fim (b) (resp. eim (b)) is defined. There is a weight function wt : B → P that satisfies the following properties: wt(fi (b)) = wt(b) − αi , wt(ei (b)) = wt(b) + αi , hi , wt(b) = φi (b) − i (b).
(3.1)
B is called a P -weighted I -crystal. Let P + = { ∈ P | hi , ≥ 0, ∀i ∈ I } be the set of dominant integral weights. For ∈ P + denote by V() the irreducible integrable highest weight Uq (g)-module of highest weight . Let B() be its crystal graph. Say that an element b ∈ B of the P -weighted I -crystal B is a highest weight vector if i (b) = 0 for all i ∈ I . Let u be the highest weight vector in B(). By (3.1), for all i ∈ I , i (u ) = 0, φi (u ) = hi , .
(3.2)
Let B be the crystal graph of a Uq (g)-module V . A morphism of P -weighted I crystals is a map τ : B → B such that wt(τ (b)) = wt(b) and τ (fi (b)) = fi (τ (b)) for all b ∈ B and i ∈ I . In particular fi (b) is defined if and only if fi (τ (b)) is. Suppose V and V are Uq (g)-modules with crystal graphs B and B respectively. Then V ⊗ V admits a crystal graph denoted B ⊗ B which is equal to the direct product B × B as a set. We use the opposite of the convention used in the literature. Define b ⊗ fi (b ) if φi (b ) > i (b), fi (b ⊗ b ) = fi (b) ⊗ b if φi (b ) ≤ i (b) and φi (b) > 0, (3.3) undefined otherwise. Equivalently, ei (b) ⊗ b if φi (b ) < i (b), ei (b ⊗ b ) = b ⊗ ei (b ) if φi (b ) ≥ i (b) and i (b ) > 0, undefined otherwise.
(3.4)
One has φi (b ⊗ b ) = φi (b) + max{0, φi (b ) − i (b)}, i (b ⊗ b ) = max{0, i (b) − φi (b )} + i (b ).
(3.5)
Finally wt : B ⊗ B → P is defined by wt(b ⊗ b ) = wt B (b) + wt B (b ), where wtB : B → P and wtB : B → P are the weight functions for B and B . This construction is “associative”, that is, the P -weighted I -crystals form a tensor category. Remark 3.1. It follows from (3.4) that if b = bL ⊗ · · · ⊗ b1 and ei (b) is defined, then ei (b) = bL ⊗ · · · ⊗ bj +1 ⊗ ei (bj ) ⊗ bj −1 ⊗ · · · ⊗ b1 for some 1 ≤ j ≤ L.
Fermionic Formulas for Level-Restricted Generalized Kostka Polynomials
109
3.2. Uq (sln )-crystal graphs on tableaux. Let J = {1, 2, . . . , n − 1} be the indexing set for the Dynkin diagram of type An−1 , with weight lattice Pfin , simple roots {α i | i ∈ J }, fundamental weights {i | i ∈ J }, and simple coroots {hi | i ∈ J }. Let λ = (λ1 ≥ λ2 ≥ · · · ≥ λn ) ∈ Nn be a partition. There is a natural projection n Z → Pfin denoted λ → λ = n−1 i=1 (λi −λi+1 )i . Let V (λ) be the irreducible integrable highest weight module of highest weight λ over the quantized universal enveloping algebra Uq (sln ) [17]. By abuse of notation we shall write V λ = V (λ) and denote the crystal graph of V λ by Bλ . As a set Bλ may be realized as the set of tableaux of shape λ over the alphabet {1, 2, . . . , n}. Define the content of b ∈ Bλ by content(b) = (c1 , . . . , cn ) ∈ Nn , where cj is the number of times the letter j appears in b. The weight function wt : Bλ → Pfin is given by sending b to the image of content(b) under the projection Zn → Pfin . The row-reading word of b is defined by word(b) = · · · w2 w1 , where wr is the word obtained by reading the r th row of b from left to right. This definition is useful even in the context that b is a skew tableau. The edges of Bλ are given as follows. First let v be a word in the alphabet {1, 2, . . . , n}. View each letter i (resp. i +1) of v as a closing (resp. opening) parenthesis, ignoring other letters. Now iterate the following step: declare each adjacent pair of matched parentheses to be invisible. Repeat this until there are no matching pairs of visible parentheses. At the end the result must be a sequence of closing parentheses (say p of them) followed by a sequence of opening parentheses (say q of them). The unmatched (visible) subword is of the form i p (i + 1)q . If p > 0 (resp. q > 0) then fi (v) (resp. ei (v)) is obtained from v by replacing the unmatched subword i p (i + 1)q by i p−1 (i + 1)q+1 (resp. i p+1 (i + 1)q−1 ). Then φi (v) = p, i (v) = q, and fi (v) (resp. ei (v)) is defined if and only if p > 0 (resp. q > 0). For the tableau b ∈ Bλ , let fi (b) be undefined if fi (word(b)) is; otherwise define fi (b) to be the unique (not necessarily column-strict) tableau of shape λ such that word(fi (b)) = fi (word(b)). It is easy to verify that when defined, fi (b) is a columnstrict tableau. Consequently φi (b) = φi (word(b)). The operator ei and the quantity i (b) are defined similarly. n )-crystal structure on rectangular tableaux. There is an inclusion of alge3.3. Uq (sl n ), where Uq (sl n ) is the quantized universal enveloping algebra bras Uq (sln ) ⊂ Uq (sl n of the affine Kac–Moody algebra sl n [15]. corresponding to the derived subalgebra sl (1) Let I = {0, 1, 2, . . . , n − 1} be the index set for the Dynkin diagram of An−1 . Let Pcl n , with (linearly dependent) simple roots {α cl | i ∈ I }, simple be the weight lattice of sl i coroots {hi | i ∈ I }, and fundamental weights {cl | i ∈ I }. The simple roots satisfy i the relation α0cl = − i∈J αicl . There is a natural projection Pcl → Pfin with kernel cl Z0 such that cl i → i for i ∈ J and 0 → 0. Let cl : Pfin → Pcl be the section cl of the above projection defined by cl(i ) = cl i − 0 for i ∈ J . Let c ∈ sl n be the canonical central element. The level of a weight ∈ Pcl is defined by c , . Let (Pcl+ )* = { ∈ Pcl+ | c , = *}. n )-module that has a crystal graph B (not all Suppose V is a finite-dimensional Uq (sl do); B is a Pcl -weighted I -crystal. A weight function wt cl : B → Pcl may be given by wtcl (b) = cl(wt(b)), where wt : B → Pfin is the weight function on the set B viewed as a Uq (sln )-crystal graph. In addition to being a Uq (sln )-crystal graph, B also has some
110
A. Schilling, M. Shimozono
n ) which edges colored 0. The action of Uq (sln ) on V λ extends to an action of Uq (sl admits a crystal structure, if and only if the partition λ is a rectangle [18, 30]. If λ is n )-module with the rectangle with k rows and m columns, then write V k,m for the Uq (sl Uq (sln )-structure V λ and denote its crystal graph by B k,m . If one of m or k is 1, then it is easy to give e0 and f0 explicitly on B k,m , for in this case the weight spaces of V k,m are one-dimensional, and the zero edges can be deduced from (3.1) [18]. The general case is given as follows [37]. We shall first define a content-rotating bijection ψ −1 : B k,m → B k,m . Let b ∈ B k,m be a tableau, say of content (c1 , c2 , . . . , cn ). ψ −1 (b) will have content (c2 , c3 , . . . , cn , c1 ). Remove all the letters 1 from b, leaving a vacant horizontal strip of size c1 in the northwest corner of b. Compute Schensted’s P tableau [34] of the row-reading word of this skew subtableau. It can be shown that this yields a tableau of the shape obtained by removing c1 cells from the last row of the rectangle (mk ). Subtract one from the value of each entry of this tableau, and then fill in the c1 vacant cells in the last row of the rectangle (mk ) with the letter n. It can be shown that ψ −1 is a well-defined bijection, whose inverse ψ can be given by a similar algorithm. Then fi = ψ −1 ◦ fi+1 ◦ ψ, ei = ψ −1 ◦ ei+1 ◦ ψ
(3.6)
for all i where indices are taken modulo n; in particular for i = 0 this defines explicitly the operators e0 and f0 . 3.4. Sequences of rectangular tableaux. For a sequence of rectangles R, consider the n )-crystal graph has underlying set PR = tensor product V RL ⊗ · · · ⊗ V R1 . Its Uq (sl BRL ⊗ · · · ⊗ BR1 , where the tensor symbols denote the Cartesian product of sets. A typical element of PR is called a path and is written b = bL ⊗ · · · ⊗ b2 ⊗ b1 , where bj ∈ BRj is a tableau of shape Rj . The edges of the crystal graph PR are given explicitly as follows. Define the word of a path b by word(b) = word(bL ) · · · word(b2 )word(b1 ). Then for i = 1, 2, . . . , n − 1 (as in the definition of fi for b ∈ Bλ ), if fi (word(b)) is undefined, let fi (b) be undefined; otherwise it is not hard to see that there is a unique path fi (b) ∈ PR such that word(fi (b)) = fi (word(b)). To define f0 , let ψ(b) = ψ(bL ) ⊗ · · · ⊗ ψ(b1 ) and f0 = ψ −1 ◦ f1 ◦ ψ. This definition is equivalent to that given by taking the above definition of fi on the crystals BRj and then applying the rule for lowering operators on tensor products (3.3). The action of ei for i ∈ I is defined analogously. n , with weight 3.5. Integrable affine crystals. Consider the affine Kac–Moody algebra sl lattice Paf , independent simple roots {αi | i ∈ I }, simple coroots {hi | i ∈ I }, and fundamental weights {i | i ∈ I }. Let δ ∈ Paf be the null root. There is a natural projection which we shall by abuse of notation also call cl : Paf → Pcl such that cl(δ) = 0 and cl(i ) = cl i for i ∈ I . Write af : Pcl → Paf for the section of cl given by af(cl ) = for i ∈ I . i i
Fermionic Formulas for Level-Restricted Generalized Kostka Polynomials
111
Let ∈ Pcl+ be a dominant integral weight and B() the crystal graph of the n )-module of highest weight . If = 0 irreducible integrable highest weight Uq (sl then B() is infinite. The set of weights in Paf that project by cl to are given by cl−1 () = {af() + j δ | j ∈ Z}. Now fix j . The irreducible integrable highest weight n )-crystal graph B(af() + j δ) may be identified with B() as sets and as I Uq (sl crystals (independent of j ). The weight functions for B(af()+j δ) and B(af()) differ by the global constant j δ. The weight function B() → Z is obtained by composing the weight function for B(af() + j δ), with the projection cl : Paf → Pcl . The set B() is then endowed with an induced Z-grading E : B() → N defined by E(b) = − d , wt(b) , where B() is identified with B(af()), wt : B(af()) → Paf is the weight function and d ∈ Paf∗ is the degree generator. The map d , · takes the coefficient of the element δ of an element in Paf when written in the basis {i | i ∈ I } ∪ {δ}. 3.6. Energy function on finite paths. The set of paths PR has a natural statistic called the energy function. The definitions here follow [30]. Consider first the case that R = (R1 , R2 ) is a sequence of two rectangles. Let Bj = BRj for 1 ≤ j ≤ 2. Since B2 ⊗ B1 is a connected crystal graph, there is a unique n )-crystal graph isomorphism Uq (sl (3.7) σ : B2 ⊗ B1 ∼ = B1 ⊗ B2 . This is called the local isomorphism (see Sect. 4.4 for an explicit construction). Write σ (b2 ⊗ b1 ) = b1 ⊗ b2 . Then there is a unique (up to a global additive constant) map H : B2 ⊗ B1 → Z such that −1 if i = 0, e0 (b2 ⊗ b1 ) = e0 b2 ⊗ b1 and e0 (b1 ⊗ b2 ) = e0 b1 ⊗ b2 , H (ei (b2 ⊗ b1 )) = H (b2 ⊗ b1 ) + 1 if i = 0, e0 (b2 ⊗ b1 ) = b2 ⊗ e0 b1 (3.8) and e0 (b1 ⊗ b2 ) = b1 ⊗ e0 b2 , 0 otherwise. This map is called the local energy function. By definition it is invariant under the local isomorphism and under fi and ei for i ∈ J . Let us normalize it by the condition that H (u2 ⊗u1 ) = |R1 ∩R2 |, where uj is the Uq (sln ) highest weight vector of Bj for 1 ≤ j ≤ 2, R1 ∩ R2 is the intersection of the Ferrers diagrams of R1 and R2 , and |R1 ∩ R2 | is the number of cells in this intersection. Explicitly |R1 ∩ R2 | = min{η1 , η2 } min{µ1 , µ2 }. If η1 + η2 ≤ n then the local energy function attains precisely the values from 0 to |R1 ∩ R2 |. Now let R = (R1 , . . . , RL ) be a sequence of rectangles and b = bL ⊗ · · · ⊗ b1 ∈ PR . For 1 ≤ p ≤ L−1 let σp denote the local isomorphism that exchanges the tensor factors (i+1) be the (i + 1)th tensor in the pth and (p + 1)th positions. For 1 ≤ i < j ≤ L, let bj factor in σi+1 σi+2 . . . σj −1 (b). Then define the energy function (i+1) E(b) = H (bj ⊗ bi ). (3.9) 1≤i<j ≤L
The value of the energy function is unchanged under local isomorphisms and under ei and fi for i ∈ J , since the local energy function has this property. The next lemma follows from the definition of the local energy function.
112
A. Schilling, M. Shimozono
Lemma 3.2. Suppose b = bL ⊗ · · · ⊗ b1 ∈ PR is such that e0 (b) is defined and for any ⊗ · · · ⊗ b of b under a composition of local isomorphisms, e (b ) = image b = bL 0 1 bL ⊗ · · · ⊗ bj +1 ⊗ e0 (bj ) ⊗ bj −1 ⊗ · · · ⊗ b1 , where j = 1. Then E(e0 (b)) = E(b) − 1. If all rectangles Rj are the same then each of the local isomorphisms is the identity and E(b) = (L − i)H (bi+1 ⊗ bi ). (3.10) 1≤i≤L−1
Say that b ∈ PR is classically restricted if it is an sln -highest weight vector, that is, i (b) = 0 for all i ∈ J . Equivalently, word(b) is a (reverse) lattice permutation (every final subword has partition content). Let PR be the set of classically restricted paths in PR of weight ∈ Pcl . It was shown in [37] that the generalized Kostka polynomial (which was originally defined in terms of Littlewood–Richardson tableaux; see (4.3)) can be expressed as KλR (q) = q E(b) . (3.11) b∈Pcl(λ)R
This extends the path formulation of the Kostka polynomial by Nakayashiki and Yamada [30]. 3.7. Level-restricted paths. Let B be any Pcl -weighted I -crystal and ∈ Pcl+ . Say that b ∈ B is -restricted if b ⊗ u is a highest weight vector in the Pcl -weighted I -crystal B ⊗ B(), that is, i (b ⊗ u ) = 0 for all i ∈ I . Equivalently i (b) ≤ hi , for all i ∈ I by (3.5) and (3.2). Denote by H(, B) the set of elements b ∈ B that are -restricted. If ∈ Pcl+ has the same level as , define H(, B, ) to be the set of b ∈ H(, B) such that wt(b) = − ∈ Pcl , that is, the set of b ∈ B such that b ⊗ u is a highest weight vector of weight . Say that the element b is restricted of level * if it is (*0 )-restricted. Such paths are also classically restricted since hi , *0 = 0 * denote the set of paths in P for i ∈ J . Let PR R that are restricted of level *. Letting * = H(* , B, + * ). B = PR , this is the same as saying PR 0 0 Define the level-restricted generalized Kostka polynomial by * KλR (q) = q E(b) . (3.12) b∈P *
cl(λ)R
3.8. Perfect crystals. This section is needed to compute the coset branching functions in n )-crystal n . For any Uq (sl Sect. 7. We follow [19], stating the definitions in the case of sl B, define , φ : B → Pcl by (b) = i∈I i (b)i and φ(b) = i∈I φi (b)i . Now let * be a positive integer and B the crystal graph of a finite dimensional irren )-module V . Say that B is perfect of level * if ducible Uq (sl (1) B ⊗ B is connected. (2) There is a weight ∈ Pcl such that B has a unique vector of weight and all other vectors in B have lower weight in the Chevalley order, that is, wt(B) ⊂ − i∈J Nαi .
Fermionic Formulas for Level-Restricted Generalized Kostka Polynomials
113
(3) * = minb∈B c , (b) . (4) The maps and φ restrict to bijections Bmin → (Pcl+ )* , where Bmin ⊂ B is the set of b ∈ B achieving the minimum in 3.
n the perfect crystals of level * are precisely those of the form B k,* for 1 ≤ k ≤ For sl cl n − 1 [18, 30]. Let B = B k,* . The weight can be taken to be *(cl k − 0 ). Example 3.3. We describe the bijections , φ : Bmin → (Pcl+ )* in this example. Let B = B k,* . For this example let n = 6, k = 3, * = 5, and consider the weight = 20 + 1 + 2 + 4 . As usual subscripts are identified modulo n. The unique tableau b ∈ B k,* such that φ(b) = is constructed as follows. First let T be the following tableau of shape (*k ). Its bottom row contains hi , copies of the letter i for 1 ≤ i ≤ n (here it is 12466 since the sequence of hi , for 1 ≤ i ≤ 6 is (1, 1, 0, 1, 0, 2)). Let every letter in T have value one smaller than the letter directly below it. Here we have −1 0 2 4 4 T = 0
1 3 5 5
1
2 4 6 6.
Let T− be the subtableau of T consisting of the entries that are nonpositive and T+ the rest. Say T− has shape ν (here ν = (2, 1)). Let ν = (*k ) − (νk , νk−1 , . . . , ν1 ) (here ν = (5, 4, 3)). The desired tableau b is defined as follows. The restriction of b to the shape ν is P (T+ ), or equivalently, the tableau obtained by taking the skew tableau T+ and first pushing all letters straight upwards to the top of the bounding rectangle (*k ), and then pushing all letters straight to the left inside (*k ). The restriction of b to (*k )/ ν is the tableau of that skew shape in the alphabet {1, 2, . . . , n} with maximal entries, that is, its bottom row is filled with the letter n, the next-to-bottom row is filled with the letter n − 1, etc. In the example, 1 1 2 4 4 b=2 3 5 5 5 4 6 6 6 6. To construct the unique element b ∈ B k,* such that (b ) = , let U be the tableau whose first row has hi , copies of the letter i + 1 for 1 ≤ i ≤ n, again identifying subscripts modulo n; here U has first row 11235. Now let the rest of U be defined by letting each entry have value one greater than the entry above it. So 1 1 2 3 5 U =2 2 3 4 6 3 3 4 5 7. Let U− be the subtableau of U consisting of the values that are at most n. Let µ be the µ = (*k ) − (µk , µk−1 , . . . , µ1 ). Here µ = (5, 5, 4) and µ = (1, 0, 0). shape of U− and The element b is defined as follows. Its restriction to the skew shape (*k )/ µ is the unique skew tableau V of that shape such that P (V ) = U− , or equivalently, this restriction is obtained by taking the tableau U− , pushing all letters directly down within the rectangle (*k ) and then pushing all letters to the right within (*k ). The restriction of b to the
114
A. Schilling, M. Shimozono
shape µ is filled with the smallest letters possible, so that the first row of this subtableau consists of ones, the second row consists of twos, etc. Here 1 1 1 2 3
b =2 2 3 4 5 3 3 4 5 6. The main theorem for perfect crystals is: Theorem 3.4 ([19]). Let B be a perfect crystal of level * and ∈ (Pcl+ )* with * ≥ * . n )-crystals Then there is an isomorphism of Uq (sl B ⊗ B() ∼ =
B( + wt(b)).
(3.13)
b∈H(,B)
Suppose now that B is perfect of level * and ∈ (Pcl+ )* . Write b() for the unique element of B such that φ(b()) = . Theorem 3.4 (with therein replaced by = (b())) says that B ⊗ B((b())) ∼ = B() with corresponding highest weight vectors b() ⊗ u(b()) → u . This isomorphism can be iterated. Let σ : Bmin → Bmin be the unique bijection defined by φ ◦ σ = . Then there are isomorphisms B ⊗N ⊗ B(φ(σ N (b()))) ∼ = B() such that the highest weight vector of the left-hand side is n ) given by b()⊗σ (b())⊗σ 2 (b())⊗· · ·⊗σ N−1 (b())⊗uφ(σ N (b())) . For the Uq (sl perfect crystals B k,* , it can be shown that the map σ is none other than the power ψ −k of the content rotating map ψ. Moreover if σ is extended to a bijection σ : B k,* → B k,* by defining σ = ψ −k , then the extended function also satisfies φ(σ (b)) = (b) for all b ∈ B k,* not just for b ∈ Bmin . Since the bijection ψ on B k,* has order n, the bijection σ has order n/ gcd(n, k). The ground state path for the pair (, B) is by definition the infinite periodic sequence b = b1 ⊗ b2 ⊗ . . . , where bi = σ i−1 (b()). Let P(, B) be the set of all semi-infinite sequences b = b1 ⊗ b2 ⊗ . . . of elements in B such that b eventually agrees with the ground state path b for (, B). Then the set P(, B) has the structure of the crystal B() with highest weight vector u = b and weight function wt(b) = i≥1 (wt(bi ) − wt(bi )). To recover the weight function of the n )-crystal B(af()), define the energy function on P(, B) by Uq (sl E(b) =
i(H (bi ⊗ bi+1 ) − H (bi ⊗ bi+1 ))
(3.14)
i≥1
and define the map B(af(λ)) → Paf by b → wt(b) − E(b)δ, where wt : B() → Pcl . P(, B) can be regarded as a direct limit of the finite crystals B ⊗N . Define the embedding iN : B ⊗N → P(, B) by b1 ⊗ · · · ⊗ bN → b1 ⊗ b2 ⊗ bN ⊗ bN+1 ⊗ bN+2 ⊗ . . . . Define EN : B ⊗N → Z by EN (b1 ⊗ · · · ⊗ bN ) = E(b1 ⊗ · · · ⊗ bN ⊗ bN+1 ), where the E on the right-hand side is the energy function for the finite path space B ⊗N+1 . By definition for all p = b1 ⊗ · · · ⊗ bN ∈ B ⊗N , E(iN (p)) = EN (p) − EN (b1 ⊗ · · · ⊗ bN ). Note that the last fixed step bN+1 is necessary to make the energy function on the finite paths stable under the embeddings into P(, B).
Fermionic Formulas for Level-Restricted Generalized Kostka Polynomials
115
3.9. Standardization embeddings. We require certain embeddings of finite path spaces. Given a sequence of rectangles R, let r(R) denote the sequence of rectangles given by splitting the rectangles of R into their constituent rows. For example, if R = ((1), (2, 2)), then r(R) = ((1), (2), (2)). There is a unique embedding iR : PR 7→ Pr(R)
(3.15)
defined as follows. Its explicit computation is based on transforming R into r(R) using two kinds of steps. (1) Suppose R1 has more than one row (η1 > 1). Then use the transformation R → η −1 R < = ((µ1 ), (µ11 ), R2 , R3 , . . . , RL ). Informally, R < is obtained from R by n )-crystal splitting off the first row of R1 . There is an associated embedding of Uq (sl < < graphs iR : PR → PR < defined by the property that word(i (b)) = word(b) for all b ∈ PR . Here it is crucial that the rectangle being split horizontally, is the first one, for otherwise the embedding does not preserve the edges labeled by 0. (2) If η1 = 1, then use a transformation of the form R → sp R for some p. Here sp R denotes the sequence of rectangles obtained by exchanging the p th and (p + 1)th n )-crystal graphs is the local rectangles in R. The associated isomorphism of Uq (sl isomorphism σp : PR → Psp R defined before. It is clear that one can transform R into r(R) using these two kinds of steps. Now fix one such sequence of steps leading from R to r(R), say R = R (0) → R (1) → · · · → R (N) = r(R), where each R (m) is a sequence of rectangles and each step R (m−1) → R (m) is one of the two types defined above. Define the map i (m) : PR (m−1) 7→ PR (m) by i (m) = iRk
L
µa max{ηa − k, 0}
(5.1)
a=1
for k ≥ 0, where by convention ν (0) is the empty partition. If λ has at most n parts all partitions ν (k) for k ≥ n are empty. For a partition ρ, define mi (ρ) to be the number of parts equal to i and min{i, ρj }, Qi (ρ) = ρ1t + ρ2t + · · · + ρit = j ≥1
the size of the first i columns of ρ. Let ξ (k) (R) be the partition whose parts are the widths of the rectangles in R of height k. The vacancy numbers for the (λ; R)-configuration ν are the numbers (indexed by k ≥ 1 and i ≥ 0) defined by (k) Pi (ν) = Qi ν (k−1) − 2Qi ν (k) + Qi ν (k+1) + Qi ξ (k) (R) . (k)
(5.2)
In particular P0 (ν) = 0 for all k ≥ 1. The (λ; R)-configuration ν is said to be admissible (k) if Pi (ν) ≥ 0 for all k, i ≥ 1, and the set of admissible (λ; R)-configurations is denoted by C(λ; R). Following [26, (3.2)], set (k) (k) (k+1) αi αi − α i , cc(ν) = k,i≥1
Fermionic Formulas for Level-Restricted Generalized Kostka Polynomials
121
(k)
where αi is the size of the i th column in ν (k) . Define the charge c(ν) of a configuration ν ∈ C(λ; R) by c(ν) = ||R|| − cc(ν) − |P | with ||R|| =
|Ri ∩ Rj |
and
|P | =
1≤i<j ≤L
k,i≥1
(k)
mi (ν)Pi (ν).
Observe that c(ν) depends on both ν and R but cc(ν) depends only on ν. Example 5.1. Let λ = (3, 2, 2, 1) and R = ((2), (2, 2), (1, 1)). Then ν = ((2), (2, 1), (1)) is a (λ; R)-configuration with ξ (1) (R) = (2) and ξ (2) (R) = (2, 1). The configuration ν may be represented as 0
1
0
0
where the vacancy numbers are indicated to the left of each part. In addition cc(ν) = 3, !R! = 5, |P | = 1 and c(ν) = 1. Define the q-binomial by
(q)m+p m+p = (q)m (q)p m
for m, p ∈ N and zero otherwise, where (q)m = (1 − q)(1 − q 2 ) · · · (1 − q m ). The following fermionic or quasi-particle expression of the generalized Kostka polynomials, is a variant of [25, Theorem 2.10]. Theorem 5.2. For λ a partition and R a sequence of rectangles P (k) (ν) + mi (ν (k) ) i . KλR (q) = q c(ν) mi (ν (k) ) k,i≥1
(5.3)
ν∈C(λ;R)
Expression (5.3) can be reformulated as the generating function over rigged configurations. To this end we need to define certain labelings of the rows of the partitions in a configuration. For this purpose one should view a partition as a multiset of positive integers. A rigged partition is by definition a finite multiset of pairs (i, x), where i is a positive integer and x is a nonnegative integer. The pairs (i, x) are referred to as strings; i is referred to as the length of the string and x as the label or quantum number of the string. A rigged partition is said to be a rigging of the partition ρ if the multiset consisting of the lengths of the strings is the partition ρ. So a rigging of ρ is a labeling of the parts of ρ by nonnegative integers, where one identifies labelings that differ only by permuting labels among equal-sized parts of ρ. A rigging J of the (λ; R)-configuration ν is a sequence of riggings of the partitions ν (k) such that for every part of ν (k) of length i and label x, (k)
0 ≤ x ≤ Pi (ν).
(5.4)
The pair (ν, J ) is called a rigged configuration. The set of riggings of admissible (λ; R)configurations is denoted by RC(λ; R). Let (ν, J )(k) be the k th rigged partition of (ν, J ).
122
A. Schilling, M. Shimozono (k)
A string (i, x) ∈ (ν, J )(k) is said to be singular if x = Pi (ν), that is, its label takes on the maximum value. Observe that the definition of the set RC(λ; R) is completely insensitive to the order of the rectangles in the sequence R. However the notation involving the sequence R is useful when discussing the bijection between LR tableaux and rigged configurations, since the ordering on R is essential in the definition of LR tableaux. Define the cocharge and charge of (ν, J ) ∈ RC(λ; R) by cc(ν, J ) = cc(ν) + |J |, c(ν, J ) = c(ν) + |J |, (k) |Ji |, |J | = k,i≥1
(k)
(k)
where Ji is the partition inside the rectangle of height mi (ν (k) ) and width Pi (ν) given by the labels of thepartsof ν (k) of size i. Since the q-binomial m+p is the generating function of partitions with at most m m parts each not exceeding p [1, Theorem 3.1], Theorem 5.2 is equivalent to the following theorem. Theorem 5.3. For λ a partition and R a sequence of rectangles KλR (q) =
q c(ν,J ) .
(5.5)
(ν,J )∈RC(λ;R)
5.2. Switching between quantum and coquantum numbers. Let θR : RC(λ; R) → RC(λ; R) be the involution that complements quantum numbers. More precisely, for (k) (ν, J ) ∈ RC(λ; R), replace every string (i, x) ∈ (ν, J )(k) by (i, Pi (ν) − x). The notation here differs from that in [25], in which θR is an involution on RC(λt ; R t ). Lemma 5.4. c(θR (ν, J )) = ||R|| − cc(ν, J ) for all (ν, J ) ∈ RC(λ; R). Proof. Let θR (ν, J ) = (ν , J ). It follows immediately from the definitions that ν = ν. In particular ν and ν have the same vacancy numbers and |J | = |P | − |J |. Then c(θR (ν, J )) = c(ν , J ) = ||R|| − cc(ν ) − |P | + |J | = ||R|| − cc(ν) − |J | = ||R|| − cc(ν, J ).
# "
There is a bijection tr RC : RC(λ; R) → RC(λt ; R t ) that has the property cc(tr RC (ν, J )) = ||R|| − cc(ν, J ) for all (ν, J ) ∈ RC(λ; R); see the proof of [26, Prop. 11].
(5.6)
Fermionic Formulas for Level-Restricted Generalized Kostka Polynomials
123
5.3. RC’s and level-restriction. Here we introduce the most important new definition in this paper, namely, that of a level-restricted rigged configuration. Say that a partition λ is restricted of level * if λ1 − λn ≤ *, recalling that it is assumed that all partitions have at most n parts, some of which may be zero. Fix a shape λ and a sequence of rectangles R that are all restricted of level *. Define * = * − (λ1 − λn ), which is nonnegative by assumption. Set λ = (λ1 − λn , . . . , λn−1 − λn )t and denote the set of all column-strict tableaux of shape λ over the alphabet {1, 2, . . . , λ1 − λn } by CST(λ ). Define a table of modified vacancy numbers depending on ν ∈ C(λ; R) and t ∈ CST(λ ) by (k)
(k)
Pi (ν, t) = Pi (ν) −
λ k −λn
χ (i ≥ * + tj,k ) +
λk+1 −λn
j =1
χ (i ≥ * + tj,k+1 )
(5.7)
j =1
for all i, k ≥ 1, where χ (S) = 1 if the statement S is true and χ (S) = 0 otherwise, and (k) (k) tj,k is the (j, k)th entry of t. Finally let xi be the largest part of the partition Ji ; if (k) (k) Ji is the empty set xi = 0. Definition 5.5. Say that (ν, J ) ∈ RC(λ; R) is restricted of level * provided that (k)
(1) ν1 ≤ * for all k. (2) There exists a tableau t ∈ CST(λ ), such that for every i, k ≥ 1, (k)
xi
(k)
≤ Pi (ν, t).
Let C* (λ; R) be the set of all ν ∈ C(λ; R) such that the first condition holds, and denote by RC* (λ; R) the set of (ν, J ) ∈ RC(λ; R) that are restricted of level *. (k)
Note in particular that the second condition requires that Pi (ν, t) ≥ 0 for all i, k ≥ 1. Example 5.6. Let us consider Definition 5.5 for two classes of shapes λ more closely: (k)
(1) Vacuum case: Let λ = (a n ) be rectangular with n rows. Then λ = ∅ and Pi (ν, ∅) = (k) Pi (ν) for all i, k ≥ 1 so that the modified vacancy numbers are equal to the vacancy numbers. (2) Two-corner case: Let λ = (a α , bβ ) with α + β = n and a > b. Then λ = (α a−b ) and there is only one tableau t in CST(λ ), namely the Yamanouchi tableau of shape λ . Since tj,k = j for 1 ≤ k ≤ α we find that (k) (k) *, 0} Pi (ν, t) = Pi (ν) − δk,α max{i −
for 1 ≤ i ≤ * and 1 ≤ k < n. We wish to thank Anatol Kirillov for communicating this formula to us [27]. Our main result is the following formula for the level-restricted generalized Kostka polynomial: Theorem 5.7. Let * be a positive integer. For λ a partition and R a sequence of rectangles both restricted of level *, * KλR (q) = q c(ν,J ) . (ν,J )∈RC* (λ;R)
124
A. Schilling, M. Shimozono
The proof of this theorem is given in Sect. 8. Example 5.8. Consider n = 3, * = 2, λ = (3, 2, 1) and R = ((2), (1)4 ). Then 0 0
1
and
1
0
(5.8)
0
2
are in C* (λ; R), where again the vacancy numbers are indicated to the left of each part. The set CST(λ ) consists of the two elements 1
1
1
and
2
2
2
.
Since * = 0 the three rigged configurations 0 0 ,
0 0
0 0
and
0
0 1
0
are restricted of level 2 with charges 2, 3, 4, respectively. The riggings are given on the 2 (q) = q 2 + q 3 + q 4 . right of each part. Hence KλR In contrast to this, the Kostka polynomial Kλµ (q) is obtained by summing over both configurations in (5.8) with all possible riggings below the vacancy numbers. This amounts to Kλµ (q) = q 2 + 2q 3 + 2q 4 + 2q 5 + q 6 . In Sect. 7 we will use Theorem 5.7 to obtain explicit expressions for type A branching functions. The results suggest that it is also useful to consider the following sets of rigged configurations with imposed minima on the set of riggings. t t t Let ρ ⊂ λ be a partition and Rρ = ((1ρ1 ), (1ρ2 ), . . . , (1ρn )), the sequence of single t t columns of height ρi . Set ρ = (ρ1 − ρn , . . . , ρn−1 − ρn ) and (k) Mi (t)
=
ρ k −ρn j =1
ρk+1 −ρn
χ (i ≤ ρ1 − ρn − tj,k ) −
χ (i ≤ ρ1 − ρn − tj,k+1 )
j =1
for all t ∈ CST(ρ ). Then define RC* (λ, ρ; R) to be the set of all (ν, J ) ∈ RC* (λ; Rρ ∪R) (k) such that there exists a t ∈ CST(ρ ) such that Mi (t) ≤ x for (i, x) ∈ (ν, J )(k) and (k) (k) Mi (t) ≤ Pi (ν) for all i, k ≥ 1. Note that the second condition is obsolete if i occurs (k) (k) as a part in ν (k) since by definition Mi (t) ≤ x ≤ Pi (ν) for all (i, x) ∈ (ν, J )(k) . Conjecture 8.3 asserts that the set RC* (λ, ρ; R) corresponds to the set of all level-* restricted Littlewood–Richardson tableaux with a fixed subtableaux of shape ρ.
Fermionic Formulas for Level-Restricted Generalized Kostka Polynomials
125
6. Fermionic Expression of Level-Restricted Generalized Kostka Polynomials 6.1. Fermionic expression. Similarly to the Kostka polynomial case, one can rewrite the expression of the level-restricted generalized Kostka polynomials of Theorem 5.7 in fermionic form. (k)
Lemma 6.1. For all ν ∈ C* (λ, R), t ∈ CST(λ ) and 1 ≤ k < n, we have Pi (ν, t) = 0 for i ≥ *. (k)
(k)
Proof. Since ν1 ≤ * it follows from [26, (11.2)] that Pi (ν) = λk − λk+1 for i ≥ *. Since t is over the alphabet {1, 2, . . . , λ1 − λn } this implies for i ≥ *,
(k)
(k)
Pi (ν, t) = Pi (ν) −
λ k −λn
χ (i ≥ * + tj,k ) +
j =1
λk+1 −λn
χ (i ≥ * + tj,k+1 )
j =1
= λk − λk+1 − (λk − λn ) + (λk+1 − λn ) = 0.
# "
Let SCST(λ ) be the set of all nonempty subsets of CST(λ ). Furthermore set (k) = min{Pi (ν, t)|t ∈ S} for S ∈ SCST(λ ). Then by inclusion-exclusion the set of allowed rigging for a given configuration ν ∈ C* (λ; R) is given by
(k) Pi (ν, S)
S∈SCST(λ )
(k)
(−1)|S|+1 {J |xi
(k)
≤ Pi (ν, S)}.
is the generating function of partitions with at most m parts Since the q-binomial m+p m (k) each not exceeding p and since P* (ν, S) = 0 by Lemma 6.1 the level-* restricted generalized Kostka polynomials has the following fermionic form. Theorem 6.2.
* KλR (q) =
(−1)|S|+1
S∈SCST(λ )
ν∈C* (λ;R)
q c(ν)
(k) mi (ν (k) ) + Pi (ν, S) . mi (ν (k) )
*−1 n−1 i=1 k=1
In Sect. 7 we will derive new expressions for branching functions of type A as limits of the level-restricted generalized Kostka polynomials. To this end we need to reformulate the fermionic formula of Theorem 6.2 in terms of a so-called (m, n)-system. Set (a)
(a)
(a)
(a)
mi
= Pi (ν, S) = Pi (ν) + fi (S),
ni
= mi (ν (a) ),
(a)
126
A. Schilling, M. Shimozono
(a) and Li = L j =1 χ (i = µj )χ (a = ηj ) for 1 ≤ i ≤ * and 1 ≤ a ≤ n which is the number of rectangles in R of shape (i a ). Then (a)
(a)
(a)
(a−1)
(a)
(a+1)
+ 2ni − ni −mi−1 + 2mi − mi+1 − ni (a−1) (a−1) (a) (a+1) (a) (a+1) = αi − 2αi + αi − αi+1 − 2αi+1 + αi+1 +
=
L
δa,ηk − min{i − 1, µk } + 2 min{i, µk } − min{i + 1, µk }
k=1 (a) (a) (a) − fi−1 (S) + 2fi (S) − fi+1 (S) (a−1) (a+1) (a−1) (a) (a) − αi − αi+1 + 2(αi − αi+1 ) − αi (a) (a) (a) (a) Li − fi−1 (S) + 2fi (S) − fi+1 (S).
(a+1)
− αi+1
(a)
At this stage it is convenient to introduce vector notation. For a matrix vi 1 ≤ i ≤ * − 1 and 1 ≤ a ≤ n − 1 define v=
*−1 n−1 i=1 a=1
with indices
(a)
vi e i ⊗ e a ,
where ei and ea are the canonical basis vectors of Z*−1 and Zn−1 , respectively. Define (a)
(a)
(a)
(a)
ui (S) = −fi−1 (S) + 2fi (S) − fi+1 (S), which in vector notation reads u(S) = (C ⊗ I )f (S) +
n−1
(λa − λa+1 )e*−1 ⊗ ea ,
(6.1)
a=1 (0)
where C is the Cartan matrix of type A and I is the identity matrix. Since ni (k) (k) m0 = 0 and m* = 0 by Lemma 6.1 it follows that (C ⊗ I )m + (I ⊗ C)n = L + u(S).
(n)
= ni
=
(6.2)
In terms of the new variables the condition (5.1) on |ν (a) | becomes (a)
n* = −e*−1 ⊗ ea (C −1 ⊗ I )n −
a
*
n
1 1 (b) λj + i min{a, b}Li , * * j =1
(6.3)
i=1 b=1
where we used Cij−1 = min{i, j } − ij/* if C is (* − 1) × (* − 1)-dimensional and n * (b) b=1 i=1 ibLi = |λ|. Lemma 6.3. In terms of the above (m, n)-system c(ν) =
1 m(C ⊗ C −1 )m − m(I ⊗ C −1 )u(S) 2 1 + u(S)(C −1 ⊗ C −1 )u(S) + g(R, λ), 2
(6.4)
Fermionic Formulas for Level-Restricted Generalized Kostka Polynomials
127
where g(R, λ) = !R! −
2 n−1 * n 1 −1 (a) (b) 1 1 λj − |λ| Cab Lj Lj + 2 2* n a,b=1 j =1
(a)
and Li
=
j =1
*
(a) j =1 min{i, j }Lj .
Proof. By definition c(ν) = !R! − cc(ν) − |P |. Note that |P | =
* n−1 i=1 k=1
=
(k)
mi (ν (k) )Pi (ν)
* n−1 i=1 k=1
(k)
αi
= −2cc(ν) +
(k)
− αi+1
n−1 * i=1 k=1
i (k) (α (k−1) − 2α (k) + α (k+1) ) + Li j =1
j
j
j
(k) (k)
n i Li .
Hence eliminating cc(ν) in favor of |P | yields * n−1
1 1 (k) (k) c(ν) = !R! − |P | − n i Li . 2 2 i=1 k=1
(k)
On the other hand, using ni
(k)
= mi (ν (k) ) and P* (ν) = λk − λk+1 ,
|P | = n(I ⊗ I )P (ν) +
n−1 k=1
(k)
n* (λk − λk+1 )
so that n−1
1 1 (k) (k) c(ν) = !R! − n(I ⊗ I )(P (ν) + L) − n* λk − λk+1 + L* . 2 2
(6.5)
k=1
Eliminating n in favor of m using (6.2) and substituting P (ν) = m − f (S) yields 1 1 − n(I ⊗ I )(P (ν) + L) = m{C ⊗ C −1 (m + L − f (S)) − I ⊗ C −1 (L + u(S))} 2 2 1 − (L + u(S))(I ⊗ C −1 )(L − f (S)). 2 Similarly, replacing n by m in (6.3) we obtain (a)
n* = e*−1 ⊗ ea (I ⊗ C −1 m − C −1 ⊗ C −1 u(S)) −
1 1 −1 (b) λj − |λ| + Cab L* . * n a
n−1
j =1
b=1
(6.6)
128
A. Schilling, M. Shimozono
Inserting these equations into (6.5), trading f (S) for u(S) by (6.1) and using (C ⊗ I )L − L −
n−1 a=1
(a)
e*−1 ⊗ ea L* = 0
# "
results in the claim of the lemma.
As a corollary of Lemma 6.3 and Theorem 6.2 we obtain the following expression for the level-restricted generalized Kostka polynomial 1 −1 −1 * KλR (q) = q g(R,λ) (−1)|S|+1 q 2 u(S)C ⊗C u(S) ×
S∈SCST(λ )
q
m+n , m
1 −1 −1 2 mC⊗C m−mI ⊗C u(S)
m
(6.7)
where n is determined by (6.2), the sum over m is such that e*−1 ⊗ ea (I ⊗ C −1 m − C −1 ⊗ C −1 u(S)) 1 1 −1 (b) λj − |λ| + Cab L* ∈ Z, * n a
−
n−1
j =1
for all 1 ≤ a ≤ n − 1 and
m+n m
=
*−1 n−1 i=1
k=1
b=1
(k) (k) mi +ni (k) mi
.
Now consider the second case of Example 5.6, namely λ = (a α , bβ ) with a > b and α + β = n. Then SCST(λ ) only contains the element S = {t}, where t is the Yamanouchi tableau of shape λ and u(S) = e * ⊗ eα . In the vacuum case, that is, when n ), the set SCST(λ ) only contains S = {∅} and u(S) = f (S) = 0. In this ) λ = (( |λ| n case (6.7) simplifies to 1 * g(R,λ) mC⊗C −1 m m + n 2 KλR (q) = q . q m m When R is a sequence of single boxes this proves [8, Theorem 1]1 . When R is a sequence of single rows or single columns this settles [12, Conjecture 5.7]. 6.2. Polynomial Rogers–Ramanujan-type identities. Let W be the Weyl group of sln , M = {β ∈ Zn | ni=1 βi = 0} be the root lattice, ρ the half-sum of the positive roots, and (·|·) the standard symmetric bilinear form. Recall the energy function (3.9). It was shown in [31] that 1 * (q) = (−1)τ q − 2 (*+n)(β|β)+(λ+ρ|β)+E(b) . (6.8) KλR τ ∈W β∈M
b∈PR wt(b)=−ρ+τ −1 (λ−(*+n)β+ρ)
Equating (6.7) and (6.8) gives rise to polynomial Rogers–Ramanujan-type identities. For the vacuum case, that is, when the partition λ is rectangular with n rows, this proves [33, Eq. (9.2)]2 . 1 We believe that the proof given in [8] is incomplete. 2 The definition of level-restricted path as given in [33, p. 394] only works when R (or µ therein) consists
of single rows; otherwise the description of Sect. 3.7 should be used.
Fermionic Formulas for Level-Restricted Generalized Kostka Polynomials
129
7. New Expressions for Type A Branching Functions The coset branching functions b labeled by the three weights , , have a nat ural finitization in terms of ( + )-restricted crystals. For certain triples of weights these can be reformulated in terms of level-restricted paths, which in turn yield an expression of the type A branching functions as a limit of the level-restricted generalized Kostka polynomials. Together with the results of the last section this implies new fermionic expressions for type A branching functions at certain triples of weights.
7.1. Branching function in terms of paths. Let , , ∈ Pcl be dominant integral weights of levels *, * , and * respectively, where * = * + * . The branching function b (z) is the formal power series defined by af()−mδ b zm caf( ),af( ) , (z) = m≥0
af()−mδ
where caf( ),af( ) is the multiplicity of the irreducible integrable highest weight n )-module V(af() − mδ) in the tensor product V(af( )) ⊗ V(af( )). Uq (sl n -highest weight vectors of weight The desired multiplicity is equal to the number of sl af()−mδ in the tensor product B(af( ))⊗B(af( )), that is, the number of elements b ⊗b ∈ B(af( ))⊗B(af( )) such that wt(b ⊗b ) = af()−mδ and i (b ⊗b ) = 0 for all i ∈ I . By (3.5), b = u , b is -restricted, and wt(b ) = af( − ) − mδ. Let B be a perfect crystal of level * . Using the isomorphism B( ) ∼ = P( , B) let b = b1 ⊗ b2 ⊗ · · · and b ∈ P( , B) be the ground state path. Suppose N is such that . In type A(1) the period of the ground for all j > N, bj = bj . Write b = b1 ⊗ · · · ⊗ bN n−1 state path b always divides n. Choose N to be a multiple of n, so that b = b ⊗ b and bN+1 = b1 . Then the above desired highest weight vectors have the form b ⊗ b = (b ⊗ u ) ⊗ u ∈ B ⊗N ⊗ B(af( )) ⊗ B(af( )). But there is an embedding B(af( + )) 7→ B(af( )) ⊗ B(af( )) defined by u + → u ⊗ u . With this rephrasing of the conditions on b and taking limits, we have −EN (b1 ⊗···⊗bN ) b zEN (b) , (7.1) (z) = lim z N→∞ N∈nZ
b∈H( + ,B⊗N ,)
where EN : B ⊗N → Z is given by EN (b) = E(b ⊗ bN+1 ) = E(b ⊗ b1 ) and E is the energy function on finite paths. Our goal is to express (7.1) in terms of level-restricted generalized Kostka polynomials. We find that this is possible for certain triples of weights. Using the results of Sect. 6 this provides explicit formulas for the branching functions. 7.2. Reduction to level-restricted paths. The first step in the transformation of (7.1) is to replace the condition of ( + )-restrictedness by level * restrictedness. This is achieved at the cost of appending a fixed inhomogeneous path. Consider any tensor product B of perfect crystals each of which has level at most * (the level of ), such that there is an element y ∈ H(* 0 , B , ). We indicate how such a B and y can be constructed explicitly. Let λ be the partition with strictly
130
A. Schilling, M. Shimozono
less than n rows with hi , columns of length i for 1 ≤ i ≤ n − 1. Let Yλ be the Yamanouchi tableau of shape λ. Then any factorization (in the plactic monoid) of Yλ into a sequence of rectangular tableaux, yields such a B and y . Example 7.1. Let n = 6, * = 5, = 0 + 22 + 3 + 4 . Then λ = (4, 4, 2, 1) (its transpose is λt = (4, 3, 2, 2)) and 1 1 1 1 Yλ =
2 2 2 2 3 3
.
4 One way is to factorize into single columns: B = B 2,1 ⊗ B 2,1 ⊗ B 3,1 ⊗ B 4,1 and y = y4 ⊗ y3 ⊗ y2 ⊗ y1 , where each yj is an sln highest weight vector, namely, the j th column of Yλ . Another way is to factorize into the minimum number of rectangles by slicing Yλ vertically. This yields B = B 2,2 ⊗ B 3,1 ⊗ B 4,1 ; again the factors of y = y3 ⊗ y2 ⊗ y1 are the sln highest weight vectors, namely,
y3 =
1 1 2 2
1 ,
y2 = 2 , 3
1 2 y1 = . 3 4
Consider also a tensor product B of perfect crystals such that there is an element ∈ H(* 0 , B , ). Then y = y ⊗ y ∈ H(*0 , B ⊗ B , + ). Instead of b ∈ H( + , B ⊗N , ), we work with b ⊗ y, where b ⊗ y is restricted of level *. This trick doesn’t help unless one can recover the correct energy function directly from b ⊗ y. Let p be the first N steps of the ground state path b ∈ P( , B). Define the normalized energy function on B ⊗N by E(b) = E(b ⊗ y ) − E(p ⊗ y ). A priori it depends on , B, and y . The energy function occurring in the branching function is E (b) = E(b ⊗ b1 ) − E(p ⊗ b1 ). y
Lemma 7.2. E = E . Proof. It suffices to show that the function B ⊗N → Z given by b → E(b ⊗ y ) − E(b ⊗ b1 ) is constant. Using the definition (3.9) and the fact that b is homogeneous of length N, we have E(b ⊗ y ) = E(b) + N E(bN ⊗ y ) − (N − 1)E(y ). Similarly E(b ⊗ b1 ) = E(b) + N E(bN ⊗ b1 ). Therefore E(b ⊗ y ) − E(b ⊗ b1 ) = N(E(bN ⊗ y ) − E(bN ⊗ b1 )) − (N − 1)E(y ). Thus it suffices to show that the function B → Z given by b → E(b ⊗ y ) − E(b ⊗ b1 ) is a constant function. Suppose first that i (b ) > hi , for some 1 ≤ i ≤ n − 1. By the construction of y and b1 , φi (y ) = hi , = φi (b1 ) for 1 ≤ i ≤ n − 1, since φ(b1 ) = . Then ei (b ⊗ y ) = ei (b ) ⊗ y and ei (b ⊗ b1 ) = ei (b ) ⊗ b1 by (3.4). Passing from b to ei (b ) repeatedly, the values of the energy functions are constant, so it may be assumed that b ⊗ y is a sln highest weight vector; in particular, i (b ) ≤ hi , for all 1 ≤ i ≤ n − 1.
Fermionic Formulas for Level-Restricted Generalized Kostka Polynomials
131
Next suppose that 0 (b ) > h0 , . Now φ0 (y ) = 0 and φ0 (b1 ) = h0 , . By (3.4) e0 (b ⊗ b1 ) = e0 (b ) ⊗ b1 and e0 (b ⊗ y ) = e0 (b ) ⊗ y . By (3.8) and the fact that the local isomorphism on B ⊗B is the identity, we have E(e0 (b ⊗b1 )) = E(b ⊗b1 )−1. To show that E(e0 (b ⊗y )) = E(b ⊗y )−1 we check the conditions of Lemma 3.2. By (3.1) 0 (y ) = φ0 (y ) − h0 , wt(y ) = 0 − h0 , − * 0 = * − h0 , . Also by (3.5), since φ0 (y ) = 0, we have 0 (b ⊗ y ) = 0 (b ) + 0 (y ) > h0 , + * − h0 , = * . Let z ⊗ x be the image of b ⊗ y under an arbitrary composition of local isomorphisms. Since b ⊗ y is an sln highest weight vector, so is z ⊗ x and x. Now x is the sln -highest weight vector in a perfect crystal of level at most * , so φ0 (x) = 0 and 0 (x) ≤ * . But * < 0 (b ⊗ y ) = 0 (z ⊗ x) = 0 (z) + 0 (x) so that 0 (z) > 0. By (3.4) e0 (z ⊗ x) = e0 (z) ⊗ x. So E(e0 (b ⊗ y )) = E(b ⊗ y ) − 1 by Lemma 3.2. ) ≤ h , . But then ) ≤ (b (b By induction we may now assume that 0 0 i i i hi , , or c , (b ) ≤ c , = * . Since b ∈ B and B is a perfect crystal of level * , b must be the unique element of B such that (b ) = . Thus the function B → Z given by b → E(b ⊗ y ) − E(b ⊗ b1 ) is constant on B if it is constant on the singleton set { −1 ( )}, which it obviously is. " # 7.3. Explicit ground state energy. To go further, an explicit formula for the value E(p ⊗ y ) is required. This is achieved in (7.2). The derivation makes use of the following explicit construction of the local isomorphism. Theorem 7.3. Let B = B k,* be a perfect crystal of level *, , ∈ (Pcl+ )* , B a perfect crystal of level * ≤ *, and b ∈ H( , B , ). Let x ∈ B (resp. y ∈ B) be the unique element such that (x) = (resp. (y) = ). Then under the local isomorphism B ⊗ B ∼ = ψ k (b) ⊗ y. = B ⊗ B, we have x ⊗ b ∼ The proof requires several technical lemmas and is given in the next section. Example 7.4. Let n = 5, * = 4, k = 2, = 0 +1 +3 +4 , = 0 +1 +2 + 4 , * = 2, B = B 2,2 . Here the set H( , B , ) consists of two elements, namely, 1 2 4 5
and
1 4 2 5.
Let b be the second tableau. The theorem says that 1 1 2 3 2 3 4 5
⊗
1 1 2 4 1 4 ∼ 1 3 ⊗ = 2 3 5 5. 2 4 2 5
Proposition 7.5. Let ∈ (Pcl+ )* , B = B k,* a perfect crystal of level *, b ∈ P(, B) the ground state path, p a finite path (say of length N , where N is a multiple of n) such that p ⊗ b = b, B the tensor product of perfect crystals each of level at most *, and y ∈ H(*0 , B , ). Let p be the path of length N such that p ⊗ b = b , where b ∈ P(*0 , B) is the ground state path. Then under the composition of local isomorphisms B ⊗N ⊗ B ∼ = y ⊗ p . = B ⊗ B ⊗N we have p ⊗ y ∼ Proof. Induct on the length of the path y. Suppose B = B1 ⊗ B2 and y = y1 ⊗ y2 , where yj ∈ Bj and Bj is a perfect crystal. Let = − wt(y1 ). By the definitions y2 ∈ H(*0 , B2 , ). By induction the first N steps p of the ground state path of
132
A. Schilling, M. Shimozono
∼ y2 ⊗ p under the composition of local isomorphisms P( , B) satisfy p ⊗ y2 = ⊗N ⊗N ∼ B ⊗ B2 = B2 ⊗ B . Tensoring on the left with y1 , it remains to show that p ⊗ y1 ∼ = y1 ⊗ p under the composition of local isomorphisms B ⊗N ⊗ B1 ∼ = B1 ⊗ B ⊗N . Now ∈ B are the unique elements such that (p ) = and (p ) = . pN ∈ B and pN N N . Now p ⊗ y ∈ H( , B ⊗ Applying Theorem 7.3 we obtain pN ⊗ y1 ∼ = ψ k (y1 ) ⊗ pN N 1 ∈ H( , B ⊗ B, φ(p )). This implies that ψ k (y ) ∈ B1 , φ(pN )) so that ψ k (y1 ) ⊗ pN 1 N 1 ) and (p H(φ(pN ), B1 , φ(pN )). Now by definition (pN−1 ) = φ(pN N−1 ) = φ(pN ). . Continuing in Applying Theorem 7.3 we obtain pN−1 ⊗ ψ k (y1 ) ∼ = ψ 2k (y1 ) ⊗ pN−1 j k (j +1)k ∼ (y1 ) ⊗ pN−j for 0 ≤ j ≤ N − 1. this manner it follows that pN−j ⊗ ψ (y1 ) = ψ Composing these local isomorphisms it follows that p ⊗ y1 ∼ = ψ Nk (y1 ) ⊗ p . But ψ N is the identity since the order of ψ divides n which divides N . Therefore p ⊗ y1 ∼ = y1 ⊗ p under the composition of local isomorphisms and we are done. " # In the notation in the previous section, E(p ⊗ y ) = E(y ⊗ p ), where p is the first N steps of the ground state path of P(* 0 , B). Write N = nM and B = B k,* . Then using the generalized cocyclage one may calculate explicitly the generalized charge of the LR tableau corresponding to the level * restricted (and hence classically restricted) path y ⊗ p . Let |y | denote the total number of cells in the tableaux comprising y . Then kM . (7.2) E(y ⊗ p ) = E(y ) + |y |kM + n* 2 Example 7.6. Let n = 5, * = 3, = 0 + 3 + 4 , k = 2 and M = 1. Then p is the path 4 4 4 5 5 5
⊗
2 2 2 3 3 3
⊗
1 1 1 5 5 5
⊗
3 3 3 4 4 4
⊗
1 1 1 2 2 2.
The element y can be taken to be the tensor product 1
1
2
2⊗ 3
3 4.
Let λ = (8, 8, 8, 7, 6). Then the tableau Q ∈ LR(λ; R) (resp. Y ) that records the path y ⊗ p (resp. y ) is given by 1 1 1
5
5
5
11 15
2 2 2
7
7
7
12 16
Q=3 3 3
8
8
8
13 17 ,
4 4 4
9
9
9
14
1 5 2 6 Y =3 7 4
6 6 6 10 10 10 with R = ((3, 3), (3, 3), (3, 3), (3, 3), (3, 3), (1, 1, 1, 1), (1, 1, 1)) and subalphabets {1, 2}, {3, 4}, {5, 6}, {7, 8}, {9, 10}, {11, 12, 13, 14}, {15, 16, 17}. The generalized charge
Fermionic Formulas for Level-Restricted Generalized Kostka Polynomials
133
cR (Q) is equal to the energy E(y ⊗ p ) [37, Theorem 23]. Here the widest rectangle in the path is of width * . For any tableau T ∈ LR(ρ; R) for some partition ρ, define V (T ) = P ((w0R Te )(w0R Tw )), where P is the Schensted P tableau, w0R is the automorphism of conjugation that reverses each of the subalphabets, and Tw and Te are the west and east subtableaux obtained by slicing T between the * th and (* + 1)th columns. It can be shown that there is a composition of |Te | generalized R-cocyclages leading from T to V (T ), where |Te | denotes the number of cells in Te . It follows from the ideas in [35, Sect. 3] and the intrinsic characterization of cR in [35, Theorem 21] that cR (T ) = cR (V (T )) + |Te |.
(7.3)
For the above tableau Q we have 1 1 1
1 1 1
2 2 2
2 2 2
Qw = 3 3 3
w0R Qw = 3 3 3
4 4 4
4 4 4
6 6 6
5 5 5
6
6
6
11 15
7
7
7
12 16
= 8
8
8
13 17 .
9
9
9
14
and 5
5
5
11 15
7
7
7
12 16
Qe = 8
8
8
13 17
9
9
9
14
w0R Qe
10 10 10
10 10 10
Then
V (Q) =
1
1
1
2
2
2
1
1
1
11 15
3
3
3
2
2
2
12 16
4
4
4
3
3
3
13 17
5
5
5
4
4
4
14
6
6
6
5
5
5
7
7
7
6
6
6
8
8
8
7
7
7
9
9
9
8
8
8
10 10 10.
9
9
9
11 15
10 10 10
12 16
and
V (V (Q)) =
13 17 14
134
A. Schilling, M. Shimozono
We have cR (V (V (Q))) = cR (Y ) = E(y ) by [35, Theorem 21] and cR (Q) = cR (V (Q)) + |Qe | = cR (V (Q)) + * n + |Y |, and cR (V (Q)) = cR (V (V (Q))) + |Y | by (7.3). This implies cR (Q) = * n + E(y ) + 2|Y |. 7.4. Proof of Theorem 7.3. The proof of Theorem 7.3 requires several lemmas. Words of length L in the alphabet {1, 2, . . . , n} are identified with the elements of the crystal basis of the L-fold tensor product (B 1,1 )⊗L . Lemma 7.7. Let u and v be words such that uv is an An−1 highest weight vector. Then v is an An−1 highest weight vector and j (u) ≤ φj (v) for all 1 ≤ j ≤ n − 1. Proof. Let uv be an An−1 highest weight vector and 1 ≤ j ≤ n − 1. By (3.5) 0 = j (uv) = j (v) + max{0, j (u) − φj (v)}. Since both summands on the right-hand side are nonnegative and sum to zero they must both be zero. " # Lemma 7.8. Let w be a word in the alphabet {1, 2} and w a word obtained by removing a letter i of w. Then w ) ≤ 1 (w) + 1 with equality only if i = 1. (1) 1 ( w ) + 1 with equality only if i = 2. (2) 1 (w) ≤ 1 ( Proof. Write w = uiv and w = uv. By (3.5) 1 (ui) = 1 (i) + max{0, 1 (u) − φ1 (i)} max{0, 1 (u) − 1} if i = 1 = 1 + 1 (u) if i = 2.
(7.4)
In particular 1 (ui) ≥ 1 (u) − 1. Applying (3.5) to both 1 (uv) and 1 (uiv) and subtracting, we obtain 1 (uv) − 1 (uiv) = max{0, 1 (u) − φ1 (v)} − max{0, 1 (ui) − φ1 (v)} ≤ max{0, 1 (u) − φ1 (v)} − max{0, 1 (u) − 1 − φ1 (v)} ≤ 1. Moreover if 1 (uv) − 1 (uiv) = 1 then all of the inequalities are equalities. In particular it must be the case that 1 (ui) = 1 (u) − 1, which by (7.4) implies that i = 1, proving the first assertion. On the other hand, (7.4) also implies 1 (ui) ≤ 1 + 1 (u). Subtracting 1 (uv) from 1 (uiv) and computing as before, the second part follows. " # Say that w is an almost highest weight vector with defect i if there is an index 1 ≤ i ≤ n − 1 such that j (w) = δij for 1 ≤ j ≤ n − 1, and also i−1 (ei (w)) = 0 if i > 1. Lemma 7.9. Let w be an almost highest weight vector with defect i for 1 ≤ i ≤ n − 1. Then ei (w) is either an An−1 highest weight vector or an almost highest weight vector of defect i + 1.
Fermionic Formulas for Level-Restricted Generalized Kostka Polynomials
135
Proof. For j ∈ {i − 1, i, i + 1}, the restriction of the words w and ei (w) to the alphabet {j, j + 1} are identical, so that j (ei (w)) = j (w) = 0 by the definition of an almost highest weight vector.Also i (w) = 1 implies that i (ei (w)) = 0.Again by the definition of an almost highest weight vector, i−1 (ei (w)) = 0. If i = n − 1 we have shown that ei (w) is an An−1 highest weight vector. So it may be assumed that i < n − 1. It is enough to show that one of the two following possibilities occurs. (1) i+1 (ei (w)) = 0. (2) i+1 (ei (w)) = 1 and i (ei+1 ei (w)) = 0. Recall that ei (w) is obtained from w by changing an i + 1 into an i. Write w = u(i + 1)v such that ei (w) = uiv. In this notation we have φi (v) = 0 and i (u) = 0. By Lemma 7.8 point 7.8 with {1, 2} replaced by {i + 1, i + 2} and using that w is an almost highest weight vector of defect i, we have i+1 (ei (w)) ≤ i+1 (w) + 1 = 1. It is now enough to assume that i+1 (ei (w)) = 1 and to show that i (ei+1 ei (w)) = 0. By (3.5) 0 = i+1 (w) = i+1 (u(i + 1)v) = i+1 (v) + max{0, i+1 (u) − φi+1 ((i + 1)v)}. In particular i+1 (v) = 0. Hence ei+1 (ei (w)) = ei+1 (uiv) = ei+1 (u)iv. Similar computations starting with i (w) = 1 and which use the fact that i (u) = φi (v) = 0, yield i (v) = 0. We have i (ei+1 ei (w)) = i (ei+1 (u)iv) = i (iv) + max{0, i (ei+1 (u)) − φi (iv)} = 0 + max{0, i (ei+1 (u)) − 1}. But i (u) = 0 and in passing from u to ei+1 (u) an i + 2 is changed into an i + 1. By Lemma 7.8 point 7.8 applied to the restriction of u to the alphabet {i, i + 1}, we have i (ei+1 (u)) ≤ i (u) + 1 = 1. It follows that i (ei+1 ei (w)) = 0, and that ei (w) is an almost highest weight vector of defect i + 1. " # Lemma 7.10. Suppose w is an An−1 highest weight vector and w is a word obtained by removing a letter (say i) from w. Then there is an index r such that i ≤ r ≤ n and er−1 er−2 · · · ei ( w ) is an An−1 highest weight vector. Proof. By Lemma 7.9 it suffices to show that w is either an An−1 highest weight vector or an almost highest weight vector of defect i. w ) = 0 for j = i. For j ∈ {i − 1, i}, the restrictions of w and First it is shown that j ( w to the alphabet {j, j + 1} are the same, so that j ( w ) = j (w) = 0. For j = i − 1, by Lemma 7.8 point 7.8 and the assumption that w is an An−1 highest weight vector, it follows that i−1 ( w ) ≤ i−1 (w) + 1 = 1. But equality cannot hold since the removed letter is i as opposed to i − 1. Thus i−1 ( w ) = 0. w ) ≤ i (w) + 1 = 1 by Lemma 7.8 point 7.8 and the fact Next we observe that i ( that w is an An−1 highest weight vector. w ) = 0 then w is an An−1 highest weight vector. So it may be assumed that If i ( i ( w ) = 1. It suffices to show that i−1 (ei ( w )) = 0. Write w = uiv and w = uv. Now
136
A. Schilling, M. Shimozono
j (v) = 0 for all 1 ≤ j ≤ n − 1 by Lemma 7.7 since w is an An−1 highest weight vector. In particular i (v) = 0 so that ei ( w ) = ei (uv) = ei (u)v. We have i−1 (ei ( w )) = i−1 (ei (u)v) = i−1 (v) + max{0, i−1 (ei (u)) − φi−1 (v)} = max{0, i−1 (ei (u)) − φi−1 (v)}, since i−1 (v) = 0 by Lemma 7.7. It is enough to show that i−1 (ei (u)) ≤ φi−1 (v). But i−1 (ei (u)) ≤ i−1 (u) + 1 = i−1 (ui) ≤ φi−1 (v). The first inequality holds by an application of Lemma 7.8 point 7.8 since the restrictions of u and ei (u) to the alphabet {i − 1, i} differ by inserting a letter i. The last inequality holds by Lemma 7.7 since w = uiv is an An−1 highest weight vector. " #
Lemma 7.11. Let B = B k,* be a perfect crystal of level * ≤ *, ∈ (Pcl+ )* , B a finite (possibly empty) tensor product of perfect crystals of level at most *, x ∈ B and b ∈ B such that x ⊗ b ∈ H(, B ⊗ B). Let i ∈ J such that hi , > 0 and set = − i + i−1 . Then there is an index 0 ≤ s ≤ k such that ei+s−1 · · · ei+1 ei (x ⊗ b) = x ⊗ ei+s−1 · · · ei+1 ei (b)
(7.5)
and ei+s−1 · · · ei (b) ∈ H( , B), where the subscripts are taken modulo n. Moreover if * = * then s = k. (1)
Proof. Since the Dynkin diagram An−1 has an automorphism given by rotation, it may be assumed that i = 1. Let λ be the partition of length less than n, given by hj , = λj − λj +1 for 1 ≤ j ≤ n − 1 and λn = 0. Since h1 , > 0 it follows that λ has t
a column of size 1. Let m = λ1 and yi be the An−1 -highest weight vector in B λj ,1 for 1 ≤ j ≤ m. Write y = ym ⊗ · · · ⊗ y1 and y = ym−1 ⊗ · · · ⊗ y1 . Observe that t t ,1 λ m y ⊗ u*0 is an affine highest weight vector in B ⊗ · · · ⊗ B λ1 ,1 ⊗ B(*0 ) and has weight so its connected component is isomorphic to B(). A similar statement holds for y ⊗ u*0 and B( ). In particular, b ⊗ y is an An−1 highest weight vector. The map x ⊗ b ⊗ y → word(x)word(b)word(y) gives an embedding of An−1 -crystals into a tensor product of crystals B 1,1 . By Lemma 7.10, there exists an index 1 ≤ r ≤ n such that er−1 er−2 · · · e1 (word(x)word(b)word( y )) is an An−1 highest weight vector. Since y is an An−1 highest weight vector it follows that er−1 · · · e1 (word(x)word(b)word( y )) = er−1 · · · e1 (word(x)word(b))word( y ). Let pj be the position of the letter in ej −1 . . . e1 (word(x)word(b)) that changes from a j + 1 to j upon the application of ej , for 1 ≤ j ≤ r − 1. It follows from the proof of Lemma 7.9 that pr−1 < pr−2 < · · · < p2 < p1 .
(7.6) b
Let s be the maximal index such that ps is located in word(b). Write = es · · · e1 (b). It follows that es es−1 · · · e1 (x ⊗ b) = x ⊗ b and that b ⊗ y is an An−1 highest weight vector. It remains to show that 0 (b ⊗ y ⊗ u*0 ) = 0 and that s ≤ k with equality if * = *.
(7.7)
Fermionic Formulas for Level-Restricted Generalized Kostka Polynomials
137
Consider the corresponding positions in the tableau b. Since b → word(b) is an An−1 crystal morphism, es · · · e1 (word(b)) = word(es · · · e1 (b)). Let (i1 , j1 ) be the position in the tableau b corresponding to the position p1 in word(b), and analogously define (i2 , j2 ), (i3 , j3 ), and so on. Since the rows of all tableaux (and in particular b, e1 (b), e2 e1 (b), etc.) are weakly increasing and (7.6) holds, it follows that i1 < i2 < i3 < · · · < is . But b has k rows, so s ≤ k. The next goal is to prove (7.7). Suppose first that s < n − 1. In this case the letters 1 and n are undisturbed in passing from e1 (b) to es · · · e1 (b). Using this and the Dynkin diagram rotation it follows that y ⊗ u*0 ) = 0 (e1 (b) ⊗ u ) 0 (es · · · e2 e1 (b) ⊗ = max{0, 0 (e1 (b)) − φ0 (u )} = max{0, 0 (e1 (b)) − φ0 (u ) − 1}.
(7.8)
But φ0 (u ) ≥ 0 (b) ≥ 0 (e1 (b)) − 1 by the fact that 0 (b ⊗ u ) = 0 and Lemma 7.8 point 7.8 applied after rotation of the Dynkin diagram. By (7.8) the desired result (7.7) follows. Otherwise assume s = n − 1. Here k = n − 1 since s ≤ k < n with the inequality holding by the perfectness of B. By (7.6) and the fact that b is a tableau, it must be the case that e1 acting on b changes a 2 in the first row of b into a 1, e2 acting on e1 (b) changes a 3 in the second row of e1 (b) into a 2, etc. Since b is a tableau with n − 1 rows with entries between 1 and n, there are integers 0 ≤ νn−1 ≤ νn−2 ≤ · · · ≤ ν1 < * such that the i th row of b consists of νi copies of the letter i and * − νi copies of the letter i + 1. For tableaux b of this very special form, the explicit formula for e0 in [37, (3.11)] yields 0 (b) = * − mn (b), where mn (b) is the number of occurrences of the letter n in b. Since b = en−1 · · · e1 (b) also has the same form (with νi replaced by νi + 1 for 1 ≤ i ≤ n − 1) and mn (b ) = mn (b) − 1, it follows that 0 (b ) = 0 (b) + 1. We have y ⊗ u*0 ) = 0 (b ⊗ u ) 0 (b ⊗
= max{0, 0 (b ) − φ0 (u )} = max{0, 0 (b) + 1 − (φ0 (u ) + 1)} = 0
since b ∈ H(, B). Finally, assuming * = *, it must be shown that s = k. Since the level of B is the same as that of the weights and , it follows from the perfectness of B that both b and b are uniquely defined by the property that (b) = and (b ) = . Let = n−1 i=0 zi i . By the explicit construction of b in Example 3.3, wt(b) =
n−1 k j =1 i=0
zi (i+j − i+j −1 ) =
n−1
zi (i+k − i )
i=0
with indices taken modulo n. Subtracting the analogous formula for wt(b ), wt(b) − wt(b ) = − kj =1 αj . Using (3.1) it follows that k = s. " # Proof of Theorem 7.3. First observe that x ⊗ b ∈ H( , B ⊗ B , φ(x)) by (3.1), b ∈ H( , B , ), and (x) = . Let c ∈ B and z ∈ B be such that x ⊗ b ∼ = c⊗z under the local isomorphism. Then c ⊗ z ∈ H( , B ⊗ B, φ(x)) which means that z is -restricted. Hence z ∈ H( , B, φ(z)) and c ∈ H(φ(z), B , φ(x)). The former together with the perfectness of B implies that y = z. From the latter it follows that
138
A. Schilling, M. Shimozono
ψ −k (c) ∈ H( , B , ). However the set H( , B , ) might have multiplicities so it is not obvious why b = ψ −k (c) or equivalently c = ψ k (b). The proof proceeds by an induction that changes the weight to a weight that is “closer to" *0 . Suppose first that there is a root direction i = 0 such that = − i + i−1 . By Lemma 7.11 applied for the weight hi , > 0 and , simple root αi , and element x ⊗ b ∈ H( , B ⊗ B ), there is an 0 ≤ s < n such , B , ), where = − s+i + s+i−1 and that b = ei+s−1 · · · ei+1 ei (b) ∈ H( ei+s−1 · · · ei (x ⊗ b) = x ⊗ b. Applying Lemma 7.11 with , αs+i , and x ∈ H(, B), , B). it follows that x = ek+s+i−1 · · · es+i (x) ∈ H( , B ⊗ B ). The above computations imply ek+s+i−1 · · · ei (x ⊗ b) = x ⊗ b ∈ H( , B ⊗ B) since x ⊗ b → c ⊗ y under We have ek+s+i−1 · · · ei+1 ei (c ⊗ y) ∈ H( the local isomorphism. It must be seen which of these raising operators act on the tensor factor in B and which act in B. By Lemma 7.11 applied with , αi , and c ⊗ y ∈ , B) and that ek+i−1 · · · ei (c⊗ H( , B ⊗B), it follows that y = ek+i−1 · · · ei (y) ∈ H( (1) y) = c⊗ y . Since y ⊗u is an An−1 highest weight vector, the rest of the raising operators es+k−1 · · · ek+i must act on the first tensor factor. Let c = ek+s+i−1 · · · ek+i (c). Then ek+s+i−1 · · · ei (c ⊗ y) = c ⊗ y . But the local isomorphism is a crystal morphism so it sends x ⊗ b → c ⊗ y . By induction c = ψ k ( b). By (3.6) it follows that c = ψ k (b). Otherwise there is no index i = 0 such that hi , > 0. This means = *0 . But the sets H(*0 , B, ) and H(*0 , B , φ(y)) are singletons whose lone elements are given by the An−1 highest weight vectors in B and B respectively. Since B ⊗ B is An−1 multiplicity-free it follows that the sets H(φ(y), B , φ(x)) and H(, B, φ(x)) are singletons. In this case it follows directly that c = ψ k (b) since both c and ψ k (b) are elements of the set H(φ(y), B , φ(x)). " # 7.5. Branching function by restricted generalized Kostka polynomials. The appropriate map from LR tableaux to rigged configurations, sends the generalized charge of the LR tableau to the charge of the rigged configuration. Unfortunately in general it is not clear what happens when one uses the statistic coming from the energy function E(b ⊗ y ) but using the path b ⊗ y ⊗ y . It is only known that the statistic E(b ⊗ y ⊗ y ) on the path b ⊗ y ⊗ y is well-behaved. So to continue the computation we require that y = ∅. This is achieved when = * 0 . So let us assume this. The other problem is that we do not consider all paths in H(*0 , B ⊗N ⊗ B , ), but only those of the form b ⊗ y , where y ∈ B is a fixed path. Passing to LR tableaux, this is equivalent to imposing an additional condition that the subtableaux corresponding to the first several rectangles must be in fixed positions. Conjecture 8.3 asserts that the corresponding sets of rigged configurations are well-behaved. The special case that requires no extra work is when B consists of a single perfect crystal. This is achievable when has the form = rs + (* − r)0 ; in this case B = B s,r and y is the sln -highest weight element of B s,r . This is the same as requiring that the first subtableau of the LR tableau be fixed. But this is always the case. Let R (M) consist of the single rectangle (r s ) followed by N = Mn copies of the rectangle (* k ), where B = B k,* . Let λ(M) be the partition of the same size as the total size of R (M) , (M) such that λ projects to − *0 . Then the set of paths H(*0 , B ⊗N ⊗ B s,r , ) is * equal to P−* ,R (M) . This is summarized by 0
kM
−rskM−n* ( 2 ) * b Kλ(M) ,R (M) (q), (q) = lim q M→∞
where is arbitrary, = rs + (* − r)0 , and = * 0 .
(7.9)
Fermionic Formulas for Level-Restricted Generalized Kostka Polynomials
139
Inserting expression (6.7) for the generalized Kostka polynomial in (7.9) and taking the limit yields the following fermionic expression for the branching function: b (q) = q
×
rs(s−n) 1 2n + 2*
n
|λ| 2 j =1 (λj − n )
(−1)|S|+1 q 2 u(S)C 1
−1 ⊗C −1 u(S)
S∈SCST(λ )
q 2 mC⊗C 1
−1 m−mI ⊗C −1 u(S)
m
*−1 n−1 m(a) +n(a) n−1 i
i=1 a=1 i=*
(a) mi
i
a=1
1 , (q)m(a)
(7.10)
*
where λ is any partition which projects to − *0 and u(S) as defined in (6.1). The n−1 (a) (a) sum over m runs over all m = *−1 a=1 mi ei ⊗ ea such that mi ∈ Z and i=1 e*−1 ⊗ ea (I ⊗ C −1 m − C −1 ⊗ C −1 u(S)) −
1 1 λj − |λ| ∈ Z * n a
j =1
(a)
for all 1 ≤ a ≤ n − 1. The variables ni (a)
ni
are given by
= ei ⊗ ea −C ⊗ C −1 m + I ⊗ C −1 (u(s) + er ⊗ es )
for all 1 ≤ a < n and 1 ≤ i < *, i = * . 8. Proof of Theorem 5.7 To prove Theorem 5.7 it clearly suffices to show that there is a bijection ψ R : RLR* (λ; R) → RC* (λ; R) that is charge-preserving, that is, cR (T ) = c(ψ R (T )) for all T ∈ RLR* (λ; R). Here we identify LR(λ; R) with RLR(λ; R) via the standardization bijec : CLR(λ; R) → N by c = c ◦ γ , where c : RLR(λ; R) → tion std. Also define cR R R R R N. It will be shown that one of the standard bijections ψ R : RLR(λ; R) → RC(λ; R) is charge-preserving, and that it restricts to a bijection RLR* (λ; R) → RC* (λ; R). With this in mind let us review the bijections from LR tableaux to rigged configurations. 8.1. Bijections from LR tableaux to rigged configurations. A bijection φ R : CLR(λ; R) → RC(λt ; R t ) was defined recursively in [25, Definition-Proposition 4.1]. It is one of four natural bijections from LR tableaux to rigged configurations: (1) Column index quantum: φ R : CLR(λ; R) → RC(λt ; R t ), R : CLR(λ; R) → RC(λt ; R t ), defined by φ R = (2) Column index coquantum: φ θR t ◦ φ R , (3) Row index quantum: ψ R : RLR(λ; R) → RC(λ; R), defined by ψ R = φ R t ◦ tr, and R : RLR(λ; R) → RC(λ; R), defined by ψ R = θR ◦ ψ R . (4) Row index coquantum: ψ Of these four, the one that is compatible with level-restriction is ψ. First we show that it is charge-preserving. This fact is a corollary of the difficult result [25, Theorem 9.1]. Proposition 8.1. c(ψ R (T )) = cR (T ) for all T ∈ RLR(λ; R).
140
A. Schilling, M. Shimozono
Proof. Consider the following diagram, which commutes by the definitions and [25, Theorem 7.1] RLR(λ; R) ggggogooo g g g g o g ggggg oootr g g o g g o wo sggg CLR(λ; R) tr / CLR(λt ; R t ) ψR LR φ R t φR / RC(λ; R). RC(λt ; R t ) tr γR−1
RC
In particular ψ R = tr RC ◦ φ R ◦ γR−1 . Let T ∈ RLR(λ; R) and Q = γR−1 (T ). Then, using tr RC ◦ θR t = θR ◦ tr RC , R (Q))). ψ R (T ) = θR (tr RC (φ R (Q)). Then Let (ν, J ) = tr RC (φ c(ψ R (T )) = c(θR (ν, J )) = ||R|| − cc(ν, J ) R (Q))) = cc(φ R (Q)) = cR (Q) = cR (T ) = ||R|| − cc(tr RC (φ . by Lemma 5.4, (5.6) and [25, Theorem 9.1] to pass from cc to cR
# "
In light of Proposition 8.1, to prove Theorem 5.7 it suffices to establish the following result. Theorem 8.2. The bijection ψ R : RLR(λ; R) → RC(λ; R) restricts to a well-defined bijection ψ R : RLR* (λ; R) → RC* (λ; R). Computer data suggests that the bijection ψ R is not only well-behaved with respect to level-restriction, but also with respect to fixing certain subtableaux. It was argued in Sect. 7.5 that the branching functions can be expressed in terms of generating functions of tableaux with certain fixed subtableaux.t t Let ρ ⊂ λ be partitions, Rρ = ((1ρ1 ), . . . , (1ρn )) and Tρ the unique tableau in RLR(ρ; Rρ ). Define RLR* (λ, ρ; R) to be the set of tableaux T ∈ RLR* (λ; Rρ ∪ R) such that T restricted to shape ρ equals Tρ . Recall the set of rigged configurations RC* (λ, ρ; R) defined in Sect. 5.3. Conjecture 8.3. The bijection ψ R : RLR(λ; R) → RC(λ; R) restricts to a well-defined bijection ψ R : RLR* (λ, ρ; R) → RC* (λ, ρ; R). 8.2. Reduction to single rows. In this section it is shown that to prove Theorem 8.2 it suffices to consider the case where R consists of single rows. Recall the nontrivial embedding iR : LR(λ; R) 7→ LR(λ; r(R)). We identify LR(λ; R) and RLR(λ; R) via std, and therefore have an embedding iR : RLR(λ; R) 7→ RLR(λ; r(R)). Define a map jR : RC(λ; R) → RC(λ; r(R)) as follows. Let (ν, J ) ∈ RC(λ; R). For each rectangle of R having k rows and m columns, add k − j strings (m, 0) of length m and label zero to the rigged partition (ν, J )(j ) for 1 ≤ j ≤ k − 1. The resulting rigged configuration is jR (ν, J ).
Fermionic Formulas for Level-Restricted Generalized Kostka Polynomials
141
Proposition 8.4. The following diagram commutes: iR
RLR(λ; R) −−−−→ RLR(λ; r(R)) ψ ψR
r(R) RC(λ; R) −−−−→ RC(λ; r(R)). jR
It must be shown that similar diagrams commute in which iR is replaced by either iR< or sp , the maps that occur in the definition of iR . Let jR< : RC(λ; R) → RC(λ; R < ) be defined by adding a string (µ1 , 0) to each of the first η1 − 1 rigged partitions in (ν, J ) ∈ RC(λ; R). Lemma 8.5. jR< is well-defined and the following diagram commutes: iR
, ∂pj ∂p/ 4 4 1 2 E0 p − E0 0 > 1 − √ , so that, since |Eλ (p) − E0 (p)| ≤ constλ , we also have
2
Eλ p − Eλ (0) + Eλ (0) − E0 (0) ≤ 2d + constλ2 . As Eλ (p) is real analytic in p, the ∂ 2 E0 (p) analytic implicit function theorem and Cauchy estimates are used to control ∂pj ∂p/ and the remainder. & ' Proof. For λ = 0, we have
Spectral Analysis Stochastic Lattice Ginzburg–Landau Models
391
4.2. The ladder approximation. The first part of this subsection is devoted to showing the existence or absence of two-particle bound states in the ladder approximation and follows [15]. We use the mixed coordinates of Eq. (2.7) to analyze the kernels in the BS equation. The kernel of D˜ λ0 is given by 0 0
(2) k (2) k 0 0 0 0 0 ˜ ˜ ˜ Dλ (p, q, k) = δ p + q Sλ − p , p Sλ + p , q δ p + q − k 2 2 0 0 k k (2) (2) + S˜λ + p 0 , p S˜λ − p 0 , k − p δ (p − q) . 2 2 (4.3) The Recall that D˜ λ (k 0 ) means D˜ λ taken at zero spatial momentum, i.e., D˜ λ ((k 0 , 0)). action of D˜ λ0 (k 0 ) on energy independent functions f (p), which depend only on p, is 0 0 (2) k (2) k (D˜ λ0 (k 0 )f )(p) = (2π)d+1 S˜λ + f (−p)]. + p 0 , p S˜λ − p 0 , p [f (p) 2 2 (4.4) In the ladder approximation, K˜ λ is replaced by its first order term λL˜ of Eq. (3.2), which is local in time and so 3 ˜ + E0 ( L(p, q, k) = − a2 [E0 (p) + E0 ( q ) + E0 (p − k) q − k)], 4 i.e., its Fourier transform does not depend on p0 , q 0 and k 0 . Hence, at zero total spatial momentum k, ˜ = − 3 a2 [E0 (p) L(p, q, (k 0 , 0)) + E0 ( q )], 2 ˜ 0 , 0) has rank two (in a scalar local field theory the which shows that the operator L(k rank is one). Solving the Bethe–Salpeter equation (2.9) for D˜ λ , in the ladder approximation, yields −1 ˜ 0) D˜ λ0 (k 0 ) D˜ λ (k 0 ) = 1 − (2π )−2(d+1) λD˜ λ0 (k 0 )L(k (4.5) −1 = D˜ λ0 (k 0 ) 1 − (2π )−2(d+1) λL˜ λ (k 0 )D˜ λ0 (k 0 ) with all quantities taken at zero spatial momentum as in (4.4). The action of L˜ λ D˜ λ0 is given by
L˜ λ (k 0 )D˜ λ0 (k 0 )f (p) = − 3a2 (2π )d+1 E0 (p) + E0 ( q) 0 0 k k (4.6) × S˜λ − q 0 , q S˜λ + q 0 , q 2 2 × f (−q) + f (−q 0 , q ) dq. Hence, if the test function f depends only on p, we have (L˜ λ (k 0 )D˜ λ0 (k 0 )f )(p) = −3a2 (2π )d+1 ρ0 (f ) + ρ1 (f )E0 (p) ,
392
P. A. Faria da Veiga, M. O’Carroll, E. Pereira, R. Schor
where ρn (f ) = G( q , k0 ) =
1 2
Td ∞
−∞
G( q , k 0 )E0 ( q )δ0n f ( q ) + f (− q ) d q;
n = 0, 1,
(2) (2) S˜λ (q)S˜λ (k 0 − q0 , q )dq0 .
It follows from q , k 0 ) is
and from a simple analytic continuation argument that G(
(4.1) 0 This result depends on the fact that Eλ (0) ≤ Eλ (p) analytic on Imk < 2Eλ (0). for any p ∈ Td , proven in Proposition 4.2. Recall, from (2.8), that the basic object we want to analyze is (f, D˜ λ (k 0 )f ), which has the form, 0 d+1 ˜ f (p)G( p, k 0 )g(p, k 0 )d p, (4.7) (f, Dλ (k )f ) = 2(2π ) Td
where
−1 ˜ 0 )D˜ λ0 (k 0 ) f (·). g(·, k 0 ) = 1 − (2π )−2(d+1) λL(k
must come from those of g(·, k 0 ). The only singularities of (4.7) on Imk 0 < 2Eλ (0)
But, in turn, these come from the zeroes of 1−µ± (k 0 ), where µ± (k 0 ) are the eigenvalues ˜ 0 )D˜ 0 (k 0 ) on the space generated by the functions 1 and E0 (p). of (2π)−2(d+1) λL(k We λ find
1/2 0 −(d+1) 0 0 0 (4.8) λ α(k ) ± β(k )γ (k ) µ± (k ) = −3a2 (2π) with the eigenfunction corresponding to µ+ given by β ψ+ (p) = 1 + E0 (p), γ where
α(k 0 ) = β(k ) =
Td
Td
0
γ (k 0 ) =
Td
E0 ( q )G( q , k 0 )d q, G( q , k 0 )d q,
(4.9)
E0 ( q )2 G( q , k 0 )d q.
Now, from (4.1), G( q , k 0 ) can be written as q )2 cλ ( π q , k 0 ), + G1 ( 2 Eλ ( q ) Eλ ( q )2 + 41 (k 0 )2
+ 2M0 . q , k 0 ) is analytic on Imk 0 < Eλ (0) where G1 ( From general principles, the singularities of (4.7) can only be located on the imaginary k 0 axis. Writing k 0 = iκ with κ ≥ 0 and using (4.1), one can show that G( q , iκ) > 0 G( q , k0 ) =
Spectral Analysis Stochastic Lattice Ginzburg–Landau Models
393
It follows then that α(iκ), β(iκ) and γ (iκ) are positive and, by for 0 ≤ κ < 2Eλ (0). Cauchy-Schwarz’s inequality, α ≤ [βγ ]1/2 on 0 ≤ κ < 2Eλ (0). For space dimension d ≥ 3, then α(iκ), β(iκ) and γ (iκ) increase to a finite limit as because the singularity generated by G( κ → 2Eλ (0) q , iκ) is quadratic and therefore integrable. Thus, if λ is small enough, 1 − µ± (iκ) cannot be zero on 0 < κ < 2Eλ (0) so that, in the ladder approximation, there are no bound states. but α − [βγ ]1/2 remains finite. This If d < 3, α, β and γ diverge as κ → 2Eλ (0), yields the nonvanishing of 1 − µ− (iκ) . Finally, 1 − µ+ (iκ) is nonzero if a2 > 0, if a2 < 0. This implies the and has a unique zero on the interval 0 < κ < 2Eλ (0), existence of a single bound state for the later case. be the mass for a single quasiparticle in the interacting theLet Mλ = Eλ (0) ory. The mass ML of the bound state, in the ladder approximation, is the solution of (assuming a2 < 0) F (λ, iML ) = −(2π )d+1 /3a2 λ, where F (λ, k 0 ) = α(λ, k 0 ) + [β(λ, k 0 )γ (λ, k 0 )]1/2 , and we have made explicit the λ dependence of α, β and γ . Let E = 2Mλ − ML . Performing an asymptotic analysis of the coefficients α, β, and γ we find 9 λ2 2 a [1 + O(λ)] ; if d = 1 4 m4 2 (4.10) E(λ) = 4π m2 exp − [1 + O(λ)] ; if d = 2. 3 |a2 | λ To go beyond the ladder approximation, let us introduce some function spaces. We define a weighted Hardy space Hδ (see [3, 21]) as functions f analytic in the strip | Imp j |< δ1 such that f (p) = f (−p), with norm given by, with α = α 0 , α , | w(p + iα)f (p + iα) |2 dp, sup f 2δ = | Imp 0 |< δ0 ;
|α0 | M0 . For | q 0 |≤ M0 , w(q)−1 Bδ (q −q )w(q )−1 is clearly bounded so that we have the required bound c$(κ). For | q 0 |> M0 , write q 0 = (q 0 − q 0 ) + q 0 , so that
2α
α α w(q)−1 = (q 0 )2 + 16Mλ2 ≤ 2 | q 0 − q 0 |2α +2 q 0 + 16Mλ2 , using the /p triangle inequality with p = α −1 . As r00 κ, q is O (q 0 )−4 , the result follows. & ' Let H∗ be the dual space to H, determined by the L2 inner product. We have Lemma 4.6. R0λ : H → H∗ is analytic in 0 < Reκ < 2Mλ , |Imκ| < Mλ and with norm bounded by c$(κ)2 , c > 0. Proof. From Eq. (4.3),
(g, R0λ f )2 ≤ sup S˜λ p 0 + iκ S˜λ p 0 − iκ w(p)−2 w(p) |g(p)f (p)| dp, 2 2 p and using (4.1) the result follows.
' &
Spectral Analysis Stochastic Lattice Ginzburg–Landau Models
397
4.4. Complete model: Existence of bound states. For the complete model, following [2], here we show the existence of mass spectrum in the interval κ ∈ (0, 2Mλ ) when d < 3 and a2 < 0. We will prove there is a unique bound state near the ladder bound state ML . In the next subsection, absence of bound states in (0, 2Mλ ) will be proven both for d ≥ 3 and a2 < 0 and for a2 > 0. Essentially, this is done by showing the existence or absence of an eigenvalue 1 of Kλ (κ) R0λ (κ). Multiplicity one is checked for the former case. Before we go to the technical details, we give a description of the strategy employed in both cases. For the repulsive case a2 > 0 and for the attractive case a2 < 0 and d ≥ 3, with Kλ = λL + λ2 K (2) λ , we write
−1 (2) , Dλ = DL + Dλ λ2 K (2) λ DL = DL 1 − λ2 Kλ DL where
−1 DL = Dλ0 + λDL LDλ0 = Dλ0 1 − λLDλ0 .
Using an explicit representation for DL , we show that DL has no singularities in (2) (0, 2Mλ ), and also that Kλ DL has norm less than one in (0, 2Mλ ). Hence, the resol−1
(2) is well defined by its Neumann series and Dλ does not have vent 1 − λ2 Kλ DL singularities in (0, 2Mλ ). For the attractive case a2 < 0 and d < 3, in order to show existence of a bound state we write
−1 Dλ = Dλ0 1 − Kλ Dλ0 and consider the family of compact operators, µ ∈ C, defined by Tλ (µ, κ) = −λT1 (κ) + µT2 (κ), where T1 , and T2 are defined in (4.16). We remark that µ = λ2 corresponds to the value of interest (the physical one), that is [see (4.15)] Tλ (λ2 , κ) = Kλ (κ)R0λ (κ). This family is shown to be compact and jointly analytic in κ and µ, for 0 < Reκ < 2Mλ and |µ| < 2λ2 . Without further analysis, the analytic Fredholm theory implies that −1 exists, except for κ in a discrete set. As Dλ0 is not singular in the same 1 − Kλ Dλ0 domain, it follows that the mass spectrum is discrete in (0, 2Mλ ). However, we show more. The point µ = 0 is called the ladder approximation which was solved explicitly in Subsect. 4.2, and leads to a bound state at some κ = κL ∈ (0, 2Mλ ). This is the only mass spectral point in (0, 2Mλ ). As µT2 is an analytic perturbation, it is shown that there is an isolated bound state of multiplicity one at κb ∈ (0, 2Mλ ), where κb lies in the interval |κb − κL | ≤ 21 bλ2 , for b sufficiently small, uniform in λ, such that κb is the unique mass spectral point in the interval. For κ in the intervals 0, κL − 21 bλ2 or κL + 21 bλ2 , 2Mλ − λ5/2 , the mass spec−1 exists. Thus, as Dλ0 is not singular, the trum is excluded by showing that 1 − Kλ Dλ0 −1 same holds for D = Dλ0 1 − Kλ Dλ0 . For κ near ML , the resolvent (−λT1 (κ) − w)−1 of −λT1 (κ) is constructed explicitly and µT2 (κ) is shown to be an analytic perturbation to this ladder operator. The resolvent (Tλ (µ, κ) − w)−1 is defined through its Neumann series and is shown to exist for w in
398
P. A. Faria da Veiga, M. O’Carroll, E. Pereira, R. Schor
the complement of | w |−1 , with | w |−1 < 4. This means that the spectrum of Tλ (µ, κ) is contained in | w |≤ 1/4, | w − 1 |≤ 1/4. Consequently, by analytic perturbation theory, there is a unique multiplicity one eigenvalue αλ (µ, κ) of Tλ (µ, κ) which is analytic both in κ and µ, and satisfies αλ (0, κ) = 1. However, we do not know that for real µ > 0 and small the eigenvalue takes the value one. To show that indeed it does, we compute the derivative [∂αλ /∂κ] (0, κ) (see Lemma 4.11), which is large positive for small λ. This is shown to be the dominating contribution to [∂αλ /∂κ] (µ, κ). Thus, for small real µ, αλ (µ, κ) is strictly monotone increasing in µ. In this way, we show: Lemma 4.7. Let µ and κ be real. For | µ |< 2λ2 and c sufficiently small, there is a unique κ = κλ (µ) in | κ − ML |≤ 21 cλ2 such that αλ (µ, κλ (µ)) = 1. Remark 4.8. Recall that µ = λ2 is the physical value of interest so that αλ λ2 , κλ (λ2 ) = 1 is the eigenvalue of Tλ (λ2 , κ) = Kλ (κ)R0λ (κ), where κ = κλ (λ2 ) ≡ Mb is the bound state mass given in (1.5). In order for the analysis of [2, Lemmas 2.7–2.11] to go through, it suffices to show the two lemmas below Lemma 4.9. Let µ± be defined as in (4.8). Then, for some positive c, 1 1 1 1 1 −1 ≤ c max , , Tλ (0, ML ) . [w − Tλ (0, ML )] w w − 1 w w − 1 w − µ− (ML ) Remark 4.10. We recall, for κ = ML , that µ+ (ML ) = 1 and | µ− (ML ) |≤ c | λ |. Note that the ladder bound state satisfies αλ (0, ML ) = 1. Lemma 4.11. For κ such that | κ − ML |≤ 21 cλ2 , with a sufficiently small c > 0, set αλ (0, κ) = ρ (α + βγ )1/2 . Then, there exist positive constants c1 and c2 such that ∂αλ (0, κ) ≥ λc1 $(κ)3 ≥ c2 λ−2 , for $(κ) as defined in (4.13). ∂κ Proof of Lemma 4.9. Using the representation (4.12), the resolvent [w − Tλ (0, ML )]−1 is bounded using Lemma 4.5. & ' Proof of Lemma 4.11. From the representations for α, β and γ [see (4.9)], we see that they are all strictly positive as well as their κ derivatives. From the bounds of Lemma 4.4, ∂αλ it follows that ' (0, κ) ≥ λc1 $(κ)3 . & ∂κ 4.5. Complete model: Absence of bound states. Here, considering the complete model and using the strategy described in Sect. 4.4, we show the absence of mass spectrum in (0, 2Mλ ) in the two-particle sector for the repulsive case a2 > 0 and d < 3, as well as for d ≥ 3. A variant of the method is used to complete the proof that excludes spectrum between the bound state Mb and the two-particle threshold 2Mλ . We treat the repulsive case following the method of [19] and the attractive one following [2]. As before, the λ dependence is omitted unless deemed necessary. To control the spectrum, we treat D = D 0 + DKD 0 as a perturbation about the ladder approximation. For this, we set DL for the (λ dependent) D solution of the ladder BS equation, that is, DL = D 0 + λDL LD 0 .
(4.18)
Spectral Analysis Stochastic Lattice Ginzburg–Landau Models
399
Then, D is given by
−1 D = DL + Dλ2 K (2) DL = DL 1 − λ2 K (2) DL .
In the repulsive case a2 > 0, we show that DL has no singularity in 0 < κ < 2Mλ , and that the bound state of mass Mb is isolated with isolation radius rb . We now show that there is no spectrum in (Mb + rb , 2Mb ) again by showing that K (2) DL has norm less than one in this interval. The starting point of the analysis is an explicit representation for DL . Using (4.6) and (4.17) in (4.18), and suppressing the κ dependence, gives q )X(p), DL (p, q) = r0 (p)δ(p + q) − 3λa2 r0 (q)Y (p) − 3λa2 r(q)E0 (
where X(p) =
(4.19)
DL (p, q)dq
,
Y (p) =
DL (p, q)E0 ( q )dq.
Multiplying (4.19) by the function 1 and E0 ( q ), integrating over q and solving for X(p) and Y (p), leads to 3λa2 DL (p, q) = r0 (p)δ(p + q) − + E0 ( q ) + 3λa2 E0 (p) D × (E0 (p) + E0 ( q )) α − γ − βE0 (p)E 0 ( q) ≡ r0 (p)δ(p + q) + c(p, q)r0 (p)r0 (q), where D = D(w = 1) [see (4.11)], that is
D = (1 − µ+ ) (1 − µ− ) = 1 + 3λa2 α + (3λa2 )2 α 2 − βγ .
(4.20)
To establish our result, it is sufficient to use the bound of Lemma 4.3 for the Hilbert– (2) Schmidt norm of λ2 Kλ R, with, following (4.14), R(κ) ≡ D˜ L (κ), and the bound (uniformly in p )
I ≡ R(p, q)f (p)g(q)dpdq ≤ O λ−1 w(q ) . (4.21)
In (4.21), suppressing the p and q dependence, (2)
f (p) = Kλ (κ, p + iδ, p); As the κ behavior of
J ≡
g(q) = w(q)−1 Bδ (q − q).
D˜ L (p, q)dpdq = β/D
(4.22)
(4.23)
is easily controlled (see Lemma 4.14 below), it is convenient to write I of (4.21) as I= r0 (p) [f (p)g(q) − f (0)g(0)] dpdq + [f (p) − f (0)] g(0)c(p, q)r0 (p)r0 (q)dpdq + f (p) [g(p) − g(0)] c(p, q)r0 (p)r0 (q)dpdq + Jf (0)g(0) ≡ X1 + X2 + X3 + X4 .
400
P. A. Faria da Veiga, M. O’Carroll, E. Pereira, R. Schor
The terms X2 and X3 are bounded by a combination of the methods used for bounding X1 and X4 (see [2]). We now bound X4 . Following [19], we write [h(p) ≡ f (p)g(p)] − h(0) as h(p) − h(0) = h(p) − h(0, p) + h(0, p) − h(0) ≡ δh1 (p) + δh2 (p). (4.24) The δh1 (p) and δh2 (p) terms are bounded in the lemmas below. Lemma 4.12. Recalling the definitions given in (4.22) and (4.24), the bound
δh1 (p)r00 (κ, p)dp ≤ O λ−1 w
is satisfied. Proof. Write [see (2.2) and (4.3)] o −1
r00 (κ, p) = (iκp )
−1 p/2 m2 κ2 + + (p ) − iκp − 4 2 2 −1 2 p/2 m2 κ 0 2 0 . + + − (p ) + iκp − 4 2 2
0 2
0
The singularity in p0 at zero is cancelled and for the first (second) terms we make the contour shift p 0 → p 0 ± iδ0 , with δ0 < δ0 . Thus, the denominators become 2 2 2 2 p2 (p0 )2 ± 2i δ0 p 0 ∓ κp 0 + κδ0 − δ02 + κ4 + m2 + 2/ , which is zero for κ4 = m2 + 2 √ m2 p/ 0 2 + δ0 κ − δ0 > 2 for δ0 < κ. Thus, κ > 2m. Hence, we have no p singularity 0 3 and the rest of the bound is carried out using the 1/(p ) falloff of the term of r00 (κ, p) as in the proof of Lemma 4.5. & '
Lemma 4.13. The bound δh2 (p)r00 (κ, p)dp
≤ O λ−1 w(q ) holds. Proof. Writing h(0, p) − h(0) = p.∇ u h(0, u ) |u=0 +
1
t
dt
0
dt
0
∂2 h(0, t p) ∂t 2
and doing the p0 integration, we get
cλ (p) 2 pj pk Eλ (p) 4Eλ (p )2 − 4Eλ (0 )2 + 4Mλ2 − κ 2
1 0
t
dt 0
dt
∂2 h(0, u = t p)d p, ∂uj uk
where the p terms integrate to zero by parity. The integral over p is finite for 0 < κ < 2Mλ . Concerning the derivatives of h(0, p), with respect to p j , we see that they are (2) (2) bounded by Bδ and those of Kλ . Using the analyticity of Kλ , the derivatives are uniformly bounded. Proceeding as in the proof of Lemma 4.5, the bound is completed. ' &
Spectral Analysis Stochastic Lattice Ginzburg–Landau Models
401
Lemma 4.14. There exist positive constants c1 , c2 , c3 and c4 such that i) For a2 > 0, and uniformly for 0 < κ < 2Mλ , −1 J ≤ c1 $(κ) 1 + 3λa2 c2 $(κ) − c3 λ2 . ii) For a2 < 0, and uniformly for 2Mλ − λ5/2 κ < 2Mλ , λJ ≤ c4 . Proof. i) From (4.20) and (4.23), we get 2 −1 J = β 1 − µ+ 1 − µ− = 1 + 3λa2 α + 3λa2 α 2 − βγ . The first bound follows from α, β, γ < c$(κ) but α 2 −βγ < 0, by the Cauchy-Schwarz inequality. However, by separating out the constant term in the numerators of α, β and γ , the p/2 singularity in the denominator is cancelled and α 2 − βγ < c uniformly in 0 < κ < 2Mλ . For ii) see Sect. 3 of [2]. & ' 5. Concluding Remarks We have determined the low-lying e − m spectrum for dynamic stochastic lattice Landau–Ginzburg models with small polynomial interaction and such that the equilibrium state is in the single phase region. The determination of the spectrum for models with equilibrium states in the multi-phase region is of interest. Also the question of the effect of large noise on the spectrum is relevant and is currently being investigated [13]. References 1. Dimock, J.: A Cluster Expansion for Stochastic Lattice Fields. J. Stat. Phys. (1990)
58, 1181–1207 2. Dimock, J., Eckmann, J.-P.: On the Bound State in Weakly Coupled λ φ 6 − φ 4 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.
2
Models. Commun.
Math. Phys. 51, 41–54 (1976) Duren, P.: Theory of H p Spaces. Pure and Applied Mathematics Vol. 38, New York: Academic Press, 1970 Glimm, J., Jaffe, A.: Quantum Physics: A Functional Integral Point of View. New York: Springer Verlag, 1986 Gammaitoni, L., Hanggi, P., Jung, P., Marchesoni, F.: Stochastic Resonance. Rev. Mod. Phys. 70, 223–287 (1998) Hohenberg, P. C., Halperin, B. I.: Theory of Dynamic Critical Phenomena. Rev. Mod. Phys. 49, 435–479 (1977) Horsthemke, W., Lefever, R.: Noise-induced Transitions. Berlin: Springer Verlag, 1984 Itzykson, C., Zuber, J.-B.: Quantum Field Theory. New York: McGraw-Hill, 1980 Jona-Lasinio, G., Mitter, P. K.: On the Stochastic Quantization of Field Theory. Commun. Math. Phys. 101, 409–436 (1985) Jona-Lasinio, G., Sénéor, R.: Study of Stochastic Differential Equations by Constructive Methods I. J. Stat. Phys. 83, 1109–1148 (1996) Kondratiev, Yu. G., Minlos, R. A.: One-Particle Subspaces in the Stochastic XY Model. J. Stat. Phys. 87, 613–642 (1997) Minlos, R. A., Suhov, Y. M.: On the Spectrum of the Generator of an Infinite System of Interacting Diffusions. Commun. Math. Phys. 206, 463–489 (1999) Pereira, E.: Noise Induced Bound States. Phys. Lett. A 282, 169–174 (2001) Reed, M., Simon, B.: Analysis of Operators. Modern Methods of Mathematical Physics Vol. IV, New York: Academic Press, 1978 Schor, R., Barata, J. C. A., Faria da Veiga, P. A., Pereira, E.: Spectral Properties of Weakly Coupled Landau-Ginzburg Stochastic Models. Phys. Rev. E 59, Issue 3, 2689–2694 (1999)
402
P. A. Faria da Veiga, M. O’Carroll, E. Pereira, R. Schor
16. Schor, R., O’Carroll, M.: Decay of the Bethe–Salpeter Kernel and Absence of Bound States for Lattice Classical Ferromagnetic Spin Systems at High Temperature. J. Stat. Phys. 99, 1207–1223 (2000); Transfer Matrix Spectrum and Bound States for Lattice Classical Ferromagnetic Spin Systems at High Temperature. J. Stat. Phys. 99, 1265–1279 (2000) 17. Simon, B.: Statistical Mechanics of Lattice Models. Princeton, NJ: Princeton University Press, 1994 18. Spencer, T.: The Decay of the Bethe–Salpeter Kernel in P(ϕ)2 Quantum Field Models. Commun. Math. Phys. 44, 143–164 (1975) 19. Spencer, T., Zirilli, F.: Scattering States and Bound States in λP(φ)2 Models. Commun. Math. Phys. 49, 1–16 (1976) 20. Spohn, H.: Large Scale Dynamics of Interacting Particles. Berlin: Springer Verlag, 1991 21. Stein, E.M.: Harmonic Analysis. Princeton, NJ: Princeton University Press, 1993 22. Zhizhina, E.A.: Two-Particle Spectrum of the Generator for Stochastic Model of Planar Rotators at High Temperature. J. Stat. Phys. 91, 343–366 (1998) 23. Zinn-Justin, J.: Quantum Field Theory and Critical Phenomena. Oxford: Oxford University Press, 1993 Communicated by Ya. G. Sinai
Commun. Math. Phys. 220, 403 – 428 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Global Properties of Gravitational Lens Maps in a Lorentzian Manifold Setting Volker Perlick Albert Einstein Institute, 14476 Golm, Germany. E-mail:
[email protected] Received: 16 October 2000 / Accepted: 18 January 2001
Abstract: In a general-relativistic spacetime (Lorentzian manifold), gravitational lensing can be characterized by a lens map, in analogy to the lens map of the quasi-Newtonian approximation formalism. The lens map is defined on the celestial sphere of the observer (or on part of it) and it takes values in a two-dimensional manifold representing a twoparameter family of worldlines. In this article we use methods from differential topology to characterize global properties of the lens map. Among other things, we use the mapping degree (also known as Brouwer degree) of the lens map as a tool for characterizing the number of images in gravitational lensing situations. Finally, we illustrate the general results with gravitational lensing (a) by a static string, (b) by a spherically symmetric body, (c) in asymptotically simple and empty spacetimes, and (d) in weakly perturbed Robertson–Walker spacetimes.
1. Introduction Gravitational lensing is usually studied in a quasi-Newtonian approximation formalism which is essentially based on the assumptions that the gravitational fields are weak and that the bending angles are small, see Schneider, Ehlers and Falco [1] for a comprehensive discussion. This formalism has proven to be very powerful for the calculation of special models. In addition it has also been used for proving general theorems on the qualitative features of gravitational lensing such as the possible number of images in a multiple imaging situation. As to the latter point, it is interesting to inquire whether the results can be reformulated in a Lorentzian manifold setting, i.e., to inquire to what extent the results depend on the approximations involved. In the quasi-Newtonian approximation formalism one considers light rays in Euclidean 3-space that go from a fixed point (observer) to a point that is allowed to vary Permanent address: TU Berlin, Sekr. PN 7-1, 10623 Berlin, Germany. E-mail:
[email protected] 404
V. Perlick
over a 2-dimensional plane (source plane). The rays are assumed to be straight lines with the only exception that they may have a sharp bend at a 2-dimensional plane (deflector plane) that is parallel to the source plane. (There is also a variant with several deflector planes to model deflectors which are not “thin”.) For each concrete mass distribution, the deflecting angles are to be calculated with the help of Einstein’s field equation, or rather of those remnants of Einstein’s field equation that survive the approximations involved. Hence, at each point of the deflector plane the deflection angle is uniquely determined by the mass distribution. As a consequence, following light rays from the observer into the past always gives a unique “lens map” from the deflector plane to the source plane. There is “multiple imaging” whenever this lens map fails to be injective. In this article we want to inquire whether an analogous lens map can be introduced in a spacetime setting, without using quasi-Newtonian approximations. According to the rules of general relativity, a spacetime is to be modeled by a Lorentzian manifold (M, g) and the light rays are to be modeled by the lightlike geodesics in M. We shall assume that (M, g) is time-oriented, i.e., that the timelike and lightlike vectors can be distinguished into future-pointing and past-pointing in a globally consistent way. To define a general lens map, we have to fix a point p ∈ M as the event where the observation takes place and we have to look for an analogue of the deflector plane and for an analogue of the source plane. As to the deflector plane, there is an obvious candidate, namely the celestial sphere Sp at p. This can be defined as the set of all one-dimensional lightlike subspaces of the tangent space Tp M or, equivalently, as the totality of all light rays issuing from p into the past. As to the source plane, however, there is no natural candidate. Following Frittelli, Newman and Ehlers [2–4], one might consider any timelike 3-dimensional submanifold T of the spacetime manifold as a substitute for the source plane. The idea is to view such a submanifold as ruled by worldlines of light sources. To make this more explicit, one could restrict to the case that T is a fiber bundle over a two-dimensional manifold N , with fibers timelike and diffeomorphic to R. Each fiber is to be interpreted as the worldline of a light source, and the set N may be identified with the set of all those worldlines. In this situation we wish to define a lens map fp : Sp −→ N by extending each light ray from p into the past until it meets T and then projecting onto N . In general, this prescription does not give a well-defined map since neither existence nor uniqueness of the target value is guaranteed. As to existence, there might be some past-pointing lightlike geodesics from p that never reach T . As to uniqueness, one and the same light ray might intersect T several times. The uniqueness problem could be circumvented by considering, on each past-pointing lightlike geodesic from p, only the first intersection with T , thereby willfully excluding some light rays from the discussion. This comes up to ignoring every image that is hidden behind some other image of a light source with a worldline ξ ∈ N . For the existence problem, however, there is no general solution. Unless one restricts to special situations, the lens map will be defined only on some subset Dp of Sp (which may even be empty). Also, one would like the lens map to be differentiable or at least continuous. This is guaranteed if one further restricts the domain Dp of the lens map by considering only light rays that meet T transversely. Following this line of thought, we give a precise definition of lens maps in Sect. 2. We will be a little bit more general than outlined above insofar as the source surface need not be timelike; we also allow for the limiting case of a lightlike source surface. This has the advantage that we may choose the source surface “at infinity” in the case of an asymptotically simple and empty spacetime. In Sect. 3 we briefly discuss some general properties of the caustic of the lens map. In Sect. 4 we introduce the mapping degree (Brouwer degree) of the lens map as an important tool from differential topology.
Global Properties of Gravitational Lens Maps in Lorentzian Setting
405
This will then give us some theorems on the possible number of images in gravitational lensing situations, in particular in the case that we have a “simple lensing neighborhood”. The latter notion will be introduced and discussed in Sect. 5. We conclude with applying the general results to some examples in Sect. 6. Our investigation will be purely geometrical in the sense that we discuss the influence of the spacetime geometry on the propagation of light rays but not the influence of the matter distribution on the spacetime geometry. In other words, we use only the geometrical background of general relativity but not Einstein’s field equation. For this reason the “deflector”, i.e., the matter distribution that is the cause of gravitational lensing, never explicitly appears in our investigation. However, information on whether the deflectors are transparent or non-transparent will implicitly enter into our considerations. 2. Definition of the Lens Map As a preparation for precisely introducing the lens map in a spacetime setting, we first specify some terminology. By a manifold we shall always mean what is more fully called a “real, finitedimensional, Hausdorff, second countable (and thus paracompact) C ∞ -manifold without boundary”. Whenever we have a C ∞ vector field X on a manifold M, we may consider two points in M as equivalent if they lie on the same integral curve of X. We shall denote the resultant quotient space, which may be identified with the set of all integral curves of X, by M/X. We call X a regular vector field if M/X can be given the structure of a manifold in such a way that the natural projection πX : M −→ M/X becomes a C ∞ -submersion. It is easy to construct examples of non-regular vector fields. E.g., if X has no zeros and is defined on Rn \ {0}, then M/X cannot satisfy the Hausdorff property, so it cannot be a manifold according to our terminology. Palais [5] has proven a useful result which, in our terminology, can be phrased in the following way. If none of X’s integral curves is closed or almost closed, and if M/X satisfies the Hausdorff property, then X is regular. We are going to use the following terminology. A Lorentzian manifold is a manifold M together with a C ∞ metric tensor field g of Lorentzian signature (+ · · · + −). A Lorentzian manifold is time-orientable if the set of all timelike vectors {Z ∈ T M | g(Z, Z) < 0} has exactly two connected components. Choosing one of those connected components as future-pointing defines a time-orientation for (M, g).A spacetime is a connected 4-dimensional time-orientable Lorentzian manifold together with a time-orientation. We are now ready to define what we will call a “source surface” in a spacetime. This will provide us with the target space for lens maps. Definition 1. (T , W ) is called a source surface in a spacetime (M, g) if (a) T is a 3-dimensional C ∞ submanifold of M; (b) W is a nowhere vanishing regular C ∞ vector field on T which is everywhere causal, g(W, W ) ≤ 0, and future-pointing; (c) πW : T −→ N = T /W is a fiber bundle with fiber diffeomorphic to R and the quotient manifold N = T /W is connected and orientable. We want to interpret the integral curves of W as the worldlines of light sources. Thus, one should assume that they are not only causal but even timelike, g(W, W ) < 0, since a light source should move at subluminal velocity. For technical reasons, however, we
406
V. Perlick
allow for the possibility that an integral curve of W is lightlike (everywhere or at some points), because such curves may appear as (C 1 -)limits of timelike curves. This will give us the possibility to apply the resulting formalism to asymptotically simple and empty spacetimes in a convenient way, see Subsect. 6.2 below. Actually, the causal character of W will have little influence upon the results we want to establish. What really matters is a transversality condition that enters into the definition of the lens map below. Please note that, in the situation of Def. 1, the bundle πW : T −→ N is necessarily trivializable, i.e., T N × R. To prove this, let us assume that the flow of W is defined on all of R × T , so it makes πW : T −→ N into a principal fiber bundle. (This is no restriction of generality since it can always be achieved by multiplying W with an appropriate function. This function can be determined in the following way. Owing to a famous theorem of Whitney [6], also see Hirsch [7], p. 55, paracompactness guarantees that T can be embedded as a closed submanifold into Rn for some n. Pulling back the Euclidean metric gives a complete Riemannian metric h on T and the flow of the vector field h(W, W )−1/2 W is defined on all of R × T , cf. Abraham and Marsden [8], Prop. 2.1.21.) Then the result follows from the well known facts that any fiber bundle whose typical fiber is diffeomorphic to Rn admits a global section (see, e.g., Kobayashi and Nomizu [9], p. 58), and that a principal fiber bundle is trivializable if and only if it admits a global section (see again [9], p. 57). Also, it is interesting to note the following. If T is any 3-dimensional submanifold of M that is foliated into timelike curves, then time orientability guarantees that these are the integral curves of a timelike vector field W . If we assume, in addition, that T contains no closed timelike curves, then it can be shown that πW : T −→ N is necessarily a fiber bundle with fiber diffeomorphic to R, providing N satisfies the Hausdorff property, see Harris [10], Theorem 2. This shows that there is little room for relaxing the conditions of Def. 1. Choosing a source surface in a spacetime will give us the target space N = T /W for the lens map. To specify the domain of the lens map, we consider, at any point p ∈ M, the set Sp of all lightlike directions at p, i.e., the set of all one-dimensional lightlike subspaces of Tp M. We shall refer to Sp as to the celestial sphere at p. This is justified since, obviously, Sp is in natural one-to-one relation with the set of all light rays arriving at p. As it is more convenient to work with vectors rather than with directions, we shall usually represent Sp as a submanifold of Tp M. To that end we fix a future-pointing timelike vector Vp in the tangent space Tp M. The vector Vp may be interpreted as the 4-velocity of an observer at p. We now consider the set Sp = Yp ∈ Tp M g(Yp , Yp ) = 0 and g(Yp , Vp ) = 1 . (1) It is an elementary fact that (1) defines an embedded submanifold of Tp M which is diffeomorphic to the standard 2-sphere S 2 . As indicated by our notation, the set (1) can be identified with the celestial sphere at p, just by relating each vector to the direction spanned by it. Representation (1) of the celestial sphere gives a convenient way of representing the light rays through p. We only have to assign to each Yp ∈ Sp the lightlike geodesic s −→ expp (sYp ) , where expp : Wp ⊆ Tp M −→ M denotes the exponential map at the point p of the Levi-Civita connection of the metric g. Please note that this geodesic is past-pointing, because Vp was chosen future-pointing, and that it passes through p at the parameter value s = 0. The lens map is defined in the following way. After fixing a source surface (T , W ) and choosing a point p ∈ M, we denote by Dp ⊆ Sp the subset of all lightlike
Global Properties of Gravitational Lens Maps in Lorentzian Setting
p
..................................................................................... ............ .... ... . ..... .................... .... ... ......... .... .... .... ........ .... .... ....... .... .... .... .... .... .... ... ... ... .... .... .... ... . . . . .... . . .... .... .... .... .... . .... . . .... . . . . . . . .... . . .... ... ... ... .... . ... . . . . . .... . . . . .... . . . . . .... .... ..... .... ..... ..... .... . . . . .... . . p . . . .... ..... .... .... ..... .... .. .... .. . . ... . .... ... . . . . .... . . .... .. .... .... ..... .... . . ..... ....... .... ..... ..... .... ...... .... ... ... .. .. .... . . ... ... .... . . . . .... .... . . .... ... .... .... .. .... .... . . .... ... . .... .... . ..... . .... ... .... ....... .... .... .... ..... ... .... . .... .... . .... . .... .... .... ..... .... .... .... .... .... .... ... ... ... .... .... .... ... .... .... .... .... .... .... .... .... .... .... .... ... . ..................................................................... ... . ................ ... .... ........... ........... .... ......... ......... ..... ........ ... .......
q ❅ ❘ ❅
407
✻
W
Y
T
q
πW
❄ q
..................................................................... ............... ........... ........... ......... ......... ........ p p .....
f (Y )
N
Fig. 1. Illustration of the lens map
directions at p such that the geodesic to which this direction is tangent meets T (at least once) if sufficiently extended to the past, and if at the first intersection point q with T this geodesic is transverse to T . By projecting q to N = T /W we get the lens map fp : Dp −→ N = T /W , see Fig. 1. If we use the representation (1) for Sp , the definition of the lens map can be given in more formal terms in the following way. Definition 2. Let (T , W ) be a source surface in a spacetime (M, g). Then, for each p ∈ M, the lens map fp : Dp −→ N = T /W is defined in the following way. In the notation of Eq. (1), let Dp be the set of all Yp ∈ Sp such that there is a real number wp (Yp ) > 0 with the properties (a) sYp is in the maximal domain of the exponential map for all s ∈ [ 0 , wp (Yp )]; (b) the curve s −→ exp(sYp ) intersects T at the value s = wp (Yp ) transversely; (c) expp (sYp ) ∈ / T for all s ∈ [ 0 , wp (Yp )[ . This defines a map wp : Dp −→ R. The lens map at p is then, by definition, the map fp : Dp −→ N = T /X ,
fp (Yp ) = πW expp (wp (Yp )Yp ) .
(2)
Here πW : T −→ N denotes the natural projection. The transversality condition in part (b) of Def. 2 guarantees that the domain Dp of the lens map is an open subset of Sp . The case Dp = ∅ is, of course, not excluded. In particular, Dp = ∅ whenever p ∈ T , owing to part (c) of Def. 2.
408
V. Perlick
Moreover, the transversality condition in part (b) of Definition 2, in combination with the implicit function theorem, makes sure that the map wp : Dp −→ R is a C ∞ map. As the exponential map of a C ∞ metric is again C ∞ , and πW is a C ∞ submersion by assumption, this proves the following. Proposition 1. The lens map is a C ∞ map. Please note that without the transversality condition the lens map need not even be continuous. Although our Def. 2 made use of the representation (1), which refers to a timelike vector Vp , the lens map is, of course, independent of which future-pointing Vp has been chosen. We decided to index the lens map only with p although, strictly speaking, it depends on T , on W , and on p. Our philosophy is to keep a source surface (T , W ) fixed, and then to consider the lens map for all points p ∈ M. In view of gravitational lensing, the lens map admits the following interpretation. For ξ ∈ N , each point Yp ∈ Dp with fp (Yp ) = ξ corresponds to a past-pointing lightlike geodesic from p to the worldline ξ in M, i.e., it corresponds to an image at the celestial sphere of p of the light source with worldline ξ . If fp is not injective, we are in a multiple imaging situation. The converse need not be true as the lens map does not necessarily cover all images. There might be a past-pointing lightlike geodesic from p reaching ξ after having met T before, or being tangential to T on its arrival at ξ . In either case, the corresponding image is ignored by the lens map. The reader might be inclined to view this as a disadvantage. However, in Sect. 6 below we discuss some situations where the existence of such additional light rays can be excluded (e.g., asymptotically simple and empty spacetimes) and situations where it is desirable, on physical grounds, to disregard such additional light rays (e.g., weakly perturbed Robertson–Walker spacetimes with compact spatial sections). It was already mentioned that the domain Dp of the lens map might be empty; this is, of course, the worst case that could happen. The best case is that the domain is all of the celestial sphere, Dp = Sp . We shall see in the following sections that many interesting results are true just in this case. However, there are several cases of interest where Dp is a proper subset of Sp . If the domain of the lens map fp is the whole celestial sphere, none of the light rays issuing from p into the past is blocked or trapped before it reaches T . In view of applications to gravitational lensing, this excludes the possibility that these light rays meet a non-transparent deflector. In other words, it is a typical feature of gravitational lensing situations with non-transparent deflectors that Dp is not all of Sp . Two simple examples, viz., a non-transparent string and a non-transparent spherical body, will be considered in Subsect. 6.1 below. 3. Regular and Critical Values of the Lens Map Please recall that, for a differentiable map F : M1 −→ M2 between two manifolds, Y ∈ M1 is called a regular point of F if the differential TY F : TY M1 −→ TF (Y ) M2 has maximal rank, otherwise Y is called a critical point. Moreover, ξ ∈ M2 is called a regular value of F if all Y ∈ F −1 (ξ ) are regular points, otherwise ξ is called a critical value. Please note that, according to this definition, any ξ ∈ M2 that is not in the image of F is regular. The well-known (Morse-)Sard theorem (see, e.g., Hirsch [7], p. 69) says that the set of regular values of F is residual (i.e., it contains the intersection of countably many sets that are open and dense in M2 ) and thus dense in M2 and the critical values of F make up a set of measure zero in M2 .
Global Properties of Gravitational Lens Maps in Lorentzian Setting
For the lens map fp : Dp −→ N , we call the set Caust(fp ) = ξ ∈ N ξ is a critical value of fp
409
(3)
the caustic of fp . The Sard theorem then implies the following result. Proposition 2. The caustic Caust(fp ) is a set of measure zero in N and its complement N \ Caust(fp ) is residual and thus dense in N . Please note that Caust(fp ) need not be closed in N . Counter-examples can be constructed easily by starting with situations where the caustic is closed and then excising points from spacetime. For lens maps defined on the whole celestial sphere, however, we have the following result. Proposition 3. If Dp = Sp , the caustic Caust(fp ) is compact in N . This is an obvious consequence of the fact that Sp is compact and that fp and its first derivative are continuous. As the domain and the target space of fp have the same dimension, Yp ∈ Dp is a regular point of fp if and only if the differential TYp fp : TYp Sp −→ Tfp (Yp ) N is an isomorphism. In this case fp maps a neighborhood of Yp diffeomorphically onto a neighborhood of fp (Yp ). The differential TYp fp may be either orientation-preserving or orientation-reversing. To make this notion precise we have to choose an orientation for Sp and an orientation for N . For the celestial sphere Sp it is natural to choose the orientation according to which the origin of the tangent space Tp M is to the inner side of Sp . The target manifold N is orientable by assumption, but in general there is no natural choice for the orientation. Clearly, choosing an orientation for N fixes an orientation for T , because the vector field W gives us an orientation for the fibers. We shall say that the orientation of N is adapted to some point Yp ∈ Dp if the geodesic with initial vector Yp meets T at the inner side. If Dp is connected, the orientation of N that is adapted to some Yp ∈ Dp is automatically adapted to all other elements of Dp . Using this terminology, we may now introduce the following definition. Definition 3. A regular point Yp ∈ Dp of the lens map fp is said to have even parity (or odd parity, respectively) if TYp fp is orientation-preserving (or orientation-reversing, respectively) with respect to the natural orientation on Sp and the orientation adapted to Yp on N . For a regular value ξ ∈ N of the lens map, we denote by n+ (ξ ) (or n− (ξ ), respectively) the number of elements in fp−1 (ξ ) with even parity (or odd parity, respectively). Please note that n+ (ξ ) and n− (ξ ) may be infinite, see the Schwarzschild example in Subsect. 6.1 below. A criterion for n± (ξ ) to be finite will be given in Prop. 8 below. Definition 3 is relevant for gravitational lensing in the following sense. The assumption that Yp is a regular point of fp implies that an observer at p sees a neighborhood of ξ = fp (Yp ) in N as a neighborhood of Yp at his or her celestial sphere. If we compare the case that Yp has odd parity with the case that Yp has even parity, then the appearance of the neighborhood in the first case is the mirror image of its appearance in the second case. This difference is observable for a light source that is surrounded by some irregularly shaped structure, e.g. a galaxy with curved jets or with lobes. If ξ is a regular value of fp , it is obvious that the points in fp−1 (ξ ) are isolated, i.e., any Yp in fp−1 (ξ ) has a neighborhood in Dp that contains no other point in fp−1 (ξ ). This follows immediately from the fact that fp maps a neighborhood of Yp diffeomorphically
410
V. Perlick
onto its image. In the next section we shall formulate additional assumptions such that the set fp−1 (ξ ) is finite, i.e., such that the numbers n± (ξ ) introduced in Def. 3 are finite. It is the main purpose of the next section to demonstrate that then the difference n+ (ξ ) − n− (ξ ) has some topological invariance properties. As a preparation for that we notice the following result which is an immediate consequence of the fact that the lens map is a local diffeomorphism near each regular point. Proposition 4. n+ and n− are constant on each connected component of fp (Dp ) \ Caust(fp ). Hence, along any continuous curve in fp (Dp ) that does not meet the caustic of the lens map, the numbers n+ and n− remain constant, i.e., the observer at p sees the same number of images for all light sources on this curve. If a curve intersects the caustic, the number of images will jump. In the next section we shall prove that n+ and n− always jump by the same amount (under conditions making sure that these numbers are finite), i.e., the total number of images always jumps by an even number. This is well known in the quasi-Newtonian approximation formalism, see, e.g., Schneider, Ehlers and Falco [1], Sect. 6. If Caust(fp ) is empty, transversality guarantees that fp (Dp ) is open in N and, thus a manifold. Proposition 4 implies that, in this case, fp gives a C ∞ covering map from Dp onto fp (Dp ). As a C ∞ covering map onto a simply connected manifold must be a global diffeomorphism, this implies the following result. Proposition 5. Assume that Caust(fp ) is empty and that fp (Dp ) is simply connected. Then fp gives a global diffeomorphism from Dp onto fp (Dp ). In other words, the formation of a caustic is necessary for multiple imaging provided that fp (Dp ) is simply connected. In Subsect. 6.1 below we shall consider the spacetime of a non-transparent string. This will demonstrate that the conclusion of Prop. 5 is not true without the assumption of fp (Dp ) being simply connected. In the rest of this subsection we want to relate the caustic of the lens map to the caustic of the past light cone of p. The past light cone of p can be defined as the image set in M of the map Fp : (s, Yp ) −→ expp (sYp )
(4)
considered on its maximal domain in ] 0 , ∞ [ × Sp , and its caustic can be defined as the set of critical values of Fp . In other words, q ∈ M is in the caustic of the past light cone of p if and only if there is an s0 ∈ ] 0 , ∞ [ and a Yp ∈ Sp such that the differential T(s0 ,Yp ) Fp has rank k < 3. In that case one says that the point q = expp (s0 Yp ) is conjugate to p along the geodesic s −→ expp (sYp ), and one calls the number m = 3 − k the multiplicity of this conjugate point. As Fp ( · , Yp ) is always an immersion, the multiplicity can take the values 1 and 2 only. (This formulation is equivalent to the definition of conjugate points and their multiplicities in terms of Jacobi vector fields which may be more familiar to the reader.) It is well known, but far from trivial, that along every lightlike geodesic conjugate points are isolated. Hence, in a compact parameter interval there are only finitely many points that are conjugate to a fixed point p. A proof can be found, e.g., in Beem, Ehrlich and Easley [11], Theorem 10.77. After these preparations we are now ready to establish the following proposition. We use the notation introduced in Def. 2.
Global Properties of Gravitational Lens Maps in Lorentzian Setting
411
Proposition 6. An element Yp ∈ Dp is a regular point of the lens map if and only if the point expp (wp (Yp )Yp ) is not conjugate to p along the geodesic s −→ expp (sYp ). A regular point Yp ∈ Dp has even parity (or odd parity, respectively) if and only if the number of points conjugate to p along the geodesic [ 0 , wp (Yp )] −→ M , s −→ expp (sYp ) is even (or odd, respectively). Here each conjugate point is to be counted with its multiplicity. Proof. In terms of the function (4), the lens map can be written in the form fp (Yp ) = πW Fp (wp (Yp ), Yp ) .
(5)
As s −→ Fp (s, Yp ) is an immersion transverse to T at s = wp (Yp ) and πW is a submersion, the differential of fp at Yp has rank 2 if and only if the differential of Fp at (wp (Yp ), Yp ) has rank 3. This proves the first claim. For proving the second claim define, for each s ∈ [0, wp (Yp )], a map s : TYp Sp −→ Tfp (Yp ) N
(6)
by applying to each vector in TYp Sp the differential T(s,Yp ) Fp , parallel-transporting the result along the geodesic Fp ( · , Yp ) to the point q = Fp wp (Yp ), Yp and then projecting down to Tfp (Yp ) N . In the last step one uses the fact that, by transversality, any vector in Tq M can be uniquely decomposed into a vector tangent to T and a vector tangent to the geodesic Fp ( · , Yp ). For s = 1, this map s gives the differential of the lens map. We now choose a basis in TYp Sp and a basis in Tfp (Yp ) N , thereby representing the map s as a (2 × 2)-matrix. We choose the first basis right-handed with respect to the natural orientation on Sp and the second basis right-handed with respect to the orientation on N that is adapted to Yp . Then det(0 ) is positive as the parallel transport gives an orientation-preserving isomorphism. The function s −→ det(s ) has a single zero whenever Fp (s, Yp ) is a conjugate point of multiplicity one and it has a double zero whenever Fp (s, Yp ) is a conjugate point of multiplicity two. Hence, the sign of det(1 ) can be determined by counting the conjugate points. This result implies that ξ ∈ N is a regular value of the lens map fp whenever the worldline ξ does not pass through the caustic of the past light cone of p. The relation between parity and the number of conjugate points is geometrically rather evident because each conjugate point is associated with a “crossover” of infinitesimally neighboring light rays. 4. The Mapping Degree of the Lens Map The mapping degree (also known as Brouwer degree) is one of the most powerful tools in differential topology. In this section we want to investigate what kind of information could be gained from the mapping degree of the lens map, providing it can be defined. For the reader’s convenience we briefly summarize the definition and main properties of the mapping degree, following closely Choquet-Bruhat, Dewitt-Morette, and DillardBleick [12], pp. 477. For a more abstract approach, using homology theory, the reader may consult Dold [13], Spanier [14] or Bredon [15]. In this article we shall not use homology theory with the exception of the proof of Prop. 11. The definition of the mapping degree is based on the following observation.
412
V. Perlick
Proposition 7. Let F : D ⊆ M1 −→ M2 be a continuous map, where M1 and M2 are oriented connected manifolds of the same dimension, D is an open subset of M1 with compact closure D and F |D is a C ∞ map. (Actually, C 1 would do.) Then for every ξ ∈ M2 \ F (∂D) which is a regular value of F |D , the set F −1 (ξ ) is finite. Proof. By contradiction, let us assume that there is a sequence (yi )i∈N with pairwise different elements in F −1 (ξ ). By compactness of D, we can choose an infinite subsequence of (yi )i∈N that converges towards some point y∞ ∈ D. By continuity of F , F (y∞ ) = ξ , so the hypotheses of the proposition imply that y∞ ∈ / ∂D. As a consequence, y∞ is a regular point of F |D , so it must have an open neighborhood in D that does not contain any other element of F −1 (ξ ). This contradicts the fact that a subsequence of (yi )i∈N converges towards y∞ . If we have a map F that satisfies the hypotheses of Prop. 7, we can thus define, for every ξ ∈ M2 \ F (∂D) which is a regular value of F |D , deg(F, ξ ) = sgn(y) , (7) y ∈ F −1 (ξ )
where sgn(y) is defined to be +1 if the differential Ty F preserves orientation and −1 if Ty F reverses orientation. If F −1 (ξ ) is the empty set, the right-hand side of (7) is set equal to zero. The number deg(F, ξ ) is called the mapping degree of F at ξ . Roughly speaking, deg(F, ξ ) tells how often the image of F covers the point ξ , counting each “layer” positive or negative depending on orientation. The mapping degree has the following properties (for proofs see Choquet-Bruhat, Dewitt-Morette, and Dillard-Bleick [12], pp. 477). Property A. deg(F, ξ ) = deg(F, ξ ) whenever ξ and ξ are in the same connected component of M2 \ F (∂D). Property B. deg(F, ξ ) = deg(F , ξ ) whenever F and F are homotopic, i.e., whenever there is a continuous map : [0, 1] × D −→ M2 , (s, y) −→ s (y) with 0 = F and 1 = F such that deg(s , ξ ) is defined for all s ∈ [0, 1]. Property A can be used to extend the definition of deg(F, ξ ) to the non-regular values ξ ∈ M2 \ F (∂D). Given the fact that, by the Sard theorem, the regular values are dense in M2 , this can be done just by continuous extension. Property B can be used to extend the definition of deg(F, ξ ) to continuous maps F : D −→ M2 which are not necessarily differentiable on D. Given the fact that the C ∞ maps are dense in the continuous maps with respect to the C 0 -topology, this can be done again just by continuous extension. We now apply these general results to the lens map fp : Dp −→ N . In the case Dp = Sp it is necessary to extend the domain of the lens map onto a compact set to define the degree of the lens map. We introduce the following definition. Definition 4. A map fp : Dp ⊆ M1 −→ M2 is called an extension of the lens map fp : Dp −→ N if (a) M1 is an orientable manifold that contains Dp as an open submanifold; (b) M2 is an orientable manifold that contains N as an open submanifold; (c) the closure Dp of Dp in M1 is compact; (d) fp is continuous and the restriction of fp to Dp is equal to fp .
Global Properties of Gravitational Lens Maps in Lorentzian Setting
413
If the lens map is defined on the whole celestial sphere, Dp = Sp , then the lens map is an extension of itself, fp = fp , with M1 = Sp and M2 = N . If Dp = Sp , one may try to continuously extend fp onto the closure of Dp in Sp , thereby getting an extension with M1 = Sp and M2 = N . If this does not work, one may try to find some other extension. The string spacetime in Subsect. 6.1 below will provide us with an example where an extension exists although fp cannot be continuously extended from Dp onto its closure in Sp . The spacetime around a spherically symmetric body with Ro < 3m will provide us with an example where the lens map admits no extension at all, see Subsect. 6.1 below. Applying Prop. 7 to the case F = fp immediately gives the following result. Proposition 8. If the lens map fp : Dp −→ N admits an extension fp : Dp ⊆ M1 −→ M2 , then for all regular values ξ ∈ N \fp (∂Dp ) the set fp−1 (ξ ) is finite, so the numbers n+ (ξ ) and n− (ξ ) introduced in Def. 3 are finite. If fp is an extension of the lens map fp , the number deg(fp , ξ ) is a well defined integer for all ξ ∈ N \ fp (∂Dp ), provided that we have chosen an orientation on M1 and on M2 . The number deg(fp , ξ ) changes sign if we change the orientation on M1 or on M2 . This sign ambiguity can be removed if Dp is connected. Then we know from the preceding section that N admits an orientation that is adapted to all Yp ∈ Dp . As N is connected, this determines an orientation for M2 . Moreover, the natural orientation on Sp induces an orientation on Dp which, for Dp connected, gives an orientation for M1 . In the rest of this paper we shall only be concerned with the situation that Dp is connected, and we shall always tacitly assume that the orientations have been chosen as indicated above, thereby fixing the sign of deg(fp , ξ ). Now comparison of (7) with Def. 3 shows that deg(fp , ξ ) = n+ (ξ ) − n− (ξ )
(8)
for all regular values in N \ fp (∂Dp ). Owing to Property A, this has the following consequence. Proposition 9. Assume that Dp is connected and that the lens map admits an extension fp : Dp ⊆ M1 −→ M2 . Then n+ (ξ ) − n− (ξ ) = n+ (ξ ) − n− (ξ ) for any two regular values ξ and ξ which are in the same connected component of N \ fp (∂Dp ). In particular, n+ (ξ ) + n− (ξ ) is odd if and only if n+ (ξ ) + n− (ξ ) is odd. We know already from Prop. 4 that the numbers n+ and n− remain constant along each continuous curve in fp (Dp ) that does not meet the caustic of fp . Now let us consider a continuous curve α : ] − ε0 , ε0 [ −→ fp (Dp ) that meets the caustic at α(0) whereas α(ε) is a regular value of fp for all ε = 0. Under the additional assumptions that Dp is connected, an extension, and that α(0) ∈ / fp (∂Dp ), Prop. 9 that fp admits tells us that n+ α(ε) − n− α(ε) remains constant when ε passes through zero. In other words, n+ and n− are allowed to jump only by the same amount. As a consequence, the total number of images n+ + n− is allowed to jump only by an even number. We now specialize to the case that the lens map is defined on the whole celestial sphere, Dp = Sp . Then the assumption of fp admitting an extension is trivially satisfied, with fp = fp , and the degree deg(fp , ξ ) is a well-defined integer for all ξ ∈ N . Moreover,
414
V. Perlick
deg(fp , ξ ) is a constant with respect to ξ , owing to Property A. It is then usual to write simply deg(fp ) instead of deg(fp , ξ ). Using this notation, (8) simplifies to deg(fp ) = n+ (ξ ) − n− (ξ )
(9)
for all regular values ξ of fp . Thus, the total number of images n+ (ξ ) + n− (ξ ) = deg(fp ) + 2n− (ξ )
(10)
is either even for all regular values ξ or odd for all regular values ξ , depending on whether deg(fp ) is even or odd. In some gravitational lensing situations it might be possible to show that there is one light source ξ ∈ N for which fp−1 (ξ ) consists of exactly one point, i.e., ξ is not multiply imaged. This situation is characterized by the following proposition. Proposition 10. Assume that Dp = Sp and that there is a regular value ξ of fp such that fp−1 (ξ ) is a single point. Then |deg(fp )| = 1. In particular, fp must be surjective and N must be diffeomorphic to the sphere S 2 . Proof. The result |deg(fp )| = 1 can be read directly from (9), choosing the regular value ξ which has exactly one pre-image point under fp . This implies that fp must be surjective since a non-surjective map has degree zero. So N being the continuous image of the compact set Sp under the continuous map fp must be compact. It is well known (see, e.g., Hirsch [7], p. 130, Exercise 5) that for n ≥ 2 the existence of a continuous map F : S n −→ M2 with deg(F ) = 1 onto a compact oriented n-manifold M2 implies that M2 must be simply connected. As the lens map gives us such a map onto N (after changing the orientation of N , if necessary), we have thus found that N must be simply connected. Owing to the well-known classification theorem of compact orientable twodimensional manifolds (see, e.g., Hirsch [7], Chapter 9), this implies that N must be diffeomorphic to the sphere S 2 . In the situation of Prop. 10 we have n+ (ξ ) + n− (ξ ) = 2n− (ξ ) ± 1, for all ξ ∈ N \ Caust(fp ), i.e., the total number of images is odd for all light sources ξ ∈ N S 2 that lie not on the caustic of fp . The idea to use the mapping degree for proving an odd number theorem in this way was published apparently for the first time in the introduction of McKenzie [16]. In Prop. 10 one would, of course, like to drop the rather restrictive assumption that fp−1 (ξ ) is a single point for some ξ . In the next section we consider a special situation where the result |deg(fp )| = 1 can be derived without this assumption. 5. Simple Lensing Neighborhoods In this section we investigate a special class of spacetime regions that will be called “simple lensing neighborhoods”. Although the assumption of having a simple lensing neighborhood is certainly rather special, we shall demonstrate in Sect. 6 below that sufficiently many examples of physical interest exist. We define simple lensing neighborhoods in the following way. Definition 5. (U, T , W ) is called a simple lensing neighborhood in a spacetime (M, g) if (a) U is an open connected subset of M and T is the boundary of U in M; (b) ( T = ∂U, W ) is a source surface in the sense of Def. 1;
Global Properties of Gravitational Lens Maps in Lorentzian Setting
415
(c) for all p ∈ U, the lens map fp : Dp −→ N = ∂U/W is defined on the whole celestial sphere, Dp = Sp ; (d) U does not contain an almost periodic lightlike geodesic. Here the notion of being “almost periodic” is defined in the following way. Any immersed curve λ : I −→ U, defined on a real interval I , induces a curve λˆ : I −→ P U ˆ ˙ | c ∈ R }. in the projective tangent bundle P U over U which is defined by λ(s) = { cλ(s) The curve λ is called almost periodic if there is a strictly monotonous sequence of ˆ i ) i∈N has an accumulation point parameter values (si )i∈N such that the sequence λ(s in P U. Please note that Condition (d) of Def. 5 is certainly true if the strong causality condition holds everywhere on U, i.e., if there are no closed or almost closed causal curves in U. Also, Condition (d) is certainly true if every future-inextendible lightlike geodesic in U has a future end-point in M. Condition (d) should be viewed as adding a fairly mild assumption on the futurebehavior of lightlike geodesics to the fairly strong assumptions on their past-behavior that are contained in Condition (c). In particular, Condition (c) excludes the possibility that past-oriented lightlike geodesics are blocked or trapped inside U, i.e., it excludes the case that U contains non-transparent deflectors. Condition (c) requires, in addition, that the past-pointing lightlike geodesics are transverse to ∂U when leaving U. In the situation of a simple lensing neighborhood, we have for each p ∈ U a lens map that is defined on the whole celestial sphere, fp : Sp −→ N = ∂U/W . We have, thus, Eq. (9) at our disposal which relates the numbers n+ (ξ ) and n− (ξ ), for any regular value ξ ∈ N , to the mapping degree of fp . (Please recall that, by Prop. 8, n+ (ξ ) and n− (ξ ) are finite.) It is our main goal to prove that, in a simple lensing neighborhood, the mapping degree of the lens map equals ±1, so n(ξ ) = n+ (ξ ) + n− (ξ ) is odd for all regular values ξ . Also, we shall prove that a simple lensing neighborhood must be contractible and that its boundary must be diffeomorphic to S 2 × R. The latter result reflects the fact that the notion of simple lensing neighborhoods generalizes the notion of asymptotically simple and empty spacetimes, with ∂U corresponding to past lightlike infinity J− , as will be detailed in Subsect. 6.2 below. When proving the desired properties of simple lensing neighborhoods we may therefore use several techniques that have been successfully applied to asymptotically simple and empty spacetimes before. As a preparation we need the following lemma. Lemma 1. Let (U, T , W ) be a simple lensing neighborhood in a spacetime (M,g). Then there is a diffeomorphism , from the sphere bundle S = Yp ∈ Sp p ∈ U of lightlike directions over U onto the space T N × R2 such that the following diagram commutes. S
,
−→ T N × R2
ip ↑
↓ pr fp
Sp −→
(11)
N
Here ip denotes the inclusion map and pr is defined by dropping the second factor and projecting to the foot-point. Proof. We fix a trivialization for the bundle πW : T −→ N and identify T with N × R. Then we consider the bundle B = Xq ∈ Bq q ∈ T over T , where Bq ⊂ Sq is, by definition, the subspace of all lightlike directions that are tangent to past-oriented
416
V. Perlick
lightlike geodesics that leave U transversely at q. Now we choose for each q ∈ T a vector Qq ∈ Tq M, smoothly depending on q, which is non-tangent to T and outward pointing. With the help of this vector field Q we may identify B and T N × R as bundles over T N × R in the following way. Fix ξ ∈ N , Xξ ∈ Tξ N and s ∈ R and view the tangent space Tξ N as a natural subspace of Tq (N × R), where q = (ξ, s). Then the desired identification is given by associating the pair (Xξ , s) with the direction spanned by Zq = Xξ + Qq − α W (q), where the number α is uniquely determined by the requirement that Zq should be lightlike and past-pointing. – Now we consider the map π : S −→ B T N × R
(12)
given by following each lightlike geodesic from a point p ∈ U into the past until it reaches T , and assigning the tangent direction at the end-point to the tangent direction at the initial point. As a matter of fact, (12) gives a principal fiber bundle with structure group R. To prove this, we first observe that the geodesic spray induces a vector field without zeros on S. By multiplying this vector field with an appropriate function we get a vector field whose flow is defined on all of R × S (see the second paragraph after Def. 1 for how to find such a function). The flow of this rescaled vector field defines an R-action on S such that (12) can be identified with the projection onto the space of orbits. Conditions (c) and (d) of Def. 5 guarantee that no orbit is closed or almost closed. Owing to a general result of Palais [5], this is sufficient to prove that this action makes (12) into a principal fiber bundle with structure group R. However, any such bundle is trivializable, see, e.g., Kobayashi and Nomizu [9], pp. 57/58. Choosing a trivialization for (12) gives us the desired diffeomorphism , from S to B × R T N × R2 . The commutativity of the diagram (11) follows directly from the definition of the lens map fp . With the help of this lemma we will now prove the following proposition which is at the center of this section. Proposition 11. Let (U, T , W ) be a simple lensing neighborhood in a spacetime (M, g). Then (a) N = T /W is diffeomorphic to the standard 2-sphere S 2 ; (b) U is contractible; (c) for all p ∈ U, the lens map fp : Sp S 2 −→ N S 2 has |deg(fp )| = 1; in particular, fp is surjective. Proof. In the proof of part (a) and (b) we shall adapt techniques used by Newman and Clarke [17, 18] in their study of asymptotically simple and empty spacetimes. To that end it will be necessary to assume that the reader is familiar with homology theory. With the sphere bundle S, introduced in Lemma 1, we may associate the Gysin homology sequence . . . −→ Hm (S) −→ Hm (U) −→ Hm−3 (U) −→ Hm−1 (S) −→ . . . ,
(13)
where Hm (X ) denotes the mth homology group of the space X with coefficients in a field F. For any choice of F, the Gysin sequence is an exact sequence of abelian groups, see, e.g., Spanier [14], p. 260 or, for the analogous sequence of cohomology groups, Bredon [15], p. 390. By Lemma 1, S and N have the same homotopy type, so Hm (S) and Hm (N ) are isomorphic. Upon inserting this into (13), we use the fact
Global Properties of Gravitational Lens Maps in Lorentzian Setting
417
that Hm (U) = 1 ( = trivial group consisting of the unit element only) for m > 4 and Hm (N ) = 1 for m > 2 because dim(U) = 4 and dim(N ) = 2. Also, we know that H0 (U) = F and H0 (N ) = F since U and N are connected. Then the exactness of the Gysin sequence implies that Hm (U) = 1
for m > 0
(14)
H2 (N ) = F.
(15)
and H1 (N ) = 1 ,
From (15) we read that N is compact since otherwise H2 (N ) = 1. Moreover, we observe that N has the same homology groups and thus, in particular, the same Euler characteristic as the 2-sphere. It is well known that any two compact and orientable 2-manifolds are diffeomorphic if and only if they have the same Euler characteristic (or, equivalently, the same genus), see, e.g., Hirsch [7], Chapter 9. We have thus proven part (a) of the proposition. – To prove part (b) we consider the end of the exact homotopy sequence of the fiber bundle S over U, see, e.g., Frankel [19], p. 600, . . . −→ π1 (S) −→ π1 (U) −→ 1.
(16)
As S has the same homotopy type as N S 2 , we may replace π1 (S) with π1 (S 2 ) = 1, so the exactness of (16) implies that π1 (U) = 1, i.e., that U is simply connected. If, for some m > 1, the homotopy group πm (U) would be different from 1, the Hurewicz isomorphism theorem (see, e.g., Spanier [14], p. 394 or Bredon [15], p. 479, Corollary 10.10.) would give a contradiction to (14). Thus, πm (U) = 1 for all m ∈ N, i.e., U is contractible. – We now prove part (c). Since U is contractible, the tangent bundle T U and thus the sphere bundle S over U admits a global trivialization, S U ×S 2 . Fixing such a trivialization and choosing a contraction that collapses U onto some point p ∈ U gives a contraction i˜p : S −→ Sp . Together with the inclusion map ip : Sp −→ S this gives us a homotopy equivalence between Sp and S. (Please recall that a homotopy equivalence between two topological spaces X and Y is a pair of continuous maps ϕ : X −→ Y and ϕ˜ : Y −→ X such that ϕ ◦ ϕ˜ can be continuously deformed into the identity on Y and ϕ˜ ◦ ϕ can be continuously deformed into the identity on X .) On the other hand, the projection pr from (11), together with the zero section pr ˜ : N −→ T N × R2 gives a homotopy equivalence between T N × R2 and N . As a consequence, the diagram (11) ˜ tells us that the lens map fp = pr ◦ , ◦ ip together with the map f˜p = i˜p ◦ , −1 ◦ pr gives a homotopy equivalence between Sp S 2 and N S 2 , so fp ◦ f˜p is homotopic to the identity. Since the mapping degree is a homotopic invariant (please recall Property B of the mapping degree from Sect. 4), this implies that deg(fp ◦ f˜p ) = 1. Now the product theorem for the mapping degree (see, e.g., Choquet-Bruhat, Dewitt-Morette, and Dillard-Bleick [12], p. 483) yields deg(fp ) deg(f˜p ) = 1. As the mapping degree is an integer, this can be true only if deg(fp ) = deg(f˜p ) = ±1. In particular, fp must be surjective since otherwise deg(fp ) = 0. In all simple examples to which this proposition applies the degree of fp is, actually, equal to +1, and it is hard to see whether examples with deg(fp ) = −1 do exist. The following consideration is quite instructive. If we start with a simple lensing neighborhood in a flat spacetime (or, more generally, in a conformally flat spacetime), then
418
V. Perlick
conjugate points cannot occur, so it is clear that the case deg(fp ) = −1 is impossible. If we now perturb the metric in such a way that the simple-lensing-neighborhood property is maintained during the perturbation, then, by Property B of the degree, the equation deg(fp ) = +1 is preserved. This demonstrates that the case deg(fp ) = −1 cannot occur for weak gravitational fields (or for small perturbations of conformally flat spacetimes such as Robertson–Walker spacetimes). Among other things, Proposition 11 gives a good physical motivation for studying degree-one maps from S 2 to S 2 . In particular, it is an interesting problem to characterize the caustics of such maps. Please note that, by parts (a) and (c) of Proposition 11, fp (Dp ) is simply connected for all p ∈ U. Hence, Proposition 5 applies which says that the formation of a caustic is necessary for multiple imaging. Owing to (10), part (c) of Proposition 11 implies in particular that n(ξ ) = n+ (ξ ) + n− (ξ ) is odd for all worldlines of light sources ξ ∈ N that do not pass through the caustic of the past light cone of p, i.e., if only light rays within U are taken into account the observer at p sees an odd number of images of such a worldline. It is now our goal to prove a similar “odd number theorem” for a light source with worldline inside U. As a preparation we establish the following lemma. Lemma 2. Let (U, T , W ) be a simple lensing neighborhood in a spacetime (M, g) and p ∈ U. Let J − (p, U) denote, as usual, the causal past of p in U, i.e., the set of all points in M that can be reached from p along a past-pointing causal curve in U. Let ∂U J − (p, U) denote the boundary of J − (p, U) in U. Then (a) every point q ∈ ∂U J − (p, U) can be reached from p along a past-pointing lightlike geodesic in U; (b) ∂U J − (p, U) is relatively compact in M. Proof. As usual, let I − (p, U) denote the chronological past of p in U, i.e., the set of all points that can be reached from p along a past-pointing timelike curve in U. To prove part (a), fix a point q ∈ ∂U J − (p, U). Choose a sequence (pi )i∈N of points in U that converge towards p in such a way that p ∈ I − (pi , U) for all i ∈ N. This implies that we can find for each i ∈ N a past-pointing timelike curve λi from pi to q. Then the λi are past-inextendible in U \ {q}. Owing to a standard lemma (see, e.g., Wald [20], Lemma 8.1.5) this implies that the λi have a causal limit curve λ through p that is pastinextendible in U \ {q}. We want to show that λ is the desired lightlike geodesic. Assume that λ is not a lightlike geodesic. Then λ enters into the open set I − (p, U) (see Hawking and Ellis [21], Prop. 4.5.10), so λi enters into I − (p, U) for i sufficiently large. This, however, is impossible since all λi have past end-point on ∂U J − (p, U), so λ must be a lightlike geodesic. It remains to show that λ has past end-point at q. Assume that this is not true. Since λ is past-inextendible in U \ {q} this assumption implies that λ is pastinextendible in U, so by condition (c) of Def. 5 λ has past end-point on ∂U and meets ∂U transversely. As a consequence, for i sufficiently large λi has to meet ∂U which gives a contradiction to the fact that all λi are within U. – To prove part (b), we have to show that any sequence (qi )i∈N in ∂U J − (p, U) has an accumulation point in M. So let us choose such a sequence. From part (a) we know that there is a past-pointing lightlike geodesic µi from p to qi in U for all i ∈ N. By compactness of Sp S 2 , the tangent directions to these geodesics at p have an accumulation point in Sp . Let µ be the past-pointing lightlike geodesic from p which is determined by this direction. By condition (c) of Definition 5, this geodesic µ and each of the geodesics µi must have a past end-point on ∂U if maximally extended inside U. We may choose an affine parametrization for each of those geodesics with the parameter ranging from the value 0 at p to the value 1 at ∂U.
Global Properties of Gravitational Lens Maps in Lorentzian Setting
419
Then our sequence (qi )i∈N in U determines a sequence (si )i∈N in the interval [0, 1] by setting qi = µi (si ). By compactness of [0, 1], this sequence must have an accumulation point s ∈ [0.1]. This demonstrates that the qi must have an accumulation point in M, namely the point µ(s). We are now ready to prove the desired odd-number theorem for light sources with worldline in U. Proposition 12. Let (U, T , W ) be a simple lensing neighborhood in a spacetime (M, g) and assume that U does not contain a closed timelike curve. Fix a point p ∈ U and a timelike embedded C ∞ curve γ in U whose image is a closed topological subset of M. (The latter condition excludes the case that γ has an end-point on ∂U.) Then the following is true. (a) If γ does not meet the point p, then there is a past-pointing lightlike geodesic from p to γ that lies completely within U and contains no conjugate points in its interior. (The end-point may be conjugate to the initial-point.) If this geodesic meets γ at the point q, say, then all points on γ that lie to the future of q cannot be reached from p along a past-pointing lightlike geodesic in U. (b) If γ meets neither the point p nor the caustic of the past light cone of p, then the number of past-pointing lightlike geodesics from p to γ that are completely contained in U is finite and odd. Proof. In the first step we construct a C ∞ vector field V on M that is timelike on U, has γ as an integral curve, and coincides with W on T = ∂U. To that end we first choose any future-pointing timelike C ∞ vector field V1 on M. (Existence is guaranteed by our assumption of time-orientability.) Then we extend the vector field W to a C ∞ vector field V2 onto some neighborhood V of T . Since W is causal and future-pointing, V2 may be chosen timelike and future-pointing on V \ T . (Here we make use of the fact that T = ∂U is a closed subset of M.) Finally we choose a timelike and future-pointing vector field V3 on some neighborhood W of γ that is tangent to γ at all points of γ . (Here we make use of the fact that the image of γ is a closed subset of M.) We choose the neighborhoods V and W disjoint which is possible since γ is completely contained in U and closed in M. With the help of a partition of unity we may now combine the three vector fields V1 , V2 , V3 into a vector field V with the desired properties. In the second step we consider the quotient space M/V . This space contains the open subset U/V whose boundary T /V = N is, by Prop. 11, a manifold diffeomorphic to S 2 . We want to show that U/V is a manifold (which, according to our terminology, in particular requires that U/V is a Hausdorff space). To that end we consider the map jp : ∂U J − (p, U) −→ U/V which assigns to each point q ∈ ∂U J − (p, U) the integral curve of V passing through that point. (In this proof overlining always means closure in M.) Clearly, jp is continuous with respect to the topology ∂U J − (p, U) inherits as a subspace of M and the quotient topology on U/V . Moreover, ∂U J − (p, U) intersects each integral curve of V at most once, and if it intersects one integral curve then it also intersects all neighbboring integral curves in U; this follows from Wald [20], Theorem 8.1.3. Hence, jp is injective and its image is open in U/V . On the other hand, part (b) of Lemma 2 implies that the image of jp is closed. Since the image of jp is non-empty and connected, it must be all of U/V . (The domain of jp and, thus, the image of jp is non-empty because U does not contain a closed timelike curve. The domain and, thus, the image of jp is connected since U is connected.) We have, thus, proven that jp
420
V. Perlick
is a homeomorphism. This implies that the Hausdorff condition is satisfied on U/V and, in particular, on U/V . Since V is timelike and U contains no closed timelike curves, this makes sure that U/V is a manifold according to our terminology, see Harris [10], Theorem 2. In the third step we use these results to prove part (a) of the proposition. Our result that jp is a homeomorphism implies, in particular, that γ has an intersection with ∂U J − (p, U) at some point q. Now part (a) of Lemma 2 shows that there is a past-pointing lightlike geodesic from p to q in U. This geodesic cannot contain conjugate points in its interior since otherwise a small variation would give a timelike curve from p to q, see Hawking and Ellis [21], Prop. 5.4.12, thereby contradicting q ∈ ∂U J − (p, U). The rest of part (a) is clear since all past-pointing lightlike geodesics in U that start at p are confined to J − (p, U). In the last step we prove part (b). To that end we choose on the tangent space Tp M a Lorentz basis (Ep1 , Ep2 , Ep3 , Ep4 ) with Ep4 future-pointing, and we identify each x = (x 1 , x 2 , x 3 ) ∈ R3 with the past-pointing lightlike vector Yp = x 1 Ep1 + x 2 Ep2 + x 3 Ep3 − |x|Ep4 . With this identification, the lens map takes the form fp : S 2 −→ N = ∂U/V , x −→ πV expp (wp (x)x) . We now define a continuous map F : B −→ M/V x on the closed ball B = x ∈ R3 |x| ≤ 1 by setting F (x) = πV expp (wp ( |x| ) x) for x = 0 and F (0) = πV (p). The restriction of F to the interior of B is a C ∞ map onto the manifold U/V , with the exception of the origin where F is not differentiable. The latter problem can be circumvented by approximating F in the C o -sense, on an arbitrarily small neighborhood of the origin, by a C ∞ map. Then the mapping degree deg(F ) can be calculated (see, e.g., Choquet-Bruhat, Dewitt-Morette, and Dillard-Bleick [12], pp. 477) with the help of the integral formula F ∗ ω = deg(F ) ω, (17) B
U /V
where ω is any 3-form on U/V and the star denotes the pull-back of forms. For any 2-form ψ on U/V , we may apply this formula to the form ω = dψ. With the help of the Stokes theorem we then find F ∗ ψ = deg(F ) ψ. (18) S2
N
However, the restriction of F to ∂B = S 2 gives the lens map, so on the left-hand side of (18) we may replace F ∗ ψ by fp∗ ψ. Then comparison with the integral formula for the degree of fp shows that deg(F ) = deg(fp ) which, according to Prop. 11, is equal to ±1. For every ζ ∈ U/V that is a regular value of F , the result deg(F ) = ±1 implies that the number of elements in F −1 (ζ ) is finite and odd. By assumption, the worldline γ ∈ U/V meets neither the point p nor the caustic of the past light cone of p. The first condition makes sure that our perturbation of F near the origin can be done without influencing the set F −1 (γ ); the second condition implies that γ is a regular value of F , please recall our discussion at the end of Sect. 3. This completes the proof. If only light rays within U are taken into account, then Prop. 12 can be summarized by saying that, for light sources in a simple lensing neighborhood, the “youngest image” has always even parity and the total number of images is finite and odd. In the quasi-Newtonian approximation formalism it is a standard result that a transparent gravitational lens produces an odd number of images, see Schneider, Ehlers and
Global Properties of Gravitational Lens Maps in Lorentzian Setting
421
Falco [1], Section 5.4, for a detailed discussion. Proposition 12 may be viewed as a reformulation of this result in a Lorentzian geometry setting. It is quite likely that an alternative proof of Prop. 12 can be given by using the Morse theoretical results of Giannoni, Masiello and Piccione [22, 23]. Also, the reader should compare our results with the work of McKenzie [16] who used Morse theory for proving an odd-number theorem in certain globally hyperbolic spacetimes. Contrary to McKenzie’s theorem, our Prop. 12 requires mathematical assumptions which can be physically interpreted rather easily. 6. Examples 6.1. Two simple examples with non-transparent deflectors. 6.1.1. Non-transparent string. As a simple example, we consider gravitational lensing in the spacetime (M, g) where M = R2 × R2 \ {0} and g = −dt 2 + dz2 + dr 2 + k 2 r 2 dϕ 2
(19)
with some constant 0 < k < 1. Here (t, z) denote Cartesian coordinates on R2 and (r, ϕ) denote polar coordinates on R2 \ {0}. This can be interpreted as the spacetime around a static non-transparent string, see Vilenkin [24], Hiscock [25] and Gott [26]. One should think of the string as being situated at the z-axis. Since the latter is not part of the spacetime, it is indeed justified to speak of a non-transparent string. As ∂/∂t is a Killing vector field normalized to −1, the lightlike geodesics in (M, g) correspond to the geodesics of the space part. The latter is a metrical product of a real line with coordinate z and a cone with polar coordinates (r, ϕ). So the geodesics are straight lines if we cut the cone open along some radius ϕ = const. and flatten it out in a plane. Owing to this simple form of the lightlike geodesics, the investigation of lens maps in this string spacetime is quite easy. To work this out, choose some constant R > 0 and let T denote the hypercylinder r = R in M. Let W denote the restriction of the vector field ∂/∂t to T . Then (T , W ) is a source surface, with N = T /W S 1 × R. Henceforth we discuss the lens map fp for any point p ∈ M at a radius r < R. There are no past-pointing lightlike geodesics from p that intersect T more than once or touch T tangentially, so the lens map fp gives full information about all images at p of each light source ξ ∈ N . The domain Dp of the lens map is given by excising a curve segment, namely a meridian including both end-points at the “poles”, from the celestial sphere Sp , so Dp R2 is connected. The boundary of Dp in Sp corresponds to light rays that are blocked by the string before reaching T . It is easy to see that the lens map cannot be continuously extended onto Sp (= closure of Dp in Sp ). Nonetheless, the lens map admits an extension in the sense of Def. 4. We may choose M1 = S 2 and M2 = S 2 . Here Dp is embedded into the sphere in such a way that it covers a region (θ, ϕ) ∈ ]0, π [ × ] ε , 2π − ε[ , i.e., in comparison with the embedding into Sp the curve segment excised from the sphere has been “widened” a bit. The embedding of N S 1 × R into S 2 is made via Mercator projection. As the string spacetime has vanishing curvature, the light cones in M have no caustics. Owing to our general results of Sect. 3, this implies that the caustic of the lens map is empty and that all images have even parity, so (8) gives deg(fp , ξ ) = n+ (ξ ) = n(ξ ) for all ξ ∈ N \ fp (∂Dp ). The actual value of n(ξ ) depends on the parameter k that enters into the metric (19). If i = 1/k is an integer, N \ fp (∂Dp ) is connected and n(ξ ) = i everywhere on this set. If
422
V. Perlick
i < 1/k < i + 1 for some integer i, N \ fp (∂Dp ) has two connected components, with n(ξ ) = i on one of them and n(ξ ) = i +1 on the other. Thus, the string produces multiple imaging and the number of images is (finite but) arbitrarily large if k is sufficiently small. For all k ∈ ]0, 1[ , the lens map is surjective, fp (Dp ) = N S 1 ×R. So this example shows that the assumption of fp (Dp ) being simply connected was essential in Prop. 5. 6.1.2. Non-transparent spherical body. We consider the Schwarzschild metric −1 2 2 g = 1 − 2m dr + r 2 dθ 2 + sin2 θ dϕ 2 − 1 − 2m r r ) dt
(20)
on the manifold M = ]Ro , ∞[ × S 2 × R. In (20), r is the coordinate ranging over ]Ro , ∞[ , t is the coordinate ranging over R, and θ and ϕ are spherical coordinates on S 2 . This gives the static vacuum spacetime around a spherically symmetric body of mass m and radius Ro . Restricting the spacetime manifold to the region r > Ro is a way of treating the central body as non-transparent. In the following we keep a value Ro > 0 fixed and we allow m to vary between m = 0 (flat space) and m = Ro /2 (black hole). For discussing lens maps in this spacetime we fix a constant R > 3Ro /2. We denote by T the set of all points in M with coordinate r = R and we denote by W the restriction of ∂/∂t to W . Then (T , W ) is a source surface, with N = T /W S 2 . It is our goal to discuss the properties of the lens map fp : Dp −→ N for a point p ∈ M with a radius coordinate r < R in dependence of the mass parameter m. To that end we make use of well-known properties of the lightlike geodesics in the Schwarzschild metric, see, e.g., Chandrasekhar [28], Sect. 20, for a comprehensive discussion. For determining the relevant features of the lens map it will be sufficient to concentrate on qualitative aspects of image positions. For quantitative aspects the reader may consult Virbhadra and Ellis [27]. We first observe that, for any m ∈ [0, Ro /2], there is no past-pointing lightlike geodesic from p that intersects T more than once or touches T tangentially. This follows from the fact that in the region r > 3m the radius coordinate has no local maximum along any light ray. So the lens map fp gives full information about all images at p of light sources ξ ∈ N . For m = 0, the light rays are straight lines. The domain Dp of the lens map is given by excising a disc, including the boundary, from the celestial sphere Sp , i.e., Dp R2 . The boundary of Dp corresponds to light rays grazing the surface of the central body, so fp can be continuously extended onto the closure of Dp in Sp , thereby giving an extension of fp , in the sense of Def. 4, fp : Dp ⊆ Sp −→ N . In Fig. 2, fp (∂Dp ) can be represented as a “circle of equal latitude” on the sphere r = R, with the image of fp “to the north” of this circle. With increasing m, fp (∂Dp ) moves “south” until, at some value m = m1 , it has reached the “south pole” ξS . This is the situation depicted in Fig. 2. From now on the lens map is surjective and ξS is seen as an Einstein ring, thereby indicating that a caustic has formed. Now fp (∂Dp ) moves north until, at some value m = m1 , it has reached the “north pole” ξN . From now on ξN is seen as an Einstein ring, in addition to the regular image that exists from the beginning. With further increasing m, we find an infinite sequence of values 0 < m1 < m1 < m2 < m2 < · · · < mi < mi < . . . such that at m = mi a new Einstein ring of ξS and at mi a new Einstein ring of ξN comes into existence. For all intermediate values of m, fp (∂Dp ) divides N into two connected components. All points ξ in the southern component, with the exception of the south pole ξS , are regular values of the lens map. fp−1 (ξ ) consists of exactly 2i points where i is the largest integer with mi < m. There are i images of even parity, n+ (ξ ) = i, and i images
Global Properties of Gravitational Lens Maps in Lorentzian Setting
423
ξN
r
................................................................. .................... .............. ............. ........... ........... ......... ......... ......... . . . . . . . . ........ ...... . . . ........ . . . .... ....... . . . . . . ....... ..... . . . ..... . . ... . . .... . . . . . . . . .... .. ...... ..... . . . . . .... . .... .. .. . . .... . . . . . . .... .. ... .... . . . . . . . . .... .... .. .. . . . . . . . .... .... ... ... . . . . . ... . . .... ... .. .... . . . . . . .... ... . .. . . ... . . . .... .... ... ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... . . . . . .............. ... ... .. .................... . . . . . . .. . . . ........ ... ... .. ............ . . . . . . . . ... ....... ... .. .. ......... . . .. . . ... . . ...... ... .. . .. ......... . . ... . ...... .. .. ....... ... .......... .. ..... ...... ... .. ...... .. ... ...... ..... .. ... .. .... ..... .. ..... ..... ... .. .... ... ...... .... ..... ... ..... ....... .... ... ... ... ... .... .... ... ... ... . ... ... . . ... .. .. ... ... . . . . ... ... ... . .... . . . . . . . . ... .. .... .. ... .... .. . . . . . .. . .. . . . . . . . . ... .. .. . .. .. .. ... ... ... .. .. ... ... ... .... .. .. ... .. ... . ... .. ... .. ... .. .. . . . . . . . . . . . .. ... .... ... .. .. ..... ... ... .... .. .. ...... ... ...... ... .. .. o ....... ... .. ... ....... ........ ... .. ........ . . ... ... .......... . . . . . . . . . . . ................ .... ... ... .................................... ... ... ... ... ... ... ... ... ... ... .. ... .... ... .. . . . . . . . .... . ... .. .... ... .... ... ... .... .... ... .... .... .... .... .... .... .... .... . . .... . . .... . . .... .... .... ...... .... ..... ...... .... .... ...... .... ....... .... .... ....... ....... .... .... ........ ....... . . . . . . . . . . . . . . ........ .... .... ........ ......... .... .... ......... ......... .... .... ......... ........... .... .... ........... .............. ..................... ........ ....... .................................. .........................................
r p
r=R
r=R
r
ξS Fig. 2. At m = m1 , the extended lens map fp maps the boundary of Dp onto the south pole ξS
of odd parity, n− (ξ ) = i, hence deg(fp , ξ ) = n+ (ξ ) − n− (ξ ) = 0. Similarly, all points ξ in the northern component, with the exception of the north pole ξN , are regular values of the lens map. fp−1 (ξ ) consists of exactly 2i + 1 points, where i is the largest integer with mi < m. There are i + 1 images of even parity, n+ (ξ ) = i + 1, and i images of odd parity, n− (ξ ) = i, hence deg(fp , ξ ) = n+ (ξ )−n− (ξ ) = 1. Both sequences (mi )i∈N and (mi )i∈N converge towards m = Ro /3. For m ≥ Ro /3, the boundary of Dp corresponds to light rays that approach the sphere r = 3m asymptotically in a neverending spiral motion, cf. Chandrasekhar [28], Fig. 9 and Fig. 10. The lens map no longer admits an extension in the sense of Def. 4, so we cannot assign a mapping degree to it. There are infinitely many concentric Einstein rings for both poles, and infinitely many isolated images for all other ξ ∈ N , with both n+ (ξ ) and n− (ξ ) being infinite. These features remain unchanged until the black-hole case m = Ro /2 is reached. The fact that in this case the caustic of the lens map consists of just two points is rather exceptional. After a small perturbation of the spherical symmetry the caustic would show a completely different behavior. For regular ξ ∈ N , however, the statements about n± (ξ ) are stable against small perturbations. Having studied Schwarzschild spacetimes around non-transparent bodies, the reader might ask what about transparent bodies, i.e., what about matching an interior solution to the exterior Schwarzschild solution at Radius Ro , with Ro > 2m, and allowing for light rays passing through the interior region. If Ro > 3m, and if there are no light rays trapped within the interior region, the resulting spacetime will be asymptotically simple and empty. Qualitative features of lens maps in this class of spacetimes are
424
V. Perlick
discussed in the following subsection. For a more explicit discussion of lens maps in the Schwarzschild spacetime of a transparent body, choosing a perfect fluid with constant density for the interior region, the reader is referred to Kling and Newman [29].
6.2. Asymptotically simple and empty spacetimes. Asymptotically simple and empty spacetimes are considered to be good models for the gravitational fields of transparent gravitating bodies that can be viewed as isolated from all other masses in the universe. The formal definition, which is essentially due to Penrose [30], cf., e.g. Hawking and Ellis [21], p. 222, reads as follows. Definition 6. A spacetime (M, g, ) is called asymptotically simple if there is a strongly ˜ g) causal spacetime (M, ˜ with the following properties: ˜ with a non-empty boundary ∂M . (a) M is an open submanifold of M ˜ −→ R such that M = { p ∈ M ˜ | @(p) > 0 }, (b) There is a C ∞ function @ : M ˜ ∂M = { p ∈ M | @(p) = 0 }, d@ = 0 everywhere on ∂M and g˜ = @2 g on M . (c) Every inextendible lightlike geodesic in M has past and future end-point on ∂M . (M, g) is called asymptotically simple and empty if, in addition, ˜ such that the Ricci tensor of g vanishes on (d) there is a neighborhood V of ∂M in M V ∩ M. Condition (d) of Def. 6 is a way of saying that, sufficiently far away from the gravitating body under consideration, Einstein’s vacuum field equation is satisfied. This assumption is reasonable for the spacetime around an isolated body producing gravitational lensing as long as cosmological aspects can be ignored. ˜ The assumptions (a)–(d) of Def. 6 imply that ∂M is a g-lightlike ˜ hypersurface in M + − that has two connected components, usually denoted by J and J (cf., e.g., Hawking and Ellis, [21], p. 222). Every inextendible lightlike geodesic in M has future end-point on J+ and past end-point on J− . In the following we concentrate on J− which is the relevant quantity in view of gravitational lensing. By construction, J− is ruled by the integral curves of the g˜ gradient Z of @. (In coordinate notation, the vector field Z is defined by Z a = g˜ ab ∂b @ on J− .) It is well known that Z is regular, with J− /Z being diffeomorphic to S 2 , and that the natural projection πZ : J− −→ J− /Z S 2 makes J− into a trivializable fiber bundle with typical fiber diffeomorphic to R. For a full proof we refer to Newman and Clarke [17, 18]. (The argument given in Hawking and Ellis [21], Prop. 6.9.4, which is due to Geroch [31], is incomplete.) This result can be translated into our terminology in the following way. Proposition 13. In the case of an asymptotically simple and empty spacetime, (J− , Z) ˜ g), is a source surface in the spacetime (M, ˜ with N = J− /Z diffeomorphic to S 2 . Each integral curve of Z can be written as the C 1 -limit of a sequence (γi )i∈N of timelike curves in M. We may interpret the γi as a sequence of worldlines of light sources approaching infinity. From the viewpoint of the physical spacetime (M, g), it is thus justified to interpret the integral curves of Z as “light sources at infinity”. With respect to the unphysical metric g, ˜ these worldlines are lightlike. With respect to the physical metric, however, they have no causal character at all, because the metric g is
Global Properties of Gravitational Lens Maps in Lorentzian Setting
425
not defined on J− . It is, thus, a misinterpretation to say that the “light sources at infinity” move at the speed of light. We shall now show that the formalism of “simple lensing neighborhoods” applies to the situation at hand. To that end, we observe that J− is the boundary of M in the manifold M˜ \ J+ . This gives rise to the following result. Proposition 14. In the case of an asymptotically simple and empty spacetime, ˜ \ J+ , g| (M, J− , Z) is a simple lensing neighborhood in the spacetime (M ˜M ˜ \J+ ). Proof. Condition (a) of Def. 5 is obvious from Def. 6 and Condition (b) was just established. The proof of the remaining two conditions is based on the fact that on M the g-lightlike geodesics coincide with the g-lightlike ˜ geodesics (up to affine parametrization). Condition (d) of Def. 5 is satisfied since every lightlike geodesic in M has past end-point on J− and future end-point on J+ . Moreover, the arrival on J± must be transverse since J± is g-lightlike. ˜ This shows that Condition (c) of Def. 5 is satisfied as well. We can, thus, apply our results on simple lensing neighborhoods to asymptotically simple and empty spacetimes. As a first result, Prop. 11 tells us that every asymptotically simple and empty spacetime M must be contractible. This result is not new. It is well known that every asymptotically simple and empty spacetime is globally hyperbolic and, thus, homeomorphic to a product of a Cauchy surface C with the real line, M C × R, and that C is contractible. For a full proof we refer again to Newman and Clarke [17,18]. The stronger result that C must be homeomorphic to R3 requires the assumption that the Poincaré conjecture is true (i.e., that every simply connected and compact 3-manifold is homeomorphic to S 3 ). In addition, Prop. 11 gives us the following result. Proposition 15. In the case of an asymptotically simple and empty spacetime, for all p ∈ M the lens map fp : Sp −→ J− /Z S 2 has |deg(fp )| = 1. The lens map fp for “light sources at infinity” in an asymptotically simple and empty spacetime was already discussed in Perlick [32, 33]. In particular, a proof of the result deg(fp ) = 1 was given in Theorem 6 of [32]. An equivalent statement, using a different terminology, can be found as Lemma 1 in Kozameh, Lamberti and Reula [34], together with a short proof. However, both these earlier proofs are incomplete. The proof in [32] is based on the idea to homotopically deform fp into the identity, but it is not shown that the construction can be made in such a way that the dependence on the deformation parameter is, indeed, continuous. In [34], the authors write the future light cone (or, equivalently, the past light cone) of a point p ∈ M as the image of a map : ]0, ∞[ ×S 2 −→ M, and they assign a winding number to each map (s, ·). Since a winding number has to refer to a “center”, the authors in [34] apparently take for granted that there is a timelike curve through p that has no further intersection with the light cone of p. The existence of such a curve, however, is an open question. With our Prop. 11 we have filled these gaps insofar as we have established the result deg(fp ) = ±1. However, we have not shown whether, with our choice of orientations, the occurrence of the minus sign can be ruled out. Proposition 15 implies that every observer in p sees an odd number of images of each light source at infinity that does not pass through the caustic of the past light cone of p. (Here one has to refer to the g-cone ˜ which is an extension of the g-cone.) As an immediate consequence of Prop. 12, we find that a similar statement is true for light sources inside M, see Fig. 3.
426
V. Perlick ........... ... ...... .... .... .... .... .... .... .... .... ... .... .... .... .... .... + .... .... .... .... .... .... ... .... .... . .. ... .... .. .... .. ....... .... .. .... .... .. .... ... ... .... . .... .... .. .... .... ... .... ... .... .... .... .... ... .... ... ... .... .... .... ... ... .... ... ... .... .... ... . . .... ... .. . . ... . ... ... . .... ... .. .... .. .... .... .... ... .... . . .... ... .... .... .. ... .... ... .... ... .... .... ... . .... . .... .. .. ... .... .... .... ... − .... .... ... ... . . .. .. .... . ... .. .... ... ... .. .. ...... .... .. .... .... ..... ...... .... .......... .... ......... .......... .
J
p
q
γ
J
Fig. 3. Illustration of Proposition 16
Proposition 16. Fix a point p and a timelike embedded C ∞ curve γ in an asymptotically simple and empty spacetime (M, g). Assume that the image of γ is a closed subset of ˜ \ J+ and that γ meets neither the point p nor the caustic of the past light cone of M p. Then the number of past-pointing lightlike geodesics from p to γ in M is finite and odd. Let us conclude this subsection with a few remarks on spacetimes that are asymptotically simple but not empty. For any asymptotically simple spacetime it is easy to verify that ∂M has either one or two connected components, and that all lightlike geodesics in M have their past end-point in the same connected component of ∂M. Let us denote this component by J− henceforth. In order to apply our formalism of simple lensing neighborhoods the additional assumptions needed are that J− is a fiber bundle with g-causal ˜ fibers diffeomorphic to R over an orientable basis manifold, and that all pastinextendible lightlike geodesics in M meet J− transversely. If these assumptions are satisfied, our results on simple lensing neighborhoods apply. In particular, J− must be diffeomorphic to S 2 × R and M must be contractible. As an interesting special case, we might modify Condition (d) of Def. 6 by requiring the Ricci tensor of g to be equal to D g near ∂M with a positive or negative cosmological constant D. The resulting spacetimes are called asymptotically deSitter for D > 0 and asymptotically anti-deSitter for D < 0. It was verified already by Penrose [30] that then ∂M is g-spacelike ˜ for D > 0 and g-timelike ˜ for D < 0. Thus, the formalism of simple lensing neighborhoods is inappropriate for investigating asymptotically deSitter spacetimes, but it may be used for the investigation of asymptotically anti-deSitter spacetimes. 6.3. Weakly perturbed Robertson–Walker spacetimes. It is a characteristic feature of the lens map, as defined in this paper, that it is constructed by following each pastpointing lightlike geodesic up to its first intersection with the source surface only. Further
Global Properties of Gravitational Lens Maps in Lorentzian Setting
427
intersections are ignored, i.e., some images are willfully excluded from the gravitational lensing discussion. In the preceding examples no such further intersections occurred. We shall now discuss an example where they do occur but where it is physically well motivated to disregard them. To that end we start out with a spacetime (M, g) with M = S 3 × R and g = R(t)2 − dt 2 + dχ 2 + sin2 χ (dθ 2 + sin2 θ dφ 2 ) . (21) Here χ ∈ [0, π ], θ ∈ [0, π ] and φ ∈ [0, 2π ] denote standard coordinates on S 3 (with the usual coordinate singularities), t denotes the projection from M = S 3 × R onto R, and R : R −→ R is a strictly positive but otherwise arbitrary C ∞ function. This is the general form of a Robertson–Walker spacetime with positive spatial curvature and natural topology which has no particle horizons. (Particle horizons are excluded by the assumption that the “conformal time” t runs over all of R.) Now fix a coordinate value χo ∈ ] 0 , π/2[ and let U denote the set of all points in M whose χ -coordinate is smaller than χo . Let W denote the restriction of the vector field ∂/∂t to the boundary ∂U . Then (U, ∂U, W ) is a simple lensing neighborhood. This is easily verified using the fact that the lightlike geodesics in M project to the geodesics of the standard metric on S 3 . Our assumptions that t ranges over all of R and that χo < π/2 are essential to make sure that, for all p ∈ U, the lens map is defined on all of Sp . In the case at hand, the lens map fp : Sp −→ ∂U/W is a global diffeomorphism for all points p ∈ U . Actually, there are infinitely many past-pointing lightlike geodesics from any fixed p ∈ U to any fixed ξ ∈ ∂U/W , but only one of them reaches ξ without having left U. All the other ones make at least a half circle around the whole universe, so they will give rise to rather faint images as a consequence of absorption in the intergalactic medium. It is, thus, reasonable to assume that only the one image which enters into the lens map is actually visible. In this sense, disregarding all the other light rays is physically well motivated. Please note that all the infinitely many images of ξ are situated at just two points of the celestial sphere at p ; the two brightest images cover all the other ones. Now this example is boring in view of gravitational lensing because the lens map is a global diffeomorphism. However, we can switch to a more interesting situation by choosing a compact subset K ∈ S 3 and modifying the metric on the set K × R. In view of Einstein’s field equation, this can be interpreted as introducing local mass concentrations that act as gravitational lens deflectors. If K × R is completely contained in U, and if the modification of the metric is sufficiently small to make sure that, even after the modification, no light rays are past- or future-trapped inside U, then U remains a simple lensing neighborhood. We have, thus, Prop. 11 at our disposal. Under the (very mild) additional assumption that, even after the perturbation, there are no closed timelike curves in U, we may also use Prop. 12. This is a line of argument to the effect that, in a Robertson–Walker spacetime of the kind considered here, any transparent gravitational lens deflector produces an odd number of visible images. The assumption that there are no particle horizons was essential since otherwise the lens map would not be defined on the whole celestial sphere for all p ∈ U. A similar argument applies, of course, to Robertson–Walker spacetimes with noncompact spatial sections. Then we don’t have to care about light rays traveling around the whole universe, so there are no additional images which are ignored by the lens map.
428
V. Perlick
References 1. Schneider, P., Ehlers, J., Falco, E.: Gravitational lenses. New York: Springer, 1992 2. Frittelli, S., Newman E.: Phys. Rev. D 59, 124001 (1999) 3. Ehlers, J., Frittelli, S., Newman, E.: In: J. Renn (ed.), Festschrift in honor of John Stachel. Kluwer Academic Publishers, to appear 2001 4. Ehlers, J.: Annalen der Physik (Leipzig) 9, 307 (2000) 5. Palais, R.: Ann. Math. 73, 295 (1961) 6. Whitney, H.: Ann. Math. 37, 645 (1936) 7. Hirsch, M.W.: Differential topology. Springer, New York, 1976 8. Abraham, R., Marsden, J.: Foundations of mechanics. Reading, MA: Benjamin-Cummings, 1978 9. Kobayashi, S., Nomizu, K.: Foundations of differential geometry. Vol.I. New York: Wiley-Interscience, 1963 10. Harris, S.: Class. Quantum Grav. 9, 1823 (1992) 11. Beem, J., Ehrlich, P., Easley, K.: Global Lorentzian geometry. New York: Dekker, 1996 12. Choquet-Bruhat,Y., Dewitt-Morette, C., Dillard-Bleick, M.: Analysis, manifolds and physics. Amsterdam: North-Holland, 1977 13. Dold, A.: Lectures on algebraic topology. Berlin: Springer, 1980 14. Spanier, E.: Algebraic topology. New York: McGraw Hill, 1966 15. Bredon, G.E.: Topology and geometry. New York: Springer, 1993 16. McKenzie, R.H.: J. Math. Phys. 26, 1592 (1985) 17. Newman, R.P.C., Clarke, C.J.S.: Class. Quantum Grav. 4, 53 (1987) 18. Newman, R.P.C.: Commun. Math. Phys. 123, 17 (1989) 19. Frankel, T.: The geometry of physics Cambridge: Cambridge UP, 1997 20. Wald, R.: General relativity. Chicago: University of Chicago Press, 1984 21. Hawking, S., Ellis, G.: The large scale structure of space-time. Cambridge: Cambridge UP, 1973 22. Giannoni, F., Masiello, A., Piccione, P.: Commun. Math. Phys. 187, 375 (1997) 23. Giannoni, F., Masiello, A., Piccione, P.: Ann. Inst. H. Poincaré, Physique Theoretique 69, 359 (1998) 24. Vilenkin, A.: Phys. Rev. D 23, 852 (1981) 25. Hiscock, W.: Phys. Rev. D 31, 3288 (1985) 26. Gott, J.R.: Astrophys. J. 288, 422 (1985) 27. Virbhadra, K.S., and Ellis, G.F.R.: Phys. Rev. D 62, 084003 (2000) 28. Chandrasekhar, S.: The mathematical theory of black holes. Oxford: Oxford UP, 1983 29. Kling, T., Newman, E.T.: Phys. Rev. D. 59, 124002 (1999) 30. Penrose, R.: In: deWitt, C. M., deWitt, B. (eds.): Relativity, groups and topology. Les Houches Summer School 1963. New York: Gordon and Breach, 1964, p. 565 31. Geroch, R.: In: Sachs, R. K. (ed.): General relativity and cosmology Enrico Fermi School, Course XLVII. New York: Academic Press, 1971, pp. 71–103 32. Perlick, V.: In: Schmidt, B. (ed.): Einstein’s field equations and their physical implications, Heidelberg: Springer, 2000 33. Perlick, V.: Ann. Physik (Leipzig) 9, SI–139 (2000) 34. Kozameh, C., Lamberti, P.W., Reula, O.: J. Math. Phys. 32, 3423 (1991) Communicated by H. Nicolai
Commun. Math. Phys. 220, 429 – 451 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
On the Characteristic Polynomial of a Random Unitary Matrix C. P. Hughes1,2 , J. P. Keating1,2 , Neil O’Connell1 1 BRIMS, Hewlett-Packard Labs, Bristol, BS34 8QZ, UK 2 School of Mathematics, University of Bristol, University Walk, Bristol, BS8 1TW, UK
Received: 27 June 2000 / Accepted: 30 January 2001
Abstract: We present a range of fluctuation and large deviations results for the logarithm of the characteristicpolynomial Z of a random N × N unitary matrix, as N → ∞. First
we show that ln Z/ 21 ln N , evaluated at a finite set of distinct points, is asymptotically a collection of i.i.d. complex normal random variables. This leads to a refinement of a recent central limit theorem due to Keating and Snaith, and also explains the covariance structure of the eigenvalue counting function. Next we obtain a central limit theorem for ln Z in a Sobolev space of generalised functions on the unit circle. In this limiting regime, lower-order terms which reflect the global covariance structure are no longer negligible and feature in the covariance structure of the limiting Gaussian measure. Large deviations results for ln Z/A, evaluated at a finite set of distinct points, can be obtained √ for ln N A ln N . For higher-order scalings we obtain large deviations results for ln Z/A evaluated at a single point. There is a phase transition at A = ln N (which only applies to negative deviations of the real part) reflecting a switch from global to local conspiracy. 1. Introduction and Summary
Let U be an N × N unitary matrix, chosen uniformly at random from the unitary group U(N ), and denote its eigenvalues by exp(iθ1 ), . . . , exp(iθN ). In order to develop a heuristic understanding of the value distribution and moments of the Riemann zeta function, Keating and Snaith [21] considered the characteristic polynomial (normalised so that its logarithm has zero mean) Z(θ ) = det(I − U e−iθ ) =
N
1 − ei(θn −θ) .
(1.1)
n=1
This is believed to be a good statistical model for the zeta function at (large but finite) height T up the critical line when the mean density of the non-trivial zeros (which equals
430
C. P. Hughes, J. P. Keating, N. O’Connell
(1/2π) ln(T /2π)) is set equal to the mean density of eigenangles (which is N/2π). (For additional evidence of this, concerning other statistics, see [9].) Note that the law of Z(θ ) is independent of θ ∈ T (the unit circle). In [21] it is shown that as N → ∞, ln Z(0)/σ converges in distribution to a standard complex normal random variable, where 2σ 2 = ln N . That is ln Z(0) ⇒ X + iY, 1 ln N 2
(1.2)
where X and Y are independent normal random variables with mean zero and variance one1 , and ⇒ denotes convergence in distribution. (A similar result can be found in [2], but there the real and imaginary parts of ln Z/σ are treated separately.) In order to make the imaginary part of the logarithm well-defined, the branch is chosen so that ln Z(θ ) =
N
ln 1 − ei(θn −θ)
(1.3)
n=1
and
− 21 π < Im ln 1 − ei(θn −θ) ≤ 21 π.
(1.4)
Compare the above central limit theorem with a central limit theorem, due to Selberg, for the value distribution of the log of the Riemann zeta function along the critical line. Selberg proved (see, for example, §2.11 of [24] or §4 of [22]) that, for rectangles B ⊆ C, ln ζ ( 21 + it) 1 2 2 = 1 T ≤ t ≤ 2T : e−(x +y )/2 dx dy. (1.5) ∈ B lim 2π T →∞ T 1 B ln ln T 2
Equating the mean density of the Riemann zeros at height T with the mean density of eigenangles of an N × N unitary matrix, we have N = ln(T /2π ) and thus we see that these two central limit theorems are consistent. In this paper we obtain more detailed fluctuation theorems for ln Z as N → ∞, and a range of large and moderate deviations results. First we show that ln Z/σ , evaluated at a finite set of distinct points, is asymptotically a collection of i.i.d. complex normal random variables. This leads to a refinement of the above central limit theorem, and also explains the mysterious covariance structure which has been observed, by Costin and Lebowitz [10] and Wieand [32, 33], in the eigenvalue counting function. We also obtain a central limit theorem for ln Z in a Sobolev space of generalised functions on the unit circle. In this limiting regime, lower-order terms which reflect the global covariance structure are no longer negligible and feature in the covariance structure of the limiting Gaussian measure. The limiting process is not in L2 (T). It is, however, when integrated, Hölder continuous with parameter 1 − δ, for any δ > 0. Large deviations results for ln Z/A, evaluated at a finite set of distinct points, are √ obtained for ln N A ln N . For higher-order scalings we obtain large deviations results for ln Z/A evaluated at a single point. For the imaginary part, all scalings A N 1 Perhaps we should warn the reader at this point that some authors use the term “standard complex normal” to refer to the case where the variance of each component is 1/2.
Characteristic Polynomial of Random Unitary Matrix
431
lead to quadratic rate functions. At A = N , the speed is N 2 , and the rate function is a convex function for which we give an explicit formula. For the real part, only scalings up to A = ln N lead to quadratic rate functions. At this critical scaling one observes a phase transition, and beyond it deviations to the left and right occur at different speeds. For deviations to the left, the rate function becomes linear; for deviations to the right, the rate function remains quadratic up to but not including the scaling A = N . At the scaling A = N , deviations to the left occur at speed N , while deviations to the right occur at speed N 2 , and the rate function is again a convex function for which we give an explicit formula. The phase transition reflects a switch from global to local conspiracy. Related fluctuation theorems for random matrices can be found in [10, 13, 12, 19, 14, 27] and references therein. In particular, Diaconis and Evans [12] give an alternative proof of Theorem 2.2 below. The large deviation results at speed N 2 are partially consistent with (but do not follow from) a higher-level large deviation principle due to Hiai and Petz [16]. High-level large deviations results and concentration inequalities for other ensembles can be found in [5, 6, 15].
2. Fluctuation Results Our first main result is that the law of ln Z(0) obtained by averaging over the unitary group is asymptotically the same as the value distribution of ln Z(θ) obtained by averaging over θ for a typical realisation of U : Theorem 2.1. Set WN (θ ) = ln Z(θ )/σ , and denote by m the uniform probability measure on T (so that m(dθ ) = dθ/2π ). As N → ∞, the sequence of laws m ◦ WN−1 converges weakly in probability to a standard complex normal variable. This will follow from Theorem 2.2 below, so we defer the proof. Theorem 2.1 hints at the possibility that the t-range in (1.5) can be significantly reduced. The characteristic polynomial can also be used to explain the mysterious “white noise” process which appears in recent work of Wieand [32, 33] on the counting function (and less explicitly in earlier work of Costin and Lebowitz [10]).A Gaussian process is defined to be a collection of real (complex) random variables {X(α), α ∈ I }, with the property that, for any α1 , . . . , αm , the joint distribution of X(α1 ), . . . , X(αm ) is multivariate (complex) normal. For −π < s < t ≤ π , let CN (s, t) denote the number of eigenangles of U that lie in the interval (s, t). Wieand proves that the finite dimensional distributions N defined by of the process C − (t − s)N/2π N (s, t) = CN (s, t) C 1 1 π 2 ln N
(2.1)
converge to those of a Gaussian process C which can be realised in the following way: let Y be a centered Gaussian process indexed by T with covariance function EY (s)Y (t) = 11{s=t} (where 11 is the indicator function) and set C(s, t) = Y (t) − Y (s). What is the origin of this process Y ? The answer is as follows. First, it is not hard to show that for each N , N (s, t) = YN (t) − YN (s), C
(2.2)
432
C. P. Hughes, J. P. Keating, N. O’Connell
where YN (θ ) = Im ln Z(θ )/σ . This follows from the identity 11{θ∈(s,t)} =
1 t −s 1 + Im ln(1 − ei(θ−t) ) − Im ln(1 − ei(θ−s) ), 2π π π
(2.3)
where, as always, the principal branch of the logarithm is chosen as in (1.4). Moreover: Theorem 2.2. Set WN (θ ) = ln Z(θ )/σ . If r1 , . . . , rk ∈ T are distinct, the joint law of (WN (r1 ), . . . , WN (rk )) converges as N → ∞ to that of k i.i.d. standard complex normal random variables. In particular, the finite dimensional distributions of YN converge to those of Y . This suggests that the analogous extension of Selberg’s theorem (1.5) might hold for the zeta function. Proof. Let f be a real-valued function in L1 (T), and denote by 2π ˆ f (θ)e−ikθ m(dθ) fk =
(2.4)
0
its Fourier coefficients. The N th order Toeplitz determinant with symbol f is defined by DN [f ] = det(fˆj −k )1≤j,k≤N .
(2.5)
Heine’s identity (see, for example, [28]) states that DN [f ] = E
N
f (θn ).
(2.6)
n=1
The following lemma is more general than we need here, but we record it for later reference. Lemma 2.3. For any d(N) 1 as N → ∞, s, t ∈ Rk with N sufficiently large such that sj > −d(N) for all j , and rj distinct in T, k E exp sj Re ln Z(rj )/d + tj Im ln Z(rj )/d (2.7) j =1
∼
k
E exp sj Re ln Z(rj )/d + tj Im ln Z(rj )/d
(2.8)
j =1
k ln N ∼ exp (s 2 + tj2 ) . 4d 2 j
(2.9)
j =1
Proof. This follows from Heine’s identity and a result of Basor [4] on the asymptotic behaviour of Toeplitz determinants with Fisher–Hartwig symbols. The Fisher–Hartwig symbol we require has the form f (θ ) =
k
(1 − ei(θ−rj ) )αj +βj (1 − ei(rj −θ) )αj −βj .
j =1
(2.10)
Characteristic Polynomial of Random Unitary Matrix
433
Taking αj = sj /2d and βj = −itj /2d, we have, by Heine’s identity, k E exp sj Re ln Z(rj )/d + tj Im ln Z(rj )/d = DN [f ].
(2.11)
j =1
Note that the αj ’s are real and the βj ’s purely imaginary. Basor [4] proves that, as N → ∞, for rj distinct, DN [f ] ∼ E(α1 , β1 , r1 , . . . , αk , βk , rk )
k
N αj −βj , 2
2
(2.12)
j =1
for αj > −1/2, where E(α1 , β1 , r1 , . . . , αk , βk , rk ) =
1 − ei(rm −rn )
−(αm −βm )(αn +βn )
1≤m,n≤k m=n k G(1 + αj + βj )G(1 + αj − βj ) × , G(1 + 2αj )
(2.13)
j =1
where G is the Barnes G-function, and arg 1 − ei(rm −rn ) ≤ π/2. By closer inspection of the proof given in [4] it can be seen that (2.12) holds uniformly for |αj | < 1/2 − δ, and |βj | < γ , for any fixed δ, γ > 02 . We remark that uniformity in β is worked out carefully in [32] for the case αj = 0 for each j , and uniformity in α is discussed in [4]. The statement of the lemma follows from noting that E(0, 0, r1 , . . . , 0, 0, rk ) = 1. Setting d = 21 ln N = σ completes the proof of Theorem 2.2. Proof of Theorem 2.1. Set XN (θ ) = Re ln Z(θ )/σ , YN (θ ) = Im ln Z(θ )/σ and φN (s, t) = exp (sXN (θ ) + tYN (θ )) m(dθ). (2.14) T
By the central limit theorem derived in [21] (which we note, in passing, also follows from Theorem 2.2), EφN (s, t) = E exp (sXN (0) + tYN (0)) → e(s We also have EφN (s, t)2 =
2 +t 2 )/2
.
(2.15)
T
E exp (sXN (θ ) + tYN (θ ) + sXN (0) + tYN (0)) m(dθ).
(2.16)
By Cauchy–Schwartz, the integrand is bounded above by sup E exp (2sXN (0) + 2tYN (0)) ,
N≥N0
2 This was pointed out to us by Harold Widom.
(2.17)
434
C. P. Hughes, J. P. Keating, N. O’Connell
where N0 is chosen such that 2s > −σ (N0 ). Thus, by Theorem 2.2 and the bounded convergence theorem, EφN (s, t)2 → es
2 +t 2
,
(2.18)
and hence P(|φN (s, t) − e(s
2 +t 2 )/2
| > -) ≤ Var φN (s, t)/- 2 → 0,
(2.19)
for any - > 0, by Chebyshev’s inequality. Thus, for each s, t, the sequence φN (s, t) 2 2 converges in probability to e(s +t )/2 . The result now follows from the fact that moment generating functions are convergence-determining. We note that Szegö’s asymptotic formula for Toeplitz determinants does not apply in the above context. Szegö’s theorem for real-valued functions states that if A(h) =
∞
k|hˆ k |2 < ∞,
(2.20)
k=1
then
DN [eh ] = exp N hˆ 0 + A(h) + o(1)
(2.21)
as N → ∞. Combining this with Heine’s identity, we see that if hˆ 0 = 0 and A(h) < ∞, then Tr h(U ) is asymptotically normal with zero mean and variance 2A(h). Now, we can write Re ln Z(θ ) = Tr h(U ), where h(t) = Re ln(1 − ei(t−θ) ), but the Fourier coefficients hˆ k are of order 1/k in this case and A(h) = +∞. We can, however, apply Szegö’s theorem to obtain a functional central limit theorem for ln Z. Actually, we will use the following fact, due to Diaconis and Shahshahani [13], which can be deduced from Szegö’s theorem. Lemma 2.4. For each l, the collection of random variables 2 −j Tr U , j = 1, . . . , l j
(2.22)
converges in distribution to a collection of i.i.d. standard complex normal random variables. (In fact, it is shown in [13] that there is exact agreement of moments up to high order for each N . See also [18], where superexponential rates of convergence are established.) Denote by H0a the space of generalised real-valued functions f on T with fˆ0 = 0 and f 2a
=
∞
|k| |fˆk |2 = 2 2a
k=−∞
∞
k 2a |fˆk |2 < ∞.
(2.23)
k=1
This is a Hilbert space with the inner product f, ga =
∞ k=−∞
|k|2a fˆk gˆ k∗ .
(2.24)
Characteristic Polynomial of Random Unitary Matrix
435
It is also a closed subspace of the Sobolev space H a , which is defined similarly but without the restriction fˆ0 = 0. Sobolev spaces have the following useful property: the unit ball in H a is compact in H b , whenever a > b. It follows that the unit ball in H0a is compact in H0b for a > b. We shall make use of this fact later. Note that, when a = 0, · , ·a is just the usual inner product on L2 (T); in this case we will drop the subscript. Fix a < 0, and define a Gaussian measure µ on H0a × H0a as follows. First, let X1 , X2 , . . . be a sequence of i.i.d. standard complex normal random variables, X0 = 0 and X−k = Xk∗ , and define a random element F ∈ H0a by F (θ) =
∞ Xk eikθ . √ 2 2|k| k=−∞
(2.25)
(To see that F ∈ H0a , note that in fact EF 2a < ∞ for a < 0.) Now define µ to be the −1/2 : law of (F, AF ), where A is the Hilbert transform on H0 i fˆk Af k = −i fˆk
k>0 k < 0.
(2.26)
We will describe some properties of µ later. First we will prove: Theorem 2.5. The law of (Re ln Z, Im ln Z) converges weakly to µ. Proof. First note that Im ln Z = A(Re ln Z) and, for k = 0,
Re ln Z
k
− Tr U −k . 2|k|
=
(2.27)
Convergence on cylinder sets (in the Fourier representation) therefore follows from Lemma 2.4. To prove tightness in H0a × H0a we will use the fact that the unit ball in H0b is compact in H0a for a < b < 0, and the uniform bound ERe ln Z2b = 2E = = ≤
1 2 1 2 1 2
∞
2 ln Z k 2b Re
k=1 ∞ k=1 ∞
k=1 ∞ k=1
k
(2.28)
k 2b−2 E| Tr U −k |2
(2.29)
k 2b−2 min(k, N )
(2.30)
k 2b−1 ;
(2.31)
436
C. P. Hughes, J. P. Keating, N. O’Connell
a similar bound holds for Im ln Z. We have used the fact that E| Tr U k |2 = min(|k|, N ) for k = 0 (see, for example, [25]). Thus, sup P (max{Re ln Zb , Im ln Zb } > q) N
(2.32)
≤ sup {P (Re ln Zb > q) + P (Im ln Zb > q)} N ≤ sup ERe ln Z2b + EIm ln Z2b /q 2
(2.34)
→0
(2.35)
N
as q → ∞, so we are done.
(2.33)
We will now discuss some properties of the limiting measure µ. Let (F, AF ) be a realisation of µ. First note that F and AF have the same law. Recalling the construction of F , we note that for k > 0 the random variables |Fˆk |2 are independent and |Fˆk |2 is exponentially distributed with mean 1/4k. It follows that F a < ∞ if, and only if, a < 0. In particular, F is almost surely not in L2 (T). −1/2 Nevertheless, we can characterise the law of F by stating that, for f ∈ H0 , 2f, F /f −1/2
(2.36)
is a standard normal random variable. The covariance is given by Ef, F g, F = We note that
f, g−1/2 = −2
T2
1 f, g−1/2 . 4
ln |eiθ − eiφ |f (φ)g(θ) m(dφ)m(dθ).
(2.37)
(2.38)
In the language of potential theory, if f is a charge distribution, then f 2−1/2 is the logarithmic energy of f . The logarithmic energy functional also shows up as a large deviation rate function for the sequence of eigenvalue distributions: see Sect. 3.5 below. We can also write down a stochastic integral representation for the process F . If we set φ S(φ) = F (θ)dθ, (2.39) then S has the same law as 1 S(φ) = 2π
2π
b(φ − θ)dB(θ ),
where B is a standard Brownian motion and ∞ 1 −3/2 b(θ ) = √ k cos(kθ ). 8π k=1 To see this, compare covariances using the identity 2 ∞ 1 1 − cos(kt) 1 11[0,t] − t = . 4 2π −1/2 4π 2 k3 k=1
Finally, we observe:
(2.40)
0
(2.41)
(2.42)
Characteristic Polynomial of Random Unitary Matrix
437
Lemma 2.6. Let δ > 0. The process S has a modification which is almost surely Hölder continuous with parameter 1 − δ. Proof. This follows from Kolmogorov’s criterion (see, for example, [26, Theorem 2.1]) and the fact that 2 1 t E|S(t) − S(0)| = 11[0,t] − 4 2π −1/2
(2.43)
∞ 1 1 − cos(kt) 4π 2 k3
(2.44)
2
=
k=1
∼−
1 2 t ln t, 8π 2
(2.45)
as t → 0+ . To see that this asymptotic formula is valid, one can use the fact that the expression (2.44) is related to Claussen’s integral (see, for example, [1, §27.8]). We conclude this section with two remarks on Theorem 2.5. First, Rains [25] showed that, for each θ = 0, Var CN (0, θ) =
1 (ln N + γ + 1 + ln |2 sin(θ/2)|) + o(1), π2
(2.46)
where CN (0, θ) is the number of eigenangles lying in the interval (0, θ ). Comparing this with (2.2) we see that 1 EIm ln Z(θ )Im ln Z(0) = − ln |2 sin(θ/2)| + o(1). 2
(2.47)
This is consistent with the fact that (formally) 1 EF (θ)F (0) = − ln |2 sin(θ/2)|. 2
(2.48)
The formal identity (2.48) in fact contains all of the information needed to determine the covariance structure of the process F . The fluctuation theorem (2.5) is therefore a statement which contains information about the global covariance structure of ln Z. The covariance (2.47) is too small to feature in the scaling of Theorem 2.2. Finally, the following observation arose in discussions with Marc Yor. The process F also appears in the following context. Let B be a standard complex Brownian motion, and f : C → R defined by f (z) = h(arg z)δ(|z| = 1) for some h : T → R with hˆ 0 = 0. Then, as t → ∞, 1 √ π ln t
0
t
f (Bs )ds ⇒ h, F .
This can be deduced from a result of Kasahara and Kotani given in [20].
(2.49)
438
C. P. Hughes, J. P. Keating, N. O’Connell
3. Large Deviations In this section we present large and moderate deviations results for ln Z(0). We begin with a quick review of one-dimensional large deviation theory (see, for example, [8, 11]). We are concerned with the log-asymptotics of the probability distribution of RN /A(N ), where RN is some one-dimensional real random variable and A(N ) is a scaling that is much greater than the square root of the variance of RN (so we are outside the regime of the central limit theorem). Suppose that there exists a function B(N ) (which tends to infinity as N → ∞), such that 1 B(N ) (3.1) ;(λ) := lim ln E exp λ RN N→∞ B(N ) A(N ) exists as an extended real number, for each λ (i.e. the pointwise limit exists in the extended reals). The effective domain of ;(·) is the set D = {λ ∈ R : ;(λ) < ∞}
(3.2)
and its interior is denoted by D◦ . The convex dual of ;(·) is given by ;∗ (x) = sup{λx − ;(λ)}. λ∈R
(3.3)
Theorem 3.1. For a < b, if ;(·) is differentiable in D◦ and if
then
(a, b) ⊆ {;" (λ) : λ ∈ D◦ },
(3.4)
1 RN ln P ∈ (a, b) = − inf ;∗ (x). N→∞ B(N) x∈(a,b) A(N )
(3.5)
lim
If (3.5) holds we say that RN /A(N ) satisfies the large deviation principle (LDP) with speed B(N) and rate function ;∗ . Some partial moderate deviations results can be obtained using Lemma 2.3; however, for many of the results presented here we will need more detailed information. In particular, we will make use of the following explicit formula (see, for example, [2, 7, 21]): E exp (sRe ln Z(θ ) + tIm ln Z(θ )) G(1 + s/2 + it/2)G(1 + s/2 − it/2)G(1 + N )G(1 + N + s) = , G(1 + N + s/2 + it/2)G(1 + N + s/2 − it/2)G(1 + s)
(3.6)
valid for Re(s ± it) > −1, where G(·) is the Barnes G-function, described in Appendix A. We will find the single moment generating functions useful, which we record here as MN (s) := E exp(sRe ln Z(0)) G2 1 + 21 s G(N + 1)G(N + 1 + s) = , G(1 + s)G2 N + 1 + 21 s
(3.7) (3.8)
Characteristic Polynomial of Random Unitary Matrix
439
and LN (t) := E exp(itIm ln Z(0)) G 1 + 21 t G 1 − 21 t G2 (N + 1) = . G N + 1 + 21 t G N + 1 − 21 t Theorem 3.2. For any A(N ) ln N , and a < b < 0, Re ln Z(0) 1 ln P ∈ (a, b) = b. lim N→∞ A A
(3.9) (3.10)
(3.11)
Also, for any a < b < −1/2,
Re ln Z(0) 1 ln P ∈ (a, b) = b + 1/4. lim N→∞ ln N ln N
(3.12)
Proof. From Theorem 3.9 we have that if lim supN→∞ x/ ln N < −1/2, then 1 ln 2 − 21 ln π N 1/4 , p(x) ∼ ex exp 3ζ " (−1) + 12
(3.13)
where p(x) is the probability density function of Re ln Z(0). Therefore, for a < b < −1/2, b ln N Re ln Z(0) P p(x) dx ∈ (a, b) = ln N a ln N 1 ∼ exp 3ζ " (−1) + 12 ln 2 − 21 ln π N 1/4 N b − N a
(3.14)
and the result follows from taking logarithms of both sides. Similarly for A(N ) ln N with a < b < 0. 3.1. Large deviations at the scaling A = N . Since Re ln Z(0) ≤ N ln 2 and |Im ln Z(0)| ≤ N π/2, the scaling A = N is the maximal non-trivial scaling. Theorem 3.3. The sequence Re ln Z(0)/N satisfies the LDP with speed N 2 and rate function given by the convex dual of 2 1 (1 + s)2 ln(1 + s) − 1 + 21 s ln 1 + 21 s − 41 s 2 ln 2s for s ≥ 0 ;(s) = 2 ∞ for s < 0. (3.15) Proof. ln E exp(sN Re ln Z(0)) = ln MN (N s), the asymptotics of which are given in Appendix C, and so ;(s) = lim
N→∞
1 ln MN (N s) N2
(3.16)
= 21 (1 + s)2 ln(1 + s) − 1 + 21 s
2
ln 1 + 21 s − 41 s 2 ln 2s
(3.17)
for s ≥ 0, and ;(s) = ∞ for s < 0. If x > 0, then Theorem 3.1 implies that the rate function, I (x), is given by the convex dual of ;(s). If x < 0, then Theorem 3.2 implies that I (x) = 0. Thus for x ∈ R, I (x) is given by the convex dual of ;(s), and this completes the proof of Theorem 3.3.
440
C. P. Hughes, J. P. Keating, N. O’Connell
One can also obtain an LDP for the imaginary part: Theorem 3.4. The sequence Im ln Z(0)/N satisfies the LDP with speed N 2 and rate function given by the convex dual of 4 1 1 2 1 1 2 ;(t) = 8 t ln 1 + 2 − 2 ln 1 + 4 t + t arctan t . (3.18) t 2 Proof. ln E exp(tN Im ln Z(0)) = ln LN (−iN t), and the asymptotics (given in Appendix D) imply that 1 ln LN (−iN t) N2 4 1 = 18 t 2 ln 1 + 2 − 21 ln 1 + 41 t 2 + t arctan t t 2
;(t) = lim
N→∞
(3.19) (3.20)
Theorem 3.1 implies that J (y), the rate function, is given by the convex dual of ;(t), for all y ∈ R. 3.2. Moderate Deviations. At other scalings, one finds that the rate function is either quadratic or linear. √ Theorem 3.5. For scalings ln N A N , the sequence Re ln Z(0)/A satisfies the LDP with speed B = −A2 /W−1 (−A/N ) (where W−1 is Lambert’s W -function, described in Appendix B) and rate function given by √ x2 if ln N A ln N x ≥ −1/2 x2 if A = ln N I (x) = −x − 1/4 x < −1/2 (3.21) 2 x ≥ 0 x if ln N A N. x −1 if N s/χ ≤ −1
(3.23)
which follows from results summarized in Appendix C. Therefore a non-trivial limit of (3.22) occurs if B = N 2 ln χ /χ 2 , where χ = N A/B, that is, if B=
A2
A −W−1 − N
.
(3.24)
Characteristic Polynomial of Random Unitary Matrix
441
Note that the restriction χ → ∞ implies A N , and that the restriction that B → ∞ √ implies A ln N . χ If we set δ = lim inf N→∞ N , then we have 1 ln MN (sB/A) ;(s) = lim N→∞ B 1 2 s for s > −δ = 4 ∞ for s < −δ.
(3.25) (3.26)
√ If ln N A ln N then δ = +∞ and Theorem 3.1 implies that I (x) = x 2 for all x ∈ R. If A = ln N , then δ = 1/2, and Theorem 3.1 applies only for x > −1/2, where we have I (x) = x 2 . However, since B ∼ ln N at this scaling, Theorem 3.2 implies that, for x < −1/2, I (x) = |x| − 1/4. Finally, if ln N A N , then δ = 0, and I (x) = x 2 for x > 0 by Theorem 3.1 and I (x) = 0 for x < 0 by Theorem 3.2 (since B A for A ln N ). This completes the proof of Theorem 3.5, √ Remark. For all ln N A N it turns out that I (x) is the convex dual of ;(s). Once again, a similar result is true for the imaginary part, but this time the rate function is always quadratic. √ Theorem 3.6. For scalings ln N A N , the sequence Im ln Z(0)/A satisfies the LDP with speed B = −A2 /W−1 (−A/N ) and rate function J (y) = y 2 . Proof. For a given scaling sequence A(N ) we wish to find B(N ) such that 1 ln LN (−itB/A) N→∞ B lim
(3.27)
exists as a non-trivial pointwise limit. Applying results from Appendix D we have 2 ln χ N ln LN (−itB/A) = 41 t 2 N 2 2 + Ot (3.28) χ χ2 for all t ∈ R. So, √ as in the proof of Theorem 3.5, we need B to be as in (3.24) (which will be valid for ln N A N ), and the rate function will be given by the convex dual of 41 t 2 , i.e. J (y) = y 2 . 3.3. Large deviations of ln Z(θ) evaluated at distinct points. √ Theorem 3.7. For ln N A ln N , and for any r1 , . . . , rk (distinct), the sequence (Re ln Z(r1 )/A, Im ln Z(r1 )/A, . . . , Re ln Z(rk )/A, Im ln Z(rk )/A)
(3.29)
satisfies the LDP in (R2 )k with speed B = A2 / ln N and rate function I (x1 , y1 . . . , xk , yk ) =
k j =1
xj2 + yj2 .
(3.30)
442
C. P. Hughes, J. P. Keating, N. O’Connell
Proof. By Theorem 2.3, if B/A 1, k ln E exp sj Re ln Z(rj )B/A + tj Im ln Z(rj )B/A j =1
N B 2 ln N ∼ (sj2 + tj2 )/4 , A2
(3.31)
j =1
so choosing the speed B = A2 / ln N , the stated result follows from a multidimensional analogue of Theorem 3.1 (see, for example, [11]). √ 2 Remark. If B is given by (3.24), then for ln N A ln N , B ∼ lnAN . So for A in this restricted range, this theorem generalizes Theorems 3.5 and 3.6. From this we can deduce large deviations results for the counting function, using the identity (2.2). For example: √ Theorem 3.8. For ln N A ln N , and −π < s < t ≤ π , the sequence (CN (s, t) − (t − s)N/2π )/A satisfies the LDP in R with speed B = A2 / ln N and rate function L(x) = π 2 x 2 /2. 3.4. Refined large deviations estimates. By Fourier inversion, the probability density of Re ln Z(0) is given by ∞ 1 p(x) = e−iyx MN (iy) dy, (3.32) 2π −∞ where MN (iy) = EeiyRe ln Z(0) is given by (3.8). Theorem 3.9. If lim supN→∞ x/ ln N < −1/2, then 1 ln 2 − p(x) ∼ ex exp 3ζ " (−1) + 12 Proof. We evaluate 1 2π
C
1 2
ln π N 1/4 .
e−iyx MN (iy) dy,
(3.33)
(3.34)
where C is the rectangle with vertices −R, R, R + i + -i, −R + i + -i, for - a fixed real number subject to 0 < - < 1, and let R → ∞. Note that the contour encloses only the simple pole at y = i. The asymptotics for G(x) show that the integral on the sides of the contour vanish as M → ∞, which means p(x) = i Res e−iyx MN (iy) + E, (3.35) y=i
where E=
ex+-x 2π
∞ −∞
e−itx MN (it − 1 − -) dt.
(3.36)
Characteristic Polynomial of Random Unitary Matrix
443
It is not hard to show that i Res e−iyx MN (iy) ∼ ex exp 3ζ " (−1) + y=i
and ex+-x 2π
1 12
ln 2 −
1 2
ln π N 1/4 ,
∞
|MN (it − 1 − -)|dt −∞ ex+-x G2 21 − 21 - 1/4+-/2+- 2 /4 (ln N )−1/2 . ∼ √ N π G(−-)
|E| ≤
(3.37)
(3.38) (3.39)
Thus |E| ex N 1/4 when ex- N -/2+-
2 /4
(ln N )−1/2 1.
(3.40)
Thus the error term can be made subdominant if lim sup N→∞
x 1 0, then 2 ln MN (x) = N 2 21 (1 + λ)2 ln(1 + λ) − 1 + 21 λ ln 1 + 21 λ 1 1 − 41 λ2 ln(2λ) − 12 ln N − 12 ln λ + ζ " (−1) 1 1 1 + 6 ln(2 + λ) − 12 ln(1 + λ) + O . N
(C.4)
(C.5)
D. Asymptotics of ln LN (ix) We consider x ∈ R. From the asymptotics for the G-function, (A.2), we have ln LN (ix) = ln G 1 + 21 ix + ln G 1 − 21 ix − 38 x 2 + N 2 ln N 2 2 − 21 N + 21 ix ln N + 21 ix − 21 N − 21 ix ln N − 21 ix 1 1 1 ln N + 21 ix + 12 ln N − 21 ix + O − 16 ln N + 12 . N
(D.1)
450
C. P. Hughes, J. P. Keating, N. O’Connell
Constraining x(N) to lie in various regimes simplifies the above considerably: • If |x| 1, then
ln LN (ix) = 41 x 2 (ln N + 1 + γ ) + O(x 4 ) + O
1 N
• If x = O(1), then ln LN (ix) = ln G 1 + 21 ix + ln G 1 − 21 ix + 41 x 2 ln N + O √ N, then ln LN (ix) = 41 x 2 ln N − ln x + ln 2 + 23 − 4 x 1 +O +O . 2 N x2
.
(D.2)
1 N
.
(D.3)
• If 1 |x|
1 6
ln x +
1 6
ln 2 + 2ζ " (−1)
• If x = λN with λ = O(1), then ln LN (ix) = N 2 18 λ2 ln 1 + 4λ−2 − 21 ln 1 + 41 λ2 + λ tan−1 21 λ 1 1 − 16 ln N + 12 ln 1 + 4λ−2 + 2ζ " (−1) + O . N
(D.4)
(D.5)
Acknowledgements. We are grateful to Persi Diaconis and Steve Evans for their suggestions and for making the preprint [12] available to us. Thanks also to Harold Widom for helpful correspondence and Marc Yor for fascinating discussions on the connection between Theorem 2.5 and planar Brownian motion.
References 1. Abramowitz, M. and Stegun, I.A.: Handbook of Mathematical Functions. Dover, 1965 2. Baker, T.H. and Forrester, P.J.: Finite-N fluctuations formulas for random matrices. J. Stat. Phys. 88, nos. 5/6, 1371–1386 (1997) 3. Barnes, E.W.: The theory of the G-function. Quart. J. Pure Appl. Math. 31, 264–314 (1899) 4. Basor, E.: Asymptotic formulas for Toeplitz determinants. Trans. Am. Math. Soc. 239, 33–65 (1978) 5. Ben Arous, G. and Guionnet, A.: Large deviations for Wigner’s law and Voiculescu’s non-commutative entropy. Probab. Theor. Rel. Fields 108, no. 4, 517–542 (1997) 6. Ben Arous, G. and Zeitouni, O.: Large deviations from the circular law. ESAIM: Probability and Statistics 2, 123–134 (1998) 7. Böttcher, A. and Silbermann, B.: Introduction to Large Truncated Toeplitz Matrices. Berlin–Heidelberg– New York: Springer-Verlag, 1999 8. Bucklew, J.: Large Deviation Techniques in Decision, Simulation, and Estimation. New York: Wiley Interscience, 1990 9. Coram, M. and Diaconis, P.: New tests of the correspondence between unitary eigenvalues and the zeros of Riemann’s zeta function. Preprint (2000) 10. Costin, O. and Lebowitz, J.L.: Gaussian fluctuation in random matrices. Phys. Rev. Lett. 75, 69–72 (1995) 11. Dembo, A. and Zeitouni, O.: Large Deviations Techniques and Applications, 2nd Ed. Berlin–Heidelberg– New York: Springer-Verlag, 1998 12. Diaconis, P. and Evans, S.N.: Linear functionals of eigenvalues of random matrices. Preprint (2000) 13. Diaconis, P. and Shahshahani, M.: On the eigenvalues of random matrices. J. Appl. Probab. A 31, 49–62 (1994) 14. Guiounnet, A.: Fluctuations for strongly interacting random variables and Wigner’s law. Preprint (1999) 15. Guiounnet, A. and Zeitouni, O.: Concentration of the spectral measure for large matrices. Preprint (2000)
Characteristic Polynomial of Random Unitary Matrix
451
16. Hiai, F. and Petz, D.: A large deviation theorem for the empirical eigenvalue distribution of random unitary matrices. Preprint (2000) 17. Hida, T.: Brownian Motion. Berlin–Heidelberg–New York: Springer, 1980 18. Johansson, K.: On random matrices from the classical compact groups. Ann. Math. 145, 519–545 (1997) 19. Johansson, K.: On fluctuations of eigenvalues of random Hermitian matrices. Duke J. Math. 91, 151–204 (1998) 20. Kasahara, Y. and Kotani, S.: On limit processes for a class of additive functionals of recurrent diffusion processes. Z. Wahrsch. verw. Gebiete. 49, 133–153 (1979) 21. Keating, J.P. and Snaith, N.C.: Random matrix theory and ζ (1/2 + it). Commun. Math. Phys. 214, 57–89 (2000) 22. Laurinˇcikas, A.: Limit Theorems for the Riemann Zeta Function. Dordrecht: Kluwer Academic Publishers, 1996 23. Mehta, M.L.: Random Matrices. New York: Academic Press, 1991 24. Odlyzko, A.M.: The 1020 -th zero of the Riemann zeta function and 175 million of its neighbors. Unpublished 25. Rains, E.: High powers of random elements of compact Lie groups. Prob. Th. Rel. Fields 107, 219–241 (1997) 26. Revuz, D. and Yor, M.: Continuous Martingales and Brownian Motion. Berlin–Heidelberg–New York: Springer-Verlag, 1990 27. Shoshnikov, A.B.: Gaussian fluctuation for the number of particles in Airy, Bessel, sine, and other determinantal random point fields. Preprint (1999) 28. Szegö, G.: Orthogonal Polynomials. AMS Colloquium Publications XXII, 1939 29. Titchmarsh, E.C.: The Theory of the Riemann Zeta-Function. Oxford: Oxford Science Publications, 1986 30. Voros, A.: Spectral functions, special functions and the Selberg zeta function. Commun. Math. Phys. 110, 439–465 (1987) 31. Weyl, H.: Classical Groups. Princeton, NJ: Princeton University Press, 1946 32. Wieand, K.: Eigenvalue Distributions of Random Matrices in the Permutation Group and Compact Lie Groups. PhD Thesis, Harvard University, 1998 33. Wieand, K.: Eigenvalue distributions of random unitary matrices. Preprint (2000) Communicated by P. Sarnak
Commun. Math. Phys. 220, 453 – 454 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Erratum
Differential Graded Cohomology and Lie Algebras of Holomorphic Vector Fields Friedrich Wagemann Laboratoire de Mathématiques, Faculté des Sciences et des Techniques, Université de Nantes, 2, rue de la Houssinière, 44322 Nantes Cedex 3, France. E-mail:
[email protected] Received: 31 January 2001 / Accepted: 25 April 2001 Commun. Math. Phys. 208, 521–540 (1999)
The author apologizes for introducing an inadequate new object in the above stated article. The sheaf Homcont (C∗,dg (g), C) of continuous sheaf homomorphisms between the sheaf of differential graded chains C∗,dg (g) of a sheaf of Lie algebras g and the constant sheaf C was intended to generalize continuous cochains of Lie algebras to a sheaf setting, but this is probably not possible for cochains with trivial coefficients. The space of global sections H omcont (C∗,dg (g), C) of Homcont (C∗,dg (g), C) is smaller than the space of continuous cochains C ∗ (g(X)) on the Lie algebra g(X). It does not contain evaluations of differential expressions at a point or integrals over differential expressions, but these are important examples of continuous Lie algebra cochains – for example, the Virasoro cocycle is of this type. Let us explain the foregoing for the sheaf of differentiable vector fields Vect on a finite dimensional compact manifold X. Indeed, examples of cochains (i.e. elements of C ∗ (g(X))) are evaluations Dx0 (ξ1 , . . . , ξr ) = D(ξ1 , . . . , ξr )(x0 ) at a point x0 ∈ X of some differential expression D(ξ1 , . . . , ξr ) in the coefficient functions of ξ1 , . . . , ξr ∈ Vect(X). In order to check whether Dx0 is a morphism of sheaves, we write down the diagram of restrictions to the open set U ∗ = U \ {x0 }, where U is an open neighborhood of x0 : r (Vect(X)) resr (Vect) ❄ r (Vect(U ∗ ))
✲ C Dx0 resC φ ✲
❄
C
There cannot exist φ rendering the diagram commutative, since resC = idC . Thus, the cochain Dx0 is not an element of H omcont (C∗,dg (g), C).
454
F. Wagemann
A similar argument shows that cochains which are integrals over differential expressions are not elements of H omcont (C∗,dg (g), C). These two examples constitute the most important classes of cochains with values in the trivial module C. Clearly, sheaf approaches are possible (and carried out) in the case of cochains with values in a module which is itself a non-constant sheaf, for example the sheaf of sections of a vector bundle. In conclusion, one has to ban the object Homcont (C∗,dg (g), C) from the setting from the article (in particular, 1.1.8). The complementary Cech approach still remains valid. The spectral sequence for continuous differential graded cohomology (Lemmas 1 to 4), the cosimplicial version (Sect. 2.3, Theorem 4) work well still, and the main result of the article (Theorem 7) is unaffected. Communicated by M. Aizenman
Commun. Math. Phys. 220, 455 – 488 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Poisson–Lie T-Duality for Quasitriangular Lie Bialgebras E. J. Beggs1 , Shahn Majid2, 1 Department of Mathematics, University of Wales Swansea SA2 8PP, UK 2 School of Mathematical Sciences, Queen Mary, University of London, Mile End Rd, London E1 4NS, UK
Received: 22 August 1999 / Accepted: 4 February 2000
Abstract: We introduce a new 2-parameter family of sigma models exhibiting Poisson– Lie T-duality on a quasitriangular Poisson–Lie group G. The models contain previously known models as well as a new 1-parameter line of models having the novel feature that the Lagrangian takes the simple form L = E(u−1 u+ , u−1 u− ), where the generalised metric E is constant (not dependent on the field u as in previous models). We characterise these models in terms of a global conserved G-invariance. The models on G = SU2 and its dual G are computed explicitly. The general theory of Poisson–Lie Tduality is also extended, notably the reduction of the Hamiltonian formulation to constant loops as integrable motion on the group manifold. The approach also points in principle to the extension of T-duality in the Hamiltonian formulation to group factorisations D = G M, where the subgroups need not be dual or connected to the Drinfeld double. 1. Introduction Poisson–Lie T-duality has been introduced in [1–3] and other works as a non-Abelian version of T-duality in string theory, based on duality of Lie bialgebras. A motivation (stated in [2]) is quantum group or Hopf algebra duality; this had been introduced as a duality for quantum physics several years previously [4–7], as an “observable-state” duality for certain quantum systems based on group factorisations D = G M. In one system a particle moves in G under the action of M and its quantum algebra of observables is the bicrossproduct Hopf algebra U (m)C(G); in the dual system the roles of G, M are interchanged but its quantum algebra of observables C(M)U (g) has the same physical content with the roles of observables/states and position/momentum interchanged (here g, m are the Lie algebras of G, M respectively). Indeed, being mutually dual Hopf algebras the two quantum systems are related to each other by quantum Reader and Royal Society University Research Fellow at QM and Senior Research Fellow at the Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK.
456
E. J. Beggs, S. Majid
Fourier transform F : U (m)C(G) → C(M)U (g),
(1)
see [8] where this was recently studied in detail for the simplest example (the so-called Planck-scale Hopf algebra C[p]C[x] in [4].) Under this observable-state duality it was shown in [4] that one had inversion of coupling constants as well as connections with Planck-scale physics. At about the same time, Abelian T-duality was introduced in [9] and elsewhere as a momentum-winding mode symmetry in string theory with some similar features. The observable-state duality (1) was not, by contrast, limited in any way to the Abelian case and indeed there was a natural model for every compact simple group G with M = G , the Yang–Baxter dual. Here a Lie bialgebra is an infinitesimal version of a Hopf algebra and has a dual g , and G is its associated Lie group. It is also the group of dressing transformations [10] in the theory of classical inverse scattering and the solvable group in the Isawasa decomposition D = GC = G G of the complexification of the compact Lie group G, see [7]. Moreover, D = G G is the Lie group associated to the Drinfeld double d(g) of g as a Lie bialgebra [11]. The Lie bialgebra structure of g also implies a natural Poisson bracket on G [11]. Further details are in the Preliminaries; see also [12] for an introduction to these topics. These quantum systems U (g)C(G ) with observable-state duality were constructed in [5–7] as one of the two main sources of quantum groups canonically associated to a simple Lie algebra (the other is the more well-known q-deformation of U (g) to quantum groups Uq (g)). The subsequent theory of Poisson–Lie T-duality [1, 3] indeed has many of the same features. One system consists of a sigma model on the group G with a Lagrangian of the form L = Eu (u−1 u+ , u−1 u− ),
u : R1,1 → G,
where u is the field, u± are derivatives in light-cone coordinates and Eu a bilinear form on g but depending on the value of u (a “generalised metric” since Eu need not be symmetric). The dual theory is a sigma-model on G with Lˆ = Eˆ t (t −1 t+ , t −1 t− ),
t : R1,1 → G .
The physical content of the two theories is established to be the same due to the existence of the larger group D = G G associated to the Drinfeld double d(g). In the present paper we extend Poisson–Lie T-duality in several directions, motivated in part by the above connections with quantum groups and observable-state duality. From a physical point of view the main result is as follows: the previously-known models exhibiting Poisson–Lie T-duality require a very special form of the generalised metric Eu depending on u in a rather complicated way (related to the Poisson bracket on G). This is in sharp contrast to the usual principal sigma model [13] where the metric is a constant, the Killing form K. As a result, Poisson–Lie T-duality would appear to be somewhat artificial and to apply to only certain highly non-linear models where the “metric” in the target group is far from constant. Even the explicit form of Eu is known only in some simple cases such as g = b+ the Borel-subalgebra of su2 [2] and g = su2 case as discussed in [18] and (not very explicitly) in [14]. Our main result is the introduction of a new 2-parameter class of models within the existing general framework for Poisson–Lie T-duality but whith much nicer properties. We also provide new computational tools using the theory of Lie bialgebras to compute the models explicitly. We obtain, for
Poisson–Lie T-Duality for Quasitriangular Lie Bialgebras
457
example, the explicit Lagrangians in the SU2 case and its dual in a compact form that we have not found elsewhere (in the SU2 case directly in terms of the matrix-valued fields). These new models require that g is a quasitriangular Lie bialgebra, i.e. defined by an element r ∈ g ⊗ g obeying the so-called modified classical Yang–Baxter equations, [11]. This includes all complex semisimple Lie algebras equipped, for example, with their standard Drinfeld–Sklyanin quasitriangular structure as used in the theory of classical inverse scattering. The quantisations of the associated Poisson bracket on G in these cases include coordinate algebras of the quantum groups Uq (g). This is therefore an important class of models, and we will find quite tractable formulae in this case. We use r not only in the Lie bialgebra structure (which is usual) but again in certain boundary conditions for the graph coordinates in order to cancel their natural u-dependence for the choice of certain parameters. This greater generality allows for a two-parameter family of models associated to this data. Moreover, in this extended parameter space there is a novel line of “nice” models in which Eu = Ee is a constant not dependent at all on u. This line includes at ∞ the standard principal sigma model where Ee = K the Killing form, but at other points has an antisymmetric part built from r itself. In this way one may approach the principal sigma model itself along a line of sigma models exhibiting Poisson–Lie T-duality and of a simple form without additional non-linearities due to a non-constant generalised metric. The dual models are more complicated but at ∞, for example, one obtains an Abelian model as the Poisson–Lie T-dual of the principal sigma model approached in this way (the latter lies on the boundary of the space of models exhibiting T-duality). These results are presented in Sect. 6. Also in the paper we consider the Hamiltonian picture of Poisson–Lie T-duality in more detail than we have found elsewhere, but see [15] where the Hamiltonian theory was introduced. This is done in Sect. 3 after the preliminary Sect. 2. Among the modest new results is a more regular expression for the Hamiltonian that covers both the model and the dual model simultaneously. Also new is a study of the symmetries of the theory induced by the left action of D on itself. These are not usually considered because they are not conserved but we show that they do respect the symplectic structure. Moreover, when Eu is constant we show that the action of G ⊂ D is conserved and we compute the conserved charges. A second development in this context, in Sect. 4, is a study of the classical mechanical system on G (say) in the limit of point-like strings (i.e. x-independent solutions). We show that this constraint commutes with the dynamics and we provide the resulting Lagrangian and Hamiltonian systems and the phase space. The left action of D descends to the classical mechanical system and we show that it has a moment map. The conserved charges are computed in the case of constant Eu . The dual model on G equivalent to these point-solutions are not point solutions but extended solutions of a certain special form. We also discuss the quantisation of this classical mechanical system both conventionally and in a manner relevant to the conserved charges. Although these systems appear to be different from the systems U (g )C[G] exhibiting observable-state duality at the Planck-scale [4], we do establish some points of comparison, such as a common phase space. Section 5 contains some further algebraic preliminaries needed for the explicit construction of Eu . We emphasise the matched pair of groups point of view and recover in particular the known formula [3] for the general case Ad∗u (Eu ) = (Ee−1 + (u))−1 ,
458
E. J. Beggs, S. Majid
where is the g ⊗ g-valued function defining the Poisson-structure on G and Ee−1 need not be the standard choice obtained from the Killing form. Although the freedom to choose Ee−1 has been known from the start [3], we provide the first general constructions for models based on a different and nonstandard choice for Ee−1 . In particular, this allows us in Sect. 6 to present our main result; the class of “nice” Poisson–Lie T-dual models based on quasitriangular Lie bialgebras. Finally, Sect. 7 introduces new “double-Neumann” boundary conditions for the open string and proceeds for these (as well as more trivially for closed strings) to extend the Poisson–Lie T-duality in the Hamiltonian form to general group factorisations D = G M, where D need no longer be the Lie group of the Drinfeld double d(g) and indeed m need not be g but could be some quite different Lie algebra, possibly of different dimension. This is directly motivated by the observable-state duality models which exist [5, 12] for any factorisation. It is also motivated by the Adler–Kostant–Symes theorem in classical inverse scattering which works for a general factorisation equipped with an inner product, see [12]. The dynamics are determined, similarly to the conventional bialgebra theory, by the splitting of the Lie algebra of D into orthogonal subspaces but these need no longer be of the same dimension (although only in this case is there a sigma-model interpretation). We also have an action of D by left multiplication on the phase space with the double-Neumann boundary conditions which is useful even for standard Poisson–Lie T-duality based on Lie bialgebras. In particular, it extends to an action of the affine Kac–Moody Lie algebra d˜ . Several directions remain for further work. First of all, only some first steps are taken (in Sect. 4) to relate T-duality to observable-state duality (1) in the quantum theory; our long term motivation here is to extend these ideas from particles to loops and hence to formulate T-duality for the full quantum systems as a duality operation on a more general algebraic structure (no doubt more general than Hopf algebras but in the same spirit). This in turn would give insight into the correct algebraic structure for the conjectured “M-theory” about which little is known beyond dualities visible in the Lagrangians at various classical limits. Let us mention only that Poisson–Lie T-duality is connected also with mirror symmetry [16] and indirectly with several other relevant dualities in the theory of strings and branes. Secondly, there are some interesting examples of the generalisation of Poisson–Lie T-duality in Sect. 7 which exist in principle and should be developed further. Thus, the conformal group on Rn (n > 2) has, locally, a factorisation into the Poincaré group and an Rn of special conformal translations. The global structure of the factorisation is singular in a similar manner to the “black-hole event-horizon”-like features of the Planck-scale Hopf algebra C[p]C[x] in [4]. There is also the possibility in our more general setting of a many-sided T-duality (i.e. not only two equivalent theories) associated to more than one factorisation of the same group. Finally, the natural emergence of generalised metrics which have both symmetric and antisymmetric parts is a natural feature of noncommutative Riemannian geometry [17] (where symmetry is natural only in the commutative limit). This is a further direction that remains to be explored. Also to be considered is the addition of WZNW terms to render our 2-parameter class of sigma-models conformally invariant as well as the computation of 1-loop or higher quantum effects cf. [18, 19].
Preliminaries. We recall, see e.g. [12] that a Lie bialgebra is a Lie algebra equipped with δ : g → g ⊗ g, where δ is antisymmetric and obeys the coJacobi identity (so that
Poisson–Lie T-Duality for Quasitriangular Lie Bialgebras
459
g∗ is a Lie algebra) and δ[ξ, η] = adξ (η) − adη (ξ ) for all ξ, η ∈ g, where ad extends as a derivation. Next, associated to any Lie bialgebra g there is a double Lie algebra d = g g∗op . This is a double semidirect sum with cross relations [φ, ξ ] = φξ − φξ, where the actions are mutually coadjoint ones φξ = ξ[2] , φξ[1] ,
φξ = ξ, φ[2] φ[1] ,
where the angle brackets are the dual pairing of g∗ with g and δ(ξ ) = ξ[1] ⊗ ξ[2] . Here d is quasitriangular and factorisable (see later) and as a result there is an adjoint invariant inner product on d, (ξ ⊕ φ, η ⊕ ψ) = φ, η + φ, ξ . Here g = g∗op
(2)
and g are maximal isotropic subspaces. We will need this description from [6] which is somewhat more explicit than the usual description in terms of the “Manin triple” in Drinfeld’s work [20]. Given a double cross sum of Lie algebras g m, we may at least locally exponentiate to a double cross product of Lie groups G M. This is given explicitly in [7]. We view the Lie algebra actions as cocycles, exponentiate to Lie group cocycles, view these as flat connections and take the parallel transport operation. The actions can be described by b(u) ∈ g ⊗ m∗ given by b(u)(φ) = bφ (u) = (φu)u−1 and a(s) ∈ g∗ ⊗ m given 1 ∗ by a(s)(ξ ) = aξ (s) = s −1 (sξ ). It can be shown that b ∈ ZAd ⊗ ∗ (G, g ⊗ m ) is a cocycle, where the action ∗ is a left action of G on m∗ given by dualising the right action : m × G → m. Also a ∈ Z1∗ ⊗ AdR (M, g∗ ⊗ m), where AdR is the right adjoint action of M on m and ∗ is the right action of M on g∗ given by dualising its action on g. These Lie-algebra-valued functions a, b generate the vector fields for the action of g on M and m on G respectively. Thus, φu = bφ (u)u, where ξ u = ξ˜ denotes the right invariant vector field on G generated by ξ ∈ g. Similarly, sξ = saξ (s). Once the global actions of G on M and vice-versa are known, the structure of G M is such that su = (su)(us),
∀u ∈ G, s ∈ M.
(3)
This allows every element of the double cross product group G M to be uniquely factorised either as GM or as MG, and relates the two factorisations.
460
E. J. Beggs, S. Majid
2. T-Duality Based on Lie Bialgebras We begin by giving a version of the standard T-duality based on the Drinfeld double of a Lie bialgebra [2, 3]. We will phrase it slightly differently in terms of double cross products with a view to later generalisation. Thus, there is a double cross product group D = G M with Lie algebra d = g + m, and an adjoint-invariant bilinear form on d which is zero on restriction to g and m. The Lie algebra d is the direct sum of two perpendicular subspaces E− and E+ . This means that m = g∗op , that the factorisation is a coadjoint matched pair and that d = D(g), the Drinfeld double of g, which is the setting that Klimˇcík etc., assume. On R2 we use light cone coordinates x+ = t + x and x− = t − x, where t and x are the standard time-space coordinates. Now let us suppose that there is a function k : R2 → G M, with the properties that k+ k −1 (x+ , x− ) ∈ E− and k− k −1 (x+ , x− ) ∈ E+ for all (x+ , x− ) ∈ R2 . Then we see that, if we factor k = us for u ∈ G and s ∈ M, u−1 u± + s± s −1 ∈ u−1 E∓ u. If the projection πg : d → g (with kernel m) is 1-1 and onto when restricted to u−1 E− u and u−1 E+ u, we can find graph coordinates Eu : g → m and Tu : g → m so that ξ + Eu (ξ ) : ξ ∈ g = u−1 E+ u and ξ + Tu (ξ ) : ξ ∈ g = u−1 E− u. It follows that s− s −1 = Eu (u−1 u− ) and s+ s −1 = Tu (u−1 u+ ). From the identity (s+ s −1 )− − (s− s −1 )+ = [s− s −1 , s+ s −1 ] we deduce that u(x+ , x− ) satisfies the equation Tu (u−1 u+ ) − − Eu (u−1 u− ) + = Eu (u−1 u− ), Tu (u−1 u+ ) .
(4)
Klimˇcík shows that the Lagrangian density L = Eu (u−1 u− ), u−1 u+
(5)
gives rise to these equations of motion. The dual theory is given by the factorisation k = tv, where t ∈ M and v ∈ G. If we let Eˆ t : m → g and Tˆt : m → g be the graph coordinates of t −1 E+ t and t −1 E− t respectively, then t (x+ , x− ) obeys the dual equation (6) Tˆt (t −1 t+ ) − − Eˆ t (t −1 t− ) + = Eˆ t (t −1 t− ), Tˆt (t −1 t+ ) . These are the equations of motion for a sigma model with Lagrangian Lˆ = Eˆ t (t −1 t− ), t −1 t+ .
(7)
These two models are different but equivalent descriptions of the model defined by k. The (u, s) and (t, v) coordinates are related by the actions of the double cross product group structure: tv = (tv)(tv) = us.
(8)
Poisson–Lie T-Duality for Quasitriangular Lie Bialgebras
461
3. Hamiltonian Formulation of T-Duality There are two models considered in the last section, the first order equations of motion for k : R2 → G M and the second order equations of motion for u : R2 → G. The equations of motion for k : R2 → G M are the natural way to introduce duality into the system, and are very nearly equivalent to the equations of motion for u : R2 → G. There is not a 1–1 correspondence between the systems, as multiplying k on the right by a constant element of M gives rise to exactly the same u. We have a Lagrangian and Hamiltonian for the u equations of motion, and can work out the corresponding Hamiltonian mechanics. However the reader must remember that this will not give the Hamiltonian mechanics for k, but rather for k quotiented on the right by constant elements of M. As pointed out by Klimˇcík, we can take the phase space of the system to be the set of ∞ smooth functions C ∞ (R, D) (or more strictly C (R, D)/M), where we regard R to be a constant time line in R1+1 , or C ∞ (0, π ), D /M for a finite space. We will compute the symplectic structure more explicitly than we have found elsewhere and then obtain a new and more symmetric formulation of the Hamiltonian density that covers both the model and the dual model simultaneously. We will need this in later sections when we generalise to arbitrary factorisations, as well as for the point-like limit. 3.1. The symplectic form. We begin by showing that this is the correct phase space, i.e. that such a function encodes both u and u˙ on a constant time line. Thus, take k ∈ C ∞ (R, D) or C ∞ (0, π ), D . As k(x) ∈ D we can factor it as k(x) = u(x)s(x), so u(x) is specified on the constant time line. But we also know that sx s −1 = Tu (u−1 u+ ) − Eu (u−1 u− ) 1 ˙ − Eu (u−1 u) ˙ + Tu (u−1 ux ) + Eu (u−1 ux ) , Tu (u−1 u) = 2
(9)
and as we know sx s −1 and (Tu + Eu )(u−1 ux ), we can find (Tu − Eu )(u−1 ut ). From this we can in principle find u−1 u˙ as the function ξ → Tu (ξ ) − Eu (ξ ) is 1-1 (if η lay in the kernel of this operator then η + Tu (η) = η + Eu (η) ∈ u−1 (E+ ∩ E− )u = {0}). If we have a system with coordinates for configuration space qi , and Lagrangian L(qi , q˙i ), then the canonical momenta are pi = ∂L/∂ q˙i , and we define a symplectic form on the phase space by ω = dpi ∧ dqi . With a little thought, it can be seen that this corresponds to the directional derivative formula (where we have taken a Lagrangian density L)
π ω(u, u; ˙ a, b; c, d) = L (u, u; ˙ 0, c; a, b) − L (u, u; ˙ 0, a; c, d) dx. x=0
If we write a change in k as labelled by y we get ky = uy s + usy , and likewise for kz = uz s + usz . From the last section, we can write the Lagrangian density for our system as 4L(u, u) ˙ = Eu (u−1 u˙ − u−1 ux ), u−1 u˙ + u−1 ux , so we can calculate a partial derivative ˙ 0, c) = Eu (u−1 c), u−1 u˙ + u−1 ux + Eu (u−1 u˙ − u−1 ux ), u−1 c 4L (u, u; = Eu (u−1 u) ˙ − Tu (u−1 u) ˙ − Eu (u−1 ux ) − Tu (u−1 ux ), u−1 c,
462
E. J. Beggs, S. Majid
so 2L (u, u; ˙ 0, uy ) = −sx s −1 , u−1 uy , which results in 2L (u, u; ˙ 0, uy ; uz , u˙ z ) = − (sx s −1 )z , u−1 uy + sx s −1 , u−1 uz u−1 uy = − (sz s −1 )x , u−1 uy + [sx s −1 , sz s −1 ], u−1 uy + sx s −1 , u−1 uz u−1 uy . Now compare this with the standard 2-form on the loop group of D. Consider (k −1 ky )x , k −1 kz = (s −1 sy )x + [s −1 u−1 uy s, s −1 sx ] + s −1 (u−1 uy )x s, s −1 sz + s −1 u−1 uz s = (sy s −1 )x − [sx s −1 , sy s −1 ], u−1 uz + [sx s −1 , sz s −1 ], u−1 uy + sx s −1 , [u−1 uz , u−1 uy ] + sz s −1 , (u−1 uy )x . On integration we find
π
sz s −1 , u−1 uy
x=0
=
π x=0
(sz s −1 )x , u−1 uy + sz s −1 , (u−1 uy )x dx,
so we have the following symplectic form on the phase space:
π π 2ω(k; kz , ky ) = (k −1 ky )x , k −1 kz dx − sz s −1 , u−1 uy . x=0
x=0
(10)
Now we come to the complication, the fact that this form is degenerate on C ∞ (0, π ), D . If we take a change in k ∈ C ∞ (0, π ), D given by kφ for φ ∈ m, then ω(k; kz , kφ) = 0 for all kz . To remedy this we could remove the null direction by declaring that the phase space would actually be C ∞ (0, π ), D /M. Equivalently we could consider the phase space to consist of those k = us ∈ C ∞ (0, π ), D for which s(0) is the identity in M. 3.2. The Hamiltonian density. The Hamiltonian density generating the time evolution can be calculated by 4H = 4L (u, u; ˙ 0, u) ˙ − 4L(u, u), ˙ and using our previous result we can write this as ˙ 4H = − Eu (u−1 u˙ − u−1 ux ), u−1 ux − sx s −1 + Eu (u−1 u˙ − u−1 ux ), u−1 u = − Eu (u−1 u˙ − u−1 ux ), u−1 ux − Tu (u−1 u) ˙ + Tu (u−1 ux ), u−1 u ˙ = Eu (u−1 ux ), u−1 ux − Tu (u−1 u), ˙ u−1 u ˙ = Eu (u−1 ux ), u−1 ux + Eu (u−1 u), ˙ u−1 u, ˙ or equivalently ˙ u−1 u. ˙ 8H = (Eu − Tu )(u−1 ux ), u−1 ux + (Eu − Tu )(u−1 u),
(11)
Poisson–Lie T-Duality for Quasitriangular Lie Bialgebras
463
Using the equation we derived for sx s −1 , we can rewrite (Eu − Tu )(u−1 u), ˙ u−1 u ˙ as (Tu + Eu )(u−1 ux ) − 2sx s −1 , (Eu − Tu )−1 (Tu + Eu )(u−1 ux ) − 2sx s −1 = − (Tu + Eu )(Eu − Tu )−1 (Tu + Eu )(u−1 ux ), u−1 ux − 4sx s −1 , (Eu − Tu )−1 (Tu + Eu )(u−1 ux ) + 4sx s −1 , (Eu − Tu )−1 (sx s −1 ). If we observe that (Eu − Tu )(u−1 ux ), u−1 ux = (Eu − Tu )(Eu − Tu )−1 (Eu − Tu )(u−1 ux ), u−1 ux , then we can write 4H = − Tu (Eu − Tu )−1 Eu (u−1 ux ), u−1 ux − Eu (Eu − Tu )−1 Tu (u−1 ux ), u−1 ux − 2sx s −1 , (Eu − Tu )−1 (Tu + Eu )(u−1 ux )
(12)
+ 2sx s −1 , (Eu − Tu )−1 (sx s −1 ). To simplify this equation we shall first look at the form of the projections to the subspaces u−1 E+ u and u−1 E− u in terms of the graph coordinates. If we take ξ ∈ g and φ ∈ m, we can write ξ + φ = (w + Eu (w)) + (y + Tu (y)), where w = (Eu − Tu )−1 φ − (Eu − Tu )−1 Tu (ξ ) and y = (Eu − Tu )−1 Eu (ξ ) − (Eu − Tu )−1 φ. Then we can define projections πu+ and πu− to u−1 E+ u and u−1 E− u as πu+ (ξ + φ) = w + Eu (w) and
πu− (ξ + φ) = y + Tu (y).
It follows that (πu+ − πu− )ξ = −2Eu (Eu − Tu )−1 Tu ξ − (Eu − Tu )−1 (Tu + Eu )ξ, (πu+ − πu− )φ = 2(Eu − Tu )
−1
φ + (Tu + Eu )(Eu − Tu )
−1
φ.
(13) (14)
From this we can rewrite the last equation for the Hamiltonian as 4H = (πu+ − πu− )(u−1 ux + sx s −1 ), u−1 ux + sx s −1 . This can be further simplified by removing the u dependence from the projections. If π+ is the projection to E+ with kernel E− , then πu+ = Adu−1 ◦ π+ ◦ Adu , and since the inner product is adjoint invariant we find 4H = (π+ − π− )(ux u−1 + usx s −1 u−1 ), ux u−1 + usx s −1 u−1
(15)
or in terms of a combined variable on D, 4H = (π+ − π− )(kx k −1 ), kx k −1 .
(16)
The equations of motion can similarly be written in terms of k as ˙ −1 = (π− − π+ )(kx k −1 ). kk
(17)
464
E. J. Beggs, S. Majid
3.3. Symmetries of the models. Returning to the equations of motion in the form k± k −1 ∈ E∓ , it is clear that k → kd,
d∈D
(18)
is a global symmetry of the model. This has been discussed in [3]. In addition to this known symmetry we now consider k → dk,
E± → dE∓ d −1 ,
d∈D
(19)
which alters the subspaces E± and hence the model. On our phase space picture, where the different subspaces appear as different Hamiltonians, this left translation in D may not preserve the Hamiltonian for a particular model, but rather takes us from one model to another. To have a dynamical symmetry of a particular model we can proceed to restrict to left multiplication by those d ∈ D such that dE± d −1 = E± . We distinguish two special cases: (1) The subspaces E± are G-invariant, and (2) The subspaces E± are M-invariant. In Case (1) we say that the models are G-invariant. Then Tu = Te and Eu = Ee are independent of u ∈ G, and the models themselves are simpler to work with. The actions of d ∈ G by left translation in terms of the variables of the model and the dual model are (u, s) → (du, s),
(t, v) → ((t −1 d −1 )−1 , (t −1 d −1 )−1 v)
respectively. To see if the left translation has a moment map, we consider kz = δk for δ ∈ d in the equation for the symplectic form:
2ω(k; δk, ky ) =
π
x=0
π k(k −1 ky )x k −1 , δ dx − sz s −1 , u−1 uy . x=0
If δ ∈ g, then sz = 0, so we have the moment map 1 Iδ (k) = − 2
kx k −1 , δdx,
δ ∈ g.
In terms of the sigma-model on G, this is
−4Iδ (u) =
2u−1 ux +(Tu −Eu )(u−1 u)+ ˙ (Tu +Eu )(u−1 ux ), u−1 δudx ,
δ ∈ g,
which is a conserved charge in the G-invariant case. The left translations for δ ∈ m are not in general given by moment maps. There are analogous formulae for the dual model and the M-invariant case. We shall return to these symmetries when we have have discussed boundary conditions for the models. We shall also study the particular properties of G-invariant models in some detail in later sections.
Poisson–Lie T-Duality for Quasitriangular Lie Bialgebras
465
4. Solutions Independent of x In this section we show that the systems above in the Hamiltonian form have “pointlike” limits where the solutions are restricted so that the field u, say, is independent of x. This then becomes a system of a classical particle moving on the group manifold of G. In the dual picture, i.e. in terms of the variable t, the model is far from point-like and instead describes some form of extended object in the manifold M. We obtain the Poisson brackets and the Hamiltonian and we study the symmetries, in particular the G-invariant case. The dual case where t is pointlike and u extended is identical with the roles of G and M interchanged and is therefore omitted except with regard to the study of this case when the model is G-invariant. 4.1. The point-particle Poisson structure. The solutions which have u(x) independent of x are parameterised by initial values of u ∈ G and p = sx s −1 ∈ m. This is because the equation sx s −1 = (Tu − Eu )(u−1 u)/2 ˙ shows that p is also independent of x. Therefore the effective phase space coordinates are (u, p) rather than the fields (u(x), s(x)) in the general case. The symplectic form per unit length is then 2ω(u, p; uz , pz , uy , py ) = py , u−1 uz − pz , u−1 uy + p, [u−1 uz , u−1 uy ], which is closed independently of the pairing used. This can also be written as 2ω(u, p; uz , pz ; uy , py ) = (upu−1 )y , uz u−1 − (upu−1 )z , uy u−1 − p, [u−1 uz , u−1 uy ].
(20)
We now invert the symplectic form on the phase space m × G to find the Poisson structure. Define ω0 : (m ⊕ g) ⊗(m ⊕ g) → R by 2ω0 (py ⊕ ξy , pz ⊕ ξz ) = py , ξz − pz , ξy + p, [ξz , ξy ], ∀py , pz ∈ m, ξy , ξz ∈ g. Take a basis ei of g and a dual basis ei of m = g∗ (for 1 ≤ i ≤ n). Then we can take a basis of m ⊕ g as fi = ei for 1 ≤ i ≤ n and fi = ei−n for n + 1 ≤ i ≤ 2n. Then in this basis,
0 id A −id −1 and (2ω0 ) = , 2ω0 = −id A id 0 where Aij = p, [ei , ej ]. The corresponding tensor is 1 −1 ei ⊗ ei − ei ⊗ ei + ω0 = p, [ei , ej ]ei ⊗ ej . 2 1≤i,j ≤n
1≤i≤n
Now, ω(u, p; uξz , pz ; uξy , py ) = ω0 (pz ⊕ξz , py ⊕ξy ) so its inverse, the corresponding Poisson bivector, is given by left translation from ω0−1 , e˜i ⊗ ei − ei ⊗ e˜i + 2δp, (21) γ (p, u) = 2 i
where ξ˜ = uξ is the left-invariant vector field generated by ξ ∈ g.
466
E. J. Beggs, S. Majid
The Poisson bracket itself then can be described simply for functions f, g on G and ξ, η ∈ g = m∗ by {f, g} = 0,
{ξ, f } = −2ξ˜ (f ),
{ξ, η} = 2[ξ, η].
(22)
From this it is clear that we can quantise the system with the Weyl algebra C[G]>U (g) or at the C ∗ -algebra level C(G)>C ∗ (G), where G acts on G by left multiplication. 4.2. The point-particle Hamiltonian. We have shown that p = sx s −1 is independent of x, so s is of the form s = epx a, where a ∈ M is also independent of x. To find the equations of motion we write k = uepx a, where u ∈ G depends only on time, not on x. ˙ −1 = (π− − π+ )kx k −1 gives Then the equation of motion kk uu ˙ −1 + u
d px −px −1 u + uepx aa ˙ −1 e−px u−1 = (π− − π+ )(upu−1 ), (e )e dt
which yields, for the case x = 0, u−1 u˙ + aa ˙ −1 = (πu− − πu+ )p, and taking the first order terms in x gives p˙ = [aa ˙ −1 , p]. We can now get rid of the variable a and write the equations of motion in terms of u and p only, u−1 u˙ = πg (πu− − πu+ )p,
p˙ = [πm (πu− − πu+ )p, p].
In the constant case, the Hamiltonian per unit length (15) restricts to 4H = (π+ − π− )(upu−1 ), upu−1 .
(23)
We have to check that the restricted Hamiltonian and the restricted symplectic form indeed correspond to these equations of motion, i.e. that the constraint of x-independence commutes with the original Hamiltonian. To do this, it will be convenient to first calculate from the equations of motion d (upu−1 ) = u[(πu− − πu+ )p, p]u−1 = [(π− − π+ )(upu−1 ), upu−1 ], dt and now we can write 2ω(u, p; uz , pz ; u, ˙ p) ˙ = [upu−1 , (π+ − π− )upu−1 ], uz u−1 − (upu−1 )z , uu ˙ −1 − upu−1 , [uz u−1 , uu ˙ −1 ] = [uz u−1 , upu−1 ], (π+ − π− )upu−1 − (upu−1 )z − [uz u−1 , upu−1 ], uu ˙ −1 = (upu−1 )z , (π+ − π− )upu−1 − upz u−1 , (π+ − π− )upu−1 − upz u−1 , uu ˙ −1 = (upu−1 )z , (π+ − π− )upu−1 − pz , u−1 u˙ − (πu− − πu+ )p = (upu−1 )z , (π+ − π− )upu−1 + pz , aa ˙ −1 = 2Hz ,
Poisson–Lie T-Duality for Quasitriangular Lie Bialgebras
467
where we used at the end the equations of motion again, and then that pz , aa ˙ −1 = 0 as m is isotropic. In terms of graph coordinates, we can write the equations of motion as u−1 u˙ = −2(Eu − Tu )−1 p = 2Tu−1 (Eu−1 − Tu−1 )−1 Eu−1 p, p˙ = −[(Eu + Tu )(Eu − Tu )
−1
p, p] =
[(Eu−1
(24)
− Tu−1 )−1 (Eu−1
+ Tu−1 )p, p],
(25)
and the Hamiltonian as 4H = (πu+ − πu− )p, p = 2(Eu − Tu )−1 p, p
(26)
= 2(Eu−1 − Tu−1 )−1 Eu−1 p, Eu−1 p.
There is also a “conjugate” description of the system which we mention briefly here. Although only sx s −1 = p is directly needed for solving the x-independent equations of motion for the u variable, the rest of the degrees of freedom in s are also an auxiliary part of the system from the point of view of the group D. It turns out that one could equally regard (p, a) as phase space variables and solve the system in terms of them, with u regarded as auxiliary. Then the equations of motion would be aa ˙ −1 = πm (πu− − πu+ )p = −(Eu + Tu )(Eu − Tu )−1 p, p˙ = [πm (πu− − πu+ )p, p].
(27)
If we work with the phase space m × M = g ⊗ G , we can more easily compare the system with the classical phase space of the bicrossproduct Hopf algebra U (g)C[G ] associated to the same factorisation of D in [4]. In fact both the Poisson structures and the natural Hamiltonians look somewhat different, but the general interpretation as a particle on M = G with momentum given by p ∈ g is the same. 4.3. Symmetries of the point-particle system. We now consider which of the translation symmetries of the general theory restrict to the x-independent solutions. First of all, the right translation symmetries are not interesting in this case: the right action by M is the identity on our (u, p) coordinates, while the right action by G does not preserve that u is x-independent. On the other hand, the left translation symmetries by d ∈ D do preserve that u is x-independent. We compute the Hamiltonian functions for these actions. First of all, for an infinitesimal transformation by φ ∈ m the variations of u, upu−1 are uφ = φu,
(upu−1 )φ = [φ, upu−1 ],
and hence (20) yields 2ω(u, p; uz , pz ; uφ , pφ ) = −(upu−1 )z , φ for any variation uz , pz . Hence the Hamiltonian function generating this flow is 1 Iφ (u, p) = − upu−1 , φ 2 1 = −hu(pu−1 ), φ = − ubp (u−1 )u−1 , φ, 2
∀φ ∈ m.
468
E. J. Beggs, S. Majid
Similarly, for an infinitesimal left translation generated by ξ ∈ g we have uξ = ξ u (the right-invariant vector field generated by ξ ) and pξ = 0. In this case we obtain more simply 2ω(u, p; uz , pz ; uξ , pξ ) = −(upu−1 )z , ξ or the generating function 1 1 Iξ (u, p) = − upu−1 , ξ = − pu−1 , ξ , 2 2
∀ξ ∈ g.
The two cases can be combined into a single generating function or moment map 1 Iδ (u, p) = − upu−1 , δ, 2
∀δ ∈ d.
(28)
In particular, we see that if the model is G-invariant, so that G is a dynamical symmetry, then the projection of upu−1 to m, QG = pu−1
(29)
is a constant of motion, the conserved charge for the symmetry. Likewise, if the model is M-invariant then the projection of upu−1 to g, QM = ubp (u−1 )u−1
(30)
is a constant of motion. The Hamiltonian and the equations of motion also simplify in the G-invariant case, namely (24)-(26) with Eu = Ee and Tu = Te . Writing U = 2(Te − Ee )−1 , V = 1 2 (Ee + Te ), we have u−1 u˙ = Up,
p˙ = [V Up, p],
4H = −Up, p.
(31)
Thus, the equations of motion decouple in this case; p˙ is a quadratic function of p and u−1 u˙ is a linear function of p, i.e. can then be obtained (in principle) by integrating p(t). 4.4. The extended system dual to the point-particle limit. The dual model when u is x-independent is described by variables t, v both far from x-independent. The dual constraint is one where t is fixed to be x-independent, in which case the model in our original u, s description is far from x-independent. Rather, it is some form of “extended solution”. We can reverse the order of factorisation k = uepx a = tv to get t −1 = (epx a)−1 u−1 and v −1 = (epx a)−1 u−1 . Here u, p and a are functions of t only. It can be seen that t has a modified exponential behaviour in x, and that v is a constant acted on by an exponential as a function of x. In particular t will not satisfy the Neumann boundary conditions. The Hamiltonian can be written as 4H = (πt+ − πt− )(t −1 tx + vx v −1 ), t −1 tx + vx v −1 , where πt± are the projections to t −1 E± t. The constraints on the dual system corresponding to the constant u are that tv and tx t −1 + tvx v −1 t −1 are independent of x.
Poisson–Lie T-Duality for Quasitriangular Lie Bialgebras
469
5. More About Graph Coordinates In this section we provide some preliminary results on the explicit construction of the graph coordinates of the subspaces Adu−1 E± in terms of the actions of the groups on the Lie algebras. This is needed, in particular, for the explicit computations for the quasitriangular case in the next section. In fact it will be convenient to consider the inverses of the graph coordinates rather than the graph coordinates themselves, as the formulae are considerably simpler. Thus, given generic E ± , the subspace Adu−1 E+ contains elements of the form Adu−1 (Ee−1 (φ) ⊕ φ) = Adu−1 (Ee−1 (φ) + bφ (u)) ⊕ φu = Eu−1 (φu) ⊕ φu, so we deduce that Eu−1 (φ) = Adu−1 (Ee−1 (φu−1 ) + bφu−1 (u)). We can write this as E¯ u−1 ≡ Adu ◦ Eu−1 ◦ (( )u),
E¯ u−1 = Ee−1 + b(u).
(32)
Also observe that (su−1 )u = (su−1 )−1 for any double cross product group, which implies that Adu−1 bφu−1 (u) = −bφ (u−1 ) (this is part of the cocycle property for b). Hence we can write equivalently Eu−1 (φ) = Adu−1 (Ee−1 (φu−1 )) − bφ (u−1 ).
(33)
The same formulae hold for T replacing E. If we consider the dual model the subspace Adt −1 E+ contains elements of the form Adt −1 (ξ ⊕ Eˆ e−1 (ξ )) = t −1 ξ ⊕ Adt −1 (Eˆ e−1 (ξ ) + aξ (t −1 )) = t −1 ξ ⊕ Eˆ t−1 (t −1 ξ ), from which we deduce ˆ¯ −1 ≡ Ad ◦ Eˆ −1 ◦ (t −1 ( )), E t t t
ˆ¯ −1 = Eˆ −1 + a(t −1 ) E e t
(34)
or equivalently that Eˆ t−1 (ξ ) = Adt −1 (Eˆ e−1 (tξ )) − aξ (t),
(35)
similarly for Tˆ . Note also that Ee−1 (φ) + φ ∈ E+ for all φ ∈ m and since this also characterises Eˆ e (and similarly for Tˆe ), we conclude that Eˆ e = Ee−1 ,
Tˆe = Te−1 .
(36)
Finally, we specialise to the case of a coadjoint matched pair, i.e. where g is a Lie bialgebra and m = g , with d = g g the Drinfeld double. This recovers the formulae of [3]. Now, associated to the Lie bialgebra structure is a Poisson–Lie group structure on G defined by the bivector ˜ γG (u) = (u),
470
E. J. Beggs, S. Majid
where ˜ = R∗ denotes extension as a left-invariant vector field and : G → g ⊗ g is 1 (G, g ⊗ g) extending the Lie cobracket δ ∈ Z 1 (g, g ⊗ g) (which the cocycle ∈ ZAd ad is the derivative of at the group identity). Since the action of g on g in the coadjoint matched pair is just δ viewed by evaluation against the second factor of its output, the cocycle generator b of its corresponding vector fields on G is just b = in this case. Also observe that we could equally well have defined γ as generated by right-invariant vector fields from some R , say. Here R (u) = Adu−1 ((u)) = −(u−1 ), the last equation by the cocycle condition obeyed by . To apply these observations to the above we write the operator Eu−1 : m → g as an evaluation against the second factor of elements Eu−1 ∈ g ⊗ g (we use the same symbols when the meaning is clear). Similarly for Eˆ t−1 . Then Eu−1 = Adu−1 (Ee−1 ) − (u−1 ) = Adu−1 (Ee−1 ) + R (u)
(37)
as elements of g ⊗ g. Inverting this defines the Lagrangian for the models, L = Eu (u−1 u− ), u−1 u+ = Eu (u−1 u+ , u−1 u− ),
(38)
where in the second expression we view Eu : g → m as an evaluation against the second factor of Eu ∈ m ⊗ m. Or in terms of E¯ u−1 = Adu (Eu−1 ) ∈ g ⊗ g, we have [3] E¯ u−1 = Ee−1 + (u),
(39)
and the Lagrangian is written equally as L = E¯ u (u− u−1 ), u+ u−1 = E¯ u (u+ u−1 , u− u−1 ).
(40)
One or the other of these two forms is usually easier to compute. Similarly, for the dual model we identify a(t) : g → m with evaluation against the ˆ R , i.e. a = − ˆ R when the latter is considered as an operator by first component of evaluation against its second factor (a convention that we adopt unless stated otherwise). Then Eˆ t−1 = Adt −1 (Ee ) + R (t),
ˆ¯ −1 = E + (t) ˆ E e t
(41)
and ˆ¯ (t t −1 , t t −1 ) L = Eˆ t (t −1 t+ , t −1 t− ) = E t + −
(42)
is the Lagrangian for the dual model. These results allow us to explicitly construct the graph coordinates and the Lagrangians given a generic splitting of d into subspaces E ± . The latter are equivalent to specifying Ee−1 , Te−1 and these allow us to obtain the general Eu−1 , etc., from (33) or from (37), etc., in the coadjoint case.
Poisson–Lie T-Duality for Quasitriangular Lie Bialgebras
471
6. Models Based on g Quasitriangular In this section we define a class of Poisson–Lie dual models based on the double of g (the usual setting) but in the special case where g is quasitriangular and factorisable. In this case we are able to obtain much more explicit formulae for the model and the dual model than in the general case. A Lie bialgebra is quasitriangular if there is an element r ∈ g ⊗ g such that δξ = adξ (r) and r obeys the classical Yang–Baxter equations [r12 , r13 ] + [r12 , r23 ] + [r13 , r23 ] = 0
(43)
and has 2r+ = r + r21 ad-invariant. A factorisable quasitriangular Lie bialgebra is one where 2r+ viewed as a map g∗ → g is invertible. We denote its inverse by K. In standard examples where g is simple, K is a multiple of the Killing form viewed as a map. In this case there is an isomorphism [21, 12] d = g g∗ ∼ =gL gR ,
ξ ⊕ φ → (ξ + r1 (φ), ξ − r2 (φ))
which also sends the bilinear form , on d to KL −KR on the two copies gL , gR of g. Here KL , KR are two copies of K. Therefore the inverse image of gL , gR defines a splitting of d into mutually orthogonal subspaces. From the explicit form of the isomorphism in [12] one finds E + = {ξ − r1 (K(ξ )) + K(ξ )},
E − = {ξ − r2 (K(ξ )) − K(ξ )}.
(44)
These subspaces are not generic, however (the graphs blow up) but they are the model for the construction which follows. In fact one has a two parameter family of models by varying the coefficients of r1 , K in E + , etc., with graph coordinates in the general case. In another degenerate limit of these parameters one has the principal sigma model as well. 6.1. Construction of the quasitriangular models on G. The subspaces E ± defining our model will be constructed by introducing parameters into (44) in such a way as to preserve orthogonality. Equivalently, one may define suitable Ee−1 , Te−1 . We then obtain the general graph coordinates by the method of Sect. 5. In fact we consider the second problem first as it leads to the most elegant choice of ansatz for the Ee−1 , etc. Thus, in the case of a quasitriangular Lie bialgebra one has simply (u) = Adu (r) − r
(45)
for the cocycle defining its Poisson structure. This defines the Drinfeld–Sklyanin bracket on G when g is the standard quasitriangular structure [20] for a simple Lie algebra g. These are also the Poisson brackets of which the associated quantum groups in this case are the quantisations. We refer to [12] for further discussion of these preliminaries. In view of (45) and the results of Sect. 5, it is then immediate that the graph coordinates for the model on G in the quasitriangular case obey Eu−1 = Adu−1 (Ee−1 − r) + r
(46)
as an element of g ⊗ g. This equation, together with a little linear algebra, allows the explicit computation of the graph coordinates for any model based on a quasitriangular Lie bialgebra, given suitable Ee1 .
472
E. J. Beggs, S. Majid
Motivated by (44) we now let Ee−1 = (λ + 1)r + µK −1 , where λ, µ are two complex parameters. For generic values we will indeed be able to invert to obtain graph coordinates Eu , Tu and hence will obtain a model of the type studied in Sects. 2, 3. Clearly, from (47), we have Eu−1 = λAdu−1 (r) + r + µK −1
(47)
as solving Eq. (46) for all λ, µ. If we denote by r2 : g∗ → g the evaluation against the second factor of r ∈ g ⊗ g and similarly by r1 for evaluation against the first factor, we have equivalently, as maps m → g, Ee−1 = (λ + 1)r2 + µK −1 = (λ + µ + 1)r2 + µr1
(48)
for our class of models. Similarly, Te−1 = −(λ + 1)r1 − µK −1 = −(λ + µ + 1)r1 − µr2 .
(49)
These imply Ee−1 − Te−1 = (λ + 1 + 2µ)K −1 ,
Ee−1 + Te−1 = (λ + 1)(r2 − r1 ).
(50)
For further computations in the Hamiltonian formulation we need the difference of the associated projectors π± . Rearranging (13)–(14), we have (πu+ − πu− )ξ = 2(Eu−1 − Tu−1 )−1 ξ + (Eu−1 + Tu−1 )(Eu−1 − Tu−1 )−1 ξ, ∀ξ ∈ g,
(51)
− 2Eu−1 (Eu−1 − Tu−1 )−1 Tu−1 φ − (Eu−1 − Tu−1 )−1 (Eu−1 + Tu−1 )φ,
(52)
(πu+ − πu− )φ =
∀φ ∈ m.
Evaluating at the identity and inserting the above results for Ee−1 , etc., we obtain: 2 K(ξ, ξ ), λ + 1 + 2µ λ+1 (π+ − π− )ξ, φ = K(ξ, (r1 − r2 )φ), λ + 1 + 2µ 2 (π+ − π− )φ, φ = K(Te−1 φ, Te−1 φ), λ + 1 + 2µ (λ + 1)2 K(Te−1 φ, Te−1 φ) = K((r1 − r2 )φ, (r1 − r2 )φ) 4 (λ + 1 + 2µ)2 −1 + K (φ, φ). 4 (π+ − π− )ξ, ξ =
These results provide for the computation of the Hamiltonian from (15) in Sect. 3.
(53) (54)
(55)
Poisson–Lie T-Duality for Quasitriangular Lie Bialgebras
473
It remains to show that the above Ee−1 , Te−1 indeed define an orthogonal splitting of d into subspaces E± and to give these explicitly. First of all the corresponding subspaces defined by our choice of Ee−1 , Te−1 are (λ + 1)r1 (K(ξ )) − K(ξ ) : ξ ∈ g}, λ+1+µ (λ + 1)r2 (K(ξ )) + K(ξ ) E− = {Te−1 φ ⊕ φ} = {ξ − : ξ ∈ g}. λ+1+µ E+ = {Ee−1 φ ⊕ φ} = {ξ −
(56) (57)
To show that these form an orthogonal decomposition of d, we calculate the inner products Ee−1 φ ⊕ φ, Te−1 φ ⊕ φ = Ee−1 φ, φ + φ, Te−1 φ = (λ + 1)(r2 − r1 )(φ), φ = 0,
Ee−1 φ ⊕ φ, Ee−1 φ ⊕ φ = Ee−1 φ, φ + φ, Ee−1 φ = (λ + 1 + 2µ)K −1 (φ, φ),
Te−1 φ ⊕ φ, Te−1 φ ⊕ φ = Te−1 φ, φ + φ, Te−1 φ = −(λ + 1 + 2µ)K −1 (φ, φ).
In particular, E ± are mutually orthogonal as required (the latter two equations show further that the inner product is nondegenerate on each subspace). To show that the subspaces span d we need to show that ξ ⊕ φ = Ee−1 (ψ) + ψ + Te−1 (χ ) + χ has a (unique) solution for ψ, χ ∈ m for all ξ ∈ g and φ ∈ m. Clearly ψ + χ = φ. Meanwhile, putting in the form of Ee−1 , Te−1 we have ξ = µK −1 (ψ − χ ) + (λ + 1)(r2 (ψ) − r1 (χ )) which can be rearranged as 1 1 ξ + (λ + 1)(−r2 + K −1 )(φ) = (λ + 1 + 2µ)K −1 (ψ − χ ). 2 2 Thus we have an orthogonal splitting if and only if λ + 1 + 2µ % = 0.
(58)
We assume this throughout. Moreover, the splitting has the inverse-graph coordinates Ee−1 , Te−1 computed above. This completes the construction of our model at least in the Hamiltonian formulation. Indeed, this can be defined entirely in terms of Eu−1 , Tu−1 without recourse to Eu , Tu themselves. It is clear from our construction that: (1) The model is G-invariant if and only if λ=0
(59)
(or the Lie bialgebra structure on g is identically zero). (2) The standard Lagrangian for the model (which requires Eu ) exists if and only if (47) are nondegenerate, in particular when µK dominates, i.e. |µ| >> |λ + 1| and g is semisimple. We describe several special cases.
(60)
474
E. J. Beggs, S. Majid
Modified principal sigma model. This is obtained by λ = −1, µ = 1. Then E ± = {ξ ± K(ξ ) : ξ ∈ g},
Ee−1 = K −1 = −Te−1 .
(61)
Here Eu is obtained by inverting Fu = K −1 − (u) and is not independent of u ∈ G. Considering K, as maps K, 2 by evaluation against the second component, we have Eu−1 − Tu−1 = 2K −1 ,
Eu−1 + Tu−1 = 2R (u)
for this model. Here R (u) defines the Poisson-bracket associated to the Lie bialgebra structure of G and is viewed as a map m → g by evaluation (as usual) against its second factor. In particular, the Lagrangian is L = (K −1 + R (u))−1 u−1 u− , u−1 u+ = (K −1 + (u))−1 (u− u−1 ), u+ u−1 .
(62)
This recovers the setting of [3], for example, as a special case of our class of models. Note that the formulae for general µ but λ = −1 are strictly similar, with Eu = (µK −1 + R (u))−1 in the Lagrangian instead. Pure-quasitriangular and principal sigma model. The G-invariant models are obtained by λ = 0, µ = 0. In this case Eu−1 = Ee−1 = r2 + µK −1 ,
Tu−1 = Te−1 = −r1 − µK −1 .
For the equations of motion we can use the equations u−1 u− = Ee−1 (s− s −1 ) and u−1 u+ = Te−1 (s+ s −1 ) since the operators Ee−1 and Te−1 are defined as above, even though Ee and Te may not be. Then the equations of motion are most conveniently described as a sigma model for s, with equation (Te−1 (s+ s −1 ))− − (Ee−1 (s− s −1 ))+ = −[Ee−1 (s− s −1 ), Te−1 (s+ s −1 )]. We see that this case contains another sigma model on the dual group which makes sense in the G-invariant case. Indeed, in the general G-invariant case the variable s may be considered to have a complex parameter µ, which makes this look very much like inverse scattering for the sigma model. Moreover, for generic µ, the operators Ee and Te do exist, and both u and s are described by sigma models. The pure-quasitriangular model is the special case with µ = 0 as well. In this case the subspaces E ± are the ones in (44) corresponding to the Drinfeld double as gg. This new class of models has the Hamiltonian defined by 1 (π+ − π− )(ξ ⊕ φ), ξ ⊕ φ = K(ξ, ξ ) + K(ξ, (r1 − r2 )φ) + K(r1 (φ), r1 (φ)). 2 The principal sigma model is the limit with µ → ∞ and a suitable rescaling. It is on the boundary of our moduli space of quasitriangular models. Then E + = {ξ + µ−1 (ξ − r1 ◦ K(ξ )) + K(ξ ))}, E − = {ξ + µ−1 (ξ − r2 ◦ K(ξ )) − K(ξ ))}
(63)
Poisson–Lie T-Duality for Quasitriangular Lie Bialgebras
475
and Ee = µ−1 (K −1 + µ−1 r2 )−1 = µ−1 K(1 − µ−1 r2 ◦ K + · · · ). Hence the Lagrangian is L(u) = (µK −1 + r2 )−1 (u−1 u− ), u−1 u+ = µ−1 K(u−1 u− , u−1 u+ ) + µ−2 K(r2 ◦ K(u−1 u− ), u−1 u+ ) + · · ·
(64)
which after an infinite renormalisation has as leading term the usual principal sigma model. The equation of motion, to lowest order in µ−1 , is K((u−1 u+ )− + (u−1 u− )+ ) = µ−1 K(r1 K(u−1 u+ )− + r2 K(u−1 u− )+ )−[K(u−1 u− ), K(u−1 u+ )] + · · · . This is the usual principal sigma model equations of motion to lowest order in µ−1 , namely (u−1 u+ )− + (u−1 u− )+ = 0. 6.2. Quasitriangular models on SU2 . We now compute these models for the group G = SU2 and for its other real form G = SL2 (R). Actually, only the second of these is strictly real and quasitriangular. Thus, with a basis {H, X± } for its Lie algebra (with the usual relations), we take the Drinfeld–Sklyanin quasitriangular structure 1 r = X+ ⊗ X− + H ⊗ H. 4 Let sl2 (R) have the dual basis {φ, ψ± }, then its Lie algebra structure is [φ, ψ± ] =
1 ψ± , 2
and the other required maps are 1 φ 4H r2 ψ+ = 0 , ψ− X+
[ψ+ , ψ− ] = 0
H 2φ K X+ = ψ− . X− ψ+
Note that if we take a different real form e1 =
−ı (X+ + X− ), 2
e2 =
−1 (X+ − X− ), 2
e3 =
then [ei , ej ] = Bij k ek (the real form su2 ) but ei ⊗ ei + ı(e1 ⊗ e2 − e2 ⊗ e1 ) r=− i
−ı H, 2
476
E. J. Beggs, S. Majid
is not real in this basis. If {fi } is a dual basis then r2 (fj ) = −ej + ıei Bij 3 ,
1 K = − id. 2
This means that although we can arrange for a completely real Lie bialgebra su2 in this basis (here the Lie coalgebra is purely imaginary but we can rescale r to make it real) it is not a quasitriangular one over R; the required r if we want to obey (43) lives in the complexification. In the above conventions the Lie algebra sl2 in the dual basis is imaginary, [fi , fj ] = ı(δik δj 3 − δj k δi3 )fk . The choice of basis
ei
= −ıfi is its real form su2 .
Modified principal sigma model on SU2 . To construct the model we will need (u) = Adu (r− ) − r− quite explicitly, where r− = ıe1 ∧ e2 is the antisymmetric part of r. For our purposes we write SU2 as elements
a b u= , |a|2 + |b|2 = 1. −b¯ a¯ Then working with the matrix representation ei = easy to find
−ı 2 σi
given by the Pauli matrices it is
Adu−1 (e1 ) = ((a 2 − b2 )e1 + )(a 2 + b2 )e2 − 2((ab)e3 , Adu−1 (e2 ) = −)(a 2 − b2 )e1 + ((a 2 + b2 )e2 + 2)(ab)e3 , and hence R (u) = 2ıe1 ∧ e2 |b|2 − e3 ∧ e1 (a b¯ − ab) ¯ − ıe2 ∧ e3 (a b¯ + ab) ¯
(65)
after a short computation, which is purely imaginary (as expected). Evaluating against the second factor and regarding as a matrix we have ¯ 1 −ı|b|2 −ı)(a b) ¯ . Eu−1 = K −1 + R (u) = −2 ı|b|2 1 ı((a b) ¯ −ı((a b) ¯ ı)(a b) 1 Here Eu−1 (fj ) = Eij−1 ei , where (Eij−1 ) is the matrix shown. Note that we can write ¯ ((a b) ¯ , Eij−1 = −2(δij + ıBij k πk ), π = )(a b) −|b|2 and any matrix of this form has inverse Eij = −
1 (δij − ıBij k πk − πi πj ). 2(1 − π 2 )
Here π 2 = π · π = |b|2 in our case. The corresponding operator is Eu (ej ) = Eij fi . To cast the resulting Lagrangian in a useful form let us note that Tr (id − π /)σi σj = Tr (id − π · σ )(δij id + ıBij k σk ) = 2(δij − ıBij k πk ),
Poisson–Lie T-Duality for Quasitriangular Lie Bialgebras
477
where σi are the Pauli matrices and π / = π · σ . Hence in our representation of su2 in the i basis ei = −ıσ we have 2
1 1 −1 −1 −1 −1 Tr [(id − π /)u u+ u u− ] − Tr [π /u u+ ]Tr [π /u u− ] , (66) L= |a|2 2 where
¯ −|b|2 ab π /= a b¯ |b|2
0 b −1 0 b = ¯ u=u . b 0 b¯ 0
The matrix Eij here is complex since R in our conventions is imaginary. For a completely real version of this model on SU2 one should keep the freedom of general µ in this class of models so that Eu−1 = µK −1 + R (u) and then set µ = ı. Taking the real normalisation of su2 as a Lie bialgebra (i.e. multiplying r by −ı so that r− = e1 ∧ e2 and K −1 = 2ıid) gives the same Eij−1 as above but times −ı off the diagonal. One may also work of course on G = SL2 (R) with real r, K for a completely real model with µ = 1. This type of model has been considered specifically for SU2 in [18] although not in the matrix/trace form above, and more recently in [14] but without explicit formulae for the Lagrangian. Pure-quasitriangular and principal sigma models on SU2 . Here we take λ = 0 and can write down immediately 1 + 2µ −ı 0 1 + 2µ 0 Eu−1 = Ee−1 = − ı 0 0 1 + 2µ which has inverse
1+2µ Eu = Ee =
−1 4µ
1+µ −ı 1+µ
i 1+µ 1+2µ 1+µ
0
0
0 0 4µ 1+2µ
for µ % = 0, − 21 , −1. The Lagrangian defined by this can be conveniently obtained by writing 0 Eij−1 = −(1 + 2µ)(δij − ıBij k πk ), π = 0 , 1 1+2µ
which implies (by similar computations to those above),
1 1 + µ 0 −1 L= Tr [ u u+ u−1 u− ] 0 µ µ(1 + µ) 1 − Tr [σ3 u−1 u+ ]Tr [σ3 u−1 u− ] . 4(1 + 2µ) This is singular for the pure quasitriangular model where µ = 0, and also does not have a good limit at µ = ∞ for the principal sigma model. Rather, we have well-defined
478
E. J. Beggs, S. Majid
equations of motion conveniently described as a sigma model for s ∈ M as explained above, using Ee−1 and a similar matrix for Te−1 . On the other hand, by changing the normalisation of the Lie bialgebra structure (namely, dividing r by µ) we have Eu with the same matrix as above but without the µ−1 factor in front. This rescaled Lagrangian is well defined both for µ = 0 and µ = ∞, with 1 Tr [(1 + σ3 )u−1 u+ u−1 u− ] − 41 Tr [σ3 u−1 u+ ]Tr [σ3 u−1 u− ] as µ → 0 µL → 2 −1 Tr [u u+ u−1 u− ] as µ → ∞. The first limit is the Lagrangian for the rescaled pure-quasitriangular model on SU2 , while the second is the standard Lagrangian for the principal sigma model on SU2 based on the Killing form of su2 . Notice that in this rescaled model the Lie cobracket of su2 is infinite at µ = 0, i.e. the Lie algebra m has infinite commutators, and zero at µ = 0, i.e. the Lie algebra m is Abelian. The geometrical pictures behind these two models are therefore very different but interpolated by general µ. Also note that the µ = 0 limit here is again defined by a complex Lagrangian. For a real version one may look at the pure-quasitriangular model on G = SL2 (R) instead. Here we have, clearly, 1+2µ 4µ φ H H 1+2µ φ 4 µ Ee−1 ψ+ = µX− , Ee X+ = µ−1 1+µ ψ− . ψ− X− (1 + µ)X+ ψ+ As before, we take out a factor µ by rescaling in order to obtain well-defined operators Ee at µ = 0, ∞, this time with all coefficients being real in our choice of bases. The corresponding Lagrangian can easily be written out explicitly upon fixing a description of u ∈ SL2 (R). For example, if we write u = exX+ ehH eyX− , so that u−1 u± = x± X+ e−2h + (h± + yx± e−2h )H + (y± − 2yh± − 2y 2 x± e−2h )X− , using the relations of sl2 then the rescaled µ = 0 limit gives the Lagrangian L = e−2h x+ (y− − 2yh− − 2y 2 e−2h x− ) as the pure-quasitriangular model on SL2 (R). The µ = ∞ limit is the standard principal sigma model on SL2 (R) and the general case interpolates the two. 6.3. Dual of the quasitriangular models on G . The quasitriangular models are examples of the case where the factorisation is based on the Drinfeld double associated to a Lie bialgebra, so that Eu−1 is related to the Poisson–Lie group G. Hence the dual models are of the same form but based on the Poisson–Lie group G rather than G, i.e. with ˆ (t) ∈ m ⊗ m in place of . As explained in Sect. 5 we can then construct them from the initial data Eˆ e−1 = Ee ,
Tˆe−1 = Te
Poisson–Lie T-Duality for Quasitriangular Lie Bialgebras
479
ˆ¯ −1 = E + (t) ˆ as given above for our quasitriangular models. We compute E and invert e t to obtain the Lagrangian −1 ˆ Lˆ = (Ee + (t)) t− t −1 , t+ t −1
(67)
for the dual model. For the models below, where there is no special Adt -invariance of Ee , this is easier than computing the Lagrangian via Eˆ t . We outline the results for SU2 and SL2 (R) . First of all we describe these groups explicitly. The former is generated by the basis {−ıfi }, i.e. we write φ = φi (−ıfi ) ∈ m for real φi , which we regard as a vector φ. One standard representation of the resulting group is as matrices of the form
x z , 0 x −1
x > 0,
z ∈ C.
This is the group occurring in the Iwasawa decomposition SL2 (C) = SU2 SU2 , see [12]. Another description useful for very explicit computations is as the semidirect product R2 >R [12], which can be viewed as a modified product on R3 . Elements are s ∈ R3 with s3 > −1 and the product law and inversion are st = s + (s3 + 1)t,
s −1 = −
s . s3 + 1
The exponentiation from the Lie algebra to a group is explicitly s=φ
eφ3 − 1 φ3
for s = eφ in the natural 3-dimensional coadjoint representation. See [12]. The real form SL2 (R) has a similar description as C>R, i.e. where s2 is imaginary and s1 , s3 real with s3 > −1 according to the conventions in [12]. Note that x = s3 +1 is multiplicative under the group law if one wants a more standard notation. The Lie bracket on su2 determines the Lie cobracket and Poisson structure on SU2 (and similarly on SL2 (R) ). It is given by [12] 1 ˆ (s) = −ı(Bij a sa + s 2 Bij 3 )fi ⊗ fj . 2 Explicitly, ˆ ı (s) =
1 2 (s + s22 + (s3 + 1)2 − 1)f1 ∧ f2 + s2 f3 ∧ f1 + s1 f2 ∧ f3 . 2 1
Note also that the notation s± s −1 means more precisely Rs −1 ∗ s± . Similarly for s −1 s± . In our present group coordinates, from the product law, it is easy to see that Ls∗ φ = (s3 + 1)φ,
Rs∗ φ = φ + φ3 s.
480
E. J. Beggs, S. Majid
Dual of the modified principal sigma model. We set λ = −1 and µ = 1. Then 1 fi ⊗ fi . Ee = K = − 2 i
Hence −1
ˆ¯ E ij
1 = − (δij + ıBij k πˆ k ), 2
0 πˆ = 2t + 0 t2
and ˆ¯ = − E ij
1 − t 2 (t 2
2 (δij − ıBij k πˆ k − πˆ i πˆ j ). + 4(t3 + 1))
This defines the Lagrangian 2 Lˆ = ∇+ t · ∇− t − ı πˆ · (∇+ t × ∇− t) − (πˆ · ∇+ t)(πˆ · ∇− t) , 1 − t 2 (t 2 + 4(t3 + 1)) where Rt −1 ∗ t± is computed as t ∇± t = t ± − t±3 . t3 + 1 As before, the model in the form stated is complex but with a different choice µ = ı and different normalisation of r we can obtain a real model as well. Dual of the pure-quasitriangular and principal sigma models. Here we set λ = 0. Then rearranging Ee above as an element of m ⊗ m we have
1 1 + 2µ 4µ ı (f1 ⊗ f1 + f2 ⊗ f2 ) + f3 ⊗ f3 + f 1 ∧ f2 . Ee = − 4µ 1 + µ 1 + 2µ 1+µ One may then compute ˆ¯ = (E + ) ˆ −1 , E t e and hence the Lagrangian. The result does not have any particular simplifying features over the λ = −1 case above, so we omit its detailed form. Both limits of µ are singular, and require rescaling. The µ → ∞ case makes sense after a rescaling of r to r/µ. This in turn scales the Lie cobracket of g by µ−1 and hence also changes the Lie algebra structure of m to an Abelian one plus corrections of order µ−1 . The effect of this is to change the exponential map and the group law of G , making the latter Abelian. This can be expressed conveniently by working in new coordinates with t scaled by µ−1 . In this new coordinate system we have ˆ¯ = −2id + O(µ−1 ) E t ˆ is linear in t to lowest order. The Lagrangian is since Lˆ = 2t + · t − + O(µ−1 ). Thus the dual model to the principal sigma model on SU2 is an Abelian one based on the group R3 with the usual linear wave equation. The similar limit for the pure-quasitriangular case is ill-defined since the Lie bracket of m becomes singular as µ → 0. Other scaling limits of both the original model and its dual are possible in this case.
Poisson–Lie T-Duality for Quasitriangular Lie Bialgebras
481
6.4. Point-particle limit of the quasitriangular models. We have seen that the pointparticle limit where u is independent of x reduces to a classical mechanical dynamical system on the group G. For our quasitriangular models we have the following special cases. Point-particle modified principal model. From the expressions for Eu−1 etc. above, the Hamiltonian is H=
1 −1 K ((K ◦ R (u) + 1)p, (K ◦ 2 (u−1 ) + 1)p) 4
(68)
and the equations of motion are u−1 u˙ = K −1 ◦ ((K ◦ R (u))2 − 1)p,
p˙ = [K ◦ R (u)p, p].
(69)
In this limit both the case entirely over R or the case where r and hence the cobracket are imaginary lead to well-defined real equations of motion. In this case is imaginary but so is the Lie bracket of m in the dual basis to the real basis of g. For example, we can either work on G = SL2 (R) or, as more usual, on G = SU2 . In the latter case (see above) we have ¯ 0 −|b|2 −)(a b) ¯ = ıBij k πk fi ⊗ ej . K ◦ R (u) = ı |b|2 0 ((a b) ¯ ¯ )(a b) −((a b) 0 Using the complexified Lie bracket on su2 we have the equations of motion for p = pi fi (with pi real) as p˙ = p(p × π )3 − p3 p × π in terms of the vector cross product. This can be written explicitly as p˙ 3 = 0,
ı ı ¯ 2 ρ˙ = − abρ + 2p32 ) + ı|b|2 ρp3 , ¯ 2 + a b(|ρ| 2 2
ρ ≡ p1 + ıp2
after a short computation. On the other hand, ((K ◦ R )2 − 1)ij = (π 2 − 1)δij − πi πj , hence the equation for u in our basis ei =
−ıσi 2
of su2 is
u−1 u˙ = ı(π 2 − 1)p/ − ıπ /π · p. In our case π 2 = |b|2 and π · p = ((ρ ab) ¯ − |b|2 p3 , hence
ı 0 b p3 ρ¯ 2 2 ¯ u˙ = − . (ρ ab ¯ + ρa ¯ b − 2|b| p3 ) − ı|a| u ρ −p3 2 b¯ 0 Explicitly, this is a˙ = −ı|a|2 (ap3 + bρ),
ı ı b˙ = ıbp3 − (1 + |a|2 )a ρ¯ − ρ ab ¯ 2. 2 2
One may verify that this preserves |a|2 + |b|2 = 1 as it must.
482
E. J. Beggs, S. Majid
Point-particle pure-quasitriangular model. We set λ = 0 and Eu = Ee etc. (the models are G-invariant). The Hamiltonian and equations of motion are then 1 K((r2 + µK −1 )p, (r2 + µK −1 )p), 1 + 2µ 2 u−1 u˙ = − (r2 + µK −1 ) ◦ K ◦ (r1 + µK −1 )p, 1 + 2µ 2H =
(70) p˙ =
2 [K ◦ r2 p, p]. 1 + 2µ (71)
Since these models are invariant, we know that pu−1 is conserved. This means that we can let Q = p(0)u(0)−1 ∈ m be fixed and substitute p(t) = Qu(t) into the equation for u. ˙ We then solve a first order non-linear differential equation for u(t). In particular, in the limit µ = 0 we obtain the x-independent limit of the purequasitriangular model. Thus H=
1 K(r2 p, r2 p), 2
u−1 u˙ = 2(r2 ◦ K − 1)r2 p,
p˙ = 2[K ◦ r2 p, p]
(72)
using r1 + r2 = K −1 to rearrange. In this case it makes sense to consider the reduced variable ξ = r2 p and write the equations of motion as u−1 u˙ = 2(r2 ◦ K − 1)ξ,
ξ˙ = 2[r2 ◦ Kξ, ξ ],
(73)
where we use that r2 : g → g is a Lie algebra homomorphism in view of the classical Yang–Baxter equation (43) [12]. We only need to solve this for ξ in the image of r2 but it is interesting that the equation makes sense for any ξ as an interesting integrable system on the group manifold. We can solve this for our strictly real form g = sl2 (R). We will solve it here for the general (73); the special case of interest is similar but more elementary. Thus, 1 H 2H r2 ◦ K X+ = X+ X− 0 so, writing ξ(t) = h(t)H + x(t)X+ + y(t)X− , we need to solve u−1 u˙ = −h(t)H − 2y(t)X− and ˙ + xX ˙ − = [hH + 2xX+ , hH + xX+ + yX− ] hH ˙ + + yX = 2xyH − 2hxX+ − 2hyX− , which is the system of equations h˙ = 2xy,
x˙ = −2hx,
y˙ = −2hy.
Note first of all that d 2 1˙ (h + h) = 0 dt 2
Poisson–Lie T-Duality for Quasitriangular Lie Bialgebras
483
so h2 + xy =
ω2 4
(say) is a constant. Inserting this into the equation for h yields the Riccati equation 1 h˙ − ω2 + 2h2 = 0 2 which has the general solution h(t) =
2h(0) 1 sinh(ωt) + ω cosh(ωt) . ω 2 cosh(ωt) + 2h(0) sinh(ωt) ω
We can then compute y as y(t) = e−2
t 0
h(τ )dτ
y(0)
and similarly for x(t). Since we only need h, y to obtain u(t) we can consider the choice of x(0) to be equivalent to the choice of ω (at least in a certain range). The initial values of h, y then determine their general values as above, and these then determine u(t) given u(0). The latter can be expressed explicitly in terms of integrals on fixing a coordinate system for SL2 (R). For the point-particle limit of the pure quasitriangular models we are only interested in ξ ∈ b+ (the image of r2 ), i.e. we specialise to solutions of the form y(0) = 0, which clearly implies y(t) = 0 and h˙ = 0. In this case the solution is clearly ξ(t) =
ω H + e−ωt x(0)X+ , 2
u(t) = u(0)e− 2 ωtH 1
for initial data ω, x(0), u(0). For the full physical momentum p(t) we go back to (72). If we write p = 2ωφ + xψ− + xψ ¯ + say, then a similar computation using the Lie algebra of sl2 (R) gives ω constant, x˙ = −ωx as before, and additionally x˙¯ = ωx. ¯ Hence the solution is ¯ p(t) = 2ωφ + e−ωt x(0)ψ− + eωt x(0)ψ +,
u(t) = u(0)e− 2 ωtH 1
for constants ω, x(0), x(0). ¯ As a check, it is easy to verify that QG = pu−1 = (pe 2 ωtH )u(0)−1 1
is conserved. Here ψ± H = ∓2ψ± and φH = 0 is the relevant coadjoint action. Point-particle principal model. In the limit µ → ∞ of (70)–(71), we obtain the xindependent limit of the principal sigma model. Here 4H = K −1 (p, p),
u−1 u˙ = −K −1 p, ¯
p˙¯ = 0,
where p¯ = µp is the renormalised momentum variable. This has the general solution u(t) = u(0)e−tK
−1 p¯
,
p(t) ¯ = p(0). ¯
It is easy to see that Q = pu ¯ −1 is constant as well, using K ad-invariant.
484
E. J. Beggs, S. Majid
7. Generalised T-Duality with Double Neumann Boundary Conditions So far we have worked on providing a special class of Poisson–Lie T-dual models within the established general framework. We now return to our Hamiltonian formulation of the general framework and observe that in this form the main ideas can be extended to a much more general setting. Thus, from the symplectic form and the Hamiltonian we have just calculated, we can see how the definition of T-duality could be generalised. Begin with a Lie group D, with Lie algebra d, and suppose that d is the direct sum of two subspaces E− and E+ . We take π+ to be the projection to E+ with kernel E− , and π− to be the projection to E− with kernel E+ . Suppose that there is a function k : R2 → D, with the properties that k+ k −1 (x+ , x− ) ∈ E− and k− k −1 (x+ , x− ) ∈ E+ for all (x+ , x− ) ∈ R2 . Then the relation k+ k −1 (x+ , x− ) ∈ E− can be summarised by π+ (k+ k −1 ) = 0, and similarly we get π− (k− k −1 ) = 0. This gives the equations of motion ˙ −1 = (π− − π+ )(kx k −1 ). kk Now we look at the symplectic form on the phase space. Suppose that d has an adjoint invariant inner product , . If we imposed boundary conditions that k(0) and k(π ) were fixed, then the symplectic form we computed earlier becomes
π 2ω(k; kz , ky ) = (k −1 ky )x , k −1 kz dx. x=0
˙ then we get If we substitute kz = k,
π ˙ −1 dx ˙ ky ) = k(k −1 ky )x k −1 , kk 2ω(k; k, x=0
π = (kx k −1 )y , (π− − π+ )(kx k −1 ) dx, x=0
and so ˙ ky ) = −D(k;ky ) 4ω(k; k,
π x=0
kx k −1 , (π+ − π− )(kx k −1 ) dx,
on the assumption that π+ − π− is Hermitian. This will be true if the subspaces E− and ˙ = E+ are perpendicular with respect to the inner product. Then we see that ω(k; ky , k) D(k;ky ) H(k), where 4H = (πu+ − πu− )(u−1 ux + sx s −1 ), u−1 ux + sx s −1 gives the Hamiltonian generating the time evolution. The form of the boundary conditions we have imposed here should not come as too much of a surprise. Normally the string has boundary conditions (for k = us with u ∈ G and s ∈ M) ux = 0 at x = 0 or x = π . This Neumann condition is designed to prevent momentum transfer out of the string at the edges. But if the system is to be completely dual, we also need to impose a corresponding Neumann condition on the dual theory, which leads to the boundary condition kx = 0, the “double Neumann” condition. But then the equation of motion states k˙ = 0 on the boundary. Alternatively, if the reader prefers to work over x ∈ R, we just deal with rapidly decreasing solutions. In either of these cases, the symplectic form really is non-degenerate.
Poisson–Lie T-Duality for Quasitriangular Lie Bialgebras
485
Now we have a phase space and Hamiltonian for the equations of motion just based on an invariant inner product on D and an orthogonal decomposition E− and E+ of d. If we take D to be a doublecross product D = G M, and assume that the subspaces Adu−1 E± have graph coordinates Tu and Eu as before, we again recover the previous equations of motion for u ∈ G in the factorisation k = us, Tu (u−1 u+ ) − − Eu (u−1 u− ) + = Eu (u−1 u− ), Tu (u−1 u+ ) . Importantly, we do not need to assume that the inner product has any special properties with respect to the decomposition d = g + m (such as being zero on g). We can also give the form of the Hamiltonian for this general case: 4H = (Eu + I )(u−1 u− ), (Eu + I )(u−1 u− ) − (Tu + I )(u−1 u+ ), (Tu + I )(u−1 u+ ). The corresponding dual formula would produce exactly the same value. 7.1. Poisson brackets and the central extension. In this section we continue with the generalised T-duality and boundary conditions of the last section. The phase space for our system is infinite dimensional, so it is rather hard to describe the functions on it directly. We shall describe a “nice” set of functions, and hope that more general functions are expressible as a product of these nice functions. If v ∈ C ∞ ((0, π), d), we can look at the vector field kz = vk for k ∈ C ∞ ((0, π ), D). To preserve the boundary conditions we consider only those v ∈ C ∞ ((0, π ), d) which tend to zero at the end points. Consider
1 (k −1 ky )x , k −1 kz dx ω(k; ky , kz ) = − 2
1 1 −1 =− (kx k )y , v dx = − D(k;y) kx k −1 , v dx. 2 2 It follows that the function which acts as a Hamiltonian generating this flow is
1 kx k −1 , v dx. fv (k) = − 2 We can calculate the Poisson brackets between these nice functions quite easily:
1 wx , v dx. {fv , fw } = fv (k, wk) = f[v,w] − 2 We now see the appearance of a central extension term in the Lie algebra. The Poisson brackets can be written as {fv , fw } = f[v,w] + ϑ(v, w)fc , where fc (k) = 1 and the cocycle ϑ(v, w) = − wx , v dx/2. We can also manufacture a derivation term, which corresponds to the momentum (the operation of incrementing the x coordinate). Consider
1 1 −1 −1 (k ky )x , k kx dx = − (kx k −1 )y , kx k −1 dx ω(k; ky , kx ) = − 2 2
1 = − D(k;y) kx k −1 , kx k −1 dx. 4
486
E. J. Beggs, S. Majid
Thus the momentum is given by 1 fd (k) = − 4
kx k −1 , kx k −1 dx.
A brief calculation shows that {fd , fv } = fv and {fd , fc } = 0. 7.2. Adjoint symmetries of the model and dual model. In this section we consider the left multiplication symmetry again, however this time we can simultaneously describe the action on the dual models. This requires some care with the boundary conditions, and we shall take the double Neumann condition on loops, i.e. k = e and kx = 0 at both boundaries. The operation of left multiplication by constants does not preserve these conditions, but we can use our freedom to introduce a right multiplication to work with the adjoint action instead. Take the action on the phase space given by Add for d ∈ D. This preserves the boundary conditions, and preserves the models in the case where Add E± = E± . The corresponding infinitesimal motions are generated by the moment map
1 Iδ (k) = − kx k −1 , δdx, δ ∈ d. 2 If the map adδ preserves the subspaces E± then this formula gives conserved charges for the system. 7.3. Automorphism symmetries of the model and dual model. Here we consider symmetries of the phase space arising from group automorphisms θ : D = G M → D. This is really a generalisation of the previous subsection, where we just considered automorphisms given by the adjoint action, i.e. inner automorphisms. We consider the same boundary conditions as in the last subsection. For convenience we also assume that the two subspaces θE± of the Lie algebra d are perpendicular for the given inner product. This is not really needed, as we can always manufacture a new Ad-invariant inner product from the old one using the automorphism in order to make this true. Given these conditions, any automorphism θ : D → D will induce a map θ˜ on the phase space given by (θ˜ k)(x) = θ (k(x)). This map will be symplectic if θ preserves the given inner product on d, and if θ E± = E± , then the map will preserve the given models. In general θ˜ k will factor to give G-models and dual M-models which are a mixture of the original G-models and dual M-models given by factoring k. However there are two special cases worthy of mention. 1) The automorphism θ : D → D is called subgroup preserving if θG ⊂ G and θ M ⊂ M. In this case a factorisation k = us for u ∈ G and s ∈ M is sent to θ (k) = θ (u)θ (s), and θ (u) is a solution of the sigma model on G. In the same manner, if t is a solution of the sigma model on M, then θ(t) is also a solution of the sigma model on M. 2) The automorphism θ : D → D is called subgroup reversing if θG ⊂ M and θ M ⊂ G. If such an automorphism exists, the double D = G M is called self-dual [23]. In this case a factorisation k = us for u ∈ G and s ∈ M is sent to θ (k) = θ (u)θ (s), and θ (u) is a solution of the sigma model on M. In the same manner, if t is a solution of the dual sigma model on M, then θ(t) is also a solution
Poisson–Lie T-Duality for Quasitriangular Lie Bialgebras
487
of the sigma model on G. In this manner the solutions of the sigma model on G and the dual sigma model on M are related by a group homomorphism from G to M, and in that sense the models are self-dual. Other symmetries may be constructed. For example if we have θ E+ = E− and θE− = E+ then the map θˆ (k)(t, x) = θ(k(t, π − x)) sends a solution k of the model into another solution. The explicit computation of examples of our generalised T-duality along the above lines is a topic for further work. However, the data required for the construction do exist in abundance. For example, given any two Lie algebras g0 ⊂ d whose Dynkin diagrams differ by the deletion of some nodes, one has an inductive construction d = (n>g0 ) n∗ , where n are braided-Lie bialgebras [22]. For a concrete example, one has, locally, D = SO(1, n + 1) = (Rn >SO(n)) Rn as the decomposition of conformal transformations into Poincaré and special conformal translations. The group D has a non-degenerate bilinear form as required (although not positive-definite). The explicit construction of the required factorisation and the associated bicrossproduct quantum groups and T-dual models will be attempted elsewhere. References 1. Klimcik, C. and Severa, P.: Dual non-Abelian duality and the Drinfeld double. Phys. Lett. B, 351, 455–462 (1995) 2. Klimcik, C. Poisson–Lie T-duality. Nucl. Phys. B (Proc. Suppl.) 46, 116–121 (1996) 3. C. Klimcik and P. Severa. Poisson–Lie T-duality and loop groups of Drinfeld doubles. Phys. Lett. B, 372, 65–71, (1996) 4. Majid, S.: Hopf algebras for physics at the Planck scale. J. Classical and Quantum Gravity 5, 1587–1606 (1988) 5. Majid, S.: Non-commutative-geometric Groups by a Bicrossproduct Construction. PhD thesis, Harvard mathematical physics, 1988 6. Majid, S.: Physics for algebraists: Non-commutative and non-cocommutative Hopf algebras by a bicrossproduct construction. J. Algebra 130, 17–64 (1990) 7. Majid, S.: Matched pairs of Lie groups associated to solutions of the Yang–Baxter equations. Pac. J. Math. 141, 311–332 (1990) 8. Majid, S. and Oeckl, R.: Twisting of quantum differentials and the Planck-scale Hopf algebra. Commun. Math. Phys. 205, 617–655 (1999) 9. Tseytlin, A.A.: Duality symmatrical closed string theory and interacting chiral scalars. Nucl. Phys. B 350, 395–440 (1991) 10. Semenov-Tian-Shansky, M.A.: Dressing transformations and Poisson group actions. Publ. RIMS (Kyoto) 21, 1237–1260 (1985) 11. Drinfeld, V.G. Hamiltonian structures on Lie groups, Lie bialgebras and the geometric meaning of the classical Yang–Baxter equations. Sov. Math. Dokl. 27, 68 (1983) 12. Majid, S. Foundations of Quantum Group Theory. Cambridge, Cambridge Univeristy Press, 1995 13. Novikov, S., Manakov, S.V., Pitaevskii, L.P. and Zakharov, V.E.: Theory of solitons. NewYork: Consultants Bureau, 1984 14. Lledo, M.A. and Varadarajan, V.S.: su(2)-Poisson–Lie T-duality. UCLA Preprint, 1998 15. Sfetsos, K.: Canonical equivalence of non-isometric sigma-models and Poisson–Lie T-duality. Nucl. Phys. B 517, 549–566 (1998) 16. Parkhomenko, S.E.: Mirror symmetry as a Poisson–Lie T-duality. Landau Inst. Preprint, 1997 17. Majid, S.: Quantum and braided group Riemannian geometry. J. Geom. Phys. 30, 113–146 (1999) 18. Sfetsos, K.: Poisson–Lie T-duality beyond the classical level and the renormalisation group. Phys. Lett. B 432, 365–375 (1998)
488
E. J. Beggs, S. Majid
19. Alekseev, A.Yu., Klimcik, C. and Tseytlin, A.A.: Quantum Poisson–Lie T-duality and the WZNW model. Nucl. Phys. B, 458, 430–444 (1996) 20. Drinfeld, V.G.: Quantum groups. In: A. Gleason, editor, Proceedings of the ICM, Providence, Rhode Island: AMS, 1987, p. 798–820 21. Semenov-Tian-Shansky, M.A.: What is a classical R-matrix. Func. Anal. Appl. 17, 17 (1983) 22. Majid, S.: Braided-Lie bialgebras. Pac. J. Math. 192, 329–356 (2000) 23. Beggs, E.J. and Majid, S.: Quasitriangular and differential structures on bicrossproduct Hopf algebras: J. Algebra 219, 682–727 (1999) Communicated by R. H. Dijkgraaf
Commun. Math. Phys. 220, 489 – 535 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Realizing Holonomic Constraints in Classical and Quantum Mechanics Richard Froese1 , Ira Herbst2 1 Department of Mathematics, University of British Columbia, Vancouver, British Columbia, Canada 2 Department of Mathematics, University of Virginia, Charlottesville, Virginia, USA
Received: 16 November 2000 / Accepted: 2 February 2001
Abstract: We consider the problem of constraining a particle to a smooth compact submanifold of configuration space using a sequence of increasing potentials. We compare the classical and quantum versions of this procedure. This leads to new results in both cases: an unbounded energy theorem in the classical case, and a quantum averaging theorem. Our two step approach, consisting of an expansion in a dilation parameter, followed by averaging in normal directions, emphasizes the role of the normal bundle of , and shows when the limiting phase space will be larger (or different) than expected. 1. Introduction and Table of Contents Consider a system of non-relativistic particles in a Euclidean configuration space Rn+m whose motion is governed by the Hamiltonian H = 21 p, p + V (x).
(1.1)
We are interested in the motion of these particles when their positions are constrained to lie on some n-dimensional smooth compact submanifold ⊂ Rn+m . In both classical and quantum mechanics there are accepted notions about what the constrained motion should be: In classical mechanics, the Hamiltonian for the constrained motion is assumed to have the form (1.1), but whereas p and x originally denoted variables on the phase space T ∗ Rn+m = Rn+m × Rn+m , they now are variables on the cotangent bundle T ∗ . The inner product p, p is now computed using the metric that inherits from Rn+m , and V now denotes the restriction of V to . In quantum mechanics, p, p is interpreted to mean − , where is the Laplace operator, and V (x) is the operator of multiplication by V . For unconstrained motion
is the Euclidean Laplacian on Rn+m , and the Hamiltonian acts in L2 (Rn+m ). For constrained motion, the Laplace operator for with the inherited metric is used, and the Hilbert space is L2 (, dvol).
490
R. Froese, I. Herbst
In both cases the description of the constrained motion is intrinsic: it depends only on the Riemannian structure that inherits from Rn+m , but not on other details of the imbedding. Of course, a constrained system of particles is an idealization. Instead of particles moving exactly on , one might imagine there is a strong force pushing the particles onto the submanifold. The motion of the particles would then be governed by the Hamiltonian Hλ =
1 p, p + V (x) + λ4 W (x), 2
(1.2)
where W is a positive potential vanishing exactly on and λ is large. (The fourth power is just for notational convenience later on.) Does the motion described by Hλ converge to the intrinsic constrained motion as λ tends to infinity? Surprisingly, the answer to this question depends on exactly how it is asked, and is often no. A situation in classical mechanics where the answer is yes is described by Rubin and Ungar [RU]. An initial position on and an initial velocity tangent to are fixed. Then, for a sequence of λ’s tending to infinity, the subsequent motions under Hλ are computed. As λ becomes large, these motions converge to the intrinsic constrained motion on . This result is widely known, since it appears in Arnold’s book [A1] on classical mechanics. However, from the physical point of view, it is neither completely natural to require that the initial position lies exactly on , nor that the initial velocity be exactly tangent. Rubin and Ungar also consider what happens if the initial velocity has a component in the direction normal to . In this case, the motion in the normal direction is highly oscillatory, and there is an extra potential term, depending on the initial condition, in the Hamiltonian for the limiting motion on . In their proof, is assumed to have co-dimension one. A more complete result is given by Takens [T]. Here the initial conditions are allowed to depend on λ in such a way that the initial position converges to a point on and the initial energy remains bounded. (We will give precise assumptions below.) Once again, the limiting motion on is governed by a Hamiltonian with an additional potential. Takens noticed that a non-resonance condition on the eigenvalues of the Hessian of the constraining potential W along is required to prove convergence. He also gave an example showing that if the Hessian of W has an eigenvalue crossing, so that the non-resonance condition is violated, then there may not be a good notion of limiting motion on . In his example, he constructs two sequences of orbits, each one converging to an orbit on . These limiting orbits are identical until they hit the point on where the eigenvalues cross. After that, they are different. This means there is no differential equation on governing the limiting motion. For other discussions of the question of realizing constraints see [A2] and [G]. A modern survey of the classical mechanical results that emphasizes the systematic use of weak convergence is given by Bornemann and Schütte [BS]. The quantum case was considered previously by Tolar [T], da Costa [dC1, dC2] and in the path integral literature (see Anderson and Driver [AD]). Related work can also be found in Helffer and Sjöstrand [HS1, HS2], who obtained WKB expansions for the ground state, and in Duclos and Exner [DE], Figotin and Kuchment [FK], Schatzman [S] and Kuchment and Zeng [KZ]. The most general formal expansions appear in the preprint of Mitchell [M]. (We thank the referee for this reference.) There are really two aspects to the problem of realizing constraints: a large λ expansion followed by an averaging procedure to deal with highly oscillatory normal motion. Previous work in quantum mechanics concentrated on the first aspect (although a related averaging procedure for classical paths with a vanishingly small random perturbation can be found
Realizing Holonomic Constraints in Classical and Quantum Mechanics
491
in [F] and [FW]). Already a formal large λ expansion reveals the interesting feature that the limiting Hamiltonian has an extra potential term depending on scalar and the mean curvatures. Since the mean curvature is not intrinsic, this potential does depend on the imbedding of in Rn+m . It is not completely straightforward to formulate a theorem in the quantum case. We have chosen a formulation, modeled on the classical mechanical theorems, tracking a sequence of orbits with initial positions concentrating on via dilations in the normal direction. Actually we consider the equivalent problem of tracking the evolution of a fixed vector governed by the Hamiltonian Hλ conjugated by unitary dilations. In order to obtain simple limiting asymptotics for the orbit we must assume that all the eigenvalues of the Hessian of the constraining potential W are constant on . In fact we will assume that W is exactly quadratic. Our theorems show that for large λ the motion is approximated by the motion generated by an averaged limiting Hamiltonian H B , with superimposed normal oscillations generated by λ2 HO , where HO is the normal harmonic oscillator Hamiltonian. The Hamiltonians H B and HO commute, so the motions are independent. These theorems do not require any non-resonance conditions on the eigenvalues of the Hessian of W . However, the limiting Hamiltonian H B does not act in L2 (), but in L2 (N ), where N is the normal bundle of . It is only in certain situations where one can effectively ignore the motion in the normal directions and obtain a unitary group on L2 () implementing the dynamics of the tangential motion. This occurs, for example, if (a) the eigenvalues of the Hessian of W are all distinct and non-resonant, (b) the normal bundle is trivial, and (c) we confine our attention to a simultaneous eigenspace of all the number operators for the normal motion. In the general situation, the dynamics of the additional degrees of freedom in N cannot be factored out, and we must be content with analysis on L2 (N ). Our formulation of the quantum theorems invites comparison with the classical mechanical results of Rubin and Ungar [RU] and Takens [T]. It turns out that extra potentials that appear in the two cases are quite different, and there is no obvious connection. Upon reflection, the reason for this difference is clear. If we have a sequence of initial quantum states whose position distribution is being squeezed to lie close to , then by the uncertainty principle, the distribution of initial momenta will be spreading out, and thus the initial energy will be unbounded. However, the classical mechanical convergence theorems above all deal with bounded energies. The danger in considering unbounded energies is that even if the initial energy in the tangential mode is bounded, the coupling between tangential and normal modes may result in unbounded tangential energy in finite time. Our assumptions, which allow us to obtain a classical theorem despite the unbounded energy, are motivated by quantum mechanics. Our results for classical mechanics with unbounded initial energies are quite similar to our results in quantum mechanics. Table of Contents 1. 2. 3. 4. 5. 6. 7. 8.
Introduction and Table of Contents . . . . . Classical Mechanics: Bounded Energy . . . Classical Mechanics: Unbounded Energy . . Quantum Mechanics . . . . . . . . . . . . . Co-ordinate Expressions . . . . . . . . . . . Proofs of Theorems in Classical Mechanics . More Co-ordinate Expressions . . . . . . . Proofs of Theorems in Quantum Mechanics
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
489 492 494 499 503 507 515 521
492
R. Froese, I. Herbst
Section 2 contains a statement of the theorem of Rubin, Ungar and Takens on limiting orbits when the initial energies remain bounded. In Sect. 3 we state our expansion and averaging theorems in classical mechanics when the initial energies scale as they do in quantum mechanics. We also describe when the limiting motion can be thought of as a motion on . These classical results are motivated by the parallel results in quantum mechanics, which we present in Sect. 4. The proofs of the theorems in Sects. 3 and 4 are found in Sects. 6 and 8 respectively, while Sects. 5 and 7 contain background material needed in the proofs. This paper is an expanded and improved version of the announcement [FH]. 2. Classical Mechanics: Bounded Energy To give a precise statement of our results we must introduce some notation. Let be a smooth compact n-dimensional submanifold of Rn+m . The normal bundle to is the submanifold of Rn+m × Rn+m given by N = {(σ , n) : σ ∈ , n ∈ Nσ }. Here Nσ denotes the normal space to at σ , identified with a subspace of Rn+m . There is a natural map from N into Rn+m given by ι : (σ, n) → σ + n. We now fix a sufficiently small δ so that this map is a diffeomorphism of N δ = {(σ, n) : n < δ} onto a tubular neighbourhood of in Rn+m . Then we can pull back the Euclidean metric from Rn+m to N δ . Since we are interested in the motion close to we may use N δ as the classical configuration space. This will be convenient in what follows, and is justified below. We will want to decompose vectors in the cotangent spaces of N δ into horizontal and vertical vectors, so we now explain this decomposition. Let π : N → denote the projection of the normal bundle onto the base given by π : (σ, n) → σ . The vertical subspace of T(σ,n) N is defined to be the kernel of dπ : T(σ,n) N → Tσ . The horizontal subspace is then defined to be the orthogonal complement (in the pulled back ∗ N metric) of the vertical subspace. Using the identification of T(σ,n) N with T(σ,n) given by the metric we obtain a decomposition of cotangent vectors into horizontal and vertical components as well. We will denote by (ξ, η) the horizontal and vertical ∗ N . components of a vector in T(σ,n) The decomposition can be explained more concretely as follows. For each point σ ∈ , we may decompose Tσ Rn+m = Tσ ⊕ Nσ into the tangent and normal space. Using the natural identification of all tangent spaces with Rn+m , we may regard this as a decomposition of Rn+m . Let PσT and PσN be the corresponding orthogonal projections. Since we are thinking of N as an n + m-dimensional submanifold of Rn+m × Rn+m , we can identify T(σ,n) N with the n + m-dimensional subspace of Rn+m × Rn+m given by all vectors of the form (X, Y ) = (σ˙ (0), n(0)), ˙ where (σ (t), n(t)) is a curve in N passing through (σ, n) at time t = 0. The inner product of two such tangent vectors is (X1 , Y1 ), (X2 , Y2 ) = X1 + Y1 , X2 + Y2 ,
(2.1)
where the inner product on the right is the usual Euclidean inner product. For a tangent vector (X, Y ), the decomposition into horizontal and vertical vectors is given by (X, Y ) = (X, PσT Y ) + (0, PσN Y ).
Realizing Holonomic Constraints in Classical and Quantum Mechanics
493
In the statements of our theorems we will want to express the fact that two cotangent vectors, for example ξλ (t) and ξ(t) in Theorem 2.1, are close, even though they belong to two different cotangent spaces. To do this we may use the imbedding to think of the vectors as elements of R2(n+m) . Then it makes sense to use the (Euclidean) norm of their difference, ξλ (t) − ξ(t) to measure how close they are. We will use the symbol · in this situation, while |ξ | will denote the norm of ξ as a cotangent vector. We will assume that the constraining potential is a C ∞ function of the form W (σ, n) =
1 n, A(σ )n, 2
(2.2)
where for each σ , A(σ ) is a positive definite linear transformation on Nσ . The Hamiltonian (1.2) can then be written Hλ (σ, n, ξ, η) =
1 1 ξ, ξ + η, η + V (σ + n) + λ4 W (σ, n). 2 2
(2.3)
Notice that on the boundary of N δ1 , for 0 < δ1 < δ, Hλ (σ, n, ξ, η) ≥ c1 λ4 − c2 with c1 = c2 =
inf
W (σ + n) > 0,
sup
|V (σ + n)|.
(σ,n):σ ∈,n=δ1 (σ,n):σ ∈,n=δ1
By conservation of energy, this implies that an orbit under Hλ that starts out in N δ1 with initial energy less than c1 λ4 − c2 can never cross the boundary, and therefore stays in N δ1 . We will only consider such orbits in this paper, and therefore are justified in taking our phase space to be T ∗ N δ , or even T ∗ N if we extend Hλ in some arbitrary way. Since we expect the motion in the normal directions to consist of rapid harmonic oscillations, it is natural to introduce action variables for this motion. There is one for each distinct eigenvalue ωα2 (σ ) of A(σ ). Let Pα (σ ) be the projection onto the eigenspace of ωα2 (σ ). This projection is defined on Nσ , which we may think of as the range of PσN in Rn+m . Thus the projection is defined on vertical vectors in T(σ,n) N and, via the natural ∗ N . With this notation, the corresponding identification, on vertical vectors in T(σ,n) action variable, multiplied by λ2 for notational convenience, is given by 1 λ4 ωα (σ ) η, Pα η + (2.4) n, Pα n. 2ωα (σ ) 2 Notice that the total normal energy is given by α ωα Iαλ . The following is a version of the theorem of Takens and Rubin, Ungar. Iαλ (σ, n, ξ, η) =
Theorem 2.1. Let be a smooth compact n-dimensional submanifold of Rn+m . Let the Hamiltonian Hλ on T ∗ N be given by (2.3), where V , W ∈ C ∞ , W has the form (2.2) and satisfies (i) The eigenvalues ωα2 (σ ) of A(σ ) have constant multiplicity. Suppose that (σλ , nλ , ξλ , ηλ ) are initial conditions in T ∗ N δ satisfying
494
R. Froese, I. Herbst
(a) σλ − σ0 + ξλ − ξ0 → 0, (b) Iαλ (σλ , nλ , ξλ , ηλ ) → Iα0 > 0, as λ → ∞. Let (σλ (t), nλ (t), ξλ (t), ηλ (t)) denote the subsequent orbit in T ∗ N δ under the Hamiltonian Hλ . Suppose that (σ (t), ξ(t)) is the orbit in T ∗ with initial conditions (σ0 , ξ0 ) governed by the Hamiltonian h(σ, ξ ) =
1 Iα0 ωα (σ ). ξ, ξ σ + V (σ ) + 2 α
Then for any T ≥ 0, sup σλ (t) − σ (t) + ξλ (t) − ξ(t) → 0
0≤t≤T
as λ → ∞. Implicit in this statement is the fact that the approximating orbit stays in the tubular neighbourhood for 0 ≤ t ≤ T , provided λ is sufficiently large. This theorem is actually true in greater generality. We can consider smooth constraining potentials W , where 1 2 n, A(σ )n is the first term in an expansion. If we choose our tubular neighbourhood so that W (σ +n) ≥ c|n|2 and impose the non-resonance condition ωα (σ ) = ωβ (σ )+ωγ (σ ) for every choice of α, β and γ and for every σ , then the same conclusion holds. This theorem is also really a local theorem: if we impose the conditions on W and the nonresonance condition locally, and take T to be a number less than the time where σ (t) leaves the set where condition (i) is true, then the same conclusion holds as well. Actually, Takens [T] only treats the case where all the eigenvalues ωα are distinct and the normal bundle is trivial. On the other hand, he does not require that Iα0 > 0. This positivity is a technical requirement of our proof and arises because action angle co-ordinates are singular on the surface Iα0 = 0. Since Theorem 2.1 is a minor variation of known results, we will not give a proof here. 3. Classical Mechanics: Unbounded Energy We now describe our theorems in classical mechanics where the initial energies are diverging as they do in the quantum case. In quantum mechanics, the ground state energy of a harmonic oscillator − 21 (d/dx)2 + 21 λ4 ω2 x 2 is λ2 ω/2. Thus we will assume that the initial values of the action variables Iαλ scale like λ2 Iα0 , and therefore that the initial normal energy diverges like λ2 . Examining the effective Hamiltonian h(σ, ξ ) in Theorem 2.1, one would expect there to be a diverging λ2 α Iα0 ωα (σ ) potential term similar to the constraining potential but with strength λ2 . If this potential is not constant, and thus has a local minimum (called a mini-well in [HS1, HS2]), no limiting orbit could be expected in general unless the initial positions were chosen to converge to such a minimum. For simplicity, we will assume that there are no mini-wells, i.e., the frequencies ωα are constant. The first step in our analysis is a large λ expansion. It is convenient to implement this expansion using dilations in the fibre of the normal bundle. It is also convenient to assume that our configuration space is all of N . This makes no difference, since the orbits we are considering never leave N δ .
Realizing Holonomic Constraints in Classical and Quantum Mechanics
495
The dilation dλ : N → N is defined by dλ (σ, n) = (σ, λn). As with any diffeomorphism of the configuration space, dλ has a symplectic lift Dλ to the cotangent bundle given by Dλ = dλ−1∗ = dλ∗−1 . The expression for Dλ in local co-ordinates is given by (5.1). Instead of the original Hamiltonian Hλ we may now consider the equivalent pulled back Hamiltonian Lλ = Hλ ◦Dλ−1 . Since Dλ is a symplectic transformation, orbits under Hλ and orbits under Lλ are mapped to each other by Dλ and its inverse. Therefore, it suffices to study the dynamics of the scaled Hamiltonian Lλ . A formal large λ expansion yields Lλ = HB + λ2 HO + O(λ−1 ), where HO is the harmonic oscillator Hamiltonian HO (σ, n, ξ, η) =
1 1 η, η + n, A(σ )n 2 2
(3.1)
and HB is the bundle Hamiltonian given by HB (σ, n, ξ, η) =
1 J ξ, J ξ σ + V (σ ). 2
(3.2)
The inner product ·, ·σ is the inner product on T ∗ defined by the imbedding. Here ∗ N with the horizontal J denotes the identification of the horizontal subspace of T(σ,n) ∗ ∗−1 . This subspace of Tσ given in terms of the bundle projection map πσ,n by J = dπσ,n map is well defined on the horizontal subspace, since dπσ,n : T(σ,n) N → Tσ is an isomorphism when restricted to the horizontal subspace of T(σ,n) N . Thus, its adjoint ∗ is an isomorphism of T ∗ onto the horizontal subspace of T ∗ N . In local dπσ,n σ (σ,n) co-ordinates xi , yi defined in Sect. 5 below, where xi are co-ordinates for , the map J ∗ N with dx ∈ T ∗ . simply identifies dxi ∈ T(σ,n) i σ Additional understanding of the Hamiltonians HB and HO can be obtained if we introduce another metric on N . If (X, Y ) ∈ T(σ,n) N , let (X, Y ), (X, Y )λ = X2 + λ−2 PσN Y 2 .
(3.3)
(In Sect. 7 we describe in what sense this is a limiting form of the pulled-back, scaled, Euclidean metric.) If ·, ·λ denotes the corresponding metric on the cotangent space, then 1 λ2 HB + λ2 HO = (ξ, η), (ξ, η)λ + n, A(σ )n + V (σ ). 2 2 The local co-ordinate expressions for HB and HO are given in (5.9) and (5.10). We will use the notation φtH to denote the Hamiltonian flow governed by the Hamiltonian H .
496
R. Froese, I. Herbst
Theorem 3.1. Let be a smooth compact n-dimensional submanifold of Rn+m . Let Lλ = Hλ ◦ Dλ−1 , where the Hamiltonian Hλ on T ∗ N is given by (2.3). Assume that V , W ∈ C ∞ , W has the form (2.2), and that the eigenvalues ωα2 of A(σ ) do not depend on σ . Suppose that γλ are initial conditions in T ∗ N with γλ → γ0 as λ → ∞. Then for any T ≥ 0, 2 sup φtLλ (γλ ) − φtHB +λ H0 (γ0 ) → 0 0≤t≤T
as λ → ∞. In this theorem the normal energy of the initial conditions, λ2 HO (γλ ) grows like since HO (γλ ) is converging to HO (γ0 ). This leads to increasingly rapid normal 2 oscillations for both orbits φtLλ (γλ ) and φtHB +λ H0 (γ0 ). Neither orbit converges as λ becomes large. It is only their difference that converges. The convergence of the initial conditions is stated for the scaled variables γλ . To find out what this implies for the original variables (σ˜ λ , n˜ λ , ξ˜λ , η˜ λ ) = Dλ−1 γλ we must determine the action of Dλ on horizontal and vertical vectors. This results in the following conditions: λ2 ,
(a) (b) (c) (d)
σ˜ λ → σ0 , λn˜ λ → n0 , ξ˜λ → J ξ0 , and λ−1 η˜ λ → η0 ,
where (σ0 , n0 , ξ0 , η0 ) = γ0 . Here we are thinking of σ , n as vectors in Rn+m and ξ , η as vectors in R2(n+m) . We may also compute what these conditions mean for the initial velocities (Xλ , Yλ ) ∈ T(σ˜ λ ,n˜ λ ) N , again thought of as vectors in R2(n+m) . It turns out that (c ) Xλ → X0 , and (d ) λ−1 Yλ → Y0 . This theorem gives a satisfactory description of the limiting motion if the Poisson bracket of HB and HO vanishes. Then the flows generated by HB and HO commute and the motion is given by the rapid oscillations generated by λ2 HO superimposed on the flow generated by HB . In this situation we can perform averaging by simply ignoring the oscillations. An example where {HB , HO } is zero is when has codimension one, or, more generally, if the connection form, given by (3.10) below, vanishes. Then HB only involves variables on T ∗ , so the motion for large λ is a motion on with independent oscillations in the normal variables. The Poisson bracket {HB , HO } also vanishes if all the frequencies ωα are equal, but in this case the motion generated by HB need not only involve the variables on T ∗ . The motion generated by HB can be thought of as a generalized minimal coupling type flow. (See [GS] for a description of the geometry of this sort of flow.) The flow has the property that the trajectories in N are parallel along their projections onto . In particular, |n|2 is preserved by this motion. In general, when the frequencies are not all equal, the flows generated by HB and λ2 HO interact, and HB + λ2 HO generates a more complicated flow which need not be
Realizing Holonomic Constraints in Classical and Quantum Mechanics
497
simply related to the flows generated by HB and HO . Let H B be defined by H B (γ ) = lim T −1 T →∞
T 0
HB ◦ φtHO (γ )dt.
(3.4)
The existence of this limit follows from the Fourier expansion discussed below. This averaged Hamiltonian Poisson commutes with HO . It turns out that the flow for large λ is the one generated by this Hamiltonian, with superimposed normal oscillations. Theorem 3.2. Assume that the assumptions of Theorem 3.1 hold. Let HO , HB and H B be the Hamiltonians given by (3.1), (3.2) and (3.4) respectively. Let γ0 ∈ T ∗ N and T > 0. Then 2 2 (3.5) sup φtHB +λ H0 (γ0 ) − φtλ H0 ◦ φtH B (γ0 ) → 0 0≤t≤T
as λ → ∞. In this theorem we do not impose a non-resonance condition. However, the form of the averaged Hamiltonian H B depends crucially on whether or not resonances are present. To explain this further we introduce scaled action variables. Recall that the scaled Hamiltonian was defined by Lλ = Hλ ◦ Dλ−1 . We perform a similar scaling on the action variables and define Iα by Iαλ ◦ Dλ−1 = λ2 Iα . Then Iα (σ, n, ξ, η) =
1 ωα η, Pα η + n, Pα n. 2ωα 2
Suppose that there are m0 distinct eigenvalues ωα2 . Then the flows φtIα are commuting Iα = φtIα . harmonic oscillations in the normal variables. They are periodic, satisfying φt+2π m ∗ We therefore obtain a group action 0 of the m0 torus T 0 on T N defined by Im
0τ = φτI11 ◦ · · · ◦ φτm00 , for τ = (τ1 , . . . , τm0 ) ∈ T m0 . Notice that φtHO = 0tω , where ω = (ω1 , . . . , ωm0 ). Now we may perform a Fourier expansion of HB ◦ 0τ yielding eiν,τ Fν HB ◦ 0τ = ν∈Zm0
so that
HB ◦ φtHO =
eitν,ω Fν .
ν∈Zm0
It turns out that only finitely many Fν ’s are non-zero. Thus we may exchange the integral and limit in the definition of H B with the Fourier sum to obtain T −1 itν,ω lim T HB = e dt Fν = Fν . ν∈Zm0
T →∞
0
ν∈Zm0 :ν,ω=0
498
R. Froese, I. Herbst
The non-resonance condition on the eigenvalues ω = (ω1 , . . . , ωm0 ) in this situation would be If ν = 0 and Fν = 0 then ν, ω = 0.
(3.6)
If this condition holds, we find that H B = F0 . We now examine the case m0 = m, where there are m distinct frequencies ωα . We wish to describe how the limiting motion generated by H B can be thought of as taking place on . To begin, since {H B , Iα } = 0 for each α, each Iα is a constant of the motion, so the motion takes place on the level sets of I1 , . . . , Im . Furthermore, we want to to disregard the normal oscillations. Technically, we may do this by replacing the original phase space T ∗ N , with its quotient by the group action 0. This amounts to ignoring the angle variables in local action angle co-ordinates. It turns out that T ∗ N /0 = T ∗ × Rm ,
(3.7)
where the variables in Rm are the action variables. Since these are constant, we may think of the motion as taking place on T ∗ . To describe the identification (3.7) we first make ∗ N . Since there are m a new direct sum decomposition of each cotangent space T(σ,n) distinct eigenvalues ω1 , . . . , ωm , the corresponding eigenvectors, defined globally up to sign, give an orthonormal frame for the normal bundle. In this situation the co-ordinates ∗ N yi = n, ni (σ ) are also globally defined up to sign. Thus the subspace of T(σ,n) spanned by dy1 , . . . , dym is globally defined. This subspace is complementary to the horizontal subspace, but is not necessarily orthogonal. Given horizontal and vertical ∗ N , we may write ξ + η = ξ + η , where ξ is components (ξ, η) of a vector in T(σ,n) 1 1 1 horizontal and η1 is in the span of dy1 , . . . , dym . The map from T ∗ N → T ∗ × Rm given by (σ, n, ξ, η) → (σ, J ξ1 , I1 (σ, n, ξ, η), . . . , Im (σ, n, ξ, η)) is invariant under 0 and gives rise to the identification (3.7). Now suppose that the values of I1 , . . . , Im have been fixed by the initial condition. Then the Hamiltonian governing the motion on T ∗ depends on these “hidden” variables, and is given by 1 (3.8) ξ, ξ σ + V (σ ) + V1 (σ ; I1 , . . . , Im ), 2 provided the non-resonance condition holds. Given that the eigenvalues are distinct, the following implies (3.6) hB (σ, ξ ; I1 , . . . , Im ) =
If j, k, l and m are all distinct then If j, k and l are all distinct then
ωj + ωk ± ωl − ωm = 0. 2ωj ± ωk − ωl = 0.
(3.9)
The extra potential V1 is defined in terms of the frame for the normal bundle, n1 (σ ), . . . , nm (σ ), consisting of normalized eigenvectors of A(σ ). Let bk,l be the associated connection one-form given by bk,l [·] = nk , dnl [·], Then V1 (σ ; I1 , . . . , Im ) =
Ik I l ω l k,l
ωk
(3.10)
|bk,l |2 .
Notice that the norm |bk,l | is insensitive to the choice of signs for the frame.
(3.11)
Realizing Holonomic Constraints in Classical and Quantum Mechanics
499
4. Quantum Mechanics In quantum mechanics, we wish to understand the time evolution generated by Hλ for large λ, where Hλ is the Hamiltonian given by (1.2) with p, p = − . As in the classical case, it is convenient to replace the original configuration space Rn+m with the normal bundle N . We will show that if the initial conditions in L2 (Rn+m ) are supported near then, to a good approximation for large λ, the time evolution stays near . Thus we lose nothing by inserting Dirichlet boundary conditions on the boundary of the tubular neighbourhood of , and may transfer our considerations to L2 (N δ , dvol), where dvol is computed using the pulled back metric. If we extend the pulled back metric, and make a suitable definition of Hλ in the complement of N δ , we may remove the boundary condition. Thus we may assume that that the Hamiltonian Hλ acts in L2 (N , dvol). More precisely, we let gN be any complete smooth Riemannian metric on N that equals the metric induced from the imbedding in the region {(σ, n) : n < :}, for some : < δ. For example, such a gN could be obtained by smoothly joining the induced metric for small n with the metric · , ·1 given by (3.3) for large n. Let dvol denote the Riemannian density for gN . Let V (σ, n) be a smooth bounded function on N such that V (σ, n) = V (σ + n) when n < :. Our goal in this section is to analyze the time evolution generated by λ4 1 Hλ = − + V (σ, n) + n, A(σ )n 2 2
(4.1)
acting in L2 (N , dvol). Here denotes the Laplace- Beltrami operator for gN . We now introduce the group of dilations in the normal directions by defining (Dλ ψ)(σ, n) = λm/2 ψ(σ, λn). This is a unitary operator from L2 (N , dvolλ ) to L2 (N , dvol), where dvolλ denotes the pulled back density dvolλ (σ, n) = dvol(σ, λ−1 n). Since the spaces L2 (N , dvolλ ) depend on λ, and we want to deal with a fixed Hilbert space as λ → ∞, we perform an additional unitary transformation. Let dvolN = lim dvolλ = dvol ⊗ dvolRm . λ→∞
Then the quotient of densities dvolN /dvol √ λ is a function on N and we may define Mλ to be the operator of multiplication by dvolN /dvolλ . The operator Mλ is unitary from L2 (N , dvolN ) to L2 (N , dvolλ ). Let Uλ = Dλ Mλ .
(4.2)
Notice that the support of a family of initial conditions of the form Uλ ψ is being squeezed close to as λ → ∞. We want to consider such a sequence of initial conditions. Therefore it is natural to consider the conjugated Hamiltonian Lλ = Uλ∗ Hλ Uλ , since the evolution generated by Lλ acting on ψ is unitarily equivalent to the evolution generated by Hλ acting on Uλ ψ. As a first step we perform a large λ expansion. Formally, this yields Lλ = HB + λ2 HO + O λ−1 ,
500
R. Froese, I. Herbst
where HO is the quantum harmonic oscillator Hamiltonian in the normal variables, and HB is quantum version of the corresponding classical Hamiltonian, except with an additional potential n2 n(n − 1) s − h2 . K= 4 8 Here s is the scalar curvature and h is the mean curvature vector (see Eqs. (7.2) and (7.1)). Notice that this extra potential does depend on the imbedding of in Rn+m , since the mean curvature does. The quadratic forms for HO and HB are 1 V 1 (4.3) P dψ, P V dψσ,n + n, A(σ )n|ψ|2 dvolN ψ, HO ψ = 2 N 2 and
ψ, HB ψ =
N
1 J P H dψ, J P H dψσ + (V (σ, 0) + K(σ ))|ψ|2 dvolN . (4.4) 2
Local co-ordinate expressions for these operators are given by (7.7) and (7.6) below. As in the classical case, we can gain additional understanding of these operators by introducing the metric (3.3). Then 1 λ2 HB + λ2 HO = − λ + n, A(σ )n + V (σ, 0) + K(σ ), 2 2 where λ is the Laplace–Beltrami operator on N with the metric (3.3). Note that the volume element dvolN is actually λm times the usual volume element associated to this metric (see Sect. 7). The operator HO is explicitly given on C 2 functions in its domain by the formula
m m 1 ∂2 1 (HO ψ)(σ, n) = − + n, A(σ )n ψ(σ, yk nk (σ )), 2 2 ∂yk2 k=1 k=1
where {nk (σ ) : k = 1 . . . m} is any orthonormal basis for N and n = m k=1 yk nk (σ ). It is easy to show that with the metric (3.3), N is complete so that any positive integer power of HB + λ2 HO is essentially self-adjoint on C0∞ for λ > 0 [C]. Similarly, because HO is basically a harmonic oscillator Hamiltonian, it is straightforward to show that any positive integer power of HO is essentially self-adjoint on C0∞ . The operator HB is more complicated, but also can be shown to be essentially self-adjoint on C0∞ . The argument is not difficult and will be omitted. Theorem 4.1. Let be a smooth compact n-dimensional submanifold of Rn+m . Let gN be a complete smooth Riemannian metric on N that coincides with the induced metric when n < :, for some : < δ, and suppose V (σ, n) is a bounded smooth extension of V (σ + n). Let Hλ be the Hamiltonian given by (4.1), acting in L2 (N , dvol). Assume that A(σ ) varies smoothly, and that the eigenvalues of ωα2 of A(σ ) do not depend on σ . Let Lλ = Uλ∗ Hλ Uλ acting in L2 (N , dvolN ). Then, for every ψ ∈ L2 (N , dvolN ) and every T > 0, 2 lim sup e−itLλ − e−it (HB +λ HO ) ψ = 0. λ→∞ 0≤t≤T
Realizing Holonomic Constraints in Classical and Quantum Mechanics
501
Just as in the classical case, this theorem provides a satisfactory description of the motion if [HB , HO ] = 0, so that exp(−it (HB + λ2 HO )) = exp(−itHB ) exp(−itλ2 HO ). As before, this will happen, for example, if has co-dimension one, or if all the frequencies ωα are equal. If has co-dimension one, then the normal bundle is trivial. (We are assuming that is compact.) Then we have L2 (N , dvolN ) = L2 (, dvol ) ⊗ L2 (R, dy) and HB = hB ⊗I for a Schrödinger operator hB acting in L2 (, dvol ). Since HO = I ⊗hO we have that exp(−it (HB + λ2 HO )) = exp(−ithB ) ⊗ exp(−itλ2 hO ). This can be interpreted as a motion in L2 (, dvol ) with superimposed normal oscillations. In the case where the frequencies ωα are all equal, the normal bundle may be nontrivial, and there is not such a simple tensor product decomposition of L2 (N , dvolN ). However, for some initial conditions ψ the limiting motion may again be thought of as taking place in L2 (, dvol ) with superimposed oscillations. For example, consider the subspace of functions in L2 (N , dvolN ) that are radially symmetric in the fibre variable n. This subspace does have a tensor product decomposition L2 (, dvol ) ⊗ L2radial (Rm , d m y). It is an invariant subspace for HB . Furthermore, the restriction of HB to this subspace has the form hB ⊗ I . Thus, if ψ0 is a radial function in n, then exp(−itLλ )ψ0 = exp(−ithB )⊗exp(−itλ2 hO )ψ0 .As above, we interpret this as motion in L2 (, dvol ) with superimposed normal oscillations. On the other hand, if the normal bundle is non-trivial, it may happen that the limiting motion takes place on a space of sections of a vector bundle over . Instead of giving more details about the general case, we offer the following illustrative example. Instead of a normal bundle, consider the Möbius band B defined by R × R / ∼, where (x, y) ∼ (x + 1, −y). This an O(1) bundle over S 1 with fibre R. An L2 function ψ on B can be thought of as a function on R × R satisfying ψ(x + 1, −y) = ψ(x, y). If we decompose ψ(x, y), for fixed x, into odd and even functions of y, ψ(x, y) = ψeven (x, y) + ψodd (x, y), then ψeven (x+1, y) = ψeven (x, y) and ψodd (x+1, y) = −ψodd (x, y). (Notice that these are eigenfunctions for the left regular representation of O(1) on L2 (R).) Thus ψeven can be thought of as an L2 (R, dy) valued function on S 1 , while ψodd can be thought of as an L2 (R, dy) valued section of a line bundle over S 1 (which happens to be B itself). In this way we obtain the decomposition L2 (B) = L2 (S 1 , dx) ⊗ L2even (R, dy) ⊕ B(S 1 , dx) ⊗ L2odd (R, dy), where B is the space of L2 sections of B. In this example, the bundle is flat, so HB = −Dx2 + V (x) and HO = −Dy2 + y 2 /2 acting in L2 (B, dxdy). Let h+ = −Dx2 + V (x) acting in L2 (S 1 , dx) and h− = −Dx2 + V (x) acting in B(S 1 , dx). Let h0 = −Dy2 +y 2 /2 acting in L2 (R, dy), with L2even (R, dy) and L2odd (R, dy) as invariant subspaces. Then e−it (HB +λ
2H ) O
= e−ith+ ⊗ e−itλ
2h O
⊕ e−ith− ⊗ e−itλ
2h O
.
So if the initial condition happens to lie in B ⊗ L2odd , then we would think of the limiting motion as taking place in B, with superimposed oscillations in L2odd .
502
R. Froese, I. Herbst
When HB and HO do not commute, we perform a quantum version of averaging. Define H B on C0∞ by ∞ −1 H B ψ = lim T eitHO HB e−itHO ψ dt. (4.5) T →∞
0
It can be shown that H B is essentially self-adjoint. Theorem 4.2. Assume that the hypotheses of Theorem 4.1 hold. Let HO , HB , and H B be the Hamiltonians defined by (4.3), (4.4) and (4.5). Then, for every ψ ∈ L2 (N , dvolN ) and every T > 0, 2 2 lim sup e−it (HB +λ HO ) − e−itλ HO e−itH B ψ = 0. λ→∞ 0≤t≤T
The proof that this limit defining H B exists parallels the discussion in classical 2 . For each α = mechanics. Suppose that there are m0 distinct eigenvalues ω12 , . . . , ωm 0 1, . . . , m0 define the operators Iα via the quadratic forms 1 ωα ψ, Iα ψ = P V dψ, Pα P V dψ + n, Pα n|ψ|2 dvolN . 2 N 2ωα These operators all commute and satisfy ω α I α = HO . α
An expression for Iα in terms of local creation and annihilation operators will be given iτ Iα H e−iτ Iα is periodic in τ near the end of Sect. 7. In that section we will show B that e i τ I α α with period 2π . Thus if we conjugate HB with e , the resulting operator is defined on the torus T m0 and has a Fourier expansion ei τα Iα HB e−i τα Iα = eiν,τ Fν . ν∈Zm0
Here τ = (τ1 , . . . , τm0 ) and the coefficients Fν are differential operators. As in the classical case, the sum is finite. Thus eitHO HB e−itHO = eitν,ω Fν . ν∈Zm0
This shows that the limit defining H B exists, and is given by HB = Fν . ν∈Zm0 :ν,ω=0
As in the classical case, we may look for conditions under which the limiting motion can be considered to take place on . Suppose that the eigenvalues ω1 , . . . , ωm are all distinct, and, in addition, that the eigenvectors nk (σ ) can be chosen to be smooth functions on all of . Then the normal bundle is trivial, N = ×Rm and L2 (N , dvolN ) = L2 (, dvol ) ⊗ L2 (Rm , d m y). If the non-resonance condition (3.9) holds, then 1 H B = − + V (σ ) + K(σ ) ⊗ 1 + V1 . 2
Realizing Holonomic Constraints in Classical and Quantum Mechanics
503
The term V1 is slightly different from (3.11), because terms arising in its computation do not all commute. It is given by I k Il ωl 1 |bk,l |2 . − V1 = ωk 4 k,l
The joint eigenspaces of I1 , . . . , Im are invariant subspaces for H B . The restriction of H B to such a joint eigenspace is the Schrödinger operator − 21 + V (σ ) + K(σ ) + V˜1 , acting in L2 (, dvol ), where V˜1 is obtained from V1 by replacing the operators Ik by their respective eigenvalues. Thus H B is a direct sum of Schrödinger operators acting in L2 (, dvol ). 5. Co-ordinate Expressions Our proofs will rely on local co-ordinate expressions for the quantities introduced above. Suppose x(σ ) is a local co-ordinate map for . Its inverse σ (x) is a local imbedding of Rn onto ⊂ Rn+m . Given a local orthonormal frame n1 (σ ), . . . , nm (σ ) for the normal bundle, we obtain local co-ordinates for N by setting i = 1, . . . , n, xi (σ, n) = xi (σ ), yi (σ, n) = ni (σ ), n, i = 1, . . . , m. We then may form the standard bases ∂/∂x1 , . . . , ∂/∂xn , ∂/∂y1 , . . . , ∂/∂ym for the tangent spaces of N and dx1 , . . . , dxn , dy1 , . . . , dyn for the cotangent spaces. This gives rise to local co-ordinates for T N and T ∗ N in the standard way. For the cotangent bundle, we will denote these by (x, y, p, r) ∈ R2(n+m) . Thus (x, y, p, r) denotes the cotangent vector pi dxi + rj dyj in the cotangent space over (σ (x), j yj nj (σ )). The standard symplectic form for T ∗ N is the two form given by ω=
n
dpi ∧ dxi +
i=1
m
drj ∧ dyj .
j =1
The dilation map Dλ is given in local co-ordinates by Dλ (x, y, p, r) = (x, λy, p, λ−1 r).
(5.1)
Clearly this map preserves the symplectic form ω. We now compute the local expression for the metric. Let σi (x) ∈ Rn+m denote the vector ∂σ (x)/∂xi . The tangent vector ∂/∂xi ∈ T(σ,n) N corresponds to the vector in R2(n+m) given by (σi , j yj dnj (σ )[σi ]). The tangent vector ∂/∂yj corresponds to (0, nj (σ )) Here σ = σ (x), σi = σi (x) and n = j yj nj (σ (x)). Using (2.1) for the inner product, we find that the local expression for the metric has block form
T G + C + BB T B I B G + C 0 I B G(x, y) = = , (5.2) 0 I 0 I 0 I I BT where G = G (x) is the metric for with matrix entries σi (x), σj (x), B = B(x, y) is the matrix with entries yk dnk [σi ], nj , (5.3) Bi,j (x, y) = k
504
R. Froese, I. Herbst
and where C = C(x, y) is the matrix with entries Ci,j (x, y) = yk (dnk [σi ], σj + σi , dnk [σj ]) k
+
yk yl dnk [σi ], dnl [σj ] − BB T
k,l
=
yk (dnk [σi ], σj + σi , dnk [σj ]) +
k
(5.4) k,l
yk yl dnk [σi ], PσT dnl [σj ].
The geometrical meaning of the term G + C is given in (7.12) below. The inverse can be written
T I −B (G + C)−1 0 I −B −1 . G (x, y) = 0 I 0 I 0 I
(5.5)
The local expressions for the projections onto the vertical and horizontal subspaces can now be computed. Let PV and PH denote the projections for the tangent space and P V and P H the projections for the cotangent spaces. Then
0 0 I 0 PV = = P , H BT I −B T 0 and P V = GPV G−1 =
0B 0 I
P H = GPH G−1 =
I −B . 0 0
Notice that the vertical subspace of T(σ,n) N is the span of ∂/∂y1 , . . . , ∂/∂ym and ∗ N is the span of dx , . . . , dx . The map dπ the horizontal subspace of Tσ,n 1 n σ,n : T(σ,n) N → Tσ sends ∂/∂xi ∈ T(σ,n) N to ∂/∂xi ∈ Tσ and sends ∂/∂yi ∈ ∗−1 , defined on the horizontal subspace of T(σ,n) N to 0. From this it follows that J = dπσ,n ∗ N sends dx ∈ T ∗ N to dx ∈ T ∗ . If (σ, n, ξ, η) has co-ordinates (x, y, p, r) Tσ,n i i σ,n σ then ξ has co-ordinates
p p − B(x, y)r H P (x, y) = , r 0 so that J ξ has co-ordinates
p − B(x, y)r.
We now compute the expressions for Hλ , HB and HO in local co-ordinates. We will abuse notation and use the same letters to denote functions on T ∗ N and their co-ordinate expressions. Suppose that the co-ordinates of (σ, n, ξ, η) are (x, y, p, r). Since
00 G−1 P V = P V T G−1 P V = (5.6) 0I we have that
p p η, η = P V , G−1 P V = r, r. r r
(5.7)
Realizing Holonomic Constraints in Classical and Quantum Mechanics
505
Here, and in what follows, inner products involving vectors always refer to co-ordinate 2 . For the horizontal vectors, Euclidean inner products. For example, r, r = m r i=1 i we have
I −B PH = PH 0 I so that ξ, ξ = P
H
p −1 H p ,G P r r
(5.8)
= (p − Br), (G + C)−1 (p − Br). Therefore the local co-ordinate expression for Hλ is Hλ (x, y, p, r) =
1 (p − Br), (G + C)−1 (p − Br) 2 1 λ4 + r, r + y, A(x)y + V (x, y). 2 2
Here C = C(x, y) and B = B(x, y) are the matrices appearing in the expression for the metric G, A(x) is the matrix for A(σ ) in the basis given by the orthonor mal frame n1 , . . . nm used to define the co-ordinate system and V (x, y) = V σ (x) + yk nk (σ (x)) . Similarly HB (x, y, p, r) =
1 (p − Br), G−1 (p − Br) + V (x, 0), 2
(5.9)
where B = B(x, y) and G = G (x). Finally HO (x, y, p, r) =
1 1 r, r + y, A(x)y. 2 2
(5.10)
The expressions for HO and Iα simplify if we can choose the vectors in the local orthonormal frame to be eigenvectors of A(σ ). This is always possible if there are no eigenvalue crossings. When, in addition, the eigenvalues ωα2 (σ ) do not depend on σ there are further simplifications. In what follows we will assume that there are m0 distinct constant eigenvalues ωα2 for α = 1, . . . , m0 , where ωα2 has multiplicity kα . We will assume that the local orthonormal frame used to define the co-ordinate system consists of eigenvectors for A(σ ). We label them nα,j , where α = 1, . . . m0 and j = 1, . . . , kα , where for each α, nα,j is an eigenvector with eigenvalue ωα2 . This means that the coordinates y and r now also acquire a double labelling. First of all we have HO (x, y, p, r) =
1 r, r + 2
1 2
α
If the co-ordinates of (σ, n, ξ, η) are (x, y, p, r), then 2 yα,j . n, Pα n = j
ωα2
j
2 yα,j .
506
R. Froese, I. Herbst
p The vertical cotangent vector η has co-ordinates P V . The corresponding tangent r
p 0 which equals , by (5.6). Now the projection vector has co-ordinates G−1 P V r r Pα , acting on tangent vectors, just picks off the basis vectors ∂/∂yα,j , i.e., Pα ∂/∂yβ,j = δβ,α ∂/∂yβ,j . Thus 2 η, Pα η = rα,j . j
Therefore Iα (x, y, p, r) =
1 2 ωα 2 rα,j + yα,j . 2ωα 2 j
j
Notice that in this situation, where the vectors in the local orthonormal frame are eigenvectors of A(σ ), neither HO nor Iα depend on x or p. Now we introduce local action-angle co-ordinates. In analogy with creation and destruction operators in quantum mechanics, we define the complex quantities aα,j =
yα,j ωα + irα,j , √ 2ωα
so that 1 ∗ (aα,j + aα,j ), 2ωα ωα ∗ = −i ). (aα,j − aα,j 2
yα,j = √ rα,j
The action variables Iα,j ∈ R and angle variables ϕα,j ∈ S 1 are then defined by aα,j = Iα,j eiϕα,j . Notice that j Iα,j = Iα . The change of co-ordinates from (x, y, p, r) to (x, ϕ, p, I ) is symplectic, since drα,j ∧ dyα,j = dIα,j ∧ dϕα,j . This makes it easy to compute the flow φtIα in these co-ordinates. Hamilton’s equations for the flow are x˙i = 0,
p˙ i = 0, I˙α,j = 0, ϕ˙α,j = δβ,α .
Thus, under the flow φtIα each ϕα,j is translated by t and all the other variables remain unchanged. This implies that under the group action 0(τ ), with τ = (τ1 , . . . , τm0 ) the quantities aα,j evolve as e−iτα aα,j . We now compute the expression for HB in action angle co-ordinates. We find Bi,(α,j ) (x, y)rα,j (Br)i = α,j
=
β,k,α,j
=
i b(α,j ),(β,k) (x)rα,j yβ,k
i b(α,j ),(β,k) (x) β,k,α,j
2
∗ (aα,j − aα,j )(aβ,k
∗ + aβ,k )
ωα . ωβ
Realizing Holonomic Constraints in Classical and Quantum Mechanics
507
i Here b(α,j ),(β,k) (x) = b(α,j ),(β,k) [σi (x)] is the antisymmetric matrix given by (3.10). The expression for HB is now obtained by substituting this formula for Br into (5.9), which we may rewrite as
HB (x, p, ϕ, I ) =
1 1 pi g i,l pl − (Br)i g i,l pl + (Br)i g i,l (Br)l + V (x, 0). 2 2 i,l
i,l
i,l
Here g i,l = g i,l (x) are the matrix elements of G−1 (x). To obtain the expression for HB ◦0(τ ) we simply replace each occurrence of aα,j in the formula above with eiτα aα,j . ∗ , we see that the Since HB contains only constant, quadratic and quartic terms in aα,j , aα,j Fourier expansionof HB ◦ 0(τ ) has finitely many terms, since the ν = (ν1 , . . . , νm0 )’s that appear have α |να | ∈ {0, 2, 4}. 6. Proofs of Theorems in Classical Mechanics Proof of Theorem 3.1. We begin with some remarks about the co-ordinate charts for T ∗ N . We will assume that the frames used to defined the co-ordinates consist of eigenvectors of A(σ ). We assume that each chart has the form {(σ, n, ξ, η) : σ ∈ U, n ∈ ∗ N is horizontal, η ∈ T ∗ N is vertical}, where U is a co-ordinate Nσ , ξ ∈ Tσ,n σ,n chart for . Since is compact, there is an atlas with finitely many charts, and there exists a positive number :1 so that two points in T ∗ N both lie in a single chart if their projections onto are a distance less than :1 apart. We use the notation γλ (t) = φtLλ (γλ ),
γ λ (t) = φtHB +λ
2H 0
(γ0 ).
Our first estimates are large λ bounds on the components of γλ (t) = (σλ (t), nλ (t), ξλ (t), ηλ (t)) that follow from the conservation of energy. These bounds are |nλ (t)|, |ηλ (t)| ≤ C
(6.1)
|ξλ (t)| ≤ Cλ.
(6.2)
and
The analogous bounds also hold for γ λ (t) = (σ λ (t), nλ (t), ξ λ (t), ηλ (t)). Clearly |nλ (t)| = |yλ (t)| and, by (5.7), |ηλ (t)| = |rλ (t)|. Thus, (6.1) implies that |yλ (t)| and |rλ (t)| remain bounded. To prove these we first consider the action of Dλ−1 on ξλ . Let γλ = (σλ , nλ , ξλ , ηλ ) have co-ordinates (xλ , yλ , pλ , rλ ). Then ξλ ∈ Tσ∗λ ,nλ N has co-ordinates
pλ − B(xλ , yλ )rλ H pλ = . P rλ 0 We now wish to apply Dλ−1 . Since B(x, y) is linear in y, the scaling in yλ and in rλ cancel. In other words B(xλ , λ−1 yλ )λrλ = B(xλ , yλ )rλ .
508
R. Froese, I. Herbst
Thus Dλ−1 ξλ ∈ Tσ∗ ,λ−1 n N has the same co-ordinates as ξλ ∈ Tσ∗λ ,nλ N . This implies λ λ that as λ → ∞,
pλ − B(xλ , yλ )rλ p − B(xλ , yλ )rλ , G−1 (xλ , λ−1 yλ ) λ |Dλ−1 ξλ |2 = 0 0 −1 pλ − B(xλ , yλ )rλ = pλ − B(xλ , yλ )rλ , G (xλ ) + C(xλ , λ−1 yλ ) → p0 − B(x0 , y0 )r0 , G (x0 )−1 p0 − B(x0 , y0 )r0 = |dπ ∗−1 ξ0 |2 .
(6.3)
Thus, for large λ, the initial energy satisfies 1 −1 2 λ2 |Dλ ξλ | + CV + |ηλ |2 + nλ , A(σλ )nλ ) 2 2 2 ≤ Cλ ,
Lλ (γλ ) = Hλ ◦ Dλ−1 (γλ ) ≤
where CV is an upper bound for V in a neighbourhood of . Given this bound on the initial energies, we may assume that V is bounded, as was explained in the introduction. We now estimate the energy for later times t, Lλ (γλ (t)) = Hλ ◦ Dλ−1 (γλ (t)) 1 ≥ |Dλ−1 ξλ (t)|2 − V ∞ + Cλ2 |ηλ (t)|2 + |nλ (t)|2 2 ≥ −V ∞ + Cλ2 |ηλ (t)|2 + |nλ (t)|2 .
Since energy is conserved, i.e., Lλ (γλ (t)) = Lλ (γλ ), this implies (6.1). In a similar way we find that |Dλ−1 ξλ (t)|2 ≤ Cλ2 .
(6.4)
Now for |y| < C1 sufficiently large λ there is a constant C such that G−1 (x, y) < CG−1 (x, λ−1 y) in any of the finitely many co-ordinate patches. Thus, (6.3) implies |ξλ (t)| ≤ |Dλ−1 ξλ (t)|, so that (6.4) implies (6.2). The proof of bounds (6.1) and (6.2) for γ λ (t) is similar. We now wish to improve the bound (6.2) to |ξλ (t)| ≤ C
(6.5)
for 0 ≤ t ≤ T . We begin by defining a function Q that depends on our co-ordinate systems. Let χ1 (σ ), . . . , χN (σ ) be a partition of unity with each χk supported in a single co-ordinate patch. Define Q = Qk χk , where the local co-ordinate expression for Qk is 1 Qk (x, p) = p, G (x)−1 p + 1. 2
Realizing Holonomic Constraints in Classical and Quantum Mechanics
509
(We are abusing notation by using the same letter Qk for the function on T ∗ N and its local co-ordinate expression.) Given (6.1) we may find a constant C such that |ξλ (t)|2 ≤ CQ(γλ (t)). Thus bound (6.5) follows from an upper bound for Q along an orbit. To establish such a bound we first estimate the time derivative of Qk (xλ (t), pλ (t)). This derivative is given by the Poisson bracket, d Qk (xλ (t), pλ (t)) = {Qk , Lλ } (xλ (t), pλ (t), pλ (t), rλ (t)). dt Recall that the orthonormal frame n1 (σ ), . . . , nm (σ ) giving our local co-ordinates consists of eigenvectors of A(σ ). Thus Lλ = HB + λ2 HO + Eλ with HB (x, y, p, r) = Qk (x, p) − B(x, y)r, G (x)−1 p 1 + B(x, y)r, G (x)−1 B(x, y)r + V (x, 0), 2 1 1 2 2 HO (x, y, p, r) = r, r + ωi y i 2 2 i
and Eλ (x, y, p, r) 1 p − B(x, y)r , (G (x) + C(x, λ−1 y))−1 − G (x)−1 p − B(x, y)r = 2 + V (x, λ−1 y) − V (x, 0). Since Qk only depends on x and p any Poisson bracket {Qk , F } is given in local coordinates by ∂Qk ∂F ∂Qk ∂F {Qk , F } = − . ∂pi ∂xi ∂xi ∂pi i
Thus {Qk , HO } = {Qk , Qk } = 0. Using these formulas, together with (6.1) and (6.2) we find d Qk (xλ (t), pλ (t)) ≤ C pλ (t)2 + λ−1 pλ (t)3 (6.6) dt ≤ CQk (xλ (t), pλ (t)). Next, writing Hamilton’s equations for xλ (t) and using (6.1) we find ∂HB |x˙λ (t)| ≤ ∂p 1 2
≤ CQ (xλ (t), pλ (t)).
(6.7)
510
R. Froese, I. Herbst
Since the cutoff functions, written in local co-ordinates, only depend on xλ we find that 1
|χ˙ k | ≤ C|x˙λ | ≤ CQ 2 .
(6.8)
Now we show if we evaluate Qk and Qj at the same point γ = (σ, n, ξ, η) with |n|, |η| < C then 1
|Qk (γ ) − Qj (γ )| ≤ CQk (γ ) 2 .
(6.9)
To see this, we first compute how our co-ordinates change. If (x, ˜ y, ˜ p, ˜ r˜ ) are the coordinates in the j th chart, obtained from the co- ordinates in the i th chart by a change of co-ordinates on and a change of frame, then p˜ = Mp + b, −1 −1 ˜ G = M −1 G−1 M , where M is the n × n matrix with entries ∂ x˜i /∂xj and b is a vector with components rk yl ∂θkl /∂xi for an orthogonal matrix valued function θ(x) given by taking inner products of the elements of the old and new frames. Thus ˜ −1 p Qj = p, ˜ G ˜ +1
2 = Qk + 2b, M −1 G−1 p + b + 1 1
≤ Qk + CQk2 . This implies (6.9). ˙ denote Now we are ready to establish a bound for Q along an orbit. Let Q dQ(γλ (t))/dt. Then ˙ = ˙ j χj + Qj χ˙ j Q Q j
=
˙ j χj + Q
j
Qj χ˙ j χk .
k,j
The first term is estimated using (6.6) yielding ˙ j χj ≤ C Q Qj χj = CQ. j
j
To estimate the second term, note that since k χk = 1, we have k χ˙ k = 0. Thus Qk χ˙ j χk = 0, k,j
so that
k,j
Qj χ˙ j χk =
(Qj − Qk )χ˙ j χk k,j
≤ CQ
Realizing Holonomic Constraints in Classical and Quantum Mechanics
511
by (6.8) and (6.9). Thus we have the differential inequality ˙ ≤ CQ Q which implies Q(γλ (t)) ≤ Q(γλ (0))eCt . This implies (6.5) Note that (6.7) implies σ˙ λ (t), σ˙ λ (t) < C
(6.10)
for 0 ≤ t ≤ T . We will now show that there exists : > 0 such that if lim sup γλ (τ ) − γ λ (τ ) = 0
λ→∞ τ ∈[0,t]
(6.11)
holds for some t = t1 ≤ T then (6.11) also holds for any t ≤ t1 + :. Since (6.11) holds for t = 0 by the assumption on the initial conditions, this will complete the proof. So assume that (6.11) holds for t = t1 ≤ T . To compare the two orbits for nearby times, we want to ensure that they lie in the same co- ordinate patch. There exists an :1 > 0 such that γλ and γ λ will lie in the same co-ordinate chart if σλ − σ λ < :1 . Choose λ0 so that λ > λ0 implies sup γλ (τ ) − γ λ (τ ) < :1 /3.
τ ∈[0,t1 ]
Now fix λ > λ0 . For t > t1 , σλ (t) − σ λ (t) ≤ σλ (t) − σλ (t1 ) + σλ (t1 ) − σ λ (t1 ) + σ λ (t1 ) − σ λ (t) ≤ 2|t − t1 |C + :1 /3, where C is the constant from (6.10). Thus if we choose : < :1 /3C then γλ and γ λ will lie in the same co-ordinate chart for t ∈ [t1 , t1 + :]. Notice that we do not rule out the chart changes with λ. We now write down the differential equation for γλ and γ λ in this common co-ordinate chart. Let z ∈ R2(n+m) denote co-ordinates for T ∗ N , i.e., x y z = . p r Denote by zλ the co-ordinates of γλ and by zλ the co-ordinates of γ λ . For a Hamiltonian H , let XH denote the corresponding Hamiltonian vector field given in local co-ordinates by ∂H /∂x(z) ∂H /∂y(z) XH (z) = . −∂H /∂p(z) −∂H /∂r(z)
512
R. Froese, I. Herbst
Then d zλ (t) = XHB (zλ (t)) + Xλ2 HO (zλ (t)) + XEλ (zλ (t)) . dt
(6.12)
Since HO is quadratic, the vector field Xλ2 HO is linear, given by Xλ2 HO (z) = λ2 Dz for a matrix D that is similar to a real antisymmetric matrix. It follows that (6.12) can be written in integral form t 2 2 2 zλ (t) = eλ (t−t1 )D zλ (t1 ) + eλ tD e−λ τ D XHB (zλ (τ )) + XEλ (zλ (τ )) dτ. t1
We may write a similar equation for the co-ordinates of γ λ and obtain zλ (t) − zλ (t) = eλ
2 (t−t )D 1
(zλ (t1 ) − zλ (t1 )) t 2 2 e−λ τ D XHB (zλ (τ )) − XHB zλ (τ ) + XEλ (zλ (τ )) dτ. + eλ tD t1
The harmonic oscillator evolution eλ tD is similar to a rotation and therefore uniformly bounded. Moreover we have the estimates XH (zλ (τ )) − XH zλ (τ ) ≤ C zλ (τ ) − zλ (τ ) 2
B
and
B
XE (zλ (τ )) ≤ Cλ−1 . λ
These follow from (6.1) and (6.5) which imply that the co-ordinates for the orbits stay in compact sets. Thus zλ (t) − zλ (t) = C zλ (t1 ) − zλ (t1 ) + C|t − t1 | sup zλ (τ ) − zλ (τ ) + C|t − t1 |λ−1 . τ ∈[t1 ,t1 +:]
If we now also insist that : < 1/(2C), then we find that 1 sup zλ (τ ) − zλ (τ ) ≤ C zλ (t1 ) − zλ (t1 ) + C:λ−1 . 2 τ ∈[t1 ,t1 +:] Since we have only finitely many co-ordinate charts, there is a constant C so that C −1 zλ (τ ) − zλ (τ ) ≤ γλ (τ ) − γ λ (τ ) ≤ C zλ (τ ) − zλ (τ ) in any chart. Thus we conclude that sup
τ ∈[t1 ,t1 +:]
γλ (τ ) − γ λ (τ ) ≤ C γλ (t1 ) − γ λ (t1 ) + C:λ−1 .
Realizing Holonomic Constraints in Classical and Quantum Mechanics
This implies that sup
lim
λ→∞ τ ∈[t1 ,t1 +:]
513
γλ (τ ) − γ λ (τ ) = 0
and completes the proof. # $ Proof of Theorem 3.2. We will show that there exists : > 0 such that if (3.5) holds for some t = t1 ≤ T , then (3.5) also holds for any t ≤ t1 + :. So assume that (3.5) holds for some t = t1 ≤ T . Define 2 λ2 HO ψλ (t) = φ−t ◦ φtHB +λ HO (γ0 ). Choosing our co-ordinate charts as in the proof of Theorem 3.1, we find that for small enough :, ψλ (t) will stay in a single chart for t ∈ [t1 , t1 + :]. This follows from the 2 estimate (6.10) for γ λ (t) = φtHB +λ HO (γ0 ) and the fact that the harmonic oscillator
λ HO keeps the base point σ fixed. motion φ−t Let wλ (t) denote the local co-ordinates of ψλ (t). In local co-ordinates, the evolution 2 λ2 HO φ−t is given by multiplication by e−tλ D , and so 2
wλ (t) = e−tλ
2D
zλ (t),
where D is the same matrix, similar to a real antisymmetric matrix, that appeared in the proof of Theorem 3.1, and zλ (t) are the co-ordinates of γ λ (t). Differentiating, we obtain dwλ (t) 2 2 = e−tλ D XHB (etλ D wλ (t)), dt so that for t ∈ [t1 , t1 + :],
wλ (t) = wλ (t1 ) +
t t1
e−sλ
2D
XHB (esλ
2D
wλ (s))ds.
(6.13)
Now consider the family of R2(n+m) valued functions on [t1 , t1 + :] given by W = {wλ (·) : λ > 0}. We will show for any sequence λj → ∞, there is a subsequence λ1,j such that wλ1,j converges uniformly to the same limit w∞ . This will imply that wλ → w∞ uniformly. The estimates (6.1) and (6.5) of Theorem 3.1 and the fact that the matrices e−tD are bounded uniformly in t imply that W is a bounded family. Moreover, from (6.13) and the boundedness of the orbits, it follows that wλ (t) − wλ (t ) ≤ C|t − t | so that W is equicontinuous. Suppose we are given a sequence λj → ∞. Then, by Ascoli’s theorem, there exists a subsequence λ1,j so that wλ1,j converges uniformly to w∞ . We wish to show that w∞ is always the same, no matter which sequence we start with. Our assumption on t1 implies that wλ1,j (t1 ) always converges to the same w0 , namely to the co-ordinates of φtH1 B (γ0 ). We will show that w∞ (t) is the orbit generated by the Hamiltonian H B with initial condition w0 at t = t1 . Using the uniform boundedness of the matrices e−tD in (6.13) we find that t 2 2 w∞ (t) = w0 + e−sλ1,j D XHB (esλ1,j D w∞ (s))ds + o(1) t1
514
R. Froese, I. Herbst
H0 as j → ∞. Now esλ1,j D is a symplectic map, being the Hamiltonian flow φsλ 2 in local 2
1,j
co-ordinates. It follows that e−sλ1,j D XHB (esλ1,j D w∞ (s)) = XH 2
2
H0 B ◦φ 2 sλ1,j
(w∞ (s)).
If we use the Fourier expansion H0 HB ◦ φsλ = 2 1,j
we find that XH
B ◦φ
so that
=
H0 sλ2 1,j
ν∈Zm0
t t1
eisλ1,j ν,ω Fν 2
ν∈Zm0
ν∈Zm0
w∞ (t) = w0 +
eisλ1,j ν,ω XFν 2
eisλ1,j ν,ω XFν (w∞ (s))ds + o(1). 2
Taking j to infinity and using the Riemann-Lebesgue lemma, we find that w∞ (t) = w0 + = w0 +
t
ν∈Zm0 :ν,ω=0 t1 t t1
XFν (w∞ (s))ds
XH B (w∞ (s))ds.
This identifies w∞ (t) as the orbit generated by H B with initial condition w0 at t1 , as claimed. Now we have 2 sup e−tλ D zλ (t) − w∞ (t) → 0 t∈[t1 ,t1 +:]
as λ → ∞ which implies sup
t∈[t1 ,t1 +:]
2 λ z (t) − etλ D w∞ (t) → 0.
This implies sup
t∈[t1 ,t1 +:]
2 HB +λ2 H0 (γ0 ) − φtλ H0 ◦ φtH B (γ0 ) → 0 φt
and completes the proof. # $
Realizing Holonomic Constraints in Classical and Quantum Mechanics
515
7. More Co-ordinate Expressions In this section we give the co-ordinate expressions that will be needed in our proofs of the quantum theorems. We begin by defining the second fundamental form, the Weingarten maps and the mean and scalar curvatures. Let X and Y be two vector fields tangent to . Since the Lie bracket [X, Y ] = dY [X] − dX[Y ] is tangent to we find that II (X, Y ) = P N dX[Y ] = P N dY [X] + P N [X, Y ] = P N dY [X] is symmetric in X and Y . Here P N denotes the projection onto the normal space. By definition, II (X, Y ) is the second fundamental form. Given an orthonormal frame n1 (σ ), . . . , nm (σ ) for the normal bundle, we have II (X, Y ) = X, Sk Y nk k
for a collection of symmetric linear transformations Sk on the tangent space. These are called the Weingarten maps. Clearly X, Sk Y = nk , dX[Y ]. But, by differentiating nk , X = 0, we obtain dnk [Y ], X + nk , dX[Y ] = 0, so that the Weingarten maps can also be written as Sk = −P T dnk . Here P T denotes the orthogonal projection onto the tangent space. The mean curvature vector is given by m
h=
1 tr(Sk )nk n
(7.1)
k=1
while the scalar curvature is m
s=
1 ((tr(Sk ))2 − tr(Sk2 )). n(n − 1)
(7.2)
k=1
Recall that the local expression G(x, y) for the pulled back metric on N has the block form (5.2). Initially, G(x, y) is only defined for y < δ. In our theorem, we wish to extend this metric to a complete Riemannian metric on all of N . One way to achieve this is to join the induced metric for small |y| to the metric · , ·1 given by (3.3) for large |y|. Since the matrix for the metric · , ·1 is
I B 0 I
T G 0 I B 0 I 0 I
the resulting metric on all of N would have the matrix
G(x, y) =
I B 0 I
T G + χ C 0 I B , 0 I 0 I
where χ = χ (|y|) is a cutoff function that equals 1 for |y| < : and 0 for |y| > δ. With this special form of the extended metric the local co-ordinate expression below remains true on all of N if C is replaced by χ C. However, this special form of the extension is not required for our theorems.
516
R. Froese, I. Herbst
Let g(x, y) = det(G(x, y)) = det(G + C). Define Dx1 Dy1 Dx = ... , Dy = ... . Dxn
Dym
The local co-ordinate expression for the operator Hλ = − 21 + V (σ, n) + λ4 W (σ, n) in the region |y| < δ is
T
−1 0 Dx − BDy 1/2 (G + C) 1 −1/2 Dx − BDy Hλ = − 2 g g Dy Dy 0 I λ4 + V (x, y) + y, A(x)y 2 1 −1/2 = − 2g (Dx − BDy )T g 1/2 (G + C)−1 (Dx − BDy ) + DyT g 1/2 Dy λ4 y, A(x)y. 2 Local expressions for the densities on N are dvol = g(x, y)|d n x||d m y|, dvolλ = g(x, y/λ)|d n x||d m y|, dvolN = g(x, 0)|d n x||d m y| = g (x)|d n x||d m y|, + V (x, y) +
where g (x) = det(G (x)). Thus the multiplication operator Mλ appearing in (4.2) is −1/4 multiplication by fλ where fλ (x, y) =
g(x, y/λ) . g (x)
We may now compute the local expression for Lλ . Conjugation by Dλ results in every multiplication by a (possibly matrix valued) function F (x, y) being replaced by multiplication by F (x, y/λ), and every Dy being replaced by λDy . Conjugation by Mλ −1/4 simply puts a multiplication by fλ to the right of the operator, and a multiplication 1/4 by fλ to the left. In a co-ordinate system for a domain in N of the form {(σ, n) : σ ∈
Dx U, n ∈ N σ } let D = and Gλ (x, y) be the scaled and extended metric taking Dy into account the scaling of Dy as well as y. In other words
I 0 I 0 Gλ (x, y) = G(x, y/λ) . (7.3) 0 λI 0 λI Then −1/4
Lλ = − 21 fλ g(x, y/λ)−1/2 D T g(x, y/λ)1/2 G−1 λ Dfλ 1/4
+ V (x, y/λ) +
λ2 y, A(x)y 2
−1/2 −1/4 T 1/4 1/2 −1 1/4 −1/4 fλ D fλ g Gλ fλ Dfλ
= − 21 g
(7.4) + V (x, y/λ) +
λ2 y, A(x)y. 2
Realizing Holonomic Constraints in Classical and Quantum Mechanics
517
Thus in the region where y < δλ we may use the explicit form of the metric to obtain Lλ = −
1 −1/4 −1/2 g 2 fλ
Dx − BDy Dy
T
(G + Cλ )−1 0 0 λ2 I
1/2 1/2 · g fλ
+ V (x, y/λ) +
Dx − BDy −1/4 fλ Dy
(7.5)
λ2 y, A(x)y, 2
where Cλ (x, y) = C(x, y/λ). Note that formally putting fλ = 1 above, and replacing Cλ by 0, we obtain for the first line of (7.5), −1/2 − 21 g
Dx Dy
T
I −B 0 I
T
1/2 g
0 G−1 0 λ2 I
I −B Dx , 0 I Dy
which is the Laplace–Beltrami operator for the metric which in local co-ordinates is
I B 0 I
T G 0 I B . 0 λ−2 I 0 I
This is easily seen to be the matrix for the metric (3.3). This explains part of the origin of the HB + λ2 HO . A more complete analysis (to which we now turn) is necessary to understand the origin of the term K(σ ). Before beginning this, note that the local expressions for HB and HO are given by HB =
1 (Dx − B(x, y)Dy )∗ G−1 (Dx − B(x, y)Dy ) + K(x) + V (x, 0) 2
(7.6)
and H0 =
1 ∗ D Dy + 21 y, A(x)y. 2 y
(7.7)
Here Dx∗ and Dy∗ denote the formal adjoints with respect to dvolN given by Dx∗ = −1/2
−1/2
−1/2
−g DxT g , Dy∗ = −g DyT g = −DyT and B ∗ = g B T g = B T . We now wish to perform a large λ expansion of Lλ . To state the error estimates precisely, we introduce the notation Ek to denote a smooth function of x and y that vanishes to k th order at y = 0, evaluated at (x, y/λ). Roughly speaking, Ek behaves like (y/λ)k for small y/λ. The effect of differentiating such an error term is given by 1/2
1/2
∂Ek = Ek , ∂xi ∂Ek λ−1 Ek−1 = λ−1 E0 ∂yi
1/2
if k ≥ 1 if k = 0.
In our theorems we will always assume that the eigenvalues ωj2 of A(σ ) are constant. If we choose the orthonormal frame in the definition of our co-ordinates to consist of eigenvectors of A(σ ) then n, A(σ )n = j ωj2 yj2 . We will make this substitution without further comment below.
518
R. Froese, I. Herbst
Lemma 7.1. In the region where y < δλ, the local expression for Lλ can be written Lλ = HB + λ2 H0 + (Dx − BDy )∗ E1 (Dx − BDy ) + E1 . Proof. In a co-ordinate
system for a domain in N of the form {(σ, n) : σ ∈ U, n ∈ Dx Nσ } let D = and Gλ (x, y) be given by (7.3). Setting kλ = (1/4) ln fλ , we may Dy write (7.4) as Lλ =
1 λ2 2 2 (D − ∂k ) + V (x, y/λ) + ωj y j , (D − ∂kλ )∗ G−1 λ λ 2 2
(7.8)
j
where ∂kλ =
∂ x kλ −1/2 1/2 , ∂kλ∗ = (∂kλ )T , and D ∗ = −g D T g . We further expand ∂y k λ
(7.8) to obtain Lλ =
1 ∗ −1 D Gλ D + 21 ∂kλ∗ G−1 λ ∂kλ 2 λ2 2 2 1 −1/2 1/2 + V (x, y/λ) + + g ∂i g G−1 ∂ k ωj y j . j λ λ i,j 2 2 i,j
(7.9)
j
If y < λδ then G−1 λ (x, y)
T I −B(x, y) I −B(x, y) (G (x) + C(x, y/λ))−1 0 = (7.10) 0 I I 0 λ2 I 0
so that in this region we obtain Lλ =
1 λ2 (Dx − BDy )∗ G (x)−1 (Dx − BDy ) + Dy∗ Dy 2 2 2 λ + (Dx − BDy )∗ E1 (Dx − BDy ) + E1 + ∂y2i kλ + (∂yi kλ )2 2 i
λ2 2 2 + V (x, y/λ) + ωj y j 2 j
= HB + λ HO + (Dx − BDy )∗ E1 (Dx − BDy ) + E1 λ2 2 + ∂yi kλ + (∂yi kλ )2 − K(x). 2 2
i
E1 , so that (∂x − B∂y )kλ = E1 . Here we used (∂x − B∂y )Ek = Ek and ∂kλ = −1 λ E0 The lemma will follow if we can show
λ2 2 ∂yi kλ + (∂yi kλ )2 = K(x) + E1 . 2 i
(7.11)
Realizing Holonomic Constraints in Classical and Quantum Mechanics
519
This requires a more careful expansion of fλ . The first step is to uncover the geometrical meaning of the term G (x) + C(x, y) occurring in the expression (5.2) for the metric. Note that dnk [σi ], σj = −Sk σi , σj = −σi , Sk σj = σi , dnk [σj ] and that
Mk = G−1 [σi , Sk σj ]
. , σn . Let S be the symmetric is the matrix for the Weingarten map Sk in the basis σ1 , . . operator defined by n, II (X, Y ) = X, SY . Then S = k yk Sk , and the matrix for S in the basis σ1 , . . . , σn is M = M(x, y) = yk Mk (x). k
A short calculation shows G + C = G (I − M)2 . Given the block form (5.2) of G and (7.12), we obtain fλ = gλ /g = det(G(x, y/λ))/ det(G (x)) = det(G (x)(I − λ−1 M(x, y))2 )/ det(G (x)) = det(I − λ−1 M(x, y))2 . Thus kλ = =
1/2 1 1 2 ln(fλ ) = 2 ln det(I −1 1 2 tr ln(I − λ M)
− λ−1 M)
1 = − 21 λ−1 tr(M) − λ−2 tr(M 2 ) + E3 4 1 1 −1 = −2λ yk tr(Sk ) − λ−2 yk yl tr(Sk Sl ) + E3 . 4 k
This implies that and
k,l
∂yi kλ = − 21 λ−1 tr(Si ) + λ−2 E1 + λ−1 E2 (∂yi )2 kλ = − 21 λ−2 tr(Si2 ) + λ−2 E1 .
Thus 1 1 λ2 2 ∂yi kλ + (∂yi kλ )2 = − tr(Si2 ) + (tr(Si ))2 + E1 4 8 2 i 1 1 = (tr(Si ))2 − tr(Si2 ) − (tr(Si ))2 + E1 4 8 n(n − 1) n2 = s − h2 + E1 . 4 8 Thus proves (7.11) and completes the proof.
$ #
(7.12)
520
R. Froese, I. Herbst
We conclude this section by discussing the expression for H B in local co-ordinates. We may define local annihilation and creation operators, using the co-ordinates yα,j defined in Sect. 5, as 1 (ωα,j yα,j + Dyα,j ), 2ωα 1 =√ (ωα,j yα,j − Dyα,j ). 2ωα
aα,j = √ ∗ aα,j
Then we find
1 ωα 2 y Dy2α,j + 2ωα 2 α,j j ∗ = aα,j + 21 . aα,j
Iα =
−
j
We may also write HB in terms of the annihilation and creation operators. We begin with i (B(x, y)Dy )i = bα,j ,β,k Dyα,j yβ,k . α,j ,β,k
Notice that the order of Dyα,j and yβ,k is irrelevant here, since bi is antisymmetric in (α, j ) and (β, k). Then we can use
ωα ∗ aα,j − aα,j , 2 1 ∗ aβ,k + aβ,k , = 2ωα
Dyα,j = yβ,k
and substitute the resulting expression in (7.6). The resulting formula expresses HB as a finite sum of terms involving the product of 0, 2 or 4 annihilation or creation operators. The identities eitHO aα,j e−itHO = e−itωα aα,j
∗ ∗ eitHO aα,j e−itHO = eitωα aα,j
lead to a finite sum eitHO HB e−itHO =
(7.13)
eitν,ω Fν
ν∈Zm0
that defines the differential operators Fν . Lemma 7.2. For ϕ ∈ C0∞ (N ), e−itHO ϕ ∈ D(HB ) and eitHO HB e−itHO ϕ =
eitν,ω Fν ϕ,
ν∈Zm0
where the operators Fν are defined by the sum above.
(7.14)
Realizing Holonomic Constraints in Classical and Quantum Mechanics
521
Proof. It suffices to prove this for ϕ ∈ C0∞ supported in a single co-ordinate patch, since a general ϕ ∈ C0∞ can be written as a sum of such functions. Introducing our usual local co-ordinates x and y, we find that e−itHO is simply a harmonic oscillator time evolution in the y variables. Hence e−itHO ϕ is in Schwartz space. This implies that e−itHO ϕ ∈ D(HB ), and that the expansion of HB into a sum of terms involving products ∗ is valid when applied to e−itHO ϕ. To complete the proof, it remains to of aα,j and aα,j show that the identities (7.13) hold when applied to a function ϕ in Schwartz space. This follows from d itHO aα,j e−itHO ϕ = ieitHO [HO , aα,j ]e−itHO ϕ e dt ∗ = iωα eitHO [aα,j aα,j , aα,j ]e−itHO ϕ ∗ = iωα eitHO [aα,j , aα,j ]aα,j e−itHO ϕ
= −iωα eitHO aα,j e−itHO ϕ.# $ 8. Proofs of Theorems in Quantum Mechanics We begin with some analysis that allows us to transfer our considerations from Rn+m to the normal bundle N . Let d(x, ) = inf{x − σ : σ ∈ } denote the distance to in Rn+m and let Uδ = {x ∈ Rn+m : d(x, ) < δ} be the tubular neighbourhood of that is diffeomorphic to N δ . The first proposition shows that the time evolution in L2 (Rn+m ) under Hλ is approximately the same for large λ as the time evolution in L2 (Uδ ) under the same Hamiltonian, except with Dirichlet boundary conditions. Proposition 8.1. Suppose that W, V ∈ C ∞ (Rn+m ) with W ≥ 0 and V bounded below. Suppose W (x) = 0 if and only if x ∈ and that W (x) ≥ w0 > 0 for large x. Suppose λ ≥ 1, ψ ∈ L2 (Rn+m ), ψ = 1 and Hλ ψ ≤ C1 λ2 , where Hλ = 1 − 2 + V + λ4 W . Then, given : > 0 there exists C2 such that for all t ∈ R, F(d≥:) e−itHλ ψ ≤ C2 λ−1 .
(8.1)
Here F(·) denotes multiplication by the characteristic function supported on the region indicated in the parentheses. Define Hλδ to be the operator in L2 (Uδ ) given by Hλ with Dirichlet boundary conditions on ∂Uδ . Then for all t ∈ [0, T ] and 0 < : < δ, δ
F(d≤:) e−itHλ ψ − e−itHλ F(d≤:) ψ ≤ C3 λ−1/4 . Here C2 depends only on C1 and : and C3 depends only on C1 , T and :. Remark. The power 1/4 in (8.2) is not optimal.
(8.2)
522
R. Froese, I. Herbst
Proof. By the assumption on ψ and the Schwarz inequality ψ, Hλ ψ ≤ C1 λ2 . Without loss we may assume that V ≥ 0, so that 1 ∇ψ2 ≤ C1 λ2 , 2 ψ, W ψ ≤ C1 λ−2 . It follows that
(8.3)
C(:)ψ, F(d≥:) ψ ≤ ψ, F(d≥:) W ψ ≤ C1 λ−2
which proves (8.1), since e−itHλ ψ satisfies the same hypotheses as ψ. For 0 < :1 ≤ α we will need the estimate 1
F(:1 ≤d≤α) ∇ψ ≤ C4 λ 2 ,
(8.4)
where C4 depends only on α, :1 and C1 . To prove this, choose a function χ ∈ C0∞ (Rn+m ), 0 ≤ χ ≤ 1, which is 1 in a neighbourhood of {x : :1 ≤ d(x, ) ≤ α} and vanishes in a neighbourhood of . Then F(:1 ≤d≤α) ∇ψ = F(:1 ≤d≤α) ∇(χ ψ) ≤ ∇(χ ψ). The Schwarz inequality and integration by parts gives 1
1
∇(χ ψ) ≤ (χ ψ) 2 χ ψ 2 so that (8.4) follows from (χ ψ) ≤ C5 λ2
(8.5)
and (8.1). To prove (8.5) let p = −i∇ and calculate, as forms on C0∞ × C0∞ , Hλ2 =
1 4 1 |p| + (V + λ4 W )2 + pj (V + λ4 W )pj − ( V + λ4 W ). 4 2
(8.6)
j
It follows from (8.6) and the fact that C0∞ is a core for Hλ that χ ψ ∈ D(Hλ ) and 1 p 2 χ ψ2 ≤ Hλ (χ ψ)2 + Cλ4 , 2 or
√ 1 1 2 p χ ψ ≤ Cλ2 + Hλ ψ + [ p 2 , χ ]ψ. 2 2 The last term can be bounded by (8.3), yielding (8.5). Let χ˜ be a smooth function which satisfies 0 ≤ χ˜ ≤ F(d 0, lim sup F01 ψ, eitL0λ e−itLλ F1 ψ − F01 ψ, F1 ψ = 0. λ→∞ 0≤t≤T
(8.15)
We now need to show the quantum analogue of the fact in classical mechanics that the orbits stay in a bounded region of phase space if we watch the system for a time T < ∞ which is independent of λ. Using energy considerations it follows from Lemma 8.2 that n and Dy are bounded but only that Dx cannot grow faster than λ. We now seek a λ independent bound, showing that up to a fixed time T , not too much energy can be transferred from normal to tangential modes. In the quantum setting the statement F2 Dx χ e−itLOλ FO1 ψ < C,
(8.16)
where F2 is as in Lemma 8.2, will suffice. We will prove this estimate when LOλ = Lλ , since the other case when LOλ = L0λ is similar. Let {χk2 (σ )} be a partition of unity subordinate to a finite cover of co-ordinate charts. In other words, each χk2 is supported in a single co- ordinate chart, and k χk2 = 1. We may assume that each χk is a smooth function only of σ . Define Q= χk Dx∗ G−1 (x)Dx χk , k
where, in each term, Dx and x are defined in terms of the co- ordinates for the chart in which χk is supported. We now want to cut Q off to the region where we have explicit expressions for the metric, and then add a constant to regain positivity. So let ¯ = F2 QF2 + 1. Q ¯ commute with F2 , since in local co-ordinates F2 is a function Notice that Q and Q ¯ are essentially self-adjoint on of y alone. It is not difficult to show that both Q and Q C0∞ (N ). Define ¯ −itLλ F1 ψ. q(t) = F1 ψ, eitLλ Qe Then (8.7) follows from sup{q(t) : t ∈ [0, T ]} ≤ C. We will prove a differential inequality as in the classical case. We will need further estimates to bound the terms which arise when we compute q(t) ˙ and to prove an upper bound for q(0).
528
R. Froese, I. Herbst
Lemma 8.3. Suppose F1 is a smooth cutoff in the energy λ−2 Lλ . Then γ ¯ −1/2 ≤C nl (λ−1 Dx )α Dyβ Dx χj F2 F1 Q if l + |α| + |β| ≤ 2 and |γ | = 1. Proof. We use the Helffer–Sjöstrand formula (see [D]) F1 = g(z)(Rλ − z)−1 dz ∧ d z¯ , where we may take g ∈ C0∞ (R2 ) with |g(z)|| Im z|−N ≤ CN for any N . (We are using γ the fact that F1 (λ−2 Lλ ) = F˜1 (Rλ ) for F˜1 ∈ C0∞ (0, 2). Let A1 = nα (λ−1 Dx )β Dy χ with χ ∈ C ∞ (), supported in the j th co-ordinate patch, χ χ1 = χ1 , and let F2,1 be a smooth function of |n|/λ with F2,1 F2 = F2 . Then γ ¯ −1/2 = A1 F2,1 F1 Dxγ χj F2 Q ¯ −1/2 + A1 F2,1 [Dxγ χj F2 , F1 ]Q ¯ −1/2 . A1 Dx χj F2 F1 Q
Using (8.8), the first term is bounded by a constant times γ ¯ −1/2 ≤ C A1 F2,1 Rλ · Dx χj F2 Q
and it is thus sufficient to show γ
Rλ−1 [Dx χj F2 , F1 ] ≤ C. We compute from the Helffer–Sjöstrand formula γ
γ
Rλ−1 [Dx χj F2 , F1 ] ≤ C[Dx χj F2 , λ−2 Lλ ]Rλ .
(8.17)
For our present purposes we can write Lλ = (Dx − BDy )∗ E0 (Dx − BDy ) +
λ2 ∗ ω2 yj2 ) + E0 (Dy Dy + 2 j
and we thus obtain γ
γ
[Dx χj F2 , λ−2 Lλ ] = λ−1 Dx χj (∇F2 · Dy + Dy · ∇F2 ) γ
+ λ−2 [Dx χj , (Dx − BDy )∗ E0 (Dx − BDy )]F2 + λ−2 E0 . The first term gives a bounded contribution to (8.17) by Lemma 8.2. The second term can be written λ−1 (Dx − BDy )∗ E0 λ−1 (Dx − BDy ) + Dy∗ E0 λ−1 (Dx − BDy ) + λ−1 (Dx − BDy )∗ E0 Dy + λ−2 E0 (Dx − BDy ) χj F2 γ + λ−1 Dx (∂x χj )T E0 λ−1 (Dx − BDy ) + λ−1 (Dx − BDy )∗ E0 ∂x χj F2 and again this gives a bounded contribution to (8.17) by Theorem 8.2. # $
Realizing Holonomic Constraints in Classical and Quantum Mechanics
529
We now return to the proof of Theorem 4.1 and calculate ¯ 1 e−itLλ ψ. q(t) ˙ = ie−itLλ ψ, F1 [Lλ , Q]F Let F1,1 be a C0∞ function of λ−2 Lλ with slightly larger support than F1 , so that F1 F1,1 = F1 . We will show that ¯ 1,1 ≤ C Q ¯ F1,1 [iLλ , Q]F
(8.18)
so that q(t) ≤ eCt q(0). First consider any term which arises when the cut-off F2 = F(|n|/λ 1, m > 0. As a corollary of the formulas (4.13)–(4.15) we find the actions of the translation operators Tδ and T˜δ : Tδ (ekδ+α) = (−1)kθ(α) e(k+1)δ+α ,
Tδ (e−kδ−α) = (−1)(k+1)θ(α) e−(k+1)δ−α ,
Tδ (elδ−α) = (−1)lθ(α) e(l−1)δ−α ,
Tδ (e−lδ+α) = (−1)(l−1)θ(α) e(−l+1)δ+α ,
Tδ (emδ ) = (−1)mθ(α) emδ ,
(4.16)
Tδ (e−mδ ) = (−1)mθ(α) e−mδ
for k ≥ 0, l > 1, m > 0, and T˜δ (˜enδ+α) = (−1)(n−1)θ(α) e˜(n−1)δ+α ,
T˜δ (˜e−nδ−α) = (−1)nθ(α) e˜(−n−1)δ+α ,
T˜δ (˜enδ−α) = (−1)(n+1)θ(α) e˜(n+1)δ−α ,
Tδ (˜e−nδ+α) = (−1)nθ(α) e˜−(n+1)δ+α ,
T˜δ (˜enδ ) = (−1)nθ(α) e˜nδ ,
(4.17)
T˜δ (e˜−nδ ) = (−1)nθ(α) e˜−nδ ,
where n > 0. (Also see (3.15) and (3.16)). Using the formulas (4.16), (4.17) we can −1 ˜ −1 easily find the actions for the inverse translation operators Tδ−1 , T˜δ−1 and T2δ , T2δ . These actions are not written here. From the relations (4.16), (4.17) it is clear that the operators Tδ±1 and T˜δ±1 can be used for construction of the Cartan–Weyl generators (4.3)–(4.6) starting from the Chevalley basis. In the case of the quantum untwisted affine algebras a similar procedure was applied in the paper [2]. Proposition 4.1. The root vectors (4.3)–(4.6) satisfy the following permutation relations: kd enδ±α kd−1 = q n(d,δ) enδ±α ,
−1 kd enδ kd = q n(d,δ) enδ ,
−1 kγ enδ±α kγ = q ±(γ ,α) enδ±α , kγ enδ kγ = enδ
(4.18)
for any n ∈ Z and any γ ∈ + , and also [enδ+α , e−nδ−α ] = (−1)nθ(α)
−1 knδ+α − knδ+α
q − q −1
[enδ−α , e−nδ+α ] = (−1)(n−1)θ(α)
[enδ+α , e(n+2m−1)δ+α ]q = (qα2 −1)
m−1 l=1
−1 knδ−α − knδ−α
(n ≥ 0),
(4.19)
(n > 0);
(4.20)
qα−l e(n+l)δ+α e(n+2m−1−l)δ+α ,
(4.21)
q − q −1
Quantum Affine (Super)Algebras
549
2 [enδ+α , e(n+2m)δ+α ]q = (qα −1)qα−m+1 e(n+m)δ+α
+ (qα2 −1)
m−1
qα−l e(n+l)δ+α e(n+2m−l)δ+α
(4.22)
qα−l e(n+2m−1−l)δ−α e(n+l)δ−α ,
(4.23)
qα−l e(n+2m−l)δ−α e(n+l)δ−α
(4.24)
l=1
for any integers n ≥ 0, m > 0; [e(n+2m−1)δ−α , enδ−α ]q = − (qα2 −1)
m−1
l=1 2 − (qα −1)qα−m+1 e(n+m)δ−α
[e(n+2m)δ−α , enδ−α ]q =
− (qα2 −1)
m−1 l=1
for any integers n, m > 0; [e−nδ+α , e(n+2m−1)δ+α ] = − (−1)(n−1)θ(α) (qα2 −1) ×
n+m−1 l=n
qα−l knδ−α e(l−n)δ+α e(n+2m−1−l)δ+α
+ (qα2 −1)
n−1 l=1
(4.25)
(−1)lθ(α) qα−l kδl e(l−n)δ+α e(n+2m−1−l)δ+α ,
[e−nδ+α , e(n+2m)δ+α ] = − (−1)(n−1)θ(α) (qα2 −1) ×
n+m−1 l=n
qα−l knδ−α e(l−n)δ+α e(n+2m−l)δ+α n−1
+ (qα2 −1)
l=1
(4.26) (−1)lθ(α) qα−l kδl e(l−n)δ+α e(n+2m−l)δ+α
2 − (−1)(n−1)θ(α) (qα −1)qα−m−n+1 knδ−α emδ+α
for any integers n, m > 0; [e(n+2m−1)δ−α , e−nδ−α ] = (−1)(n+1)θ(α) (qα2 −1) ×
n+m−1 l=n+1
−1 qα−l e(n+2m−1−l)δ−α e(l−n)δ−α knδ+α
− (qα2 −1)
n−1 l=1
(−1)lθ(α) qα−l e(n+2m−1−l)δ−α e(l−n)δ−α kδ−l , (4.27)
550
S. M. Khoroshkin, J. Lukierski, V. N. Tolstoy
[e(n+2m)δ−α , e−nδ−α ] = (−1)(n+1)θ(α) (qα2 −1) ×
n+m−1 l=n
−1 qα−l e(n+2m−l)δ−α e(l−n)δ−α knδ+α
− (qα2 −1)
n−1 l=1
(−1)lθ(α) qα−l e(n+2m−l)δ−α e(l−n)δ−α kδ−l
−1 2 + (−1)(n−1)θ(α) (qα −1)qα−m−n+1 emδ−α knδ+α
(4.28) for any integers n ≥ 0, m > 0; [enδ+α , emδ−α ]q = e(n+m)δ
(n ≥ 0, m > 0),
(4.29)
−1 kmδ+α (n > m ≥ 0), [enδ+α , e−mδ−α ] = −(−1)(m+1)θ(α) e(n−m)δ
(4.30)
[e−mδ+α , enδ−α ] = −(−1)mθ(α) kmδ−α e(n−m)δ
(n > m > 0),
(4.31)
(n > 0, m > 0),
(4.32)
qα−l e(n+l)δ+α e(m−l)δ
(4.33)
qα−l e(m−l)δ e(n+l)δ−α
(4.34)
[enδ , emδ ] = [e−nδ , e−mδ ]=0 [enδ+α , emδ ] = qα−m+1 ae(n+m)δ+α + (qα2 −1)
m−1 l=1
for any integers n ≥ 0, m > 0; [emδ , enδ−α ] = qα−m+1 ae(n+m)δ−α + (qα2 −1)
m−1 l=1
for any integers n, m > 0; [e−nδ+α , emδ ] = − (−1)(n−1)θ(α) qα−m+1 aknδ−α e(m−n)δ+α
− (−1)(n−1)θ(α) (qα2 −1)knδ−α
+ (qα2 −1)
n−1 l=1
m−1 l=n
qα−l e(l−n)δ+α e(m−l)δ
(4.35)
(−1)lθ(α) qα−l kδl e(l−n)δ+α e(m−l)δ
for any integers m ≥ n > 0; [e−nδ+α , emδ ] = (−1)mθ(α) qα−m+1 akδm e(m−n)δ+α
+ (qα2 −1) for any integers n > m > 0;
m−1 l=1
(−1)lθ(α) qα−l kδl e(l−n)δ+α e(m−l)δ
(4.36)
Quantum Affine (Super)Algebras
551
−1 [emδ e−nδ−α ] = − (−1)(n+1)θ(α) qα−m+1 ae(m−n)δ−α knδ+α
− (−1)
(n+1)θ(α)
+ (qα2 −1)
n l=1
(qα2 −1)
m−1 l=n+1
−1 qα−l e(m−l)δ e(l−n)δ−α knδ+α
(4.37)
(−1)lθ(α) qα−l e(m−l)δ e(l−n)δ−α kδ−l
for any integers m > n ≥ 0; e−nδ−α ] = (−1)mθ(α) qα−m+1 ae(m−n)δ−α kδ−m [emδ
+ (qα2 −1)
m−1 l=1
(−1)lθ(α) qα−l e(m−l)δ e(l−n)δ−α kδ−l
(4.38)
for any integers n ≥ m > 0. Here in the relations (4.21)–(4.38) and in what follows qα := (−1)θ(α) q (α,α) . Outline of proof. First of all, the formulas (4.18) are trivial. The relations (4.19) and (4.20) are obtained by application of the translation operators Tδn and Tδ−n to the relations (2.3). Further, in terms of the generators (4.3)–(4.6) the relation (2.5) means that [eα , eδ+α ]q = 0. Applying to it the operator Tδn , we obtain the relation (4.21) for m = 1. In the case m > 1 the formulas (4.21) and (4.22) are proved for arbitrary m by induction. If we apply the operator Tδ−k to the relations (4.21) and (4.22) for n = 0, then in the case k < m we obtain the relations (4.25) and (4.26), in the case m < k < 2m we obtain the relations which are obtained from (4.27) and (4.28) by the conjugation “∗ ”, and finally for k > 2m we get the relations which are obtained from (4.23) and (4.24) by the conjugation “∗ ”. Further, the relation (4.29) for n = 0 is trivial (see (4.6)). Applying to (4.29) with n = 0 the operators Tδn , we can obtain for any n > 0 and m > 0 the relation (4.29) as well as the relation (4.30). The relation (4.31) can be obtained from (4.29) by repeated application of the operator Tδ−1 . The relations (4.33) in the case n = 0 and (4.34) in the case n = 1 are proved by direct verification with the help of previous results. Repeated application of the operators Tδ±1 to these relations provides the general case n, m > 0. The relation (4.32) is proved by direct verification with the help of the relations (4.33) and (4.34). At last, the relations (4.35)–(4.38) can be obtained from (4.33) and (4.34) by repeated application of the operator Tδ−1 . do not satisfy the relations of the type (4.19) and The imaginary root vectors enδ therefore we introduce new imaginary roots vectors e±nδ by the following (Schur) relations:
enδ =
p1 +2p2 +...+npn =n
(−1)θ (α) (q−q −1 )
pi −1
p1 !···pn !
p
p
eδ 1 · · · enδn .
(4.39)
In terms of generating functions E (u) = (−1)θ(α) (q − q −1 )
n≥1
enδ u−n ,
(4.40)
552
S. M. Khoroshkin, J. Lukierski, V. N. Tolstoy
E(u) = (−1)θ(α) (q − q −1 )
enδ u−n
(4.41)
n≥1
the relation (4.39) may be rewritten in the form E (u) = −1 + exp E(u)
(4.42)
E(u) = ln(1 + E (u)).
(4.43)
or This provides a formula inverse to (4.39),
enδ =
(−1)θ(α) (q −1 −q)
pi −1
n
(
i=1 pi −1)!
p1 !···pn !
p1 +2p2 +...+npn =n
pn (eδ )p1 · · · (enδ ) . (4.44)
The new root vectors corresponding to negative roots are obtained by the Cartan conjugation (∗ ): e−nδ = (enδ )∗ .
(4.45)
Proposition 4.2. The new root vectors e±nδ satisfy the following commutation relations: [enδ+α , emδ ] = (−1)(m−1)θ(α) a(m)e(n+m)δ+α
(n ≥ 0, m > 0),
(4.46)
[emδ , enδ−α ] = (−1)(m−1)θ(α) a(m)e(n+m)δ−α
(n, m > 0),
(4.47)
[e−nδ+α , emδ ] = −(−1)(n+m)θ(α) a(m) knδ−α e(m−n)δ+α (m ≥ n > 0),
(4.48)
[e−nδ+α , emδ ] = (−1)θ(α)a(m) kδm e(m−n)δ+α
(n > m > 0),
(4.49)
−1 [emδ , e−nδ−α ] = −(−1)(n+m)θ(α)a(m)e(m−n)δ−α knδ+α
(m > n ≥ 0),
(4.50)
[emδ , e−nδ−α ] = (−1)θ(α)a(m)e(m−n)δ−α kδ−m
(n ≥ m > 0),
(4.51)
(n, m > 0),
(4.52)
[enδ , e−mδ ] = δnm a(m)
kδm − kδ−m q − q −1
where a(m) :=
q m(α,α) − q −m(α,α) . m(q − q −1 )
(4.53)
This can be proved by direct calculation, applying the relations of Proposition (4.1) and the actions of the translation operators Tδ±1 . All the relations of Propositions (4.1), (4.2) together with the ones obtained from them by the conjugation describe a complete list of the permutation relations of the Cartan–Weyl bases corresponding to the “direct” normal ordering (4.1). Applying to these relations the Dynkin involution τ , it is easy to obtain these results for the “inverse” normal ordering (4.2).
Quantum Affine (Super)Algebras
553
(2) 5. Extremal Projector for Uq (A(1) 1 ) and Uq (C(2) )
A general formula for the extremal projector for quantized contragredient Lie (super)algebras of finite growth was presented in Refs. [17, 10, 11]. Here we specialize (1) this result to our case Uq (g), where g = A1 , C(2)(2) . By definition, the extremal projector for Uq (g) is a nonzero element p := p(Uq (g)) of the Taylor extension Tq (g) of Uq (g) (see Refs. [17, 10, 11]), satisfying the equations eα p = p e−α = 0,
eδ−α p = p e−δ+α = 0,
p2 = p.
(5.1)
The explicit expression of the extremal projector p for our case Uq (g) can be presented as follows: p = p+ p0 p− ,
(5.2)
where the factors p+ , p0 and p+ have the following form: p+ =
→ n≥0
pnδ+α ,
p0 =
n≥1
pnδ ,
p− =
← n≥1
pnδ−α .
(5.3)
The elements pγ are given by the formula pnδ+α = pnδ = pnδ−α =
∞ (−1)m + m ϕ e em , (m)q¯α ! n,m −nδ−α nδ+α
(5.4)
∞ (−1)m 0 m m ϕn,m e−nδ enδ , m!
(5.5)
∞ (−1)m − m em , ϕ e (m)q¯α ! n,m −nδ+α nδ−α
(5.6)
m=0
m=0
m=0
+ , ϕ 0 and ϕ − are determined as follows: where the coefficients ϕm m m + ϕn,m =
m
(−1)mnθ(α) (q − q −1 )m q −m( knδ+α q
r=1 0 ϕn,m =
− ϕn,m
=
(n+ 21 + 2r )(α,α)
m−1 4 +n)(α,α) r
−1 − (−1)(r−1)θ(α) knδ+α q −(n+ 2 + 2 )(α,α) 1
,
nm (q − q −1 )n+m q −mn(α,α) , (q n(α,α) − q −n(α,α) )m (kδn q n(α,α) − kδ−n q −n(α,α) )m m
(−1)m(n−1)θ(α) (q − q −1 )m q −m(
r=1
r
(5.8)
m−5 4 +n)(α,α) r
−1 q −(n− 2 + 2 )(α,α) knδ−α q (n− 2 + 2 )(α,α) − (−1)(r−1)θ(α) knδ−α 1
(5.7)
1
.
(5.9)
Here in the relations (5.4), (5.6) and in what follows we use the notation q¯α := (−1)θ(α) q −(α,α) , and the symbol (m)q¯α is defined by the formula (6.4). Acting by the extremal projector p on any highest weight Uq (g)-module M we obtain a space M 0 = pM of highest weight vectors for M if pM has no singularities. A concrete example of the application of the extremal projector for the case of the quantum algebra Uq (gl(n, C) can be found in Ref. [17].
554
S. M. Khoroshkin, J. Lukierski, V. N. Tolstoy
(2) 6. Universal R-Matrix for Uq (A(1) 1 ) and Uq (C(2) )
Any quantum (super)algebra Uq (g) is a non-cocommutative Hopf (super)algebra which has the intertwining operator called the universal R-matrix. By definition [5], the universal R-matrix for the Hopf (super)algebra Uq (g) is an invertible element R of the Tylor extension Tq (g) ⊗ Tq (g) of Uq (g) ⊗ Uq (g) (see Refs. [11–13]), satisfying the equations ˜ q (a) = Rq (a)R −1 (q ⊗ id)R = R 13 R 23 ,
∀ a ∈ Uq (g),
(6.1)
(id ⊗q )R = R 13 R 12 ,
(6.2)
˜ q is the opposite comultiplication: ˜ q = σ q , σ (a ⊗ b) = (−1)deg a deg b b ⊗ a where for all homogeneous elements a, b ∈ Uq (g). In the relation (6.2) we use the standard 12 = notations R ai ⊗bi ⊗id, R 13 = ai ⊗id ⊗bi , R 23 = id ⊗ai ⊗bi if R has the form R = ai ⊗ bi . We employ the following standard notation for the q-exponential: xn x2 xn expq (x) := 1 + x + (2) (6.3) ! + . . . + (n) ! + . . . = (n) ! , q
q
n≥0
q
where q n −1
(n)q := q−1 .
(6.4)
A general formula for the universal R-matrix R for quantized contragredient Lie (super)algebras was presented in Refs. [11–13]. Here we specialize this result to our case (1) Uq (g), where g = A1 , C(2)(2) . The explicit expression of the universal R-matrix R for our case Uq (g) can be presented as follows: R = R+ R0 R− K.
(6.5)
Here the factors K and R± have the following form: K = q (α,α) hα ⊗hα +hδ ⊗hd +hd ⊗hδ , 1
R+ =
→
Rnδ+α ,
R− =
n≥0
←
Rnδ−α .
A(γ ) =
(6.7)
n≥1
The elements Rγ are given by the formula Rγ = expq¯γ A(γ )(q − q −1 )(eγ ⊗ e−γ ) , where
(6.6)
(−1)nθ(α) (−1)(n−1)θ(α)
if γ = nδ + α, if γ = nδ − α.
Finally, the factor R0 is defined as follows: d(n)enδ ⊗ e−nδ , R0 = exp (q − q −1 ) n>0
(6.8)
(6.9)
(6.10)
Quantum Affine (Super)Algebras
555
where d(n) is the inverse to a(n), i.e. d(n) =
n(q − q −1 ) . q n(α,α) − q −n(α,α)
(6.11)
7. The “New Realization” Let us denote by d the Cartan element hd and by c the Cartan element hδ , emphasizing that d defines homogeneous gradation of the algebra and kδ = q hδ is the central element. c
±1
It will be convenient in the following to add its square roots q ± 2 = kδ 2 . Let us introduce the new notations: en := enδ+α (n ≥ 0), e−n := −(−1)(n−1)θ(α) k−nδ+α e−nδ+α (n > 0), and fn := −enδ−α knδ−α (n > 0), f−n := (−1)(n+1)θ(α) e−nδ−α n ≥ 0). We also put nc nc an := enδ q 2 (n ≥ 1), and a−n := (−1)nθ(α) e−nδ q − 2 (n ≥ 1). Collect the elements en , fn (n ∈ Z ) and a±n (n ≥ 1) into the generating functions (“fields”) e(z) =
en z−n ,
n=1
n∈ Z
f (z) =
∞ ψ+ (z) = kα−1 exp (−1)θ(α) (q − q −1 ) an z−n ,
fn z−n ,
ψ− (z) = kα exp (−1)θ(α) (q −1 − q)
∞
(7.1)
a−n zn ,
n=1
n∈ Z
such that deg e(z) = deg f (z) = θ(α),
deg ψ± (z) = 0.
(7.2)
These fields satisfy the following conjugation conditions with respect to graded conjugation “‡ ”: (e(z))‡ = f (z−1 ),
(f (z))‡ = (−1)θ(α) e(z−1 ),
(ψ+ (z))‡ = ψ− (z−1 ), (ψ− (z))‡ = ψ+ (z−1 ),
(7.3)
and have the following symmetry with respect to the translation operator Tδ : Tδ (e(z)) = (−1)θ(α) ze((−1)θ(α) z), Tδ (f (z)) = (−1)θ(α) z−1 f ((−1)θ(α) z), Tδ (ψ+ (z)) = q −c ψ+ ((−1)θ(α) z),
Tδ (ψ− (z)) = q c ψ− ((−1)θ(α) z). (7.4)
Proposition 7.1. In terms of the fields (7.1) the relations of Sect. 4 can be rewritten in the following compact form: [q c , everything] = 0, ud ϕ(v)u−d = ϕ(uv),
(7.5)
where ϕ(v) = e(v), f (v), ψ± (v), and also ψ± (u)ψ± (v) = ψ± (v)ψ± (u),
(7.6)
556
S. M. Khoroshkin, J. Lukierski, V. N. Tolstoy
(u − q¯α v)e(u)e(v) = (q¯α u − v)e(v)e(u),
(7.7)
(u − qα v)f (u)f (v) = (qα u − v)f (v)f (u),
(7.8)
∓ 2c
−1 q¯α q u − v ψ± (u)e(v) ψ± (u) = (−1)θ(α) ∓ c e(v), q 2 u − q¯α v
(7.9)
c
−1 qα q ± 2 u − v = (−1)θ(α) ± c f (v), ψ± (u)f (v) ψ± (u) q 2 u − qα v
−1
ψ+ (u)
−1 (q c u − qα v)(q −c u − q¯α v) ψ− (v)ψ+ (u) ψ− (v) = c , (q v − q¯α u)(q −c v − qα u) u 1 [e(u), f (v)] = δ( q −c )ψ− (vq c/2 ) q − q −1 v u c − δ( q )ψ+ (uq c/2 ) . v
Here in (7.11) δ(z) = supercommutator:
n∈ Z z
(7.10) (7.11)
(7.12)
n , and the brackets [·, ·] in the relation (7.12) mean the
[e(u), f (v)] = e(u)f (v) − (−1)θ(α) f (v)e(u).
(7.13)
The given description is called the “new realization”, or the current realization of the (1) quantum affine superalgebras Uq (A1 ) and Uq (C(2)(2) ). It should be noted that the relations (7.1) and (7.6)–(7.12) differ from the corresponding relations of Refs. [3, 5, 20] by replacement of q by q −1 . The current realization possesses its own graded comultiplication structure, different from (2.12): (D) q (c) = c ⊗ 1 + 1 ⊗ c, (D) q (d) = d ⊗ 1 + 1 ⊗ d, c2
c1
±2 ) ⊗ ψ± (zq ∓ 2 ), (D) q (ψ± (z)) = ψ± (zq
(7.14)
c1
c1 2 (D) q (e(z)) = e(z) ⊗ 1 + ψ− (zq ) ⊗ e(zq ), c2
c2 2 (D) q (f (z)) = f (zq ) ⊗ ψ+ (zq ) + 1 ⊗ f (z),
S (D) q (c) = −c,
S (D) q (d) = −d,
−1 , S (D) q (ψ± (z)) = ψ± (z) − 2c −1 ) e(zq −c ), S (D) q (e(z)) = − ψ− (zq
(7.15)
−c − 2c −1 ) , S (D) q (f (z)) = −f (zq ) ψ+ (zq ε(c) = ε(d) = ε(e(z)) = ε(f (z)) = 0,
ε(ψ ± (z)) = 1.
(7.16)
Quantum Affine (Super)Algebras (D)
557
(D)
Here q , S q , and ε are the comultiplication, antipode and counit respectively. The (D) two comultiplications q and q are related by the twist [13]: −1 (x)F, (D) q (x) = F
(7.17)
21 , with R given by (6.7)–(6.9), such that the universal R-matrix for the where F = R+ + (D) comultiplication q equals 21 R(D) = R0 R− KR+
(7.18)
with the factors from (6.5). In the generators en , fn and an it can be rewritten as follows ¯ R(D) = KR,
(7.19)
where K=q
hα ⊗hα (α,α)
∞ 1 1 q 2 (c⊗d+d⊗c) exp (q − q −1 ) d(n) an ⊗ a−n q 2 (c⊗d+d⊗c) ,
(7.20)
n=1
¯ = R
→ n∈Z
expq¯α (q −1 − q)f−n ⊗ en ,
(7.21)
and d(n) =
n(q − q −1 ) . qαn − qα−n
(7.22)
¯ in the completed algebras It is possible to give another presentation of the element R (1) (2) ¯ U (g), where g is either A1 or C(2) [3, 4]. The completion is done with respect to open neighborhoods of zero U¯ r = s>r Us , where Us consists of all the elements from U (g) of degree s. The completed algebra acts on (infinite-dimensional) representations of highest weight and admits the series over monomials xi1 xi2 · · · xin , i1 ≤ i2 . . . ≤ in , with x = e, f, a and fixed ik . The matrix coefficients of the products of the currents e(z1 )e(z2 ) · · · e(zn ) and f (z1 )f (z2 ) · · · f (zn ), defined originally as formal series, converge to meromorphic in Cn functions with the poles at zi = 0 and zi = qα∓1 zj , i ≤ j. Let t (z) = (q −q −1 )f (z)⊗e(z). As before, we the product t (z1 ) · · · t (zn ) n understand as an operator-valued meromorphic function in C∗ with simple poles at zi = qα∓1 zj , i = j . Define 1 dz1 dzn ¯ = 1+ R · · · ··· t (z1 ) · · · t (zn ), (7.23) n n!(2π i) z1 zn n>0
Dn
and the integration region Dn is defined as Dn = |zi | = 1, i = 1, . . . , n for |q| < 1 and, more generally, by Dn = zi (zi − qα zj ) = 1, i = 1, . . . , n (7.24) j =1,...,n, j =i
for any q, such that qαN = 1, N ∈ Z \{0}.
558
S. M. Khoroshkin, J. Lukierski, V. N. Tolstoy
¯ in the tensor product of highest Proposition 7.2. The action of the tensor R = KR weight modules is well defined and coincides with the action of the universal R-matrix (7.19) The integrals in (7.20) can be computed explicitly. Let us put by induction t (n) (z) = −
Res
z1 =zq¯α2n−2
t (z1 )t (n−1) (z)
dz1 , z1
(7.25)
where t (1) (z) = t (z). In the components the fields t (n) (z) look as follows: t (n) (z) = Cn e(n) (z) ⊗ f (n) (z),
(7.26)
where − n(n−1) 2
Cn = (−1)(n−1)θ(α) (q −q −1 )n q˜α
(q˜α −1)n−1 (n−1)q˜α! (n)q˜α!,
e(n) (z) = e(z)e(q¯α z)e(q¯α2 z) · · · e(q¯αn−2 z)e(q¯αn−1 z),
(7.27)
f (n) (z) = f (q¯αn−1 z)f (q¯αn−2 z) · · · f (q¯α2 z)f (q¯α z)f (z), such that e(n) (z) =
m∈ Z
zq¯αn
m
λ1 ≥···≥λn , λ1 +···+λn =m
qαλ1 +2λ2 +...nλn · · · eλ1 , e e (λj − λj +1 )q¯α ! λn λn−1
j∈ Z
(7.28) f (n) (z) =
m∈ Z
zqα
m
λ1 ≥···≥λn , λ1 +···+λn =m
qαλ1 +2λ2 +...nλn f f · · · fλ1 . (λj − λj +1 )qα ! λn λn−1
j∈ Z
Here λj = #k, such that λk ≥ j , and q˜α := q (α,α) . The product in the denominator is finite, since there are only finitely many distinct λj for a given choice of λk . Then, ¯ : repeating the calculations in [4], we get a vertex type presentation of the element R 1 ¯ = exp R (7.29) n In , n>0
where the sequence of operators
In =
t (n) (z)dz 2πiz
(7.30)
commute between themselves: [In , Im ] = 0,
n, m > 0.
The vertex operator presentation (7.29) is convenient for applications to integrable representations: it is expressed through integrals over the fields, which number is precisely k for level k integrable representations.
Quantum Affine (Super)Algebras
559
8. Final Remarks The aim of this paper is to describe in a unified way in detail the q-deformed untwisted (1) = Uq (A1 ) and twisted superalgebra Uq (osp(2|2)(2) ) = affine algebra Uq (sl(2)) Uq (C(2)(2) ). In order to describe the complete list of quantum affine (super)algebras of rank 2 one should consider the following three quantum affine (super)algebras: (2) Uq (sl(1|3)(4) ) = Uq (A(0, 2)(4) ), Uq (sl(3)(2) ) = Uq (A2 ) and Uq ( osp(1|2)) = (1) Uq (B(0, 1) ). The Dynkin diagram of the superalgebra A(0, 2)(4) has the same ge(1) ometric structure as the (super)algebras A1 and C(2)(2) , but in this case the root α is even and δ − α is odd, and the sector of imaginary roots has odd roots. Therefore in the case of the quantum superalgebra A(0, 2)(4) the relations of the type (4.29)–(4.38) are more complicated and they require special consideration. The second family of two (2) quantum affine (super)algebras Uq (A2 ) and Uq (B(0, 1)(1) ) are described by the same Dynkin diagram with different colors of roots. Preliminary results in this direction are given in [14], where in particular the Cartan–Weyl basis of the basic affine superalgebra osp(1|2)) is considered. The unified description of two affine Uq (B(0, 1)(1) ) ≡ Uq ( (super)algebras mentioned above, analogous to the one given in the present paper, is in preparation. Acknowledgements. This work was supported (S. M. Khoroshkin, V. N. Tolstoy) by the Russian Foundation for Fundamental Research, grant No. 98-01-00303, by the program of French-Russian scientific cooperation (CNRS grant PICS-608 and grant RFBR-98-01-22033), as well as by KBN grant 5P03B05620 (J. Lukierski) and INTAS-99-1705 (S. Khoroshkin).
References 1. Asherova, R.M., Smirnov, Yu.F., and Tolstoy, V.N.: A description of some class of projection operators for semisimple complex Lie algebras. (Russian) Matem. Zametki 26, 15–25 (1979) 2. Beck, J.: Braid group actions and quantum affine algebras. Commun. Math. Phys. 165, 555–568 (1994) 3. Ding, J., and Khoroshkin, S.: Weyl group extension of quantized current algebras. Transformation Groups 5, 35–59 (2000); math.QA/9804139 ¯ 4. Ding, J., Khoroshkin, S., and Pakuliak, S.: Integral representations for the universal R-matrix. ITF preprint, ITEP-TH-67/99, Moscow, 1999; http://wwwth.itep.ru/mathphys/psfiles/99_67.ps 5. Drinfeld, V.G.: Quantum groups. Proc. ICM-86 (Berkely USA) Vol. 1, Providence, RI: Am. Math. Soc., 1987, pp. 798–820 6. Drinfeld, V.G.: A new realization of Yangians and quantized affine algebras, Soviet Math. Dokl. 36, 212–216 (1988) 7. Kac, V.G.: Lie superalgebras. Adv. Math. 26, 8–96 (1977) 8. Kac, V.G.; Infinite dimensional Lie algebras. Cambridge: Cambridge University Press, 1990 9. Khoroshkin, S.M., and Tolstoy, V.N.: Universal R-matrix for quantized (super)algebras. Commun. Math. Phys. 141, no. 3, 599–617 (1991) 10. Khoroshkin S.M., and Tolstoy V.N.: Extremal projector and universal R-matrix for quantum contragredient Lie (super)algebras. In: Quantum groups and related topics (Wrocław, 1991), Math. Phys. Stud. 13, Dordrecht: Kluwer Acad. Publ., 1992, pp. 23–32 11. Khoroshkin, S.M., and Tolstoy, V.N.: The uniqueness theorem for the universal R-matrix. Lett. Math. Phys. 24, no. 3, 231–244 (1992) 12. Khoroshkin, S.M., and Tolstoy, V.N.: The Cartan–Weyl basis and the universal R-matrix for quantum Kac-Moody algebras and superalgebras. In: Quantum Symmetries (Clausthal 1991). River Edge, NJ: World Sci. Publishing, 1993, pp. 336–351 13. Khoroshkin, S.M.; Tolstoy, V.N.: Twisting of quantum (super)algebras. Connection of Drinfeld’s and Cartan–Weyl realizations for quantum affine algebras. MPIM preprint, Bonn: MPI/94-23, 29 p., 1994; hep-th/9404036 14. Lukierski, J., and Tolstoy, V.N.: Cartan–Weyl basis for quantum affine superalgebra Uq (osp (1|2)). Czech. J. Phys. 47, no. 12, 1231–1239 (1997); q-alg/9710030
560
S. M. Khoroshkin, J. Lukierski, V. N. Tolstoy
15. Lusztig, G.: Canonical bases arising from quantized enveloping algebras. J. Am. Math. Soc. 3, 447–498 (1990) 16. Tolstoy, V.N.: Extremal projectors for contragredient Lie algebras and superalgebras of finite growth. (Russian) Uspekhi Math. Nauk 44, no. 1 (265), 211–212 (1989); translation in Russian Math. Surveys 44, no. 1, 257–258 (1989) 17. Tolstoy, V.N.: Extremal projectors for quantized Kac-Moody superalgebras and some of their applications. In: Quantum Groups (Clausthal, 1989), Lectures Notes in Phys. 370, Berlin: Springer, 1990, pp. 118–125 18. Tolstoy, V.N., and Khoroshkin, S.M.: The Universal R-matrix for quantum nontwisted affine Lie algebras. (Russian) Funktsional. Anal. i Prilozhen. 26, no. 1, 85–88 (1992); translation in Functional Anal. Appl. 26, no. 1, 69–71 (1992) 19. Van der Leur, J.W.: Contragredient Lie superalgebras of finite growth. Utrecht thesis, 1985 20. Yang, W.-L., and Zhang, Y.-Z.: Drinfeld basis and a nonclassical free boson representation of twisted quantum affine superalgebra Uq [osp(2|2)]. Preprint math.QA/9904017 Communicated by A. Connes
Commun. Math. Phys. 220, 561 – 582 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Floquet Spectrum of Weakly Coupled Map Lattices Viviane Baladi1 , Hans Henrik Rugh2 1 CNRS UMR 8628, Université de Paris-Sud, 91405 Orsay, France. E-mail:
[email protected] 2 Département des Mathématiques, Université de Cergy-Pontoise, 95302 Cergy-Pontoise, France.
E-mail:
[email protected] Received: 4 January 2001 / Accepted: 6 February 2001
Abstract: We consider weakly coupled analytic expanding circle maps on the lattice ZD (for D ≥ 1), with small coupling strength and summable decay of the twosites coupling. We study the spectrum of the associated (Perron–Frobenius) transfer operators L . On a suitable Banach space, perturbation theory applied to the difference of a high iterate n0 of the transfer operators Ln0 0 and Ln 0 yields localisation of the full spectrum of Ln 0 . The time-transfer operator Ln 0 commutes with the spatial translations, and we provide a description of part of their joint eigenvalues, more precisely of the “nonresonant” spectrum of Ln 0 restricted to the eigenspaces of the spatial translations. In particular, we exhibit smooth curves of eigenvalues and eigenspaces of Ln 0 as functions of the eigenvalues eiα (“crystal momenta”) of the spatial translations. 1. Introduction Transfer operators are bounded linear operators L, which are usually defined through composition by a discrete-time dynamical system F : X → X, and multiplication by a weight function G on X: Lφ(x) = G(y)φ(y), Fy=x
where the observable φ : X → C belongs to a suitable Banach space B. The weight, or potential, G is often a positive function related to some reference measure m by a “conformality” condition: ψ ◦ F φ dm = ψLφ dm. X
X
In particular, if X is a (finite-dimensional) compact manifold, m is often Lebesgue measure and G the Jacobian. Then, if F is sufficiently smooth and hyperbolic, such a transfer operator enjoys Perron–Frobenius type properties, the maximal eigenfunction
562
V. Baladi, H. H. Rugh
being related to an F -invariant probability measure with good thermodynamic and/or ergodic properties, and the spectral gap corresponding to the exponential rate of decay of correlations for test functions in B. Even in this case, transfer operators are usually, however, very far from being finite rank: compactness essentially only holds if F is analytic, while a weaker quasicompactness property is the best one can do in finite differentiability. That part of the spectrum which consists of isolated eigenvalues of finite multiplicity sometimes has a dynamical interpretation, and can often be described by a dynamical Fredholm determinant or zeta function. For the rich array of results obtained since the early work of Ruelle in the 1960’s, we refer, e.g., to the monograph [B] and references therein. One aim of this article is to see how much of the above picture remains true in infinitedimensional dynamics, focusing on one of the most tractable situations: weakly coupled map lattices. Coupled map lattices (CMLs) have been an object of rigorous studies since the seminal article of Bunimovich and Sinai [BS]. In this work, we shall consider the situation of weakly coupled analytic circle maps F , the lattice being just ZD , in the footsteps of Bricmont and Kupiainen [BK1]. We refer to the book of Kaneko [K], the survey of Bunimovich [Bu] or the introductions, e.g., of the more recent articles [BIJ] or [FR] for overview and motivation, or discussion of other settings (in particular the work of Pesin–Sinai [PS], Keller and Künzle [KK], Volevich [V], Jiang [J], Jiang–Pesin [JP], or the article of Bricmont–Kupiainen [BK2] in a nonanalytic framework). Bricmont and Kupiainen [BK1], in a remarkable paper, constructed a Sinai–Ruelle– Bowen measure (limit of (F∗ )n (⊗ZD Lebesgue)) and proved the exponential decay of its correlation functions for analytic observables depending on finitely many sites. (They were supposing exponential decay of the couplings, an assumption which has been weakened only very recently by Rugh [R], see below.) Their main technical tools were the transfer operators L, associated to the truncations of the system on finite subsets ⊂ ZD of the lattice, acting on holomorphic and bounded functions on finitedimensional polyannuli. They showed bounds on their spectral gaps, uniform in the size ||. This approach forced them to consider only local observables, and the prefactor in the correlation decay bounds grows exponentially with the size of . It is therefore not only for esthetical reasons that one would like to define a Banach space of observables depending on all lattice sites, and a transfer operator associated to the full (non-truncated) coupled dynamics. There are two obvious caveats in this enterprise. Firstly, in the translation-invariant case the SRB measure (which is ergodic) is absolutely continuous with respect to the product (⊗ZD Lebesgue) of Lebesgue measure on the individual sites only if it coincides with this product. So a Banach space which is big enough to contain the SRB fixed point of the transfer operator cannot be a subset of L1 (⊗ZD Lebesgue). Secondly, an element λ = 1 of the spectrum should not be expected to be an eigenvalue of finite multiplicity, even for the uncoupled transfer operator. Indeed, restricting the uncoupled operator to a finite subset ⊂ ZD , we obtain a finite tensor product, with eigenvalues K n0 k=1 λik , K ≤ ||, where the λi are eigenvalues of the single-site operator L , and the multiplicity, e.g., of a simple eigenvalue λi of Ln0 viewed as an eigenvalue (for K = 1) of ⊗ Ln0 coincides with the cardinality of . (Writing h for the fixed point of Ln0 and ϕi for the eigenfunction of Ln0 and λi , just note that for each p ∈ the tensor product ϕi (xp )⊗q∈\{p} h(xq ) gives a different eigenfunction.) An obvious solution to the second difficulty is to use translation invariance to restrict the transfer operator to the (invariant) eigenspaces of the spatial translations σ ∗ . The spectrum of each such restriction should
Floquet Spectrum of Weakly Coupled Map Lattices
563
be more tractable, in particular the “single-site” or “nonresonant” eigenvalues of the uncoupled operator (K = 1 in the analysis above for the uncoupled operator) should have finite multiplicity, giving some hope that a perturbative analysis might lead to the construction of nontrivial eigenvalues and eigenfunctions for the coupled operator L . It is this “Floquet spectrum” strategy (suggested to the first author by D. Ruelle in 1996) which we are able, finally, to carry out successfully in the present article. This is Theorem 1, stated in Sect. 2, which describes the Floquet eigenvalues generated by “nonresonant” eigenvalues of the full uncoupled operator L0 . A few words about our use of the Floquet spectrum terminology ([JS, Chap. 9], see [FKT] for a recent example) are perhaps appropriate here. In fact, we hesitated between “Bloch decomposition” (see [AC] for a recent implementation of this idea to Schrödinger operators) and “Floquet spectrum”. We opted for the second expression, which seems to be more frequent in the rigorous literature. We would like to point out that in our situation all complex frequencies or multifrequencies (sometimes called crystal momenta elsewhere, they are denoted α below) are present because of the infinitedimensional nature of the system. A brief history of the “full” transfer operator for CMLs. Keller and Künzle [KK] were the first to let the transfer operator associated to the full (infinite-dimensional) coupled dynamics act on a Banach space, in the setting of coupled non-Markov piecewise monotone interval maps, with bounded variation norms. Unfortunately, they were not able to prove the existence of a spectral gap for this full transfer operator (this remains to our knowledge an open question in this framework). In the easier situation of coupled analytic expanding circle maps from [BK1], assuming furthermore that D = 1, Baladi et al. [BIJ] were able to construct a Banach space on which the full transfer operator has a simple eigenvalue at 1 and enjoys a spectral gap. (This yields exponential decay of correlations for a class of nonlocal observables.) Sadly, this Banach space is not spatially homogeneous, so that the spatial translation σ ∗ does not define a bounded operator on it. For this reason, it was not possible to go “beyond the first gap” for the Banach space of [BIJ]. In the same paper [BIJ], a translationally invariant Fréchet space F was introduced on which the coupled transfer operator enjoyed a spectral gap property (for all D ≥ 1), except, however, for the possible presence of continuous spectrum, which also obstructed further spectral analysis. This frustrating situation improved greatly when Fischer and Rugh [FR], using the same basic assumptions from [BK1], but exploiting a much simpler cluster expansion formalism (inspired by a paper of Maes and van Moffaert [MM] in a random noise framework), showed that the Banach space subset of F defined by a natural norm was preserved by a suitable iterate of Ln 0 , and that this iterate enjoyed a spectral gap. (As Fischer and Rugh were mostly concerned with ergodic properties of the dynamics, these facts are not explicitly stated in [FR], but can easily be deduced from their results.) The Banach space analysed in [FR] appears to be the “correct” invariant space: neither too big, nor too small, and translationally invariant. In the present work, we carry out the Floquet spectrum program sketched above for this Banach space, not only under the hypotheses of [FR], but in fact in a more general setting which was introduced very recently by Rugh [R]. Rugh also considered weakly coupled expanding analytic circle maps, and he studied the transfer operator acting on the Banach space from [FR]. The main novelty, for our present purposes, is that exponential decay of the coupling is not necessary: only a summability condition (see (2.1–2.2)) is required.
564
V. Baladi, H. H. Rugh
Sketch of the paper. The article is organised as follows. In Sect. 2, after recalling the restriction of the setting of [R] (translationally invariant coupled maps on a lattice) which is relevant here, we first state Theorem 0 (proved in the appendix), which exploits the results of [R] to describe the spectrum of the uncoupled transfer operator and show that the coupled and uncoupled operator are close in operator norm when the coupling strength is small. Next, we describe the eigenspaces X α of σ ∗ as well as the eigenspaces of the restriction of the uncoupled transfer operator to the various Xα (see in particular Lemma 1). We then state our main result, Theorem 1, which describes the nonresonant part of the spectrum of Ln 0 restricted to each eigenspace X α of σ ∗ (Claim (1), which can be deduced from Theorem 0 and perturbation theory), as well as the α-dependence of the corresponding eigenvalues (Claim (2)). Section 3 is devoted to an elementary abstract functional analytical argument inspired from [BY] which gives, in particular, a very explicit proof of Theorem 1 (1). The results from Sect. 3 are used in Sect. 4 to yield a matrix description of the restriction of Ln 0 to an invariant space, expressed as a graph over the corresponding invariant space for Ln0 0 . We exploit this description to show Theorem 1 (2). The key for this is Proposition 3, which embodies that information from the coupling decay which is relevant.
2. Setting and Formal Statement of Results We start by recalling the settings of [FR] and [R], keeping as close as possible to the notation of [R] (the main exceptions are our use of n, n0 for the time parameter instead of Rugh’s T , T0 , instead of κ for the coupling strength, and instead of ϑ for the Banach space parameter). In fact, we are essentially restricting the framework [R] to the case when the infinite set " = ZD , with a translationally invariant coupling which enjoys some spatial decay. We concentrate on pair couplings with exponential or polynomial decay for simplicity: m-point couplings (allowing m to be unbounded) with other kinds of decay may be treated by exactly the same methods. Tori and polyannuli. We view the circle S 1 = R/Z as a subset of the complex cylinder C = C/Z. For ρ ≥ 0, we define the closed annulus in the cylinder, A[ρ] = {z ∈ C/Z : | Im z| ≤ ρ} ⊂ C. For each integer D ≥ 1, the infinite torus SZD = p∈ZD S 1 and the infinite polyannulus AZD = p∈ZD A[ρ] are both compact for the product topology. Let S denote the family of all finite subsets, including the empty set, of the lattice ZD . For ∈ S we write S = p∈ S 1 and A = p∈ A[ρ] for the -torus and the -annulus, respectively. H = H (A ) denotes the space of complex valued functions, holomorphic on the interior of A and continuous on A . In the case = ∅, we let H∅ = C. Each H is a Banach space for the uniform norm denoted | · |. Through coordinate projections we obtain natural inclusions: j,K : HK *→ H whenever K ⊂ (∈ S) and also j : H *→ C(AZD ), where C(AZD ) is the set of continuous functions on AZD . We denote by H (AZD ) the closure of ∪ j H in C(AZD ), for the supremum norm.
Floquet Spectrum of Weakly Coupled Map Lattices
565
Analytic expanding circle map f , uncoupled map F. Definition. For ρ > 0 and λ > 1 we say that f : A[ρ] → C is a real analytic, (ρ, λ)-expanding map of the circle if (1) f is holomorphic in Int A[ρ] and continuous on A[ρ]. (2) f (S 1 ) = S 1 . (Real-analyticity). (3) The intersection f ∂A[ρ] ∩ A[λρ] is empty. (Expansion). A real analytic, (ρ, λ)-expanding map of the circle is real analytic expanding on the circle in the usual metric sense, and λ gives a lower bound for the expansion constant (see Appendix A in [R]). We shall choose a single real analytic, (ρ, λ)-expanding map f . The uncoupled system is the direct product F = (f )p∈ZD , which leaves the infinite (real) torus invariant. Coupling g, coupling-strength , and coupled map F . It is useful to introduce now the notation σp : ZD → ZD (p ∈ ZD ) for the translation σp (k) = k + p. The theory in [R] is valid for a general class of summable couplings. For the sake of simplicity we restrict here to pair-couplings. Let us first describe an exponentially decaying setting in which d will denote a translationally invariant metric on the lattice, to be specified later. For fixed ξ ∈ (0, 1), let Hξ,0 ⊂ H (AZD ) be those φ which may be written as φ= j{0,q} φ{0,q} , q∈ZD
the sum being uniformly converging with each φ{0,q} ∈ H{0,q} , and where ξ −d(0,q) |φ{0,q} | < ∞. q∈ZD
Let |φ|ξ,0 denote the infimum of the above sum over all possible decompositions of φ. The parameter ξ quantifies an exponential spatial decay of the terms φ{p,q} with respect to the distinguished point 0 and the metric d. In our context there are two particular choices for the metric d which will be of interest. First, we may take the Euclidean metric, i.e., d(p, q) = p − q, for which the decay is truly exponential. In this case the above sum becomes ξ −q |φ{0,q} |. (2.1) q∈ZD
Our second choice, cf. also [R, Sect. 6], is to take a “renormalized” metric, d(p, q) = log(1 + p − q). Setting P = − log(ξ ) > 0, the identity ξ d(p,q) = (1 + p − q)−P shows that we are in fact describing a polynomial decay with respect to the Euclidean metric. The above sum then reads (1 + q)P |φ{0,q} |. (2.2) q∈ZD
We shall consider a translationally invariant system of couplings gZD = (gp )p∈ZD , where gp = g0 ◦ σp with g0 in Hξ,0 , either for the Euclidean metric (exponential
566
V. Baladi, H. H. Rugh
case (2.1)) or for the renormalized Euclidean metric (polynomial case (2.2)), mapping the real torus to the real line, and so that |g0 |ξ,0 ≤ . We say that the coupling decays exponentially with rate ξ (respectively polynomially with exponent P = log(1/ξ )) and (spatial) coupling strength . D The coupled system F = (F,p )p∈ZD : AZD → C Z is obtained by setting F,p : AZD → C, F,p : z → f (zp ) + gp (z), p ∈ ZD . The infinite real torus is invariant under F , by definition. Boundary condition x, ¯ truncated map F, . Let x¯ ∈ SZD denote an arbitrary but fixed reference point. For ∈ S we define a holomorphic injection, i : A → AZD , i (z ) = (z , x¯c ) and a natural projection, q : AZD → A , q (zZD ) = z , all expressed in natural coordinates on the annuli. The map q is the left-inverse of i while r = i ◦ q : AZD → AZD is a projection which gives boundary conditions outside . We will use the same notation when considering the restriction of the above maps to the tori SZD and S . Using the above notation, we define for ∈ S the -truncated coupled map, F, : S → S (and F, : A → C ) by setting F, = q ◦ F ◦ i . When w ∈ A −1 we denote by F, (w ) the (finite) set of inverse images in A obtained by solving w = F, (z ) for z ∈ A . The condition < (λ − 1)ρ on the coupling strength implies [R, App. E] that local inverses indeed exist and are real analytic in w . In particular, F, is non-singular and therefore has a well-defined orientation on S . Truncated transfer operator L, , single-site gap η. The transfer operator L, associated with the truncated dynamical system (S , F, ) and the Lebesgue measure m on the -torus is defined by the identification,
S
ψ L, φ dm ≡
S
ψ ◦ F, φ dm ,
for ψ ∈ L∞ (S ) and φ ∈ L1 (S ). By standard arguments, this defines a bounded linear operator, L, , on L1 (S ). This operator may be extended to a compact (in fact nuclear) operator on H . We shall write L for the single-site operator associated to f acting on H = H (A[ρ]), and we denote by η < 1 the maximal modulus of its eigenvalues different from 1, which is a simple eigenvalue with a positive eigenfunction h ∈ H . It may happen (nongenerically) that η = 0, e.g. for the dynamics z → z2 . The present work is only interesting if η > 0, we shall make this assumption and fix η < η < 1. Note that ch := supn Ln (1) and cr := supn (L−h ·)n /(η)n are finite. Since Ln (1) converges to h in the supremum norm on S 1 , for any ch > ch we have hH ≤ ch , up to taking smaller ρ > 0 if necessary.
Floquet Spectrum of Weakly Coupled Map Lattices
567
Banach space Mθ . Given a non-empty ∈ S, let z = (zp )p∈ be natural coordinates on the -torus S . For every pair K ⊂ ∈ S we define a linear operator πK, : H → HK by “integrating away” the coordinates outside K: πK, φ (zK ) =
S\K
φ (z ) dm\K (z\K ),
φ ∈ H .
Such operators are norm-contracting and Fubini’s theorem shows that πL,K πK, = πL, if L ⊂ K ⊂ . Hence the family (H , πK, ) is projective. We denote by M its projective limit. An element φ = (φ )∈S ∈ M satisfies πK, φ = φK whenever K ⊂ ∈ S. We write π φ for the natural projection φ ∈ M → φ ∈ H . From the defining equation it is clear that if K ⊂ , aK ∈ HK , and φ ∈ H then πK, ((j,K aK )φ ) = aK πK, φ , simply because aK does not depend on the variables which are “integrated away”. We introduce for θ ∈ (0, 1), φθ = sup θ || |φ | ∈ [0, +∞], ∈S
φ ∈ M,
(2.3)
and define Mθ to be the set of those φ with finite norm. The norm of the natural projection π : Mθ → H is given by π θ = θ −|| .
Full transfer operator L , choice of 0 < θ < . In the following we use the constants ρ > 0, λ > 1, η < 1, ch , and cr introduced above and fix θ ∈ (0, 1) and 0 > 0 (denoted κ in [R]) so that the condition TR in [R, Def. 4.18] is verified for some 1 < γ < η−1 (the constant denoted Cβ there is our C() defined in (2.4) below). For a suitable value of 0 < θ < [R, Theorem 2.1, Lemmas 4.20 and 4.25], we may define for each n ≥ 0 (n) a bounded linear operator L : M → Mθ by the requirement that for all 0 < < 0 and each finite K ⊂ ZD : n πK L(n) = lim πK, ◦ L, ◦ π . →ZD
(n)
Furthermore, there is n0 finite, such that for n ≥ n0 , L maps M into M itself so that we may consider the spectrum of this operator. (n) Rugh [R] gives another characterisation of L as a sum over configurations (see his Lemma 4.25). This allows to bypass the boundary condition x¯ and is convenient to obtain bounds (some of which are quoted and used below), but since our current aim is to reduce technicalities as much as possible we content ourselves here with the naive definition and refer to [R] for more. (N+n) (N) (n) = L L is a bounded operator on M . It is If n, N ≥ n0 we have that L thus not a (serious) abuse of notation to write Ln , without parentheses, for n ≥ n0 .
568
V. Baladi, H. H. Rugh
Uncoupled spectrum and perturbative bounds. Let sp(·) denote the spectrum of an operator. We state here extensions of the results from [R] required for our present purposes (the proofs are given in the appendix): Theorem 0 (Uncoupled spectrum and perturbation). Let θ,
, 0 , and n0 be as above.
(1) If the coupling strength 0 ≤ ≤ 0 the operators Ln : M → Mθ and Ln : M → M are bounded for all n ≥ 0 and n ≥ n0 , respectively. (2) Let sp(Ln0 ) = {1, λj , j = 1, 2, . . . , |λj | < 1} denote the nonzero spectrum of the compact single-site operator Ln0 on H . Then the nonzero spectrum of Ln0 0 on M consists in the discrete set K =0 = 1, λjk , K = 1, 2, . . . , 1 ≤ j1 ≤ j2 ≤ . . . ≤ jK , k=1
where 1 is a simple eigenvalue, and all other points are eigenvalues of infinite multiplicity. Furthermore, the image of the spectral projector associated [Ka, III.6.4] to any domain D ⊂ C, with 0 ∈ / D and ∂D a union of rectifiable curves disjoint from =0 , consists of the generalised eigenvectors {ϕ ∈ M | ∃> ≥ 1, ∃z ∈ =0 ∩ D, s.t. (z − Ln0 0 )> ϕ = 0}. (3) Define for 0 ≤ ≤ 0 the function, C() =
e2π 1 − 2π(λ−1)ρ . 2π(λ−1)ρ 2π e −e e −1
(2.4)
Then we have the following bound on the perturbation for all n ≥ n0 : Ln0 − Ln M ≤ C()/C(0 ).
(2.5)
The function C() is denoted Cβ in [R], see Lemma 3.4 there. Corollary of Theorem 0 (Perturbation theory). Since 1 is an eigenvalue of Ln 0 , it follows from (2.5) that for small enough 0 ≤ ≤ 0 , the spectrum of the bounded operator Ln 0 : M → M consists in {1} ∪ = , where = is a subset of a disc of radius η. The maximal eigenvalue 1 is simple, with an eigenfunction invariant under spatial translations, and there are no other spectral points of Ln 0 on the unit circle. For each δ > 0 there is a function Cδ () = O() as → 0 and such that d(z, =0 ) < Cδ () for each z ∈ = with z > δ. In fact, the fixed point of Ln 0 in M gives rise to an F -invariant finite positive Borel measure on the infinite torus which is the SRB measure. We refer to [BK1,BIJ,FR], and [R] for this claim and for ergodic-theoretical properties of F related to the spectral properties stated in Theorem 0 (quantified time-mixing for observables in a space related to the coupling strength).
Floquet Spectrum of Weakly Coupled Map Lattices
569
Floquet spectrum of L , resonant spectrum R. For simplicity, consider D = 1 (the extension to D ≥ 2 is straightforward and left to the reader). Write σ for the spatial translation (to the left, say) on Z, and also σ : AZ → AZ for the associated shift on the product annulus. This shift defines in a natural way an operator σ ∗ on M . To be more precise, let K be a finite subset of our lattice and denote by σK : AK → Aσ (K) the action of the shift on the truncated annulus. For a projective family, φ ∈ M , we may for each finite subset, K ⊂ Z, associate a pull-back, (σ ∗ φ)K ≡ (πσ (K) φ)◦σK . Applying Fubini, it is readily seen that if K ⊂ ∈ S then πK, (σ ∗ φ) = πK, (φσ () ◦ σ ) = (πσ (K),σ () φσ () ) ◦ σK = φσ (K) ◦ σK = (σ ∗ φ)K (i.e., the shift and the projection commute). The same is true for the inverse operator. Thus, the family (σ ∗ φ)K , K ∈ S is projective and clearly has the same norm in M as φ. Hence, σ ∗ : M → M is an isometry. It is not difficult to see that every frequency eiα , α ∈ [0, 2π ) is an eigenvalue of σ ∗ of infinite multiplicity. Letting π∅ : M → C be the projection to the empty set, we denote by Xα = {ϕ ∈ M | σ ∗ ϕ = eiα ϕ, π∅ (ϕ) = 0} the corresponding eigenspace in the kernel of Lebesgue. The set X α is obviously a complete subspace of M . Note (although we shall not use this fact here) that (σ ∗ )n = 1 for all n implies that α X coincides with the generalised eigenspace {ϕ ∈ M | ∃> ≥ 1, (σ ∗ − eiα Id)> ϕ = 0, π∅ (ϕ) = 0}). For example, if ϕ is not an eigenvector but is a generalised eigenvector for > = 2 and eiα , then one easily shows by induction that (σ ∗ )r ϕ = rei(r−1)α (σ ∗ ϕ − eiα ϕ) + eirα ϕ, for all r ≥ 2, contradicting (σ ∗ )n = 1 for all n. Our aim in the present work is to describe in more detail the structure of the spectrum of Ln 0 , using our spatially invariant setting. Choosing a translationally invariant reference point x¯ for the boundary condition, the truncated dynamical system becomes spatially invariant and the very definition of the truncated operator ensures that (Lσ (), φσ () )◦σ = L, (φσ () ◦σ ) for ∈ S. Therefore πK, Ln, (φσ () ◦σ ) = (πσ (K),σ () Lnσ (), φσ () ) ◦ σK for K ⊂ ∈ S and any n ≥ 0. Taking the limit → Z (which also implies σ () → Z) we obtain πK (Ln (σ ∗ φ)) = (πσ (K) (Ln φ)) ◦ σK = πK (σ ∗ (Ln φ)), i.e., that σ ∗ and Ln commute as operators on M : Ln ◦ σ ∗ = σ ∗ ◦ Ln , ∀ 0 ≤ < 0 , n ≥ n0 . In particular, Ln 0 sends X α into itself. Recalling the notation {λj } and =0 from Theorem 0, we define the “resonant eigenvalues” of Ln0 0 to be: K λjk , K ≥ 2, j1 ≤ . . . ≤ jK ∪ {0}. R = z ∈ =0 | z = k=1
We also call resonant those eigenvalues of Ln0 which belong to R (this happens if an eigenvalue of Ln0 coincides with a product of other eigenvalues). We have: Lemma 1 (Nonresonant spectrum of Ln0 0 |X α ). Except for the resonant eigenvalues in R and, possibly {1}, for each α ∈ [0, 2π ) the nonzero spectrum of Ln0 0 |X α is the same (in particular, consists of eigenvalues with the same finite algebraic multiplicities) as the nonresonant spectrum of Ln0 on H . For each λ ∈ sp(Ln0 ) \ {1, 0}, a bijection between
570
V. Baladi, H. H. Rugh
any basis of the generalised eigenspace of Ln0 and a basis for that of Ln0 0 |Xα is given by: ϕ → ϕ α = k∈ZD eiαk ϕ0 ◦ σ k ∈ M (2.6) where ϕ0 (x) = ϕ(x0 ) ⊗>∈ZD \{0} h(x> ). More precisely, the projective families defined by (2.6) should be understood as follows. For each finite ⊂ ZD we set (ϕ α )| = k∈ eiαk ϕk, , where ϕk, (x) = ϕ(xk ) ⊗>∈\{k} h(x> ). (Note that S 1 ϕ dm = 0 and sup sup |(ϕ α )| |/|| < ∞.) Proof of Lemma 1. First note that, since a spectral projector @αD for Ln0 0 |Xα is just the restriction to Xα of the corresponding spectral projector for Ln0 0 , the last statement of Theorem 0 (2) implies that the image of @αD is a vector space of generalised eigenvectors in Xα . Fix a nonresonant eigenvalue λ = 1 for Ln0 and let ϕ be an element of a basis of a generalised eigenspace for the compact operator Ln0 and λ, in particular (Ln0 − λ)> ϕ = 0 for some > ≥ 1 and λ. Obviously, ϕ α ∈ X α defined by (2.6) is a generalised eigenvector of Ln0 0 for λ, and λ ∈ / R by the choice of λ. To check that the image of a linearly independent set by the map (2.6) is a linearly independent set, use that α α i υi ϕi = 0 implies i υi π{0} ϕi = i υi ϕi = 0. To check that the image of (2.6) spans the generalised eigenspace for Ln0 0 |Xα and λ, first note that if ψ α is a generalised α = 0 eigenvector for Ln0 0 |Xα and λ ∈ / R, then there is some nonempty finite so that ψ n0 is a generalised eigenfunction for L and λ. The tensor product expression for Ln0 and elementary algebra imply that there are υt,p ∈ C, for t = 1, . . . , m and p ∈ , m α = ˆ ˆ with ψ p∈ t=1 υt,p ψt (xp ) ⊗k∈\{p} h(xk ), where {ψs , s = 1, . . . , m} is a basis of the m-dimensional space of generalised eigenfunctions for (Ln0 , λ) (observe that m = mλ does not depend on ). Now, since ψ α ∈ X α , linear independence yields υt,p = υt,q eiα(p−q) so that, writing υt = υt,p0 for some arbitrarily chosen p0 ∈ , we ˆα find ψ α = m " t=1 υt ψt ◦ σp0 . ! Recall that sp(Ln 0 ) ⊂ {1} ∪ {|z| < η} with 1 a simple translationally invariant eigenvalue. The key result of this work is the following description of part of the spectrum of Ln 0 |Xα (see also the remark after the statement, and note that similar properties of the spectrum of Ln on X α for n > n0 may easily be formulated and proved): Theorem 1. Let D ⊂ {z | |z| < 1} be a complex neighbourhood of a finite subset of the nonresonant spectrum sp(Ln0 0 ) \ R of Ln0 0 such that D ∩ R = ∅. There is C > 0 so that for each small enough coupling strength 0 < < 0 : (1) For every α ∈ [0, 2π), the spectrum sp(Ln 0 |Xα ) ∩ D consists in finitely many eigenvalues {λ> (α), > ∈ E(α)} of finite multiplicity. To each λj ∈ D ∩ sp(Ln0 0 ) is associated a finite set {λ> (α), > ∈ Ej (α)} so that the Ej (α) form a partition of E(α), and the sum of the algebraic multiplicities of the λ> (α) for > ∈ Ej (α) coincide with the algebraic multiplicity Mj of λj for Ln0 . Furthermore, |λ> (α) − λj | → 0, uniformly, as → 0 for all > ∈ Ej (α). Finally, if ϕ α is associated to ϕ (with α ∈ X α with (λ − Ln0 )k> ϕ α = 0 (λj − Ln0 )k ϕ = 0) as in (2.6) then there are ϕ,> > ,> α so that >∈Ej (α) ϕ,> − ϕ α → 0, uniformly, as → 0. More precisely, we have 1/Mj |λ (α) − λj )| ≤ C · C() , ∀> > C(0 ) (2.7) 1/Mj α − ϕ α ≤ C · C() >∈E (α) ϕ,> . C(0 ) j
Floquet Spectrum of Weakly Coupled Map Lattices
571
(2) If the coupling decays exponentially, then for each j the map α → λ> (α)
(2.8)
>∈Ej (α)
is analytic. If the coupling decays polynomially with exponent P > D, then each map (2.8) is (P − D)-times differentiable. Analogous statements, in the appropriate topology, hold for the (sums of) eigenfunctions and eigenfunctionals. Theorem 1(1) is obtained through perturbation theory [Ka], and it will be proved in Sect. 3. Theorem 1(2) is proved in Sect. 4, using information from Sect. 3. Remark (Joint eigenvalues vs. joint spectrum). Since there is no obvious spectral decomposition of σ ∗ acting on the Banach space M (indeed, spectral projectors associated to the eigenvalues of σ ∗ are not available), Theorem 1 does not give the full joint spectrum of the commuting pair (Ln 0 , σ ∗ ) on M , even restricting to D. We refer, e.g., to [MPR] for a list of definitions of joint spectrum of commuting operators in infinite dimension. If there are just two operators, say Ln 0 and σ ∗ , the easiest to state is that of the Harte spectrum, given as follows: (ν, λ) ∈ C2 belongs to the joint Harte spectrum of (Ln 0 , σ ∗ ) on M if at least one of the two equations (σ ∗ − ν)A1 + (Ln 0 − λ)A2 = Id, A1 (σ ∗ −ν)+A2 (Ln 0 −λ) = Id has no bounded operator solutions A1 , A2 . For commuting finite matrices B1 , B2 , one defines the set of joint eigenvalues to be those (ν, λ) ∈ C2 so that there is a vector x with B1 x = νx and B2 x = λx (see, e.g., [BB]). Extending this definition to bounded linear operators, Theorem 1 indeed describes the intersection of D × C ⊂ C2 with the set of joint eigenvalues of the commuting pair (Ln 0 , σ ∗ ) on M . 3. Existence of Eigenvalues: Elementary Perturbation Theory We formulate some abstract results in an axiomatic setting: Let (X, · ) be a complex Banach space, and let {T , 0 ≤ ≤ 0 } be a family of bounded linear operators on X. We make the following two assumptions about T0 and T for ≥ 0: There is κ0 > 0 so that the spectrum of T0 decomposes as =0 ∪ =1 , with inf
z0 ∈=0 ,z1 ∈=1
|z0 − z1 | > κ0 .
lim T − T0 = 0.
→0
(3.1) (3.2)
The main result of this section follows: Proposition 1 (Perturbation theory). Let {T , ≥ 0} satisfy (3.1–3.2). Then there is C > 1 such that for all sufficiently small > 0, there is a decomposition of sp(T ) into =0 ∪ =1 such that (1) inf z0 ∈=0 ,z1 ∈=1 |z0 − z1 | > κ0 /3. (2) Let π0 : X0 ⊕X1 → X0 be the projection associated with the spectral decomposition of T . Then π00 − π0 ≤ C · T − T0 . (3) Let Xi be the image of X by the spectral projector of T0 corresponding to =i . The invariant space X0 may be written as the graph of a linear map S : X0 → X1 , with S ≤ C · T − T0 .
572
V. Baladi, H. H. Rugh
It follows from Proposition 1(2) that, if we assume additionally that dim(X0 ) < ∞ (in particular, X0 is a generalised eigenspace),
(3.3)
then dim(X0 ) = dim(X0 ) is finite for all small enough . Clearly, statements (1) and (2) of Proposition 1 can be obtained from standard perturbation theory, see, e.g., [Ka, Sect. IV.3.5]. The third claim allows us to define T : X0 → X0 by T (x) = π00 ◦ T (x + S (x)). (3.4) The map T will be convenient to show (2.7) in Theorem 1(1) (see the following Proposition 2), and also to prove Theorem 1(2) (see Sect. 4). Proposition 2 (Finite-dimensional perturbation). Assume that {T , ≥ 0} satisfies (3.1–3.3). Let M denote the maximum algebraic multiplicity of λ ∈ sp(T0 |X0 ). Then there is C > 0 so that for all small enough , 1/M . (1) For each z ∈ sp(T |X0 ) we have inf z0 ∈sp(T0 |X0 ) |z0 − z | ≤ C · T − T0 (2) If xj ∈ X0 is an eigenvector for T0 with eigenvalue λj and algebraic multiplicity ∈ X with eigenvalues λ such that Mj , then T has at most Mj eigenvectors xj,> 0 j,> 1/M |λj,> − λj | ≤ C · T − T0 j , ∀ >, 1/M xj,> − xj ≤ C · T − T0 j . >
Proof of Theorem 1 (1). Taking X = Xα for some α ∈ [0, 2π ) and applying Theorem 0 and Lemma 1, Assumptions (3.1–3.3) are satisfied for T0 = Ln0 |Xα and T = Ln 0 |Xα . Proposition 2 gives (2.7) while Proposition 1 implies all other claims. ! " We now prove Propositions 1 and 2. We let πi : X → Xi (i = 0, 1) denote the projections on Xi , i.e., x = π0 (x) + π1 (x) ∈ X0 ⊕ X1 . Proof of Proposition 1. (1) Let λ ∈ C satisfy d(λ, sp(T0 )) > κ0 /4. To show that λ ∈ / sp(T ) (if is small enough) it suffices to prove that the resolvent R(λ, T ) is a bounded operator. If the following sum converges it coincides with the resolvent (see, e.g., [BY, (2.1)]): R(λ, T ) =
∞
j
R(λ, T0 ) ◦ (T − T0 )
· R(λ, T0 ).
(3.5)
j =0
It is easily seen that σ0 := sup{λ|d(λ,sp(T0 ))>κ0 /4} R(λ, T0 ) < ∞. Therefore, it suffices to take small enough so that T − T0 < 1/σ0 . Setting =0 = {z ∈ σ (T ) | d(z, =0 ) < κ0 /3},
=1 = {z ∈ σ (T ) | d(z, =1 ) < κ0 /3},
our Assumption (3.1) allows us to conclude. (2) Let D ⊂ C contain =0 ∪=0 , be disjoint from a κ0 /3 neighbourhood of {0}∪=1 ∪=1 , and be such that ∂D is a union of rectifiable curves with finite total length (such a domain D exists by (1)). Then we have 1 1 π0 = R(λ, T0 ) dλ π0 = R(λ, T ) dλ. 2iπ ∂ D 2iπ ∂ D
Floquet Spectrum of Weakly Coupled Map Lattices
573
We may thus estimate π0 − π0 by 1 R(λ, T0 ) − R(λ, T ) dλ π0 − π0 ≤ 2π ∂ D 1 ≤ length (∂D) max R(λ, T0 ) − R(λ, T ). 2π λ∈∂ D
(3.6)
Using (3.5), we find R(λ, T0 ) − R(λ, T ) ≤
∞
R(λ, T0 )j +1 · T − T0 j .
j =1
Statement (2) immediately follows from (3.2). (3) To prove the last claim, take small and x in X0 . Since x−π0 (x) ≤ π0 −π0 x, it follows that if x = (x0 , x1 ) ∈ X0 ⊕X1 , then x1 % x0 . This inequality implies in particular that π0 is injective on X0 so that S is well defined. The estimate on S follows from x1 = x − π0 (x) ≤ CT − T0 x T − T0 x0 . " ! ≤C 1 − CT − T0 Proof of Proposition 2. Using the map T : X0 → X0 from (3.4), we have for x ∈ X0 with x = 1, T (x) − T0 (x) ≤ π0 · (T (x) − T0 (x) + T (S (x))) ≤ C · π0 · T − T0 + T · T − T0 .
(3.7)
The assertions in Proposition 2 follow immediately. (See, e.g., [W, Chap. 2] and [Ka, II.5].) ! "
4. Smoothness of Eigenvalue Curves Returning now to the setting of Sect. 2, we show here how the information from Sect. 3 can be combined with further crucial quantitative bounds on Ln 0 (Proposition 3) to show Theorem 1(2). Recall from Sect. 2 that the distance d(p, q) in the definition of the spatial coupling strength is either the Euclidean metric (exponential decay with rate ξ < 1) or the renormalized Euclidean metric (polynomial decay with exponent P = log(1/ξ )) on the lattice. Proposition 3. Let n0 and 0 be given by Theorem 0. Let @0 be a spectral projector for a subset of a disc of radius strictly less than 1 in the spectrum of Ln0 0 . Then there is C > 0 so that for each < 0 and k ∈ ZD , writing @0 for the corresponding spectral projector in the spectrum of Ln0 , and setting Zk = {ϕ ∈ Mθ | π ϕ = 0, ∀ s.t. k ∈ / }, we have (4.1) max π{0} ◦ (@0 )|Zk , π{0} ◦ (Ln 0 ◦ @0 )|Zk ≤ Cξ d(0,k) .
574
V. Baladi, H. H. Rugh
Proof of Proposition 3. To simplify notation, we write L0 and L instead of Ln0 0 and Ln 0 in this proof. We shall concentrate on bounding the second expression in the max, a simpler version of our arguments gives the other estimate. We shall represent @0 as a suitable contour integral ∂ D · dλ of the resolvent R(λ, L ) and use (3.5) again: R(λ, L ) =
∞
j
R(λ, L0 ) ◦ (L − L0 )
◦ R(λ, L0 ).
j =0
The resolvent R(λ, L0 ) preserves Zk , and we shall in fact show that for j = 0, 1, . . . : π{0} ◦ L ◦ (R(λ, L0 ) ◦ (L − L0 ))j |Zk ≤
−1 −n0 d(0,k)
γ
ξ
(σ0 γ −n0 (C()/C(0 )))j ,
(4.2)
where σ0 = supλ∈∂ D R(λ, L0 ) and γ > 1 is again the constant from TR in [R, Definition 4.18]. Let us first consider j = 1. For a finite subset of our lattice, we consider the operator π ◦ R(λ, L0 ) ◦ (L − L0 ). By Lemma A.1 from the appendix, we may write π ◦R(λ, L0 ) = R(λ, L )◦π , where R(λ, L ) is the (bounded) resolvent of L acting on H . The operator L is defined in [R, Sect. 4.7] through a sum of configurational operators π ◦ L = C ∈C [,n0 ] L [C] : M → H . We may therefore write: π ◦ R(λ, L0 ) ◦ (L − L0 ) = R(λ, L ) ◦
L [C],
C ∈C ∗ [,n0 ]
where C ∗ [, n0 ] represents the configurations which do not consist only of initial-leaves and end-leaves (see the proof of Theorem 0 (2) in the Appendix). The bounds on the configurational operators (see also the appendix) imply furthermore that C ∈C ∗ [,n0 ]
R(λ, L ) ◦ L [C] ≤
C() R(λ, L0 ) C(0 )
−||
.
The proof of this bound is perhaps slightly trickier than it looks. The reason is that the a priori bounds for R(λ, L ) are useless here. Instead one should use the same expansion as in the proof of Lemma A.1, integrate each term in the resolvent separately (i.e., each term in U<m and U≥m ), and combine with the configurational expansion of L − L0 . The terms in this expansion of the projector involve in particular projections to subsets J of . Therefore, when composing with a configurational operator L [C] such a term vanishes unless C is a configuration over J (we refer to [R] for the terminology). Summing up what remains yields the above inequality. When acting on the kernel Ker Leb = {ϕ | π∅ = 0} of the Lebesgue measure and taking the spatial decay of the couplings into account, more can be said. First, the “size” [R, Def. 4.11, Lemma 4.25] of each configuration must be at least n0 in order to get a non-vanishing contribution. Second, we may introduce an “interaction-radius”, rad(C), of a configuration, C: First, we map C into a collection of trees yp , p ∈ [R, Sect. 4.3, Def. 4.9]. A branching (in a tree) of a point q into a set K is associated with the interaction-radius, rad(q, K) = max{d(q, r) | r ∈ K ∪ {q}} and we define rad(C) to be the sum of interaction-radii over all branchings and all trees.
Floquet Spectrum of Weakly Coupled Map Lattices
575
As in the proofs of temporal [R, Lemma 4.25] and spatial [R, Sect. 5.1] decay, we then obtain the bound, C() R(λ, L0 ) −|| . R(λ, L ) ◦ L [C]|Ker Leb ξ −rad(C ) · γ n0 ≤ C( 0) ∗ C ∈C [,n0 ]
We wish to iterate this type of argument when taking powers j ≥ 2 of the operator difference composed with the resolvent. (The case j = 0 is easier and left to the reader.) In order to do so, we note that both L − L0 and R(λ, L0 ) map the kernel of Lebesgue measure into itself ([R, Lemma 4.25] and the fact that no projector involved includes the eigenvalue 1). In the product, π ◦ L ◦ (R(λ, L0 ) ◦ (L − L0 ))j , we introduce the configurational expansion in each factor, L − L0 . We then obtain a sequence of finite subsets, 0 ≡ ⊂ 1 ⊂ · · · ⊂ j and configurations Ci ∈ C ∗ [i , n0 ], i = 0, . . . , j − 1, where each configuration expands i into i+1 . It is important here to notice that each occurrence of R(λ, L0 ) does not expand a given finite subset (since πK ◦ R(λ, L0 ) = R(λ, LK ) ◦ πK ). When acting on Zk we note that non-vanishing contributions can only occur provided k ∈ j , i.e., the very last expansion of the original set, , has to contain the distinguished point, k. This, in turn, implies that the sequence of configurations must include a sequence of “trees” which “connects” by a path (of branchings) the point k with a point in . In the product we therefore obtain factors of ξ raised to the power rad(C0 ) + · · · rad(Cj −1 ) ≥ d(k, ). This finally implies π ◦ L ◦ (R(λ, L0 ) ◦ (L − L0 ))j |Zk ≤
−|| −n0
γ
(σ0 γ −n0 (C()/C(0 )))j ξ d(k,) .
The claim (4.2) now follows if for the set we take the origin of our lattice. To end the proof of Proposition 3, just note that j (σ0 γ −n0 C()/C(0 ))j remains uniformly bounded for small . ! " Proof of Theorem 1 (2). To simplify notation, we write L0 and L instead of Ln0 0 and Ln 0 in this proof. We shall use the notation from Sect. 3, taking again T0 = L|Xα T = L |Xα , and X = Xα with the norm induced by M , writing also X0α = X0 , X1α = X1 . Note that π0 = π0α and π1 = π1α are just the restrictions to X α of the spectral projections @0 , @1 of L0 : M → M associated to the partition D ∪ (C \ D). Theorem 0 (2) guarantees that the image of π0α is a generalised eigenspace, and since D is disjoint from a neighbourhood of R, Lemma 1 says that this eigenspace is finite-dimensional and in bijection with the eigenspace associated to Ln0 and D. Using also Theorem 0 (3), we see that Conditions (3.1–3.3) are satisfied. Note also that πiα ≤ @i for i = 0, 1 and all α. We work with the finite-dimensional operator T = T (α) from (3.4). Since T (x0 ) = π0 T (x0 + S (x0 )) = λx0 if x0 + S (x0 ) is an eigenvector of T for the eigenvalue λ, the relevant eigenvalues λα of L |Xα are just the eigenvalues of any finite matrix representing T (α) on the finite-dimensional space X0α . Therefore, to prove our claim it suffices (by classical results, see, e.g., [W, Chapter 2], [DS, Theorem VII.6.9]) to check that the coefficients of such a matrix depend analytically (respectively differentiably) on α ∈ [0, 2π). By Lemma 1, these coefficients may be indexed by the following finite basis of generalised eigenvectors of L0 in X α (for eigenvalues in D): ϕqα = k∈ZD eiαk ϕq ◦ σ k , ϕq (z) = ϕˆq (z0 ) ⊗>∈ZD \{0} h(z> ), (4.3) {ϕˆq } a basis of generalised eigenvectors for L for eigenvalues in D.
576
V. Baladi, H. H. Rugh
Just like for the more precise definition for (2.6) given after Lemma the expression 1, iαk in (4.3) means that for each ⊂ ZD we define (ϕqα )| = e ϕq,k, with k∈ ϕq,k, (z) = ϕˆq (zk ) ⊗>∈\{k} h(z> ) (note that S 1 ϕˆq dm = 0). (Recall also that we in fact restrict to D = 1 for simplicity, the extension to the general case is straightforward, with α in a D-dimensional torus and using that 1/(xt (1 + x)P ) is integrable on RD if and only if t > D − P .) To compute these coefficients, we may also use a finite basis of generalised eigenfunctionals of L∗0 in (X α )∗ νn = νˆ n ⊗k∈ZD \{0} dLeb, {ˆνn } basis of eigenvectors for L∗ for eigenvalues in D. Note that Proposition 1 may be applied to the spectral decomposition of Ln0 0 associated to D ∪ (C \ D) on M , giving projectors @i for L . Recall from Sect. 3 that S = S (α) : X0α → X1α is such that X0α, = @0 (X0α ) = X0α + S (X0α ). Now, the operator π0 is invertible when acting on the finite-dimensional space π0 (X0α ). (In fact, π0 (π0 x) = π0 (x) + (π0 − π0 )(π0 x) so that π0 |π0 (X0α ) is a small perturbation of the identity.) Thus, we may define Q = Q (α) : X0α → π0 (X0α ) by Q (α) = Id + S (α) = (π0 |π0 (X0α ) )−1 .
(4.4)
For each fixed α, the coefficient T,nq (α) of the matrix of T in the chosen bases can then be expressed as: T,nq (α) = νn L Q (4.5) eiαk ϕq ◦ σ k . k∈ZD
Denoting by Qrq (α) the coefficients of the matrix of Q in the bases given by (4.3) and its image under π0 , we finally get (the sum over r is finite) Qrq (α)π0 eiα> (ϕr ◦ σ > ) T,nq (α) = νn L =
r
r
Qrq (α)
>∈ZD
eiα> νn (L @0 )(ϕr ◦ σ > ).
>∈ZD
Formally, we may write the derivatives of the interior sum over > as: d s iα> > e ν (L @ )(ϕ ◦ σ ) = >s eiα> νn (L @0 )(ϕr ◦ σ > ). n r 0 dα s D D >∈Z
>∈Z
By definition (see also the remarks after (4.3)), ϕr ∈ Z0 so that ϕr ◦ σ > ∈ Z> . The upper bounds given by Proposition 3, ξ |>| ϕr , for exponential coupling > |νn (L @0 )(ϕr ◦ σ )| ≤ C · −P (1 + |>|) ϕr , for polynomial coupling, are good enough for our purposes. By the Leibniz formula, it only remains to prove that each coefficient map α → Qrq (α) is analytic (respectively P times differentiable). It is equivalent to prove the
Floquet Spectrum of Weakly Coupled Map Lattices
577
qr (α) of the inverse map Q (α)−1 = π0 |π (Xα ) . We same claim for the coefficients Q 0 0 have ds Qqr (α) = k s eiαk νq @0 (ϕr ◦ σ k ), s dα D k∈Z
so that our task reduces to finding good bounds for |νq @0 (ϕr ◦ σ k )| for all k ∈ Z. But this is again Proposition 3. ! "
Appendix Proof of Theorem 0. The proof of (1) may be found in [R], so that we first concentrate on (2). We recall some results from [R, App. B]: For each finite ⊂ ZD , the tensor product ⊗p∈ Hp is dense in H . Also, if J ⊂ then a bounded operator TJ on HJ has a natural norm-preserving extension to a bounded operator TJ acting on H (see Theorem B.4 in [R, App. B]). The extended operator may be defined as TJ (φJ ⊗ φ\J ) ≡ (TJ φJ ) ⊗ φ\J when acting on a direct tensor product and extended by continuity. The truncated unperturbed Perron–Frobenius operator L acting on H is itself a tensor product, L = ⊗p∈ Lp , of the operators acting at the individual sites (here, for simplicity, we do not write L p for the extension of L : H → H ). In the rest of this proof we write L instead of Ln0 (and L0 , L for Ln0 0 , Ln 0 ) in order to simplify notation, and we let Pp = hp ⊗ mp denote the principal eigenprojection (which yields the single-site SRB measure) of Lp . Thus, Lp ◦ Pp = Pp ◦ Lp = Pp . Denoting PJ = ⊗p∈J Pp and (1 − P )J = ⊗p∈J (1 − Pp ), let us introduce another extension of TJ : HJ → HJ through Ext (TJ ) = TJ ⊗ P\J = TJ P\J .
We end these preliminaries by a bound for J ⊂,|J |=k Ext ((1 − P )J ) ◦ π . For this, we start by noting that for J ⊂ , Ext ((1 − P )J ) = (1 − Pp ) ⊗ P\J = (−1)|I | PI ⊗ P\J . p∈J
I ⊂J
Next, define for k ∈ N the constants αk = αk ( , ch ) = max>≥0 k> (ch +1)k (ch )>−k , where ch > 1 is associated as above to the single-site operator L (equivalently, to Ln0 ) and 0 < < 1 is chosen as in Sect. 2 (see also [R, Theorem 2.1, Lemmas 4.20 and 4.25]). Since ch < 1 these constants are finite, though not uniformly bounded in k. Using again [R, Theorem B.4], we get, recalling also the definition of ch ,the estimate (with K = J \ I ), || Ext ((1 − P )J ) ◦ π φ J ⊂,|J |=k
≤
||
PI ⊗ P\J ◦ π φ
J ⊂,|J |=k I ⊂J
≤
||
J ⊂,|J |=k
|\J |
ch
K⊂J
|J \K|
ch
φK
578
V. Baladi, H. H. Rugh
≤
||
J ⊂,|J |=k
≤
|\J |
ch
(ch )
K⊂J
|\J |
J ⊂,|J |=k
≤
−|K| |J \K| ch φ
(ch )|J \K| φ
K⊂J
(ch
|J |
+ 1) (ch )|\J | φ
J ⊂,|J |=k
≤ αk φ .
(A.1)
We will use the following lemma: Lemma A.1. For λ ∈ / =0 , λ − L0 is invertible in M . We have the following expansion for the marginals of the resolvent: π ◦ R(λ, L0 ) = Ext (R(λ, LJ )(1 − P )J ). J ⊂
Proof of Lemma A.1. The identity mp Lp = mp and the fact that L is a tensor product implies for K ⊂ that πK, L = LK πK, . When λ ∈ / =0 then the definition of the resolvent R(λ, L ) = (λ − L )−1 yields πK, R(λ, L ) = R(λ, LK )πK, . The properties of Pp also ensure that L Ext ((1−P )K ) = Ext (LK (1−P )K ) and therefore also R(λ, L )Ext ((1 − P )K ) = Ext (R(λ, LK )(1 − P )K ). (A.2) Note that this is consistent with R(λ, L )P = (λ − 1)−1 P since for K = ∅ we define L∅ and (1 − P )∅ : C → C to be the identity and Ext (Id) = P . Our goal is to extend the family of resolvents R(λ, L ) to a bounded operator acting on the space M . We may clearly define R(λ, L ) ◦ π : M → H . This family is projective but a priori it is not clear that it is bounded in the M norm. Let us first note that 1 = ⊗p∈ (Pp + (1 − P )p ) = J ⊂ Ext ((1 − P )J ). Acting on each individual term with the resolvent of L we obtain the right-hand side in Lemma A.1. In general, however, this expansion is not going to be M convergent so we need to be more careful. Instead we choose m = m(|λ|) ∈ N such that |λ| > ηm and we split the decomposition of 1 just introduced, into two sums, 1 = Ext ((1 − P )J ) + Ext ((1 − P )J ). J ⊂,|J |<m
(A.3)
(A.4)
J ⊂,|J |≥m
We shall deal with these two terms separately. For the first term we note that by (A.2) we get, using again [R, Theorem B.4], the bound R(λ, L )Ext ((1 − P )J ) ◦ π ≤ R(λ, LJ ) Ext ((1 − P )J ) ◦ π . Here R(λ, LJ ) = R(λ, L⊗|J | ) depends on J only through its cardinality. The operator, U<m = R(λ, L ) Ext ((1 − P )J ) =
J ⊂,|J |<m
J ⊂,|J |<m
Ext (R(λ, LJ )(1 − P )J ),
(A.5)
Floquet Spectrum of Weakly Coupled Map Lattices
579
is clearly bounded on H . If we combine with (A.1) we see that we also have the norm bound, m−1 || U<m ◦ π ≤ R(λ, L⊗k ) · αk ( , ch ), (A.6) k=0
uniformly in . By commutativity of the involved operators we also have (λ − L )U<m = U<m (λ − L ) = Ext ((1 − P )J ).
(A.7)
J ⊂,|J |<m
For the second term in (A.4) we shall use the following “large deviations” bound (see Lemma 4.15 in [R] for a similar computation): if a, b > 0 and γ > 1 are so that γ a + b ≤ 1, then a |J | b|\J | ≤ a |J | b|\J | γ |J |−m ≤ (γ a + b)|| γ −m ≤ γ −m . (A.8) J ⊂,|J |≥m
J ⊂
Taking γ > 1 such that condition TR in [R, Def. 4.18], holds, as before, our choices above for (see [R, Lemma 4.20]) and n0 (see [R, Lemma 4.21], recalling that the notation η represents in fact ηn0 ) ensure that cr γ η(1 + ch ) + ch ≤ 1. Then the “large deviations” bound gives (note the extra factor of η>−1 ), (cr η> (1 + ch ))|J | (ch )|\J | ≤ η>m (γ η)−m . (A.9) J ⊂,|J |≥m
Hence, using L> Ext ((1 − P )J ) ◦ π = Ext (L>J (1 − P )J ) ◦ π
≤ (cr η> )|J | Extλ ((1 − P )J ) ◦ π ,
we get by a calculation similar to (A.1) || L> Ext ((1 − P )J ) ◦ π ≤ η>m (γ η)−m .
(A.10)
J ⊂,|J |≥m
Therefore, since |λ| > ηm , the following sum is absolutely convergent on H (but the bound depends on the size of ): A=
∞ 1 >+1 λ >=1
J ⊂,|J |≥m
L> Ext ((1 − P )J ).
More precisely, the above calculations show that || A ◦ π : M → H is bounded in norm by ∞ 1 1 η>m (γ η)−m = m , γ |λ| |λ| − ηm |λ|>+1 >=1
uniformly in . Finally, using (A.4), we find (corresponding to the “missing term” > = 0 in A) that 1 || 1 Ext ((1 − P )J ) ◦ π = || (1 − Ext ((1 − P )J )) ◦ π , λ λ J ⊂,|J |≥m
J ⊂,|J |<m
580
V. Baladi, H. H. Rugh
which by (A.1) is bounded by m−1
1 (1 + αk ), |λ| k=0
again uniformly in . We may therefore define a bounded linear operator U≥m : H → H through ∞ 1 ≥m U = L> Ext ((1 − P )J ). λ>+1 J ⊂,|J |≥m
>=0
Now,
|| U ≥m
◦ π is absolutely bounded by m−1
1 1 1 ( m + 1 + αk ). |λ| γ |λ| − ηm
(A.11)
k=0
Uniform convergence and continuity allow us to interchange the sum and the action of the operator to obtain Ext ((1 − P )J ). (A.12) (λ − L )U≥m = U≥m (λ − L ) = J ⊂,|J |≥m
Combining the above, in particular (A.7) and (A.12) we see that by setting U = U<m + U≥m ,
(A.13)
we obtain a bounded linear operator on H for which U (λ − L ) = (λ − L )U = 1 , thus showing that U ≡ R(λ, L ) is indeed the resolvent of L . In particular, ||
R(λ, L ) ◦ π
is uniformly bounded in . As already mentioned, the family R(λ, L ) ◦ π : M → H is projective. Hence, if we set (U φ) ≡ R(λ, L )φ , then U is a linear operator on M bounded in norm by the sum of (A.6) and (A.11). Finally π ◦ U ◦ (λ − L0 ) = U ◦ (λ − L ) ◦ π = π and π ◦ (λ − L0 ) ◦ U = (λ − L ) ◦ U ◦ π = π , demonstrating that U is precisely the resolvent R(λ, L0 ) which is therefore a bounded linear operator on M . The finite marginals, π ◦R(λ, L0 ), may be computed from the expansion of the identity which completes the proof of Lemma A.1. ! " End of the proof of Theorem 0. To end the proof of the unperturbed case (2), we must prove the claim on the image of the spectral projection. (The fact that every λ ∈ =0 is an eigenvalue of L0 on M is obvious, but not sufficient to prove the claim.) For this, we apply Theorem VII.3.24 in [DS] and need to check that every nonzero eigenvalue λ0 of L0 on M is a pole of finite order of the operator-valued resolvent function λ → R(λ, L0 ). (See also [DS, Theorem VII.3.18] and its proof.) Clearly, it suffices to
Floquet Spectrum of Weakly Coupled Map Lattices
581
show that there is >(λ0 ) such that for each finite subset ⊂ ZD , the point λ0 is a pole of finite order at most >(λ0 ) of the operator-valued resolvent function λ → R(λ, L ), and that the projective family of operator-valued residues Wλ0 of this pole define a bounded linear operator W λ0 on M by setting (W λ0 (φ)) = Wλ0 φ . The first claim follows from the fact that the index > of the eigenvalue λ0 = λ1 · · · λm of L is the same as the index of λ0 for L if λ0 is also an eigenvalue of L (e.g. if ⊂ ). The second can be obtained by large deviations arguments similar to those used above (the combinatorics is slightly more involved), using m ≥ 1 so that ηm+1 < |z| < ηm . The proof of (3) is short but we need to go a bit deeper into the construction in [R]. Let 0 < ≤ 0 . The generating functions unp (γ ) [R, Def. 4.16] defined for each fixed 0 ≤ ≤ 0 verify the following bound [R, Lemma 4.20] for all γ > 1, close enough to 1: unp (1) ≤ unp (γ ) ≤ A + B C() ≤
−1
.
Here A and B are positive constants, related to the single-site unperturbed operators (initial- and end-leaves) and the couplings (branchings), respectively and C() is the function defined in our Theorem 0 above. (The bound in [R, Lemma 4.20] is stated for unp (γ ) with some γ > 1, but the functions unp (γ ) are monotone increasing). The Perron–Frobenius operator when projected to H is bounded by [R, Lemma 4.22]: π ◦ L ≤
p∈
unp (1) ≤
(A + B C()) ≤
−||
.
p∈
If we now consider the perturbation alone, then in the proof of [R, Lemma 4.22] we should omit all contributions associated with a product of trees related to the unperturbed operator (trees consisting of initial-leaves and end-leaves only [R, Def. 4.9, Lemma 4.14]). This amounts to subtracting from the above expression precisely the contributions coming from such product trees, in other words π ◦ (L − L0 ) ≤
(A + B C()) −
p∈
A.
(A.14)
p∈
Expanding the product we see that at least one factor of C() occurs in each term on the right-hand side. On the other hand, we know that each factor, (A + B C(0 )), (note here the occurrence of 0 rather than ) is bounded by −1 . Hence, applying a large deviation argument similar to (A.8) (with m = 1 and γ = C(0 )/C()), we see that ||
π ◦ (L − L0 ) ≤ C()/C(0 ),
from which we obtain the desired bound.
(A.15)
" !
Acknowledgement. Both authors thank David Ruelle and the I.H.É.S. where this project was started during a visit in 1998. Support by the PRODYN programme of the European Science Foundation is also gratefully acknowledged. V.B. was partially supported by the Fonds National Suisse de la Recherche Scientifique and is grateful to IMPA for its hospitality in the final phase of this work. We are grateful to the referee for many thoughtful comments which improved the presentation.
582
V. Baladi, H. H. Rugh
References [AC] [B] [BIJ] [BY] [BB] [BK1] [BK2] [Bu] [BS] [DS] [FKT] [FR] [J] [JP] [JS] [K] [Ka] [KK] [MPR] [MM] [PS] [R] [V] [W]
Allaire, G. and Carlos, C.: Bloch wave homogeneization and spectral asymptotic analysis. J. Math. Pures Appl. 77, 153–208 (1998) Baladi, V.: Positive transfer operators and decay of correlations. Singapore: World Scientific, 2000 Baladi, V., Degli Esposti, M., Isola, S., Järvenpää, E. and Kupiainen, A.: The spectrum of weakly coupled map lattices. J. Math. Pures Appl. 77, 539–584 (1998) Baladi, V. and Young, L.-S.: On the spectra of randomly perturbed expanding maps. Commun. Math. Phys. 156, 355–385 (1993); (see also Erratum, Commun. Math. Phys. 166, 219–220 (1994)) Bhatia, R. and Bhattacharyya, T.: A Henrici theorem for joint spectra of commuting matrices. Proc. Am. Math. Soc. 118, 5–14 (1993) Bricmont, J. and Kupiainen, A.: Coupled analytic maps Nonlinearity. 8, 379–396 (1995) Bricmont, J. and Kupiainen, A.: High temperature expansions and dynamical systems. Commun. Math. Phys. 178, 703–732 (1996) Bunimovich, L.A.: Coupled map lattices: One step forward and two steps back. Phys. D (Chaos, order and patterns: aspects of nonlinearity – the “gran finale”, Como 1993) 86, 248–255 (1995) Bunimovich, L.A. and Sinai,Ya.G.: Spacetime chaos in coupled map lattices. Nonlinearity 1, 491–516 (1988) Dunford, N. and Schwartz, J.T.: Linear Operators, Part I. New York: Wiley-Interscience (Wiley Classics Library), 1988 Feldman, J., Knörrer, H., and Trubowitz, E.: Perturbatively unstable eigenvalues of a periodic Schrödinger operator. Comment. Math. Helv. 66, 557–579 (1991) Fischer, T. and Rugh, H.H.: Transfer operators for coupled analytic maps. Ergodic Theory Dynam. Syst. 20, 109–144 (2000) Jiang, M.: Equilibrium states for lattice models of hyperbolic type. Nonlinearity 8, 631–659 (1995) Jiang, M. and Pesin, Ya.B.: Equilibrium measures for coupled map lattices: Existence, uniqueness and finite-dimensional approximations. Commun. Math. Phys. 193, 675–711 (1998) Jordan, D.W. and Smith, P.: Nonlinear ordinary differential equations. An introduction to dynamical systems. Third edition, Oxford. Oxford University Press, 1999 Kaneko, K. (ed.): Theory and applications of coupled map lattices. Chichester: J. Wiley & Sons, 1993 Kaneko, K. (ed.): Perturbation theory for linear operators. (Reprint of the 1980 edition) Berlin: Springer-Verlag, 1995 Keller, G. and Künzle, M.: Transfer operators for coupled map lattices. Ergodic Theory Dynamical Systems 12, 297–318 (1992) McIntosh, A., Pryde, A. and Ricker, W.: Comparison of joint spectra for certain classes of commuting operators. Studia Math. LXXXVIII, 23–36 (1988) Maes, C. and Van Moffaert, A.: Stochastic stability of weakly coupled lattice maps. Nonlinearity 10, 715–730 (1997) Pesin, Ya.G. and Sinai, Ya.G.: Space-time chaos in chains of weakly interacting hyperbolic mappings. In: Dynamical systems and statistical mechanics. Moscow, 1991; Providence, RI: Am. Math. Soc., 1991 Rugh, H.H.: Coupled maps and analytic function spaces. Preprint (2000), submitted for publication Volevich, V.L.: Construction of an analogue of the Bowen–Ruelle–Sinai measure for a multidimensional lattice of interacting hyperbolic mappings. Mat. Sb. 184, 17–36 (1993) Wilkinson, J.H.: The Algebraic Problem. London: Oxford University Press, 1965
Communicated by A. Kupiainen
Commun. Math. Phys. 220, 583 – 621 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Anderson Localization for Schrödinger Operators on Z with Potentials Given by the Skew–Shift Jean Bourgain1 , Michael Goldstein1, , Wilhelm Schlag2 1 Institute for Advanced Study, Olden Lane, Princeton, NJ 08540, USA.
E-mail:
[email protected];
[email protected] 2 Department of Mathematics, Princeton University, Fine Hall, Princeton NJ 08544, USA.
E-mail:
[email protected] Received: 19 September 2000 / Accepted: 15 February 2001
Dedicated to Yakov G. Sinai on the occasion of his 65th birthday Abstract: In this paper we study one-dimensional Schrödinger operators on the lattice with a potential given by the skew shift. We show that Anderson localization takes place for most phases and frequencies and sufficiently large disorders.
1. Introduction In this paper we study the positivity of the Lyapunov exponent, the regularity of the integrated density of states, and the nature of the spectrum for the Schrödinger operators, Hω,(x,y) ψn = −ψn+1 − ψn−1 + v(Tωn (x, y))ψn on 2 (Z),
(1.1)
where Tω = (x + y, y + ω) (mod 1) is the skew-shift on the two-dimensional torus T2 . The number ω will be assumed to be Diophantine. The study of families of Schrödinger operators with potentials that are in some sense random has a long and rich history, starting with the famous work by P. Anderson [1]. It is not our intention to review this subject, as some of the history as well as many references can be found in [7]. Furthermore, the methods in this paper have little overlap with the work that has been done on the purely random case. Our approach is motivated by the recent works [3] and [7]. The main results in this paper are as follows. Fix a nonconstant real–analytic function v0 on T2 and some small ε > 0. Then there exists a set ε ⊂ T with mes[T \ ε ] < ε, and a large constant λ0 (ε, v0 ) so that for any ω ∈ ε and λ ≥ λ0 , the equation (1.1) with v = λv0 has the following properties: Permanent address: Dept. of Mathematics, University of Toronto, Toronto, Ontario, Canada M5S 1A1
584
J. Bourgain, M. Goldstein, W. Schlag
• The Lyapunov exponents of (1.1) are positive for all energies, see Prop. 2.11. • The integrated density of states is continuous with modulus of continuity 1 h(t) = exp −c| log t| 24 − , see Prop. 2.13. ε ⊂ T2 with • The operators (1.1) display Anderson localization, i.e., there exists 2 ε ] < ε so that for all (x, y) ∈ ε the spectrum is pure point and the eigen mes[T \ functions decay exponentially, see Theorem 3.7. 2. A Large Deviation Theorem for the Monodromy Matrices and Positivity of the Lyapunov Exponents for Large Disorder Consider the Schrödinger operator (1.1), where v is a trigonometric polynomial, say. An important example is v(x, y) = cos(2π x). Any solution of (1.1) is of the form ψn+1 ψ1 = Mn (x, y; E) , ψn ψ0 where Mn (x, y; E) = 1j =n Aj (x, y; E) with (T = Tω for simplicity) v(T j (x, y)) − E −1 Aj (x, y; E) = . (2.1) 1 0 The matrix Mn (x, y; E) is called the fundamental, or monodromy matrix of Eq. (1.1). As usual, 1 Ln (E) = log Mn (x, y; E) dxdy 2 T n and L(E) = limn→∞ Ln (E) = inf n Ln (E) denotes the Lyapunov exponent. Clearly, L(E) ≥ 0 for all E. Kingman’s subadditive ergodic theorem asserts that 1 log Mn (x, y; E) → L(E) for a.e. (x, y) ∈ T2 as n → ∞. n A more quantitative version of this convergence statement will be of particular importance in this paper. In fact, the goal of this section is to prove an estimate of the form 1
sup mes (x, y) ∈ T2 log Mn (x, y; E) − Ln (E) > n−σ ≤ C exp −nσ n E (2.2) for all positive integers n and some constant σ > 0, see Prop. 2.11 below for a more precise statement. These so-called “large deviation estimates” have been of central importance in some recent papers by the authors, see [3, 7], and [4]. They are a key ingredient in the proof of localization in [3] on the one hand, and are essential for proving regularity of the density of states as well as positivity of the Lyapunov exponent in [7]. The Schrödinger equations considered in [3] and [7] were of the form (1.1) with T given by the shift rather than the skew-shift, i.e., T (x, y) = (x + ω1 , y + ω2 ) (mod Z2 ) in the case of two dimensions. We want to emphasize that the methods from these papers do
Anderson Localization for the Skew–Shift
585
not directly apply to the skew-shift and a completely new approach was required for the proof of Prop. 2.11 below. To understand the difficulty introduced by the skew-shift, let us briefly review some basic aspects of the techniques underlying the proof of the large deviation estimates in [3] for the case of the shift. Firstly, the map un (z1 , z2 ) =
1 log Mn (z1 , z2 ; E) n
(2.3)
extends to a subharmonic function on a complex neighborhood of T2 . Moreover, these subharmonic functions are bounded in that neighborhood uniformly in n. Using the standard Riesz–representation for subharmonic functions one obtains the decay of the Fourier coefficients |un ( 1 , 2 )| ≤
C | 1 | + | 2 | + 1
(2.4)
with some absolute constant C. The second important idea is to exploit the almost invariance of un under the transformation T . In fact, it follows immediately from the definition of Mn as a product that K 1 K un (T k (x, y)) − un (x, y) ≤ C . sup n (x,y)∈T2 K
(2.5)
k=1
Fourier expanding the sum in (2.5) leads to a series in which the main contributions are given by the resonances of the shift, i.e., those k ∈ Z2 \ {0} for which
k · ω 1. Since ω = (ω1 , ω2 ) is assumed to be Diophantine, such resonances only occur for a sparse set of frequencies k and the decay (2.4) then controls the size of these contributions (in [3] certain technical problems arise due to the non- 2 decay provided by (2.4), which however do not concern us here). The difficulty one faces with this method in the case of the skew-shift derives from the failure of uniform boundedness of the subharmonic function (2.3). This is due to the fact that iteration of the skew-shift is given by T k (x, y) = (x + ky + k(k − 1)ω/2, y + kω) mod Z2 .
(2.6)
Complexifying in the variable y therefore produces an imaginary part of size about n in half of the factors of the product Mn , cf. (2.1). Therefore, most factors of Mn will be of size en rather than bounded as in the case of the shift. Instead of (2.4) one can only assert that |un ( 1 , 2 )| ≤
Cn . | 1 | + | 2 | + 1
(2.7)
However, since one typically has a resonance at the site (0, n) the Fourier series argument based on the decay (2.7) does not even provide that un − Ln 2 → 0. Of course, the argument which we outlined above is rather crude as the structure of Mn only enters through the almost invariance (2.5). The tool that will allow us to exploit the structure of Mn more carefully is the “avalanche principle” from [7]. We now reproduce the statement of this principle from [7], but refer the reader to that paper for the proof.
586
J. Bourgain, M. Goldstein, W. Schlag
Proposition 2.1. Let A1 , . . . , An be a sequence of arbitrary unimodular 2×2-matrices. Suppose that min Aj ≥ µ ≥ n and
(2.8)
1≤j ≤n
1 log µ. 2
(2.9)
n−1 n−1 n log Aj − log Aj +1 Aj < C . log An · . . . · A1 + µ
(2.10)
max [log Aj +1 + log Aj − log Aj +1 Aj ] ≤
1≤j C]
(HφR )(y) dy ≤ C log R.
(2.28)
On the other hand, 1 1 (HφR )(y) dy ≤ C HφR 2 R − 2 ≤ C φR 2 R − 2 ≤ C. [R|y|≤C]
(2.29)
Thus
u0 ∗ HφR ∞ ≤ u0 ∞ HφR 1 ≤ Cε0 log R. In view of the preceding
F ∞ ≤ C ε0 log R + ε1 R + N R −1 .
The lemma follows from (2.19) and (2.30) by taking R =
√ N/ε1 .
(2.30) " !
Anderson Localization for the Skew–Shift
591
Remark 2.4. The main application of Lemma 2.13 in this paper will be to estimates on the measure of the set {x ∈ T | |u(x) − u| > λ}. In fact, by the well-known John–Nirenberg inequality [16], the measure of this set does not exceed cλ C exp − . (2.31)
u BMO The exponential integrability of the Hilbert transform of a bounded function can be derived much more easily than by going through BMO and John–Nirenberg. Indeed, it is a classical, and rather simple fact that for any real-valued function f on T such that |f | ≤ 1, one has the bound 2 exp α|(Hf )(t)| dt ≤ cos(απ/2) T for any 0 ≤ α < 1, see Theorem 1.9 in [9] (with α < 1 being optimal). Using this bound in the previous proof instead of the deeper fact that H : L∞ → BMO leads directly to the estimate (2.31) on the measure. Since the BMO-estimate (2.13) might be of interest in its own right, we have chosen to present Lemma 2.3 in this way. Lemma 2.5. Let u : T2 → R satisfy u L∞ (T2 ) ≤ 1. Assume that u extends as a separately subharmonic function in each variable to a neighborhood of T2 such that for some N ≥ 1 and ρ > 0, sup
sup |u(z1 , z2 )| ≤ N.
z1 ∈Aρ z2 ∈Aρ
Furthermore, suppose that u = u0 + u1 on T2 where
u0 − u L∞ (T2 ) ≤ ε0 and u1 L1 (T2 ) ≤ ε1 with 0 < ε0 , ε1 < 1. Here u := T2 u(x, y) dxdy. Then for any δ > 0,
1 mes (x, y) ∈ T2 |u(x, y) − u| > B δ log(N/ε1 ) ≤ CN 2 ε1−1 exp −cB − 2 +δ , 3
1
where B = ε0 log(N/ε1 ) + N 2 ε14 . The constants c, C only depend on ρ. Proof. We may assume that u = 0 without significantly changing the hypotheses. Let −1 M = N 2 ε1 2 and denote the Fejér-kernel on T with Fourier support [−M +1, M −1] by FM . Then u ∗ 1 FM = u0 ∗ 1 FM + u 1 ∗ 1 F M , where ∗1 denotes the convolution in x alone. It is clear that for fixed x ∈ T, √
u0 ∗1 FM (x, ·) L∞ ≤ ε0 and u1 ∗1 FM (x, ·) L1y ≤ Mε1 ≤ 2N 2 ε1 . y
592
J. Bourgain, M. Goldstein, W. Schlag
Since FM ≥ 0, (u ∗1 FM )(x, ·) extends to a subharmonic function in the second variable satisfying sup |u ∗1 FM (x, z)| ≤ N.
z∈Aρ
Hence Lemma 2.3, in conjunction with the John–Nirenberg inequality, implies that for any λ > 0,
cλ sup mes y ∈ T (u ∗1 FM )(x, y) − (u ∗1 FM )(x, ·) > λ ≤ C exp − , B x∈T (2.32) 3
1
where B := Cρ (ε0 log(N/ε1 ) + N 2 ε14 ).
(2.33)
Observe that for any x, x ∈ T sup |(u ∗1 FM )(x, y) − (u ∗1 FM )(x , y)| ≤ M u L∞ (T2 ) |x − x | ≤ M |x − x |.
y∈T
(2.34) Let N ⊂ T be a M −1 λ/4-net. In view of (2.32) and (2.34) one concludes that
cλ 1 M mes y ∈ T sup (u ∗1 FM )(x, y) − (u ∗1 FM )(x, ·) > λ ≤ C exp − . 2 λ B x∈T (2.35) √ Now let λ = 2 B and denote the set on the left-hand side of (2.35) with this choice of λ by B1 . Thus 1 1 1 −1 mes(B1 ) ≤ CN 2 ε1 2 B − 2 exp −cB − 2 ≤ CN 2 ε1−1 exp −cB − 2 .
(2.36)
Now fix some y ∈ T \ B1 and consider the decomposition of u(·, y) as a function of the first variable given by u(·, y) = u(·, y) − (u ∗1 FM )(·, y) + (u ∗1 FM )(·, y).
(2.37)
From the Riesz representation u(z, y) = log |z − ζ | dµ(ζ ) + h(z) with µ(Aρ/2 ) + h L∞ (Aρ/4 ) ≤ Cρ N, it is standard to deduce that the Fourier coefficients u( , ˆ y) := u(x, y) e(− x) dx T
decay as follows: |u( , ˆ y)| ≤
Cρ N . | |
Anderson Localization for the Skew–Shift
593
In particular, by definition of FM and because of our choice of y, see (2.35),
u(·, y) − (u ∗1 FM )(·, y) 2 ≤ Cρ N M − 2 and √ sup (u ∗1 FM )(x, y) − (u ∗1 FM )(x, ·) ≤ B. 1
(2.38)
x∈T
The mean appearing in the second term is uniformly small. In fact, for all x ∈ T, (u ∗1 FM )(x, ·) ≤ |(u0 ∗1 FM )(x, y)| dy + |(u1 ∗1 FM )(x, y)| dy T
T
√ ≤ u0 L∞ (T2 ) + M u1 L1 (T2 ) ≤ ε0 + 2N 2 ε1 .
(2.39)
Assuming as we may √ that B ≤ 1, one checks from (2.33) that the bound in (2.39) is no larger than C B. Hence (2.38) implies that for any y ∈ T \ B1 (recall that −1 M = N 2 ε1 2 ) 1 √
u(·, y) − (u ∗1 FM )(·, y) 1 ≤ Cρ ε14 and sup (u ∗1 FM )(x, y) ≤ C B.
x∈T
Applying Lemma 2.3 to the function u(·, y) with the decomposition given by (2.37) therefore yields √ √ 1 1 sup u(·, y) BMO ≤ Cρ ( B log(N/ε1 ) + N 2 ε18 ) ≤ Cρ B log(N/ε1 ). (2.40)
y∈T\B1
It remains to be shown that v(y) := u(·, y) =
T
u(x, y) dx
is close to zero for most y. Clearly, v extends to a subharmonic function on Aρ such that sup |v(z)| ≤ N and v = u = 0.
z∈Aρ
With v0 (y) := u0 (·, y) and v1 (y) := u1 (·, y) one has
v0 L∞ (T) ≤ ε0 and v1 L1 (T) ≤ ε1 . Therefore, Lemma 2.3 implies that
v BMO ≤ C ε0 log(N/ε1 ) + N ε1 ≤ CB. Thus
√ 1 mes y ∈ T |v(y)| > B ≤ C exp −cB − 2 .
(2.41)
Denoting the set on the left-hand side by B2 , let B := B1 ∪B2 . One concludes from (2.36), (2.41), and (2.40) by means of the John–Nirenberg inequality that
1 mes (x, y) ∈ T2 |u(x, y)| > B δ log(N/ε1 ) ≤ mes(B) + C exp −cB − 2 +δ , and the lemma follows. ! "
594
J. Bourgain, M. Goldstein, W. Schlag
2.2. Averages of subharmonic functions over orbits of the skew-shift. In what follows we assume that ω ∈ (0, 1) is Diophantine in the sense that
nω ≥ ε n−1 (1 + log n)−2 for any n ∈ Z+ ,
(2.42)
where ε > 0 is some arbitrary but fixed small number. Let ε be the set of those ω that satisfy (2.42). It is clear that mes[T \ ε ] < Cε with an absolute constant C. The choice of logarithm in (2.42) is mainly for convenience. A very small power loss is also acceptable. Throughout this section we will use ε in this sense. Let Tω : T2 → T2 , Tω (x, y) = (x + y, y + ω) (mod Z2 ) be the skew-shift. Observe that the iterates of Tω are given by Tωk (x, y) = (x + ky + k(k − 1)ω/2, y + kω)
mod Z2
(2.43)
for any k ∈ Z. Lemma 2.6. Let u : T2 → R extend to some neighborhood of T2 as a separately subharmonic function in each variable so that for some ρ > 0, sup
sup |u(z1 , z2 )| ≤ 1.
z1 ∈Aρ z2 ∈Aρ
(2.44)
Fix a small ε > 0 and let ω ∈ ε , see (2.42). For any δ > 0 there exist constants c, C such that K 1
1 mes (x, y) ∈ T2 u ◦ Tωk (x, y) − u > K − 12 +2δ ≤ C exp −cK δ , K k=1
(2.45)
for any positive integer K. Here u = T2 u(x, y) dxdy and the constants depend only on ρ, δ, ε. 1 Proof. Let u( , ˆ y) = 0 u(x, y)e(− x) dx denote the Fourier coefficient with respect to the first variable. As above one deduces by means of the Riesz representation of the subharmonic function z % → u(z, y) and from (2.44) that sup |u( , ˆ y)| ≤ Cρ | |−1 .
y∈T
(2.46)
With some positive integer p1 to be determined, let u( ˆ 1 , y)e( 1 x) + u( ˆ 1 , y)e( 1 x) u(x, y) = | 1 |≤p1
| 1 |>p1
(2.47)
=: u1 (x, y) + u1 (x, y), where u1 and u1 are the respective sums on the right-hand side of (2.47). By (2.46), −1
sup u1 (·, y) L2x ≤ Cp1 2 .
y∈T
(2.48)
Anderson Localization for the Skew–Shift
595
With some positive integer p2 to be determined below, let u1 (x, y) = u( ˆ 1 , 2 )e( 1 x + 2 y) + u( ˆ 1 , 2 )e( 1 x + 2 y) | 1 |≤p1 | 2 |>p2
| 1 |≤p1 | 2 |≤p2
(2.49)
=: u2 (x, y) + u3 (x, y). Using the Riesz representation in the second variable one derives from (2.44) that Cρ . (2.50) |u( ˆ 1 , 2 )| ≤ e(− 2 y)u(x, y) dy dx ≤ 1 + | 2 | T T Therefore,
u( ˆ 1 , 2 )e( 2 y)
u2 L2 (T2 ) ≤
| 1 |≤p1 | 2 |>p2
−1
L2y
≤ C p1 p2 2 .
In particular,
mes y ∈ T
T
K 1 u2 ◦ T k (x, y) dx > K −1 K k=1
≤K
T2
K 1 u2 ◦ T k (x, y) dxdy K
(2.51)
k=1
−1
≤ K u2 L1 (T2 ) ≤ C Kp1 p2 2 . Let B be the set on the left-hand side of (2.51). In view of (2.43), K 1 sup u3 ◦ T k (x, y) − u K 2 x,y∈T k=1
≤
1 K
| 1 |≤p1 ,| 2 |≤p2 | 1 |+| 2 |=0
K C e 1 (ky + ωk(k − 1)/2) + 2 kω 1 + | 2 | k=1
p2 K 1 C ≤ e( 2 kω) K
2
2 =1
k=1
p1 p2 1 C + K 1 + 2
1 =1 2 =0
≤
K−1
min K, m 1 ω −1
21
(2.52)
m=1
p2 C 1 min(K,
2 ω −1 ) K
2
2 =1
1 p1 K−1 2 C√ + p1 log p2 min K, m 1 ω −1 K
1 =1 m=1
=: S1 + S2 .
(2.53)
596
J. Bourgain, M. Goldstein, W. Schlag
To obtain the second term in line (2.52), one uses the well-known method of Weyl– differencing, cf. Montgomery [11, Chap. 3]. In fact, K K−1 2 e 1 (ky + ωk(k − 1)/2) + 2 kω ≤ K + 2 min K, m=1
k=1
≤C
K−1
2 |1 − e( 1 ωm)|
min K,
1 ωm −1 ,
m=1
which leads to (2.52). In view of (2.42) (with a ∼ b denoting b ≤ a ≤ 2b), for any positive integer R, R 1
=1
R
min(1, K −1
ω −1 ) ≤
1≤2j ≤K =1
+
R
1 χ[
ω ∼2−j ] min(1, K −1 2j )
χ[
ω ≤K −1 ]
=1
≤C
1≤2j ≤K
≤C
1
R
(log K)2 2j 2 −j j 2 log R + C K
K
(log K)2 K
=1
log R.
Here the constants depend on ε. Thus, S1 ≤ Cε
(log K)2 log p2 . K
(2.54)
By Dirichlet’s principle there is an integer 1 ≤ q ≤ K and an integer p so that 1 gcd(p, q) = 1 and |ω − pq | ≤ qK . In view of (2.42), one also has q ≥ cε (logKK)2 . By means of the standard bound on the divisor function and the usual estimates for reciprocal sums, cf. [11, Chap. 3], p1 K−1
min(K, m 1 ω −1 ) ≤ Cε2 (p1 K)ε2
1 =1 m=1
p 1K
min(K, kω −1 )
k=1
p K2 1 ≤ Cε2 (p1 K)ε2 + p1 K log q + K + q log q q ≤ Cε2 (p1 K)1+2ε2 , (2.55)
where ε2 > 0 is an arbitrarily small parameter. One obtains from (2.53), (2.54), and (2.55) that K 1 1 u3 ◦ T k (x, y) − u ≤ S1 + S2 ≤ Cp11+ε2 K − 2 +ε2 log p2 sup x,y∈T2 K k=1
(2.56)
Anderson Localization for the Skew–Shift
597 1
with a constant that depends both on ε and ε2 . Fix some small δ > 0 and choose p1 = K 3 δ and p2 = exp 4K . The conclusion from the preceding is as follows, cf. (2.48), (2.51), and (2.56): There exists a subset B ⊂ T of measure 4 (2.57) mes(B) ≤ CK 3 exp −2K δ ≤ C exp −K δ , such that (choosing 2ε2 < δ) K 1 sup u ◦ T k (·, y) − u 1 Lx y∈T\B K k=1
K K 1 1 ≤ sup u1 ◦ T k (x, y) 1 + sup u2 ◦ T k (x, y) dx Lx K T y∈T K y∈T\B k=1
+
sup (x,y)∈T2
≤ CK
− 16
k=1
(2.58)
K 1 u3 ◦ T k (x, y) − u K k=1
+K
−1
+ C K 3 +ε2 K δ− 2 +ε2 ≤ C K − 6 +2δ 1
1
1
with constants that depend on both δ and ε. To obtain (2.45), one uses Lemma (2.3) to convert the L1 -bound (2.58) into an L∞ -bound at the cost of removing an exponentially small set. For any fixed y ∈ T \ B, consider the bounded subharmonic function K 1 vy (z) := u ◦ T k (z, y) with z ∈ Aρ . K k=1
It is important to notice that y is real. Otherwise T k (z, y) ∈ Aρ × Aρ for large k, see (2.43). One has the decomposition K K 1 1 u ◦ T k (·, y) = u + u ◦ T k (·, y) − u. K K k=1
k=1
In view of (2.58) one obtains from Lemma 2.3 (with N = 1, ε0 = 0, and ε1 = K − 6 +2δ ) that K 1 1 k u ◦ T (·, y) ≤ Cδ K − 12 +δ . K 1
k=1
BMOx
By the John–Nirenberg inequality thus
1 sup mes x ∈ T |vy (x) − vy | > Cδ K − 12 +2δ ≤ C exp −K δ . y∈T\B
(2.59)
Since (2.58) implies that |vy − u| ≤ Cδ K − 6 +2δ , the lemma follows from (2.59) and (2.57). ! " 1
598
J. Bourgain, M. Goldstein, W. Schlag
Remark 2.7. It will be important in the proof of localization below that the previous lemma requires only finitely many conditions on ω. More precisely, the arithmetic nature of ω only enters into the estimate of S1 and S2 . Furthermore, what is required for the bound on S1 is the following: If for some K −1 ≤ κ < 1 and some positive distinct integers , ,
ω < κ and
ω < κ, then | − | ≥ cε κ −1 (log κ)−2 . This clearly requires the Diophantine condition (2.42) only for 1 ≤ k ≤ K. As far as S2 is concerned, it is evident from the estimate of S2 that (2.42) is used only in the range 1 ≤ k ≤ p1 K ≤ K 2 . 2.3. The main inductive step in the proof of the large deviation theorem. Consider equations of the form −ψn+1 − ψn−1 + λv(Tωn (x, y))ψn = Eψn ,
(2.60)
where Tω : T2 → T2 , Tω (x, y) = (x + y, y + ω) (mod Z2 ) is the skew-shift, and v is a nonconstant real–analytic function on T2 satisfying some further conditions that will specified below. Let j Aj (x, y; λ, E) = λv(Tω (x, y)) − E −1 . 1 0 The matrix Mn (x, y; λ, E) = Eq. (2.60). As usual,
1
Ln (λ, E) =
j =n Aj (x, y; λ, E)
1 n
denotes the monodromy matrix of
T2
log Mn (x, y; λ, E) dxdy
and L(λ, E) = limn→∞ Ln (λ, E) denotes the Lyapunov exponent. Introduce a scaling factor S(λ, E) = log(Cv + |λ| + |E|) ≥ 1,
(2.61)
where Cv is a constant depending only on the potential v so that for all n 1 1 log Mn (z, y; λ, E) + sup sup 2 log Mn (z1 , z2 ; λ, E) ≤ S(λ, E). z∈Aρ y∈T n z1 ∈Aρ z2 ∈Aρ n (2.62) sup sup
Here ρ = ρv is determined by v. Observe that (2.62) basically requires the function v to extend in the first variable to an analytic function on C \ {0} such that sup |v(z, y)| ≤ C(|z|d + |z|−d )
y∈T
with some constant d, see (2.43). For example, any trigonometric polynomial v(x, y) = ak, e(kx + y) |k|+| |≤d
Anderson Localization for the Skew–Shift
599
satisfies this requirement. Another possibility, which is slightly more technical to state, but applies to any analytic function on a neighborhood of T2 , is as follows: For all n, sup
sup
z1 ∈Aρ z2 ∈Aρ/n
1 log Mn (z1 , z2 ; λ, E) ≤ S(λ, E). n
(2.63)
The difference from (2.62) here is that in the second term the z2 -variable only needs to be taken in an annulus of thickness ρn . Observe that (2.63) can be stated for any potential v that extends analytically to a neighborhood of T2 of size ρ. This is essential for realanalytic v. The reason (2.63) is sufficient for our purposes is the following simple fact. Suppose u is a subharmonic function on Aρ/n bounded by one. Then there is the Riesz representation u(z) = log |z − ζ | dµ(ζ ) + h(z), where µ(Aρ/(2n) ) + h L∞ (Aρ/(4n) ) ≤ Cρ n.
(2.64)
In particular, one has the decay of the Fourier coefficients |u( )| ˆ ≤
Cn .
(2.65)
The reader will easily verify that (2.64), (2.65) are all that is required in the proof of the following lemma. The following lemma provides the inductive step in the proof of the large deviation theorem. It is based on the avalanche principle and all our previous lemmas. Lemma 2.8. Fix ε > 0 small and let ω ∈ ε , see (2.42). Suppose n and N > n are positive integers such that 1
γ mes (x, y) ∈ T2 log Mn (x, y; λ, E) − Ln (λ, E) > S(λ, E) ≤ N −10 , n 10 (2.66) 1
γ mes (x, y) ∈ T2 log M2n (x, y; λ, E) − L2n (λ, E) > S(λ, E) ≤ N −10 . 2n 10 (2.67) Assume that min(Ln (λ, E), L2n (λ, E)) ≥ γ S(λ, E), γ S(λ, E), Ln (λ, E) − L2n (λ, E) ≤ 40 9γ nS ≥ 10 log(2N ) and n2 ≤ N.
(2.68) (2.69) (2.70)
Then there is some absolute constant C0 with the property that (with LN = LN (λ, E) etc.) LN ≥ γ S(λ, E) − 2(Ln − L2n ) − C0 S(λ, E)nN −1 and LN − L2N ≤ C0 S(λ, E)nN −1 .
(2.71)
600
J. Bourgain, M. Goldstein, W. Schlag
Furthermore, for any σ
0 so that
1
mes (x, y) ∈ T2 log MN (x, y; λ, E) − LN (λ, E) > S(λ, E)N −τ N
(2.72) ≤ C exp −N σ with some constant C = C(σ, ε). Proof. We shall fix ω, λ, and E for the purposes of this proof and suppress these variables in the notation. In particular, S = S(λ, E). Denote the set on the left-hand side of (2.66) by Bn and the set on the left-hand side of (2.67) by B2n . For any (x, y) ∈ T2 \ Bn , 9γ γ
Mn (x, y) ≥ exp(nγ S − Sn) = exp Sn =: µ. 10 10 By (2.70), µ ≥ 2N. Furthermore, for any (x, y) ∈ Bn ∪ T −n Bn ∪ B2n , (2.66)–(2.69) imply log Mn ◦ T n (x, y) + log Mn (x, y) − log M2n (x, y) 4γ 9γ 1 Sn ≤ Sn = log µ. ≤ 2n(Ln − L2n ) + 10 20 2 Applying Prop. 2.1 N times yields a set B1 ⊂ T2 with measure mes(B1 ) ≤ 4N · N −10 = 4N −9 so that for any (x, y) ∈ T2 \ B1 , N 1 1 log MN (x, y) + 1 log Mn ◦ T j (x, y) N N n j =1 N Sn 2 1 1 j − log M2n ◦ T (x, y) ≤ C + ≤ CSnN −1 . N 2n N µ
(2.73)
(2.74)
j =1
Integrating (2.74) over T2 yields |LN + Ln − 2L2n | ≤ C SnN −1 + 16SN −9 ,
(2.75)
which implies the first inequality in (2.71). To obtain the second inequality in (2.71), observe that by virtue of (2.70) all arguments so far apply equally well to M2N instead of MN . Subtracting (2.75) from the analogous inequality involving L2N yields the desired bound. Denote uN (x, y) =
1 log MN (x, y) , N
and similarly with n and 2n. In view of (2.63), both un and u2n extend to separately subharmonic functions in both variables such that
sup sup |un (z1 , z2 )| + |u2n (z1 , z2 )| ≤ CS. z1 ∈Aρ
z2 ∈A(ρ/n)
Anderson Localization for the Skew–Shift
601
Applying Lemma 2.6 to un /S and u2n /(2S) (cf. the comments following (2.63), in particular (2.64) and (2.65)) therefore implies that there is a set B2 ⊂ T2 with measure (δ > 0 is a fixed small number) (2.76) mes(B2 ) ≤ C exp −N δ , such that for any (x, y) ∈ G := T2 \ (B1 ∪ B2 ), |uN (x, y) + Ln − 2L2n | ≤ CSnN −1 + Cδ SN − 12 +2δ , 1
(2.77)
see (2.74). For small δ the second term is the larger one since N ≥ n2 . Fix such an integer N . Consider the following decomposition of u := uN as a function on T2 : u = uχG + LN χG c + uχG c − LN χG c =: u0 + u1 . Here u0 is the sum of the first two terms (and G c := T2 \ G). In view of (2.77) and (2.75),
u0 − u ∞ = u0 − LN ∞ = u − LN L∞ (G ) ≤ uN + Ln − L2n L∞ (G ) + |LN + Ln − L2n | ≤ Cδ SN − 12 +2δ . 1
(2.78) On the other hand, (2.73) and (2.76) imply that
u1 1 ≤ 2S mes(G c ) ≤ CS N −9 + exp −N δ ≤ Cδ SN −9 .
(2.79)
Applying Lemma 2.5 to the function u/S with ε0 and ε1 given by (2.78) and (2.79), respectively, proves (2.72). Indeed, in this case the quantity B from Lemma (2.5) satisfies B ≤ Cδ N − 12 +2δ log(N 10 ) + CN 2 N − 4 , 1
which gives the value of σ stated above.
3
9
" !
Remark 2.9. In view of Remark 2.7 it is clear that Prop. 2.11 only requires the Diophantine condition (2.42) in the range 1 ≤ k ≤ N 2 . This will be relevant in the proof of localization below. 2.4. The initial condition via large disorder. Let Vj = v ◦ T j (x, y) and define λV1 − E −1 0 0 . . . . 0 −1 λV2 − E −1 0 0 . . . 0 0 −1 λV3 − E −1 0 0 . . 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . −1 0 0 . . . . 0 −1 λVn − E
fn (x, y; λ, E) = det Recall the simple property
(2.80)
fn (x, y; λ, E) −fn−1 (T (x, y); λ, E) Mn (x, y; λ, E) = . fn−1 (x, y; λ, E) −fn−2 (T (x, y); λ, E)
(2.81)
Dn (x, y; λ, E) = diag(λV1 − E, . . . , λVn − E).
(2.82)
Finally, let
602
J. Bourgain, M. Goldstein, W. Schlag
Lemma 2.10. There exist constants λ0 and B depending only on v such that for any positive integer n, 1
1 S(λ, E) ≤ n−50 , sup mes (x, y) ∈ T2 log Mn (x, y; λ, E) − Ln (λ, E) ≥ n 20 E (2.83) provided λ ≥ λ0 ∨ nB . Furthermore, for those λ and all E, Ln (λ, E) ≥
1 1 S(λ, E) and Ln (λ, E) − L2n (λ, E) ≤ S(λ, E). 2 80
Proof. The matrix on the right-hand side of (2.80) can be written in the form Dn + Bn , where Dn is given by (2.82). Clearly, Bn = 2 and n
1 1 log |v(T j (x, y)) − E/λ|. log | det Dn (x, y; λ, E)| = log λ + n n
(2.84)
j =1
It is a well-known property of nonconstant real-analytic functions v that there exist constants b > 0 and C depending on v such that mes (x, y) ∈ T2 |v(x, y) − h| < t ≤ Ct b (2.85) for all −2 v ∞ ≤ h ≤ 2 v ∞ and t > 0, see for example Lemma 11.4 in [7]. Therefore, for any |E| ≤ 2λ v ∞ , n 1
mes (x, y) ∈ T2 log |v ◦ T j (x, y) − E/λ| < −ρ < n Ce−bρ . n
(2.86)
j =1
One also has the upper bound n
sup (x,y)∈T2
1 log |v(x, y) − E/λ| ≤ log(3 v ∞ ). n
(2.87)
j =1
Since
Dn (x, y; λ, E)−1 ≤ λ−1 max |v ◦ T j (x, y) − E/λ|−1 , 1≤j ≤n
(2.85) implies that
1 mes (x, y) ∈ T2 Dn (x, y; λ, E)−1 > 4 ≤ n mes (x, y) ∈ T2 |v(x, y) − E/λ| < 4λ−1
(2.88)
≤ Cnλ−b . Hence
1 ≤ Cnλ−b . mes (x, y) ∈ T2 Dn (x, y; λ, E)−1 Bn > 2
(2.89)
Anderson Localization for the Skew–Shift
603
In view of (2.80), (2.84), (2.86), (2.87), and (2.88), 1 log |fn (x, y; λ, E)| − log λ n n 1 1 ≤ log |v(T j (x, y)) − E/λ| + log | det(I + Dn (x, y; λ, E)−1 Bn )| n n j =1
≤ ρ + log(3 v ∞ ) + log 2
(2.90)
up to a set of measure not exceeding Cne−bρ + Cnλ−b .
(2.91)
1 log λ and assume (6 v ∞ )400 ≤ λ. Then the right-hand side of (2.90) Now let ρ = 400 1 is no larger than 200 log λ. Under these assumptions the measure given by (2.91) is on b
the order of Cnλ− 400 . Choosing
λ ≥ nB for some B depending only on v implies 1
1 sup mes (x, y) ∈ T2 log |fn (x, y; λ, E)| − log λ ≥ log λ ≤ n−100 . n 200 |E|≤2λ v ∞ In view of (2.81) one therefore obtains 1
1 mes (x, y) ∈ T2 log Mn (x, y; λ, E) − log λ ≥ log λ n 199 |E|≤2λ v ∞ sup
≤ 4n−100 .
(2.92)
In particular, |Ln (λ, E) − log λ| ≤
1 1 log λ + 4S(λ, E)n−100 ≤ S(λ, E), 199 198
(2.93)
provided n ≥ 2. Since log λ ≥
99 S(λ, E) sup 100 |E|≤2λ v ∞
for large λ0 , (2.93) implies the second statement of the lemma in this range of E. Replacing log λ with Ln in (2.92) yields 1
1 S(λ, E) mes (x, y) ∈ T2 log Mn (x, y; λ, E) − Ln (λ, E) ≥ n 90 |E|≤2λ v ∞ sup
≤ 4n−100 .
(2.94)
If |E| > 2λ v ∞ and λ0 is sufficiently large, then the set in (2.83) is empty. In fact, for such E, 1 log | det Dn (x, y; λ, E)| − log |E| ≤ 2, n
604
J. Bourgain, M. Goldstein, W. Schlag
and thus 1 log |fn (x, y; λ, E)| − log |E| ≤ 4 n which implies that for large λ, 1 1 S(λ, E). log Mn (x, y; λ, E) − log |E| ≤ 8 ≤ n 200 Hence |Ln (λ, E) − log |E|| ≤ and the lemma follows.
1 S(λ, E), 200
" !
2.5. The proof of the large deviation estimate and positivity of the Lyapunov exponent. Proposition 2.11. Fix ε > 0 small and let ω ∈ ε , see (2.42). Assume v is a nonconstant 1 real–analytic function on T2 . Then for all σ < 24 there exist τ = τ (σ ) > 0 and constants λ1 and n0 depending only on ε, v and σ such that 1
sup mes (x, y) ∈ T2 log Mn (x, y; λ, E) − Ln (λ, E) n E
> S(λ, E)n−τ ≤ C exp −nσ
(2.95)
for all λ ≥ λ1 and n ≥ n0 . Furthermore, for those ω, λ and all E, L(λ, E) = inf Ln (λ, E) ≥ n
1 log λ. 4
1 throughout the proof and let τ = τ (σ ) > 0 be as in (2.72). Moreover, Proof. Fix σ < 24 B let λ ≥ λ0 ∨ n0 =: λ1 be as in Lemma 2.10. In this proof we shall require n0 to be sufficiently large at various places, but of course n0 will be assumed fixed. In view of Lemma 2.10 the hypotheses of Lemma 2.8 are satisfied with γ = γ0 = 21 ,
n20 ≤ N ≤ n50 ,
(2.96)
9n0 ≥ 20 log(2n10 0 ),
(2.97)
provided
cf. (2.70) (recall that S(λ, E) ≥ 1). It is clear that (2.97) holds if n0 is large. Applying Lemma 2.8 one obtains (suppressing λ, E for simplicity) 1 1 LN ≥ ( − )S − C0 SN −1 n0 ≥ γ1 S 2 40 γ1 and LN − L2N ≤ C0 SN −1 n0 ≤ S 40
(2.98)
Anderson Localization for the Skew–Shift
605
with γ1 = 13 . Moreover, with some constant C1 ≥ 1 depending on ε, 1
mes (x, y) ∈ T2 log MN (x, y; λ, E) N
− LN (λ, E) > S(λ, E)N −τ ≤ C1 exp −N σ
(2.99)
for all N in the range given by (2.96). In particular, (2.99) implies that 1
γ1 mes (x, y) ∈ T2 log MN (x, y; λ, E) − LN (λ, E) > S(λ, E) N 10
σ ≤ C1 exp −N ≤ N¯ −10 , provided n0 is large and
1 Nσ . 10 The first inequality was added to satisfy (2.70). In view of (2.96), one thus has the range 1 n40 ≤ N¯ ≤ exp (2.100) n5σ 10 0 of admissible N¯ . Moreover, 1
N 2 ≤ N¯ ≤ C110 exp
LN¯ ≥ γ1 S − 2C0 SN −1 n0 − C0 S N¯ −1 N and L ¯ − L ¯ ≤ C0 S N¯ −1 N. N
(2.101)
2N
At the next stage of this procedure, observe that the left end-point of the range of admissible indices starts at n80 , which is less than the right end-point of the range (2.100) (for n0 large). Therefore, from this point on the ranges will overlap and cover all large integers. To ensure that the process does not terminate, simply note the rapid convergence of the series given by (2.101). ! " Remark 2.12. Herman’s method [8] for proving positivity of the Lyapunov exponent for potentials given by trigonometric polynomials also applies to the skew-shift. However, it is well-known that his bound only involves the coefficient of the highest frequency of the trigonometric polynomial. In particular, it does not generalize to analytic functions covered by Prop. 2.11. On the other hand, for the important example v(x, y) = cos(2π x), it gives the superior lower bound inf L(λ, E) ≥ log(λ/2). E
Finally, in [2] the first author has recently shown that for this choice of v and all sufficiently small λ > 0 there is ω0 (λ) > 0 and a subset Eλ ⊂ [−2, 2] with the property that mes([−2, 2] \ Eλ ) → 0 as λ → 0 and such that inf L(ω, E) > 0 provided 0 < ω < ω0 .
E∈Eλ
Here L(ω, E) denotes the Lyapunov exponent for the skew-shift Tω (x, y) = (x +y, y + ω). Observe that this behavior is the exact opposite of the one displayed by the wellknown almost Mathieu equation as λ → 0. The approach in [2] is based on Kotani’s theorem [10, 14], Aubry-duality, and a perturbative argument for the almost Mathieu equation.
606
J. Bourgain, M. Goldstein, W. Schlag
2.6. Regularity of the integrated density of states.. Let EC,j (λ, x, y), j = 1, . . . , b − a + 1 = |C| be the eigenvalues of the restriction of (2.60) to the interval C = [a, b] with zero boundary conditions, ψ(a − 1) = ψ(b + 1) = 0. Consider NC (λ, E, x, y) =
1 χ(−∞,E) (EC,j ). |C| j
It is well-known that the weak limit (in the sense of measures) lim
a→−∞,b→+∞
dNC (λ, ·, x, y) = dN (λ, ·)
exists and does not depend on (x, y) ∈ T2 (up to a set of measure zero). The distribution function N (λ, ·) is called the integrated density of states. It is connected with the Lyapunov exponent via the Thouless formula L(λ, E) = log |E − E | dN (λ, E ). (2.102) In this subsection we show that for large λ both L and N have a modulus of continuity which is at least as good as 1 h(t) = exp −c| log t| 24 − . (2.103) This improves on various well-known continuity properties of L and N that hold for very general classes of transformations T . So far nothing better was known for the skewshift than log-continuity, which corresponds to replacing the power of log t in (2.103) with log log t, see Figotin, Pastur [5] and the references therein. For the proof of (2.103) we follow the approach from [4], which only requires a large deviation estimate and the avalanche principle. The latter does not depend on the transformation, and the former is given by Prop. 2.11. In particular, our assumption of large disorder is made necessary by that proposition. Since it is rather straightforward to apply the technique from [4] here, we shall be somewhat brief. Proposition 2.13. Let ω, v, and λ1 be as in Prop. 2.11. For λ > λ1 both N (λ, E) and L(λ, E) are continuous in E with modulus of continuity given by (2.103). Proof. We shall prove this for L. It is standard to deduce the statement about N from that on L by means of (2.102), see [7, Sect. 10]. For the sake of simplicity we shall 1 suppress λ in the notation. Fix any positive σ < 24 . Let N be a large integer and 1
set n = )C0 (log N ) σ * with some large constant C0 . One deduces from the avalanche principle and (2.95) that Cn , N Cn |L2N (E) − 2L2n (E) + Ln (E)| ≤ . N |LN (E) − 2L2n (E) + Ln (E)| ≤
(2.104)
The point is that (2.95) insures that the hypotheses (2.8) and (2.9) in Prop. 2.1 are satisfied up to a set of measure less than CN exp(−nσ ). This measure can therefore
Anderson Localization for the Skew–Shift
607
be made less than N −1 by taking C0 large enough. Taking the difference of the two inequalities in (2.104) yields |LN (E) − L2N (E)| ≤
Cn , N
which after summing over dyadic N gives |LN (E) − L(E)| ≤
Cn . N
(2.105)
Inserting (2.105) into (2.104) leads to |L(E) − 2L2n (E) + Ln (E)| ≤
Cn . N
(2.106)
It is clear that the derivatives of L2n (E) and Ln (E) in E are at most of size eCn . In view of this fact (2.106) implies that for any nearby E, E , σ Cn |L(E) − L(E )| ≤ (2.107) + eCn |E − E | ≤ C exp −clog |E − E | , N if one sets |E − E | = exp(−2Cn). 3. Localization The purpose of this section is to show that the operator (2.60) has pure point spectrum with exponentially decaying eigen functions for most ω, x, y ∈ T (i.e., up to a set of small measure) provided λ is sufficiently large, see Theorem 3.7 below. We will follow the scheme from [3]. The basic idea behind the proof is to start with a generalized eigen function with energy E, whose existence is guaranteed by the Shnol–Simon theorem, and then to show that it in fact decays exponentially. It is well-known that for this to hold one needs the Green’s functions GI (x, y; E) on most intervals I ⊂ Z with dist(I, 0) ∼ |I |
(3.1)
to possess exponential off–diagonal decay. This in turn is the case provided the monodromy matrices corresponding to those intervals I have norms which are on the order of eL(E)|I | , L(E) being the Lyapunov exponent. By the large deviation estimate (2.95), the bad set of (x, y) ∈ T, where any given one of these monodromy matrices has too small norm is exponentially small in |I |. The difficulty that arises here is of course that the sets of bad parameters depend on E. In principle, one would therefore need to remove the union over E of all these bad sets which might amount to the entire parameter set. The approach in [3] is to consider the set of parameters where there is some energy E with the property that, on the one hand, for some interval J ⊂ Z centered at 0 the Green’s function GJ (x, y; E) has very large norm and, on the other hand, the Green’s function GI (x, y; E) fails to have the necessary off–diagonal decay. Here I is an arbitrary interval as in (3.1), whose length and position is related to the length of J , see the proof of Theorem 3.7 below for details. Using the large deviation theorem it is possible to show that this set of parameters has small measure, see Lemma 3.6 below. It was observed in [3] that estimating the measure of the set of parameters that produce these “double resonances” can be accomplished provided one has some control on its complexity. This
608
J. Bourgain, M. Goldstein, W. Schlag
can be made precise in terms of semi-algebraic sets, which we also use here. The main technical statement in this context is Lemma 3.3 below. That lemma is in turn based on a general fact about the number of lattice points that can fall into a semi-algebraic set of not too large degree and small measure, see Lemma 3.2 for the exact statement. However, the proof of Lemma 3.3 also heavily exploits the structure of the skew-shift. It remains to be seen to what extent this method applies to other transformations. The arguments in this section do not directly invoke the lemmas from the previous section. We do, of course, use Proposition 2.11 in an essential way.
3.1. An estimate on the number of lattice points falling into a small set of bounded complexity. We begin by introducing some notation that will be used repeatedly in this section. Definition 3.1. For any a, b > 0 let a b denote C a ≤ b for some absolute constant C. The case where C is very large will be written as a b. Finally, a ∼ b means that both a b and a b. The following lemma will be important in the process of elimination of the energy. It is basically contained in Sect. 13 of [3]. Lemma 3.2. Let S ⊂ [0, 1] × [0, 1] be an open set with the following three properties: σ
mes(S) < e−B for some σ > 0, ∂S is contained in the union of at most B algebraic curves G = [P = 0] of degree deg P < B, for any line L, S ∩ L has at most B connected components.
(3.2) (3.3) (3.4)
Suppose M and B are related by the inequalities
Then
log log M log B log M.
(3.5)
m m 1 2 # (m1 , m2 ) ∈ Z2 |mi | < M and , ∈ S < B C M. M M
(3.6)
Furthermore, assume that m m −7 1 2 # (m1 , m2 ) ∈ Z2 |mi | < M and , ∈ S > M 1−10 . M M
(3.7)
Then S contains a line segment L of length |L| > M −1+10
−2
which is parallel to some integer with coordinates bounded by M 10
m1 vector m2 contains a point of the form M , M . Proof. See [3]. ! "
−6
and which
Anderson Localization for the Skew–Shift
609
3.2. On the number of times a generic orbit of the skew-shift visits a small set of bounded complexity. Lemma 3.3. Denote by Tω : T2 −→ T2 the ω-skew-shift on T2 . Let S ⊂ T4 × R be a semi-algebraic set of degree at most B such that mes(ProjT4 S) < e−B
σ
for some σ > 0.
(3.8)
Under the assumption (3.5) on M and B,
−8 mes (y0 , ω) ∈ T2 y0 , ω, Tωj (0, y0 ) ∈ ProjT4 S for some j ∼ M < M −10 . (3.9) Proof. Let ω ∈ (0, 1) be fixed and choose some y0 ∈ [0, 1). Then there are (x, y) ∈ [0, 1)2 such that mod Z2 (with ≡ denoting congruence mod Z2 ) j (j − 1) ω, y0 + j ω (x, y) ≡ Tωj (0, y0 ) ≡ jy0 + 2 j −1 (3.10) ≡ jy0 + (y − y0 + ν ), y0 + j ω 2 j − 1 ν j +1 ≡ y+ y0 + , y0 + j ω , 2 2 2 where ν, ν ∈ {0, 1}. Assume j ∼ M. Rewriting the congruences (3.10) as equalities in R yields j +1 x = ν2 + j −1 2 y + 2 y0 + m 1 (3.11) y = y0 + j ω + m2 with |mi |M. Solving (3.11) for y0 , ω one obtains
ν 2x−ν j −1 2 y0 = j +1 − 2 + x − j −1 2 y − m1 = j +1 − j +1 y − 2 2 ω = j1 (y − y0 − m2 ) = jν−2x (j +1) + j +1 y + j (j +1) m1 −
2 j +1 m1 m2 j .
(3.12)
Denoting π(S) = ProjT4 (S) we shall estimate
T2
j ∼M
χπ(S) y0 , ω, Tωj (0, y0 ) dy0 dω.
(3.13)
Using the change of variables given by (3.12) one obtains that the integral (3.13) is no larger than j −1 2 − j +1 χ j +12 2 π(S) (y0 (x, y), ω(x, y), x, y) dxdy − j (j +1) j +1 j ∼M |mi |M
∼ M −2
2x−ν j +1
T2 j ∼M |mi |M
−
j −1 j +1 y
−
(3.14)
χπ(S)x,y 2 ν−2x j +1 m1 , j (j +1)
+
2 j +1 y
+
2 j (j +1) m1
−
m2 j dxdy.
610
J. Bourgain, M. Goldstein, W. Schlag
Here π(S)x,y denotes the slice of π(S) for fixed (x, y). Restrict (x, y) ∈ T2 to the set where σ
mes(π(S)x,y ) < e− 2 B . 1
(3.15)
By (3.8), the complementary set contributes to the integral (3.14) an amount not exceeding σ
σ
e− 2 B M < e− 3 B . 1
1
(3.16)
For fixed x, y, the set Sx,y ⊂ T2 × R is still semi-algebraic of degree at most B. Therefore, condition (3.3) of Lemma 3.2 holds for π(Sx,y ) = π(S)x,y , with B C instead of B. Moreover, for any line L in [0, 1]2
π(S)x,y ∩ L = π Sx,y ∩ (L × R) has at most B C connected components, each of which is an interval. Thus condition (3.4) holds with B replaced by B C . Fix a point (x, y) ∈ [0, 1]2 satisfying (3.15) and assume j −1 m2 2 ν−2x 2 2 χπSx,y 2x−ν − y − m , + y + m − j +1 j +1 j +1 1 j (1+1) j +1 j (j +1) 1 j j ∼M;|mi |M
> κM 2 , (3.17) where −7
κ = M −10 . Fix j ∼ M and consider the affine transformation of R2 2x − ν ν − 2x j −1 2j 2 2 A(z1 , z2 ) := − y− z1 , + y+ z1 − z2 j +1 j +1 j +1 j (j + 1) j + 1 j +1 (3.18) for which 2j − j +1 0 ∼ 1. | det(DA)| = 2 −1 j +1
Thus the set A−1 π(S)x,y still satisfies conditions (3.2)–(3.4) of Lemma 3.2. Therefore, in view of (3.6), χπ(S)x,y (A(m1 /j, m2 /j )) < B C M. |m1 |M,|m2 |M
In conjunction with (3.17) this implies that there exists a subset J ⊂ {j ∼ M} such that #J > B −C κM and κ 1 −7 χπ(S)x,y (A(m1 /j, m2 /j )) > M = M 1−10 2 2
|mi |M
(3.19) (3.20)
Anderson Localization for the Skew–Shift
611
for any choice of j ∈ J . In view of (3.20), condition (3.7) of Lemma 3.2 holds for the set A−1 π(S)x,y . Hence, for any j ∈ J there exists a vector v ∈ Z2 \{0} such that |v1 | + |v2 | < M 10
−6
(3.21)
and a lattice point m ∈ Z2 , |m|M such that P + tv := m/j + tv ∈ A−1 π(S)x,y for all 0 < t < M −1+ 200 . 1
(3.22)
Applying the affine transformation A given by (3.18) yields j −1 2j ν − 2x 2x − ν − y− (tv1 + P1 ), j +1 j +1 j +1 j (j + 1) 2 2 + y+ (tv1 + P1 ) − (tv2 + P2 ) ∈ π(S)x,y j +1 j +1
(3.23)
for all t as in (3.22). Here v = (v1 , v2 ) and P = (P1 , P2 ) = m/j depend on j . Because of (3.19) and (3.21), there is a subset J ⊂ J , so that #J > M −2.10
−6
−6
#J > M −3.10 M
(3.24)
and for which all choices of j ∈ J have the same vector v. We first consider the case where v lies on the line v2 = 2v1 .
(3.25)
Denoting by L(j ) the line segment given by (3.23), assume that for some choice of j = j in J ,
dist(L(j ) , L(j ) ) < τ. Thus there exist t, t as in (3.22) so that 2x − ν 2j j −1 2 j + 1 − j + 1 y − j + 1 m1 − j + 1 tv1 2x − ν 2j j − 1 2 − − y− m − t v1 < τ j + 1 j +1 j + 1 1 j + 1
(3.26)
and ν − 2x
2v1 2 2m1 m2 + y+ − +t − v2 j (j + 1) j + 1 j (j + 1) j j +1
ν − 2x
2v1 2m1 m2 2 − + y+ − + t − v2 < τ. j (j + 1) j + 1 j (j + 1) j j +1
(3.27)
Since by (3.25) −
2v1 2j v1 = − v2 , j +1 j +1
(3.28)
612
J. Bourgain, M. Goldstein, W. Schlag
subtracting (3.26) from (3.27) and multiplying the resulting expression by j (j +1)j (j + 1) yields
2jj (j + 1)x − (j − 1)jj (j + 1)y − 2jj (j + 1)x + jj (j + 1)(j − 1)y + 2j (j + 1)x − 2jj (j + 1)y − 2j (j + 1)x + 2j (j + 1)j y τ M 4 , which is the same as
2(j − j )(1 + j )(1 + j )x M 4 τ.
(3.29)
Here · denotes the distance to the nearest integer. The points x ∈ T for which (3.29) holds for an arbitrary choice of distinct 1 ≤ j, j M form a set of measure M 6 τ . Taking τ = M −100 , one concludes that the contribution of those points x to the integral (3.14) is at most M −90 .
(3.30)
Excluding those points, one can therefore assume that for any choice of j = j in J ,
dist(L(j ) , L(j ) ) > τ,
(3.31)
where the line segments L(j ) ⊂ π(S)x,y . We will show that this leads to a contradiction. For an arbitrary set ⊂ R2 denote by N (, τ ) the number of τ -balls needed to cover the set . N is also referred to as “entropy”. In view of (3.31), (3.24), and the 1 property that |L(j ) | > M −1+ 100 , * 1 τ (j ) τ L , >N #J τ −1 M −(1− 100 ) N π(S)x,y , 10 10 j ∈J (3.32) 1 −6 M 1−3·10 τ −1 M 100 −1 M 200 τ −1 . 1
σ
On the other hand, π(S)x,y lies within a e− 4 B -neighborhood of at most B C many 1 σ algebraic curves G of degree not exceeding B C . By our assumption (3.5), τ , e− 4 B . Therefore, 1
N (G, τ ) τ −1 (G) < B C τ −1 , N (π(S)x,y , τ ) B C τ −1 .
(3.33)
Because of log M , log B this contradicts (3.32). It remains to consider the case where the vector v ∈ Z2 \{0} satisfies (3.21) but v2 = 2v1 . It follows from (3.23) that the segment L(j ) is oriented in the direction 2 j +1
−
v2 v1
− j2j +1
=
1 s(j + 1) − 1 1 v2 ≥ = 1, in fact, |s − 1| ≥ where s := . j 2v1 2|v1 | M
Anderson Localization for the Skew–Shift
613
------------------------------------------------------------------- ------------------------------------------------------------ --------------------------(j ) -- --------------------------------------------------------------------------------------- ---------------------- ------------------------------------ ----------------------- ----------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------- ----------------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------0
L
P0
L
Fig. 2. The bush L(j )
Thus for any choice of j = j , s(j + 1) − 1 s(j + 1) − 1 |1 − s| − ≥ M −3 . j j M2
(3.34) (j )
One now again considers the system of lines {L(j ) |j ∈ J }. Let Lτ neighborhood of L(j ) . Then, on the one hand, 1 1 χL(j ) dxdy #J M −1+ 100 τ M 200 τ. T2 j ∈J
denote a τ (3.35)
τ
(j )
On the other hand, since each Lτ is contained in a τ -neighborhood of π(S)x,y , (3.33) implies that 2 c χL(j ) dxdy χ χ (j ) τ N (π(S)x,y , τ )τ B (j ) . (3.36) L L T2 j ∈J
τ
τ
j ∈J
∞
j ∈J
One concludes from (3.35) and (3.36) that 1 1 χL(j ) M 200 B −C M 300 . τ
τ
∞
(3.37)
∞
j ∈J
Hence there is a subsystem {L(j ) |j ∈ J } of cardinality #J M 300 1
(j )
such that the tubes {Lτ |j ∈ J } have a common point P0 . It follows from (3.34) that
(L(j ) , L(j ) ) M −3 for any choice of j = j .
(3.38)
Choose a line L0 that crosses the majority of lines in the bush {L(j ) |j ∈ J } transversely. 1 Recalling that π(S)x,y ∩ L0 has at most B C M 300 many components, one obtains two distinct j, j ∈ J for which the points L0 ∩ L(j ) ,
L0 ∩ L(j ) ∈ π(S)x,y
614
J. Bourgain, M. Goldstein, W. Schlag
belong to the same component of π(S)x,y ∩ L0 . In view of (3.38) and (3.22) this implies that mesL0 (π(S)x,y ∩ L0 ) M −4 . Since one can translate L0 by an amount M −1 , one finally obtains mes(π(S)x,y ) M −5 , which again contradicts (3.15). We have reached the conclusion that our assumption (3.17) fails. Recalling estimates (3.16), (3.30) on the exceptional (x, y)-sets, this implies that (3.13), (3.14)
1 − 1 Bσ −7 e 3 + M −99 + κM 2 < 2M −10 , M2
which proves (3.9). ! "
3.3. Averaging the monodromy matrix over long orbits. For the remainder of this paper we shall assume that there is a large deviation estimate as in Prop. 2.11, without specifying λ in our notation. More precisely, we shall write the large deviation estimate in the form 1
sup mes (x, y) ∈ T2 log Mn (x, y; E) − Ln (E) > n−σ ≤ C exp −nσ . n E (3.39) By Prop. 2.11 this holds provided σ > 0 is sufficiently small and for all n ≥ n1 (λ, v, ε), where ω ∈ ε . Moreover, for the sake of simplicity v will be assumed to be a trigonometric polynomial. The extension to real–analytic potentials is straightforward. Lemma 3.4. Let Tω be the ω-skew-shift, ω satisfying
kω ≥ cε |k|−1−ε for all k ∈ Z, 0 < |k| < N.
(3.40)
Then, denoting uN0 (x, y) :=
1 v(T j (x, y)) − E −1 1 ω log 1 0 N0 j =N0
there exist constants σ > 0, C > 1 so that for N > N0C one has the uniform bound N 1 −σ uN ◦ T j − uN0 (x, y) dxdy ω N 0 ∞ 2 < N0 . 2 T L (T ) j =1
(3.41)
Anderson Localization for the Skew–Shift
615
Proof. By the large deviation theorem, the set
:= (x, y) ∈ T2 |uN0 (x, y) − uN0 | > N0−σ satisfies σ
mes() < e−N0 .
(3.42)
Since v is a trigonometric polynomial, is clearly semi-algebraic expressed by polynomials in (x, y) of degree not exceeding N0C . Hence ∂ is contained in the union of no more than N0C many algebraic curves G of degree bounded by N0C . Therefore, one has the entropy bounds N (G, τ ) N0C τ −1 , and since, by (3.42) σ
sup dist((x, y), ∂) e− 2 N0 , 1
(x,y)∈
one also has N (, τ ) N0C τ −1 , σ
(3.43) σ
provided τ > e− 3 N0 . It clearly suffices to prove (3.41) for N < e 10 N0 . Consider the expression 1
1
N 1
T j (x, y) − T j (x, y) −2 N2 j =j =1
∼
N −2 1
)y + (j (j − 1) − j (j − 1) ω/2 +
(j − j )ω ,
(j − j N2 j =j =1
(3.44) where · denotes both the natural distance on T2 and T. Setting k = j − j and
= j + j − 1, (3.44) can be rewritten in the form 1 N2
0 N1 and choose a collection of disks {D(Ps , τ )|s = 1, . . . , r} covering , where by (3.43) r N0C τ −1 .
(3.48)
Since by (3.47) N −2
N
T j (x, y) − T j (x, y) −2 N ε ,
(3.49)
j =j =1
we obtain in particular that r
−2 −2 # j = j T j (x, y) ∈ D(Ps , τ ), T j (x, y) ∈ D(Ps , τ ) N ε . (3.50) τ N s=1
Define for s = 1, . . . , r, so that J ⊂
-
s
Js = {j = 1, . . . , N | T j (x, y) ∈ D(Ps , τ )} Js . Clearly, (3.48) implies that #J N0C τ −1 +
#Js .
(3.51)
#Js >1
Furthermore, by (3.50),
(#Js )2 τ 2 N 2+ε .
(3.52)
#Js >1
It follows from (3.51), and (3.52) that √ #J N0C τ −1 + r τ N 1+ε N0C τ −1 + N0C τ 1/2 N 1+ε . Optimizing in τ yields #J N0C N 3 +ε . 2
(3.53)
Since uN0 is bounded, (3.53) implies that N 1 1 j (T (x, y)) − uN0 N0−σ + CN −1 #J N0−σ + N0C N − 3 +ε . uN0 2 N T j =1
Inequality (3.41) follows provided N > N0C1 with some large C1 .
" !
Anderson Localization for the Skew–Shift
617
The somewhat technical assumption (3.40), which requires only finitely many conditions on ω in terms of k, was made in order to insure that Lemma 3.3 can be applied. This will be important in the proof of localization, see Theorem 3.7 below. The previous lemma turns out to have several applications, one of which is the following uniform upper bound on the norm of the monodromy matrices. Corollary 3.5. Assume ω satisfies the Diophantine condition (3.40). For any N > N0C , there is a uniform estimate for all E ∈ R, sup (x,y)∈T2
1 log MN (x, y; E) < LN0 (E) + N0−σ . N
(3.54)
3.4. Double resonances occur with small probability. Fix ε > 0 small and let ω ∈ ε , see (2.42). Since we are assuming that the disorder λ is large, Prop. 2.11 guarantees that inf L(E) > c0 > 0. E
(3.55)
The purpose of this subsection is to prove the following lemma, which asserts in effect that double resonances occur with small probability. An analogous statement for the shift can be found in [3]. The importance of double resonances is of course a standard fact in the theory of localization, cf. Sinai [15] and Fröhlich, Spencer, Wittwer [6]. In what follows, H[−N1 ,N1 ] (ω, x, y) denotes the operator given by the left-hand side of (2.60) (with T = Tω ) restricted to the interval [−N1 , N1 ] with Dirichlet boundary conditions. We shall also write LN (ω, E) instead of LN (E) to indicate the dependence on ω. Lemma 3.6. Fix a small ε > 0. Let N be an arbitrary positive integer and let C2 ≥ 1 be some constant. Define S = SN ⊂ T4 × R to be the set of those (ω, y0 , x, y, E) for which there exists some N1 < N C2 so that
kω ≥ ε |k|−1 (1 + log k)−2 for all 0 < k < N, −1 H[−N1 ,N1 ] (ω, 0, y0 ) − E > eC3 N ,
(3.57)
log MN (ω, x, y, E) < LN (ω, E) − c0 /10.
(3.58)
1 N
(3.56)
Here c0 is the constant from (3.55) and C3 will be a sufficiently large constant depending on v. Then 1 (3.59) mes(ProjT4 S) exp − N σ . 2 Moreover, S is contained in a set S satisfying the measure estimate (3.59) and which is semi-algebraic of degree at most N C for some constant C depending on v, ε, C2 and C3 . Proof. Fix some sufficiently large N . Firstly, recall that the large deviation estimate (3.39) for n = N holds under the condition (3.56) on ω, see Remark 2.9. Now fix some ω as in (3.56) and let y0 ∈ T be arbitrary. If E satisfies (3.57), then by self-adjointness of H , |E − E | < e−C3 N
(3.60)
618
J. Bourgain, M. Goldstein, W. Schlag
for some E ∈ Spec H[−N1 ,N1 ] (ω, 0, y0 ) . Observe that these eigenvalues E do not depend on (x, y). It follows from (3.60) with sufficiently large C3 and (3.58) that 1 N
log MN (ω, x, y, E ) < LN (ω, E ) − c0 /20.
(3.61)
This can be seen by differentiating the functions on the left-hand side of (3.61) in the energy. In view of (3.39), the measure of the set of (x, y) ∈ T2 for which (3.61) holds σ −N with fixed E does not exceed e . This proves that
1 σ σ mes ProjT4 S N12 e−N e− 2 N ,
(3.62)
as claimed. It remains to be shown that conditions (3.57), and (3.58) can be replaced by inequalities involving only polynomials of degree at most N C for some C, without increasing the measure estimate (3.59) by more than a factor of two, say. We will not provide all details, since they can be readily found in [3]. Using Hilbert–Schmidt norms in (3.57) and expressing the inverse in terms of Cramer’s rule shows that condition (3.57) is semialgebraic of degree at most CN13 . Using Lemma 3.4, we may express the Lyapunov exponent 1 log MN (ω, x, y, E) dxdy LN (ω, E) = N [0,1]2 appearing in (3.58) as a discrete average LN (ω, E) = R −1
R 1 log MN (ω, Tωj (0, 0), E) + o(1) N j =1
with R < N C . Therefore, one obtains a semi-algebraic condition in ω, x, y, E of degree at most N C by rewriting (3.58) in the form
MN (ω, x, y; E) 2R ≤ e−NRc0 /10
R j =1
MN (ω, Tωj (0, 0); E) 2 .
Finally, the measure of the set S does not change by more than a factor in this process. ! " 3.5. The proof of localization for the skew-shift with large disorder. The following theorem is the main result of this section. Theorem 3.7. Fix ε > 0 small. Let v = v(x, y) be a nonconstant trigonometric polynomial on T2 and let λ1 = λ1 (v, ε) be as in Prop. 2.11. Let Tω (x, y) = (x + y, y + ω) (mod Z2 ) denote the ω-skew-shift on T2 . Then for every λ > λ1 and all (ω, x, y) ∈ T3 up to a set of measure ε, the operator
Hω,(x,y) ψ n := −ψn−1 − ψn+1 + λv(Tωn (x, y))ψn on 2 (Z) displays Anderson localization for all energies.
Anderson Localization for the Skew–Shift
619
Proof. Let ω ∈ ε , see (2.42). For large N , let SN be as in Lemma 3.6. Then Lemma 3.3 2 applies to SN and setting N¯ = e(log N) it follows that
−8 mes (y0 , ω) ∈ T2 (y0 , ω, Tωj (0, y0 )) ∈ ProjT4 (SN ) for some j ∼ N¯ < N¯ −10 . (3.63) Let BN denote the set on the left-hand side of (3.63) and define B (0) := lim sup BN . N→∞
Thus mes(B (0) ) = 0. Since T (x, y) = x + T (0, y) (mod 1), this construction applied to the potential v(x + ·, ·) instead of v produces a set B (x) of measure zero. Finally, set B := (ω, x, y) (y, ω) ∈ B (x) , which is again of measure zero. It is for all (ω, x, y) ∈ ε × T2 \ B that we shall prove localization.
Fix such a choice of (ω, x, y) and any E ∈ Spec Hω,(x,y) . By the Shnol–Simon theorem [12, 13] there exists a generalized eigenfunction ξ , i.e., (Hω,(x,y) − E)ξ = 0 and |ξn | 1 + |n| for all n ∈ Z.
(3.64)
Furthermore, we normalize |ξ0 | + |ξ1 | = 1. Fix some large integer N and assume that (3.57) holds. By our choice of (ω, x, y),
j 1 T (x, y); E > L(E) − c0 /10 log
M N ω N for all N ∼ N and j ∼ N¯ = e(log N) , cf. (3.58). It follows from the avalanche principle that then also
1 log MN2 Tωj (x, y); E > L(E) − c0 /10 if N2 (3.65) N¯ N¯ 2 ¯ < |j | < N and N < N2 < . 2 10 2
As usual, let −1 GC (ω, x, y; E) := HC (ω, x, y) − E be the Green’s function. As before, HC denotes the restriction of H to the interval C with Dirichlet boundary conditions. Consider intervals
N¯ N¯ C = j, j + , where < |j | < N¯ . 10 2 By definition of GC and because of (3.64), it will suffice to prove that 1 max GC (ω, x, y; E)(k, ) exp(−c1 N¯ ) for all k ∈ C with dist(k, ∂C) > |C|.
∈∂C 4 (3.66)
620
J. Bourgain, M. Goldstein, W. Schlag
Here c1 > 0 is some fixed constant. The proof of (3.66) follows from (3.65) by a standard argument. In fact, it is a simple consequence of Cramer’s rule and the representation of the Hamiltonian as the matrix appearing on the right-hand side of (2.80) that for any n and 1 ≤ k, ≤ n, G[1,n] (x, y; E)(k, ) =
fk−1 (x, y; E)fn− −1 (T (x, y); E) . fn (x, y; E)
In conjunction with (2.81), Corollary 3.5, and (3.65), this implies (3.66) as desired. Recall, however, that we made the assumption that (3.57) holds. To establish this condition it suffices to show that |ξN1 +1 | + |ξ−N1 −1 | e−2C3 N for some N1 ∼ N C2 . In view of (3.64) this estimate holds provided both Green’s functions
G[j −4C3 N, j +4C3 N] (ω, x, y; E) = G[−4C3 N,4C3 N] ω, T j (x, y); E with j = N1 , −N1 satisfy an exponential decay estimate as in (3.66). In view of the preceding argument involving (3.66) it remains to show that for some j ∼ N C2 one has the property
1 log M4C3 N Tωj (x, y), E > L(E) − c0 /10 4C3 N and similarly for −j . That, however, is an immediate consequence of Lemma 3.4.
" !
Acknowledgement. The authors thank Thomas Spencer and Yakov Sinai for helpful discussions. The third author wishes to thank Thomas Wolff for the suggestion that the Hilbert transform might lead to better bounds in the paper [7], and for pointing out Theorem 1.9 in [9], and Charles Fefferman for sketching to him the proof of an important BMO estimate for logarithms of polynomials. The second author was supported by a grant of the NEC Research Institute, Inc. during his stay at the Institute for Advanced Study, Princeton. The third author was supported in part by the NSF, grant number DMS-9706889.
References 1. Anderson, P.: Absence of diffusion in certain random lattices. Phys. Rev. 109, 1492–1501 (1958) 2. Bourgain, J.: Positive Lyapunov exponents for most energies. Geometric Aspects of Functional Analysis. Lecture Note sin Math. 1745. Berlin–Heidelberg–New York: Springer, 2000, pp. 37–66 3. Bourgain, J., Goldstein, M.: On nonperturbative localization with quasiperiodic potential. Annals of Math. 152, (3), 835–879 (2000) 4. Bourgain, J., Schlag, W.: Anderson localization for Schrödinger operators on Z with strongly mixing potentials. Commun. Math. Phys. 215, 143–175 (2000) 5. Figotin, A., Pastur, L.: Spectra of random and almost–periodic operators. Grundlehren der mathematischen Wissenschaften 297, Berlin–Heidelberg–New York: Springer, 1992 6. Fröhlich, J., Spencer, T., Wittwer, P.: Localization for a class of one dimensional quasi-periodic Schrödinger operators. Commun. Math. Phys. 132, 5–25 (1990) 7. Goldstein, M., Schlag, W.: Hölder continuity of the integrated density of states for quasiperiodic Schrödinger equations and averages of shifts of subharmonic functions. To appear in Annals of Math. 8. Herman, M.: Une méthode pour minorer les exposants de Lyapounov et quelques exemples montrant le charactère local d’un theoreme d’Arnold et de Moser sur le tore de dimension 2. Comment. Math. Helv. 58 no. 3, 453–502 (1983) 9. Katznelson, Y.: An introduction to harmonic analysis. New York: Dover, 1976 10. Kotani, S.: Ljapunov indices determine absolutely continuous spectra of stationary random onedimensional Schrödinger operators. In: Stochastic analysis (Kotata/Kyoto, 1982), North-Holland Math. Library, 32, Amsterdam–New York: North-Holland, 1984, pp. 225–247
Anderson Localization for the Skew–Shift
621
11. Montgomery, H.: Ten lectures on the interface between harmonic analysis and analytic number theory. CBMS Regional conference series in mathematics 84, Providence, RI: AMS, 1994 12. Shnol, I.: On the behavior of the Schroedinger equation. Mat. Sb. 273–286 (1957) Russian 13. Simon, B.: Spectrum and continuum eigenfunctions of Schroedinger Operators. J. Funct. Anal. 42, 66–83 (1981) 14. Simon, B.: Kotani theory for one-dimensional stochastic Jacobi matrices. Commun. Math. Phys. 89, 227–234 (1983) 15. Sinai, Y.G.: Anderson localization for one-dimensional difference Schrödinger operator with quasiperiodic potential. J. Stat. Phys. 46, 861–909 (1987) 16. Stein, E.: Harmonic analysis. Princeton Mathematical Series 43, Princeton, NJ: Princeton University Press, 1993 Communicated by P. Sarnak
Commun. Math. Phys. 220, 623 – 656 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Bryuno Function and the Standard Map Alberto Berretti1 , Guido Gentile2 1 Dipartimento di Matematica, II Università di Roma (Tor Vergata), Via della Ricerca Scientifica,
00133 Roma, Italy. E-mail:
[email protected] 2 Dipartimento di Matematica, Università di Roma 3, Largo S. Leonardo Murialdo 1, 00146 Roma, Italy.
E-mail:
[email protected] Received: 8 February 2000/ Accepted: 2 March 2001
Abstract: For the standard map the homotopically non-trivial invariant curves of rotation number ω satisfying the Bryuno condition are shown to be analytic in the perturbative parameter ε, provided |ε| is small enough. The radius of convergence ρ(ω) of the Lindstedt series – sometimes called critical function of the standard map – is studied and the relation with the Bryuno function B(ω) is derived: the quantity | log ρ(ω) + 2B(ω)| is proved to be bounded uniformly in ω. 1. Introduction We continue the study, started in [1], of the radius of convergence of the Lindstedt series for the standard map, for rotation numbers close to rational values. We consider real rotation numbers ω satisfying the Bryuno condition (see below), and study how the corresponding radius of convergence depends on the Bryuno function B(ω), introduced by Yoccoz in [2]. The standard map is a discrete time, one-dimensional dynamical system generated by the iteration of the area-preserving – symplectic – map of the cylinder into itself Tε : T × R → T × R, given by: x = x + y + ε sin x, Tε : (1.1) y = y + ε sin x. Given a real rotation number ω ∈ [0, 1), we can look for (homotopically non-trivial) invariant curves described parametrically by: x = α + u(α, ε; ω), (1.2) y = 2πω + u(α, ε; ω) − u(α − 2π ω, ε; ω),
624
A. Berretti, G. Gentile
such that the dynamics induced in the variable α is given by rotations by ω: α = α + 2π ω.
(1.3)
For irrational rotation numbers ω, by imposing that the average of u over α is 0, the (formal) conjugating function u is unique and odd in α, and has a formal expansion – known as the Lindstedt series – of the form: iνα k uν (ε)eiνα = u(k) (α)ε k = u(k) ε ; (1.4) u(α, ε) = ν e k≥1
ν∈Z
k≥1 ν∈Z
(k)
the coefficients uν can be expressed graphically in terms of sums over trees as explained shortly (see also [1] and references quoted therein). The radius of convergence of the series (1.4), called sometimes the critical function of the standard map, is defined as: 1/k −1 ρ(ω) = inf lim supu(k) (α) . α∈T
k→∞
(1.5)
Given ω, let {pn /qn } be the sequence of convergents defined by the standard continued fraction expansion of ω, and let: B1 (ω) =
∞ log qn+1 n=0
qn
.
(1.6)
The irrational number ω ∈ [0, 1) satisfies the Bryuno condition if B1 (ω) < ∞; we also say that in this case ω is a Bryuno number. After Yoccoz [2], we define on the irrational numbers the Bryuno function B(ω) by the functional equation:
B(ω) = − log ω + ωB(ω−1 ) for ω ∈ (0, 21 ) and irrational, B(ω + 1) = B(−ω) = B(ω).
(1.7)
It can be proved that such a functional equation has a unique solution in Lp , p ≥ 1; moreover B(ω) is related to the series B1 (ω) by the inequality: B(ω) − B1 (ω) < C1 ,
(1.8)
for some constant C1 . See [2] and [3] for the proofs of these statements. We prove the following theorem. Theorem. Consider the standard map (1.1) and let ω be an irrational number, ω ∈ [0, 1), satisfying the Bryuno condition. Then the radius of convergence (1.5) satisfies the bound: | log ρ(ω) + 2B(ω)| ≤ C0 , where C0 is a constant independent of ω.
(1.9)
Bryuno Function and Standard Map
625
An analogous result was proved by Davie [4] for the semistandard map (where the nonlinear term sin x in (1.1) is replaced by eix ); in the same paper it was also shown that the upper bound in (1.9) holds: log ρ(ω) + 2B(ω) < C2 ,
(1.10)
for some constant C2 . In [5] it was proved, by “phase space renormalization” arguments, that ∀η > 0 ∃C3 , depending on η, such that: log ρ(ω) + (2 + η)B(ω) > C3 .
(1.11)
So our theorem improves the result of [5] (using also a different, direct technique, taken from [6] – and inspired the works [7] and [8] – in some sense more elementary than the one of [5]) and proves the conjecture (“Bryuno’s interpolation”) first stated for the standard map in [9]; see also [10] and references quoted therein. Our theorem can be related to the result and the methods of [1]. There we proved that, for ω ∈ C, if ω tends to a rational number p/q through a path in the complex plane non-tangential to the real axis, then the radius of convergence satisfies: log ρ(ω) + 2 logω − p < C4 q q
(1.12)
for some constant C4 . If instead we consider a sequence of real, irrational numbers tending to a rational value p/q, the situation is quite a bit more complex. In fact, the limit and its very existence may depend on the arithmetic properties of the numbers of the sequence we consider, and on their uniformity in k; namely: 1. The sequence {ωk } can tend to p/q but, though all the ωk are irrational, some of them are not Bryuno numbers so that for those B(ωk ) = +∞ and ρ(ωk ) = 0. 2. The sequence {ωk } can tend to p/q through Bryuno numbers, or even Diophantine numbers, but they are not uniformly such in k so that B(ωk ) diverges faster than log |ωk − p/q|1/q (and so ρ(ωk ) tends to zero faster than |ωk − p/q|2/q ). An example can be the sequence of Diophantine (actually even “noble”) numbers: 1
ωk = k+
,
1
(1.13)
2k + γ 2
where γ denotes the “golden mean”: γ =
1 1 1+ 1 + ···
√ =
5−1 ; 2
(1.14)
a simple calculation using the recursion relation (1.7) shows that indeed B(ωk ) = O(k) while ωk = O(1/k), so that, by taking into account also logarithmic corrections in B(ωk ), ρ(ωk ) = O(ωk2 e−2/ωk ), that is much faster than ωk2 .
626
A. Berretti, G. Gentile
3. Finally, the sequence {ωk } can tend to p/q through a sequence of Bryuno numbers satisfying uniform estimates in k, so that an estimate like (1.12) holds (note that decays slower than |ωk − p/q|2/q are not possible); an example can be given by the sequence: ωk =
1 , k+γ
(1.15)
where again γ is the golden mean (1.14). Notice that in the numerical calculations of [11] only real sequences of type 1 were considered, and that sequences of type 1 are practically inaccessible from the numerical point of view. One might also ask whether the same interpolation property holds for the analytic critical threshold εc (ω), defined as the supremum of the set: Eω = {ε > 0 | ∀˜ε ∈ [0, ε) ∃ an analytic invariant curve with rotation number ω}; (1.16) of course ρ(ω) ≤ εc (ω). The interpolation properties of εc (ω) should be different, as, according to Davie [5], their orders of magnitude asymptotically differ as ω → 0. This, in turn, adds interest to the study of the interpolation properties for the radius of convergence ρ(ω), as a standard against which to check εc (ω), besides the obvious interest in an important analyticity property of the function u. Note that this is a much harder problem, especially considering that it is not at all clear what is the right question to ask. For example, for generic standard-like maps, the analytic critical threshold is different for positive or negative values of ε, as numerical experiments suggest (see e.g. [12]), and of course there is nothing special to positive values of ε from the physical point of view. Moreover, always for generic maps one can have the phenomenon of erratic invariant curves, that is for a given ω the invariant curve can break down at a certain value of ε, to reappear and disappear again as ε grows: again, this has been shown only numerically (see [13]) and it is unlikely the case of the standard map, but such a possibility makes the simple definition (1.16) questionable from the physical point of view. Finally, one may ask how much these results can be extended to more complicated, and realistic, symplectic maps and continuous time Hamiltonian systems. We believe that while some additional complications may arise, the really hard problem (i.e. how to handle resonances) is already present in the standard map and it was solved by carefully using the trees formalism and the multi-scale decomposition of the propagators (see below). More general maps and Hamiltonian systems, though, as already pointed out in [1, 14], have different, more complicated interpolation properties for the radius of convergence of their Lindstedt series: the challenge here seems to be to find the right interpolation formula, which the work of [14] shows is different from Bryuno’s interpolation; this is an area where much work still has to be done. The paper is organized as follows. In Sect. 2 we introduce the formalism and give the scheme of the proof of the theorem, elucidating the major difficulties, due to the accumulation of small divisors in the Lindstedt series, and showing that, in the absence of such a phenomenon, the proof could be carried out by a detailed analysis of the single terms of the series. In Sect. 3 and 4, we shall see how to handle the small divisors problem, by showing that there are cancellation mechanisms, operating to all perturbative orders
Bryuno Function and Standard Map
627
between different terms of the Lindstedt series, which assure its convergence. Finally Sect. 5 and 6 deal with the proof of the main technical lemmata used in the proof of the theorem. 2. Formalism: Trees, Clusters and Resonances (k)
As in [1], we can express graphically the coefficients uν in (1.4) in terms of trees. We shall only recall the definitions used in this paper and set up the notations, leaving the full details of the tree expansion for our problem to [1] and the references quoted therein. A tree ϑ consists of a family of lines arranged to connect a partially ordered set of points – nodes – with the lower nodes to the right. All the lines have two nodes at their extremes, except the highest which has only one node, the last node u0 of the tree; the other extreme r will be called the root of the tree and it will not be regarded as a node. We denote by the partial ordering relation between nodes defined as follows: given two nodes u, v, we say that v u if u is along the path of lines connecting v to the root r of the tree – they could coincide: we say that v ≺ u if they do not. So our trees are “rooted trees”, following the terminology of [15]. We assign to each line ! joining two nodes u and u an “arrow” pointing from the higher to the lower node according to the order relation just defined; if u ≺ u , we say that the line ! exists from u and enters u , and that u is the node immediately following u. We write u0 = r even if, strictly speaking, r is not considered a node. For each node u there is a unique exiting line, and mu ≥ 0 entering lines; as there is a one-to-one correspondence between lines and nodes, we can associate to each node u the line !u exiting from it. The line !u0 exiting the last node u0 will be called the root line. Note that each line !u can be considered the root line of the subtree consisting of the nodes v satisfying v u, and u will be the root of such tree. The order k of the tree is defined as the number of its nodes. To each node u ∈ ϑ we associate a mode label νu = ±1, and define the momentum flowing through the line !u as: ν!u = νw , νw = ±1; (2.1) wu (k)
note that no line can have zero momentum, as u0 = 0 in (1.4). While in [1] we could get along considering only two “scales”, we need a full multiscale decomposition of the momenta associated to each line. Given a rotation number ω ∈ [0, 1)\Q, let {pn /qn } be the sequence of convergents coming from the standard continued fraction expansion of ω. For x ∈ R, let: ||x|| = inf |x − ν| ν∈Z
(2.2)
be the distance of x from the nearest integer. Let now: γ (ν) = 2(cos 2π ων − 1);
(2.3)
|γ (ν)| = 2| cos 2π ων − 1| ≥ $||ων||2 ,
(2.4)
then we have the estimate:
for some constant $.
628
A. Berretti, G. Gentile
We introduce a C ∞ partition of unity in the following way. Let χ (x) a C ∞ , nonincreasing, compact-support function defined on R+ , such that: 1 for x ≤ 1, (2.5) χ (x) = 0 for x ≥ 2, and define for each n ∈ N: χ0 (x) = 1 − χ (96q1 x), χn (x) = χ (96qn x) − χ (96qn+1 x),
for n ≥ 1.
(2.6)
Then for each line ! set: g(ν! ) ≡
∞
∞
n=0
n=0
χn (||ων! ||) 1 = ≡ gn (ν! ), γ (ν! ) γ (ν! )
(2.7)
and call gn (ν! ) the propagator on scale n. Given a tree ϑ, we can associate to each line ! of ϑ a scale label n! , using the multiscale decomposition (2.7) and singling out the summands with n = n! . We shall call n! the scale label of the line !, and we shall say also that the line ! is on scale n! . Remark 1. Given a value ν! there can be at most two possible – consecutive – values of n such that the corresponding χn (||ων! ||) are not vanishing. This means that at most only two summands of the infinite series (2.7) really appear; nevertheless keeping all terms is more convenient, in order to have a label to characterize the “size” of the “propagators” g(ν! ). Remark 2. Note that if a line ! has momentum ν! and scale n! , then: 1 1 ≤ ||ων! || ≤ , 96qn! +1 48qn!
(2.8)
provided that one has χn! (||ων! ||) = 0. A group G of transformations acts on the trees, generated by the permutations of all the subtrees emerging from each node with at least one entering line: G is therefore a Cartesian product of copies of the symmetric groups of various orders. Two trees that can be transformed into each other by the action of the group G are considered identical. Denote by Tν,k the set of trees, with nonvanishing value, of order k and total momentum ν!u0 = ν, if u0 is the last node of the tree. The number of elements in Tν,k is bounded by 2k · 2k · 22k = 24k : the number of semitopological trees (see [1]) of order k is bounded by 22k ,1 and there are two possible values for the mode label of each node and two possible values for the scale label of each line. Then, as in [1] – to which we refer for more details and figures – one finds: mu +1
νu 1 u(k) = Val(ϑ), Val(ϑ) = −i g (ν ) (2.9) n! ! ; ν 2k mu ! ϑ∈Tν,k
u∈ϑ
!∈ϑ
the factors gn! (ν! ) above are called propagators of small divisors on scale n! , and the quantity Val(ϑ) will be called the value of the tree ϑ. We define now the main combinatorial tools. 1 The number of semitopological trees can be bounded by the number of one-dimensional random walks with 2k − 1 steps.
Bryuno Function and Standard Map
629
Definition (Cluster). Given a tree ϑ, a cluster T of ϑ on scale n is a maximal connected set of lines of lines on scale ≤ n with at least one line on scale n. We shall say that such lines are internal to T , and write ! ∈ T for an internal line T . A node u is called internal to T , and we write u ∈ T , if at least one of its entering lines or exiting line is in T . Each cluster has an arbitrary number mT ≥ 0 of entering lines but only one or zero exiting line; we shall call external to T the lines entering or exiting T (which are all on scale > n). We shall denote with nT the scale of the cluster T , with niT the minimum of the scales of the lines entering T , with noT the scale of the line exiting T and with kT the number of nodes in T . Note that, despite the name, not all lines outside T are “external” to it: only those lines outside T which enter or exit T are external to it. On the contrary a line inside T is said to be “internal” to it. The use of such a terminology is inherited from Quantum Field Theory. Definition (Resonance). Given a tree ϑ, a cluster V of ϑ will be called a resonance i o with resonance-scale n = nR V ≡ min{nV , nV }, if: 1. the sum of the mode labels of its nodes is 0: νV ≡ νu = 0;
(2.10)
u∈V
2. all the lines entering V are on the same scale except at most one, which can be on a higher scale; 3. niV ≤ noV if mV ≥ 2, and |niV − noV | ≤ 1 for mV = 1; 3. kV < qn ; 4. mV = 1 if qn+1 ≤ 4qn ; 5. if qn+1 > 4qn and mV ≥ 2, denoting by k0 the sum of the orders of the subtrees of order < qn+1 /4 entering V , either (a) there is only one subtree of order k1 ≥ qn+1 /4 entering V and k0 < qn+1 /8, or (b) there is no such subtree and k0 + k0 < qn+1 /4. Remark 3. Note that for any resonance V one has nR V ≥ nV + 1, if nV is the scale of the resonance V as a cluster. As in [16] we use the notation with a hyphen for the resonance-scale to avoid confusion between nR V and nV . Remark 4. One would be tempted to give a simpler definition of resonance (for instance, by imposing only condition 2 to the cluster V ). This temptation should be resisted, as it would make it impossible to exploit the cancellations leading to the improvement of the bound discussed at the end of this section (in fact, no relation would continue to subsist between momenta and scale labels and factorials would arise from counting the summands generated by the renormalization procedure described in Sect. 4). On the other hand we shall see in Sect. 5 that no problems should arise if no resonances – exactly as they are defined above – could appear. In the following we shall need to introduce trees in which it can happen that a line ! is on a scale n! and yet its momentum does not satisfy (2.8). The value of any such tree ϑ is vanishing as χn! (||ων! ||) = 0; nevertheless it will be useful to write Val(ϑ) as the sum of two (possibly) nonvanishing terms: one of them will be used to cancel terms arising from other tree values, so it will disappear, while the other one is left and has to
630
A. Berretti, G. Gentile
be bounded. This means that we shall have to deal with trees in which there are lines ! with momentum ν! and scale n! which do not satisfy (2.8). What will be shown to hold is that for such lines a bound similar to (2.8), though weaker, still holds; more precisely, a line ! with momentum ν! will have only scales n! such that: 1 1 ≤ ||ων! || ≤ , 768qn! +1 8qn!
(2.11)
and, for fixed ν! , the number of possible scales to associate to ! is bounded by an absolute constant. As (2.11) is implied by (2.8), even for trees with nonvanishing value we shall use that if a line is on scale n! then (2.11) holds. Then, if Nn (ϑ), n ∈ N, denotes the number of lines on scale n in ϑ, we have trivially for a given tree ϑ the bound: | Val(ϑ)| ≤ D1k
∞
768qn+1
2Nn (ϑ)
,
(2.12)
n=0
for some constant D1 (actually D1 = 1/ $; see (2.4), (2.9) and (2.11)). Given a tree ϑ, let us denote with NnR (ϑ) the number of resonances with resonancescale n and by Pn (ϑ) the number of resonances on scale n. Of course N0R = 0. Remark 5. Note that the number NnR (ϑ) of resonances with resonance-scale n can be counted by counting the number of lines exiting resonances with resonance-scale n; analogously Pn (ϑ) can be counted by counting the number of lines exiting resonances on scale n. Such counts will be performed in Sect. 5. The following simple lemmata contain all the arithmetic we shall need, and are basically adapted from [4]. Lemma 1 (Davie’s lemma). Given ν ∈ Z such that ||ων|| ≤ 1/4qn , then 1. either ν = 0 or |ν| ≥ qn , 2. either |ν| ≥ qn+1 /4 or ν = sqn for some integer s. Lemma 2. If a tree ϑ has k < qn nodes, then Nn (ϑ) = 0 and Pn−1 (ϑ) = 0. Lemma 3. For any irrational number ω ∈ [0, 1): ∞ log qn n=0
qn
≤ D2 ,
(2.13)
for a constant D2 ; here qn are the denominators of the convergents of ω. Lemma 4. Given a momentum ν such that 1 1 ≤ ||ων|| ≤ , 8qn 768qn+1 then one can have χn (||ων||) = 0 only for n such that n − 8 ≤ n ≤ n + 8.
(2.14)
Bryuno Function and Standard Map
631
Proof of Lemma 1. If {qn } are the denominators of the convergents of ω, then (see e.g. [17, Ch. 1, §3]): 1 1 < ||ωqn || < , 2qn+1 qn+1
(2.15)
and: ∀|ν| < qn+1 , |ν| = qn :
||ων|| > ||ωqn ||.
(2.16)
To prove 1 note that if ν = 0 nothing has to be proved: so we assume ν = 0. If |ν| < qn , by (2.16) and (2.15), ||ων|| ≥ ||ωqn−1 || > 1/2qn , so that ||ων|| < 1/4qn implies |ν| ≥ qn , proving the first assertion of Lemma 1. To prove 2, again if ν = 0 nothing has to be proved (and s = 0): so we assume ν = 0, and proceed by reductio ad absurdum. If 0 < ν < qn+1 /4 and there does not exist any s ∈ Z such that ν = sqn , then one has ν = mqn + r, with 0 < r < qn and m < qn+1 /4qn ; then, by (2.15), ||ωmqn || ≤ m||ωqn || < m/qn+1 < 1/4qn , and, by (2.16), ||ωr|| ≥ ||ωqn−1 || > 1/2qn , as r = 0; so ||ων|| ≥ ||ωr|| − ||ωmqn || > 1/4qn . The case 0 > ν > −qn+1 /4 is identical as || · || is even. Proof of Lemma 2. If k < qn , then for any ! ∈ ϑ one has |ν! | ≤ k < qn , so that, by (2.15) and (2.16), ||ων! || ≥ ||ωqn−1 || > 1/2qn , hence n! < n and so Nn (ϑ) = 0 ∀n ≥ n. If there are no lines on scale ≥ n, it is impossible to form a cluster on scale n − 1 – which is different from the whole tree – a fortiori a resonance. Proof of Lemma 3. The denominators of the convergents {qn } of ω satisfy q0 = 1, q1 ≥ 1 and qn ≥ 2qn−2 for any n ≥ 2. So we can write: ∞ log qn n=0
qn
=
∞ log q2n n=0
q2n
+
∞ log q2n+1 n=0
q2n+1
;
(2.17)
using the fact that, for x ≥ e, x −1 log x is decreasing, we obtain easily: ∞ log qn n=0
qn
log x ≤ 3 max x≥1 x
∞ k + 2 log 2 = 3(e−1 + log 2) ≡ D2 , 2k
(2.18)
k=2
which also gives an explicit value for the constant D2 .
Proof of Lemma 4. Simply use that qn+1 ≥ qn and qn+2 ≥ 2qn for all n ≥ 0, to deduce that 1/48qn+9 < 1/768qn+1 and 1/96qn−8 > 1/8qn . The following “counting” lemma is the main result stated in this section, and it can be considered an adaption and extension of Lemma 2.3 in [4]. We postpone its proof to Sect. 5. Lemma 5. Given a tree ϑ, let Mn (ϑ) = Nn (ϑ) + Pn (ϑ). Then: Mn (ϑ) ≤ where k is the order of ϑ.
k 8k + + NnR (ϑ), qn qn+1
(2.19)
632
A. Berretti, G. Gentile
Therefore we can rewrite the bound (2.12) on the tree value as: | Val(ϑ)| ≤ D1k ≤
D1k
∞
768qn+1
n=0 ∞
768qn+1
2(Mn (ϑ)−Pn (ϑ)) 2(k/qn +8k/qn+1 +NnR (ϑ)−Pn (ϑ))
(2.20) .
n=0
Note that at this point it would be very easy to prove the lower bound in (1.9) for the semistandard map and, by simple modifications of the same scheme, for the Siegel problem, since in these cases no resonances appear. On the contrary, in the more difficult case of the standard map we lack, for the moment, a control on the number NnR (ϑ) of resonances in ϑ with resonance-scale n. In Sects. 3 and 4 we shall see how to improve the bound on the sum over the trees of fixed order and total momentum, in order to prove the theorem stated in Sect. 1. We postpone to forthcoming sections the proofs, limiting ourselves here to a heuristic discussion in order to give an idea of the structure of the proof. We perform a suitable resummation – described in Sects. 3 and 4 – whose consequence is that, for each resonance V , it is as if one of the external lines on scale nR V contributed 2 2 768qnV +1 instead of 768qnR +1 . To obtain such a result, we shall perform on trees V transformations which will lead to the introduction of new trees: so we extend Tν,k ∗ . However we shall prove that the value of each single tree in T ∗ to a larger set Tν,k ν,k still admits the bound (2.20) – even if, unlike the values of the trees in Tν,k , it fails to ∗ satisfy the same bound with 768 replaced with 96 – and the number of elements in Tν,k is bounded by a constant to the power k (i.e. no bad counting factors, like factorials, appear). Then we obtain, for the sum of the resummed trees, a bound of the form (2.20) with: ∞
768qn+1
2NnR (ϑ)
n=0
replaced with: D3k
∞
768qn+1
2Pn (ϑ)
,
n=0 ∗ will be shown to be for some constant D3 . By using that the number of trees in Tν,k bounded by a constant to the power k, we obtain, for some constants D4 , D5 : (k) Val(ϑ) ≤ Val(ϑ) |u (α)| ≤ ∗ |ν|≤k ϑ∈Tν,k
|ν|≤k ϑ∈Tν,k
≤ D4k
∞
768qn+1
n=0
2k/qn +16k/qn+1
∞ log qn+1 8 log qn+1 , ≤ D5k exp 2k + qn qn+1 n=0
(2.21)
Bryuno Function and Standard Map
633
which, by making use of Lemma 3, gives: log ρ(ω) + 2B1 (ω) ≥ −16D2 − log D5 .
(2.22)
By making rigorous the above discussion in Sects. 3 and 4, we shall complete the proof of the theorem, since the bound from above was already proved in [4]. 3. Renormalization of Resonances: Set-up and the First Step Given a tree ϑ, let us consider maximal resonances, i.e. resonances not contained in any larger resonance; let us call them first generation resonances. Inside the first generation resonances let us consider the “next maximal” resonances, i.e. the resonances not contained in any larger resonance except first generation resonances, and let us call them second generation resonances. We can define in this way j th generation resonances, for j ≥ 2, as resonances which are maximal within (j − 1)th generation resonances. Let V be the set of all resonances of a tree ϑ, and Vj the set of all resonances of j th generation, with j = 1, . . . , G, for some integer G, depending on ϑ. Given a tree ϑ and a resonance V ∈ Vj with mV entering lines, define V0 as the set of nodes and lines internal to V and outside any resonances contained in V . Let LV = {!1 , . . . , !mV } be the set of entering lines of V ; we define LR V as the subset of the lines in LV which enter some resonances of higher generation contained inside V and L0V = LV \ LR V as the subset of lines in LV which enter nodes in V0 . For any line !m ∈ LR V , let V (!m ) be the minimal resonance containing the node which the line !m enters (i.e. the highest generation resonance containing such a node) and V0 (!m ) the set of nodes and lines internal to V (!m ) and outside resonances contained in V (!m ). Define: ˜ ) = {V˜ ⊂ V : V˜ = V (!m ) for some !m ∈ LR V(V V }.
(3.1)
Call mV0 the number of lines in L0V . The number of lines in LR V entering the same ˜ ˜ resonance V ∈ V(V ) is not arbitrary: it is always 1, as it is shown by the following lemma. Lemma 6. For j ≥ 1, given a resonance W ∈ Vj +1 contained inside a resonance V ∈ Vj , only one among the entering lines W can also enter V . Proof. The case mW = 1 is obvious, so we assume mW ≥ 2. One has nR W ≤ nV , R otherwise V would be a cluster on scale < nW , so that all the lines external to W would be also external to V and V = W , while we assumed that V ⊂ W . Then if a line ! enters both V and W , one must have n! > nR W . But, by items 2 and 2 in the definition of resonance, all lines entering W have the same scale nR W except at most one. We define the resonance family FV (ϑ) of V ∈ V in ϑ as the set of trees obtained from ϑ by the action of a group of transformations PV on ϑ, generated by the following operations: 1. Detach the line !1 , then consider all trees obtained by reattaching it to any node 0 internal to V0 (!1 ) if !1 ∈ LR V and to any node in V0 if !1 ∈ LV ; for each tree so obtained, do the same operations with the line !2 (i.e. detach !2 and reattach it to any 0 node internal to V0 (!2 ) if !2 ∈ LR V and to any node in V0 if !2 ∈ LV ), and so forth for each line entering the resonance.
634
A. Berretti, G. Gentile
2. In a given tree, each node u ∈ V will have mu entering lines, of which su are inside V and ru = mu − su are outside V (i.e. are entering lines of V ); then we can apply to the set of lines entering u a transformation in the group obtained as the quotient of the group of permutations of the mu lines entering u by the groups of permutations of the su internal entering lines and of permutations of the ru entering lines outside V ; in this way for each node u ∈ V a number of trees equal to:
mu mu ! = su su !ru ! is obtained. 3. Change sign simultaneously to all the mode labels of the nodes internal to V . We shall call renormalization transformations (of type 3, 3, 3) the operations described above. Remark 6. Note that in all such transformations the scales are not changed (by definition) and the set of resonance V remains the same (by construction). On the contrary the momenta flowing through the lines can change (because of the shift of the lines entering resonances) and in particular one can have for some lines !, χn! (||ων! ||) = 0, if ν! is the modified momentum flowing through !. Remark 7. The definition of resonance families is aimed at grouping together the trees between which one will look for compensations, but in doing so one has to avoid overcountings. In fact, to each tree ϑ we associate a value Val(ϑ) according to (2.9); when applying the transformations of the group PV on the tree ϑ, the same tree ϑ can be obtained, in general, in several ways; however, it has to be counted once. This means that PV , as a group, defines an equivalence class, and only inequivalent elements obtained through the transformations defining PV have to be retained. Let us call FV1 (ϑ) the family obtained by the composition of all transformations defining the resonance families FV1 (ϑ), V1 ∈ V1 . For any tree ϑ1 ∈ FV1 (ϑ), let V2 be a resonance in V2 and let us define the resonance family FV2 (ϑ1 ) of V2 in ϑ1 as the set of trees obtained from ϑ1 by the action of the group of transformations PV2 . The composition of all transformations defining the resonance families FV2 (ϑ1 ), for all ϑ1 ∈ FV1 (ϑ) and all V2 ∈ V2 , gives a family that we shall denote by FV2 (ϑ). We continue by considering resonances of 3rd generation, and so on until the Gth generation resonances are reached. At the end we shall have a family F(ϑ) of trees obtained by the composition of all transformations of the groups PV , V ∈ V, defined recursively through the application of the renormalization transformations corresponding to resonances V ∈ Vj to all trees ϑ belonging to the family FVj −1 (ϑ). Remark 8. Given a tree ϑ ∈ Tν,k and a family F(ϑ), when considering another tree ϑ ∈ F(ϑ) with nonvanishing value Val(ϑ ), the same family F(ϑ ) = F(ϑ) is obtained (by construction). Note however that F(ϑ) can contain also trees with vanishing values, as they can have lines ! such that χn! (||ων! ||) = 0 (see Remark 6). Define also NF (ϑ) the number of trees in F(ϑ) whose value is not vanishing; of course NF (ϑ) ≤ |F(ϑ)|, if |F(ϑ)| is the number of elements in F(ϑ).
Bryuno Function and Standard Map
Write:
Val(ϑ) =
ϑ∈Tν,k
ϑ∈Tν,k
635
1 NF (ϑ)
Val(ϑ ) =
∗ ϑ∈Tν,k
ϑ ∈F (ϑ)
1 |F(ϑ)|
Val(ϑ ),
ϑ ∈F (ϑ)
(3.2) where the factors NF (ϑ) and |F(ϑ)| have been introduced in order to avoid overcountings ∗ : so T ∗ is the set of (see Remark 8) and the last sum implicitly defines the set Tν,k ν,k inequivalent trees in ∪ϑ∈Tν,k F(ϑ). ∗ , then ϑ ∈ F(ϑ ) for some tree ϑ ∈ T ; however one has to bear If a tree ϑ ∈ Tν,k 0 0 ν,k in mind that ϑ, unlike ϑ0 , could vanish. ∗ , if V is a first generation resonance, we define its resonance Given a tree ϑ ∈ Tν,k factor VV (ϑ) as its contribution to the value of the tree ϑ, namely: mu +1
νu VV (ϑ) = gn! (ν! ) , (3.3) mu ! u∈V
!∈V
which of course depends on the subset of ϑ outside the resonance V only through the momenta of the entering lines of V . Given a node u ∈ V , let us denote with Eu the set of lines entering V such that they end in nodes preceding u. For future notational convenience, we rewrite (3.3) as: VV (ϑ) = UV (ϑ)LV (ϑ),
UV (ϑ) =
νumu +1 , mu !
LV (ϑ) =
u∈V
gn! (ν! ).
(3.4)
!∈V
In the following, we shall consider the quantities ων, ν ∈ Z, modulo 1, and shall continue to use the symbol ων to denote the representative of the equivalence class within the interval (−1/2, 1/2]. For any node u contained in a resonance V , we shall write: ν!u = ν!0u + ν! , ν!0u = νw , (3.5) ! ∈Eu
w∈V wu
where the set Eu was defined after (3.3). We shall consider the resonance factor (3.3) as a function of the quantities µ1 = ων!1 , . . . , µmV = ων!mV , where ν!1 , . . . , ν!mV are the momenta flowing through the lines !1 , . . . , !mV entering V . More precisely, we let: V(ϑ) ≡ VV (ϑ; ων!1 , . . . , ων!mV ),
(3.6)
and we write: VV (ϑ; ων!1 , . . . , ων!mV ) = LVV (ϑ; ων!1 , . . . , ωνmV ) + RVV (ϑ; ων!1 , . . . , ωνmV ),
(3.7)
where: LVV (ϑ; ων!1 , . . . , ων!mV ) = VV (ϑ; 0, . . . , 0) +
mV m=1
ων!m
∂ VV (ϑ; 0, . . . , 0) ∂µm
(3.8)
636
A. Berretti, G. Gentile
is the localized part of the resonance factor, or localized resonance factor, while: RVV (ϑ; ων!1 , . . . , ων!mV ) =
1
·
mV m,m =1
ων!m ων!m
dt (1 − t)
0
∂2 VV (ϑ; tων!1 , . . . , tων!mV ) ∂µm ∂µm
(3.9)
is the renormalized part of the resonance factor, or renormalized resonance factor. In (3.7) L is called the localization operator and R = 1 − L is called the renormalization operator. Using the notations (3.4), we can write: LVV (ϑ) = UV (ϑ)LLV (ϑ),
RVV (ϑ) = UV (ϑ)RLV (ϑ),
(3.10)
as only the factors in LV (ϑ) depend on the momenta flowing through the lines entering the resonance V . Remark 9. Note that in the localized part (3.8) the momentum ν! flowing through any line ! internal to V is changed into ν!0 (see (3.5)). Then we perform the renormalization transformations in PV described above. By Remark 9, for all trees obtained by applying the group PV the contribution to the localized resonance factor arising from the LV (ϑ) term in (3.4) is the same, i.e. : LLV (ϑ) = LLV (ϑ ),
∀ϑ ∈ FV (ϑ),
(3.11)
so that we can consider:
LVV (ϑ ).
(3.12)
ϑ ∈FV (ϑ)
The sum over all the trees in the resonance family FV (ϑ) of the localized resonance factors produces zero, so that only the renormalized part has to be taken into account. The proof of this assertion is similar to the proof of the analogous statement in [1], and it is given in Sect. 6 as a particular case of the proof of the more general statement in Lemma 8 below. Then only the second order terms have to be taken into account in (3.7). This leads to the following expression for the renormalized resonance factor: RVV (ϑ) = UV (ϑ)
mV m,m =1
·
ων!m ων!m
∂ ∂ gn!1 (ν!1 ) gn!2 (ν!2 ) ∂µm V V ∂µm V V 1 2
gn! (ν! )
!∈V !=!1V ,!2V
!V ,!V ∈V !1V =!2V
+
∂ ∂ gn!V (ν!V ) gn! (ν! ) , ∂µm ∂µm
!V ∈V
!∈V !=!V
(3.13)
Bryuno Function and Standard Map
637
from the very definition of the renormalized resonance factor (3.9), by noting that the two derivatives in (3.9) act either on two distinct propagators (the sum with !1V = !2V in (3.13)) or on the same propagator (the sum with only one line !V in (3.13)). Note that it can happen that ϑ ∈ FV (ϑ0 ), for some tree ϑ0 ∈ Tν,k , i.e. for some tree ϑ0 with nonvanishing value, while VV (ϑ) = 0 (correspondingly there does not exist any tree in Tν,k of that shape associated with the given choice of mode and scale labels). The tree ϑ is obtained from ϑ0 through a transformation of PV , so that there is a correspondence between the lines of ϑ0 and the lines of ϑ: we shall say that the lines are conjugate. The tree ϑ inherits the scale labels of the tree ϑ0 , i.e the lines in ϑ have the same scales of the conjugate lines of ϑ0 . So it can happen that in ϑ0 some line internal to V has a scale n! and a momentum ν˜ ! such that χn! (||ων˜ ! ||) = 0, while the momentum ν! of the line ! seen as a line of ϑ (i.e. of the line of ϑ conjugate to the line ! of ϑ0 ) is such that χn! (||ων! ||) = 0 (see Remark 8). This means that for such a line (2.8) does not hold. Nevertheless, as anticipated in Remark 6, one finds that the momentum ν! can not change “too much” with respect to ν˜ ! ; more precisely: 1 1 ≤ ||ων! || ≤ , 768qn! +1 24qn!
(3.14)
as we shall prove, using the following result. ∗ be a tree obtained by Lemma 7. Given a tree ϑ0 ∈ Tν,k and a resonance V ,let ϑ ∈ Tν,k the action of the group PV , i.e. ϑ ∈ FV (ϑ0 ). If ||ων!m || ≤ 1/8qnR for any entering line V !m of V , m = 1, . . . , mV , then, for any line ! ∈ V , one has
||ων! || − ||ων˜ ! || ≤
1 , 4qnR
||ων! || ≥
V
1 , 4qnR
||ων˜ ! || ≥
V
1 , 4qnR
(3.15)
V
if ν! and ν˜ ! are the momenta flowing through ! in ϑ and ϑ0 , respectively. Proof. As V is a resonance, then for each line ! ∈ V one has |ν!0 | ≤ kV < qnR (see item V 2 in the definition of resonance), so that: 1 , 2qnR
(3.16)
||ων!m ||,
(3.17)
||ων!0 || ≥ ||ωqnR −1 || > V
V
by (2.15) and (2.16). On the other hand: ||ων! − ων!0 || ≤
mV m=1
if ν1 , . . . , νmV are the momenta flowing through the lines !1 , . . . , !mV entering V . By hypothesis: ||ων!m || ≤
1 , 8qnR
∀m = 1, . . . , mV .
(3.18)
V
If mV ≥ 2 then one must have qnR +1 > 4qnR (see item 2 in the definition of resonance). V V In such a case if there is an entering line (say !1 ) which is the root line of a tree of order ≥ qnR +1 /4, then all the other lines are the root lines of subtrees of orders k2 , . . . , kmV V
638
A. Berretti, G. Gentile
such that k0 ≡ k2 + . . . + kmV < qnR +1 /8 (see item 2 in the definition of resonance). V Moreover, for each m = 2, . . . , mV , km ≥ qnR , otherwise the line !m would not be on V
scale ≥ nR V . By Lemma 1, ν!m = sm qnR for all m = 2, . . . , mV , with sm ∈ Z, and: V
|s2 | + . . . + |smV | ≤
qnR +1 k0 ≤ V , qnR 8qnR V
(3.19)
V
so that: mV m=1
m
V 1 1 1 1 ||ων!m || ≤ + |sm | ||ωqnR || ≤ + = , V 8qnR 8qnR 8qnR 4qnR
m=2
V
V
V
(3.20)
V
where use was made of (2.15). Therefore, when replacing ϑ0 with ϑ, (3.15) follows. If there is no entering line of V which is the root line of a tree of order ≥ qnR +1 /4 V and the tree having as root line the exiting line of V is of order k < qnR +1 /4 (see item V 2 in the definition of resonance), then: mV m=1
|sm |qnR ≤ k1 + . . . + kmV ≡ k − kV < k ≤
qnR +1 V
V
,
4
(3.21)
so that: mV
mV
||ων!m || ≤
m=1
m=1
|sm | ||ωqnR || ≤ V
qnR +1 V
1
4qnR qnR +1 V
V
=
1 , 4qnR
(3.22)
V
which implies again (3.15). If mV = 1, then (3.15) follows immediately from (3.17) and (3.18). We come back to the proof of (3.14). As the entering lines of V satisfy (2.8), hence (2.11), Lemma 7 applies. Note that inside V in ϑ0 (hence also in ϑ, see Remark 6) only lines on scale n! such that 1/48qn! > 1/4qnR are possible, by the second inequality in V (3.15) and the definition of scale (see (2.8)).Then, given a line ! internal to V on scale n! , one has: ||ων! || ≤
1 1 1 1 1 + ≤ + = . 48qn! 4qnR 48qn! 48qn! 24qn!
(3.23)
V
Likewise, if 1/96qn! +1 > 2/qnR , one has: V
||ων! || ≥
1 1 1 1 1 1 1− − ≥ − = , 96qn! +1 4qnR 96qn! +1 768qn! +1 96qn! +1 8
(3.24)
V
while, if 1/96qn! +1 < 2/qnR , one has: V
||ων! || ≥
1 1 ≥ , 4qnR 768qn! +1
(3.25)
V
by the third inequality in (3.15). Then (3.14) follows: so in particular the momentum ν! of the line ! ∈ ϑ still fulfills (2.11).
Bryuno Function and Standard Map
639
Note that (3.13) and (2.11) imply the following bound for the renormalized resonance factor of a first generation resonance: |RVV (ϑ)| ≤ D6 D7kV
mV m,m =1
· 768qnV +1
||ων!m || ||ων!m ||
2
768qn! +1
2
(3.26) ,
!∈V
(for some constants D6 and D7 ), where the last product (times $ −k ) represents a bound on the resonance factor (3.3). The proof of such an assertion again is as in [1] (see the proof of the Corollary in [1, §3]), and follows immediately by noting that for any line ! ∈ V one has n! ≥ nV . The only difference with respect to [1] is that now the derivatives can act also on the compact support functions: they were just missing in [1]; it is nevertheless straightforward to see that: p ∂ p χ (||ων ||) (3.27) ! ≤ D8 768qn+1 , ∂pµ n with p = 1, 2, for some constant D8 , so that: p ∂ p+2 (ν ) , g ∂ p µ n ! ≤ D9 768qn+1
(3.28)
with p = 0, 1, 2, for some constant D9 . For any tree in FV (ϑ) the bound (2.11) holds, so that Lemma 5 applies (see Remark 15 in Sect. 5). Note that the two factors ||ων!m ||, ||ων!m || in (3.26) allow us to neglect the propagator corresponding to a line entering a resonance with resonance-scale nR V , provided such a propagator is replaced by a factor (768qnV +1 )2 , where nV is the scale of the resonance as a cluster. Such a mechanism corresponds to the discussion leading to (2.21), as far as only the first generation resonances are considered. In general a tree will contain more resonances, and the resonances can be contained into each other. Then the above discussion has to be extended to cover the more general case: which will be done in the next section. 4. Renormalization of Resonances: The General Step We proceed following strictly the techniques of [6] and [18]. ∗ in (3.2). For each resonance V of any generation, let us Consider a tree ϑ ∈ Tν,k define a pair of derived lines !1V , !2V internal to V – possibly coinciding – with the following “compatibility” condition: if V is inside some other resonance W , the set {!1V , !2V } must contain those lines of {!1W , !2W } which are inside V . Clearly there can be 0, 1 or 2 such lines, and correspondingly we shall say that the resonance V is of type 2 if none of its derived lines is a derived line for one of the resonances containing it, of type 1 if just one of its two derived lines is a derived line for one of the resonances containing it, and of type 0 if both derived lines are derived lines for some resonances W , W – possibly coinciding – containing V ; we shall use a label zV = 0, 1, 2 to take note of the type of the resonance V . One associates also to each resonance V a pair of
640
A. Berretti, G. Gentile
entering lines !Vm , !Vm if zV = 2 and a single line !Vm if zV = 1, with m, m = 1, . . . , mV . Moreover for each resonance we shall introduce an interpolation parameter tV and a measure πzV (tV ) dtV such that: z=2 (1 − t), πz (t) = 1, (4.1) z=1 δ(t − 1), z = 0; we shall denote with t = {tV }V ∈V the set of all interpolation parameters. The momentum flowing through a line !u internal to any resonance V will be defined recursively as: ν!u (t) = ν!0u + tV ν! (t), ν!0u = νw ; (4.2) !∈Eu
w∈V wu
of course ν!u (t) will depend only on the interpolation parameters corresponding to the resonances containing the line !u (by construction). For any resonance V the resonance factor is defined as:
VV (ϑ) = UV (ϑ) gn! (ν! (t)) , (4.3) !∈V
when zV = 2, as:
∂ VV (ϑ) = UV (ϑ) gn (ν 1 (t)) gn! (ν! (t)) , ∂µ !1V !V
(4.4)
!∈V , !=!1V
when zV = 1 (and we have called !1V the line in {!1V , !2V } which belongs to the set {!1W , !2W } for some resonance W containing V ), as:
∂2 VV (ϑ) = UV (ϑ) gn (ν 1 (t)) gn! (ν! (t)) , (4.5) ∂µ∂µ !1V !V !∈V , !=!1V
when zV = 0 and !1V = !2V , and as: ∂ ∂ VV (ϑ) = UV (ϑ) g (ν (t)) gn!1 (ν!1 (t)) 2 n ∂µ V V ∂µ !2V !V
·
gn! (ν! (t)) ,
(4.6)
!∈V , !=!1V ,!2V
when zV = 0 and !1V = !2V . W and µ = ων!W , for some lines !W In (4.4)–(4.6) one has µ = ων!W m and !m (possibly m m
coinciding) entering, respectively, some resonances W and W (possibly coinciding) containing V .
Bryuno Function and Standard Map
641
We define the renormalization operator according to the type of the resonance; namely, if zV = 2, then: RVV (ϑ; ων!1 (t), . . . , ων!mV (t)) = · 0
1
dtV (1 − tV )
mV
m,m =1 2 ∂
∂µm ∂µm
ων!m (t)ων!m (t)
VV (ϑ, tV ων!1 (t), . . . , tV ων!mV (t));
(4.7)
if zV = 1, then: RVV (ϑ; ων!1 (t), . . . , ων!mV (t)) =
1
· 0
dtV
mV
ων!m (t)
m=1
∂ VV (ϑ, tV ων!1 (t), . . . , tV ων!mV (t)); ∂µm
(4.8)
finally if zV = 0, then: RVV (ϑ)(ϑ; ων!1 (t), . . . , ων!mV (t)) = VV (ϑ)(ϑ; ων!1 (t), . . . , ων!mV (t)).
(4.9)
In all cases set L = 1 − R. Remark 10. Note that zV equals the order of the renormalization performed on the resonance V . 0 R Remark 11. If a resonance V has a resonance-scale nR V , then there is a line !V on scale nV entering V such that ||ων! || ≤ ||ων!0 || for each ! entering V . If there is ambiguity, !0V V can be chosen arbitrarily. For any resonance V one has a factor bounded by ||ων!0 ||zV ,
from (4.7), (4.8) and (4.9) and by the definition of !0V .
V
To each line ! derived once one can associate the line !m (!) corresponding to the quantity µm = ων!m (!) with respect to which the propagator gn! (ν! (t)) is derived. If the line ! is derived twice one associates to it the two lines !m (!) and !m (!) such that µm = ων!m (!) and µm = ων!m (!) are the quantities with respect to which the propagator gn! (ν! (t)) is derived. Given aderivedline !, let V betheminimalresonance containing it. If the line ! is derived once, then let W be the resonance for which !m (!) is an entering line; if instead ! is derived twice, let W, W ⊆ W be the resonances for which the lines !m (!), !m (!) respectively are entering lines. In the first case, let Wi , i = 0, . . . , p the resonances contained by W and containing V , ordered naturally byinclusion: V = W0 ⊂ W1 ⊂ · · · ⊂ Wp = W.
(4.10)
We shall call the set W(!) = {W0 , . . . , Wp } the simple cloud of !. In the second case, let Wi , i = 0, . . . , p, the resonances contained by W and containing V , ordered naturally by inclusion: V = W0 ⊂ W1 ⊂ · · · ⊂ Wp = W ⊂ · · · ⊂ Wp = W,
(4.11)
642
A. Berretti, G. Gentile
with p ≤ p. We shall say that W− (!) = {W0 , . . . , Wp } is the minor cloud of ! while W+ (!) = {W0 , . . . , Wp } is the major cloud of V . When the renormalization of a resonance V ∈ Vj +1 is performed, a tree ϑ0V ∈ FV (ϑ), with V ∈ Vj , ϑ ∈ Tν,k , is replaced by the action of the group PV with a new tree ϑ V . As this replacement is performed iteratively, one has the constraint that if V1 and V2 are two resonances such that V1 is the minimal resonance containing V2 , then ∗ . On ϑ V1 = ϑ0V2 . At the end, the original tree ϑ0 ∈ Tν,k is replaced with a tree ϑ ∈ Tν,k each resonance V ∈ V of ϑ the renormalization operator R acts: a tree whose resonance factors have been all renormalized will be called a renormalized (or resummed) tree. As the replacement corresponding to each resonance settles a conjugation between lines of ϑ0V and those of ϑ V , in the end for each line of ϑ there will be a conjugate line of ϑ0 . Note that, as the transformations of the groups PV , V ∈ V, do not modify the scales of ϑ0 (see Remark 6), the scales of the lines of ϑ are the same as those of the conjugate lines of the tree ϑ0 , so that, in order to apply Lemma 5, we have only to verify that (2.11) is verified for the lines in ϑ: this will be done below (after Remark 12). Now, we shall show that: • the localized resonance factors can be neglected (in a sense that will appear clear shortly, see Lemma 8 below), • for any (renormalized) resonance we obtain a factor: 2 768qnV +1 ||ων!0 ||2 , (4.12) V
and • the number of terms generated by the renormalization procedure is bounded by a constant to the power k, so that the bound (2.20) can be replaced by a bound which leads to (2.21), as anticipated in Sect. 2. Note firstly that the localized part of the resonance factors can be dealt with as in Sect. 3, when only first generation resonances were considered. More formally, we have the following result, which is proved in Sect. 6. Lemma 8. Given a tree ϑ and a resonance V ∈ ϑ, the localized resonance factor LVV (ϑ) gives zero when the values of the trees belonging to the same resonance family FV (ϑ) are summed together. Define the map 8: 8 : V → 8V = zV , !1V , !2V , {!Vm , !Vm }∗ V ∈V ,
(4.13)
which associates to each resonance V ∈ V the derived lines !1V , !2V and the lines in the set {!Vm , !Vm }∗ defined as: V V if zV = 2, {!m , !m }, V V ∗ V {!m , !m } = !m , (4.14) if zV = 1, ∅, if zV = 0, where m, m = 1, . . . , mV and !V1 , . . . , !VmV are the lines entering V .
Bryuno Function and Standard Map
643
Note that the map 8 gives a natural decomposition of the set L of all lines of ϑ into L = L0 ∪ L1 ∪ L2 , where Lj is the set of lines derived j times. Then, by using Lemma 8, one has: Val(ϑ) =
8V
1
V ∈V 0
πzV (tV ) dtV
u∈ϑ
νumu +1 mu !
∂ gn! (ν! (t)) ων!m (!) gn! (ν! (t)) · ∂µm !∈L0 !∈L1
∂2 · ων!m (!) ων!m (!) gn! (ν! (t)) . ∂µm ∂µm
(4.15)
!∈L2
Remark 12. Note that no propagator is derived more than twice: this fact is essential for our proof since we have no control on the growth rate of the derivatives of the compact support functions (2.6). After the renormalization procedure has been applied for all resonances, one checks that the momenta of the lines in ϑ have changed, with respect to the original tree ϑ0 with nonvanishing value, in such a way that the bound (2.11) still holds. ∗ , obtained from ϑ ∈ T Lemma 9. Consider a renormalized tree ϑ ∈ Tν,k ν,k by the iterative replacements, described above, that take place each time a resonance appears. Then the lines of ϑ inherit the scales of the conjugate lines of ϑ0 and Lemma 5 applies to ϑ.
Proof. The first assertion follows by construction. The second one can be seen by induction on the generation of the resonances, by taking into account that for the first generation resonances the result has been already proved in Sect. 3. So let us suppose that (2.14) holds for resonances of any generation j , with j < j . Consider a line ! contained inside a resonance V ∈ Vj and outside all resonances in Vj +1 contained inside V : then there will be j resonances V ≡ W1 ⊂ . . . ⊂ Wj containing !. Each renormalization produces a change on the momentum flowing through the line !, such that, if ν˜ ! is the momentum flowing through the line ! in ϑ0 and ν! is the momentum flowing through the conjugate line ! in ϑ, then: j
j
i=1
i=1
1 1 1 1 − ≤ ||ων˜ ! || ≤ + . 96qn! +1 4qnR 48qn! 4qnR Wi
(4.16)
Wi
Call ϑ0V ∈ FVj (ϑ0 ) the tree containing V (which is not, in general, the originary tree ϑ0 ) and ϑ V the tree in FV (ϑ0V ) obtained by the action of the group PV . As (2.11) is supposed to hold before renormalizing V , for all lines !m , m = 1, . . . , mV , entering V one has ||ων!m || < 1/8qn!m , so that, by reasoning as in Sect. 3 to prove Lemma 7, we can conclude that: ||ων! || − ||ων˜ ! || ≤
1 , 4qnR V
||ων! || ≥
1 , 4qnR
||ων˜ ! || ≥
V
where ν! is the momentum flowing through the line ! in ϑ V .
1 , 4qnR V
(4.17)
644
A. Berretti, G. Gentile
In order that ! be contained inside V = W1 , one must have 1/48qn! ≥ 1/4qnR ; V moreover if j1 = &(j − 1)/2' and j2 = &j/2' (here &·' denotes the integer part), one has: qnR ≤ W1
qnR
W3
2
≤ ... ≤
qnR
Wj
2j1
1
,
qnR ≤
qnRW
W2
2
4
≤ ... ≤
qnRW
j2
,
2 j2
(4.18)
(simply use that qn+1 ≥ qn and qn+2 ≥ 2qn for any n ≥ 0). Then one can write: ||ων! || ≤
1 1 1 1 1 1 + + + ; ≤ i i 48qn! 4qnR 2 2 48qn! qnR V
j1
j2
i=0
i=0
(4.19)
V
this is bounded from above by 5/48qn! . Likewise one finds: ||ων! || ≥
1 1 1 1 1 1 − + − ; ≥ i 96qn! +1 4qnR 2 2i 96qn! +1 qnR V
j1
j2
i=0
i=0
(4.20)
V
this is bounded from below by 1/192qn! +1 if 1/96qn! +1 > 2/qnR and by 1/768qn! +1 V if 1/96qn! +1 ≤ 2/qnR . V Then (2.14) holds also for any line ! contained inside V0 , if V is a resonance in Vj . As any next renormalization is on resonances V ∈ Vj , with j > j , so that it does not shift the line !, the momentum ν! changes no more, so that the inductive proof is complete. Then in (4.15) we can bound, for ! ∈ L1 : ων! (!) ∂ gn (ν! (t)) m ! ∂µm 3 ≤ D9 ||ων!m (!) || 768qn! +1 p−1 3 ||ων!0Wi || ≤ D9 ||ων!m (!) || 768qn! +1 ||ων!0 ||
≤ D9 768qn! +1
p 2 i=0
i=0
||ων!0 || Wi
p
(4.21)
Wi
i=0
768qnWi +1 ,
where W(!) = {W0 , . . . , Wp } is the simple cloud of !, and, for ! ∈ L2 : ∂2 ων! (!) ων! (!) gn! (ν! (t)) m m ∂µ ∂µ m
m
4 ≤ D9 ||ων!m (!) || ||ων!m (!) || 768qn! +1
p−1 p −1 4 ||ων!0Wi || ||ων!0Wi || ≤ D9 ||ων!m (!) || ||ων!m (!) || 768qn! +1 ||ων!0 || ||ων!0 || i=0
Wi
i =0
Wi
Bryuno Function and Standard Map
≤ D9 768qn! +1 p i =0
645
p 2
||ων!0 || Wi
i=0
||ων!0 || Wi
p
i=0
768qnWi +1
i =0
p
768qnW +1 , i
(4.22)
where W− (!) = {W0 , . . . , Wp } is the minor cloud and W+ (!) = {W0 , . . . , Wp } is the major cloud of !. Note that (4.21) and (4.22) give a factor: ||ων!0 || 768qnWi +1 (4.23) Wi
for each resonance Wi belonging to the (simple or minor or major) cloud of !. As each resonance belongs to the cloud of some line internal to it and each resonance contains two derived lines or one line derived twice (by definition of the renormalization procedure), then one concludes that a factor equal to the square of (4.23) is obtained for each resonance. If we note that each underived propagator can be bounded again using (3.28) with p = 0, then we can summarize the bounds (4.21)–(4.22) stating that, for each resummed tree ϑ, we have: • for each resonance V , a factor ||ων!0 ||2 times a factor (768qnV +1 )2 ; V
• for each line !, a factor D9 (768qn! +1 )2 (as the factors (768qn! +1 )p , p = 1, 2, appearing when the corresponding propagator is derived, are taken into account by the factors associated to the resonances, see the item above).
Then the statement concerning (4.12) is proved. Once the single summand in (4.15) has been bounded, one is left with the problem of bounding the number of terms on which the sum is performed. For each first generation resonance V at most m2V times kV2 summands are generated by the renormalization procedure (see (3.13)). In general, for each (renormalized) resonance, we have to sum over the entering lines {!Vm , !Vm }∗ (corresponding to the quantities µm , m = 1, . . . , mV , in terms of which the renormalized resonance factor is considered a function) and over the internal lines {!1V , !2V } (corresponding to the factors on which the derivatives act). An estimate on the number of summands generated by the renormalization procedure can be obtained by using the counting Lemma 6. If V ∈ Vj , j ≥ 1, let NV be the number of (j + 1)th generation resonances contained inside V . Recall that V0 is the set of lines internal to V which are outside any resonance contained in V , and denote by kV0 the number of elements in V0 . The renormalization procedure, for each renormalized resonance, generates a single or double sum over the entering lines whose momenta appear in the quantities ων!1 (t), . . . , ων!mV (t), in terms of which the resonance factor is expanded: the sum is single if the localization is to first order and double if the localization is to second order (see (4.7) and (4.8)). Then we find, using Lemma 6, that in the renormalization procedure each sum over the entering lines of a first generation resonance V is on mV terms, each sum over the entering lines of all second generation resonances V ⊂ V is on kV0 + NV terms, each sum over the entering lines of all third generation resonances V ⊂ V ⊂ V is on
646
A. Berretti, G. Gentile
kV0 + NV , and so on; in general, each sum over the entering lines of all the resonances V ∈ Vj +1 contained inside a resonance V ∈ Vj is bounded by kV0 + NV . Once all generations of resonances have been considered, the overall number of summands generated by the renormalization procedure – by taking also into account the sum over the derived lines and using Remark 12 – is bounded by:
kV2 m2V (kV0 + NV )2 ≤ e6k , (4.24) V ∈V1
V ∈V1
V ∈V
where k is the order of the tree ϑ. In fact, just use x ≤ ex and the obvious inequalities: kV ≤ k, V ∈V1
mV +
V ∈V1
kV0 ≤ k,
(4.25)
V ∈V
NV ≤ k.
V ∈V
Then the statement after (4.12) is proved and the constant D3 is e6 . Finally one has to count the number of trees. The bound given in Sect. 2 is no more valid, as a line ! ∈ ϑ can have more than two scale labels. However Lemma 4 proves that to each line at most D10 = 17 scale labels can be associated, so that the number of trees ∗ is bounded by 23k D k . Then the bound (2.21) follows, with D = 23 D D D : in Tν,k 4 3 9 10 10 this concludes the proof of the theorem. 5. Proof of Lemma 5 We shall prove inductively on the order k the following bounds: Mn (ϑ) = 0, 2k − 1 + NnR (ϑ), Mn (ϑ) ≤ qn
if k < qn ,
(5.1a)
if k ≥ qn ,
(5.1b)
for any n ≥ 0, and: Mn (ϑ) = 0, k + NnR (ϑ), Mn (ϑ) ≤ qn k 8k Mn (ϑ) ≤ + − 1 + NnR (ϑ), qn qn+1
if k < qn , if qn ≤ k < if k ≥
(5.2a) qn+1 , 4
qn+1 , 4
(5.2b) (5.2c)
for qn+1 > 4qn , where k is the order of the tree ϑ. Note that (5.1a) and (5.2a) are simply a consequence of Lemma 2 of Sect. 2, so we have to prove only (5.1b), (5.2b) and (5.2c). Remark 13. If we were only interested in proving the analyticity of the invariant curves for rotation numbers satisfying the Bryuno condition, then Eqs. (5.1) would be sufficient – as it would be easy to check by proceeding along the lines of Sects. 3 and 4. However, in order to find the optimal dependence of the radius of convergence ρ(ω) on ω, which is the main focus of this paper, the more refined bounds (5.2) are necessary.
Bryuno Function and Standard Map
647
Remark 14. The proof of (5.1) is easier, as it is obvious since it is a weaker result. After dealing with (5.2), the proof of (5.1) could be left as an exercise: we shall prove it explicitly for completeness, and as it could be read as an introduction to the more involved proof of (5.2). We shall prove first (5.2) (case qn+1 > 4qn ) in cases [1]–[3] below, then (5.1) in items [4]–[6] below. We proceed by induction, and assuming that (5.1), (5.2) hold for any k < k we shall show that they hold for k also; their validity for k = 1 being trivial, Lemma 5 is proved. Recall also Remark 5 in Sect. 2 about the way of counting the resonances on scale n and the resonances with resonance-scale n. • So consider first qn+1 > 4qn . [1] If the root line ! of ϑ has scale = n and it is not the exiting line of a resonance on scale n, let us denote with !1 , . . . , !m the lines entering the last node u0 of ϑ and ϑ1 , . . . , ϑm the subtrees of ϑ whose root lines are those lines. By construction Mn (ϑ) = Mn (ϑ1 ) + · · · + Mn (ϑm ) and NnR (ϑ) = NnR (ϑ1 ) + · · · + NnR (ϑm ): the bounds (5.2) follow inductively by noting that for k ≥ qn+1 /4 one has 8k/qn+1 − 1 ≥ 1. [2] If the root line ! of ϑ has scale n, then we can reason as follows. Let us denote with !1 , . . . , !m the lines on scale ≥ n which are the nearest to the root line of ϑ,2 and let ϑ1 , . . . , ϑm be the subtrees with root lines !1 , . . . , !m . If m = 0 then (5.2) follow immediately from Lemma 2 of Sect. 2; so let us suppose that m ≥ 1. Then the lines !1 , . . . , !m are the entering lines of a cluster T (which can degenerate to a single point) having the root line of ϑ as the exiting line. As ! cannot be the exiting line of a resonance on scale n, one has: Mn (ϑ) = 1 + Mn (ϑ1 ) + · · · + Mn (ϑm ).
(5.3)
˜ ≤ m, In general m ˜ subtrees among the m considered have orders ≥ qn+1 /4, with 0 ≤ m while the remaining m0 = m − m ˜ have orders < qn+1 /4. Let us numerate the subtrees so that the first m ˜ have orders ≥ qn+1 /4. Let us distinguish the cases k < qn+1 /4 and k ≥ qn+1 /4. ˜ = 0 and each line entering T , by Lemma 1 of Sect. 2, has a [2.1] If k < qn+1 /4, then m momentum which is a multiple of qn and, by a Lemma 2, has a scale label n. Therefore the momentum flowing through the root line is ν = νT + s0 qn , for some s0 ∈ Z, with: νu . (5.4) νT ≡ u∈T
Moreover also the root line of ϑ has scale n, by assumption, and momentum ν = sqn for some s ∈ Z, by Lemma 1, so that νT = (s − s0 )qn = s qn , for some integer s . [ 2.1.1] If s = 0, then kT ≥ |νT | ≥ qn , giving: Mn (ϑ) ≤ 1 +
k 1 + · · · + km + NnR (ϑ1 ) + · · · + NnR (ϑm ) qn k − kT k ≤1+ + NnR (ϑ) ≤ + NnR (ϑ), qn qn
(5.5)
2 That is, such that no other line along the paths connecting the lines ! , . . . , ! to the root line is on scale m 1
≥ n.
648
A. Berretti, G. Gentile
as NnR (ϑ) = NnR (ϑ1 ) + · · · + NnR (ϑm ), and (5.2b) follows. [2.1.2] If s = 0 and kT ≥ qn , one can reason as in case [2.1.1]. [2.1.3] If s = 0 and kT < qn , then T is a resonance with resonance-scale n, and: Mn (ϑ) ≤ 1 +
k 1 + · · · + km + NnR (ϑ1 ) + · · · + NnR (ϑm ) qn k k ≤1+ + NnR (ϑ1 ) + · · · + NnR (ϑm ) ≤ + NnR (ϑ), qn qn
(5.6)
as NnR (ϑ) = 1 + NnR (ϑ1 ) + · · · + NnR (ϑm ), and again (5.2b) follows. [2.2] If k ≥ qn+1 /4, assume again inductively the bounds (5.2). From (5.3) we have: Mn (ϑ) ≤ 1 +
m ˜ kj j =1
qn
+
m m kj R 8kj −1 + + Nn (ϑj ), qn+1 qn
(5.7)
j =1
j =m+1 ˜
where kj is the order of the subtree ϑj , j = 1, . . . , m. [2.2.1] If m ˜ ≥ 2, then (5.2c) follows immediately. [2.2.2] If m ˜ = 0, then (5.7) gives: Mn (ϑ) ≤ 1 +
m
m
j =1
j =1
k 1 + · · · + km R k + Nn (ϑj ) ≤ 1 + + NnR (ϑj ) qn qn ≤
8k k −1+ + NnR (ϑ), qn+1 qn
(5.8)
as we are considering k such that 1 ≤ 8k/qn+1 − 1 and NnR (ϑ1 ) + · · · + NnR (ϑm ) = NnR (ϑ). [2.2.3] If m ˜ = 1, then (5.7) gives: Mn (ϑ) ≤ 1 +
k
1
qn
+
k 8k1 j −1 + + NnR (ϑj ) qn+1 qn m
m
j =2
j =1
m
=
k1 8k1 k0 R + + + Nn (ϑj ), qn qn+1 qn
(5.9)
j =1
where k0 = k2 + · · · + km . [2.2.3.1] If in such case k0 ≥ qn+1 /8, then we can bound in (5.9): k1 8k1 k0 k1 + k0 8(k1 + k0 ) 8k0 k 8k + + ≤ + − ≤ + − 1, qn qn+1 qn qn qn+1 qn+1 qn qn+1
(5.10)
and NnR (ϑ1 + · · · + NnR (ϑm ) = NnR (ϑ), so that (5.2c) follows. [2.2.3.2] If k0 < qn+1 /8, then, denoting with ν and ν1 the momenta flowing through the root line ! of ϑ and the root line !1 of ϑ1 respectively, one has: ||ω(ν − ν1 )|| ≤ ||ων|| + ||ων1 || ≤
1 , 4qn
(5.11)
Bryuno Function and Standard Map
649
as both ! and !1 are on scale ≥ n (see Remark 2 in Sect. 2 and use (2.14)). Then either |ν − ν1 | ≥ qn+1 /4 or ν − ν1 = s˜ qn , s˜ ∈ Z, by Lemma 1 of Sect. 2. [2.2.3.2.1] If |ν − ν1 | ≥ qn+1 /4, noting that ν = ν1 + νT + ν0 , where ν0 = s0 qn (with s0 ∈ Z and |ν0 | ≤ k0 < qn+1 /8) is the sum of the momenta flowing through the root lines of the m0 subtrees entering T with orders < qn+1 /4 and νT is defined by (5.4), one has: qn+1 , (5.12) kT ≥ |νT | ≥ |ν − ν1 | − |ν0 | ≥ 8 so that in (5.9) one can bound: 8k1 k0 k − kT 8(k − k0 − kT ) k 8(k − kT ) k1 + + ≤ + ≤ + qn qn+1 qn qn qn+1 qn qn+1 k 8k ≤ + − 1, qn qn+1
(5.13)
and NnR (ϑ1 ) + · · · + NnR (ϑm ) = NnR (ϑ), so that (5.2c) follows again. [2.2.3.2.2] If ν − ν1 = s˜ qn , s˜ ∈ Z, then: νT = ν − ν1 − ν0 = (˜s − s0 ) ≡ sqn ,
(5.14)
where s ∈ Z. [2.2.3.2.2.1] If s = 0, then kT ≥ qn , so that in (5.3) one has: 8k1 k0 k − kT 8k k 8k k1 + + ≤ − ≤ −1+ , qn qn+1 qn qn qn+1 qn qn+1
(5.15)
and NnR (ϑ1 ) + · · · + NnR (ϑm ) = NnR (ϑ), so implying (5.2c). [2.2.3.2.2.2] If s = 0 (i.e. νT = 0) and kT ≥ qn , one can proceed as in case [2.2.3.2.2.1]. [2.2.3.2.2.3] If s = 0 and kT < qn , then T is a resonance with resonance-scale n,3 so that NnR (ϑ) = 1 + NnR (ϑ1 ) + · · · + NnR (ϑm ), hence (5.9) gives: m
Mn (ϑ) ≤
k 8k k 8k + −1+1+ NnR (ϑj ) ≤ + − 1 + NnR (ϑ), qn qn+1 qn qn+1
(5.16)
j =1
and (5.2c) follows. [3] If the root line ! of ϑ is on scale > n and it is the exiting line of a resonance Vn on scale n, let us denote with !1 , . . . , !m the lines on scale ≥ n which are the nearest to the root line of ϑ, and let ϑ1 , . . . , ϑm be the subtrees with root lines !1 , . . . , !m ; some of these lines – at least one – are lines on scale n inside Vn .4 Let T be the cluster which the lines !1 , . . . , !m enter; of course T ⊂ Vn and T can degenerate into a single point. As in case [2], let m ˜ be the number of subtrees among the m considered which have orders ≥ qn+1 /4, and again let us numerate the subtrees in such a way that the ones with orders ≥ qn+1 /4 are the first m. ˜ 3 If m = 0, then ν ≡ ν = ν so that n ≤ n ≤ n + 1, by construction and by item 2 in the definition 0 ! !1 ! !1 ! of resonance. 4 Otherwise V would not contain any line on scale n, so that it would not be a resonance on scale n as we n are supposing.
650
A. Berretti, G. Gentile
Note that k ≥ qn+1 (otherwise ! could not be on scale > n) and: Mn (ϑ) = 1 + Mn (ϑ1 ) + · · · + Mn (ϑm ),
(5.17)
as the root line ! contributes one unit to Pn (ϑ) and does not contribute to Nn (ϑ). Note also that if T is a resonance then its resonance-scale is n. [3.1] If T is not a resonance, then: NnR (ϑ) = NnR (ϑ1 ) + · · · + NnR (ϑm ).
(5.18)
By induction (5.2) and (5.17) imply: Mn (ϑ) ≤ 1 +
m ˜ kj j =1
qn
+
m m kj R 8kj −1 + + Nn (ϑj ), qn+1 qn j =1
(5.19)
j =1
where kj are the orders of the subtrees ϑj , j = 1, . . . , m. [3.1.1] If m ˜ = 2, then (5.2c) follows immediately. [3.1.2] The case m ˜ = 0 is impossible because T is contained inside a resonance Vn on scale n, so that at least one of the subtrees entering T must have order ≥ qn+1 /4 – otherwise no line on scale > n could enter Vn , see Lemma 2. [3.1.3] If m ˜ = 1 let k0 = k2 +· · ·+km ; then the case k0 ≥ qn+1 /8 can be dealt with as in case [2.2.3.1]; if k0 < qn+1 /8, we deduce from Lemma 1 that either |ν − ν1 | ≥ qn+1 /4 or ν − ν1 = s˜ qn , using the same notations of case [2.2.3.2]. The first case can be discussed as in case [2.2.3.2.1], while in the second case we find, as in case [2.2.3.2.2], that νT = ν − ν1 − ν0 = sqn , with either s = 0 or s = 0 and kT ≥ qn (otherwise T would be a resonance), so that the conclusions in cases [2.2.3.2.2.1] and [2.2.3.2.2.2] can be inherited in the present case and (5.2c) follows again. [3.2] If T is a resonance, then its resonance-scale is n, so that: NnR (ϑ) = 1 + NnR (ϑ1 ) + · · · + NnR (ϑm ).
(5.20)
The discussion goes on as in case [3.1] above, with the only difference that now, when m ˜ = 1 (and kT < qn , k0 < qn+1 /8), the case νT = 0 (i.e. νT = sqn , with s = 0) is the only possible since T is a resonance. In such a case: m
k1 8k1 k0 R k 8k + −1+ + Nn (ϑj ) ≤ + − 1 + NnR (ϑ), Mn (ϑ) ≤ 1 + qn qn+1 qn qn qn+1 j =1
(5.21) and (5.2c) follows once more. • Now we prove (5.1). [4] If the root line ! of ϑ as scale = n and it is not the entering line of a resonance on scale n, let us denote with !1 , . . . , !m the lines entering the last node u0 of ϑ. By construction Mn (ϑ) = Mn (ϑ1 )+· · ·+Mn (ϑm ) and NnR (ϑ) = NnR (ϑ1 )+· · ·+NnR (ϑm ) so that the bound (5.1) follows immediately by induction. [5] If the root line ! of ϑ has scale n, using the same notations as in case [2], denote with !1 , . . . , !m the lines on scale ≥ n which are nearest to the root line of ϑ, and let ϑ1 ,
Bryuno Function and Standard Map
651
. . . , ϑm be the subtrees with these lines as root lines. Then such lines are the entering lines of a cluster T (which can degenerate into a single point) having the root line of ϑ as the exiting line. We have: Mn (ϑ) = 1 + Mn (ϑ1 ) + · · · + Mn (ϑm ). Assuming again inductively the bounds (5.1), from (5.22) we have: m m 2kj Mn (ϑ) ≤ 1 + −1 + NnR (ϑj ), qn j =1
(5.22)
(5.23)
j =1
where kj is the order of the subtree ϑj , j = 1, . . . , m. [5.1] If m ≥ 2, then (5.1b) follows immediately. [5.2] If m = 0, then Mn (ϑ) = 1. As ! is on scale n, the order k of ϑ has to be k ≥ qn , so that: Mn (ϑ) = 1 ≤
2k − 1, qn
NnR (ϑ) = 0,
and (5.1b) follows again. [5.3] If m = 1, then (5.23) gives:
2k1 2k1 Mn (ϑ) ≤ 1 + − 1 + NnR (ϑ1 ) = + NnR (ϑ1 ). qn qn
(5.24)
(5.25)
Denoting with ν and ν1 the momenta flowing, respectively, through the root line ! of ϑ and through the root line !1 of ϑ1 , we have: ||ω(ν − ν1 )|| ≤ ||ων|| + ||ων1 || ≤
1 , 4qn
(5.26)
as both ! and !1 are on scale ≥ n (see Remark 2 and use (2.14)). Then, as νT = ν − ν1 , either |νT | ≥ qn or νT = 0. [5.3.1] If |νT | ≥ qn , then kT ≥ |νT | ≥ qn and NnR (ϑ1 ) = NnR (ϑ) (since T is not a resonance), so that (5.25) gives: Mn (ϑ) ≤
2k 2kT 2k 2k − + NnR (ϑ1 ) ≤ − 1 + NnR (ϑ1 ) = − 1 + NnR (ϑ), qn qn qn qn
(5.27)
and (5.1b) follows. [5.3.2] If νT = 0 and kT ≥ qn , one can reason as in case [5.3.1]. [5.3.3] If νT = 0 and kT < qn , then ν1 = ν and either n!1 = n or n!1 = n + 1 (see item 2 in the definition of resonance): then T is a resonance with resonance-scale n, so that 1 + NnR (ϑ1 ) = NnR (ϑ), hence (5.25) gives:
2k 2k Mn (ϑ) ≤ − 1 + 1 + NnR (ϑ1 ) ≤ − 1 + NnR (ϑ), (5.28) qn qn and (5.1) follows again. [6] If the root line ! of ϑ is on scale > n and it is the exiting line of a resonance Vn , as in case [3] above, denote with !1 , . . . , !m the lines on scale ≥ n which are nearest to
652
A. Berretti, G. Gentile
the root line of ϑ, and let ϑ1 , . . . , ϑm be the subtree of ϑ of which these lines are root lines. Some of these lines – at least one – are lines on scale n inside Vn . Let T be the cluster which the lines !1 , . . . , !m enter; of course T ⊂ Vn , and T can degenerate into a single point. Note that as in case [3]: Mn (ϑ) = 1 + Mn (ϑ1 ) + · · · + Mn (ϑm ),
(5.29)
as the root line ! contributes one unit to Pn (ϑ) and does not contribute to Nn (ϑ), and that if T is a resonance then its resonance-scale is n. [6.1] If T is not a resonance, then: NnR (ϑ) = NnR (ϑ1 ) + · · · + NnR (ϑm ).
(5.30)
By induction, (5.1) and (5.29) imply: Mn (ϑ) ≤ 1 +
m 2kj j =1
qn
−1 +
m j =1
NnR (ϑj ),
(5.31)
where kj are the orders of the subtrees ϑj , j = 1, . . . , m. [6.1.1] If m = 2, then (5.1b) follows immediately. [6.1.2] The case m = 0 is impossible (see case [3.1.2]). [6.1.3] If m = 1 in (5.31), we have νT = ν − ν1 , so that |νT | ≥ qn (as νT = 0, otherwise T would be a resonance). Then we can go on along the lines of case [5.3.1] in order to obtain (5.1b). [6.2] If T is a resonance, then its resonance-scale is n, so that: NnR (ϑ) = 1 + NnR (ϑ1 ),
(5.32)
and the discussion goes on as in case [6.1], with the only difference that now, for m = 1, the case νT = 0 is the only possible as T is supposed to be a resonance. In such a case:
2k 2k Mn (ϑ) ≤ 1 + − 1 + NnR (ϑ1 ) ≤ − 1 + NnR (ϑ), (5.33) qn qn implying again (5.1b). • Finally, to deduce (2.19) from (5.1) and (5.2), simply note that, for qn+1 ≤ 4qn , we have 2k/qn ≤ 8k/qn+1 ; then Lemma 5 follows. Remark 15. Note that the correspondence between momenta and scale labels has been used only through the inequality (2.11). As we have seen in Sect. 4 the renormalization procedure can shift the “original” momenta flowing through the lines of a bounded quantity which does not alter such an inequality. This allow us to apply Lemma 4 also to the renormalized trees, as it was repeatedly claimed in the previous sections.
Bryuno Function and Standard Map
653
6. Proof of Lemma 8 As far as only the localized resonance factor is involved, the momenta flowing through the lines entering any resonance are set to zero, so that it does not matter if such momenta are interpolated or not (i.e. if they are of the form ν or ν(t)). In particular, the case of first generation resonances (discussed in Sect. 3) is included in Lemma 8. A basic property of the trees belonging to the resonance family FV (ϑ) is that the difference between their values is only in the resonance factor: for any tree ϑ ∈ FV (ϑ), we can write: Val(ϑ ) = A(ϑ)VV (ϑ ),
(6.1)
for some factor A(ϑ) which is the same for all ϑ ∈ FV (ϑ). This simply follows from the fact that the transformations in PV do not touch the part of the tree ϑ which is outside the resonance V . Therefore a cancellation between localized resonance factors yields a cancellation between tree values (in which the resonance factor has been localized of course). By item 2 in the definition of resonance and by definition of V0 , one has: νu = 0; (6.2) u∈V0
˜ moreover, given an entering line !m of V , if !m ∈ LR V and V0 = V0 (!m ), then: νu ≡ νu = 0. u∈V˜0
(6.3)
u∈V0 (!m )
In general we can write, for any tree ϑ ∈ FV (ϑ), LVV (ϑ ) = B(ϑ )LVV0 (ϑ ) LVV (!) (ϑ ),
(6.4)
!∈LR V
where VV0 (ϑ ) and VV (!) (ϑ ) are defined as the resonance factor VV (ϑ ), but with the product ranging only over nodes and lines internal to V0 and V (!), respectively, while LVV0 (ϑ ) and LVV (!) (ϑ ) are obtained from VV0 (ϑ ) and VV (!) (ϑ ), respectively, by replacing ν! with ν!0 in V , for all lines ! ∈ V . In (6.4) B(ϑ ) takes into account all other factors (if there are any), always evaluated with ν! replaced with ν!0 , ! ∈ V . Note that, as A(ϑ) in (6.1), also B(ϑ ) is the same for all ϑ ∈ FV (ϑ), so that one can set B(ϑ ) = B(ϑ) and write: Val(ϑ ) = A(ϑ)VV (ϑ ), LVV (ϑ ) = B(ϑ)LVV0 (ϑ ) LVV (!) (ϑ ). (6.5) !∈LR V
[1] If zV = 1 the localized resonance factor is given by the resonance factor computed for µ1 = · · · = µm = 0. Summing the localized resonance factors corresponding to the trees belonging to FV (ϑ), we can group them into subfamilies of inequivalent trees whose contributions are different as for each node u ∈ V there is a factor;
1 mu 1 1 = , (6.6) mu ! s u s u ! ru !
654
A. Berretti, G. Gentile
as all terms which are obtained by permutations are summed together (this gives the binomial coefficient in the left hand side of the above equation), times a factor: νumu +1 = νu(su +1)+ru ,
(6.7)
times a propagator gn!u (ν!0u ) (the last factor is missing if corresponding to the line exiting V ; see definitions (4.3)–(4.6)). Then for µ1 = · · · = µm = 0 we can write:
LVV (ϑ ) =
ϑ ∈FV (ϑ)
νusu +1 gn! (ν!0 ) su ! !∈V ϑ ∈FV (ϑ) u∈V
ru νu νuru · ru ! ru ! R
u∈V0
=
u∈V
·
!∈LV u∈V0 (!)
νusu +1 su !
!∈V
ϑ ∈FV (ϑ) u∈V0
(6.8)
gn! (ν!0 ) νuru ru !
νuru , ru !
u∈V0 (!) !∈LR V
where we have used the fact that for µ1 = · · · = µm = 0 the factors in square brackets have the same value for all ϑ ∈ FV (ϑ) (see (3.11) and take into account what observed at the beginning of this section). The last sum in (6.8) can be rewritten as:
νuru ru !
ϑ ∈FV (ϑ) u∈V0
=
u∈V0 (!) !∈LR V νuru
u∈V {ru ≥0} u∈V ru =mV0 0
mV0 1 = νu mV0 ! u∈V0
νuru ru !
ru !
νuru ru !
˜ ) {ru ≥0} u∈V˜0 V˜ ∈V(V u∈V˜ ru =1
(6.9)
0
νu ,
˜ ) u∈V˜0 V˜ ∈V(V
which is zero by definition of resonance (see (6.2) and (6.3) above). [2] If zV = 2 the localized resonance factor, with respect to the previous case, contains also the first order terms (again computed in µ1 = · · · = µm = 0). The zeroth order contribution can be discussed as for the case zV = 1, and the same result holds. Also the second order contribution vanishes, after summing over the trees ϑ ∈ FV (ϑ). To prove this we shall consider separately the cases mV = 2 and mV = 1. In the first case, when the derivative (∂/∂µm )VV (ϑ; 0, . . . , 0) is considered, let us compare all the trees ϑ in the subfamily of FV (ϑ) in which the line !m is kept fixed (call u¯ the node which such a line enters), while all other lines are shifted (i.e. detached and reattached to all nodes inside the resonance). The difference with respect to the previous case, discussed above, is that the line with momentum ν!m can be chosen in ru¯ ways
Bryuno Function and Standard Map
655
among the ru¯ lines entering the node u¯ ∈ V and outside V . This means that we can write:
(s +1)+ru νu u νumu +1 mu = (6.10) m u ! su su !ru ! for all nodes u = u, ¯ and: (s +1)+(ru¯ −1) u¯ νum ν u¯ mu ¯ ru¯ = u¯ mu¯ ! su su¯ !(ru¯ − 1)!
(6.11)
for u. ¯ Then we have an expression analogous to (6.8), with the only difference that the labels {ru } have to be replaced with labels {ru }, defined as: ru = ru − δuu¯ ,
∀u either in V0 or in
V˜0 ,
(6.12)
˜ ) V˜ ∈V(V
such that: u∈V0
ru +
ru = mV − 1;
(6.13)
˜ ) u∈V˜0 V˜ ∈V(V
so the last sum in the second line of (6.8) has to be replaced by: νuru νuru νu¯ ru ! ru ! R ϑ ∈FV (ϑ) u∈V0
=
!∈LV u∈V0 (!) νuru
ru !
{ru ≥0} ∗ u∈V u∈V ru =mV 0
=
1 m∗V0 !
u∈V0
{ru ≥0} u∈V˜0 ∗ u∈V˜ ru =ζ (!)
˜ ) V˜ ∈V(V
0
νu
m∗V
0
νuru ru !
(6.14)
0
ζ ∗ (V˜ ) νu
,
˜ ) u∈V˜ V˜ ∈V(V
where:
m∗V0
mV0 , = mV0 − 1,
if u¯ ∈ / V0 , if u¯ ∈ V0 ,
ζ (V˜ ) = ∗
1, 0,
if u¯ ∈ / V˜0 , if u¯ ∈ V˜0 ,
(6.15)
so that we have again vanishing contributions (as mV ≥ 2). On the contrary, if mV = 1, the above reasoning does not apply, as there is only one entering line. Anyway the function (∂/∂µ1 )VV (ϑ; 0) is an odd function, as all the propagators are even in their arguments, so that the derived one5 becomes odd, and the numerator contains an even number of νu ’s. Then by reversing the signs of the labels νu , u ∈ V , the numerator will not change, while the overall sign of the denominator 5 If z = 2, then there is only one derived propagator, arising from the renormalization of the resonance V V itself.
656
A. Berretti, G. Gentile
will change, so that the sum over the first order contributions of the localized resonance factors of the two tree values being considered vanishes.6 [3] Finally if zV = 0 the localization operator L gives zero when acting on the resonance factors, so that nothing has to be proved. References 1. Berretti, A. and Gentile, G.:, Scaling Properties for the Radius of Convergence of a Lindstedt Series: The Standard Map. J. Math. Pures Appl. (9) 78, no. 2, 159–176 (1999) 2. Yoccoz, J. C.: Théorème de Siegel, nombres de Brjuno and polinômes quadratiques. Astérisque 231, 3–88 (1995) 3. Marmi, S., Moussa, P., andYoccoz, J. C.: The Brjuno Functions and their Regularity Properties. Commun. Math. Phys. 186, 265–293 (1997) 4. Davie, A. M.: The Critical Function for the Semistandard Map. Nonlinearity 7, 219–229 (1994) 5. Davie, A. M.: Renormalization for Analytic Area-Preserving Maps. unpublished 6. Gentile, G. and Mastropietro, V.: Methods for the Analysis of the Lindstedt Series for KAM tori and Renormalizability in Classical Mechanics. A Review with Some Applications, Rev. Math. Phys. 8, no. 3, 393–444 (1996) 7. Eliasson, L. H.: Absolutely Convergent Series Expansions for Quasi-periodic Motions. University of Stockholm preprint (1988), and Math. Phys. Elect. J. 2, No. 4 (1996), 8. Gallavotti, G.: Twistless KAM tori. Commun. Math. Phys. 164, 145–156 (1994) 9. Marmi, S., and Stark, J.: On the Standard Map Critical Function. Nonlinearity 5, 743–761 (1992) 10. Rüssmann, H.: Invariant Tori in the Perturbation Theory of Weakly Non-degenerate Integrable Hamiltonian Systems, Preprint-Rehie des Fachbereichs Mathematik der Johannes Gutenberg-Universität Mainz, Nr. 14, 27.07.98 11. Berretti, A., and Marmi, S.: Scaling near Resonances and Complex Rotation Numbers for the Standard Map. Nonlinearity 7, 603–621 (1994) 12. Berretti, A., Celletti, A., Chierchia, L. and Falcolini, C.: Natural Boundaries for Area-Preserving Twist Maps. J. Stat. Phys. 66, no. 5–6, 1613–1630 (1992) 13. Wilbrink, J.: Erratic Behavior of Invariant Circles in Standard-like Mappings. Phys. D 26, 358–368 (1987) 14. Berretti, A. and Gentile, G.: Scaling Properties for the Radius of Convergence of Lindstedt Series: Generalized Standard Maps. J. Math. Pures Appl. 79, no. 7, 691–713 (2000) 15. Harary, P., and Palmer, E.: Graphical Enumeration. New York: Academic Press, 1973 16. Gallavotti, G., Gentile, G.: Majorant series convergence for twistless KAM tori. Ergodic Theory and Dynamical Systems 15, 857–869 (1995) 17. Schmidt, W. M.: Diophantine Approximation. Lecture Notes in Mathematics 785, Berlin: Springer-Verlag, 1980 18. Bonetto, F., Gallavotti, G., Gentile, G., and Mastropietro, V.: Lindstedt Series, Ultraviolet Divergences and Moser’s Theorem. Annali della Scuola Normale Superiore di Pisa Cl. Sci. (4), 26, No. 3, 545–593 (1998) Communicated by Ya. G. Sinai
6 Note that the renormalization transformations of type 3 are explicitly used in order to implement the cancellation mechanism only in the case of a resonance V with zV = 2 and mV = 1. In general not all the transformations are used for all resonances: in particular, when zV = 0, we consider separately all terms generated by the action of the group PV , as there is no need of additional renormalizations.