No title

DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 1, ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES JINHO BAIK AND ERIC M. ...

Author: Jonathan Wahl (Academic Editor)

142 downloads 469 Views 4MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 1,

ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES JINHO BAIK AND ERIC M. RAINS

Abstract We present a number of results relating partial Cauchy-Littlewood sums, integrals over the compact classical groups, and increasing subsequences of permutations. These include: integral formulae for the distribution of the longest increasing subsequence of a random involution with constrained number of fixed points; new formulae for partial Cauchy-Littlewood sums, as well as new proofs of old formulae; relations of these expressions to orthogonal polynomials on the unit circle; and explicit bases for invariant spaces of the classical groups, together with appropriate generalizations of the straightening algorithm. Introduction Consider the following two identities: Y X sλ (x1 , x2 , . . .)sλ (y1 , y2 , . . .) = (1 − xi y j )−1 , (0.1) λ

lim EU ∈U (l)

l→∞

Y i

det(1 − xi U )

i, j

−1

Y i

det(1 − yi U

−1 −1

)

=

Y i, j

(1 − xi y j )−1 . (0.2)

The first of these is the well-known identity of Cauchy (see [29]). The second is a formal analogue of the Szegö limit theorem, equivalent to a theorem of [10]. Since the right-hand sides are the same, we also have a third identity: X sλ (x1 , x2 , . . .)sλ (y1 , y2 , . . .) λ

= lim EU ∈U (l) l→∞

Y i

det(1 − xi U )−1

Y i

det(1 − yi U −1 )−1 . (0.3)

Our object of study in the present work is generalizations of these three identities. Our generalizations take two forms. One is to remove the limit l → ∞. As we see in Section 5, in order to preserve (0.3), we must then restrict the sum over partitions DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 1, Received 23 February 2000. Revision received 26 September 2000. 2000 Mathematics Subject Classification. Primary 05E15. Baik’s work supported in part by a Sloan Doctoral Foundation Fellowship. 1

2

BAIK AND RAINS

to involve only partitions with at most l parts. It is here that increasing subsequences appear; in order to rescue equations (0.1) and (0.2), we must replace the common right-hand side with a generating function counting objects (reducing to permutations in an appropriate limit) without long increasing subsequences. In the case of (0.1), the connection is via the Robinson-Schensted-Knuth correspondence, with its wellknown connections to increasing subsequences. It turns out that there is also a direct connection for (0.2), in terms of the invariant theory of the unitary group. In particular, this gives a direct (and essentially elementary) proof of the known connection between unitary group integrals and increasing subsequences (see [32]), as well as of the new connections given here. The other way in which we generalize these identities is to replace the unitary group U (l) by one of four other groups, including the orthogonal and symplectic groups. In terms of permutations, this corresponds to considering involutions (in two ways), signed permutations, and signed involutions, in addition to the original case of permutations; each of these conditions can be described as a particular type of symmetry condition. In each case, we obtain analogues of the finite l versions of equations (0.1), (0.2), and (0.3), together with increasing subsequence interpretations. Guide to main results One of the classical models used to analyze increasing subsequences is the Poisson model: one generates a random subset of the unit square via a Poisson process; then one associates a permutation to this subset in a canonical way (the order of the y coordinates relative to the x coordinates). The five symmetry types correspond to the five subgroups of Z2 × Z2 (acting on the unit square via diagonal reflections); one insists that the random subset be preserved by the appropriate group. It turns out that each symmetry type is naturally associated with a certain compact Lie group, determined as a fixed subgroup via an appropriate action of Z2 × Z2 on the unitary group. Our first main result (Theorem 1.2) then states that the (exact) distribution of the (length of the) longest increasing subsequence of a random permutation of a given symmetry type is given by the moments of the trace of a random element of the corresponding Lie group. By the Schensted correspondence, each of the five cases of Theorem 1.2 can be viewed as expressing an integral over one of the five groups as an appropriate sum over partitions. Each such identity specializes an appropriate Schur function identity (see Theorem 5.2 and Corollary 5.3). For the three symmetry types with diagonal symmetry, this can be further generalized (essentially allowing points on the diagonal of symmetry); thus, for instance, if f (λ) is the number of odd parts and if `(λ) is the

ALGEBRAIC ASPECTS OF INCREASING SUBSEQUENCES

3

number of parts of λ, then (see Theorem 5.6) Y X α f (λ) sλ (x) = E U ∈O(l) det(1 + αU ) det(1 − x j U )−1 . `(λ)≤l

(0.4)

j

For a quite general class of specializations (corresponding to “super-Schur” or “hook Schur” functions), these Schur function sums have combinatorial interpretations in terms of increasing subsequences of multisets. That is, for each symmetry type and each specialization (subject to convergence conditions), we construct a random multiset and a notion of increasing subsequence such that (see Theorem 7.1) the normalized Schur function sum gives the distribution of the longest increasing subsequence. These random multiset models generalize K. Johansson’s 2-dimensional random growth model in [24]. Putting these together, we obtain a connection between integrals and increasing subsequences of multisets. In each case, the identity states that the dimension of a certain space of invariants is given by counting a certain collection of multisets without long increasing subsequences. For the three classical groups, we give direct proofs of this fact by producing an explicit basis of the invariants indexed by the appropriate multisets. Thus, for instance (see Theorem 8.2), the centralizer algebra of the nth tensor power representation of U (l) has basis given by permutations of length n with no increasing subsequence of length greater than l. More generally (see Theorem 8.4), the space of simultaneous multilinear invariants of a collection of symmetric and antisymmetric, covariant and contravariant tensors is explicitly indexed by multisets without long increasing subsequences (generalizing the classical straightening algorithm). Corresponding results hold for the orthogonal (see Theorems 8.5 and 8.6) and symplectic (see Theorems 8.7 and 8.8) groups. The remaining collection of results is of lesser interest combinatorially, but it is crucial to our asymptotic analysis in [3]. The key step in the analysis is to express the integrals of interest in terms of orthogonal polynomials on the unit circle. This is done for the classical groups in Theorem 2.3 (the remaining two groups reduce to the unitary group); for the unitary group the connection is immediate, while for the orthogonal group one must pass through orthogonal polynomials on [−1, 1]. We also give a number of results indicating how certain modifications to the integrand affect the integral. As a consequence, we find (see Corollary 4.3) that for each of the five natural Poisson models, the longest increasing subsequence distribution can be expressed in terms of the same family of orthogonal polynomials. In the sequel to this paper (see [3]), we determine the asymptotics of such polynomials via the RiemannHilbert method, and thus we obtain the limiting longest increasing subsequence distribution for each of the five Poisson models, as well as for the de-Poissonized versions (random symmetric permutations). The results are expressed in terms of the solution to the Painlevé II equation, thus connecting to random matrix theory. Further related

4

BAIK AND RAINS

asymptotic work can be found in [2], [1], and [33]. Outline Section 1 introduces the five symmetry types and their associated groups. In particular, we express the (exact) distribution of the longest increasing subsequence of a random permutation of a given symmetry type as an integral over the corresponding group. Section 2 expresses these integrals as determinants of Bessel functions related to orthogonal polynomials; this relation is given in Section 3. Section 4 describes the extension of the integrals to include the cases when fixed points are allowed. In order to prove the formulae of Section 4, Section 5 uses representation-theoretic arguments to deduce integral representations of partial Cauchy-Littlewood sums, at which point the theory of symmetric functions can be applied. Section 6 briefly discusses alternate proofs of those formulae based on intermediate Pfaffian forms. Section 7 uses a generalized Robinson-Schensted-Knuth correspondence from [5] to relate the partial Cauchy-Littlewood sums to increasing subsequences of certain distributions of random multisets. Finally, Section 8 shows that there is an intimate connection between increasing subsequences and invariants of the classical groups. Indeed, all of the integrals for which we give increasing subsequence interpretations also can be interpreted as dimensions of certain spaces of invariants. We give direct, elementary proofs of these identities by constructing bases of invariants explicitly indexed by multisets with restricted increasing subsequence length. In the process we obtain a generalization of the straightening algorithm of invariant theory, as well as analogues for the orthogonal and symplectic groups. We also discuss extensions to quantum groups and supergroups. Notation We refer the reader to [29] for notation and an introduction to symmetric functions. In other notation, if G is a compact group, we use EU ∈G f (U )

(0.5)

to denote the integral of f (U ) with respect to the (normalized) Haar measure on G. In other words, this is the expected value of f evaluated at a uniform random element of G. When G is the orthogonal group, we occasionally need to consider the two components of G. Thus we write EU ∈O ± (l) f (U )

(0.6)


5

to denote the integral of f over the coset of O(l) of determinant ±1. In particular, we have 1 (0.7) EU ∈O ± (l) f (U ) = EU ∈O(l) (1 ± det(U )) f (U ). 2 1. Symmetrized increasing subsequence problems One of the standard models used in analyzing the usual increasing subsequence problem is defined as follows. We say that a collection of k points in the unit square is increasing if for any two points (x 1 , y1 ) and (x 2 , y2 ), either x 1 < x2 and y1 < y2 or x1 > x2 and y1 > y2 . One can ask, then, for the distribution of the size of the largest increasing subset of n points chosen uniformly and independently from the unit square. It is not too difficult to see that this is the same as the distribution of the longest increasing subsequence of a random permutation; indeed, we can associate a (uniformly distributed) permutation to a given collection of points by using the relative order of the y coordinates after sorting along the x coordinates. Also of interest is the Poissonized analogue, in which new points are occasionally added in such a way that the number of points at time λ is Poisson with parameter λ. One way to generalize this model is to impose a symmetry condition on the set of points. The square has eight symmetries; if we insist that the symmetry preserve increasing collections, we obtain a group H of four elements, generated by the reflections through the main diagonals. Thus there are five possible symmetry conditions we can impose (including the trivial condition), which we denote by the symbols , , , · , and , with associated groups H , H , H , H · , and H . (The symbol indicates the point/line(s) of reflection.) We also use the symbol ~ to denote an arbitrary choice of the five possibilities. Definition 1 We define ~ pnl

to be the probability that, if n points are chosen uniformly at random in the unit square, then the set 6 consisting of the images of those points under H~ contains no increasing subset with more than l points. We also define a function Q l~ (λ) = e−λ

X λn p ~ nl

0≤n

n!

.

(1.1)

The function Q l~ (λ) corresponds to the natural Poisson model; Q l~ (λ) is the probability that the largest increasing subset at time λ has size at most l, and 1 − Q l~ (λ) is the distribution function of the time at which an increasing subset of size l + 1 first appears. In the sequel, however, it turns out to be convenient to use a different time

6

BAIK AND RAINS

scale; we thus define Pl~ (t) = Q l~ (a~ t 2 /2),

(1.2)

where a = a = 1, a = a = 2, and a · = 4. (Here a~ is the number of Young tableaux associated to an element of Sn~ in the proof of Theorem 1.2.) If we map the set of points to a permutation, we obtain a permutation uniformly distributed from an appropriate ensemble. To be precise, define the involution ι ∈ Sn by x 7→ n + 1 − x. Then define an ensemble Sn~ for each symmetry type as follows: Sn = Sn ,

(1.3)

Sn = {π ∈ S2n : π = π −1 , π(x) 6= x},

(1.4)

−1

Sn = {π ∈ S2n : π = ιπ ι, π(x) 6= ι(x)}, Sn· = {π ∈ S2n : π = ιπι},

Sn = {π ∈ S4n : π = π

−1

, π = ιπ

−1

ι, π(x) 6= x, ι(x)}.

(1.5) (1.6) (1.7)

1.1 If a set 6 is chosen as above with symmetry ~, then with probability 1, the associated permutation is well defined and is uniformly distributed from Sn~ . LEMMA

This motivates the further definition ~ f nl~ = |Sn~ | pnl .

(1.8)

That is, f nl~ is the number of elements of Sn~ with no increasing subsequence of length greater than l. It is straightforward to compute |Sn~ | for each case: |Sn | = n!,

(1.9)

|Sn | = |Sn | =

|Sn· | = 2n n!, (2n)! |Sn | = . n!

(2n)! , 2n n!

(1.11) (1.12)

Note that Pl~ (t) = e−a~ t for

2 /2

and · , and note that Pl~ (t) = e−a~ t

2 /2

(1.10)

t 2n n!n!

(1.13)

t 2n (2n)!

(1.14)

X

f nl~

X

f nl~

0≤n

0≤n


7

for , , and . A major reason for considering the above problems is the following theorem. 1.2 Fix an integer l > 0. Map H into Aut(U (2l)) by THEOREM

7→ (U 7→ (U t )−1 ) and 7→ (U 7→ −J (U t )−1 J ), (1.15) ~ l where J = I0l −I 0 . Let U (2l) be the subgroup of U (2l) fixed by the corresponding automorphisms. Then ~ f n(2l) = EU ∈U ~ (2l) | Tr(U )|2n . (1.16) Before giving the proof, it is helpful to list the groups U ~ (2l): U (2l) = U (2l),

U (2l) = O(2l),

U (2l) = Sp(2l), U · (2l) ∼ = U (l) × U (l),

U (2l) = O(2l) ∩ Sp(2l) ∼ = U (l).

(1.17) (1.18) (1.19) (1.20) (1.21)

The last instance is the image of the fundamental representation of U (l) as a 2l×2l real matrix, and thus it corresponds to the direct sum of the fundamental representation and its conjugate. Proof The cases , , and are given in [32]; more precisely, is given there as [32, Th. 1.1], while and are given in [32, Th. 3.4]. (Note that if π ∈ Sn , then πι is a fixed-point-free involution with decreasing subsequences corresponding to the increasing subsequence of π.) We also give new, elementary proofs (see Theorems 8.2, 8.5, and 8.7). It remains to consider · and . Via the Robinson-Schensted correspondence (see [26, Sec. 5.1.4] for an excellent introduction), we can associate a pair (P, Q) of Young tableaux of the same shape to π ∈ Sn· , satisfying the relations P S = P, Q S = Q, where S is the duality operation (“evacuation”) of Schu¨ tzenberger (see [26]). But there is a bijective correspondence between self-dual tableaux and domino tableaux (see, e.g., [42]), and, further, from domino tableaux to pairs of ordinary tableaux with disjoint content, and with shape determined only by the shape of the domino tableau (see [37]). Thus we have associated four Young tableaux (P1 , P2 , Q 1 , Q 2 ), where P1 and Q 1 have the same shape, P2 and Q 2 have the same shape, P1 and P2 have

8

BAIK AND RAINS

disjoint content, and Q 1 and Q 2 have disjoint content. This corresponds to a choice of 0 ≤ m ≤ n, independent choices of two subsets of size m of [1, 2, . . . , n], and independent choices of π1 and π2 of length m and n − m. Furthermore, the longest increasing subsequence of π has length at most 2l precisely when the longest increasing subsequences of π1 and π2 are of length at most l. Putting this together, we find · = f n(2l)

X n 2 f ml f (n−m)l , m

(1.22)

0≤m≤n

while, the integral formula simplifies as follows: EU ∈U · (2l) | Tr(U )|2n = EU1 ∈U (l),U2 ∈U (l) | Tr(U1 ) + Tr(U2 )|2n = EU1 ∈U (l),U2 ∈U (l) |(Tr(U1 ) + Tr(U2 ))n |2 X 2 n m n−m Tr(U1 ) Tr(U2 ) = EU1 ∈U (l),U2 ∈U (l) m 0≤m≤n X n 2 = f ml f (n−m)l m 0≤m≤n

· . = f n(2l)

(1.23)

Similarly, an element π ∈ Sn corresponds to a pair of Young tableaux of the same shape with disjoint content, and thus 2n f nl , f n(2l) = (1.24) n while EU ∈U

(2l)

| Tr(U )|2n = EU ∈U (l) (Tr(U ) + Tr(U ))2n X 2n 2n−m = EU ∈U (l) Tr(U )m Tr(U ) m 0≤m≤2n 2n = (1.25) EU ∈U (l) | Tr(U )|2n . n

It ought to be possible to give a more uniform proof of this result; the results of Section 8 may be relevant to this goal.


9

There is an analogue of Theorem 1.2 in which 2l is replaced by 2l + 1. THEOREM 1.3 For any n, l ≥ 0,

f n(2l+1) = EU ∈U (2l+1) | Tr(U )|2n , 2n

(1.26)

f n(2l+1) = EU ∈O(2l+1) | Tr(U )| , · f n(2l+1) = EU ∈U (l)⊕U (l+1) | Tr(U )|2n ,

(1.27)

f n(2l+1) = f n(2l) ,

(1.29)

(1.28)

while

f n(2l+1) = f n(2l) .

(1.30)

Also, we have the following corollary for · and . COROLLARY 1.4 For any n, l ≥ 0,

P2l· (t) = Pl (t)2 , · (t) = P (t)P (t), P2l+1 l l+1 P2l (t) = Pl (t).

(1.31) (1.32) (1.33)

And for , , and , we have the following corollary. 1.5 For any n, l ≥ 0, COROLLARY

Pl (t) = e−t P2l (t) = e P2l (t) = e

2 /2

−t 2 /2 −t 2

EU ∈O(l) exp(t Tr(U )),

(1.34)

EU ∈Sp(2l) exp(t Tr(U )),

(1.35)

EU ∈U

(2l) exp(t

Tr(U )).

(1.36)

Proof For , we have et

2 /2

Pl (t) =

X t 2n EU ∈O(l) Tr(U )2n . (2n)!

(1.37)

0≤n

But EU ∈O(l) Tr(U )n = 0 for n odd, so this is EU ∈O(l) exp(t Tr(U )),

(1.38)

10

BAIK AND RAINS

as required. The calculations for

and

are analogous.

Remark. In particular, we see that formula (1.36), which was derived in [30] as an expression for , is really most naturally interpreted in terms. As an aside, we observe that if we remove the condition that the symmetries under consideration preserve increasing sets but insist that the corresponding sets should still give permutations, there is one further type of symmetry allowed, namely, rotation by 90 degrees. In terms of permutations, this is the set Sn◦ = {π ∈ S4n : π 2 = ι},

|Sn◦ | = (2n)!/n!.

(1.39)

Such permutations correspond to pairs of tableaux (P, Q t ) with n elements such that P and Q have the same shape and disjoint content. It follows that the length l of the longest increasing subsequence from this set has the same distribution as max(2l + (π), 2l − (π) − 1), where π is randomly chosen from Sn , l + (π) is the increasing subsequence length of π, and l − (π) is the decreasing subsequence length of π. In particular, the bound of P. Erdös and G. Szekeres implies that f nl◦ = 0 for n > l 2 , and thus no integral formula a` la Theorem 1.2 can exist for this case. There is a determinant formula, however, which can be obtained from the following symmetric function identity: ! X (h j−i (x))0≤i 0, XY R i π2∗ (F B ⊗A op B G ) = 0.

Moreover,

XY ^ F G ' π2∗ (F B ⊗A op B G ) ' F ⊗ Aop G.

Proof Consider first the case that G is Aop B itself. Then F B ⊗A op B G is simply F B , which is a quasi-coherent O X ×Y -module and therefore is acyclic for the functor Rπ2∗ . Its direct image onto Y is also quasi-coherent, and on the level of global sections we have XY 0(Y, π2∗ (F B ⊗A op B G )) = 0(X × Y , g F B)

= F B = F ⊗ Aop G. This proves the lemma when G = Aop B and, therefore, also when G is free. To prove the lemma in general, let G • →G→0 be a resolution of G by free Aop Bmodules. Then L F B ⊗A op B G ' F B ⊗A op B G˜ • .

FOURIER TRANSFORM FOR D-ALGEBRAS, I

129

If we can show that this complex has cohomology only in degree zero, we then have L

F B ⊗A op B G ' F B ⊗A op B G .

We also know that the terms of the complex F B ⊗A op B G˜ • are acyclic for the functor Rπ2∗ , from which we learn that L

RπY ∗ (F B ⊗A op B G ) ' πY ∗ (F B ⊗A op B G˜ • ). We would be finished if we then knew that the complex πY ∗ (F B ⊗A op B G˜ • ) had cohomology only in degree zero, and that its zeroth cohomology was F ^ ⊗ Aop G. But since each G i is a free module, the terms of the complex F B ⊗A op B G˜ • are quasi-coherent O X ×Y -modules, with the morphisms being differential operators. So everything that remains to be proved can be checked on the level of global sections. We have L

0(X × Y , F B ⊗A op B G˜ • ) = F B ⊗ Aop B G • ' F ⊗ Aop G ' F ⊗ Aop G.

PROPOSITION 4.3 op Let G ∈ D− qc (X × Y, A B ).

− Then the functor (·)G takes D− qc (X, A ) to Dqc (Y, B ).

Proof By [H1, Chap. I, Prop. 7.3, p. 73], it suffices to prove that F G ∈ D− qc (Y, B ) when F and G are quasi-coherent sheaves. Then one has the following observation, useful here and elsewhere in the paper. For F a quasi-coherent A -module, there are quasiisomorphisms of complexes bounded above F →c1 (F ) ← c2 (F )

such that the terms of the complex c2 (F ) are direct sums of sheaves of the form j∗ (L ), where j : U →X is an open embedding from an affine U and where L is a flat ˇ quasi-coherent A |U -module.∗ For c1 (F ), take the Cech resolution of F with respect to a cover of X by finitely many affine open subsets. Now, given the embedding j : U →X , there is a canonical flat resolution of j∗ j ∗ (F ). Namely, resolve j ∗ (F ) by free A |U -modules using the functor that sends every module to the free module generated by its elements, then apply j∗ . Do this for every term of c1 (F ) to get a double complex whose associated simple complex is c2 (F ). Now use Lemma 4.2 to deduce that F G is quasi-isomorphic to the complex πY ∗ (c2 (F )B ⊗A op B G ). Moreover, the terms of this complex are quasi-coherent, again by Lemma 4.2. ∗ We

thank P. Deligne for pointing this out to us.

130

POLISHCHUK AND ROTHSTEIN

4.2. Some natural isomorphisms The following proposition lists several natural morphisms analogous to the projection and base-change formulas. PROPOSITION 4.4 One has the following natural morphisms: (1) for F ∈ D− (X, A ), G ∈ D− (X × Y , A op B ), and H ∈ D− (Z , C ),

F G H →F G H ;

(2)

(4.2.1)

for F ∈ D− (X, A ), G ∈ D− (X × Y , A op B ), and H ∈ D− (Y × Z , B op C ), (F G )H →G F H ;

(3)

(4.2.2)

for F ∈ D− (X, A ), G ∈ D− (Y, B ), and H ∈ D− (X × Y × Z , A op B op C ), F (G

H

)

→(F G )H .

(4.2.3)

Moreover, in every case the morphism is an isomorphism if F , G , and H have quasicoherent cohomology sheaves. To illustrate, here is the morphism (4.2.1). We have F G H = BC ⊗π

1

−1 (B )π −1 (C ) 2

(π1Y Z

−1

(F G )π2Y Z

−1

(H )).

There is a natural transformation π2X Y Z

−1

XY XY Z Rπ2∗ (·)→π12

XY Z given by adjunction. Moreover, π12 XY Z π23

−1

(F G H )→

XY Z π23

−1

(BC ) ⊗π

2

= π1X Y Z ·

−1 (B )π −1 (C ) 3

−1

−1

−1

(·)

(·) is exact. So we get a morphism

−1

XY Z (π12

L

(F B ⊗A op B G )π3X Y Z

−1

(H ))

L

(F )⊗π

1

−1 (A op )

X Y Z −1 π23 (BC ) ⊗π −1 (B )π −1 (C ) 2 3

XY Z (π12

−1

(G )π3X Y Z

−1

(H )) .

On the other hand, F BC ⊗A op BC G H = π1X Y Z

−1

L

(F )⊗π

1

−1 (A op )

A op BC ⊗π

12

XY Z · (π12

−1

−1 (A op B )π −1 (C ) 3

(G )π3X Y Z

−1

(H )) .

FOURIER TRANSFORM FOR D-ALGEBRAS, I XY Z The morphism π23

−1

131

(BC )→A op BC then gives us our morphism F G H →F G H .

The proofs that the morphisms are isomorphisms in the quasi-coherent case are all done in the same way. One first uses the lemma on way-out functors (see [H1, Chap. 1, Sect. 7, Prop 7.1, p. 68]) to reduce to the case when the objects involved are quasi-coherent sheaves. Then one fixes finite affine open covers of all the schemes and uses the quasi-isomorphisms (·)→c2 (·) described in the proof of Proposition 4.3. This reduces everything to the case of complexes of flat sheaves on affine schemes, where one may invoke Lemma 4.2. Then the morphisms reduce to the canonical isomorphisms (F ⊗ Aop G)H ' F ⊗ Aop G H

(4.2.4)

(F ∈ A-mod, G ∈ A B-mod, H ∈ C-mod), op

(F ⊗ Aop G) ⊗ B op H ' G ⊗ AB op (F H )

(4.2.5)

(F ∈ A-mod, G ∈ A B-mod, H ∈ B C-mod), op

op

F ⊗ Aop (G ⊗ B op H ) ' (F G) ⊗ Aop B op H

(4.2.6)

(F ∈ A-mod, G ∈ AB-mod, H ∈ A B C-mod). op

op

4.3. Circle product Given the D-scheme (Y, B ), if we view B as a sheaf on Y × Y supported on the diagonal, it has a natural BB op -module structure. Denote this BB op -module by δ B . Given F ∈ D− (X × Y , A B op ), G ∈ D− (Y × Z , BC op ), we have F G ∈ D− (X × Y × Y × Z , A B op BC op ).

Thus we can make the following definition. Definition 4.5 Let F ∈ D− (X × Y , A B op ) and G ∈ D− (Y × Z , BC op ). Define F ◦B G ∈ D− (X × Z , A C op )

by the formula

F ◦B G = δ B F G .

PROPOSITION 4.6 For F ∈ D− (X × Y, A B op ) there is a natural morphism

F →F ◦B δ B . op It is an isomorphism if F ∈ D− qc (X × Y, A B ).

132


Proof If we assume that F is a complex of flat sheaves and if we set H = A (δ B )B op ⊗A B op BB op F δ B ,

then we need a natural morphism XY Y Y π14

−1

(F )→H .

So it is enough to give such a morphism when F is an arbitrary sheaf of A B op modules. Since H is supported on the main diagonal in X × Y 3 , it suffices to define the morphism on open sets U 0 ⊂ X × Y 3 of the form U 0 = U × X U × X U , where U is an open subset of X × Y . On the level of additive groups, one has the obvious morphism 0(U 0 , π12 −1 (F ))→0(U 0 , H ) and the isomorphism 0(U 0 , π14 −1 (F ))→0(U 0 , π12 −1 (F )). It must be checked that the composite morphism respects the 0(U, A B op )-module structure. We leave this to the reader. (Take U to be a product open set.) The proof that the morphism is an isomorphism in the quasi-coherent case proceeds, as usual, by reducing to the case of a flat quasi-coherent sheaf on an affine variety and then reducing by Lemma 4.2 to the obvious identity B ⊗ B op B (F B) ' F. The same proof works for transforms. PROPOSITION 4.7 Let F ∈ D− qc (X, A ).

There is a natural isomorphism F ' F δA . op

4.4. Associativity Now let (W, A ), (X, B ), (Y, C ), and (Z , D ) be D-schemes, and let F ∈ D− (A B op ),

G ∈ D− (BC op ),

H ∈ D− (C D op ).

We have the following morphism by Proposition 4.4: (F ◦B G ) ◦C H = δ C ((δ B

F G )H

→ δC δB

)

→δ C δ B

F G H

.

FG H

(4.4.1)


133

A morphism F ◦B (G ◦C H )→ δ C δ B

F G H

(4.4.2)

is similarly defined. These are isomorphisms in the quasi-coherent case, so we have the following result. PROPOSITION 4.8 op − op − op For F ∈ D− qc (A B ), G ∈ Dqc (BC ), and H ∈ Dqc (C D ), there is a natural isomorphism (F ◦B G ) ◦C H ' F ◦B (G ◦C H ). − op One final remark; given F ∈ D− qc (A ) and G ∈ Dqc (A B ), one may regard (X, A ) as the product D-scheme (Spec(C) × X, OSpec(C) A ) and hence consider F ◦A op G or consider instead F G .

PROPOSITION 4.9 Given F ∈ D− qc (A )

op and G ∈ D− qc (A B ), there is a natural isomorphism

F G ' F ◦A op G .

Proof We have

F G ' (F δ A )G . op

Then by the isomorphism (4.2.2), (F δ A )G ' (δ A op )F G = F ◦A op G . op

5. Lie algebroids and twisted differential operators Let T = DerO X be the tangent sheaf of X . A Lie algebroid L on X is a (quasicoherent) O X -module equipped with a morphism of O X -modules σ : L → T and a C-linear Lie bracket [·, ·] : L ⊗ L → L such that σ is a homomorphism of Lie algebras and the following identity is satisfied: [`1 , f `2 ] = f · [`1 , `2 ] + σ (`1 )( f )`2 , where `1 , `2 ∈ L , f ∈ O X (see [Mc]). To every Lie algebroid L one can associate a D-algebra U (L) called the universal enveloping algebra of L. By definition, U (L) is a sheaf of algebras equipped with the morphisms of sheaves i : O X → U (L), i L : L → U (L) such that U (L) is generated, as an algebra, by the images of these morphisms, and the only relations are (i) i is a morphism of algebras;

134

(ii) (iii)


i L is a morphism of Lie algebras; i L ( f `) = i( f )i L (`), [i L (`), i( f )] = i(σ (`)( f )), where f ∈ O X , ` ∈ L.

5.1 Let L be a Lie algebroid on X . A central extension of L by O X is a Lie algebroid L˜ on ˜ =0 X equipped with an embedding of O X -modules c : O X ,→ L˜ such that [c(1), `] ˜ ˜ ˜ for every ` ∈ L (in particular, c(O X ) is an ideal in L) and an isomorphism of Lie ˜ O X ) ' L. For such a central extension we denote by U ◦ ( L) ˜ the algebroids L/c( ˜ modulo the ideal generated by the central element i(1) − i ˜ (c(1)). quotient of U ( L) L

5.1 Let L be a locally free O X -module of finite rank. Then there is a bijective correspondence between isomorphism classes of the following data: (i) a structure of a Lie algebroid on L and a central extension L˜ of L by O X , (ii) a D-algebra A equipped with an increasing algebra filtration O X = A0 ⊂ A1 ⊂ A2 ⊂ . . . such that ∪An = A and an isomorphism of the associated graded algebra grA with the symmetric algebra S • L. LEMMA

Proof The correspondence between (i) and (ii) is established as follows. Given a central ˜ as in (i), the corresponding D-algebra is U ◦ ( L) ˜ with its standard filtraextension L, tion. The isomorphism of the associated graded algebra with S • L is provided by the Poincaré-Birkhoff-Witt (PBW) theorem for Lie algebroids (see [Mc]). Conversely, given a D-algebra A with filtration as in (ii), it gives rise to a central extension 0 → A0 → A1 → A1 /A0 → 0, where the Lie algebroid structure on A1 is induced by the algebra structure on A . Since A0 ' O X , A1 /A0 ' L, this is an extension of L by O X . 5.2 Assume that X is smooth. Then one can take L = T with its natural Lie algebroid structure. The corresponding central extensions T˜ of T by O are called Picard algebroids, and the associated D-algebras are called algebras of twisted differential operators or simply tdo’s. If D is a tdo, D−1 = 0 = D0 ⊂ D1 ⊂ D2 ⊂ . . ., its maximal D-filtration, that is, Di = {d ∈ D |ad( f )d ∈ Di−1 , f ∈ O X },

then grD ' S • T .


135

LEMMA 5.2 For a locally free O X -module of finite rank E, one has a canonical isomorphism

Ext1O X ×X (1∗ E, 1∗ O X ) ' HomO X (E, T ) ⊕ Ext1O X (E, O X ), 1

where X → X × X is the diagonal embedding. Proof Since 1∗ E ' p1∗ E ⊗O X ×X (O X ×X /J ), where J is the ideal sheaf of the diagonal, we have an exact sequence 0 → Hom( p1∗ E ⊗O X ×X J, 1∗ O X ) → Ext1 (1∗ E, 1∗ O X ) → Ext1 ( p1∗ E, 1∗ O X ). Note that the first and last terms are isomorphic to Hom(E, T ) and Ext1 (E, O X ), respectively. It remains to note that there is a canonical splitting 1∗ : Ext1 (E, O X ) → Ext1 (1∗ E, 1∗ O ). Note that the projection Ext1 (1∗ E, 1∗ O X ) → Hom(E, T ) can be described as follows. Given an extension 0 −−−−→ 1∗ O X −−−−→ E˜ −−−−→ 1∗ E −−−−→ 0, the action of J/J 2 on E˜ induces the morphism J/J 2 ⊗ E˜ → 1∗ O X , which factors through J/J 2 ⊗ 1∗ E since J annihilates 1∗ O X . Hence we get a morphism 1∗ E → 1∗ T . Now if A is a D-algebra, equipped with a filtration A• such that grA ' S • (E), then we consider the corresponding extension of O X -bimodules 0 −−−−→ O X = A0 −−−−→ A1 −−−−→ E = A1 /A0 −−−−→ 0 as an element in Ext1O X ×X (1∗ E, 1∗ O X ). By definition, A is a tdo if the projection of this element to HomO X (E, T ) is a map E → T that is an isomorphism.

6. Equivalences of categories of modules over D-algebras 6.1 Let P and Q be objects in D− qc (X × Y ) such that P ◦Y Q ' 1∗ O X , Q ◦ X P ' 1∗ OY .

(6.1.1)

136


By Proposition 3.1, the transforms by P and Q give equivalences of the derived − categories D− qc (X ) and Dqc (Y ). There are many examples of such equivalences. However, we are mainly interested in the case when X and Y are dual abelian varieties, where we may take P to be the normalized Poincaré line bundle on X × Y and Q = P −1 ω−1 X [−g], where g = dim X . In this case we can extend the equivalence to the derived categories of modules over a large class of D-algebras on X and Y . The idea is to study the functor Q ◦ X (·) ◦ X P − from D− qc (X × X ) to Dqc (Y × Y ) for sheaves supported on the diagonal.

6.2. Special sheaves Let us call a quasi-coherent sheaf K on X ×X special if there is an exhaustive filtration of K by quasi-coherent sheaves 0 = K −1 ⊂ K 0 ⊂ K 1 ⊂ . . . such that for all i, K i /K i−1 ' 1∗ (Fi ) for some globally free O X -module Fi . Denote by S X the exact category of special sheaves on X × X . We have the following easy proposition. 6.1 For every K ∈ S X , the functor M 7 → K ◦ X M is exact from O X ×Z -modules to O X ×Z -modules. For every pair of special sheaves K , K 0 ∈ S X , K ◦ X K 0 is special.

PROPOSITION

(1) (2)

From the fact that Q ◦ X (1∗ O X )◦ X P ' 1∗ OY , we obtain the following proposition. PROPOSITION 6.2 The functor 8 : K 7→ Q ◦ X K ◦ X P defines an equivalence of categories 8 : S X → SY , the inverse being K 7→ P ◦Y K ◦Y Q . PROPOSITION 6.3 For K , K 0 ∈ S X , M ∈ D− (X ), one has a canonical isomorphism of OY -bimodules,

8(K ◦ X K 0 ) ' 8K ◦Y 8K 0 , and a canonical isomorphism in mathb f D − (Y ), 8(K ◦ X M) ' 8K ◦Y 8M, where 8M = Q ◦ X M. Definition 6.4 A D-scheme (X, A ) is special if δ A is special when regarded as an O X ×X -module.


137

The main class of examples of special D-schemes is provided by tdo’s over abelian varieties. It follows from Propositions 6.2 and 6.3 that for any special D-scheme (X, A ) there exists a canonical D-algebra 8A on Y such that δ(8A ) ' 8(δ A ). Indeed, every special D-algebra is flat, so the structural morphism of A is a morphism δ A ◦ X δ A → δ A . One just has to apply 8 to this morphism and also to the morphism 1∗ O X → δ A . Then (Y, 8A ) is again a special D-scheme. Futhermore, we now prove that the derived categories of modules over A and 8A are equivalent. THEOREM 6.5 Assume that the objects P and Q in equations (6.1.1) are quasi-coherent sheaves up to a shift (i.e., they have only one cohomology). Then for every special D-algebra A − on X there is an exact equivalence 8 : D− qc (A ) → Dqc (8A ) such that the following diagram of functors is commutative: 8

D− −−−→ D− qc (A ) − qc (8A )     y y 8

0 0 D− −−−→ D− qc (A ) − qc (8A )

for every homomorphism of special D-algebras A 0 → A (resp., A → A 0 ), where the vertical arrows are the restriction (resp., induction) functors. Proof Set B = 8A . Set op G 0 = P ◦Y δ B ∈ D− qc (O X B ),

H 0 = δ B ◦Y Q ∈ D− qc (BO X ).

Note that these objects are concentrated in one cohomological degree. We claim, op moreover, that G 0 and H 0 come from objects G and H in D− qc (A B ) and op D− qc (BA ), respectively, via the forgetful functor. By Proposition 3.2, it suffices to endow G 0 with a left action of A with respect to ◦, commuting with the right B -action. But G 0 = P ◦ (Q ◦ δ A ◦ P ) = δ A ◦ X P , which exhibits the desired structure. If we apply the forgetful functor to G ◦B H , we get G 0 ◦B H 0 ∈ D− qc (X × X ). Moreover, G 0 ◦B H 0 = (P ◦Y δ B ) ◦B (δ B ◦Y Q ) = P ◦Y δ B ◦Y Q = δ A

138


as an O X ×X -module. We have to check that we have the correct A A op -module structure on G ◦B H , but this can be done one side at a time. That is, we have op G 0 ◦B H ∈ D− qc (O X A )

and G ◦B H 0 ∈ D− qc (A O X ),

and it is easy to check that we get the correct π −1 (A op )- and π −1 (A )-module structures. We now have G ◦B H ' δ A and H ◦A G ' δ B , so we have equivalences of categories (·)G and (·)H . Now let A1 →A2 be a homomorphism of special D-algebras on X . Let Gi = P ◦OY δ Ai for i = 1, 2. Then we have an isomorphism of 8A1 (A2 )op -modules G2 ' G1 ◦A1 A2 .

This immediately implies that the equivalences 8 for A1 and A2 commute with restriction. Indeed, for an A2 -module M we have an isomorphism of 8A1 -modules, G2 ◦A2 M ' (G1 ◦A1 A2 ) ◦A2 M ' G1 ◦A1 M.

The compatibility of 8 with induction is checked similarly, using the isomorphism G2 ' 8A2 ◦8A1 G1 .

Example 6.6 (Connections; see [R]) For any smooth variety X there is a D-algebra C X such that, for any O X -module F , endowing F with a (not-necessarily-integrable) connection is equivalent to endowing F with the structure of a C X -module. If we denote by T the tangent sheaf of X , then α there is a map of left O X -modules T → C X such that, for f ∈ O and ξ ∈ T , α(ξ( f )) = α(ξ ) f − f α(ξ ), and C X is universal among O X -algebras with this property. The graded algebra associated to the maximal D-filtration of C X is the tensor algebra T (T ). In particular, if X is an abelian variety, (X, C X ) is a special D-scheme. To compute its Fourier transform, it suffices to transform the extension of O X ×X -modules 0→O X →O X ⊕ T →T →0, where the middle term is an O X -bimodule by the formula f (g, ξ )h = ( f gh + f ξ(h), f hξ ).

(6.2.1)


139

Set g = H 0 (X, T ) = H 1 (Y, O ), where Y is the dual abelian variety. By Lemma 5.2, Ext1O X ×X (1∗ O X , 1∗ O X ) = g ⊕ gˆ , where gˆ = H 0 (Y, T ) = H 1 (X, O ). With this identification the extension class of (6.2.1) is the identity element in g∗ ⊗ g. Therefore the transform of extension (6.2.1) is the universal vector bundle extension on Y, 0→OY →E →g ⊗ OY →0.

(6.2.2)

We find, therefore, that 8(C X ) is the universal D-algebra associated to the extension (6.2.2); that is, 8(C X ) = T (E )/(1E − 1), the tensor algebra of E modulo the identification of 1’s. Note that for an OY -module F to be endowed with a module structure over 8(C X ) is the same as to give a splitting of the sequence 0→F →E ⊗O F →g ⊗C F →0. Thus the derived category of sheaves on Y equipped with such a splitting is equivalent via the Fourier-Mukai transform to the derived category of sheaves on X equipped with a connection. This is the form of the equivalence given in [R]. Example 6.7 (Integrable connections) These are D -modules. Now D X is the quotient of C X by the relations [α(ξ ), α(ζ )] = α([ξ, ζ ]) for vector fields ξ and ζ . Upon Fourier transform this translates to the condition that our 8(C X )-module is in fact a module over the commutative D-algebra AY = Sym(E )/(1E − 1). Now SpecY (AY ) is the universal additive group extension Y \ →Y . This recovers Laumon’s correspondence between the derived category of D modules on X and the derived category of O -modules on Y \ . Below we show that this equivalence is a degenerate case of a more symmetric picture involving the categories of modules over tdo’s on both X and Y . Remark 6.8 Assuming that X is an abelian variety, we can generalize the notion of a special Dalgebra on X as follows. Instead of considering special sheaves on X × X one can consider quasi-coherent sheaves on X × X admitting filtration with quotients of the form (id, tx )∗ L, where (id, tx ) : X → X × X is the graph of the translation by some point x ∈ X and L is a line bundle algebraically equivalent to zero on X . Let us call such sheaves quasi-special. It is easy to see that quasi-special sheaves are flat over X

140


with respect to both projections p1 and p2 , so the operation ◦ is exact on them. We can define a quasi-special algebra as a quasi-special sheaf K on X × X together with the associative multiplication K ◦ K → K admitting a unit 1∗ O X → K . Then there is a Fourier duality for quasi-special algebras and equivalence of the corresponding derived categories. The proof of Theorem 6.5 works literally in this situation. Note that modules over quasi-special algebras form a much broader class of categories than those over special D-algebras. Among these categories we can find some categories of modules over 1-motives, and our Fourier duality coincides with the one defined by G. Laumon in [L]. For example, a homomorphism φ : Z → X defines a quasi-special algebra on X that is a sum of structural sheaves of graphs of translations by φ(n), n ∈ Z. The corresponding category of modules is the category of Z-equivariant O X modules. The Fourier dual algebra corresponds to the affine group over Y that is an extension of Y by the multiplicative group.

7. Transforms of Lie algebroids and twisted differential operators PROPOSITION 7.1 Let L be a Lie algebroid on Y such that L ' OYd as an OY -module. Then for any ˜ is special. Futhermore, one central extension L˜ of L by OY , the D-algebra U ◦ ( L) ◦ ◦ 0 ˜ ' U ( L˜ ) for some central extension L˜ 0 of a Lie algebroid L 0 on X has 8U ( L) such that L 0 ' O Xd as an O X -module.

Proof This follows from Lemma 5.1. One just has to notice that if a D-algebra A on Y has an algebra filtration A• with grA• ' S • (OYd ), then 8A has an algebra filtration F A• with gr8A• ' S • (O Xd ). ˜ Note that if L is a successive extension of trivial bundles, then the D-algebra U ◦ ( L) ˜ is not necessarily of the form U ◦ ( L˜ 0 ). is still special, but 8U ◦ ( L) 7.1 From now on, we assume that X is an abelian variety; Y is the dual abelian variety. As before, denote by g (resp., gˆ ) the tangent space to X (resp., Y ) at zero. Let T˜ be a Picard algebroid on X , and let D = U ◦ (T˜ ) be the corresponding tdo. Then T˜ /O X ' T X ' g ⊗C O X is a trivial O X -module. Hence D is a special D-algebra. By Proposition 7.1, 8D ' U ◦ ( L˜ 0 ) for some Lie algebroid L 0 on Y and its central extension L˜ 0 by OY . It is then natural to ask whether 8D is a tdo.


141

THEOREM 7.2 Let D be a tdo on X , and let T˜ be the corresponding Picard algebroid. Then 8D is a tdo on Y if and only if the map g → H 1 (X, O ), induced by the extension of O X -modules 0 → O X → T˜ → g ⊗C O X → 0,

is an isomorphism. Proof Let D• be the canonical filtration of D . Then 8D is a tdo if and only if the class of the extension of OY -bimodules 0 → OY ' 8D0 → 8D1 → 8(D1 /D0 ) ' g ⊗C OY → 0 induces an isomorphism gˆ ⊗C OY → TY . Thus it is sufficient to check that the components of the canonical decomposition Ext1O X ×X (1∗ O X , 1∗ O X ) ' H 0 (X, T X ) ⊕ H 1 (X, O X ), introduced in Lemma 5.2, get interchanged by the Fourier-Mukai transform if we take into account the natural isomorphisms H 0 (X, T ) ' g ' H 1 (Y, O ), H 1 (X, O ) ' gˆ ' H 0 (Y, T ). We leave this to the reader as a pleasant exercise on Fourier-Mukai transform. 7.2 Let us describe in more detail the data consisting of a Lie algebroid L on an abelian variety X such that L ' V ⊗C O X as an O X -module (where V is a finite-dimensional k-vector space) and a central extension L˜ of L by O X . First of all, V = H 0 (X, L) has a structure of a Lie algebra, and the structural morphism L → T is given by some k-linear map β : V → g = H 0 (X, T ) which is a homomorphism of Lie algebras (where g is an abelian Lie algebra). The central extension L˜ is described (up to an isomorphism) by a class e α in the first hypercohomology space H1 (X, L ∗ → 2 ∗ 3 ∗ ∧ L → ∧ L → . . .) of the truncated Koszul complex of L. In particular, we have the corresponding class α ∈ H 1 (X, L ∗ ), which is just the class of the extension of O X -modules 0 → O X → L˜ → L → 0. We can consider α as a linear map V → H 1 (X, O X ) = gˆ . The maps α and β get interchanged by the Fourier transform, up to a sign.

142


By definition, the D-algebra associated with e L is a tdo if and only if β : V → g is an isomorphism. If in addition α : V → gˆ is an isomorphism, then the dual Dalgebra is also a tdo. Thus we have a bijection between tdo’s with nondegenerate first Chern class on X and Y such that the corresponding derived categories of modules are equivalent. According to [BB], isomorphism classes of tdo’s on X are classified by H2 (X, ≥1 ), which is an extension of H 1 (X, 1 ) ' Hom(g, gˆ ) by H 0 (X, 2 ) = ∧2 g∗ . Let U X ⊂ H2 (X, ≥1 ) be the subset of elements with nondegenerate projection to H 1 (X, 1 ). The duality gives an isomorphism between U X and UY . It is easy to see that under this isomorphism the operation of multiplication by λ ∈ C∗ on U X corresponds to multiplication by λ−1 on UY . On the other hand, let A be a tdo with trivial c1 . In other words, A corresponds to some global 2-form ω on X . Modules over A are O -modules equipped with a connection having curvature ω. Let B be the dual D-algebra on Y , and let e L→L = H 0 (X, T X ) ⊗ OY be the corresponding central extension of Lie algebroids. We claim that L is just an OY -linear commutative Lie algebra while the central extension e L is given by the class (e, ω) ∈ H 1 (L ∗ ) ⊕ H 0 (∧2 L ∗ ), where e is the canonical element in H 1 (L ∗ ) ' H 1 (Y, O ) ⊗ H 1 (Y, O )∗ . Indeed, as an OY -module, e L is a universal 1 extension of H (Y, O ) ⊗ O by O . Hence the Lie bracket defines a morphism of O modules ∧2 L→e L. Since H 0 (e L) = H 0 (O ), it follows that [e L, e L] ⊂ O ⊂ e L. It is easy to see that the Lie bracket is just given by ω : ∧2 L→O . 7.3. Action on the Neron-Severi group Recall that the Neron-Severi group of X is identified with Homsym (X, Y ) ⊗ Q, where Homsym (X, Y ) is the group of symmetric homomorphisms X →Y . Namely, to a line bundle L there corresponds a symmetric homomorphism φ L : X →Y sending a point x to tx∗ L ⊗ L −1 , where tx : X →X is the translation by x. One has the natural Q-linear homomorphism c1 : N S(X )→H2 (X, ≥1 ) sending a line bundle L to the class of the ring D L of differential operators on L. For µ ∈ N S(X ) we denote by Dµ the corresponding tdo. For a vector bundle E we set c1 (E) = c1 (detE). 7.3 If µ ∈ N S(X ) is a nondegenerate class, then the Fourier tdo dual to Dµ is PROPOSITION

8(Dµ ) = D−µ−1 .

(7.3.1)

Proof It suffices to check this when µ is a class of a line bundle L, in which case it follows easily from the isomorphism φ L∗ det8(L) ' L −rk8(L)


143

and the fact that the dual tdo to D L acts on 8(L). 8. Projective connections Let E be a coherent sheaf that is a module over some tdo on X (then E is automatically locally free). Following [BB], we say in this case that there is an integrable projective connection on E. PROPOSITION 8.1 Let E be a vector bundle on X equipped with an integrable projective connection. Assume that detE is a nondegenerate line bundle. Then H i 8(E) are vector bundles with canonical integrable projective connections, and the following equality holds: ∗ φdetE c1 (8(E)) = −χ(X, E) · rkE · c1 (E).

Proof The first statement follows immediately from the fact that 8(E) is quasi-isomorphic to a complex of modules over the tdo on Y dual to D(detE)1/r , where r = rkE. On the other hand, this tdo acting on 8(E) is isomorphic to D(det8(E))1/r 0 , where r 0 = rk8(E) = χ(X, E). Considering classes of these dual tdo’s and using the isomorphism (7.3.1) applied to µ = (1/r )φdetE , we get the above formula. 8.1 The following two natural questions arise: (1) For every µ ∈ N S(X ), does there exist a vector bundle E on X that is a module over Dµ ? (2) Which vector bundles on an abelian variety admit integrable projective connections? To answer these questions we use the following construction. Let π : X 1 →X 2 be an isogeny of abelian varieties, and let E be a vector bundle with an integrable projective connection on X 1 . Then there is a canonical integrable projective connection on π∗ E. Indeed, the simplest way to see this is to use Fourier duality. If E is a module over some tdo Dλ on X 1 , then 8(E) is a module over the dual D-algebra 8(Dλ ) on Y1 . Now we use the formula π∗ E ' 8−1 πˆ ∗ (8(E)), where 8−1 is the inverse Fourier transform on X 2 ; hence π∗ E is a module over 8−1 πˆ ∗ 8(Dλ ) which is a tdo on X 2 . In particular, the pushforwards of line bundles under isogenies have canonical integrable projective connections. Also, it is clear that if E is a vector bundle with an integrable projective connection and F is a flat vector bundle, then E ⊗ F has a natural integrable projective connection.

144


Now we can answer the above questions. 8.2 For every µ ∈ N S(X ) there exists a vector bundle E that is a module over Dµ . THEOREM

Proof We can write µ = [L]/n, where n > 0 is an integer and [L] is a class of a line bundle L on X . Let [n] A : A→A be an endomorphism of multiplication by n. Then [n]∗A (µ) ∈ N S(X ) is represented by a line bundle L 0 . Now we claim that the pushforward [n] A,∗ L 0 has the structure of a module over Dµ . Indeed, it suffices to check that c1 ([n] A,∗ L 0 )/deg([n] A ) = µ. Let Nmn : N S(X )→N S(X ) be the norm homomorphism corresponding to the isogeny [n] A . Then the left-hand side of the above equality is Nmn ([L 0 ])/deg([n] A ). Hence the pullback of the left-hand side by [n] A is equal to [L 0 ] = [n]∗A (µ), which implies our claim. THEOREM 8.3 Let E be an indecomposable vector bundle with an integrable projective connection on an abelian variety X . Then there exists an isogeny of abelian varieties π : X 0 →X , a line bundle L on X 0 , and a flat bundle F on X , such that E ' π∗ L ⊗ F.

Proof The main idea is to analyze the sheaf of algebras A = End(E). Namely, A has a flat connection such that the multiplication is covariantly constant. In other words, it corresponds to a representation of the fundamental group π1 (X ) in automorphisms of the matrix algebra. Since all such automorphisms are inner, we get a homomorphism ρ : π1 (X )→PGL(E 0 ), where E 0 is a fiber of E at zero. Now the central extension SL(E 0 )→PGL(E 0 ) induces a central extension of π1 (X ) = Z2g by the group of roots of unity of order rk E. This central extension splits on some subgroup of finite index H ⊂ π1 (X ). In other words, the restriction of ρ to H lifts to a homomorphism ρ H : H →GL(E 0 ). Let π : e X →X be an isogeny corresponding to H , so that e X is an abelian variety with e on e π1 ( e X ) = H . Then ρ H defines a flat bundle F X such that e π ∗ A ' End( F) e for some line bundle as algebras with connections. It follows that π ∗ E ' L ⊗ F e e L on X . Thus E is a direct summand of π∗ (L ⊗ F). Note that there exists a flat


145

e ' π ∗ F. (Again, the simplest way to see this is to use the bundle F on X such that F Fourier duality.) Hence E is a direct summand of π∗ L ⊗ F. It remains to check that all indecomposable summands of the latter bundle have the same form. This follows from the following lemma. 8.4 Let π : X 1 →X 2 be an isogeny of abelian varieties, let L be a line bundle on X 1 , and let F be an indecomposable flat bundle on X 1 . Assume that π∗ (L ⊗ F) is decomposable. Then there exists a nontrivial factorization of π into a composition LEMMA

π0

X 1 → X 10 →X 2 such that L ' (π 0 )∗ L 0 for some line bundle L 0 on X 10 . Proof By adjunction and projection formula we have End(π∗ (L ⊗ F)) ' Hom(π ∗ π∗ (L ⊗ F), L ⊗ F) ' ⊕x∈K Hom(tx∗ L ⊗ F, L ⊗ F), where K ⊂ X 1 is the kernel of π. If tx∗ L ' L for some x ∈ K , x 6= 0, then L descends to a line bundle on the quotient of X 1 by the subgroup generated by x. Otherwise we get End(π∗ (L ⊗ F)) ' End(F); hence π∗ (L ⊗ F) is indecomposable. References [BB]

[H1] [H2] [K] [Kr1] [Kr2] [L] [Mc]

A. BEILINSON and J. BERNSTEIN, “A proof of Jantzen conjectures” in I. M. Gelfand

Seminar, Adv. Soviet Math. 16, Part 1, Amer. Math. Soc., Providence, 1993, 1–50. MR 95a:22022 123, 125, 142, 143 R. HARTSHORNE, Residues and Duality, Lecture Notes in Math. 20, Springer, Berlin, 1966. MR 36:5145 129, 131 , Algebraic Geometry, Grad. Texts in Math. 52, Springer, New York, 1977. MR 57:3116 M. KAPRANOV, Noncommutative geometry based on commutator expansions, J. Reine Angew. Math. 505 (1998), 73–118. MR 2000b:14003 124 I. M. KRICHEVER, Algebraic-geometric construction of the Zaharov-Sabat equations and their periodic solutions, Soviet Math. Dokl. 17 (1976), 394–397. 124 , Integration of nonlinear equations by the methods of nonlinear geometry (in Russian), Funk. Anal. i Pril. 11 (1977), 15–31. 124 G. LAUMON, Transformation de Fourier généralisée, preprint, arXiv:math.alg-geom/9603004 123, 140 K. MACKENZIE, Lie Groupoids and Lie Algebroids in Differential Geometry, London Math. Soc. Lecture Note Ser. 124, Cambridge Univ. Press, Cambridge, 1987. MR 89g:58225 133, 134

146


[M]

S. MUKAI, Duality between D(X ) and D( Xˆ ) with its application to Picard sheaves,

[R]

M. ROTHSTEIN, Sheaves with connection on abelian varieties, Duke Math. J. 84

Nagoya Math. J. 81 (1981), 153–175. MR 82f:14036 123, 127 (1996), 565–598, MR 98i:14044a; Correction, Duke Math. J. 87 (1997), 205–211. MR 98i:14044b 123, 138, 139

Polishchuk Department of Mathematics, Harvard University, Cambridge, Massachusetts 02138, USA; [email protected]; current: Department of Mathematics and Statistics, Boston University, Boston, Massachusetts 02215, USA; [email protected] Rothstein Department of Mathematics, University of Georgia, Athens, Georgia 30602, USA; [email protected]


LOW-LYING ZEROS OF L-FUNCTIONS AND RANDOM MATRIX THEORY MICHAEL RUBINSTEIN

Abstract By looking at the average behavior (n-level density) of the low-lying zeros of certain families of L-functions, we find evidence, as predicted by function field analogs, in favor of a spectral interpretation of the nontrivial zeros in terms of the classical compact groups. 1. Introduction In this paper, a connection is made between the low-lying zeros of L-functions and the eigenvalues of large matrices from the classical compact groups. The Langlands program (see [2], [10], [7]) predicts that all L-functions can be written as products of ζ (s) and L-functions attached to automorphic cuspidal representations of GL M over Q. Such an L-function is given intially (for <s sufficiently large) as an Euler product of the form L(s, π) =

Y p

L(s, π p ) =

M YY

(1 − απ ( p, j) p −s )−1 .

(1.1)

p j=1

Basic properties of such L-functions are described in [15]. The L-functions that arise in the m = 1 case are the Riemann zeta-function ζ (s) and Dirichlet L-functions L(s, χ ), χ a primitive character. For m = 2, the L-functions in question are associated to cusp forms or Maass forms of congruence subgroups of SL2 (Z). The Riemann hypothesis (RH) for L(s, π ) asserts that the nontrivial zeros of L(s, π), {1/2 + iγπ } all have γπ ∈ R. (Our L-functions are always normalized so that the critical line is through <s = 1/2.) A vague suggestion of G. Pólya and D. Hilbert suggests an approach that one might take in establishing RH. They hypothesized (for ζ (s)) that one might be able to associate the nontrivial zeros of ζ to the eigenvalues of some operator acting on some Hilbert space, thus (depending on the properties of the operator) forcing the zeros to lie on a line. DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 1, Received 11 February 2000. Revision received 25 September 2000. 2000 Mathematics Subject Classification. Primary 11M26; Secondary 15A52. 147

148

MICHAEL RUBINSTEIN

The first evidence in favor of this approach was obtained by H. Montgomery [9], who derived (under certain restrictions) the pair correlation of the zeros of ζ (s). Together with an observation of Freeman Dyson, who pointed out that the Gaussian Unitary Ensemble (GUE), consisting of N × N random Hermitian matrices (see [8] for a more precise definition), has the same pair correlation (as N → ∞), it seems to suggest that the relevant operator, at least for ζ (s), might be Hermitian. Extensive computations of A. Odlyzko [11], [12] further seem to bolster the Hermitian nature of the zeros of ζ (s), as might the work of Z. Rudnick and P. Sarnak [15], where, under certain restrictions, the n-level correlations of ζ (s) and L(s, π) are found to be the same as those of the GUE. However, recent developments suggest that, rather than being Hermitian, the relevant operators for L-functions belong to the classical compact groups. (This is consistent with the above work of Montgomery, Odlyzko, and Rudnick and Sarnak since all the classical compact groups have the same n-level correlations as the GUE (as N → ∞).) First, analogs with function field zeta-functions, where there is a spectral interpretation of the zeros in terms of Frobenius on cohomology, point towards the classical compact groups (see [6]). Second, even though all the mentioned families of matrices have the same n-level correlations, there is another statistic, called n-level density, which is sensitive to the particular family. By looking at this statistic for zeros of L-functions, one finds the fingerprints of the classical compact groups. For n = 1 ¨ uk this was done, for quadratic twists of ζ (s), and with certain restrictions, by A. Ozl¨ and C. Snyder [13]. A stronger result (which takes into account certain nondiagonal contributions and allows one to choose test functions whose Fourier transform is supported in (−2/M, 2/M)) which applies for ζ (s) as well as all L(s, π) was obtained by N. Katz and Sarnak [6]. The general case, n ≥ 1, is worked out (again, with some restrictions) in this paper. 2. n-level density For an (N × N )-matrix A in one of the classical compact groups, write its eigenvalues as λ j = eiθ j , with 0 ≤ θ1 ≤ · · · ≤ θ N < 2π. (2.1) Assume that f : Rn → R is bounded, Borel measurable, and compactly supported. Then, letting X H (n) (A, f ) = f θ j1 N /(2π), . . . , θ jn N /(2π) , 1≤ j1 ,..., jn ≤N distinct

Katz and Sarnak [5, Appendix] obtain the following family dependent result: Z Z (n) (n) lim H (A, f ) d A = WG (x) f (x) d x N →∞ G(N )

Rn≥0

(2.2)

LOW-LYING ZEROS

149

for the following families: (n)

G

WG

U(N ),Uκ (N ) USp(N ) SO(2N ), O− (2N + 1) SO(2N + 1), O− (2N )

det K 0 (x j , xk ) 1≤ j≤n 1≤k≤n det K −1 (x j , xk ) 1≤ j≤n 1≤k≤n det K 1 (x j , xk ) 1≤ j≤n 1≤k≤n det K −1 (x j , xk ) 1≤ j≤n 1≤k≤n P + nν=1 δ(xν ) det K −1 (x j , xk ) 1≤ j6=ν≤n 1≤k6=ν≤n

with K ε (x, y) =

sin(π(x − y)) sin(π(x + y)) +ε . π(x − y) π(x + y)

In the above, d A is the Haar measure on G(N ) (normalized so that and Uκ (N ) = A ∈ U(N ) : detκ (A) = 1 ,

R

G(N ) d A

= 1),

SO(N ) = {A ∈ O(N ) : det A = 1} , O− (N ) = {A ∈ O(N ) : det A = −1} . The delta functions in the SO(2N + 1), O− (2N ) case are accounted for by the eigenvalue λ1 = 1. (Notice, for O(N ), that λ = 1 is an eigenvalue if N is even and det A = −1 (i.e., A ∈ O− (2N )) or if N is odd and det A = 1 (i.e., if A ∈ SO(2N + 1).) (n) Removing this zero from (2.2) would yield the same WG as for USp. For ease of (n) notation, we refer to the third WG above (i.e., det K 1 (x j , xk ) ) as the scaling den(n) sity of O+ and the fourth WG as the scaling density of O− . (We use this notation because the former comes from orthogonal matrices with even functional equations p(z) = z N p(1/z), while the latter comes from orthogonal matrices with odd functional equations p(z) = −z N p(1/z).) One could also form a similar statistic for the eigenvalues of the GUE (where we would normalize the eigenvalues according to the Wigner semicircle law), and one could obtain the same answer (as N → ∞) as for U(N ). (n) The function WG (x) is called the n-level scaling density of the group G(N ), and its nonuniversality can be used to detect which group lies behind which family of L-functions. Notice that the normalization by N /(2π) is such that the mean spacing is 1 and that only the low-lying eigenvalues (those with θ ≤ c/N for some constant c) contribute to H (n) (A, f ). So, (2.2) measures how the low-lying eigenvalues of matrices in G(N ) fall near the point 1 (as N → ∞).

150

MICHAEL RUBINSTEIN

3. Results In this section, we consider the analog of (2.2) for the zeros of families of L-functions. One looks at the average behavior of the low-lying nontrivial zeros (i.e., those close to the real axis) of a family of L-functions hoping to find evidence (as predicted by functional field analogs (see [6])) in favor of a spectral interpretation in terms of the classical compact groups. Indeed, if we take quadratic twists of ζ (s), {L(s, χd )}, as our family of L functions, where χd (n) = dn is Kronecker’s symbol and we restrict ourselves to primitive χd , we find evidence of a USp(∞) symmetry. This is Theorem 3.1. More generally, we take a self-contragredient automorphic cuspidal representation of GL M over Q, π = π, ˜ that is, one whose L-function has real coefficients, απ ( p, j) ∈ R, and we look at the family of quadratic twists, {L(s, π ⊗ χd )}. The low-lying zeros of this family behave as if they are coming either from USp(∞) or from O± (∞). (Here the ± is to indicate that we need to consider separately the L(s, π ⊗ χd )’s with even (resp., odd) functional equations.) We describe this result in Theorem 3.2. It confirms the connection to the classical compact groups, and it gives an answer that cannot be confused with the corresponding statistic for the GUE. Numerical experiments that further support the connection to classical compact groups are described in the author’s thesis [14] and in Katz and Sarnak [6]. 3.1. Main theorem Write the nontrivial zeros of L(s, χd ) as ( j)

1/2 + iγd , where

(1)

0 ≤ α. But, by the support condition on j Q P ˆ ˆ f (v ), f (v ) = 0 if i i i i i=1 i∈∪ F` i∈∪ F` |vi | > α. Hence (3.16) is zero if j j Pk u > α; thus the claim is proved. j j=1 So, if Qn

j=1

156

MICHAEL RUBINSTEIN

CLAIM 2 Qn Pn |u i | ≤ α < 1. Then Suppose that i=1 fî (u i ) is supported in i=1   k k X X Y 3(m j ) log m j 1 −2   lim χd (m j ) Fˆ` j 1/2 X →∞ |D(X )| log X log X m d∈D(X )

m i ≥1 i=1,...,k m 1 ·...·m k 6=

j=1

j

= 0.

(3.17)

Here we are summing over all k-tuples (m 1 , . . . , m k ) of positive integers with Qk / {1, 4, 9, 16, . . .}, and S = {l1 , . . . , lk }. 1 mi ∈ Remark. This claim tells us that the only contributions to (3.14) come from perfect squares. (This is dealt with in Claim 3.) Proof Changing the order of summation and applying Claim 1 and the Cauchy-Schwarz inequality, we find that the l.h.s. of (3.17) is 1/2

  1 1   lim  k X →∞ |D(X )| log X  P

X

m i ≥1 log m i ≤α log X m 1 ·...·m k 6 =

    ·  P

X

m i ≥1 log m i ≤α log X m 1 ·...·m k 6=

 32 (m 1 ) · . . . · 32 (m k )     m1 · . . . · mk 

1/2 2  X    . χ (m · . . . · m ) d k 1   d∈D(X )

The first bracketed term is less than  k/2 X 32 (m)   logk X. m α

(3.18)

(3.19)

m≤X

Next, the number of times we may write m = m 1 · . . . · m k , m i ≥ 1, is O σ0k−1 (m) = Oε (m ε ) for any ε > 0 (σ0 (m) being the number of divisors of m), so that the second bracketed term is  2 1/2   ε X X ε  X χd (m)  . (3.20) m≤X α d∈D(X )

LOW-LYING ZEROS

157

Applying the methods of M. Jutila [4], we find that the above is 1/2 ε X ε+1+α log A X

for some constant A (A = 10 is admissable),

which, combined with (3.19), shows that (3.18) is 1 X ε+(1+α)/2 , X →∞ |D(X )|

ε lim

But, for ε small enough, this limit equals zero (because |D(X )| ∼ cX for some constant c, and we are assuming α < 1). CLAIM 3 We have

X −2 k 1 lim X →∞ |D(X )| log X d∈D(X )

 X

k Y 3(m j )

 m i ≥1 m 1 ·...·m k =

1/2

j=1

mj

χd (m j ) Fˆ` j

 log m j  log X

  | S c | Y Z X 2  −1 Fˆ` (u) du  = 2 c R `∈S2

S2 ⊆S |S2 | even

 ·

X

(A;B)

2|S2 |/2



|SY 2 |/2 Z j=1

|u| Fâ j (u) Fˆb j (u) du  .

(3.21)

R

Here we are summing over all k-tuples (m 1 , . . . , m k ) of positive integers with Qk 1 m i ∈ {1, 4, 9, 16, . . .}. Proof Q First, the 3(m i )’s restrict us to prime powers, m i = piei , so the only way that k1 m i can equal a perfect square is if some of the ei ’s are even, and the rest of the piei ’s match up to produce squares. We can focus our attention on ei = 1 or 2 since the sum over ei ≥ 3 contributes zero as X → ∞. Q Q Also, note, in (3.21), that χd ( k1 m i ) = 1 since k1 m i is restricted to perfect

158

MICHAEL RUBINSTEIN

squares. Hence the l.h.s. of (3.21) is X

lim

X →∞

X

p` S2 ⊆S |S2 | even Q`∈S2 p` =

−2 log X

|S2 | Y log( pi ) ˆ log pi F i 1/2 log X i∈S2 pi

`∈S2

·

X p` `∈S2c

c −2 | S2 | Y log( pi ) ˆ 2 log pi Fi . log X pi log X c i∈S2

P (We have dropped the (1/ |D(X )|) d∈D(X ) since the terms in the sum do not depend on d.) The sum over ` ∈ S2 corresponds to the e` ’s that are equal to 1 (and that pair up to produce squares), while the sum over ` ∈ S2c corresponds to the e` ’s that are equal to 2. To complete the proof of this claim and hence of Lemma 1, we establish the two subclaims below. SUBCLAIM

3.1

We have X −2 | S2c | Y log( pi ) 2 log pi lim Fî X →∞ log X pi log X c p` i∈S2

`∈S2c

=

c Z −1 | S2 | Y Fˆ` (u) du. 2 c R

(3.22)

`∈S2

Proof The l.h.s. of (3.22) factors Y `∈S2c

! −2 X log( p) ˆ 2 log p F` , log X p p log X

which, summing by parts, equals Y 2 Z ∞ X log( p) 2 log t 0 Fˆ` dt. p log X 1 c log X p≤t `∈S2

The sum

P

p≤t

log( p)/ p can be evaluated elementarily (see [3, p. 22]), and the above

LOW-LYING ZEROS

159

becomes 0 2 log t dt (log t + O(1)) Fˆ` log X 1 Y −2 Z ∞ 2 log t dt 1 Fˆ` +O , = log X 1 log X t log X c

2 c log X

Y

`∈S2

∞

Z

(3.23)

`∈S2

(1) the last step from integration by parts, and using the fact that Fˆ` (u) is supported in |u| ≤ α. Changing variables u = 2 log t/ log X and noting that all the Fˆ` ’s are even (since all the f i ’s are), we thus find that the limit in (3.22) is

c Z −1 | S2 | Y Fˆ` (u) du. 2 c R `∈S2

SUBCLAIM

3.2

We have

X →∞

X

lim

p` Q`∈S2 p` =

−2 log X

|S2 | Y log( pi ) ˆ log pi F i 1/2 log X i∈S2 pi

`∈S2

=

X

|S2 |/2

2

(A;B)

|SY 2 |/2 Z j=1

|u| Fâ j (u) Fˆb j (u) du.

(3.24)

R

Proof Q In (3.24), `∈S2 p` = implies that the p` ’s pair up to produce squares. So, the l.h.s. of (3.24) equals lim

X →∞

X (A;B)

X

|SY 2 |/2

pi j=1 i=1,...,|S2 |/2

log2 ( p j ) Fâ j pj log2 (X ) 4

log p j log X

Fˆb j

log p j log X

. (3.25)

The sum over (A; B) accounts for all ways of pairing up primes in (3.24). Note that there is a bit of overlap produced in (3.25), but this overlap contributes zero as X → ∞. For example, if S2 = {1, 2, 5, 7}, then the three ways of pairing up p1 , p2 , p5 , p7 are: p1 = p5 and p2 = p7 , p1 = p7 and p2 = p5 , p1 = p2 and p5 = p7 . So the sum over p1 = p2 = p5 = p7 is counted three times in (3.25), whereas it is counted only once in the l.h.s. of (3.24). Such diagonal sums do not bother us since there are Ok (1)

160

MICHAEL RUBINSTEIN

such sums, and a typical p j1 = p j2 = · · · = p j2r , r ≥ 2, contributes to (3.25) a term with a factor that is 1

lim

X →∞

log2r

X log2r p 1 lim = 0. r X →∞ p X p log2r X

Now, (3.25) can be written as lim

X →∞

2 |/2 X |SY

X log2 ( p)

4 log2 (X )

(A;B) j=1

p

p

Fâ j

log p log X

Fˆb j

log p log X

!

.

Summing by parts, we find that the bracketed term is Z ∞ 4 u Fâ j (u) Fˆb j (u) du + O (1/ log X ) . 0

ˆ are even, we obtain the subclaim. Recalling that the F’s We thus obtain Claim 3 and Lemma 1. LEMMA

2

Let a` (d) =

X γd

where F` (x) =

( j) F` Lγd ,

f i (x), and f i is as in Theorem 3.1. Then

Q

i∈F`

X ν(F) Y 1 a` (d) X →∞ |D(X )| lim

d∈D(X ) `=1

X ν(F) Y 1 (a` (d) + O(1/ log X )) . X →∞ |D(X )|

= lim

d∈D(X ) `=1

Remark. This lemma justifies dropping the O(1/ log X ) when plugging (3.12) into (3.10). Proof The proof is by induction. We consider k X Y 1 (a` (d) + O(1/ log X )) X →∞ |D(X )|

lim

d∈D(X ) `=1

(3.26)

LOW-LYING ZEROS

161

for k = 1, 2, . . . , ν(F). When k = 1, this clearly equals X 1 a` (d). X →∞ |D(X )| lim

d∈D(X )

Now, consider the general case. Multiplying out the product in (3.26), we get k X Y 1 a` (d) + remainder, X →∞ |D(X )|

lim

d∈D(X ) `=1

where the remainder consists of 2k − 1 terms, each of which is of the form   k2 X Y 1 1 a` (d)  O r j log (X ) |D(X )|

(3.27)

d∈D(X ) j=1

with r ≥ 1, k2 < k. Now, if F` (x) ≥ 0 for all x, then a` j (d) = a` j (d), and, by our inductive hypothesis combined with Lemma 1, the O-term above tends to zero as X → ∞. If F` (x) is not greater than or equal to zero for all x, we can show that the Oterm in (3.27) tends to zero as X → ∞ by replacing each f i (x) (i = 1, . . . , n) with a function gi (x), which is positive and bigger in absolute value than f i (x), and which satisfies the conditions of Theorem 3.1; that is, we require that • gi (x) ≥ | f i (x)|, • g (x) be even and in S(R), Qi n Pn • i=1 gˆ i (u i ) be supported in i=1 |u i | < 1. That there exist gi ’s satisfying the required conditions can be seen as follows. Let ( K exp(−1/(1 − t 2 )), |t| < 1, h(t) = |t| ≥ 1, 0, where K is chosen so that Z

1

h(t) dt = 1, −1

let θβ (t) =

1 h(t/β) β

(3.28)

(so that θβ approximates the δ-function when β is small), and consider 9β (x) = (θβ ∗ θβ )ˆ(x) = (θˆβ (x))2 .

(3.29)

162

MICHAEL RUBINSTEIN

Now θˆβ (x) =

1 β Z

Z

β

h(t/β) cos(2π xt) dt −β 1

h(u) cos(2πβux) du.

=

(3.30)

−1

But when |x| ≤ 1/(8β), we have θˆβ (x) >

√ Z 1 √ 2 2 h(u) du = 2 −1 2

(since, when |x| ≤ 1/(8β), |u| ≤ 1, we get, |2πβux| ≤ π/4). Hence 9β (x) > 1/2

when |x| ≤ 1/(8β)

(so 9β is bounded away from zero for long stretches when β is small), and, from (3.29), 9β (x) ≥ 0 for all x. Also, note that 9β is even and in S(R) (since h(t) enjoys these properties), and note ˆ β (t) = (θβ ∗ θβ )(t) is supported in [−2β, 2β]. We use 9β (x)’s to construct a that 9 gi (x) satisfying the three required properties. Let M f (c, d) = max | f (x)| , c≤|x|≤d

and let β −1 j

=

( 2n + j, 0,

j ≥ 1, j = 0.

(The j = 0 case is only for notational convenience.) Then gi (x) = 2

∞ X

M fi (8β j )−1 , (8β j+1 )−1 9β j+1 (x)

j=0

has the required properties. 3.3. r.h.s. Our goal is to express Z

(n)

Rn

f (x)WUSp (x) d x

in a manner that allows us to easily see how to match terms with (3.13).

LOW-LYING ZEROS

163

We consider the more general Z Rn

f (x)Wε (x) d x,

(3.31)

where ε ∈ {−1, 1} and Wε (x1 , . . . , xn ) = det K ε (x j , xk ) 1≤ j≤n , 1≤k≤n

sin(π(x − y)) sin(π(x + y)) K ε (x, y) = +ε π(x − y) π(x + y) because it is needed when we study analogous questions for GL M /Q. Write n X Y K ε (x j , xσ ( j) ). Wε (x1 , . . . , xn ) = sgn(σ ) σ

j=1

Here, σ is over all permutations of n elements. Express σ as a product of disjoint cycles G S ∗ (F1 ) × · · · × S ∗ (Fν(F) ), (3.32) σ ∈ F

where F is over set partitions of {1, . . . , n} (as in Section 3.2) and S ∗ (F` ) denotes the set of all (|F` | − 1)! cyclic permutations of the elements of F` . Notice that sgn(σ ) = Qν(F) |F` |−1 . `=1 (−1) For example, if n = 7 and F = [{1, 3, 4, 6} , {2, 5, 7}], then S ∗ ({1, 3, 4, 6}) × ∗ S ({2, 5, 7}) is the set of 12 permutations: {(1 3 4 6)(2 5 7), (1 3 6 4)(2 5 7), (1 4 3 6)(2 5 7), (1 4 6 3)(2 5 7), (1 6 3 4)(2 5 7), (1 6 4 3)(2 5 7), (1 3 4 6)(2 7 5), (1 3 6 4)(2 7 5), (1 4 3 6)(2 7 5), (1 4 6 3)(2 7 5), (1 6 3 4)(2 7 5), (1 6 4 3)(2 7 5)}. We are applying Parseval’s formula to (3.31), and thus we need to determine Wˆ ε (u). So, for each cycle (i 1 , . . . , i m ), we evaluate the Fourier transform Z P 2πi mj=1 u i j xi j K ε (xi1 , xi2 )K ε (xi2 , xi3 ) · . . . · K ε (xim , xi1 )e d xi1 · · · d xim . (3.33) Rm

Expanding the product of K ε ’s, we obtain 2m terms Z X sin(π(xim − am xi1 )) sin(π(xi1 − a1 xi2 )) ··· εβ(a) m π(x − a x ) π(xim − am xi1 ) i1 1 i2 R a ·e

2πi

Pm

j=1 u i j xi j

d xi1 · · · d xim .

(3.34)

164

MICHAEL RUBINSTEIN

Here a ranges over all 2m m-tuples (a1 , . . . , am ) with a j ∈ {1, −1}, and β(a) = # j | a j = −1 . P According to Lemma 3, if u i j < 1, then (3.34) is 2m−2 ε +

X

  m X c j u i j  1 − V c1 u i1 , . . . , cm u im , δ

c

(3.35)

j=1

where c is over all 2m−1 m-tuples (c1 , . . . , cm ) with c j ∈ {1, −1}, cm = 1, and where V ( y) = M( y) − m( y),

(3.36)

M( y) = max {sk ( y), k = 1, . . . , n} , m( y) = min {sk ( y), k = 1, . . . , n} , s j ( y) =

k X

yj.

j=1

Applying Parseval’s formula to (3.31) and recalling the assumption that the support Qn Pn |u i | < 1 (so in the integral below, we are restricted to the of i=1 fî (u i ) is in i=1 region where Lemma 3 applies), we find that (3.31) equals n Y

Z Rn

! du i fî (u i )

X ν(F) Y F

i=1

`=1

 ·

X0

2|F` |−2 ε +

X c

{i|i∈F` }

(−1)|F` |−1

   |F` | X δ c j u i j  1 − V c1 u i1 , . . . , c|F` | u i| F |  , j=1

`

(3.37) P where {i|i∈F0` } is over all (|F` | − 1)! cyclic permutations of the elements of F` . Next, in the inner sum, change variables wi j = c j u i j . Recalling that the fˆ’s are assumed to be even functions, we find that the above becomes ! Z n Y X ν(F) Y dwi fî (wi ) (−2)|F` |−1 Rn

F

i=1

`=1

   |F` | X 0 ε X  +δ · wi j  1 − V wi1 , . . . , wi| F |  . ` 2 

{i|i∈F` }

j=1

LOW-LYING ZEROS

165

Applying the combinatorial identity [15, (4.35)], we get    ! Z n Y X ν(F) Y X ε dwi fî (wi ) (−2)|F` |−1 · (|F` | − 1)! + δ  wi  n 2 R F i∈F i=1 `=1 `   X X · (|F` | − 1)! − wk  . (|H | − 1)! (|F` | − 1 − |H |)! c k∈H

[H,H ]

Here, H, H runs over all 2|F` | − 2 /2 ways of decomposing F` into two disjoint P proper subsets: H ∪ H c = F` , H ∩ H c = ∅, with H 6 = ∅, F` . Since |F` | = n, we can rewrite the above as    ! Z ν(F) n Y X Y X (|F` | − 1)! ε + δ  du i fî (u i ) (−2)n−ν(F) ui  n 2 R F i∈F` i=1 `=1   X X (|F` | − 1)! − (|H | − 1)!(|F` | − 1 − |H |)! u k  . c c

[H,H ]

k∈H

(3.38) We now prove the lemma that was required in deriving the above. LEMMA 3 Pm j=1 u j

Let

Z

< 1. Then

sin(π(x1 − a1 x2 )) sin(π(xm − am x1 )) 2πiu·x ··· e dx π(x1 − a1 x2 ) π(xm − am x1 ) a   m X X = 2m−2 ε + δ c j u j  (1 − V (c1 u 1 , . . . , cm u m )) .

X Rm

εβ(a)

c

(3.39)

j=1

The notation here is defined between (3.34) and (3.36). Note: In the degenerate case m = 1, the above should be read as Z 1 sin(2π x) 2πiux e d x = ε + δ(u), |u| < 1. 1+ε 2π x 2 R Proof The m = 1 case is easy to check and follows from the fact that (1/2)χ[−1,1] (u) = R 2πiux d x. So, assume that m ≥ 2, and consider a typical R (sin(2π x)/(2π x))e Z sin(π(x1 − a1 x2 )) sin(π(xm − am x1 )) 2πiu·x ··· e d x. (3.40) m π(x − a x ) π(xm − am x1 ) 1 1 2 R

166

MICHAEL RUBINSTEIN

Let ti = xi − ai xi+1 ,

i = 1, . . . , m − 1,

tm = x m ,

(3.41)

so that   1 a1 a1 a2 a1 a2 a3 . . . a1 · . . . · am−1     0 1 a2 a2 a3 . . . a2 · . . . · am−1    t1 x1   1 a3 . . . a3 · . . . · am−1   .   ..  0 0  . .  . = . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   t xm 0 . . . . . . . .  m 0 1 am−1 0 ................. 0 1 Let def

K (y) = sin(π y)/(π y). Changing variables, (3.40) is Z K (t1 ) · · · K (tm−1 )K (tm − am (t1 + a1 t2 + a1 a2 t3 + · · · + a1 · . . . · am−1 tm )) Rm

·e2πi(t1 s1 +···+tm sm ) dt1 · · · dtm , (3.42)

where s1 = u 1 , s2 = a 1 u 1 + u 2 , s3 = a 1 a 2 u 1 + a 2 u 2 + u 3 , .. . sk = a1 · . . . · ak−1 u 1 + a2 · . . . · ak−1 u 2 + · · · + ak−1 u k−1 + u k , .. .

(3.43)

Now, K (y) = K (−y), so, because am ∈ {1, −1}, we find that (3.42) equals Z K (t1 ) · · · K (tm−1 )K (am tm − t1 − a1 t2 − a1 a2 t3 − · · · − a1 · . . . · am−1 tm )) Rm

·e2πi(t1 s1 +···+tm sm ) dt1 · · · dtm .

Applying [15, (4.28)] (to the variable t1 with τ = −am tm + a1 t2 + a1 a2 t3 + · · · + a1 · . . . · am−1 tm ), the above becomes Z χ[−1/2,1/2] (v) χ[−1/2,1/2] (v + s1 ) e2πiv(−am tm +a1 t2 +a1 a2 t3 +···+a1 ·...·am−1 tm ) Rm

·K (t2 ) · · · K (tm−1 )e2πi(t2 s2 +···+tm sm ) dv dt2 · · · dtm .

LOW-LYING ZEROS

167

Integrating over t2 , . . . , tm−1 , we get Z χ[−1/2,1/2] (v) χ[−1/2,1/2] (v + s1 ) χ[−1/2,1/2] (a1 v + s2 ) R2

· χ[−1/2,1/2] (a1 a2 v + s3 ) · . . . · χ[−1/2,1/2] (a1 · . . . · am−2 v + sm−1 ) · e2πitm (sm +v(a1 ·...·am−1 −am )) dv dtm . (3.44) Now, if β(a) = # {i | ai = −1} is even, then a1 · . . . · am = 1, so a1 · . . . · am−1 = am and thus a1 · . . . · am−1 − am = 0. Hence the integral over tm pulls out a δ(sm ) from the integral. Next, if β(a) is odd, then a1 · . . . · am = −1, so a1 · . . . · am−1 = −am and thus a1 · . . . · am−1 − am = −2am . Hence the integral over tm gives us a δ(sm − 2am v), which, when integrated over v, pulls out a product of characteristic functions. Hence, we find that (3.44) (and hence that (3.40)) is Z δ(sm ) χ[−1/2,1/2] (v) χ[−1/2,1/2] (v + s1 ) χ[−1/2,1/2] (a1 v + s2 ) · . . . R

· χ[−1/2,1/2] (a1 · . . . · am−2 v + sm−1 ) dv if β(a) is even, (3.45) 1 sm sm sm χ[−1/2,1/2] χ[−1/2,1/2] + s1 χ[−1/2,1/2] a1 + s2 · . . . 2 2am 2am 2am sm · χ[−1/2,1/2] a1 · . . . · am−2 + sm−1 if β(a) is odd. (3.46) 2am We require the following two claims. CLAIM 4 Pm |u i | < 1. Then Let β(a) be odd, and assume that i=1 sm χ[−1/2,1/2] a1 · . . . · ak−1 + sk = 1, k = 1, . . . , m − 1. 2am

(3.47)

Thus, (3.46) equals 1/2. Proof Because ak ∈ {1, −1}, we have, from (3.43), sk = a1 · . . . · ak−1 (u 1 + a1 u 2 + a1 a2 u 3 + · · · + a1 · . . . · ak−1 u k ) .

(3.48)

So the coefficient of u j in (3.47) is (a1 · . . . · ak−1 ) (a1 · . . . · am−1 ) a1 · . . . · a j−1 + (a1 · . . . · ak−1 ) a1 · . . . · a j−1 . 2am (3.49)

168

MICHAEL RUBINSTEIN

When β(a) is odd,

Qm

i=1 ai

= −1; hence (3.49) equals (a1 · . . . · ak−1 ) a1 · . . . · a j−1 1 1 ∈ ,− . 2 2 2

So

a1 · . . . · ak−1 sm + sk < 1/2 2am Pm (since we are assuming i=1 |u i | < 1), and hence the claim is proved.

5 Pm |u i | < 1. Then (3.45) equals Let β(a) be even, and assume that i=1 CLAIM

δ(sm ) (1 − V (u 1 , a1 u 2 , . . . , a1 · . . . · am−1 u m )) with V ( y) defined in (3.36). Proof In (3.45), we have, by (3.48), χ[−1/2,1/2] (a1 · . . . · ak−1 v + sk ) = χ[−1/2,1/2] (a1 · . . . · ak−1 (v + u 1 + a1 u 2 + a1 a2 u 3 + · · · + a1 · . . . · ak−1 u k )) , and we can drop the a1 ·. . .·ak−1 ∈ {1, −1} since χ[−1/2,1/2] (y) is even. Furthermore, the δ(sm ) restricts us to u 1 +a1 u 2 +a1 a2 u 3 +· · ·+a1 ·. . .·am−1 u m = 0. And because Pm |u i | < 1 < 2, we may apply [15, Lemma 4.3], obtaining the we are assuming i=1 claim. Note: In [15, (4.32)], n could read n − 1 without affecting the truth of the equation since, in the notation of that paper, f 2 (v) f 2 (v + u 1 + · · · + u n ) = f 2 (v). We are now ready to complete the proof of this lemma. By Claim 4, the contribution to (3.39) from a with β(a) odd is X a β(a) odd

1 β(a) ε . 2

But we are assuming ε ∈ {1, −1}, so the above is 2m−2 ε. The contribution to (3.39) from a with β(a) even is, by Claim 5, X δ(sm ) (1 − V (u 1 , a1 u 2 , . . . , a1 · . . . · am−1 u m )) . a β(a) even

(3.50)

(3.51)

LOW-LYING ZEROS

169

Now, sm = a1 · . . . · am−1 (u 1 + a1 u 2 + a1 a2 u 3 + · · · + a1 · . . . · am−1 u m ) = am u 1 + am a1 u 2 + am a1 a2 u 3 + · · · + am a1 · . . . · am−1 u m because

Qm

i=1 ai

= 1 when β(a) is even. Let

c = (c1 , . . . , cm ) = (am , am a1 , am a1 a2 , . . . , am a1 · . . . · am−1 ). Qm Now, because i=1 ai = 1, c ranges over all m-tuples with c j ∈ {1, −1} and cm = 1. So, summing over such c, we find that (3.51) equals   m X X δ c j u j  (1 − V (am c1 u 1 , . . . , am cm u m )) . (3.52) c

j=1

But, because V (− y) = V ( y), the above is (regardless of the value of am = ±1)   m X X δ c j u j  (1 − V (c1 u 1 , . . . , cm u m )) . (3.53) c

j=1

This, in combination with (3.50), establishes the lemma. 3.4. l.h.s. = r.h.s. LEMMA 4 We have

Z

Y R| F` | i∈F `

du i fî (u i ) =

Z

Fˆ` (u) du. R

Proof Q Both are equal, by Fourier inversion, to i∈F` f i (0). LEMMA

5

We have



Z

  Y

R| F` |



du i fî (u i ) δ 

i∈F`

Proof We obtain the lemma by Parseval’s formula.

 X

i∈F`

Z

F` (x) d x.

ui  = R

170

MICHAEL RUBINSTEIN

LEMMA 6 Let H ⊂ F` , H 6= ∅. Then     Z X Y X  du i fî (u i ) δ  ui  uk F | | ` R i∈F`

i∈F`

Z

! ! \ \ Y Y f i (u) f i (u) |u| du.

= R

k∈H

i∈H c

i∈H

Proof We obtain the lemma by Parseval’s formula. (n)

Now, WUSp = W−1 , so we need to compare (3.38), with ε = −1, to (3.13). By Lemmas 4–6, write (3.38) as ν(F) Y X (−2)n−ν(F) (P` + Q ` + R` )

(3.54)

`=1

F

with

Z −1 P` = (|F` | − 1)! Fˆ` (u) du, 2 R Z Q ` = (|F` | − 1)! F` (x) d x, R

R` = −

X

(|H | − 1)! (|F` | − 1 − |H |)!

Z R

[H,H c ]

! ! \ \ Y Y f i (u) f i (u) |u| du. i∈H c

i∈H

(3.55) Expanding the product over `, we get ! X

(−2)n−ν(F)

F

X

Y

S

`∈S c

Q`

! X

Y

T ⊆S

`∈T c

P`

! Y

R` ,

(3.56)

`∈T

where S ranges over all subsets of 1, . . . , ν(F) . (We take empty products to be 1.) Q Expanding the product `∈T R` , we find that (3.56) is ! ! |T | X X Y X Y XY n−ν(F) H j − 1 ! (−2) Q` P` · (−1)|T | F

S

`∈S c

T ⊆S

`∈T c

H j=1

\ \ Z Y Y    · F` j − 1 − H j ! f i (u) f i (u)|u| du , R

i∈H j

i∈H jc

(3.57)

LOW-LYING ZEROS

171

P c , . . . , H , Hc |T |-tuples where is over all H , H and where T = |T | 1 H |T | 1 `1 , . . . , `|T | . (If T = ∅, we take the large bracketed factor to be 1. And if T 6= ∅, P but H contains no terms, we take it to be zero.) We have thus expressed, in (3.57), the r.h.s. of (3.7) in a form that can easily be compared with the l.h.s., as expressed in (3.13). More precisely, a typical term in (3.13) is specified by F l.h.s. , Sl.h.s. , S2 , (A; B). The sum over F arises from combinatorial sieving, and the sum over S ⊆ 1, . . . , ν(F) arises from multiplying out the explicit formula (3.12). The sum over S2 ⊆ S comes from deciding which prime powers are paired up to produce squares and which are already squares (S2c ). (A; B) accounts for all ways of pairing up S2 . The contribution to (3.13) from a typical term is   Z Y (−2)n−ν(F l.h.s. )  F` (x) d x  (|F` | − 1)! R

c `∈Sl.h.s.

 ·

Y

(|F` | − 1)!

`∈S2c

 · 2|S2 |/2

|SY 2 |/2

−1 2

Z

 Fˆ` (u) du 

R

Fa − 1 ! Fb − 1 ! j j 

= (−2)n−ν(F l.h.s. ) 

 Y

c `∈Sl.h.s.

· 2|S2 |/2

|SY 2 |/2

Fâ j (u) Fˆb j (u) |u| du  R

j=1





Z

Q` 

 Y

P` 

`∈S2c

Fa − 1 ! Fb − 1 ! j j



Z

Fâ j (u) Fˆb j (u) |u| du  .

(3.58)

R

j=1

other hand, in (3.57), a typical term is specified by F r.h.s. , Sr.h.s. , T , On the c c H1 , H1 , . . . , H|T | , H|T | . Set [ [ c F r.h.s. = F` | ` ∈ Sl.h.s. F` | ` ∈ S2c Fa j ∪ Fb j | j = 1, . . . , |S2 | /2 , H1 = Fa1 , .. . H|S2 |/2 = Fa| S

2 |/2

,

H1c = Fb1 , .. . c H|S = Fb| S 2 |/2

2 |/2

(3.59) .

Sr.h.s. and T are chosen in the obvious way (so that both products of Q’s match, and both products of P’s match). Notice that |T | = |S2 | /2 and that ν(F l.h.s. ) = ν(F r.h.s. ) + |S2 | /2.

172

MICHAEL RUBINSTEIN

The contribution to (3.57) from this term is thus    Y Y (−2)n−ν(F l.h.s. )+|S2 |/2  Q`  P`  c `∈Sl.h.s.

 · (−1)|S2 |/2

|SY 2 |/2

`∈S2c

Fa − 1 ! Fb − 1 ! j j



Z

Fâ j (u) Fˆb j (u) |u| du  , R

j=1

(3.60) which is equal, because |S2 | is even, to (3.58). So every term on the l.h.s. has a corresponding term on the r.h.s. Conversely, this method of matching (i.e., (3.59)) produces for every term on the r.h.s. its corresponding term on the l.h.s. (with the convention that we disregard, on P the r.h.s., any term with |T | ≥ 1 but H empty; we can do so since these terms contribute nothing to (3.57)). Thus (3.13) = (3.38) and Theorem 3.1 is proved. 2

3.5. Examples One term for n = 17 Let n = 17, and let F l.h.s. = [F1 , F2 , F3 , F4 , F5 , F6 , F7 ] = [{1, 2, 13} , {4} , {3, 6, 7, 9, 17} , {8, 10, 11} , {5, 12} , {14} , {15, 16}] , Sl.h.s. = {1, 2, 3, 5, 6} , S2 = {1, 2, 5, 6} ,

c Sl.h.s. = {4, 7} ,

S2c = {3} ,

(A; B) = (1, 5; 2, 6).

(3.61)

This corresponds on the r.h.s. to F r.h.s. = [F1 , F2 , F3 , F4 , F5 ] , F1 = F4 ,

F2 = F7 ,

F4 = F1 ∪ F2 ,

F5 = F5 ∪ F6 ,

Sr.h.s. = {3, 4, 5} , T = {4, 5} ,

F3 = F3 ,

c Sr.h.s. = {1, 2} ,

T c = {3} ,

H1 = F1 ,

H1c = F2 ,

H2 = F5 ,

H2c = F6 .

(3.62)

LOW-LYING ZEROS

173

Tables 3.1 and 3.2 show the correspondence between terms on the l.h.s. (as expressed in (3.58)) and the r.h.s. (as expressed in (3.60)). 3.6. Analogous results for GL M /Q Let L(s, π) be the L-function attached to a self-contragredient (π = π) ˜ automorphic cuspidal representation of GL M over Q. Such an L-function is given initially (for <s sufficiently large) as an Euler product of the form L(s, π) =

Y

L(s, π p ) =

p

M YY

(1 − απ ( p, j) p −s )−1 .

p j=1

The condition π = π˜ implies that απ ( p, j) ∈ R. The Rankin-Selberg L-function L(s, π ⊗ π) ˜ factors as the product of the symmetric and exterior square Lfunctions (see [1]): L(s, π ⊗ π) ˜ = L(s, π ⊗ π) = L(s, π, ∨2 )L(s, π, ∧2 ) and has a simple pole at s = 1 which is carried by one of the two factors. Write the order of the pole of L(s, π, ∧2 ) as (δ(π ) + 1)/2 (so that δ(π ) = ±1). We desire to generalize Theorem 3.1 to the zeros of L(s, π ⊗ χd ) whose Euler product is given by L(s, π ⊗ χd ) =

M YY

(1 − χd ( p)απ ( p, j) p −s )−1 .

p j=1

Now, when π = π, ˜ L(s, π ⊗ χd ) has a functional equation of the form 8(s, π ⊗ χd ) := π −Ms/2

M Y

0 (s + µπ ⊗χd ( j))/2 L(s, π ⊗ χd )

j=1

= ε(s, π ⊗ χd )8(1 − s, π ⊗ χd ), where the µπ ⊗χd ( j)’s are complex numbers that are known to satisfy < µπ⊗χd ( j) > −1/2 (and are conjectured to satisfy < µπ⊗χd ( j) ≥ 0). We also have −s+1/2

ε(s, π ⊗ χd ) = ε(π ⊗ χd )Q π⊗χd

−s+1/2

= ±Q π⊗χd

with ε(π ⊗ χd ) = χ 0 (d), where χ 0 is a quadratic character that depends only on π. When δ(π) = −1, all twists have ε(π ⊗ χd ) = 1. If δ(π) = 1, then half the L(s, π ⊗ χd )’s have ε(π ⊗ χd ) = 1 and the other half have ε(π ⊗ χd ) = −1 (with

174

MICHAEL RUBINSTEIN

Table 3.1. Matching the l.h.s. with the r.h.s. for n = 1, 2, 3. Here Sl.h.s. ⊆ 1, . . . , ν(F l.h.s. ) , S2 ⊆ Sl.h.s. , with |S2 | even. (A; B) accounts for all ways of pairing up S2 . Further, Sr.h.s. ⊆ 1, . . . , ν(F r.h.s. ) , T ⊆ Sr.h.s. , and H is over all |T |-tuples H1 , H1c , . . . , H|T | , H|T |c . The matching is as described in (3.59). n

F l.h.s.

Sl.h.s.

S2

(A; B)

F r.h.s.

Sr.h.s.

T

H

1

[{1}]

∅ {1}

∅ ∅

— —

[{1}]

∅ {1}

∅ ∅

— —

2

[{1, 2}]

∅ {1} ∅ {1} {2} {1, 2}

∅ ∅ ∅ ∅ ∅ ∅ {1, 2}

— — — — — — (1; 2)

[{1, 2}]

∅ {1} ∅ {1} {2} {1, 2} {1}

∅ ∅ ∅ ∅ ∅ ∅ {1}

— — — — — — [{1} , {2}]

∅ ∅ ∅ ∅ ∅ ∅ {1, 2} ∅ ∅ ∅ ∅ {1, 2} ∅ ∅ ∅ ∅ {1, 2} ∅ ∅ ∅ ∅ ∅ {1, 2} ∅ {1, 3} ∅ {2, 3} ∅ {1, 2} {1, 3} {2, 3}

— — — — — — (1; 2) — — — — (1; 2) — — — — (1; 2) — — — — — (1; 2) — (1; 3) — (2; 3) — (1; 2) (1; 3) (2; 3)

∅ {1} ∅ {1} {2} {1, 2} {1} ∅ {1} {2} {1, 2} {1} ∅ {1} {2} {1, 2} {1} ∅ {1} {2} {3} {1, 2} {1} {1, 3} {1} {2, 3} {1} {1, 2, 3} {1, 2} {1, 2} {1, 2}

∅ ∅ ∅ ∅ ∅ ∅ {1} ∅ ∅ ∅ ∅ {1} ∅ ∅ ∅ ∅ {1} ∅ ∅ ∅ ∅ ∅ {1} ∅ {1} ∅ {1} ∅ {1} {1} {1}

— — — — — — [{1, 2} , {3}] — — — — [{1, 3} , {2}] — — — — [{2, 3} , {1}] — — — — — [{1} , {2}] — [{1} , {3}] — [{2} , {3}] — [{1} , {2}] [{1} , {3}] [{2} , {3}]

[{1} , {2}]

3

[{1, 2, 3}] [{1, 2} , {3}]

∅ {1} ∅ {1} {2} {1, 2}

[{1, 3} , {2}]

∅ {1} {2} {1, 2}

[{2, 3} , {1}]

∅ {1} {2} {1, 2}

[{1} , {2} , {3}]

∅ {1} {2} {3} {1, 2} {1, 3} {2, 3} {1, 2, 3}

[{1} , {2}]

[{1, 2}] [{1, 2, 3}] [{1, 2} , {3}]

[{1, 2, 3}] [{1, 3} , {2}]

[{1, 2, 3}] [{2, 3} , {1}]

[{1, 2, 3}] [{1} , {2} , {3}]

[{1, 2} , {3}] [{1} , {2} , {3}] [{1, 3} , {2}] [{1} , {2} , {3}] [{2, 3} , {1}] [{1} , {2} , {3}] [{1, 2} , {3}] [{1, 3} , {2}] [{2, 3} , {1}]

LOW-LYING ZEROS

175

Table 3.2. Terms on the r.h.s. that are discarded since they contribute nothing to (3.57). n

F r.h.s.

Sr.h.s.

T

H

1

[{1}]

{1}

{1}

none

2

[{1} , {2}] [{1} , {2}] [{1} , {2}]

{1} {2} {1, 2}

{1} {2} {1, 2}

none none none

3

[{1, 2} , {3}] [{1, 2} , {3}] [{1, 2} , {3}] [{1, 3} , {2}] [{1, 3} , {2}] [{1, 3} , {2}] [{2, 3} , {1}] [{2, 3} , {1}] [{2, 3} , {1}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}] [{1} , {2} , {3}]

{2} {1, 2} {1, 2} {2} {1, 2} {1, 2} {2} {1, 2} {1, 2} {1} {2} {3} {1, 2} {1, 2} {1, 2} {1, 3} {1, 3} {1, 3} {2, 3} {2, 3} {2, 3} {1, 2, 3} {1, 2, 3} {1, 2, 3} {1, 2, 3} {1, 2, 3} {1, 2, 3} {1, 2, 3}

{2} {2} {1, 2} {2} {2} {1, 2} {2} {2} {1, 2} {1} {2} {3} {1} {2} {1, 2} {1} {3} {1, 3} {2} {3} {2, 3} {1} {2} {3} {1, 2} {1, 3} {2, 3} {1, 2, 3}

none none none none none none none none none none none none none none none none none none none none none none none none none none none none

176

MICHAEL RUBINSTEIN

the corresponding d’s lying in fixed arithmetic progressions to the modulus of the character χ 0 ). When ε(π ⊗ χd ) = 1, we write the nontrivial zeros of L(s, π ⊗ χd ) as ( j)

1/2 + iγπ ⊗χd , with

j = ±1, ±2, ±3, . . . , (1)

(−1)

(−2)

(2)

. . . 0 if t 6= 0, (d n φ/dt n )(0) = 0 for all n ∈ N, and φ(1) = 1. Let g : S n−1 −→ R be a C ∞ -function having zero as a regular value and such that g −1 ((−∞, 0]) = A. Define f : D(δ, 0) −→ R by f (y) = φ(||y||2 )g(x), where y = (x, ||y||) ∈ S n−1 × [0, 1] in polar coordinates. As all the derivatives of φ vanish at zero, it follows that f is T C ∞ . Clearly, f −1 ((−∞, 0])) S(δ, 0) = A. Now, in polar coordinates (x, t), the gradient of f has the form ((φ)(t 2 )/t∇g, 2t (dφ/dt)(t 2 )g). This means that the only singularity of f is at zero. Moreover, whenever g 6 = 0 the product of this gradient with the position vector has sign = sign(g) = sign( f ). Together with the fact that if x ∈ f −1 (0), x 6 = 0, then the vector x belongs to Tx ( f −1 (0)), this implies that zero is a nondepraved critical point of f . For the second part we first assume that k = 0 = h. Notice that if f : D n −→ R has a single singularity at zero which is nondepraved, then there is a new smooth function f with a single critical point and small disks (all centered at zero) D 0 ⊂ D 00 ⊂ D n such that f = f on D 0 , sign( f ) = sign( f ) on D 00 , and sign(∇ f (x) • x) = sign( f ) for x ∈ ∂ D 00 . Indeed, consider a cylindrical neighbourhood U (, δ) as in Lemma 2.2. We may round the corners of U (, δ), getting a new neighbourhood U 0 in such T −1 T −1 a way that −∇ f points in on ∂U 0 f ((0, ∞)) and out on ∂U 0 f ((−∞, 0)). We have, as in Lemma 2.2, a diffeomorphism of pairs (U 0 , A(, δ)) ≈ (D 00 , A f ) for some small disk D 00 containing U 0 . Moreover, this diffeomorphism can be taken to be equal to the identity on some small disk D 0 ⊂ U 0 . We define f as the composition of the restriction of f to U 0 with the inverse of this diffeomorphism. We perform the same construction for g, thus getting a function g. We have that −∇g and −∇ f point inside (respectively, outside) D 00 precisely on A− f = A−g (respectively, A f = A g ). This implies that the function G τ = τ f + (1 − τ )g has the same properties on ∂ D 00 . In particular, its gradient never vanishes on ∂ D 00 for any τ ∈ [0, 1]. Because of this we may use the same blow-up technique as in the proof of Proposition 2.3 to extend the function G : D 00 × [0, 1] −→ R to a smooth function F : Rn × [0, 1] −→ R such that Fτ extends G τ , with respect to a suitable metric F is a Palais-Smale deformation such that F0 extends f and F1 extends g, and the critical points of Fτ coincide with those of G τ . Notice that we may assume that the relevant metrics restrict to the canonical one in a neighbourhood of 0 ∈ Rn .

PS-DEFORMATIONS OF REAL SINGULARITIES

197

We now come back to the general case when k > 0 and h 6= 0. Because of Lemma 2.4, sums of nondepraved singularities are neat with respect to metrics restricting to products in a neighbourhood of the origin and have cylindrical neighbourhoods satisfying the diffeomorphism of Lemma 2.2. Therefore, the construction above can also be applied to f + h. The statement follows. Proof of Proposition 3.2 Let h : A f +q × [0, 1] −→ S n−1 × [0, 1] be the smooth embedding provided by the isotopy of A f +q and A g+q . It has the properties that h t is an embedding for all t ∈ [0, 1], h 0 is the inclusion of A f +q , and h 1 is a diffeomorphism A f +q ≈ A g+q . Denote by A the image of h. We may use a parametrized version of the construction in Lemma 3.3(a) to obtain a function G : D n × [0, 1] −→ R such that, for each t ∈ [0, 1], the function G t has a single singularity at zero which is nondepraved. Of course, the sublink of G 0 is A f +q , and that of G 1 is A g+q . A parametrized version of Proposition 2.3 shows that G can be extended to a PS-deformation of G 0 to G 1 . By applying the second part of Lemma 3.3, we conclude the proposition. COROLLARY 3.4 Under the assumptions of Theorem 3.1(a) we also have 6 L f ' 6 L g .

Proof It is easy to see that L f is the void set if and only if L g is void. We now assume that L f is not void. We have an obvious homotopy pushout: / Af

Lf

i

A− f

j

/ S n−1

The maps i and j are null-homotopic. This implies that up to homotopy 6 L f ' 6 A f ∨ 6 A− f ∨ S n−1 . By applying Theorem 3.1, we have 6 A f ' 6 A g and also 6 A− f ' 6 A−g . COROLLARY 3.5 The relation “ f is related to g by a stable PS-deformation” is an equivalence relation on the class of nondepraved germs.

Proof Obviously, our relation is well defined, reflexive, and symmetric. The only difficulty is transitivity. Assume that f is related to g by a stable PS-deformation and that g

198

OCTAVIAN CORNEA

is related to h by another such deformation. Then A f ' S A g ' S Ah by Theorem 3.1(a). But now Theorem 3.1(b) implies that h and f can be related by a stable PSdeformation. Remark 7 Recall that the standard equivalence relation for singularities (see, for example, [11]) is right equivalence. Two singularity germs f and g are right equivalent if there is a diffeomorphism germ h : Rn −→ Rn such that f = g ◦ h. Obviously, if f and g are nondepraved and right equivalent, then A f ≈ A g and therefore f and g are stably PS-equivalent. It is also worth mentioning that right equivalence is not implied only by the existence of an isotopy of the inclusions A f ⊂ S n−1 and A g ⊂ S n−1 (see [11]).

4. The main theorems and Hamiltonian flows 4.1. Proof of Theorem 1.1 Let SingPS n be the set of stable PS-equivalence classes of nondepraved germs f : D n −→ R. Let 9 : SingPS n −→ CWn be defined by 8([ f ]) = [A f ], where [ f ] is the stable PS-equivalence class of f and [X ] is the stable homotopy equivalence class of the complex X ⊂ S n−1 . (CWn is the set of stable homotopy classes of complexes admitting a thickening in S n−1 .) By Theorem 3.1 and Lemma 3.3 we already know that 8 is a bijection. The map 8 is clearly compatible with the involution f → − f as A− f is the closure of the complement of A f in S n−1 . To end the proof of Theorem 1.1 we only need to discuss the two operations that are transported by 8 to the wedge and, respectively, to the join of CW-complexes. The existence of these two operations is obvious as they can be defined using the bijection 8−1 . However, it is useful to have a definition that is more geometric. Fusion of singularities Given two nondepraved germs f, g : D n −→ R, we construct a function k in the following way. Let U = U (, δ) and U 0 = U 0 (, δ) be cylindrical neighbourhoods for f and g. We take the connected sum of U and U 0 by first identifying a disk in D 0 ⊂ T T f −1 (0) ∂U with a disk D 00 ⊂ g −1 (0) ∂U 0 and then extending this identification to the set W of points in ∂U and points in ∂U 0 situated on flow lines that cross D 0 , respectively, D 00 . Of course, W is again a disk (of dimension n − 1), and the union S S U W U 0 is a disk of dimension n. We define the function k : U U 0 −→ R by pasting together f and g on W . Of course, by composing with a diffeomorphism we


199

may assume that k is defined on D n . We see that k has exactly two critical points both with the same critical value equal to zero, that it is negative or zero on a set ambiently isotopic to the connected sum of A f and A g , and that it extends (translations of) both f and g. PS Assume now that [ f ] and [g] are in SingPS n . Let [ f ] ∨ [g] ∈ Singn be the stable n PS-equivalence class of a nondepraved germ h : D −→ R which admits a PSdeformation H such that H0 extends h, H1 extends the function k constructed above, and H0 as well as H1 do not have more critical points than h, respectively, k. By a method similar to the proof of Theorem 3.1(a), we see that 6 Ah ' 6 A f ∨ 6 A g . (This is because the Conley index of the maximal invariant set of k is precisely this wedge.) As 8 is bijective, it follows that this operation is well defined once we prove that such functions h, H exist. This follows immediately by the same methods as in Proposition 3.2. (The only thing that is crucial is that even if k has two critical points, it still admits cylindrical neighbourhoods homeomorphic to disks; in this case they correspond to connected sums of cylindrical neighbourhoods of f and g.) Exterior sum PS PS For [ f ] ∈ SingPS n and [g] ∈ Singk , let [ f ] ⊕ [g] = [h] ∈ Singn+k , where h is a nondepraved germ such that there is a PS-deformation relating h to f + g. (Such a germ h exists by Lemma 3.3.) By Lemma 2.4, for a metric restricting to a product in a neighbourhood of the origin, we have that f + g is neat and A f +g ' A f ∗ A g = Ah . Assume that h 0 is any other nondepraved germ with the same property relative to f + g. Then Ah 0 ' S Ah , and therefore h and h 0 are stably PS-equivalent. As a consequence, the operation is well defined. As an immediate consequence of the properties of these two operations, we obtain 2 Theorem 1.1. 4.2. Proofs of Theorem 1.2 and of Corollary 1.3 For completeness, we start by recalling the basic notions needed. A symplectic form ω on R2n is a 2-form that is closed and nowhere degenerate. Assume that α is some Riemannian metric on R2n . As before, for a smooth function g : R2n −→ R, let ∇ α g be the α-gradient of g. The Hamiltonian vector field Hg , induced by g, is defined by the equation ω(Hg , X ) = dg(X ) that holds for all smooth vector fields X . It has the property that ∇ α g and Hg are α-orthogonal. Therefore, Hg is tangent to the hypersurfaces g −1 (a). The Hamiltonian flow induced by g and ω is the flow obtained by integrating Hg . We recall that we assume our isolated singularities to be different from local extrema.

200

OCTAVIAN CORNEA

Proof of Theorem 1.2 The idea of the proof is simple and is similar to [1, Chapter I, Section 9.2]: assuming that f does not have bounded orbits, we show that there exists a continuation from the gradient flow of f to the gradient flow of − f in such a way that there is some global, compact, isolating neighbourhood. (Here f is an arbitrary PS-extension of f .) This implies, by the continuation properties of the Conley index, that 6 A f ' 6 A− f , which shows by Theorem 1.1 that f and − f are stably PS-equivalent. It is useful to note that the intermediate stages of the continuation that is constructed are not gradient flows, and, therefore, we do not construct directly a PS-deformation of f to −f. Fix a metric α on R2n . Let f be some PS-extension of f (relative to α). Fix also a symplectic form ω on R2n . Let φ, ψ, ν : [−1, 1] −→ [0, 1] be smooth functions such that φ(x) = 0 for x ≥ 0, ψ(x) = 0 for x ≤ 0, ν(x) = 0 for |x| ≥ 1/2, φ is decreasing, φ(−1) = 1, ψ is increasing, and ψ(1) = 1; ν is increasing for x < 0 and decreasing for x > 0, and ν(0) = 1; φ(x) + ψ(x) + ν(x) = 1 for all x ∈ [−1, 1]. Let X = −∇ α f , let H = H f be the Hamiltonian vector field induced by f , and let h be the associated Hamiltonian flow. Define a new vector field on [−1, 1] × R2n by V (t, x) = φ(t)X (x) + ν(t)H (x) − ψ(t)X (x). Let γ be the associated flow on [−1, 1] × R2n , and let γ τ be the restriction of γ to {τ } × R2n . For a flow η and a subset K of its domain, we denote by Iη (K ) the maximal invariant set of η inside K . We consider the set N = [−1, 1]× D(R, 0). This set is certainly compact. Notice S that Iγ (N ) = [−1, 1] × {0} Ih (D(R, 0)). This happens because Iγ τ (D(R, 0)) = {0}

if τ 6= 0

(1)

and γ 0 = h. Assume that for all x ∈ S(R, 0) the h-orbit of x is not bounded. This is T equivalent to the fact that Ih (D(R, 0)) S(R, 0) = ∅. It follows that Iγ (N ) ⊂ Int([−1, 1] × D(R, 0)), and therefore [−1, 1] × D(R, 0) is an isolating neighbourhood of this invariant set. The continuation properties of the Conley index, the identity (1), and the particular form of the Conley index of a neat singularity imply that 6 A f ' cγ −1 (0) = cγ 1 (0) ' 6 A− f . Remark 8 Clearly, the same argument as above shows that, if U (R), R ∈ R+ , is any family of compact neighbourhoods of zero with mutually disjoint boundaries and such that


R < R 0 implies U (R) ⊂ U (R 0 ) and U (R) that intersects ∂U (R).

S

201

U (R) = Rn , then there is a bounded orbit in 2

Proof of Corollary 1.3 We first verify that tame equivalence is indeed an equivalence relation. Recall that we say that two nondepraved germs f and g are tamely equivalent if there are self-dual, nondepraved germs T1 and T2 and quadratic forms q1 , q2 such that f + T1 + q1 and g + T2 + q2 are related by a PS-deformation. By Theorem 3.1 this is equivalent to A f ∗ A T1 ' S A g ∗ A T2 . Notice that if Ti is a nondepraved germ that is self-dual for i = 1, 2, then T1 ⊕ T2 is self-dual. This immediately implies that our relation is an equivalence. By Theorems 1.2 and 1.1, the statement is now obvious. Indeed, we have that SingT is isomorphic, via 8, to CW∗S factorized by the equivalence relation identifying X to 6 X for every CW-complex X and, additionally, killing all homotopy types of sublinks of self-dual germs. All these germs correspond via 8 to self-dual complexes. Moreover, as germs of the form f ⊕ (− f ) are self-dual (as they have a sublink of the form A∗ A∗ with A∗ the Spanier-Whitehead dual of A), we obtain that SingT is indeed a group, and the required isomorphism is immediate. If g is a nondepraved germ in the stable PS-class of [ f ] + [q], then A g ' S A f and, as 8( f ) 6= 0, it follows that A g is not (Spanier-Whitehead) self-dual; hence g has bounded orbits. 4.3. Closed orbits In this subsection we show Corollary 1.4 and discuss its relations with results in the literature. As above, we assume a symplectic form ω fixed on R2n . Recall that by the closed orbit of a flow γ we mean either an orbit generated by a point x such that there is a period T ∈ R with γT (x) = x or the closure of an orbit generated by a point x such that the two limits limt→+∞ γt (x) and limt→−∞ γt (x) exist and are equal. Recall also the ω-limits of a point x in the flow \ \ γ : ω+ (x) = γ[t,∞) (x), ω− (x) = γ(−∞,−t] (x). t>0

t>0

We start with a particular case. LEMMA 4.1 Generically, any PS-function g : R2n −→ R with a single critical point which is nondegenerate of index different from n induces a Hamiltonian flow with at least one nonconstant closed orbit.

202

OCTAVIAN CORNEA

Proof From Theorem 1.2 we know that the function g has bounded orbits. To show that any generic such g induces a Hamiltonian flow with at least one nonconstant closed orbit, we are going to use the C 1 -closing lemma of Pugh and Robinson [15] in the following particular form (see [15, Section 11.3]): there is a dense set of Hamiltonian vector fields, S, on R2n that satisfy c (S) ⊂ 0(S). Density is understood here in the C 1 -Whitney strong topology; 0(S) is the closure of the set of points on periodic trajectories; c (S) is the set of nonwandering points with at least one nonvoid ω-limit. We make a distinction here between closed orbits and periodic trajectories: the first class contains the second but also contains the orbits that close at a stationary point. Assume that the index of the critical point of g is k. There is a C 2 -(Whitney-compact open topology) neighbourhood of g consisting of functions that have a single critical point that is nondegenerate and of index k. Therefore, generically, we may assume that g induces a Hamiltonian vector field Hg that satisfies c (Hg ) ⊂ 0(Hg ). We may also assume that the critical point of g is zero. Assume that x ∈ R2n generates a bounded orbit of the flow induced by Hg . There are two possibilities: either ω+ (x) = ω− (x) = S 0, and in this case the orbit of x closes at zero, or there is a point y ∈ ω+ (x) ω− (x) different from zero. This point is nonwandering; notice that the orbit generated by y is also bounded, and hence its ω-limits are nonempty. Therefore y is in the closure of the space of closed orbits, and therefore this space contains more than the point zero. Proof of Corollary 1.4 Let f : D 2n −→ R be a nondepraved germ, and let f be any PS-extension of f . Close to f we may find a generic family of smooth functions g such that the induced Hamiltonian vector fields are C 1 -generic in the sense above when restricted to a closed disk D and g is a Morse-Smale function with critical points in D of distinct critical values. (This happens because these Morse functions form an open and dense set.) Assume that a1 , . . . , am are the critical values corresponding, respectively, to the critical points x1 , . . . , xm . We may apply Lemma 4.1 to each critical point xi at a time. Indeed, this lemma depends on the existence of bounded orbits of the Hamiltonian flow inside f −1 ([ai − , ai + ]) for small and all i. But the same argument as that used in the first step of the proof of Theorem 1.2 can be applied in this situation (see Remark 8), and it leads to the existence of closed orbits inside this set whenever the index of the critical point xi is not n (or, in other words, whenever xi , as singularity, is not self-dual). It is easy to see that the number of critical points of index k of g is bounded from below by rk(Hk−1 (A f ; Z)). Indeed, the maximal invariant set of the flow induced by the gradient of g is compact, and, as g is very close to f , by continuation its Conley index is 6 A f . On the other hand, the Morse complex of


203

g computes the homology of this Conley index, and our statement is implied by the Morse inequalities. Remark 9 (a) It is useful to recall that the problem of finding closed orbits on some hypersurface h −1 (a) for the Hamiltonian flow induced by a function h : R2n −→ R depends only on the set h −1 (a) and not on h. This provides the connection between our result and, for example, that of P. Rabinowitz [16] claiming that if h −1 (a) is diffeomorphic to S n−1 under radial projection, then it contains a periodic orbit. Indeed, if h −1 (a) has this property, then by possibly replacing h with a different function without modifying the preimage of a, we may assume that h has a single critical point that is nondepraved and a minimum. Corollary 1.4 implies that, generically, we can find closed orbits on a hypersurface h −1 (a). This is much weaker than Rabinowitz’s result, of course. However, our result becomes of interest in the cases when the hypersurfaces in question are not compact and the singularity of h is complicated. In this noncompact setting, it appears that no other tools are available. Of course, one would like to strengthen Corollary 1.4 by producing closed orbits whenever self-duality is not present without the genericity assumption. However, some additional conditions are probably necessary as there are compact hypersurfaces that do not carry any closed orbits (see [7]). (b) Estimating precisely the number of critical points in close Morse approximations is rather subtle. The question turns out to depend on whether the approximation needs to be only C 0 -close or higher (see [12]). For a more general discussion on morsification, see also [17]. References [1]

[2] [3]

[4] [5] [6]

C. CONLEY, Isolated Invariant Sets and the Morse Index, CBMS Regional Conf. Ser.

Math. 38, Amer. Math. Soc., Providence, 1978. MR 80c:58009 184, 186, 193, 200 O. CORNEA, Cone-decompositions and degenerate critical points, Proc. London Math. Soc. (3) 77 (1998), 437–461. MR 99j:57036 186, 188, 190, 192, 195, 196 , “Spanier-Whitehead duality and critical points” in Homotopy Theory via Algebraic Geometry and Group Representations (Evanston, Ill., 1997), Contemp. Math. 220, Amer. Math. Soc., Providence, 1998, 47–63. MR 99g:55011 192, 193 , Homotopical dynamics: Suspension and duality, Ergodic Theory Dynam. Systems 20 (2000), 379–391. MR CMP 1 756 976 185 , Homotopical dynamics, II: Hopf invariants, smoothings and the Morse complex, preprint, 1998, arXiv:math.GT/9812103 185 E. N. DANCER, Degenerate critical points, homotopy indices and Morse inequalities, J. Reine Angew. Math. 350 (1984), 1–22. MR 85i:58033 189, 195

204

OCTAVIAN CORNEA

[7]

V. GINZBURG, Some remarks on symplectic actions of compact groups, Math. Z. 210

[8]

M. GORESKY and R. MACPHERSON, Stratified Morse Theory, Ergeb. Math. Grenzgeb.

[9]

H. HOFER and E. ZEHNDER, Symplectic Invariants and Hamiltonian Dynamics,

(1992), 625–640. MR 93h:57053 203 (3) 14, Springer, Berlin, 1988. MR 90d:57039 183, 186, 187

[10] [11] [12] [13] [14] [15] [16] [17] [18]

[19] [20] [21]

[22] [23] [24]

Birkhäuser Adv. Texts Basler Lehrbucher, Birkhäuser, Basel, 1994. MR 96g:58001 185 J. F. P. HUDSON, Concordance, isotopy, and diffeotopy, Ann. of Math. (2) 91 (1970), 425–448. MR 41:4549 195 H. KING, Real analytic germs and their varieties at isolated singularities, Invent. Math. 37 (1976), 193–199. MR 54:13114 198 , The number of critical points in Morse approximations, Compositio. Math. 34 (1977), 285–288. MR 56:1330 192, 203 J. W. MILNOR, Singular Points of Complex Hypersurfaces, Ann. of Math. Stud. 61, Princeton Univ. Press, Princeton, 1968. MR 39:969 188 R. PALAIS, Lusternik-Schnirelman theory on Banach manifolds, Topology 5 (1966), 115–132. MR 41:4584 190 C. C. PUGH and C. ROBINSON, The C 1 closing lemma, including Hamiltonians, Ergodic Theory Dynam. Systems 3 (1983), 261–313. MR 85m:58106 187, 202 P. H. RABINOWITZ, Periodic solutions of Hamiltonian systems, Comm. Pure Appl. Math. 31 (1978), 157–184. MR 57:7674 203 J. REINECK, Continuation to the minimal number of critical points in gradient flows, Duke Math. J. 68 (1992), 185–194. MR 93i:58028 203 E. ROTHE, A relation between the type numbers of a critical point and the index of the corresponding field of gradient vectors, Math. Nachr. 4 (1951), 12–17. MR 12:720c 189 D. SALAMON, Connected simple systems and the Conley index of isolated invariant sets, Trans. Amer. Math. Soc. 291 (1985), 1–41. MR 87e:58182 186, 193 E. H. SPANIER, Function spaces and duality, Ann. of Math. (2) 70 (1959), 338–378. MR 21:6584 185 F. TAKENS, The minimal number of critical points of a function on a compact manifold and the Lusternik-Schnirelman category, Invent. Math. 6 (1968), 197–244. MR 38:5235 196 R. THOM, Ensembles et morphismes stratifiés, Bull. Amer. Math. Soc. 75 (1969), 240–284. MR 39:970 188 C. T. C. WALL, Classification problems in differential topology, IV: Thickenings, Topology 5 (1966), 73–94. MR 33:734 184, 195 H. WHITNEY, Elementary structure of real algebraic varieties, Ann. of Math. (2) 66 (1957), 545–556. MR 20:2342 188

Université de Lille 1, Unité de Formation et de Recherche de Mathématiques, 59655 Villeneuve D’Ascq, France; [email protected], http://www-gat.univ-lille1.fr/˜cornea/octav.html


THE ASYMPTOTICS OF MONOTONE SUBSEQUENCES OF INVOLUTIONS JINHO BAIK AND ERIC M. RAINS

Abstract We compute the limiting distributions of the lengths of the longest monotone subsequences of random (signed) involutions with or without conditions on the number of fixed points (and negated points) as the sizes of the involutions tend to infinity. The resulting distributions are, depending on the number of fixed points, (1) the TracyWidom distributions for the largest eigenvalues of random GOE, GUE, GSE matrices, (2) the normal distribution, or (3) new classes of distributions which interpolate between pairs of the Tracy-Widom distributions. We also consider the second rows of the corresponding Young diagrams. In each case the convergence of moments is also shown. The proof is based on the algebraic work of J. Baik and E. Rains in [7] which establishes a connection between the statistics of random involutions and a family of orthogonal polynomials, and an asymptotic analysis of the orthogonal polynomials which is obtained by extending the Riemann-Hilbert analysis for the orthogonal polynomials by P. Deift, K. Johansson, and Baik in [3]. 1. Introduction β-Plancherel measure In the last few years, it has been observed by many authors that there are certain connections between random permutations and/or Young tableaux, and random matrices. One of the earliest clues to this relationship appeared in the work of A. Regev [41] in 1981. A Young diagram, or equivalently a partition λ = (λ1 , λ2 , . . .) ` n P (λ1 ≥ λ2 ≥ . . . , λ j = n), is an array of n boxes with top and left adjusted as in the first picture of Figure 1, which represents the example λ = (4, 3, 1) ` 8. A standard Young tableau Q is a filling of the diagram λ by numbers 1, 2, . . . , n such that numbers are increasing along each row and along each column. In this case, we say that the tableau Q has the shape λ. The second picture in Figure 1 is an example DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 2, Received 23 February 2000. Revision received 5 February 2001. 2000 Mathematics Subject Classification. Primary 60C05; Secondary 45E05, 05A05. Baik’s work supported in part by a Sloan Doctoral Dissertation Fellowship during the academic year 1998–1999 as a graduate student at Courant Institute of Mathematical Sciences. 205

206

BAIK AND RAINS

1

2

5

3

4

8

7

6 Figure 1. Young diagram and standard Young tableau

of a standard Young tableau with shape λ = (4, 3, 1). Let dλ denote the number of standard Young tableaux of shape λ. A result of [41] is that for fixed β > 0 and fixed l, as n → ∞, X

dλ

√

λ`n λ1 ≤l

∼

(

β ll

2 /2

ln

2π )(l−1)/2 n (l−1)(l+2)/4

β

n (l−1)/2 l!

Z

e−(1/2)βl Rl

P

j

x 2j

Y

j 0, again from (1.1), we expect that in the large n limit the rows of a random β Young diagram under Mn correspond to the Coulomb charges on the real line with the quadratic potential at the inverse temperature β, which specializes to GOE, GUE, GSE eigenvalue distributions for the cases β = 1, 2, 4, respectively. This conjecture seems natural from the perspective of the discrete Coulomb gas interpretation for the Plancherel measure case Mn2 given by Johansson [31], [32]. Random involutions The Plancherel measure Mn2 has a nice combinatorial interpretation. The well-known Robinson-Schensted correspondence in [42] establishes a bijection between the permutations π of size n and the pairs of standard Young tableaux (P, Q) where the shape of P and the shape of Q are the same and the shape of P (or Q), denoted by λ(π), is a partition of n. Thus the Plancherel measure Mn2 on Yn is the pushforward of the uniform probability measure on Sn . Moreover, under this correspondence, λ1 (π) is equal to the length of the longest increasing subsequence of π. More generally, a theorem of C. Greene [26] says that λ1 (π) + · · · + λk (π) is equal to the length of the longest so-called k-increasing subsequence of π. Thus the difference of the lengths of the longest k-increasing subsequence and the longest (k − 1)-increasing subsequence of π ∈ Sn under the uniform probability measure is equal to λk of λ ∈ Yn under the Plancherel measure Mn2 in the sense of joint distributions. Thus, for example, (1.3) and (1.4) can be restated for the results on the longest increasing subsequence of a random permutation. On the other hand, the sum of the lengths of the first k columns of λ is equal to the length of the longest k-decreasing subsequence of corresponding π. But since the transpose λt has the same statistics as λ under Mn2 , the results (1.3) and (1.4) also hold for the longest decreasing subsequence of a random permutation. The measure Mn1 also has a combinatorial interpretation. If π is mapped to (P, Q) under the Robinson-Schensted correspondence, then π −1 is mapped to (Q, P) (see, e.g., [33, Sec. 5.1.4]). Therefore the set of involutions π = π −1 ∈ Sn is in bijection with the set of standard Young tableaux whose shapes are partitions of n. Consequently, the uniform probability measure on the set of involutions Sñ = {π ∈ Sn : π = π −1 }

(1.6)

is pushed forward to the 1-Plancherel measure Mn1 on Yn . Thus the result (1.5) for k = 1 implies that in the large n limit the length of the longest increasing (also decreasing) subsequence of a random involution behaves statistically like the largest eigenvalue of a random GOE matrix. An involution π ∈ Sñ consists of only 1-cycles and 2-cycles. It turns out that if we put a condition on the number of 1-cycles (or fixed points) of π, the limiting

RANDOM INVOLUTIONS

209

distribution is different. Introduce a new ensemble, Sn,m = {π ∈ S˜2n+m : |{x : π(x) = x}| = m}.

(1.7)

For an involution π, the number of fixed points is equal to the number of odd parts of λt (see [33]). Equivalently, the number of fixed points of π is equal to λ1 − λ2 + λ3 − . . .. Thus the uniform probability measure on the set Sn,m is pushed forward to the measure dλ P , λ ∈ Yn,m , (1.8) µ∈Yn,m dµ where

Yn,m = {λ = (λ1 , λ2 , . . .) ∈ Y2n+m :

X j

(−1) j−1 λ j = m}.

(1.9)

Note that the rows and columns of λ ∈ Yn,m now have different distributions. We ,(k) ,(k) denote by L n,m and L n,m the random variables given by the lengths of the kth row and the kth column, respectively, of a random λ ∈ Yn,m under the measure (1.8). ,(1) We also set L n,m = L n,m and L n,m = L ,(1) , the length of the longest increasing and decreasing subsequences of a random π ∈ Sn,m under the uniform probability measure. Set m (1.10) α=√ . 2n The limiting distribution of L n differs depending on α. Indeed, we prove in Theorem 3.1 and Theorem 9.2 that L √ − 2√2n + m n,[α 2n] lim Pr ≤ x = F4 (x), 0 ≤ α < 1, (1.11) n→∞ (2n + m)1/6 √ L √ − 2 2n + m n,[ 2n] lim Pr ≤ x = F1 (x), α = 1, (1.12) n→∞ (2n + m)1/6 L √ − (α + 1/α)√2n + m n,[α 2n] p lim Pr = erf(x), α > 1, (1.13) n→∞ (1/α − 1/α 3 )(2n + m)1/4

where F4 and F1 are the distributions for the limiting fluctuations of the largest eigenvalues of random GSE and GOE matrices, respectively, and erf is the standard normal distribution. Again, we also prove convergence of moments. We note that F4 = F1(2) ; the limiting distributions of the largest eigenvalue of GSE and the second largest eigenvalue of GOE are the same (see the discussion at the end of Section 3). The role of the number of fixed points for the limiting distribution can be seen from the following point selection picture. Consider a unit square [0, 1] × [0, 1] in the plane, and set δ = {(x, x) : 0 ≤ x ≤ 1}, the diagonal. Suppose we select n points at

210

BAIK AND RAINS

7 6 5 4

1234567 1537264

3 2 1

1

2

3

4 5

6

7

Figure 2. Point selection process

random in the lower triangle 0 ≤ x < y ≤ 1, and suppose we take the mirror image of the points about the diagonal δ. We also select m points at random on the diagonal δ. Hence there is a total of 2n + m points in the square. As illustrated in Figure 2, one such choice of points gives rise to a permutation π satisfying π 2 = 1 with m fixed points; that is, π ∈ Sn,m . The length L n (π) of the longest increasing subsequence of π is then equal to the “length” of the longest (piecewise linear) up/right path in the square from (0, 0) to (1, 1), where the “length” of a path is defined by the number of points on the path. The length of the longest up/right path in the above point selection process has the same distribution as L n . Now, note that the points on δ form an increasing path. When m is large compared to n, there are many points on δ and we expect that the longest path consists mostly of diagonal points. Hence we are in the linear statistics situation, and thus the order of fluctuation of the length of the longest path is expected to be (mean)1/2 by the usual central limit theorem. On the other hand, when m is small compared to n, then the longest path contains few diagonal points (none if m = 0) and we are in the situation of a 2-dimensional maximization problem. In this case it has been believed, and in a few cases (e.g., [3], [32], [25]) it has been proved, that the fluctuation has order (mean)1/3 . (For random permutations, which have a similar interpretation as a point selection process, one can see from the scaling in (1.4) that the fluctuation is of order (mean)1/3 .) Thus there must be a transition of the limiting distribution as the size of m varies. The results (1.11)–(1.13) show that α = 1 is the transition point. The fixed points play the role of adding a special line in the 2-dimensional maximization problem (see [6] for a relevant work where two special lines are added to a 2-dimensional maximization problem). We note that when (1) L n (see (1.5)). This α = 1, L √ in (1.12) has the same limiting distribution as e n,[ 2n] √ is because the typical number of fixed points of a random involution of size k is k.

RANDOM INVOLUTIONS

211

Indeed, the result (1.5) is proved by using (1.11)–(1.13) and by taking a summation over the number of fixed points (see Section 8) to which only α = 1 gives the main contribution. Once the transition point α = 1 is known, it is of interest to investigate the transition more carefully. We set α =1−

2w , (2n)1/6

(1.14)

and we take n → ∞ while keeping w fixed. We prove that (see Theorem 3.2) there is a one-parameter family of distribution functions F (x; w), which is expressed in terms of the Riemann-Hilbert representation for the Painlevé II equation (see Definition 4), such that L √ − 2√2n + m n,[α 2n] ≤ x = F (x; w). (1.15) lim Pr n→∞ (2n + m)1/6

The new class of distributions F (x; w) interpolates F4 and F1 as w → ∞ and w = 0, respectively, and satisfies limw→−∞ F (x; w) = 0, so (1.15) is consistent (2) with (1.11)–(1.13). Alternatively, since F4 = F1 , F (x; w) interpolates the limiting distributions of the second and first eigenvalues of a random GOE matrix. The meaning of F (x; w) in terms of random matrices is not clear, but there is a Coulomb gas interpretation. In [7, (7.64)–(7.65)], the following density function is introduced. Suppose 2N ordered particles on the positive real line, 0 < ξ 2N < · · · < ξ2 < ξ1 , are distributed according to the density function 1 Z 2N

eA

P2N

j=1 (−1)

jξ

j

Y

1≤i< j≤2N

(ξi − ξ j )

2N Y

e−ξ j /2 dξ j ,

(1.16)

j=1

where Z 2N is the normalization constant. Hence, in addition to the usual Coulomb gas interaction, there is an additional attraction between neighboring pairs ξ 2 j−1 and ξ2 j , j = 1, . . . , N . When A = 0, this attraction vanishes, and one sees that (1.16) is the eigenvalue density for the 2N × 2N Laguerre orthogonal ensemble (LOE). On the other hand, when A → ∞, the neighboring particles ξ2 j−1 , ξ2 j , j = 1, 2, . . . , N , coalesce, and (1.16) becomes the eigenvalue density function for the N × N Laguerre symplectic ensemble (LSE). Thus (1.16) interpolates LOE and LSE eigenvalue distributions. This density function arises in a symmetric version of the growth model with a growth rule given by the exponential distribution considered in [31, Prop. 1.4]. A discrete version of the above density function was considered in [8, (4.27)], and the limiting distribution of the largest particle was precisely F (x; w). Now formally taking the exponential limit (set q = 1 − 1/L and α = 1 − A/L, and take L → ∞) of [8, (4.27)] (which is convincing but is not justified yet), we see that if we set w A = 1/3 (1.17) N

212

BAIK AND RAINS

in (1.16) and if we take N → ∞, the scaled largest particle (ξ1 −4N )/(2N 1/3 ) has the limiting distribution F (x; w). We plan to exploit the justification of the exponential limit in a later publication. On the other hand, the longest decreasing subsequence corresponds to the longest down/right path from (0, 1) to (1, 0) in the above point selection process. Thus it is clear that the distribution of L n,m is insensitive to the number of fixed points (see Section 3 for results and discussions). The other ensemble we consider is the set of signed involutions. A signed permutation π is a bijection from {−n, . . . , −1, 1, . . . , n} onto itself which satisfies π(x) = −π(−x). The limiting distribution for a random signed permutation is obtained by [46] and [9]. In this paper we consider random signed involutions with/without constraints on fixed points and also on negated points (π(x) = −x) (see Section 3 for results). Especially, we obtain another one-parameter family of distributions which now interpolates F2 and F12 . Here F12 means the limiting distribution for the largest “eigenvalue” of the superimposition of the eigenvalues of two random GOE matrices. We note that in this case, F2 is equal to the second largest “eigenvalue” of such superimposition (see the discussion at the end of Section 3). For convenience of future reference, we summarize various definitions introduced above. By the kth row/column of π we mean the kth row/column of the corresponding Young diagram under the Robinson-Schensted map. Definition 1 Let Sn be the symmetric group of n letters, and let Sn be the set of bijections from {−n, . . . , −2, −1, 1, 2, . . . , n} onto itself satisfying π(x) = −π(−x). We define Sñ = {π ∈ Sn : π = π −1 }, Sn,m = π ∈ S˜2n+m : |{x : π(x) = x}| = m ,

Sñ = {π ∈ Sn : π = π −1 }, Sn,m + ,m − = π ∈ S˜2n+m + +m − : |{x : π(x) = x}| = 2m + , |{x : π(x) = −x}| = 2m − , ˜ L˜ (k) n (π) = the length of the kth row of π ∈ Sn ,

L˜ n

,(k)

(π) = the length of the kth row of π ∈ Sñ ,

,(k) L n,m (π) = the length of the kth row of π ∈ Sn,m , ,(k) (π) L n,m

,(k) L n,m (π) + ,m −

(1.18) (1.19) (1.20)

(1.21) L˜ n = L˜ (1) n , L˜ n = L˜ n

,(1) L n,m = L n,m ,

,(1)

(1.22) ,

(1.23)

(1.24)

= the length of the kth column of π ∈ Sn,m , ,(1) L n,m = L n,m ,

= the length of the kth row of π ∈ Sn,m + ,m − ,

(1.25)

RANDOM INVOLUTIONS

213 ,(1) L n,m + ,m − = L n,m . + ,m −

(1.26)

The results of this paper were announced in [8]. Since we completed this paper, there have been two applications. One is to random vicious walker models (see [23], [8]), and the other is to polynuclear growth models (see [40], [39], [6]). Indeed, there are bijections between the above two applications and various ensembles considered in this paper, and thus the results in this paper can be employed to answer asymptotic questions in the above applications. The proofs of our theorems use the Poissonization and de-Poissonization scheme of [30] and [3]. We define the Poisson generating function, for example, for L n,m by (see Definition 6) Q l (λ1 , λ2 ) := e−λ1 −λ2

X λn 1 λn 2 1 2 Pr L n 2 ,n 1 ≤ l . n 1 !n 2 !

(1.27)

n 1 ,n 2 ≥0

A generalization of the de-Poissonization lemma due to Johansson [30] yields that Pr L n 2 ,n 1 ≤ l ∼ Q l (n 1 , n 2 ) as n 1 , n 2 → ∞ (see Section 6 for the precise statement). Thus if we obtain the asymptotics of the generating function, the asymptotics of the coefficients follow. The point of the scheme is that the Poisson generating functions can be expressed in terms of Toeplitz and/or Hankel determinants. The necessary algebraic work for this purpose was done in our earlier paper [7]. The general theory of orthogonal polynomials then tells us that Toeplitz/Hankel determinants can be expressed in terms of orthogonal polynomials. In [7], it turned out that for all the ensembles being discussed in this paper, we need only one family of orthogonal polynomials πn (z; t) = z n + . . . which is orthogonal with respect to the weight et cos θ dθ/(2π) on the unit circle. This orthogonal polynomial is precisely the same orthogonal polynomial used in [3] to analyze the random permutation problem. The authors in [3] computed the uniform asymptotics of the normalization constant Nn (t) of πn as n, t → ∞ using the steepest-descent analysis for the corresponding Riemann-Hilbert problem (see (5.3)). The difference between the present paper and [3] is that we need πn (−α; t) for all α ≥ 0, which is in contrast to [3], where only one quantity, Nn (t), was needed. But in order to analyze N n (t), [3] controlled in a uniform way the asymptotic behavior of the solution to the associated Riemann-Hilbert problem. Therefore the asymptotics of πn (−α; t) for α uniformly apart from 1 can be (almost) directly read off from the analysis of [3] and eventually imply (1.11). The point z = −1 (α = 1) in the complex plane plays a special role in this RiemannHilbert analysis as discussed in [3]—it is the point where a gap starts to open up in the support of the associated equilibrium measure as the relation of t to n varies. When α → 1 as n → ∞ according to the scaling (1.14) (which is required for (1.15)), we need a more careful analysis of the Riemann-Hilbert problem which is the new part

214

BAIK AND RAINS

of the asymptotic analysis of the orthogonal polynomials and the Riemann-Hilbert problem. In this paper we establish this goal by extending the analysis of [3]. In the analysis of the Riemann-Hilbert problem in Section 10, we give a rather sketchy presentation for the parts that overlap the work of [3], but we give a full proof for the new analysis required for the case α → 1. We also rework portions of [3] as necessary for consistency of presentation. This paper is organized as follows. Section 2 defines the Tracy-Widom distribution functions as well as new classes of distribution functions which are used to state the main results, and their properties are discussed (see Lemma 2.1). The main results of the paper are then stated in Section 3. Determinantal formulae and orthogonal polynomial expressions for the Poisson generating functions are taken from [7] and summarized in Section 4. In Section 5 we state the main estimates of the relevant quantities of orthogonal polynomials; these estimates are key to the proofs of the theorems of Section 3. The de-Poissonization lemmas are stated in Section 6. Proofs of the main theorems are given in Section 7 for involutions with constraints on the number of fixed points (see Theorems 3.1, 3.2, 3.3, and 3.5), and in Section 8 for general involutions and equivalently for Mn1 (see Theorems 3.4 and 3.6), respectively. The case when α > 1 is considered in Section 9 (see remark to Theorem 3.3). Finally, the Riemann-Hilbert analysis is given in Section 10, which proves the propositions in Section 5. Notational remarks The ensemble Sñ in the present paper is identical to Sñ in [7]. In [7], Sñ was introduced to denote the ensemble of “neginvolutions,” and we investigated the longest increasing subsequence of π ∈ Sñ . But there is a bijection between Sñ and Sñ , and the longest increasing subsequence of π ∈ Sñ corresponds to the longest decreasing subsequence of the image of π. In the present paper we choose the viewpoint of considering both the increasing and decreasing subsequences of involutions of the same ensemble rather than considering only the increasing subsequences of involutions of the different ensembles. 2. Limiting distribution functions Let u(x) be the solution of the Painlevé II (PII) equation u x x = 2u 3 + xu

(2.1)

with the boundary condition u(x) ∼ − Ai(x)

as x → +∞,

(2.2)

where Ai is the Airy function. The proof of the (global) existence and the uniqueness of the solution was first established in [27]: the asymptotics as x → ±∞ are (see,

RANDOM INVOLUTIONS

215

e.g., [27], [19])

3/2

e−(4/3)x u(x) = − Ai(x) + O x 1/4 r −x 1 u(x) = − 1+O 2 2 x Recall that Ai(x) ∼ e−(2/3)x

3/2

as x → +∞,

as x → −∞.

√ /(2 π x 1/4 ) as x → +∞. Define Z x v(x) := (u(s))2 ds,

(2.3) (2.4)

(2.5)

∞

so that v 0 (x) = (u(x))2 . We can now introduce the Tracy-Widom (TW) distributions. (Note that q := −u, which Tracy and Widom used in their papers, solves the same differential equation with the boundary condition q(x) ∼ + Ai(x) as x → ∞.) Definition 2 (TW distribution functions) Set Z ∞ Z 1 ∞ 1 v(s) ds = exp − (s − x)(u(s))2 ds , F(x) := exp 2 2 x Zx ∞ 1 E(x) := exp u(s) ds , 2 x

(2.6) (2.7)

and set F2 (x) := F(x)2 = e−

R∞ x

(s−x)(u(s))2 ds

1/2

(2.8)

, R∞

F1 (x) := F(x)E(x) = F2 (x) e(1/2) x u(s) ds , F4 (x) := F(x) E(x)−1 + E(x) /2 R∞ 1/2 −(1/2) R ∞ u(s) ds x e = F2 (x) + e(1/2) x u(s) ds /2.

(2.9)

(2.10)

In [44] and [45], Tracy and Widom proved that under proper centering and scaling, the distribution of the largest eigenvalue of a random GUE/GOE/GSE matrix converges to F2 (x) / F1 (x) / F4 (x) as the size of the matrix becomes large. We note that from the asymptotics (2.3) and (2.4), for some positive constant c, 3/2 as x → +∞, (2.11) F(x) = 1 + O e−cx −cx 3/2 as x → +∞, (2.12) E(x) = 1 + O e 3 F(x) = O e−c|x| as x → −∞, (2.13)

216

BAIK AND RAINS

E(x) = O e−c|x|

3/2

as x → −∞.

(2.14)

Hence, in particular, lim x→+∞ Fβ (x) = 1 and limx→−∞ Fβ (x) = 0, β = 1, 2, 4. Monotonicity of Fβ (x) follows from the fact that Fβ (x) is the limit of a sequence of distribution functions. Therefore Fβ (x) is indeed a distribution function. Definition 3 Define χGOE , χGUE , and χGSE to be random variables whose distribution functions are given by F1 (x), F2 (x), and F4 (x), respectively. Define χGOE2 to be a random variable with the distribution function F1 (x)2 . As indicated in the introduction, we need new classes of distribution functions to describe the transitions from χGSE to χGOE and from χGUE to χGOE2 . First, we consider the Riemann-Hilbert problem (RHP) for the Painlevé II equation (see [20], [29]). Let 0 be the real line R, oriented from +∞ to −∞, and let m(· ; x) be the solution of the following RHP:   m(z; x) is analytic in z ∈ C \ 0,   !  3  1 −e−2i((4/3)z +x z) m + (z; x) = m − (z; x) 2i((4/3)z 3 +x z) for z ∈ 0,  e 0     m(z; x) = I + O(1/z) as z → ∞. (2.15) Here m + (z; x) (resp., m − ) is the limit of m(z 0 ; x) as z 0 → z from the left (resp., right) of the contour 0: m ± (z; x) = lim↓0 m(z ∓ i; x). Relation (2.15) corresponds to the RHP for the PII equation with the special monodromy data p = −q = 1, r = 0 (see [20], [29], also [22], [19]). In particular, if the solution is expanded at z = ∞, m 1 (x) 1 m(z; x) = I + +O 2 as z → ∞, (2.16) z z we have 2i(m 1 (x))12 = −2i(m 1 (x))21 = u(x), 2i(m 1 (x))22 = −2i(m 1 (x))11 = v(x),

(2.17) (2.18)

where u(x) and v(x) are defined in (2.1)–(2.5). Now we define two one-parameter families of distribution functions. Definition 4 Let m(z; x) be the solution of RHP (2.15), and denote by m jk (z; x) the ( jk)-entry of m(z; x). For w > 0, define

RANDOM INVOLUTIONS

F (x; w) := F(x)

217

m 22 (−iw; x) − m 12 (−iw; x) E(x)−1

+ m 22 (−iw; x) + m 12 (−iw; x) E(x) /2, (2.19)

and for w < 0, define F (x; w) := e(8/3)w

3 −2xw

F(x)

−m 21 (−iw; x) + m 11 (−iw; x) E(x)−1 − m 21 (−iw; x) + m 11 (−iw; x) E(x) /2.

(2.20)

Also, define F (x; w) := m 22 (−iw; x)F2 (x), F (x; w) := −e

(8/3)w 3 −2xw

w > 0,

m 21 (−iw; x)F2 (x),

(2.21) w < 0.

(2.22)

First, F (x; w) and F (x; w) are real from Lemma 2.1(i). Note that F (x; w) and F (x; w) are continuous at w = 0 since at z = 0, the jump condition of RHP (2.15) implies (m 12 )+ (0; x) = −(m 11 )− (0; x) and (m 22 )+ (0; x) = −(m 21 )− (0; x). In fact, F (x; w) and F (x; w) are entire in w ∈ C from RHP (2.15). From (2.11)–(2.14) and (2.24)–(2.27), we see that lim F (x; w), F (x; w) = 1,

x→+∞

lim F (x; w), F (x; w) = 0

x→−∞

(2.23)

for any fixed w ∈ R. Also, Theorem 3.2 shows that F (x; w) and F (x; w) are limits of distribution functions, implying that they are monotone in x. Therefore F (x; w) and F (x; w) are indeed distribution functions for each w ∈ R. Definition 5 Define χw and χw to be random variables with distribution functions F (x; w) and F (x; w), respectively. We close this section by summarizing some properties of m(−iw; x) in the following lemma. In particular, the lemma implies that F (x; w) interpolates between F4 (x) and F1 (x), and F (x; w) interpolates between F2 (x) and F1 (x)2 (see Corollary 2.2). LEMMA 2.1 0 , σ = 0 1 , and set [a, b] = ab − ba. Let σ3 = 10 −1 1 10 (i) For real w, m(−iw; x) is real.

218

(ii)

BAIK AND RAINS

For fixed w ∈ R, we have m(−iw; x) = I + e

−cx 3/2

m(−iw; x) = I + e−cx

3/2

1 − e(8/3)w 0 1

3 −2xw

1 3 −e−(8/3)w +2xw

1 1 −1 (−(4/3)w3 +xw)σ3 e e m(−iw; x) ∼ √ 2 1 1

!

!

0 , 1

,

w > 0, x → +∞, (2.24) w < 0, x → +∞, (2.25)

√ √ (( 2/3)(−x)3/2 + 2w 2 (−x)1/2 )σ3

,

w > 0, x → −∞, (2.26) √ √ 1 1 1 (−(4/3)w3 +xw)σ3 (−( 2/3)(−x)3/2 − 2w2 (−x)1/2 )σ3 e m(−iw; x) ∼ √ e , 2 −1 1 w < 0, x → −∞. (iii)

(2.27)

For any x, we have lim m(−iw; x) = lim σ1 m(−iw; x)σ1 w→0− (1/2) E(x)2 + E(x)−2 −E(x)2 = . E(x)2 (1/2) −E(x)2 + E(x)−2

w→0+

(iv)

For fixed w ∈ R \ {0}, m(−iw; x) solves the differential equation d m = w[m, σ3 ] + u(x)σ1 m, dx

(v)

(vi)

(2.28)

where u(x) is the solution of the PII equation (2.1) and (2.2). For fixed x, m(−iw; x) solves 2 ∂ −u 0 u 2 m = (−4w + x)[m, σ3 ] − 4wu(x)σ1 m − 2 0 m. u −u 2 ∂w

(2.29)

(2.30)

For any x, we have

COROLLARY

m(z; x) = σ1 m(−z; x)σ1 .

(2.31)

F (x; 0) = F1 (x),

(2.32)

2.2

We have

lim F (x; w) = F4 (x),

w→∞

(2.33)

RANDOM INVOLUTIONS

219

lim F (x; w) = 0,

(2.34)

w→−∞

F (x; 0) = F1 (x)2 ,

(2.35)

lim F (x; w) = F2 (x),

(2.36)

lim F (x; w) = 0.

(2.37)

w→∞ w→−∞

Proof The values at w = 0 follow from (2.28). For w → ±∞, note from RHP (2.15) that we have limz→∞ m(z; x) = I . Proof of Lemma 2.1 Let v(z) = v(z; x) denote the jump matrix of RHP (2.15). Since v(−z) = v(z) for z ∈ R, M(z) := m(−z; x) also solves the same RHP. By the uniqueness of the solution of RHP (2.15), we have m(−z; x) = m(z; x),

z ∈ C \ R.

(2.38)

Thus, m(−iw; x) is real for w ∈ R, thus proving (i). By the symmetry of the jump matrix, σ1 v(−z)−1 σ1 = v(z), we obtain, by an argument similar to (i), σ1 m(−z; x)σ1 = m(z; x), (2.39) which is (vi). The asymptotics results (ii) as x → ±∞ follow from the calculations in [19, Sec. 6, pp. 329–333]. For the proof of (iv), define a new matrix function f (z; x) := m(z; x)e −iθσ3 ,

θ :=

4 3 z + x z. 3

(2.40)

Then f (· ; x) satisfies the jump condition f + (z; x) = f − (z; x) 11 −1 0 for z ∈ R, and f (z; x)eiθσ3 → I as z → ∞. Since the jump matrix for f (z; x) is independent of x, f 0 (z; x), the derivative with respect to x, satisfies f +0 (z; x) = f −0 (z; x) 11 −1 0 , and f 0 eiθσ3 + iθ 0 f σ3 eiθσ3 → 0 as z → ∞. Hence f 0 f −1 has no jump across R, and it satisfies f 0 f −1 + iθ 0 f σ3 f −1 → 0 as z → ∞. If we write m(z; x) = I + m 1 (x)/z + O(1/z 2 ) as z → ∞, we have iθ 0 f σ3 f −1 = i zσ3 + i[m 1 , σ3 ] + O(z −1 ) as z → ∞. Thus f 0 f −1 is entire and as z → ∞, f 0 f −1 ∼ −i zσ3 − i[m 1 , σ3 ]. Therefore, by Liouville’s theorem, we obtain f 0 (z; x)( f (z; x))−1 = −i zσ3 − i[m 1 , σ3 ].

(2.41)

220

BAIK AND RAINS

Recalling that u(x) = 2i(m 1 (x))12 = −2i(m 1 (x))21 in (2.17), we have [m 1 , σ3 ] = iu(x)σ1 . Changing f to m from (2.40), (2.41) is d m(z; x) = i z[m(z; x), σ3 ] + u(x)σ1 m(z; x). dx

(2.42)

This is (2.29) when z = −iw. The proof of (v) is very similar to that of (iv), and the details are left to the reader. We note only that in the derivation of (v) we need the identity d m 1 = i[m 2 , σ3 ] − i[m 1 , σ3 ]m 1 , dx

(2.43)

which can be obtained from (2.42) by setting m(z; x) = I + m 1 (x)/z + m 2 (x)/z 2 + O(1/z 3 ) as z → ∞. Finally, we prove (iii). Note that limw→0± m(−iw; x) = m ± (0; x). From the jump condition at z = 0, we have 1 −1 m + (0; x) = m − (0; x) . (2.44) 1 0 Letting z → 0, Im z > 0, in (vi), we have σ1 m + (0; x)σ1 = m − (0; x), which together with (2.44) implies that m + (0; x) = σ1 m + (0; x)σ1 11 −1 0 . Thus we have a(x) b(x) (2.45) m + (0; x) = a(x) + b(x) −b(x) for some a(x), b(x). Also, the condition det v(z) = 1 for all z ∈ R implies that det m(z; x) = 1 for all z ∈ C \ R, and hence we have b2 + 2ab + 1 = 0.

(2.46)

Now letting z → 0, Im z < 0, in (2.42), we obtain 0 u(x) 0 −1 . m + (0; x)(m + (0; x)) = u(x) 0

(2.47)

Thus from (2.45) and (2.46), b0 /b = −u, which has the solution b(x) = b(y)e−

Rx y

u(s) ds

.

(2.48)

From (2.24) with w = 0+ , we have b(x) = (m 12 )+ (0; x) → −1 R∞

as x → +∞.

(2.49)

Therefore b(x) = −e x u(s) ds , which is −E(x)2 from (2.7). Now (2.46) gives a(x) = (1/2)(E(x)2 + E(x)−2 ), proving (2.28).

RANDOM INVOLUTIONS

221

3. Statement of results 3.1. Involutions with constraints on the number of fixed points Recall (see Definition 1 in the introduction) the ensembles Sñ,m , Sñ,m + ,m − of (signed) involutions with constraints on the number of fixed (and negated) points. We scale the random variables: √ L n,m − 2 2n + m χn,m := , (3.1) (2n + m)1/6 √ L n,m − 2 2n + m , (3.2) χn,m := (2n + m)1/6 √ L n,m + ,m − − 2 4n + 2m + + 2m − χn,m + ,m − := . (3.3) 22/3 (4n + 2m + + 2m − )1/6 THEOREM 3.1 For fixed α and β, we have

lim Pr χn,[√2nα] ≤ x = F4 (x), 0 ≤ α < 1, n→∞ lim Pr χn,[√2n] ≤ x = F1 (x), n→∞ lim Pr χn,[√2nα] ≤ x = 0, α > 1; n→∞

lim Pr χn,[√2nβ] ≤ x = F1 (x),

n→∞

β ≥ 0;

lim Pr χn,[√nα],[√nβ] ≤ x = F2 (x), 0 ≤ α < 1, β ≥ 0, n→∞ lim Pr χn,[√n],[√nβ] ≤ x = F1 (x)2 , β ≥ 0, n→∞ lim Pr χn,[√nα],[√nβ] ≤ x = 0, α > 1, β ≥ 0. n→∞

(3.4) (3.5) (3.6)

(3.7)

(3.8) (3.9) (3.10)

As indicated in the introduction, as α → 1 at a certain rate, we see smooth transitions. 3.2 For fixed w ∈ R and β ≥ 0, we have lim Pr χn,m ≤ x = F (x; w), n→∞ lim Pr χn,m + ,m − ≤ x = F (x; w), THEOREM

n→∞

√ m = [ 2n − 2w(2n)1/3 ], (3.11) √ √ m + = [ n − 2wn 1/3 ], m − = [ nβ]. (3.12)

222

BAIK AND RAINS

From Corollary 2.2, this result is consistent with Theorem 3.1. We also have convergence of moments. 3.3 For any p = 1, 2, 3, . . ., the following hold. For fixed α and β, lim E (χn,[√2nα] ) p = E (χGSE ) p , 0 ≤ α < 1, n→∞ lim E (χn,[√2n] ) p = E (χGOE ) p , n→∞ lim E (χn,[√2nβ] ) p = E (χGOE ) p , 0 ≤ β, n→∞ lim E (χn,[√nα],[√nβ] ) p = E (χGUE ) p , 0 ≤ α < 1, β ≥ 0, n→∞ lim E (χn,[√n],[√nβ] ) p = E (χGOE2 ) p , β ≥ 0. THEOREM

n→∞

Also, for fixed w ∈ R and β ≥ 0, lim E (χn,m ) p = E (χw ) p , n→∞ lim E (χn,m + ,m − ) p = E (χw ) p , n→∞

(3.13) (3.14) (3.15) (3.16) (3.17)

√ (3.18) m = [ 2n − 2w(2n)1/3 ], √ √ m + = [ n − 2wn 1/3 ], m − = [ nβ]. (3.19)

Remark. Theorem 3.1 shows that when α > 1 is fixed, we have used incorrect scaling. When properly scaled, the resulting limiting distribution is Gaussian (see Section 9 for the statement and the proof). The proofs of Theorems 3.1, 3.2, and 3.3 are provided in Section 7. In terms of the point selection process, which is a version of (directed site) percolation, mentioned in the introduction, the above results show that the limiting distribution of the longest path depends on the geometry of the domain, while the order of fluctuation is the same: (mean)1/3 . From (1.4), the longest up/right path in a rectangle 0 ≤ x, y ≤ 1 has F2 in the limit, while the longest up/right path in a lower triangle 0 ≤ x < y ≤ 1 has F4 (see (3.4)) in the limit if there are no points on the edge 0 ≤ x = y ≤ 1. If there are points on 0 ≤ x = y ≤ 1, they affect the length of the longest up/right path. On the other hand, the longest down/right path corresponding to L n,m can be thought of as the longest path from the point (0, 1) to the line 0 ≤ x = y ≤ 1. Thus result (3.7) shows that the point-to-line maximizing path has different limiting distribution from the point-to-point maximizing path, F2 from (1.4), though the fluctuation order is identical. One can also state similar results for the Poisson process (see Proposition 7.3) and certain directed site percolation processes considered in [31] (see [8]). This observation came from discussions between Baik and Charles Newman, to whom we are especially grateful.

RANDOM INVOLUTIONS

223

3.2. General involutions Now we consider general involutions and signed involutions without any conditions on the number of fixed or negated points. THEOREM 3.4 For any fixed x ∈ R, we have √ L˜ n − 2 n lim Pr χ˜ n := ≤ x = F1 (x), n→∞ n 1/6 √ L˜ n − 2 2n lim Pr χ˜ n := 2/3 ≤ x = F1 (x)2 . n→∞ 2 (2n)1/6

(3.20) (3.21)

Also, for any p = 1, 2, 3, . . .,

lim E (χ˜ n ) p = E (χGOE ) p , n→∞ lim E (χ˜ n ) p = E (χGOE2 ) p .

n→∞

(3.22) (3.23)

As mentioned in the introduction, this result proves that the first row of a random Young diagram under the 1-Plancherel measure Mn1 behaves statistically like the largest eigenvalue of a random GOE matrix as n → ∞. The proof of Theorem 3.4 is given in Section 8. 3.3. Second rows For the second row, we scale the same way as in (3.1)–(3.3), and we denote the scaled ,(2) ,(2) ,(2) , χn,m , and χn,m random variables by χn,m + ,m − , respectively. THEOREM 3.5 Let α, β ≥ 0 be fixed. Then

,(2) √ n, 2nα ,(2) lim Pr χ √ n,[ 2nβ] n→∞ ,(2) √ lim Pr χn,[ nα],[√nβ] n→∞

lim Pr χ

n→∞

and for any p = 1, 2, 3, . . .,

,(2) √ )p n, 2nα n→∞ ,(2) lim E (χ √ )p n,[ 2nβ] n→∞ √ √ lim E (χn,[,(2) )p nα],[ nβ] n→∞ lim E (χ

= F4 (x),

(3.24)

= F4 (x),

(3.25)

= F2 (x),

(3.26)

= E (χGSE ) p , = E (χGSE ) p , = E (χGUE ) p .

(3.27) (3.28) (3.29)

224

BAIK AND RAINS

Theorem 3.5 is proved in Section 7. As in the first row, these results yield the following theorem on the second rows of general (signed) involutions. The proof is very similar to the proof of Theorem 3.4, and we skip the details. THEOREM 3.6 For any fixed x ∈ R, we have

√ (2) L˜ n − 2 n (2) ≤ x = F4 (x), lim Pr χ˜ n := n→∞ n 1/6 √ ,(2) L˜ n − 2 2n ,(2) lim Pr χ˜ n := ≤ x = F2 (x). n→∞ 22/3 (2n)1/6

(3.30) (3.31)

Also, for any p = 1, 2, 3, . . .,

lim E (χ˜ n(2) ) p = E (χGSE ) p , n→∞ lim E (χ˜ n ,(2) ) p = E (χGUE ) p .

n→∞

(3.32) (3.33)

We conclude this section with some remarks on GOE and GSE. If the conjecture given in the introduction that the kth row of a random involution behaves in the limit like the kth largest eigenvalue of a random GOE matrix is true, the result (3.30) suggests that the limiting distribution, F1(2) , of the second largest eigenvalue of GOE is equal to the limiting distribution, F4 , of the largest eigenvalue of GSE. Equivalently, since a GSE matrix has double eigenvalues, the second eigenvalues of GOE and GSE are expected to have the same limiting distribution. An indication for this is [36, Th. 10.6.1], which says that the distributions of N alternate angles of the eigenvalues of a random (2N × 2N )-matrix taken from the circular orthogonal ensemble (COE) are identical to those of the N angles of the eigenvalues of a random (N × N )-matrix taken from the circular symplectic ensemble (CSE). Indeed, for 2N × 2N Laguerre ensembles, we have proved that the joint distributions of the second, fourth, sixth, . . . largest eigenvalues of the Laguerre orthogonal ensemble (LOE) and the Laguerre symplectic ensemble (LSE) are identical (see [7, Rem. 1 to Cor. 7.6]). In particular, since the kth largest eigenvalue of a Laguerre ensemble has the same limiting distribution as the corresponding quantity for a Gaussian ensemble, the above remark implies that F4((2k−1) = F4(2k) = F1(2k) ,

k = 1, 2, . . . .

(3.34)

Thus (3.20) and (3.30) imply that the first and second rows of a random involution have the same limiting distribution as the first and second eigenvalues of GOE, respectively.

RANDOM INVOLUTIONS

225

Recently the authors of [24] proved that the same property holds true for GOE and GSE. They also proved, among many other things, that the (2k)th “eigenvalue” of a superimposition of two random GOE matrices has the same distribution as the kth eigenvalues of a random GUE matrix. In particular, when k = 1, this implies that (F12 )(2) = F2 ,

(3.35)

and hence (3.21) and (3.31) state that the first and second rows of a random signed involution have the same limiting distribution as the first and second “eigenvalues” of the superimposition of two random GOE matrices, respectively. 4. Poisson generating functions We review the results from [7] which we need in the proof of the theorems in Section 3. As in [7], throughout the paper the notation ~ indicates an arbitrary member of the set { , , }. Definition 6 We define the Poisson generating functions for the distributions introduced above: Q l (λ1 , λ2 ) := e−λ1 −λ2 Q l (λ1 , λ2 ) := e−λ1 −λ2 Q l (λ1 , λ2 , λ3 ) := e

X λn 1 λn 2 1 2 Pr L n 2 ,n 1 ≤ l , n 1 !n 2 !

(4.1)

n 1 ,n 2 ≥0

X λn 1 λn 2 1 2 Pr L n 2 ,n 1 ≤ l , n 1 !n 2 !

n 1 ,n 2 ≥0

n

X

−λ1 −λ2 −λ3

(4.2)

n 1 ,n 2 ,n 3 ≥0

λn1 1 λn2 2 λ3 3 Pr L n 3 ,n 1 ,n 2 ≤ l . n 1 !n 2 !n 3 !

(4.3)

As in [7], let fñml (resp., fñml ) be the number of involutions on n numbers with m fixed points with no increasing (resp., decreasing) subsequence of length greater than l. Thus f˜(2n 2 +n 1 )n 1 l = Pr(L n 2 ,n 1 ≤ l) · |Sn 2 ,n 1 |, and so on. Also, let fñm + m − l be the number of signed involutions on 2n letters with 2m + fixed points and 2m − negated points with no increasing subsequence of length greater than l : f˜(2n 3 +n 1 +n 2 )n 1 n 2 l = Pr(L n 3 ,n 1 ,n 2 ≤ l) · |Sn 3 ,n 1 ,n 2 |. We also define Pl (t; α) := e−αt−t

2 /2

Pl (t; β) := e−βt−t

2 /2

X tn X 0≤n

Pl (t; α, β) := e−αt−βt−t

n!

0≤m

X tn X 0≤n 2

n!

X 0≤n

α m fñml ,

(4.4)

β m fñml ,

(4.5)

0≤m n X t

n!

0≤m + ,m −

α m + β m − fñm + m − l .

(4.6)

226

BAIK AND RAINS

Using |Sn,m | = (2n+m)!/(n!m!2n ) and |Sn,m + ,m − | = (2n+m + +m − )!/(n!m + !m − !), it is easy to check that Pl (t; α) = Q l (αt, t 2 /2),

(4.7)

Pl (t; β) = Q l (βt, t 2 /2),

(4.8)

2

Pl (t; α, β) = Q l (αt, βt, t ).

(4.9)

It turns out that the P-formulae in (4.4)–(4.6) are useful for algebraic manipulations (see [7]), while the Q-formulae (4.1)–(4.3) are well adapted to asymptotic analysis. The following results from [7] provide the starting point for our analysis in this paper. For a nonnegative integer k, define πk (z; t) = z k + . . . to be the monic orthogonal polynomial of degree k with respect to the weight function exp(t (z + 1/z)) dz/(2πi) on the unit circle. Let the norm of πk (z; t) be Nk (t): Z dz = Nn (t)δnm . πn (z; t)πm (z; t)et (z+1/z) (4.10) 2πi z 6 We note that all the coefficients of πn (z; t) are real. Define πn∗ (z; t) := z n πn (z −1 ; t).

(4.11)

Then we have the following theorem. 4.1 ([7, Cors. 4.3 and 2.7]) For α, β ≥ 0, 1 ∗ 2 P2l (t; α) = e−αt−t /2 π2l−1 (−α; t) − απ2l−1 (−α; t) Dl−− (t) 2 ∗ ++ + π2l−1 (−α; t) + απ2l−1 (−α; t) Dl−1 (t) , (4.12) THEOREM

P2l+1 (t; α) = e

−αt−t 2 /2 1

2

π2l∗ (−α; t) + απ2l (−α; t) et Dl+− (t) +

π2l∗ (−α; t) − απ2l (−α; t)

P2l+1 (t; β) = e−t P2l+1 (t; α, β) = e

2 /2

e

Dl++ (t),

−αt−t 2

πl∗ (−α; t)Dl (t),

−t

Dl−+ (t)

, (4.13) (4.14) (4.15)

where for any real t ≥ 0, Dl (t) and Dl±± (t) are certain Toeplitz and Hankel determinants which in turn can be written as Y 2 e−t Dl (t) = N j (t)−1 , (4.16) j≥l

RANDOM INVOLUTIONS

e−t e e−t e−t

2 /2

−t 2 /2 2 /2+t

2 /2−t

227

Dl−− (t) = Dl++ (t)

=

Dl+− (t) = Dl−+ (t) =

Y j≥l

Y j≥l

Y j≥l

Y j≥l

N2 j+2 (t)−1 (1 + π2 j+2 (0; t)),

(4.17)

N2 j+2 (t)−1 (1 − π2 j+2 (0; t)),

(4.18)

N2 j+1 (t)−1 (1 − π2 j+1 (0; t)),

(4.19)

N2 j+1 (t)−1 (1 + π2 j+1 (0; t)).

(4.20)

Remark. The absence of β on the right-hand side of (4.14) is fairly simple to explain. Observe that in the point selection model, the longest decreasing subsequence can always be chosen to be symmetric about the diagonal; moreover, any decreasing subsequence can contain at most one diagonal point. Thus if the longest decreasing subsequence has l points, then removing the diagonal points results in a longest decreasing subsequence with 2[l/2] points. The independence from β is thus special to Pl for l odd; for l even, we do indeed have β-dependence. But by the monotonicity of Pl in l, we need only (4.14) to compute the limiting distribution; in particular, the limiting distribution does not depend on β. A similar remark applies to (4.15). As a special case, we have the following theorem. THEOREM 4.2 ([7, Th. 2.5 and Cor. 4.3]) For l ≥ 0, we have the following formulae: −− 2 (t) + Dl++ (t) /2, P2l+2 (t; 0) = e−t /2 Dl+1 2 P2l+1 (t; 0) = e−t /2 et Dl+− (t) + e−t Dl−+ (t) /2,

P2l (t; 0) = e

P2l (t; 1) = P2l (t; 1) = e P2l+1 (t; 1) = P2l+1 (t; 1) = e P2l (t; 0, 0) = e P4l+1 (t; 1, β) = P4l+3 (t; 1, β) = Also, P0 (t; 0) = e−t

2 /2

−t 2 /2

Dl++ (t),

−t−t 2 /2 −t 2 /2 −t 2

Dl−+ (t),

Dl++ (t),

Dl (t),

−t−t 2

e Dl++ (t)Dl−+ (t), 2 −+ e−t−t Dl++ (t)Dl+1 (t).

D0−− (t) = e−t

2 /2

(4.21) (4.22) (4.23) (4.24) (4.25) (4.26) (4.27) (4.28)

.

For the second row, we define the Poisson generating functions in a similar manner. Then we have the following theorem.

228

BAIK AND RAINS

THEOREM 4.3 ([7, Th. 5.8 and Cor. 5.12]) For α, β ≥ 0,

Pl

,(2)

(t; α) = Pl (t; 0),

(4.29)

,(2) P2l+1 (t; β) = P2l (t; 0),

,(2) P2l+1 (t; α, β)

(4.30)

= P2l (t; 0, 0).

(4.31)

5. Asymptotics of orthogonal polynomials Let 6 = {z ∈ C : |z| = 1} be the unit circle in the complex plane, oriented counterclockwise. Set −1 ψ(z; t) := et (z+z ) . (5.1) Let πn (z; t) = z n + . . . be the nth monic orthogonal polynomial with respect to the measure ψ(z; t) dz/(2πi z) on the unit circle. From Theorems 4.1 and 4.2, in order to obtain the asymptotics of the Poisson generating functions, we need the asymptotics, as k, t → ∞, of Nk (t), πk (z; t), πk∗ (z; t). (5.2) In this section, we summarize the asymptotic results for these quantities. Define the (2 × 2)-matrix-valued function of z in C \ 6 by Y (z; k; t) 

:= 

πk (z; t)

∗ (z; t) −Nk−1 (t)−1 πk−1

R



πk (s;t) ψ(s;t) ds s−z 2πis k , ∗ (s;t) R π ψ(s;t) ds k−1 −1 −Nk−1 (t) 6 s−z 2πis k 6

k ≥ 1. (5.3)

Then Y (· ; k; t) solves the following RHP (see [3, Lem. 4.1]):   Y (z; k; t) is analytic in z ∈ C \ 6,   !     1 (1/z k )ψ(z; t) Y+ (z; k; t) = Y− (z; k; t) on z ∈ 6, 0 1      z −k 0  Y (z; k; t) = I + O(1/z) as z → ∞.  0 zk

(5.4)

Here the notation Y+ (z; k) (resp., Y− ) denotes the limiting value limz 0 →z Y (z 0 ; k) with |z 0 | < 1 (resp., |z 0 | > 1). Note that k and t play the role of external parameters in RHP (5.4); in particular, the term O(1/z) does not imply a uniform bound in k and t. One can easily show that the solution of RHP (5.4) is unique; hence (5.3) is the unique solution of RHP (5.4). This RHP formulation of orthogonal polynomials on the unit circle given in [3] is an adaptation of a result of A. Fokas, A. Its, and A. Kitaev in [21], where they considered orthogonal polynomials on the real line.

RANDOM INVOLUTIONS

229

From (5.3), the quantities in (5.2) are equal to Nk−1 (t)−1 = −Y21 (0; k; t), πk (z; t) = Y11 (z; k; t),

πk∗ (z; t)

k

= z Y11 (z

−1

; k; t) = Y21 (z; k + 1; t)(Y21 (0; k + 1; t))

(5.5) (5.6) −1

.

(5.7)

(For the other entries of Y , one can check directly from (5.3) that Y12 (0; k; t) = Nk (t), Y22 (0; k; t) = πk (0; t).) Thus the asymptotic analysis of RHP (5.4) would yield the asymptotics of the above quantities, and hence eventually the theorems in Section 3. The asymptotic analysis of RHP (5.4) was conducted in [3] with special interest in Y21 (0; k; t). But as mentioned in the introduction, [3] controlled the solution Y (z) to RHP (5.4) in a uniform way. In [3] and Proposition 5.1, it is natural to distinguish five different regimes of k and t. From the analysis of [3], the following results for Y (0; k; t), except for πk (0; t) in Proposition 5.1(ii), can be directly read off. For example, [3, (5.34)–(5.35)] yield the estimates for Proposition 5.1(iii) when x ≥ 0. For Proposition 5.1(ii), we need to improve the L 1 -norm bound (see [3, (5.23)]) of the associated jump matrix. If one is interested only in N k−1 (t), the first integral involving w(3) (s) in [3, displayed equation before (5.19)] vanishes, and hence [3, bound (5.23)] is enough. But for πk (0; t), this integral does not vanish, and we need an improved bound (see the discussion in (10.43)–(10.45)). PROPOSITION 5.1 ([3]) There exists M0 > 0 such that as k, t → ∞, we have the following asymptotic results for Nk−1 (t) and πk (0; t) in each different region of k and t. (i) If 0 ≤ 2t ≤ ak with 0 < a < 1, then Nk−1 (t)−1 − 1 , πk (0; t) ≤ Ce−ck (5.8)

(ii)

for some constants C, c > 0. If ak ≤ 2t ≤ k − Mk 1/3 with some M > M0 and 0 < a < 1, then

√ Nk−1 (t)−1 − 1 , πk (0; t) ≤ C e−(2 2/3)k(1−2t/k)3/2 k 1/3

(iii)

(5.9)

for some constant C > 0. If 2t = k − (x/21/3 )k 1/3 with −M ≤ x ≤ M for some constant M > 0, then 1/3 1/3 Nk−1 (t)−1 − 1 − 2 v(x) , πk (0; t) + (−1)k 2 u(x) ≤ C k 2/3 (5.10) 1/3 1/3 k k

for some constant C > 0, where u(x) and v(x) are defined in (2.1) and (2.5), respectively.

230

BAIK AND RAINS

If k + Mk 1/3 ≤ 2t ≤ ak with some M > M0 and a > 1, then

(iv)

r 2t k(2t/k−log(2t/k)−1) −1 , e N (t) − 1 k−1 k r C 2t (−1)k πk (0; t) − 1 ≤ 2t − k 2t − k

(5.11)

for some constant C > 0. If ak ≤ 2t ≤ bk with 1 < a < b, then

(v)

r 2t k(2t/k−log(2t/k)−1) −1 , N (t) − 1 e k−1 k r C 2t (−1)k πk (0; t) − 1 ≤ 2t − k k

(5.12)

for some constant C > 0.

Now we are interested in πk (z; t). If z is apart from −1 and is fixed, then similar estimates for Y (z; k; t) can be obtained from the analysis of [3]. The result below when x ≥ 0 is (almost) direct from the work of [3]. For the case when x < 0, the analysis of [3] expresses the bound in terms of the so-called g-function, and we need further analysis for this g-function. When z = 0, this g-function becomes very simple: g(0) = πi (see (10.134)–(10.139)). PROPOSITION 5.2 For 2t = k − x(k/2)1/3 , x fixed, and for each fixed z ∈ C \ 6, we have

lim et z πk (z; t) = 0,

k→∞

−1

lim z −k et z πk (z; t) = 1,

k→∞

lim et z πk∗ (z; t) = 1,

k→∞

−1

|z| < 1,

lim z −k et z πk∗ (z; t) = 0,

k→∞

|z| > 1.

(5.13) (5.14)

5.3 For 2t = k − x(k/2)1/3 , x fixed, we have for fixed α > 1, COROLLARY

lim e−αt πk (−α; t) = 0,

k→∞

lim e−αt πk∗ (−α; t) = 0.

k→∞

(5.15)

Proof Write e−αt πk (−α; t) = α k et (−α+α

−1 )

−1

α −k e−tα πk (−α; t) −1

= ek f (α;2t/k) α −k e−tα πk (−α; t),

(5.16)

RANDOM INVOLUTIONS

231

where

γ (−α + α −1 ) + log α. (5.17) 2 The function f (α; 1) is strictly decreasing for α > 0, and f (1; 1) = 0. Hence f (α; 1) < 0 for α > 1. Note that f (α; γ ) =

f (α; γ ) = f (α; 1) +

γ −1 (−α + α −1 ). 2

(5.18)

When x ≤ 0, 2t/k ≥ 1 and hence f (α; 2t/k) ≤ f (α; 1). On the other hand, when x > 0, since 2t/k − 1 = −x/(21/3 k 2/3 ), we have f (α; 2t/k) ≤ (1/2) f (α; 1) if k > (22/3 x(−α + α −1 )/( f (α; 1)))3/2 . Therefore (5.14) implies that −αt e πk (−α; t) ≤ e(k/2) f (α;1) (−α)−k e−tα −1 πk (−α; t) → 0, (5.19) as k → ∞. Similar calculations give the desired result for πk∗ .

When z → −1 (which is required for the proof of Theorem 3.2), the estimates for Y (z; k; t) cannot be directly read off from the result of [3]. However, with more detailed estimates, the same procedure as in [3] gives us the following results (see Section 10.1.3, (10.73)–(10.83), for the case when x ≥ 0, and see Section 10.2.3, (10.140)–(10.150), for the case when x < 0). Recall from Section 2 that m(· , x) solves the RHP for the PII equation (2.15). PROPOSITION 5.4 Let 2t = k − x(k/2)1/3 , where x is a fixed number. Set

α =1−

24/3 w . k 1/3

(5.20)

We have for w > 0 fixed, lim (−1)k e−tα πk (−α; t) = −m 12 (−iw; x),

(5.21)

k→∞

lim e−tα πk∗ (−α; t) = m 22 (−iw; x),

(5.22)

k→∞

and for w < 0 fixed, −1

lim (−α)−k e−tα πk (−α; t) = m 11 (−iw; x),

(5.23)

k→∞

−1

lim α −k e−tα πk∗ (−α; t) = −m 21 (−iw; x).

(5.24)

k→∞

COROLLARY 5.5 For w < 0, under the same condition as Proposition 5.4, we have

lim (−1)−k e−tα πk (−α; t) = m 11 (−iw; x)e(8/3)w

k→∞

3 −2xw

,

(5.25)

232

BAIK AND RAINS

lim e−tα πk∗ (−α; t) = −m 21 (−iw; x)e(8/3)w

k→∞

3 −2xw

.

(5.26)

Proof Note that under the stated conditions we have et (α

−1 −α)

α k = e(8/3)w

3 −2xw+O(k −1/3 )

.

(5.27)

Remark. As noted in Section 2, it follows from RHP (2.15) that (m 12 )+ (0; x) = −(m 11 )− (0; x) and (m 22 )+ (0; x) = −(m 21 )− (0; x), and hence by Corollary 5.5 the limits in (5.21) and (5.22) are in fact continuous across w = 0. For convergence of moments, we need a uniform bound of πk (z; t) for |x| ≥ M for a fixed number M > 0. The results (5.29) and (5.30) are essentially in the analysis of [3], while (5.31)–(5.34) are new estimates. We again need to extend the method of [3] to obtain the results below. The proof is provided in Section 10 (see Sections 10.1.1 and 10.1.2 for the case when x ≥ M, and see Sections 10.2.1 and 10.2.2 for the case when x ≤ −M). PROPOSITION 5.6 Define x through the relation

2t x = 1 − 1/3 2/3 . k 2 k

(5.28)

Then there exists M0 such√that the following holds for any fixed M > M 0 . Let 0 < b < 1 and 0 < L < 2−3/2 M be fixed. Then as k, t → ∞, we have for x ≥ M, tz e πk (z; t) ≤ Ce−c|x|3/2 , |z| ≤ b, (5.29) t z −1 −k 3/2 −c|x| −1 e z πk (z; t) − 1 ≤ Ce , |z| ≥ b , (5.30) −tα e πk (−α; t) ≤ Cec|x| , α = 1 − 24/3 k −1/3 w, −L ≤ w ≤ L ,

(5.31) −tα −1 3/2 e (−α)−k πk (−α; t) − 1 ≤ Ce−c|x| , α = 1 − 24/3 k −1/3 w, −L ≤ w ≤ L , (5.32) and for x ≤ −M,

−tα e πk (−α; t) ≤ C, −tα −1 e (−α)−k πk (−α; t) ≤ C,

0 < α ≤ 1,

(5.33)

α ≥ 1.

(5.34)

RANDOM INVOLUTIONS

233

COROLLARY 5.7 Let α = 1−24/3 wk −1/3 , and let −L ≤ w ≤ L for fixed L > 0. Under the assumption of Proposition 5.6, for x ≤ −M, we have −tα e πk (−α; t) ≤ Cec|x| , (5.35) −tα −1 e (−α)−k πk (−α; t) ≤ Cec|x| , (5.36)

for some positive constants C and c.

Proof We have |e−t (α−α |e

−1 )

t (α−α −1 )

α k | = e2xw+(8/3)w

α −k | = e

3 +O(k −1 )

(5.37)

,

−2xw−(8/3)w 3 +O(k −1 )

(5.38)

.

Proposition 5.6 shows that (5.35) is true for w ≥ 0. For w < 0, write −1 −1 e−tα πk (−α; t) = e−t (α−α ) (−α)k e−tα (−α)−k πk (−α; t) .

(5.39)

Now (5.35) follows from (5.34) and (5.37). The estimate (5.38) is proved similarly.

The result below is new and is used for the asymptotics of L √ when α > 1 (see n,[ 2nα] Section 10.3 for the proof). 5.8 Let α > 1 be fixed. When PROPOSITION

α t α(α 2 − 1)1/2 x = 2 − ·√ , k α +1 (α 2 + 1)3/2 k we have lim e

k→∞

−αt

k

(−α) πk (−α

−1

1 ; t) = √ 2π

Z

x

x fixed,

(5.40)

2

e−(1/2)y dy.

(5.41)

−∞

6. De-Poissonization lemmas In this section, we describe a series of Tauberian-type de-Poissonization lemmas, which enable us to extract the asymptotics of the coefficient from the knowledge of the asymptotics of its generating function. Lemma 6.1 is due to Johansson [30], and Lemma 6.2 is taken from [3, Sec. 8]. Lemmas 6.3 and 6.4 are multi-index versions. Lemmas 6.1 and 6.3 are enough for both the convergence in distribution and the convergence of moments, but for convenience of computation we use Lemmas 6.2 and 6.4 for the convergence of moments in subsequent sections.

234

BAIK AND RAINS

For a sequence q = {qn }n≥0 , we define its Poisson generating function by φ(λ) = e−λ

X

qn

0≤n

λn . n!

(6.1)

6.1 For any fixed real number d > 0, set LEMMA

p √ µ(d) n = n + (2 d + 1 + 1) n log n, p √ νn(d) = n − (2 d + 1 + 1) n log n.

(6.2) (6.3)

Then there are constants C and n 0 such that for any sequence q = {qn }n≥0 satisfying (i) qn ≥ qn+1 , (ii) 0 ≤ qn ≤ 1, for all n ≥ 0, −d φ(µ(d) ≤ qn ≤ φ(νn(d) ) + Cn −d n ) − Cn

(6.4)

for all n ≥ n 0 . LEMMA 6.2 For any fixed real number d > 0, there exist constants C and n 0 such that for any sequence q = {qn }n≥0 satisfying Lemma 6.1(i) and (ii),

√ qn ≤ Cφ(n − d n),

(6.5)

√

1 − qn ≤ C(1 − φ(n + d n)),

(6.6)

for all n ≥ n 0 . For multi-indexed sequences there are similar results. For q = {q n 1 ,n 2 }n 1 ,n 2 ≥0 , define φ(λ1 , λ2 ) = e−λ1 −λ2

X

n 1 ,n 2 ≥0

qn 1 n 2

λn1 1 λn2 2 . n 1 !n 2 !

(6.7)

From the above two lemmas, we easily obtain the following lemmas. 6.3 (d) For any real number d > 0, define µ(d) n and νn as in Lemma 6.1. Then there exist constants C and n 0 such that for any q = {qn 1 ,n 2 }n 1 ,n 2 ≥0 satisfying (i) qn 1 ,n 2 ≥ qn 1 +1,n 2 , qn 1 ,n 2 ≥ qn 1 ,n 2 +1 , (ii) 0 ≤ qn 1 ,n 2 ≤ 1, for all n 1 , n 2 ≥ 0, LEMMA

−d −d −d −d (d) (d) (d) φ(µ(d) n 1 , µn 2 ) − C(n 1 + n 2 ) ≤ qn 1 n 2 ≤ φ(νn 1 , νn 2 ) + C(n 1 + n 2 )

for all n 1 , n 2 ≥ n 0 .

(6.8)

RANDOM INVOLUTIONS

235

Similarly, we have the following lemma. LEMMA 6.4 For any fixed real number d > 0, there exist constants C and n 0 such that for any q = {qn 1 ,n 2 }n 1 ,n 2 ≥0 satisfying the two conditions in Lemma 6.3,

√ √ qn 1 n 2 ≤ Cφ(n 1 − d n 1 , n 2 − d n 2 ), √ √ 1 − qn 1 n 2 ≤ C(1 − φ(n 1 − d n 1 , n 2 + d n 2 )),

(6.9) (6.10)

for n 1 , n 2 ≥ n 0 . Remark. Similar lemmas hold true for sequences of arbitrarily many indices. 7. Proofs of Theorems 3.1, 3.2, 3.3, and 3.5 The following results follow from Proposition 5.1. Result (7.1) is derived in [3, Lem. 7.1(iii)], and the other cases are similar. We omit the details. 7.1 Let M > M0 , where M0 is given in Proposition 5.6. Then there exist positive constants C and c which are independent of M, and a positive constant C(M) which may depend on M, such that the following results hold for large l. (i) (See [3].) Define x by 2t = l − x(l/2)1/3 . For −M < x < M, X C(M) 3/2 −1 log N j (t) − 2 log F(x) ≤ 1/3 + Ce−cM . (7.1) l COROLLARY

j≥l

(ii)

Define x by t = l − (x/2)l 1/3 . For −M < x < M, X −1 , log N (t) − log F(x) 2 j+2 j≥l

X C(M) 3/2 −1 log N2 j+1 (t) − log F(x) ≤ 1/3 + Ce−cM , l j≥l X log(1 − π2 j+2 (0; t)) − log E(x) ,

(7.2)

j≥l

X C(M) 3/2 log(1 + π2 j+1 (0; t)) − log E(x) ≤ 1/3 + Ce−cM , l j≥l X , log(1 + π (0; t)) + log E(x) 2 j+2 j≥l

(7.3)

236

BAIK AND RAINS

X C(M) 3/2 ≤ log(1 − π (0; t)) + log E(x) + Ce−cM . 2 j+1 l 1/3

(7.4)

j≥l

These results yield the asymptotics of the determinants in Theorem 4.1.

7.2 There exists M1 such that for M > M1 , there exist positive constants C and c which are independent of M, and a positive constant C(M) which may depend on M, such that the following results hold for large l. (i) Define x by 2t = l − x(l/2)1/3 . For −M < x < M, COROLLARY

(ii)

−t 2 e Dl (t) − F(x)2 ≤ C(M) + Ce−cM 3/2 . l 1/3

(7.5)

Define x by t = l − (x/2)l 1/3 . For −M < x < M,

−t 2 /2 −− C(M) 3/2 e Dl (t) − F(x)E(x)−1 ≤ 1/3 + Ce−cM , l C(M) −t 2 /2 ++ 3/2 e Dl−1 (t) − F(x)E(x) ≤ 1/3 + Ce−cM , l −t 2 /2+t +− C(M) 3/2 e Dl (t) − F(x)E(x)−1 ≤ 1/3 + Ce−cM , l C(M) −t 2 /2−t −+ 3/2 e Dl (t) − F(x)E(x) ≤ 1/3 + Ce−cM . l

(7.6) (7.7) (7.8) (7.9)

Proof 3/2 For C and c in Corollary 7.1, take M1 > M0 such that Ce−cM1 ≤ 1/2. Once we 3/2 M > M1 , then for l is large, C(M)/l 1/3 + Ce−cM < 1, and hence by (7.1) fix P −1 − 2 log F(x) ≤ 1. Using |e x − 1| ≤ (e − 1)|x| for |x| ≤ 1, j≥l log N j (t) −t 2 e Dl (t) − F(x)2 = F(x)2 e

P

j≥l

log N j (t)−1 −2 log F(x)

− 1

X −1 log N j (t) − 2 log F(x) . ≤ (e − 1)F(x) 2

(7.10)

j≥l

But from (2.11) and (2.13), F(x) is bounded for x ∈ R. Hence, using (7.1), we obtain the result for (i) with new constants C, c, and C(M). For (ii), we note that F(x)E(x) and F(x)E(x)−1 are bounded for x ∈ R from (2.11)–(2.14). From Proposition 5.2, Corollary 5.3, and Theorems 4.1 and 4.2, Corollary 7.2 immediately yields the following asymptotics for Poisson generating functions.

RANDOM INVOLUTIONS

237

PROPOSITION 7.3 Let 2t = l − x(l/2)1/3 , where x is fixed. As l → ∞, for each fixed α, β,

Pl (t; α) → F4 (x),

Pl (t; 1) → F1 (x),

Pl (t; α) → 0,

0 ≤ α < 1,

α > 1,

Pl (t; β) → F1 (x),

(7.11) (7.12) (7.13)

β ≥ 0.

(7.14)

Let 4t = l − x(2l)1/3 , where x is fixed. As l → ∞, for each fixed α, β, Pl (t; α, β) → F2 (x), 2

Pl (t; 1, β) → F1 (x) , Pl (t; α, β) → 0,

0 ≤ α < 1, β ≥ 0, β ≥ 0,

(7.15) (7.16)

α > 1, β ≥ 0.

(7.17)

Similarly, using Proposition 5.4, Corollary 5.5, and Theorem 4.1, we have the following theorem. THEOREM 7.4 Let 2t = l − x(l/2)1/3 , where x is fixed. As l → ∞, we have for any fixed w ∈ R,

Pl (t; α) → F (x; w),

α =1−

24/3 w . l 1/3

(7.18)

Let 4t = l − x(2l)1/3 , where x is fixed. As l → ∞, we have for each fixed β and w ∈ R, 25/3 w Pl (t; α, β) → F (x; w), α = 1 − 1/3 , β ≥ 0. (7.19) l Recall the relation between Q l~ (λ) and Pl~ (t) in (4.7)–(4.9). We now use the dePoissonization Lemma 6.3 to obtain the asymptotic results of Theorems 3.1 and 3.2. In order to apply the de-Poissonization lemma, we need the following monotonicity results. LEMMA 7.5 (Monotonicity) For any l, Pr(L k,m ≤ l), Pr(L k,m ≤ l), and Pr(L k,m + ,m − ≤ l) are monotone decreasing in k, m, m + , and m − .

Proof We first consider Pr(L k,m ≤ l). Let f km := Pr(L k,m ≤ l) · |Sk,m | be the number of elements in Sk,m with no increasing subsequence greater than l. Consider the map h : Sk,m−1 × {1, 2, . . . , 2k + m} → Sk,m defined as follows: for (π, j) ∈ Sk,m−1 ×

238

BAIK AND RAINS

{1, 2, . . . , 2k + m}, set h(π, j)(x) = π(x) for 1 ≤ x < j − 1, h(π, j)( j) = j, and set h(π, j)(x) = π(x − 1) for j < x ≤ 2k + m. Then it is easy to see that h −1 (σ ) consists of m elements and hence that (2k + m)|Sk,m−1 | = m|Sk,m |. Moreover, if π ∈ Sk,m−1 has an increasing subsequence of length greater than l, then h(π, j) has an increasing subsequence of length greater than l. Thus (2k + m) f k(m−1) ≥ m f km . But since |Sk,m | = (2k + m)!/(2k k!m!), we obtain Pr(L k,m−1 ≤ l) ≥ Pr(L k,m ≤ l). A similar argument works for the other cases. Note that |Sk,m + ,m − | = (2k + m + , m − )!/(k!m + !m − !). Thus Lemma (6.3) can be applied to obtain the asymptotics results in Theorems 3.1 and 3.2. The proofs are similar to that in [3, Sec. 9]. Now we consider convergence of moments. For this we first obtain the following estimates that follow from Proposition 5.1(i), (ii), (iv), and (v). The proof is very similar to that of [3, Lem. 7.1(i), (ii), (iv), (v)]. Compare the results with (2.11)– (2.14), noting Corollary 7.1. COROLLARY

7.6

Set

x t = l − l 1/3 . (7.20) 2 There exists M2 such that for a fixed M > M2 , there are positive constants C = C(M) and c = c(M) such that the following results hold. (i) For x ≥ M, Y Y 3/2 1− N2 j+2 (t)−1 , 1 − N2 j+1 (t)−1 ≤ Ce−c|x| , (7.21) j≥l

1− 1− (ii)

Y j≥l

Y j≥l

j≥l

(1 − π2 j+2 (0; t)), (1 + π2 j+2 (0; t)),

For x ≤ −M, Y N2 j+2 (t)−1 , j≥l

Y j≥l

Y j≥l

Y j≥l

(1 − π2 j+2 (0; t)), (1 + π2 j+2 (0; t)),

1− 1−

Y j≥l

Y j≥l

3/2

(1 + π2 j+1 (0; t)) ≤ Ce−c|x| , (7.22) 3/2

(1 − π2 j+1 (0; t)) ≤ Ce−c|x| . (7.23)

3

N2 j+1 (t)−1 ≤ Ce−c|x| , Y j≥l

Y j≥l

(7.24) 3/2

(1 + π2 j+1 (0; t)) ≤ Ce−c|x| , 3/2

(1 − π2 j+1 (0; t)) ≤ Ce+c|x| .

(7.25) (7.26)

RANDOM INVOLUTIONS

239

Remark. From the definitions of Pl~ (t) and the equalities of Theorem 4.1, we know that all the infinite products above are between 0 and 1. Now as in [3, Sec. 9], using Lemma 6.4 and Theorems 3.1 and 3.2, this implies Theorem 3.3. Theorem 3.5 follows from Theorem 4.3. 8. Proofs of Theorems 3.4 and 3.6 In this section, we prove Theorem 3.4 by summing up the asymptotic results of Theorems 3.1, 3.2, and 3.3. Theorem 3.6 can be proved in a similar way from Theorem 4.3. Proof of (3.20) Note that we have a disjoint union Sñ =

[

Sk,m .

(8.1)

2k+m=n

Set pkml = Pr(L km ≤ l), the probability that the length of the longest decreasing subsequence of π ∈ Sk,m is less than or equal to l. As the first row and the first column of π in Sñ have the same statistics, we have X 1 Pr( L˜ n ≤ l) = pkml |Sk,m |. (8.2) | Sñ | 2k+m=n Note that

2k + m (2k)! . (8.3) 2k 2k k! As n → ∞ (see [33, pp. 66–67]), we have X 1 n/2 −n/2+√n−1/4 7 −1/2 −3/4 ˜ | Sn | = |Sk,m | = √ n e 1+ n +O n , (8.4) 24 2 |Sk,m | =

2k+m=n

√ √ and the main contribution to the sum comes from n − n +1/4 ≤ m ≤ n + n +1/4 . Fix 0 < a < 1 < b. We split the sum in (8.2) into two pieces: X 1 X Pr( L˜ n ≤ l) = pkml |Sk,m | + pkml |Sk,m | , (8.5) | Sñ | (∗) (∗∗) √ √ where (∗) is the region a n ≤ m ≤ b n and where (∗∗) is the rest. n For 2k + m = n, the quantity |Sk,m | = 2k (2k)!/(2k k!) is unimodal for 0 ≤ k ≤ √ n, and the maximum is achieved when k ∼ (n − n)/2 as n → ∞. Hence X pkml |Sk,m | ≤ n · max |Sk,[a √n] |, |Sk,[b√n] | . (8.6) (∗∗)

240

BAIK AND RAINS

√ Using Stirling’s formula for (8.3), for any fixed c, when 2k + [c n] = n, we have |Sk,[c

√

n] |

∼

−1/2+c2 /4 √ n/2 −n/2+ n(c−c log c) e n e . √ πcn 1/4

(8.7)

Hence, using (8.4), we have √ √ 1 X pkml |Sk,m | ≤ Cn 3/4 · max e n(a−1−a log a) , e n(b−1−b log b) . | Sñ | (∗∗)

(8.8)

But f (x) = x − 1 − x log x is increasing in 0 < x < 1, is decreasing in x > 1, and f (1) = 0. Therefore there are positive constants C and c such that for large n, √ 1 X pkml |Sk,m | ≤ Ce−c n . (8.9) | Sñ | (∗∗)

On the other hand, Lemma 6.3 says that (recall (4.8)) for any fixed real number √ √ d > 0, there is a constant C such that for a n ≤ m ≤ b n, (d) −1/2 1/2 (d) Pl (2µ(d) ) ; µ (2µ ) − Cn −d/2 m k k (8.10) ≤ pkml ≤ Pl (2νk(d) )1/2 ; νm(d) (2νk(d) )−1/2 + Cn −d/2 for sufficiently large n. Since Pl (t; β) ≤ Pl+1 (t; β), Theorem 4.1 for P2l+1 (t; β) yields (d)

(d)

(d)

(d)

++ ++ e−µk D[(l−1)/2] ((2µk )1/2 ) − Cn −d/2 ≤ pnml ≤ e−νk D[l/2] ((2νk )1/2 ) + Cn −d/2 . (8.11) √ √ √ √ Let l = [2 n + xn 1/6 ]. For a n ≤ m ≤ b n and hence for (n − b n)/2 ≤ k ≤ √ (n − a n)/2, p (d) (d) l/2 − (2µk )1/2 2(l/2)−1/3 , l/2 − (2νk )1/2 2(l/2)−1/3 = x + O n −1/6 log n . (8.12)

Also, note that from the asymptotics (2.3), (2.4), and (2.11)–(2.14), 1 (F(x)E(x))0 = − (v(x) + u(x))F(x)E(x) (8.13) 2 is bounded for x ∈ R. Hence, using (7.7) in Corollary 7.2, (8.12), and (8.13), we obtain −ν (d) ++ (d) 1/2 e k D ) − (F E)(x) [l/2] ((2νk ) (d) ++ (d) (d) ≤ e−νk D[l/2] ((2νk )1/2 ) − (F E) (l/2 − (2νk )1/2 )2(l/2)−1/3 (8.14) + (F E) (l/2 − (2νk(d) )1/2 )2(l/2)−1/3 − (F E)(x) p 3/2 ≤ C(M)n −1/6 + Ce−cM + Cn −1/6 log n.

RANDOM INVOLUTIONS

241

Therefore we have X (∗)

pnml |Sn,m |

≤ F(x)E(x) + C(M)n −1/6 + Ce−cM

3/2

p X + Cn −1/6 log n |Sn,m |. (8.15) (∗)

Similarly, X (∗)

pnml |Sn,m |

≥ F(x)E(x) − C(M)n −1/6 − Ce−cM

3/2

p X − Cn −1/6 log n |Sn,m |. (8.16) (∗)

But from (8.9), √ 1 X 1 X |Sn,m | = 1 − |Sn,m | = 1 + O(e−c n ). | Sñ | (∗) | Sñ | (∗∗)

(8.17)

Thus, using (8.5), (8.9), (8.15), (8.16), and (8.17), we obtain (3.20). Proof of (3.22) As in [3, Sec. 9], integrating by parts, we have Z 0 Z Z ∞ E (χ˜ n ) p = x p d Fn (x) = − px p−1 Fn (x) d x + −∞

−∞

∞

px p−1 (1− Fn (x)) d x,

0

(8.18)

√

where Fn (x) := Pr(χ˜ n ≤ x) = Pr( L˜ n ≤ 2 n + xn 1/6 ). From Theorem 4.1 and Corollary 7.6, we have 1 − e−t e−t

2 /2 2 /2

3/2

Dl++ (t) ≤ Ce−c|x| , 3

Dl++ (t) ≤ Ce−c|x| ,

x ≥ M,

(8.19)

x ≤ −M,

(8.20)

for a fixed M > M2 where t = l−(x/2)l 1/3 . Noting that P2l+1 (t; β) = e−t for all β ≥ 0, from (8.2), Lemma 6.4, (8.19), and (8.20), we obtain 3/2

1 − Fn (x) ≤ Ce−c|x| , Fn (x) ≤ Ce

−c|x|3

,

2 /2

Dl++ (t)

x ≥ M,

(8.21)

x ≤ −M.

(8.22)

Now using convergence in distribution, the dominated convergence theorem gives (3.22).

242

BAIK AND RAINS

Remark. We could also proceed using Pr( L˜ n ≤ l) =

1 | Sñ |

X

2k+m=n

pkml |Sk,m |.

(8.23)

√ The main contribution to the sum from |Sk,m | comes from the region |m − n| ≤ √ n 1/4+ . On the other hand, from Theorem 3.2, when m = n − 2wn 1/3 , the quantity √ pkml converges to F(x; w). Since the region m = n + cn 1/4+ is much narrower √ than the region m = n + cn 1/3 , the main contribution to the sum comes from the case when w = 0, implying that n 1 X ˜ Pr( L n ≤ l) ∼ F(x; 0)|Sn,m | = F1 (x). | Sñ | m=0

(8.24)

In the following proof for signed involutions, we make this argument rigorous. Proof of (3.21) We have a disjoint union Sñ =

[

2k+m + +m − =n

Sk,m + ,m − .

(8.25)

Hence again 1 Pr( L˜ n ≤ l) = S˜ n

One can check that

S

k,m + ,m −

Hence we have S˜ = n

X

2k+m + +m − =n

where

S

X

2k+m + +m − =n

pkm + m − l Sk,m + ,m − .

(2k + m + + m − )! = . k!m + !m − !

k,m + ,m −

=

X

X

0≤k≤[n/2] 0≤m + ≤n−2k

(8.26)

(8.27)

f (m + , k),

(8.28)

n! . (8.29) m + !(n − m + − 2k)!k! For fixed 0 ≤ k ≤ [n/2], f (m + , k) is unimodal in m + and achieves its maximum when m + ∼ n/2 − k. Also, f (n/2 − k, k) is unimodal in k, and the maximum is √ attained when k ∼ n/2 − n/2. Hence f (m + , k) has its maximum when (m + , k) ∼ √ √ √ ( n/2, n/2 − n/2). Consider the disk D of radius n 1/4+ centered at ( n/2, n/2 − √ n/2). We show that the main contribution to the sum in (8.28) comes from D. Set r r n n n m+ = + x, k= − + y, |x|, |y| ≤ n 1/4+ . (8.30) 2 2 2 f (m + , k) :=

RANDOM INVOLUTIONS

243

By Stirling’s formula, √ √ 1 2 2 (2n)n/2 e−n/2+ 2n e−(x +(x+2y) )/ 2n 1 + O(n −1/4+3/2 ) . enπ (8.31) Hence from the unimodality discussed above,

f (m + , k) = √

X

(m + ,k)∈D /

f (m + , k) ≤ n 2

max

(m + ,k)∈∂ D

√ 2 √ n2 f (m + , k) ≤ √ (2n)n/2 e−n/2+ 2n e−5 2n , enπ

(8.32)

and by summing up using (8.31), X

(m + ,k)∈D

√ 1 f (m + , k) = √ (2n)n/2 e−n/2+ 2n 1 + O(n −1/4+3/2 ) . 2e

(8.33)

Hence we have √ S˜ = √1 (2n)n/2 e−n/2+ 2n 1 + O(n −1/4+3/2 ) , n 2e

and the main contribution to the sum in (8.28) comes from D. As in Theorem 3.4, we write X X 1 pkm + m − l S˜k,m + ,m − + pkm + m − l S˜k,m + ,m − . Pr( L˜ n ≤ l) = S˜ c n

D

(8.34)

(8.35)

D

From (8.32) and (8.34),

1 X 2 pnm + m − l Sñ,m + ,m − ≤ e−10n . S˜ c n

(8.36)

D

On the other hand, by the remark to Lemma 6.3 and Theorem 4.1 (recall (4.9)), (d) (d) ∗ pkm + m − l ≤ e−νm + −νk π[l/2] −

νm(d)+

(νk(d) )1/2

; (νk(d) )1/2 D[l/2] ((νk(d) )1/2 ) + Cn −d/2

(8.37)

for large n. We have a similar inequality of the other direction with ν, l, and +Cn −d/2 replaced by µ,√ l − 1, and −Cn −d/2 . Let l = [2 2n + x22/3 (2n)1/6 ]. In the region D, p (d) (l/2 − 4(νk )1/2 )(l/4)−1/3 = x + O(n −1/6 log n) (8.38) and

(1 − νm(d)+ /(νk(d) )1/2 )2−4/3 (l/2)1/3 = O(n −1/12+ ).

(8.39)

244

BAIK AND RAINS

Hence, as in (8.14), using (7.5) in Corollary 7.2, Proposition 5.4, and Corollary 5.5, −ν (d) −ν (d) ∗ e m + k π [l/2] − ≤ Cn

(d)

νm +

; (νk(d) )1/2 D[l/2] ((νk(d) )1/2 ) − (νk(d) )1/2 p −1/6 −1/12+ −1/6 log n + Cn

+ C(M)n

F (x; 0)

+ Ce

(8.40)

−cM 3/2

for a constant C(M) which may depend on M and constants C and c which are independent of M. Thus for large n, √ ˜ L n − 2 2n Pr 2/3 ≤ x ≤ F (x; 0) + e(n, M) (8.41) 2 (2n)1/6 with some error e(n, M) such that lim M→∞ limn→∞ e(n, M) = 0. Similarly we have an inequality for the other direction. Recalling F (x; 0) = F1 (x)2 from (2.35), we obtain (3.21). Proof of (3.23) Integrating by parts, Z E (χ˜ n ) p =

∞

−∞

=−

Z

x p d Fn (x) 0

px −∞

p−1

Fn (x) d x +

Z

∞

(8.42) px

p−1

0

(1 − Fn (x)) d x,

√ where Fn (x) := Pr(χ˜ n ≤ x) = Pr( L˜ n ≤ 2 2n + x22/3 (2n)1/6 ). Note that when x < −(4n)1/3 , Fn (x) = 0, and that when x > 21/6 n 5/6 − (4n)1/3 , Fn (x) = 1. Let M > M0 fixed. Consider the case when −(4n)1/3 ≤ x ≤ −M. From (8.35) and (8.36), 1 X 2 Fn (x) ≤ pkm + m − l S˜k,m + ,m − + Ce−10n , (8.43) S˜ n D √ where l = [2 2n + x22/3 (2n)1/6 ]. We apply Lemma 6.4, Corollary 7.6, and (5.36) in Corollary 5.7. Note that we are in the region α → 1 faster than k −1/3 , and hence w is bounded, say, −1 ≤ w ≤ 1. So we can apply (5.36) in Corollary 5.7. Then we obtain 3 2 (8.44) Fn (x) ≤ Ce−c|x| + Ce−10n . Since −(4n)1/3 ≤ x ≤ −M, we have e−10n

2

≤ e−(10/2

4 )|x|6

;

(8.45)

thus 3

Fn (x) ≤ Ce−c|x| + Ce−(10/2

4 )|x|6

.

(8.46)

RANDOM INVOLUTIONS

245

On the other hand, when M ≤ x ≤ 21/6 n 5/6 − (4n)1/3 , similarly we have 1 X 2 1 − Fn (x) ≤ (1 − pkm + m − l ) S˜k,m + ,m − + Ce−10n . (8.47) S˜ n

D

Using Lemma 6.4, Corollary 7.6, and (5.32) in Proposition 5.6, we obtain 1 − Fn (x) ≤ Ce−c|x|

3/2

Since M ≤ x ≤ 21/6 n 5/6 − (4n)1/3 , e−10n

2

≤ e−(10/2

2

+ Ce−10n .

2/5 )|x|12/5

(8.48)

;

(8.49)

thus 1 − Fn (x) ≤ Ce−c|x|

3/2

+ Ce−(10/2

2/5 )|x|12/5

.

(8.50)

Therefore, using the dominated convergence theorem, we obtain (3.23). 9. Asymptotics for α > 1 As we remarked after Theorem 3.3, when α > 1, we must use a different scaling to obtain useful results. Let L (t; α) and L (t; α, β) be random variables with the distribution functions given by Pr(L (t; α) ≤ l) = Pl (t; α) and Pr(L (t; α, β) ≤ l) = Pl (t; α, β), respectively: the Poissonized version of L and of L . Under appropriate scalings, we obtain the Gaussian distribution in the limit. THEOREM 9.1 For α > 1 and β ≥ 0 fixed, Z x L (t; α) − (α + α −1 )t 1 2 p lim Pr e−(1/2)y dy, ≤x =√ t→∞ −1 2π −∞ (α − α )t Z x −1 L (t; α, β) − 2(α + α )t 1 2 p lim Pr e−(1/2)y dy. ≤x =√ t→∞ −1 2π −∞ 2(α − α )t

(9.1) (9.2)

Proof p Let l = (α + α −1 )t + (α − α −1 )t for L . For large t, 2t/l ≤ c < 1 for some c > 0. From Proposition 5.1(i), using Theorem 4.1, it is easy to see that 2 2 e−t /2 Dl±± (t), e−t ±t Dl±∓ (t) → 1 exponentially as l → ∞. Now Theorem 4.1, p (5.29), and Proposition 5.8 imply (9.1). For L , let l = 2(α +α −1 )t + 2(α − α −1 )t. 2 Similarly, e−t Dl (t) → 1 exponentially as l → ∞, and we obtain (9.2). Unfortunately, we can no longer apply the de-Poissonization technique; the difficulty is that (α + α −1 )t depends too strongly on small perturbations in α. Indeed, as we see, the asymptotics of the non-Poisson processes are different.

246

BAIK AND RAINS

Consider the case of involutions with [αt] fixed points and [t 2 /2] 2-cycles; the case of signed involutions is analogous. By symmetry, this is the same as the largest increasing subset distribution for the point selection process in the triangle 0 ≤ y ≤ x ≤ 1 with [t 2 /2] generic points and [αt] diagonal points. As was observed in [7, Rem. 2 to Cor. 7.6], it is equivalent to consider weakly increasing subsets where the extra points are added to the line y = 0 instead of to the diagonal. As in (3.1), let χ[t 2 /2],[αt] =

L [t 2 /2],[αt] − (α + 1/α)t p . (1/α − 1/α 3 )t

(9.3)

THEOREM 9.2 As t → ∞, the variable χ[t 2 /2],[αt] converges in distribution and moments to N (0, 1).

Proof Let S(t) be the set of points at time t, and let I be a largest increasing subset of S(t). Then there exists some number 0 ≤ s + ≤ 1 (not unique) such that (S(t) ∩ {y = 0, 0 ≤ x ≤ s + }) ⊂ I

(9.4)

and such that every other point of I has x > s + and y > 0. For any 0 ≤ s ≤ 1, we thus have f 1 (s) + f 2 (s) ≤ |I |, (9.5) where f 1 (s) is the number of points of S(t) with y = 0 and 0 ≤ x ≤ s, and where f 2 (s) is the largest increasing subset of S(t) lying entirely in the (part-open) trapezoid with x ≥ s, y > 0. Since f 1 (s) is binomial with parameters [αt] and s, we have the following lemma. LEMMA 9.3 Let M > 0 be sufficiently large and fixed. For all 0 ≤ s < 1, there exist positive constants C and c independent of s such that for w ≥ M, 2

Pr( f 1 (s) > αst + wt 1/2 ) ≤ Ce−c|w| ,

(9.6)

while for w ≤ −M, 2

Pr( f 1 (s) < αst + wt 1/2 ) ≤ Ce−c|w| .

(9.7)

For f 2 we have the following lemma. LEMMA 9.4 Let M > 0 be sufficiently large and fixed. For 0 ≤ s < 1, there exist positive constants

RANDOM INVOLUTIONS

247

C and c independent of s such that for all w ≥ M, p 3/2 Pr( f 2 (s) > 2 (1 − s)t + wt 1/3 ) ≤ Ce−c|w| ,

(9.8)

and for all w ≤ −M,

p 3 Pr( f 2 (s) < 2 (1 − s)t + wt 1/3 ) ≤ Ce−c|w| .

(9.9)

Proof We first show the corresponding large-deviation result for the Poissonization. Define f 20 (s, t) to be the length of the longest increasing subsequence when the number of points in the trapezoid is Poisson with parameter t 2 (1 − s 2 )/2. Then f 20 (s, t) is bounded between the corresponding processes for the rectangle s ≤ x ≤ 1, 0 ≤ y ≤ 1 and for the triangle 0 ≤ (x − s)/(1 − s) ≤ y ≤ 1. In particular, if f 20 (s, t) deviates sig√ nificantly from 1 − st, so must the appropriate bounding process; the result follows immediately from the corresponding results for rectangles and triangles. The corresponding large-deviation result when the number of points is fixed then follows from Lemma 6.2. In our case the number of points in the trapezoid is binomial with parameters t 2 /2 and (1 − s 2 ); the lemma follows via essentially the same argument used to prove Lemma 6.2. As we see, the value s = 1 − α −2 deserves special attention. LEMMA 9.5 Let M > 0 be sufficiently large and fixed. There exist positive constants C and c such that for w ≥ M,

Pr( f 1 (1 − α −2 ) + f 2 (1 − α −2 ) > (α + 1/α)t + wt 1/2 ) ≤ Ce−c min(|w|

2 ,|w|3/2 t 1/4 )

, (9.10)

and for w ≤ −M, 2

Pr( f 1 (1 − α −2 ) + f 2 (1 − α −2 ) < (α + 1/α)t + wt 1/2 ) ≤ Ce−c|w| .

(9.11)

Moreover, if we define χ0 (t) =

f 1 (1 − α −2 ) + f 2 (1 − α −2 ) − (α + 1/α)t p , (1/α − 1/α 3 )t

(9.12)

then χ0 (t) converges to a standard normal distribution, both in distribution and moments.

248

BAIK AND RAINS

Proof That χ0 (t) converges as stated follows from the fact that if we write χ0 (t) = χ1 (t) + χ2 (t), with f 1 (1 − α −2 ) − (α − 1/α)t p , (1/α − 1/α 3 )t

χ1 (t) =

(9.13)

f 2 (1 − α −2 ) − (2/α)t p , (1/α − 1/α 3 )t

χ2 (t) =

(9.14)

then χ1 (t) converges in distribution and moments to a standard normal distribution, and χ2 (t) converges in distribution and moments to zero. For the large-deviation bounds, we note that if x + y > z + w, then either x > z or y > w. Thus for any 0 ≤ b ≤ 1, we have Pr f 1 (1 − α −2 ) + f 2 (1 − α −2 ) > (α + 1/α)t + wt 1/2 ≤ Pr f 1 (1 − α −2 ) > (α − 1/α)t + bwt 1/2 + Pr f 2 (1 − α −2 ) > (2/α)t + (1 − b)wt 1/2 2

≤ Ce−c|bw| + Ce−c|(1−b)w|

3/2 t 1/4

;

(9.15)

2

the result follows by balancing the two terms. In the other case, the Ce −c|w| -term always dominates. LEMMA 9.6 For any sufficiently small > 0, there exist positive constants C and c such that Pr s + − (1 − 1/α 2 ) > t /3−1/3 < Ce−ct (9.16)

and

Pr((1 − 1/α 2 ) − s + > t /2−1/2 ) < Ce−ct

(9.17)

for all sufficiently large t. Proof Define a sequence si by taking si = 1 − (1 − 2/(i + 2))2 /α 2

(9.18)

for all i ≥ 0. Similarly, define a sequence si0 by si0 = max(ti , 0), with ti = 1 − (1 + 2e2

−1−i

)2 /α 2

(9.19)

(9.20)

RANDOM INVOLUTIONS

249

for i < 0 and ti = 1 − (1 + 4/(i + 1))2 /α 2

(9.21)

for i ≥ 0. Note that si is strictly decreasing and ti is strictly increasing. 9.7 For all i ≥ 0, LEMMA

p αsi + 2 1 − si+1 < α + 1/α.

For all i,

(9.22)

q 0 αsi+1 + 2 1 − si0 < α + 1/α.

(9.23)

Proof In the first case, we have p α + 1/α − (αsi + 2 1 − si+1 ) =

4 (i

+ 2)2 (i

+ 3)α

.

(9.24)

In the second case, it suffices to verify the formula with s 0 replaced by t. For i < −1, p −i (9.25) α + 1/α − (αti+1 + 2 1 − ti ) = 4e2 /4 /α. For i = −1,

Finally, for i ≥ 0,

p α + 1/α − (αti+1 + 2 1 − ti ) = (24 − 4e)/α.

p α + 1/α − (αti+1 + 2 1 − ti ) =

8i . (i + 1)(i + 2)2 α

(9.26)

(9.27)

Let i 1 = t 1/6−/6 , i 2 = t 1/4−/4 . Then there exist constants C and c such that for 0 ≤ i ≤ i1, Pr f 1 (si ) + f 2 (si+1 ) > f 1 (1 − α −2 ) + f 2 (1 − α −2 ) < Ce−ct . (9.28)

Since f 1 (si ) + f 2 (si+1 ) is an upper bound on f 1 (s) + f 2 (s) with si+1 ≤ s ≤ si , it follows that Pr(s + ∈ [si+1 , si ]) < Ce−ct . (9.29) Similarly, for i ≤ i 2 ,

0 ) + f 2 (si0 ) > f 1 (1 − α −2 ) + f 2 (1 − α −2 ) < Ce−ct , Pr f 1 (si+1

(9.30)

and thus

0 Pr(s + ∈ [si0 , si+1 ]) < Ce−ct .

(9.31)

Since there are only i 1 + i 2 + log log α such events to consider, Lemma 9.6 is proved.

250

BAIK AND RAINS

In particular, with probability 1 − Ce −ct , we have f 1 (1 − α −2 − t /2−1/2 ) + f 2 (1 − α −2 + t /3−1/3 ) ≤ L(t)

≤ f 1 (1 − α −2 + t /3−1/3 ) + f 2 (1 − α −2 − t /2−1/2 ). (9.32)

But then, using the fact that f 1 (1 − α −2 ) − f 1 (1 − α −2 − t /2−1/2 )

(9.33)

f 1 (1 − α −2 − t /2−1/2 ) − f 1 (1 − α −2 )

(9.34)

and are Poisson and using the large-deviation behavior of f 2 (s), we find that 0 Pr L(t) − f 1 (1 − α −2 − t /2−1/2 ) + f 2 (1 − α −2 + t /3−1/3 ) ≥ t 1/2− ≤ Ce−ct (9.35) and 0 Pr f 1 (1 − α −2 − t /3−1/3 ) + f 2 (1 − α −2 + t /2−1/2 ) − L(t) ≥ t 1/2− ≤ Ce−ct . (9.36) So χ(t)−χ0 (t) converges to zero in a fairly strong sense; in particular, χ(t) and χ 0 (t) must have the same limiting distribution and limiting moments. Thus Theorem 9.2 is proved. Remark. The above proof could be applied equally well to the Poisson process; the beta distribution would then be replaced by a Poisson distribution. For the signed involution case, with [2αt] fixed points, [2βt] negated points, and [2t 2 ] 2-cycles, again we let χ[t 2 ],[2αt],[2βt] =

L [t 2 /2],[αt],[βt] − 2(α + 1/α)t p . 2(1/α − 1/α 3 )t

(9.37)

Then the analogous argument proves that χ 0 (t) also converges in moments and distribution to a standard normal. 10. Steepest descent–type analysis for Riemann-Hilbert problems In this section, we prove the asymptotics of orthogonal polynomials results stated in Section 5 by applying the steepest descent–type method to RHP (5.4). The steepest descent method for RHP’s, the Deift-Zhou method, was introduced by Deift and X. Zhou in [18], developed further in [19] and [16], and finally placed in a systematic

RANDOM INVOLUTIONS

251

form by Deift, S. Venakides, and Zhou in [17]. The steepest descent analysis of RHP (5.4) was first conducted in [3]. The analysis of [3] has many similarities with [13], [14], and [15] where the asymptotics of orthogonal polynomials on the real line with respect to a general weight is obtained, leading to a proof of universality conjectures in random matrix theory. As mentioned in the introduction and Section 5, in this section we extend the analysis of [3] and obtain new estimates on the orthogonal polynomials πk (z; t). The extension is done roughly in two categories. In [3], the quantity of interest was Y21 (0; k; t), and so the z-dependence of the error bound of Y (z; k; t) was not considered carefully. But in the present paper we need the asymptotics of Y (z) for general z ∈ C and also for the case when z → −1 as k, t → ∞. Hence the first category of our extension is to investigate how the error estimate depends on z. This task sometimes requires improved estimates of the solution Y (z) (see, e.g., (10.42), where an improved L 1 -norm bound of the jump matrix is needed). On the other hand, as we see, the asymptotic solution Y (z) is expressed in terms of the so-called g-function. Thus we need detailed analysis of the g-function to obtain the asymptotics of the orthogonal polynomials. In the special case z = 0, we have g(0) = πi (see [3, Lem. 4.2]). Hence in [3], the analysis of the g-function was quite simple. But in the present paper, we need general values of g(z), and this in some cases requires further analysis. Hence the analysis of the g-function is the second category of our extension (e.g., see (10.109)–(10.121), where we need a further analysis of the g-function). Again the analysis in this section relies heavily on the analysis of [3] and we extend the method of [3]. For continuity of presentation and also for the convenience of readers, we nevertheless include some calculations that overlap [3]. When the analysis overlaps that of [3], we only sketch the method, and instead we focus on new features to indicate how to prove the propositions in Section 5. We say that an RHP is normalized at ∞ if the solution m satisfies the condition m → I as z → ∞. Thus, for instance, RHP’s (2.15) and (10.1) are normalized at ∞, while RHP (5.4) is not. In [3] it turned out that the asymptotic analysis differs critically when (2t)/k ≤ 1 and (2t)/k > 1, due to the difference of (the support of) the associated equilibrium measure (see [3, Lem. 4.3]). Hence we discuss those two cases separately in Sections 10.1 and 10.2, which extend [3, Secs. 5 and 6], respectively. Each section is also divided into three subcases. In each subcase the corresponding case of the propositions in Section 5 (except Proposition 5.8) is proved. Section 10.3 is new, and Proposition 5.8 is proved there.

252

BAIK AND RAINS

10.1. When (2t)/k ≤ 1 The following algebraic transformations (10.1)–(10.10) of RHP’s are taken from [3, (5.1)–(5.3)]. Define (−1)k et z 0 m (1) (z; k; t) := Y (z; k; t) , |z| < 1, 0 (−1)k e−t z ! (10.1) −k et z −1 z 0 (1) , |z| > 1. m (z; k; t) := Y (z; k; t) −1 0 z k e−t z Then m (1) solves a new RHP that is equivalent to RHP (5.4) in the sense that a solution of one RHP yields algebraically a solution of the other RHP, and vice versa:   m (1) (z; k; t) is analytic in C \ 6,   !  −1  (−1)k z k et (z−z ) (−1)k (1) (1) m (z; k; t) = m − (z; k; t) on 6, −1  + 0 (−1)k z −k e−t (z−z )     (1) m (z; k; t) = I + O(1/z) as z → ∞, (10.2) where 6 is the unit circle oriented counterclockwise as before. Here and in the sequel, m + (z) (resp., m − (z)) is understood as the limit from the left-hand (resp., right-hand) side of the contour as one goes along the orientation of the contour. Now we define m (2) (z; k; t) in terms of m (1) (z; k; t) as follows: for even k,  m (2) ≡ m (1) , |z| > 1, (2) (1) 0 −1 m ≡ m 1 0 , |z| < 1; for odd k,  m (2) ≡ m (2) ≡

1 0 1 0

(1) 1 0 , 0 −1 m 0 −1 0 m (1) 0 −1 , −1 −1 0

(10.3) |z| > 1, |z| < 1.

Then m (2) (·; k; t) solves another RHP  m (2) (z; k; t) = m (2) (z; k; t)v (2) (z; k; t) on 6, + − m (2) (z; k; t) = I + O(1/z) as z → ∞,

where

v

(2)

1 (z; k; t) = −1 (−1)k z −k e−t (z−z )

−(−1)k z k et (z−z 0

−1 )

!

.

(10.4)

(10.5)

RANDOM INVOLUTIONS

253

The jump matrix has the following factorization: ! ! k z k et (z−z −1 ) 1 0 1 −(−1) (2) −1 (2) ) b+ . v (2) = =: (b− −1 (−1)k z −k e−t (z−z ) 1 0 1 (10.6) We note that through the changes Y → m (1) → m (2) , we have (2)

Y11 (z; k; t) = −(−1)k e−t z m 12 (z; k; t), Y21 (z; k; t) = Y11 (z; k; t) = Y21 (z; k; t) =

|z| < 1,

(2) −e−t z m 22 (z; k; t), |z| < 1, −1 (2) z k e−t z m 11 (z; k; t), |z| > 1, −1 (2) (−z)k e−t z m 21 (z; k; t), |z| >

(10.7) (10.8) (10.9)

1.

(10.10)

As in [3, (5.4)], the absolute value of the (12)-entry of the jump matrix v (2) is ek F(ρ,θ;2t/k) where F(z; γ ) = F(ρeiθ ; γ ) :=

γ (ρ − ρ −1 ) cos θ + log ρ, 2

The absolute value of the (21)-entry of v (2) is e−k F(ρe

iθ ;2t/k)

z = ρeiθ .

(10.11)

. Note that

F(ρ, θ; γ ) = −F(ρ −1 , θ; γ ). sig

(10.12)

sig

sig

sig

Figure 3 shows the curves F(z; γ ) = 0. In 1 ∪ 3 , F > 0, and in 2 ∪ 4 , sig F < 0. The region 2 becomes smaller as γ increases, and when γ = 1, the curve F(z; γ ) = 0 contacts the unit circle 6 at z = −1 with the angle π/3. We distinguish three subcases, as in [3, Sec. 5]. 10.1.1. The case 0 ≤ 2t ≤ ak for some 0 < a < 1 sig It is possible to fix ρa < 1 such that the circle {z : |z| = ρa } is in the region 2 for all such t and k. Define m (3) (z; k; t) by (see [3, (5.9)])  (3) (2) (2) −1   m = m (b+ ) , ρa < |z| < 1, (2) −1 (10.13) m (3) = m (2) (b− ) , 1 < |z| < ρa−1 ,   m (3) = m (2) , |z| < ρ , |z| > ρ −1 . a

(3)

a

(3)

Then m (3) satisfies a new jump condition m + = m − v (3) on 6 (3) := {z : |z| = (2) (2) −1 ρa , ρa−1 }, where v (3) = b+ , |z| = ρa and v (3) = (b− ) , |z| = ρa−1 . This 6 (3) is not the best choice (see Section 10.1.2). But for a simple and direct estimate, we use this choice in this section. From the choice of ρa , we have (see [3, (5.13)–(5.14)]) |v (3) (z; k; t) − I | ≤ e−ck

for all z ∈ 6 (3) ,

(10.14)

254

BAIK AND RAINS

sig

4

−1 sig

3

sig

1

0

sig 2

Figure 3. Curves of F(z; γ ) = 0 when 0 < γ < 1

which implies that I − C w(3) is invertible and the norm of the inverse is uniformly bounded, where w (3) := v (3) − I and C w(3) ( f ) := C − ( f w (3) ) on L 2 (6 (3) , |dz|), C ± being Cauchy operators (see [3, (2.5)–(2.9) and references therein]). From the general theory of RHP’s, we have m

(3)

1 (z) = I + 2πi

Z

6 (3)

((I − C w(3) )−1 I )(s)w (3) (s) ds, s−z

z∈ / 6 (3) .

(10.15)

This implies the estimates (see [3, (5.16)]) |m (3) 22 (0; k; t) − 1|,

−ck |m (3) , 12 (0; k; t)| ≤ Ce

(10.16)

which, using (10.13), (10.7), (10.8) and (5.5), (5.6), yield Proposition 5.1(i). This is precisely the result contained in [3, (5.17)]. From (10.13), (10.7), (10.9), and (5.6), we have (3)

πk (z; t) = −(−1)k e−t z m 12 (z; k; t), πk (z; t) = πk (z; t) =

k −t z −1

|z| < ρa ,

(3) z e m 11 (z; k; t), |z| > ρa−1 , −1 (3) (3) z k e−t z m 11 (z; k; t) − (−1)k e−t z m 12 (z; k; t),

(10.17) (10.18) ρa < |z| < ρa−1 . (10.19)

RANDOM INVOLUTIONS

255

Let 0 < b < 1 be a fixed number. From Figure 3, we could have chosen ρa such that ρa > b. When |z| ≤ b and |z| ≥ b −1 , we have dist(z, 6 (3) ) ≥ c > 0. Since z is uniformly bounded away from the contour, we can extend the argument leading to [3, (5.17)] where the uniform boundedness of zero from the contour is used. Hence, using (10.15), (10.17) and (10.18) imply that |et z πk (z; t)| ≤ Ce−ck , |e

t z −1

|z| ≤ b,

z −k πk (z; t)| ≤ Ce−ck ,

(10.20)

|z| ≥ b−1 .

(10.21)

These are (5.29) and (5.30) in Proposition 5.6 of the special case x ≥ 21/3 (1 − a)k 2/3 . On the other hand, let L > 0 be a fixed number. Set α = 1 − 24/3 k −1/3 w with −L ≤ w ≤ L, as in Proposition 5.6. Since ρa is fixed, when k is large, dist(−α, 6 (3) ) ≥ c > 0. Then from (10.15), (3)

(3)

|m 12 (−α; k; t)| ≤ Ce−ck .

|m 11 (−α; k; t) − 1|,

(10.22)

Note that 1 (s − s −1 ) ≤ s − 1, s > 0, 2 1 2 1 ≤ s ≤ 1, − (s − s −1 ) + log s ≤ (1 − s)3 , 2 3 2 1 − (s − s −1 ) + log s ≤ 0, s ≥ 1. 2

(10.23) (10.24) (10.25)

Thus for γ ≤ 1, s ≥ 1/2, 1−γ 1 γ F(−s; γ ) = − (s − s −1 ) + log s = (s − s −1 ) − (s − s −1 ) + log s 2 2 2 2 3 ≤ (1 − γ )(s − 1) + |s − 1| . 3 (10.26) For large k, α ≥ 1/2 for all −L ≤ w ≤ L, and hence (−α)k e−t (α−α −1 ) = ek F(−α;2t/k) ≤ e(32/2)|w|3 −k(1−2t/k)(24/3 w)/k 1/3 3

≤ e(32/2)L e2

Similarly, since α −1 ≥ 1/2 for large k, (−α)−k et (α−α −1 ) = ek F(−α −1 ;2t/k) ≤ ek(1−2t/k)(2

Therefore, from (10.19) and (10.22),

4/3 Lk 2/3

= Ceck

4/3 w)/(k 1/3 α)+(32/2)|w|3 α −3

2/3

≤ Ceck

(10.27)

.

2/3

(10.28) .

256

BAIK AND RAINS

−tα e πk (−α; t) −1 k (3) ck 2/3 , (10.29) = (−α)k e−t (α−α ) m (3) 11 (−α; k; t) − (−1) m 12 (−α; k; t) ≤ Ce −tα −1 e (−α)−k πk (−α; t) − 1 (3) −1 (3) = m 11 (−α; k; t) − 1 − α −k et (α−α ) m 12 (−α; k; t) ≤ Ce−ck . (10.30)

Noting x ∼ k 2/3 , these are (5.31) and (5.32) in Proposition 5.6 for the special case x ≥ 21/3 (1 − a)k 2/3 . Thus we have extended the argument of [3, (5.17)] to the case when z → −1. This is an example of the extension of the first category mentioned at the beginning of Section 10 (though it is straightforward to extend in this case).

10.1.2. The case ak ≤ 2t ≤ k − M2−1/3 k 1/3 for some 0 < a < 1 and M > M0 In Section 10.1.1, the contour (3) was not the best choice. We could have chosen the steepest descent curve for F(z; γ ). For the previous case, it was not necessary to use the steepest descent curve to obtain the desired results, but for the case at hand and in the future calculations, we need to use the steepest descent curve. For fixed θ satisfying 0 ≤ θ < π/2 or (3π)/2 < θ < 2π, F(ρ, θ; γ ) is always negative for 0 < ρ < 1, and as ρ ↓ 0, it decreases to minus infinity. On the other hand, one can check that (see [3, (5.5)]) when γ ≤ 1, the minimum of F(ρ, θ; γ ), 0 < ρ ≤ 1, is attained, for fixed π/2 ≤ θ ≤ (3π)/2, at p 1 − 1 − γ 2 cos2 θ ρ = ρθ := , (10.31) −γ cos θ and F(ρθ , θ; γ ) < 0. And it is straightforward to check that for 0 ≤ γ ≤ 1, π/2 ≤ θ ≤ 3π/2,

F(ρθ , θ; γ ) p √ q 2 2 1 − 1 − γ 2 cos2 θ 2 2 = 1 − γ cos θ + log ≤− (1 + γ cos θ)3/2 . −γ cos θ 3 (10.32) This is an extension of [3, (5.13)], where only the case when θ = π was considered. We need this improved version to obtain a better L 1 -estimate of v (3) − I in the sequel. Also, F(ρθ , θ; γ ) is increasing in π/2 ≤ θ ≤ π and is decreasing in π ≤ θ ≤ 3π/2. In fact, the saddle points for (γ /2)(z − z −1 ) + log z are z = −ρπ and z = −ρπ−1 . (3) (3) This time, define 6 (3) := 6in ∪ 6out , as in [3, (5.6)], by (3)

6in = {ρθ eiθ : 3π/4 ≤ θ ≤ 5π/4} ∪ {ρ3π/4 eiθ : 0 ≤ θ ≤ 3π/4, 5π/4 ≤ θ < 2π}, (3)

−1 iθ 6out = {ρθ−1 eiθ : 3π/4 ≤ θ ≤ 5π/4} ∪ {ρ3π/4 e : 0 ≤ θ ≤ 3π/4, 5π/4 ≤ θ < 2π},

(10.33)

RANDOM INVOLUTIONS

257

(3)

Ω4

(3)

Ω3 (3)

Ω2 (3)

−1

Ω1

0

(3)

Σ in

Σ

(3)

Σout

Figure 4. 6 (3) and (3) when γ < 1

where ρθ is defined in (10.31) with γ = (2t)/k. Orient 6 (3) as in Figure 4. Note that sig 6 (3) lies in 2 , and for 3π/4 ≤ θ ≤ 5π/4, it is the steepest descent curve. The reason why we choose a part of the circle as the contour for the remaining angles is to ensure the uniform boundedness of the Cauchy operators (see [3, Sec. 5]). This does not affect the asymptotics since, as we see, the main contribution to the asymptotics comes from the neighborhood of z = −1. (3) Define the regions (3) j , j = 1, . . . , 4, as in Figure 4. Define m (z; k; t), as in [3, (5.9)], by  (2) −1  ) in (3) m (3) = m (2) (b+  2 ,  (2) (3) (10.34) m (3) = m (2) (b− )−1 in 3 ,    (3) (3) (3) m = m (2) in 1 , 4 , (2)

where b± are defined in (10.6). Then m (3) solves a new RHP with the jump matrix v (3) (z; k; t) where  ! −1  1 −(−1)k z k et (z−z )  (3) (3)   v = on 6in ,   0 1 (10.35) !   1 0 (3)  (3) v = on 6out .  −1  (−1)k z −k e−t (z−z ) 1 (3)

Set w (3) := v (3) − I . For z ∈ 6in , from the choice of 6 (3) and (10.32), the (12)-entry of the jump matrix satisfies for 3π/4 ≤ arg z ≤ 5π/4, |z k et (z−z

−1 )

| = ek F(ρθ ,θ;2t/k) ≤ e−(2

√ 2/3)k(1+(2t/k) cos θ)3/2

258

BAIK AND RAINS

≤ e−(2

√ 2/3)k(1−2t/k)3/2

≤ e−(2/3)M

3/2

(10.36)

,

and for 0 ≤ arg z ≤ 3π/4 or 5π/4 ≤ arg z < 2π, |z k et (z−z

−1 )

| = ek F(ρ3π/4 ,θ;2t/k) ≤ ek F(ρ3π/4 ,3π/4;2t/k) ≤ e−(2

√ 2/3)k(1+(2t/k) cos(3π/4))3/2

From (10.12), similar estimates hold for z −k e−t (z−z kw (3) k L ∞ (6 (3) ) ≤ Ce−(2

−1 )

≤ e−(1/24)k .

(10.37)

(3) , z ∈ 6out . Thus we have

√ 2/3)k(1−2t/k)3/2

(10.38)

.

Also, there exists M0 such that for M > M0 , kCw(3) k L 2 (6 (3) )→L 2 (6 (3) ) ≤ c1 < 1, and hence (10.15) holds. This is precisely [3, (5.18)]. For this derivation we do not need the extension (10.32) of [3, (5.13)]. But for the improved L 1 -norm estimate of w (3) , which we do now, we need (10.32). Note that |dz| ≤ C|dθ| on 6 (3) . Using the estimates in (10.36) and (10.37), Z k t (z−z −1 ) z e |dz| (3)

6in

≤C

Z

5π/4

e−(2

√ 2/3)k(1+(2t/k) cos θ)3/2

3π/4

dθ + C

Z

e−(1/24)k dθ.

[0,2π)\[3π/4,5π/4]

(10.39)

The second integral is clearly less than Ce −(1/24)k . For the first integral, recall the a a a inequality √ (x + y) 2≥ x + y , x, y > 0, a ≥ 1. Then using the inequality 1 + cos θ ≥ (1/(2 2))(θ − π) for θ ∈ [3π/4, 5π/4], together with the condition ak ≤ 2t, the first integral is less than or equal to Z

5π/4

e−(2

√ 2/3)k[(1−2t/k)3/2 +((2t/k)(1+cos θ))3/2 ]

3π/4

≤e

√ −(2 2/3)k(1−2t/k)3/2

Z

5π/4

dθ e−(a

3/2 /(3·23/4 ))k|θ−π|3

dθ, (10.40)

3π/4

where the last inequality is less than or equal to Ck −1/3 for some constant C > 0. Therefore, adjusting constants, we obtain Z √ k t (z−z −1 ) z e |dz| ≤ C e−(2 2/3)k(1−2t/k)3/2 . (10.41) (3) k 1/3 6in

(3) We have similar estimates on 6out . Therefore

kw (3) k L 1 (6 (3) ) ≤

C k

e−(2 1/3

√ 2/3)k(1−2t/k)3/2

.

(10.42)

RANDOM INVOLUTIONS

259

This is a refinement of [3, (5.23)]. Now from (10.15), we have Z 1 w (3) (s) m (3) (z) = I + ds 2πi 6 (3) s − z Z [(I − Cw(3) )−1 Cw(3) I ](s)w (3) (s) 1 ds, + 2πi 6 (3) s−z

z∈ / 6 (3) . (10.43)

(3) In [3], only the term m (3) 22 (z) (at z = 0) was of interest. But then w22 = 0, and the first integral in (10.43) was zero. As computed in [3, (5.20)], the second integral was bounded by the product of the L ∞ - and L 1 -norms of w (3) , and then, due to (10.38), (3) the estimate kw (3) k L 1 ≤ Ck −1/3 in [3, (5.23)] was enough to control m 22 . But in the (3) present paper, we need estimates of m 12 for πk (z), and hence we need an estimate of the first integral which is the same as a bound on the L 1 -norm of w (3) . The L 1 bound that [3, (5.23)] obtained is not good enough for this purpose, and we need an improved estimate on the L 1 -norm of w (3) . Now by (10.42), the first integral is less than or equal to √ C −(2 2/3)k(1−2t/k)3/2 e , (10.44) dist(z, 6 (3) )k 1/3 while, as in [3, (5.20)], the second integral is less than or equal to, by using (10.38) and (10.42),

1 k(I − C w(3) )−1 Cw(3) I k L 2 kw (3) k L 2 2π dist(z, 6 (3) ) 1 k(I − C w(3) )−1 k L 2 →L 2 kCw(3) I k L 2 kw (3) k L 2 ≤ 2π dist(z, 6 (3) ) C kw (3) k2L 2 ≤ dist(z, 6 (3) ) C kw (3) k L ∞ kw (3) k L 1 ≤ dist(z, 6 (3) ) √ C −(4 2/3)k(1−2t/k)3/2 ≤ e . dist(z, 6 (3) )k 1/3

(10.45)

When z = 0, dist(z, 6 (3) ) ≥ c1 > 0; hence, using (10.34), (10.7), (10.8) and (5.5), (5.6), we obtain Proposition 5.1(ii). As in (10.17)–(10.19), from (10.13), (10.7), (10.9), and (5.6), we have (3)

πk (z; t) = −(−1)k e−t z m 12 (z; k; t), −1

(3)

πk (z; t) = z k e−t z m 11 (z; k; t), πk (z; t) = z k e

−t z −1

(3)

(3)

z ∈ 1 ,

(10.46)

(3)

z ∈ 4 ,

(10.47) (3)

m 11 (z; k; t) − (−1)k e−t z m 12 (z; k; t),

(3)

(3)

z ∈ 2 ∪ 3 . (10.48)

260

BAIK AND RAINS

Define x by 2t x (10.49) = 1 − 1/3 2/3 , k 2 k as in Proposition 5.6. Let 0 < b < 1 be a fixed number. Given b, from the beginning, we could have chosen 0 < a < 1 such that p 1 − 1 − a 2 cos2 θb ρ θb = (10.50) −a cos θb is strictly greater than b for some π/2 ≤ θb < π. Note that in (10.33) the choice of 3π/4 and 5π/4 was arbitrary in defining 6 (3) . Instead of 3π/4 and 5π/4, this time we use θb and 2π − θb , and we carry this forward through the later calculations. Thus we obtain the same estimates of (10.44) and (10.45) with different constants C. Now (3) (3) |z| ≤ b lies in 1 , and |z| ≥ b−1 lies in 4 . Since the distance dist(z, 6 (3) ) ≥ c2 > 0, using (10.43), (10.44), (10.45), (10.46), and (10.47), we obtain (5.29) and (5.30) in Proposition 5.6 for M ≤ x ≤ (1 − a)21/3 k 2/3 . Since in (10.20) and (10.21) in Section 10.1.1 the choice of 0 < a < 1 was arbitrary, we obtain (5.29) and (5.30) in Proposition 5.6 for all x ≥ M. √ On the other hand, let 0 < L < 2−3/2 M be a fixed number. Set α = 1 − 24/3 k −1/3 w with −L ≤ w ≤ L as in Proposition 5.6. From the inequality (1 − p √ 1 − γ 2 )/γ ≤ 1 − 1 − γ for all 0 ≤ γ ≤ 1, we have p r √ M 1 − 1 − (2t/k)2 2t ρπ = ≤1− 1− ≤ 1 − 1/6 1/3 . (10.51) 2t/k k 2 k But α =1−

24/3 w 24/3 L ≥ 1 − . k 1/3 k 1/3

(10.52)

Hence dist(−α, 6 (3) ) ≥ Ck −1/3 . Thus from (10.43), (10.44), and (10.45), we have |m (3) (−α; k; t) − I | ≤ Ce−(2/3)x (3)

3/2

,

(10.53)

(3)

which together with (10.48) (note that −α ∈ 2 ∪ 3 ) implies that

−tα e πk (−α; t) ≤ (−α)k e−t (α−α −1 ) 1 + Ce−c|x|3/2 + Ce−c|x|3/2 , (10.54) −tα −1 3/2 −1 ) 3/2 −k −c|x| −k t (α−α −c|x| e Ce (−α) πk (−α; t) − 1 ≤ Ce + (−α) e . (10.55)

For large k, α ≥ 1/2 for all −L ≤ w ≤ L, and hence, using (10.26), we obtain, as in (10.27) and (10.28), (−α)k e−t (α−α −1 ) = ek F(−α;2t/k) ≤ e−2wx+(32/2)|w|3 ≤ Cec|x| (10.56)

RANDOM INVOLUTIONS

261 PII,2

Ω1

PII,2

Ω2

0 PII,2

Ω3

PII,2

Ω4

Figure 5. 6 PII,2 and PII,2 j

and

(−α)−k et (α−α −1 ) = ek F(−α −1 ;2t/k) ≤ Cec|x| .

(10.57)

Thus from (10.54) and (10.55), we obtain (5.31) and (5.32) in Proposition 5.6. 10.1.3. The case k − M2−1/3 k 1/3 ≤ 2t ≤ k for some M > 0 In this case, as k → ∞, the point z = −ρπ on the deformed contour 6 (3) defined in (10.33) approaches z = −1 rapidly, and so we need to pay special attention to the neighborhood of z = −1. More precisely, we need to introduce the so-called parametrix for the RHP around z = −1, which is an approximate local solution. Recall RHP (2.15) for the Painlevé II equation. Let 6 PII,2 = 61PII,2 ∪ 62PII,2 be a contour of the general shape indicated in Figure 5. Asymptotically for large z, the curves are straight lines of angle less than π/3 (see [3, paragraph after (2.18)] for more precise discussions on the curve). We define the exact shape of 6 PII,2 below. Define m PII,2 (z; x) by  ! −2i((4/3)z 3 +x z)  1 e  PII,2 m (z, x) = m(z; x) in 2PII,2 ,    0 1    ! (10.58) 1 0 PII,2  m (z, x) = m(z; x) 2i((4/3)z 3 +x z) in 3PII,2 ,    e 1     m PII,2 (z, x) = m(z; x) in 1PII,2 , 4PII,2 , where m(z; x) is the solution of the RHP for the PII equation given in (2.15). Then

262

BAIK AND RAINS

m PII,2 solves a new RHP (see [3, (2.19)])   m PII,2 is analytic in C \ 6 PII,2 ,     !   −2i((4/3)z 3 +x z)   1 −e PII,2 PII,2   in 61PII,2 ,  m + = m − 0 1 !   1 0  PII,2 PII,2   m+ = m− in 62PII,2 , 3 +x z)  2i((4/3)z  e 1      m PII,2 = I + O 1/z as z → ∞.

(10.59)

Also, m 1PII,2 (x) defined by m PII,2 (z; x) = I + m 1PII,2 (x)/z + O(z −2 ) satisfies m 1PII,2 (x) = m 1 (x), where m 1 (x) is defined in a manner similar to that in (2.16). Set x by 2t x = 1 − 1/3 2/3 . k 2 k

(10.60)

(10.61)

We define 6 (3) and m (3) as in (10.33) and (10.34). Now we proceed as in [3, (5.25)– (5.35)]. Let O be the ball of radius around z = −1, where > 0 is a small fixed number. Define the map (see [3, equation displayed between (5.26) and (5.27)]) 1 λ(z) := −i2−4/3 k 1/3 (z − z −1 ) 2

(10.62)

in O . Define 6 PII,2 by 6 PII,2 ∩ λ(O ) := λ(6 (3) ∩ O ), and extend it smoothly outside λ(O ) as indicated in [3, (5.29)]. Define m PII,2 as above using this contour. Now we define the parametrix by (see [3, equation displayed between (5.30) and (5.31)]) ( m p (z; k; t) = m PII,2 (λ(z), x) in O \ 6 (3) , (10.63) m p (z; k; t) = I in O¯ c \ 6 (3) . It is proved in [3, (5.25)–(5.34)] that if we take small enough but fix it, then the ratio R(z; k; t) := m (3) m −1 p solves a new RHP     R(z; k; t)

is analytic in C \ 6 R ,

R+ (z; k; t) = R− (z; k; t)v R (z; k; t) on 6 R ,    R(z; k; t) = I + O(1/z) as z → ∞,

(10.64)

(10.65)

RANDOM INVOLUTIONS

263

where 6 R := ∂ O ∪ 6 (3) , and the jump matrix satisfies (see [3, (5.34)])  2/3  on O ∩ 6 (3) ,  kv R − I k L ∞ ≤ C/k kv R − I k L ∞ ≤ Ce−ck   kv − I + m PII,2 (x)/(λ(z))k ∞ ≤ C/k 2/3 R L 1

on O c ∩ 6 (3) ,

(10.66)

on ∂ O , as k → ∞,

with some positive constants C and c which may depend on M. Set w R := v R − I . Using (10.15), which holds generally, we have (see [3, (5.35) and the preceding calculations]) Z ((I − C w R )−1 I )(s)(w R (s)) 1 R(z; k; t) = I + ds 2πi 6 R s−z Z v R (s) − I 1 =I+ ds (10.67) 2πi 6 R s − z Z [(I − Cw R )−1 Cw R I ](s)w R (s) 1 ds. + 2πi 6 R s−z Now the absolute value of the second integral is less than or equal to (recall that |λ(z)| = O(k −1/3 ) for z ∈ ∂ O ) C k(I − C w R )−1 k L 2 (6 R )→L 2 (6 R ) kCw R I k L 2 (6 R ) kw R k L 2 (6 R ) dist(z, 6 R ) C ≤ kw R k2L 2 (6 ) R dist(z, 6 R ) C ≤ , (10.68) dist(z, 6 R )k 2/3 and similarly, the first integral satisfies Z Z 1 m 1PII,2 (x) v R (s) − I 1 C ≤ ds + ds 2πi dist(z, 6 )k 2/3 . s − z 2πi λ(s)(s − z) R 6R ∂O (10.69) Hence Z PII,2 (3) −1 (x) m C 1 1 m (z; k; t) m p (z; k; t) −I + ds ≤ . 2πi dist(z, 6 R )k 2/3 ∂ O λ(s)(s − z) (10.70)

This is an extension of [3, (5.35)] to the case when z 6= 0. For z = 0, from (10.63) and (10.64), R(0) = m (3) (0) and dist(0, 6 R ) ≥ c1 > 0. Note that λ(s) is analytic in O except at s = −1, and 1 λ(s) = −i2−4/3 k 1/3 [(s + 1) + (s + 1)2 + . . .], 2

s ∼ −1.

(10.71)

264

BAIK AND RAINS

By a residue calculation for (10.70), we have, as in [3, (5.35)]), m

(3)

i24/3 m 1PII,2 (x) 1 (0; k; t) = I + + O 2/3 . k 1/3 k

(10.72)

Thus, using (2.17) and (2.18), from (5.5), (5.6), (10.7), (10.8), and (10.34), we obtain Proposition 5.1(iii) of the case when 0 ≤ x ≤ M. We now prove Proposition 5.2 when x ≥ 0. Since the choice of M was arbitrary in our calculations, for fixed x we choose M > 0 large enough so that x < M. Let z ∈ C\6 be fixed. We first assume |z| < 1. By modifying the contour 6 (3) if necessary, as (3) in (10.50) and the following paragraph, we have z ∈ 1 and dist(z, 6 (3) ) ≥ c1 > 0. Thus from (10.70), |R(z) − I | ≤ Ck −1/3 with some constant C that depends on x. Thus from (5.6), (10.7), (10.34), and (10.64), we obtain the first limit of (5.13). A similar calculation applies to the case |z| > 1, and we obtain the first limit of (5.14). The second limits of (5.13) and (5.14) follow from the first limits of (5.14) and (5.13), respectively, by replacing z → 1/z. Hence this extends the calculation [3, (5.35)] to the case when z is bounded away from the contour. Finally, we prove Proposition 5.4 when x > 0. Set α =1−

24/3 w , k 1/3

w fixed,

(10.73)

and

x 2t = 1 − 1/3 2/3 , x > 0 fixed. (10.74) k 2 k In this case, −α ∈ O . By a residue calculation again, for w not equal to zero, 1 2πi

Z

∂O

1 1 i24/3 + ds = λ(s)(s + α) λ(−α) (−1 + α)k 1/3 i i =− + 1/3 w w + (2 /k 1/3 )w 2 + · · · 1 = O 1/3 . k

(10.75)

When w = 0, we have the same order O(k −1/3 ) by a similar calculation. On the other hand, since p √ 1 1 − 1 − (2t/k)2 21/3 x x ρπ = =1− + + O , (10.76) 2t/k k k 1/3 21/3 k 2/3 using (10.73) we have dist(−α, 6 R ) ≥ Ck −2/3 . Thus we obtain from (10.70), |R(−α; k; t) − I | ≤ C.

(10.77)

RANDOM INVOLUTIONS

265

Using λ(−α) ∼ −iw, from (10.63) and (10.64), we have lim m (3) (−α; k; t) = m PII,2 (−iw, x)

(10.78)

k→∞

since is arbitrarily small. On the other hand, from the conditions on t and α, we have −1 3 lim α k e−t (α−α ) = e(8/3)w −2xw . (10.79) k→∞

Thus, using (10.34), we obtain (3)

lim m (2) (−α; k; t) = m PII,2 (−iw, x),

(3)

−α ∈ 1 , 4 , ! 3 1 −e(8/3)w −2xw (2) PII,2 lim m (−α; k; t) = m (−iw, x) , k→∞ 0 1 k→∞

lim m (2) (−α; k; t) = m PII,2 (−iw, x)

k→∞

1 3 −e−(8/3)w +2xw

!

0 , 1

(10.80) (3)

−α ∈ 2 , (10.81) (3)

−α ∈ 3 . (10.82)

Now finally using (10.58), for each fixed w and x, we have lim m (2) (−α; k; t) = m(−iw; x).

k→∞

(10.83)

From (10.7)–(10.10) and (10.79), this implies Proposition 5.4 in the case when x > 0. This is a new computation we had to do in the present paper in order to include the case when α → 1. 10.2. When (2t)/k > 1 Throughout this section we set γ :=

2t > 1. k

(10.84)

We need some definitions from [3]. Set 0 < θc < π by sin2 (θc /2) = 1/γ . Define a probability measure on an arc (see [3, (4.13)]), s θ θ 1 γ − sin2 dθ, −θc ≤ θ ≤ θc , (10.85) dµ(θ) := cos π 2 γ 2 and define a constant (see [3, (4.14)]) l := −γ + log γ + 1.

(10.86)

266

BAIK AND RAINS

Now we introduce the so-called g-function (see [3, (4.8)]) Z θc g(z; k; t) := log(z − eiθ ) dµ(θ), z ∈ C \ 6 ∪ (−∞, −1].

(10.87)

−θc

The measure dµ(θ) is the equilibrium measure of a certain variational problem, and the constant l is a related constant (see [3, Sec. 4]). For each |θ| ≤ θc , the branch is chosen such that log(z − eiθ ) is analytic in C \ (−∞, −1] ∪ {eiφ : −π ≤ φ ≤ θ} and behaves like log z as z ∈ R → +∞. The basic properties of g(z) are summarized in [3, Lem. 4.2]. In general, the role of the g-function in RHP analysis, first introduced in [19] and then generalized in [17], is to replace exponentially growing terms in the jump matrix by oscillating or exponentially decaying terms. The authors in [13] introduced a g-function of a form similar to (10.87) to analyze an RHP associated to orthogonal polynomials on the real line. The above g-function (10.87) introduced in [3] is an adaptation of their work to the circle case. When 0 ≤ γ ≤ 1, the related equilibrium measure is (see [3, (4.12)]) dµ(θ) =

1 (1 + γ cos θ) dθ, 2π

−π ≤ θ < π,

with the related constant l = 0, and hence (see [3, (4.15)]) ( log z − γ /(2z), |z| > 1, z ∈ / (−∞, −1), g(z) = −(γ /2)z + πi, |z| < 1.

(10.88)

(10.89)

Since g(z) is explicit in this case, we did not introduce it in the form (10.87) in Section 10.1. Then up to (10.104), we follow the procedure in [3, Sec. 6]. Write 6 = C 1 ∪ C2 , where C2 := {eiθ : −θc < θ < θc } and C1 := 6 \ C 2 . Define m (1) (z; k; t) by m (1) (z; k; t) := e(kl/2)σ3 Y (z; k; t)e−kg(z;k;t)σ3 e−(kl/2)σ3 , (10.90) 0 . Then m (1) solves (see [3, (6.1)]) a new RHP where σ3 = 10 −1   is analytic in C \ 6, m (1) (z; k; t)    !   α (z;k;t)   e−2ke (−1)k (1)  m (1) (z; k; t) = m (z; k; t) on C2 ,  −  + α (z;k;t) 0 e2ke !  k e−2ke α −(z;k;t)  1 (−1)  (1) (1)   on C1 , m + (z; k; t) = m − (z; k; t)   0 1      m (1) (z; k; t) = I + O(1/z) as z → ∞, (10.91)

RANDOM INVOLUTIONS

267

eiθc

eoutside C

einside C

C1

+

(3)

1

(3)

2

e−iθc

(3)

3

(3)

4

Figure 6. 6 (3) and (3) when γ > 1

where e α (z; k; t) is defined by (see [3, Lem. 6.1]) Z γ z s + 1p e α (z; k; t) := − (s − eiθc )(s − e−iθc ) ds, 4 eiθc s 2

ξ := eiθc .

(10.92)

(Notation: We use e α here instead pof α in [3] to avoid confusion with α in (10.140).) The branch is chosen such that (s − eiθc )(s − e−iθc ) is analytic in C \ C 1 and behaves like s as s ∈ R → +∞. Define m (2) (z; k; t) as in (10.3). Then m (2) solves a new RHP, normalized as z → ∞, with the jump matrices (see [3, (6.2)])  ! −2ke α (z;k;t)  1 −e   v (2) (z; k; t) = on C2 ,  α (z;k;t)  e2ke 0 (10.93) !  −2ke α −(z;k;t) −1  e  (2)  on C1 .  v (z; k; t) = 1 0 Through the changes Y → m (1) → m (2) , we have (2)

Y11 (z; k; t) = −ekg(z;k;t) m 12 (z; k; t), k kg(z;k;t)+kl

Y21 (z; k; t) = −(−1) e Y11 (z; k; t) = Y21 (z; k; t) =

|z| < 1,

(2) m 22 (z; k; t),

ekg(z;k;t) m (2) |z| > 1, 11 (z; k; t), k kg(z;k;t)+kl (2) (−1) e m 21 (z; k; t),

(10.94) |z| < 1,

(10.95) (10.96)

|z| > 1.

(10.97)

einside ∪ C eoutside as in Figure 6, which divides C into four Set 6 (3) := C1 ∪ C (3) regions, j , j = 1, . . . , 4. Again, there is a certain freedom in choosing the shape einside and C eoutside . For example, C einside (resp., C eoutside ) can be any smooth curve of C

268

BAIK AND RAINS

(3) iθc and e−iθc ; the precise requirement is given lying in (3) 2 (resp., 3 ) connecting e in [3] (see also (10.100)–(10.102)). Define m (3) (z; k; t) by (see [3, p. 1151])  !−1 α (z;k;t)  1 −e−2ke  (3)  (2)  in (3) m =m  2 ,   0 1   ! (10.98) 1 0 (3) (2) m = m in (3) ,  3  2ke α (z;k;t)  e 1      (3) (3) (3) m = m (2) in 1 , 4 .

Then m (3) solves an RHP, normalized as z → ∞, with the jump matrix given by  ! −2ke α (z;k;t)  1 −e   einside ,  on C    0 1    !   1 0 eoutside , v (3) (z; k; t) = on C (10.99) 2ke α (z;k;t) 1  e    !   α −(z;k;t) −1   e−2ke   on C1 .   1 0 From the properties of g(z), it is proved in [3, between (6.3) and (6.4)] that α −(z;k;t) e−ke →0

e

−ke α (z;k;t)

e

ke α (z;k;t)

→0 →0

as k → ∞, z ∈ C 1 ,

einside , as k → ∞, z ∈ C

eoutside . as k → ∞, z ∈ C

(10.100) (10.101) (10.102)

einside and C eoutside is precisely for these properties. Here the converThe choice of C gence is uniform for any compact part of each contour away from the end points e iθc and e−iθc , but it is not uniform on the whole contour. This gives rise to a technical difficulty that is overcome below using the idea of the parametrix. Formally, v (3) → v ∞ as k → ∞ where  !  1 0  v ∞ (z) = einside ∪ C eoutside ,  on C   0 1 (10.103) !   0 −1  ∞  on C1 .  v (z) = 1 0 ∞ ∞ Thus we expect that m (3) converges to m ∞ , the solution of the RHP m ∞ + = m− v ∞ ∞ with m → I as z → ∞. The solution m is easily given by (see [3, Lem. 6.2]) (1/2i)(β − β −1 ) (1/2)(β + β −1 ) ∞ m (z) = , (10.104) −(1/2i)(β − β −1 ) (1/2)(β + β −1 )

RANDOM INVOLUTIONS

269

where β(z) := ((z − eiθc )/(z − e−iθc ))1/4 , which is analytic C \ C 1 and β ∼ +1 as z ∈ R → +∞. 10.2.1. The case 2t ≥ ak for some a > 1 The parametrix m p (z) is introduced in [3, (6.25)–(6.31)], and it has the following properties. In the neighborhood O of size around the points e iθc and e−iθc , m p (z) is constructed using the Airy function in such a way that (m p (z))+ = (m p (z))− v (3) (z) for z ∈ 6 (3) ∩ O , and km p (m ∞ )−1 − I k L ∞ (∂ O ) = O(k −1 ). In C \ O , we set m p (z) := (3) ∩ O , and it has a m ∞ (z). Then the ratio R(z) := m (3) (z)m −1 p (z) has no jump on 6 −1 jump v R := w R + I converging to I uniformly of order O(k ) on ∂ O , and of order O(e−ck ) on 6 (3) ∩ O c as k → ∞. This implies that R(z) = I + O(k −1 ) for any z ∈ C \ 6 p , 6 p := (6 (3) ∩ O c ) ∪ ∂ O . Moreover, following the arguments in [14, Sec. 8], the error is uniform up to the boundary in each open region in C \ 6 p . In (3) particular, for z ∈ (3) 1 ∪ 4 (see [3, (6.34)–(6.40)]), m (3) (z) = I + O(k −1 ) m ∞ (z).

(10.105)

|πk (−α; k; t)| ≤ C|ekg(−α;k;t) |.

(10.106)

Here the error is uniform for ak ≤ 2t ≤ bk for some 0 < a < b. For the case (2t)/k → ∞, by shrinking the size of O properly, we again obtain uniform error (see [2]). Therefore, for any a > 0, we obtain uniformity in (10.105) for ak ≤ 2t. When z = 0, β(0) = −ieiθc /2 and g(0) = πi (see [3, Lem. 4.2(vi)]). Also, (10.98) says that m (3) (0) = m (2) (0). Thus Proposition 5.1(v) follows from (10.94) and (10.95), as in [3, (6.4)]. Now we consider Proposition 5.6 in the case where x ≤ −21/3 (a − 1)k 2/3 . For z = −α real, |β(−α)| = 1, so m ∞ (−α) is bounded. Hence from (10.98) and (10.94), (10.96), we have for α ≥ 1,

Then we proceed as in (10.109)–(10.121) of the following section to obtain the proper estimate. 10.2.2. The case k + M2−1/3 k 1/3 ≤ 2t ≤ ak for some a > 1 and M > M0 In this case, the points eiθc and e−iθc are allowed to approach −1, but the rate is restricted: √ k 1/2 M iθc |e + 1| = 2 1 − ≥ 1/3 (10.107) 2t k √ for k large. We now take the neighborhood O to be of size 2t/k − 1 around eiθc and e−iθc . From (10.107), O consists of two disjoint disks and their boundaries do not touch the real axis. We introduce the same parametrix m p as in Section 10.2.1. Then

270

BAIK AND RAINS

we have a similar result: there is M0 > 0 such that for M > M0 , 1 m (3) (z) = I + O m ∞ (z) k(2t/k − 1) (3)

(10.108)

(3)

for z ∈ 1 ∪ 4 . This is proved in [3, (6.34)–(6.40)]. When z = 0, as in Section 10.2.1, we obtain Proposition 5.1(iv), as in [3]. Now we prove Proposition 5.6 when x ≤ −M. As in Section 10.2.1, we have |πk (−α; k; t)| ≤ C|ekg(−α;k;t) |.

(10.109)

Now we need an estimate of g(−α; k; t); as we mentioned at the beginning of Section 10, this is the second way in which we must extend [3]. Note that Z 1 θc Re g(−α) = log(1 + α 2 + 2α cos θ) dµ(θ) 2 −θc (10.110) 1 α = log 2 + log + I (s), 2 γ where I (s) =

1 π

Z

1 −1

p log(s 2 − x 2 ) 1 − x 2 d x,

s :=

√ γ (1 + α) > 1. √ 2 α

(10.111)

The inequality s > 1 follows from the arithmetic-geometric mean inequality and the assumption γ > 1. A residue calculation gives us Z q 1 1 2y p 0 2 I (y) = 1 − x d x = 2y − 2 y 2 − 1, y > 1. (10.112) π −1 y 2 − x 2

Integrating from 1 to s > 1, we have

I (s) = s 2 − 1 − 2

Z

s 1

q

y 2 − 1 dy + I (1).

The constant I (1) can be evaluated (cf. [3, Lem. 4.3(ii)–(a)]): Z 1 1 π/2 I (1) = log(sin2 θ) sin2 θ dθ = − log 2. π −π/2 2 Thus we have p p 1 1 α Re g(−α) = − + log + s 2 − s s 2 − 1 + log s + s 2 − 1 . 2 2 γ

(10.113)

(10.114)

(10.115)

Assume 0 < α ≤ 1. We change the variables γ , α into s, ξ , where s is defined in (10.111) and 1/2 γ > 1. (10.116) ξ := α

RANDOM INVOLUTIONS

271

Then p p 1 1 γ F(ξ ) := g(−α)− α = − −log ξ − ξ 2 +2sξ −s 2 −s s 2 − 1+log(s + s 2 − 1). 2 2 2 (10.117) Differentiating with respect to ξ , we find 1 F 0 (ξ ) = − − ξ + 2s. (10.118) ξ √ √ Thus the maximum of F occurs at ξ = s + s 2 − 1. But F(s + s 2 − 1) = 0; hence F(ξ ) ≤ 0. Thus we obtain |e−tα πk (−α; k)| ≤ Cek Re(g(−α;k;t)−(γ /2)α) ≤ C,

0 < α ≤ 1.

(10.119)

For α ≥ 1, note that Re(g(−α)) = log α + Re(g(−α −1 )).

(10.120)

Thus, using (10.119), we have −1

|e−tα (−α)−k πk (−α; k)| ≤ Cek Re(g(−α;k;t)−(γ /2)α = Cek Re(g(−α

−1 −log α)

−1 ;k;t)−(γ /2)α −1 )

≤ C.

(10.121)

10.2.3. Case k < 2t ≤ k + M2−1/3 k 1/3 for some M > 0 First, we introduce m PII,3 as in [3, (2.22)–(2.28)]. Set 4 2 x 3/2 g (z) := z + , (10.122) 3 2 √ √ which is analytic in C\ − −x/2, −x/2 and behaves like (4/3)z 3 +x z+x 2 /(8z)+ O(z −3 ) =: θPII (z) + O(z −1 ) as z → +∞. Let 6 PII,3 := ∪5j=1 6 PII,3 as shown in j Figure 7. The angles of the rays with the real line are between zero and π/3. Recall that m(z; x) solves (2.15), the RHP for the PII equation. Define m PII,3 (z; x) by  PII,3 = m(z; x)ei(g PII −θPII )σ3 ,  z ∈ 1PII,3 , 4PII,3 , m   !    −2iθPII  PII m PII,3 = m(z; x) 1 e ei(g −θPII )σ3 , z ∈ (2PII,3 ∪ 3PII,3 ) ∩ C− , 0 1   !    1 0 i(gPII −θPII )σ3   PII,3  = m(z; x) 2iθ e , z ∈ (2PII,3 ∪ 3PII,3 ) ∩ C+ . m e PII 1 (10.123) PII

272

BAIK AND RAINS 62PII,3

PII,3 3

PII,3 1

61PII,3

65PII,4 −(−x/2)1/2

63PII,3

0

PII,3 2 (−x/2)1/2 64PII,3

PII,3 4

Figure 7. 6 PII,3 and PII,3 j

Then m (3) solves the RHP (see [3, (2.25)]) normalized at ∞ with the jump matrix  !  1 0    on 61PII,3 , 62PII,3 , PII  2ig   e 1    !   1 −e−2ig PII (3) on 63PII,3 , 64PII,3 , v (z; k; t) = (10.124)  0 1    !  PII    e−2ig− −1   on 65PII,3 .   1 0

Also, we have

m 1 (x) =

m 1PII,3 (x) −

i x2 σ3 , 8

(10.125)

where m PII,3 (z; x) = I + m 1PII,3 (x)/z + O(z −2 ) as z → ∞. Now as before, set x by 2t x = 1 − 1/3 2/3 . k 2 k

(10.126)

Hence we have −M ≤ x < 0 in this section. We now proceed as in [3, case (iii) of Sec. 6]. Define the parametrix ( m p (z; k; t) = m PII,3 λ(z), x in O \ 6 (3) , (10.127) m p (z; k; t) = I in O¯ c \ 6 (3) , where λ(z) is defined in (10.62) and O is a small neighborhood of size > 0 around z = −1 (see [3, case (iii) of Sec. 6] for details). As in Section 10.1.3, the ratio R(z; k; t) := m (3) m −1 p satisfies a new RHP, normalized at ∞, with jump matrix v R satisfying the estimate (10.66), where m 1PII,2 (x) is replaced by m 1PII,3 (x). Hence we

RANDOM INVOLUTIONS

273

have Z PII,3 (3) C 1 m (z; k; t) m p (z; k; t) −1 −I + m 1 (x) ds ≤ , 2πi dist(z, 6 R )k 2/3 ∂ O λ(s)(s − z) (10.128)

which is hidden in the derivation of [3, (6.19)]. Then as in (10.72), we have m

(3)

i24/3 m 1PII,3 (x) 1 (0; k; t) = I + + O 2/3 , k 1/3 k

(10.129)

which is a (direct) extension of [3, (6.19)]. Now from (10.98) and (10.125) (see [3, (6.19)]), 1 i24/3 m 1 (x) − 2−5/3 x 2 σ3 (2) m (0; k; t) = I + (10.130) + O 2/3 . k 1/3 k Hence, using ekl = 1− x 2 /(25/3 k 1/3 )+ O(k −2/3 ) and g(0) = πi, (10.94) and (10.95) yield Proposition 5.1(iii) in the case when −M ≤ x < 0, as in [3]. For the proof of Proposition 5.2, note that, as before, for each fixed z ∈ C \ 6, (3) we can use the freedom of the shape of 6 (3) (and 6 R ) so that z ∈ (3) 1 ∪ 4 and dist(z, 6 R ) ≥ c1 > 0. Thus we obtain lim m (2) (z; k; t) = I,

k→∞

z ∈ C \ 6 fixed.

(10.131)

From (10.94) and (10.96), we have lim e−kg(z;k;t) πk (z) = 0,

|z| < 1,

(10.132)

lim e−kg(z;k;t) πk (z) = 1,

|z| > 1.

(10.133)

k→∞ k→∞

This is an extension of the calculation in [3] where z = 0 is given. Now in order to prove Proposition 5.2, we need further analysis that is an extension in the second category as mentioned at the beginning of Section 10. Since γ = (2t)/k, for the proof of Proposition 5.2, it is enough to show that lim (−1)k ek[g(z;k;t)+(γ /2)z] = 1,

k→∞

lim ek[g(z;k;t)+(γ /2)z

k→∞

−1 −log z]

= 1,

|z| < 1,

(10.134)

|z| > 1.

(10.135)

But the proof of [3, Lem. 4.3(ii)] says that for |z| > 1, z ∈ / (−∞, −1), Z z γ γ s + 1p γ 1 (s − eiθc )(s − e−iθc ) ds + g− (1), g(z) = log z − (z + z −1 ) + + 2 4 2 4 1+0 s 2 (10.136)

274

BAIK AND RAINS

where the integral p is taken over a curve from 1+0 to z lying in {z ∈ C : |z| > 1, z ∈ / (−∞, −1)}. Here (s − eiθc )(s − e−iθc ) is analytic in C \ C 1 and behaves like s as s ∈ R → +∞, and log z is analytic in C\(−∞, 0] and is real for zR+ . Calculations in the same proof, together with [3, Lem. 4.2(viii)], give us g− (1) = −1/2−(1/2) log γ . Also, using sin2 (θc /2) = 1/γ , for |s| > 1, s ∈ / (−∞, −1), p 2s (γ − 1) + O((γ − 1)2 ). (s − eiθc )(s − e−iθc ) = (s + 1) − (10.137) s+1 Thus, expanding in γ − 1, we have γ g(z) + z −1 − log z = O((γ − 1)2 ) = O(k −4/3 ), (10.138) 2 which implies (10.135). A similar computation implies that for |z| < 1, z ∈ / (−1, 0], we have Z γ γ γ z s + 1p 1 g(z) = log z − (z + z −1 ) + + (s − eiθc )(s − e−iθc ) ds + g+ (1) 2 4 2 4 1+0 s 2 (10.139) and g+ (1) = −1/2 − (1/2) log z + πi, which yield (10.134). For the proof of Proposition 5.4 when x < 0, set 24/3 w . (10.140) k 1/3 In this case, we need a new argument as α → 1. When w and x are fixed, again as in (10.77), we have limk→∞ R(−α; k; t) = I , which implies that α =1−

lim m (3) (−α; k; t) = m PII,3 (−iw, x).

(10.141)

k→∞

From [3, (6.8)], we have 4 x 3/2 2 (−iw) + lim ke α (−α; k; t) = i = ig PII (−iw), k→∞ 3 2

(10.142)

which from (10.98) and (10.123) implies

lim m (2) (−α; k; t) = m(−iw, x)ei(g

k→∞

PII (−iw)−θ

PII (−iw))σ3

.

(10.143)

Now we compute the large k limit of kg(−α; k; t) − tα when w > 0, and of kg(−α; k; t) − tα −1 − log α when w < 0. For −π < θ < π, lim↓0 arg(−α + i − eiθ ) = π + tan−1 (sin θ/(α + cos θ)), where −π < tan−1 φ < π. Since tan−1 (sin θ/(α + cos θ)) is odd in θ, we have from (10.115), lim g(−α + i) = lim Re g(−α + i) + πi ↓0

↓0

p p 1 1 α = − + log + s 2 − s s 2 − 1 + log(s + s 2 − 1) + πi, 2 2 γ (10.144)

RANDOM INVOLUTIONS

275

√ √ where s = γ (1 + α)/(2 α) > 1. Under the stated conditions on γ and α, as k → ∞, 2 √ γ (1 + α) 1 w x 1 2w 3 + O =1+ − · + , (10.145) √ k 21/3 24/3 k 2/3 k −4/3 2 α 1 x 2xw 24/3 w α (10.146) = 1 − 1/3 + 1/3 2/3 − + O −4/3 . γ k k 2 k k Note that lim↓0 g(−α + i) − lim↓0 g(−α − i) is 2πi for α > 1, and it is zero for 0 < α < 1. Therefore we obtain lim (−1)k ekg(−α)−tα = e(4/3)w

k→∞

lim (−1)k ekg(−α)−tα

k→∞

−1 −k log α

3 −xw−(4/3)(w 2 −x/2)3/2

= e−(4/3)w

w > 0, (10.147)

,

3 +xw−(4/3)(w 2 −x/2)3/2

w < 0.

,

(10.148) Also, being careful of branch, we have 4 3 4 i(g (−iw) − θPII (−iw)) = w − xw − w2 − 3 3 4 3 4 PII i(g (−iw) − θPII (−iw)) = w − xw + w2 − 3 3 PII

x 2 x 2

3/2 3/2

,

w > 0,

(10.149)

,

w < 0.

(10.150)

Since limk→∞ ekl = 1, using (10.94)–(10.97) and (10.143), this implies (5.21)–(5.24) when x < 0. 10.3. Proof of Proposition 5.8 The analysis in this section is new, and it is needed for the proof of Proposition 5.8. Let α > 1 be fixed. Set t α(α 2 − 1)1/2 x α − ·√ , = 2 k α +1 (α 2 + 1)3/2 k

x ∈ R \ {0} fixed.

(10.151)

We are interested in the asymptotics of e −αt (−α)l πk (−α −1 ; t). Since 2α/(α 2 + 1) < 1 and x is fixed, we are in the case of Sections 10.1.1 and/or 10.1.2. We define m (1) and m (2) as in (10.1) and (10.3). Recall that we have a certain freedom in the choice of 6 (3) . We choose a contour passing through the saddle points of t f (z) := (z − z −1 ) + log(−z), (10.152) k the exponent of the (12)-entry of v (2) divided by k. The saddle points (see (10.31) and the following discussion) are −ρπ and −ρπ−1 , where p 1 (α 2 + 1)1/2 1 1 1 − 1 − (2t/k)2 x ρπ = =: ρ . = − + O · + O √ c 2t/k α α(α 2 − 1)1/2 k k k (10.153)

276

BAIK AND RAINS

(3)

4

(3)

3 (3)

2 (3)

−1

1

0

6c

6r 6

(3)

6out

Figure 8. 6 (3) and (3)

Take δ > 0 and > 0 small such that 6c := {−ρc + is : −k δ−1/2 ≤ s ≤ k δ−1/2 } (3) (3) lies inside the open unit disk for all k ≥ 1. Define (see Figure 8) 6 (3) := 6in ∪ 6out (3) (3) (3) by 6in := 6c ∪ 6r and 6out := {r −1 eiφ : r eiφ ∈ 6in }, where 6r := {| − ρc + ik δ−1/2 |eiθ : |θ| < θ0 }, −ρc + ik δ−1/2 = | − ρc + ik δ−1/2 |eiθ0 . Let (3) j be as in (3) Figure 8, and define m as in (10.34). As in (10.46)–(10.48), the quantities we are interested in are e−αt (−α)k πk (−α −1 ; t) = −α k e−t (α−α e−αt (−α)k πk (−α −1 ; t) = −α k e−t (α−α

−1 )

−1 )

−1 m (3) 12 (−α ; k; t), (3)

x < 0,

(10.154)

(3)

m 12 (−α −1 ; k; t) + m 11 (−α −1 ; k; t),

x > 0. (10.155)

For the estimates of w (3) := v (3) − I , note that for any fixed 0 < ρ < 1, Re f (ρeiθ ) = F(ρeiθ ; 2t/k) (recall (10.11)) is increasing in 0 < θ < π and is δ−1/2 ) decreasing in π < θ < 2π; hence ke k f (z) k L ∞ (6r ) = ek Re f (−ρc +k . But we have t 1 α 2 (α 2 − 1) 2 x 2 −1 f (−ρc + ia) = (α − α ) − log α − a + + O(k −3/2+2δ ), k 2 k α2 + 1 |a| ≤ k δ−1/2 . (10.156)

Thus kα k e−t (α−α

−1 )

2δ

ek f (z) k L ∞ (6r ) ≤ Ce−ck ,

(10.157)

RANDOM INVOLUTIONS

277

and also using α −k et (α−α

−1 )

= ek[log α−(α

2 −1)/(α 2 +1)+O(1/

√ k)]

,

log α −

α2 − 1 > 0 for α > 1, α2 + 1 (10.158)

we have kek f (z) k L ∞ (6r ) ≤ Ce−ck .

(10.159)

On the other hand, one can directly check that Re f (−ρc + a) has its maximum at a = 0 for −k δ−1/2 ≤ a ≤ k δ−1/2 ; hence kek f (z) k L ∞ (6c ) = ek Re f (−ρc ) . Again (10.156) and (10.158) yield kek f (z) k L ∞ (6c ) ≤ e−ck .

(10.160)

Similarly, we have ke−k f (z) k L ∞ (6 (3) ) ≤ e−ck . Now calculations as in Section out 10.1.2 give us the result (10.43). Hence, using (10.158) √ and (10.160) and noting dist(−α −1 , 6 (3) ) = ((α 2 + 1)1/2 /(α(α 2 − 1)1/2 )) · (x/ k), we have

αk e

−t (α−α −1 )

√ (3) m 11 (−α −1 ; k; t) = 1 + O( ke−2(1−1 )c1 k ),

(10.161)

−t (α−α −1 )

−1 k m (3) 12 (−α ; k; t) = α e Z −1 √ −ck 2δ (−s)k et (s−s ) ds · + O( ke ). 2πi s + α −1 6c (10.162)

To evaluate the integral asymptotically, first we change the variable by s = −ρ c − √ 1/2 2 1/2 + 1) /(α(α − 1) )) · (y/ k). Then from (10.156), the numerator of the integrand becomes (i(α 2

α −k et (α−α

−1 )

e−(1/2)(y

2 +x 2 )+O(k −1/2+2δ )

(10.163)

.

Hence, setting A := α(α 2 − 1)1/2 /((α 2 + 1)1/2 ), 1 2πi Z Ak δ

k −t (α−α −1 )

α e

1 = 2πi

Z

−Ak δ

6c

(−s)k et (s−s s + α −1 2

e−(1/2)(y +x y + ix

2)

−1 )

ds

2δ dy 1 + O(k −(1/2)+2δ ) + O(e−ck ). (10.164)

Thus from (10.154) and (10.155), we obtain lim e−αt (−α)k πk (−α −1 ; t) =

k→∞

1 2πi

Z

∞ −∞

2

e−(1/2)(y +x y + ix

2)

dy,

x < 0,

(10.165)

278

BAIK AND RAINS

lim e−αt (−α)k πk (−α −1 ; t) =

k→∞

The function h(x) := (1/(2πi))

1 2πi

R∞

∞ −∞

2

e−(1/2)(y +x y + ix

2)

dy + 1,

x > 0. (10.166)

−(1/2)(y 2 +x 2 ) /(y

+ i x) dy is smooth in x > 0 √ 2 −(1/2)x = (1/ 2π )e . As x → ±∞, h(x) → 0.

−∞ e

h 0 (x)

Z

and x < 0. The derivative is Therefore we see that ( √ Rx 2 Z ∞ −(1/2)(y 2 +x 2 ) (1/ 2π ) −∞ e−(1/2)y dy, e 1 dy = √ Rx 2 2πi −∞ y + ix (1/ 2π ) ∞ e−(1/2)y dy,

x < 0, x > 0.

(10.167)

Thus Proposition 5.8 is proved.

2

Acknowledgments. We would like to thank Percy Deift for helpful discussions and encouragement, especially for his help in proving Lemma 2.1. We would also like to acknowledge many useful conversations and communications with Peter Forrester, Kurt Johansson, Charles Newman, and Harold Widom. Special thanks are due the referee who gave us crucial advice, improving the exposition of the paper significantly. References [1]

[2]

[3]

[4]

[5] [6]

[7]

D. ALDOUS and P. DIACONIS, Longest increasing subsequences: From patience

sorting to the Baik-Deift-Johansson theorem, Bull. Amer. Math. Soc. (N.S.) 36 (1999), 413–432. MR 2000g:60013 207 J. BAIK, Riemann-Hilbert problems and random permutations, Ph.D. dissertation, Courant Institute of Mathematical Sciences, New York, 1999, http://www.math.princeton.edu/˜jbaik/ 269 J. BAIK, P. DEIFT, and K. JOHANSSON, On the distribution of the length of the longest increasing subsequence of random permutations, J. Amer. Math. Soc. 12 (1999), 1119–1178. MR 2000e:05006 205, 207, 210, 213, 214, 228, 229, 230, 231, 232, 233, 235, 238, 239, 241, 251, 252, 253, 254, 255, 256, 257, 258, 259, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274 , On the distribution of the length of the second row of a Young diagram under Plancherel measure, Geom. Funct. Anal. 10 (2000), 702–731, MR CMP 1 791 137; Addendum, Geom. Funct. Anal. 10 (2000), 1606–1607. MR CMP 1 810 756 207 J. BAIK, P. DEIFT, and E. RAINS, A Fredholm determinant identity and the convergence of moments for random Young tableaux, preprint, arXiv:math.CO/0012117 207 J. BAIK and E. M. RAINS, Limiting distributions for a polynuclear growth model with external sources, J. Statist. Phys. 100 (2000), 523–541. MR CMP 1 788 477 210, 213 , Algebraic aspects of increasing subsequences, Duke Math. J. 109 (2001), 1–65. 205, 211, 213, 214, 224, 225, 226, 227, 228, 246

RANDOM INVOLUTIONS

[8]

[9]

[10]

[11]

[12] [13]

[14]

[15]

[16]

[17]

[18]

[19] [20] [21]

[22] [23]

279

, Symmetrized random permutations, preprint, arXiv:math.CO/9910019, to appear in Random Matrix Models and Their Applications, ed. P. Bleher and A. Its, Math. Sci. Res. Inst. Publ. 40, Cambridge Univ. Press, Cambridge, 2001. 211, 213, 222 A. BORODIN, Longest increasing subsequences of random colored permutations, Electron. J. Combin. 6 (1999), R13, http://www.combinatorics.org MR 2000a:05014 207, 212 A. BORODIN, A. OKOUNKOV, and G. OLSHANSKI, Asymptotics of Plancherel measures for symmetric groups, J. Amer. Math. Soc. 13 (2000), 481–515, http://www.ams.org/jams/ MR CMP 1 758 751 207 P. A. DEIFT, Orthogonal Polynomials and Random Matrices: A Riemann-Hilbert Approach, Courant Lect. Notes Math. 3, Courant Inst. Math. Sci., New York, 1999. MR 2000g:47048 207 , Integrable systems and combinatorial theory, Notices Amer. Math. Soc. 47 (2000), 631–640. MR CMP 1 764 262 207 P. DEIFT, T. KRIECHERBAUER, K. T.-R. MCLAUGHLIN, S. VENAKIDES, and X. ZHOU, Asymptotics for polynomials orthogonal with respect to varying exponential weights, Internat. Math. Res. Notices 1997, 759–782. MR 99g:34038 251, 266 , Strong asymptotics of orthogonal polynomials with respect to exponential weights, Comm. Pure Appl. Math. 52 (1999), 1491–1552. MR CMP 1 711 036 251, 269 , Uniform asymptotics for polynomials orthogonal with respect to varying exponential weights and applications to universality questions in random matrix theory, Comm. Pure Appl. Math. 52 (1999), 1335–1425. MR CMP 1 702 716 251 P. DEIFT, S. VENAKIDES, and X. ZHOU, The collisionless shock region for the long-time behavior of solutions of the KdV equation, Comm. Pure Appl. Math. 47 (1994), 199–206. MR 95f:35220 250 , New results in small dispersion KdV by an extension of the steepest descent method for Riemann-Hilbert problems, Internat. Math. Res. Notices 1997, 286–299. MR 98b:35155 251, 266 P. DEIFT and X. ZHOU, A steepest descent method for oscillatory Riemman-Hilbert problems: Asymptotics for the MKdV equation, Ann. of Math. (2) 137 (1993), 295–368. MR 94d:35143 250 , Asymptotics for the Painlevé II equation, Comm. Pure Appl. Math. 48 (1995), 277–337. MR 96d:34004 215, 216, 219, 250, 266 H. FLASCHKA and A. C. NEWELL, Monodromy- and spectrum-preserving deformations, I, Comm. Math. Phys. 76 (1980), 67–116. MR 82g:35103 216 A. S. FOKAS, A. R. ITS, and A. V. KITAEV, Discrete Painlevé equations and their appearance in quantum gravity, Comm. Math. Phys. 142 (1991), 313–344. MR 93a:58080 228 A. S. FOKAS and X. ZHOU, On the solvability of Painlevé II and IV, Comm. Math. Phys. 144 (1992), 601–622. MR 93d:34004 216 P. J. FORRESTER, Random walks and random permutations, preprint,

280

BAIK AND RAINS

arXiv:math.CO/9907037 213 [24]

P. J. FORRESTER and E. M. RAINS, Inter-relationships between orthogonal, unitary and

[25]

J. GRAVNER, C. A. TRACY, and H. WIDOM, Limit theorems for height fluctuations in a

symplectic matrix ensembles, preprint, arXiv:solv-int/9907008 225

[26] [27]

[28] [29]

[30]

[31] [32] [33]

[34] [35] [36] [37]

[38] [39] [40] [41]

class of discrete space and time growth models, preprint, arXiv:math.PR/0005133 210 C. GREENE, An extension of Schensted’s theorem, Adv. Math. 14 (1974), 254–265. MR 50:6874 208 S. P. HASTINGS and J. B. MCLEOD, A boundary value problem associated with the second Painlevé transcendent and the Korteweg–de Vries equation, Arch. Rational Mech. Anal. 73 (1980), 31–51. MR 81i:34024 214, 215 A. R. ITS, C. A. TRACY, and H. WIDOM, Random words, Toeplitz determinants and integrable systems, I, preprint, arXiv:math.CO/9909169 207 M. JIMBO, T. MIWA, and K. UENO, Monodromy preserving deformation of linear ordinary differential equations with rational coefficients, I: General theory and τ -function, Phys. D 2 (1981), 306–352. MR 83k:34010a 216 K. JOHANSSON, The longest increasing subsequence in a random permutation and a unitary random matrix model, Math. Res. Lett. 5 (1998), 63–82. MR 99e:60033 213, 233 , Shape fluctuations and random matrices, Comm. Math. Phys. 209 (2000), 437–476. MR CMP 1 737 991 207, 208, 211, 222 , Discrete orthogonal polynomial ensembles and the Plancherel measure, preprint, arXiv:math.CO/9906120 , to appear in Ann. of Math. (2). 207, 208, 210 D. E. KNUTH, The Art of Computer Programming, Vol. 3: Sorting and Searching, 2d ed., Addison-Wesley Ser. Comput. Sci. Inform. Process., Addison-Wesley, Reading, Mass., 1973. MR 56:4281 208, 209, 239 G. KUPERBERG, Random words, quantum statistics, central limits, random matrices, preprint, arXiv:math.PR/9909104 207 B. F. LOGAN and L. A. SHEPP, A variational problem for random Young tableaux, Adv. Math. 26 (1977), 206–222. MR 98e:05108 206 M. L. MEHTA, Random Matrices, 2d ed., Academic Press, Boston, 1991. MR 92f:82002 206, 207, 224 A. M. ODLYZKO and E. M. RAINS, “On longest increasing subsequences in random permutations” in Analysis, Geometry, Number Theory: The Mathematics of Leon Ehrenpreis (Philadelphia, 1998), Contemp. Math. 251, Amer. Math. Soc., Providence, 2000, 439–451. MR 2001d:05003 207 A. OKOUNKOV, Random matrices and random permutations, Internat. Math. Res. Notices 2000, 1043–1095. MR CMP 1 802 530 207 ¨ M. PRAHOFER and H. SPOHN, Statistical self-similarity of one-dimensional growth processes, Phys. A 279 (2000), 342–352. MR CMP 1 797 145 213 , Universal distributions for growth processes in 1 + 1 dimensions and random matrices, Phys. Rev. Lett. 84 (2000), 4882–4885. 213 A. REGEV, Asymptotic values for degrees associated with strips of Young diagrams, Adv. Math. 41 (1981), 115–136. MR 82h:20015 205, 206

RANDOM INVOLUTIONS

281

[42]

C. SCHENSTED, Longest increasing and decreasing subsequences, Canad. J. Math. 13

[43]

R. P. STANLEY, Generalized riffle shuffles and quasisymmetric functions, preprint,

[44]

C. A. TRACY and H. WIDOM, Level-spacing distributions and the Airy kernel, Comm.

(1961), 179–191. MR 22:12047 208 arXiv:math.CO/9912025 207

[45] [46] [47] [48]

Math. Phys. 159 (1994), 151–174. MR 95e:82003 207, 215 , On orthogonal and symplectic matrix ensembles, Comm. Math. Phys. 177 (1996), 727–754. MR 97a:82055 207, 215 , Random unitary matrices, permutations and Painlevé, Comm. Math. Phys. 207 (1999), 665–685. MR CMP 1 727 236 207, 212 , On the distributions of the lengths of the longest monotone subsequences in random words, Probab. Theory Related Fields 119 (2001), 350–380. 207 A. M. VERSHIK and S. V. KEROV, Asymptotics of the Plancherel measure of the symmetric group and the limiting form of Young tables, Soviet Math. Dokl. 233 (1977), 527–531. MR 58:562 206

Baik Mathematics Department, Princeton University, Princeton, New Jersey 08544-1000, USA; [email protected]; School of Mathematics, Institute for Advanced Study, Princeton, New Jersey 08540, USA Rains AT&T Labs-Research, Florham Park, New Jersey 07932, USA; [email protected]


ON ICOSAHEDRAL ARTIN REPRESENTATIONS KEVIN BUZZARD, MARK DICKINSON, NICK SHEPHERD-BARRON, AND RICHARD TAYLOR

Abstract If ρ : Gal(Qac /Q) → GL2 (C) is a continuous odd irreducible representation with nonsolvable image, then under certain local hypotheses we prove that ρ is the representation associated to a weight 1 modular form and hence that the L-function of ρ has an analytic continuation to the entire complex plane. Introduction E. Artin [A] conjectured that the L-series L(r, s) of any continuous representation r : Gal(Qac /Q) −→ GLn (C) is entire except possibly for a pole at s = 1 when r contains the trivial representation. The case when n = 1 is simply a restatement of the Kronecker-Weber theorem and standard results on the analytic continuation of Dirichlet L-series. Artin proved his conjecture when r is induced from a 1-dimensional representation of an open subgroup of Gal(Qac /Q). Moreover, R. Brauer [Br] was able to show in general that L(r, s) is meromorphic on the whole complex plane. Since then, the only real progress has been for n = 2, although very recently D. Ramakrishnan [Ra] has dealt with some n = 4 cases. When n = 2, such representations can be classified according to the image of the projectivised representation proj r : Gal(Qac /Q) −→ PGL2 (C). This image is either cyclic, dihedral, the alternating group A4 (the tetrahedral case), the symmetric group S4 (the octahedral case), or the alternating group A5 (the icosahedral case). When the image of proj r is cyclic, then r is reducible and Artin’s conjecture follows from the n = 1 case. When the image of proj r is dihedral, then r is induced from a character of an open subgroup of index 2, and so Artin himself proved the conjecture in this case, although the result is implicit in earlier work of E. Hecke [He]. R. Langlands [Langl] proved Artin’s conjecture for tetrahedral and some octahedral representations. J. Tunnell [Tu] extended this to all octahedral representations. These results are based on Langlands’s theory of cyclic base change for automorphic representations of GL2 , DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 2, Received 29 December 1999. Revision received 13 October 2000. 2000 Mathematics Subject Classification. Primary 11F11, 11F80; Secondary 11F33, 11G18, 14G22. Taylor’s work partially supported by National Science Foundation grant number DMS-9702885 and by the Miller Institute at the University of California at Berkeley. 283

284

BUZZARD, DICKINSON, SHEPHERD-BARRON, AND TAYLOR

and so the method seems to be restricted (at best) to cases where the image of r is soluble. The icosahedral case has until now largely been attacked using computational methods, where one can hope to construct an explicit weight 1 modular form to deal with any particular case. There is a growing literature on the computational side of the subject, started by J. Buhler in [Buh] and continued by G. Frey and others in [F], E. Goins [Go], K. Buzzard and W. Stein [BS], and A. Jehanne and M. Müller [JM]. In each case finitely many (up to twist) icosahedral cases of Artin’s conjecture are treated. The contribution here to the problem is to treat infinitely many icosahedral cases using a theoretical approach. More precisely, we prove the following theorem. THEOREM A

Suppose that r : Gal(Qac /Q) −→ GL2 (C) is a continuous irreducible representation and that r is odd, that is, that the determinant of complex conjugation is −1. If r is icosahedral, suppose that • proj r is unramified at 2 and that the image of a Frobenius element at 2 under proj r has order 3 and that • proj r is unramified at 5. Then there is a weight 1 newform f such that for all prime numbers p the pth Fourier coefficient of f equals the trace of Frobenius at p on the space of coinvariants for the inertia group at p in the representation r . In particular, the Artin L-series for r is the Mellin transform of a weight 1 newform and is an entire function. The proof follows a strategy outlined to A. Wiles by R. Taylor in 1992 (see [Ta2]), which we have now carried out in three main steps (see [ST], [Di], and [BT]). The purpose of this article is simply to pull these results together and document some technical results that we require but that do not seem to be available in the literature. The result is that this paper is rather technical. The reader who simply desires to get an overview of the main ideas of the proof should consult [Ta2], perhaps followed by [ST], [BT], and [Di], rather than this paper. We remark also that by using arguments mod 5 rather than mod 2, Taylor has proved in [Ta3] a theorem similar to Theorem A but with different local conditions. One might hope that extensions of our method may treat all odd 2-dimensional icosahedral representations of Gal(Qac /Q), although considerable work remains to be done. On the other hand, our method seems to offer no prospect of treating the general Artin conjecture.

ON ICOSAHEDRAL ARTIN REPRESENTATIONS

285

1. mod 2 icosahedral representations In this section we give a slight extension of results in [ST]. This could be avoided by appealing to the results of B. Gross in [Gr]. However, Gross’s results depend on certain “unchecked compatibilities,” and so we prefer to make our result unconditional by using this more ad hoc argument. We remark that the hypotheses in our main theorem could be weakened if one could make Gross’s theorem unconditional. We start with a strengthening of [ST, Th. 3.4]. 1.1 Fix a continuous homomorphism THEOREM

ρ : Gal(Qac /Q) −→ SL2 (F4 ). Suppose that ρ is unramified at 2 and that ρ(Frob2 ) has distinct eigenvalues α, β ∈ F× with a principal polarisation 4 . Then there is an abelian surface A/Q together √ λ : A −→ A∨ and an embedding i : Z[(1 + 5)/2] ,→ End(A) (both defined over Q ) such that √ (1) λ ◦ i(a) = i(a)∨ ◦ λ for all a ∈ Z[(1 + 5)/2]; (2) the action of Gal(Qac /Q) on A[2] ∼ = F24 is equivalent to ρ; (3) A has good ordinary reduction at 2 and Frob2 = α on A[2]et (the generic fibre of the maximal e´ tale quotient of the 2-torsion on the Néron model of A over Z); and √ √ (4) the action of Gal(Qac /Q) on the 5-division points, A[ 5], is via a surjection Gal(Qac /Q) → → GL2 (F5 ). Proof With the third condition removed, this is the main result of [ST]. The proof of this strengthening is a slight variant of the argument of that paper. We start by recalling some of the constructions there. √ We fix an identification of F4 with Z[(1 + 5)/2]/(2) and of SL2 (F4 ) with A5 . We let Y/Q denote the smooth cubic surface given in P4 by 5 X i=1

yi =

5 X

yi3 = 0.

i=1

The group A5 acts on Y by permuting the variables. We let Y 0 ⊂ Y (resp., Y 1 ⊂ Y ) denote the complement of the 15 lines conjugate to (s : −s : t : −t : 0) (resp., the complement of the 10 points conjugate to (1 : −1 : 0 : 0 : 0)). We let Yρ (resp., Yρ0 , resp., Yρ1 ) denote the twist of Y (resp., Y 0 , resp., Y 1 ) by ρ : Gal(Qac /Q) → A5 . There is an e´ tale P1 -bundle Cρ → Yρ1 together with 6 distinguished sections

286


s1 , . . . , s6 : Yρ1 × Qac → Cρ × Qac such that the set {s1 , . . . , s6 } is Gal(Qac /Q)invariant. SUBLEMMA

Over Yρ0 , these sections are distinct. Proof We use without comment some notation from [ST, Sec. 2]. By the formulae in [DO, pp. 15–17], the locus in P16 where s1 , . . . , s6 are distinct is identified with the comP6 P6 3 plement, Z 0 , in i=1 zi = i=1 z i = 0 of the 15 S6 -conjugates of the plane (s : −s : t : −t : u : −u). Then using [ST, Lem. 2.4] it is easy to see that j −1 Z 0 = Y 0 , and the result follows. We let Wρ /Q denote the F4 -vector space scheme corresponding to ρ Gal(Qac /Q) → GL2 (F4 ). It comes with a standard pairing

:

Wρ × Wρ −→ µ2 which on Qac -points sends (a, b) × (c, d) 7−→ (−1)trF4 /F2 (ad−bc) . Then there is a coarse moduli space Hρ /Q parametrising quadruples (A, √ λ, i, α), where (A, λ) is a principally polarised abelian surface, where i : Z[(1 + 5)/2] ,→ ∼ End(A) has image fixed by the λ-Rosati involution, and where α : Wρ → A[2] is an isomorphism of F4 -vector space schemes taking the standard pairing to the λ-Weil pairing. There is a Zariski-open subset Hρ0 ⊂ Hρ consisting of those geometric points for which the corresponding (A, λ) is a Jacobian. We claim that there is an isomorphism Yρ0 ∼ = Hρ0 so that a geometric point y of Yρ0 maps to the point parametrising a quadruple (A, λ, i, α) such that (A, λ) is the Jacobian of the curve which maps 2 : 1 to Cρ,y ramified exactly at s1 (y), . . . , s6 (y). Unfortunately, this is not explicitly stated in [ST]. To prove it, one may assume that ρ = 1. Recall from [ST] that we have maps Y − →H2∗ −→ A2∗ − →P16 . (We keep the notation of [ST], so in particular H2∗ is a compactification of what we are now calling H1 .) The locus of Jacobians in A2∗ is the locus of points where A2∗ − →P16 is regular and which map to Z 0 ⊂ P16 . Thus Y 0 maps to H10 ⊂ H2∗ . On the other hand, H2∗ is the disjoint union of the image of Y 0 and some P1 ’s which get contracted to the points of P16 − (P16 )s (see [ST, Sec. 2].) Thus, if y is a point of H2∗ not in the image of Y 0 , then either H2∗ − →P16 is not regular at y or y gets mapped outside Z 0 . In either case, y does not lie in H10 , establishing the claim.


287

If X ρ0 denotes the blow-up of Yρ × Yρ along the diagonal, then X ρ0 has an involution t that exchanges the two factors. We let X ρ denote the twist of X ρ0 by √ ∼ Gal(Qac /Q) → → Gal(Q( 5)/Q) → {1, t}, and we let X ρ2 be the complement in X ρ of the strict transforms of L × L as L runs over lines on Yρ . Then there is a morphism θ : X ρ2 −→ Yρ which (loosely speaking) sends (P, Q) to the third point of intersection of the line through P and Q with Yρ (see [ST] for details). We let X ρ0 (resp., X ρ1 , resp., Dρ / X ρ1 ) denote the preimage of Yρ0 (resp., the preimage of Yρ1 , resp., the pullback of Cρ ) under θ. Then it is proved in [ST, Lem. 3.1 and Prop. 3.2] that X ρ /Q is rational and that Dρ / X ρ1 is a Zariski P1 -bundle. The argument preceding [ST, Lem. 2.7] shows that given x ∈ X ρ0 we can find a Zariski-open subset U ⊂ X ρ0 containing x and a principally polarised abelian surface (AU , λU )/U such that (1) for all x1 ∈ U the fibre (AU , λU )x1 is the Jacobian of a curve which maps 2 : 1 to Dρ,x1 ramified exactly at s1 (x1 ), . . . , s6 (x1 ); ∼ (2) there is an isomorphism αU : Wρ → AU [2] of finite flat group schemes over U with alternating pairings; √ and (3) there exists iU : Z[(1 + 5)/2] ,→ End(AU ) which is compatible with αU and the action of F4 on Wρ . We remark that in [ST] the existence of iU is explained only over a nonempty open subset of U , but by [CF, Rem. 1.10(a) of Chap. I], iU extends to U . We remind the reader that AU is not canonical. Suppose that x is a geometric point of U . If f is an automorphism of (AU , λU , iU , αU )x , then T2 ( f ) ≡ 1 mod 2 and so T2 ( f 2 ) ≡ 1 mod 4. As f has finite order, this implies that f 2 = 1. If f 6= ±1, then AU,x ∼ = (1 + f )/2AU,x ⊕ (1 − f )/2AU,x and λU correspondingly decomposes as the direct sum of two polarisations. This contradicts the fact that θ(x) ∈ Yρ0 ∼ = Hρ0 . Thus we must have Aut((AU , λU , iU , αU )x ) = {±1}. In particular, if we set √ e = {(a, b) ∈ (AU × AU )[ 5] | ha, bi 6= 1}/ ∼, U where (a, b) ∼ (a 0 , b0 ) if and only if (a, b) = ±(µa 0 , b0 ) for some µ ∈ F× 5 , then e is canonical and so we can glue the U e/U to give an e´ tale cover the construction of U e X ρ0 is geometrically X ρ0 / X ρ0 of degree 60. The argument of [ST, Lem. 2.7] shows that e irreducible.

288


Suppose for the moment that we can find a point x2 ∈ X ρ0 (Q2 ), a Zariski-open U2 ⊂ X ρ0 × Q2 as above, and a continuous character χ2 : Gal(Qac 2 /Q2 ) → {±1} such that • the twist AU2 ,x2 (χ2 ) of AU2 ,x2 by χ2 has good reduction, and • if AU2 ,x2 (χ2 ) denotes the mod 2 reduction of the Néron model of AU2 ,x2 (χ2 ) over Z2 , then AU2 ,x2 (χ2 )[2]et 6= (0) and Frob2 acts on AU2 ,x2 (χ2 )[2]et by α. Then we can find a neighbourhood (for the 2-adic topology) U ⊂ X ρ0 (Q2 ) of x2 such that if x ∈ U , then • x ∈ U2 , • AU2 ,x (χ2 ) has good reduction at 2, and • AU2 ,x (χ2 )[2] ∼ = AU2 ,x2 (χ2 )[2]. Because X ρ is rational, it follows from T. Ekedahl’s version of the Hilbert irreducibility theorem (see [E, Th. 1.3]) that we can find a point x ∈ X ρ0 (Q) such that • x ∈ U , and • if e x is a point of e X ρ0 above x, then [Q(e x ) : Q] = 60. Suppose that U is a Zariski neighbourhood of x in X ρ0 as above. Then (AU , λU , iU , αU )x × Q2 is a twist by some character χ20 : Gal(Qac 2 /Q2 ) → {±1} of (AU2 , λU2 , iU2 , αU2 )x . Choose a character χ : Gal(Qac /Q) −→ {±1} which restricts to χ2 χ20 on Gal(Qac 2 /Q2 ). Then AU,x (χ) has the following properties: • (AU,x (χ ), λU,x√)/Q is a principally polarised abelian surface; • iU,x : Z[(1 + 5)/2] ,→ End(AU,x (χ)), and the image is fixed by the λU,x Rosati involution; • as an F4 [Gal(Qac /Q)]-module, AU,x (χ)[2](Qac ) is equivalent to ρ; • AU,x (χ) × Q2 ∼ = AU2 ,x (χ2 ), and so AU,x (χ) has good reduction at 2; • AU,x (χ)[2] ∼ = AU2 ,x2 (χ2 )[2], and so AU,x (χ)[2]et 6= (0) and Frob2 acts on AU,x (χ )[2]et by α; √ • if G denotes the image Gal(Qac /Q) in AutF4 (AU,x (χ )[ 5]) ∼ = GL2 (F5 ), then det G = F× (because of the λ-Weil pairing) and 5 #G/G ∩

µ 0 0 ν

ν = ±1, µ ∈ F× = 60; 5

it is then elementary to check that G = GL2 (F5 ). It remains to explain the construction of x2 . This we do in two steps. More precisely, we show the following two results. (1) There is a quadruple (A, λ, i, α) (as above) defined over K such that A has good reduction, and if A denotes the reduction of its Néron model, then A[2]et 6= (0) and Frob2 acts on A[2]et by α. (2) If y ∈ Yρ0 (Q2 ), then there is a point of X ρ0 (Q2 ) mapping to y under θ .


289

The first assertion gives a point y2 ∈ Hρ0 (Q2 ) = Yρ0 (Q2 ) and the second a point x2 ∈ X ρ0 (Q2 ) mapping to y2 under θ. This point x2 suffices. We initially establish the second assertion. Suppose y ∈ Yρ0 (Q2 ), and let Yρ (y)0 denote the complement in Yρ of the intersection of Yρ with the tangent plane to Yρ at y. Thus Yρ (y)0 is a smooth affine cubic surface. There is an involution ι y of Yρ (y) which sends any point z to the third point of intersection of the line through y and 0 z with the √ cubic surface Yρ . We let Yρ (y) denote the twist of Yρ (y) by ι y over Gal(Q2 ( 5)/Q2 ). We may identify Yρ (y) as a Zariski-open subset of the fibre of θ : X ρ0 → Yρ0 above y, and so it suffices to show that Yρ (y)(Q2 ) 6= ∅. Note that the equations defining Y also define a smooth projective surface over Z2 , which we also denote by Y . The constructions of Yρ , Yρ (y)0 , and Yρ (y) from Y all make sense over Z2 and give rise to smooth relative surfaces over Z2 , which we denote by the same symbols. Here we are using the fact that ρ is unramified, and we are not asserting that these integral models have any moduli theoretic meaning. By Hensel’s lemma it suffices to show that Yρ (y)(F2 ) is nonempty. Without loss of generality, the surface Yρ × F2 is given in P3 by the equation X 13 + X 1 X 22 + X 23 + X 32 X 4 + X 3 X 42 = 0. (If γ is a root of T 3 + T + 1 = 0, then (X 1 : X 2 : X 3 : X 4 ) corresponds to the point (X 3 + X 4 ) + X 1 γ + X 2 γ 2 : (X 3 + X 4 ) + X 1 γ 2 + X 2 γ 4 : (X 3 + X 4 ) + X 1 γ 4 + X 2 γ : X 3 : X 4 of Y × F2 .) Thus Yρ (F2 ) has three points: P = (0 : 0 : 1 : 0), Q = (0 : 0 : 0 : 1), and R = (0 : 0 : 1 : 1). First, suppose that y reduces to P. Then Yρ (y) × F2 is the surface given in affine 3-space by the equation x13 + x1 x22 + x23 + x3 + x32 = 0 and ι y maps (x1 , x2 , x3 ) to (x1 , x2 , x3 + 1). (Here we set xi = X i / X 4 .) Thus Yρ (y) × F2 is given in affine 3-space by the equation y13 + y1 y22 + y23 + 1 + y3 + y32 = 0. (Here we let (y1 , y2 , y3 ) correspond to the point (x1 , x2 , x3 ) = (y1 , y2 , y3 + (1 + √ 5)/2).) Thus Yρ (y)(F2 ) consists of 6 points. Second, suppose that y reduces to Q. This case is exactly analogous, and again we see that Yρ (y)(F2 ) consists of 6 points.

290


Third, suppose that y reduces to R. Introducing a new variable X 40 = X 3 + X 4 , we see that Yρ × F2 can also be described in P3 by the equation X 13 + X 1 X 22 + X 23 + X 32 X 40 + X 3 (X 40 )2 = 0 and that in these new coordinates R becomes the point (0 : 0 : 1 : 0). Thus the analysis is the same again and we see that Yρ (y)(F2 ) again consists of 6 points. Finally, we turn to our first assertion. Let K denote the field Q(a), where a is a root of T 4 + 13T 2 + 41 = 0. √ K is a CM Then 13 + 2a 2 is a square root of √ 5, which we denote 5. Moreover, −1 field with totally real subfield Q( 5). The inverse different d K /Q is principal with generator ξ = (13a + 2a 3 )−1 . We have the prime factorisation √ √ 2O K = (((1 + 5)/2 + a)/2)(((1 + 5)/2 − a)/2). × As √ ±1 are the only roots of unity √ in K , the only elements of K with norm down to Q( 5) equal to 2 are (±(1 + 5)/2 ± √a)/2. √ The normal closure of K /Q is K ( 41)/Q, and Gal(K ( 41)/Q) is generated by two elements σ and τ , where √ σ (a) = 41/a, τ (a) √ √ √ = a, √ σ ( 41) = − 41, τ ( 41) = − 41.

Thus σ 4 = τ 2 = 1, τ σ τ = σ 3 , and σ 2 = c. By the Chebotarev density theorem, we may choose a prime ℘ of O K which is split completely and lies above a rational prime p ≡ 3 mod 4. Let α0 denote the character O K×, p → → O K×,℘ → → {±1}.

√ Fix an embedding K ( 41) ,→ C such that a has negative imaginary part, 13 + √ 2a 2 > 0,√ and 41 > 0. Then 8 = {1, σ } is a CM-type with reflex (L , 80 ), where 0 3 L = K ( 41){1,σ τ } and √ 8 = {1, σ }. The field L is also a CM field and has a totally real subfield Q( 41). It is isomorphic to the field obtained by adjoining a root × 2 of T 4 + 26T √ + 5 to Q. Then L has class number 1 andc O L is generated by −1 and 32 + 5 41. We have a prime factorisation 2Ol = I I J with #O L /I = 2 and #O L /J = 4. We have a homomorphism N80 : L × −→ K × , x 7−→ xσ 3 (x). × Then N80 extends to a map A× L → A K . Define a continuous homomorphism × α : A× L −→ K

by setting

ON ICOSAHEDRAL ARTIN REPRESENTATIONS • •

α| L × = N80 , α|O × = α0 ◦ N80 , L,p

•

291

α|O ×

L , p0 × Lv

= 1 for any rational prime p 0 6 = p, and

α| = 1 for any infinite place v of L. (This makes sense because the class number of L is 1 and because (α0 ◦ N80 )|O × = L N80 |O × .) L By results in [Lang], especially [Lang, Chap. 1, Ths. 3.6 and 4.5 and Chap. 5, Cor. 5.3], we see that there is a triple (A, λ, i)/L, where (A, λ) is a principally polarised simple abelian surface with an action i of O K , which has type (K , 8, O K , ξ ) and character α. Because α is trivial on O L×,I , we see from the fundamental theorem of complex multiplication (see [Lang, Th. 1.1 of Chap. 4]) that, for a rational prime l > 2, inertia at I acts trivially on Tl A, the l-adic Tate module of A. Thus A has good reduction at I . Let A denote the reduction mod I of the Néron model of A. Moreover, if I = (a), then Frob2 acts on Tl A via ±N80 a. As N K /Q(√5) N80 a = 2, we see that √ ±N80 a = (±(1 + 5)/2 ± a)/2, and so √ ±N80 a ≡ (1 + 5)/2 mod (N80 I )c . √ Thus A[N80 I c ] is e´√tale and Frob2 acts on it as (1 + 5)/2. If α = (1 + 5)/2, then (A, λ, i|Z[(1+√5)/2] )/L I suffices to give the desired √ example. If on the other hand α = (1 − 5)/5, then (A, λ, i|Z[(1+√5)/2] ◦ σ )/L I suffices to give the desired example. •

We now apply this theorem to deduce the modularity of certain mod 2 representations. If N , M, and k are positive integers, we denote by h k (N ; M) the Z-algebra generated by the Hecke operators T p and h pi for any prime p6 | N M, and by the Hecke operators U p for any prime p|N M acting on the space of weight k cusp forms for 01 (N ) ∩ 00 (M). If M | N , we drop it from the notation and write simply h k (N ). If p6 | N M, set S( p) = p k−2 h pi. Also, for every positive integer n, define T (n) by the relations • T (n 1 n 2 ) = T (n 1 )T (n 2 ) if n 1 and n 2 are coprime, P • (1 − T p X + pS( p)X 2 ) r∞=1 T ( pr )X r = 1 for any prime p6 | N M, and • T ( pr ) = U pr for every prime p|N M. COROLLARY 1.2 Fix a continuous homomorphism

ρ : Gal(Qac /Q) −→ SL2 (F4 ). Suppose that ρ is unramified at 2 and 5 and that ρ(Frob2 ) has distinct eigenvalues α, β ∈ F× 4 . Then there is an odd positive integer N divisible by all primes at which ρ

292


ramifies and a homomorphism f α : h 2 (N ) −→ F4 which takes (1) T p to tr ρ(Frob p ) for all primes p6 | 2N , (2) T2 to α, and (3) U p to zero for all p|N . Proof First, note that [ST, Th. 4.1] is improved in [BCDT] to suppress the condition on ρ(I3 ). Thus [ST, Th. 4.2] can be improved to suppress the condition that A has semistable reduction at 3. The proof of this corollary is then the same as the proof of [ST, Th. 4.3] except that we replace references to [ST, Th. 4.2] by this improvement and references to [ST, Th. 3.4] by references to Theorem 1.1 of this paper. 2. l-adic modular forms Let l be a prime. In this section we recall some facts about l-adic modular forms, which are applied later in the case when l = 2. The most important for us is the assertion that an l-adic limit of ordinary classical modular forms is overconvergent (see Lemma 2.12). Many of these assertions appear in the literature, but we have not been able to locate proofs for them. For primes l > 3, such results are due to N. Katz [K], but we follow Coleman’s approach via rigid geometry. Fix an integer N ≥ 5 which is not divisible by l. Let X 1 (N )/Zl denote the usual compactification of the moduli scheme for pairs (E, i), where E is an elliptic curve and i is an embedding µ N ,→ E[N ]. Also, let X 1 (N ; l)/Zl denote the usual α compactification of the moduli scheme for pairs (E, i, E → E 0 ), where E is an elliptic curve, i is an embedding µ N ,→ E[N ], and α : E → E 0 is an isogeny of degree l. There are two natural projections, π1 and π2 : X 1 (N ; l) → X 1 (N ), which α take (E, i, E → E 0 ) to (E, i) and (E 0 , α ◦ i), respectively. We let ω X 1 (N ) (resp., ω X 1 (N ;l) ) denote the canonical extension to the cusps of the pullback by the identity section of the sheaf of relative differentials of the universal elliptic curve over the noncuspidal locus of X 1 (N ) (resp., X 1 (N ; l)). Then π1∗ ω X 1 (N ) = ω X 1 (N ;l) and there is a natural map j = (α ∨ )∗ : ω X 1 (N ;l) → π2∗ ω X 1 (N ) . After one inverts l, j becomes an isomorphism. We let SS denote the finite set of points in X 1 (N )(Flac ) corresponding to supersingular elliptic curves. For s ∈ SS, choose Ts ∈ O X 1 (N )×W (Flac ),s so that ac ∼ (X 1 (N ) × W (Flac ))∧ s = Spf W (Fl )[[Ts ]]

and so that if σ ∈ Gal(Flac /Fl ) and s ∈ SS, then (1 × σ ∗ )∗ (T(1×σ ∗ )(s) ) = Ts .


293

(Here W (k) denotes the Witt vectors of k.) Let Cl denote the completion of Qlac . We let X 1 (N )an denote the rigid analytic space over Cl associated to X 1 (N ). It is connected. If r ∈ l Q and 1 ≥ r ≥ 1/l, we let X 1 (N )≥r (if r 6= 1/l) (resp., X 1 (N )>r (if r 6 = 1)) denote the rigid analytic subspace of X 1 (N )an where for each s ∈ SS we remove all points x in the residue disc of s with |Ts (x)|l < r (resp., ≤ r .) (Here | |l is the l-adic absolute value normalised by |l|l = 1/l.) LEMMA 2.1 The rigid space X 1 (N )≥r is connected.

Proof Suppose that X 1 (N )≥r has an admissible open cover {U, V } with U and V nonempty and disjoint. For each s ∈ SS the preimage of s in X 1 (N )≥r is an annulus and hence e (resp., V e) denote the union of connected and contained in either U or V . Let U U (resp., V ) with the residue disc of each s ∈ SS for which the preimage of s in e, V e} is an admissible open cover of X 1 (N )≥r is contained in U (resp., V .) Then {U an X 1 (N ) by disjoint nonempty sets, a contradiction. ⊗k over We let Mk≥r (N ) (resp., Mk>r (N )) denote the space of sections of (ωan X 1 (N ) ) X 1 (N )≥r (resp., X 1 (N )>r ). The spaces Mk≥r (N ) have natural norms making them Banach spaces. More precisely, we set

| f |r =

sup

x∈X 1 (N )≥r (Cl )

| f |x ,

where we define | f |x as follows. Let x ∈ X 1 (N )(Flac ) denote the reduction of x, and let f 0 denote a local generator for ω⊗k X 1 (N ) near x. Then we set | f |x = |( f / f 0 )(x)|l , which is easily checked to be independent of the choice of f 0 . Note that if r1 ≥ r2 and if f ∈ Mk≥r2 (N ), then | f |r1 ≤ | f |r2 . We let X 1 (N )0 denote the formal completion of X 1 (N ) along its locally closed subscheme X 1 (N ) × Fl − SS. It is a formal scheme over Zl . The base change to Cl of the rigid analytic space associated to X 1 (N )0 is just X 1 (N )≥1 . Thus we get an identification ∼ ≥1 b 0(X 1 (N )0 , ω⊗k X 1 (N ) )⊗Zl Cl → Mk (N ), ≥1 b under which 0(X 1 (N )0 , ω⊗k X 1 (N ) )⊗Zl OCl is identified to the unit ball in Mk (N ). There is a map Spec Zl ((q)) −→ X 1 (N )

294


corresponding to the pair (Gm /q Z , i can ), where Gm /q Z denotes the Tate curve (Tate(q) in the notation of [KM, Sec. 8.8]) and where i can comes from the tautological embedding µ N ,→ Gm (see [KM, Prop. 8.11.7]). This map extends to a map Spec Zl [[q]] −→ X 1 (N ) (use [KM, Th. 8.11.10]), and this gives rise to a map Spf Zl [[q]] −→ X 1 (N )0 . If f ∈ 0(X 1 (N )0 , ω⊗k X 1 (N ) ), then its pullback to Spf Zl [[q]] has the form ∞ X

cn ( f )q n (dt/t)⊗k ,

n=0

where t is the usual parameter on Gm and where we refer to q-expansion at infinity of f . This extends to a map

P∞

n=0 cn ( f )q

n

as the

Mk≥1 (N ) −→ Cl [[q]],

f 7 −→

∞ X

cn ( f )q n .

n=0

From the q-expansion principle (see [K, Sec. 1.6] and note that X 1 (N ) × Flac is irreducible), we deduce that for f ∈ Mk≥1 (N ) we have | f |1 = sup |cn ( f )|l . n

⊗(l−1)

If l ≥ 5, we let E denote the section of ω X 1 (N ) over X 1 (N ) with q-expansion at infinity, ∞ X 1 − (2(l − 1)/Bl−1 ) σl−2 (n)q n , n=1

P t where Bk denotes the Bernoulli number, and σt (n) = 01/l . If l ≥ 3, we set E 0 = E. If l = 2, we take E 0 to be the section of ω⊗4 X 1 (N ) over X 1 (N ) with q-expansion at infinity, 1 + 240

∞ X

σ3 (n)q n .

n=1

In either case the q-expansion at infinity of E 0 is congruent to 1 modulo l and E 0 has no zeros in X 1 (N )>l −1/4 . We recall some elementary results about rigid analytic functions on annuli. The set of analytic functions on the annulus β ≤ |z|l ≤ α is the set of functions ∞ X

f (z) =

an z n

n=−∞

for which |an |l

βn

→ 0 as n → −∞ and |an |l α n → 0 as n → ∞.

LEMMA 2.2 If r ∈ l Q and β ≤ r ≤ α, then the supremum of | f (z)|l on |z|l = r equals

sup |an |l r n . n

Proof Set A = supn |an |l r n . Then sup | f (z)|l = A sup

|z|l =r

X

|w|l =1 |a | r n =A n l

cn w n

l

for some cn with |cn |l = 1. However, for |w|l = 1 we see that X n cn w ≤ 1 l

|an |l r n =A

with equality for some such choice of w, which is enough. P n In particular, we see that if f (z) = ∞ n=∞ an z is a function on the annulus β ≤ |z|l ≤ α, then | f (z)|l always achieves its maximum on either |z|l = α or |z|l = β (or possibly on both). In the former case this maximum equals sup |an |l α n = sup |an |l α n , n

n≥0

296


and in the latter case it equals sup |an |l β n = sup |an |l β n . n

n≤0

Suppose now that f is an analytic function on the annulus β ≤ |z|l < α such that | f (z)|l is bounded by A. Then we have f (z) =

∞ X

an z n ,

n=−∞

where |an |l β n → 0 as n → −∞ and where for all n we have |an |l ≤ Aα −n and |an |l ≤ Aβ −n . If | f (z)|l achieves its supremum, it does so on |z|l = β and the supremum equals sup |an |l β n = sup |an |l β n . n

n≤0

LEMMA 2.3 Suppose that 1 > r > 1/l, that r ∈ l Q , and that f is a rigid analytic function on X 1 (N )≥r . Then | f (x)|l achieves its supremum and does so at some point y that reduces to an element s ∈ SS and satisfies |Ts (y)|l = r .

Proof Because X 1 (N )≥r is a finite union of affinoids, the maximum modulus principle tells us that | f (x)|l does achieve its supremum. Thus we may assume that this supremum equals 1. If | f (x)|l does not achieve its supremum in X 1 (N )≥1 , then it does so in the inverse image under reduction of some s ∈ SS and the lemma follows from the facts about rigid analytic functions on annuli which we recalled above. Thus, suppose that f achieves its maximum in X 1 (N )≥1 . As | f (x)|l ≤ 1 on X 1 (N )≥1 , f is a global section of the structure sheaf of the formal completion of X 1 (N ) × OCl along X 1 (N ) × Flac − SS, and it thus reduces to give a regular function f on X 1 (N ) × Flac − SS. Thus we may choose s ∈ SS such that either f has a pole at s or f is constant. Choose also an affine neighbourhood U of s in X 1 (N ) × Flac which contains no other element of SS and which admits a regular function g that has a simple zero at s and no other zero on U . Let the formal completion of X 1 (N )×W (Flac ) along U equal Spf A, and let g ∈ A be a lift of g. Note that the formal completion of X 1 (N )× W (Flac ) at s is isomorphic to Spf W (Flac )[[g]].


297

bOCl )hhSii/(gS − 1). The formal completion of X 1 (N ) × OCl along U − {s} is Spf(A⊗ Thus we may expand f as ∞ X fi Si i=0

bOCl ) and f i → 0 as i → ∞. The same expansion holds on the rigid with f i ∈ (A⊗ analytic subspace of X 1 (N )≥r consisting of points that reduce to U (as this space is connected, being the inverse image under reduction of a Zariski-connected space). Moreover, on U we see that ∞ X f = f i g −i , i=0

where f i denotes the reduction of f i and where now the sum is finite. In the formal completion of X 1 (N ) × OCl at s, we may expand fi =

∞ X

ai j g j

j=0

with ai j ∈ OCl . Thus, on the rigid analytic subspace of X 1 (N )≥r consisting of points that reduce to s, we see that f =

∞ X X k=−∞

ai,i+k g k .

i

(The second sum is over i ∈ Z such that i ≥ 0 and i + k ≥ 0.) Similarly, we see that in the formal completion of X 1 (N ) × Flac at s we have f =

∞ X X k=−∞

ai,i+k g k .

i

Write bk for i ai,i+k . Then bk ∈ OCl and either • bk is a unit for some k < 0, or • b0 is a unit and bk reduces to zero for all k 6 = 0. In either case we see that the supremum of | f (x)|l on |g(x)|l = r (i.e., on |Ts (x)|l = r ) is greater than or equal to 1, as desired. P

2.4 If 1 > r > 1/l, then there is a constant C (depending on k, N , and r ) such that for all f ∈ Mk≥r (N ) we have | f |r ≤ C sup | f |x , LEMMA

s,x

where s runs over SS and where x runs over elements of the residue disc of s with |Ts (x)|l = r .

298


Proof If l = 2, reduce to the case when 5|N by passing to a cover. By Lemma 2.3 we see that | f l−1 /E k |l on X 1 (N )≥r achieves its supremum at some point x that reduces to some s ∈ SS and that satisfies |Ts (x)|l = r . Thus for all y ∈ X 1 (N )≥r we have k l−1 k | f |l−1 y /|E| y ≤ sup | f |x /|E|x , s,x

where s and x run over the sets described in the statement of the lemma. Hence k | f |l−1 ≤ |E|rk sup(| f |l−1 r x /r ), s,x

where again s and x run over the sets described in the statement of the lemma. The lemma follows with C = (|E|r /r )k/(l−1) . ≥r For each s ∈ SS, choose a local generator f s of ω⊗k X 1 (N ) near s. If f ∈ Mk (N ) and s ∈ SS, then restricting f to the annulus 1 > |Ts (x)|l ≥ r in the residue disc of s we see that f / f s can be expanded as

f / fs =

∞ X

an (s, f )Tsn ,

n=−∞

where the an (s, f ) are bounded for n > 0 and where |an (s, f )|l r n −→ 0 as n → −∞. Choose a nonnegative integer M such that r −M > C (the constant from the lemma above), and choose πr ∈ Cl with |πr |l = r . Now consider the map 2 from Mk≥r (N ) to the direct sum of #SS Tate algebras Cl hT i SS which sends f to ∞ X

a M−n (s, f )πrM−n T n

n=0

s∈SS

.

P One clearly has |2( f )| ≤ | f |r . (Here, as usual, we set |( n bn (s)T n )s∈SS | = sups,n |bn (s)|l .) On the other hand, for all n ∈ Z and s ∈ SS we have |an (s, f )πrn |l ≤ |2( f )|; if this were false, then we could choose s and n so that |an (s, f )πrn |l is maximal. Then we must have n > M and we see that | f |r ≥ |an (s, f )|l = r −n sup | f |x > C sup | f |x ≥ | f |r , s,x

s,x

a contradiction. Thus C|2( f )| ≥ C sup | f |x ≥ | f |r . s,x

We deduce that 2 is a homeomorphism onto a closed subspace of Cl hT i SS .


299

LEMMA 2.5 Suppose that 1 > r1 > r2 > 1/l. Then the natural inclusion ≥r2

Mk

(N ) ,→ Mk≥r1 (N )

is completely continuous. Proof We have a commutative diagram ≥r2

(N ) ,→ Mk≥r1 (N ) ↓ ↓ Cl hT i SS −→ Cl hT i SS

Mk

P ∞ n=0

bn (s)T n

7 −→

s∈SS

P ∞

bn (s)(πr2 /πr1 )n T n

n=0

s∈SS

where the vertical arrows are homeomorphisms onto closed subspaces (and where we have made the same choice of M to define both vertical arrows). The lower horizontal arrow is a limit of continuous operators with finite range and hence completely continuous. It follows that the upper horizontal arrow is completely continuous. The reduction X 1 (N ; l) × Flac of X 1 (N ; l) has two irreducible components that we denote X 1 (N ; l)∞ and X 1 (N ; l)0 . We choose the labelling so that ∼ • π1 : X 1 (N ; l)∞ −→ X 1 (N ) × Flac , • π2 : X 1 (N ; l)∞ −→ X 1 (N ) × Flac has degree l, • π1 : X 1 (N ; l)0 −→ X 1 (N ) × Flac has degree l, and ∼ • π2 : X 1 (N ; l)0 −→ X 1 (N ) × Flac . The two curves X 1 (N ; l)∞ and X 1 (N ; l)0 intersect in a finite number of points which ∼ ∼ we denote SSl . Then π1 : SSl → SS and π2 : SSl → SS are both bijections (see, for instance, [KM, Lem. 5.3.1] for these assertions). If s ∈ SSl , we write Ts,i for πi∗ Tπi s . 2.6 If s ∈ SSl , then (X 1 (N ; l) × W (Flac ))∧ s is isomorphic to LEMMA

l l Spf W (Flac )[[Ts,1 , Ts,2 ]]/((Ts,1 − Ts,2 )(Ts,2 − Ts,1 ) − lu s )

for some u s ∈ W (Flac )[[Ts,1 , Ts,2 ]]× . Proof ∼ [KM, Th. 6.6.2] tells us that (X 1 (N ; l) × W (Flac ))∧ s = Spf R for some 2-dimensional,

300


regular complete local ring R, which is flat over W (Flac ). [KM, Th. 13.4.7] tells us that l l R/l R ∼ )(Ts,2 − Ts,1 )). = Flac [[Ts,1 , Ts,2 ]]/((Ts,1 − Ts,2 Thus we have a surjection W (Flac )[[Ts,1 , Ts,2 ]] → → R and the kernel must be generated by one element f with l )(T l • f ≡ (Ts,1 − Ts,2 s,2 − Ts,1 ) mod l, and • f 6∈ (l, Ts,1 , Ts,2 )2 . The lemma follows. COROLLARY 2.7 If s ∈ SSl , then ac ∼ (X 1 (N ; l) × W (Flac ))∧ s = Spf W (Fl )[[X 1 , X 2 ]]/(X 1 X 2 − l).

Proof l ) and X = (T l −1 Take, for instance, X 1 = (Ts,1 − Ts,2 2 s,2 − Ts,1 )u s . 0 For r ∈ l Q and 1 ≥ r > 1/l, we define X 1 (N ; l)∞ ≥r (resp., X 1 (N ; l)≥r ) to be the an admissible open subset of X 1 (N ; l) consisting of • all points of X 1 (N ; l)an which reduce to a point of X 1 (N ; l)∞ − SSl (resp., X 1 (N ; l)0 − SSl ), and • all points x ∈ X 1 (N ; l)an which reduce to some s ∈ SSl and for which

|Ts,1 (x) − Ts,2 (x)l |l ≥ r (resp., |Ts,2 (x) − Ts,1 (x)l |l ≥ r ). If in fact 1 > r 2 > 1/l and s ∈ SSl , then we let Us (r ) denote the admissible open subset of X 1 (N ; l)an consisting of points that reduce to s and that satisfy |Ts,1 (x) − Ts,2 (x)l |l ≤ r and |Ts,2 (x) − Ts,1 (x)l |l ≤ r. It is easy to check that these sets do not depend on the choice of {Ts } as long as the choices satisfy (1 × σ ∗ )∗ (T(1×σ ∗ )(s) ) = Ts for σ ∈ Gal(Flac /Fl ).


301

LEMMA 2.8 If r1 , r2 , r3 ∈ l Q , 1 > r12 > 1/l, r1 > r2 > 1/l, and r1 > r3 > 1/l, then the sets • X 1 (N ; l)∞ ≥r2 , 0 • X 1 (N ; l)≥r3 , and • for each s ∈ SSl the set Us (r1 ) form an admissible cover of X 1 (N ; l)an by connected admissible open subsets.

Proof This seems to be very well known, but as we are unable to find a reference, let us sketch the argument. Take an affine Zariski cover U 0 , U ∞ , and Us for s ∈ SSl of X 1 (N ; l) × Flac , where for s ∈ SSl we have SSl ∩ Us = {s}, where U 0 = X 1 (N ; l) × Flac − X 1 (N ; l)∞ , and where U ∞ = X 1 (N ; l) × Flac − X 1 (N ; l)0 . Shrinking Us if necessary, choose a regular function xs0 on Us which is identically zero on X 1 (N ; l)∞ ∩ Us and nonzero on (X 1 (N ; l)0 ∩ Us ) − {s} with a simple zero at s. We can lift xs0 to some affine open subset of X 1 (N ; l) × W (Flac ) which intersects the special fibre in Us . Set xs∞ = p/xs0 . In (X 1 (N ; l) × W (Flac ))∧ s we have P∞ P∞ xs0 = i=1 ai X 2i + l f = X 2 ( i=1 ai X 2i−1 + X 1 f ); that is, xs0 is X 2 times a unit (the same X 1 , X 2 as in Corollary 2.7). Thus, again shrinking Us if necessary, we may assume that xs∞ is regular on Us , identically zero on X 1 (N ; l)0 ∩ Us , and nonzero on ∞ (X 1 (N ; l)∞ ∩ Us ) − {s}. Moreover, in (X 1 (N ; l) × W (Flac ))∧ s , x s is a unit times X 1 . We let U∞ (resp., U0 , resp., Us ) denote the preimage in X 1 (N ; l)an of U∞ (resp., U0 , resp., Us ). They form an admissible affinoid cover of X 1 (N ; l)an . For r ∈ l Q and 0 ∞ ⊂ U ) to be the locus where |x 0 | ≥ r 1 ≥ r > 1/l, set Us,≥r ⊂ Us (resp., Us,≥r s s l ∞ (resp., |xs |l ≥ r ). Note also that Us (r ) is the subspace of Us where |xs0 |l ≤ r and 0 |xs∞ |l ≤ r . Note that X 1 (N ; l)0≥r (resp., X 1 (N ; l)∞ ≥r ) is the union of U0 and Us,≥r ∞ for s ∈ SS ). If r , r , r ∈ l Q , 1 > r 2 > 1/l, for s ∈ SSl (resp., U∞ and Us,≥r l 1 2 3 1 ∞ ,U0 r1 > r2 > 1/l, and r1 > r3 > 1/l, then Us,≥r , and U (r ) form an admiss 1 s,≥r3 2 0 , and U (r ) for s ∈ SS sible affinoid cover of Us . Thus X 1 (N ; l)∞ , X (N ; l) s 1 l 1 ≥r2 ≥r3 form an admissible open cover of X 1 (N ; l)an . It remains to show that for r ∈ l Q and 1 ≥ r > 1/l the spaces X 1 (N ; l)0≥r and X 1 (N ; l)∞ ≥r are connected. To save on notation, we explain only the case of 0 0 X 1 (N ; l)≥r . It suffices to check that U0 and Us,≥r for s ∈ SSl are all connected. This follows because in each case the reduction map gives a continuous map with connected fibres to a connected (in the Zariski topology) space. If r ∈ l Q and 1 ≥ r > l −l/(1+l) , then it is easy to check that 0 π1−1 X 1 (N )≥r = X 1 (N ; l)∞ ≥r q X 1 (N ; l)≥r 1/l

and π2−1 X 1 (N )≥r = X 1 (N ; l)∞ q X 1 (N ; l)0≥r . ≥r 1/l

302


0 ∞ 0 Moreover, X 1 (N ; l)∞ ≥r and X 1 (N ; l)≥r 1/l (resp., X 1 (N ; l)≥r and X 1 (N ; l)≥r 1/l ) form

an admissible open cover of π1−1 X 1 (N )≥r (resp., π2−1 X 1 (N )≥r ). As X 1 (N ; l) → X 1 (N ) is finite flat of degree l + 1, the same is true of the analytifications. Thus 0 π1 : X 1 (N ; l)∞ ≥r q X 1 (N ; l)≥r 1/l −→ X 1 (N )≥r

and π2 : X 1 (N ; l)∞ q X 1 (N ; l)0≥r −→ X 1 (N )≥r ≥r 1/l are both finite and flat of degree l + 1. Looking at the cardinality of the preimages of points, we deduce the following lemma. 2.9 Suppose that r ∈ l Q and 1 ≥ r > l −l/(1+l) ; then

LEMMA

(1)

∼

π1 : X 1 (N ; l)∞ ≥r −→ X 1 (N )≥r and

∼

π2 : X 1 (N ; l)0≥r −→ X 1 (N )≥r . (2)

Suppose that r ∈ l Q and 1 ≥ r > l −1/(1+l) ; then π2 : X 1 (N ; l)∞ ≥r −→ X 1 (N )≥r l and π1 : X 1 (N ; l)0≥r −→ X 1 (N )≥r l are both finite flat of degree l.

We define a bounded linear map l

≥r ≥r U = (1/l) trπ2 ◦ j ◦ π1 |−1 X (N ;l)∞ : Mk (N ) −→ Mk (N ). 1

≥r

One may check that U is compatible with the map on q-expansions which sends ∞ X n=0

an q n 7 −→

∞ X

anl q n .

n=0

Note that for 1 ≥ r ≥ l −l/(1+l) , using π1 to identify X 1 (N ; l)∞ ≥r and X 1 (N )≥r , we get a map Hom(h k (N ; l), Cl ) ,→ Mk≥r (N )


303

which sends f to the form with q-expansion at infinity, ∞ X

f (T (n))q n .

n=1

Under this map the Hecke operator Ul corresponds to the linear map U . l Suppose that 1 > r > l −1/(1+l) . Combining U : Mk≥r (N ) → Mk≥r (N ) with the l

inclusion Mk≥r (N ) ,→ Mk≥r (N ), we get a continuous endomorphism of Mk≥r (N ), which we also denote U . It follows from Lemma 2.5 that U is completely continuous as an endomorphism of Mk≥r (N ). From the theory of completely continuous operators on p-adic Banach spaces (see [S1]), we see that we may write Mk≥r (N ) = Mk≥r (N )0 ⊕ Mk≥r (N )1

as a direct sum of U -invariant subspaces, where Mk≥r (N )0 is finite-dimensional, all the eigenvalues of U |M ≥r (N )0 are l-adic units, and U |M ≥r (N )1 is topologically nilpok

k

tent (i.e., if f ∈ Mk≥r (N )1 , then U r f → 0 as r → ∞). We let e denote projection onto the summand Mk≥r (N )0 , so that e f = lim U r ! f. r →∞

LEMMA 2.10 If f ∈ Mk≥r (N )0 for some 1 > r > l −1/(1+l) , then

f ∈ Mk>l

−l/(1+l)

(N ).

Proof i Choose a minimal integer i such that r l ≤ l −1/(1+l) , and write f = U i+1 f 0 for some ≥r f 0 ∈ Mk (N )0 . Then we see that li

U i f 0 ∈ Mk≥r (N ) ⊂ Mk>l and hence that

f = U (U i f 0 ) ∈ Mk>l

−1/(1+l)

−l/(1+l)

(N )

(N ).

LEMMA 2.11 Suppose that 1 > r ≥ l −1/(1+l) , that f ∈ Mk≥r (N ), that a ∈ Cl is an l-adic unit, and that ∈ R>0 . If |U f − a f |1 ≤ ,

then | f − e f |1 ≤ .

304


Proof For all positive integers t, we see that |U t! f − a t! f |1 ≤ . Taking the limit as t → ∞ and noting that | |1 ≤ | |r , the lemma follows. (We remark that a t! → 1 as t → ∞.) 2.12 Suppose we are given an integer k and a formal q-expansion LEMMA

∞ X

an q n ∈ Cl [[q]]

n=1

such that for all n we have anl = al an and such that al is an l-adic unit. Suppose we also have two series of positive integers ti and ki and a series of abelian group homomorphisms f i : h ki (N ; l) → Cl such that (1) ti → ∞ as i → ∞, (2) ki ≡ k mod (l − 1)l ti −1 , and (3) for all positive integers n and for all i, we have f i (T (n)) ≡ an mod l ti . Then

P

an q n is the q-expansion at infinity of an element of Mk>l

−l/(1+l)

(N ).

Proof P By Lemma 2.10 we need only show that an q n is the q-expansion at infinity of ≥r an element of Mk (N ) for some r < 1. Choose such an r with r > l −1/4 and r > l −1/(1+l) . We may suppose that each ti ≥ 3. Set h = 4 if l = 2, and h = l − 1 otherwise. Then f i corresponds to an element of Mk≥r (N ) which we also denote by i ≥r 0 (k −k)/ h i f i . Moreover, f i /(E ) ∈ Mk (N ) and has q-expansion at infinity congruP ent to n an q n modulo l ti . (If l = 2, note that E 0 is congruent to 1 modulo 24 .) Thus e( f i /(E 0 )(ki −k)/ h ) ∈ Mk≥r (N )0 also has q-expansion at infinity congruent to P ≥r n ti 0 n an q modulo l . As Mk (N ) is finite-dimensional, all l-adic norms are equivalent. The e( f i /(E 0 )(ki −k)/ h ) form a Cauchy sequence for | |1 and hence also for | |r . Let f ∈ Mk≥r (N )0 denote the limit of the e( f i /(E 0 )(ki −k)/ h ) in both of these norms. P Then f has q-expansion at infinity n an q n , as desired. Finally, we state the generalisation of [BT, Th. 4] to l = 2 and 3. Although in [BT] there is a running hypothesis that l ≥ 5, the proof of this theorem given there makes no use of that hypothesis.


305

THEOREM 2.13 Let N and k denote integers with N ≥ 5. Let l6 | N be a prime. Suppose α and β are −l/(1+l) distinct nonzero elements of Cl and that f α , f β ∈ Mk>l (N ) are eigenvectors for U with eigenvalues α and β. Suppose also that f α (resp., f β ) have q-expansions P P at infinity, n≥1 an ( f α )q n (resp., n≥1 an ( f β )q n ), and that for all positive integers n not divisible by l, we have an ( f α ) = an ( f β ).

Then f = (α f α − β f β )/(α − β) is classical; that is, there is an abelian group homomorphism f 0 : h k (N ) → Cl such that for all n, f 0 (T (n)) = (αan ( f α ) − βan ( f β ))/(α − β). 3. 2-adic Hida theory and deformation theory In this section we draw together some results about 2-adic Hida theory which are not well documented in the literature, and we deduce some slight extensions of the results of [Di]. If N ≥ 5 is an odd positive integer, we let h 0 (N ) denote lim e(h 2 (2r N ) ⊗Z Z2 ), ←−

r

where e denotes H. Hida’s idempotent e = lim U2t! . t→∞

Taking the limit of the homomorphisms h i : (Z/2r N Z)× −→ e(h 2 (2r N ) ⊗Z Z2 )× , we get a continuous homomorphism 0 × S = S 2 × S2 : (Z/N Z)× × Z× 2 −→ h (N ) .

∼ Z2 [[T ]], where T + 1 is We let 3 denote the completed group ring Z2 [[(1 + 4Z2 )]] = identified with the element 5 of 1+4Z2 . Then S2 induces a continuous homomorphism 3 → h 0 (N ). According to [Hi, Ths. 3.3 and 3.4], h 0 (N ) is a finitely generated, torsion-free 3-module and for any integer k ≥ 2 we have a surjection h 0 (N )/(S2 (5) − 5k−2 ) → → e(h k (4N ) ⊗Z Z2 ) which sends T (n) to T (n) for all n and which becomes an isomorphism after tensoring with Q2 .

306


Set e± = (1 ± S2 (−1))/2 and h 0 (N )± = e± h 0 (N ) ⊂ h 0 (N ) ⊗Z2 Q2 . Then h 0 (N ) ⊂ h 0 (N )+ ⊕ h 0 (N )− ⊂ (1/2)h 0 (N ). Thus we see that h 0 (N )± are finitely generated torsion-free 3-modules, and so from the structure theory of finitely generated 3-modules we see that we have exact sequences of 3-modules (0) −→ h 0 (N )± −→ 3r± −→ X ± −→ (0), where r± are nonnegative integers and where X ± have finite cardinality 2a± . LEMMA 3.1 If k ≥ 2 is an integer with the same parity as (1 ∓ 1)/2, then there is a surjection

h 0 (N )± /(S2 (5) − 5k−2 ) → → e(h k (2N ) ⊗Z Z2 ) which sends T (n) to T (n) for all n, and with kernel of finite order divisible by 2a± . Proof Observe first that U2 maps the space of modular forms of weight k and level 01 (2N )∩ 00 (4) to the space of modular forms of weight k and level 01 (2N ) (cf., for instance, [Hi, Prop. 8.3]). We deduce that for k ≡ (1 ∓ 1)/2 mod 2 we have an equality ee± (h k (4N ) ⊗ Q2 ) = e(h k (2N ) ⊗ Q2 ), and the lemma follows. Similarly, set h 2 (4N )− = e− h 2 (4N ) ⊂ h 2 (4N ) ⊗ Q. LEMMA 3.2 Suppose that f : h 0 (N )− → Qac 2 is a continuous Z2 -algebra homomorphism such that f (S2 (5)) = 1/5. Then ∞ X f (T (n))q n n=1

is the q-expansion at infinity of an element of M1>2

−2/3

(N ).

Proof For each integer r ≥ 1, set k(r ) = 1 + 2a− +r . Then we can find a continuous homomorphism of Z2 -modules fr : e(h k(r ) (2N ) ⊗Z Z2 ) −→ Qac 2


307

such that fr (T (n)) ≡ f (T (n)) mod 2r +2 for all n. The lemma follows from Lemma 2.12. Suppose that k ≥ 2 is an integer. If ℘ is a minimal prime ideal of h 0 (N )± containing S2 (5) − 5k−2 , then h 0 (N )± /℘ is a 1-dimensional integral domain. Thus ℘ contains ker h 0 (N )± → → ee± (h k (4N ) ⊗Z Z2 ) . Thus contraction gives a bijection between prime ideals of ee± (h k (4N ) ⊗Z Z2 ) and prime ideals of h 0 (N )± containing S2 (5) − 5k−2 , and hence also a bijection between maximal ideals of ee± (h k (4N ) ⊗Z Z2 ) and maximal ideals of h 0 (N )± . Hence to any maximal ideal m of h 0 (N )± we can associate a continuous semisimple representation ρ m : Gal(Qac /Q) −→ GL2 (h 0 (N )± /m) such that for all but finitely many primes p we have tr ρ m (Frob p ) = T p , and • det ρ m (Frob p ) = pS( p). We call m Eisenstein if ρ m is not absolutely irreducible. Note that the intersection over all integers k ≥ 2 with k ≡ (1 ∓ 1)/2 of ker h 0 (N )± → → e(h k (2N ) ⊗ Q2 ) •

equals \

S2 (5) − 5k−2 3r± ∩ h 0 (N ) = (0).

k

Thus h 0 (N )± = lim h 0 (N )± /

\

←

ker h 0 (N )± → → e(h k (2N ) ⊗ Z2 ) ,

k∈K

where the inverse limit is over finite sets K of integers k ≥ 2 with k ≡ (1 ∓ 1)/2 mod 2. For each k ≥ 2 there is a continuous 2-dimensional pseudorepresentation (see [Ta1] for the definition of pseudorepresentation) T : Gal(Qac /Q) −→ e(h k (2N ) ⊗ Z2 ) such that for all primes p6 | 2N the pseudorepresentation T is trivial on I p , and, moreover, T (Frob p ) = T p and T (Frob2p ) = T p2 − 2 pS( p). We remind the reader that I p is the inertia group at p, and to say that T is trivial on I p means that T (σ τ ) = T (τ ) for all σ ∈ I p and τ ∈ Gal(Qac /Q). By the Chebotarev density theorem, we see that T is

308


uniquely defined by these properties. Thus for any finite set K as in the last paragraph we get a continuous pseudorepresentation \ T : Gal(Qac /Q) −→ h 0 (N )± / ker h 0 (N )± → → e(h k (2N ) ⊗ Z2 ) k∈K

⊂

M

e(h k (2N ) ⊗Z Z2 )

k∈K

such that for all primes p6 | 2N the pseudorepresentation T is trivial on I p , T (Frob p ) = T p , and T (Frob2p ) = T p2 − 2 pS( p). Taking the limit over K , we find a continuous pseudorepresentation T : Gal(Qac /Q) −→ h 0 (N )± such that for all primes p6 | 2N the pseudorepresentation T is trivial on I p , T (Frob p ) = T p , and T (Frob2p ) = T p2 − 2 pS( p). By [N, main theorem] (see also [Ro]), we see that if m is a non-Eisenstein maximal ideal of h 0 (N )± , then there is a continuous representation ord ρm : Gal(Qac /Q) −→ GL2 (h 0 (N )±,m ) ord is unramified at all primes p6 | 2N and satisfies such that ρm ord • tr ρm (Frob p ) = T p , and ord (Frob ) = pS( p). • det ρm p It is known (by [De] or [W, Th. 2.1.4]) that ρ m |ss is unramified. We Gal(Qac 2 /Q2 ) ss suppose that ρ m |Gal(Qac /Q ) (Frob2 ) has two distinct eigenvalues α and β. Then it is 2

2

also known that α, β ∈ h 0 (N )± /m and that either U p − α ∈ m or U p − β ∈ m (see [De] or [W, Th. 2.1.4]). We suppose it is the former. Choose an element σ0 ∈ ord Gal(Qac 2 /Q2 ) above Frob2 . It follows from Hensel’s lemma that ρm (σ0 ) has distinct 0 eigenvalues A and B in h (N )±,m with A ≡ α mod m and B ≡ β mod m. Choose ord (σ ) with eigenvalues a basis (e B , e A ) of h 0 (N )2±,m consisting of eigenvectors of ρm 0 B and A, respectively. With respect to this basis, write a(σ ) b(σ ) ord ρm (σ ) = . c(σ ) d(σ ) Also, write ψa for the unramified character of Gal(Qac 2 /Q2 ) which takes Frob2 to a, • χ2 for the 2-adic cyclotomic character, and • S for the composite •

∼

S

× 0 × Gal(Qac /Q) −→ Gal(Q(µ2∞ N )/Q) → Z× 2 × (Z/N Z) −→ h (N ) .


309

Then by [W, Th. 2.1.4] we see that for any integer k ≥ 2 with k ≡ (1 ∓ 1)/2 mod 2 and for any σ ∈ Gal(Qac 2 /Q2 ), we have • a(σ ) ≡ (χ2 SψU−1 )(σ ), 2 • c(σ ) ≡ 0, and • d(σ ) ≡ ψU2 (σ ), all modulo ker h 0 (N )±,m → → e(h k (2N ) ⊗Z Z2 )m . We conclude that ord ρm |Gal(Qac 2 /Q2 )

∼

χ2 SψU−1 2 0

∗ ψU2

!

and that A = U2 . Now, suppose that ρ : Gal(Qac /Q) −→ GL2 (Fac 2 ) is a continuous representation such that • ρ(c) 6= 1, • ρ|ss is unramified and ρ|ss (Frob2 ) has distinct eigenvalues Gal(Qac Gal(Qac 2 /Q2 ) 2 /Q2 ) α and β, • ρ|Gal(Q(√−1)ac /Q(√−1)) is irreducible, and • such that there exists an odd integer N ≥ 5 and a homomorphism f : h 2 (N ) −→ Fac 2 satisfying (1) f (T2 ) = α, (2) f (T p ) = tr ρ(Frob p ) for all primes p6 | 2N , and (3) f ( pS( p)) = det ρ(Frob p ) for all primes p6 | 2N . We let N (ρ) denote the conductor of ρ. Suppose also that S is a finite set of odd primes containing all the primes where ρ ramifies and some prime p ≥ 5. Then set Y dim ρ Ip N S (ρ) = N (ρ) p . p∈S

It follows from [Buz, Th. 3.1] that we can find a ring homomorphism h 2 (2N S (ρ)) −→ Fac 2 such that • U2 maps to α, • U p maps to zero if p ∈ S, • T p maps to tr ρ(Frob p ) if p6 | 2N S (ρ), and • pS( p) maps to det ρ(Frob p ) if p6 | 2N S (ρ).

310


(It is here we use the fact that ρ|Gal(Q(√−1)ac /Q(√−1)) is irreducible, rather than the weaker assumption that ρ is irreducible.) We let m S (ρ, α)+ denote the kernel of this homomorphism. LEMMA 3.3 Keep the above notation and assumptions. (1) There is a ring homomorphism h 2 (4N S (ρ))− → Fac 2 such that • U2 maps to α, • U p maps to zero if p ∈ S, • T p maps to tr ρ(Frob p ) if p6 | 2N S (ρ), and • pS( p) maps to det ρ(Frob p ) if p6 | 2N S (ρ). We denote its kernel m S (ρ, α)− . (2) There is a surjection

→ h 2 (2N S (ρ))m S (ρ,α)+ /(2) h 2 (4N S (ρ))−,m S (ρ,α)− /(2) → which takes T (n) to T (n) for all n. Proof Let T denote the polynomial algebra over Z2 generated by variables t p and s p for p6 | 2N S (ρ) and u p for p|2N S (ρ). Then there is a natural map T → h 2 (2N S (ρ))m S (ρ,α)+ /(2) which sends t p to T p , and so on. Let m denote the pullback of m S (ρ, α)+ . It is a maximal ideal of T. Let Y denote the open (i.e., with the cusps removed) modular curve of level 01 (2N S (ρ)) ∩ 00 (4). Let denote the character of 01 (4N S (ρ))/(01 (2N S (ρ)) ∩ 00 (4)) of order 2, thought of as a character of the fundamental group of Y . It is known that H 1 (Y, Z2 )m ∼ = h 2 (2N S (ρ))2m S (ρ,α)+ , where T acts on the cohomology by sending t p to T p , and so on (see [Gr, Prop. 12.10]). Because H 2 (Y, Z2 ) = (0) (as Y is affine), we conclude that 2 H 1 (Y, F2 )m ∼ = h 2 (2N S (ρ))m S (ρ,α)+ /(2) . Thus to prove the lemma it suffices to see that the action of T on H 1 (Y, F2 )m factors through h 2 (4N S (ρ))−,m S (ρ,α)− . However, H 1 (Y, F2 )m = H 1 (Y, F2 ())m ∼ = H 1 (Y, Z2 ())m ⊗ F2 because H 2 (Y, Z2 ()) = (0) (as Y is affine.) Finally, the action of T on H 1 (Y, Z2 ())m factors through h 2 (4N S (ρ))−,m S (ρ,α)− because H 1 (Y, Z2 ())m is torsion-free (because in turn H 0 (Y, F2 ())m = H 0 (Y, F2 )m = (0), as m is nonEisenstein).


311

We remark that by our choice of N S (ρ), for p ∈ S we have U p = 0 in each of h 2 (2N S (ρ))m S (ρ,α)+ , h 2 (4N S (ρ))−,m S (ρ,α)− , and h 0 (N S (ρ))±,m S (ρ,α)± . To see this it suffices to check that for p ∈ S we have U p = 0 in h k (4N S (ρ))m S (ρ,α)± whenever k ≥ 2 and k ≡ (1 ∓ 1)/2 mod 2. This is, however, standard (see, e.g., [CDT, Cor. 4.2.3 and proof of Lem. 5.1.1]). We let (2) (2) ρ S,α,± : Gal(Qac /Q) −→ GL2 (R S,α,± ) denote the universal deformation of ρ to a representation that is unramified outside S ∪ {2} and that when restricted to Gal(Qac 2 /Q2 ) is of the form

φ1 0

∗ φ2

,

where φ2 is unramified and φ2 (Frob2 ) ≡ α modulo the maximal ideal, and where, thinking of φ1 as a character of Q× 2 by local class field theory, we have φ1 (−1) = ∓1 and φ1 (x) = x for all x ∈ (1 + 4Z2 ). Similarly, we let ord ord ρ S,α,± : Gal(Qac /Q) −→ GL2 (R S,α,± )

denote the universal deformation of ρ to a representation that is unramified outside S ∪ {2} and that when restricted to Gal(Qac 2 /Q2 ) is of the form

φ1 0

∗ φ2

,

where φ2 is unramified and φ2 (Frob2 ) ≡ α modulo the maximal ideal, and where, thinking of φ1 as a character of Q× 2 by local class field theory, we have φ1 (−1) = ∓1. −1 The character φ1 χ2 gives a continuous homomorphism, which we denote S2 , ord (1 + 4Z2 ) −→ (R S,α,± )× ord and so makes R S,α,± into a 3-algebra. From the definitions one sees that (2)

ord R S,α,± = R S,α,± /(S2 (5) − 1).

From the universal properties we get maps (2) • R S,α,+ −→ h 2 (2N S (ρ))m(ρ,α)+ , (2)

R S,α,− −→ h 2 (4N S (ρ))−,m(ρ,α)− , and ord • R S,α,± −→ h 0 (N S (ρ))±,m(ρ,α)± , which we claim are surjections. To see this, note that U p = 0 if p ∈ S, that T p = (2) ord tr ρm(ρ,α)± (Frob p ) or tr ρm(ρ,α) (Frob p ) is in the image for p6 | 2N S (ρ), that S( p) is ± similarly in the image for p6 | 2N S (ρ), and that U2 is in the image by Hensel’s lemma •

312


(as it is an eigenvalue for an element of Gal(Qac 2 /Q2 ) above Frob2 in one of these representations). [Di, Th. 4 and Prop. 6] show that the map (2)

R S,α,+ −→ h 2 (2N S (ρ))m(ρ,α)+ is an isomorphism. PROPOSITION 3.4 The natural maps

(2)

R S,α,− → → h 2 (4N S (ρ))−,m(ρ,α)− and ord R S,α,± → → h 0 (N S (ρ))±,m(ρ,α)±

are isomorphisms. Proof Consider the first of these maps. We have a commutative diagram (2)

R S,α,− /(2) → → h 2 (4N S (ρ))−,m(ρ,α)− /(2) ↓ ↓ ∼ (2) R S,α,+ /(2) −→ h 2 (2N S (ρ))m(ρ,α)+ /(2) where the left-hand vertical arrow is an isomorphism from the definitions. Thus (2)

∼

R S,α,− /(2) −→ h 2 (4N S (ρ))−,m(ρ,α)− /(2), and, because h 2 (4N S (ρ))−,m(ρ,α)− is torsion-free over Z2 , we deduce that (2)

∼

R S,α,− −→ h 2 (4N S (ρ))−,m(ρ,α)− . Now the composite ord R S,α,− /(S2 (5) − 1) → → h 0 (N S (ρ))−,m(ρ,α)− /(S2 (5) − 1) → → h 2 (4N S (ρ))−,m(ρ,α)−

is an isomorphism, and so ∼

ord R S,α,− /(S2 (5) − 1) −→ h 0 (N S (ρ))−,m(ρ,α)− /(S2 (5) − 1).

Because h 0 (N S (ρ))−,m(ρ,α)− is 3-torsion-free, we deduce that ∼

ord R S,α,− → h 0 (N S (ρ))−,m(ρ,α)− .

The same argument also shows that ∼

ord R S,α,+ → h 0 (N S (ρ))+,m(ρ,α)+ .


313

Putting Proposition 3.4 together with Corollary 1.2 and Lemma 3.2, we obtain the following corollary. COROLLARY 3.5 Suppose that K /Q2 is a finite extension with ring of integers O K , maximal ideal ℘ K , and residue field containing F4 . Suppose also that

ρ : Gal(Qac /Q) −→ GL2 (O K ) is a continuous representation such that (1) (ρ mod ℘ K ) has image equal to a conjugate of SL2 (F4 ) ⊆ SL2 (O K /℘ K ), (2) (ρ mod ℘ K )(c) 6= 1, (3) (ρ mod ℘ K ) is unramified at 5, (4) ρ is ramified at only finitely many primes, (5) ρ is unramified at 2 and ρ(Frob2 ) has eigenvalues α and β in O K with distinct reduction modulo ℘ K . Then there exists an odd integer N ≥ 5 divisible by all primes at which ρ ramifies −2/3 and a normalised eigenform f α ∈ M1>2 (N ) such that • T p f α = (tr ρ(Frob p )) f α for all primes p6 | 2N , • pS( p) f α = (det ρ(Frob p )) f α for all primes p6 | 2N , • U2 f α = α f α , and • U p f α = 0 for all p|N . (We remark that it is presumably not hard to weaken the fifth assumption to simply require that ρ|ss be unramified and that α be an eigenvalue of ρ I2 (Frob2 ). Gal(Qac 2 /Q2 ) We do not do so, as we do not need this result.) 4. The main theorem We now turn to the proof of Theorem A. By the previous work cited in the introduction, it suffices to check the following special case, which is our only contribution. 4.1 Suppose that K /Q is a Galois extension with Galois group A5 . Suppose also that • 2 is unramified in K and Frob2 ∈ Gal(K /Q) has order 3, • 5 is unramified in K , and • K is not totally real. If r : Gal(Qac /Q) −→ GL2 (C) is a continuous icosahedral representation such that proj r factors through Gal(K /Q), then there is a weight 1 newform f such that for all prime numbers p the pth Fourier coefficient of f equals the trace of Frobenius at p on the space of coinvariants for the inertia group at p in the representation r . In THEOREM

314


particular, the Artin L-series for r is the Mellin transform of a weight 1 newform and is an entire function. Proof Twisting r by a character of finite order, we may suppose that the image of det r has two-power order, that r is unramified at 2 and 5, and that r (Frob2 ) has order 3. Choose ∼ an isomorphism of fields Qac 2 = C, so that we may think of r as a representation Gal(Qac /Q) −→ GL2 (O K ) for some finite extension K /Q2 inside Qac 2 . By Corollary 3.5 we see that we may find an odd integer N ≥ 5 divisible by all primes at which r ramifies and normalised −2/3 eigenforms f α , f β ∈ M1>2 (N ) such that • T p f α = (tr r (Frob p )) f α and T p f β = (tr r (Frob p )) f β for all primes p6 | 2N , • pS( p) f α = (det r (Frob p )) f α and pS( p) f β = (det r (Frob p )) f β for all primes p6 | 2N , • U2 f α = α f α and U2 f β = β f β , and • U p f α = U p f β = 0 for all p|N . Theorem 2.13 tells us that f = (α f α − β f β )/(α − β) is classical, and Theorem A follows from this. Lastly, let us give some examples. To that end we call a number field K suitable if • K is Galois over Q with group A5 , • 2 is unramified in K and Frob2 ∈ Gal(K /Q) has order 3, • 5 is unramified in K , and • K is totally complex. If K is such a number field, then we can find a continuous homomorphism r : Gal(Qac /Q) −→ GL2 (C) such that the image of proj r is Gal(K /Q) (see, for instance, [S2, corollary to Th. 4]). For any such r we have just shown that L(r, s) has analytic continuation to the whole complex plane. Thus to give examples of our theorem, one need only give examples of suitable number fields K . Suppose that S is a finite set of places of Q including 2, 5, and ∞. For v ∈ S, let L v /Qv be a finite Galois extension such that Gal(L v /Qv ) embeds into A5 . Suppose that • L 2 /Q2 is unramified of degree 3, • L 5 /Q5 is unramified, and


315

L ∞ = C. According to [M], the quotient of affine 5 space over Q by the action of A5 which simply permutes the variables is rational. Hence, by, for example, the discussion in [S3, p. xiv] (see, in particular, [S3, Th. 2 and the remark that follows]), we see that there is a number field K that is Galois over Q with group A5 and such that for v ∈ S 60/[L v :Qv ] we have K v ∼ . By varying S we see in particular that there are infinitely = Lv many suitable number fields. More concrete examples can be found in the literature. For example, according to Buhler [Buh], the splitting fields of the following are suitable: •

x 5 + 4x 4 + 25x 3 + 17x 2 + 5x + 2, x 5 + 6x 4 + 19x 3 + 25x 2 + 11x + 2, x 5 + 3x 4 + 7x 3 + 6x 2 − 11x − 24, x 5 + 3x 4 + x 3 − 4x 2 + 17x − 8, x 5 + 2x 4 + 37x 3 − 7x 2 + 25x − 4. Corrigenda for [Ta2]. Taylor would like to take the opportunity to record some corrections to [Ta2]. He would like to thank Kevin Buzzard, Henri Darmon, and Nick Shepherd-Barron for pointing these out. • Page 339, line 10: the formula defining T p should have a factor p k−1 multiplying the second sum. • Page 339, line −4: between “if and only if” and “ f (as an element . . . ” insert “c1 ( f ) = 1 and”. • Page 340, line −11: the SS>r should read SS 0 and if π is an infinite-dimensional irreducible admissible representation of GL2 (Q p ), then write U p for the Hecke op erator V 0p 10 V acting on π V . We also write T p and S p for the Hecke operators GL2 (Z p ) 0p 10 GL2 (Z p ) and GL2 (Z p ) 0p 0p GL2 (Z p ) acting on π GL2 (Z p ) . We then have the following proposition, which occurs as a corollary to [Ca, proof of Th. 1]. PROPOSITION 12 Let π be an infinite-dimensional irreducible admissible representation of GL2 (Q p ). n Then there is an integer c ≥ 0 such that for every n ≥ 0 the dimension of π U1 ( p ) is equal to max{0, n − c + 1}.

We call the integer p c with c as in the above lemma the conductor of π . If c = 0, then we say that π is unramified. Each infinite-dimensional irreducible admissible representation of GL2 (Z p ) is classified as principal series, special, or supercuspidal, as described in [DI]. If × ¯ ¯ χ : Q× p → K and ψ : Q p → K are continuous characters, then we write π p (χ , ψ) for the space of locally constant functions f : GL2 (Q p ) → K¯ which satisfy f ( a0 db g) = χ (a)ψ(d) f (g) for all g in GL2 (Q p ) and any matrix a0 db in GL2 (Q p ). (Note that this notation differs from the notation in [DI].) If χ/ψ is not equal to the identity or to x 7 → p −2v p (x) , then π(χ, ψ) is irreducible and is a principal series representation, of conductor equal to the product of the conductors of χ and ψ. Otherwise, π(χ , ψ) has a unique infinite-dimensional irreducible subquotient sp(χ, ψ), which is a special representation. If χ and ψ are unramified, then sp(χ, ψ) is called unramified special and it has conductor p; otherwise χ and ψ have the same conductor and the conductor of sp(χ, ψ) is equal to the product of the conductors of χ and ψ. The supercuspidal representations are more difficult to construct; they all have conductor at least p 2 . To relate the above definitions to the classical situation, if π = ⊗0p π p is an automorphic representation arising from a weight 2 newform f of level N f and character χ f , then for any p the conductor p c p of π p is equal to the p-part of N f ; thus π p is unramified if and only if p does not divide N f . Moreover, the eigenvalues of the classical Hecke operators T p (when p does not divide N f ) and U p (when p does divide N f ) acting on the newform f agree with the eigenvalues of the actions of the corresponding operators defined above on the 1-dimensional space U1 ( p c p ) U1 ( p c p ) πp , and for p not dividing N f the eigenvalue of S p on π p is equal to χ f ( p).

336

MARK DICKINSON

3.2. The local Langlands correspondence for GL2 and Carayol’s theorem Recall that the local Langlands correspondence for GL2 (whose proof was completed by P. Kutzko [Ku]) gives for any odd prime p a natural bijection between isomorphism classes of irreducible admissible representations of GL2 (Q p ) over K¯ and isomorphism classes of continuous 2-dimensional representations of the absolute Weil group W p of Q p over K¯ for which any choice of 8 in W p lifting Frob p acts semisimply. There are various normalisations used in the literature for this correspondence; we use the normalisation described by Carayol in [C, Sec. 0]. (Note that Carayol uses the opposite convention for the identification W p ∼ = Q× p and identifies p with a lift of the geometric Frobenius, so his results look a little different.) We are also using here the correspondence described in [T, Sec. 4.2.1] between continuous 2-dimensional representations of W p over a finite extension K 0 of K and 2-dimensional representations of the Weil-Deligne group of Q p over K 0 . (This is the reason for our restriction to odd primes p.) If π is an infinite-dimensional irreducible admissible representation of GL2 (Q p ), then the conductor of π is equal to the conductor of the corresponding representation of W p , and the determinant of the representation corresponding to π is equal to ε2 (χπ ◦ ω), where χπ is the central character of π. The correspondence also preserves L-factors and ε-factors (see [K] for more details and additional properties of the correspondence). The representation of W p corresponding to a particular π is decomposable, reducible but not decomposable, or irreducible according as π is principal series, special, or supercuspidal, respectively. More precisely, if π = π(χ, ψ) is principal (χ◦ω p )ε2 0 series, then the corresponding representation of W p has the form 0 ψ◦ω p . If π = sp(χ, χ) is special, then the corresponding representation has the form (χ◦ω p )ε2 ∗ 0 χ ◦ω p . Carayol showed in [C] that if p is an odd prime and f is a weight 2 newform with corresponding automorphic representation π = ⊗0p π p , then π p corresponds under the local Langlands correspondence above to the restriction of the representation ρ f to the Weil group W p at p. This theorem combined with the properties of the local Langlands correspondence has many useful consequences; we collect some of them here for later use. THEOREM 13 (Carayol) Let f be a weight 2 newform, and let π = ⊗0p π p be the corresponding automorphic representation of GL2 (A∞ ). Let ρ f : G Q → GL2 ( K¯ ) be the representation associated to f . Then for an odd prime p, (1) ρ f |W p is the representation corresponding to π p under the local Langlands correspondence as described above;

MODULARITY OF 2-ADIC GALOIS REPRESENTATIONS

(2)

(3) (4)

337

let χπ : (A∞ )× → K¯ be the central character of π; then, using the identifications of class field theory described in the notation and conventions section, det ρ f = (χπ ◦ ε)ε2 and det ρ f |W p = (χπ |Q×p ◦ ω p )ε2 |W p ; the odd part of the conductor of ρ f is equal to the odd part of the level of f ; let p c p be the conductor of π p , and let e p be the dimension of (ρ f ) I p ; then the characteristic polynomial g(X ) of U p acting on the (e p + 1)-dimensional U ( p c p +e p )

space π p 1 is equal to X times the characteristic polynomial of the action of Frob p on (ρ f ) I p . Theorem 13 gives the facts we need about the restrictions ρ f |G p for odd p; we also need to say something about the structure of ρ f |G 2 when the level of f is not divisible by 4. PROPOSITION 14 Let f be a weight 2 newform. Then we have the following: (1) if f has odd level and the reduction of t2 ( f ) to k¯ is nonzero, then ∗ ∗ ∼ ρ f |G 2 = , 0 ψ

(2) (3)

where ψ is unramified and sends Frob2 to the unit root of X 7 → X 2 −t2 ( f )X + 2s2 ( f ); if f has odd level and the reduction of t2 ( f ) to k¯ is zero, then ρ f |G 2 is absolutely irreducible; and if f has level exactly divisible by 2, then χε2 ∗ ∼ ρ f |G 2 = , 0 χ where χ is unramified and sends Frob2 to the eigenvalue of U2 acting on f .

Proof The first part follows from [Wi1, Th. 2]. The third part also follows from this theorem along with the fact that the character χ f of f has odd conductor and the square of the eigenvalue of U2 on f is equal to χ f (2) (see [DDT, Th. 1.27]). The second part follows from a theorem of Fontaine (see [E, Th. 2.6]). 4. Cohomology of modular curves In this section we give some basic results about sheaf cohomology on modular curves. Let V be an open compact subgroup of GL2 (A∞ ). As in [CDT], we define the open modular curve YV to be the real 2-manifold YV = GL2 (Q)\ GL2 (A)/V U∞ ,

338

MARK DICKINSON

where U∞ = SO2 (R)R× ⊂ GL2 (R) is the stabiliser of i for the transitive action of GL2 (R) on C − R, and we define X V to be the standard compactification of YV obtained by the addition of cusps. One can show that the number of connected comˆ × and that each component can ponents of YV is equal to the index of det V in Z be identified with a quotient of the upper-half complex plane by some congruence subgroup of SL2 (Z). We define some particular open compact subgroups of GL2 (A∞ ) as follows. ˆ → Let N be a positive integer, and let π N be the natural quotient map GL2 (Z) GL2 (Z/N Z). Then we define U0 (N ), U1 (N ), U2 (N ), and U (N ) to be the inverse images under π N of all matrices of the form ∗0 ∗∗ , ∗0 ∗1 , 01 ∗1 , and 10 01 , respectively. One can check that YU0 (N ) and YU1 (N ) can be identified with the classical curves Y0 (N ) and Y1 (N ), respectively. Definition We say that an open compact subgroup V of GL2 (A∞ ) is sufficiently small if it does 0 1 or −1 −1 . not contain any conjugate of either of the matrices −1 0 1 0 0 Note that we do not demand that V not contain −1 0 −1 in the above definition. If V is sufficiently small, then YV has no elliptic points—that is, for every g in GL2 (A), the stabiliser in GL2 (Q) of the coset gV U∞ is contained in {±I }. If V is contained in U1 (N ) for some integer N ≥ 4, then both V and V {±I } are sufficiently small. Q Now suppose that V = p V p is sufficiently small, that M is a finitely generated O -module equipped with an action of V on the right, that the action of V on M factors through some finite quotient of V , and that V ∩ {±1} acts trivially on M. Extend the action of V to an action of V U∞ by letting U∞ act trivially on M. Then we can define a sheaf (in the sense of [Wa, Chap. 5]) F M = (GL2 (Q)\ GL2 (A)) × M /V U∞ → YV on YV , using the obvious projection map F M → YV . The conditions that V is sufficiently small and that V ∩ {±1} acts trivially on M ensure that this sheaf is locally constant and that each of its stalks is isomorphic to M. Let N and D be coprime positive integers such that U2 (N ) ∩ U (D) ⊂ V ⊂ U0 (N ) and such that V ∩ U (D) acts trivially on M; then the Hecke operators T p for p not dividing N D and U p for p dividing N act naturally on the cohomology group H 1 (YV , F M ). We note that [CDT, Lems. 6.1.2 and 6.3.1] adapt without change to the case when ` = 2, and that, furthermore, the hypothesis that V is contained in U1 (r 2 ) for a suitable prime r can be weakened to allow any sufficiently small V . Thus we have the following two results.


339

PROPOSITION 15 Q Let N and D be coprime positive integers. Let V = p V p be an open compact ˆ subgroup of GL2 (Z) satisfying

U2 (N ) ∩ U (D) ⊂ V ⊂ U0 (N ), and let M be a finitely generated O -module with a right action of V such that both V ∩ U (D) and V ∩ {±1} act trivially on M. Suppose that V is sufficiently small. Let T be the polynomial ring over O generated by indeterminates T p for primes p not dividing N D and U p for p dividing N ; we consider H 1 (YV , F M ) as a module for T. Let m be a non-Eisenstein maximal ideal of T with finite residue field. (Recall that a maximal ideal m of T is said to be Eisenstein if there is an integer N 0 such that T p − 2 is in m for all but finitely many p congruent to 1 modulo N 0 .) Then we have the following: (1) the natural maps Hc1 (YV , F M )m → H 1 (YV , F M )m and H 1 (X V , F M )m → H 1 (YV , F M )m are isomorphisms; (2) there is an isomorphism V /U (N )∩U (D) H 1 (YV , F M )m ∼ ; = M ⊗O H 1 (YU2 (N )∩U (D) , O ) m 2 (3)

if 0 → M 0 → M → M 00 → 0 is a short exact sequence of right O [V ]modules, then the sequence 0 → H 1 (YV , F M 0 )m → H 1 (YV , F M )m → H 1 (YV , F M 00 )m → 0

(4)

of T m -modules is exact; if M is free as an O -module, then the natural map H 1 (YU , F M )m ⊗O k → H 1 (YU , F M⊗O k )m

is an isomorphism. Furthermore, all the maps above are maps of T m -modules. PROPOSITION 16 (Ihara, Wiles) Let N , D, V , and M be as above, and let p be a prime not dividing 2N D. Let T be as above but with the indeterminate T p removed, and let m be a non-Eisenstein maximal ideal of T. Suppose in addition that the maximal ideal of O annihilates M, so that M is a k[V /V ∩ U (D)]-module. For s ≥ 1, write γ p and δ p for the maps H 1 (X V ∩U1 ( ps−1 ) , F M ) → H 1 (X V ∩U1 ( ps ) , F M ) arising from the maps X V ∩U1 ( ps ) → −1 X V ∩U1 ( ps−1 ) given by multiplication by the matrices 10 01 and p0 01 in GL2 (Q p ) ⊂ GL2 (A), respectively. Then we have the following:

340

(1)

MARK DICKINSON

the map H 1 (X V , F M )2m

(2)

γ p ⊕δ p

/ H 1 (X V ∩U ( p) , F M )m 1

is injective; if s ≥ 1, then the sequence 0

/ H 1 (X V ∩U1 ( p s−1 ) , F M )m

(−δ p ,γ p )

/ H 1 (X V ∩U ( ps ) , F M )2 m 1

γ p ⊕δ p

/ H 1 (X V ∩U1 ( p s+1 ) , F M )m

is exact. Again, all the maps are maps of T m -modules. Proof For a proof of this, see [CDT, Lem. 6.3.1]. If V1 ⊂ V2 is an inclusion of open compact subgroups of GL2 (A∞ ), then there is a corresponding map X V1 → X V2 of modular curves and hence a map H 1 (X V2 , O ) → H 1 (X V1 , O ) of cohomology groups. We denote by H the limit limV H 1 (X V , O ); this is an admis− → sible GL2 (A∞ )-module. The admissible representation H ⊗O K¯ of GL2 (A∞ ) over K¯ has a decomposition M H ⊗O K¯ ∼ π 2f , = f

where f runs over the set of all weight 2 newforms (see [CDT, Sec. 5.3]). 5. The modular deformation ring Let S be any finite set of odd primes. As in Section 1, we define the set N S to be the set of all weight 2 newforms f of odd level from which ρ¯ arises and for which the corresponding deformation (R f , ρ f ) is an S-deformation of ρ, ¯ and we use this set of newforms to define a modular S-deformation (R Smod , ρ Smod ) of ρ. ¯ In this section we give some properties of the sets N S and of the ring R Smod , and we define some new elements of R Smod . The proof of the following result is delayed until Section 12. PROPOSITION 17 The set N S is nonempty. Furthermore, if S = Q consists entirely of primes p for


341

which ρ¯ is unramified at p and ρ(Frob ¯ p ) has distinct k-rational eigenvalues, then we have the equality #N Q = #G Q #N∅ , where G Q is the group defined at the end of Section 2. The first part of this follows essentially from Buzzard’s level-lowering result (Proposition 1). PROPOSITION 18 The ring R Smod has the following properties. (1) R Smod is reduced, and it is free as an O -module of finite rank equal to the cardinality of N S . (2) If K 0 is a finite extension of K with ring of integers O 0 and if (R Smod )0 is the corresponding ring defined for K 0 , then the natural map

R Smod ⊗O O 0 → (R Smod )0 (3)

of O 0 -algebras which sends T p ⊗O 1 to T p is an isomorphism. Let δ : G Q → k × be a continuous character of odd conductor, and let δ˜ be its Teichmüller lift, and suppose that ρ¯ = σ¯ ⊗k δ for some representation σ¯ . univ , ρ univ ) and (R mod , ρ mod ) be the universal deformation and modular Let (R S, σ¯ S,σ¯ S,σ¯ S,σ¯ deformation, respectively, for S-deformations of σ¯ (defined with respect to the eigenvalue ασ¯ = αδ −1 (Frob2 ) of σ¯ (Frob2 ) if ρ¯ is unramified at 2), and let univ → R mod be the map arising from the universal property. Then φ S,σ¯ : R S, σ¯ S,σ¯ there is a commutative diagram R Suniv φS

R Smod

/ R univ S,σ¯

φ S,σ¯

/ R mod S,σ¯

of objects of CO where the horizontal maps are isomorphisms, and the bottom ˜ map sends T p to δ(Frob p )T p for all odd p at which ρ¯ is unramified. Proof First, note that the set N S is finite since if f is a form of odd level which gives rise to an S-deformation of ρ, ¯ then by Carayol’s theorem the level N f of f is equal to the Q dimk ρ¯ I p . conductor of ρ f , and this is bounded by N (ρ) ¯ p∈S p

342

MARK DICKINSON

Q Since R Smod is by definition a subring of f ∈N S K f , it is clear that R Smod is reduced and free of finite rank as an O -module. Now, consider the inclusion map Y R Smod ⊗O K → Kf. f ∈N S

The algebra on the left is a reduced finite-dimensional K -algebra, so it is a product of finite extensions of K . Each maximal ideal of R Smod ⊗O K arises as the inverse image of a maximal ideal of the product on the right; that is, it arises from some particular f in N S as the kernel of the surjective map R Smod ⊗O K → K f which sends T p to t p ( f ) for all odd p not in S. If two newforms f and g give rise to the same maximal ideal p of R Smod ⊗O K , then the corresponding field extensions (R Smod ⊗O K )/p → K¯ are conjugate by an element of Gal( K¯ /K ) and so f and g are conjugate by this same element, by multiplicity one. Conversely, any two newforms f and g which are conjugate by an element of Gal( K¯ /K ) give rise to the same maximal ideal, and if f is an element of N S , then so is any Gal( K¯ /K )-conjugate of f ; thus the number of conjugates of f in N S is equal to the degree of K f over K . Summing over the various conjugacy classes, we find that the K -dimension of R Smod ⊗O K , and hence the O -rank of R Smod , is equal to the number of elements of N S . Now R Smod ⊗O O 0 → (R Smod )0 is a surjection of O 0 -modules of equal rank, and so it is an isomorphism. For the last part of the proposition, note that the set N S is in bijection with the set N S,σ¯ of weight 2 newforms of odd level giving rise to S-deformations of σ¯ ; the bijection is essentially given by twisting by the Dirichlet character corresponding to the Teichmüller lift of δ. The result then follows easily from the definitions of R Smod mod . and R S,σ In addition to the elements T p (for p not in S and not dividing 2N (ρ)) ¯ of R Smod already defined in Section 1, we let S p be the element (s p ( f )) f ∈N S for p = 2 and for odd primes p not in S at which ρ¯ is unramified. For odd p, S p = p −1 det ρ Smod (Frob p ) is clearly contained in R Smod ; since the map p 7→ S p factors through some Dirichlet character of odd conductor, the element S2 is equal to S p for some odd p and so is also in R Smod . We also need to know that the element T2 = (t2 ( f )) f ∈N S is in R Smod . Let 8 be an element of G 2 which lifts Frob2 ; then by our assumptions on ρ| ¯ G 2 the matrix ρ(8) ¯ has distinct eigenvalues and so by Hensel’s lemma the characteristic polynomial X 7→ X 2 − tr ρ Smod (8)X + det ρ Smod (8) of ρ Smod (8) has distinct roots in R Smod . Let u be the root that lies above α. From Proposition 14 we know that for each f in N S the eigenvalue of the matrix ρ f (8) which lies over α is equal to the unit root of the polynomial X 2 − t2 ( f )X + 2s2 ( f ).


343

Thus u 2 − T2 u + 2S2 = 0, and (t2 ( f )) f ∈N S = T2 = u + 2S2 /u is contained in R Smod . Variant. We could also define the set N S to be the set of forms giving rise to Sdeformations of ρ¯ and of level not divisible by 4. By Proposition 14 and our assumptions on the behaviour of ρ¯ at 2, this gives the same set. Now, instead of defining T2 , define an element U2 of R Smod to be equal to the element u found above. If f is an element of N S with corresponding automorphic representation π f = ⊗0p π p , then π2 U (2)

is unramified and π2 1 is 2-dimensional; thus U2 = (u 2 ( f )) f ∈N S , where u 2 ( f ) is U (2) the eigenvalue of U2 on π2 1 which lies above α. 6. Representations of GL2 (Z p ) In this section we recall the definition and properties of some representations of GL2 (Z p ) introduced in [CDT]. Let p be an odd prime, let A be the ring of Witt vectors of F p2 , and let σ denote the Z p -automorphism of A corresponding to the Frobenius automorphism of F p2 . Let F be any algebraically closed field of characteristic zero, and let χ : A× → F × be a finite-order character of conductor p n , some n ≥ 1, such that χ/χ ◦ σ also has conductor p n . Then B. Conrad, Diamond, and Taylor define in [CDT, Sec. 3.2] a representation 2(χ) of GL2 (Z p ) over F. 19 The representation 2(χ) has the following properties: (1) 2(χ ) factors through GL2 (Z/ p n Z) to give an irreducible representation of degree ( p − 1) p n−1 ; × n (2) if δ : Z× p → F is a character of finite order and conductor at most p , then × 2(χ (δ ◦norm)) is isomorphic to 2(χ )⊗ F F(δ ◦det), where norm : A → Z× p is the norm map; (3) 2(χ ) is isomorphic to 2(ψ) if and only if χ is equal to ψ or ψ ◦ σ ; (4) the restriction of 2(χ) to U1 ( p) is irreducible, and the restrictions of 2(χ ) and 2(ψ) to U1 ( p) are isomorphic if and only if χ |1+ p A is equal to one of ψ|1+ p A or ψ ◦ σ |1+ p A ; (5) the central character of 2(χ) is equal to the restriction of χ to Z× p; (6) let V be the open compact subgroup of GL2 (Z p ) consisting of all elements that reduce modulo p n to a matrix of the form ∗0 01 ; then 2(χ)V has dimension 1 over F; (7) 2(χ ) can be realised over the subfield of F generated by values of (χ +χ ◦σ ). PROPOSITION

Proof Although the statements above are slightly different from those in [CDT] (since we deal with restrictions to U1 ( p) rather than to U0 ( p)), the proof of [CDT, Lem. 3.2.1]

344

MARK DICKINSON

also adapts to prove the analogous statements above. The last two statements are not proved in [CDT]; to give a somewhat ad hoc proof of the first of these, we may assume that F = K¯ ; then we may use Proposition 21 to find a 2-dimensional representation of W p of conductor p 2n such that if π is the corresponding representation of GL2 (Q p ), n then (π|GL2 (Z p ) )U ( p ) is isomorphic to 2(χ). Then π also has conductor p 2n and its central character has conductor p n ; it follows that the subspace of π fixed by n −n (1+ p n Z p )U1 ( p 2n ) = p0 01 V p0 01 is 1-dimensional over F and hence that π V = 2(χ )V is 1-dimensional. For the last statement, it is evident from the construction in [CDT] and the fact that 2(χ) is isomorphic to 2(χ ◦ σ ) that the trace of 2(χ) takes values in Q(χ + χ ◦ σ ); now we use the previous part of the proposition along with [W, Lem. I.1]. If χ : A× → F × is such that χ/χ ◦ σ has conductor p n for some n ≥ 1, but χ × itself has strictly greater conductor, then for some character δ : Z× p → F the twist −1 n −1 χ (δ ◦norm) of χ has conductor p and we define 2(χ) to be 2(χ (δ ◦norm))⊗ F F(δ ◦ det); by Proposition 19 this definition does not depend on the choice of δ. Now, suppose that F = K¯ and that χ : A× → K¯ × is a continuous (hence finite order) character. By a model of 2(χ) over O we mean an O [GL2 (Z p )]-module L, free over O , such that L ⊗O K¯ is isomorphic to 2(χ). 20 K¯ × be a character as above, and suppose there is a model L for 2(χ ) over O . Then we have the following: (1) L is the unique model of 2(χ) up to isomorphism; (2) the reduction L¯ = L ⊗O k of L is an absolutely irreducible k[U1 ( p)]-module; (3) there is an isomorphism PROPOSITION Let χ : A× →

L∼ = HomO (L , O ) ⊗ (χ |Z×p ◦ det) (4)

of O [GL2 (Z p )]-modules; and if L 0 is a model for 2(χ 0 ) over O and if the reduction of the pair {χ 0 , χ 0 ◦ σ } ¯ is equal to that of {χ, χ ◦ σ }, then L¯ 0 is isomorphic to L.

Proof The second part follows from [S1, Sec. 16.4, Prop. 46] using the fact that 2(χ)|U1 ( p) is absolutely irreducible; the first part then follows from [S1, Exercise 15.3]. To prove the third part, note that there is a corresponding nondegenerate pairing on V , as described in [CDT, Sec. 3.3], so that both sides are models for V and hence isomorphic. The last part comes from the fact that both L¯ 0 and L¯ have the same Brauer character; this follows from the construction of 2(χ) in [CDT].


345

Finally, we have the following supplement to the description of the local Langlands correspondence given in Section 3. PROPOSITION 21 Suppose that ψ : I p → K¯ × is the restriction of a character of the Galois group H C G p of the unramified quadratic extension of Q p , and write χ for the corresponding character on A× . Assume that the conductors of χ and χ/χ ◦ σ are both equal to p n for some n ≥ 1. Let π be an irreducible admissible representation of GL2 (Q p ) over n K¯ . Then the representation π U ( p ) of GL2 (Z p ) is isomorphic to 2(χ ) if and only if the restriction to I p of the corresponding representation of W p is isomorphic to ψ 0 U ( p n ) ) is 0 ψ◦Frob p ; if this is not the case, then the module Hom K¯ [GL2 (Z p )] (2(χ ), π trivial.

Here Frob p denotes the automorphism of H ab given by conjugation by Frob p . Proof This is [CDT, part 3 of Lem. 4.2.4]. 7. A module for the modular deformation ring In this section we explain how to define a module HS for the modular deformation ring R Smod , and we give a useful reinterpretation of the definition of HS in the special case when S contains all odd primes at which ρ¯ is ramified. The module HS eventually turns out to be free of rank 2 over R Smod . These results are used along with the main theorem to establish the multiplicity one result given in Section 1. For convenience we assume in this section that K is large enough that some twist of ρ¯ by a k × -valued character is minimal, in the sense of Section 2.2. This can be achieved, for example, by adding all roots of unity of order M to K , where M is the odd part of φ(N (ρ)). ¯ Thus, in general, our definition of HS is valid only after replacing K by some finite (unramified) extension and making the corresponding replacements for O , k, ρ, ¯ and CO . Begin by choosing, once and for all, a representation σ¯ : G Q → GL2 (k) and a character δ : G Q → k × such that ρ¯ = σ¯ ⊗k δ and σ¯ is minimal. We may assume that ˆ × → O × for the δ is ramified only at those primes dividing N (ρ)/N ¯ (σ¯ ). Write 1 : Z −1 Teichmüller lift of the character δ ◦ ε corresponding to δ. For each odd prime p, let p c p be the conductor of σ¯ |G p and let e p be the dimension of σ¯ I p . Let A and σ be as ¯ to be the set of odd in the previous section. Recall that in Section 2 we defined T (ρ) primes for which the restriction of ρ¯ to I p is decomposable over k¯ but the restriction to G p is absolutely irreducible. For each such prime the representation ρ| ¯ G p ⊗k k¯ is induced from a character of the absolute Galois group of the degree 2 unramified

346

MARK DICKINSON

extension of Q p , whose restriction to I p corresponds under local class field theory to a character A× → k¯ × . We let χ p : A× → K¯ × be the Teichmüller lift of this character; note that χ p + χ p ◦ σ takes values in K . We describe how to construct for each prime p an O [GL2 (Z p )]-module M p . (1) If p is in T (ρ), ¯ then let M p be a model over O for the representation 2(χ p ) described in Section 6. (By the last part of Proposition 19 the representation 2(χ p ) is realisable over K , so that a model over O exists.) (2) If p is not in T (ρ), ¯ then let M p = O (1 p ◦ det), where 1 p denotes the restriction of 1 to Z× . p Note, in particular, that M p has trivial GL2 (Z p )-action when p = 2 or ρ¯ is ˆ Let S be a finite set of unramified at p. Let M be the module ⊗ p M p for GL2 (Z). ˆ of the form odd primes; then we also define an open compact subgroup VS of GL2 (Z) Q VS = p VS, p , where each VS, p is an open compact subgroup of GL2 (Z p ). For p not in S, define VS, p as follows: (1) let VS,2 = GL2 (Z2 ); (2) if p is an odd prime not in T (ρ), ¯ then let VS, p be the set of elements ac db in U0 ( p c p ) for which d has order a power of 2 in (Z/ p c p Z)× ; (3) if p is an odd prime in T (ρ), ¯ then let VS, p = GL2 (Z p ). If p is in S, then we use the following definitions: (1) if p is not in T (ρ), ¯ then let VS, p = U1 ( p c p +e p ); (2) if p is in T (ρ), ¯ then let VS, p = U1 ( p). We now explain how to use the choices of VS and M to identify the set N S of weight 2 newforms of odd level giving rise to S-deformations of ρ. ¯ First we do some local analysis. Suppose that f is a weight 2 newform from which ρ¯ arises, with corresponding automorphic representation π f = ⊗0p π p . Then for each prime p we define a finitedimensional K¯ -module H p, f = Hom K¯ [VS, p ] (M p ⊗O K¯ , π p ). For p = 2 and for odd p not in S at which ρ¯ is unramified, H p, f ∼ = π pGL2 (Z p ) and there is a natural action of the double coset operator T p on H p, f . For p in S − T (ρ) ¯ the GL2 (Z p )module M p is equal to O (1 p ◦ det), and we can define an action of U p as follows. Extend 1 p to a character of Q× p by setting 1 p ( p) = 1; then M p can be considered as a module for GL2 (Q p ). Hence Hom K¯ (M p ⊗O K¯ , π p ) can also be considered as a module for GL2 (Q p ) (by defining (g f )(x) = g( f (g −1 x)) for g in GL2 (Z p ), x in M K¯ , and f : M K¯ → π f ), and we define U p to be the usual double coset operator VS, p 0p 10 VS, p . LEMMA 22 The module H p, f is nontrivial if and only if p = 2 and f has odd level, or p is odd and the deformation (R f , ρ f ) corresponding to f satisfies the condition at p in the


347

definition of an S-deformation. Furthermore, if this condition is satisfied, then (1) for p = 2 or p not in S at which ρ¯ is unramified, H p, f is 1-dimensional and T p acts by t p ( f ); (2) for p in S − T (ρ), ¯ write m p for the ideal (U p )+mO of the polynomial algebra O [U p ]; then there is a decomposition H p, f = H p, f,0 ⊕ H p, f,1

(3)

of K¯ [U p ]-modules such that H p, f,0 is 1-dimensional and annihilated by U p and (H p, f,1 )m p = 0; for the remaining p, the module H p, f is 1-dimensional.

Note that if p is in S, then there is no condition imposed at p in the definition of an S-deformation, so that for these primes Lemma 22 implies that H p, f should always be nontrivial. Proof If p = 2 or p is an odd prime not in S at which ρ¯ is unramified, then, using the results GL (Z ) described in Section 3, the space H p, f = π p 2 p is trivial unless π p is unramified, in which case it has dimension 1. But by Theorem 13, for odd p the representation π p is unramified if and only if ρ f |G p is, while for p = 2 the representation π2 is unramified if and only if f has odd level. Now, suppose that p is in S−T (ρ). ¯ Then there is no condition imposed at p in the definition of an S-deformation, so it is enough to check that the stated decomposition U1 ( p c p +e p ) is nontrivial. By Theorem 13 holds, and then H p, f ∼ = (π p ⊗ K¯ K¯ (1−1 p ◦det)) the representation π p ⊗ K¯ K¯ (1−1 p ◦ det) (considered as a representation of GL2 (Q p ) by setting 1( p) = 1) corresponds under the local Langlands correspondence to a lift σ p : W p → GL2 ( K¯ ) of σ¯ |W p . If p c p (σ p ) is the conductor of σ p and e p (σ p ) = dim K¯ σ I p , then from the definitions c p (σ p ) + e p (σ p ) = c p + e p , and then the last part p of Theorem 13 provides the required decomposition. The last part of the lemma again follows from Theorem 13 along with Proposition 9 and, for p in T (ρ), ¯ Proposition 21. We give details when p is in S ∩ T (ρ). ¯ In this case, we must show that the module H p, f = Hom K¯ [U1 ( p)] (2(χ p ), π p ) is always 1-dimensional; the argument is mildly complicated by the need to twist representations so that the conductor conditions of Proposition 21 are satisfied. The representation π p corresponds under the local Langlands correspondence to a representation ρ p : W p → GL2 ( K¯ ) which lifts ρ| ¯ W p ; by Proposition 9 this representation is induced from some character of the index 2 subgroup of W p containing I p , and the restriction of this character to I p corresponds by local class field theory to a character ψ p : A× → K¯ × whose reduction (replacing ψ p with ψ p ◦ σ if necessary) is equal

348

MARK DICKINSON

to that of χ p . Note that ψ p |1+ p A = χ p |1+ p A since the quotient ψ p /χ p has 2-power order and so is trivial on the pro- p-group 1 + p A. From Proposition 21, applied with −1 −1 U ( pn ) χ = ψ p (1−1 p ◦norm) and π = π p ⊗(1 p ◦det), we find that (π p ⊗(1 p ◦det)) n −1 is isomorphic to 2(ψ p (1−1 p ◦ norm)). Here p is the conductor of ψ p (1 p ◦ norm), −1 which is also equal to the conductor of χ p (1 p ◦ norm). Since the representation n 2(χ p (1−1 p ◦ norm)) of GL2 (Z p ) has trivial U ( p )-action, it follows that −1 H p, f ∼ = Hom K¯ [U1 ( p)] 2(χ p (1−1 p ◦ norm)), 2(ψ p (1 p ◦ norm)) , which is 1-dimensional since the two representations 2(χ p ) and 2(ψ p ) of U1 ( p) are irreducible and isomorphic by Proposition 19. Now, let T S be the polynomial O -algebra generated by indeterminates T p for p = 2 and for odd primes p not in S at which ρ¯ is unramified and U p for p in S − T (ρ). ¯ Let I S be the kernel of the surjective map T S → R Smod which sends the indeterminate U p to zero for p in S − T (ρ) ¯ and the indeterminate T p to the element T p of R Smod for p = 2 and for odd primes not in S at which ρ¯ is unramified. Let m S be the preimage N in T S of the maximal ideal of R Smod . Let M K¯ = p (M p ) K¯ denote the (irreducible) ˆ over K¯ , and let f be a weight 2 newform with representation M ⊗O K¯ of GL2 (Z) corresponding automorphic representation π f = ⊗0p π p . Then the actions of T p and U p described above give the K¯ -vector space Hom K¯ [VS ] (M K¯ , π f ) an action of T S . The following lemma indicates when a weight 2 newform f gives rise to an Sdeformation of ρ, ¯ in terms of the local components of the automorphic representation associated to f .

LEMMA 23 Let f be a weight 2 newform with corresponding automorphic representation π f = ⊗0p π p over K¯ . Then f is in N S if and only if the finite-dimensional K¯ -vector space

Hom K¯ [VS ] (M K¯ , π f )m S is nontrivial. In this case, this vector space has dimension 1 and the natural map Hom K¯ [VS ] (M K¯ , π f )[I S ] → Hom K¯ [VS ] (M K¯ , π f )m S is an isomorphism. Proof Write H for the K¯ -vector space HomVS (M K¯ , π f ); we have an isomorphism H∼ = ⊗ p H p, f


349

of finite-dimensional K¯ -vector spaces where H p, f was defined above, and for each prime p the operator T p or U p acts only through H p, f . One can check that the K¯ modules Hm S and H [I S ] are unaffected by base change (that is, by replacing K with a finite extension and making corresponding changes for all related data), and so without loss of generality we may assume that O , and hence T S , contains the Hecke eigenvalues of f . First, suppose that Hm S is nonzero, so that H p, f must be nonzero for each p. For all but finitely many p, the element T p − t p ( f ) of T S annihilates H p, f and hence also H ; so if Hm S is to be nonzero, then T p − t p ( f ) must be in m S for these p and it follows that ρ¯ arises from f . Now we can apply the first part of Lemma 22 to deduce that f must in fact be an element of N S . Conversely, suppose that f is in N S . Since m S is the unique maximal ideal of T S containing I S , it follows that, for any T S -module L, the natural map L[I S ] → L m S is an inclusion. We describe a decomposition H = H0 ⊕ H1 of T S -modules such that H0 is 1-dimensional and equal to H0 [I S ] while (H1 )m S = 0; then H [I S ] = H0 = Hm S , and the result follows. For each p in S − T (ρ), ¯ Lemma 22 gives a decomposition of H p, f into two components H p, f,0 and H p, f,1 ; this gives a corresponding decomposition of H = ¯ components. Let H be the single component involving the ⊗ p H p, f into 2#(S−T (ρ)) 0 product of the H p, f,0 , and let H1 be the sum of the remaining components. Let p f be the kernel of the map T S → K¯ which sends T p to t p ( f ) for p = 2 and odd p at which ρ¯ is unramified, and sends U p to zero for p in S − T (ρ); ¯ then from Lemma 22 it follows that H0 = H0 [p f ] and that (H1 )m S = 0. Furthermore, since f is in N S by assumption, the map T S → K¯ factors through the map T S → R Smod , and so I S ⊂ p f and H0 [I S ] = H0 . This gives the required decomposition of H and completes the proof of the lemma. The preceding result motivates the following definition of the module HS for R Smod . Recall that in Section 4 we defined an admissible GL2 (A∞ )-module H as the direct limit limU H 1 (X U , O ) of cohomology groups of modular curves. Now using − → the definitions of VS , M, T S , I S , and m S above, we have a natural action of T S on HomO [VS ] (M, H ) defined in the same way as the action on HomO [VS ] (M K¯ , π f ), and we define HS = HomO [VS ] (M, H )[I S ]. Since HS is supported only at the maximal ideal m S of T S , there is an inclusion HS → HomO [VS ] (M, H )m S of T S -modules, which is an isomorphism since both sides are free O -modules of the same rank (from Lemma 23 and the decomposition of H ⊗O K¯ ) and HS has torsionfree cokernel in HomO [VS ] (M, H ).

350

MARK DICKINSON

PROPOSITION 24 The module HS ⊗O K is a free R Smod ⊗O K -module of rank 2.

Proof It is enough to show this result after base change to K¯ , and then it follows easily from Lemma 23 and from the decomposition of H ⊗O K¯ . COROLLARY 25 The ring R Smod can be identified with the O -subalgebra of EndO HS generated by the Hecke operators T p and S p for p = 2 and for odd p not in S at which ρ¯ is unramified, and by operators U p for the remaining odd primes p, excluding those in T (ρ). ¯

Proof The natural map R Smod → EndO HS of free O -modules is injective by Proposition 24, and its image is generated by T p for some cofinite set of primes p. We need only show that the images of all the Hecke operators actually lie in R Smod . For p = 2 we have already shown in Section 5 that there is an element T2 of R Smod whose action on HS corresponds to the action of the Hecke operator T2 . If p is odd and not in S and ρ¯ is unramified at p, then T p = tr ρ Smod (Frob p ). If p is odd and an element of S − T (ρ), ¯ then U p = 0. If p is odd and not in S or T (ρ), ¯ then either ρ¯ I p = 0 and mod I I p p U p = 0 or ρ¯ has dimension 1 over k, (ρ S ) is a free rank 1 R Smod -module, and U p = tr(ρ Smod ) I p (Frob p ). If S contains T (ρ) ¯ and all primes at which δ is ramified (in other words, if S contains all those primes p for which the component M p of M is not just O with trivial GL2 (Z p )-action), it is possible to give a simpler description of HS which does not involve the module M. The most useful case of this is that in which S contains all odd primes at which ρ¯ is ramified. PROPOSITION 26 Suppose that S contains all odd primes p at which ρ¯ is ramified. Define an integer N by the formula Y Ip N = N (ρ) ¯ p dimk ρ¯ . p∈S

Then there are isomorphisms HS ∼ = (H U1 (N ) )m S ∼ = H 1 (X 1 (N ), O )m S of T S -modules. Furthermore, for all p in S (including those p in T (ρ)) ¯ the natural 1 action of U p on H (X 1 (N ), O )m S is by zero.


351

Proof First, note that N is at least 11 by Proposition 1 (since there are no weight 2 cusp forms of level smaller than 11), and so U1 (N ) is sufficiently small. Thus the second isomorphism of the theorem is immediate from Proposition 15. For the first isomorphism, we only have to deal with primes p in T (ρ) ¯ and primes p at which δ is ramified, since these are the only p for which M p has non0 trivial GL2 (Z p )-action. First, suppose that p is in T (ρ). ¯ Let VS, p be the subgroup of GL2 (Z p ) consisting of all matrices that reduce modulo p c p /2 to something of the form ∗0 01 . By Proposition 19 the space HomO [VS,0 p ] (O (1 p ◦ det), M p ) is a free O module of rank 1; let f p be a generator for this. Then composition with f p gives −c p /2 0 0 c p /2 0 a map HS → HomO [VS,0 p ] (O (1 ◦ det), H ); since p VS, p p contains 0 1 0 1 −c p /2 0 U1 ( p c p ), we can compose further with the map H → H given by x 7→ p x 0 1 to obtain an injective map HS → HomO [U1 ( pc p )] (O (1 ◦ det), H )m S . cp (Here the matrix p0 01 should be thought of as an element of GL2 (Q p ).) This map is easily seen to be injective with torsion-free cokernel. (The latter fact follows from the fact that the reduction of M p modulo the maximal ideal of O is an absolutely irreducible k[U1 ( p)]-module and hence that f p generates M p as an O [U1 ( p)]-module.) Combining these maps for all p in T (ρ), ¯ we obtain a map HS → HomO [U1 (N 0 )] (O (1 ◦ det), H )m S which is again injective with torsion-free cokernel, where N 0 denotes the integer Q Q Ip N (σ¯ ) p∈S p dimk σ¯ = p∈S p c p +e p . Now, using analogues of Lemmas 22 and 23, one can show that the domain and codomain of this map both have the same rank over O and hence that the map is an isomorphism. Now we untwist to remove the factor of 1 ◦ det: let f be any element of HomU1 (N 0 ) (O (1 ◦ det), H )[I S ]; then the image of f is contained in the subspace H 0 ⊂ H consisting of all elements x of H for which 1 1 acts trivially on x, and (1) 01 (2) the Hecke operator U p annihilates x for each p dividing the conductor N (1) of 1. P N (1)−1 1 i/N (1) For any x satisfying these two properties, note that i=0 x = 0. Now, 0 1 consider the map θ: H 0 → H 0 P N (1)−1 given by multiplication by the Gauss sum i=0 1(i) 10 i/N1(1) . This map is an

352

MARK DICKINSON

isomorphism, with inverse given by N (1)−1 X 1 1−1 (i) x 7→ N (1)

1 −i/N (1) 0 1

x.

i=0

Hence the map HomO [VS ] (O (1 ◦ det), H )[I S ] → HomO (O , H ) given by composition with θ is injective with torsion-free cokernel. A straightforward but tedious case-by-case check shows that the image of this map is contained in HomO [U1 (N )] (O , H )[I S ] = (H U1 (N ) )m S ; it is also straightforward to check (again using the methods of Lemmas 22 and 23) that both modules are free O -modules of the same rank, so that this map is also an isomorphism. In order to reduce statements of the main theorem to the minimal case, we also need to understand the behaviour of HS under twisting and under change of base. PROPOSITION 27 The module HS has the following two properties. (1) Let K 0 be a finite extension of K with ring of integers O 0 and residue field k 0 , and, using these data, define a module HS0 as above. Then HS0 is isomorphic as an (R Smod )0 = R Smod ⊗O O 0 -module to HS ⊗O O 0 . (2) Let HS,σ¯ denote the module defined as above using the representation σ¯ in place of ρ. ¯ Then there is an isomorphism

HS → HS,σ¯ of O -modules such that for each odd prime p at which ρ¯ is unramified the ˜ action of T p in R Smod on HS corresponds to the action of T p δ(Frob p ) on HS,σ¯ . mod Equivalently, the module HS regarded as a module over R S,σ¯ via the inverse mod is isomorphic to H of the natural isomorphism R Smod → R S, S,σ¯ . σ¯ mod of rank 2, then it follows that H is Thus if we can show that HS,σ¯ is free over R S, S σ¯ mod free of rank 2 over R S .

Proof The proof of the first statement is straightforward from the definition of HS and from the fact that the module M is a flat and finitely presented O -module. For the proof of the second statement, let N (δ) be the conductor of δ and suppose that V is any open compact subgroup of GL2 (A∞ ) whose determinant is contained in


353

ˆ × → O × for the character the kernel of the map Zˆ × → (Z/N (δ)Z)× . Write 1 : Z ˆ × × R× ) by corresponding to the Teichmüller lift of δ, and extend 1 to A× = Q× (Z >0 × × making it trivial on Q and R>0 . Then the map [g] 7 → 1(det g) from YV to O (where [g] is the element of YV represented by an element g of GL2 (A)) is a well-defined locally constant map, which extends to the cusps to give a map θ : X V → O ; consider the endomorphism of the space of O -valued divisors of X V which sends a point [g] to θ (g) · [g]. This gives a natural map H 1 (X V , O ) → H 1 (X V , O ); putting these maps together for all V as above, we obtain a map 2: H → H of O -modules satisfying 2(gx) = 1(det g)2(x) for x in H and g in GL2 (A∞ ). Now, let Mσ¯ be the module defined in the same way as M but with respect to σ¯ in place of ρ; ¯ then M = Mσ¯ ⊗O 1 ◦ det as a U S -module. Let m S,σ¯ be the maximal ideal of T S corresponding to σ¯ . Then one can check that composition with 2 gives the required isomorphism HomU S (M, H )m S → HomU S (Mσ¯ , H )m S,σ¯ .

Variant. Building on the variant at the end of Section 5, one can take VS,2 to be U1 (2) instead of GL2 (Z2 ), include an indeterminate U2 in T S in place of T2 , and send it to U2 in R Smod in the definition of I S and m S . Then the corresponding definition of HS gives a module that is isomorphic to the original HS , but the proof of Proposition 26 naturally identifies this new HS with H 1 (X 1 (2N ), O )m S . 8. The main theorem We are now in a position to state a stronger version of the main theorem. Let ρ¯ and S be as in Section 1. We prove the following theorem. 28 We have the following: (1) the map φ S : R Suniv → R Smod is an isomorphism; (2) the algebra R Suniv is a complete intersection ring; and (3) after any change of base necessary to define HS , this module is free over R Suniv of rank 2. THEOREM

The first step in the proof is to reduce to the case where ρ¯ is minimal; this reduces the number of possible cases for the local behaviour of ρ¯ and so makes some of the later calculations easier. In order to make ρ¯ minimal, we must first replace K by some finite

354

MARK DICKINSON

extension, with corresponding replacements for O , k, ρ, ¯ and CO , and then we must twist ρ¯ by a suitably chosen k × -valued character. That we can do both of these things follows from Propositions 8, 18, and 27, along with standard base change results. We assume from this point onwards that ρ¯ is minimal. We also explain here how to add auxiliary structure that is necessary in later arguments. Let r be an odd prime congruent to 3 modulo 4, and assume that r is not in S, ρ¯ is unramified at r , and ρ(Frob ¯ r ) has distinct k-rational eigenvalues. That ˇ such a prime exists follows from the Cebotarev density theorem together with the fact that the restriction of ρ¯ to G Q(i) is absolutely irreducible. (If this were false, then ρ¯ would be induced from a character of G Q(i) and this would contradict the hypothesis on the restriction of ρ¯ to G 2 .) Now in the definition of HS we alter the subgroup ˆ by replacing VS,r = GL2 (Zr ) with U1 (r ){±I } and by removing the VS of GL2 (Z) generator Tr from T S , and we write L S for the R Smod -module thus obtained. PROPOSITION 29 There is an isomorphism L S ∼ = HS ⊕ HS of R Smod -modules.

Proof Let VS0 be the subgroup obtained from VS by adding auxiliary structure at r . Thus VS0 is a product of subgroups of GL2 (Z p ), and these component subgroups are identical to those of VS , except at the place r where we use U1 (r ){±I } in place of GL2 (Zr ). The module L S is therefore defined as HomO [VS0 ] (M, H )m S . Since (R Smod , ρ Smod ) is an S-deformation of ρ, ¯ the representation ρ Smod is unrammod ified at r . Let α˜ r and β˜r in R S be the eigenvalues of ρ Smod (Frobr ) lifting the eigenmod and of U which values αr and βr of ρ(Frob ¯ r ). The module L S has actions of R S r commute with each other. Now, define a map HS → L S sending an element f of HS to (Ur − β˜r ) f , and P a map L S → HS which sends an element f of L S to g g f as g runs over a set of coset representatives for GL2 (Zr )/U0 (r ). We claim that the module L S is unaltered if we define it using U0 (r ) in place of U1 (r ){±1} so that the latter map is well defined and L S decomposes into two pieces on which Ur acts by α˜ r and β˜r , respectively; we claim further that the composite HS → HS of the two maps is equal to multiplication by the unit r α˜ r − β˜r of R Smod . Then the maps given identify HS with the submodule of L S on which Ur acts by α˜ r , and by symmetry we can also identify HS with the submodule of L S on which Ur acts by β˜r . This proves the proposition. It is enough to check the claims after base change from O to K¯ . But then, using the decomposition of H ⊗O K¯ , both HS ⊗O K¯ and L S ⊗O K¯ decompose into components corresponding to weight 2 newforms from which ρ¯ arises, and the problem is reduced to checking the claims for any one component.


355

By Lemma 23 the only newforms contributing to HS ⊗O K¯ are those in N S , and in fact the same is true for L S ; to see this, note that if f is a weight 2 newform from which ρ¯ arises and π f = ⊗0p π p is the corresponding automorphic representation, then πr is a principal series representation. For each weight 2 newform f from which ρ¯ arises, the r th component of the corresponding representation is principal series, equal to π(χ , ψ), where χ and ψ are tamely ramified, and the restriction of its central character to Zr× is of 2-power order. This form can contribute nontrivially to L S ⊗O K¯ only if π(χ, ψ)U1 (r ){±I } is nonzero. But then, since r is congruent to 3 modulo 4, the central character restricted U (r ) U (r ) to Zr× has odd order and so is trivial. So πrUr is equal to πr 0 . Furthermore, if πr 0 is nontrivial, then πr has conductor at worst r , so that at least one of the characters χ and ψ is unramified, and the central character of πr is also unramified. Hence both χ and ψ are unramified, and so is πr . It follows that the representation associated to f is unramified at r and hence that the only newforms contributing nontrivially to either of HS or L S are those in N S . Furthermore, the map L S → HS above is well defined, as claimed. One can also check, using the explicit description of π(χ, ψ), that the composite of the two maps defined above is equal to multiplication by r 2 χ(r ) − ψ(r ), or r α˜ r − β˜r , as required. Thus it is enough to show that L S is a free R Smod -module (of rank 4), and then HS is projective, so also free, necessarily of rank 2. A key step in proving the main theorem, both in the case S = ∅ and the case S 6= ∅, requires relating the modules L S for various S. Suppose that S ⊂ S 0 is an inclusion of finite sets of odd primes; then for each p not in S we define an element µ p of R Smod by the following formulas: (1) if e p = 2, then µ p = ( p − 1)2 (T p2 − S p (1 + p)2 ); (2) if e p = 1 and ρ| ¯ I p is not semisimple, then µ p = ( p − 1)2 ( p + 1); (3) if e p = 1 and ρ| ¯ I p is semisimple, then µ p = ( p − 1)2 ; (4) if e p = 0 and p ∈ T (ρ), ¯ then µ p = p 2 − 1; and (5) if e p = 0 and p ∈ / T (ρ), ¯ then µ p = p − 1. Q Now we define an element µ S,S 0 of R Smod by setting µ S,S 0 = S 0 −S µ p , except in the Q special case S = ∅ 6= S 0 when we define instead µ S,S 0 = (1/2) S 0 −S µ p . Note that if p is a prime at which ρ¯ is unramified and ρ(Frob ¯ p ) has distinct mod 2 2 eigenvalues, then T p − S p (1 + p) is a unit in R∅ , so µ p is a unit times ( p − 1)2 . In particular, if S = Q is of the form considered toward the end of Section 2, then the element µ∅,Q of R∅mod is equal to a unit times the order of the group G Q . When S ⊂ S 0 , we have a natural surjection R Suniv → R Suniv arising from the universal 0 univ univ property of (R S 0 , ρ S 0 ), and we also have a surjection R Smod → R Smod induced by 0 Q Q ¯ the projection f ∈N 0 K¯ → f ∈N S K ; these maps are compatible with the maps S

356

MARK DICKINSON

φ S 0 : R Suniv → R Smod and φ S : R Suniv → R Smod . We have the following analogue of 0 0 [CDT, Prop. 5.5.1], which is used to establish the main theorem both in the case S = ∅ and in the case S 6= ∅.


357

PROPOSITION 30 There is a perfect pairing

L S ⊗O L S → O of O -modules which induces an isomorphism L S → HomO (L S , O ) of R Smod modules. For each inclusion S ⊂ S 0 of finite sets of odd primes, with r not in S 0 , there is a map i S,S 0 : L S → L S 0 of R Smod 0 -modules such that (1) if S ⊂ S 0 ⊂ S 00 , then i S,S 00 = i S 0 ,S 00 ◦ i S,S 0 ; (2) i S,S 0 is injective with torsion-free cokernel; and (3) if jS 0 ,S : L S 0 → L S is the adjoint of i S,S 0 with respect to the pairings of Proposition 30, then the composite jS 0 ,S ◦ i S,S 0 : L S → L S is given by multiplication by an element of R Smod ; this element is a unit times the element µ S,S 0 defined above. Proof These results are proved in exactly the same way as in [CDT, Sec. 6.3], using Propositions 15 and 16 of Section 4 in place of [CDT, Lems. 6.1.2 and 6.3.1]. Note that each of the elements µ p defined above is equal to ( p − 1) times the corresponding factor defined in [CDT]. This difference is accounted for by the fact that we impose no global restriction on the determinant of a deformation of ρ, ¯ with the result that the maps of modular curves used to define the inclusion L S → L S 0 may have larger degree than the corresponding maps in [CDT]. The larger degree just results in an extra constant factor in the computation of the composite jS 0 ,S ◦ i S,S 0 , and it does not affect the calculations otherwise. Similarly, in the special case when S = ∅ 6= S 0 , an extra factor of 1/2 appears in the computations; this is a result of the fact that the index of VS 0 in V∅ is twice the degree of the map Y S 0 → Y∅ because V∅ contains the element −1 0 0 0 −1 while VS does not. 9. Proof of the main theorem when S = ∅ In this section we assemble the ingredients needed to prove the main theorem when S = ∅. The backbone of the argument is the following commutative algebra lemma, due to Diamond [Dia3], which simplifies the original commutative algebra arguments described in [Wi2] and [TW]. We restate it here in a less general form that is more immediately applicable to our situation. LEMMA 31 Let R be an object of CO , and let H be a nonzero R-module that is free of finite

358

MARK DICKINSON

rank as an O -module. Suppose that for some fixed integer r ≥ 0 and for each of an unbounded set of n ≥ 0 we have the following data: (1) an object Rn of CO which can be generated as a topological O -algebra by r elements and a surjection Rn → R; (2) a finite abelian 2-group G n that is the product of r cyclic groups, each of order at least 2n , and a map O [G n ] → Rn of objects of CO such that the kernel of the composite O [G n ] → Rn → R contains the augmentation ideal of O [G n ]; and (3) a module Hn over Rn which is free over O [G n ] and a map Hn → H of Rn modules inducing an isomorphism (Hn )G n ∼ = H. Then R is a complete intersection ring and H is a free R-module. Proof See Theorem 2.1 and [Dia3, proof of Th. 3.1]. Recall that L ∅ was defined as a module over R∅mod and that it becomes a module over R∅univ via the surjection φ∅ : R∅univ → R∅mod . We apply this lemma with R = R∅univ and H = L ∅ to deduce that L ∅ is free as an R∅univ -module and that R∅univ is a complete intersection. It follows trivially that φ∅ is an isomorphism. We prove in Section 11 the following theorem. 32 There is an integer s such that for every integer n ≥ 3 we can find a set Q n consisting of s primes, with the property that for each p in Q n the representation ρ¯ is unramified at p, ρ(Frob ¯ p ) has distinct k-rational eigenvalues, and p is congruent to n+1 1 modulo 2 , and such that the universal deformation ring R univ Q n can be generated by 2s elements as a topological O -algebra. THEOREM

The ring R univ Q n for n ≥ 3 plays the role of Rn , and L Q n takes the part of Hn . Let G n be the group G Q n , and let O [G n ] → Rn be the natural map defined toward the end of Section 2.2. Let r = 2s. It remains only to find a map L Q n → L ∅ of R univ Qn modules which induces an isomorphism (L Q n )G Q n → L ∅ . By Proposition 30 we have dualities L ∅ → HomO (L ∅ , O ) and L Q n → HomO (L Q n , O ) and an injection i n = i ∅,Q n : L ∅ → L Q n


359

with torsion-free cokernel; furthermore, these maps are all maps of R mod Q n -modules and hence also maps of R univ -modules, and if j = j is the adjoint of i n with respect n Q n ,∅ Qn to these pairings, then the composite jn i n : L ∅ → L ∅ is given by multiplication by a unit times the order of G Q n . Recall also that from Proposition 17 the cardinality of N Q n is equal to the cardinality of G Q n times that of N∅ , and hence the O -rank of L Q n is equal to the cardinality of G Q n times the O -rank of L ∅ . One can also check that the G O -rank of L Q nQ n is equal to 4#N∅ ; this follows from the observation that a newform f in N Q n is in N∅ if and only if the image of G Q n in R ×f is trivial. Now it follows, using the same pure commutative algebra argument as in [CDT, last paragraph of Sec. 6.4], that L Q n is a free O [G Q n ]-module and that the map jn of R univ Q n -modules induces an isomorphism, as required. This completes the proof of the main theorem in the minimal case. 10. Proof of the main theorem when S 6 = ∅ We prove the main theorem when S 6 = ∅ using a freeness criterion of Diamond [Dia3, Th. 2.4], based on H. Lenstra’s generalisation of the numerical isomorphism criterion of Wiles. We reproduce Diamond’s result here for convenience. THEOREM 33 (Diamond) Let R be an object of CO equipped with an O -algebra homomorphism R → O , and let p denote the kernel of this homomorphism. Suppose that H is an R-module, finitely generated and free over O , and suppose that p is in the support of H . Let T = R/ Ann R H , write pT for the image of p in T , and write JT for AnnT pT . Let denote H/(H [pT ] + H [JT ]). Let d denote the rank of H [p] = H [pT ] over O . If has finite length over O , then the following are equivalent: (1) rankO H ≤ d rankO T and lengthO ≥ d lengthO (p/p2 ), (2) rankO H = d rankO T and ∼ = (O / FittO (p/p2 ))d , and (3) R is a complete intersection ring and H is free of rank d over R.

Proof See [Dia3, Th. 2.4] Let S be a set of odd primes, and let f be an element of the (nonempty) set N∅ . Assume that K contains the eigenvalues of f . We apply the above criteria with R = R Suniv , H = L S and with the map θ S : R Suniv → O corresponding to f ; then, since R Suniv → R Smod is surjective and L S is a faithful R Smod -module, T is isomorphic to R Smod , d is equal to 4, and we know from Propositions 24 and 29 that rankO H = 4 rankO T . Write p S for the kernel of θ S , and write JS for the ideal JT of the proposition; then, in order to apply the proposition, all we have to show is that the

360

MARK DICKINSON

O -length of is at least 4 times the O -length of p S /p2S .

34 The module C S = L S /(L S [p S ] + L S [JS ]) fits into a short exact sequence LEMMA

0 → L S [p S ] → HomO (L S [p S ], O ) → C S → 0 of R Smod -modules. Proof Since R Smod is reduced, p S ∩ JS = 0 and p S + JS has finite index in R Smod . It follows that the submodules L S [p S ] and L S [JS ] have trivial intersection and that their sum is a finite-index submodule of L S . There is a natural map of R Smod -modules LS ∼ = HomO (L S , O ) → HomO (L S [p S ], O ) which is surjective because the inclusion L S [p S ] → L S has torsion-free cokernel. The kernel of L S → HomO (L S [p S ], O ) is easily seen to be contained in L S [JS ], and hence it is equal to L S [JS ] since its O -rank is equal to that of L S [JS ]. Now L S /L S [JS ] ∼ = HomO (L S [p S ], O ), and so L S /(L S [JS ] + L S [p S ]) is isomorphic to the cokernel of the map L S [p S ] → HomO (L S [p S ], O ). So this cokernel has finite length, and it follows that the map L S [p S ] → HomO (L S [p S ], O ) is injective since it is a map of O -modules of equal rank. Let i = i ∅,S : L ∅ → L S be the inclusion map described in Proposition 30, and let j = jS,∅ be its adjoint. We have a diagram of R Smod -modules 0

/ L ∅ [p∅ ] [ i

0

/ L S [p S ]

/ HomO (L ∅ [p∅ ], O ) [ j

/ C∅

/0

/ HomO (L S [p S ], O )

/ CS

/0

C1

/ C2

0

0

j∗

i∗

in which all rows and columns are exact and every module is killed by p S , so that R Smod acts through θ S . (Note that p S maps onto p∅ , so that L ∅ [p∅ ] = L ∅ [p S ].) One can check that the map denoted i is injective with torsion-free cokernel and that i ∗ is surjective;


361

it follows that both maps are isomorphisms since in each case the O -rank of the domain is equal to that of the codomain. The composites ji and i ∗ j ∗ are both given by multiplication by the nonzero element θ∅ (µ∅,S ) of O , and so j ∗ is injective and j ∗ i ∗ is also equal to multiplication by θ∅ (µ∅,S ). Thus C1 can be identified with the cokernel of multiplication by θ∅ (µ∅,S ) on the free rank-4 O -module HomO (L S [p S ], O ). It follows from a diagram chase that the map C1 → C2 of cokernels is an isomorphism and that the map C∅ → C S is injective. So lengthO C S = lengthO C∅ + 4 length O /θ∅ (µ∅,S )O . From Theorem 33, since we know already from the proof of the minimal case of the main theorem that L ∅ is free of rank 4 over the complete intersection ring R∅mod , we have lengthO C∅ = 4 lengthO p∅ /p2∅ . In Section 11 we prove the following result. THEOREM 35 Suppose that f is a newform in N∅ , all of whose Hecke eigenvalues lie in K , with corresponding representation ρ f : G Q → GL2 (O ). Let S ⊂ S 0 be two finite sets of odd primes, and let p S and p S 0 be the kernels of the maps θ S : R Suniv → O and θ S 0 : R Suniv → O corresponding to ρ f . Then 0

lengthO p S 0 /p2S 0 ≤ lengthO p S /p2S + lengthO O /θ S (µ S,S 0 )O , where µ S,S 0 is the element of R Smod defined in Section 8. So we now have lengthO C S = 4 lengthO p∅ /p2∅ + 4 lengthO O /θ∅ (µ∅,S )O ≥ 4 lengthO p∅ /p2∅ + 4(lengthO p S /p2S − lengthO p∅ /p2∅ ) = 4 lengthO p S /p2S , and we deduce again from Theorem 33 that R Suniv is a complete intersection ring and that L S is free of rank 4 over R Suniv ; hence R Suniv → R Smod is an isomorphism, as required. This completes the proof of the main theorem in the case S 6= ∅. 11. Galois cohomology and dimension calculations In this section we give proofs of Theorems 32 and 35, which are used, respectively, to prove the minimal case and the nonminimal case of the main theorem. 11.1. Selmer groups Here we give an interpretation of the tangent space of the universal deformation ring R Suniv as a Selmer group contained in H 1 (Q, ad ρ). ¯

362

MARK DICKINSON

Let M be any finite discrete abelian group with a continuous action of G Q , and ¯ denote its Cartier dual. let M ∗ = Hom(M, µ(Q)) Definition A family of local conditions for M is a map that assigns to each place v of Q a subgroup L v of H 1 (G v , M) in such a way that for all but finitely many places v the subgroup L v is equal to H 1 (G v /Iv , M Iv ). ⊥ If L is a family of local conditions, then so is the dual L ∗ = {L ⊥ v }v , where L v is the annihilator of L v under the perfect pairing

H 1 (G v , M) ⊗ H 1 (G v , M ∗ ) → Q/Z given by J. Tate’s local duality theorem. Given a family L of local conditions for M, we define the corresponding Selmer 1 (G , M) or H 1 (Q, M) to be the set of all elements of H 1 (G , M) whose group HL Q Q L restriction to H 1 (G v , M) lies in L v for each place v. The following theorem of Wiles relates the size of such a Selmer group to the size of its dual. THEOREM 36 (Wiles) Let L be a family of local conditions for a finite discrete G Q -module M. Then the 1 (Q, M) is finite and order of the the Selmer group HL 1 (Q, M) 1 (Q, M ∗ ) Y #HL #HL #L v ∗ = . 0 0 ∗ 0 #H (Q, M) #H (Q, M ) v #H (G v , M)

To make sense of the infinite product on the right-hand side, note that for all but finitely many places v the cardinalities of L v = H 1 (G v /Iv , M Iv ) and H 0 (G v , M) are equal. Proof For a proof of this theorem, see [DDT, Sec. 2.3]. Now suppose that S is a set of odd primes and that (R, ρ) is an S-deformation of ρ, ¯ and let M be an R-module of finite cardinality. Let R[M] denote the object R ⊕ M of CO in which M is an ideal whose square is zero. Then there is a correspondence (see [DDT, Sec. 2.4]) between deformations of ρ to R[M] (that is, deformations of ρ¯ to R[M] which lift ρ) and elements of the Galois cohomology group H 1 (Q, ad ρ ⊗ R M), which sends a cocycle ξ : G Q → M2 (M) to the representation (1 + ξ )ρ : G Q → GL2 (R[M]). This correspondence restricts to give a correspondence between the set of S-deformations of ρ to R[M] and some submodule of H 1 (Q, ad ρ ⊗ R M) which


363

we denote HS1 (Q, ad ρ ⊗ R M). Standard arguments give the following interpretation of HS1 (Q, ad ρ ⊗ R M) as a Selmer group. PROPOSITION 37 The subspace HS1 (G Q , ad ρ ⊗ R M) of H 1 (G Q , ad ρ ⊗ R M) consists of all those cohomology classes x such that (1) for each odd prime p not in S, the restriction of x to a decomposition group at p is contained in the submodule H 1 (G p /I p , (ad ρ) ¯ I p ) of H 1 (G p , ad ρ), ¯ and (2) the restriction of x to G 2 satisfies the local condition at 2.

By Theorem 36 the module HS1 (G Q , ad ρ ⊗ R M) has finite cardinality. In particular, in the special case when (R, ρ) = (k, ρ) ¯ and M = k, this Selmer group is naturally isomorphic to the tangent space HomCO (R Suniv , k[ε]) of R Suniv ; hence we have the following. COROLLARY 38 The universal deformation ring R Suniv is Noetherian, equal to a quotient of the power series ring O [[X 1 , . . . , X r ]] in r = dimk HS1 (G Q , ad ρ) ¯ indeterminates over O .

We also need to analyse further the local condition at 2 in this case. Let N be the 2-dimensional k[G Q ]-module corresponding to ρ, ¯ and let N1 be the 1-dimensional subspace that is fixed by inertia and on which Frob2 acts by the eigenvalue of ρ(Frob ¯ 2) −1 which is not equal to α. Write ad ρ¯ for the submodule Homk (N /N1 , N1 ) of ad ρ. ¯ Then we have the following result. PROPOSITION 39 Let σ = (1 + εξ )ρ¯ be a lifting of ρ¯ to k[ε]. Then σ satisfies the local condition at 2 if and only if the following apply: (1) if ρ¯ is ramified at 2, then ξ |G 2 is contained in the kernel of the natural map

H 1 (G 2 , ad ρ) ¯ → H 1 (I2 , ad ρ/ ¯ ad−1 ρ); ¯ (2)

if ρ¯ is unramified at 2, then ξ |G 2 is contained in the submodule H 1 (G 2 /I2 , (ad ρ) ¯ G 2 ) ⊕ H 1 (G 2 , ad−1 ρ) ¯ of H 1 (G 2 , ad ρ). ¯

Proof First, suppose that ρ¯ is ramified at 2. Without loss of generality we may suppose that χ1 ∗ ρ| ¯ G2 = . 0 χ2

364

MARK DICKINSON

Then ad−1 ρ¯ is the set of matrices of the form 00 ∗0 . The condition imposed on σ at 2 was that it should be conjugate to a representation ψ1 ∗ 0 ψ2 for some unramified characters ψ1 and ψ2 . If σ has this form, then ξ |G 2 is clearly in the kernel of the natural map H 1 (G 2 , ad ρ) ¯ → H 1 (I2 , ad ρ/ ¯ ad−1 ρ). ¯ Conversely, suppose that ξ |G 2 is in this kernel; then lifting coboundaries, we may replace ξ by an equivalent cocycle and assume that ξ | I2 = 00 ∗0 , so that ρ(g) is upper triangular for elements of the inertia group I2 . Now, since we have assumed that ρ¯ is ramified at 2, there is an element α0 β1 = ρ(g) ¯ for some g in the inertia group I2 , with either α − 1 or β nonzero. For any element h of the decomposition group G 2 , we have the cocycle identity ξ(g) + gξ(h) = ξ(h) + hξ(h −1 gh) in which both sides are just ξ(gh) = ξ(h · h −1 gh). But ξ(h −1 gh) and ξ(g) both have the form 00 ∗0 , and conjugation by ρ(h) ¯ preserves this; hence the matrix ρ(g)ξ(h) ¯ ρ(g ¯ −1 ) − ξ(h) also has this form, that is, −1 α β α ξ(h) 0 1 0

−β/α 0 γ − ξ(h) = 1 0 0

for some γ . It follows that ρ(h) is upper triangular for every h in G 2 , and so that ρ|G 2 has the required form. Now, suppose that ρ¯ is unramified at 2, so that, without loss of generality, β 0 ρ(Frob ¯ ) = 2 0 α with α and β distinct. Suppose that ρ = (1 + εξ )ρ¯ is a lifting that satisfies the local condition at 2. Then ξ |G 2 is contained in the submodule H 1 (G 2 , ad−1 ρ) ¯ ⊕ H 1 (G 2 /I2 , (ad ρ) ¯ G2 ) of H 1 (G 2 , ad ρ). ¯ Conversely, if ξ |G 2 is contained in this submodule, then, altering ξ π ∗ by a coboundary if necessary, it has the form 01 π2 , where π1 and π2 are unramified additive characters, and hence (1 + εξ )ρ¯ has the required form.


365

Now, define subgroups L v ⊂ H 1 (G v , ad ρ) ¯ for each place v of Q as follows: for odd p not in S, let L p = H 1 (G p /I p , (ad ρ) ¯ I p ); for p in S, let L p = H 1 (G p , ad ρ); ¯ let L ∞ = H 1 (G ∞ , ad ρ); ¯ and finally, let L 2 be an appropriate submodule of H 1 (G 2 , ad ρ) ¯ as described in the statement of Proposition 39. Then Propositions 37 1 (Q, ad ρ) and 39 identify HS1 (Q, ad ρ) ¯ with the Selmer group HL ¯ defined by the family L = {L v }v of local conditions for ad ρ. ¯ 11.2. Proof of Theorem 32 In this section we show that we can find a special series of sets of primes S such that the dimension of the tangent space of the corresponding universal deformation ring R Suniv (and hence the number of elements required to generate R Suniv as a topological O -algebra) is equal to 2#S. We continue to assume that ρ¯ satisfies the hypotheses of Section 1, although the modularity assumption is not used anywhere in this section. As explained in Section 2, the number of elements required to generate R Suniv can be expressed as the dimension of the space of S-deformations of ρ¯ to k[ε], and this in turn can be identified with the Selmer group HS1 (Q, ad ρ) ¯ defined in Section 11. We use Wiles’s result (Theorem 36) along with Propositions 37 and 39 to relate the dimension of HS1 (Q, ad ρ) ¯ to the dimension of the dual Selmer group HS1∗ (Q, ad ρ), ¯ and we then compute the latter dimension for certain carefully chosen sets S. For each integer n ≥ 0, let E n be the cyclotomic extension Q(ζ2n ) obtained by adjoining a primitive 2n th root of unity to Q, and let Fn be the extension of E n cut out by the representation ad ρ, ¯ so that G Fn is the kernel of the map proj ρ¯ : G E n → PGL2 (k). The Galois group Gal(Fn /E n ) is thus isomorphic to the projective image in PGL2 (k) of the representation ρ¯ restricted to G E n . The subgroups of PGL2 (k) were classified by L. Dickson in [Dic]; using his classification and our assumptions on ρ, ¯ we find that the projective image of ρ(G ¯ Q ) is either dihedral, of order dividing 2(#k − 1), or it is simple, conjugate to PGL2 (κ) for some subfield κ 6 = F2 of k. The main result of this section is the following lemma, which allows us to turn the contribution of H∅1∗ (Q, ad ρ) ¯ to the dimension calculation into something more manageable. LEMMA 40 Let ψ be an element of H∅1∗ (Q, ad ρ). ¯ If the restriction of ψ to the group H 1 (E n , ad ρ) ¯ is nontrivial, then there are infinitely many primes q such that (1) q is congruent to 1 modulo 2n , (2) ρ¯ is unramified at q and ρ(Frob ¯ q ) has distinct k-rational eigenvalues, and (3) ψ maps to a nontrivial element of H 1 (Fq , ad ρ) ¯ ⊂ H 1 (Qq , ad ρ). ¯ 1 Conversely, if ψ is an element of H (Q, ad ρ) ¯ which restricts to the zero element of H 1 (E n , ad ρ) ¯ and whose restriction to a decomposition group at 2 is contained in the

366

MARK DICKINSON

1 dual local condition L ⊥ ¯ and there are 2 , then ψ is in the Selmer group H∅∗ (Q, ad ρ) no primes q having the above properties.

Here L 2 is the subgroup of H 1 (G 2 , ad ρ) ¯ defined following the proof of Proposition 39. Before proving this lemma, we establish two preliminary results. LEMMA 41 The projective image of ρ¯ is the same as that of ρ¯ restricted to G E n .

Proof The projective image of ρ¯ is isomorphic to Gal(F0 /Q), while the projective image of ρ¯ restricted to G E n is isomorphic to Gal(Fn /E n ). The statement amounts to saying that the natural inclusion Gal(Fn /E n ) → Gal(F0 /Q) is a bijection, which is equivalent to the statement that the extensions E n and F0 of Q are linearly disjoint. The Galois group Gal(E n ∩ F0 /Q) is a simultaneous quotient of Gal(F0 /Q) and of Gal(E n /Q) ∼ = (Z/2n Z)× . If the projective image of ρ¯ is PGL2 (κ) for some subfield κ of k of cardinality at least 4, then it is simple and so has no abelian quotients. Thus Gal(E n ∩ F0 /Q) is trivial, and E n and F0 are linearly disjoint. If the projective image is isomorphic to the dihedral group D2h , then h is odd and the only abelian quotient of order a power of 2 is the quotient by the cyclic subgroup C h of index 2. So Gal(Fn /E n ) is isomorphic either to D2h , in which case E n and F0 are again linearly disjoint, or to C h . In the latter case, the projective image of ρ¯ becomes cyclic on restricting to the absolute Galois group of one of the three fields √ √ √ Q( 2), Q( −2), or Q( −1), and it follows that ρ¯ must be induced from a character of one of these Galois groups. One can check that then the semisimplification of the restriction of ρ¯ to G 2 is a scalar representation, and this violates our assumption on the restriction of ρ¯ to G 2 . So the projective image of ρ¯ restricted to G E n cannot be isomorphic to C h . 42 The cohomology group H 1 (Fn /E n , ad ρ) ¯ is trivial. LEMMA

Proof By Lemma 41, Gal(Fn /E n ) is isomorphic to Gal(F0 /Q) with compatible actions on ad ρ, ¯ so it is enough to prove this result when n = 0. We treat first the case when Gal(F0 /Q) is isomorphic to PGL2 (κ). In this case, the cohomology group above is H 1 (PGL2 (κ), M2 (k)), where M2 (k) is the set of two-by-two matrices with entries in k and PGL2 (κ) acts on M2 (k) by


367

conjugation. By decomposing M2 (k) using a κ-basis of k, one sees that it is enough to show that H 1 (PGL2 (κ), M2 (κ)) = 0. In fact, this result is true for any finite field κ with the exceptions of F3 and F5 . See [DDT] for the proof when the order of κ is odd. Here we give the proof where the cardinality of κ is divisible by 4; when κ = F2 , the group PGL2 (κ) is dihedral; this case is treated along with the other dihedral cases below. Write G for the group PGL2 (κ) ∼ = SL2 (κ), M for the G-module M2 (κ), and M 0 for the set of elements of M of trace zero. There is a filtration 0 ⊂ κ ⊂ M 0 ⊂ M, in which the quotient M 0 /κ is isomorphic to κ 2 with an element ac db of SL2 (κ) 2 2 acting on κ 2 as left multiplication by the matrix a2 b 2 . The other two quotients of c d this filtration have trivial G-action. The cohomology group H 1 (G, κ) = Hom(G, κ) is trivial because G is simple, so it follows from the long exact sequence associated to the short exact sequence 0 → κ → M → M/κ → 0 that H 1 (G, M) injects into H 1 (G, M/κ), and it suffices to prove that the latter group vanishes. One can check easily that H 0 (G, M/κ) = 0; for instance, there are no nontrivial elements of M/κ which are fixed under conjugation by both the matrix 10 11 and the matrix a0 b0 with a 6= b. So, from the short exact sequence 0 → M 0 /κ → M/κ → κ → 0 of G-modules, we get an exact sequence 0 → κ → H 1 (G, M 0 /κ) → H 1 (G, M/κ) → 0, and it suffices to show that H 1 (G, M 0 /κ) has dimension 1. Let B be the image of the upper-triangular matrices in PGL2 (κ); then the restriction map H 1 (PGL2 (κ), M 0 /κ) → H 1 (B, M 0 /κ) is injective since its composition with the corestriction map is just multiplication by the index of B in G, which is odd. Write M 1 for the B-submodule of M 0 consisting of upper-triangular matrices with trace zero. We have a short exact sequence 0 → M 1 /κ → M 0 /κ → M 0 /M 1 → 0 b in which an element a0 a −1 of B acts by a 2 on the first term and a −2 on the third. A portion of the associated long exact sequence of cohomology groups is H 0 (B, M 0 /M 1 ) → H 1 (B, M 1 /κ) → H 1 (B, M 0 /κ) → H 1 (B, M 0 /M 1 ) in which the first term is zero because B acts nontrivially on M 0 /M 1 . Let U C B be the subgroup of unipotent matrices of B; then from the five-term restriction-inflation

368

MARK DICKINSON

sequence it follows that H 1 (B, N ) ∼ = H 1 (U, N ) B/U for any finite κ-module N since the order of B/U is coprime to that of N . In particular, H 1 (B, M 1 /κ) = H 1 (U, M 1 /κ) B/U = Hom B (U, M 1 /κ) is 1-dimensional over κ since an element a0 db of B acts by a 2 on U and on M 1 /κ. Similarly, H 1 (B, M 0 /M 1 ) = Hom B (U, M 0 /M 1 ) is the trivial module. So H 1 (B, M 0 /κ) is 1-dimensional over κ. The case where the projective image of ρ¯ is dihedral is easier. We have to show that H 1 (D2h , M2 (k)) is zero, where the cyclic subgroup C h of D2h acts by conjugation by diagonal matrices in PGL2 (k) and D2h is generated by C h and the element 0 1 . Since the order of C is odd, the inflation map h 10 H 1 (D2h /C h , M2 (k)Ch ) → H 1 (D2h , M2 (k)) is an isomorphism. Now M2 (k)Ch is the space of diagonal matrices, and from the explicit description of cohomology for a cyclic group we see that H 1 (D2h /C h , M2 (k)Ch ) = 0.

Proof of Lemma 40 Suppose that ψ is a cocycle representing an element of H 1 (Q, ad ρ) ¯ which does not restrict to zero in H 1 (E n , ad ρ). ¯ If ρ¯ is unramified at q, then the conditions on the prime q can be translated into the following conditions on any lifting 8 of Frobq : (1) 8 is contained in G E n , (2) ρ(8) ¯ has distinct k-rational eigenvalues, and (3) ψ(8) is not contained in (8 − 1) ad ρ. ¯ The last condition arises because the cohomology group H 1 (Fq , ad ρ) ¯ can be exˇ pressed as the cokernel of Frobq −1 acting on ad ρ. ¯ By the Cebotarev density theorem, to find a prime q satisfying the above conditions it suffices to show that the open subset of G Q that consists of all elements σ in G E n for which ρ(σ ¯ ) has distinct k-rational eigenvalues and ψ(σ ) ∈ / (σ − 1) ad ρ¯ is nonempty. Lemma 42 shows that the first term in the restriction-inflation sequence 0 → H 1 (Fn /E n , ad ρ) ¯ → H 1 (E n , ad ρ) ¯ → H 1 (Fn , ad ρ) ¯ is zero, so that the map on the right-hand side is an injection. It follows that ψ is nontrivial when restricted to an element of H 1 (Fn , ad ρ). ¯ Since G Fn acts trivially on ad ρ, ¯ this last cohomology group is just the continuous homomorphisms from G Fn to


369

ad ρ. ¯ Thus it makes sense to talk about the image ψ(G Fn ) of ψ restricted to G Fn , and this image is a nontrivial subgroup of ad ρ. ¯ It follows from the explicit descriptions of the possible projective images of ρ¯ above that we can find elements α0 β0 , β0 α0 , and α0 β0 in the projective image of ρ¯ restricted to G E n , with α 6= β. The intersection of the subspaces of ad ρ¯ of the form (σ − 1) ad ρ¯ as σ ranges over these three elements is (0). But ψ(G Fn ) is nontrivial, so for at least one of these elements σ , ψ(G Fn ) is not contained in (σ −1) ad ρ. ¯ Now if τ is any element of G Fn , then (σ −1) ad ρ¯ = (τ σ −1) ad ρ, ¯ ψ(τ σ ) = τ ψ(σ )+ψ(τ ) = ψ(σ )+ψ(τ ), and ρ(τ ¯ σ ) = ρ(σ ¯ ) has distinct k-rational eigenvalues. Since ψ(G Fn ) is not contained in (σ − 1) ad ρ, ¯ it follows that there is an open subset of G Fn consisting of elements τ such that ψ(τ σ ) is not contained in (τ σ − 1) ad ρ, ¯ so that τ σ is an element satisfying the properties listed above. This completes the proof of the first part of Lemma 40. Now, suppose that ψ is an element of H 1 (Q, ad ρ) ¯ which restricts to the zero 1 element of H (E n , ad ρ). ¯ Then, for every prime q at which ρ¯ is unramified and which is congruent to 1 modulo 2n , the decomposition group G q is contained in G E n , and so ψ automatically restricts to the zero element of H 1 (Qq , ad ρ). ¯ Hence in this case there are no primes q satisfying the listed properties. Finally, note that for any prime p 6= 2 the inertia group I p is contained in G E n since E n is ramified only at 2. So ψ restricts to an element of H 1 (F p , (ad ρ) ¯ I p ) in 1 ∗ H (Q p , ad ρ) ¯ and all the local conditions ∅ are satisfied, except possibly the condition at 2. This completes the proof of Lemma 40. We now use Theorem 36 and the Selmer group interpretation of H Q1 (Q, ad ρ) ¯ given 1 at the end of Section 11.1 to compute the size of H Q (Q, ad ρ) ¯ for particular Q. Suppose that Q is a finite set of odd primes with the property that for each q in Q the representation ρ¯ is unramified at q and ρ(Frob ¯ q ) has distinct k-rational eigenvalues. Then in the formula of Theorem 36, (1) the odd primes p not in Q contribute nothing; (2) the primes p in Q each contribute a factor of (#k)2 ; (3) the G Q -module ad ρ¯ is self-dual, so that #H 0 (G Q , ad ρ) ¯ = #H 0 (Q, ad ρ¯ ∗ ); (4)

the k-vector space H 1 (G ∞ , ad ρ) ¯ is trivial, so that the local condition L ∞ is also trivial and the contribution to the dimension count at the place ∞ is − dimk H 0 (G ∞ , ad ρ) ¯ = −2. These facts give us the following formula for the dimension:

dimk H Q1 (Q, ad ρ) ¯ = dimk H Q1 ∗ (Q, ad ρ) ¯ + dimk L 2 − dimk H 0 (G 2 , ad ρ) ¯ + 2#Q − 2.

370

MARK DICKINSON

The idea now is to use Lemma 40 to find sets of primes Q for which the dual Selmer group H Q1 ∗ (Q, ad ρ) ¯ is as small as possible. We first prove a lemma that makes no use of the local information at 2 in the deformation problem. LEMMA 43 There is an integer r ≥ 0 and for each n ≥ 3 a set Q n of r primes, each congruent to 1 modulo 2n , such that the dimension of H Q1 n (Q, ad ρ) ¯ is equal to

2#Q n + dimk γ −1 (L 2 ) − dimk H 0 (G 2 , ad0 ρ) ¯ if H 1 (Gal(Q2 (ζ2∞ )/Q2 ), k)⊥ ⊂ β L 2 , and 2#Q n − 1 + dimk γ −1 (L 2 ) − dimk H 0 (G 2 , ad0 ρ) ¯ if H 1 (Gal(Q2 (ζ2∞ )/Q2 ), k)⊥ 6 ⊂ β L 2 , where γ and β are the maps H 1 (G 2 , ad0 ρ) ¯

γ

/ H 1 (G , ad ρ) ¯ 2

β

/ H 1 (G , k) 2

in the long exact sequence of G 2 -modules associated to the short exact sequence 0 → ad0 ρ¯ → ad ρ¯ → k → 0 in which the right-hand map is the trace map. Proof Let r be the dimension of the vector space ¯ → H 1 (E n , ad ρ) ¯ . im H∅1∗ (Q, ad ρ) For any choice of primes Q at which ρ¯ is unramified, we have an exact sequence M 0 → H Q1 ∗ (Q, ad ρ) ¯ → H∅1∗ (Q, ad ρ) ¯ → H 1 (Fq , ad ρ) ¯ q∈Q

since the only local conditions that differ are those at q. Now, starting with Q n = ∅ and using Lemma 40, we can repeatedly reduce the dimension of H Q1 ∗ (Q, ad ρ) ¯ by n adding a suitably chosen prime q to the set Q n ; after doing this at most r times, we can assume that H Q1 ∗ (Q, ad ρ) ¯ has been reduced to the portion of H∅1∗ (Q, ad ρ) ¯ n

which lies in the kernel of the map H∅1∗ (Q, ad ρ) ¯ → H 1 (E n , ad ρ); ¯ we can also then add extra primes to Q n so that it contains exactly r elements. By Lemma 40 any cohomology class in H 1 (Q, ad ρ) ¯ which lies in this kernel automatically satisfies all the local conditions away from 2, so H Q1 ∗ (Q, ad ρ) ¯ can be described as the intersection n

of the kernel of the map H 1 (Q, ad ρ) ¯ → H 1 (E n , ad ρ) ¯ with the set of elements of 1 H (Q, ad ρ) ¯ satisfying the local condition L ⊥ at 2. Using the restriction-inflation 2 sequence for the normal subgroup G E n of G Q , this is ⊥ im H 1 (E n /Q, k) → H 1 (Q, ad ρ) ¯ ∩ ker H 1 (Q, ad ρ) ¯ → H 1 (G 2 , ad ρ)/L ¯ 2 ,


371

which is isomorphic to its preimage ⊥ ker H 1 (E n /Q, k) → H 1 (Q2 , ad ρ)/L ¯ 2

in H 1 (E n /Q, k). But the inclusion G 2 → G Q induces a bijection Gal(E n /Q) → Gal(Q2 (ζ2n )/Q2 ) of Galois groups, and since n ≥ 3, the group H 1 Gal(Q2 (ζ2n )/Q2 ), k = Hom Gal(Q2 (ζ2n )/Q2 ), k can be identified with H 1 (Gal(Q2 (ζ2∞ )/Q2 ), k). So the dimension of H Q1 ∗ (Q, ad ρ) ¯ n can now be expressed as ⊥ dimk ker H 1 (Gal(Q2 (ζ2∞ )/Q2 ), k) → H 1 (Q2 , ad ρ)/L ¯ 2 . By local class field theory, the maximal abelian extension of Q2 is the compositum of the maximal unramified extension and Q2 (ζ2∞ ), which are linearly disjoint, ˆ and so the abelianisation of G 2 is isomorphic to the direct product of G 2 /I2 ∼ = Z × ∼ Gal(Q2 (ζ2∞ )/Q2 ) = Z2 . Therefore the dimension above is equal to ⊥ dimk ker H 1 (Q2 , k) → H 1 (Q2 , ad ρ)/L ¯ 2 ⊥ ) is contained in H 1 (Gal(Q (ζ ∞ )/Q ), k), if ker(H 1 (Q2 , k) → H 1 (Q2 , ad ρ)/L ¯ 2 2 2 2 and ⊥ −1 + dimk ker H 1 (Q2 , k) → H 1 (Q2 , ad ρ)/L ¯ 2

otherwise. We have two perfect pairings H 1 (Q2 , k)

α

× H 1 (Q2 , k) o Q/Z

/ H 1 (Q , ad ρ) ¯ 2

× β

H 1 (Q2 , ad ρ) ¯ o

γ

H 1 (Q2 , ad0 ρ) ¯

Q/Z

of k-vector spaces, which are compatible in the sense that (αx, y) = (x, βy) for x ⊥ in H 1 (Q2 , k) and y in H 1 (Q2 , ad ρ). ¯ Now the kernel above is α −1 (L ⊥ 2 ) = (β L 2 ) , ⊥ −1 1 by compatibility of these pairings. So dimk α (L 2 ) is equal to dimk H (Q2 , k) − dimk β L 2 , which is 3 − dimk β L 2 . Furthermore, the condition that α −1 (L ⊥ 2 ) be contained in H 1 (Gal(Q2 (ζ2∞ )/Q2 ), k) can now be replaced by the equivalent dual condition that H 1 (Gal(Q2 (ζ2∞ )/Q2 ), k)⊥ be contained in β L 2 . The expression for the dimension of H Q1 n (Q, ad ρ) ¯ now becomes 2#Q n + dimk L 2 − dimk H 0 (G 2 , ad ρ) ¯ − dimk β L 2

372

MARK DICKINSON

when H 1 (Gal(Q2 (ζ2∞ )/Q2 ), k)⊥ is not contained in β L 2 , and one more than this otherwise. From the long exact sequence 0 → H 0 (Q2 , ad0 ρ) ¯ → H 0 (Q2 , ad ρ) ¯ → H 0 (Q2 , k) → γ −1 (L 2 ) → L 2 → β L 2 → 0, this is equal to 2#Q n − 1 + dimk γ −1 (L 2 ) − dimk H 0 (G 2 , ad0 ρ) ¯ or 2#Q n + dimk γ −1 (L 2 ) − dimk H 0 (G 2 , ad0 ρ), ¯ again depending on whether H 1 (Gal(Q2 (ζ2∞ )/Q2 ), k)⊥ is contained in β L 2 or not. This completes the proof of the lemma. We now apply this lemma to the specific cases that we are considering. First, suppose that ρ¯ is unramified at 2, so that ρ| ¯ G 2 looks like χ01 χ02 with χ1 and χ2 distinct and χ2 (Frob2 ) = α. Then we require that any lifting ρ look like ψ01 ψ∗2 when restricted to G 2 where ψi is unramified and lifts χi , and the corresponding local condition is the subspace H 1 (G 2 /I2 , { diagonal matrices }) ⊕ H 1 (G 2 , k(χ1 /χ2 )) of the space H 1 (G 2 , ad ρ) ¯ = H 1 (G 2 , { diagonal matrices }) ⊕ H 1 (G 2 , k(χ1 /χ2 )) ⊕ H 1 (G 2 , k(χ2 /χ1 )). The pullback γ −1 (L 2 ) of this local condition to H 1 (G 2 , ad0 ρ) ¯ is H 1 (G 2 /I2 , k) ⊕ H 1 (G 2 , k(χ1 /χ2 )), which has dimension 2, the dimension of H 0 (G 2 , ad ρ) ¯ is 1, and the trace map β takes 1 L 2 into H (G 2 /I2 , k), so that β L 2 does not contain H 1 (Gal(Q2 (ζ2∞ )/Q2 ), k)⊥ . So, by Lemma 43, the total dimension count is 2#Q n . The other case we must consider is the case in which ρ¯ is ramified at 2 and the χ ∗ restriction of ρ¯ to a decomposition group at 2 has the form 01 χ2 , where χ1 and χ2 are distinct and unramified. We require the same property of a lifting ρ as above. In this case, the local condition is L 2 = ker H 1 (G 2 , ad ρ) ¯ → H 1 (I2 , ad ρ/ ¯ ad−1 ρ) ¯ ,


373

and its pullback γ −1 (L 2 ) to H 1 (G 2 , ad0 ρ) ¯ is equal to ker H 1 (G 2 , ad0 ρ) ¯ → H 1 (I2 , ad0 ρ/ ¯ ad−1 ρ) ¯ since it follows from the long exact sequence associated to 0 → ad0 ρ/ ¯ ad−1 ρ¯ → ad ρ/ ¯ ad−1 ρ¯ → k → 0 that the map H 1 (I2 , ad0 ρ/ ¯ ad−1 ρ) ¯ → H 1 (I2 , ad ρ/ ¯ ad−1 ρ) ¯ is an injection. The trace map β again maps the local condition L 2 into H 1 (G 2 /I2 , k), so that β L 2 cannot contain H 1 (Gal(Q2 (ζ2∞ )/Q2 ), k)⊥ . It remains to compute dimk γ −1 (L 2 ) − dimk H 0 (G 2 , ad0 ρ). ¯ We have two compatible short exact sequences of G 2 -modules: 0

/ ad−1 ρ¯

/ (ad0 ρ) ¯ I2

/ (ad0 ρ/ ¯ ad−1 ρ) ¯ I2

/0

0

/ ad−1 ρ¯

/ ad0 ρ¯

/ ad0 ρ/ ¯ ad−1 ρ¯

/0

in which the top sequence can be regarded as a sequence of G 2 /I2 -modules. Now, consider the diagram of k-modules (Figure 1), in which the rows and columns are exact. We want to compute the dimension dimk ker δ − dimk H 0 (G 2 ad0 ρ). ¯ Write Z for the cohomology group H 1 (G 2 /I2 , (ad0 ρ/ ¯ ad−1 ρ) ¯ I2 ); then, since the square at the bottom left of the above diagram commutes and since H 1 (G 2 /I2 , ad−1 ρ) ¯ = 0, the image of Z is contained in the image of u. So dimk ker δ = dimk ker u + dimk Z and dimk ker δ− dimk H 0 (G 2 , ad0 ρ) ¯ = dimk ker u + dimk Z − dimk H 0 (G 2 , ad0 ρ). ¯

374

MARK DICKINSON

0 H 0 (G 2 , ad−1 ρ) ¯ H 0 (G 2 , ad0 ρ) ¯ H 0 (G 2 , ad0 ρ/ ¯ ad−1 ρ) ¯ H 1 (G 2 , ad−1 ρ) ¯

H 1 (G 2 /I2 , (ad0 ρ/ ¯ ad−1 ρ) ¯ I2 ) H 2 (G 2 /I2 , ad−1 ρ) ¯

H 1 (G 2 , ad0 ρ) ¯S SSS SSSδ SSS u SSS S) −1 1 (I , ad0 ρ/ / H 1 (G , ad0 ρ/ / ¯ ad ρ) ¯ H ¯ ad−1 ρ) ¯ 2 2 / H 2 (G , ad−1 ρ) ¯ 2 Figure 1

From the exact sequence 0 → H 0 (G 2 , ad−1 ρ) ¯ → H 0 (G 2 , ad0 ρ) ¯ → H 0 (G 2 , ad0 ρ/ ¯ ad−1 ρ) ¯ → H 1 (G 2 , ad−1 ρ) ¯ → ker u → 0 and from the fact that dimk Z = dimk H 0 (G 2 , ad0 ρ/ ¯ ad−1 ρ) ¯ = 0, this dimension is equal to dimk H 0 (G 2 , ad−1 ρ) ¯ − dimk H 1 (G 2 , ad−1 ρ), ¯ which is in turn equal to 1 + dimk H 2 (G 2 , ad−1 ρ), ¯ and so to 1 + dimk H 0 (G 2 , (ad−1 ρ) ¯ ∗) = 1 since χ and ψ are distinct. So the total dimension count is 2#Q n , and this completes the proof of Theorem 32.


375

11.3. Proof of Theorem 35 In this section we prove Theorem 35. Let p be an odd prime. We begin with the following lemma. LEMMA 44 Suppose that f is an element of N∅ all of whose Hecke eigenvalues are in K . Then the determinant of Frob p −1 acting on (ad ρ f ) I p (1) is equal to a unit of O times µ p , where µ p is as defined in Section 8.

Proof This is a straightforward case-by-case check. LEMMA 45 Let ρ : G p → GL2 (O ) be any continuous representation. Then the natural map

H 1 (I p , ad ρ ⊗O K /O )G p /I p → H 1 (I p , K /O )G p /I p induced by the trace map ad ρ → O is a surjection. Proof We have a diagram H 1 (G p , ad ρ ⊗O K /O )

/ H 1 (I p , ad ρ ⊗O K /O )G p /I p

H 1 (G p , K /O )

/ H 1 (I p , K /O )G p /I p

It follows from the five-term restriction-inflation sequence and from the fact that G p /I p ∼ = Zˆ has cohomological dimension 1 that the horizontal maps are surjective. Thus it would be enough to show that the left-hand vertical map is surjective, and, from the long exact sequence in cohomology associated to the sequence 0 → ad0 ρ ⊗O K /O → ad ρ ⊗O K /O → K /O → 0, the surjectivity of the left-hand vertical map is equivalent to the injectivity of the map H 2 (G p , ad0 ρ ⊗O K /O ) → H 2 (G p , ad ρ ⊗O K /O ). For any finite G p -module M of cardinality prime to p, Tate’s local duality theorem gives a perfect pairing H 2 (G p , M) ⊗ H 0 (G p , M ∗ ) → Q/Z,

376

MARK DICKINSON

¯ p )) denotes the Cartier dual of M. Furthermore, for any n, where M ∗ = Hom(M, µ(Q the (Galois-equivariant) trace pairing on ad ρ can be used to identify the Cartier dual n of the G p -module ad ρ ⊗O m−n O /O with the G p -module ad ρ(1) ⊗O O /mO , and the n dual of ad0 ρ ⊗O m−n O /O with the module (ad ρ/O )(1) ⊗O O /mO . So the Pontryagin dual of the map lim H 2 (G p , ad0 ρ ⊗O m−n H 2 (G p , ad ρ ⊗O m−n O /O ) → lim O /O ) − → − → n n can be identified with the map lim H 0 G p , ad ρ(1) ⊗O O /mnO → lim H 0 G p , (ad ρ/O )(1) ⊗O O /mnO ← − ← − n n which is equal to (ad ρ(1))G p → (ad ρ/O )(1)G p , and to prove the lemma it is enough to show that this map is surjective. This we now do. Suppose that A is an element of M2 (O ) which represents an element of (ad ρ/O )(1)G p . It would be enough to show that the trace of A is divisible by 2, for then A − (1/2) tr A would be an element of ad ρ(1)G p lifting A. Take any lift 8 in G p of Frob p in G p /I p , and let B be the invertible matrix ρ(8). Then p B AB −1 = A + λ for some scalar matrix λ. Taking determinant and trace of this equation, we find that ( p 2 − 1) det A = λ(λ + tr A) and ( p − 1) tr A = 2λ. It follows that 4 det A = (tr proof of the lemma.

A)2 ,

and so tr A is divisible by 2. This completes the

We now turn to the proof of Theorem 35. Proof of Theorem 35 It is enough to prove the theorem in the case when S 0 = S ∪ { p}. Let N be the G Q -module ad ρ ⊗O K /O . The dual HomO (π S /π S2 , K /O ) of π S /π S2 has the same length as π S /π S2 and can be identified with the cohomology group HS1 (Q, N ). From Proposition 37 we have a diagram 0

/ H 1 (Q, N ) S

/ H 10 (Q, N ) S

/C

/0

0

/ H 1 (G p /I p , N I p )

/ H 1 (G p , N )

/ H 1 (I p , N )G p /I p

/0


377

in which the rows are exact and the left-hand square is Cartesian. The module C is by definition the cokernel of the top left-hand map; thus lengthO π S 0 /π S20 − lengthO π S /π S2 = lengthO C, and so to prove the theorem we must bound the length of C. By a diagram chase one sees that the right-hand vertical map is injective, so the length of C is bounded by that of H 1 (I p , N )G p /I p , and, as described in [CDT], the length of this is equal to the length of O /d p O , where d p is the determinant of Frob p −1 acting on (ad ρ) I p (1). By Lemma 44 this determinant is equal to a unit times µ p . In the case when S is nonempty, the theorem follows. In the case when S is empty, we need a little more. Suppose that S = ∅ and S 0 = { p}, and let (R 0 , σ ) be any S 0 -deformation of ρ. ¯ Since σ (c) necessarily has the form 01 10 , it follows that 1 = (det σ/ε2 )(c) =

Y ((det σ/ε2 )|Wq ◦ ωq−1 )(−1) q

where the product runs over all primes q. For every q not equal to p, it follows from Proposition 9 that (det σ/ε2 )| Iq is the Teichmüller lift of det ρ| ¯ Iq , which has odd order, and so ((det σ/ε2 )|Wq ◦ ωq−1 )(−1) = 1 for each q 6 = p and hence also for q = p. Now, consider the map HS10 (Q, N ) → H 1 (I p , K /O )G p /I p given by ξ 7→ tr ξ | I p . The codomain of this map can be identified with the O 1 G p /I p be the submodule corremodule Hom(Z× p , K /O ); let D ⊂ H (I p , K /O ) × sponding to maps f : Z p → K /O which send −1 to 0; then the inclusion D ⊂ H 1 (I p , K /O )G p /I p has cokernel isomorphic to O /2O . An element ξ of HS10 (Q, N ) corresponds to an S 0 -deformation (1 + ξ )ρ of ρ to R[m−n O /O ] for some n, and, from the above calculation and the fact that det((1 + ξ )ρ) = (1 + tr ξ ) det ρ, we see that the image of the map above is contained in D. Hence the inverse image of D under the surjective map H 1 (I p , ad ρ ⊗O K /O )G p /I p → H 1 (I p , K /O )G p /I p contains C and lengthO C ≤ lengthO H 1 (I p , ad ρ ⊗ K /O )G p /I p − lengthO O /2O which is equal to lengthO O (1/2)µ p O , as required. 12. Numbers of newforms In this section we prove the two results stated in Proposition 17. Since the cardinality of the set N S is not affected either by replacing K with a finite extension of K or by replacing ρ¯ with a twist by a character of odd conductor, we assume in this section that ρ¯ is minimal.

378

MARK DICKINSON

12.1. Existence of a minimal modular lift Recall that N∅ is the set of classical weight 2 newforms of odd level whose associated deformation is an ∅-deformation. We show that the modularity assumption on ρ, ¯ together with the assumption on ρ| ¯ G 2 , implies that N∅ is nonempty. From Buzzard’s level-lowering result (Proposition 1), there is a weight 2 newform f of odd level equal to the conductor N (ρ) ¯ of ρ¯ and for which the reduction of t2 ( f ) is equal to α if ρ¯ is unramified at 2. From Wiles’s result (Proposition 14) we see that the deformation (R f , ρ f ) satisfies the lifting condition at 2. Recall that the free O -module H∅ = HomV∅ (M, H )m∅ has O -rank equal to twice the cardinality of N∅ ; thus in order to show that N∅ is nonempty it is enough to show that this module is nonzero. We do this by constructing a substitute module M 0 for M such that M 0 ⊗O k ∼ = M ⊗O k and HomV∅ (M 0 , H )m∅ is demonstrably nonzero. Then from the fourth part of Proposition 15 it follows that H∅ is nonzero and N∅ is nonempty. The definition of the substitute module M 0 = ⊗O M 0p is as follows. Let f be the newform above which satisfies the local condition at 2, and assume that K contains K f . Then we have the following. (1) At p in T (ρ), ¯ ρ f |G p is induced from some character ψ p whose reduction is equal to that of χ p . Let M 0p be a model over O for 2(ψ p ) as described earlier. (2) If p is not in T (ρ), ¯ then det ρ f | I p is a lift of ρ| ¯ I p and corresponds to some × × 0 character ψ p : Z p → O . Let M p be a rank-1 O -module on which V p acts by ψ p ◦ det. Note that for p not in T (ρ) ¯ the group ψ p (det V p ) has 2-power order and so has trivial reduction. Given these definitions, one can check that the automorphic representation corresponding to f as above contributes nontrivially to HomO [V∅ ] (M 0 , H )m∅ ⊗O K , and it follows as described above that N∅ is nonempty. 12.2. Cardinalities of sets of newforms Now, suppose that Q is a finite set of odd primes such that for each p in Q the representation ρ¯ is unramified at p and ρ(Frob ¯ p ) has distinct eigenvalues. We prove the second part of Proposition 17, namely, that #N Q = #G Q #N∅ , where G Q is the abelian 2-group defined in Section 2. We define modules L i for 0 ≤ i ≤ 2 as follows. Let r be a prime congruent to 3 modulo 4 not in Q at which ρ¯ is unramified and ρ(Frob ¯ r ) has distinct k-rational eigenvalues. Also, let s be an odd prime (but not a Fermat prime) not in Q ∪ {r }


379

at which ρ¯ is unramified and ρ(Frob ¯ s ) has distinct k-rational eigenvalues, and fix a nontrivial character ψ : (Z/sZ)× → O × of order not a power of 2 (enlarge K if necessary). Q Define subgroups V2 ⊂ V1 ⊂ V0 with Vi = p Vi, p and an O [V0 ]-module M 0 = ⊗ p M 0p as follows: (1) at p not in Q ∪ {r, s}, let Vi, p = V∅, p and M 0p = M p ; (2) at r , let Vi,r = U1 (r ){±I }, and let Mr0 be a rank-1 O -module with trivial U1 (r )-action; (3) at s, let Vi,s = U0 (s), and let Ms0 be a rank-1 O -module on which U0 (s) acts by the character U0 (s) → U0 (s)/U1 (s) ∼ from ψ; = (Z/sZ)× → O × arising a b (4) at p in Q, let V0, p = U0 ( p), let V1, p be the set of all cp in U ( p) such d 0 a b × that d has odd order in (Z/ pZ) , and let V2, p be the set of all cp d in U0 ( p) such that both a and d have odd order in (Z/ pZ)× ; let M 0p = O have trivial U0, p -action. Let T be the polynomial algebra over O generated by operators T p for odd primes p not in Q ∪ {r, s} at which ρ¯ is unramified and by T2 , and let m be the maximal ideal of T which sends T2 to α and T p to tr ρ(Frob ¯ p ) for p odd, as before. Now, define modules L i by L i = HomVi (M, H )m ∼ = H 1 (YVi , F Mˇ )m for 0 ≤ i ≤ 2, where Mˇ denotes the right Vi -module HomO (M, O ) and the isomorphism follows from Proposition 15. Then, using the methods of proof of Lemmas 22 and 23, we can show that an automorphic representation π contributes nontrivially to L 0 ⊗O K¯ if and only if π ⊗ ψ corresponds to an element of N∅ , and it contributes nontrivially to the larger module L 2 ⊗O K¯ if and only if π ⊗ K ψ corresponds to an element of N Q ; in both cases the dimension of the contribution is exactly 2#Q+1 . The key observation is the following. LEMMA 46 Suppose that f is a weight 2 newform from which ρ¯ arises, with corresponding automorphic representation π f = ⊗0p π p , and let p be a prime in Q. Then we have the following: V (1) the subspace π p 2, p of π p has dimension 2 over K¯ ; V V (2) if π p 0, p is nontrivial, then π p is unramified and π p 0 has dimension 2 over K¯ .

Proof By our assumption on Q, ρ¯ is unramified at p and the restriction ρ| ¯ G p has distinct k-rational eigenvalues; it follows that the deformation ρ f |G p of ρ¯ is diagonalisable by Proposition 9, with both components at worst tamely ramified. Using Carayol’s

380

MARK DICKINSON

theorem and the description of the local Langlands correspondence, it follows that × π p = π(χ1 , χ2 ) is principal series, corresponding to characters χ1 : Q× p → O and × × χ2 : Q p → O which are trivial on 1 + pZ p . Write B2 (Q p ) for the subgroup of upper-triangular matrices of GL2 (Q p ). Let V be any open compact subgroup of GL2 (Q p ); then using the explicit description of the principal series representation given in Section 3, π pV can be described as the space of maps f : GL2 (Q p )/V → K¯ of B2 (Q p )-sets, where B2 (Q p ) acts on GL2 (Q p )/V by left multiplication and on K¯ by the character (χ1 , χ2 ) : B2 (Q p ) → K¯ × given by a0 db 7 → χ1 (a)χ2 (d). Choose a set S ⊂ GL2 (Q p ) of representatives for B2 (Q p )\ GL2 (Q p )/V ; then restriction to S identifies π(χ1 , χ2 )V with the space of maps f : S → K¯ such that for each s in S the image f (s) is fixed by sV s −1 ∩B2 (Q p ). If V is either V0, p or V2, p , then we may take S to be the set { 10 01 , 01 10 }, and then it is straightforward to check that π U2 ( p) has dimension 2 over K¯ and that π U0 ( p) is trivial unless χ1 and χ2 are unramified, in which case it is 2-dimensional. Since each π occurs twice in H ⊗O K¯ , we have the following. LEMMA 47 The O -rank of L 0 is equal to 2#Q+2 #N∅ , and the O -rank of L 2 is equal to 2#Q+2 #N Q .

Thus all we have to do is show that the O -rank of L 2 is equal to #G Q times the O -rank of L 0 . Note that YV2 is disconnected, having a number of components equal to the index of V2 in V1 , which is equal to the cardinality of the maximal 2-power quotient of Q × p∈Q (Z/ pZ) , while YV1 is connected and the natural map YV2 → YV1 induced by the inclusion V2 → V1 is an isomorphism when restricted to any connected component of YV2 . The natural map YVi → YV j for j ≤ i is unramified since YV j has no elliptic points, and there is an obvious isomorphism of the sheaf F Mˇ on YVi with the pullback of the sheaf F Mˇ on YV j . Thus for every connected component Z of YV2 we obtain an isomorphism H 1 (Z , (F Mˇ )| Z ) → H 1 (YV1 , F Mˇ ) of O -modules from which it follows that the O -rank of L 2 is equal to the cardinality of Q the maximal 2-power quotient of p∈Q (Z/ pZ)× times the O -rank of L 1 . It remains to show that the O -rank of L 1 is equal to the cardinality of the maximal 2-power Q quotient of ( p∈Q (Z/ pZ)× )/{±1} times the O -rank of L 0 , and this can be proved exactly as in [CDT, Lem. 6.4.3]. This completes the proof of Proposition 17. Acknowledgments I would like to thank my thesis adviser Richard Taylor for suggesting this project and for his constant availability and readiness to answer questions during my work on it. I


381

would also like to thank Fred Diamond for helpful conversations, and the referee for pointing out a serious error in the original version of Lemma 43 and for making many other corrections and useful suggestions. References [BCDT] C. BREUIL, B. CONRAD, F. DIAMOND, and R. TAYLOR, On the modularity of elliptic curves over Q, to appear in J. Amer. Math. Soc. 320 [B] K. BUZZARD, On level-lowering for mod 2 representations, Math. Res. Lett. 7 (2000), 95–110. MR 2001a:11080 322, 323 [BDST] K. BUZZARD, M. DICKINSON, N. SHEPHERD-BARRON, and R. TAYLOR, On icosahedral Artin representations, Duke Math. J. 109 (2001), 283–318. 320, 323, 326 [BT] K. BUZZARD and R. TAYLOR, Companion forms and weight one forms, Ann. of Math. (2) 149 (1999), 905–919. MR 2000j:11062 320 [C] H. CARAYOL, Sur les représentations l-adiques associées aux formes modulaires de ´ Hilbert, Ann. Sci. Ecole Norm. Sup. (4) 19 (1986), 409–468. MR 89c:11083 336 [Ca] W. CASSELMAN, On some results of Atkin and Lehner, Math. Ann. 201 (1973), 301–314. MR 49:2558 335 [CV] R. F. COLEMAN and J. F. VOLOCH, Companion forms and Kodaira-Spencer theory, Invent. Math. 110 (1992), 263–281. MR 93i:11063 322 [CDT] B. CONRAD, F. DIAMOND, and R. TAYLOR, Modularity of certain potentially Barsotti-Tate Galois representations, J. Amer. Math. Soc. 12 (1999), 521–567. MR 99i:11037 320, 321, 337, 338, 340, 343, 344, 345, 356, 357, 359, 377, 380 [CR] B. CONRAD and K. RUBIN, eds., Arithmetic Algebraic Geometry (Park City, Utah, 1999), Amer. Math. Soc., Providence, to appear. [CSS] G. CORNELL, J. H. SILVERMAN, and G. STEVENS, eds., Modular Forms and Fermat’s Last Theorem (Boston, 1995), Springer, New York, 1997. MR 99k:11004 [DDT] H. DARMON, F. DIAMOND, and R. TAYLOR, “Fermat’s last theorem” in Current Developments in Mathematics (Cambridge, Mass., 1995), Internat. Press, Cambridge, Mass., 1994, 1–154. MR 99d:11067a 323, 337, 362, 367 [Dia1] F. DIAMOND, On deformation rings and Hecke rings, Ann. of Math. (2) 144 (1996), 137–166. MR 97d:11172 320 [Dia2] , “An extension of Wiles’ results” in Modular Forms and Fermat’s Last Theorem (Boston, 1995), Springer, New York, 1997, 475–489. MR CMP 1 638 490 332 [Dia3] , The Taylor-Wiles construction and multiplicity one, Invent. Math. 128 (1997), 379–391. MR 98c:11047 357, 358, 359 [DI] F. DIAMOND and J. IM, “Modular forms and modular curves” in Seminar on Fermat’s Last Theorem (Toronto, 1993/1994), CMS Conf. Proc. 17, Amer. Math. Soc., Providence, 1995, 39–133. MR 97g:11044 321, 334, 335 [Dic] L. E. DICKSON, Linear Groups: With an Exposition of the Galois Field Theory, Teubner, Leipzig, 1901; reprint, Dover, New York, 1958. MR 21:3488 365 [E] B. EDIXHOVEN, The weight in Serre’s conjectures on modular forms, Invent. Math. 109 (1992), 563–594. MR 93h:11124 323, 337

382

[FM]

[G]

[Gr] [Gro]

[K]

[Ku] [ML] [RS]

[S1] [S2] [ST]

[dSL]

[T]

[Ta] [TW] [W] [Wa]

MARK DICKINSON

J.-M. FONTAINE and B. MAZUR, “Geometric Galois representations” in Elliptic

Curves, Modular Forms, and Fermat’s Last Theorem (Hong Kong, 1993), Ser. Number Theory 1, Internat. Press, Cambridge, Mass., 1995, 41–78. MR 96h:11049 319, 322, 325 ˆ , “Deformation theory” to appear in Arithmetic Algebraic Geometry F. Q. GOUVEA (Park City, Utah, 1999), ed. B. Conrad and K. Rubin, Amer. Math. Soc., Providence. 328 B. H. GROSS, A tameness criterion for Galois representations associated to modular forms (mod p), Duke Math. J. 61 (1990), 445–517. MR 91i:11060 322 A. GROTHENDIECK, Technique de descente et théorèmes d’existence en géométrie algébrique, II: Le théorème d’existence en théorie formelle des modules, Séminaire Bourbaki 5, 1958/59–1959/60, Soc. Math. France, Montrouge, 1995, 369–390, exp. no. 195. MR CMP 1 603 480 324, 328 S. S. KUDLA, “The local Langlands correspondence: The non-Archimedean case” in Motives (Seattle, 1991), Proc. Sympos. Pure Math. 55, Part 2, Amer. Math. Soc., Providence, 1994, 365–391. MR 95d:11065 336 P. KUTZKO, The Langlands conjecture for Gl2 of a local field, Ann. of Math. (2) 112 (1980), 381–412. MR 82e:12019 336 S. MACLANE, Categories for the Working Mathematician, Grad. Texts in Math. 5, Springer, New York, 1971. MR 50:7275 328 K. A. RIBET and W. A. STEIN, “Lectures on Serre’s conjectures” to appear in Arithmetic Algebraic Geometry (Park City, Utah, 1999), ed. B. Conrad and K. Rubin, Amer. Math. Soc., Providence. 322 J.-P. SERRE, Linear Representations of Finite Groups, Grad. Texts in Math. 42, Springer, New York, 1977. MR 56:8675 344 , Sur les représentations modulaires de degré 2 de Gal(Q/Q), Duke Math. J. 54 (1987), 179–230. MR 88g:11022 319, 321 N. I. SHEPHERD-BARRON and R. TAYLOR, mod 2 and mod 5 icosahedral representations, J. Amer. Math. Soc. 10 (1997), 283–298. MR 97h:11060 320, 323 B. DE SMIT and H. W. LENSTRA, JR., “Explicit construction of universal deformation rings” in Modular Forms and Fermat’s Last Theorem (Boston, 1995), Springer, New York, 1997, 313–326. MR CMP 1 638 482 325, 334 J. TATE, “Number theoretic background” in Automorphic Forms, Representations and L-Functions (Corvallis, Ore., 1977), Part 2, Proc. Sympos. Pure Math. 33, Amer. Math. Soc., Providence, 1979, 3–26. MR 80m:12009 336 R. TAYLOR, “Icosahedral Galois representations” in Olga Taussky-Todd: In Memoriam, Pacific J. Math. 1997, special issue, 337–347. MR 99d:11057 320 R. TAYLOR and A. WILES, Ring-theoretic properties of certain Hecke algebras, Ann. of Math. (2) 141 (1995), 553–572. MR 96d:11072 319, 357 J.-L. WALDSPURGER, Sur les valeurs de certaines fonctions L automorphes en leur centre de symétrie, Compositio Math. 54 (1985), 173–242. MR 87g:11061b 344 F. W. WARNER, Foundations of Differentiable Manifolds and Lie Groups, Grad. Texts in Math. 94, Springer, New York, 1983. MR 84k:58001 338


[Wi1] [Wi2]

A. WILES, On ordinary λ-adic representations associated to modular forms, Invent.

Math. 94 (1988), 529–573. MR 89j:11051 333, 337 , Modular elliptic curves and Fermat’s last theorem, Ann. of Math. (2) 141 (1995), 443–551. MR 96d:11071 319, 329, 357

Department of Mathematics, University of Michigan, 2074 East Hall, 525 East University Avenue, Ann Arbor, Michigan 48109-1109, USA; [email protected]

383


CUBIC RINGS AND THE EXCEPTIONAL JORDAN ALGEBRA NOAM D. ELKIES AND BENEDICT H. GROSS

Abstract In a previous paper [EG] we described an integral structure (J, E) on the exceptional Jordan algebra of Hermitian 3 × 3 matrices over the Cayley octonions. Here we use modular forms and Niemeier’s classification of even unimodular lattices of rank 24 to further investigate J and the integral, even lattice J0 = (ZE)⊥ in J . Specifically, we study ring embeddings of totally real cubic rings A into J which send the identity of A to E, and we give a new proof of R. Borcherds’s result that J0 is characterized as a Euclidean lattice by its rank, type, discriminant, and minimal norm. Contents 0. Preface . . . . . . . . . . . 1. The integral structure (J, E) 2. Embeddings of cubic rings . 3. The A-module structure on L 4. A Hilbert modular form . . 5. The case D = p 2 . . . . . . 6. The case D = 49 . . . . . . 7. The case D = 16 . . . . . . 8. The case D = 32 . . . . . . 9. The uniqueness of J0 . . . . References . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

383 384 388 390 391 394 396 399 403 404 409

0. Preface In a previous paper [EG] we described an integral structure (J, E) on the exceptional cone in R27 and studied the integral, even lattice J0 = (ZE)⊥ of rank 26 and discriminant 3. In this paper we study ring embeddings f : A → J of totally real cubic rings A into J , mapping the identity element 1 of A to the polarization E = f (1) of J . DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 2, Received 21 June 2000. Revision received 8 November 2000. 2000 Mathematics Subject Classification. Primary 11F30, 11H55, 11H56. Elkies’s work supported in part by the Packard Foundation.

383

384

ELKIES AND GROSS

We first show how such an embedding gives rise to an integral, even lattice L = A⊥ of rank 24, as well as a holomorphic Hilbert modular form F(τ ) of weight (4,4,4) for the discrete group SL2 (A) ⊂ SL2 (R)3 . We then establish some general results on the lattice L and the form F(τ ). In particular, when the discriminant of A is a square, we show there is a Niemeier lattice M between L and its dual lattice L ∨ , which is determined by the embedding f : A → J . We then give some examples. In particular, when A = Z[cos(2π/7)] = Z[α]/(α 3 + α 2 − 2α − 1) is the Dedekind domain of discriminant D = 49, and when A = {(a, b, c) ∈ Z3 : a ≡ b ≡ c (mod 2)} = Z + 2Z3 has discriminant D = 16, the embedding f is unique up to conjugation by the finite group Aut(J, E). In these cases we determine the lattices L and M and the Hilbert modular form F(τ ). In [B, Ch. 5.7], Borcherds proved that J0 is characterized by the following properties: it is an even integral lattice of rank 26, discriminant 3, and minimal norm 4. His proof requires detailed knowledge of the Lorentzian lattice II25,1 . But one can also prove this uniqueness result using only theta functions and elementary Euclidean arguments, somewhat in the spirit of J. Conway’s characterization in [C] of the Leech lattice by its rank, discriminant, and minimal norm. Starting from any even lattice L ⊂ R26 of discriminant 3 and minimal norm 4, the proof examines the configuration of minimal vectors of L and its dual, showing that L shares further combinatorial properties with J0 until L is forced to coincide with J0 . Most of these properties are also needed in our investigation of the algebra J ; in particular, the Niemeier lattice for D = 16 arises naturally in the course of the uniqueness proof. Thus several steps in that proof also provide alternative explanations for facts about J and J0 that we used in [EG] to analyze embeddings of cubic rings. In the final section of our paper we indicate these steps of the proof; a fuller treatment of that uniqueness proof will appear elsewhere. 1. The integral structure (J, E) In this section we recall the results from [EG] which we need. Let R ⊂ O be the Coxeter order in Cayley’s octonion algebra (see [EG, (5.1)]), and let β in R be defined by 1 1 β = − + (e1 + e2 + · · · + e7 ). (1.1) 2 2 Let J be the Z-lattice, of rank 27, consisting of the 3×3 Hermitian symmetric matrices over R. (This lattice was denoted L in [EG], where J was used for the real vector space containing the lattice, here denoted J ⊗ R.) An element B of J has the form 

a  B= z y

z b x

 y x c

CUBIC RINGS AND THE EXCEPTIONAL JORDAN ALGEBRA

385

with a, b, c in Z and x, y, z in R. In particular, we define the element E in J by 

2 E = β β

β 2 β

 β β 2

(1.2)

(see [EG, (5.4)]). The function d(B) = abc + Tr(x yz) − ax x − by y − czz

(1.3)

(see [EG, (1.7)]) defines a cubic form d : J → Z. This gives a symmetric trilinear form (B, B 0 , B 00 ) with (B, B, B) = 6d(B). Since ββ = 2 and Tr(β 3 ) = 5, we find that d(E) = 1. Since E is also positive definite (see [EG, (1.12)]), it defines a polarization of J . The finite group Aut(J, d, E) has order 212 35 72 13 and is isomorphic to 3D4 (2).3 (see [EG, (7.7)]). From E and the cubic form, we obtain a linear form on J : T (B) =

1 (E, E, B) = 2(a + b + c) + Tr(xβ) + Tr(yβ) + Tr(zβ). 2

(1.3a)

We also obtain two symmetric bilinear forms on J : (B, B 0 ) = (E, B, B 0 ), hB, B 0 i = −(B, B 0 ) + T (B)T (B 0 ). The first is even, of signature (1, 26) and discriminant 2. The second is positive definite and unimodular (see [EG, (7.2)]). We have hB, Bi ≡ hE, Bi (mod 2); for all B in L; that is, E is a characteristic vector of the lattice. On the lattice of rank 26, J0 = {B ∈ J : T (B) = 0} = {B ∈ J : hB, Ei = 0} = {B ∈ J : (B, E) = 0}, we have the formula (B, B 0 ) = −hB, B 0 i. (This lattice was denoted L 0 in [EG, §8].) It is even, of discriminant 3, and has no roots (see [EG, (8.4)]). Its theta function was determined in [EG, (8.7)].

386

ELKIES AND GROSS

Recall that the Jordan roots S in J are the matrices of rank 1 (see [EG, (1.8)]) which satisfy T (S) = 2. There are 819 = 32 7 · 13 Jordan roots, permuted transitively by the group Aut(J, d, E) (see [EG, (7.8)]). If S is a fixed Jordan root, then hS, S 0 i = 4 2 1 0 for precisely 1 288 144 18 Jordan roots S 0 (see [EG, (8.9)]). Moreover, the 18 roots S 0 orthogonal to S come in 9 pairs (S 0 , S 00 ), with hS 0 , S 00 i = 0 and 2E = S + S 0 + S 00 . These are the root triples containing S (see [EG, (7.8)]). If S 0 and T 0 are orthogonal to S, with S 0 6 = T 0 and hS 0 , T 0 i 6 = 0, then hS 0 , T 0 i = 2. Indeed, 4 = h2E, T 0 i = 0 + hS 0 , T 0 i + hS 00 , T 0 i, so hS 0 , T 0 i = hS 00 , T 0 i = 2. In [EG, §8], we showed that the short vectors v in J0∨ have the form 2 v=± S− E , 3 where S is a Jordan root. From this we shall determine all elements B in J with hB, Bi ≤ 4. PROPOSITION 1.4 If hB, Bi = 0, then B = 0. There are no B in J with hB, Bi = 1 or hB, Bi = 2. If hB, Bi = 3, then either

B = ±E

and

T (B) = ±3,

or B = ±(E − S)

and

T (B) = ±1

for a unique Jordan root S. If hB, Bi = 4, then either B = ±S

and

T (B) = ±2

for a unique Jordan root S, or B = S1 − S2

and

T (B) = 0

for a pair (S1 , S2 ) of Jordan roots with hS1 , S2 i = 2. There are precisely 2 representations of B as a difference of Jordan roots.


387

Proof The first statement is clear, as h·, ·i is a positive-definite pairing. Since 1 ZE ⊕ J0 ⊂ J ⊂ ZE ⊕ J0∨ , 3 3 3 we may write

a E +v 3 with v in J0∨ , and a an integer. The class of a mod 3 is determined by the class of v in J0∨ /J0 . Since hE, Ei = 3, B=

hB, Bi =

a2 + hv, vi. 3

If v = 0, then a = 3b for some nonzero integer b, and B = bE. Then hB, Bi ≥ 3, with equality if and only if b = ±1. Otherwise, hB, Bi ≥ 12. If v 6 = 0, we have hv, vi ≥ 8/3, with equality if and only if v = ∓ (S − (2/3)E) for a unique Jordan root S (see [EG, (8.4)]). Taking a = ±1, we obtain the elements B = ±(E − S), with hB, Bi = 3,

T (B) = ±1.

Taking a = ∓2, we obtain the elements B = ∓S, with hB, Bi = 4,

T (B) = ∓2.

If hv, vi > 8/3, then hv, vi ≥ 4, with equality implying that v lies in J0 . We conclude the proof by showing that v = S1 − S2 ,

hS1 , S2 i = 2,

in precisely two distinct ways. Since there are 144 · 819 short vectors v in J0 (see [EG, (8.7)]) and 288 · 819 ordered pairs (S1 , S2 ) of Jordan roots with hS1 , S2 i = 2, we will get all the short vectors, provided that we show each S1 − S2 has precisely one further representation as T1 − T2 . If S1 − S2 = T1 − T2 with Si 6 = Ti , we have 2 = hS1 , T1 − T2 i = hS1 , T1 i − hS1 , T2 i.

388

ELKIES AND GROSS

Hence hS1 , T1 i = 2 and hS1 , T2 i = 0. Similarly, hT1 , S2 i = 0, so we have S1 + T2 + R = 2E, T1 + S2 + R 0 = 2E for Jordan roots R (orthogonal to S1 and T1 ) and R 0 (orthogonal to T1 and S2 ). But S1 + T2 = T1 + S2 , so R = R 0 is orthogonal to both (S1 , S2 ) and (T1 , T2 ). Conversely, such an R orthogonal to (S1 , S2 ) gives us another pair (T1 , T2 ). We conclude the proof by proving the following lemma. 1.5 If S1 and S2 are Jordan roots with hS1 , S2 i = 2, there is a unique Jordan root R orthogonal to S1 and S2 . LEMMA

Proof As we noted in [EG, (8.9)] (by invoking the Atlas [A]), Aut(J0 ) acts transitively on pairs (S1 , S2 ) of Jordan roots such that hS1 , S2 i = 2. Thus the number of Jordan roots orthogonal to both S1 , S2 is a constant independent of the choice of S1 , S2 ; call this constant n. We determine n by counting in two ways the triples (S1 , S2 , R) of Jordan roots with the above inner products. On the one hand, there are 819 choices for S1 , then 288 choices for S2 , then n choices for R, for a total of 819 · 288 · n triples. On the other hand, there are 819 choices for R, then 18 choices for S2 , then 16 choices for S1 , so 819 · 18 · 16 = 819 · 288 triples. Hence n = 1, as claimed. 2. Embeddings of cubic rings Let A be a cubic ring, by which we mean a commutative ring with unit which is isomorphic as an additive group with Z3 . Assume that A is totally real, that is, that A ⊗ R ' R3 . Let N : A → Z be the norm, which is a cubic form. Let f : A → J be a homomorphism of abelian groups. We say f is an embedding if the following three conditions hold: d( f (a)) = Na

for all a ∈ A,

f (1) = E, the abelian group J/ f (A)is torsion-free.

(2.1) (2.2) (2.3)

The first two conditions imply that f is a ring homomorphism f (a · b) = f (a) ◦ E f (b)

(2.4)


389

(see [GG, Lem. 2]), where B ◦ E B 0 is the Jordan product on J ⊗ Z[1/2] defined in [EG, (2.15)]. The condition (2.3) implies that the embedding f does not extend to a larger order A0 ⊃ A in the e´ tale algebra A ⊗ Q. Since only maximal orders were considered in [GG], this condition was unnecessary there. By (2.1) and (2.2), we have d(x E − f (a)) = N(x − a) as cubic polynomials over Z. Hence T ( f (a)) = Tr(a), h f (a), f (b)i = T ( f (a) ◦ E f (b)) = Tr(a · b). Let A∨ ⊂ A ⊗ Q be the lattice dual to A under the form ha, bi = Tr(a · b); the finite A-module A∨ /A has order D = disc(A). Let L = f (A)⊥ be the subgroup, of rank 24, of elements B of J which are orthogonal to the image of A. Then L ⊂ J0 is an even lattice, and L ⊥ = f (A), by (2.3). We have inclusions A ⊕ L ⊂ J ⊂ A∨ ⊕ L ∨ . (2.5) PROPOSITION 2.6 The projections onto the first and second components define isomorphisms of finite abelian groups

α : J/(A ⊕ L) ' A∨ /A, β : J/(A ⊕ L) ' L ∨ /L . If β ◦ α −1 = γ : A∨ /A ' L ∨ /L, then for all a, b in A∨ we have hγ a, γ bi ≡ −ha, bi

(mod Z).

Proof We have (A ⊕ L)∨ = A∨ ⊕ L ∨ . Since J is unimodular, the index d of A ⊕ L in J is equal to the index d of J in A∨ ⊕ L ∨ . Since the maps α and β are both injective, we have d ≤ #(A∨ /A) and d ≤ #(L ∨ /L). But by the above remark, d 2 = #(A∨ /A) · #(L ∨ /L). Hence the maps α and β are both isomorphisms, and we have d = #(L ∨ /L) = #(A∨ /A) = D. Define t A : J → A∨ as follows: t A (B) is the first component of B in the decomposition J ⊂ A∨ + L ∨ . Then Tr(t A (B)) = h1, t A (B)i = hE, Bi = T (B) ∈ Z.

(2.7)

390

ELKIES AND GROSS

3. The A-module structure on L The lattice J0 is even, so q(v) = hv, vi/2 defines a positive-definite quadratic form q : J0 → Z. In this section we define an A-module structure on the lattice L = f (A)⊥ ⊂ J0 and define a positive-definite quadratic map of A-modules q A : L → A∨

(3.1)

such that Tr(q A ) = q on L. The A-module structure on L is due to T. Springer (cf. [KMRT]). It comes from the E-adjoint map B 7 → B # on J (cf. [EG, (2.21)], where B # was denoted B E∗ ). This is a quadratic map, which satisfies B ◦ E B # = B # ◦ E B = d(B) · Eq(v) = −hv # , Ei for all v ∈ J0 (see [EG, 2.22]). There is a similar map a 7 → a # on A, with a · a # = N(a), and if f : A → J is an embedding, we have f (a # ) = f (a)# . The Freudenthal product B ×C is the symmetric bilinear map J × J → J defined by the formulas B × C = (B + C)# − B # − C # = 2(B ◦ E C) − T (B)C − T (C)B + (B, C)E. Note that E × C = −C. A key identity, which can be verified using [EG, (2.15)], is the inner product formula hB × C, Di = hB, C × Di. For a ∈ A and v ∈ L, we define a · v in L by the formula a · v = −( f (a) × v). This lies in L = f (A)⊥ , as for any b ∈ A, h f (b), a · vi = −h f (b), f (a) × vi = −h f (b) × f (a), vi = −h f (b × a), vi = 0. It endows L with an A-module structure. (The relevant identities can be checked for the action of A ⊗ R on L ⊗ R, as in [EG, §1–3].)


Now for v ∈ L ⊂ J ⊂ A∨ + L ∨ , write  #  v = −q A (v) + β(v), with q A : L → A∨ ,  β : L → L ∨.

391

(3.2)

Again, by extending scalars to R, one can check that q A (a · v) = a 2 · q A (v), and hv, wi A = q A (v + w) − q A (v) − q A (w) is a bilinear form with values in A∨ ; thus q A is a quadratic form. Moreover, Tr q A (v) = −T (v # ) = −hv # , Ei = q(v). From this it follows that L ∨ is also an A-module inside L ⊗ Q. Indeed, L ∨ = Hom(L , Z) under h·, ·i, so L ∨ = Hom A (L , A∨ )

(3.3)

under the bilinear form h·, ·i A . Some further identities include β(a · v) = a # · β(v) in L ∨ as well as the formula for the cubic form on L, d(v) = hv, β(v)i A (see [KMRT, pp. 522–523]). The right-hand side miraculously takes values in the subring Z of A. In particular, for a ∈ A, d(a · v) = Na · d(v). Even though the quotient L ∨ /L has the structure of a finite A-module, the isomorphism γ : A∨ /A ' L ∨ /L of finite abelian groups in Proposition 2.6 is not an A-module homomorphism, as the action of A on A∨ + L ∨ does not stabilize the sublattice J . 4. A Hilbert modular form Let A be a totally real cubic ring. An element of A ⊗ R ' R3 is said to be totally positive if each of its three R3 -coordinates is nonnegative. We denote by (A ⊗R)+ the self-dual cone of such elements. Fix an embedding f : A → J . Since (A ⊗ R)+ = (A ⊗ R)2 and (J ⊗ R)2 is the cone of positive-semidefinite matrices B in J ⊗ R, f maps totally positive α in A to positive-semidefinite B = f (α) in J . Conversely, if B ≥ 0 in J , then α = t A (B) lies in A∨ + . To verify this it suffices to check that Tr(αα 0 ) ≥ 0 for all α 0 ∈ A+ . But Tr(αα 0 ) = ht A (B), f (α 0 )i = hB, f (a 0 )i ≥ 0

392

ELKIES AND GROSS

as f (α 0 ) ≥ 0 in J . Let H be the upper half-plane. PROPOSITION 4.1 The holomorphic function f : H 3 → C, defined by the convergent Fourier expansion X F(τ ) = f (τ1 , τ2 , τ3 ) = c(α)e2πi(α1 τ1 +α2 τ2 +α3 τ3 ) α∈A∨ +

with c(0) = 1 and  X

c(α) = 240

 X

 SinJ rank(S)=1 t A (S)=α

d 3 ,

(4.2)

d|c(S)

is a Hilbert modular form of weight (4, 4, 4) for SL2 (A). That is, aτ + b = N(cτ + d)4 F(τ ) F cτ + d for all ac db in SL2 (A).

(4.3)

In the proposition, c(S) is the largest positive integer dividing S in J . If α is primitive in A∨ , then c(S) = 1 and c(a) = 240 · #{S : rank S = 1, t A (S) = 2}. Writing S =α+v

in J ⊂ A∨ + L ∨

with α = t A (S), we find S # = (α # − q A (v)) + (β(v) − α · v). The condition that rank(S) = 1 is equivalent to the fact that S # = 0 (see [EG, (1.11)]), so ( q A (v) = α # β(v) = α · v

in A∨ ⊗ A∨ in A∨ ⊗ L ∨ .

(4.4)

To prove the proposition, we begin with a description of H. Kim’s singular form F(Z ) on the exceptional tube domain D = {Z = X + iY, with X ∈ J ⊗ R and Y ∈ (J ⊗ R)+ }.


393

This is a holomorphic function F : D → C which satisfies F(Z + B) = F(Z ) for all B ∈ J, F(g Z ) = F(Z ) for all g ∈ Aut(J, d), F(−Z /d(Z )) = d(Z )4 F(Z ). #

It has Fourier expansion 



X

F(Z ) = 1 + 240

X 

S≥0

d 3  e2πihA,Z i .

(4.5)

d|c(S)

rank(S)=1

These facts were established by Kim [K, p. 146], using the identity I to polarize J ; in fact, any polarization E determines an isomorphic discrete subgroup of automorphisms of the exceptional domain. The form F(τ ) is simply the restriction of F(Z ) to the sub-tube-domain H 3 = (A ⊗ R) + i(A ⊗ R)+ ,→(J ⊗ R) + i(J ⊗ R)+ = D . f

This satisfies F(τ + b) = F(τ ) for all b ∈ A, F(α 2 · τ ) = F(τ ) for all α ∈ A∗ , F(−1/τ ) = (Nτ )4 F(τ ). The matrices 1 b , 0 1

α 0

0 , α −1

0 −1 1 0

generate SL2 (A), so F(τ ) has weight (4,4,4) for this discrete group acting on H 3 . Its Fourier expansion is given by   X X  F(τ ) = 1 + 240 d 3  e2πi Tr(t A (S)·τ ) S≥0 rank(S)=1

d|c(S)

 = 1 + 240

X   

α∈A∨ + α6 =0

as claimed.



 X

X  rank(S)=1 t A (S)=α

d|c(S)



 2πi Tr(α·τ ) , d 3  e

394

ELKIES AND GROSS

5. The case D = p 2 In this section we consider the case when the cubic ring A is maximal and has discriminant D = p 2 , with p a prime. Then p ≡ 1 (mod 3), and A is the ring of integers in the cubic subfield of the pth cyclotomic field. Thus p is tamely ramified in A and lies under a unique prime p of A; we have  3   pA = p , (5.1) A∨ = p−2 A,    A∨ /A ' (Z/ pZ)2 . The quadratic space A∨ /A with form p · Tr(x y) (mod p) is split over Z/ pZ, with one of its isotropic lines the sub-A-module N = p−1 A/A.

(5.2)

0

Let N be the other isotropic line in A∨ /A, and let N and N 0 be the corresponding unimodular lattices contained in A∨ . Since rank(N ) = rank(N 0 ) = 3, both are isomorphic to the lattice Z3 (see [MH, p. 19]). Thus N has six short vectors ±e1 , ±e2 , ±e3 which satisfy hei , e j i = Tr(ei e j ) = δi j , and likewise N 0 has six short vectors ±e10 , ±e20 , ±e30 with Tr(ei0 e0j ) = δi j . Both N and N 0 contain the element 1 of A, with h1, 1i = 3. We may normalize the signs of the ei so that e1 + e2 + e3 = 1, as the only v with hv, vi = 3 have the form v = ±e1 ± e2 ± e3 . We next determine the cubic equations satisfied by the ei , ei0 , and in the process we obtain a novel proof of the existence of integers s, t such that 4 p = s 2 + 27t 2 . Let σ generate the cyclic group (of order 3) of automorphisms of A. Then σ (A∨ ) = A∨ , σ (N ) = N , and σ (N 0 ) = N 0 . Since heiσ , eiσ i = 1, we see that eiσ is a short vector not equal to ±ei . Since e1σ + e2σ + e3σ = 1σ = 1, σ cyclically permutes the set {e1 , e2 , e3 }. Hence Tr(ei ) = 1. Since Tr(ei2 ) = 1, the ei are the three roots of a cubic equation of the form f m (x) = x 3 − x 2 − m = 0

(5.3)

with m = N(ei ). The same argument shows that the ei0 are roots of a cubic equation f m 0 (x) = 0 with m 0 = N(ei0 ). These norms m, m 0 are rational numbers of p-valuation −1 and −2, respectively. Now the discriminant dµ of f µ (x) is given by dµ = −4µ − 27µ2 = −µ(4 + 27µ). Since A has square discriminant, dµ must be a rational square for both µ = m and µ = m 0 . Thus if we write m = −m 1 / p, m 0 = −m 2 / p 2 , then both m 1 (4 p−27m 1 ) and


395

m 2 (4 p 2 −27m 2 ) are squares. Now gcd(m i , 4 p −27m i )|4 for i = 1, 2 since the m i are integers not divisible by p. Thus each m i must be either a square or twice a square. But the latter is impossible because then 4 pi − 27m i would also be twice a square, yet 4 pi − 27m i ≡ 1 mod 3. Therefore each m i is a square. Writing m 1 = t 2 , we find that 4 p − 27t 2 = s 2 for some integer s, and we have thus solved 4 p = s 2 + 27t 2 ; hence t2 s2t 2 (5.4) m = − , dm = 2 . p p Next, if m 2 = t 0 2 , then 4 p 2 − 27t 0 2 = s 0 2 for some integer s 0 . Having solved 4 p = √ s 2 + 27t 2 , we have represented p as a norm of the algebraic integer (s + t −27)/2 √ in the quadratic imaginary ring Z + Z(1 + −27)/2; squaring, we obtain the representation of p 2 and conclude that s 0 = st, so s2t 2 m =− 2 , p 0

dm 0

2 s 2 t 2 (s 2 − 27t 2 )/2 = . p4

(5.5)

Fix an embedding f : A → J , and let L = f (A)⊥ as in §2. By Proposition 2.6, the map γ : A∨ /A ' L ∨ /L identifies the isotropic lines in these rank-2 quadratic spaces over Z/ pZ. We let M/L be the line corresponding to N /A, and we let M 0 /L be the line corresponding to N 0 /A. Then M and M 0 are two even, integral, unimodular lattices of rank 24 which lie between L and L ∨ . The abelian group L ∨ /L has the structure of a finite A-module, by the results of §3. PROPOSITION 5.6 The A-module L ∨ /L is cyclic and is isomorphic to A/p2 . It has a unique nontrivial A-submodule p(L ∨ /L), which is equal to the isotropic line M/L. The quadratic form q A on M takes values in A∨ , and the A-bilinear map h·, ·i A : M × M → A∨ identifies the A-module M with Hom A (M, A∨ ).

Proof Since L ∨ /L has order p 2 , it is isomorphic to either the cyclic A-module A/p2 or the A-module (A/p)2 . In the latter case, pL ∨ ⊂ L, which we use to derive a contradiction. Indeed, hL , L ∨ i A ⊂ A∨ , so if pL ∨ ⊂ L, we have hL ∨ , L ∨ i A ⊂ p−1 A∨ . Since p > 2, this means that for any y ∈ L ∨ we would have 1 hy, yi A ≥ −3. ordp q A (y) = ordp 2 On the other hand, take a in A∨ with ordp (a) = −2, and find y in L ∨ such that v = a + y is in J . Then v # = (a # − q A (y)) + (β(y) − a · y) is also in J , so it has first

396

ELKIES AND GROSS

component a # − q A (y) in A∨ . Since ordp (a # ) = −4, this forces ordp (q A (y)) = −4, a contradiction. Hence L ∨ /L is cyclic, and its unique A-submodule is p(L ∨ /L). We show that this submodule is isotropic for the quadratic form p · q(y) : L ∨ /L → Z/ pZ. Let π be a uniformizing element at p in A, and write a basis of p(L ∨ /L) as e = π · λ, with λ ∈ L ∨ . Then p · q(e) = p · Tr q A (e) = p · Tr(π 2 q A (y)). It suffices to show that π 2 q A (λ) lies in A∨ . Since L ∨ = Hom A (L , A∨ i and p2 (L ∨ ) ⊂ L, the quadratic form q A takes elements of L ∨ to elements in p−2 A∨ = (A∨ )⊗2 . Hence π 2 q A (λ) ∈ A∨ . Since p(L ∨ /L) is isotropic, it is equal to the line M/L or the line M 0 /L in L ∨ /L. If it is equal to M/L, then M is an A-module, q A : M → A∨ , and h, i A identifies M with the A-module Hom A (M, A∨ ). If not, p(L ∨ /L) = M 0 /L and q A : M 0 → A∨ . From this we derive a contradiction. Indeed, M 0 corresponds to the unimodular lattice A ⊂ N 0 ⊂ A∨ , and if a ∈ N 0 − A, then ordp (a) = −2 and ordp (a # ) = −4. By the definition of M 0 , we may find an element m 0 in M 0 with a + m 0 in J . Then (a + m 0 )# = (a # − q A (m 0 )) + (β(m 0 ) − a · m 0 ) also lies in J . Hence its first component a # − q A (m 0 ) lies in A∨ = p−2 A. This contradicts the fact that ordp (a # ) = −4 and ordp (q A (m 0 )) ≥ −2. Note 5.7 The results in this section extend, with minor modifications, to every cubic A of square discriminant. There is always a canonical A-module M with L ⊂ M ⊂ L ∨ which is unimodular for h·, ·i and on which h·, ·i A takes values in A∨ . 6. The case D = 49 We now study the case when A is the Dedekind domain Z[cos(2π/7)] = Z[α]/(α 3 + α 2 − 2α − 1) of discriminant D = 49. In this case there are 29 34 13 embeddings f : A → J , all conjugate under the finite group Aut(J, E) = 3 D 4 (2).3 of order 212 35 72 13 (see [GG, §8]). The stabilizer of a fixed embedding is the subgroup 72 : 2A4 of order 23 3 · 72 , and the normalizer of this subgroup is the maximal subgroup 72 : 2A4 × 3. The quotient is the cyclic group Aut(A) of order 3. In particular, Galois conjugate embeddings are conjugate under Aut(J, E).


We may specify one embedding by taking  −1 1 f (α) =  1 −1 −β −β where we recall that 

2 f (1) = E = β β

 −β −β  , −1 β 2 β

397

(6.1)

 β β , 2

and β 2 + β + 2 = 0. The image f (A) consists of the Z-module   f + p + r p − r + pβ f − r − rβ  p − r + pβ f + p+r f + rβ  f − r − rβ f + rβ f + p+r with f, p, r all integers. The element E corresponds to the triple (0, 1, 1), and the element f (α) corresponds to the triple (0, 0, −1). The trace form is 4 f + 2 p + r , so f (A∨ ) consists of matrices with f, p, r in (1/7)Z and 4 f + 2 p + r in Z. Since the trace form Tr(x 2 ) on A takes the values 0, 3, 5, 6, . . ., the six nontrivial cosets of N /A = p−1 A/A are represented uniquely by the six short vectors n in N with Tr(n 2 ) = 1. Let M(with L ⊂ M ⊂ L ∨ ) be the even unimodular lattice of rank 24 corresponding to the lattice N . If λ ∈ M is a root, we may find a unique short vector n ∈ N which satisfies Tr(n) = 1 and a unique choice of sign ± such that S = E ±λ−n is a Jordan root in J . Indeed, choose n and the sign uniquely so that the sum v = n ± λ in A∨ + L ∨

(6.2)

lies in J . Since hv, vi = hn, ni + hλ, λi = 1 + 2 = 3 and T (v) = Tr(n) = 1, we have v=E−S for a Jordan root S by Proposition 1.4. Consequently, we have shown the following proposition. PROPOSITION 6.3 The number of roots λ in the Niemeier lattice M is equal to twice the number of Jordan roots S in J which satisfy t A (S) = 1 − n in A∨ + , where n is any short vector in N with Tr(n) = 1.

398

ELKIES AND GROSS

Since the three short vectors in N with Tr(n) = 1 are Galois conjugate, and Galois conjugate embeddings of A are conjugate by Aut(J, E), the number of S with t A (S) = 1 − n is equal to the number of S with t A (S) = 1 − n σ . Hence we obtain the following corollary. 6.4 Fix a short vector n in N with Tr(n) = 1, and let a = 1 − n be the corresponding (totally positive) element in A∨ + . Then COROLLARY

#{roots λ in M} = 6 · #{Jordan roots S in J with t A (S) = a}. A similar argument works for the lattices M 0 and N 0 . Fix a short vector n 0 in N 0 with Tr(n 0 ) = 1, and let a 0 = 1 − n 0 . Then #{roots λ0 in M} = 6 · #{Jordan roots S with t A (S) = a 0 }.

(6.5)

We can calculate these numbers by a determination of the Hilbert modular form F(τ ). This has weight (4,4,4) for SL2 (A), and since the Galois conjugate embeddings are conjugate, it is invariant under the action of Aut(A). One can show, using the trace formula, that the space of such forms is 2-dimensional and is spanned by the forms E 22 and E 4 . Here E k is the weight-(k, k, k) Eisenstein series studied by Siegel, with the Fourier expansion   X X 1  E k = 3 ζ A (1 − k) + (Nc)k−1  q a (6.6) 2 2 a>0 in A∨

c|(a)p

(see [vdG, pp. 19–20]). From the values 1 , 3·7 79 ζ A (−3) = , 2·3·5·7 ζ A (−1) = −

we find   X X 1  E2 = − 3 + Nc q a , 2 3·7 2 a>0 in A∨

c|(a)p

 E4 = −

79 + 24 3 · 5 · 7

X

 X

 a>0 in A∨

c|(a)p2

(Nc)3  q a .


399

There is a unique Aut(A) orbit of elements a > 0 in A∨ with Tr(a) = 1, represented by the squares n 2 of short vectors in N . Since the space of modular forms is 2-dimensional, there is a unique form F(τ ) with constant Fourier coefficient c(0) = 1 and coefficient c(n 2 ) = 0. Some calculation shows that this is the linear combination F(τ ) = 24 3 · 5 · 7E 2 (τ )2 + 22 5E 4 (τ ).

(6.7)

There are five orbits of Aut(A) on elements a > 0 in A∨ with Tr(a) = 2, and we tabulate the Fourier coefficients c(a) of F(τ ) on these orbits (see Table 1). As before, n is a short vector in N with Tr(n) = 1, and n 0 is a short vector in N 0 with Tr(n 0 ) = 1. We have n 3 − n 2 + (1/7) = 0, and (n 0 )3 − (n 0 )2 + (1/49) = 0 by (5.4) and (5.5). Indeed, for p = 7, s 2 = t 2 = 1. Table 1 a>0 in A∨ Tr(a)=2 2 · n2

(a)p2

c(a) of F(τ )

(2)

240 · 49

1−n

p

240 · 28

1 − n0

1

0

1 − n2

a prime of norm 13

240 · 196

1

0

1 − 2n

+ n2

The form F(τ ) we have determined is the one that appears in §4, as that satisfies c(0) = 1, c(n 2 ) = 0. Hence, for Tr(a) = 2 we have c(a) = 240#{S = Jordan roots of J with t A (S) = a}. In particular, this shows that the lattice N has 6 · 28 = 168 roots and that the lattice N 0 has 6 · 0 = 0 roots. Hence, as Niemeier lattices, ( N ' A46 , (6.8) N 0 ' Leech lattice, as claimed in [GG, §8]. Indeed, A46 is the unique Niemeier lattice with 168 roots (and Coxeter number h = 7), and the Leech lattice is the unique Niemeier lattice with no roots (see [N], [V]). 7. The case D = 16 We treat another case, when A is the subring of index 4 in Z3 consisting of the triples (a, b, c) with a ≡ b ≡ c (mod 2). This ring has discriminant D = 16 and admits an

400

ELKIES AND GROSS

embedding f : A → J which is unique up to conjugacy by Aut(J, d, E). Indeed, an embedding of A is given by the images     f (2, 0, 0) = S1 , (7.1) f (0, 2, 0) = S2 ,    f (0, 0, 2) = S , 3

which satisfy Si2 = 2Si , Si S j = 0, S1 + S2 + S3 = 2E. Thus (S1 , S2 , S3 ) forms a root triple in the sense of [EG, §3], and by [EG, Prop. 7.8], the group Aut(J, d, E) = 3 D4 (2).3 of order 212 35 72 13 acts transitively on root triples, with fixer the subgroup 22+3+6 .7.3. Hence there are 2 · 34 · 7 · 13 = 14742 distinct embeddings f : A → J . Fix an embedding, and let L = A⊥ . Then L ∨ /L ' A∨ /A. Since A∨ is the subgroup of ((1/2)Z)3 consisting of triples (a, b, c) with a + b + c in Z, we find A∨ /A ' (Z/4Z)2 . The unimodular lattice Z3 , A ⊂ Z3 ⊂ A∨ , corresponds to the subgroup (2Z/4Z)2 killed by 2, and in turn it yields a Niemeier lattice M between L and L ∨ . We show that M is isomorphic to the Niemeier lattice whose root system is A24 1 by showing that M contains precisely 48 roots. To do this, we first determine the modular form F(τ ) of weight (4, 4, 4) for SL2 (A), defined in §4. Let 0(2) G SL2 (Z) be the subgroup of integral matrices that reduce to the identity mod 2. Then SL2 (A) G 0(2)3 , so F(τ ) has weight (4, 4, 4) for 0(2)3 and enjoys some additional invariance properties. Let W be the complex vector space of holomorphic modular forms of weight 4 for 0(2). This has dimension 3 and is spanned by the Eisenstein series at the three cusps 1 − q + 7q 2 + . . . , 16 f 2 = −q 1/2 + 8q − 28q 3/2 + 64q 2 + . . . ,

f1 =

f 3 = q 1/2 + 8q + 28q 3/2 + 64q 2 + . . . (see [R, pp. 232–235]). Here q 1/2 = eπiτ . The form f 1 is a power series in q = e2πiτ ,


401

as is f 2 + f 3 . The general Fourier coefficient an of q n/2 is given by X an = (−1)d d 3 for f 1 , d|n n|deven

X

an =

(−1)d d 3

for f 2 ,

d|n n|dodd

X

an =

d3

for f 3

d|n n|dodd

(see [R, Th. 7.3.1]). The group SL2 (Z)/ 0(2) ' S3 acts on this space by permuting the forms f 1 , f 2 , f 3 . The unique invariant is the sum 1 + 15q + 135q 2 + . . . 16 1 = E 4 of weight 4 for SL2 (Z). 16

f1 + f2 + f3 =

The space of forms of weight (4, 4, 4) for 0(2)3 is isomorphic to W ⊗ W ⊗ W . This has dimension 27 and basis f i ⊗ f j ⊗ f k . The group SL2 (Z)/ 0(2) acts diagonally and has invariant subspace of dimension 5. A basis for the invariants is given by g1 = f 1 ⊗ f 1 ⊗ f 1 + f 2 ⊗ f 2 ⊗ f 2 + f 3 ⊗ f 3 ⊗ f 3 , g2 = f 1 ⊗ f 1 ⊗ f 2 + f 1 ⊗ f 1 ⊗ f 3 + f 2 ⊗ f 2 ⊗ f 1 + f 2 ⊗ f 2 ⊗ f 3 + f3 ⊗ f3 ⊗ f1 + f3 ⊗ f3 ⊗ f2, g3 = f 1 ⊗ f 2 ⊗ f 1 + f 1 ⊗ f 3 ⊗ f 1 + f 2 ⊗ f 1 ⊗ f 2 + f 2 ⊗ f 3 ⊗ f 2 + f3 ⊗ f1 ⊗ f3 + f3 ⊗ f2 ⊗ f3, g4 = f 2 ⊗ f 1 ⊗ f 1 + f 3 ⊗ f 1 ⊗ f 1 + f 1 ⊗ f 2 ⊗ f 2 + f 3 ⊗ f 2 ⊗ f 2 + f1 ⊗ f3 ⊗ f3 + f2 ⊗ f3 ⊗ f3, g5 = f 1 ⊗ f 2 ⊗ f 3 + f 1 ⊗ f 3 ⊗ f 2 + f 2 ⊗ f 1 ⊗ f 3 + f 2 ⊗ f 3 ⊗ f 1 + f3 ⊗ f1 ⊗ f2 + f3 ⊗ f2 ⊗ f1. This is precisely the space of forms of weight (4, 4, 4) on SL2 (A), as SL2 (A) is the subgroup of SL2 (Z)3 consisting of triples of matrices with the same reduction mod 2. The form F(τ ) has an additional invariance property. Since the Aut(A) = S3 conjugate embeddings f : A → J are conjugate under Aut(J, d, E), the number of S in J of rank 1 with t A (S) = a is equal to the number with t A (S) = a σ , for all σ ∈ S3 = Aut(A). Hence the Fourier coefficient c(a) of F(τ ) is equal to c(a σ ) for all σ ∈ S3 and a ∈ A∨ + , and F(τσ1 , τσ2 , τσ3 ) = F(τ1 , τ2 , τ3 ).

402

ELKIES AND GROSS

The subspace of forms of weight (4, 4, 4) for SL2 (A) with this extra invariance property has dimension 3. As a basis, we may take h 1 = 212 g1 with c(0, 0, 0) = 1, h 2 = 24 (g2 + g3 + g4 ) with c(0, 0, 0) = 0, c(1, 0, 0) = 1, h 3 = −23 g5 with c(0, 0, 0) = 0, c(1, 0, 0) = 0, c(1/2, 1/2, 0) = 1. We tabulate the coefficients of the basis elements h i , at orbits of Aut(A) on A∨ + with Tr(a) ≤ 2 (see Table 2). Table 2 a=

(0,0,0)

(1,0,0)

( 12 , 12 , 0)

h1 h2 h3

1 0 0

-16 1 0

0 2 1

(2,0,0)

(1,1,0)

( 12 , 32 , 0)

( 12 , 12 , 1)

112 8 0

256 96 -64

0 56 28

65536 -288 -16

The form F(τ ) has Fourier coefficients c(0, 0, 0) = 1, c(1, 0, 0) = 0, 1 1 c , , 0 = 0. 2 2 Hence we must have f = h 1 + 16h 2 − 32h 3 .

(7.2)

The coefficients of F at elements a in A∨ + with Tr(a) = 2 are c(2, 0, 0) = 240, c(1, 1, 0) = 16 · 240, 1 3 c , , 0 = 0, 2 2 1 1 c , , 1 = 256 · 240. 2 2 Hence there are 16 Jordan roots S with t A (S) = (1, 1, 0), 16 roots with t A (S) = (0, 1, 1), and 16 roots with t A (S) = (1, 0, 1).


403

The three nontrivial cosets of Z3 /A are represented by short vectors n with hn, ni = 1 and Tr n = 1. An argument similar to that in §6 shows that if n is such a short vector, #{roots λ in M} = 3#{Jordan roots with t A (S) = 1 − n} = 3 · 16 = 48. This completes the proof that N ' A24 1 , as that is the unique Niemeier lattice with 48 roots (and Coxeter number h = 2; again, see [N], [V]). 8. The case D = 32 Another interesting case, which merits further study, is the cubic ring √ A = {(b, c + d 2) : b ≡ c (mod 2)}, which has index 2 in the maximal order √ A0 = Z + Z[ 2]

of discriminant 8.

Hence A has discriminant D = 32. The embeddings f : A → J are all conjugate. To construct one, choose a triple of Jordan roots (S1 , S2 , S) with hS1 , S2 i = 2 and S orthogonal to S1 and S2 , as in the proof of Lemma 1.6. We embed A into J by mapping f (1, 1) = E, f (2, 0) = S, √ f (0, 2 + 2) = S1 + S2 . Since B = S1 + S2 satisfies B 3 − 4B 2 + 2B = 0

in J,

and S · B = B · S = 0, this is a ring embedding. The abelian group A∨ /A is isomorphic to Z/8 + Z/4. Indeed, ( ) √ ! b c+d 2 ∨ A = , : b ≡ c (mod 2) . 2 4 Hence the exponent of A∨ /A is 8. The 2-torsion in A∨ /A is given by the image of the lattice c N= b, √ + d , 2

404

ELKIES AND GROSS

and N /A ' Z/2 + Z/2. The subgroup N /A is also isotropic for the induced pairing A∨ /A × A∨ /A → Q/Z, √ as Tr(n 2 ) = Tr b2 , c2 /2 + d 2 + 2cd = b2 + c2 + 2d 2 is integral for n ∈ N . We have N ⊃2 A0 ⊃2 }A. The lattice N has discriminant 2 and is isomorphic to Z + Z + Z(2). By the results of §2, the lattice L = f (A)⊥ has index 32 in L ∨ and is contained in the intermediate integral lattices L , M with L ⊂ L 0 ⊂ M ⊂ L ∨, 2

2

corresponding to A0 and N , respectively. The discriminant of M is 2. Can one determine these lattices explicitly and identify the Hilbert √ modular form √ F(τ ) inside the space of forms of weight (4, 4, 4) for 0(2) × 0( 2), where 0( 2) is the normal √ √ subgroup of SL2 (Z[ 2]) consisting of matrices reducing to the identity (mod 2)? 9. The uniqueness of J0 We outline here how J0 can be proved unique without using the Lorentzian lattice II25,1 . We remark that this uniqueness result was recently used (see [BV]) in the classification of rootless lattices in dimensions 27 and 28. We postpone to a future paper some details of the spaces of modular forms that can arise as weighted theta functions and of the reconstruction of J0 from the Niemeier lattice with root system A24 1 . Let 3 be a positive-definite even integral lattice of rank 26, discriminant 3, and minimal norm 4. To prove that 3 is isometric with J0 , we show the following proposition. 9.1 Every vector of the dual lattice 3∨ is either in 3 or has norm congruent to 2/3 mod 2Z. No vector of 3∨ has norm 2/3. The theta series of 3, 3∨ coincide with those of J0 , J0∨ . In particular, each of the two nontrivial cosets of 3 in 3∨ has 819 vectors of minimal norm 8/3. Those 819 vectors constitute a spherical 2-design 1, and the 2 · 819 minimal vectors of 3∨ constitute a spherical 4-design 1 ∪ (−1). The inner product of any w1 , w2 ∈ 1 is one of 8/3, 2/3, −1/3, −4/3. For each w1 these inner products occur, respectively, for 1, 288, 512, 18 of the 819 choices of w2 . For each i, j, k ∈ {8/3, 2/3, −1/3, −4/3} there exists an integer n i,k j , independent of 3, such that for any w1 , w2 ∈ 1 with (a, b) = k the number of w ∈ K with (w1 , w) = i and (w2 , w) = j is n i,k j .

PROPOSITION

(i) (ii) (iii) (iv) (v)

(vi)


405

(For part (iv), recall that a nonempty finite subset S of a sphere in Rn is a spherical tP design if, for every polynomial P on Rn of degree at most t, |S|−1 x∈S P(x) equals the average of P over the sphere. Part (vi) is the statement that 1 is a 4-class association scheme indexed by {8/3, 2/3, −1/3, −4/3}, with parameters n i,k j independent of 3. That 1 is such an association scheme when 3 = J0 follows from the fact that Aut(J0 ) acts distance-transitively on 1; but of course this argument, which we used earlier in this paper, is not yet available to us for arbitrary 3.)

406

ELKIES AND GROSS

Proof (i) This is known to be true for any positive-definite even integral lattice 3 of discriminant 3 and rank 8n + 2. Since [3∨ : 3] = det(3) = 3, we have either v ∗ − w∗ ∈ 3 or v ∗ +w∗ ∈ 3 for any v ∗ , w∗ ∈ 3∨ −3. Thus hv ∗ , v ∗ i ∼ = hw∗ , w∗ i mod Z. More∗ ∗ ∗ ∗ ∗ ∗ ∗ over, 3v ∈ 3, so 3hv , v i = h3v , v i ∈ Z and 9hv , v i = h3v ∗ , 3v ∗ i ∈ 2Z. Thus there exists an integer c such that hv ∗ , v ∗ i ≡ 2c/3 mod 2Z for all v ∗ ∈ 3∨ − 3. If we had c = 0, then 3∨ would be an integral lattice, which is impossible because det 3∨ = 1/3 ∈ / Z. If c = 2, then we could glue 3∨ to the A2 lattice, obtaining a positive-definite even unimodular lattice of rank 8n + 4, which is impossible. Thus c = 1, as claimed. (ii) Assume on the contrary that hv ∗ , v ∗ i = 2/3 for some v ∗ ∈ 3∨ . Let 31 = 3∩(Zv ∗ )⊥ . This is a positive-definite even integral lattice of rank 25 and discriminant hv ∗ , v ∗ i(det 3) = 2 containing no vectors of norm 2. But no such lattice exists. As with J0 , but more simply, the nonexistence of 31 was proved by Borcherds via II25,1 (see [B, Lem. 4.3.1]), and it can also be established using theta series without invoking hyperbolic lattices. To do this, first show, as in (i), that all vectors in 3∨ 1 − 31 would have norm ≡ (1/2) mod 2Z, and note that 3∨ has minimal norm at least 5/2 because 1 ∨ ∗ ∗ if w ∈ 31 has norm 1/2, then 2w ∈ 31 has norm 2. Then show, as we do for 3 in (iii) and (iv), that 3∨ 1 has minimal norm 5/2 and that its minimal vectors constitute a ∗ 2 ∗ spherical 2-design. Thus, for any minimal vector w0∗ ∈ 3∨ 1 , the average of hw , w0 i over all minimal vectors w∗ is (5/2)2 /25 = 1/4. But hw∗ , w0∗ i ∈ Z + 1/2 for all ∗ ∗ ∗ 2 ∗ w ∗ ∈ 3∨ 1 , and hw0 , w0 i = 5/2. Thus each hw , w0 i ≥ 1/4, and the inequality is strict at least for w∗ = w0∗ . Therefore the average of hw, w0 i2 exceeds 1/4. This contradiction proves that 31 cannot exist, and thus it proves that 3∨ has no vectors of norm 2/3. (iii) This is in effect already proved in [EG, Prop. 8.6], in which the theta series of J0 , J0∨ was determined using only the facts about J0 that we assumed for 3 or proved in (i) and (ii). Since the involution x ↔ −x switches the two nontrivial cosets of 3 in 3∨ , each of these two cosets has the same number of minimal vectors; thus the theta series of 3∨ also determines the number of minimal vectors in each nontrivial coset. P (iv) We use the fact that S is a t-design if and only if x∈S P(x) = 0 whenever P is a spherical harmonic of positive degree at most t. Let C be one of the nontrivial P cosets of 3 in 3∨ , and consider the weighted theta series x∈C P(x)q hx,xi/2 . This is a modular form of weight 13 + deg(P) for 0(3). Because 3∨ has minimal norm 8/3, this form vanishes at each cusp at least to the same order as η(z)32 , a form of weight 16. Thus, if 13 + deg(P) < 16, then the form is identically zero. In particular, its q 4/3 coefficient vanishes; since this coefficient is the sum of P(x) over the 819 minimal vectors of C, we confirm that these vectors constitute a spherical 2-design.


407

(This is the argument we suggested in [EG, p. 693]; see also the second part of (v) below.) As for the minimal vectors of 3∨ , the same construction yields a modular form P ϑ(z) := x∈3∨ P(x)q(z)hx,xi/2 of weight 13 + deg(P) for 00 (3). It still vanishes at least as η32 at each cusp, but at the cusp z = 0 we have ϑ(z) = O(q(−1/z)2 ), not P just O(q(−1/z)4/3 ), because ϑ(−1/z) is proportional to x∈3 P(x)q(z)hx,xi/2 and 3 has minimal norm 4. This lets us conclude that ϑ ≡ 0 if 13 + deg(P) < 18 and thus that 1 ∪ (−1), the set of minimal vectors of 3∨ , is a spherical 4-design, as claimed. (Since this design is symmetric about the origin, it is automatically a 5-design as well, but we do not use this.) (v) Since w2 ≡ w1 mod 3, we have hw1 , w2 i ≡ hw1 , w1 i = 8/3 ≡ 2/3 mod Z. By Cauchy-Schwarz |hw1 , w2 i| ≤ 8/3, with equality if and only if w1 , w2 are proportional. Since w1 , w2 ∈ 1, this equality condition is equivalent to w1 = w2 . If w1 6 = w2 , then w1 − w2 ∈ 3 − {0}, so w1 − w2 has norm at least 4, whence hw1 , w2 i = (|w1 |2 + |w1 |2 − |w1 − w2 |2 )/2 ≤ (16/3 − 4)/2 = 2/3. Likewise, w1 + w2 , a nonzero vector in 3∨ , has norm at least 8/3, whence hw1 , w2 i ≥ −4/3. Thus the only possibilities for hw1 , w2 i are −4/3, −1/3, 2/3, and 8/3, the last occurring if and only if w2 = w1 . Now fix w1 and apply (iv) with P(x) = hw1 , xi and P(x) = hw1 , xi2 . This yields two linear equations on the four counts n i := #{w2 ∈ 1 : hw1 , xi = i} (i = 8/3, 2/3, −1/3, −4/3). These are already known to satisfy the two linear condiP tions n 8/3 = 1 and i n i = 819. Solving these simultaneous linear equations yields (n 2/3 , n −1/3 , n −4/3 ) = (288, 512, 18), as claimed. For a check on the computation P we may verify that i i 4 n i = 512/3 is consistent with 1 ∪ (−1) being a spherical 4-design. (vi) (Sketch) We may assume that k 6 = 8/3. For each of the remaining three values of k, and w1 , w2 ∈ 1 such that hw1 , w2 i = k, let n i, j := #{w ∈ 1 : hw1 , wi = i, hw2 , wi = j}. P P We know i n i, j for each j, and j n i, j for each i, from (v). We can also calculate P P 2 i, j i j n i, j and i, j (i j) n i, j using the fact that 1 ∪ (−1) is a 4-design. In each case this gives us enough independent linear equations to determine all the n i, j , and, in particular, to show that they depend only on k and not on the choice of w1 , w2 . This completes the proof of the proposition. We can now obtain the uniqueness of J0 in two ways. The first is to use a combinatorial characterization of a regular graph G of order 819 and degree 18 obtained from 1. This graph has vertex set 1 and an edge connecting any w1 , w2 ∈ 1 if and only if hw1 , w2 i = −4/3. It turns out that the n i,k j of Proposition 9.1 are equivalent to the condition that G be a generalized hexagon of order (2, 8). A. Cohen and J.

408

ELKIES AND GROSS

Tits showed in [CT] that every such generalized hexagon is isomorphic to the graph obtained from the norm-(8/3) vectors of J0∨ . (They actually reduced this result in turn to M. Ronan’s characterization (see [Ro1], [Ro2]) of this graph, and they offered an alternative proof by showing that for each vertex of such a graph there is a graph involution fixing only the vertex and its neighbors and then citing F. Timmesfeld’s group-theoretic characterization in [T, (3.3)] of 3D4 (2).) But G determines the inner products of all pairs of vectors in 1: two vertices at distance d on the graph correspond to vectors in 1 with inner product (−1)d 23−d /3. Thus 1 is isometric with the configuration of minimal vectors of J0∨ , and since these vectors generate J0∨ , it follows that 3∨ ∼ = J0∨ and thus that L ∼ = J0 , as claimed. In the second approach we use the counts n i,k j for k = −4/3 to find a copy of the A24 1 Niemeier lattice in 3. Fix w1 , w2 ∈ 1 such that hw1 , w2 i = −4/3, and let w3 = −(w1 + w2 ). Then w3 ∈ 1 also, and w1 , w2 , w3 form an equilateral triangle. When 3 = J0 , such triangles are precisely the projections to J0 of root triples in J . Let L be the 24-dimensional slice of 3 orthogonal to w1 , w2 , w3 . As in §7, we show that this is an even lattice of discriminant 16 with L ∨ /L ∼ = (Z/4Z)2 and thus that the preimage of (2Z/4Z)2 in L ∨ is a self-dual lattice M. This lattice can also be described as the projection to L ⊗ R of all vectors of 3 whose inner product with each wi is even. The norm of such a projection must be even; thus M is a Niemeier lattice. The next step is to show that M contains 48 roots. We cannot use the methods of §7 to count the roots, so instead we reduce the problem to the results of Proposition 9.1. If r ∈ M has norm 2, then since r ∈ / 3 we must have r + 3wi /2 ∈ 3 for exactly one of i = 1, 2, 3. Thus w := r − wi /2 ∈ 3∨ . Then w has norm 8/3, and being congruent to wi mod 3 it must be contained in 1. Since hr, wi i = 0 for each i, the projection of w to the (w1 , w2 , w3 ) plane is −wi /2. Conversely, given w ∈ 1 with that projection, we can reconstruct the root r = w + wi /2. Enumerating the roots thus reduces to enumerating the w’s. But this is done in part (vi) of Proposition 9.1 (actually, in this case part (v) would suffice); the multiplicities of the projections of 1 to that plane are given by the diagram (see Figure 1). In particular, the number of roots is 16 + 16 + 16 = 48, as claimed. We finish this proof of the uniqueness of J0 by showing that, up to the automorphisms of M, there is a unique suitable choice for its index-4 sublattice L, from which we recover 3 and thus identify it with J0 . We can even obtain the size, if not the structure, of Aut(J0 ) by multiplying #Aut(L) by the number of choices we have made along the way. This analysis, too, we relegate to a future paper.


409

w1 T T

T T T T T

T 16 256 T 16 TT T T T T T T u T T T T T T T T 256 256 T T T T T T TT T w2 w3 16 Figure 1

Acknowledgments. It is a pleasure to thank Wee Teck Gan for his help and Richard Borcherds for a copy and discussion of his thesis. References [BV]

R. BACHER and B. VENKOV, Réseaux entiers unimodulaires sans racine en dimension

[B]

R. E. BORCHERDS, The Leech lattice and other lattices, Ph.D. dissertation, Trinity

[CT]

A. M. COHEN and J. TITS, On generalized hexagons and a near octagon whose lines

[C]

J. H. CONWAY, A characterisation of Leech’s lattice, Invent. Math. 7 (1969), 137–142.

[A]

J. H. CONWAY, R. T. CURTIS, S. P. NORTON, R. A. PARKER, and R. A. WILSON, Atlas of

27 et 28, preprint, 2000, http://http://www.fourier.ujf-grenoble.fr 404 College, Cambridge, 1984, http://http://www.math.berkeley.edu/˜reb 384, 406 have three points, European J. Combin. 6 (1985), 13–27. MR 86j:51021 408 MR 39:6824 384

[CS] [EG]

[GG]

Finite Groups: Maximal Subgroups and Ordinary Characters for Simple Groups, Oxford Univ. Press, Eynsham, 1985. MR 88g:20025 388 J. H. CONWAY and N. J. A. SLOANE, Sphere Packings, Lattices and Groups, 2d ed., Grundlehren Math. Wiss. 290, Springer, New York, 1993. MR 93h:11069 N. D. ELKIES and B. H. GROSS, The exceptional cone and the Leech lattice, Internat. Math. Res. Notices 1996, 665–698. MR 97g:11070 383, 384, 385, 386, 387, 388, 389, 390, 392, 400, 406, 407 B. H. GROSS and W. T. GAN, Commutative subrings of certain non-associative rings,

410

ELKIES AND GROSS

[K]

H. KIM, Exceptional modular form of weight 4 on an exceptional tube domain contained in C27 , Rev. Math. Iberoamericana 9 (1983), 139–200. MR 94c:11040

Math. Ann. 314 (1999), 265–283. MR 2000j:11050 389, 396, 399

393 [KMRT] M.-A. KNUS, A. MERKURJEV, M. ROST, and J. TIGNOL, The Book of Involutions, Amer. Math. Soc. Colloq. Publ. 44, Amer. Math. Soc., Providence, 1998. MR 2000a:16031 390, 391 [MH] J. MILNOR and D. HUSEMOLLER, Symmetric Bilinear Forms, Ergeb. Math. Grenzgeb. (3) 73, Springer, New York, 1973. MR 58:22129 394 [N] H.-V. NIEMEIER, Definite quadratische Formen der Dimension 24 und Diskriminante 1, J. Number Theory 5 (1973), 142–178. MR 47:4931 399, 403 [R] R. RANKIN, Modular Forms and Functions, Cambridge Univ. Press, Cambridge, 1977. MR 58:16518 400, 401 [Ro1] M. A. RONAN, A note on the 3D4 (q) generalized hexagons, J. Combin. Theory Ser. A 29 (1980), 249–250. MR 82g:51015 408 [Ro2] , A combinatorial characterization of the dual Moufang hexagons, Geom. Dedicata 11 (1981), 61–67. MR 82i:51016 408 [T] F. G. TIMMESFELD, A characterization of the Chevalley- and Steinberg-groups over F2 , Geometriae Dedicata 1 (1973), 269–321. MR 48:8616 408 [vdG] G. VAN DER GEER, Hilbert modular surfaces, Ergeb. Math Grenzgeb. (3) 16, Springer, Berlin, 1988. MR 89c:11073 398 [V] B. B. VENKOV, “Even unimodular 24-dimensional lattices” in Sphere Packings, Lattices and Groups, 2d ed., Grundlehren Math. Wiss. 290, Springer, New York, 1993, 429–440. 399, 403

Elkies Department of Mathematics, Harvard University, One Oxford Street, Cambridge, Massachusetts 01238, USA; [email protected] Gross Department of Mathematics, Harvard University, One Oxford Street, Cambridge, Massachusetts 01238, USA; [email protected]


CORRECTION TO: “INTERNAL LIFSHITS TAILS FOR RANDOM PERTURBATIONS OF PERIODIC ¨ SCHRODINGER OPERATORS” ´ ERIC ´ FRED KLOPP

In [1] we studied the existence of Lifshitz tails for internal gaps of a randomly perturbed periodic Schrödinger operator and concluded that, for both long-range and short-range single-site perturbation, one has Lifshitz tails at a band edge if and only if a suitably chosen underlying periodic operator has a nondegenerate density of states at the corresponding band edge. This result is correct only in the case of short-range potentials. In the case of long-range potentials, one finds that the Lifshitz tails hold without any assumptions on the underlying periodic potential. More precisely, if one assumes only [1, (H.4)], then the results stated in [1, Th. 2.1] are correct. If one assumes [1, (H.4s)], then the result stated in [1, Th. 2.1] in the case [1, (H.2bis(2))] is correct if one asks for a slightly faster decay, that is, if assumption [1, (H.2bis(2))] is replaced with (H.2bis(2)) there exists ν > d + 2 such that one has, for any γ ∈ 0 and almost every x ∈ C0 , V > 0 on some open set and 0 ≤ V (x +γ )·(1+ | γ |)ν ≤ g+ (x). (0.1) In the case when one has, for any γ ∈ 0 and almost every x ∈ C0 , V > 0 on some open set and 0 ≤ V (x + γ ) · (1+ | γ |)d+2 ≤ g+ (x),

(0.2)

then the correct statement (which is proved in [1]) is lim

E→0+

log(n(E) − n(0)) d log | log(N (E) − N (0))| d = =⇒ lim =− . + log E 2 log E 2 E→0

(0.3)

Let us now state the correct result in the case of long-range single-site perturbations; that is, we assume DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 2, Received 10 November 2000. Revision received 9 April 2001. 2000 Mathematics Subject Classification. Primary 82B44; Secondary 47B80, 60H25. Author’s work supported by European Training and Mobility of Researchers network grant number ERBFMRXCT960001. 411

´ ERIC ´ FR ED KLOPP

412

(H.2bis(1)) for some ν ∈ (d, d + 2], there exists 0 ≤ g− ≤ g+ , g+ ∈ L p (Rd ) (here p is taken as in [1, (H.2)]), and 0 < g− on some open set, such that, for any γ ∈ 0 and almost every x ∈ C0 , one has g− (x) ≤ V (x + γ ) · (1+ | γ |)ν ≤ g+ (x).

Then we prove that if [1, (H.4s)] holds, then zero is a continuity point for N and we have log | log(N (E) − N (0))| d lim =− . (0.4) log E ν−d E→0+ So (0.4) is true without any assumption on the underlying periodic operator. The proof of this result will appear elsewhere (see [2]). Note that (0.4) also implies that the converse of (0.3) is not true if one assumes only (0.2). References [1]

[2]

´ ERIC ´ FRED KLOPP, Internal Lifshits tails for random perturbations of periodic

Schrödinger operators, Duke Math. J. 98 (1999), 335–396. MR 2000m:82029 411, 412 ———, Internal Lifshitz tails for long range single site potentials, preprint, 2001, http://http://zeus.math.univ-paris13.fr/˜klopp/publi.html 412

Département de Mathématiques, Institut Galilée, Unité Mixte de Recherche (UMR) 7539 Centre National de la Recherche Scientifique (CNRS), Université de Paris-Nord, 99 avenue Jean-Baptiste Clément, F-93430 Villetaneuse, France; [email protected]

DUKE MATHEMATICAL JOURNAL Vol. 109, No. 3, © 2001

DIRECT AND INVERSE SPECTRAL PROBLEM FOR A SYSTEM OF DIFFERENTIAL EQUATIONS DEPENDING RATIONALLY ON THE SPECTRAL PARAMETER R. MENNICKEN, A. L. SAKHNOVICH, AND C. TRETTER

Abstract A nonclassical skew-selfadjoint system of two linear differential equations is considered, which depends rationally on the spectral parameter. Systems of this type are related to the sine-Gordon equation. We introduce the notion of W p -functions (Weyl functions) which are defined in a neighborhood of the poles. The main results are theorems on the existence and uniqueness of the Weyl functions, on the uniqueness of the solutions of the inverse problem, and on explicit solutions for the direct and the inverse problem. 1. Introduction The classical Gelfand-Levitan-Krein-Marchenko approach to inverse spectral and scattering problems was intensely and variously developed during the last decades (see [K], [M], [LS], and the references therein). This development was stimulated essentially by the creation of the famous method of inverse scattering transformation in the theory of integrable nonlinear equations (see, e.g., [AS], [FT]). Some other interesting nonclassical (direct) spectral problems with rational dependence on the spectral parameter were studied in [AL], [LMM], and [LT] (see also the references therein). An inverse problem for an analogue of a canonical system with a rational dependence like (λ − x)−1 on the spectral parameter was treated in [SaL2] and [SaL3]. Direct and inverse scattering problems for potentials with general rational dependence on the spectral parameter λ were studied in the important paper [Z] under some natural restrictions. In this paper we consider (2 × 2)-systems of first-order differential equations of DUKE MATHEMATICAL JOURNAL Vol. 109, No. 3, © 2001 Received 11 May 2000. Revision received 10 October 2000. 2000 Mathematics Subject Classification. Primary 34B07; Secondary 34A55, 34B20, 47E05. Authors’ work supported by the Deutsche Forschungsgemeinschaft (German Research Foundation).

413

414

MENNICKEN, SAKHNOVICH, AND TRETTER

the form b1 β1 (x)∗ β1 (x) b2 β2 (x)∗ β2 (x) yx (x, λ) = i + λ − d1 λ − d2

y(x, λ),

x ∈ [0, ∞), λ ∈ C,

(1.1) where yx = dy/dx denotes the derivative with respect to the variable x, b p = ±1, p = 1, 2, and d p = d p , p = 1, 2, d1 6= d2 , are constants, and the vector functions β p = β p1 β p2 have the property β p (x)β p (x)∗ = 1,

x ∈ [0, ∞), p = 1, 2.

(1.2)

Problems of the form (1.1) are intimately related to nonlinear differential equations. For example, the well-known sine-Gordon equation in laboratory coordinates ωx x − ωtt = sin ω leads to two families of auxiliary problems (1.1) depending on the parameter t with d1 = 1, d2 = −1, with either b1 = b2 = 1 or b1 = 1, b2 = −1, and with coefficients 1 β1 ( · , t) := √ 1 i eiω( · ,t)/2 h( · , t), 2 1 β2 ( · , t) := √ 1 i e−iω( · ,t)/2 h( · , t), 2 where the function h is given by ω ωt 1 h x = −i j + sin J h, 4 2 2 h(0, 0) = I2 , ω ωx 1 h t = −i j + cos J j h, 4 2 2 with j :=

1 0 , 0 −1

J :=

0 1 . 1 0

For details we refer the reader to [FT], [AS], and [SaA2]. The (2 × 2)-matrix function w( · , λ) satisfying differential equation (1.1) and the normalization condition w(0, λ) = I2 (1.3) is called the fundamental solution of (1.1). Here I2 denotes the (2 × 2)-unit matrix. Important transformations of the fundamental solutions are obtained via Backlund-Darboux transformations. For the role of Backlund-Darboux transformations and different representations of the fundamental solutions and eigenfunctions in

SYSTEMS DEPENDING RATIONALLY ON THE SPECTRAL PARAMETER

415

spectral theory we refer the reader to [D], [AM], [DIKZ], [RS], and the references therein. Different generalizations of the notion of a Weyl function (especially for the nonselfadjoint case) are based on the asymptotics of the fundamental solution of differential equation (1.1) (see, e.g., [L], [BC], [Y], [BDZ], [FI], [SaA2], [GKS]). In the present paper we use the following definition. Definition 1.1 Let p ∈ {1, 2}. A function ϕ p is called a W p -function ( pth Weyl function) of system (1.1) with the property (1.2) if and only if there exists an M > 0 such that ϕ p is defined on the complex domain b 1 p D M := λ ∈ C : λ − d p − i < M M and for all x ∈ [0, ∞),

ϕ p (λ)

< ∞. sup w(x, λ)

1

λ∈D M

(1.4)

It turns out that the Weyl functions are closely related to the spectral properties of a certain auxiliary system associated with system (1.1). For example, if λ0 is a zero of a W p -function, then µ0 = b p /(2(λ0 − d p )) is an eigenvalue of this auxiliary system. We also would like to mention that for the system (1.1) induced by the sine-Gordon equation the evolution of the Weyl functions ϕ p ( · , t), p = 1, 2, can be described via the boundary values w(0, t) and wx (0, t). The present paper is organized as follows. In Section 2 we prove the existence of unique Weyl functions ϕ1 , ϕ2 of system (1.1). In Section 3 we show the uniqueness of the solution of the inverse problem, which consists of recovering the coefficient functions β p of system (1.1) from its W p -functions. The scheme used here may be used without modifications for a system with r summands b p β p (x)∗ β p (x) instead of two as in (1.1). In Section 4 explicit solutions of the direct and the inverse spectral problem are established for λ-rational systems (1.1) as given in [GKS] for pseudocanonical systems with linear dependence on the spectral parameter. 2. Existence and uniqueness of the Weyl functions (direct problem) Suppose that the functions β p , p = 1, 2, are absolutely continuous and that sup kβ 0p (x)k < ∞,

p = 1, 2.

(2.1)

0<x 0 such that the inequality M sup kξ(x, µ)k < (2.9) 4 x∈[0,∞) =(µ) 0 (2.13) for x ∈ [0, ∞) when =(µ) < −M/4. By (2.3) it is easy to see that the inequality =(µ) < −M/4 is equivalent to λ − d p − (ib p /M) < 1/M, that is, λ ∈ D M . Since

418

MENNICKEN, SAKHNOVICH, AND TRETTER

W (0, µ) = I2 , (2.13) implies that W (x, µ)∗ j W (x, µ) ≥ j, or, equivalently, (W (x, µ)−1 )∗ j W (x, µ)−1 ≤ j,

M , 4

=(µ) < −

that is, W (x, µ)−1 is j-contractive. We set v11 (x, µ) v12 (x, µ) −1 W (x, µ) =: . v21 (x, µ) v22 (x, µ)

(2.14)

(2.15)

For a fixed x ∈ [0, ∞) we define a family of linear fractional transformations ψ p (x, µ) :=

v11 (x, µ)P(µ) + v12 (x, µ) , v21 (x, µ)P(µ) + v22 (x, µ)

(2.16)

where P is an analytic function which is bounded by 1, and we denote this family by N (x) := {ψ p (x, · ) : P analytic, |P(µ)| ≤ 1, =(µ) < −M/4}.

Note that according to (2.14) and (2.15) the denominator in (2.16) does not vanish and, moreover, |ψ p (x, µ)| ≤ 1 since W −1 is j-contractive. From (2.13) we infer W (x1 , µ)∗ j W (x1 , µ) > W (x2 , µ)∗ j W (x2 , µ),

x1 > x2 ,

(2.17)

that is, W (x2 , µ)W (x1 , µ)−1 is j-contractive. From this observation it follows that N (x1 ) ⊂ N (x2 ),

x1 > x2 .

(2.18)

If we let R(x, µ) :=

r11 (x, µ) r12 (x, µ) r21 (x, µ) r22 (x, µ)

:= W (x, µ)∗ j W (x, µ),

(2.19)

we see that by definition (2.16) of ψ p the condition ψ p (x, · ) ∈ N (x) is equivalent to M ψ p (x, µ) ≤ 0, =(µ) ≤ − . (2.20) ψ p (x, µ) 1 R(x, µ) 1 4 Hence we can parametrize the values ψ p (x, · ) ∈ N (x) in the form of a so-called Weyl disc, ψ p (x, µ) = ρ1 (x, µ)−1/2 u(x, µ)ρ2 (x, µ)−1/2 + ρ0 (x, µ), M x ∈ [0, ∞), =(µ) < − , 4

(2.21)

with the center of the disc given by ρ0 (x, µ) = −r11 (x, µ)−1r12 (x, µ), the radii −1/2 −1/2 ρ1 and ρ2 given by ρ1 (x, µ) = r11 (x, µ), −1 ρ2 (x, µ) = r21 (x, µ)r11 (x, µ)−1r12 (x, µ) − r22 (x, µ) ,

SYSTEMS DEPENDING RATIONALLY ON THE SPECTRAL PARAMETER

419

and a bounded parameter function |u(x, µ)| ≤ 1. By means of (2.17) and the definition of R( · , µ) (see (2.19), and noting that r21 ( · , µ) = r12 ( · , µ)∗ ), it is easy to see −1/2 −1/2 that the functions ρ1 and ρ2 are decreasing. Moreover, by (2.9) and (2.13) we have Z x M W (s, µ)∗ W (s, µ) ds. (2.22) R(x, µ) ≥ j − 2 =(µ) + 4 0 In particular, as W (s, µ)∗ W (s, µ) ≥ W (s, µ)∗ j W (s, µ) ≥ j, formula (2.22) yields M ρ1 (x, µ) = r11 (x, µ) ≥ 1 − 2x =(µ) + → ∞, x → ∞. (2.23) 4 Therefore, for =(µ) < −M/4, the discs of functions N (x) converge to a point (the so-called Weyl point): \ N (x) = lim ρ0 (x, · ) =: ψ p ( · ). (2.24) 0≤x 2 and g ≡ ∞ if n ≤ 2. Our general result says the following: Given the parameters α > β ≥ 2, the two-sided sub-Gaussian estimate pt (x, y) ' t

−α/β

exp

−

d β (x, y) ct

1/(β−1) (1.7)

holds if and only if V (x, r ) ' r α

and

g(x, y) ' d(x, y)−(α−β) .

(1.8)

We do not specify here the ranges of the variables x, y, t, r because they are different for different settings. In the present paper, we treat the case when the underlying space is a graph, and the time is also discrete. However, the graph case already contains all difficulties. We present the proof in such a way that only minimal changes are required to pass to a general setting of abstract metric spaces, which will be dealt with elsewhere. The exact statements are given in Section 2. Note that our result is new, even for the Gaussian case β = 2. Hypothesis (1.8) consists of two conditions of different nature. The first one is a geometric condition of the volume growth, whereas the second is an estimate of a fundamental solution to an elliptic equation. Neither of them separately implies heat kernel bounds (1.7). Surprisingly enough, the exponent β, which provides the scaling of space and time variables for a parabolic equation, can be recovered from an elliptic equation, although combined with the volume growth. The paper is arranged as follows. In Section 2 we state the main result—Theorem 2.1. In Section 3 we introduce necessary tools such as the discrete Laplace operator,

SUB-GAUSSIAN ESTIMATES OF HEAT KERNELS

455

its eigenvalues, the mean exit time, and so on. In Section 4 we describe the scheme of the proof of Theorem 2.1, as well as some consequences. In particular, we mention some other conditions equivalent to (1.7). The actual proof of Theorem 2.1 consists of many steps that are considered in detail in Sections 5–15. Notation. The letters c, C are reserved for positive constants not depending on the variables in question. They may be different on different occurrences, even within the same formula. All results of the paper are quantitative in the sense that the constants in conclusions depend only on the constants in hypotheses. The relation f ' g means that the ratio of functions f and g is bounded from above and below by positive constants, for the specified range of the variables. If one of those functions contains a sub-Gaussian factor exp(−(d β /(ct))1/(β−1) ), then the constant c in exp may be different for the upper and lower bounds (cf. (1.7)). We use a number of lettered formulas such as (U E), (L E), and so on, to refer to the most important and frequently used conditions. In the appendix we provide a complete list of all such formulas. 2. Statement of the main result Throughout the paper, 0 denotes an infinite, connected, locally finite graph. If x, y ∈ 0, then we write x ∼ y, provided x and y are connected by an edge. The graph is always assumed nonoriented; that is, x ∼ y is equivalent to y ∼ x. We do not exclude loops so that x ∼ x is possible. If x ∼ y, then x y denotes the edge connecting x and y. The distance d(x, y) is the minimal number of edges in any edge path connecting x and y. Assume that graph 0 is endowed by a weight µx y , which is a symmetric nonnegative function on 0 × 0 such that µx y > 0 if and only if x ∼ y. Given µx y , we also define a measure µ on vertices by X µ(x) := µx y y∼x

and µ(A) :=

X

µ(x),

x∈A

for any finite set A ⊂ 0. The couple (0, µ) is called a weighted graph. Here µ refers both to the weight µx y and to the measure µ. Any graph 0 admits a standard weight, which is defined by µx y = 1 for all edges x y. For such a weight, µ(x) is equal to the degree of the vertex x, which is the number of its neighbors.

456

GRIGOR’YAN AND TELCS

Any weighted graph has a natural Markov operator P(x, y) defined by P(x, y) :=

µx y . µ(x)

(2.1)

Clearly, we have X

P(x, y) = 1

(2.2)

y∈0

and P(x, y)µ(x) = P(y, x)µ(y).

(2.3)

For the Markov operator P, there is an associated random walk X n , jumping at each time n ∈ N from a current vertex x to a neighboring vertex y with probability P(x, y). The process X n is Markov and reversible with respect to measure µ. If µ is the standard weight on 0, then X n is called a simple random walk on 0. Conversely, given a countable set 0 with a measure µ and a Markov operator P(x, y) on 0 satisfying (2.3), identity (2.1) uniquely determines a symmetric weight µx y on 0×0. Then one defines edges x y as those pairs of vertices for which µx y 6 = 0, and one obtains a weighted graph (0, µ). One has to assume in addition that the resulting graph 0 is connected and locally finite. Let Pn denote the nth convolution power of the operator P. Alternatively, Pn (x, y) is the transition function of the random walk X n ; that is, Pn (x, y) = Px (X n = y) . Define also the transition density of X n , or the heat kernel, by pn (x, y) :=

Pn (x, y) . µ(y)

As obviously follows from (2.3), pn (x, y) = pn (y, x). The only a priori assumption that we normally make about the transition probability is the following: P(x, y) ≥ p0 , ∀x ∼ y, ( p0 ) where p0 is a positive constant. Due to (2.2), hypothesis ( p0 ) implies that the degree of each vertex x ∈ 0 is uniformly bounded from above. The latter is in fact equivalent to ( p0 ), provided X n is a simple random walk. By sub-Gaussian heat kernel estimates on graphs, we mean the following inequalities: d(x, y)β 1/(β−1) −α/β pn (x, y) ≤ Cn exp − (U E) Cn


457

and pn (x, y) + pn+1 (x, y) ≥ cn

−α/β

exp

−

d(x, y)β cn

1/(β−1) ,

n ≥ d(x, y), (L E)

where x, y are arbitrary points on 0 and where n is a positive integer. Let us comment on the differences between (U E) and (L E). First, observe that pn (x, y) = 0 whenever n < d(x, y). (Indeed, the random walk cannot get from x to y in a number of steps smaller than d(x, y).) Therefore, the restriction n ≥ d(x, y) in (L E) is necessary. We could assume the same restriction in (U E), but if pn (x, y) = 0, then (U E) is true anyway. Another difference—using pn + pn+1 in (L E) in place of pn in (U E)—is due to the parity problem. Indeed, if graph 0 is bipartite (e.g., Z D ), then pn (x, y) = 0 whenever n and d(x, y) have different parities. Therefore, the lower bound for pn cannot hold in general, and we state it for pn + pn+1 instead. Alternatively, one could say that the lower bound holds either for pn or for pn+1 . The structure of the graph may cause one of pn , pn+1 to be small (or even vanish), but it is not possible to decide a priori which of these two terms admits the lower bound (see Section 14 for more details). Denote by B(x, R) a ball on 0 of radius R centered at x, and denote by V (x, R) its measure, that is, B(x, R) := {y ∈ 0 : d(x, y) < R} ,

V (x, R) := µ(B(x, R)).

We say that graph (0, µ) has the regular volume growth of degree α if V (x, R) ' R α ,

∀x ∈ 0, R ≥ 1.

(V )

The Green kernel of (0, µ) is defined by g(x, y) :=

∞ X

pn (x, y).

n=0

Assuming that α > β, the estimates (U E) and (L E) imply, upon summation in n, g(x, y) ' d(x, y)−γ ,

∀x 6= y,

(G)

where γ = α − β. It turns out that (G), together with the volume growth condition (V ), is sufficient to recover the heat kernel estimates (U E) and (L E), as is stated in the following main theorem. 2.1 Let α > β > 1, and let γ = α − β. For any infinite connected weighted graph (0, µ) satisfying ( p0 ), the following equivalence holds: THEOREM

(V ) + (G) ⇐⇒ (U E) + (L E).

458


Remark 2.1 Under hypotheses (V ) and (G), some partial heat kernel estimates were obtained by A. Telcs [62]. It is well known that a simple random walk in Z D admits the Gaussian estimate

cn −D/2 exp

−

d 2 (x, y) cn

≤ pn (x, y) ≤ Cn −D/2 exp

−

d 2 (x, y) , Cn

(2.4)

subject to the restrictions n ≡ d(x, y) (mod 2) and d(x, y) ≤ n. Similar Gaussian estimates were also proved for more general graphs, under various assumptions (see [37], [54], [22], [28]). It is easy to see that (2.4) is equivalent to (U E) + (L E) for α = D and β = 2 (see Section 14 for the parity matters). Barlow and Bass [8] constructed a family of graphs—graphical Sierpiński carpets (resembling in the large scale the multi-dimensional Sierpiński carpet), which are characterized by the parameters α and β. Heat kernels on those graphs satisfy subGaussian estimates (U E) and (L E). In general, the parameters α and β in (U E) and (L E) must satisfy the inequalities 2 ≤ β ≤ α + 1,

(2.5)

which can be seen as follows. By [9, Th. 2.1], the lower bound in (V ) implies the ondiagonal upper bound pn (x, x) ≤ Cn −α/(α+1) . By the result of [48], the upper bound in (V ) implies the on-diagonal lower bound pn (x, x) ≥ c (n log n)−α/2 . Comparing these estimates with the on-diagonal lower and upper bounds implied by (L E) and (U E), we obtain (2.5) (cf. [4, Th. 3.20 and Rem. 3.22], [59], as well as Lemma 5.4). The sub-Gaussian estimates for different α and β are related as follows. Consider the right-hand side of (U E) and (L E) as a function of α and β. It is easy to see that it decreases as β and α/β simultaneously increase (assuming d(x, y) ≥ n). In particular, (U E) gets stronger (and (L E) gets weaker) on increasing of α with constant β, whereas in general there is no monotonicity in β. Estimates (U E) and (L E) were proved by O. Jones [39] for the graphical Sierpiński gasket. The latter is a graph that is obtained from an equilateral triangle by a fractal-like construction (see Figure 1). The reason for a subdiffusive behaviour of the random walk on such graphs is that they contain plenty of “holes” of all sizes, which causes the random walk to spend more time on circumventing the obstacles rather than on moving away from the origin. It is possible to show that (V ) and (G) imply β ≥ 2 (see Lemma 5.4). The assumption α > β is necessary to ensure the finiteness of the Green function. It is known that either g(x, y) is finite for all x, y or g ≡ ∞. In the first case the graph


459

Figure 1. A fragment of the graphical Sierpiński gasket

(0, µ) is called transient and in the second case recurrent (e.g., Z D is transient if D ≥ 3 and recurrent otherwise). Hence, Theorem 2.1 serves only transient graphs. The question of finding equivalent conditions for sub-Gaussian estimates (U E) and (L E) is equally interesting for recurrent graphs. Note that the graph in Figure 1 is recurrent.∗ Indeed, the volume function on this graph obviously admits the estimate V (x, r ) ≤ Cr 2 , which implies the recurrence (see [18], [66]). Alternatively, one can see directly that α < β because the parameters α and β for the Sierpiński gasket are α = log 3/ log 2 and β = log 5/ log 2 (see [4]). Some hints on the recurrent case are given in Section 4. 3. Preliminaries If P is the Markov operator of a weighted graph (0, µ) and if I is the identity operator, then 1 := P − I is called the Laplace operator of (0, µ). For any set A ⊂ 0, denote by A the set containing all vertices of A and all their neighbors. If a function f is defined on A, then 1 f is defined on A and 1 f (x) =

X y∼x

∗ Plenty

P(x, y) f (y) − f (x) =

1 X ∇x y f µx y , µ(x)

(3.1)

y∈0

of examples of transient graphs and fractals with sub-Gaussian heat kernel bounds can be found in [4], [7], and [8].

460


where ∇x y f := f (y) − f (x). Note that although the summation in the second sum in (3.1) runs over all vertices y, the summand is nonvanishing only if y ∼ x. The following is a discrete analogue of the Green formula: for any finite set A and for all functions f and g defined on A, X X 1 X 1 f (x)g(x)µ(x) = ∇x y f g(x)µx y − ∇x y f ∇x y g µx y . 2 x∈A

x∈A,y ∈A /

x,y∈A

(3.2) We say that a function v is harmonic in set A if v is defined in A and 1v = 0 in A. Similarly, we say that a function v is superharmonic if 1v ≤ 0. Observe that the inequality 1v ≤ 0 is equivalent to X v(x) ≥ P(x, y)v(y). y∼x

The latter implies, in particular, that the infimum of a family of superharmonic functions is again superharmonic. For any nonempty set A ⊂ 0, let c0 (A) be the set of functions on 0 whose support is finite and is in A. Denote by 1 A the Laplace operator with the vanishing Dirichlet boundary condition on A; that is, ( 1 f, x ∈ A, 1 A f (x) := 0, x∈ / A. The operator 1 A is symmetric with respect to the measure µ and is nonpositive definite. Moreover, it is essentially self-adjoint in L 2 (A, µ). For a finite set A, denote by |A| its cardinality. If A is finite and nonempty, then the operator −1 A has |A| nonnegative eigenvalues that we enumerate in increasing order and denote as follows: λ1 (A) ≤ λ2 (A) ≤ · · · ≤ λ|A| (A). It is known that all eigenvalues λi (A) lie in the interval [0, 2] and that λ1 (A) ∈ [0, 1] (see, e.g., [19], [22, Sec. 3.3]). The smallest eigenvalue λ1 (A) admits the variational definition −(1 f, f ) λ1 (A) = inf = f ∈c0 (A) ( f, f )

P (1/2) x∼y (∇x y f )2 µx y P 2 inf , f ∈c0 (A) x f (x)µ(x)

where ( f, g) :=

X x∈0

f (x)g(x)µ(x).

(3.3)


461

If A = B(x, R), then we write, for simplicity, λ(x, R) := λ1 (B(x, R)). Given a nonempty set A ⊂ 0, let X nA be the random walk on (0, µ) with the killing condition outside A. Its Markov operator P A (x, y) is defined by ( P(x, y), x, y ∈ A, A P (x, y) := 0, otherwise. The transition function PnA (x, y) of X nA is defined inductively: P0A (x, y) = δx y and X X A (3.4) P A (x, z)PnA (z, y). (x, y) = PnA (x, z)P A (z, y) = Pn+1 z∈0

z∈0

As easily follows from (3.4), the function u n (x) = PnA (x, y) satisfies in A × N the discrete heat equation u n+1 − u n = 1 A u n . (3.5) The heat kernel pnA (x, y) of X nA is defined by pnA (x, y) :=

PnA (x, y) . µ(y)

As follows from (2.1), p A is symmetric in x and y. In particular, the kernel pnA (x, y) satisfies heat equation (3.5) both in (n, x) and (n, y). If f (x) is a function on A, then the function X u n (x) := PnA f (x) = pnA (x, y) f (y)µ(y) y∈A

solves, in A × N, heat equation (3.5) with initial data u 0 = f and boundary data u n (x) = 0 if x ∈ / A. The Green function of X nA is defined by G A (x, y) :=

∞ X

PnA (x, y).

n=0

The alternative definition is that the function G A (x, y) is the infimum of all positive fundamental solutions of the Laplace equation in A. If the Green function is finite, then, for any y ∈ A, we have 1 A G A (·, y) = −δ y . The opposite case, when G A (x, y) ≡ +∞, is equivalent to the recurrence of the process X nA . The Green kernel g A (x, y) is defined by ∞

g A (x, y) =

G A (x, y) X A = pn (x, y). µ(y) n=0

462


Clearly, the Green kernel is symmetric in x, y. Therefore, if g A is finite, then g A is superharmonic in A with respect to both x and y, and it is harmonic away from the diagonal x = y. Observe that if µ(x) ' 1 (which in particular follows from (V )), then G A (x, y) ' g A (x, y) and pnA (x, y) ' PnA (x, y). It is easy to see that kernels pnA (x, y) and g A (x, y) increase when A is enlarged and tend to global kernels pn (x, y) and g(x, y) (defined in Section 2) as an increasing sequence of sets A exhausts 0. If A is finite and nonempty, then it makes sense to consider the Dirichlet problem in A, ( 1u = f in A, (3.6) u=h in A \ A, where f and h are given function on A and A \ A, respectively. As follows easily from the maximum principle, the solution u exists and is unique. For a finite set A, c0 (A) is identified with all functions on A extended by zero outside A. Then the equation 1 A u = f, where u and f are in c0 (A), is equivalent to Dirichlet problem (3.6) with h = 0. Its solution is given by means of the Green operator G A as follows: X u(x) = −G A f (x) = − G A (x, y) f (y). (3.7) y

In other words, we have G A = (−1 A )−1 . For any set A ⊂ 0 and a point x ∈ 0, define the mean exit time E A (x) by X E A (x) := G A (x, y). (3.8) y∈A

As follows from the above discussion, the function E A (x) solves the following boundary value problem in A: ( 1u = −1 in A, (3.9) u=0 outside A. Denote by T A the first exit time from set A for the process X n ; that is, T A := min{k : X k ∈ / A}. We claim that E A (x) = Ex (T A ), which justifies the term “mean exit time” for E A . Indeed, T A coincides with the cardinality of all n = 0, 1, 2, . . . for which X nA is in A; that is, ∞ X TA = 1{ X nA ∈A} , n=0


463

whence Ex (T A ) =

∞ X

∞ X X X Px X nA ∈ A = PnA (x, y) = G A (x, y) = E A (x). n=0 y∈A

n=0

y∈A

If A = B(x, R), then we use a shorter notation E(x, R) := E B(x,R) (x). Another function associated with the exit time is the exit probability, defined by 9nA (x) := Px {X k ∈ / A for some k ≤ n} = Px {T A ≤ n} .

(3.10)

In other words, 9nA (x) is the probability that the random walk X k started at x will at least once exit A by time n. Alternatively, 9nA (x) can be defined as the solution u n (x) to the following initial boundary value problem in A × N:    u n+1 − u n = 1u n , (3.11) u 0 (x) = 0, x ∈ A,   u (x) = 1, x∈ / Aand n ≥ 0. n

If A = B(x, R), then we use the shorter notation 9n (x, R) := 9nB(x,R) (x). To conclude this section, we prove two useful consequences of condition ( p0 ): P(x, y) ≥ p0 ,

∀x ∼ y.

( p0 )

PROPOSITION 3.1 If ( p0 ) holds, then, for all x ∈ 0 and R > 0 and for some C = C( p0 ),

V (x, R) ≤ C R µ(x).

(3.12)

Remark 3.1 Inequality (3.12) implies that, for a bounded range of R, V (x, R) ' µ(x). Proof Let x ∼ y. Since P(x, y) = µx y /µ(x) and µx y ≤ µ(y), hypothesis ( p0 ) implies p0 µ(x) ≤ µ(y). Similarly, p0 µ(y) ≤ µ(x). Iterating these inequalities, we obtain, for arbitrary x and y, d(x,y) p0 µ(y) ≤ µ(x). (3.13) Another consequence of ( p0 ) is that any point x has at most p0−1 neighbors. Therefore, any ball B(x, R) has at most C R vertices inside. By (3.13), any point y ∈ B(x, R) has measure at most p0−R µ(x), whence (3.12) follows.

464


PROPOSITION 3.2 Assume that hypothesis ( p0 ) holds on (0, µ). Let function v be nonnegative in A and superharmonic in A. Then, for all points x, y ∈ A such that x ∼ y, we have v(x) ' v(y).

Proof Indeed, the superharmonicity of v implies X v(x) ≥ P(x, z)v(z) ≥ P(x, y)v(y), z∼x

whence v(x) ≥ p0 v(y) by ( p0 ). In the same way, v(y) ≥ p0 v(x), whence the claim follows. 4. Outline of the proof and its consequences The proof of Theorem 2.1 consists of many steps. Here we describe the logical order of these steps. The rest of the paper is arranged so that each section treats a certain topic corresponding to one or more steps in the proof of Theorem 2.1. Apart from conditions (V ), (G), (U E), and (U E) described in Section 2, we introduce here some more lettered conditions that are widely used in the proof. We say that the Faber-Krahn inequality holds on (0, µ) if, for some positive exponent ν, λ1 (A) ≥ cµ(A)−1/ν (F K ) for all nonempty finite sets A ⊂ 0. In particular, (F K ) holds in Z D with ν = D/2. If 0 is infinite and connected and if µ is the standard weight on 0, then (F K ) automatically holds with ν = 1/2 (see [9, Prop. 2.5]). We are interested in (F K ) with ν = α/β, where α and β are the parameters from (U E) and (L E), in which case we have ν > 1. An easy consequence of (U E) is the diagonal upper estimate pn (x, x) ≤ Cn −α/β

(DU E)

for all x ∈ 0 and n ≥ 1. Consider the following estimates for the mean exit time and the exit probability: E(x, R) ' R β

(E)

for all x ∈ 0, R ≥ 1, and 9n (x, R) ≤ C exp

−

Rβ Cn

1/(β−1) (9)

for all x ∈ 0, R > 0, and n ≥ 1. For example, (E) and (9) hold in Z D with β = 2.


465

Part (V ) + (G) =⇒ (U E) of Theorem 2.1 is proved by the following chain of implications: (V ) + (G) ⇓Prop. 5.5 ⇓Prop. 6.3 (F K ) (E) ⇓Prop. 5.1 ⇓Prop. 7.1 (DU E) (9) | {z } ⇓Prop. 8.1 (U E) The relations among exponents α, β, γ , and ν involved in all conditions are as follows: α−β =γ

and

α/β = ν.

Given (DU E) and (9), one easily obtains full upper bound (U E) using the approach of Barlow and Bass [6] (see Section 8). The method of obtaining Faber-Krahn inequality (F K ) from (V ) and (G) is based on ideas of G. Carron [14]. The implication (F K ) =⇒ (DU E) is a discrete modification of the approach of A. Grigor’yan [32]. The implication (V ) + (G) =⇒ (E) was originally proved by Telcs [59], and here we give a simpler proof for that. The crucial part of the proof of upper estimate (U E) is the implication (E) =⇒ (9). The following nearly Gaussian estimate is true always, without assuming (E) or anything else: V (x, R) R2 exp − (4.1) 9n (x, R) ≤ C µ(x) Cn (see [58] and [33, p. 355]). However, (4.1) is not good enough for us even if we neglect the factor V (x, R) in front of the exponential. Indeed, the range of n, for which we apply (9), is n > R (see the proof of Proposition 8.1). Assuming β > 2, we have in this range β 1/(β−1) R2 R > , n n so that (9) is stronger than (4.1). We provide here an entirely new argument for (E) =⇒ (9), which is based on an investigation of solutions of the equation 1v = λv. Function v can be estimated by comparing it to 1u = −1 (and the latter is related to the mean exit time). On the other hand, function (1 + λ)n v(x) satisfies the discrete heat equation and hence can be compared to 9nA (x) by using the parabolic comparison principle (see Section 7 for details). Another proof of (E) =⇒ (9) can be obtained by using the probabilistic method of Barlow and Bass [5], [6], [7].

466


Before we consider the proof of lower bound (L E), let us introduce the following conditions. The near-diagonal lower estimate is pn (x, y) + pn+1 (x, y) ≥ cn −α/β

if d(x, y) ≤ δn 1/β

(N L E)

for some positive constant δ. Obviously, (N L E) is equivalent to (L E) in the range d(x, y) ≤ δn 1/β . As an intermediate step, we use the following diagonal lower estimate for the killed random walk: B(x,R)

p2n

(x, x) ≥ cn −α/β

if n ≤ ε R β

(DL E)

for some positive constant ε. We say that the Harnack inequality holds on (0, µ) if, for any ball B(x, 2R) ⊂ 0 and for any nonnegative function u in B(x, 2R) which is harmonic in B(x, 2R), max u ≤ H min u

B(x,R)

(H )

B(x,R)

for some constant H ≥ 1. The Harnack inequality reflects a certain homogeneity of the graph. For example, it holds for Z D with the standard weight but fails on the connected sum of two copies of Z D as well as on a binary tree. The scheme of the proof of (V ) + (G) =⇒ (L E) is shown on the diagram below. From the previous diagram, we already know that conditions (F K ) and (E) follow from (V ) + (G), as well as the implications (F K ) =⇒ (DU E) and (E) =⇒ (9): (V ) z

(F K ) ⇓Prop. 5.1 (DU E) ⇓Prop. 12.3 [deriv]

|

+ (G) ⇓Prop. 5.5, 6.3 }| (E) ⇓Prop. 7.1 (9) + (V ) | {z } ⇓Prop. 9.1 (DL E) + (E) {z ⇓Prop. 13.1 (N L E) + (V ) | {z } ⇓Prop. 13.2 (L E)

(G) ⇓Prop. 10.1

{

(H ) ⇓Prop. 11.2 [osc] }

The central point in the diagram is Proposition 13.1, where (N L E) is obtained from (DU E), (DL E), (E), and (H ). The proof goes through the intermediate steps that


467

are denoted here by [osc] and [deriv]. The former refers to oscillation inequality (11.3) obtained from (H ) in Propositions 11.1 and 11.2, and the latter refers to upper estimate (12.5) for | pn+2 − pn | obtained from (DU E) in Proposition 12.3. The idea of obtaining (N L E) by means of an elliptic Harnack inequality seems to have appeared independently in papers by P. Auscher [2], [3], Barlow and Bass [6], [7], [8], and W. Hebisch and L. Saloff-Coste [38]. Basically, one views the heat equation for the heat kernel as an elliptic equation 1( pn + pn+1 ) = f,

where f = pn+2 − pn .

The elliptic Harnack inequality and the upper bound for E(x, r ) allow one to estimate the oscillation of pn + pn+1 via f . (In the continuous setting, the latter argument is classical and is due to J. Moser [49].) On the other hand, the on-diagonal upper bound for pn implies a suitable estimate for the discrete time derivative pn+2 − pn . The fact that (DU E) implies a certain estimate of the time derivative of the heat kernel is well known. In the context of manifolds it goes back to S. Cheng, Li, and Yau [17] and E. Davies [26], [27] (see also [34]); in the discrete setting it follows from the results of E. Carlen, S. Kusuoka, and D. Stroock [13] and T. Coulhon and Saloff-Coste [23]; and in the setting of fractals it is proved by Barlow and Bass [7]. Having an upper bound for the oscillation of pn + pn+1 and the on-diagonal lower bound for pn + pn+1 , one obtains (N L E). The final step in the proof—the implication (N L E) + (V ) =⇒ (L E)—is done by using the classical chaining argument of Moser [50] and D. Aronson [1]. The method of obtaining (DL E) from (9) and (V ) used in Proposition 9.1 is well known. Its various modifications can be found in [6], [11], [21], [24], [48], [56], and possibly in other places. The claim that Green kernel estimate (G) implies elliptic Harnack inequality (H ) would not surprise experts. In the context of the uniformly elliptic operators in R D , this was first observed by E. Landis [46, p. 145–146] and then was elaborated by N. Krylov and M. Safonov [43] and E. Fabes and Stroock [29]. However, this claim becomes rather nontrivial for arbitrary graphs (and manifolds) because of topological difficulties. We provide here a new, simple, and general proof of the implication (G) =⇒ (H ), which is based on the potential theoretic approach of A. Boukricha [12]. Finally, the converse implication (U E) + (L E) =⇒ (V ) + (G) is quite straightforward and is proved in Proposition 15.1. As a consequence of the above diagrams, we see that the following equivalence takes place: (F K ) + (V ) + (E) + (H ) ⇐⇒ (U E) + (L E).

468


It is possible to show that this equivalence is also true for recurrent graphs. Furthermore, Faber-Krahn inequality (F K ) turns out to follow from (V ) + (E) + (H ), so that (V ) + (E) + (H ) ⇐⇒ (U E) + (L E). (4.2) Condition (H ) ensures here a necessary homogeneity of the graph, whereas (V ) and (E) provide the exponents α and β, respectively. Another consequence of the proof is that (V ) + (U E) + (H ) ⇐⇒ (U E) + (L E)

(4.3)

(see Remark 15.1). There are a number of conditions given in terms of capacities, eigenvalues, and so on, which can replace (E) or (U E) in (4.2) and (4.3), respectively. In the presence of (V ) and (H ), the purpose of the other condition is to recover the exponent β in (U E) and (L E). Note that if β = 2, then (U E) in (4.3) can be replaced by (DU E) (cf. [38]). The complete proofs of (4.2), (4.3), and other related statements will be given elsewhere. 5. The Faber-Krahn inequality and on-diagonal upper bounds Recall that a Faber-Krahn inequality holds on (0, µ) if there are constants c > 0 and ν > 0 such that, for all nonempty finite sets A ⊂ 0, λ1 (A) ≥ cµ(A)−1/ν .

(F K )

We discuss here relationships between eigenvalue estimates like (F K ) and estimates of the Green kernel, heat kernel, and volume growth. The outcome is the following implications: (V ) + (G) =⇒ (F K ) =⇒ (DU E), which are contained in Propositions 5.5 and 5.1, respectively, and which constitute a part of the proof of Theorem 2.1. 5.1 Let (0, µ) satisfy ( p0 ), and let ν be a positive number. Then the following conditions are equivalent: (a) Faber-Krahn inequality (F K ); (b) the on-diagonal heat kernel upper bound, for all x ∈ 0 and n ≥ 1, PROPOSITION

pn (x, x) ≤ Cn −ν ;

(DU E)


(c)

469

the estimate of the level sets of the Green kernel, for all x ∈ 0 and t > 0, µ{y : g(x, y) > t} ≤ Ct −ν/(ν−1) ,

(5.1)

provided ν > 1. The analogue of Proposition 5.1 for manifolds was proved by Carron [14]. The equivalence (a) ⇐⇒ (b) was also proved in [32] for heat kernels on manifolds, and in [20, Prop. V.1] for random walks satisfying in addition the condition infx P(x, x) > 0. We provide detailed proof only for the implications (a) =⇒ (b) and (c) =⇒ (a) which we use in this paper. The implication (b) =⇒ (c) can be proved in the following way. By a theorem of N. Varopoulos [63], (DU E) implies a Sobolev inequality. Then one applies an argument of [14, Prop. 1.14] (adapted to the discrete setting) to show that (5.1) follows from the Sobolev inequality. Note that our proof of (a) =⇒ (b) goes through for any ν > 0. If ν > 1, then one could apply the approach of [14] using a Sobolev inequality as an intermediate step between (a) and (b). In general, we use instead a Nash-type inequality that is obtained in the following lemma. LEMMA 5.2 Let (0, µ) be a weighted graph (which is not necessarily connected). Assume that, for any nonempty finite set A ⊂ 0,

λ1 (A) ≥ 3(µ (A)),

(5.2)

where 3(·) is a nonnegative nonincreasing function on (0, ∞). Let f (x) be a nonnegative function on 0 with finite support. Denote X X f (x)µ(x) = a and f 2 (x)µ(x) = b. x∈0

x∈0

Then, for any s > 0, 1X (∇x y f )2 µx y ≥ (b − 2sa) 3(a/s). 2 x∼y

(5.3)

Proof If b − 2sa < 0, then (5.3) trivially holds. So, we can assume in the sequel that s≤

b . 2a

Since b ≤ a max f , (5.4) implies s < max f and, therefore, the set As = {x ∈ 0 : f (x) > s}

(5.4)

470

GRIGOR’YAN AND TELCS f (x)

A

As = { f > s}

0

Figure 2. Sets A and As

is nonempty (see Figure 2). Consider function h = ( f − s)+ . This function belongs to c0 (As ) whence we obtain, by variational property (3.3) of eigenvalues, X 1X (∇x y h)2 µx y ≥ λ1 (As ) h 2 (x)µ(x). 2 x∼y

(5.5)

x∈0

Let us estimate all terms in (5.5) via f . We start with the obvious inequality f 2 ≤ ( f − s)2+ + 2s f = h 2 + 2s f, which holds for any s ≥ 0. It implies h 2 ≥ f 2 − 2s f whence X h 2 (x)µ(x) ≥ b − 2sa.

(5.6)

x∈0

The definition of As implies µ(As ) ≤ a/s whence, by (5.2), λ1 (As ) ≥ 3 (µ (As )) ≥ 3(a/s).

(5.7)

Clearly, we also have X x∼y

(∇x y h)2 µx y ≤

X

(∇x y f )2 µx y .

x∼y

Combining this with (5.7), (5.6), and (5.5), we obtain (5.3). We apply Lemma 5.2 for function 3(v) = cv −1/ν . Choosing s = b/(4a) in (5.3), we obtain 1X (∇x y f )2 µx y ≥ c a −2/ν b1+1/ν . (5.8) 2 x∼y This is a discrete version of the Nash inequality (cf. [51], [13]).


471

Proof of (a) =⇒ (b) in Proposition 5.1 Step 1. Let f be a nonnegative function on 0 with finite support. Denote, for simplicity, X X b= f 2 (x)µ(x) and b0 = [P f (x)]2 µ(x), x∈0

x∈0

where P is the Markov operator of (0, µ). Then we have b − b0 = ( f, f ) L 2 (0,µ) − (P f, P f ) L 2 (0,µ) = ( f, (I − P2 ) f ) L 2 (0,µ) . Clearly, Q := P2 is also a Markov operator on 0 reversible with respect to µ, and it is associated with another structure of a weighted graph on the set 0. Denote this weighted graph by (0 ∗ , µ∗ ). As a set, 0 ∗ coincides with 0, and the measures µ and µ∗ on vertices are the same. On the other hand, points x, y are connected by an edge on 0 ∗ if there is a path of length 2 from x to y in 0, and the weight µ∗x y on edges of 0 ∗ is defined by µ∗x y = Q(x, y)µ(x). Denote by 1∗ the Laplace operator of (0 ∗ , µ∗ ). Then 1∗ = P2 − I and, by Green formula (3.2), X 1 X b − b0 = − f (x) 1∗ f (x)µ(x) = (∇x y f )2 µ∗x y . (5.9) 2 x∈0

x,y∈0

Step 2. If A is a nonempty finite subset of 0, then [22, Lem. 4.3] says that∗ λ∗1 (A) ≥ λ1 (A),

(5.10)

where λ∗1 (A) is the first eigenvalue of −1∗A . By Faber-Krahn inequality (F K ) for the graph (0, µ), we obtain λ∗1 (A) ≥ cµ(A)−1/ν . (5.11) Since ( p0 ) and Proposition 3.1 imply X X V (x, 2) ≤ C µ(x) = Cµ(A) = Cµ∗ (A), µ(A) ≤ x∈A

(5.11) yields (F K ) for the graph

x∈A

(0 ∗ , µ∗ ).

Remark 5.1 The only place where ( p0 ) is used in the proof of (a) =⇒ (b) is to ensure that µ(A) ≤ Cµ(A). If this inequality holds for another reason, then the rest of the proof goes in the same way. ∗

The proof of (5.10) is based on variational property (3.3) and on the fact that all eigenvalues of −1 A belong to the interval [λ1 (A), 2 − λ1 (A)].

472


Step 3. For some fixed y ∈ 0, denote f n (x) = pn (x, y) and X bn = f n2 (x)µ(x) = p2n (y, y). x∈0

Then f n+1 = P f n and we obtain, by (5.9), bn − bn+1 =

1 X (∇x y f n )2 µ∗x y . 2 x,y∈0

The graph (0 ∗ , µ∗ ) satisfies (F K ) so that Lemma 5.2 can be applied. Since X X f n (x)µ(x) = Pn (x, y) = 1, x∈0

x∈0

(5.8) yields 1 X 1+1/ν (∇x y f n )2 µ∗x y ≥ c bn , 2 x,y∈0

whence 1+1/ν

bn − bn+1 ≥ cbn

.

(5.12)

In particular, we see that bn > bn+1 . Next we apply an elementary inequality ν(x − y) ≥

x ν − yν , x ν−1 + y ν−1

(5.13) −1/ν

−1/ν

which is true for all x > y > 0 and ν > 0. Taking x = bn+1 and y = bn obtain, from (5.13) and (5.12), −1/ν

−1/ν

ν(bn+1 − bn

)≥

−1 bn+1 − bn−1 −(ν−1)/ν bn+1

−(ν−1)/ν + bn

=

whence −1/ν

−1/ν

bn+1 − bn

≥

bn − bn+1 1/ν bn+1 bn

1/ν + bn bn+1

, we

1+1/ν

≥

cbn

1+1/ν 2bn

=

c , 2

c = const. 2ν −1/ν

Summing up this inequality in n, we conclude that bn ≥ cn and bn ≤ Cn −ν . Since bn = p2n (y, y), we have proved that, for all y ∈ 0 and n ≥ 1, p2n (y, y) ≤ Cn −ν , which is (DU E) for all even times.

(5.14)


473

Step 4. By the semigroup identity, we have, for any 0 < k < m, X pm (x, y) = pm−k (x, z) pk (z, y)µ(z).

(5.15)

z∈0

In particular, if m = 2n, k = n, and y = x, then X p2n (x, x) = pn2 (x, z)µ(z).

(5.16)

z∈0

On the other hand, (5.15), the Cauchy-Schwarz inequality, and (5.16) imply X p2n (x, y) = pn (x, z) pn (z, y)µ(z) z∈0

≤

" X

#1/2 " X

pn2 (x, z)µ(x)

z∈0

#1/2 pn2 (y, z)µ(z)

,

z∈0

whence p2n (x, y) ≤ p2n (x, x)1/2 p2n (y, y)1/2 .

(5.17)

Together with (5.14), this yields p2n (x, y) ≤ Cn −ν for all x, y ∈ 0. This implies (DU E) also for odd times if we observe that, by (5.15) and (2.2), X p2n+1 (x, y) = p2n (x, z)P(z, y) ≤ max p2n (x, z). (5.18) z∈0

z∈0

Proof of (c) ⇒ (a) in Proposition 5.1 Let A be a nonempty finite subset of 0, and let f ∈ c0 (A) be the first eigenfunction of −1 A . We may assume that f ≥ 0. Let us normalize f so that max f = 1, and let x0 ∈ A be the maximum point of f . The equation −1 A f = λ1 (A) f implies, by (3.7), X f (x) = λ1 (A) G A (x, y) f (y), y∈A

whence, for x = x0 , 1 = λ1 (A)

X

G A (x0 , y) f (y) ≤ λ1 (A)

y∈A

G A (x0 , y)

y∈A

and λ1 (A) ≥

X

max x∈A

X y∈A

−1 G A (x, y) .

(5.19)

474


On the other hand, for any x ∈ A, X

G A (x, y) =

y∈A

X

g A (x, y)µ(y) =

∞

Z

µ {g A (x, ·) > t} dt. 0

y∈A

Fix some t0 > 0, and estimate the integral above using (5.1), g A ≤ g, and the fact that µ {g A (x, ·) > t} ≤ µ(A). Then we obtain X y∈A

G A (x, y) ≤

t0

Z

µ(A) dt +

0

∞

Z

t0

−1/(ν−1)

Ct −ν/(ν−1) dt = µ(A)t0 + Ct0

.

Let us choose t0 ' µ(A)−(ν−1)/ν to equate the two terms on the right-hand side, whence X G A (x, y) ≤ Cµ(A)1/ν . (5.20) y∈A

Finally, (5.20) and (5.19) imply (F K ). The second result of this section is preceded by two lemmas. We say that a weighted graph (0, µ) satisfies the doubling volume condition if V (x, 2R) ≤ C V (x, R),

∀x ∈ 0, R > 0.

(D)

Clearly, (D) is a weaker assumption than (V ). 5.3 If (0, µ) satisfies (D), then, for all x ∈ 0 and R > 0, LEMMA

λ(x, R) ≤

C . R2

Proof Let us apply variational property (3.3) with the test function f (y) = (R − d(x, y))+ ∈ c0 (B(x, R)). Since ∇ yz f ≤ 1, (3.3) and (D) imply P (1/2) y∼z (∇ yz f )2 µ yz C0 C V (x, R) P 2 ≤ 2 ≤ 2, λ(x, R) ≤ R V (x, R/2) R y f (y)µ(y) which was to be proved.

(5.21)


475

The next lemma was proved in [59], but we give here a shorter proof. 5.4 Let (0, µ) satisfy ( p0 ). If (V ) and (G) hold, with some positive parameters α and γ , then α − γ ≥ 2 . LEMMA

Proof By (5.19), we have λ(x, R)−1 ≤

X

max

y∈B(x,R)

G(y, z).

(5.22)

z∈B(x,2R)

By (G) and Proposition 3.2, G(y, y) is uniformly bounded from above. Using (G) to estimate G(y, z) for y 6= z and (V ), we obtain X

G(y, z) = G(y, y) +

dlog 2 Re X

X

g(y, z)µ(z)

i=−1 z∈B(y,2−i R)\B(y,2−i−1 R)

z∈B(y,2R)

≤C +C

dlog 2 Re −γ X 2−i R V (y, 2−i R) i=−1

≤C 1+

dlog 2 Re X

2−i R

α−γ

.

(5.23)

i=−1

A straightforward computation of sum (5.23) yields, for large R,  α−γ ,  α > γ,  R X G(y, z) ≤ C log2 R, α = γ ,   z∈B(y,2R) 1, α < γ. Combining (5.22) and (5.24), we obtain  −(α−γ ) ,  α > γ,  R −1 λ(x, R) ≥ c log2 R , α = γ,   1, α < γ.

(5.24)

(5.25)

By Lemma 5.3, we have (5.21), which together with (5.25) implies α − γ ≥ 2. 5.5 Let (0, µ) satisfy ( p0 ). If (V ) and (G) hold with some positive parameters α and γ , then Faber-Krahn inequality (F K ) holds with the parameter ν = α/(α − γ ). PROPOSITION

476


Proof Note that, by Lemma 5.4, we have α > γ so that ν is positive and, moreover, ν > 1. Let us verify that µ{y : g(x, y) > t} ≤ const t −α/γ . (5.26) Then (5.1) would follow with ν = α/(α − γ ), which implies (F K ), by Proposition 5.1. The upper bound in (G) and ( p0 ) implies that, for all x, y (including the case x = y; see Proposition 3.2), g(x, y) ≤ C min(1, d(x, y)−γ ).

(5.27)

If t ≥ C, then the set {y : g(x, y) > t} is empty, and (5.26) is trivially true. Assume now that t ≤ C. Then (5.27) implies µ{y : g(x, y) > t} ≤ µ{y : d(x, y) < (t/C)−1/γ } = V (x, (t/C)−1/γ ). Since R := (t/C)−1/γ ≥ 1, we can apply here the upper bound from (V ) and obtain (5.26). 6. The mean exit time and the Green kernel The purpose of this section is to verify part (V ) + (G) =⇒ (E) of the proof of Theorem 2.1. Recall that (E) stands for the condition E(x, R) ' R β ,

∀x ∈ 0, R ≥ 1.

(E)

Alongside the mean exit time E A (x), consider the maximal mean exit time E A defined by E A := sup E A (y). (6.1) y

If A = B(x, R), then we write E(x, R) := E B(x,R) . We also use the following hypothesis: E(x, R) ≤ C E(x, R), ∀x ∈ 0, R > 0. (E) 6.1 The upper bound in (E) implies, for all x ∈ 0 and R ≥ 1, PROPOSITION

E(x, R) ≤ C R β .

(6.2)

E(x, R) ≥ c R β .

(6.3)

The lower bound in (E) implies

Consequently, (E) implies (E) and E(x, R) ' R β .

(6.4)


477

Proof To show (6.2), let us observe that, for any point y ∈ B(x, R), we have B(x, R) ⊂ B(y, 2R), whence E(x, R) =

E B(x,R) (y) ≤

sup y∈B(x,R)

E B(y,2R) (y)

E(y, 2R) ≤ C R β .

sup

=

sup y∈B(x,R)

y∈B(x,R)

Lower bound (6.3) is obvious by E ≤ E. Finally, (E) follows from (E) and (6.4) if R ≥ 1, and (E) holds trivially if R < 1. PROPOSITION 6.2 For any nonempty finite set A ⊂ 0, we have

λ1 (A) ≥ (E A )−1 .

(6.5)

Proof Indeed, this is a combination of (5.19) and the definition of E (see (3.8) and (6.1)). The next statement was proved in [59]. PROPOSITION 6.3 Let (0, µ) satisfy ( p0 ). If (V ) and (G) hold, with some positive parameters α and γ , then (E) holds as well with β = α − γ .

Proof Denote A = B(x, R). Applying (3.8), the obvious inequality g A ≤ g, as well as (V ) and (G), we obtain (cf. (5.23) and (5.24)) X X E(x, R) = g A (x, y)µ(y) ≤ g(x, y)µ(y) ≤ C R α−γ . y∈A

y∈A

Observe that, by Lemma 5.4, we already know that α > γ . For the lower bound of E(x, R), let us prove that g A (x, y) ≥ c d(x, y)−γ ,

∀y ∈ B(x, ε R) \ {x} ,

(6.6)

provided ε > 0 is small enough. Consider the function u(y) = g(x, y) − g A (x, y) which is harmonic in A. By the maximum principle, its maximum is attained at the boundary of A, whence, by (G), 0 ≤ u(y) ≤ C R −γ .

478


Therefore, g A (x, y) = g(x, y) − u(y) ≥ c d(x, y)−γ − C R −γ .

(6.7)

If R is large enough and if d(x, y) ≤ ε R with a small enough ε, then the second term in (6.7) is absorbed by the first one, whence (6.6) follows. Summing up (6.6) over y, we obtain (cf. (5.23) and (5.24)) X X E(x, R) = g A (x, y)µ(y) ≥ g A (x, y)µ(y) ≥ c R α−γ . y∈A

y∈B(x,ε R)\{x}

If R is not big enough, then the above argument does not work. However, in this case we argue as follows. If the random walk starts at x, then TB(x,R) ≥ R. Hence, we always have E(x, R) = Ex (TB(x,R) ) ≥ R which yields the lower bound in (E), provided R ≤ const. Assuming that (V ) and (E) hold, there are the following general relations between the exponents α and β: if the graph is transient, then 2 ≤ β ≤ α; and if it is recurrent, then 2 ≤ β ≤ α + 1 (see [59]; see also [60], [61] for various definitions of dimensions of graphs). 7. Sub-Gaussian term The following statement is crucial for obtaining the off-diagonal upper bound of the heat kernel. It contains the part (E) =⇒ (9) of the proof of Theorem 2.1. PROPOSITION 7.1 Assume that the graph (0, µ) possesses property (E). Then, for all x ∈ 0, R > 0, and n ≥ 1, we have β 1/(β−1) R 9n (x, R) ≤ C exp − . (9) Cn

We start with the following lemma. 7.2 Assume that hypothesis (E) holds on (0, µ). Let A = B(x0 , r ) be an arbitrary ball on 0, and let v be a function on A such that 0 ≤ v ≤ 1. Suppose that v satisfies in A the equation 1v = λv, (7.1) LEMMA

where λ is a constant such that λ ≥ (E A )−1 .

(7.2)


≥ε

479

v(x)

1

x0

A = B(x0 , r )

Figure 3. The value of the function v at the point x0 does not exceed 1 − ε

Then v(x0 ) ≤ 1 − ε,

(7.3)

where ε > 0 depends on the constants in hypothesis (E) (see Figure 3).

Proof Denote for simplicity u(x) = E A (x), and recall that u ∈ c0 (A) and 1u = −1 in A (cf. (3.9)). Also, denote 1 λ0 := (E A )−1 = . max u Consider the function w = 1 − (λ0 /2)u. Then 1/2 ≤ w ≤ 1 and, in A, 1w =

λ0 ≤ λ0 w ≤ λw. 2

Since v ≤ 1 and w = 1 outside A, the maximum principle for the operator 1 − λ implies that v ≤ w in A. In particular, v(x0 ) ≤ w(x0 ) = 1 − Hypothesis (E) yields

λ0 u(x0 ) u(x0 ) ≤ 1 − . 2 2 max u

u(x0 ) E(x0 , r ) = ≥ c, max u E(x0 , r )

whence (7.3) follows. 7.3 Assume that (0, µ) satisfies (E). Let A = B(x0 , R) be an arbitrary ball on 0, and let v be a function on A such that 0 ≤ v ≤ 1. If v satisfies, in A, equation (7.1 ) with a constant λ such that C R −β ≤ λ < λ, (7.4) LEMMA

480


(r + 1)(i + 1) (r + 1)i xi+1 r xi

x0

Figure 4. The points xi where v(x) takes the maximum values

then

v(x0 ) ≤ exp −cλ1/β R .

(7.5)

Here λ is an arbitrary constant, C is some constant depending on condition (E), and c > 0 is some constant depending on λ and on condition (E). Proof Condition (E) implies (E) and E(x, R) ' R β (see Proposition 6.1). Choose the constant C in (7.4) so big that the lower bound in (7.4) implies λ ≥ E(x, R)−1 . Then, by Lemma 7.2, we obtain v(x0 ) ≤ 1 − ε. If we have, in addition, λ1/β R ≤ const,

(7.6)

then (7.5) is trivially satisfied. In particular, if R is in the bounded range, then (7.6) is true because λ is bounded from above by (7.4). Hence, we may assume in the sequel that R > C0

and

λ > C 00 R −β ,

(7.7)

with large enough constants C 0 and C 00 (in particular, C 00 C). The point of the present lemma is that it improves the previous one for this range of R and λ. Choose a number r from the equation λ = Cr −β , where C is the same constant as in (7.4). The above argument shows that Lemma 7.2 applies in any ball of radius r . Let xi , i ≥ 1, be a point in the ball B(x0 , (r + 1)i) in which v takes the maximum value in this ball, and denote m i = v(xi ) (see Figure 4). For i = 0, we set m 0 = v(x0 ). For each i ≥ 0, consider the ball Ai = B(xi , r ). Since Ai ⊂ B(xi , r + 1) ⊂ B(x0 , (r + 1)(i + 1)),


481

we have max v ≤ m i+1 . Ai

Applying Lemma 7.2 to the function v/m i+1 in the ball Ai , we obtain m i ≤ (1 − ε)m i+1 . Iterating this inequality k := bR/(r + 1)c times and using m k ≤ 1, we conclude v(x0 ) = m 0 ≤ (1 − ε)k .

(7.8)

By conditions (7.7) and (7.4) and by the choice of r , we have k'

R ' λ1/β R, r

so that (7.8) implies (7.5). 7.4 Assume that (0, µ) satisfies (E). Let A = B(x0 , R) be an arbitrary ball on 0, and let wn (x) be a function in A × N such that 0 ≤ w ≤ 1. Suppose that w solves in A × N the heat equation wn+1 − wn = 1wn (7.9) LEMMA

with initial data w0 ≡ 0 in A (see Figure 5). Then, for all n ≥ 1, wn (x0 ) ≤ exp

−c

Rβ n

1/(β−1)

+1 .

(7.10)

Proof First, consider two trivial cases. If R β ≤ Cn, then (7.10) is true just by w ≤ 1, provided c is small enough. Since 1w(x) depends only on the immediate neighbors of x, one gets by induction that wk (x) = 0 for all x ∈ B(x0 , R − k). Therefore, if R > n, then wn (x0 ) = 0, and (7.10) is true again. Hence, we may assume in the sequel that, for a large enough C, Cn 1/β < R ≤ n.

(7.11)

Fix some λ > 0, and find a function v(x) on A solving the boundary value problem ( 1v = λv in A, v=1

in A \ A.

482


n

(x0 , n)

wn+1 − wn = 1w

w0 = 0

wn ≤ 1

A 0

Figure 5. The value of the function w at the point (x0 , n) is affected by the initial value w = 0 and by the boundary condition w≤1

The function u n (x) := (1 + λ)n v(x) solves heat equation (7.9) and satisfies the following boundary conditions: u n (x) ≥ 1 for x ∈ A \ A and u 0 (x) ≥ 0 for x ∈ A. By the parabolic comparison principle, we have w ≤ u. Assume for a moment that λ satisfies hypothesis (7.4) of Lemma 7.3. Then we estimate v(x0 ) by (7.5) and obtain wn (x0 ) ≤ (1 + λ)n v(x0 ) ≤ exp λn − cλ1/β R . Now, choose λ from the condition cλ1/β R = 2λn; that is, β/(β−1) cR λ= . 2n

(7.12)

As follows from (7.11), this particular λ satisfies (7.4). Therefore, the above application of Lemma 7.3 is justified, and we obtain β 1/(β−1) R wn (x0 ) ≤ exp(−λn) = exp − c0 , n finishing the proof. Proof of Proposition 7.1 Denote A = B(x0 , R). By (3.11), the function wn (x) := 9nA (x) satisfies all the hypotheses of Lemma 7.4. Hence, (9) follows from (7.10). 8. Off-diagonal upper bound of the heat kernel Here we prove the following implication: (F K ) + (E) =⇒ (U E),

(8.1)


483

which finishes the proof of the heat kernel upper bound in Theorem 2.1. Indeed, together with the implications Prop. 5.5

(V ) + (G) =⇒ (F K ) and

Prop. 6.3

(V ) + (G) =⇒ (E), (8.1) yields the part (V ) + (G) =⇒ (U E) of Theorem 2.1. 8.1 On any graph (0, µ), we have PROPOSITION

(DU E) + (9) =⇒ (U E).

(8.2)

In particular, if ( p0 ) holds on (0, µ), then (F K ) + (E) =⇒ (U E).

(8.3)

Proof By Proposition 5.1, ( p0 ) and (F K ) imply (DU E). By Proposition 7.1, (E) implies (9). Hence, implication (8.3) is a consequence of (8.2). To prove (8.2), let us fix some points x, y ∈ 0 and denote r = d(x, y)/2. Since balls B(x, r ) and B(y, r ) do not intersect, the semigroup identity (5.15) and the symmetry of the heat kernel imply, for any triple of nonnegative integers k, m, n such that k + m = n, X X pn (x, y) ≤ pm (x, z) pk (z, y)µ(z) + pm (x, z) pk (z, y)µ(z) z ∈B(x,r / )

z ∈B(y,r / )

≤ sup pk (z, y) z

X

Pm (x, z) + sup pm (x, z)

z ∈B(x,r / )

z

X

Pk (y, z)

z ∈B(y,r / )

= sup pk (y, z)Px (X m ∈ / B(x, r )) + sup pm (x, z)P y (X k ∈ / B(x, r )) . z

z

As follows from definition (3.10) of 9, Px (X m ∈ / B(x, r )) ≤ 9m (x, r ). Hence, we obtain the following general inequality, which is true for all reversible random walks: pn (x, y) ≤ sup pk (y, z)9m (x, r ) + sup pm (x, z)9k (y, r ). z

z

(8.4)

484


As follows from (5.17), diagonal upper bound (DU E) implies, for all x, y ∈ 0, pn (x, y) ≤ Cn −α/β ,

(8.5)

provided n is even. Using inequality (5.18), we see that (8.5) also holds for odd n. Assuming n ≥ 2, choosing k ' m ' n/2, and applying (8.5) and (9) to estimate the right-hand side of (8.4), we obtain (U E). If n = 1, then (U E) follows trivially from (8.5) and the fact that pn (x, y) = 0 whenever d(x, y) > n. 9. On-diagonal lower bound In this section we prove part (9) + (V ) =⇒ (DL E) of Theorem 2.1. 9.1 Assume that hypothesis (9) holds on (0, µ). For arbitrary x ∈ 0 and R > 0, denote A = B(x, R). Then the following on-diagonal lower bound is true: PROPOSITION

A (x, x) ≥ p2n

c , V (x, Cn 1/β )

(9.1)

provided n ≤ ε R β , where ε is a sufficiently small positive constant depending only on the constants from (9). If in addition (V ) holds, then A p2n (x, x) ≥ cn −α/β ,

∀n ≤ ε R β .

(DL E)

Remark 9.1 A for any A = B(x, R), inequality (DL E) implies p (x, x) ≥ Since p2n ≥ p2n 2n −α/β cn for all positive integers n. Proof Let us fix some r ∈ (0, R) and denote B = B(x, r ). Since p B ≤ p A , it suffices to prove (9.1) for p B instead of p A , for some r < R. Semigroup identity (5.15) for p B and the Cauchy-Schwarz inequality imply B p2n (x, x)

=

X

pnB (x, z)2 µ(z)

z∈B

X 2 1 B ≥ pn (x, z)µ(z) . µ(B)

(9.2)

z∈B

Let us observe that X

pnB (·, z)µ(z) + 9nB (·) = 1.

(9.3)

z∈B

Indeed, the first term in (9.3) is the probability that the random walk X k stays in B up to the time k = n, whereas 9nB is the probability of the opposite event.


485

By hypothesis (9), we have 9nB (x) = 9n (x, r ) ≤ C exp

−

rβ Cn

1/(β−1) .

(9.4)

Choosing r = Cn 1/β for large enough C and assuming n ≤ ε R β for sufficiently small ε > 0 (the latter ensures r < R), we obtain, from (9.4), 9n (x, r ) ≤ 1/2, whence, by (9.3), X 1 pnB (x, z)µ(z) ≥ . 2 z∈B

Therefore, (9.2) yields B p2n (x, x) ≥

1/4 1/4 = , V (x, r ) V (x, Cn 1/β )

finishing the proof. 10. The Harnack inequality and the Green kernel Recall that the weighted graph (0, µ) satisfies the elliptic Harnack inequality if, for all x ∈ 0, R > 0, and for any nonnegative function u in B(x, 2R) which is harmonic in B(x, 2R), max u ≤ H min u (H ) B(x,R)

B(x,R)

with some constant H > 1. In this section we establish that (H ) is implied by condition (G). Recall that the latter refers to g(x, y) ' d(x, y)−γ ,

∀x 6 = y.

(G)

Consider the following annulus Harnack inequality for the Green kernel: for all x ∈ 0 and R > 1, max g(x, y) ≤ C

y∈A(x,R)

min

y∈A(x,R)

g(x, y),

(H G)

where A(x, R) := B(x, R) \ B(x, R/2). 10.1 Assume that ( p0 ) hold and the graph (0, µ) is transient. Then PROPOSITION

(G) =⇒ (H G) =⇒ (H ). Since the implication (G) =⇒ (H G) is obvious, we need to prove only the second implication. The main part of the proof is contained in the following lemma.

486


U A B x

y

z

Figure 6. The sets B = U0 , A = U2 \ U1 , and U = U3 LEMMA 10.2 Let U0 ⊂ U1 ⊂ U2 ⊂ U3 be a sequence of finite sets in 0 such that Ui ⊂ Ui+1 , i = 0, 1, 2. Denote A = U2 \ U1 , B = U0 , and U = U3 . Then, for any function u that is nonnegative in U2 and harmonic in U2 , we have

max u ≤ H min u, B

B

where H := max max max x∈B y∈B z∈A

G U (y, z) G U (x, z)

(10.1)

(10.2)

(see Figure 6).

Remark 10.1 Note that no a priori assumption has been made about the graph (0, µ) (except for connectedness and unboundedness). If the graph is transient, then, by exhausting 0 by a sequence of finite sets U , we can replace G U in (10.2) by G. Note also that, without loss of generality, one can take U2 = U1 . Proof The following potential-theoretic argument is borrowed from [12]. We use the notation of Section 3. Given a nonnegative harmonic function u in U2 , denote by Su the following class of superharmonic functions: Su = v : v ≥ 0 in U , 1v ≤ 0 in U, and v ≥ u in U1 . Define the function w on U by w(x) = min {v(x) : v ∈ Su } .

(10.3)


487

u v w

U1

U1

U

Figure 7. The function u, a function v ∈ Su , and the function w = min Su v. The latter is harmonic in U1 and in U \ U 1 .

Clearly, w ∈ Su . Since the function u itself is also in Su , we have w ≤ u in U . On the other hand, by definition of Su , w ≥ u in U1 , whence we see that u = w in U1 (see Figure 7). In particular, it suffices to prove (10.1) for w instead of u. Let us show that w ∈ c0 (U ). Indeed, let v(x) = EU (x). Then, by (3.9) and the strong minimum principle, v is superharmonic and strictly positive in U . Hence, for a large enough constant C, we have Cv ≥ u in U1 , whence Cv ∈ Su and w ≤ Cv. Since v = 0 in U \ U , this implies w = 0 in U \ U and w ∈ c0 (U ). Denote f := −1w, and observe that f ≥ 0 in U . Since w ∈ c0 (U ), we have, for any x ∈ U , X w(x) = G U (x, z) f (z). (10.4) z∈U

Next we prove that f = 0 outside A so that the summation in (10.4) can be restricted to z ∈ A. Given this, we obtain, for all x, y ∈ B, P G U (y, z) f (z) w(y) = Pz∈A ≤ H, w(x) z∈A G U (x, z) f (z) whence (10.1) follows. We are left to verify that w is harmonic in U1 and outside U1 . Indeed, if x ∈ U1 , then 1w(x) = 1u(x) = 0 because w = u in U1 . Let 1w(x) 6= 0 for some x ∈ U \U1 . Since w is superharmonic, we have 1w(x) < 0 and X w(x) > Pw(x) = P(x, y)w(y). y∼x

488


Consider the function w0 , which is equal to w everywhere in U except for the point x; and w0 at x is defined to satisfy X w0 (x) = P(x, y)w0 (y). y∼x

Clearly, w0 (x) < w(x), and w0 is superharmonic in U . Since w0 = w = u in U1 , we have w0 ∈ Su . Hence, by definition (10.3) of w, w ≤ w0 in U , which contradicts w(x) > w0 (x). Proof of Proposition 10.1 Now we assume (H G) and prove (H ). Given any ball B(x0 , 2R) of radius R > 4 and a nonnegative harmonic function u in B(x0 , 2R), define the sequence of radii R0 = R, R1 = 3R/2, and R2 = 2R, and denote Ui = B(x0 , Ri ) for i = 0, 1, 2 and U3 = 0. By Lemma 10.2, we have inequality (10.1), which implies (H ), provided we can show that the Harnack constant H from (10.2) is bounded from above, uniformly in x0 and R. Indeed, if x, y ∈ B(x0 , R) and z ∈ A = B(x0 , 2R) \ B(x0 , 3R/2), then both distances d(z, x) and d(z, y) are between R/2 and 7R/2. By iterating (H G) in the annuli centered at z, we obtain G(y, z) g(z, y) = ≤ const, G(x, z) g(z, x) whence we see that H is indeed uniformly bounded from above. The condition R > 4, which we have imposed above, ensures that Ui ⊂ Ui+1 , which is required for Lemma 10.2. If R ≤ 4, then (H ) simply follows from ( p0 ) and Proposition 3.2. 11. Oscillation inequalities For any nonempty finite set U and a function u on U , denote osc u := max u − min u. U

U

U

The purpose of this section is to prove estimate (11.3), which provides the step (H ) =⇒ [osc] of the proof of Theorem 2.1. 11.1 Assume that elliptic Harnack inequality (H ) holds on (0, µ). Then, for any ε > 0, there exists σ = σ (ε, H ) < 1 such that, for any ball B(x, R) and for any function u defined in B(x, R) and harmonic in B(x, R), we have PROPOSITION

osc u ≤ ε osc u.

B(x,σ R)

B(x,R)

(11.1)


489

Proof Fix a ball B(x, R), and denote for simplicity Br = B(x, r ). Let us prove that, for any r ∈ (0, R/3], osc u ≤ (1 − δ) osc u, (11.2) Br

B3r

where δ = δ(H ) ∈ (0, 1). Then (11.1) follows from (11.2) by iterating. If r ≤ 1, then the left-hand side of (11.2) vanishes and (11.2) is trivially satisfied. If r > 1, then B2r ⊂ B3r , and the function u − min B3r u is nonnegative in B2r and harmonic in B2r . Applying Harnack inequality (H ) to this function, we obtain max u − min u ≤ H min u − min u B3r

Br

Br

B3r

and osc u ≤ (H − 1) min u − min u . Br

Br

B3r

Similarly, we have osc u ≤ (H − 1) max u − max u . Br

B3r

Br

Summing up these two inequalities, we conclude osc u ≤ C osc u − osc u , Br

B3r

Br

whence (11.2) follows. 11.2 Assume that elliptic Harnack inequality (H ) holds on (0, µ). Let u ∈ c0 (B(x, R)) satisfy in B(x, R) the equation 1u = f . Then, for any positive r < R, osc u ≤ 2 E(x, r ) + εE(x, R) max | f | , (11.3) PROPOSITION

B(x,σ r )

where σ and ε are the same as in Proposition 11.1. Proof Denote for simplicity Br = B(x, r ). By definition of the Green function, we have X u(y) = − G B R (y, z) f (z), z∈B R

whence, using (3.8), we obtain max |u| ≤ E(x, R) max | f | .

490


u v Br

BR

Figure 8. The functions u and v in the case f ≤ 0

Let v ∈ c0 (Br ) solve the Dirichlet problem 1v = f in Br (see Figure 8). In the same way, we have max |v| ≤ E(x, r ) max | f | . The function w = u − v is harmonic in Br whence, by Proposition 11.1, osc w ≤ ε osc w. Bσ r

Br

Since w = u on Br \ Br , the maximum principle implies that osc w = osc w = osc u ≤ 2 max |u|. Br

Br \Br

Br \Br

Hence, osc u ≤ osc v + osc w ≤ 2 max |v| + 2ε max |u| ≤ 2 E(x, r ) + εE(x, R) max | f |, Bσ r

Bσ r

Bσ r

which was to be proved. 12. Time derivative of the heat kernel Given a function u n (x) on 0 ×N, by the “time derivative” of u we mean the difference ∂n u := u n+2 − u n . The main result of this section is Proposition 12.3, which provides upper bound (12.5) for ∂n p and thus constitutes the part (DU E) =⇒ [deriv] of the proof of Theorem 2.1. The crucial point is that ∂n p decays as n → ∞ faster than pn . The analogue of the time derivative in the discrete case is ∂n p = pn+2 − pn rather than pn+1 − pn . Indeed, in Z D (as well as in any other bipartite graph), pn (x, x) = 0 if n is odd. Therefore, the difference pn+1 (x, x) − pn (x, x) is equal either to pn+1 (x, x) or to − pn (x, x), and hence it decays as n → ∞ at the same rate as pn (x, x). PROPOSITION 12.1 Let A be a nonempty finite subset of 0, and let f be a function on A. Define

u n (x) = PnA f (x).


491

Then, for all integers 1 ≤ k ≤ n, k∂n uk L 2 (A,µ) ≤

1 ku n−k k L 2 (A,µ) . k

Proof The proof follows the argument from [17]. Let φ1 , φ2 , . . . , φ|A| be the eigenfunctions of the Laplace operator −1 A , and let λ1 , λ2 ,. . . ,λ|A| be the corresponding eigenvalues. Let us normalize φi ’s to form an orthonormal basis in L 2 (A, µ). The function f can be expanded in this basis: X f = ci φi . i

Since

PA

= I − (−1 A ), we obtain un =

X

ρin φi ,

(12.1)

i

where ρi := 1 − λi are eigenvalues of the Markov operator P A . From (12.1), we obtain X u n − u n+2 = 1 − ρi2 ρin φi and ku n − u n+2 k2L 2 (A,µ) =

2 X 1 − ρi2 ρi2n .

(12.2)

i

Note that |ρi | ≤ 1 and hence ρi2 ∈ [0, 1]. For any a ∈ [0, 1], we have 1 ≥ (1 + a + a 2 + · · · + a k )(1 − a) ≥ ka k (1 − a), whence

1 . k Applying this inequality for a = ρi2 , we obtain, from (12.2), (1 − a) a k ≤

ku n − u n+2 k2L 2 (A,µ) ≤

1 X 2(n−k) 1 ρi = 2 ku n−k k2L 2 (A,µ) , k2 k i

which was to be proved. 12.2 Let A be a nonempty finite subset of 0. Then, for all x, y ∈ A, 1q A (x, x) p A p2m ∂n p A (x, y) ≤ 2(n−m−k) (y, y) k PROPOSITION

for all positive integers n, m, k such that m + k ≤ n.

(12.3)

492


Proof From semigroup identity (5.15) for p A , we obtain X ∂n p A (x, y) = pmA (x, z)∂n−m p A (z, y)µ(z), z∈A

whence

∂n p A (x, y) ≤ pmA (x, ·)

L 2 (A,µ)

∂n−m p A (y, ·)

L 2 (A,µ)

.

By Proposition 12.1,

∂n−m p A (y, ·)

L 2 (A,µ)

≤

1

A

pn−m−k (y, ·) 2 L (A,µ) k

for any 1 ≤ k ≤ n − m. Since

2

A

pm (x, ·) 2

L (A,µ)

=

X

A pmA (x, z)2 µ(z) = p2m (x, x),

z∈A

we obtain (12.3). PROPOSITION 12.3 Suppose that (DU E) holds; that is, for all x ∈ 0 and n ≥ 1,

pn (x, x) ≤ Cn −ν .

(12.4)

|∂n p(x, y)| ≤ Cn −ν−1 .

(12.5)

Then, for all x, y ∈ 0 and n ≥ 1,

Proof First, assume n > 3. Then we can choose k and m in (12.3) so that k ' m ' n/3 and n − m − k ' n/3. As follows from (12.4), for any nonempty finite set A ⊂ 0, A p2m (x, x) ≤ Cn −ν

and

A p2(n−m−k) (y, y) ≤ Cn −ν ,

whence, by Proposition 12.1, ∂n p A (x, y) ≤ Cn −ν−1 . By letting A → 0, we obtain (12.5). If n ≤ 3, then (12.5) follows from the trivial inequality |∂n p| ≤ pn + pn+2 and the fact that (12.4) implies a similar bound for pn (x, y) (cf. (5.17) and (5.18)).


493

13. Off-diagonal lower bound An important intermediate step in proving the lower estimate (L E) is a near-diagonal lower estimate pn (x, y) + pn+1 (x, y) ≥ cn −α/β (N L E) for all x, y ∈ 0 and n ≥ 1 such that d(x, y) ≤ δn 1/β .

(13.1)

In this section we finish the proof of lower bound (L E) in Theorem 2.1 as on the following diagram: (V ) + (G) =⇒ (F K ) + (V ) + (E) + (H ) =⇒ (N L E) + (V ) =⇒ (L E). The first implication here is given by Propositions 5.5, 6.3, and 10.1, whereas the other two are proved below. Let us recall that (DL E) refers to the lower bound B(x,R)

p2n

(x, x) ≥ cn −α/β ,

∀n ≤ ε R β ,

(DL E)

with some small enough ε > 0, and (DU E) refers to the upper bound pn (x, x) ≤ Cn −α/β .

(DU E)

Denote for simplicity by (E ≤) the upper bound in (E); that is, E(x, R) ≤ C R β ,

∀x ∈ 0, R ≥ 1.

(E ≤)

13.1 For any graph (0, µ), we have PROPOSITION

(DU E) + (DL E) + (E ≤) + (H ) =⇒ (N L E).

(13.2)

Consequently, if ( p0 ) holds on (0, µ), then (F K ) + (V ) + (E) + (H ) =⇒ (N L E).

(13.3)

Proof Let us first show how the second claim follows from the first one. Recall that, by Proposition 5.1, (F K ) =⇒ (DU E); by Proposition 7.1, (E) =⇒ (9); and, by Proposition 9.1, (9) + (V ) =⇒ (DL E). Hence, the hypotheses of (13.3) imply the hypotheses of (13.2). To prove (13.2), fix x ∈ 0, n ≥ 1, and set 1/β n R= (13.4) ε

494


for a small enough positive ε. So far we only assume that ε satisfies (DL E), but later one more upper bound on ε is imposed. Denote A = B(x, R), and introduce the function A u(y) := pnA (x, y) + pn+1 (x, y). By hypothesis (DL E), we have u(x) ≥ cn −α/β . Let us show that |u(x) − u(y)| ≤

c −α/β n 2

(13.5)

for all y such that d(x, y) ≤ δn 1/β , which would imply u(y) ≥ (c/2)n −α/β and hence prove (N L E). The function u(y) is in the class c0 (A) and solves the equation 1u(y) = f (y) where A f (y) := pn+2 (x, y) − pnA (x, y). On-diagonal upper bound (DU E) implies, by Proposition 12.3, max | f (y)| ≤ y

C n α/β+1

.

(13.6)

By (H ) and Proposition 11.2, we have, for any 0 < r < R and for some σ ∈ (0, 1), (13.7) osc u ≤ 2 E(x, r ) + ε2 E(x, R) max | f | . B(x,σ r )

By Proposition 6.1, (E ≤) implies a similar upper bound for E. Estimating max | f | by (13.6), we obtain, from (13.7), osc u ≤ C

B(x,σ r )

r β + ε2 R β . n α/β+1

Choosing r to satisfy r β = ε2 R β and substituting from (13.4) n = ε R β , we obtain osc u ≤ C

B(x,σ r )

ε2 R β = Cεn −α/β , n α/β+1

which implies osc u ≤

B(x,σ r )

c −α/β n , 2

(13.8)

provided ε is small enough. Note that σ r = σ ε 2/β R = σ ε2/β

1/β n = σ ε1/β n 1/β = δn 1/β , ε

where δ := σ ε1/β . Hence, (13.8) implies (13.5), provided d(x, y) ≤ δn 1/β , which was to be proved.


495

The final step in proving part (V ) + (G) =⇒ (L E) of Theorem 2.1 is covered by the following statement. Denote by (V ≥) the lower bound in (V ) that is, V (x, R) ≥ c R α ,

∀x ∈ 0, R ≥ 1.

(13.9)

13.2 Assume that (0, µ) satisfies ( p0 ). Then PROPOSITION

(N L E) + (V ≥) =⇒ (L E). We precede the proof with the following lemmas. Denote for simplicity Pñ = Pn + Pn+1 ,

(13.10)

where Pn is the n-convolution power of the Markov operator P. In particular, we have Pn Pm = Pn+m .

(13.11)

We need a replacement for this property for the operator Pñ , which is stated in Lemma 13.5. LEMMA 13.3 Assume that ( p0 ) holds on (0, µ). Then, for all integers n ≥ l ≥ 1 such that

n≡l

(mod 2),

(13.12)

we have Pl (x, y) ≤ C n−l Pn (x, y)

(13.13)

for all x, y ∈ 0, with a constant C = C( p0 ). Proof By semigroup property (5.15), we have X Pk+2 (x, y) = Pk (x, z)P2 (z, y) ≥ Pk (x, y)P2 (y, y). z∈0

Using ( p0 ), we obtain P2 (y, y) =

X z∼y

P(y, z)P(z, y) ≥ p0

X

P(y, z) = p0 ,

z∼y

whence Pk+2 (x, y) ≥ p0 Pk (x, y). Iterating this inequality, we obtain (13.13) with −1/2 C = p0 .

496


LEMMA 13.4 Assume that (0, µ) satisfies ( p0 ). Then, for all integers n ≥ l≥ 1 and all x, y ∈ 0,

P˜l (x, y) ≤ C n−l Pñ (x, y),

(13.14)

where C = C( p0 ). Remark 13.1 Note that no parity condition is required here in contrast to condition (13.12) of Lemma 13.3. Proof This is an immediate consequence of Lemma 13.3 because both Pl (x, y) and Pl+1 (x, y) can be estimated from above via either Pn (x, y) or Pn+1 (x, y) depending on the parity of n and l. LEMMA 13.5 Assume that (0, µ) satisfies ( p0 ). Then, for all n, m ∈ N and x, y ∈ 0, we have the following inequality:

Pñ P˜m (x, y) ≤ C Pñ+m+1 (x, y),

(13.15)

where C = C( p0 ). Proof Observe that, by (13.10) and (13.11), Pñ P˜m = (Pn + Pn+1 )(Pm + Pm+1 ) = Pn+m + 2Pn+m+1 + Pn+m+2 . By Lemma 13.3, Pn+m (x, y) ≤ C Pn+m+2 , whence Pñ P˜m (x, y) ≤ C(Pn+m+1 + Pn+m+2 ) = C Pñ+m+1 .

LEMMA 13.6 Assume that (0, µ) satisfies ( p0 ). Then, for all x, y ∈ 0 and k, m, n ∈ N such that n ≥ km + k − 1, we have the following inequality: k P˜m (x, y) ≤ C n−km Pñ (x, y). (13.16)

Proof By induction, (13.15) implies k P˜m (x, y) ≤ C k−1 P˜km+k−1 (x, y).


497

From inequality (13.14) with l = km + k − 1, we obtain P˜km+k−1 (x, y) ≤ C n−km−(k−1) Pñ (x, y), whence (13.16) follows. Proof of Proposition 13.2 Since Pñ (x, y) = ( pn (x, y) + pn+1 (x, y))µ(y), (N L E) can be stated as follows: Pñ (x, y) ≥ cn −α/β µ(y) if d(x, y) ≤ δn 1/β .

(13.17)

The required (L E) takes the form β d (x, y) 1/(β−1) −α/β ˜ Pn (x, y) ≥ cn µ(y) exp − . cn

(13.18)

To prove (13.18), fix x, y ∈ 0, n ≥ d(x, y), and consider the following cases: Case 1: d(x, y) ≤ δn 1/β , Case 2: δn 1/β < d(x, y) ≤ εn, Case 3: εn < d(x, y) ≤ n. Here δ is the constant from (13.17) and ε > 0 is a small constant to be chosen later. In the first case, (13.18) coincides with (13.17). In the third case, (13.18) becomes Pñ (x, y) ≥ cn −α/β µ(y) exp(−Cn), (13.19) which can be deduced directly from ( p0 ). Indeed, depending on the parity of n, there is a path from x to y of length either n or n + 1. The Px -probability that the random −(n+1) walk follows this path is at least p0 , whence Pñ (x, y) ≥ exp(−Cn). This implies (13.19), using the fact that µ(y) ≤ C. The latter is proved as follows. Take, in (13.17), x ∼ y and n ' δ −β . Then (13.17) implies 1 ≥ Pñ (x, y) ≥ cδ α µ(y), whence µ(y) ≤ C. Consider the main second case. Denote d = d(x, y), take a positive integer k such that k ≤ d, (13.20)

498


ok−1 o2

y = ok

o3

x = o1

Figure 9. The chain of balls B(oi , r )

and define m by n − 1. m= k

(13.21)

Since k ≤ d ≤ εn, we see that n/k ≥ ε−1 and that m is positive. Since n ≥ k(m + 1), Lemma 13.6 applies and yields k C n−mk Pñ (x, y) ≥ P˜m (x, y). (13.22) In order to estimate ( P˜m )k (x, y), observe that there exists a sequence o1 , o2 , . . . , ok of points on 0 such that x = o1 , y = ok , and, for all i = 1, 2, . . . , k − 1, d(x, y) =: r (13.23) d(oi , oi+1 ) ≤ k (see Figure 9). Clearly, we have X k P˜m (x, y) ≥ z 1 ∈B(o1 ,r )

X

···

P˜m (x, z 1 ) P˜m (z 1 , z 2 ) · · · P˜m (z k−1 , y).

z k−1 ∈B(ok−1 ,r )

(13.24) Assume that we have, in addition, 3r ≤ δm 1/β .

(13.25)

Since d(z i−1 , z i ) ≤ 3r , each P˜m (z i−1 , z i ) can be estimated by (13.17) as follows: P˜m (z i−1 , z i ) ≥ cm −α/β µ(z i ). The same applies to P˜m (x, z 1 ) and P˜m (z k−1 , y). Using the lower bound of volume (13.9), we obtain, from (13.22) and 13.24), C n−mk Pñ (x, y) ≥ (cm −α/β )k−1 V (o1 , r ) · · · V (ok−1 , r )µ(y) ≥ ck m −(α/β)k r α(k−1) µ(y).


499

Hence, Pñ (x, y) ≥ cn−mk+k m −(α/β)k r α(k−1) ≥ ck m −α/β

r

α(k−1)

m 1/β

,

(13.26)

where we have used the fact that n − mk + k ≤ 3k, which follows from (13.21). Before we go further, let us specify the choice of k to ensure that both (13.20) and (13.25) hold. Using definitions (13.21) and (13.23) of m and r , we see that (13.25) is equivalent to 1/β d n C ≤δ k k or β 1/(β−1) −β/(β−1) d k ≥ Cδ . (13.27) n Let k be the minimal possible integer satisfying (13.27). By the hypothesis d ≥ δn 1/β , we have β 1/(β−1) d k' . (13.28) n Condition (13.20) follows from the hypothesis n ≥ ε−1 d, provided ε is small enough. From (13.28), (13.21), and (13.25), we obtain β/(β−1) 1/(β−1) n n m' and r' . d d Hence, by (13.26) and m ≤ n/k, Pñ (x, y) ≥ ck m −α/β ≥ n −α/β k α/β exp(−Ck) ≥ n −α/β exp(−C 0 k). Substituting here k from (13.28), we obtain (13.18). 14. Parity matters Let us recall that (L E) contains the estimate for pn + pn+1 rather than for pn . In this section we discuss to what extent it is possible to estimate pn from below. In general, there is no lower bound for pn (x, y) for the parity reason. Indeed, on any bipartite graph, the length of any path from x to y has the same parity as d(x, y). Therefore, pn (x, y) = 0 if n 6≡ d(x, y) (mod 2). We immediately obtain the following result for bipartite graphs. PROPOSITION 14.1 If (0, µ) is bipartite and satisfies (L E), then d(x, y)β 1/(β−1) pn (x, y) ≥ cn −α/β exp − cn

(14.1)

500


for all x, y ∈ 0 and n ≥ 1 such that n ≥ d(x, y)

and

n ≡ d(x, y)

(mod 2).

(14.2)

Proof Indeed, assuming (14.2), n + 1 and d(x, y) have different parities whence pn+1 (x, y) = 0, and (14.1) follows from (L E). If there is enough “mixing of parity” in the graph, then one does get the lower bound regardless of the parity of n and d(x, y). 14.2 Assume that graph (0, µ) satisfies ( p0 ), (L E), and the following “mixing” condition: there is an odd positive integer n 0 such that PROPOSITION

inf Pn 0 (x, x) > 0.

x∈0

(14.3)

Then lower bound (14.1) holds for all n > n 0 and x, y ∈ 0, provided n ≥ d(x, y). For example, if n 0 = 1, then hypothesis (14.3) means that each point x ∈ 0 has a loop edge x x. If n 0 = 3 and there are no loops, then (14.3) means that, for each point x ∈ 0, there is an edge triangle x y, yz, zx. This property holds, in particular, for the graphical Sierpiński gasket (see Figure 1). Proof By (9.2) we obtain, for any positive integer m, 2 X 1 1 p2m (x, x) ≥ pm (x, z)µ(z) = . V (x, m + 1) V (x, m + 1) z∈B(x,m+1)

Condition ( p0 ) and Proposition 3.1 imply V (x, m + 1) ≤ C m+1 µ(x), whence P2m (x, x) = p2m (x, x)µ(x) ≥ C −m−1 . Since we use this lower estimate only for the bounded range of m ≤ m 0 , we can rewrite it as P2m (x, x) ≥ c, (14.4) where c = c(m 0 ) > 0. Assuming n > n 0 , we have, by semigroup property (5.15), X pn (x, y) = pn−n 0 (x, z)Pn 0 (z, y) ≥ pn−n 0 (x, y)Pn 0 (y, y) z∈0

(14.5)


501

x

m

ξ o

y

m

−m

Figure 10. Every path of odd length from x to y goes through o and ξ

and, in the same way, pn (x, y) ≥ pn−n 0 +1 Pn 0 −1 (y, y).

(14.6)

By hypothesis (14.3), we can estimate Pn 0 (y, y) from below by a positive constant. Also, Pn 0 −1 (y, y) is bounded below by a constant, as in (14.4). Hence, adding up (14.5) and (14.6), we obtain pn (x, y) ≥ c( pn−n 0 (x, y) + pn−n 0 +1 (x, y)).

(14.7)

The right-hand side of (14.7) can be estimated from below by (L E), whence (14.1) follows. Finally, let us show an example that explains why in general one cannot replace pn + pn+1 in (L E) by pn , even assuming the parity condition n ≡ d(x, y) (mod 2). Example 14.1 Let (0, µ) be Z D with the standard weight µx y = 1 for x ∼ y, and let D > 4. We modify 0 by adding one more edge ξ of weight 1, which connects the origin o = (0, 0, . . . , 0) to the point (1, 1, 0, 0, . . . , 0), and we denote the new graph by (0 0 , µ0 ). Clearly, the volume growth and the Green kernel on (0 0 , µ0 ) are of the same order

502


as on (0, µ); that is, V (x, r ) ' r D

and

g(x, y) ' d(x, y)2−D .

Hence, for both graphs one has, by Theorem 2.1, d 2 (x, y) −D/2 pn (x, y) ≤ Cn exp − Cn

(14.8)

and a similar lower bound (L E) for pn + pn+1 . Since Z D is bipartite, we have for (0, µ), by Proposition 14.1, d 2 (x, y) −D/2 pn (x, y) ≥ cn exp − if n ≥ d(x, y) and n ≡ d(x, y) (mod 2). cn (14.9) Let us show that (0 0 , µ0 ) does not satisfy (14.9). Fix some (large) odd integer m, and consider points x = (m, m, 0, 0, . . . , 0) and y = −x (see Figure 10). The distance d(x, y) on 0 is equal to 4m, whereas the distance d 0 (x, y) on 0 0 is 4m − 1, due to the shortcut ξ . Denote n = m 2 . Then n ≡ d 0 (x, y) (mod 2) and n > d 0 (x, y). Let us estimate from above pn (x, y) on (0 0 , µ0 ) and show that it does not satisfy lower bound (14.9). Since n is odd and all odd paths from x to y have to go through the edge ξ , the strong Markov property yields pn (x, y) =

n X

Px (τ = k) pn−k (o, y),

(14.10)

k=0

where τ is the first time the random walk hits the point o. If n − k < m, then pn−k (o, y) = 0. If n − k ≥ m, then we estimate pn−k (o, y) by (14.8) as follows: pn−k (o, y) ≤

C (n − k)

D/2

≤

C . m D/2

Therefore, (14.10) implies pn (x, y) ≤ Cm −D/2 Px {τ < ∞} . The Px -probability to hit o is of the order g(x, o) ' m 2−D . Hence, we obtain pn (x, y) ≤ Cm −(3D/2−2) = Cn −(3D/4−1) = o(n −D/2 ) so that lower bound (14.9) cannot hold. A more careful argument shows that, in fact, pn (x, y) ' n −(D−1) .


503

15. Consequences of the heat kernel estimates Here we prove the remaining part of Theorem 2.1, as stated in the next proposition. PROPOSITION 15.1 Assuming ( p0 ), we have

(L E) + (U E) =⇒ (V ) + (G). Proof The Green kernel is related to the heat kernel by g(x, y) =

∞ X

pn (x, y).

(15.1)

n=0

Let x 6= y. Then p0 (x, y) = 0, and the upper bound (U E) for pn implies the upper bound for g as follows: g(x, y) ≤ C

∞ X

n

−α/β

exp

n=1

dβ −c n

1/(β−1) ,

where d = d(x, y). By estimating the sum via an integral, we obtain g(x, y) ≤ Cd −γ with γ = α − β. Similarly, one proves g(x, y) ≤ Cd −γ using (L E) and the obvious consequence of (15.1): ∞

g(x, y) ≥

1X ( pn (x, y) + pn+1 (x, y)). 2 n=1

Let us prove the upper bound for the volume V (x, R) ≤ C R α for any x ∈ 0 and R ≥ 1. Indeed, for any n ∈ N, we have X pn (x, y)µ(y) ≡ 1, y∈0

whence X

( pn (x, y) + pn+1 (x, y))µ(y) ≤ 2

yeB(x,R)

and V (x, R) ≤ 2

inf

−1 ( pn (x, y) + pn+1 (x, y)) .

y∈B(x,R)

(V ≤)

(15.2)

504


Taking n ' R β and applying (L E), we see that the inf is bounded below by cn −α/β ' R −α whence (V ≤) follows. Let us prove the lower bound for the volume V (x, R) ≥ c R α .

(V ≥)

We first show that (U E) and (V ≤) imply the following inequality: X

pn (x, y)µ(y) ≤

y ∈B(x,R) /

1 , 2

∀n ≤ ε R β ,

(15.3)

provided ε > 0 is sufficiently small. Denoting Rk = 2k R, we have X X d(x, y)β 1/(β−1) pn (x, y)µ(y) ≤ C n −α/β exp − c n y ∈B(x,R) /

y ∈B(x,R) /

≤C

∞ X

X

k=0 y∈B(x,Rk+1 )\B(x,Rk )

β 1/(β−1) Rk n −α/β exp − c n

β Rk 1/(β−1) ≤C exp − c n k=0 ∞ X 2k R α 2k R β/(β−1) =C exp − c 1/β . n 1/β n ∞ X

Rkα n −α/β

(15.4)

k=0

If R/n 1/β is large enough, then the right-hand side of (15.4) is majorized by a geometric series, and the sum can be made arbitrarily small, in particular, smaller than 1/2. From (15.2) and (15.3), we conclude that X

pn (x, y)µ(y) ≥

y∈B(x,R)

whence 1 V (x, R) ≥ 2

sup

1 , 2

(15.5)

−1 pn (x, y) .

y∈B(x,R)

Finally, choosing n = [ε R β ] and using the upper bound pn (x, y) ≤ Cn −α/β , we obtain (V ≥). This argument works only if ε R β ≥ 1. Let us now prove (V ≥) for the opposite β case when ε R β < 1. To that end, define R0 by ε R0 = 1. Then we have R < Ro . By hypothesis ( p0 ) and Proposition 3.1, we have V (x, R0 ) ≤ Cµ(x). Combining with the lower bound (V ≥) for V (x, R0 ), we obtain µ(x) ≥ c > 0. In particular, for any R > 0, we have V (x, R) ≥ c, which implies (V ≥) for the bounded range of R.


505

Remark 15.1 Using a similar argument, one can also show the following implication: (V ) + (U E) + (H ) =⇒ (L E).

(15.6)

Indeed, as we have seen in the proof of Proposition 15.1, (U E) implies (G ≤), which, together with (V ), is enough to obtain (E ≤) (see Proposition 6.3). From (U E) and (V ), one obtains the diagonal lower bound p2n (x, x) ≥ cn −α/β . Indeed, from (9.2) and (15.5) with R = Cn 1/β , we deduce p2n (x, x) ≥

1 V (x, R)

X

2 pn (x, y) dµ(y) ≥

y∈B(x,R)

1 ' n −α/β . 4V (x, R)

From this estimate, one gets (DL E) (see [56]; the argument is similar to the proof of (6.6)). Also, (DU E) follows trivially from (U E). Hence, having (DU E), (DL E), (E ≤), and (H ), we obtain (N L E) by Proposition 13.1, and then we deduce (L E) from (N L E) + (V ) by Proposition 13.2. Implication (15.6) yields that (V ) + (U E) + (H ) is equivalent to either of our main conditions (V ) + (G) and (U E) + (L E). Indeed, we have (V ) + (G) =⇒ (V ) + (U E) + (H ) =⇒ (U E) + (L E), where the first implication follows by Theorem 2.1 and Proposition 10.1, and the second is the same as (15.6). We are left to close the circle by Theorem 2.1 or Proposition 15.1. Appendix. The list of the lettered conditions Here we provide a list of the lettered conditions frequently used in the paper. The relations among the exponents α, β, γ , ν are as follows: α > β ≥ 2,

γ = α − β,

and

ν = α/β.

In all conditions, n is an arbitrary positive integer, R is an arbitrary positive real number, and x, y are arbitrary points on 0, subject to additional restrictions if any. The constants C, c, δ, ε, p0 are positive. We have the following list: V (x, R) ' R α ,

∀R ≥ 1,

(V )

E(x, R) ' R β ,

∀R ≥ 1,

(E)

g(x, y) ' d(x, y)−γ ,

x 6 = y,

(G)

506


V (x, 2R) ≤ C V (x, R),

(D)

E(x, R) ≤ C E(x, R),

(E)

λ1 (A) ≥ cµ(A)−1/ν

for all nonempty finite sets A ⊂ 0,

pn (x, x) ≤ Cn −1/ν , pn (x, y) ≤ Cn

( pn + pn+1 )(x, y) ≥ cn

−α/β

B(x,R)

p2n

−α/β

exp −

exp −

d(x, y)β Cn

d(x, y)β cn

(x, x) ≥ cn −α/β

pn (x, y) + pn+1 (x, y) ≥ cn −α/β

(F K ) (DU E)

1/(β−1) ,

(U E)

1/(β−1) if n ≥ d(x, y), (L E)

if n ≤ ε R β ,

if d(x, y) ≤ δn 1/β ,

(DL E) (N L E)

β 1/(β−1) R 9n (x, R) := Px (TB(x,R) ≤ n) ≤ C exp − , Cn P(x, y) ≥ p0

if x ∼ y,

max B(x,R) u ≤ H min B(x,R) u

(9) ( p0 ) (H )

for any function u nonnegative in B(x, 2R) and harmonic in B(x, 2R). References [1]

[2] [3]

D. G. ARONSON, Non-negative solutions of linear parabolic equations, Ann. Scuola

Norm. Sup. Pisa (3) 22 (1968), 607–694, MR 55:8553; Addendum, Ann. Scuola Norm. Sup. Pisa (3) 25 (1971), 221–228. MR 55:8554 467 P. AUSCHER, Regularity theorems and heat kernel for elliptic operators, J. London Math. Soc. (2) 54 (1996), 284–296. MR 97f:35034 467 P. AUSCHER and T. COULHON, Gaussian lower bounds for random walks from elliptic regularity, Ann. Inst. H. Poincaré Probab. Statist. 35 (1999), 605–630. MR 2000m:60086 467


[4]

[5]

[6] [7] [8]

[9] [10] [11]

[12]

[13]

[14]

[15] [16] [17]

[18]

[19] [20] [21]

507

M. T. BARLOW, “Diffusions on fractals” in Lectures on Probability Theory and

Statistics (Saint-Flour, France, 1995), Lecture Notes in Math. 1690, Springer, Berlin, 1998, 1–121. MR 2000a:60148 452, 458, 459 M. T. BARLOW and R. F. BASS, The construction of Brownian motion on the Sierpiński carpet, Ann. Inst. H. Poincaré Probab. Statist. 25 (1989), 225–257. MR 91d:60183 465 , Transition densities for Brownian motion on the Sierpiński carpet, Probab. Theory Related Fields 91 (1992), 307–330. MR 93k:60203 465, 467 , Brownian motion and harmonic analysis on Sierpinski carpets, Canad. J. Math. 51 (1999), 673–744. MR 2000i:60083 453, 459, 465, 467 , “Random walks on graphical Sierpinski carpets” in Random Walks and Discrete Potential Theory (Cortona, Italy, 1997), Sympos. Math. 39, Cambridge Univ. Press, Cambridge, 1999, 26–55. MR CMP 1 802 425 453, 458, 459, 467 M. T. BARLOW, T. COULHON, and A. GRIGOR’YAN, Manifolds and graphs with slow heat kernel decay, to appear in Invent. Math. 458, 464 M. T. BARLOW and E. A. PERKINS, Brownian motion on the Sierpiński gasket, Probab. Theory Related Fields 79 (1988), 543–623. MR 89g:60241 453 I. BENJAMINI, I. CHAVEL, and E. A. FELDMAN, Heat kernel lower bounds on Riemannian manifolds using the old ideas of Nash, Proc. London Math. Soc. (3) 72 (1996), 215–240. MR 97c:58150 467 A. BOUKRICHA, Das Picard-Prinzip und verwandte Fragen bei Störung von harmonischen Räumen, Math. Ann. 239 (1979), 247–270. MR 81h:31018 467, 486 E. A. CARLEN, S. KUSUOKA, and D. W. STROOCK, Upper bounds for symmetric Markov transition functions, Ann. Inst. H. Poincaré Probab. Statist. 23 (1987), 245–287. MR 88i:35066 467, 470 G. CARRON, “Inégalités isopérimétriques de Faber-Krahn et conséquences” in Actes de la table ronde de géométrie différentielle (Luminy, France, 1992), Semin. Congr. 1, Soc. Math. France, Montrouge, 1996, 205–232. MR 97m:58198 465, 469 I. CHAVEL, Eigenvalues in Riemannian Geometry, Pure Appl. Math. 115, Academic Press, Orlando, 1984. MR 86g:58140 452 , Isoperimetric inequalities and heat diffusion on Riemannian manifolds, lecture notes, 1999. 452 S. Y. CHENG, P. LI, and S.-T. YAU, On the upper estimate of the heat kernel of a complete Riemannian manifold, Amer. J. Math. 103 (1981), 1021–1063. MR 83c:58083 467, 491 S. Y. CHENG and S.-T. YAU, Differential equations on Riemannian manifolds and their geometric applications, Comm. Pure Appl. Math. 28 (1975), 333–354. MR 52:6608 459 F. R. K. CHUNG, Spectral Graph Theory, CBMS Regional Conf. Ser. in Math. 92, Amer. Math. Soc., Providence, 1997. MR 97k:58183 460 T. COULHON, Ultracontractivity and Nash type inequalities, J. Funct. Anal. 141 (1996), 510–539. MR 97j:47055 469 T. COULHON and A. GRIGOR’YAN, On-diagonal lower bounds for heat kernels and

508

[22] [23] [24] [25] [26] [27] [28]

[29]

[30]

[31]

[32] [33] [34] [35]

[36]

[37]

[38]


Markov chains, Duke Math. J. 89 (1997), 133–199. MR 98e:58159 467 , Random walks on graphs with regular volume growth, Geom. Funct. Anal. 8 (1998), 656–701. MR 99e:60153 458, 460, 471 T. COULHON and L. SALOFF-COSTE, Puissances d’un opérateur régularisant, Ann. Inst. H. Poincaré Probab. Statist. 26 (1990), 419–436. MR 91j:43002 467 , Minorations pour les chaˆınes de Markov unidimensionnelles, Probab. Theory Related Fields 97 (1993), 423–431. MR 95b:60085 467 E. B. DAVIES, Heat Kernels and Spectral Theory, Cambridge Tracts in Math. 92, Cambridge Univ. Press, Cambridge, 1989. MR 90e:35123 452 , Pointwise bounds on the space and time derivatives of heat kernels, J. Operator Theory 21 (1989), 367–378. MR 90k:58214 467 , Non-Gaussian aspects of heat kernel behaviour, J. London Math. Soc. (2) 55 (1997), 105–125. MR 97i:58169 467 T. DELMOTTE, Parabolic Harnack inequality and estimates of Markov chains on graphs, Rev. Mat. Iberoamericana 15 (1999), 181–232. MR 2000b:35103 453, 458 E. B. FABES and D. W. STROOCK, A new proof of Moser’s parabolic Harnack inequality using the old ideas of Nash, Arch. Rational Mech. Anal. 96 (1986), 327–338. MR 88b:35037 467 S. GOLDSTEIN, “Random walks and diffusion on fractals” in Percolation Theory and Ergodic Theory of Infinite Particle Systems (Minneapolis, 1984/85), ed. H. Kesten, IMA Vol. Math. Appl. 8, Springer, New York, 1987, 121–129. MR 88g:60245 453 A. A. GRIGOR’YAN, The heat equation on non-compact Riemannian manifolds (in Russian), Mat. Sb. 182, no. 1 (1991), 55–87; English translation in Math. USSR-Sb. 72, no. 1 (1992), 47–77. MR 92h:58189 453 , Heat kernel upper bounds on a complete non-compact manifold, Rev. Mat. Iberoamericana 10 (1994), 395–452. MR 96b:58107 465, 469 , Integral maximum principle and its applications, Proc. Roy. Soc. Edinburgh Sect. A 124 (1994), 353–362. MR 95c:35045 465 , Upper bounds of derivatives of the heat kernel on an arbitrary complete manifold, J. Funct. Anal. 127 (1995), 363–389. MR 96a:58183 467 , “Estimates of heat kernels on Riemannian manifolds” in Spectral Theory and Geometry (Edinburgh, 1998), ed. B. Davies and Yu. Safarov, London Math. Soc. Lecture Note Ser. 273, Cambridge Univ. Press, Cambridge, 1999, 140–225. MR 2001b:58040 452 B. M. HAMBLY and T. KUMAGAI, Transition density estimates for diffusion processes on post critically finite self-similar fractals, Proc. London Math. Soc. (3) 78 (1999), 431–458. MR 99m:60118 453 W. HEBISCH and L. SALOFF-COSTE, Gaussian estimates for Markov chains and random walks on groups, Ann. Probab. 21 (1993), 673–709. MR 94m:60144 452, 458 , On the relation between elliptic and parabolic Harnack inequalities, preprint, 2000. 467, 468


509

[39]

O. D. JONES, Transition probabilities for the simple random walk on the Sierpiński

[40]

J. KIGAMI, Harmonic calculus on p.c.f. self-similar sets, Trans. Amer. Math. Soc. 335

graph, Stochastic Process. Appl. 61 (1996), 45–69. MR 97b:60115 453, 458

[41] [42]

[43]

[44] [45]

[46]

[47] [48]

[49] [50]

[51] [52]

[53] [54]

[55]

(1993), 721–755. MR 93d:39008 453 , Harmonic calculus on limits of networks and its application to dendrites, J. Funct. Anal. 128 (1995), 48–86. MR 96e:60130 453 J. KIGAMI and M. L. LAPIDUS, Weyl’s problem for the spectral distribution of Laplacians on p.c.f. self-similar fractals, Comm. Math. Phys. 158 (1993), 93–125. MR 94m:58225 453 N. V. KRYLOV and M. V. SAFONOV, A certain property of solutions of parabolic equations with measurable coefficients (in Russian), Izv. Akad. Nauk SSSR Ser. Mat. 44, no. 1 (1980), 161–175, 239; English translation in Math. USSR-Izv. 16 (1981), 151–164. MR 83c:35059 467 T. KUMAGAI, Estimates of transition densities for Brownian motion on nested fractals, Probab. Theory Related Fields 96 (1993), 205–224. MR 94e:60068 453 S. KUSUOKA and Z. X. YIN [X. Y. ZHOU], Dirichlet forms on fractals: Poincaré constant and resistance, Probab. Theory Related Fields 93 (1992), 169–196. MR 94e:60069 453 E. M. LANDIS, The Second Order Equations of Elliptic and Parabolic Type (in Russian), Izdat. “Nauka,” Moscow, 1971, MR 47:9044; English translation in Transl. Math. Monogr. 171, Amer. Math. Soc., Providence, 1998. MR 98k:35034 467 P. LI and S.-T. YAU, On the parabolic kernel of the Schrödinger operator, Acta Math. 156 (1986), 153–201. MR 87f:58156

452 F. LUST-PIQUARD, Lower bounds on K n 1→∞ for some contractions K of L 2 (µ), with applications to Markov operators, Math. Ann. 303 (1995), 699–712. MR 96m:47055 458, 467 J. MOSER, On Harnack’s theorem for elliptic differential equations, Comm. Pure Appl. Math. 14 (1961), 577–591. MR 28:2356 467 , A Harnack inequality for parabolic differential equations, Comm. Pure Appl. Math. 17 (1964), 101–134, MR 28:2357; Correction, Comm. Pure Appl. Math. 20 (1967), 231–236. MR 34:3121 467 J. NASH, Continuity of solutions of parabolic and elliptic equations, Amer. J. Math. 80 (1958), 931–954. MR 20:6592 470 F. O. PORPER and S. D. E` ˘IDEL’MAN, Two-side estimates of fundamental solutions of second-order parabolic equations and some applications (in Russian), Uspekhi Mat. Nauk 39, no. 3 (1984), 107–156; English translation in Russian Math. Surveys 39, no. 3 (1984), 119–179. MR 86b:35078 452 L. SALOFF-COSTE, A note on Poincaré, Sobolev, and Harnack inequalities, Internat. Math. Res. Notices 1992, 27–38. MR 93d:58158 453 , Isoperimetric inequalities and decay of iterated kernels for almost-transitive Markov chains, Combin. Probab. Comput. 4 (1995), 419–442. MR 97c:60171 458 R. SCHOEN and S.-T. YAU, Lectures on Differential Geometry, Conf. Proc. Lecture

510

[56]

[57] [58] [59] [60] [61] [62]

[63] [64] [65]

[66] [67]


Notes Geom. Topology 1, International Press, Cambridge, Mass., 1994. MR 97d:53001 452 D. W. STROOCK, “Estimates on the heat kernel for second order divergence form operators” in Probability Theory (Singapore, 1989), ed. L. H. Y. Chen, K. P. Choi, K. Hu, and J. H. Lou, de Gruyter, Berlin, 1992, 29–44. MR 93m:35092 467, 505 K-T. STURM, Analysis on local Dirichlet spaces, III: The parabolic Harnack inequality, J. Math. Pures Appl. (9) 75 (1996), 273–297. MR 97k:31010 454 M. TAKEDA, On a martingale method for symmetric diffusion processes and its applications, Osaka J. Math. 26 (1989), 605–623. MR 91d:60193 465 A. TELCS, Random walks on graphs, electric networks and fractals, Probab. Theory Related Fields 82 (1989), 435–449. MR 90h:60065 453, 458, 465, 475, 477, 478 , Spectra of graphs and fractal dimensions, I, Probab. Theory Related Fields 85 (1990), 489–497. MR 91k:60075 478 , Spectra of graphs and fractal dimensions, II, J. Theoret. Probab. 8 (1995), 77–96. MR 96d:60107 478 , Transition probability estimates for reversible Markov chains, Electron. Comm. Probab. 5 (2000), 29–37, http://www.math.washington.edu/˜ejpecp/ MR 2001b:60088 458 N. TH. VAROPOULOS, Hardy-Littlewood theory for semigroups, J. Funct. Anal. 63 (1985), 240–260. MR 87a:31011 469 , Analysis on nilpotent groups, J. Funct. Anal. 66 (1986), 406–431. MR 88h:22014 452 N. TH. VAROPOULOS, L. SALOFF-COSTE, and T. COULHON, Analysis and Geometry on Groups, Cambridge Tracts in Math. 100, Cambridge Univ. Press, Cambridge, 1992. MR 95f:43008 452 W. WOESS, Random walks on infinite graphs and groups—a survey on selected topics, Bull. London Math. Soc. 26 (1994), 1–60. MR 94i:60081 452, 459 X. Y. ZHOU, Resistance dimension, random walk dimension and fractal dimension, J. Theoret. Probab. 6 (1993), 635–652. MR 95d:60119 453

Girgor’yan Department of Mathematics, Imperial College, 180 Queen’s Gate, London SW7 2BZ, United Kingdom; [email protected] Telcs International Management Center, Graduate School of Business, Budapest, Zrinyi u. 14, Budapest H-1051, Hungary; [email protected]


THE GLOBAL NILPOTENT VARIETY IS LAGRANGIAN VICTOR GINZBURG

Abstract The purpose of this paper is to present a short elementary proof of a theorem due to G. Faltings and G. Laumon, which says that the global nilpotent cone is a Lagrangian substack in the cotangent bundle of the moduli space of G-bundles on a complex compact curve. This result plays a crucial role in the geometric Langlands program (see [BD]) since it insures that the D -modules on the moduli space of Gbundles whose characteristic variety is contained in the global nilpotent cone are automatically holonomic and, in particular, have finite length. Let (M, ω) be a smooth symplectic algebraic variety. A (possibly singular) algebraic subvariety Y ⊂ M is said to be isotropic, respectively, Lagrangian, if the tangent space, Ty Y , at any regular point y ∈ Y is an isotropic, respectively, Lagrangian, vector subspace in the symplectic vector space Ty M. (We always assume Y to be reduced but not necessarily irreducible.) The following characterisation of isotropic subvarieties proved, for example, in [CG, Prop. 1.3.30], is used later: Y ⊂ M is isotropic if and only if, for any smooth locally closed subvariety W ⊂ Y , we have ω|W = 0 . (Here W is possibly contained in the singular locus of Y .) An advantage of this characterisation is that it allows us to extend the notion of “being isotropic” from algebraic subvarieties to semialgebraic constructible subsets. Thus, we call a semialgebraic constructible subset Y ⊂ M isotropic if ω|W = 0 for any smooth locally closed algebraic variety W ⊂ Y . e M, Now, let M be a smooth stack that can be locally presented as p : M e where M is a smooth algebraic variety and p is a smooth surjective morphism; for example, M is locally isomorphic to the quotient of a smooth algebraic variety modulo an algebraic action of an algebraic group (see, e.g., [LMB]). We have a natural diagram e ,→ e T ∗ M T ∗ M ×M M T ∗ M. DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 3, Received 16 August 2000. Revision received 13 October 2000. 2000 Mathematics Subject Classification. Primary 53D12, 14D20.

511

512

VICTOR GINZBURG

A substack Z ⊂ T ∗ M is said to be constructible, respectively, isotropic or Lae is a constructible, respectively, isotropic or Lagrangian, subgrangian, if (Z × M M) ∗ e relative to the standard symplectic structure on the cotangent bundle of a set of T M smooth variety. Let X be a smooth complex compact connected algebraic curve of genus g > 1, and let G be a complex semisimple∗ group. Below we write BunG for the moduli space of principal algebraic G-bundles on X , regarded as a stack (cf. [LMB]) rather than a scheme. In particular, no stability conditions on principal bundles P ∈ BunG are imposed. Given a principal G-bundle P, let g P and g∗P denote the associated vector bundles corresponding to the adjoint and coadjoint representations of G, respectively. Let P be the universal bundle on BunG × X , and let p : BunG × X → BunG be the projection. The cotangent stack, T ∗ BunG , is a stack (see [BD, Sec. 1.1.1]) that is relatively representable over BunG by the affine spectrum of the sheaf of algebras Sym(R 1p∗ gP ), the symmetric algebra of the first derived pushforward sheaf. Note that for all i > 1 we have R i p∗ gP = 0 since dim X = 1. Hence, the formation R 1p∗ (−) is right-exact and therefore commutes with base change. For a scheme S and a morphism S → BunG , write P(S) for the pullback of the universal bundle to S × X . Using the base change, one obtains the following (stack version of the) Kodaira-Spencer formula for the set of S-points of the fiber of T ∗ BunG over P(S): ∗ TP(S) BunG = 0 S, H om Coh (R 1p∗ gP(S) , O S ) = 0 S, H om D b (Coh) (Rp∗ gP(S) [1] , O S ) (1) ! = 0 S, H om D b (Coh) (gP(S) , Rp O S [−1]) = 0(S × X, g∗P(S) ⊗ 1X ×S/S ) ' 0(S × X, gP(S) ⊗ 1X ×S/S ) , where the second isomorphism exploits the fact that the complex Rp∗ gP(S) [1] is concentrated in nonpositive degrees and the last isomorphism uses the identification g∗P ' gP induced by the Killing form on g. Write N for the nilpotent cone in g, the zero variety of the set of Ad G-invariant polynomials on g without constant term. Choose a Borel subgroup B ⊂ G with Lie algebra b, and let n denote the nilradical of b. B. Kostant proved in [Ko] (see also [CG, Chap. 6]) that N is equal, as a subscheme of g, to the image of the Springer resolution, the morphism G × B n → g given by the assignment (g, x) 7→ Ad g(x). Given a scheme S and a G-bundle P on S × X , we choose local trivialisations of the vector bundles g P and 1X , and we view a local section x of g P ⊗ 1X as a function S × X → g. The section x is called nilpotent if the corresponding function ∗ It

is not hard to extend our results to any reductive group, but that would lead to unpleasant dimension shifts in various formulas below, so we restrict ourselves to the semisimple case.

GLOBAL NILPOTENT VARIETY IS LAGRANGIAN

513

gives a morphism S × X → N ⊂ g. The notion of a nilpotent section does not depend on the choices of trivialisations involved. Following Laumon [La], define the global nilpotent cone as a closed (nonreduced) substack N ilp ⊂ T ∗ BunG whose set of S-points is N ilp(S) = {(P, x) x ∈ 0(S × X, g P ⊗ 1X ), x is nilpotent section}, where P runs over G-bundles on S × X . MAIN THEOREM

N ilp is a Lagrangian substack in T ∗ BunG .

Remark. This theorem was first proved, in the special case G = SLn , by Laumon [La]. His argument cannot be generalised to arbitrary semisimple groups. In the general case, the theorem was proved by Faltings [Fa, Th. II.5]. The proof below seems to be more elementary than that of Faltings; it is based on nothing but a few general results of symplectic geometry. Another proof of the theorem is given in [BD]. That proof is more complicated; however, it potentially leads to a description of the irreducible components of N ilp. We begin with a few general lemmas. Let (M1 , ω1 ) and (M2 , ω2 ) be complex algebraic symplectic manifolds, and let pri : M1 × M2 → Mi be the projections. We regard M1 × M2 as a symplectic manifold with symplectic form pr∗1 ω1 − pr∗2 ω2 , involving the minus sign on the second factor. The following result is a special case of [CG, Prop. 2.7.51]. LEMMA 1 Let 31 ⊂ M1 and 3 ⊂ M1 × M2 be smooth algebraic isotropic subvarieties. Then pr2 pr−1 1 (31 ) ∩ 3 ⊂ M2 is an isotropic subvariety.

Proof Set Y := pr−1 1 (31 ) ∩ 3. Simple linear algebra shows that, for any y ∈ Y , the image of the tangent map (pr2 )∗ : Ty Y → Tpr2 (y) M2 is isotropic. We use the characterisation of isotropic subvarieties mentioned at the beginning of the paper. Let W ⊂ 32 := pr2 (Y ) be an irreducible smooth subvariety. Observe that map pr2 : pr−1 there exists a nonempty smooth Zariski2 (W )∩Y → W is surjective. Hence, open dense subset Y 0 ⊂ pr−1 (W ) ∩ Y such that the restriction pr2 : Y 0 → W has 2 red surjective differential at any point of Y 0 . Therefore, the tangent space at the generic point of W is isotropic. Whence the tangent space at every point of W is isotropic by continuity. It follows that any smooth subvariety of 32 is isotropic, and the lemma follows.

514

VICTOR GINZBURG

Given a manifold N , we write λ N for the canonical 1-form on T ∗ N , usually denoted “ pdq,” such that dλ is the canonical symplectic 2-form on T ∗ N . Let f : N1 → N2 be a morphism of smooth algebraic varieties. Identify T ∗ (N1 × N2 ) with T ∗ N1 × T ∗ N2 via the standard map multiplied by (−1) on the factor T ∗ N2 . The canonical 1-form on T ∗ (N1 × N2 ) becomes, under the above identification, equal to pr∗1 (λ N1 ) − pr∗2 (λ N2 ). We endow T ∗ N1 × T ∗ N2 with the corresponding symplectic form pr∗1 (dλ N1 ) − pr∗2 (dλ N2 ); it is induced from the canonical symplectic form on T ∗ (N1 × N2 ) via the identification. Introduce the following closed subvariety: Y f = {(n 1 , α1 ), (n 2 , α2 ) ∈ T ∗ N1 × T ∗ N2 n 2 = f (n 1 ) , α1 = 0 = f ∗ (α2 )} . (3) 2 The image of Y f under the second projection pr2 : T ∗ N1 × T ∗ N2 → T ∗ N2 is an isotropic subvariety in T ∗ N2 . LEMMA

Proof Using the above explained identification of T ∗ (N1 ×N2 ) with T ∗ N1 ×T ∗ N2 involving a sign, the conormal bundle to the graph of f can be written as the subvariety 3 = {(n 1 , α1 ), (n 2 , α2 ) ∈ T ∗ N1 × T ∗ N2 n 2 = f (n 1 ) , α1 = f ∗ (α2 )} . Observe that the canonical 1-form pr∗1 (λ N1 ) − pr∗2 (λ N2 ) on T ∗ N1 × T ∗ N2 vanishes identically on 3. Hence, 3 is an isotropic subvariety, and we may apply Lemma 1 to M1 = T ∗ N1 , M2 = T ∗ N2 , and 31 = TN∗1 N1 = zero-section, and to 3 above. ∗ Observe now that we have by definition Y f = 3 ∩ pr−1 1 (TN1 N1 ). Hence, by Lemma 2 the subvariety pr2 (Y f ) is isotropic.

LEMMA 3 If N1 and N2 are smooth algebraic stacks and f : N1 → N2 is a representable morphism of finite type, then the assertion of Lemma 2 remains valid.

Proof Due to locality of the claim, we may (and do) assume that N2 is quasi-compact. Let e2 be a smooth algebraic variety, and let N e2 → N2 be a smooth surjective equidiN e1 := N e2 × N2 N1 . Note that the set Y f ⊂ T ∗ N1 × T ∗ N2 mensional morphism. Set N defined in (3) may be viewed as a subset in T ∗ N2 × N2 N1 . Therefore, we have e2 ⊂ T ∗ N2 × N2 N e2 × N2 N1 = T ∗ N e2 × N2 N e1 . Y f × N2 N e2 is an isotropic subvariety in T ∗ N e2 . Let F : We must show that the image of Y f × N2 N ∗ e e e e N1 → N2 be the natural morphism, and let Y F ⊂ T N2 × Ne2 N1 be the corresponding


515

e2 × Ne N e = T∗N e2 × N2 N1 and Y f × N2 N e2 = subvariety of Lemma 2. Observe that T ∗ N 2 1 ∗ e e2 is Y F . Hence, Lemma 2 applied to F shows that the image of Y F × N2 N2 in T N isotropic. The claim follows. Choose a Borel subgroup B ⊂ G with Lie algebra b, and let n denote the nilradical of b. Given a field K of characteristic zero, write G(K ), B(K ), b(K ), and so on, for the corresponding sets of K -rational points. The following result seems to be well known; it is included here for the reader’s convenience. LEMMA 4 For any field K ⊃ C and any x ∈ N (K ), there exists an element g ∈ G(K ) such that Ad g(x) ∈ n(K ).

Proof Since the Jacobson-Morozov theorem holds for any field of characteristic zero, one may find an sl2 -triple (x, h, x − ) ⊂ g(K ) associated to the given nilpotent element x ∈ g(K ). The eigenspaces of the semisimple endomorphism ad h : g → g corresponding to nonnegative eigenvalues span a parabolic subalgebra px ⊂ g, which is defined over K . Writing ux for the nilradical of px , by construction we have x ∈ ux (K ). Clearly, if bx ⊂ px is a Borel subalgebra defined over K and nx is its nilradical, then x ∈ ux (K ) ⊂ nx (K ). Thus, it suffices to prove that the parabolic px contains a Borel subalgebra defined over K . To this end, let P denote the partial flag variety of all parabolics in g of type px . There is a unique p ∈ P (K ) such that p ⊃ b, where b is our fixed Borel subalgebra. Now, the group G(K ) acts transitively on P (K ). (This follows easily from the Bruhat decomposition; see [Ja].) We deduce that there exists g ∈ G(K ) such that Ad g(p) = px . But then Ad g(b) ⊂ px is a Borel subalgebra defined over K , and we are done. Let f 1 , . . . , fr , (r = rk g) be a set of homogeneous free generators of C[g]G , the algebra of G-invariant polynomials on g. Let di = deg f i be the exponents of g. Following N. Hitchin [Hi], we put M M Hitch := 0 X, X⊗d1 ··· 0 X, X⊗dr . This is an affine space of dimension equal to dim BunG . (At this point it is used (see [Hi]) that genus(X ) is greater than 1.) Hitchin has defined a morphism π : T ∗ BunG → Hitch by assigning to any pair (s, P) ∈ T ∗ BunG , where P ∈ BunG and s ∈ TP∗ BunG ' 0(X, gP(S) ⊗ 1X ) (see (1)), the element π(s, P) = ⊕ri=1 f i (s) ∈ Hitch. It is immediate from the construction that the global nilpotent variety is the fiber of π over the zero element 0 ∈ Hitch.

516

VICTOR GINZBURG

Remark. Hitchin actually worked in the setup of stable Higgs bundles and not in the setup of stacks. But his construction of the map π extends to the stack setup verbatim. We make no use of any additional properties of the map π established in [Hi]. LEMMA 5 N ilp is an isotropic substack in T ∗ BunG .

Proof Write N2 = Bun B for the moduli stack of principal B-bundles. By an old result of G. Harder [Ha], any G-bundle on a curve has a B-reduction; hence the natural morphism of stacks f : Bun B → BunG is surjective. Let P be an algebraic G-bundle on the curve X , and let s be a nilpotent regular section of g P ⊗ 1X . Harder showed, further, using a key rationality result of R. Steinberg [St], that the bundle P is locally trivial in the Zariski topology. Thus, trivializing P on the generic point of X , one may identify the restriction of s to the generic point with a nilpotent element of g(K ), where K = C(X ) is the field of rational functions on X . Hence, Lemma 4 implies that there exists a B-reduction of P over the generic point of X such that s ∈ n P ⊗ 1X . Here, a B-reduction is a section of the associated bundle B\P. The fibers of B\P being projective varieties (isomorphic to B\G), any section of B\P defined over the generic point of X extends to the whole of X . Thus, there exists a B-reduction over X of the G-bundle P such that s ∈ n P ⊗ 1X . Further, let k ⊃ C be a field, and let S = Spec(k). For any P ∈ Bun B (k), we have TP∗ Bun B = H 1 (X, b P )∗ = H 0 (X, b∗P ⊗ 1X ) = H 0 X, (g P /n P ) ⊗ 1X . It follows that in the notation of Lemma 3, for N1 = Bun B and N2 = BunG , we have N ilp = pr2 (Y f ). Observe further that Bun B is the union of a countable family of open substacks, each of finite type over BunG . Thus, Lemma 3 implies that N ilp is the union of a countable family of isotropic substacks. We claim that if an algebraic stack S is the union of a countable family {Si }i∈N of locally closed substacks, then any quasi-compact substack of S can be covered by finitely many Si ’s. To prove this, we may assume without loss of generality that S is itself quasi-compact. Choose a smooth surjective morphism e S S, where e S is a scheme of finite type. By our assumptions there exists a countable family {e Si }i∈N of locally closed subschemes of e S such that e S = ∪i e Si . Hence, there exists n 0 such that e S equals the union of the closures of e S1 , . . . , e Sn , thanks to Baer theorem. Hence, e S1 ∪ · · · ∪ e Sn is Zariski dense in e S, and dim e S r (e S1 ∪ · · · ∪ e Sn ) < dim e S. Arguing by induction on dim e S, we deduce that e S is covered by finitely many e Si ’s. Hence, for any field k ⊃ C, the quasi-compact set S(k) is covered by finitely many subsets Si (k).


517

Thus, we have proved that any open quasi-compact substack of N ilp can be covered by finitely many isotropic substacks. This implies that N ilp is itself isotropic. PROPOSITION 1 There is an equality dim N ilp = dim BunG .

Proof We observe that, since BunG is an equidimensional smooth stack, each irreducible component of T ∗ BunG has dimension greater than or equal to 2 dim BunG . To see this, one may replace BunG by an open quasi-compact substack Y , which admits a e Y , where Y e is a smooth algebraic variety and smooth surjective morphism p : Y ∗ e is a closed subscheme of T ∗ Y e fibers of p are purely m-dimensional. Then, T Y ×Y Y ∗ ∗ locally defined by m equations. Hence, for each irreducible component T j of T Y we find e) − m ≥ (dim T ∗ Y e − m) − m = 2(dim Y e − m) = 2 dim Y. dim T j∗ = dim(T j∗ ×Y Y It follows that each irreducible component of any fiber of the Hitchin morphism π : T ∗ BunG → Hitch has dimension greater than or equal to 2 dim BunG − dim Hitch = dim BunG . But we have proved that each component of N ilp = π −1 (0) is an isotropic subvariety. Thus, dim N ilp = dim BunG . Remark. Although the inequality dim T ∗ BunG ≥ 2 dim BunG is no longer true if X has genus one or zero, it has been shown in [BD] that the stack N ilp still has pure dimension equal to dim BunG . COROLLARY 1 Every irreducible component of any fiber of π has dimension equal to dim BunG . In particular, the Hitchin morphism π is flat.

Proof There is a natural C∗ -action on Hitch such that t ∈ C∗ acts on the direct summand 0(X, X⊗di ) ⊂ Hitch via multiplication by t di . The map π : T ∗ BunG → Hitch is C∗ -equivariant relative to the above-defined C∗ -action on Hitch and to the standard C∗ -action on T ∗ BunG by dilations along the fibers, respectively. Clearly, zero is the only fixed point of the C∗ -action on Hitch and, moreover, it is contained in the closure of any other C∗ -orbit on Hitch. It follows, since the dimension of any irreducible component of any fiber is less than or equal to the dimension of the special fiber, that for any h ∈ Hitch we have dim π −1 (h) ≤ dim π −1 (0). But for curves of genus greater than 1, Proposition 1 yields dim π −1 (0) = dim N ilp = dim BunG . Thus, for

518

VICTOR GINZBURG

any h ∈ Hitch we get dim π −1 (h) ≤ dim BunG . On the other hand, the dimension of each irreducible component of any fiber of the morphism π : T ∗ BunG → Hitch is no less than dim T ∗ BunG − dim Hitch = dim BunG . This proves the opposite inequality. The proof of the main theorem is completed by the following stronger result. THEOREM 1 The stack T ∗ BunG is a local complete intersection, and N ilp is a Lagrangian complete intersection in T ∗ BunG .

Proof The claims being local, we may replace BunG by an open quasi-compact substack Y e Y , where Y e is a smooth algebraic which admits a presentation of the form p : Y variety and p is a smooth surjective morphism with fibers of pure dimension m. As e is a closed subscheme of we have observed in the proof of Proposition 1, T ∗ Y ×Y Y ∗ e locally defined by m equations and, moreover, dim T ∗ Y ≥ 2 dim BunG . On the T Y other hand, since fibers of the Hitchin map π have dimension equal to dim BunG , we find dim T ∗ Y ≤ dim BunG + dim Hitch = 2 dim BunG . This proves that the stack T ∗ BunG is a local complete intersection. Further, N ilp being the zero fiber of the surjective morphism π, it is defined by dim BunG equations in T ∗ BunG . The equality dim N ilp = dim BunG (Proposition 1) implies that N ilp is a complete intersection in T ∗ BunG . Finally, since N ilp is equidimensional (Corollary 1), Lemma 5 implies that N ilp is Lagrangian. Acknowledgments. I am grateful to V. Drinfeld for his invaluable help and also for his extreme patience while answering my numerous foolish questions. References [BD]

[CG] [Fa] [Ha] [Hi]

A. BEILINSON and V. DRINFELD, Quantization of Hitchin’s integrable system and

Hecke eigensheaves, preprint, 2000, http://www.math.uchicago.edu/˜benzvi 511, 512, 513, 517 N. CHRISS and V. GINZBURG, Representation Theory and Complex Geometry, Birkhäuser, Boston, 1997. MR 98i:22021 511, 512, 513 G. FALTINGS, Stable G-bundles and projective connections, J. Algebraic Geom. 2 (1993), 507–568. MR 94i:14015 513 G. HARDER, Halbeinfache Gruppenschemata u¨ ber Dedekindringen, Invent. Math. 4 (1967), 165–191. MR 37:1378 516 N. HITCHIN, Stable bundles and integrable systems, Duke Math. J. 54 (1987), 91–114. MR 88i:58068 515, 516


519

[Ja]

J. C. JANTZEN, Representations of Algebraic Groups, Pure Appl. Math. 131, Academic

[Ko]

B. KOSTANT, Lie group representations on polynomial rings, Amer. J. Math. 85

[La]

G. LAUMON, Un analogue global du cône nilpotent, Duke Math. J. 57 (1988),

[LMB]

G. LAUMON and L. MORET-BAILLY, Champs algébriques, Ergeb. Math. Grenzgeb. (3)

[St]

´ R. STEINBERG, Regular elements of semisimple algebraic groups, Inst. Hautes Etudes

Press, Boston, 1987. MR 89c:20001 515 (1963), 327–404. MR 28:1252 512 647–671. MR 90a:14012 513 39, Springer, Berlin, 2000. MR CMP 1 771 927 511, 512 Sci. Publ. Math. 25 (1965), 49–80. MR 31:4788 516

University of Chicago, Department of Mathematics, Chicago, Illinois 60637, USA; [email protected]


SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ TERESA KRICK, LUIS MIGUEL PARDO, AND MARTÍN SOMBRA

Abstract We present sharp estimates for the degree and the height of the polynomials in the Nullstellensatz over the integer ring Z. The result improves previous work of P. Philippon, C. Berenstein and A. Yger, and T. Krick and L. M. Pardo. We also present degree and height estimates of intrinsic type, which depend mainly on the degree and the height of the input polynomial system. As an application we derive an effective arithmetic Nullstellensatz for sparse polynomial systems. The proof of these results relies heavily on the notion of local height of an affine variety defined over a number field. We introduce this notion and study its basic properties. Contents Introduction . . . . . . . . . . . . . . . . . . . . . 1. Height of polynomials and varieties . . . . . . . 1.1. Height of polynomials . . . . . . . . . . . 1.2. Height of varieties . . . . . . . . . . . . . 2. Estimates for local and global heights . . . . . . 2.1. Estimates for Chow forms . . . . . . . . . 2.2. Basic properties of the height . . . . . . . 2.3. Local height of norms and traces . . . . . . 3. An effective arithmetic Nullstellensatz . . . . . . 3.1. Division modulo complete intersection ideals

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

522 527 528 536 542 542 548 556 562 562

DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 3, Received 11 February 2000. Revision received 30 October 2000. 2000 Mathematics Subject Classification. Primary 11G35; Secondary 13P10. Krick and Sombra’s work partially supported by Consejo Nacional de Investigaciones Cient´ıficas y Técnicas, Universidad de Buenos Aires Ciencia y Técnica, and Agencia Nacional de Promoción Cient´ıfica y Tecnológica (Argentina), and by the Mathematical Sciences Research Institute at Berkeley (USA). Sombra’s work also partially supported by National Science Foundation grant number DMS-97-29992 to the Institute for Advanced Study, Princeton, New Jersey. Pardo’s work partially supported by PB 96-0671-C02-02 (Spain), and by Centre National de la Recherche Scientifique 1026 Mathématiques Effectives, Développements Informatiques, Calcul, Ingénierie et Systèmes (France). 521

522

KRICK, PARDO, AND SOMBRA

3.2. An effective arithmetic Nullstellensatz Intrinsic type estimates . . . . . . . . . . . 4.1. Equations in general position . . . . . 4.2. An intrinsic arithmetic Nullstellensatz References . . . . . . . . . . . . . . . . . . . 4.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

569 580 580 588 595

Introduction Hilbert Nullstellensatz is a cornerstone of algebraic geometry. In a simplified form, its statement is the following: Let f 1 , . . . , f s ∈ Z[x1 , . . . , xn ] be polynomials such that the equation system f 1 (x) = 0, . . . , f s (x) = 0 (1) has no solution in Cn . Then there exist a ∈ Z \ {0} and g1 , . . . , gs ∈ Z[x1 , . . . , xn ] satisfying the Bézout identity a = g1 f 1 + · · · + gs f s .

(2)

As for many central results in commutative algebra and algebraic geometry, it is an existential noneffective statement. The estimation of both the degree and the height of polynomials satisfying identity (2) became an important and widely considered question. Effective versions of Hilbert Nullstellensatz apply to a wide range of situations in number theory and theoretical computer science. In particular, they decide the consistency of a given polynomial system. In their arithmetic presentation they apply to Lojasiewicz inequalities (see [51], [26]) and to the consistency problem over finite fields (see [28], [22]). Let h( f ) denote the height of an arbitrary polynomial f ∈ Z[x1 , . . . , xn ], defined as the logarithm of the maximum modulus of its coefficients. The main result of this paper is the following effective arithmetic Nullstellensatz. THEOREM 1 Let f 1 , . . . , f s ∈ Z[x1 , . . . , xn ] be polynomials without common zeros in Cn . Set d := maxi deg f i and h := maxi h( f i ). Then there exist a ∈ Z \ {0} and g1 , . . . , gs ∈ Z[x1 , . . . , xn ] such that • a = g1 f 1 + · · · + gs f s , • deg gi ≤ 4 n d n , • h(a), h(gi ) ≤ 4 n (n + 1) d n (h + log s + (n + 7) log(n + 1) d).

As we see below, this result substantially improves all previously known estimates for the arithmetic Nullstellensatz.

SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ

523

The following variant of a well-known example due to D. Masser and to Philippon (see [6]) yields a lower bound for any general degree and height estimate. Set f 1 := x1d ,

f 2 := x1 xnd−1 − x2d ,

d . . . , f n−1 := xn−2 xnd−1 − xn−1 ,

f n := xn−1 xnd−1 − H

for any positive integers n, d, and H . These are polynomials of degree d and height bounded by h := log H without common zeros in Cn . Let a ∈ Z\{0} and g1 , . . . , gn ∈ Z[x1 , . . . , xn ] such that a = g1 f 1 + · · · + gn f n . n−2

n−1

Specializing this identity at x1 := H d t d −1 , . . . , xn−1 := H t d−1 , xn := 1/t, we obtain n n−2 n−1 n−1 a = g1 (H d t d −1 , . . . , H t d−1 , 1/t) H d t d −d . We conclude that deg g1 ≥ d n − d and h(a) ≥ d n−1 h. In fact, a modified version of this example gives the improved lower bound h(a) ≥ d n h (see Example 3.10). This shows that our estimate is essentially optimal. The earlier work on the effective Nullstellensatz dealt with the degree bounds. Let k be a field, and let k be its algebraic closure; let f 1 , . . . , f s ∈ k[x1 , . . . , xn ] be n polynomials of degree bounded by d without common zeros in k . In 1926, G. Hermann [25] (see also [23], [43]) proved that there exist g1 , . . . , gs ∈ k[x1 , . . . , xn ] such that 1 = g1 f 1 + · · · + gs f s n−1

with deg gi f i ≤ 2 (2d)2 . After a conjecture of O.-H. Keller and W. Gröbner, this estimate was dramatically improved by W. Brownawell [6] to deg gi f i ≤ n 2 d n + n d in case char(k) = 0, while 2 L. Caniglia, A. Galligo, and J. Heintz [7] showed that deg gi f i ≤ d n holds in the general case. These results were then independently refined by J. Kollár [29] and by N. Fitchas and Galligo [12] to deg gi f i ≤ max{3, d}n , which is optimal in case d ≥ 3. For d = 2, M. Sombra [53] recently showed that the bound deg gi f i ≤ 2n+1 holds. Now, let us consider the height aspect: assume that f 1 , . . . , f s ∈ Z[x1 , . . . , xn ] are polynomials of degree and height bounded, respectively, by d and h. The previous degree bound reduces Bézout identity (2) to a system of Q-linear equations. Applying Cramer rule to this linear system, one obtains an estimate for the height of a and the 2 polynomials gi of type s d n (h + log s + d).

524


However, it was soon conjectured that the true height bound should be much smaller. Philippon [48] obtained the following sharper estimate for the denominator a in the Bézout equation: deg gi ≤ (n + 2) d n ,

h(a) ≤ κ(n) d n (h + d),

where κ(n) depends exponentially on n. The first essential progress on height estimates for all the polynomials gi was achieved by Berenstein and Yger [2], who obtained deg gi ≤ n (2 n + 1) d n ,

h(a), h(gi ) ≤ λ(n) d 8n+3 (h + log s + d log d),

where λ(n) is a (nonexplicit) constant that depends exponentially on n. Their proof relies on the previous work of Philippon and on techniques from complex analysis. Later on, Krick and Pardo [31], [32] obtained deg gi ≤ (n d)c n ,

h(a), h(gi ) ≤ (n d)c n (h + log s + d),

where c is a universal constant (c ≤ 35). Their proof, based on duality theory for Gorenstein algebras, is completely algebraic. Finally, Berenstein and Yger [3] improved their height bound to λ(n) d 4n+2 (h + log s + d ) and extended it to the case when Z is replaced by an arbitrary diophantine ring. It should be said, however, that the possibility of such an extension was already clear from the arguments of [32]. We refer the reader to the surveys [58], [1], and [45] for a broad introduction to the history of the effective Nullstellensatz, main results, and open questions. Aside from degree and height estimates, there is a strong current area of research on computational issues (see [19], [13], [32], [18], [17], [22]). There are other results in the recent research papers [50], [30], and [9]. With respect to previous work, in this paper we improve in an almost optimal way the dependence of the height estimate on d n and we eliminate the extraneous exponential constants depending on n. We remark that the polynomials arising in Theorem 1 are a slight variant of the polynomials which appear in [32] and can thus be effectively computed by their algorithm. Although the exponential behavior of the degree and height estimates is—in the worst case—unavoidable, it has been observed that there are many particular instances in which these estimates can be essentially improved. This has motivated the introduction of parameters associated to the input system which identify special families whose behavior with respect to our problem is polynomial instead of exponential. In this spirit, M. Giusti, Heintz, J. Morais, J. Morgenstern, and Pardo [18] introduced the notion of degree of a polynomial system f 1 , . . . , f s . Roughly speaking, this parameter measures the degree of the varieties cut out by f 1 , . . . , f i for


525

i = 1, . . . , s − 1. It was soon realized that the degrees in the Nullstellensatz can be controlled in terms of this parameter, giving rise to the so-called “intrinsic Nullstellensätze” (see [18], [33], [17], [52]). Recently K. Hägele, Morais, Pardo, and Sombra [22] (see also [21]) obtained an arithmetic analogue of these intrinsic Nullstellensätze. To this aim, they introduced the notion of height of a polynomial system, the arithmetic analogue of the degree of the system. They obtained degree and height estimates which depend polynomially on the number of variables and on the degree, height and complexity of the input system. This result followed from their study of the computational complexity of the Nullstellensatz. In this paper we obtain a dramatic improvement over this result, bringing it to an (apparently) almost optimal form. In particular, we show that the dependence on the degree and the height of the system is linear, and we eliminate the influence of the complexity of the input. THEOREM 2 Let f 1 , . . . , f s ∈ Z[x1 , . . . , xn ] be polynomials without common zeros in Cn . Set d := maxi deg f i and h := maxi h( f i ). Let δ and η denote the degree and the height of the polynomial system f 1 , . . . , f s . Then there exist a ∈ Z \ {0} and g1 , . . . , gs ∈ Z[x1 , . . . , xn ] such that • a = g1 f 1 + · · · + gs f s , • deg gi ≤ 2 n 2 d δ, • h(a), h(gi ) ≤ (n + 1)2 d 2 η + (h + log s) δ + 21 (n + 1)2 d log(d + 1) δ .

Since δ ≤ d n−1 and η ≤ n d n−1 h + log s + 3 n (n + 1) d (see Lemma 4.8), one recovers from this statement essentially the same estimates as those of Theorem 1. However, we remark that Theorem 2 is a more flexible result as there are many situations in which the degree and the height of the input system are smaller than the Bézout bounds. When this is the case, it yields a much more accurate estimate (see Sec. 4.2.2). An example of the situation when both the degree and the height of the system are smaller than the expected worst-case bounds is the sparse case. To state the result, we first need to introduce some standard notation. The support Supp( f 1 , . . . , f s ) of a polynomial system f 1 , . . . , f s ∈ C[x1 , . . . , xn ] is defined as the set of exponents of all the nonzero monomials of all f i ’s, and the Newton polytope N ( f 1 , . . . , f s ) ⊂ Rn is the convex hull of this support. The (normalized) volume of f 1 , . . . , f s equals n! times the volume of the corresponding Newton polytope.

526


The notions of Newton polytope and volume of a polynomial system give a sharper characterization of its monomial structure than the degree alone. These concepts were introduced in the context of root counting by D. Bernstein [4] and A. Kushnirenko [35] and are now in the basis of sparse elimination theory (see, e.g., [56]). As an application of Theorem 2 we derive the following effective arithmetic Nullstellensatz for sparse polynomial systems. COROLLARY

3

Let f 1 , . . . , f s ∈ Z[x1 , . . . , xn ] be polynomials without common zeros in Cn . Set d := maxi deg f i and h := maxi h( f i ). Let V denote the volume of the polynomial system 1, x1 , . . . , xn , f 1 , . . . , f s . Then there exist a ∈ Z \ {0} and g1 , . . . , gs ∈ Z[x1 , . . . , xn ] such that • a = g1 f 1 + · · · + gs f s , • deg gi ≤ 2 n 2 d V , • h(a), h(gi ) ≤ 2 (n + 1)3 d V ( h + log s + 22n+3 d log(d + 1)). The crucial observation here is that both the degree and the height of a polynomial system are essentially controlled by the normalized volume. This follows from an adequate arithmetic version of the Bernstein-Kushnirenko theorem (see Prop. 2.12). Our result follows then from Theorem 2 in a straightforward way. As before, we can apply the worst-case bound V ≤ d n to recover from this result an estimate similar to the one presented in Theorem 1. However, this result gives sharper estimates for both the degree and the height when the input system is sparse (see Ex. 4.13). The sparse aspect in the Nullstellensatz was previously considered by J. Canny and I. Emiris [8, Th. 8.2] for the case of n + 1 n-variate Laurent polynomials without common roots at toric infinity. Their result is the sparse analogue of F. Macaulay’s effective Nullstellensatz [40]. The first general sparse Nullstellensatz was obtained by Sombra [53]. In both cases the authors give bounds for the Newton polytopes of the output polynomials in terms of the Newton polytopes of the input ones. We refer to the original papers for the exact statements. It is quite difficult to make a definite comparison between these results and ours. The latter does not give sharp bounds for Newton polytopes. But on the other hand, our degree estimate for the general case is better, while the height estimate is completely new. The key ingredient in our treatment of the arithmetic Nullstellensatz is the notion of local height of a variety defined over a number field K .


527

Let V ⊂ An (Q) be an equidimensional affine variety defined over K . For each absolute value v over K , we introduce the local height h v (V ) of V at v as a Mahler measure of a suitable normalized Chow form of V . This is consistent with the Faltings height h(V ) of V , namely, h(V ) =

X 1 Nv h v (V ), [K : Q] v∈M K

where M K denotes the set of canonical absolute values of K and Nv the multiplicity of v. We study the basic properties of this notion. In particular, we are able to estimate the local height of the trace and the norm of a polynomial f ∈ K [x1 , . . . , xn ] with respect to an integral extension K [Ar ] ,→ K [V ]. We also obtain local analogues of many of the global results of J.-B. Bost, H. Gillet, and C. Soulé [5] and Philippon [49]. Our proof of the arithmetic Nullstellensatz is based on duality theory for Gorenstein algebras (trace formula). This technique was introduced in the context of the effective Nullstellensatz in [19] and [13]. Here we follow mostly the lines of J. Sabia and P. Solernó [50] and of Krick and Pardo [32]. The trace formula allows us to perform division modulo complete intersection ideals, with good control of the degree and height of the involved polynomials. Local arithmetic intersection theory plays, with respect to the height estimates, the role of classical intersection theory with respect to the degree bounds. Finally, we remark that all of our results are valid not just for Q but for arbitrary number fields. In fact, the general analysis over number fields is necessary to obtain the sharpest estimates for the case K := Q. We also remark that the estimates in the general version of Theorem 1 (Theorem 3.6) do not depend on the involved number field. The outline of the paper is the following: In Section 1, we recall the basic definitions and properties of the height of polynomials, and we introduce the notion of local height of a variety defined over a number field. In Section 2, we derive useful estimates for the local heights of the trace and the norm of a polynomial in K [V ], and we study the behavior of the local heights of the intersection of a variety with a hypersurface. In Section 3, we recall the basic facts of duality theory which are useful in our context, and we prove Theorem 1. In Section 4, we focus on the intrinsic and sparse versions of the arithmetic Nullstellensatz. 1. Height of polynomials and varieties Throughout this paper Q denotes the field of rational numbers, Z the ring of rational integers, K a number field, and O K its ring of integers. We also denote by R the field

528


of real numbers, by C the field of complex numbers, by k an arbitrary field, and by k an algebraic closure of k. As usual, An and Pn denote, respectively, the affine and the projective space of n dimensions over k. For every rational prime p we denote by | · | p the p-adic absolute value over Q such that | p| p = p −1 . We also denote the ordinary absolute value over Q by | · |∞ or simply by | · |. These form a complete set of independent absolute values over Q; we identify the set MQ of these absolute values with the set {∞, p; p prime}. For v ∈ MQ we denote by Qv the completion of Q with respect to the absolute value v. In case v = ∞ we have Q∞ = R, while in case p is prime we have that Q p is the p-adic field. There exists a unique extension of v to an absolute value over the algebraic closure Qv . We denote by Cv the completion of Qv with respect to this absolute value. This field is algebraically closed and complete with respect to the induced absolute value, which we also denote by v. We have C∞ = C. 1.1. Height of polynomials In this section we introduce the different measures for the size of a multivariate polynomial, both over Cv and over a number field. We establish the link between the different notions and study their basic properties. 1.1.1. Height of polynomials over Cv We fix an absolute value v ∈ {∞, p ; p prime} for the rest of this section. Let A ⊂ Cv be a finite set. We denote by |A |v := max{ |a|v , a ∈ A } its absolute value. Then we define the (logarithmic) height of A as h v (A ) := max{ 0, log |A |v }, that is, h v (A ) = log |{1} ∪ A |v . P For a polynomial f = α aα x α ∈ Cv [x1 , . . . , xn ], we define its absolute value | f |v as the absolute value of its set of coefficients, that is, | f |v := maxα { |aα |v }. In the same way we define the height h v ( f ) of f as the height of its set of coefficients: h v ( f ) := max{ 0, log | f |v }. When v = ∞, that is, when f has complex coefficients, we make use of the (logarithmic) Mahler measure of f defined as m( f ) :=

1

Z

1

Z ···

0

log | f (e2π i t1 , . . . , e2π i tn )| dt1 · · · dtn .

0

This integral is well defined, as log | f | is a plurisubharmonic function on Cn (see [39, Appendix I]).


529

The Mahler measure was introduced by D. Lehmer [37] for the case of a univariQd ate polynomial f := ad i=1 (x − αi ) ∈ C[x] as m( f ) = log |ad | +

d X

max{0, log |αi | }.

i=1

The link between both expressions of m( f ) is given by Jensen’s formula. The general case was introduced and studied by K. Mahler [41]. The key property of the Mahler measure is its additivity: m( f g) = m( f ) + m(g). We have the following relation between log | f | and m( f ): − log(n + 1) deg f ≤ m( f ) − log | f | ≤ log(n + 1) deg f.

(1.1)

The right inequality follows from the definition of m and the fact that the number of f monomials of f is bounded by n+deg ≤ (n + 1)deg f . For the left inequality, we n refer to [47, Lem. 1.13] and its proof. When f has total degree bounded by 1, the inequality is refined to log | f | ≤ m( f ). Also, for any degree, m( f (x1 , . . . , xn−1 , 0)) ≤ m( f ). We make frequent use of the following more precise relation. LEMMA 1.1 Let f ∈ C[X 1 , . . . , X r ] be a polynomial in r groups of n i variables each, for i = 1, . . . , r . Let di denote the degree of f in the group of variables X i . Then

−

r X i=1

log(n i + 1) di ≤ m( f ) − log | f | ≤

r X

log(n i + 1) di .

i=1

Proof The right inequality follows directly from the definition of m( f ) and the fact that we Q can bound by i (n i + 1)di the number of monomials of f . Thus we only consider the left inequality. Let f α1 ···αi ∈ C[X i+1 , . . . , X r ] denote the coefficient of f with respect to the monomial X 1α1 · · · X iαi . Applying inequality (1.1), we obtain for all (ξi+1 , . . . , ξr ) ∈ Cn i+1 +···+nr , log | f α1 ···αi−1 (X i , ξi+1 , . . . , ξr )| ≤ m( f α1 ···αi−1 (X i , ξi+1 , . . . , ξr )) + log(n i + 1) di . We have | f α1 ···αi−1 (X i , ξi+1 , . . . , ξr )| = maxαi | f α1 ···αi (ξi+1 , . . . , ξr )|. We inten +···+nr grate both sides of the last inequality on S1 i+1 , and we deduce max{m( f α1 ···αi ) ; αi ∈ Zni } ≤ m( f α1 ···αi−1 ) + log(n i + 1) di .

530


We apply this relation recursively, and we obtain log | f | = max{m( f α1 ···αr ) ; α1 ∈ Zn 1 , . . . , αr ∈ Znr } ≤ m( f ) +

r X

log(n i + 1) di .

i=1

Let f ∈ C[X 1 , . . . , X r ] be a multihomogeneous polynomial in r groups of n i + 1 variables each, and set f a for a dehomogenization of f with respect to these groups of variables. Then m( f a ) = m( f ), log | f a | = log | f |. Thus the estimates of the preceding lemma also hold for f . Next we introduce the (logarithmic) Sn -Mahler measure of a polynomial f ∈ C[x1 , . . . , xn ] as Z m( f ; Sn ) :=

log | f (x)| µn (x),

Sn

where Sn := {(z 1 , . . . , z n ) ∈ Cn : |z 1 |2 +· · ·+|z n |2 = 1} is the unit sphere in Cn , and µn is the measure of total mass 1, invariant with respect to the unitary group U (n). More generally, let f ∈ C[X 1 , . . . , X r ] be a polynomial in r groups of n variables each. Its Snr -Mahler measure is then defined as Z m( f ; Snr ) := log | f (X )| µrn (X ) Snr

with Snr := Sn × · · · × Sn . This alternative Mahler measure was introduced by Philippon [49, I]. With this notation the ordinary Mahler measure m( f ) of f ∈ C[x1 , . . . , xn ] coincides with m( f ; S1n ). When f ∈ C is a constant, we agree that m( f ; Sn0 ) = log | f |. The Snr -Mahler measure is related to the ordinary Mahler measure by the following inequalities (see [38, Th. 4]): 0 ≤ m( f ) − m( f ; Snr ) ≤ r d

n−1 X 1 , 2i

(1.2)

i=1

where d is a bound for the degree of f in each group of variables. Finally, we summarize in the following lemma the basic properties of the notion of height of polynomials in Cv [x1 , . . . , xn ]. LEMMA 1.2 Let v ∈ MQ and f 1 , . . . , f s ∈ Cv [x1 , . . . , xn ]. (1) If v = ∞, then P (a) h ∞ ( i f i ) ≤ maxi {h ∞ ( f i )} + log s;


531

Qs Ps Ps−1 h ∞ ( i=1 f i ) ≤ i=1 h ∞ ( f i ) + log(n + 1) i=1 deg f i ; h ∞ ( f 1 f 2 ) ≤ h ∞ ( f 1 ) + h ∞ ( f 2 ) + log(n + 1) min{deg f 1 , deg f 2 }. (c) Let g ∈ C[y1 , . . . , ys ]. Set d := maxi {deg f i } and h ∞ := maxi {h ∞ ( f i )}. Then h ∞ g( f 1 , . . . , f s ) ≤ h ∞ (g) + deg g h ∞ + log(s + 1) + log(n + 1) d ; P P Q (d) log | i f i |∞ ≥ i log | f i |∞ − 2 log(n + 1) i deg f i . If v = p for some prime p, then P (a) h p ( i f i ) ≤ maxi {h p ( f i )}; Q P (b) h p ( i f i ) ≤ i h p ( f i ). (c) Let g ∈ C p [y1 , . . . , ys ]. Set d := maxi {deg f i } and h p := maxi {h p ( f i )}. Then h p g( f 1 , . . . , f s ) ≤ h p (g) + deg g h p ; Q P (d) log | i f i | p = i log | f i | p . (b)

(2)

Proof The different behavior for v = ∞ and v = p is simply due to the fact that | · | p is nonArchimedean, that is, it verifies the stronger inequality |a + b| p ≤ max{|a| p , |b| p } for any a, b ∈ C p . Inequalities (1.a), (1.b), (2.a), and (2.b) are now immediate from the definition of hv . For (1.c) and (2.c), let us first consider the case v = ∞. Set c(n) := log(n + 1). First we compute h v ( f 1α1 · · · f sαs ) for the exponent (α1 , . . . , αs ) of a monomial of g. Applying (1.b), we obtain X h ∞ ( f 1α1 · · · f sαs ) ≤ c(n) d + h ∞ αi ≤ c(n) d + h ∞ deg g. i

The polynomial g has at most (s + 1)deg g monomials, and so h ∞ (g( f 1 , . . . , f s )) ≤ h ∞ (g) + (c(n) d + h ∞ ) deg g + c(s) deg g. The case v 6= ∞ follows in a similar way. For (1.d), we apply inequality (1.1) directly: X X log | f i |∞ ≤ m( f i ) + c(n) deg f i i

i

Y X = m( f i ) + c(n) deg f i i

≤ log |

i

Y

f i |∞ + 2 c(n)

i

For (2.d), the Gauss lemma implies that

X

deg f i .

i

P

i

log | f i | p = log |

Q

i

fi | p .

532


We make frequent use of the following particular case of the previous lemma: Let ( f i j )i j be an (s × s)-matrix of polynomials in Cv [x1 , . . . , xn ] of degrees and heights bounded by d and h v , respectively. From Lemma 1.2(a,b) we obtain • h ∞ det( f i j )i j ≤ s h ∞ + log s + d log(n + 1) , • h p det( f i j )i j ≤ s h p . 1.1.2. Height of polynomials over a number field The set M K of absolute values over K which extend the absolute values in MQ is called the canonical set. We denote by M K∞ the set of Archimedean absolute values in M K , that is, the absolute values extending ∞. If v ∈ M K extends an absolute value v0 ∈ MQ (which is denoted by v | v0 ), there exists a (not necessarily unique) embedding σv : K ,→ Cv0 corresponding to v, that is, such that |a|v = |σv (a)|v0 for every a ∈ K . In the p-adic case, there is a one-to-one correspondence P 7 → v(P ) between prime ideals of O K which divide p and absolute values extending p, defined by |a|v(P ) := p −ordP (a)/eP = N(P )−ordP (a)/eP

fP

for a ∈ K ∗ . Here ordP (a) denotes the order of P in the factorization of a, and N(P ) denotes the norm of the ideal P . Also, eP := ordP ( p) denotes the ramification index, and f P := [O K /P : Z/( p)] denotes the residual degree of the prime ideal P. Note that a ∈ O K if and only if log |a|v ≤ 0 for every v ∈ M K \ M K∞ . We denote by K v the completion of K in Cv0 . The local degree of K at v is defined as Nv := [K v : Qv0 ], and it coincides with the number of different embeddings σ : K ,→ Cv0 which correspond to v. When v is Archimedean, K v is either R or C, and Nv equals 1 or 2 accordingly. When v is non-Archimedean, Nv = eP f P , where P is the prime ideal that corresponds to v. In any case, X [K : Q] = Nv v | v0

for v0 ∈ MQ . The canonical set M K satisfies the product formula with multiplicities Nv : Y | a |vNv = 1, ∀ a ∈ K ∗. (1.3) v∈M K

Let A ⊂ K be a finite set. Let v ∈ M K be an absolute value that extends v0 ∈ MQ , and let σv be an embedding corresponding to v. The local absolute value


533

of A at v is defined as |A |v := |σv (A )|v0 = max{|σv (a)|v0 , a ∈ A }. Then we define the local height of A as h v (A ) := max{0, log |A |v } = h v0 (σv (A )). We note that this notion behaves well with respect to extensions. Let K ,→ L be a finite extension, and let w ∈ M L be an absolute value extending v. Then h v (A ) = h w (A ). P For a polynomial f = α aα x α ∈ K [x1 , . . . , xn ], we define the local absolute value of f at v (denoted by | f |v ) as the absolute value at v of its set of coefficients, and the local height of f at v (denoted by h v ( f )) as the local height at v of its set of coefficients. Finally, we define the (global) height of a finite set A ⊂ K as h(A ) :=

X 1 Nv h v (A ). [K : Q] v∈M K

In classical terms this is the affine height of A ; if we set A := {a1 , . . . , a N }, then h(A ) equals the Weil absolute height of the point (1 : a1 : · · · : a N ) ∈ P N . Because of the imposed normalization, this quantity does not depend on the field K in which we consider the set A . This allows us to extend the definition of h to subsets of Q. We also define the (global) height of f 1 , . . . , f s ∈ K [x1 , . . . , xn ] as the global height of its set of coefficients; that is, h( f 1 , . . . , f s ) :=

X 1 Nv max h v ( f i ). i [K : Q]

(1.4)

v∈M K

We have h v (a) ≤ h v (A ) for every a ∈ A and every v ∈ M K , and so max h(a) ≤ h(A ). a∈A

In case A ⊂ O K , we have that h v (A ) = 0 for every v ∈ M K \ M K∞ , and so h(A ) = P (1/[K : Q]) v∈M ∞ Nv h v (A ). We also have h v (A ) ≤ [K : Q] maxa∈A h(a) for K all v ∈ M K and hence h(A ) ≤ [K : Q] max h(a). a∈A

Both inequalities are sharp. Equality is attained in the first one when, for instance, A has only one element.

534


√ √ √ For √ the second one, set A = {1 + 2, 1 − 2} ⊂ Q( √ √ √ 2). Then h(A ) = log(1 + 2) while h(1 + 2) = h(1 − 2) = (1/2) log(1 + 2). Hence h(A ) = 2 maxa∈A h(a). More generally, if a ∈ C is a Pisot number, namely, an algebraic integer such that |a| > 1 and all its conjugates lie inside the unit disk, and K := Q(a) is Galois, then, for A := {σ (a) : σ ∈ Gal(K /Q)} ⊂ K , we have h(A ) = [K : Q] h(a). Let a = m/n ∈ Q∗ be a rational number, where m ∈ Z and n ∈ N are coprime. Then h(a) = max{|m|, n}, that is, the height of a controls the size of both the minimal numerator and the minimal denominator of a. More generally, let A ⊂ Q be a finite set, and let b ∈ N be a minimal common denominator for all the elements of A . Then h(A ) = log max{ |b A |, b }. The following is the analogous statement for the general case. 1.3 Let A ⊂ K be a finite set. Then there exist b ∈ Z \ {0} and B ⊂ O K such that LEMMA

bA = B,

h(A ) ≤ h({b} ∪ B ) ≤ [K : Q] h(A ).

Proof Let v ∈ M K \ M K∞ , and set P for the corresponding prime ideal of O K . Let av ∈ A such that h v (A ) = h v (av ), and set c(P ) = max{0, − ordP (av )}. Then ordP (a) ≥ −c(P ) for every a ∈ A , and h v (A ) = c(P ) log N(P )/eP f P . Set Y b := N(P )c(P ) , B := {b a ; a ∈ A }, P

where P runs over all prime ideals of O K . Clearly b ∈ N \ {0}. We have ordP (b) = eP f P c(P ), and so ordP (b a) ≥ eP f P c(P ) − c(P ) ≥ 0 for every a ∈ A . Hence B ⊂ O K . For v ∈ M K∞ we have h v ({b} ∪ B ) = h v (A ) + log b, and so X X 1 1 h({b} ∪ B ) = Nv h v ({b} ∪ B ) = Nv h v (A ) + log b. [K : Q] [K : Q] ∞ ∞ v∈M K

v∈M K

We have log b =

X

c(P ) log N(P ) =

P

X

Nv h v (A ),

v ∈M / K∞

and therefore h({b} ∪ B ) =

X X 1 Nv h v (A ) + Nv h v (A ) ≤ [K : Q] h(A ). [K : Q] ∞ ∞ v∈M K

v ∈M / K


535

On the other hand, we have h v (A ) + log |b|v ≤ h v ({b} ∪ B) for all v ∈ M K . Applying the product formula (1.3) to b, we obtain X 1 Nv h v (A ) + log |b|v [K : Q] v X 1 ≤ Nv h v ({b} ∪ B ) = h({b} ∪ B ). [K : Q] v

h(A ) =

∗

Finally, let a ∈ Q be a nonzero algebraic number, and set pa ∈ Z[t] for its primitive minimal polynomial. We have h(a) = m( pa )/ deg a. More generally, the height of a finite set can be seen as the height of the minimal polynomial of a generic linear combination of its elements. This gives a partial motivation for the notion of global height of a finite set. LEMMA 1.4 Let A := {a1 , . . . , a N } ⊂ K be a finite set, and set Y pA := (u 0 + σ (a1 ) u 1 + · · · + σ (a N ) u N ) ∈ Q[u 0 , . . . , u N ], σ

where the product is taken over all Q-embeddings σ : K ,→ Q. Then − log(N + 1) ≤ h(A ) − h( pA )/[K : Q] ≤ log(N + 1). Proof Set L(u) := u 0 + a1 u 1 + · · · + a N u N ∈ K [u], so that Y pA = σ (L). σ

For v0 ∈ MQ we choose an inclusion Q ,→ Cv0 . Then for each v ∈ M K such that v|v0 there are Nv embeddings σ : K ,→ Q which correspond to it. We note that for each such σ , log |σ (L)| = h v (A ) holds. Applying Lemma 1.2(b), we obtain X h ∞ ( pA ) ≤ log |σ (L)| + [K : Q ] log(N + 1) σ

=

X

Nv h v (A ) + [K : Q ] log(N + 1).

v∈M K∞

In the same way we obtain h p ( pA ) ≤ h( pA ) ≤

X v∈M K

P

v| p

Nv h v (A ) for p prime, and hence

Nv h v (A ) + [K : Q] log(N + 1) = [K : Q] h(A ) + log(N + 1) .

536


On the other hand, log |σ (L)| ≤ m(σ (L)) for every σ , as L has total degree 1. Thus X [K : Q] h(A ) = Nv h v (A ) v∈M K

=

X

log |σ (L)|∞ +

σ

≤ m( pA ) +

XX p

X

log |σ (L)| p

σ

h p ( pA )

p

≤ h( pA ) + [K : Q] log(N + 1) by application of Lemma 1.2(d) and inequality (1.1), and the definition of the height.

1.2. Height of varieties In this section we introduce the notions of local and global height of an affine variety defined over a number field. For this aim we recall the basic facts of the degree and Chow form of varieties. As an important particular case, we study the height of an affine toric variety. 1.2.1. Degree of varieties Let k be an arbitrary field, and let V ⊂ An be an affine equidimensional variety of dimension r . We recall that the degree of V is defined as the number of points in the intersection of V with a generic linear variety of dimension n −r . This coincides with the sum of the degrees of its irreducible components. For an arbitrary variety V ⊂ An we set V = ∪i Vi for its decomposition into equidimensional varieties. Following Heintz [23], we define the degree of V as X deg V := deg Vi . i

For V = ∅ we agree deg V := 1. This is a positive integer, and we have deg V = 1 if and only V is a linear variety. The degree of a hypersurface equals the degree of any generator of its defining ideal. The degree of a finite variety equals its cardinal. For a linear morphism ϕ : An → Am and a variety V ⊂ An , we have deg ϕ(V ) ≤ deg V , where ϕ(V ) denotes the Zariski closure of ϕ(V ) in Am . The basic aspect of this notion of degree is its behavior with respect to intersections. It verifies the Bézout inequality deg(V ∩ W ) ≤ deg V deg W


537

for V, W ⊂ An , without any restriction on the intersection type of V and W (see [23, Th. 1], [14, Exam. 8.4.6]). 1.2.2. Normalization of Chow forms Let V ⊂ An be an affine equidimensional variety of dimension r defined over a field k. Let FV be a Chow form of V , that is, a Chow form of its projective closure V ⊂ Pn . This is a squarefree polynomial over k in r + 1 groups U0 , . . . , Ur of n + 1 variables each. It is multihomogeneous of degree D := deg V in each group of variables, and it is uniquely determined up to a scalar factor. In the case when V is irreducible, FV is an irreducible polynomial, and in the general case of an equidimensional variety, the product of Chow forms of its irreducible components is a Chow form of V . In order to avoid this indeterminacy of FV , we fix one of its coefficients under a technical assumption on the variety V . For purpose of reference, we state it in the following assumption. ASSUMPTION 1.5 We assume that the projection πV : V → Ar defined by x 7→ (x1 , . . . , xr ) verifies #πV−1 (0) = deg V .

This assumption implies that πV : V → Ar is a dominant map of degree deg V , by the theorem of dimension of fibers. Later on, we prove that in fact this assumption implies that the projection πV is finite, that is, the variables x1 , . . . , xr are in Noether normal position with respect to V (Lemma 2.14). We remark that the previous condition is satisfied by any variety under a generic linear change of variables. Each group of variables Ui is associated to the coefficients of a generic linear form L i (Ui ) := Ui 0 + Ui 1 x1 + · · · + Ui n xn . The main feature of a Chow form is that FV (ν0 , . . . , νr ) = 0 ⇔ V ∩ {L 0h (ν0 ) = 0} ∩ · · · ∩ {L rh (νr ) = 0} 6 = ∅ n+1

holds for νi ∈ k . Here L ih := Ui 0 x0 + · · · + Ui n xn stands for the homogenization of L i . Assumption 1.5 implies that V ∩ {x1 = 0} ∩ · · · ∩ {xr = 0} is a zero-dimensional variety of Pn lying in the affine space {x0 6= 0}. Set ei for the (i + 1)-vector of the canonical basis of k n+1 . Then FV (e0 , . . . , er )—that is, the coefficient of the monomial U0D0 · · · UrDr —is nonzero. We then define the (normalized) Chow form ChV of V by fixing the election of FV through the condition ChV (e0 , . . . , er ) = 1. Under this normalization, ChV equals the product of the normalized Chow forms of the irreducible components of V .

538


1.2.3. Height of varieties over Cv Let v ∈ {∞, p; p prime} be an absolute value over Q, and let V ⊂ An (Cv ) be an equidimensional variety of dimension r which satisfies Assumption 1.5. We introduce the height of V as a Mahler measure of its normalized Chow form. Definition 1.6 The height of the affine variety V ⊂ An (Cv ) is defined as r +1 h v (V ) := m(ChV ; Sn+1 ) + (r + 1)

X n

1/2i

deg V

i=1

in case v = ∞ and as h v (V ) := h v (ChV ) in case v = p for some prime p. This definition coincides in the non-Archimedean cases with the local height of V ⊂ Pn with respect to the divisors div(x0 ), . . . , div(xr ) ∈ Div(Pn ) as it is introduced in [20, Sec. 9]. In general, it is also closely related to Philippon’s local height of a projective variety (see [49, II]). Let us consider some examples: Pn Pi • We have that h ∞ An (C) equals the Stoll number i=1 j=1 1/2 j, while h p An (C p ) = 0. This follows from [5, Lem. 3.3.1], [55, Th. 3], [49, I, Th. 2], and the fact that ChAn = det(U0 , . . . , Un ). • Let V ⊂ An (Cv ) be a hypersurface verifying Assumption 1.5, defined by a squarefree polynomial f ∈ Cv [x1 , . . . , xn ]. Then the coefficient of the monodeg V mial xn is nonzero, and we can suppose without loss of generality that it equals 1. Let f h denote the homogenization of f . Then h v (V ) = m( f h ; Sn+1 ) +

X n−1 X i

1/2 j

deg V

i=1 j=1

•

in case v = ∞, while in case v = p for some prime p, h v (V ) = h v ( f ) (see [49, I, Cor. 4]). In case V = {ξ } for some ξ ∈ An , we have (see, e.g., [49, I, Prop. 4]) 1 log(1 + |ξ1 |2 + · · · + |ξn |2 ), 2 h p (V ) = h p (ξ ).

h ∞ (V ) =


539

1.2.4. Height of varieties over a number field Let V ⊂ An (Q) be an equidimensional variety of dimension r defined over a number field K . We define the (global) height h(V ) of V as the Faltings height (see [11]) of its projective closure V ⊂ Pn . It verifies the identity X X 1 r +1 + Nv m σv (FV ); Sn+1 Nv log |FV |v h(V ) = [K : Q] ∞ ∞ v∈M K

+ (r + 1)

X n

v ∈M / K

1/2i

deg V,

i=1

where FV denotes any Chow form of V (see [55, Th. 3], [49, I, Th. 2]). Following Philippon [49, III], we introduce h through this identity without appealing to Arakelov theory. For an arbitrary affine variety, we define its (global) height as the sum of the heights of its equidimensional components. It coincides with the sum of the heights of its irreducible components. We agree that h(∅) := 0. We also introduce the local counterpart of this notion. Let v ∈ M K be an absolute value over K , and suppose that V satisfies Assumption 1.5. Let v0 ∈ MQ such that v|v0 , and let σv : K v → Cv0 be an embedding corresponding to v. We define the local height of V at v as h v (V ) := h v0 (σv (V )). This is consistent with the global height h(V ) =

X 1 Nv h v (V ). [K : Q] v∈M K

The global height h is related to the height h BGS of Bost, Gillet, and Soulé by the formula X r X i h(V ) = h BGS (V ) + 1/2 j deg V i=1 j=1

(see [5, Prop. 4.1.2 (i)]). It is also related to the height h introduced in [17] in terms of the so-called geometric solution of a variety. They are polynomially equivalent (see [54, Th. 1.3.26]), namely, h(V ) ≤ (n deg V h(V ))c , h(V ) ≤ (n deg V h(V ))c , for some constant c > 0. P P We have h(V ) ≥ ( ri=1 ij=1 1/2 j) deg V , with equality only in the case when V is defined by the vanishing of n − r standard coordinates (see [5, Th. 5.2.3]). For Pn Pi instance, h(An ) = i=1 j=1 1/2 j. In particular, h(V ) ≥ 0.

540


This notion of height satisfies the arithmetic Bézout inequality (see [5, Th. 5.5.1 (iii)], [49, III, Th. 3]) h(V ∩ W ) ≤ h(V ) deg W + deg V h(W ) + c deg V deg W, for V, W ⊂ An (Q), with dim XV dim XW c := i=0

j=0

1 2(i + j + 1)

dim V + dim W + n− log 2. 2

1.2.5. Height of affine toric varieties Now we consider the case of affine toric varieties. The obtained height estimate is crucial in our treatment of the sparse arithmetic Nullstellensatz (see Corollary 4.12). In what follows we recall some basic notation and results of affine toric varieties and sparse resultants. References are [15] and [57]. Let A = {α1 , . . . , α N } ⊂ Zn be a finite set of integer vectors. Let r := dim A denote the dimension of A , that is, the dimension of the free Z-module ZA . We normalize the volume form of RA in order that any elementary simplex of the lattice ZA have volume 1. The (normalized) volume Vol(A ) of A is defined as the volume of the convex hull Conv({0} ∪ A ) with respect to this volume form. In case ZA = Zn , Vol(A ) then equals n! times the volume of Conv({0} ∪ A ) with respect to the Euclidean volume form of Rn . ∗ N We associate to the set A a map (Q )n → Q defined by ξ 7→ (ξ α1 , . . . , ξ α N ). The Zariski closure of the image of this map is the affine toric variety X A ⊂ A N . This is an irreducible variety of dimension r and degree Vol(A ). For i = 0, . . . , r , we denote by Ui a group of variables indexed by the elements of A and we set X Fi := Uiα x α α∈A

for the generic Laurent polynomial with support contained in A . Let W ∗ ∗ (P N −1 )r +1 × (Q )n be the incidence variety of F0 , . . . , Fr in (Q )n , that is

⊂

W = {(ν0 , . . . , νr ; ξ ); Fi (νi )(ξ ) = 0 ∀i}, ∗

and let π : (P N −1 )r +1 × (Q )n → (P N −1 )r +1 be the canonical projection. Then π(W ) is an irreducible variety of codimension 1. Any of its defining polynomials RA ⊂ Q[U0 , . . . , Ur ] is called the A -resultant or sparse resultant, and it coincides with a Chow form of the affine toric variety X A (see [27]). It is a multihomogeneous polynomial of degree Vol(A ) in each group of variables, and it is uniquely defined up to its sign, if we assume it to be a primitive polynomial with integer coefficients. We obtain the following bound for the height of X A . Our argument relies on the Canny-Emiris determinantal formula for the sparse resultant (see [8]).


541

PROPOSITION 1.7 Let A ⊂ Zn be a finite set of dimension r and cardinality #A ≥ 2. Then

h(X A ) ≤ 22 r +2 log(#A ) Vol(A ). Proof Let RA denote the A -resultant, which we assume to be primitive with integer coefficients. Thus X N h(X A ) = m(RA ; SrN+1 ) + (r + 1) 1/2 i Vol(A ), +1 i=1

and so it suffices to estimate the SrN+1 +1 -Mahler measure of RA . Let M be the Canny-Emiris matrix associated to the generic polynomial system F0 , . . . , Fr . This is a nonsingular square matrix of order M, where M denotes the cardinality of the set E := (r + 1) Q + ε ∩ Zn . Here Q := Conv({0} ∪ A ), and ε ∈ Rn is any vector such that each point in E is contained in the interior of a cell in a given triangulation of the polytope (r + 1) Q. In particular, ε can be arbitrarily chosen in a nonempty open set of Rn . Every nonzero entry of M is a variable Uiα . In fact, each row has exactly N nonzero entries, which consist of the variables in some group Ui . We refer to [8, Sec. 4] for the precise construction. Thus det M ∈ Z[U0 , . . . , Ur ] is a multihomogeneous polynomial of total degree M and height bounded by M log N . This polynomial is a nonzero multiple of the sparse resultant RA (see [8, Th. 5.2]). The assumption that RA is primitive implies that det M /RA lies in Z[U0 , . . . , Ur ], and so m(RA ) ≤ m(det M ). Let {T j } j∈J be a unimodular triangulation of Q, so that {(r + 1) T j } j∈I is a triangulation of (r + 1) Q. For every ε ∈ Rn , the set of integer points contained in (r + 1) T j + ε is in correspondence with a subset of those of (r + 1) T j . Moreover, for a generic choice of ε we loose—at least—the set of integer points in a facet of codimension 1. Thus 2r n n # (r + 1) T j + ε ∩ Z ≤ # r T j ∩ Z = ≤ 22 r −1 r and so M≤

X

# (r + 1) T j + ε ∩ Zn ≤ 22 r −1 #J = 22 r −1 Vol(A ).

j∈J

Applying Lemma 1.1, we obtain m(RA ) ≤ log | det M | + deg(det M ) log N

542


≤ 2 M log N ≤ 22 r log N Vol(A ). We conclude that h(X A ) = m(RA ; SrN+1 +1 ) + (r + 1)

X N

1/2 i

Vol(A )

i=1

≤ m(RA ) + 2 (r + 1) log N Vol(A ) ≤ 22 r +1 log N Vol(A ), as N = #A ≥ 2. In case A ⊂ (Z≥0 )n —that is, when F0 , . . . , Fr are polynomials—we set d := max{|α| : α ∈ A } = deg F0 . We then have N = d+n ≤ (n + 1)d and so n h(X A ) ≤ 22 r +1 log(n + 1) d Vol(A ). 2. Estimates for local and global heights In this chapter we study the basic properties of local and global heights that we need for our purposes. The key result is a precise estimate for the local height of the trace and the norm of a polynomial f ∈ K [x1 , . . . , xn ] with respect to an integral extension K [Ar ] ,→ K [V ]. We also study some of the basic properties of the height of a variety, in particular, its behavior under intersection with hypersurfaces and under affine maps. 2.1. Estimates for Chow forms In this section we recall the notion of generalized Chow forms of a variety in the sense of Philippon [47], and we prove a technical estimate for its local height. 2.1.1. Generalized Chow forms Let V ⊂ An be an affine equidimensional variety of dimension r and degree D defined over a field k. For d ∈ N we denote by U (d)0 a group of d+n variables. Also, for 1 ≤ i ≤ r n we denote by Ui a group of n+1 variables, and we set U (d) := {U (d)0 , U1 , . . . , Ur }. Set X F := U (d)0α x α , L i := Ui0 + Ui1 x1 + · · · + Uin xn |α|≤d

for the generic polynomial in n variables of degree d and 1 associated, respectively, to U (d)0 and Ui .


543

Set N := d+n + r (n + 1), and let W ⊂ A N × V be the incidence variety of n F, L 1 , . . . , L r with respect to V ; that is, W :=

ν(d)0 , ν1 , . . . , νr ; ξ ; ξ ∈ V, F ν(d)0 (ξ ) = 0, L i (νi )(ξ ) = 0, 1 ≤ i ≤ r .

Let π : A N × An → A N denote the canonical projection. Then π(W ) ⊂ A N is a hypersurface (see [47, Prop. 1.5]), and any of its defining equations Fd,V ∈ k[U (d)] is called a generalized Chow form or a d-Chow form of V . A d-Chow form is uniquely defined up to a scalar factor. It shares many properties with the usual Chow form, which corresponds to the case d = 1. We have Fd,V ν(d)0 , ν1 , . . . , νr = 0 ⇔ V ∩ {F h ν(d)0 = 0} ∩ {L 1h (ν1 ) = 0} ∩ · · · ∩ {L r(h) (νr ) = 0} 6= ∅ n+1 (d+n) for ν(d)0 ∈ k n and νi ∈ k . Here V ⊂ Pn denotes the projective closure of V , while F h and L ih stand for the homogenization of F and L i , respectively. A d-Chow form Fd,V ∈ k[U (d)] is a multihomogeneous polynomial of degree D in the group of variables U (d)0 and of degree d D in each group Ui (see [47, Lem. 1.8]). When V is an irreducible variety, Fd,V is an irreducible polynomial of k[U (d)]. When V is equidimensional, it coincides with the product of d-Chow forms of its irreducible components. Now, let U0 be another group of n + 1 variables, and consider the morphism

%d : k[U (d)] → k[U0 , U1 , . . . , Ur ] defined by %d (F) = L d0 and %d (L i ) = L i for i = 1, . . . , r , where L 0 stands for the generic linear form associated to U0 . In other terms, d d d! d−|α| α1 αn %d (U (d)0α ) = U00 U01 · · · U0n where := α α (d − |α|)! α1 ! · · · αn ! for |α| ≤ d, and %d (Ui j ) = Ui j for i = 1, . . . , r and j = 0, . . . , n. This morphism gives the following relation between a d-Chow form Fd,V and the usual one (see [47, proof of Prop. 2.8]). 2.1 Let V ⊂ An be an equidimensional variety. Then %d (Fd,V ) = λ FVd for some λ ∈ k∗. LEMMA

544


Proof It is enough to consider the case when V is irreducible. Set r := dim V . The polynomials %d (Fd,V ) and FV both have the same zero locus; let νi ∈ n+1 A for i = 0, . . . , r . As %d (Fd,V ) = Fd,V (%d (U (d)0 ))α , U1 , . . . , Ur , then %d (Fd,V )(ν0 , . . . , νr ) = 0 if and only if V ∩ {%d (F h )(ν0 ) = 0} ∩ {L 1h (ν1 ) = 0} ∩ · · · ∩ {L rh (νr ) = 0} 6= ∅, that is, if and only if V ∩ {L 0h (ν0 )d = 0} ∩ {L 1h (ν1 ) = 0} ∩ · · · ∩ {L rh (νr ) = 0} 6 = ∅, which is clearly equivalent to FV (ν0 , . . . , νr ) = 0. On the other hand, as V is irreducible, FV is an irreducible polynomial, and thus %d (Fd,V ) is a power of FV (modulo a constant λ). Since deg FV = (r + 1) deg V and deg %d (Fd,V ) = (r + 1) d deg V , we derive that %d (Fd,V ) = λ FVd for some λ ∈ k∗. Now, assume that V satisfies Assumption 1.5. Then V ∩ {x0d = 0} ∩ {x1 = 0} ∩ · · · ∩ {xr = 0} = ∅. Setting e(d)α and ei for the α-vector and the (i + 1)-vector of the canonical bases d+n ( of k n ) and k n+1 , respectively, we infer that Fd,V (e(d)0 , e1 , . . . , er )—that is, the D U d D · · · U d D —is nonzero. coefficient of the monomial U (d)00 rr 11 We define the (normalized) d-Chow form C h d,V of V by fixing the election of Fd,V with the condition C h V (e(d)0 , e1 , . . . , er ) = 1. D U d D · · · U d D is the only monomial of In the previous construction, U (d)00 rr 11 d D · · · U d D . The imposed normalizations then k[U (d)] which maps through %d to U00 rr imply %d (C h d,V ) = C h dV . 2.1.2. An estimate for generalized Chow forms The following technical result is crucial to our local height estimates for the trace and the norm of a polynomial (see Sec. 2.3.2), as well as for the intersection of a variety with a hypersurface (see Sec. 2.2.2). The proof follows the lines of [47, Prop. 2.8]. We adopt the following convention: Let f ∈ k[x1 , . . . , xn ] be a polynomial of degree d. We denote by Fd,V ( f ) and Chd,V ( f ) the specialization of U (d)0 into the coefficients of f in Fd,V and Chd,V , respectively. LEMMA 2.2 Let V ⊂ An (Cv ) be an equidimensional variety of dimension r which satisfies Assumption 1.5. Let f ∈ Cv [x1 , . . . , xn ]. Then Pn r • m Chdeg f,V ( f ); Sn+1 + r i=1 1/2i deg f deg V ≤ deg f h v (V ) + h v ( f ) deg V + log(n + 1) deg f deg V for v = ∞,

SHARP ESTIMATES FOR THE ARITHMETIC NULLSTELLENSATZ •

545

h v Chdeg f,V ( f ) ≤ deg f h v (V ) + h v ( f ) deg V for v = p for some prime p.

We need the following lemma in order to treat the non-Archimedean case. 2.3 Let g ∈ C p [y1 , . . . , ym ], and let ⊂ Am (C p ) be a Zariski open set. Then LEMMA

|g| p = max {|g(ν)| p ; ν ∈ , |ν| p = 1}. Proof For q ∈ N we denote by G q the set of q-roots of 1 in Q ,→ C p . Let α = (α1 , . . . , αm ) ∈ Zm such that |αi | < q. Then X 0 if α 6 = 0, ξα = m q if α = 0. m ξ ∈G q

P Set g = α aα x α . Let q > deg g such that |q| p = 1, that is, p /| q. Then for any ω = (ω1 , . . . , ωm ) ∈ (C∗p )m we have aα =

X 1 g(ω ξ ) ξ −α . ωα q m m ξ ∈G q

From the previous expression we derive that for each ω ∈ S := {ω ; |ωi | p = 1} there exists ξω ∈ G qm such that |g| p ≤ maxξ |g(ω ξ )| p = |g(ω ξω )| p . But on the other hand, |g(ω ξω )| p ≤ maxα |aα | p = |g| p . Thus |g| p = |g(ω ξω )| p . The set S is Zariski dense in Am (C p ), and as G qm is finite, the set { ω ξω ; ω ∈ S } ∩ is also dense and, in particular, is nonempty. For any ν0 in this set we have |g| p = |g(ν0 )| p and therefore |g| p ≤ max {|g(ν)| p ; ν ∈ , |ν| p = 1}. The other inequality is straightforward. Proof of Lemma 2.2 First, we consider the case when V is a zero-dimensional variety. We may assume without loss of generality that V is irreducible; that is, V = {ξ } for some ξ = (ξ1 , . . . , ξn ) ∈ Cnv . Set d := deg f . Then X ChV = L(ξ ) := U0 + U1 ξ1 + · · · + Un ξn , Chd,V = F(ξ ) := Uα ξ α , α

546


where L and F denote generic polynomials in n variables of degree 1 and d, respectively. Then h ∞ (F(ξ )) = log max {|ξ α |} |α|≤d

= log max{1, |ξi |d } i

= d h ∞ (L(ξ )) ≤ d m L(ξ ); Sn+1 + d

X n

1/2 i .

i=1

The last line follows from inequality (1.2). Now, a direct computation shows that h ∞ Chd,V ( f ) ≤ h ∞ F(ξ ) + h ∞ ( f ) + log(n + 1) d. In this case, Chd,V ( f ) ∈ C and so m Chd,V ( f ) = h ∞ Chd,V ( f ) ≤ d m L(ξ ); Sn+1 + d

X n

1/2 i + h ∞ ( f ) + log(n + 1) d

i=1

≤ d h ∞ (V ) + h ∞ ( f ) + log(n + 1) d. Analogously, h p F(ξ ) ≤ d h p L(ξ ) and so h p Chd,V ( f ) ≤ d h p (V ) + h p ( f ). r (n+1) Now, we consider the general case. Set ν = (ν1 , . . . , νr ) ∈ Cv , L(νi ) := νi0 + νi1 x1 + · · · + νin xn , and V (ν) := V ∩ V (L(ν1 ), . . . , L(νr )) ⊂ An (Cv ). Then V (ν) is a zero-dimensional variety of degree deg V for ν in a Zariski open set v of Ar (n+1) (Cv ). Let ν ∈ . By [47, Prop. 2.4] there exist λν , θν ∈ C∗v such that ChV (ν) (U0 ) = λν ChV (U0 , ν) , Chd,V (ν) U (d)0 = θν Chd,V U (d)0 , ν , (2.1) where ChV (U0 , ν), Chd,V U (d)0 , ν stand for the specialization of U1 , . . . , Ur into ν1 , . . . , νr . Applying the morphism %d linking the d-Chow form with the usual one, we obtain ChdV (ν) = %d (Chd,V (ν) ) = θν %d Chd,V (ν) = θν ChdV (ν), and so θν = λdν in identities (2.1).


547

We consider the case v = ∞. Any Zariski closed set of Ar (n+1) (C) intersects in a set of µrn+1 -measure zero, and so the previous relation holds for almost evr , which means that for those ν, Ch d ery ν ∈ Sn+1 d,V ( f, ν) = Chd,V (ν) ( f )/λν . Therefore Z r m Chd,V ( f ); Sn+1 = log |Chd,V ( f, ν)| µrn+1 r Sn+1

r Sn+1

Z = r Sn+1

Z ≤ r Sn+1

log |Chd,V (ν) ( f )| − d log |λν | µrn+1 d h ∞ V (ν) + h ∞ ( f ) deg V (ν)

+ log(n + 1) d deg V (ν) − d log |λν | µrn+1 Z =d m ChV (ν) (U0 ); Sn+1 − log |λν | µrn+1 r Sn+1

+

X n

1/2i d deg V + h ∞ ( f ) deg V

i=1

+ log(n + 1) d deg V Z =d m ChV (U0 , ν); Sn+1 µrn+1 r Sn+1

+

X n

1/2i d deg V + h ∞ ( f ) deg V

i=1

+ log(n + 1) d deg V = d h ∞ (V ) + h ∞ ( f ) deg V + log(n + 1) d deg V X n −r 1/2i d deg V. i=1

The case v = p follows analogously from the previous lemma, identities (2.1), and the zero-dimensional case. As before, let v ⊂ Ar (n+1) (Cv ) be a Zariski open set such that ν ∈ v implies that V (ν) is a zero-dimensional variety of degree deg V . By Lemma 2.3 we can take ν ∈ v such that log |ν| p = 1 and |Chd,V ( f )| p = |Chd,V ( f, ν)| p . Thus log |Chd,V ( f )| p = log |Chd,V (ν) ( f )| p − d log |λν | p ≤ d log |ChV (ν) | p + h p ( f ) deg V − d log |λν | p

548


= d log |ChV (U0 , ν)| p + h p ( f ) deg V ≤ d log |ChV | p + h p ( f ) deg V. The hypothesis that V satisfies Assumption 1.5 is essential in order to properly normalize the involved Chow forms and to define the local height of V . If we disregard normalization, we obtain altogether the following global result. LEMMA 2.4 Let V ⊂ An be an equidimensional variety of dimension r defined over a number field K , and let Fd,V be a d-Chow form of V . Let f ∈ K [x1 , . . . , xn ] be a polynomial of degree d. Then X r 1 Nv m σv Fd,V ( f ) ; Sn+1 [K : Q] ∞ v∈M K

+

X

Nv log |Fd,V ( f )|v + r

v ∈M / K∞

X n

1/2i d deg V

i=1

≤ d h(V ) + h( f ) deg V + log(n + 1) d deg V. Proof Note first that the product formula implies that the left-hand side of the inequality does not depend on the choice of the d-Chow form Fd,V . In the case when V is zero-dimensional, it satisfies Assumption 1.5 trivially. Thus the result follows from direct application of the previous lemma. For the general case, we let Fd,V be an arbitrary d-Chow form of V and we choose FV such that %d (Fd,V ) = FVd holds. Fix an absolute value v ∈ M K . Following the notation in the proof of the previous lemma, for any ν ∈ v there exists λν ∈ C∗v such that ChV (ν) (U0 ) = λν FV (U0 , ν) ,

Chd,V (ν) (U (d)0 ) = λdν Fd,V (U (d)0 , ν).

We then proceed as in the previous lemma, and we obtain the corresponding estimate for v. Adding up these estimates, we derive the estimate in terms of the height of the variety. 2.2. Basic properties of the height We derive some of the basic properties of the notion of height of a variety. In particular, we study the behavior of the height of a variety under intersection with a hypersurface and under an affine map. We also obtain an arithmetic version of the Bernstein-Kushnirenko theorem.


549

2.2.1. Height of varieties under affine maps Let ϕ : An → Am be a regular map defined by polynomials ϕ1 , . . . , ϕm ∈ K [x1 , . . . , xn ]. We recall that the height of ϕ is defined as h(ϕ) := h(ϕ1 , . . . , ϕm ). We obtain the following estimate for the height of the image of a variety under an affine map. PROPOSITION 2.5 Let V ⊂ An be a variety of dimension r , and let ϕ : An → A N be an affine map. Then h ϕ(V ) ≤ h(V ) + (r + 1) h(ϕ) + 8 log(n + N + 1) deg V.

The proof of this result follows from the study of the particular cases of a linear projection and an injective affine map. The following estimate for the height of a linear projection of a variety generalizes [11, Prop. 2.10] and [5, Sec. 3.3.2]. Its proof is essentially based on the description of the Chow form of such a projection variety, due to P. Pedersen and B. Sturmfels [46, Prop. 4.1]. LEMMA 2.6 Let V ⊂ An × Am be a variety of dimension r , and let π : An × Am → An denote the projection (x, y) 7→ x. Then h π(V ) ≤ h(V ) + 3 (r + 1) log(n + m + 1) deg V.

Proof We assume without loss of generality that V is irreducible. Set W := π(V ) ⊂ An and s := dim W . The case s = r follows directly from [46, Prop. 4.1]. In this case, there exists a partial monomial order ≺ such that FW | init FV ,

where init FV denotes the initial polynomial of FV with respect to ≺. In particular, init FV is the sum of some of the terms in the monomial expansion of FV . The general case s ≤ r reduces to the previous one. We choose standard coordinates z s+1 , . . . , zr of Am such that the projection $ : An × Am → An × Ar −s , verifies dim Z = r for Z := $ (V ).

(x, y) 7→ (x, z)

550


Let % : An × Ar −s → An denote the canonical projection. Then F Z | init FV , π = % ◦ $ , and W = %(Z ). We have that %−1 (ξ ) = {ξ } × Ar −s for ξ ∈ %(Z ) by the theorem of dimension of fibers. Thus Z = W × Ar −s , and, in particular, i(W ) = Z ∩ V (z s+1 , . . . , zr ) ⊂ An × Ar −s , where i denotes the canonical inclusion An ,→ An × Ar −s . We have deg W = deg Z and so FW := F Z (z s+1 , . . . , zr ) is a Chow form of W (see [47, Prop. 2.4]). Now, we estimate the height of FW . Let K be a number field of definition of V , and set init FV = Q F Z for some polynomial Q. From the proof of [47, Lem. 1.12(v)], there is a nonzero coefficient λ of Q such that log |λ|v ≤ m(σv (Q)) for all v ∈ M K∞ . Clearly log |λ|v ≤ log |Q|v also holds for all v ∈ / M K∞ . Thus m σv (F Z ) ≤ m σv (init FV ) − log |λ|v for v ∈ M K∞ , while log |F Z |v ≤ log | init FV |v − log |λ|v for v ∈ / M K∞ . ∞ Let v ∈ M K . From [47, Lem. 1.13] we obtain m σv (FW ) ≤ m σv (F Z ) . Hence s+1 m σv (FW ); Sn+1 ≤ m σv (FW ) ≤ m σv (init FV ) − log |λ|v ≤ log | init FV |v + (r + 1) log(n + m + 1) deg V − log |λ|v ≤ log |FV |v + (r + 1) log(n + m + 1) deg V − log |λ|v n+m X r +1 ≤ m σv (FV ); Sn+m+1 + (r + 1) 1/2i deg V i=1

+ 2 (r + 1) log(n + m + 1) deg V − log |λ|v / M K∞ we have analoby application of Lemma 1.1 and inequality (1.2). In case v ∈ gously log |FW |v ≤ log |FV |v − log |λ|v , and so h(W ) ≤ h(V ) + (s + 1)

X n

1/2i

deg V + 2 (r + 1) log(n + m + 1) deg V

i=1

≤ h(V ) + 3 (r + 1) log(n + m + 1) deg V. The following lemma is a variant of [49, I, Prop. 7].


551

LEMMA 2.7 Let V ⊂ Am be a variety of dimension r , and let ψ : Am → An be an injective affine map. Then h ψ(V ) ≤ h(V ) + (r + 1) h(ψ) + 5 log(n + 1) deg V.

Proof We assume again without loss of generality that V is irreducible. Let K be a number field of definition of both V and ψ, and set ψ(x) = a + A x for some (m × n)-matrix A of maximal rank and a ∈ K n . Then let ψ ∗ : An+1 → Am+1 be the linear map y 7→ (a, A)t y defined by the transpose of the matrix associated to ψ. Set W := ψ(V ), and let V ⊂ Pm , W ⊂ Pn denote the projective closures of V and W , respectively. n+1 For i = 0, . . . , r we let νi ∈ Q , and we set L h (νi ) := νi0 x0 + · · · + νin xn for the homogenization of the associated linear form. Then FW (ν0 , . . . , νr ) = 0 if and only if there exists ξ ∈ V such that ψ(ξ ) lies in the linear space determined by ν0 , . . . , νr . Equivalently, ξ lies in the linear space determined by ψ ∗ (ν0 ), . . . , ψ ∗ (νr ). We conclude that FW = FV ◦ (ψ ∗ )r +1 . Let v ∈ M K∞ . Then r +1 m σv (FW ), Sn+1 ≤ log |FW |v + (r + 1) log(n + 1) deg V ≤ log |FV |v + (r + 1) h v (ψ) + 2 log(n + 1) deg V + (r + 1) log(n + 1) deg V ≤ m σv (FV ) + (r + 1) log(m + 1) deg V + (r + 1) h v (ψ) + 3 log(n + 1) deg V X m r +1 ≤ m σ (FV ), Sm+1 + 1/2i (r + 1) deg V i=1

+ (r + 1) h v (ψ) + 4 log(n + 1) deg V. Here we have applied Lemma 1.1, inequality (1.2), and the proof of Lemma 1.2(c), using the fact that the number of monomials of FV is bounded by (n + 1)(r +1) deg V . In case v 6∈ M K∞ we obtain analogously log |FW |v ≤ log |FV |v + (r + 1) h v (ψ) deg V , and hence h ψ(V ) ≤ h(V ) + (r + 1) h(ψ) + 5 log(n + 1) deg V.

552


Proof of Proposition 2.5 Let ψ : An → A N × An be the injective map x 7 → ϕ(x), x . Then ϕ decomposes as ϕ = π ◦ ψ, where π : A N × An → A N denotes the canonical projection. Thus h ϕ(V ) ≤ h ψ(V ) + 3 (r + 1) log(n + N + 1) deg ψ(V ) ≤ h(V ) + (r + 1) h(ψ) + 5 log(n + N + 1) deg V + 3 (r + 1) log(n + N + 1) deg V = h(V ) + (r + 1) h(ϕ) + 8 log(n + N + 1) deg V. 2.2.2. Local height of the intersection of varieties We obtain the following estimate for the local height of the intersection of a variety with a hypersurface. This is a consequence of our previous estimate for generalized Chow forms. This result can be seen as the local analogue of [47, Prop. 2.8], and its proof closely follows it. PROPOSITION 2.8 Let V ⊂ An be an equidimensional variety of dimension r defined over a number field K . Let f ∈ K [x1 , . . . , xn ] be a polynomial that is not a zero divisor in K [V ]. We assume that both V and V ∩ V ( f ) satisfy Assumption 1.5. Then there exists λ ∈ K ∗ such that • h v V ∩ V ( f ) ≤ deg f h v (V ) + h v ( f ) deg V + log(n + 1) deg f deg V − log |λ|v for v ∈ M K∞ , • h v (V ∩ V ( f ) ≤ deg f h v (V ) + h v ( f ) deg V − log |λ|v for v ∈ / M K∞ .

Proof Set d := deg f and W := V ∩ V ( f ) ⊂ An . By [47, Prop. 2.4] there exists Q ∈ K [U1 , . . . , Ur ] \ {0} such that Chd,V ( f ) = Q ChW . Then—as in the proof of Lemma 2.6—there exists a nonzero coefficient λ of Q such that log |λ|v ≤ m σv (Q) for all v ∈ M K∞ and log |λ|v ≤ log |Q|v for all v ∈ / M K∞ . Now, let v ∈ M K∞ . From inequality (1.2) we obtain r log |λ|v ≤ m σv (Q) ≤ m σv (Q); Sn+1 +r

X n

1/2i (d deg V − deg W )

i=1

since Q has degree d deg V − deg W in each group of variables. Then X n r h v (W ) = m σv (ChW ); Sn+1 +r 1/2i deg W i=1


553

X n r = m σv Chd,V ( f ) ; Sn+1 +r 1/2i d deg V i=1 r − m σv (Q); Sn+1 −r

X n

1/2i (d deg V − deg W )

i=1

≤ d h v (V ) + h v ( f ) deg V + log(n + 1) d deg V − log |λ|v by straightforward application of Lemma 2.2. The case v ∈ / M K∞ follows in an analogous way. Proposition 2.8 can be immediately generalized to families of polynomials. COROLLARY 2.9 Let V ⊂ An be

an equidimensional variety of dimension r defined over K . Let f 1 , . . . , f s ∈ K [x1 , . . . , xn ] be polynomials that form a complete intersection in V . We assume that V ∩ V ( f 1 , . . . , f i ) satisfies Assumption 1.5 for i = 0, . . . , s. Set di := deg f i . Then there exists λ ∈ K ∗ such that P • h v V ∩ V ( f 1 , . . . , f s ) ≤ h v (V ) + h ( f )/d i deg V + s log(n + i v i Q ∞ 1) deg V i di − log |λ|v for v ∈ M K , Q P • h v V ∩ V ( f 1 , . . . , f s ) ≤ h v (V )+ i h v ( f i )/di deg V i di −log |λ|v for v ∈ / M K∞ .

Proof We just consider the case when v is Archimedean, as the other one follows similarly. From the preceding result we obtain h v V ∩ V ( f 1 , . . . , f i ) ≤ di h v V ∩ V ( f 1 , . . . , f i−1 ) + h v ( f i ) deg V ∩ V ( f 1 , . . . , f i−1 ) + log(n + 1) di deg V ∩ V ( f 1 , . . . , f i−1 ) − log |λi |v for some λi ∈ K ∗ . For the final estimate we apply iteratively this inequality and we Qs d ···d set λ := i=1 λi i+1 s . 2.10 Let f 1 , . . . , f s ∈ K [x1 , . . . , xn ] be polynomials that form a complete intersection in An . We assume that V ( f 1 , . . . , f i ) satisfies Assumption 1.5 for i = 1, . . . , s. Set di := deg f i . Then there exists λ ∈ K ∗ such that COROLLARY

554

•


h v V ( f1, . . . , fs ) ≤ for v ∈ M K∞ ,

•

h v V ( f1, . . . , fs ) ≤

Q i h v ( f i )/di + (n + s) log(n + 1) i di − log |λ|v

P P

i

h v ( f i )/di

Q

i

di − log |λ|v for v ∈ / M K∞ .

Proof We apply the previous result to V := An , using the fact that h ∞ (An ) =

n X i X

1/2 j ≤ n log(n + 1),

h p (An ) = 0.

i=1 j=1

The following corollary is the global counterpart of the previous results. It can be seen as an arithmetic analogue of [24, Prop. 2.3]. We remark that in the global situation we do not need to assume Assumption 1.5 for the intermediate varieties. In particular, f 1 , . . . , f s do not need to be a complete intersection in V . COROLLARY 2.11 Let V ⊂ An be a variety of dimension r , and let f 1 , . . . , f s ∈ Q[x1 , . . . , xn ]. Set di := deg f i , h := h( f 1 , . . . , f s ), and n 0 := min{r, s}. We assume that d1 ≥ · · · ≥ ds holds. Then X n0 n0 Y h V ∩V ( f 1 , . . . , f s ) ≤ h(V )+ 1/di h deg V +n 0 log(n +1) deg V di . i=1

i=1

Proof We proceed by induction on (r, s) with respect to the product order on N × N defined by (r, s) (r 0 , s 0 ) ⇔ r ≥ r 0 and s ≥ s 0 . The cases when r = 0 or s = 0 are both trivial. Now, let r, s ≥ 1; we assume that the statement holds for all (r 0 , s 0 ) ≺ (r, s) such that (r 0 , s 0 ) 6= (r, s). Let V = ∪C C be the decomposition of V into irreducible components. In case C ⊂ V ( f s ) we have that C ∩ V ( f 1 , . . . , f s ) = C ∩ V ( f 1 , . . . , f s−1 ) and by the inductive hypothesis, X m0 m0 Y h C ∩ V ( f 1 , . . . , f s ) ≤ h(C)+ 1/di h deg C +m 0 log(n +1) deg C di i=1

i=1

with m 0 := min{r, s − 1}. In case C 6⊂ V ( f s ) we have either C ∩ V ( f s ) = ∅ or dim C ∩ V ( f s ) ≤ r − 1. The first case is trivial. For the second case, we proceed as in the proof of Lemma 2.8,


555

applying Lemma 2.4 instead of Lemma 2.2, and we obtain h C ∩ V ( f s ) ≤ ds h(C) + h deg C + log(n + 1) ds deg C. Since dim(C ∩ V ( f s )) = r − 1, we can apply the inductive hypothesis to the variety C ∩ V ( f s ), and we obtain h C ∩ V ( f1, . . . , fs ) nX 0 −1 ≤ h C ∩ V ( fs ) + 1/di h deg C ∩ V ( f s ) i=1 n 0 −1 Y + (n 0 − 1) log(n + 1) deg C ∩ V ( f s ) di i=1

≤ h(C) +

X n0

1/di h deg C + n 0 log(n + 1) deg C

i=1

n0 Y

di .

i=1

Finally, X h V ∩ V ( f1 ∩ · · · ∩ fs ) ≤ h C ∩ V ( f1 ∩ · · · ∩ fs ) C

≤

X

h(C) +

X n0

C

= h(V ) +

1/di h deg C + n 0 log(n + 1) deg C

n0 Y

i=1

X n0

di

i=1

1/di h deg V + n 0 log(n + 1) deg V

n0 Y

di .

i=1

i=1

With the same notation as in Corollary 2.11, for V := An we obtain X Y n0 n0 h V ( f1, . . . , fs ) ≤ 1/di h + (n + n 0 ) log(n + 1) di . i=1

i=1

2.2.3. An arithmetic Bernstein-Kushnirenko theorem From our estimate for the height of an affine toric variety (see Proposition 1.7) and the previous results of this section, we derive the following arithmetic version of the Bernstein-Kushnirenko theorem. We refer to Section 1.2.5 for the notation. PROPOSITION 2.12 Let f 1 , . . . , f s ∈ K [x1 , . . . , xn ], and set

A := Supp(1, x1 , . . . , xn , f 1 , . . . , f s ) ⊂ (Z≥0 )n .

Also, set d := maxi deg f i and h := h( f 1 , . . . , f s ). Then

556 • •


deg V ( f 1 , . . . , f s ) ≤ Vol(A ), h V ( f 1 , . . . , f s ) ≤ n h + 22(n+1) log(n + 1) d Vol(A ).

Proof Set A := {α1 , . . . , α N }. The case N = 1 is trivial, and so we assume N ≥ 2. We also assume that α1 , . . . , αn are the vectors of the canonical basis of Rn . The map ϕA : An → A N induces an isomorphism between An and the affine toric variety X A ⊂ A N . The projection map πA : A N → An defined by y 7→ (y1 , . . . , yn ) restricted to X A is the inverse map of ϕA . P For i = 1, . . . , s we set f i = Nj=1 ai j x α j , and we let ì :=

N X

ai j y j ∈ K [y1 , . . . , y N ]

j=1

be the associated linear form. Set V := V ( f 1 , . . . , f s ) ⊂ An and W := X A ∩ V (`1 , . . . , `s ) ⊂ A N . We have ϕA (V ) = W , and so V = πA (W ). Then deg V ≤ deg W ≤ deg X A = Vol(A ) and h(V ) ≤ h(W ) + 3 (n + 1) log(N + 1) deg W ≤ h(X A ) + n h deg(X A ) + 4 (n + 1) log(N + 1) deg(X A ) ≤ n h + (22 n+1 log N + 4 (n + 1) log(N + 1)) Vol(A ) by successive application of Lemma 2.6, Corollary 2.11, and Proposition 1.7. Finally, 2(n+1) log(n + 1) d Vol(A ). N ≤ d+n n , and so h(V ) ≤ n h + 2 It seems that the factor 22 n in the estimate of h(X A ) is superfluous. If this is the case, the above estimate can be considerably improved. V. Maillot has recently obtained another estimate for the height of the isolated points of V ( f 1 , . . . , f s ), which is more precise in some particular cases (see [42, Cor. 8.2.3]). 2.3. Local height of norms and traces Let V ⊂ An be an equidimensional variety of dimension r and degree D defined over a field k which satisfies Assumption 1.5. As we see below, this implies that the projection πV : V → Ar defined by x 7 → (x1 , . . . , xr ) is finite (see Lemma 2.14). Set L := k(Ar ) and M := L ⊗k[Ar ] k[V ], so that M is a finite L-algebra of dimension D. Let f ∈ k[x1 , . . . , xn ]. We identify f ∈ k[V ] with the multiplication map M → M defined by q 7→ f q. From the Hamilton-Cayley theorem we derive that the characteristic polynomial X f ∈ L[t] of this map verifies X f ( f ) = 0.


557

The fact that the inclusion πV∗ : k[Ar ] ,→ k[V ] is integral implies that the minimal polynomial m f of this map lies in k[Ar ][t]. We have that X f | m Df in L[t], and so Gauss lemma implies that X f lies in fact in k[Ar ][t]. Moreover, the natural map k[V ] → M is an inclusion, as V is an equidimensional variety, and so X f ( f ) = 0 in k[V ]. Set X f = t D + b D−1 t D−1 + · · · + b0 ∈ k[Ar ][t]. Then the norm NV ( f ) and the trace TrV ( f ) of f are defined as NV ( f ) := (−1) D b0 ∈ k[Ar ] ,

TrV ( f ) := −b D−1 ∈ k[Ar ].

They equal, respectively, the determinant and the trace of the L-linear map f : M → M. We also define the adjoint polynomial f ∗ of f as f ∗ := (−1) D−1 ( f D−1 + b D−1 f D−2 + · · · + b1 ) ∈ k[x1 , . . . , xn ]. From the identity X f ( f ) = 0 we obtain that f ∗ f = NV ( f ) in k[V ]. The key result of this section is a precise bound for the height of the norm and the trace of a polynomial in the case when k is a number field. 2.3.1. Characteristic polynomials Let V ⊂ An be an equidimensional variety of dimension r and degree D defined over k. We keep notation as in Section 2.1.1: for d ∈ N we denote by P α and L := U + U x + · · · + U x F := i in n the generic i0 i1 1 |α|≤d U (d)0α x polynomial of degree d and 1 associated to the group of variables U (d)0 and Ui , respectively. As before, we set U (d) := {U (d)0 , U1 , . . . , Ur } and N := d+n + r (n + 1). n Also, we introduce an additional group T := {T0 , . . . , Tr } of r + 1 variables which correspond to the coordinate functions of Ar +1 . We consider the map ψ : A N × An → A N × Ar +1 , ν(d), ξ 7 → ν(d), F ν(d)0 (ξ ), L 1 (ν1 )(ξ ), . . . , L r (νr )(ξ ) , where ν(d) := (ν(d)0 , ν1 , . . . , νr ) ∈ A N and ξ ∈ An . Then the Zariski closure ψ(A N × V ) ⊂ A N × Ar +1 is a hypersurface, and any of its defining equations Pd,V ∈ k[U (d)][T ] is called a d-characteristic polynomial of V . Also, we define the characteristic polynomial of V by PV := P1,V . A d-characteristic polynomial is uniquely defined up to a scalar factor. In the case when V is irreducible, ψ(A N × V ) is an irreducible hypersurface and thus Pd,V is an irreducible polynomial. When V is equidimensional, it coincides with the product of d-characteristic polynomials of its irreducible components.

558


The following construction links the characteristic polynomial of a variety with its generalized Chow form. Set U (d)00 − T0 for α = 0, ζ (d)0α := U (d)0α for α 6 = 0. Analogously, for i = 1, . . . , r we set ζi0 := Ui0 − Ti and ζi j := Ui j for j 6= 0. Finally, we set ζ (d) := ζ (d)0 , ζ1 , . . . , ζr . 2.13 Let V ⊂ An be an equidimensional variety of dimension r and degree D. Let Fd,V be a d-Chow form of V . Then Fd,V ◦ ζ (d) is a d-characteristic polynomial of V . LEMMA

Proof It is enough to consider the case when V is irreducible. Let Pd,V be a d-characteristic polynomial of V . For (ν(d), ξ ) ∈ A N × V we set ϑ := F ν(d)0 (ξ ), L 1 (ν1 )(ξ ), . . . , L r (νr )(ξ ) ∈ Ar +1 , so that Pd,V ν(d) (ϑ) = 0. We observe that ξ ∈ V ∩ {F ν(d)0 (x) = ϑ0 } ∩ {L 1 (ν1 )(x) = ϑ1 } ∩ · · · ∩ {L r (νr )(x) = ϑr } ⊂ An . In particular, this variety is nonempty, and so we infer that Fd,V ◦ ζ (d) ν(d), ϑ = 0. This implies that Pd,V |Fd,V ◦ ζ (d), as Pd,V is an irreducible polynomial. On the other hand, Fd,V ◦ ζ (d) is also irreducible, as it is multihomogeneous and Fd,V ◦ ζ (d) U (d), 0 = Fd,V U (d) . We conclude that Pd,V and Fd,V ◦ ζ (d) coincide up to a factor in k ∗ . The previous construction shows that a d-characteristic polynomial of V is multihomogeneous of degree D in the group of variables U (d)0 ∪ {T0 } and of degree d D in each group Ui ∪ {Ti }. Set kd := k U (d) , and set φ : An (kd ) → Ar +1 (kd ) , x 7→ F(x), L 1 (x), . . . , L r (x) . Then Pd,V ∈ kd [T ] is also a minimal equation for the hypersurface φ(V ), and by Bézout inequality we have also degT Pd,V ≤ d D (see, e.g., [50, Prop.1]). We assume from now on that V satisfies Assumption 1.5, that is, #πV−1 (0) = deg V . In order to avoid the indeterminacy of the d-characteristic polynomial, we fix it as Pd,V := (−1) D Chd,V ◦ ζ (d).


559

In particular, we set PV := (−1) D ChV ◦ ζ (1) for the characteristic polynomial of V. Set PV := a D T0D + · · · + a0 for the expansion of PV with respect to T0 . We have that PV is multihomogeneous of degree D in each group Ui ∪ {Ti }. This implies that a D lies in fact in k[U1 , . . . , Ur ] and is multihomogeneous of degree D in each Ui for i = 1, . . . , r . D in Ch , and the imposed Moreover, a D coincides with the coefficient of U00 V normalization on ChV implies that a D (e1 , . . . , er ) = ChV (e0 , e1 , . . . , er ) = 1. We extend the morphism %d of Section 2.1.1 to a morphism k[U (d)][T ] → k[U0 , . . . , Ur ][T ] defining %d U (d)00 − T0 := (U00 − T0 )d and %d (Ti ) := Ti for 1 ≤ i ≤ r . In other terms, d X d d− j j %d (T0 ) = (−1) j−1 U00 T0 . j j=1

We obtain d %d (Pd,V ) = %d (−1) D Chd,V ◦ ζ (d) = (−1) D ChV ◦ ζ (1) = (−1)(d+1)D PVd . Now, set Pd,V = ad,D T0D + · · · + ad,0

for the expansion of Pd,V with respect to T0 . The previous remark implies that ad,D = %d (ad,D ) = a dD . In particular, ad,D ∈ k[U1 , . . . , Ur ] and ad,D (e1 , . . . , er ) = 1. The following lemma allows us to obtain a characteristic polynomial of f ∈ k[x1 , . . . , xn ] from the d-characteristic polynomial of the variety V . We introduce the following convention: Given a polynomial f ∈ k[x1 , . . . , xn ] of degree d and linear forms `1 , . . . , `r ∈ k[x1 , . . . , xn ], we denote by Pd,V ( f, `1 , . . . , `r ) the specialization of the variables in U (d) into the coefficients of f, `1 , . . . , `r . 2.14 Let V ⊂ An be an equidimensional variety of dimension r and degree D which satisfies Assumption 1.5. Then the projection πV : V → Ar is finite. Moreover, for a polynomial f ∈ k[x1 , . . . , xn ] of degree d, the characteristic polynomial of f is given by LEMMA

X f = Pd,V ( f, e1 , . . . , er )(t, x1 , . . . , xr ) ∈ k[Ar ][t].

560


Proof We have that PV (U0 , . . . , Ur )(L 0 , . . . , L r ) = 0 in k[U ] ⊗ k[V ], and so PV (e j , e1 , . . . , er )(t, x1 , . . . , xr ) ∈ k[Ar ][t]

is a monic equation for x j in k[V ], for j = r + 1, . . . , n. Thus the projection πV is finite. For the second assertion, set PF (t) := Pd,V U (d)0 , e1 , . . . , er (t, x1 , . . . , xr ) ∈ k[U (d)0 ][Ar ][t]. This is a polynomial of degree D. It is monic with respect to t, as ad,D ∈ k[U1 , . . . , Ur ] and ad,D (e1 , . . . , er ) = 1. We have PF (F) = 0 in k[U (d)0 ] ⊗ k[V ]. Now, let m F be the monic minimal polynomial of F. Let U 0 (d)0 be a group of d+n−r variables, and set F0 for the generic polynomial of degree d in the variables n−r xr +1 , . . . , xn . Then m F U 0 (d)0 , 0 ∈ k[U 0 (d)0 ][t] is an equation for F0 over πV−1 (0). Since πV−1 (0) is a zero-dimensional variety of degree D and F0 separates its points, we infer that degT0 m F = D, and so PF = m F . Finally, we obtain X f = X F ( f ) = PF ( f ) = Pd,V ( f, e1 , . . . , er )(t, x1 , . . . , xr ).

2.3.2. Estimates for norms and traces Now, we prove the announced estimates for the height of the norm and the trace of a polynomial. 2.15 Let V ⊂ An be an equidimensional variety of dimension r defined over K which satisfies Assumption 1.5. Let f ∈ K [x1 , . . . , xn ]. Then • deg NV ( f ) ≤ deg f deg V , • h v NV ( f ) ≤ deg f h v (V ) + h v ( f ) deg V + (r + 1) log(n + 1) deg f deg V for v ∈ M K∞ , • h v NV ( f ) ≤ deg f h v (V ) + h v ( f ) deg V for v ∈ / M K∞ . LEMMA

Proof We keep notation as in Section 2.3.1. Set d := deg f and D := deg V . We then have NV ( f ) = (−1) D Pd,V ( f, e1 , . . . , er )(0, x1 , . . . , xr ) = Chd,V ( f, e1 − e0 x1 , . . . , er − e0 xr )


561

by Lemmas 2.14 and 2.13. Then deg NV ( f ) ≤ degT Pd,V ≤ d D. From the previous expression we also obtain that the coefficients of NV ( f ) are some of the coefficients of Chd,V ( f ), and so | NV ( f )|v ≤ |Chd,V ( f )|v for every absolute value v of K . Let v ∈ M K∞ . Then log | NV ( f )|v ≤ log |Chd,V ( f )|v X n r ≤ m σv Chd,V ( f ) ; Sn+1 + r 1/2i d D + r log(n + 1) d D i=1

≤ d h v (V ) + h v ( f ) D + (r + 1) log(n + 1)d D by inequalities (1.1) and (1.2) and Lemma 2.2. In a similar way we obtain h v NV ( f ) ≤ d h v (V ) + h v ( f )D for v ∈ / M K∞ . The proof of the following lemma follows closely that of [50, Lem. 9]. We slightly improve the degree estimate obtained therein, and we get the corresponding height estimate. LEMMA 2.16 Let V ∈ An be an equidimensional variety of dimension r defined over K which satisfies Assumption 1.5. Let f, g ∈ K [x1 , . . . , xn ] such that f is not a zero divisor in K [V ]. Set d := max{deg f, deg g} and h v := max{h v ( f ), h v (g)} for v ∈ M K . Then • deg TrV ( f ∗ g) ≤ d deg V , • h v TrV ( f ∗ g) ≤ d h v (V ) + (h v + log 2) deg V + (r + 1) log(n + 1) d deg V for v ∈ M K∞ , • h v TrV ( f ∗ g) ≤ d h v (V ) + h v deg V for v ∈ / M K∞ .

Proof Let D := deg V , and let t be a new variable. Then K [x1 , . . . , xr , t] ,→ K [V × A1 ] is again an integral inclusion and NV ×A1 (t − f ∗ g) = X f ∗ g (t). Set Q(t) := NV ×A1 (t f − g) ∈ K [x1 , . . . , xr , t]. Since f ∗ f = NV ( f ), we have that NV ( f ∗ ) = NV ( f ) D−1 , and so NV ( f ) D−1 Q = NV ( f ∗ ) Q = X f ∗ g NV ( f )t . Set Q = c D t D +· · ·+c0 with ci ∈ K [Ar ]. The last identity then implies TrV ( f ∗ g) = −c D−1 .

562


Set q > D, and let G q denote the group of q-roots of 1. Then Q(ω) = N V (ω f − g) for ω ∈ G q , and so TrV ( f ∗ g) = −

1 X NV (ω f − g) ω1−D . q ω∈G q

From Lemma 2.15 we get deg TrV ( f ∗ g) ≤ d D. For v ∈ M K∞ , we then obtain h v TrV ( f ∗ g) ≤ max h v NV (ω f − g) ω∈G q

≤ d h v (V ) + (h v + log 2) D + (r + 1) log(n + 1) d D. Analogously, for v ∈ / M K∞ we take q > D such that |q|v = 1, and we obtain ∗ h v TrV ( f g) ≤ d h v (V ) + h v D. 3. An effective arithmetic Nullstellensatz In this section we obtain the announced estimates for the arithmetic Nullstellensatz over the ring of integers of a number field K . Theorem 1 in the introduction corresponds to the case K := Q. These estimates depend on the number of variables and on the degree and height of the input polynomials. 3.1. Division modulo complete intersection ideals A crucial tool in our treatment of the arithmetic Nullstellensatz is the trace formula. One of its outstanding features is that it performs effective division modulo complete intersection ideals (see [19], [13], [32], [50], [17], [22]). In this section we apply the trace formula to obtain sharp height estimates in the division procedure. 3.1.1. Trace formula We describe in what follows the basic aspects of duality theory for complete intersection algebras that we need in the sequel. We refer to E. Kunz [34, Appendix F] for a more complete presentation of this theory. Let k be a perfect field, and set A := k[t1 , . . . , tr ] and A[x] := A[x1 , . . . , xn ]. Let F := {F1 , . . . , Fn } ⊂ A[x] be a reduced complete intersection that defines a radical ideal (F) of dimension r . We consider the A-algebra B := A[x]/(F) = A[x1 , . . . , xn ]/(F1 , . . . , Fn ). We assume that the inclusion A ,→ B is finite; that is, the variables t1 , . . . , tr are in Noether normal position with respect to the variety V := V (F) ⊂ Ar +n . This is


563

the case, for instance, if V satisfies Assumption 1.5. Thus B is a projective A-module that turns out to be free of rank bounded by deg V by the Quillen-Suslin theorem. The dual A-module B ∗ := Hom A (B, A) can be seen as a B-module with scalar multiplication defined by f · τ (g) := τ ( f g) for f, g ∈ B and τ ∈ B ∗ . It is a free B-module of rank 1, and any of its generators is called a trace of B. The following construction yields a trace σ canonically associated to the complete intersection F. (x) We take new variables y := {y1 , . . . , yn }, and we set Fi := Fi (x) ∈ A[x] and (y) (y) (x) Fi := Fi (y) ∈ A[y]. Then Fi − Fi belongs to the ideal (y1 − x1 , . . . , yn − xn ), and so there exist (nonunique) Pi j ∈ A[x, y] such that (y)

Fi

(x)

− Fi

=

n X

Pi j (x, y) (y j − x j )

j=1

for i = 1, . . . , n. We consider the determinant 1 ∈ A[x, y] of the square matrix (Pi j )i j , and we write it as X 1= am bm m

with am ∈ A[x] and bm ∈ A[y]. Again, the polynomials am , bm are not uniquely defined. The polynomial 1 ∈ A[x, y] is called a pseudo-Jacobian determinant of the complete intersection F. Set cm := bm (x) ∈ A[x]. Then there exists a unique trace σ ∈ B ∗ such that for g ∈ A[x], X g= σ (g a m ) cm m

where the bar denotes class modulo (F). This identity is known as the trace formula. Let J := det(∂ Fi /∂ x j )i j be the Jacobian determinant of the complete intersection F with respect to the variables x1 , . . . , xn . Then the following identity—which justifies the name of pseudo-Jacobian for 1—holds: X J= a m cm . m

The standard trace TrV is related to σ by the equality TrV (g) = σ (J g) for all g ∈ A[x]. 3.1.2. A division lemma Throughout this section we keep notation and assumptions as in the previous one, but we replace k by a number field K . Set d := maxi deg Fi and h v := maxi h v (Fi )

564


for v ∈ M K . Here deg Fi denotes the total degree of Fi as an element of K [t1 , . . . , tr ][x1 , . . . , xn ]. We choose concrete polynomials am , cm which satisfy the trace formula, and we estimate their degree and local height. First, we choose the polynomials Pi j . Remarking that (y) Fi

−

(x) Fi

=

n X

Fi (x1 , . . . , x j−1 , y j , . . . , yn ) − Fi (x1 , . . . , x j , y j+1 , . . . , yn ),

j=1

we set Pi j := (Fi (x1 , . . . , x j−1 , y j , . . . , yn ) − Fi (x1 , . . . , x j , y j+1 , . . . , yn ))/(y j − x j ). Here we perform the division through the formula + y k−2 (y kj − x kj )/(y j − x j ) = y k−1 x j + · · · + y j x k−2 + x k−1 j j j j . We set 1 := det(Pi j )i j . Finally, we choose bm ∈ A[y] as the monomials in the expansion of 1 with respect to y, am ∈ A[x] as the corresponding coefficient, and we set cm := bm (x). P Set Fi = α Aiα x α with Aiα ∈ A. Then X α j−1 α j+1 α −1 α −1 Pi j = Aiα x1α1 · · · x j−1 y j+1 · · · ynαn (y j j + · · · + x j j ) ∈ A[x, y]. α

We deduce that deg Pi j ≤ d − 1 and h v (Pi j ) ≤ h v for every v ∈ M K . Then deg 1 ≤ n (d − 1), and so deg am + deg cm ≤ n (d − 1). We have also h v (cm ) = 0 and h v (am ) ≤ h v (1). Finally, we can write Pi j = C0 + · · · + Cd−1 y d−1 , j where each Ck ∈ A[x1 , . . . , x j , y j+1 , . . . , yn ] is a polynomial in n + r variables of total degree bounded by deg Pi j ≤ d − 1. This implies that the number of monomials +d−1 of Pi j is bounded by d n+rn+r ≤ d (n + r + 1)d−1 . ∞ Therefore, for v ∈ M K we have h v (am ) ≤ h v (1) ≤ n h v + (n − 1) log d + (d − 1) log(n + r + 1) + n log n ≤ n h v + d log(n + r + 1) + log d . Analogously, we have h v (am ) ≤ n h v for v ∈ / M K∞ .

(3.1)


565

The following lemma is a sharp estimate for the degree and local height of the polynomials in the division procedure. It is a substantial improvement over [32, Th. 29]. We introduce the notation degt f and degx f for the degree of a polynomial f ∈ A[x] with respect to the groups of variables t and x, respectively. MAIN LEMMA 3.1 (Division lemma) Set A := K [t1 , . . . , tr ] and A[x] := A[x1 , . . . , xn ]. Let F := {F1 , . . . , Fn } ⊂ A[x] be a reduced complete intersection defining a variety V := V (F) ⊂ Ar +n which satisfies Assumption 1.5. Set B := K [V ] = A[x]/(F). Let f, g ∈ A[x] be polynomials such that f ∈ B is a nonzero divisor and f | g in B. Set d := max{deg f, deg F1 , . . . , deg Fn } and h v := max{h v ( f ), h v (F1 ), . . . , h v (Fn )} for v ∈ M K . Then there exist q ∈ A[x] and ξ ∈ K ∗ such that • q f = g, • degx q ≤ n d, • deg q ≤ degt g + n d + max{(n + 1) d, degx g} deg V , •

•

h v (q) ≤ h v (g)+(n d +max{d, degx g}) h v (V )+ (n +1) h v +(r +6) log(n + r + 1) n d + max{(n + 1) d, degx g} deg V + 2 log(r + 1) degt g − log |ξ |v for v ∈ M K∞ , h v (q) ≤ h v (g) + (n d + max{d, degx g}) h v (V ) + (n + 1) h v deg V − log |ξ |v for v ∈ / M K∞ .

Proof Set L := K (t1 , . . . , tr ) for the quotient field of A and M := L ⊗ A B. Then M is a finite L-algebra of dimension deg V and σ can be uniquely extended to a L-linear map σ : M → M. The fact that B is a torsion-free A-algebra implies that the canonical map B → M is an inclusion. We only consider the case n ≥ 1. For the case n = 0 we refer to Remark 3.2. Whenever it is clear from the context, we avoid explicit reference to the ring in which we are considering a given element of A[x]. Let q0 ∈ A[x] be any polynomial such that q0 f = g in B. We have that f is a nonzero divisor in B, and so it is invertible in M. Then q0 = f −1 g in M and therefore σ ( f −1 g p) = σ (q0 p) ∈ A for all p ∈ A[x]. Then we set X q := σ ( f −1 g am ) cm ∈ A[x]. m

Trace formula implies that q ≡ q0 (mod (F)), and so q f = g in B.

566


Let J ∈ A[x] denote the Jacobian determinant of the complete intersection F with respect to the group of variables x. This is a nonzero divisor because of the Jacobian criterion (see, for instance, [10, Th. 18.15]), and so it is also invertible in M. Let (J f )∗ be the adjoint polynomial of J f , and set 3m := TrV (J f )∗ g am ∈ A. We have J f (J f )∗ = N(J f ) ∈ A \ {0}, and so 3m / N(J f ) = Tr (J f )−1 g am = σ ( f −1 g am ) ∈ A. In particular, N(J f ) | 3m in A, and we have the expression q=

X 1 3m cm . N(J f ) m

In the sequel, ξ ∈ K ∗ is any nonzero coefficient of N(J f ). Let us consider degrees. Clearly degx q ≤ maxm deg cm ≤ n(d − 1) ≤ n d. P α Next we analyze the total degree of q. Let g := α pα x be the monomial expansion of g with respect to x. Then X 3m = pα Tr (J f )∗ x α am , (3.2) α

as Tr is an A-linear map. We have the estimates deg(J f ) ≤ n (d − 1) + d ≤ (n + 1) d and deg(x α am ) ≤ degx g + deg am , from which we get deg Tr (J f )∗ x α am ≤ max{(n + 1) d, degx g + deg am } deg V by Lemma 2.16. Thus deg q ≤ degt g + max{max{(n + 1) d, degx g + deg am } deg V + deg cm } m

≤ degt g + max{max{(n + 1) d + deg cm , degx g + deg am + deg cm }} deg V m

≤ degt g + max{(n + 1) d + n d, degx g + n d} deg V ≤ degt g + (n d + max{(n + 1) d, degx g}) deg V. For the rest of the proof, we use the following basic estimates several times: max{deg(J f ), deg(x α am )} ≤ n d + max{d, degx g}, deg Tr (J f )∗ x α am ≤ (n d + max{d, degx g}) deg V. Finally, we estimate the local height of q. Let v ∈ M K∞ . We have h v (∂ Fi /∂ x j ) ≤ h v + log d, and so h v (J ) ≤ n (h v + log d) + (n − 1) log(n + r + 1) (d − 1) + n log n


567

≤ n h v + log(n + r + 1) d + log d . Therefore h v (J f ) ≤ n h v + log(n + r + 1) d + log d + h v + log(n + r + 1) d ≤ (n + 1) h v + (n + 1) log(n + r + 1) d + n log d

(3.3) by Lemma 1.2(b). We recall that h v (x α am ) ≤ n h v + log(n + r + 1) d + log d by inequality (3.1), and so max{h v (J f ), h v (x α am )} ≤ (n + 1) h v + (n + 1) log(n + r + 1) d + n log d. Then h v Tr (J f )∗ x α am ≤ (n d + max{d, degx g}) h v (V ) + (n + 1)h v + (n + 1) log(n + r + 1) d + n log d + log 2 deg V + (r + 1) log(n + r + 1) (n d + max{d, degx g}) deg V ≤ (n d + max{d, degx g}) h v (V ) + (n + 1) h v + (2 n + 1) log(n + r + 1) d deg V + (r + 1) log(n + r + 1) (n d + max{d, degx g}) deg V by Lemma 2.16. By considering separately the cases degx g ≤ (n + 1) d and degx g > (n + 1) d, we obtain (2 n + 1) d + (r + 1) (n d + degx g) ≤ degx g + n d + (r + 1) (n d + degx g) ≤ (r + 2) (n d + degx g). We conclude that h v Tr (J f )∗ x α am ≤ (n d + max{d, degx g}) h v (V ) + (n + 1) h v + (r + 2) log(n + r + 1) × n d + max{(n + 1) d, degx g} deg V. Hence h v (3m ) ≤ max{h v pα Tr (J f )∗ x α am } + log(n + 1) degx g α ≤ h v (g) + max{h v Tr (J f )∗ x α am } α

568


+ log(r + 1) (n d + max{d, degx g}) deg V + log(n + 1) degx g ≤ h v (g) + (n d + max{d, degx g})h v (V ) + (n + 1) h v + (r + 2) log(n + r + 1) × n d + max{(n + 1) d, degx g} deg V + log(r + 1) (n d + max{d, degx g}) deg V + log(n + 1) degx g ≤ h v (g) + (n d + max{d, degx g}) h v (V ) + (n + 1) h v + (r + 4) log(n + r + 1) × n d + max{(n + 1)d, degx g} deg V by application of identity (3.2) and Lemma 1.2(a,b). We have h v (q) ≤ max {h v 3m / N(J f ) } m

as each cm is a different monomial in x. Thus it remains only to estimate the local height of each 3m /N (J f ). Recall that ξ ∈ K ∗ is any nonzero coefficient of N(J f ). Then log |3m / N(J f )|v ≤ h v (3m ) + 2 log(r + 1) × degt g + (n d + max{d, degx g}) deg V

− log | N(J f )|v ≤ h v (g) + (n d + max{d, degx g}) h v (V ) + (n + 1) h v + (r + 6) log(n + r + 1) × n d + max{(n + 1)d, degx g} deg V + 2 log(r + 1) degt g − log |ξ |v

(3.4)

by Lemma 1.2(d) and the fact that log |ξ |v ≤ log | N(J f )|v . From Lemma 2.15 and inequality (3.3) we obtain log |ξ |v ≤ h v N(J f ) ≤ (n + 1) d h v (V ) + (n + 1) h v + (n + 1) × log(n + r + 1) d + n log d deg V + (r + 1) (n + 1) log(n + r + 1) d deg V ≤ (n + 1) d h v (V ) + (n + 1) h v + (r + 3) (n + 1) × log(n + r + 1) d deg V.

(3.5)


569

This implies that the right-hand side of inequality (3.4) is nonnegative. So the inequal ity also holds for h v 3m / N(J f ) and thus for h v (q). The case v ∈ / M K∞ is treated in exactly the same way. The obtained estimates do not involve any constant terms with respect to h v , h v (g), and h v (V ); in particular, degt g does not appear in the estimate. This follows simply from Lemma 1.2. In this case, inequality (3.5) reads as follows: log |ξ |v ≤ (n + 1) d h v (V ) + (n + 1) h v deg V.

(3.6)

We remark that the election of ξ is independent of v, and so it can be done uniformly. Remark 3.2 Let notation be as in the previous lemma. In case n = 0 we have the sharper estimates • deg q ≤ deg g, • h v (q) ≤ h v (g) + h v + 2 log(r + 1) deg g − log |ξ |v for v ∈ M K∞ , • h v (q) ≤ h v (g) + h v − log |ξ |v for v ∈ / M K∞ . ∗ Here ξ ∈ K denotes any nonzero coefficient of f . The local height estimates follow from Lemma 1.2(d) and the fact that h v − log |ξ |v ≥ 0. 3.2. An effective arithmetic Nullstellensatz 3.2.1. Estimates for the complete intersection case The following result gives estimates for the degree and local height of the polynomials arising in the Nullstellensatz over a number field K in the case when the input is a reduced weak regular sequence. It is a direct consequence of the division lemma above. These estimates depend mainly on the degree and height of the varieties successively cut out by the input polynomials. They are quite flexible, and they apply to other situations as well, as we see in Section 4. We recall that f 1 , . . . , f s ∈ K [x1 , . . . , xn ] is a weak regular sequence if f i+1 is not a zero divisor modulo the ideal ( f 1 , . . . , f i ) for i = 0, . . . , s − 1. Furthermore, it is called reduced when all these ideals are radical. LEMMA 3.3 Let f 1 , . . . , f s ∈ K [x1 , . . . , xn ] be polynomials without common zeros in An which form a reduced weak regular sequence. Assume that V j := V ( f 1 , . . . , f j ) satisfies Assumption 1.5 for j = 1, . . . , s − 1. Set d := maxi deg f i and h v := maxi h v ( f i ) for v ∈ M K . Assume also n, d ≥ 2. Then there exist p1 , . . . , ps ∈ K [x1 , . . . , xn ] and ξ ∈ K ∗ such that

570 • • •

•


1 = p1 f 1 + · · · + ps f s , Pmin{n,s}−1 deg pi ≤ 2 n d (1 + j=1 deg V j ), Ps−1 h v ( pi ) ≤ 2 n d j=1 h v (V j ) + (n + 1) h v + 2 n (2 n + 5) log(n + 1) d (1 + Ps−1 ∞ j=1 deg V j ) − log |ξ |v for v ∈ M K , Ps−1 Ps−1 h v ( pi ) ≤ 2 n d j=1 h v (V j ) + (n + 1) h v (1 + j=1 deg V j ) − log |ξ |v for v∈ / M K∞ .

Proof Set Ii := I (Vi ) = ( f 1 , . . . , f i ) for i = 1, . . . , s − 1. Also, set f 0 := 0, V0 := V ( f 0 ) = An , and I0 := I (V0 ) = (0). Finally, set Ai := K [x1 , . . . , xn−i ] and Bi := K [Vi ] = K [x1 , . . . , xn ]/Ii for 0 ≤ i ≤ s − 1. The fact that Vi satisfies Assumption 1.5 implies that the inclusion Ai ,→ Bi is integral. We note that the sets of free and dependent variables of Bi have cardinality n − i and i, respectively. Also, the set of dependent variables of B j is contained in that of Bi for i ≤ j. For f ∈ K [x1 , . . . , xn ] we denote by degx(i) f the degree of f in the dependent variables xn−i+1 , . . . , xn of Bi with respect to the integral inclusion Ai ,→ Bi . For i ≤ j, the previous observation implies that degx( j) f ≤ degx(i) f . Applying the Division Lemma 3.1, we construct inductively polynomials p1 , . . . , ps . First, we take ps such that ps f s ≡ 1

(mod Is−1 ).

For 0 ≤ i ≤ s − 2 we assume that pi+2 , . . . , ps are already constructed and we set bi+1 := 1 − ( pi+2 f i+2 + · · · + ps f s ). Then f i+1 is a nonzero divisor and f i+1 | bi+1 in Bi . We again apply the division lemma to obtain pi+1 such that pi+1 f i+1 ≡ bi+1

(mod Ii ).

Continuing this procedure until i = 0, we get 1 = p1 f 1 +· · ·+ ps f s in K [x1 , . . . , xn ]. Let us analyze degrees. First, we consider the case s ≤ n. Again we proceed by induction. The estimates from the division lemma for As−1 := K [x1 , . . . , xn−(s−1) ], g := 1, and f := f s give degx(s−1) ps ≤ (s − 1) d ≤ (n − 1) d and deg ps ≤ (2 s − 1) d deg Vs−1 . Now, let 1 ≤ i ≤ s − 2. Then degx(i) pi+1 ≤ i d and deg pi+1 ≤ deg bi+1 + i d + max{(i + 1) d, degx(i) bi+1 } deg Vi ,


571

where degx(i) bi+1 ≤ max {degx(i) p j + deg f j } ≤ max degx( j−1) p j + d ≤ s d. j≥i+2

j≥i+2

Applying recursively the previous inequality, we obtain deg pi+1 ≤ max deg p j + d + (s + i) d deg Vi j≥i+2

≤ (2 s − 1) d deg Vs−1 +

s−2 X

d + (s + j) d deg V j

j=i

= (s − i − 1) d +

s−1 X

(s + j) d deg V j .

j=i

For i = 0 we have p1 | b1 and therefore deg p1 ≤ deg b1 ≤ max j≥2 deg p j + d. Then for all i, deg pi ≤ (s − 1) d +

s−1 s−1 X X (s + j) d deg V j ≤ 2 n d (1 + deg V j ). j=1

j=1

Next, we consider the case s = n+1. In this case Vs is a zero-dimensional variety, and so deg pn+1 = degx(n) pn+1 ≤ n d. Let 1 ≤ i ≤ n − 1. Then degx(i) pi+1 ≤ i d. Again we apply recursively the previous inequality, and we get deg pi+1 ≤ max deg p j + d + (n + 1 + i) d deg Vi j≥i+2

≤ nd +

n−1 X

d + (n + 1 + j) d deg V j

j=i n−1 X = (2 n − i) d + (n + 1 + j) d deg V j . j=i

We have also deg p1 ≤ deg b1 ≤ max deg j≥2 deg p j + d. We conclude that for all i, deg pi ≤ 2 n d +

n−1 X

(n + 1 + j) d deg V j ≤ 2 n d (1 +

j=1

n−1 X

deg V j ).

j=1

Finally, we estimate the local height of these polynomials. In the rest of the proof we make repeated use of the following degree bounds: degx(i−1) pi ≤ n d,

572

KRICK, PARDO, AND SOMBRA min{n,s}−1 X

deg pi ≤ 2 n d (1 +

deg V j ).

j=i−1

As usual, we concentrate on the case v ∈ M K∞ ; the case v ∈ / M K∞ can be treated analogously. We apply the division lemma to As−1 := K [x1 , . . . , xn−(s−1) ], g := 1, and f := f s , and we obtain h v ( ps ) ≤ s d h v (Vs−1 ) + s h v + n − (s − 1) + 6 s + (s − 1) log(n + 1) d × deg Vs−1 − log |ξs−1 |v for some ξs−1 ∈ K ∗ . Let 1 ≤ i ≤ s − 2, and set n 0 := min{n, s}. Then there exists ξi ∈ K ∗ such that h v ( pi+1 ) ≤ h v (bi+1 ) + (i d + max{d, degx(i) bi+1 }) h v (Vi ) + (i + 1) h v + (n − i + 6) log(n + 1) × i d + max{(i + 1) d, degx(i) bi+1 } deg Vi + 2 log(n − i + 1) deg bi+1 − log |ξi |v ≤ max h v ( p j ) + h v + log(n + 1) d + log(s − i) j≥i+2

+ (s + i) d h v (Vi ) + (i + 1) h v deg Vi + (n − i + 6) (s + i) log(n + 1) d deg Vi + 2 log(n + 1) 2 n d (1 +

nX 0 −1

deg V j ) + d − log |ξi |v .

j=i+1

Applying the inductive hypothesis, we obtain s−2 X h v ( pi+1 ) ≤ s d h v (Vs−1 ) + d (s + j) h v (V j ) j=i

+ (s − i − 1) h v + h v

s−1 X

( j + 1) deg V j

j=i

+ 4 (s − i − 1) (n + 1) log(n + 1) d + log(n + 1) d

s−1 X

(n − j + 6) (s + j) deg V j

j=i

+ 4 n log(n + 1) d

nX 0 −1 j=i+1

( j − i) deg V j −

s−1 X j=i

log |ξ j |v .


573

For i = 0 we apply Remark 3.2. There exists ξ0 ∈ K ∗ such that h v ( p1 ) ≤ h v (b1 ) + h v + 2 log(n + 1) deg b1 − log |ξ0 |v ≤ max h v ( p j ) + 2 h v + log(n + 1) d + log s j≥2

+ 2 log(n + 1) 2 n d (1 +

nX 0 −1

deg V j ) + d − log |ξ0 |v .

j=1

We set ξ :=

Qs−1

j=0 ξ j .

h v ( p1 ) ≤ 2 n d

As 2 ≤ s ≤ n + 1, we have

s−1 X

h v (V j )

j=1

+ (n + 1) h v (1 +

s−1 X

deg V j ) + 4 n (n + 1) log(n + 1) d

j=1

+ log(n + 1) d

s−1 X

(n − j + 6) (n + 1 + j) deg V j

j=1

+ 4 n log(n + 1) d

nX 0 −1

j deg V j − log |ξ |v

j=1

≤ 2nd

s−1 X

h(V j ) + (n + 1) h v + 2 n (2 n + 5)

j=1 s−1 X × log(n + 1) d (1 + deg V j ) − log |ξ |v . j=1

This last inequality follows from the fact that 4 n j +(n− j +6) ( j +s) ≤ 2 n (2 n+5) for j ≤ n − 1, and 6 (2 n + 1) ≤ 2 n (2 n + 5) as n ≥ 2. To conclude the proof, observe that for i = 1, . . . , s − 1, inequality (3.5) guarantees that the obtained estimate for pi differs from the one for pi+1 by a positive term. Thus the same estimate holds for h v ( pi ), 1 ≤ i ≤ s. The non-Archimedean case is treated in exactly the same way. The conclusion of the proof comes in this case from inequality (3.6). By means of Bézout inequality, we can now estimate the degree and height of the varieties V j . In this way we obtain an estimate which depends only on the degree and height of the input polynomials.

574


COROLLARY 3.4 Let notation and assumptions be as in Lemma 3.3, and assume n, d ≥ 2. Then there exist p1 , . . . , ps ∈ K [x1 , . . . , xn ] and γ ∈ K ∗ such that • 1 = p1 f 1 + · · · + ps f s , • deg pi ≤ 4 n d n , • h v ( pi ) ≤ 4 n (n + 1) d n h v + 4 n (4 n + 5) log(n + 1) d n+1 − log |γ |v for v ∈ M K∞ , • h v ( pi ) ≤ 4 n (n + 1) d n h v − log |γ |v for v ∈ / M K∞ .

Proof Let us first consider degrees. From the preceding result we obtain deg( pi ) ≤ 2 n d (1 +

min{n,s}−1 X

deg V j ) ≤ 2 n d (1 + · · · + d n−1 ) ≤ 4 n d n .

j=1

Here we applied the inequality 1 + · · · + d n−1 ≤ 2 d n−1 to obtain the last estimate. Next we consider the local height estimates. Let v ∈ M K∞ . We have h v ( pi ) ≤ 2 n d

s−1 X

h v (V j )

j=1 s−1 X + (n + 1) h v + 2 n (2 n + 5) log(n + 1)d (1 + deg V j ) − log |ξ |v j=1

for some ξ ∈ K ∗ . Applying Corollary 2.10, h v (V j ) ≤ j d j−1 h v + (n + j) log(n + 1) d j − log |λ j |v for some λ j ∈ K ∗ . Therefore h v ( pi ) ≤ 2 n d

s−1 X

j d j−1 h v + (n + j) log(n + 1) d j − log |λ j |v

j=1

+ (n + 1) h v + 2 n (2 n + 5) log(n + 1) d

n X

d j − log |ξ |v

j=0 2

n

2

≤ 4 n d h v + 8 n log(n + 1) d

n+1

+ 2 (n + 1) d n h v + 4 n (2 n + 5) log(n + 1) d n+1 − 2nd

s−1 X

log |λ j |v − log |ξ |v

j=1

≤ 4 n (n + 1) d n h v + 4 n (4 n + 5) log(n + 1) d n+1 − log |γ |v ,


575

Q 2n d. where γ ∈ K ∗ is defined as γ := ξ s−1 j=1 λ j ∞ The case v ∈ / M K follows analogously: h v ( pi ) ≤ 2 n d

s−1 X

h v (V j ) + (n + 1) h v (1 +

j=1

s−1 X

deg V j ) − log |ξ |v .

j=1

We have h v (V j ) ≤ j d j−1 h v − log |λ j |v , and therefore h v ( pi ) ≤ 2 n d

s−1 X

( j d j−1 h v − log |λ j |v ) + (n + 1) h v

j=1

n X

d j − log |ξ |v

j=0

≤ 4 n (n + 1) d h v − log |γ |v . n

3.2.2. Proof of Theorem 1 In order to prove Theorem 1, it remains only to put the general case into the hypothesis of Corollary 3.4. This is accomplished by replacing the input polynomials and variables by generic linear combinations. The coefficients of the linear combinations are chosen to be roots of 1. Amazingly enough, we do not need any control on the degree of the involved finite extension. Let L be a finite extension of K , and let B := {e1 , . . . , e N } be a basis of L as a K -linear space. We recall that B ∗ := {e1∗ , . . . , e∗N } is the dual basis of B if Tr LK (ei e∗j ) = 1 for i = j and zero otherwise. LEMMA 3.5 Let ω ∈ Q be a primitive p-root of 1 for some prime p. Then the basis B ∗ := { (ω− j − ω) / p : j = 0, . . . , p − 2 } of Q(ω) is dual to B := { ωi : i = 0, . . . , p − 2 }.

Proof A direct computation shows that for i, j = 0, . . . , p − 2, p−1 X l i −l j p, Tr ωi (ω− j − ω) = ω (ω − ωl ) = 0, l=1

for i = j, for i 6= j.

We use this result in the following way: Let ω be a primitive p-root of 1, and set L := K (ω). Let us assume that Q(ω) and K are linearly independent and that p does not divide the discriminant of K . Both conditions are satisfied by all but a finite number of p. Then [L : K ] = p − 1 and O L = O K [ω] (see [36, Chap. 3, Prop. 17]). Now, let ν ∈ O L \ {0}. By the preceding lemma ν=

1 1 Tr ν (1 − ω) + · · · + Tr ν (ω2− p − ω) ω2− p ∈ O K [ω] \ {0}, p p

576


and so there exists 0 ≤ j ≤ p − 2 such that Tr ν (ω− j − ω) / p ∈ O K \ {0}. 3.6 (Effective arithmetic Nullstellensatz) Let K be a number field, and let f 1 , . . . , f s ∈ O K [x1 , . . . , xn ] be polynomials without common zeros in An . Set d := maxi deg f i and h := h( f 1 , . . . , f s ). Then there exist a ∈ O K \ {0} and g1 , . . . , gs ∈ O K [x1 , . . . , xn ] such that • a = g1 f 1 + · · · + gs f s , • deg gi ≤ 4 n d n , • h(a, g1 , . . . , gs ) ≤ 4 n (n + 1) d n h + log s + (n + 7) log(n + 1) d . THEOREM

Theorem 1 in the introduction corresponds to the case K := Q. The extremal cases n = 1 and d = 1 are both simple. We treat them directly in the following lemmas. LEMMA 3.7 Let `1 , . . . , `s ∈ O K [x1 , . . . , xn ] be polynomials of degree bounded by 1 without common zeros in An . Set h := h(`1 , . . . , `s ). Then there exist a ∈ O K \ {0} and a1 , . . . , as ∈ O K such that • a = a1 `1 + · · · + as `s , • h(a, a1 , . . . , as ) ≤ (n + 1) h + log(n + 1) .

Proof Equation a = a1 `1 + · · · + as `s is equivalent to a O K -linear system of n + 1 equations in s unknowns. The height estimate follows then from application of the Cramer rule. LEMMA 3.8 Let f 1 , . . . , f s ∈ O K [x] be polynomials without common zeros in A1 . Set d := maxi deg f i and h := h( f 1 , . . . , f s ). Then there exist a ∈ O K \ {0} and g1 , . . . , gs ∈ O K [x] such that • a = g1 f 1 + · · · + gs f s , • deg gi ≤ d − 1, • h(a, g1 , . . . , gs ) ≤ 2 d (h + d).

Proof P P Let f := i ai f i , g := i bi f i ∈ K [x] be generic linear combinations of f 1 , . . . , f s . Then f and g are coprime polynomials, and so there exist p, q ∈ K [x] with deg p < deg g and deg q < deg f such that 1 = p f + q g.


577

Expanding this identity, we see that there exist p1 , . . . , ps ∈ K [x] with deg pi ≤ d − 1 such that 1 = p1 f 1 + · · · + ps f s . Thus the above Bézout identity translates to a consistent system of K -linear equations. The number of equations and variables equal 2 d and s d, respectively. This system can be solved by the Cramer rule. The integer a is the determinant of a nonsingular (2 d × 2 d)-submatrix of the matrix of the linear system. Proof of Theorem 3.6 We assume n, d ≥ 2. Let G p ⊂ Q denote the group of p-roots of 1 for a prime p. For ai j ∈ G p and i = 1, . . . , min{n + 1, s} we set qi := ai1 f 1 + · · · + ais f s . Also, for bkl ∈ G p and k = 1, . . . , n we set yk := bk0 + bk1 x1 + · · · + bkn xn . We assume that for a specific choice of p, ai j , and bkl there exists t ≤ min{n + 1, s} such that (q1 , . . . , qi ) ⊂ K [x1 , . . . , xn ] is a radical ideal of dimension n − i for i = 1, . . . , t − 1 and 1 ∈ (q1 , . . . , qt ). We also assume that y1 , . . . , yn is a linear change of variables and that Vi := V (q1 , . . . , qi ) ⊂ An satisfies Assumption 1.5 for i = 1, . . . , t − 1 with respect to y1 , . . . , yn−i . This is guaranteed by the fact that these conditions are generically satisfied: there exists a hypersurface H of the coefficient space such that (ai j , bkl ) ∈ / H implies that q1 , . . . , qs satisfy the stated conditions with respect to the variables y1 , . . . , yn (see, for instance, [16, Ths. 3.5 and 3.7.2], [19, Sec. 3.2 ], [50, Prop. 18 and proof of Th. 19]). As ∪ p G p is Zariski dense in A1 , it follows that these coefficients can be chosen to lie in G p for some p. Moreover, p can be chosen such that for ω a primitive p-root of 1 and L := K (ω), Q(ω) and K are linearly independent and p does not divide the discriminant of K . We also refer the reader to Section 4.1, where we give a selfcontained treatment of this topic. Set b := (bk0 )k ∈ G np and B := (bkl )k,l≥1 ∈ GLn (Q), so that x = B −1 (y − b). For j = 1, . . . , t, set F j (y) := q j B −1 (y − b) ∈ L[y1 , . . . , yn ]. Then F1 , . . . , Ft satisfy the hypothesis of Corollary 3.4. Let γ ∈ L ∗ and P1 , . . . , Pt ∈ L[y1 , . . . , yn ] be the nonzero element and the polynomials satisfying Bézout identity we obtain there.

578


Now, for i = 1, . . . , s, set pi :=

t X

ai j P j (B x + b) ∈ L[x1 , . . . , xn ]

j=1

so that 1 = p1 f 1 + · · · + ps f s holds. n+1 Finally, set µ := (det B)4 n (n+1) d γ ∈ L ∗ . By Lemma 3.5 there exists 0 ≤ ` ≤ p − 2 such that Tr µ (ω−` − ω) 6= 0. We define a := Tr µ (ω−` −ω) / p ∈ K ∗ , gi := Tr µ pi (ω−` −ω) / p ∈ K [x1 , . . . , xn ] for i = 1, . . . , s. Then a = g1 f 1 + · · · + gs f s as f 1 , . . . , f s ∈ K [x1 , . . . , xn ] and Tr is a K -linear map. Aside from the degree and height bounds, we show that since f 1 , . . . , f s ∈ O K [x1 , . . . , xn ], a ∈ O K and gi ∈ O K [x1 , . . . , xn ]. Let us first analyze degrees and local heights. As deg F j ≤ d, deg gi ≤ deg pi ≤ max j deg P j ≤ 4 n d n . Now, let v ∈ M K∞ , and let w ∈ M L such that w | v. We have h w B −1 (y − b) ≤ n log n − log | det B|w , and so h w (F j ) ≤ h w (q j ) + n log n − log | det B|w + 2 log(n + 1) d ≤ h v + log s + (n + 2) log(n + 1) d − log | det B|w d by Lemma 1.2(c). From Corollary 3.4, h w (P j ) ≤ 4 n (n + 1) d n max h w (Fk ) + 4 n (4 n + 5) log(n + 1) d n+1 − log |γ |w k

≤ 4 n (n + 1) d (h v + log s + (n + 2) log(n + 1) d − log | det B|w d) n

+ 4 n (4 n + 5) log(n + 1) d n+1 − log |γ |w = 4 n (n + 1) d n (h v + log s) + 4 n (n 2 + 7 n + 7) log(n + 1) d n+1 − log |µ|w . Therefore h w (µ pi ) ≤ max h w (P j ) + 2 log(n + 1) max deg P j + log t + log |µ|w j

j

≤ 4 n (n + 1) d (h v + log s) + 4 n (n 2 + 7 n + 7) log(n + 1) d n+1 n

+ 8 n log(n + 1) d n + log(n + 1) ≤ 4 n (n + 1) d n (h v + log s + (n + 7) log(n + 1) d) − log 2

(3.7)


579

again by Lemma 1.2(c) and the fact d, n ≥ 2. We have 1 1 X σ µ pi (ω−` − ω) , gi = Tr µ pi (ω−` − ω) = p p σ ∈Gal L/K

and so h v (gi ) ≤ max h w (µ pi ) + log 2 w|v

≤ 4 n (n + 1) d n (h v + log s + (n + 7) log(n + 1) d). We have h w (µ) ≤ 4 n (n +1) d n (h v +log s)+4 n (n 2 +7 n +7) log(n +1) d n+1 , and so the previous estimate also holds for h v (a). Now let v ∈ / M K∞ and w | v. Analogously, we have h w (µ), h w (µ pi ) ≤ 4 n (n + 1) d n h v = 0 as f 1 , . . . , f s ∈ O K [x1 , . . . , xn ]. Then µ ∈ O L \ {0} and µ pi ∈ O L [x1 , . . . , xn ], which in turn implies that a ∈ O K \ {0} and gi ∈ O K [x1 , . . . , xn ] as desired. The global height estimate then follows from the expression X 1 Nv max{h v (a), h v (g1 ), . . . , h v (gs )}. h(a, g1 , . . . , gs ) = [K : Q] ∞ v∈M K

Remark 3.9 The fact that the bound (3.7) is uniform on w for w | v is the key that allows us to get rid of the roots of 1. This is no longer the case in our treatment of the more refined arithmetic Nullstellensätze in Section 4. The following example improves the lower bound for a general height estimate given in the introduction and thus shows that the term d n h is unavoidable. Example 3.10 Set f 1 := x1 − H,

f 2 := x2 − x1d , . . . ,

d f n := xn − xn−1 ,

f n+1 := xnd

for any positive integers d, H . These are polynomials without common zeros in An of degree and height bounded, respectively, by d and h := log H . Let a ∈ Z \ {0} and g1 , . . . , gn+1 ∈ Z[x1 , . . . , xn ] such that a = g1 f 1 + · · · + n−1 gn+1 f n+1 . We evaluate this identity in (H, H d , . . . , H d ), and we obtain a = gn+1 (H, H d , . . . , H d from which we deduce h(a) ≥ d n h.

n−1

n

) Hd ,

580


4. Intrinsic type estimates Theorem 1 is essentially optimal in the general case. There are, however, many particular instances in which these estimates can be improved. Consider the following example: f 1 := x1 − 1,

f 2 := x2 − x1d , . . . ,

d f n := xn − xn−1 ,

f n+1 := H − xnd

for any positive integers d and H . These are polynomials without common zeros in An of degree and height bounded, respectively, by d and h := log H . Theorem 1 says that there exist a ∈ Z \ {0} and g1 , . . . , gn+1 ∈ Z[x1 , . . . , xn ] such that a = g1 f 1 + · · · + gn+1 f n+1 with deg gi ≤ 4 n d n and h(a), h(gi ) ≤ 4 n (n + 1) d n (h + (n + 7) log(n + 1) d). However, the following Bézout identity holds: H −1=

x1d − 1 xd − 1 xd − 1 ··· n f1 + · · · + n f n + f n+1 . x1 − 1 xn − 1 xn − 1

Note that the polynomials arising in this identity have degree and height bounded, respectively, by n (d − 1) and h. There is in this case an exponential gap between the a priori general estimates and the actual ones. The explanation is somewhat simple: for i = 1, . . . , n, the varieties Vi := V ( f 1 , . . . , f i ) = V (x1 − 1, x2 − 1, . . . , xi − 1) ⊂ An verify deg(Vi ) = 1 and h(Vi ) ≤ 2 n log(n + 1). Namely, both the degree and the height of the varieties successively cut out by the input polynomials are much smaller than the corresponding Bézout estimate. As the varieties Vi verify the assumptions of Lemma 3.3, a direct application together with Lemma 1.3 produces the more realistic estimates deg gi ≤ 2 n 2 d , h(a), h(gi ) ≤ (n + 1)2 h + 8 n log(n + 1) d . Based on this idea, we devote this section to the study of more refined arithmetic Nullstellensätze which can deal with such situations. 4.1. Equations in general position This section deals with the preparation of the input data. To apply Lemma 3.3, we need to prepare the polynomials and the variables of the ambient space. Let f 1 , . . . , f s ∈ K [x1 , . . . , xn ] be polynomials without common zeros in An . For i = 1, . . . , s and ai j ∈ Z we set qi := ai1 f 1 + · · · + ais f s .


581

We estimate the height of rational integers ai j in order that there exist t ≤ min{n + 1, s} such that (q1 , . . . , qi ) ⊂ K [x1 , . . . , xn ] is a radical ideal of dimension n − i for i = 1, . . . , t − 1 and 1 ∈ (q1 , . . . , qt ). Also, we set yk := bk0 + bk1 x1 + · · · + bkn xn for k = 1, . . . , n and bk l ∈ Z. Again we want to estimate the height of rational integers bkl such that Vi := V (q1 , . . . , qi ) ⊂ An satisfies Assumption 1.5 with respect to this set of variables for i = 1, . . . , t − 1. Namely, the projection πi : Vi → An−i ,

x 7 → (y1 , . . . , yn−i )

must verify #πi−1 (0) = deg Vi ; that is, # Vi ∩ V (y1 , · · · , yn−i ) = deg Vi for i = 1, . . . , t − 1. Lemma 2.14 would then imply that the variables y1 , . . . , yn−i are in Noether normal position with respect to Vi . It is well known that these conditions are satisfied by a generic election of ai j and bkl (see, for instance, [19, Sec. 3.2], [50, Prop. 18 and proof of Th. 19]). We have already applied such a preparation to obtain the classic style version of the effective arithmetic Nullstellensatz presented in Theorem 3.6. There we chose roots of 1 as coefficients of the linear combinations since their existence was sufficient in our proof. However, technical reasons (see Remark 3.9) prevent us from applying the same principle in Section 4, and we need to carry out a more careful analysis. We note that all aspects of this preparation were previously covered in the research papers [2, Sec. 4], [19, Sec. 3.2], [32, Sec. 6], and [22, Sec. 5.2]. However, the bounds presented therein are either nonexplicit or not precise enough for our purposes. Here we choose to give a self-contained presentation, which yields another proof of the existence of such linear combinations. The obtained estimates substantially improve the previously known ones. 4.1.1. An effective Bertini theorem This section is devoted to the preparation of the polynomials. We first establish some auxiliary results. The following is a version of the so-called shape lemma representation of a zerodimensional radical ideal. The main difference here is that we choose a generic linear form—instead of a particular one—as a primitive element. For a polynomial f = c D t D + · · · + c0 ∈ k[t] we denote its discriminant by discr( f ) ∈ k. We recall that discr ( f ) 6= 0 if and only if c D 6= 0 and f is squarefree, that is, f has exactly D distinct roots. LEMMA 4.1 (Shape lemma) Let V ⊂ An be a zero-dimensional variety defined over k. Let U := (U0 , . . . , Un ) be

582


a group of n + 1 variables, and set L := U0 + U1 x1 + · · · + Un xn for the associated generic linear form. Let P ∈ k[U ][T ] be a characteristic polynomial of V . Set P 0 := ∂ P/∂ T ∈ k[U ][T ] and ρ := discrT P ∈ k[U ] \ {0}. Also, set I for the extension of I (V ) to k[U ][x]. Then there exist v1 , . . . , vn ∈ k[U ][T ] with deg vi ≤ deg V − 1 such that Iρ = P(L), P 0 (L) x1 − v1 (L), . . . , P 0 (L) xn − vn (L) ρ ⊂ k[U ]ρ [x]. Here Iρ denotes the localization of I at ρ. Proof We note first that I (V ) is a radical ideal, and so I = k[U ] ⊗k I (V ) is also radical. We readily obtain from the definition of P that I ∩ k[U ][L] = P(L) , and so P(L) ∈ I . P We can write P(L) = α aα (x) U α with aα (x) ∈ I (V ). Therefore ∂ P(L)/∂Ui also lies in I for all i. A direct computation shows that for i = 1, . . . , n, ∂ P(L)/∂Ui = P 0 (L) xi − vi (L) for some vi ∈ k[U ][T ] with deg vi ≤ deg P − 1 = deg V − 1. Set J := P(L), P 0 (L) x1 − v1 (L), . . . , P 0 (L) xn − vn (L) ⊂ k[U ][x]. The previous argument shows the inclusion I ⊃ J . On the other hand, ρ = A P + B P 0 for some A, B ∈ k[U ][T ]. Set wi := B vi . Then xi ≡ wi (L)/ρ (mod Jρ ), and so for every f ∈ k[U ][x] we have that f ≡ f (U, w1 (L)/ρ, . . . , wn (L)/ρ) modulo Jρ , and hence modulo Iρ . For f ∈ I , ρ deg f f U, w1 (L)/ρ, . . . , wn (L)/ρ ∈ I ∩ k[U ][L] = P(L) , which implies Iρ ⊂ Jρ as desired. Let ν ∈ k n+1 such that ρ(ν) 6= 0. It follows that I (V ) can be represented as I (V ) = P(L), P 0 (L) x1 − v1 (L), . . . , P 0 (L) xn − vn (L) (ν) ⊂ k[x]. Now, let f 1 , . . . , f s ∈ k[x1 , . . . , xn ] be polynomials without common zeros in An . For i = 1, . . . , s we let Z i := (Z i1 , . . . , Z is ) denote a group of s variables, and we set Q i := Z i1 f 1 + · · · + Z is f s ∈ k[Z ][x] for the associated generic linear combination of f 1 , . . . , f s .


583

LEMMA 4.2 For ` = 1, . . . , s, the ideal (Q 1 , . . . , Q ` ) is a complete intersection prime ideal of k[Z ][x].

Proof Set I := (Q 1 , . . . , Q ` ) and V := V (I ) ⊂ As` × An . First we observe that V is a linear bundle over An : the projection π : V → An ,

(z, x) 7 → x

is surjective, and the fibers are affine spaces of dimension (s − 1) `. This follows from the assumption that the f j have no common zeros. This implies that dim V = (s − 1) ` + n because of the theorem of dimension of fibers. Namely, Q 1 , . . . , Q ` is a complete intersection, and in particular the ideal I is unmixed. Set I = I1 ∩ · · · ∩ Im for the primary decomposition of this ideal. We show that I j is prime for all j and then that m = 1. First we have that I f j = (Q 1 / f j , . . . , Q ` / f j ) = (Z 1 j + H1 j , . . . , Z `j + H`j ) where Hi j ∈ k[Z i ][x] f j does not depend on Z i j . Therefore (k[As` × An ]/I ) f j ∼ = k[A(s−1)` × An ] f j is a domain; that is, I f j is prime. We have I f j = (I1 ) f j ∩ · · · ∩ (Im ) f j , and so there exists 1 ≤ n( j) ≤ m such that I f j = (In( j) ) f j ,

V (Ii ) ⊂ { f j = 0}

for i 6= n( j).

In particular, In( j) = I f j ∩ k[As` × An ] is prime. The fact that ∩ j { f j = 0} = ∅ ensures that n( j) runs over all 1 ≤ i ≤ m, and so I is radical. The expression I f j = (Z 1 j + H1 j , . . . , Z `j + H`j ) implies that π(V (I f j )) ⊂ An contains the dense open set { f j 6 = 0}. In particular, V (I f j ) is not contained in any of the hypersurfaces { f i = 0}, and so n( j) = n(1) for all j. This implies that m = 1, and so I = I1 is prime. The following proposition shows that (Q 1 (a1 ), . . . , Q ` (a` )) is a radical ideal for a generic election of ai := (ai1 , . . . , ais ). Unlike Lemmas 4.1 and 4.2, this result does not hold for arbitrary characteristic. For instance, let x p , 1 − x p ∈ F p [x] for some prime p. Then Q 1 (a1 ) = b + c x p for some b, c ∈ F p , and so Q 1 (a1 ) = (b1/ p + c1/ p x) p is not squarefree.

584


PROPOSITION 4.3 Let char(k) = 0, and set I := (Q 1 , . . . , Q ` ) ⊂ k[Z ][x]. • In case I ∩ k[Z ] 6 = {0} there exists F ∈ k[Z ] \ {0} with deg F ≤ (d + 1)` such that F(a1 , . . . , a` ) 6 = 0 for a1 , . . . , a` ∈ k s implies that 1 ∈ Q 1 (a1 ), . . . , Q ` (a` ) . • In case I ∩ k[Z ] = {0} there exists F ∈ k[Z ] \ {0} with deg F ≤ 2 (d + 1)2 ` such that F(a1 , . . . , a` ) 6 = 0 for a1 , . . . , a` ∈ k s implies that Q 1 (a1 ), . . . , Q ` (a` ) ⊂ k[x] is a radical ideal of dimension n − `.

Proof Set V := V (I ) ⊂ As` × An . We have dim V = (s − 1) ` + n and deg V ≤ (d + 1)` . First, we consider the case I ∩ k[Z ] 6 = {0}. This occurs, for instance, when ` ≥ n + 1 since then dim I = s ` + n − ` < dim k[Z ] = s `. Let π : As` × An → As` be the canonical projection. Then π(V ) is a proper subvariety of As` , and thus it is contained in a hypersurface of degree bounded by deg V . This can be seen by taking a generic projection of this variety into an affine space of dimension s ` + n − ` + 1 (see [23, Rem. 4]). Let F ∈ k[Z ] be a defining equation of this hypersurface. Then F ∈ I as I is prime, and we have deg F ≤ (d + 1)` . Thus 1 ∈ I F ⊂ k[Z ] F [x], and therefore 1 ∈ I (a) := Q 1 (a1 ), . . . , Q ` (a` ) for a ∈ k s` such that F(a) 6 = 0. Next, we consider the case I ∩ k[Z ] = {0}.We adopt the following convention: For an ideal J ⊂ k[x] and for ζ any new group of variables, we denote by J [ζ ] and J (ζ ) the extension of J to the polynomial rings k[ζ ][x] and k(ζ )[x], respectively. We assume for the moment ` = n. Then dim I = s `, and so the extended ideal I (Z ) ⊂ k(Z )[x] is a zero-dimensional prime ideal. We have then that k(Z ) ⊗k I (Z ) ⊂ k(Z )[x] is a zero-dimensional radical ideal, as char(k) = 0 (see [44, Th. 26.3]). Our approach to this case is based on Shape Lemma 4.1. We determine a polynomial F ∈ k[Z ] such that F(a) 6 = 0 implies that the shape lemma representation of I (Z ) can be transferred to a shape lemma representation of I (a). Let U be a group of n + 1 variables, and set L := U0 + U1 x1 + · · · + Un xn for the associated generic linear form. Consider the morphism 9 : An+1 × As` × An → An+1 × As` × A1 , (u, z, x) 7 → u, z, L(x) , and let W be the variety defined by I in An+1 ×As` ×An , that is, W = An+1 × V . The Zariski closure 9(W ) is then an irreducible hypersurface. We set P ∈ k[U, Z ][T ] for one of its defining equations. If I [U ](Z ) is the extension of I (Z ) to k[U ](Z )[x], the polynomial P can be equivalently defined through the condition that P(L) is a generator of the principal


585

ideal I [U ](Z ) ∩ k[U, Z ][L]. Namely, P is a characteristic polynomial of the zerodimensional variety W0 defined by I (Z ) in An (k(Z )). Let v1 , . . . , vn ∈ k[U ](Z )[T ] denote the polynomials arising in the shape lemma applied to W0 . From the proof of this lemma we have that ∂ P(L)/∂Ui = P 0 (L) xi − vi (L) ∈ k[U, Z ][L] and so vi ∈ k[U, Z ][T ]. Set J := P(L), P 0 (L) x1 −v1 (L), . . . , P 0 (L) xn −vn (L) ⊂ k[U, Z ][x] and ρ := discrT P ∈ k[U, Z ] \ {0}. Then (I [U ](Z ) )ρ = (J [U ](Z ) )ρ ⊂ k[U ](Z )ρ [x]. We have that both Iρ[U,Z ] and Jρ[U,Z ] are prime ideals of k[U, Z ]ρ [x] with trivial intersection with the ring k[U, Z ]. Thus they coincide, respectively, with the contrac[U ](Z ) [U ](Z ) tion of Iρ and Jρ to k[U, Z ]ρ [x], and so Iρ[U,Z ] = Jρ[U,Z ] ⊂ k[U, Z ]ρ [x]. Define F ∈ k[Z ] \ {0} as any of the nonzero coefficients of the monomial expansion of ρ with respect to U . Let a ∈ k s` such that F(a) 6= 0. Then ρ(U, a) 6 = 0 and so P(U, a)[T ] is squarefree in k(U )[T ]. Then I (a)[U ]

ρ(U,a) 0

= P(L), P (L) x1 − v1 (L), . . . , P 0 (L) xn − vn (L) (a)ρ(U,a) ⊂ k[U ]ρ(U,a) [x] is radical, which implies in turn that I (a) = (I (a)[U ] )ρ(U,a) ∩ k[x] is a zerodimensional radical ideal of k[x], as desired. It remains to estimate the degree of F. To this end, it suffices to bound the degree of ρ with respect to the group of variables Z . We recall that P was defined as a defining equation of the hypersurface 9(W ). The map 9 is linear in the variables Z and x, and so deg Z P ≤ deg W = deg V ≤ (d + 1)n . This implies that deg F ≤ deg Z ρ ≤ deg Z P (2 deg Z P − 1) ≤ 2 (d + 1)2n . Finally, we consider the case ` < n for I ∩ k[Z ] = {0}. Let U1 , . . . , Un−` be groups of n + 1 variables each, and set L i := Ui0 + Ui1 x1 + · · · + Uin xn . for i = 1, . . . , n − `. Set U := (U1 , . . . , Un−` ), L := (L 1 , . . . , L n−` ), and k0 := k(U, L). The extended prime ideal I0 ⊂ k0 [Z ][x1 , . . . , x` ] verifies I0 ∩ k0 [Z ] = {0} and thus falls into the previously considered case.

586


Thus there exists F0 ∈ k0 [Z ] \ {0} with deg F0 ≤ 2 (d + 1)2` such that F0 (a) 6= 0 for a ∈ k s` implies that I0 (a) is a radical ideal of k0 [x1 , . . . , x` ]. This implies in turn that I (a) is a radical ideal of dimension n − l of k[x], as I (a) = I0 (a) ∩ k[x]. We can assume without loss of generality that F0 lies in k[U, L][Z ]. We conclude by taking F as any nonzero coefficient of the monomial expansion of F0 with respect to the variables U and L. COROLLARY 4.4 Let char(k) = 0, and let f 1 , . . . , f s ∈ k[x1 , . . . , xn ] be polynomials without common zeros in An . Set d := maxi deg f i . Then there exist t ≤ min{n + 1, s} and a1 , . . . , at ∈ Zs such that • Q 1 (a1 ), . . . , Q i (ai ) is a radical ideal of dimension n − i for 1 ≤ i ≤ t − 1, • 1 ∈ Q 1 (a1 ), . . . , Q t (at ) , • h(ai ) ≤ 2 (n + 1) log(d + 1).

Proof Set t for the minimal i such that Ii := (Q 1 , . . . , Q i ) ∩ k[Z ] 6= {0}. Then t ≤ n + 1, and by the previous result there exists Ft ∈ k[Z ] with deg Ft ≤ (d + 1)t such that Ft (a) 6= 0 implies that 1 ∈ (Q 1 (a1 ), . . . , Q t (at )). On the other hand, for i < t we take a polynomial Fi ∈ k[Z ] of degree bounded by 2 (d + 1)2 i such that Fi (a) 6 = 0 implies that Q 1 (a1 ), . . . , Q i (ai ) is a radical ideal of dimension n − i. Then we take F := F1 · · · Ft , and so deg F ≤ 2 (d + 1)2 + · · · + 2 (d + 1)2(t−1) + (d + 1)t ≤ (d + 1)2n + 2 (d + 1)2n + (d + 1)n+1 ≤ 4 (d + 1)2n . Finally, F 6= 0 implies there exist a1 , . . . , at ∈ Zs such that h(ai ) ≤ log(deg F) and F(a) 6 = 0. 4.1.2. Effective Noether normal position Now we devote ourselves to the preparation of the variables. For k = 0, . . . , n we let Uk := (Uk0 , . . . , Ukn ) be a group of n + 1 variables and we set Yk := Uk0 + Uk1 x1 + · · · + Ukn xn . PROPOSITION 4.5 Let V ⊂ An be an

equidimensional variety of dimension r defined over k.


587

Then there exists G ∈ k[U1 , . . . , Ur ] \ {0} with degUk G ≤ 2 (deg V )2 such that G(b1 , . . . , br ) 6= 0 for b1 , . . . , br ∈ k n+1 implies that # V ∩ V (Y1 (b1 ), . . . , Yr (br )) = deg V. Proof Let FV be a Chow form of V , and let PV ∈ k[U, T ] be the characteristic polynomial of V associated to FV given by Lemma 2.13. Set D := deg V, and let PV = c D T0D + · · · + c0 be its expansion with respect to T0 . Also, set ρ := discrT0 PV ∈ k[U0 , . . . , Ur ][T1 , . . . , Tr ] \ {0} for the discriminant of PV with respect to T0 . Observe that as PV is multihomogeneous of degree D in each group of variables Ui ∪ {Ti }, the degree of ρ in each of these groups of variables is bounded by D (2D − 1). n+1 such that V (ν) := V ∩ V (Y1 (ν1 ), . . . , Yr (νr )) is a Now, let ν1 , . . . , νr ∈ k zero-dimensional variety of cardinality D, and let FV (ν) be a Chow form of V (ν). Set ζ0 := (T0 − U00 , U01 . . . . , U0n ). Then applying [47, Prop. 2.4], there exists λ ∈ k ∗ such that PV (U0 , ν1 , . . . , νr )(T0 , 0, . . . , 0) = FV ζ0 (U0 , T0 ), ν1 , . . . , νr = λ FV ν) (ζ0 (U0 , T0 )

= λ PV (ν) (U0 )(T0 ) where PV (ν) is a characteristic polynomial of V (ν). This implies PV (U )(T0 , 0, . . . , 0) ∈ k[U ][T0 ] is a squarefree polynomial and so ρ(U )(0) 6 = 0. We take G ∈ k[U1 , . . . , Ur ] as any nonzero coefficient of the expansion of ρ(U )(0) with respect to U0 . Therefore deg G ≤ degUi ρ(U )(0) ≤ D (2 D − 1). The condition G(b) 6 = 0 implies that ρ(U0 , b1 , . . . , br )(0) 6= 0, and so #V (b) = D. As we noted before, this implies that the variables Y1 (b1 ), . . . , Yr (br ) are in Noether normal position with respect to the variety V .

588


COROLLARY 4.6 Let char(k) = 0, and let q1 , . . . , qt ∈ k[x1 , . . . , xn ] be polynomials without common zeros in An which form a reduced weak regular sequence. Set d := maxi deg f i . Then there exist b1 , . . . , bn ∈ Zn+1 such that V (q1 , . . . , qi ) satisfies Assumption 1.5 with respect to the variables Y1 (b1 ), . . . , Yn−i (bn−i ) for i = 1, . . . , t, and

h(bk ) ≤ 2 (n + 1) log(d + 1). Proof This follows readily from the previous result. We take G i as the polynomial corresponding to the variety V (q1 , . . . , qi ), and we set G := G 1 · · · G t−1 ∈ k[U1 , . . . , Un ]. We have degU j G i ≤ 2 d 2 i , and so degU j G ≤ 2 d 2 + · · · + 2 d 2 (t−1) ≤ 4 d 2 (t−1) ≤ 4 d 2 n . We conclude by taking b1 , . . . , bn ∈ Zn+1 such that h(bi ) ≤ log(deg G) and G(b) 6= 0. (If t < n + 1, we complete with vectors of the canonical basis of k n+1 .) 4.2. An intrinsic arithmetic Nullstellensatz In this section we introduce the notions of degree and height of a polynomial system defined over a number field K . Modulo setting the input equations in general position, these parameters measure the degree and height of the varieties successively cut out. The resulting estimates for the arithmetic Nullstellensatz are linear in these parameters. As an important particular case, we derive a sparse arithmetic Nullstellensatz. 4.2.1. Intrinsic parameters Let f 1 , . . . , f s ∈ K [x1 , . . . , xn ] be polynomials of degree bounded by d without common zeros in An . For i = 1, . . . , s we let Z i denote a group of s variables and we set Q i (Z ) := Z i1 f 1 + · · · + Z is f s ∈ K [Z ][x] for the associated generic linear combination of f 1 , . . . , f s . Let 0 be the set of integer (s × s)-matrices a = (ai j )i j ∈ Zs×s of height bounded by 2 (n + 1) log(d + 1) such that Ii (a) := Q 1 (a1 ), . . . , Q i (ai ) ⊂ K [x1 , . . . , xn ] is a radical ideal of dimension n − i for i = 1, . . . , t − 1 and 1 ∈ It (a) for some t ≤ min{n + 1, s}. Corollary 4.4 implies that 0 6= ∅. For a ∈ 0 we set δ(a) := max { deg V Ii (a) ; 1 ≤ i ≤ min{t, n} − 1 },


589

η(a) := max { h V Ii (a) ; 1 ≤ i ≤ t − 1 }. We set 0min ⊂ Zs×s for the subset of matrices a ∈ 0 such that η(a) + d δ(a) is minimum. Finally, let a min ∈ 0min be a matrix that attains the minimum of δ(a) for a ∈ 0min . Definition 4.7 Let notation be as in the previous paragraph. Then we define the degree and the height, respectively, of the polynomial system f 1 , . . . , f s as δ( f 1 , . . . , f s ) := δ(a min ) ,

η( f 1 , . . . , f s ) := η(a min ).

We restrict ourselves to integer matrices of bounded height in order to keep control of the height of Q 1 (a1 ), . . . , Q t (at ). The election of η(a) + d δ(a) as the defining invariant comes from the need to estimate the degree and height simultaneously. We note that in the case when f 1 , . . . , f s is already a reduced weak regular sequence we have η( f 1 , . . . , f s ) + d δ( f 1 , . . . , f s ) ≤ η(Id) + d δ(Id). We can estimate these parameters through the following arithmetic Bézout inequality. LEMMA 4.8 Let f 1 , . . . , f s ∈ K [x1 , . . . , xn ] be polynomials without common zeros in An . Set di := deg f i , and assume that d1 ≥ · · · ≥ ds holds. Set d := d1 = maxi deg f i and h := h( f 1 , . . . , f s ). Also, set n 0 := min{n, s} and n 1 := min{n + 1, s}. Then Qn 0 −1 • δ( f 1 , . . . , f s ) ≤ j=1 dj, Qn 1 −2 • η( f 1 , . . . , f s ) ≤ n h + log s + 3 n (n + 1) d j=1 d j .

Proof Let a := a min = (ai j )i j ∈ Zs×s be a coefficient matrix such that δ( f 1 , . . . , f s ) = δ(a) and η( f 1 , . . . , f s ) = η(a), and set qi := ai1 f 1 + · · · + ais f s ,

1 ≤ i ≤ s.

Let t ≤ n 1 = min{n + 1, s} be minimum such that 1 ∈ (q1 , . . . , qt ). Let e a ∈ Z(t−1)×s be the matrix formed by the first t − 1 rows of a, and let c ∈ Z(t−1)×s be a staircase matrix equivalent to e a. The polynomial system e qi := ci1 f 1 + · · · + cis f s

590


is then equivalent to q1 , . . . , qt−1 ; that is, (e q1 , . . . , e qi ) = (q1 , . . . , qi ) for i = 1, . . . , t − 1. Also, we have deg e qi ≤ di , and so δ := max {deg Vi ; 1 ≤ i ≤ min{n, t} − 1} ≤

nY 0 −1

dj.

j=1

We have also that each coefficient of c is a subdeterminant of e a . Thus e h := h(e q1 , . . . , e qt−1 ) ≤ h + log s + h(c) ≤ h + log s + (t − 1) 2 (n + 1) log(d + 1) + log(t − 1) ≤ h + log s + n (3 n + 1) d, and so, applying Corollary 2.11, η ≤ max {h(Vi ); 1 ≤ i ≤ min{n + 1, t} − 1} nX nY 1 −1 1 −1 e ≤ h/dl + (n + n 1 − 1) log(n + 1) dj l=1

j=1

1 −2 nY ≤ n (h + log s + n (3 n + 1) d) + 2 n log(n + 1) dj

j=1

≤ n h + log s + 3 n (n + 1) d

nY 1 −2

dj.

j=1

We can also estimate these parameters through the following arithmetic BernsteinKushnirenko inequality. 4.9 Let f 1 , . . . , f s ∈ K [x1 , . . . , xn ] be polynomials without common zeros in An . Set d := maxi deg f i and h := h( f 1 , . . . , f s ). Also, let V denote the volume of 1, x1 , . . . , xn , f 1 , . . . , f s . Then • δ( f 1 , . . . , f s ) ≤ V , • η( f 1 , . . . , f s ) ≤ n V (h + log s + 22 n+3 d). LEMMA

Proof Let a := a min = (ai j )i j ∈ Zs×s , and set qi := ai 1 f 1 + · · · + ai s f s for i = 1, . . . , s. Then Supp(qi ) ⊂ Supp( f 1 , . . . , f s ), and so V (1, x1 , . . . xn , q1 , . . . , qs ) ≤ V . Applying Proposition 2.12, we obtain δ ≤ V and η ≤ n max h(qi ) + 22 n+2 log(n + 1) d V i


591

≤ n h + log s + 2 (n + 1) log(d + 1) + 22 n+2 log(n + 1) d V ≤ n V (h + log s + 22 n+3 d ). 4.2.2. Proof of Theorem 2 Modulo the preparation of the input data, the proof of Theorem 2 follows the lines of the example introduced at the beginning of Section 4. The following is the general version of Theorem 2 over number fields. THEOREM 4.10 (Intrinsic arithmetic Nullstellensatz) Let K be a number field, and let f 1 , . . . , f s ∈ O K [x1 , . . . , xn ] be polynomials without common zeros in An . Set d := maxi deg f i and h := h( f 1 , . . . , f s ). Also, let δ and η denote the degree and the height of the polynomial system f 1 , . . . , f s . Then there exist a ∈ Z \ {0} and g1 , . . . , gs ∈ O K [x1 , . . . , xn ] such that • a = g1 f 1 + · · · + gs f s , • deg gi ≤ 2 n 2 d δ, • h(a, g1 , . . . , gs ) ≤ (n+1)2 [K : Q] d 2 η+(h+log s) δ+21 (n+1)2 d log(d+ 1) δ .

Proof Let a min = (ai j )i j ∈ Zs×s be a coefficient matrix such that δ = δ(amin ), η = η(amin ), and h(amin ) ≤ 2 (n + 1) log(d + 1). We set qi := ai1 f 1 + · · · + ais f s for i = 1, . . . , s. Then (q1 , . . . , qi ) is a radical ideal of dimension n − i for i = 1, . . . , t − 1 and 1 ∈ (q1 , . . . , qt ) for some t ≤ min{n + 1, s}. For 1 ≤ k ≤ n, 0 ≤ l ≤ n, we also let bkl ∈ Z be integers with h(bkl ) ≤ 2 (n + 1) log(d + 1) such that Vi := V (q1 , . . . , qi ) satisfies Assumption 1.5 with respect to the variables yk := bk0 + bk1 x1 + · · · + bkn xn for i = 1, . . . , t − 1. Set b := (bk0 )k ∈ Zn and B := (bkl )k,l≥1 ∈ GLn (Q), and set ϕ : An → An for the affine map ϕ(x) := B x + b. For j = 1, . . . , t we then set F j (y) := q j (x) = q j ϕ −1 (y) ∈ K [y1 , . . . , yn ]. Thus F1 , . . . , Ft are in the hypothesis of Lemma 3.3 with respect to y1 , . . . , yn , and we let P1 , . . . , Pt ∈ K [x1 , . . . , xn ] be the polynomials satisfying Bézout identity we obtain there.

592


Finally, for i = 1, . . . , s, we set pi :=

t X

ai j P j ϕ(x) ∈ K [x1 , . . . , xn ].

j=1

We have 1 = p1 f 1 + · · · + ps f s . Now we analyze the degree and the height of these polynomials. We assume n, d ≥ 2 as the remaining cases have already been considered in Lemmas 3.7 and 3.8. Set Wl := V (F1 , . . . , Fl ) ⊂ An for l = 1, . . . , t − 1. We have Wl = ϕ(Vl ) and so deg Wl = deg Vl . We have also deg F j = deg q j ≤ d, and so deg pi ≤ max deg P j ≤ 2 n d (1 + j

min{n,s}−1 X

deg Wl ) ≤ 2 n 2 d δ

l=1

as deg Wl ≤ δ for l ≤ n − 1. Now, let v ∈ M K∞ . We have h ∞ (ϕ) ≤ 2 (n + 1) log(d + 1) and so h ∞ (ϕ −1 ) ≤ n h ∞ (ϕ) + log n − log | det B|∞ ≤ n 2 (n + 1) log(d + 1) + log n − log | det B|∞ ≤ 3 n (n + 1) log(d + 1) − log | det B|∞ . Set h v := maxi h v ( f i ). Then h v (Fi ) ≤ h v (qi ) + h ∞ (ϕ −1 ) + 2 log(n + 1) deg qi ≤ h v + 2 (n + 1) log(d + 1) + log s + 3 n (n + 1) log(d + 1) − log | det B|∞ + 2 log(n + 1) d ≤ h v + log s + n + 1 + 3 n (n + 1) + 2 n d log(d + 1) − log | det B|∞ d ≤ h v + log s + 3 (n + 1)2 d log(d + 1) − log | det B|∞ d by Lemma 1.2(c) and the fact that log(n + 1) ≤ n and log(d + 1) ≥ 1 for d ≥ 2. Next, applying Lemma 2.7, we obtain h(Wl ) ≤ h(Vl ) + (n − l + 1) h(ϕ) + 5 log(n + 1) deg Vl ≤ h(Vl ) + n 2 (n + 1) log(d + 1) + 5 log(n + 1) deg Vl ≤ η + n (7 n + 2) d log(d + 1) δ as deg Wl = deg Vl ≤ d δ and h(Vl ) ≤ η for l = 1, . . . , t − 1. By Lemma 3.3 there exists ξ ∈ K ∗ such that h v (P j ) ≤ 2 n d

t−1 X l=1

h v (Wl ) + (n + 1) max h v (Fl ) l


+ 2 n (2 n + 5) log(n + 1) d

1+

t−1 X

593

deg Wl − log |ξ |v

l=1

≤ 2nd

t−1 X

h v (Wl ) + (n + 1)2 (h v + log s) d δ

l=1

+ 3 (n + 1)4 + 2 n 2 (2 n + 5) (n + 1) d 2 log(d + 1) δ − log |µ|v with µ := (det B)(n+1)

2 d2 δ

ξ ∈ K ∗ . From the previous estimates we deduce h v ( pi ) ≤ max h v (P j ) + h ∞ (ϕ) + 2 log(n + 1) max deg P j j

j

+ 2 (n + 1) log(d + 1) + log t X ≤ 2nd h v (Wl ) + (n + 1)2 (h v + log s) d δ l

+ 3 (n + 1)4 + 2 n 2 (2 n + 5) (n + 1) d 2 log(d + 1) δ − log |µ|v + 2 (n + 1) log(d + 1) + 2 log(n + 1) 2 n 2 d δ + 2 (n + 1) log(d + 1) + log(n + 1) X ≤ 2nd h v (Wl ) + (n + 1)2 (h v + log s) d δ l

+ 7 (n + 1)3 (n + 2) d 2 log(d + 1) δ − log |µ|v . P Analogously, h v ( pi ) ≤ 2 n d l h v (Wl )+(n+1)2 h v d δ−log |µ|v for v ∈ / M K∞ . Hence X h( p1 , . . . , ps ) ≤ 2 n d h(Wl ) + (n + 1)2 (h + log s) d δ l

+ 7 (n + 1)3 (n + 2) d 2 log(d + 1) δ ≤ 2 n 2 d η + 2 n 3 (7 n + 2) d 2 log(d + 1) δ + (n + 1)2 (h v + log s) d δ + 7 (n + 1)3 (n + 2) d 2 log(d + 1) δ ≤ 2 n 2 d η + (n + 1)2 (h + log s) d δ + 21 (n + 1)4 d 2 log(d + 1) δ. Finally, we apply Lemma 1.3 to obtain a ∈ Z \ {0} such that gi := a pi ∈ O K [x1 , . . . , xn ]. Thus a = g1 f 1 + · · · + gs f s and the corresponding height estimates are multiplied by [K : Q]. We derive from this result and Lemma 4.8 the following estimate in terms of the degree and the height of the input polynomials.

594


COROLLARY 4.11 Let f 1 , . . . , f s ∈ O K [x1 , . . . , xn ] be polynomials without common zeros in An . Set di := deg f i , and assume that d1 ≥ · · · ≥ ds holds. Also, set d := d1 = maxi deg f i , h := h( f 1 , . . . , f s ), and n 0 := min{n, s}. Then there exist a ∈ Z \ {0} and g1 , . . . , gs ∈ O K [x1 , . . . , xn ] such that • a = g1 f 1 + · · · + gs f s , Qn 0 −1 • deg gi ≤ 2 n 2 d j=1 dj, • h(a, g1 , . . . , gs ) ≤ 2 (n + 1)3 [K : Q] h + log s + 3 n(n + 7) d log(d + Qn 0 −1 1) d j=1 dj.

4.2.3. Estimates for the sparse case Our arithmetic Bernstein-Kushnirenko inequality (see Proposition 2.12 and Lemma 4.9) shows that both the degree and the height of a system are controlled by its volume. We then derive from Theorem 4.10 the following arithmetic Nullstellensatz for sparse polynomial systems. Corollary 3 in the introduction corresponds to the case K := Q. COROLLARY 4.12 (Sparse arithmetic Nullstellensatz) Let f 1 , . . . , f s ∈ O K [x1 , . . . , xn ] be polynomials without common zeros in An . Set d := maxi deg f i and h := h( f 1 , . . . , f s ). Also, let V denote the volume of the polynomial system 1, x1 , . . . , xn , f 1 , . . . , f s . Then there exist a ∈ Z \ {0} and g1 , . . . , gs ∈ O K [x1 , . . . , xn ] such that • a = g1 f 1 + · · · + gs f s , • deg gi ≤ 2 n 2 d V , • h(a, g1 , . . . , gs ) ≤ 2 (n + 1)3 [K : Q] d V h + log s + 22n+3 d log(d + 1) .

Example 4.13 For 1 ≤ i ≤ s we let f i := ai0 + ai1 x1 + · · · + ain xn + bi1 x1 · · · xn + · · · + bid (x1 · · · xn )d ∈ Z[x1 , . . . , xn ] be polynomials of degree bounded by n d without common zeros in An . Set h := maxi h( f i ). Also, set Pd := Conv 0, e1 , . . . , en , d (e1 + · · · + en ) ⊂ Rn , so that Pd contains the Newton polytope of the polynomials 1, x1 , . . . , xn , f 1 , . . . , f s . Then V ≤ Vol(Pd ) = n! d/(n − 1)! = n d.

We conclude that there exist a ∈ Z \ {0} and g1 , . . . , gs ∈ Z[x1 , . . . , xn ] such that a = g1 f 1 + · · · + gs f s , • deg gi ≤ 2 n 4 d 2 , • h(a), h(gi ) ≤ 2 n 2 (n + 1)3 d 2 h + log s + n 22 n+3 d log(n d + 1) . This estimate is sharper than the one given by Theorem 1. •


595

References [1]

[2] [3] [4] [5]

[6] [7]

[8] [9] [10] [11] [12]

[13]

[14] [15]

[16]

C. A. BERENSTEIN and D. C. STRUPPA, Recent improvements in the complexity of the

effective Nullstellensatz, Linear Algebra Appl. 157 (1991), 203–215. MR 92m:13024 524 C. A. BERENSTEIN and A. YGER, Effective Bezout identities in Q [z 1 , . . . , z n ], Acta Math. 166 (1991), 69–120. MR 92f:32004 524, 581 , Residue calculus and effective Nullstellensatz, Amer. J. Math. 121 (1999), 723–796. MR 2000g:13016 524 D. N. BERNSTEIN, The number of roots of a system of equations, Funct. Anal. Appl. 9 (1975), 183–185. MR 55:8034 526 J.-B. BOST, H. GILLET, and C. SOULE´ , Heights of projective varieties and positive Green forms, J. Amer. Math. Soc. 7 (1994), 903–1027. MR 95j:14025 527, 538, 539, 540, 549 W. D. BROWNAWELL, Bounds for the degrees in the Nullstellensatz, Ann. of Math. (2) 126 (1987), 577–591. MR 89b:12001 523 L. CANIGLIA, A. GALLIGO, and J. HEINTZ, Borne simple exponentielle pour les degrés dans le théorème des zéros sur un corps de caractéristique quelconque, C. R. Acad. Sci. Paris Sér. I Math. 307 (1988), 255–258. MR 90c:12002 523 J. CANNY and I. EMIRIS, A subdivision-based algorithm for the sparse resultant, J. ACM 47 (2000), 417–451. MR CMP 1 768 142 526, 540, 541 L. EIN and R. LAZARSFELD, A geometric effective Nullstellensatz, Invent. Math. 137 (1999), 427–448. MR 2000j:14028 524 D. EISENBUD, Commutative Algebra: With a View Toward Algebraic Geometry, Grad. Texts in Math. 150, Springer, New York, 1995. MR 97a:13001 566 G. FALTINGS, Diophantine approximation on abelian varieties, Ann. of Math. (2) 133 (1991), 549–576. MR 93d:11066 539, 549 N. FITCHAS and A. GALLIGO, Nullstellensatz effectif et conjecture de Serre (théorème de Quillen-Suslin) pour le calcul formel, Math. Nachr. 149 (1990), 231–253. MR 92i:12002 523 N. FITCHAS, M. GIUSTI, and F. SMIETANSKI,“Sur la complexité du théorème des zéros” in Approximation and Optimization in the Caribbean, II (Havana, 1993), Approx. Optim. 8, Lange, Frankfurt, 1995, 247–329. MR 97g:68091 524, 527, 562 W. FULTON, Intersection Theory, Ergeb. Math. Grenzgeb. (3) 2, Springer, Berlin, 1984. MR 85k:14004 537 I. M. GELFAND, M. M. KAPRANOV, and A. V. ZELEVINSKY, Discriminants, Resultants, and Multidimensional Determinants, Math. Theory Appl., Birkhäuser, Boston, 1994. MR 95e:14045 540 M. GIUSTI and J. HEINTZ, “La détermination des points isolés et de la dimension d’une variété algébrique peut se faire en temps polynomial” in Computational Algebraic Geometry and Commutative Algebra (Cortona, Italy, 1991), ed. D. Eisenbud and L. Robbiano, Sympos. Math. 34, Cambridge Univ. Press, Cambridge, 1993, 216–256. MR 95a:68063 577

596

[17]

[18]

[19] [20] [21] [22]

[23] [24]

[25] [26] [27] [28] [29] [30] [31]

[32]

[33] [34] [35]

KRICK, PARDO, AND SOMBRA ¨ ˜ , M. GIUSTI, J. HEINTZ, K. HAGELE, J. E. MORAIS, L. M. PARDO, and J. L. MONTANA

“Lower bounds for diophantine approximations” in Algorithms for Algebra (Eindhoven, Netherlands, 1996), J. Pure Appl. Algebra 117/118 (1997), 277–317. MR 99d:68106 524, 525, 539, 562 M. GIUSTI, J. HEINTZ, J. E. MORAIS, J. MORGENSTERN, and L. M. PARDO, Straight-line programs in geometric elimination theory, J. Pure Appl. Algebra 124 (1998), 101–146. MR 99d:68128 524, 525 M. GIUSTI, J. HEINTZ, and J. SABIA, On the efficiency of effective Nullstellensätze, Comput. Complexity 3 (1993), 56–95. MR 94i:13016 524, 527, 562, 577, 581 W. GUBLER, Local heights of subvarieties over non-Archimedean fields, J. Reine Angew. Math. 498 (1998), 61–113. MR 99j:14022 538 ¨ K. HAGELE , Intrinsic height estimates for the Nullstellensatz, Ph.D. dissertation, Univ. Cantabria, Cantabria, Spain, 1998. 525 ¨ K. HAGELE, J. E. MORAIS, L. M. PARDO, and M. SOMBRA, On the intrinsic complexity of the arithmetic Nullstellensatz, J. Pure Appl. Algebra 146 (2000), 103–183. MR 2000m:14069 522, 524, 525, 562, 581 J. HEINTZ, Definability and fast quantifier elimination in algebraically closed fields, Theoret. Comput. Sci. 24 (1983), 239–277. MR 85a:68062 523, 536, 537, 584 J. HEINTZ and C.-P. SCHNORR, “Testing polynomials which are easy to compute” in Logic and Algorithmic (Zurich, 1980) Monograph. Enseign. Math. 30, Univ. Genève, Geneva, 1982, 237–254. MR 83g:12003 554 G. HERMANN, Der Frage der endlich vielen Schritte in der Theorie der Polynomideale, Math. Ann. 95 (1926), 736–788. 523 ´ and B. SHIFFMAN, A global Lojasiewicz inequality for algebraic S. JI, J. KOLLAR, varieties, Trans. Amer. Math. Soc. 329 (1992), 813–818. MR 92e:32007 522 M. M. KAPRANOV, B. STURMFELS, and A. V. ZELEVINSKY, Chow polytopes and general resultants, Duke Math. J. 67 (1992), 189–218. MR 93e:14062 540 P. KOIRAN, Hilbert’s Nullstellensatz is in the polynomial hierarchy, J. Complexity 12 (1996), 273–286. MR 98e:68109 522 ´ , Sharp effective Nullstellensatz, J. Amer. Math. Soc. 1 (1988), 963–975. J. KOLLAR MR 89h:12008 523 , Effective Nullstellensatz for arbitrary ideals, J. Eur. Math. Soc. (JEMS) 1 (1999), 313–337. MR 2000h:13014 524 T. KRICK and L. M. PARDO, Une approche informatique pour l’approximation diophantienne, C. R. Acad. Sci. Paris Sér. I Math. 318 (1994), 407–412. MR 95d:13033 524 , “A computational method for diophantine approximation” in Algorithms in Algebraic Geometry and Applications (Santander, Spain, 1994), Progr. Math. 143, Birkhäuser, Basel, 1996, 193–253. MR 98h:13039 524, 527, 562, 565, 581 ´ , On intrinsic bounds in the Nullstellensatz, Appl. T. KRICK, J. SABIA, and P. SOLERNO Algebra Engrg. Comm. Comput. 8 (1997), 125–134. MR 98g:13030 525 E. KUNZ, Kähler Differentials, Adv. Lectures Math., Vieweg, Braunschweig, 1986. MR 88e:14025 562 A. G. KUSHNIRENKO, Newton polytopes and the Bézout theorem, Funct. Anal. Appl.


597

10 (1976), 233–235, http://www.emis.de/ZMATH 526 [36]

S. LANG, Algebraic Number Theory, Addison-Wesley, Reading, Mass., 1970.

[37]

D. H. LEHMER, Factorization of certain cyclotomic functions, Ann. of Math. (2) 34

[38]

P. LELONG, Mesure de Mahler et calcul des constantes universelles pour les

[39]

P. LELONG and L. GRUMAN, Entire Functions of Several Complex Variables,

[40]

F. S. MACAULAY, Some formulæ in elimination, Proc. London Math. Soc. 35 (1903),

[41]

K. MAHLER, On some inequalities for polynomials in several variables, J. London

[42]

V. MAILLOT, Géométrie d’Arakelov des variétés toriques et fibrés en droites

[43]

¨ D. W. MASSER and G. WUSTHOLZ , Fields of large transcendence degree generated by

MR 44:181 575 (1933), 461–479, http://www.emis.de/ZMATH 529 polynômes de n variables, Math. Ann. 299 (1994), 673–695. MR 95g:32025 530 Grundlehren Math. Wiss. 282, Springer, Berlin, 1986. MR 87j:32001 528 3–27. 526 Math. Soc. 37 (1962), 341–344. MR 25:2036 529 intégrables, Mém. Soc. Math. Fr. (N.S.) 2000, no. 80. MR CMP 1 775 582 556

[44] [45]

[46] [47] [48] [49]

[50]

[51] [52]

[53]

values of elliptic functions, Invent. Math. 72 (1983), 407–464. MR 85g:11060 523 H. MATSUMURA, Commutative Ring Theory, Cambridge Stud. Adv. Math. 8, Cambridge Univ. Press, Cambridge, 1986. MR 88h:13001 584 L. M. PARDO, “How lower and upper complexity bounds meet in elimination theory” in Applied Algebra, Algebraic Algorithms and Error-correcting Codes (Paris, 1995), Lecture Notes Comput. Sci. 948, Springer, Berlin, 1995, 33–69. MR 99a:68097 524 P. PEDERSEN and B. STURMFELS, Product formulas for resultants and Chow forms, Math. Z. 214 (1993), 377–396. MR 94m:14068 549 ´ P. PHILIPPON, Critères pour l’indépendance algébrique, Inst. Hautes Etudes Sci. Publ. Math. 64 (1986), 5–52. MR 88h:11048 529, 542, 543, 544, 546, 550, 552, 587 , Dénominateurs dans le théorème des zéros de Hilbert, Acta Arith. 58 (1990), 1–25. MR 92i:13008 524 , Sur des hauteurs alternatives, I, Math. Ann. 289 (1991), 255–283 MR 92m:11061; II, Ann. Inst. Fourier (Grenoble) 44 (1994), 1043–1065 MR 96c:11069; III, J. Math. Pures Appl. (9) 74 (1995), 345–365. MR 97a:11098 527, 530, 538, 539, 540, 550 ´ , Bounds for traces in complete intersections and degrees in J. SABIA and P. SOLERNO the Nullstellensatz, Appl. Algebra Engrg. Comm. Comput. 6 (1995), 353–376. MR 96k:13017 524, 527, 558, 561, 562, 577, 581 ´ , Effective Lojasiewicz inequalities in semialgebraic geometry, Appl. P. SOLERNO Algebra Engrg. Comm. Comput. 2 (1991), 2–14. MR 94i:14059 522 M. SOMBRA, “Bounds for the Hilbert function of polynomial ideals and for the degrees in the Nullstellensatz” in Algorithms for Algebra (Eindhoven, Netherlands, 1996), J. Pure Appl. Algebra 117/118 (1997), 565–599. MR 98i:13032 525 , A sparse effective Nullstellensatz, Adv. in Appl. Math. 22 (1999), 271–295. MR 2000c:13041 523, 526

598

[54] [55]

[56]

[57] [58]


, Estimaciones para el teorema de ceros de Hilbert, Ph.D. dissertation, Univ. Buenos Aires, Buenos Aires, 1998. 539 C. SOULE´ , “Géometrie d’Arakelov et théorie des nombres transcendants” in Journées Arithmétiques (Luminy, 1989), Astérisque 198–200 Soc. Math. France, Montrouge, 1991, 355–371. MR 93c:14024 538, 539 B. STURMFELS, “Sparse elimination theory” in Computational Algebraic Geometry and Commutative Algebra (Cortona, Italy, 1991), Cambridge Univ. Press, Cambridge, 1993, 264–298. MR 94k:13035 526 , Gröbner Bases and Convex Polytopes, Univ. Lecture Ser. 8, Amer. Math. Soc., Providence, 1996. MR 97b:13034 540 B. TEISSIER, Résultats récents d’algèbre commutative effective, Astérisque 189–190 (1990), 107–131, Séminaire Bourbaki 1989/90, exp. no. 718. MR 92e:13015 524

Krick Departamento de Matemática, Universidad de Buenos Aires, Ciudad Universitaria, 1428 Buenos Aires, Argentina; [email protected] Pardo Departamento de Matemáticas, Estad´ıstica y Computación, Universidad de Cantabria, E-39071 Santander, España; [email protected] Sombra Departamento de Matemática, Universidad Nacional de La Plata, Calle 50 y 115, 1900 La Plata, Argentina, and School of Mathematics, Institute for Advanced Study, Princeton, NJ 08540, USA; [email protected], [email protected]


POSITIVITY IN EQUIVARIANT SCHUBERT CALCULUS WILLIAM GRAHAM

Abstract We prove a positivity property for the cup product in the T -equivariant cohomology of the flag variety. This was conjectured by D. Peterson and has as a consequence a conjecture of S. Billey. The result for the flag variety follows from a more general result about algebraic varieties with an action of a solvable linear algebraic group such that the unipotent radical acts with finitely many orbits. The methods are those used by S. Kumar and M. Nori. 1. Introduction Let X = G/B be the flag variety of a complex semisimple group G with B ⊃ T a Borel subgroup and a maximal torus, respectively. The homology H∗ (X ) has as a basis the fundamental classes [X w ] of Schubert varieties X w ⊂ X . If {xw } ⊂ H ∗ (X ) is the corresponding dual basis for cohomology, then the cup product, expressed in this basis, has nonnegative coefficients: X w xw , (1.1) xu xv = auv w are nonnegative integers. where auv The T -equivariant cohomology and Chow groups of the flag variety have been described by [A], [KK], and [Br1]. One reason to study these groups is that they provide a way to compute the coefficients in the multiplication in ordinary cohomology. In addition, the equivariant groups are related to degeneracy loci in algebraic geometry (see [F2], [F3], [PR], [G]); these in turn are related to double Schubert polynomials (see [LS]), of interest in combinatorics. Peterson [P] recently conjectured that the equivariant cohomology groups of the flag variety have a positivity property generalizing (1.1). The T -equivariant cohomology HT∗ (X ) is a free module over HT∗ (pt) with a basis dual (in a suitable sense; see Section 2) to the equivariant fundamental classes [X w ]T ; again we call this basis {xw }. Now HT∗ (pt) is isomorphic to the polynomial ring S(Tˆ ) = Z[λ1 , . . . , λn ], where

DUKE MATHEMATICAL JOURNAL c 2001 Vol. 109, No. 3, Received 13 June 2000. Revision received 11 October 2000. 2000 Mathematics Subject Classification. Primary 14M, 22E. Author’s work partially supported by the National Science Foundation. 599

600

WILLIAM GRAHAM

λ1 , . . . , λn is a basis for the free abelian group Tˆ of characters of T . Let α1 , . . . , αn denote the simple roots in Tˆ (chosen so that the roots of b = Lie B are positive). In the equivariant setting, we can again expand the product xu xv in the form (1.1), but now w are in H ∗ (pt)—in other words, they are polynomials. Peterson’s conjecture the auv T w is written as a sum of monomials in the α , the coefficients are is that when each auv i all nonnegative. In this paper we prove the conjecture, not just for finite-dimensional flag varieties but in the general Kac-Moody setting. An immediate corollary is a conjecture of Billey [Bi]. The methods of this paper are those used by Kumar and Nori [KN]. In that paper, the authors prove the nonnegativity result (1.1) in ordinary cohomology for the flag variety of a Kac-Moody group. As they observe, the difficulty in proving this result is that in the Kac-Moody case, unlike the finite-dimensional case, the flag variety is not, in general, a homogeneous space. However, it is approximated by finite-dimensional varieties, each of which has an action of a unipotent group with finitely many orbits. The main result of [KN] is that, for such varieties, the cup product has nonnegative coefficients (with respect to a suitable basis); the result for the flag variety follows. A similar problem arises in equivariant cohomology. The equivariant cohomology of X is by definition the cohomology of a “mixed space” X T , which, although infinitedimensional, can be approximated by finite-dimensional varieties. As in the situation considered by Kumar and Nori, the space X T is not a homogeneous space. But unlike their situation, the finite-dimensional approximations to X T do not (as far as I know) have actions of unipotent groups with finitely many orbits, so we cannot apply their result. Instead, by adapting their proof to the equivariant setting and using a relation in equivariant cohomology (or Chow groups) observed by M. Brion, we are able to deduce an equivariant analogue of the main result of [KN]. The equivariant nonnegativity result for the flag variety follows immediately. 2. Preliminaries We work with schemes over the ground field C and assume (to freely apply the results of [F, Chapter 19]) that all schemes considered admit closed embeddings into nonsingular schemes. For schemes with group actions, we assume that equivariant embeddings exist. We use equivariant cohomology and Borel-Moore homology with integer coefficients as our main tools; H∗ X denotes the Borel-Moore homology of X . As general references for functorial properties of Borel-Moore homology, see [F] and [FM]. For smooth varieties we could alternatively use equivariant Chow groups, but for nonsmooth varieties the Chow “cohomology” theory is not as well understood, and for this reason we use (equivariant) cohomology and Borel-Moore homology groups. In this section we recall some basic facts about these groups (for more background, see [Br2] or [EG]). We also prove, for lack of a reference, equivariant versions of

POSITIVITY IN EQUIVARIANT SCHUBERT CALCULUS

601

several familiar nonequivariant results. Let X be a scheme with an action of a linear algebraic group G. Let V be a representation of G, and let U be an open subset of V such that G acts freely on U . View G as acting on the right on U and on the left on X ; then G acts on U × X by g · (u, x) = (ug −1 , gx). ∗ Define U ×G X to be (U × X )/G. The equivariant cohomology and Borel-Moore homology of X are, by definition, HGi (X ) = H i (U ×G X ), H jG (X ) = H j+2(dim V −dim G) (U ×G X ), provided the (complex) codimension of V − U in V is greater than i/2 (for the first equation) or dim X − j/2 (for the second equation). These groups are independent of the choice of V and U , provided the codimension condition is satisfied. For this reason we often denote U ×G X by X G (omitting U from the notation). The quotient U/G is a finite-dimensional approximation to the classifying space BG introduced in Chow theory by B. Totaro [T]. We frequently write BG when we mean such a finite-dimensional approximation. Note that HGi (X ) vanishes for negative i; H jG (X ) vanishes for j > 2 dim X but can be nonzero for negative j. The equivariant cohomology of a point we denote by HG∗ . Both HG∗ (X ) and H∗G (X ) are modules for HG∗ . HG∗ (X ) has a natural ring structure, and H∗G (X ) is a module for this ring. Any G-stable closed subvariety Y ⊂ X has a fundamental class [Y ]G in H2Gdim Y (X ). There is a natural map ∩[X ]G : HG∗ (X ) → H∗G (X ); if X is smooth, this is an isomorphism. In particular, we always identify H∗G (pt) with HG∗ . Let π X : X → pt denote the projection. If X is proper, this induces an HG∗ linear map π∗X : H∗G (X ) → H∗G (pt) ∼ = HG∗ . In this case there is a pairing ( , ) : ∗ ∗ G HG (X ) ⊗ H∗ (X ) → HG taking x ⊗ C to π∗X (x ∩ C). We sometimes write this R R pairing as C x, and, if C = [Y ]G , we abuse notation and write it as Y x. The pairing has the property that, given f : X 1 → X 2 , we have ( f ∗ x2 , C1 ) = (x2 , f ∗ C1 ).

(2.2)

(Proof: ( f ∗ x2 , C1 ) = π∗X 1 ( f ∗ x2 ∩ C1 ) = π∗X 2 f ∗ ( f ∗ x2 ∩ C1 ) = π∗X 2 (x2 ∩ f ∗ C1 ) = (x2 , f ∗ C1 ).) The map X ×G U → U/G is a fibration with fiber X , and pullback to a fiber yields a map v0 : HG∗ (X ) → H ∗ (X ). There is also a Gysin morphism H∗G (X ) → H∗ (X ), which we again denote by v0 . Properties of Gysin morphisms (see [FM, Section 2.5]) imply that if b ∈ HG∗ (X ) and a ∈ H∗G (X ), then v0 (b ∩ a) = v0 (b) ∩ v0 (a). A variety X is said to be paved by affines if it can be written as a finite disjoint ` 0 union X = X i , where each X i0 is a locally closed subvariety isomorphic to affine ∗ Alternatively,

we could let G act on the left on U and then take the diagonal action on U × X .

602

WILLIAM GRAHAM

space Adi for some di such that, for some partial order on the indexing set, [ Xi ⊆ X 0j . j≤i

Here X i is the closure of X i0 . As is well known (see, e.g., [KN]), the Borel-Moore homology H∗ (X ) is the free Z-module generated by the fundamental classes [X i ] (where X i is the closure of X i0 ); the odd-dimensional Borel-Moore homology vanishes. Part (b) of the next proposition is from [A, Propositions 2.5.1 and 2.4.1]. PROPOSITION 2.1 Suppose the G-variety X has a paving by G-invariant affines X i0 . Then (a) H∗G (X ) is a free HG∗ -module with basis {[X i ]G }. (b) Suppose in addition that X is complete and that HG∗ is torsion-free and vanishes in odd degrees. Then there exist classes xi (of degree dim X i ) in HG∗ (X ) which form a basis for HG∗ (X ) as HG∗ -module, such that the bases {[X i ]G } and R {xi } are dual in the sense that X i x j = δi j .

Proof (a) Let X k0 be open in X , and let Y = X − X k0 ; then there is a long exact sequence of HG∗ -modules G → Hi+1 (X k0 ) → HiG (Y ) → HiG (X ) → HiG (X k0 ) → · · · .

Since X k0 is isomorphic to affine space, H∗G (X k0 ) is a free HG∗ -module of rank 1, generated by [X k0 ]G . Hence all the odd equivariant homology of X k0 vanishes; by induction the same holds for Y , and then by the long exact sequence it holds for X . Thus we have a short exact sequence of HG∗ -modules 0 → H∗G (Y ) → H∗G (X ) → H∗G (X k0 ) → 0. This is split by the HG∗ -linear map H∗G (X k0 ) → H∗G (X ) taking [X k0 ]G to [X k ]G . Induction implies (a). (b) See [A, Propositions 2.5.1 and 2.4.1]. Remarks (1) Although A. Arabia assumes that G is connected, his proof is valid under the hypotheses stated. If G is connected, then [Bo1, Sections 7 and 19] (cf. [A, p. 136]) shows that if H ∗ (G) is torsion-free, then HG∗ is also torsion-free and vanishes in odd degrees. In particular (as observed in [A]), this holds with coefficients in a field, and Propositions 2.1 and 2.2 are also valid with field coefficients. If G is allowed to


603

be disconnected, with identity component G 0 , then the finite covering BG0 → BG induces an isomorphism HG∗ (pt; Q) ' HG∗ 0 (pt; Q)G/G 0 . R (2) As noted in [A], the conditions X i x j = δi j imply that under the map HG∗ (X ) → H ∗ (X ) the images xi0 of xi form a basis of H ∗ (X ) dual to the basis {[X i ]} of H∗ (X ). For a variety X paved by G-invariant affines as above, we have the following description of the product on HG∗ (X ) in terms of the diagonal morphism. The nonequivariant version of this result was used by [KN]. The equivariant version was mentioned in [P] for the flag variety; the general proof is the same. Note that the diagonal morphism δ : X → X × X is G-equivariant (G acting diagonally on X × X ). PROPOSITION 2.2 Let X be a G-variety with a paving by G-invariant affines X i0 ; assume that HG∗ is torsion-free and vanishes in odd degrees. Let X i and xi be as in the previous propoP sition. We can write δ∗ [X k ]G = i, j aikj [X i × X j ]G , where aikj ∈ HG∗ . The product in HG∗ (X ) is given by X xi x j = aikj xk . k

Proof We can write δ∗ [X k ]G in the form claimed because the classes [X i × X j ]G form a basis for H∗G (X × X ) as HG∗ -module. Let qi : X × X → X denote the ith projection. As in the nonequivariant case, the product on HG∗ (X ) is given by c1 · c2 = δ ∗ (q1∗ c1 · q2∗ c2 ) for c1 , c2 ∈ HG∗ (X × X ). (This can be seen by considering the composition i δG X G → (X × X )G ∼ = X G ×BG X G ,→ X G × X G

and noting that the product on H ∗ (X G ) is given by ζ1 · ζ2 = (i ◦ δG )∗ (pr∗1 ζ1 · pr∗2 ζ2 ), where pri : X G × X G → X G is the projection and ζi ∈ H ∗ (X G ). Choosing ζi to represent ci ∈ HG∗ (X ), the assertion follows easily.) The preceding proposition shows that if X is paved by invariant affines, then HG∗ (X ) and H∗G (X ) are free HG∗ -modules with a perfect pairing ( , ) : HG∗ (X ) ⊗ HG∗ H∗G (X ) → HG∗ .

604

WILLIAM GRAHAM

Using this, we can identify HG∗ (X ) = Hom HG∗ (H∗G (X ), HG∗ ). P Therefore, to show that xi x j = k aikj xk , it is enough to show that for all ν ∈ H∗G (X ) we have ! X X k (xi x j , ν) = ai j xk , ν = aikj (xk , ν). k

k

In fact, it is enough to check this when ν is one of the basis elements [X k ]G ; that is, it is enough to show (xi x j , [X k ]G ) = aikj . Now (xi x j , [X k ]G ) = (δ ∗ (q1∗ xi · q2∗ x j ), [X k ]G ) = (q1∗ xi · q2∗ x j , δ∗ [X k ]G ) X = akmn (q1∗ xi · q2∗ x j , [X m × X n ]G ). m,n

By definition of the pairing, (q1∗ xi · q2∗ x j , [X m × X n ]G ) = π∗X ×X (q1∗ xi · q2∗ x j ∩ [X m × X n ]G ). This is computed using the fibrations X G → BG and (X × X )G = πG X G ×BG X G → BG. By the next lemma, the result is equal to π∗X (xi ∩ [X m ]G ) · π∗X (x j ∩ [X n ]G ) which is 1 if i = m and j = n, and zero otherwise. We conclude that (xi x j , [X k ]G ) = aikj , as desired. LEMMA 2.3 Let ρi : X i → Y (i = 1, 2) be fibrations with ρi proper and with π : X 1 ×Y X 2 → Y , qi : X 1 ×Y X 2 → X i the projections. Let Z i ⊂ X i be closed subvarieties such that ρi | Z i : Z i → Y are fibrations, and let αi ∈ H ∗ (X i ). Assume that Y is smooth, and identify H∗ (Y ) with H ∗ (Y ). Then

π∗ (q1∗ α1 · q2∗ α2 ∩ [Z 1 ×Y Z 2 ]) = ρ1∗ (α1 ∩ [Z 1 ]) · ρ2∗ (α2 ∩ [Z 2 ]), where on the right-hand side the product is taken in H ∗ (Y ). Proof We have a Cartesian diagram 1

X 1 ×Y X 2 ↓π

→

Y

→

δ

X1 × X2 ↓5 Y ×Y


605

Because Y is smooth, δ (and hence 1) are regular embeddings, so there are Gysin maps δ ∗ and 1∗ on homology [F, Example 19.2.1]. These satisfy the relation π∗ 1∗ = δ ∗ 5∗ [FM, p. 26]. Claim: In H∗ (X 1 ×Y X 2 ), q1∗ α1 · q2∗ α2 ∩ [Z 1 ×Y Z 2 ] = 1∗ ((α1 ∩ [Z 1 ]) × (α2 ∩ [Z 2 ])). To prove this, first note that (with pri : X 1 × X 2 → X i denoting the projection) q1∗ α1 ·q2∗ α2 = 1∗ (pr∗1 α1 ·pr∗2 α2 ) = 1∗ (α1 ×α2 ) (cf. [M, p. 351]). Next, [Z 1 ×Y Z 2 ] = 1∗ [Z 1 × Z 2 ] since Z 1 × Z 2 and 1(X 1 ×Y X 2 ) are subvarieties of X 1 × X 2 whose intersection at smooth points is transverse. Hence (noting that [Z 1 × Z 2 ] = [Z 1 ]×[Z 2 ] by [F, p. 377]), q1∗ α1 · q2∗ α2 ∩ [Z 1 ×Y Z 2 ] = 1∗ (α1 × α2 ) ∩ 1∗ [Z 1 × Z 2 ] = 1∗ ((α1 × α2 ) ∩ ([Z 1 ] × [Z 2 ])) = 1∗ ((α1 ∩ [Z 1 ]) × (α2 ∩ [Z 2 ])) proving the claim. To complete the proof of the lemma, we compute π∗ (q1∗ α1 · q2∗ α2 ∩ [Z 1 ×Y Z 2 ]) = π∗ 1∗ ((α1 ∩ [Z 1 ]) × (α2 ∩ [Z 2 ])) = δ ∗ 5∗ ((α1 ∩ [Z 1 ]) × (α2 ∩ [Z 2 ])) = δ ∗ (ρ1∗ (α1 ∩ [Z 1 ]) × ρ2∗ (α2 ∩ [Z 2 ])) = ρ1∗ (α1 ∩ [Z 1 ]) · ρ2∗ (α2 ∩ [Z 2 ]). This proves the lemma. 3. The positivity theorem In this section we prove the positivity result about multiplication in equivariant cohomology (Theorem 3.1). As in the nonequivariant case considered by Kumar and Nori, it is deduced from a result about invariant cycles (Theorem 3.2). In the nonequivariant setting, A. Hirschowitz [H] proved that for a projective scheme with an action of a connected solvable group B, any effective cycle is rationally equivalent to a Binvariant effective cycle. Kumar and Nori gave a different proof of this result (without assuming projectivity) in the special case of unipotent groups, and the proof of Theorem 3.2 is adapted from their proof. In this section, T denotes an algebraic torus (i.e. product of multiplicative groups Gm ) with Lie algebra t = Lie T and with Tˆ ⊂ t∗ the group of characters of T . The equivariant cohomology group HT∗ can be identified with the polynomial ring S(Tˆ ), the symmetric algebra on the free abelian group Tˆ .

606

WILLIAM GRAHAM

THEOREM 3.1 Let B be a connected solvable group with unipotent radical N and Levi decomposition B = T N . Let α1 , . . . , αd ∈ Tˆ denote the weights of T on n = Lie N . Let X be a complete B-variety on which N acts with finitely many orbits X 10 , . . . , X n0 . These are a paving of X by B-stable affines; let X 1 . . . , X n denote the closures, so that {[X 1 ]T , . . . , [X n ]T } are a basis for H∗T (X ). Let {x1 , . . . , xn } denote the dual basis of HT∗ (X ). Write X xi x j = aikj xk k

with α1i1

aikj

∈

· · · αdid ,

HT∗

= S(Tˆ ). Then each aikj can be written as a sum of monomials

with nonnegative integer coefficients.

Note that the constant term in each aikj (i.e., the coefficient of α10 · · · αd0 ) is nonnegative by the above theorem. This is the coefficient that occurs in the multiplication in the ordinary cohomology H ∗ (X ). The reason is that our hypotheses imply H ∗ (X ) = HT∗ (X )/HT>0 · HT∗ (X ) (see [GKM]). The next result is the key ingredient in the proof of Theorem 3.1. In this theorem, N is not assumed to act with finitely many orbits. The result also holds with equivariant Chow groups in place of equivariant Borel-Moore homology. THEOREM 3.2 Let B be a connected solvable group with unipotent radical N , and let T ⊂ B be a maximal torus, so that B = T N . Let α1 , . . . , αd ∈ Tˆ denote the weights of T acting on n = Lie N . Let X be a scheme with a B-action, and let Y be a T -stable subvariety of X . Then there exist B-stable subvarieties Y1 , . . . , Yr of X such that in H∗T (X ), X [Y ]T = f i [Yi ]T ,

where each f i ∈ HT∗ can be written as a linear combination of monomials in α1 , . . . , αd with nonnegative integer coefficients. The following lemma was pointed out to me by Michel Brion. 3.3 Suppose that the connected solvable group B = T N acts on X and that N has finitely many orbits on X . Then each N -orbit is B-stable (in fact, the B-orbit of a T -fixed point). LEMMA

Proof B has finitely many orbits on X (as the subgroup N does); as each B-orbit is N -


607

stable, it is a finite union of N -orbits. Let B · x 0 ' B/B 0 be an orbit, where B 0 is the stabilizer of x 0 . As each N -orbit is isomorphic to affine space (see, e.g., [KN]), the odd cohomology of B · x 0 vanishes, so B 0 must contain a maximal torus of B. As all maximal tori of B are conjugate [Bo2, Corollary 11.3], there is some b ∈ B such that B 0 = bB1 b−1 , where B1 ⊃ T . Then B · x 0 = B · x, where x = b−1 x 0 ; moreover, B1 is the stabilizer of x. Hence B · x is the N -orbit of the T -fixed point x. Proof of Theorem 3.1 The group B˜ = T ·(N ×N ) (semidirect product) acts on X ×X by t·(n 1 , n 2 )( p1 , p2 ) = (tn 1 p1 , tn 2 p2 ). The unipotent radical N × N has finitely many orbits X i0 × X 0j on X × X with closures X i × X j , so H∗T (X × X ) is a free HT∗ -module with basis P [X i × X j ]T . By Proposition 2.2, if xi x j = k aikj xk , then δ∗ [X k ]T = [δ(X k )]T = P k k i j ai j [X i × X j ]T . The coefficients ai j are uniquely determined by the expansion of δ∗ [X k ]T because the classes [X i × X j ] are linearly independent over HT∗ . By Theorem 3.2, these coefficients can be written as monomials in α1 , . . . , αd with nonnegative integer coefficients, where α1 , . . . , αd are the weights of T on Lie (N × N ) (which are the same as the weights of T on n). Proof of Theorem 3.2 ϕ ∼ First, consider the case where dim N = 1; then B/T → N → Ga , where Ga ∼ = A1 ∼ is the additive group. Write α = α1 . We have B = N T , and the map B/T → N sends nT → n. Now, B acts on B/T by left multiplication. Via the isomorphism of B/T with N , we obtain an action of B on N ; the subgroup T ⊂ B acts on N by conjugation, and the subgroup N acts by left multiplication. The action of T by conjugation on N corresponds under ϕ to an action of T on A1 with weight α. Embed B/T ,→ P1 by nT 7→ [ϕ(n) : 1]. The action of B on B/T extends to an action on P1 ; the element tn ∈ B acts by the matrix α(t) ϕ(n) . 0 1 The point ∞ = [1 : 0] is fixed by B, while the point 0 = [0 : 1] is fixed by T . Now, B acts on B ×T X by left multiplication: b · (b0 , x) = (bb0 , x). Under the isomorphism θ : B ×T X → B/T × X , taking (b, x) to (bT, bx), the B-action corresponds to the product action on B/T × X . This extends to a B-action on P1 × X . The projections π : P1 × X → P1 and ρ : P1 × X → X are B-equivariant. If Y ⊂ X is a T -invariant subvariety, then B ×T Y is a B-invariant subvariety of B ×T X . Let Z be the Zariski closure of θ(B ×T Y ) in P1 × X ; θ (B ×T Y ) and Z are B-invariant subvarieties of P1 × X . Let π Z denote the restriction of π to Z . Let [w0 : w1 ] be projective coordinates on P1 , and let w be the rational function w0 /w1 . Let g = π Z∗ w; then w (and hence g) are rational functions that are

608

WILLIAM GRAHAM

T -eigenvectors of weight −α. By [Br1, Theorem 2.1]∗ we have in H∗T (P1 × X ) the relation [div Z g]T = α[Z ]T . Therefore in H∗T X we have the relation ρ∗ [div Z g]T = αρ∗ [Z ]T .

(3.3)

Now, π Z−1 (0) = {0} × Y (cf. [KN]). Also, π Z−1 (∞) = {∞} × D, where D is a subscheme of X . Therefore (3.3) yields [Y ]T = [D]T + αρ∗ [Z ]T . As π Z is B-equivariant and ∞ ∈ P1 is B-fixed, it follows that {∞} × D, and hence D, are B-invariant. Each irreducible component Yi (i = 1, . . . , r ) of D is therefore B-invariant (as B is connected), and if m i is the multiplicity of Yi in D, then P [D]T = ri=1 m i [Yi ]T . Likewise, ρ is B-equivariant, and Z is B-invariant. If Z i is a component of Z , then the map ρ| Z i of Z i onto its image in X is finite if and only if the map ρT | Z i T of Z i T onto its image in X T is finite; in that case the degrees of the maps are the same. If we list the components of ρ(Z ) which are finite images of components of Z as Yr +1 , . . . , Ys , it follows that each of these components is B-invariant Ps and that ρ∗ [Z ]T = i=r +1 m i [Yi ]T , where m i are positive integers. We conclude that r s X X [Y ]T = m i [Yi ]T + m i α[Yi ]T , (3.4) i=1

i=r +1

where the Yi are B-invariant. This proves the result if dim N = 1. To prove the result in general, we can find a subgroup N 0 ⊂ N such that N 0 is normal in B and dim N /N 0 = 1. Let α be the weight of T on Lie (N /N 0 ). Define B 0 = N 0 T ⊂ B = N T . By induction we may assume the result is true for B 0 . It is enough to show that, given a B 0 -invariant subvariety Y ⊂ X , we can write [Y ]T as in (3.4), with B-invariant Yi . For this we modify the above proof as follows. Replace 0 0 B/T , B ×T X , and B ×T Y by B/B 0 , B × B X , and B × B Y ; the map θ now takes 0

∼ =

B × B X to B/B 0 × X . Again ϕ : B/B 0 → Ga = A1 , and T acts by weight α on A1 . We can embed B/B 0 ,→ P1 as before; the point ∞ = [1 : 0] is fixed by B, and [0 : 1] is fixed by B 0 . With these modifications, (3.4) is proved as above. This proves the theorem.

Brion is using the convention that if X is a T -space, then T acts on functions on X by (t · f )(x) = f (t x), while we are using the convention that T acts on functions by (t · f )(x) = f (t −1 x). Under Brion’s convention, our function g would be an eigenvector of weight α.

∗ M.


609

4. Schubert varieties 4.1. Peterson’s conjecture Let G be a complex semisimple group, and let B ⊃ T be a Borel subgroup and a maximal torus, respectively. Let N be the unipotent radical of B; let B − = T N − be the opposite Borel. Choose a system of positive roots so that the roots in n are positive. Let W = N (T )/T denote the Weyl group; we abuse notation and write w for an element of W and also for a representative in N (T ). Let X = G/B denote 0 = N · w B ⊂ X , and let the flag variety. The T -fixed points are {w B}w∈W . Let X w ` ` 0 0 − 0 Yw = N · w B. Then X = w X w (resp., X = w Yw ) is a decomposition of X as a disjoint union of finitely many N (resp., N − )-orbits. Let X w and Yw denote the 0 and Y 0 , and let {x } and {y } be the bases of H ∗ X dual (in the sense closures of X w w w w T of Proposition 2.1) to {[X w ]T } and {[Yw ]T }. Let α1 , . . . , α` denote the simple roots. Any weight of T on n (resp., n− ) is a nonnegative (resp., nonpositive) linear combination of the simple roots. Therefore the next corollary is an immediate consequence of Theorem 3.1. COROLLARY 4.1 P w P w w With notation as above, write xu xv = w auv xw and yu yv = v buv yw , with auv ∗ w in H . Then a w (resp., bw ) is a linear combination of monomials in the α and buv i uv uv T (resp., −αi ), with nonnegative coefficients.

Remark Theorem 3.1 can be applied to the varieties X w and Yw , which are in general singular, to yield an analogue of Corollary 4.1 for HT∗ (X w ) and HT∗ (Yw ). The analogous result also holds for partial flag varieties. ∩[X ]T

Because X is smooth, the map HT∗ (X ) → H∗T (X ) is an isomorphism. The next lemma is known (cf. [P]), but for lack of reference we give a proof. LEMMA 4.2 ∩[X ]T The map HT∗ (X ) → H∗T (X ) takes yw to [X w ]T .

Proof We can identify HT∗ (X ) with Hom HT∗ (H∗T (X ), HT∗ ) (see the proof of Proposition 2.2). Hence any γ ∈ HT∗ (X ) is uniquely determined by the values π∗T (γ ∩ h 0 ) as h 0 ranges over the basis {[Yw0 ]T } of H∗T (X ). Now, if γ ∈ HT∗ (X ) satisfies γ ∩ [X ]T = h, then γ ∩ h 0 = h · h 0 . Indeed,

610

WILLIAM GRAHAM

the intersection product on H∗T (X ) satisfies the following: if γ 0 ∩ [X ]T = h 0 , then γ · γ 0 ∩ [X ]T = h · h 0 ; but γ · γ 0 ∩ [X ]T = γ ∩ (γ 0 ∩ [X ]T ) = γ ∩ h 0 . Combining these facts, we see that to show yw ∩ [X ]T = [X w ]T , it suffices to show π∗X ([X w ]T [Yw0 ]T ) = π∗X (yw ∩ [Yw0 ]T ) = δww0 . Now, for any w, w0 , the intersection X w ∩ Yw0 is T -invariant and is known to satisfy codim X w ∩ Yw0 = codim X w + codim Yw0 . (Indeed, by [KL], X w ∩ Yw0 0 is irreducible and of dimension dim X − dim X w − dim Yw0 , but by [F, p. 137] each component of X w ∩ Yw0 has at least that dimension. It follows that X w ∩ Yw0 0 is dense in X w ∩ Yw0 .) Hence [X w ]T [Yw0 ]T is a multiple of [X w ∩ Yw0 ]T . If dim X w ∩ Yw0 > 0, then dim(X w ∩ Yw0 )T > dim BT , so π∗X ([X w ∩ Yw0 ]T ) = 0. If dim X w ∩ Yw0 = 0, then w = w0 and X w and Yw intersect with multiplicity 1 at the point w B [C, Prop. 2]. Hence πTX : X T → BT maps (X w ∩ Yw )T isomorphically onto BT , and therefore π∗X ([X w ]T [Yw ]T ) = π∗X ([X w ∩ Yw ]T ) = 1. This proves the lemma. The intersection product on H∗T (X ) is induced by the product on HT∗ (X ), via the isomorphism ∩[X ]T . Lemma 4.2 and Corollary 4.1 therefore imply the following corollary. COROLLARY 4.3 P w The intersection product on H∗T (X ) is given by [Yu ]T [Yv ]T = w auv [Yw ]T (resp., P w ∗ w w [X u ]T [X v ]T = w buv [X w ]T ), where each auv (resp., buv ) in HT is a sum of monomials in α1 , . . . , α` (resp., −α1 , . . . , −α` ), with nonnegative coefficients.

Corollaries 4.1 and 4.3 were conjectured by Dale Peterson. 4.2. Billey’s conjecture B. Kostant and S. Kumar [KK] defined functions (for each w ∈ W ) ξ w : W → S(Tˆ ) ⊂ S(t∗ ), and showed that, for any u, v ∈ W , one can write X uv w ξuξv = pw ξ w

uv ∈ S(t∗ ). Billey [Bi] observed in examples that if ν ∈ t satisfies for unique pw uv (ν) ≥ 0, and asked if a geometric proof α(ν) > 0 for all positive roots α, then pw was possible. Arabia [A] proved the following relation of the functions ξ w to the T -equivariant equivariant cohomology of the flag variety. We use the notation of the preceding subsection; thus i w : w B → G/B = X denotes the inclusion, and i w∗ : HT∗ (X ) → HT∗ (w B) = HT∗ denotes the pullback. As usual, we identify HT∗ (X ) with H∗T (X ).


611

4.4 The functions ξ w are related to pullbacks of cohomology classes by i u∗ xw = −1 ξ w (u −1 ). −1 −1 uv are related to the multiplication in H ∗ (x) by p u ,v The polynomials pw = T w−1 w auv .

THEOREM

(1) (2)

This is proved (in the general Kac-Moody case) in [A, Theorem 4.2.1]. We have stated this theorem using the conventions of [KK] for the functions ξ w ; below we explain the relationship between the conventions of [A] and [KK]. Note that (2) follows immediately from (1) since (as noted by Arabia) the pullback ⊕i w∗ : HT∗ (X ) → ⊕w∈W HT∗ is injective. As a consequence, we obtain Billey’s conjecture. COROLLARY 4.5 uv (ν) ≥ 0. If ν ∈ t satisfies α(ν) > 0 for all positive roots α, then pw

Proof This follows immediately from Theorem 4.4 and Corollary 4.1. We now discuss the conventions of [A] and [KK]. Let C[W ] denote the group algebra over C of W ; let Q be the quotient field of S(t∗ ). Kostant and Kumar set Q W = C[W ] ⊗ Q; Arabia defines Q and Q W with rational rather than complex coefficients, but we ignore this difference. Both [KK] and [A] define elements ξ w ∈ Hom Q (Q W , Q), but with different conventions; if we use ξ w for the elements defined −1 in [KK] and ξ Aw for the elements defined in [A], then ξ w = ξ Aw . Let F(W, Q) denote the set of functions from W to Q. Both [KK] and [A] use ' identifications F(W, Q) → Hom Q (Q W , Q); we denote their respective identifications by f 7→ f K ,

where f K (δu ⊗ 1) = f (u)

f 7→ f A ,

where f A (δu ⊗ 1) = f (u

−1

(see [KK, (4.17)]), )

(see [A, Section 4.1]).

w w If we define f w and g w in F(W, Q) by f Kw = ξ w , g w A = ξ A , then f (u) = −1 g w (u −1 ). Arabia uses the injection

⊕i u∗ : HT∗ (X ) ,→ ⊕HT∗ ' F(W, S(t∗ )) ⊂ F(W, Q) to identify HT∗ (X ) with a subset of F(W, Q). In his paper he proves that, under this identification, g w corresponds to what we have denoted by xw ∈ HT∗ (X ). In [KK]

612

WILLIAM GRAHAM

there is no separate notation introduced for the f w , but rather they are identified with ξ w ; that is, the ξ w are viewed as elements of F(W, Q). If we return to their notation, −1 we see ξ w (u −1 ) = i u∗ xw , as stated in Theorem 4.4. Note that if we let ξ Bw denote the functions used by Billey, then ξ Bw (u) = −1 w ξ (u −1 ). We also remark that the notation xw in this paper does not have the same meaning as it does in [A] and [KK]. 4.3. The Kac-Moody case The analogues of Corollaries 4.1 and 4.5 are also valid for flag varieties (complete or partial) of Kac-Moody groups. The key point is that such a flag variety, although in general infinite-dimensional, can be approximated by finite-dimensional varieties for which the hypotheses of Theorem 3.1 are satisfied. Indeed, this was exactly the geometric motivation of Kumar and Nori. We briefly sketch how this works in equivariant cohomology. The basic facts we need can be found in [Sl], to which we refer for a more detailed explanation of the notation. Let G be a Kac-Moody group, and let B be a Borel subgroup; let X = G/B denote the flag variety. The group B is a proalgebraic group (inverse limit of algebraic groups), and it has a Levi decomposition B = T N , where N is a proalgebraic prounipotent group (denoted by U in [Sl] and [KN]) and T is a finite-dimensional torus. The space X has the structure of indvariety; it is realized as a union X = ∪k≥0 X k , where each X k is a finite-dimensional variety embedded as a closed subvariety of X k+1 . Here X k is defined as follows. We ` 0 0 = B · w B. The have X = X w , realizing X as a disjoint union of Schubert cells X w 0 union is over all elements of the Weyl group W ; each X w is isomorphic to the affine ` 0 ; this is space Al(w) , where l(w) is the length of w. By definition, X k = l(w)≤k X w a finite-dimensional projective variety that is paved by affines. Moreover, each X k is B-stable, and there exists a subgroup Nk ⊂ N , normal in B, such that Bk = B/Nk is a finite-dimensional solvable group, and the action of B on X k factors through the map B → Bk . Each X k therefore satisfies the hypotheses of Theorem 3.1. As in the finite case, there is a set of simple roots α1 , . . . , αl in t∗ , and, moreover, for any k every weight in Lie (N /Nk ) is a nonnegative linear combination of simple roots. Now, for any fixed i the pullback HTi (X ) → HTi (X k ) is a canonical isomorphism for k sufficiently large (as the decomposition of X into Schubert cells makes X a CWcomplex and X k contains all cells in X of dimension less than or equal to 2k, and the same is true for the mixed spaces X kT and X T ). There is a basis {xw } of HT∗ (X ) dual to the fundamental classes [X w ]T in the sense that the pullbacks to HT∗ (X k ) form a basis dual to the [X w ]T ∈ H∗T (X k ), for l(w) ≤ k. This basis does not depend on k, as can be seen using property (2.2) of the pairing, applied to the inclusion map of X k into X k+1 . Theorem 3.1 therefore implies the following corollary, also conjectured by Peterson.


613

COROLLARY 4.6 With notation as above, if X is the flag variety of a Kac-Moody group, with basis P w ∗ w {xw } of HT∗ (X ), then xu xv = w auv x w , with auv ∈ HT a linear combination of monomials in the αi , with nonnegative coefficients.

Acknowledgments. The author would like to thank Michel Brion and James Carrell for some useful e-mail. References [A]

[Bi] [Bo1]

[Bo2] [Br1] [Br2]

[C]

[EG] [F] [F2] [F3] [FM]

[GKM]

A. ARABIA, Cohomologie T -équivariante de la variété de drapeaux d’un groupe de

Kac-Moody, Bull. Soc. Math. France 117 (1989), 129–165. MR 90i:32042 599, 602, 603, 610, 611, 612 S. BILLEY, Kostant polynomials and the cohomology ring for G/B, Duke Math. J. 96 (1999), 205–224. MR 2000a:14060 600, 610 A. BOREL, Sur la cohomologie des espaces fibrés principaux et des espaces homogènes de groupes de Lie compacts, Ann. of Math. (2) 57 (1953), 115–207. MR 14:490e 602 , Linear Algebraic Groups, 2d ed., Grad. Texts in Math. 126, Springer, New York, 1991. MR 92d:20001 607 M. BRION, Equivariant Chow groups for torus actions, Transform. Groups 2 (1997), 225–267. MR 99c:14005 599, 608 , “Equivariant cohomology and equivariant intersection theory” in Representation Theories and Algebraic Geometry (Montreal, 1997), ed. A. Broer and A. Daigneault, NATO Adv. Sci. Inst. Ser. C Math. Phys. Sci. 514, Kluwer, Dordrecht, 1998, 1–37. MR 99m:14005 600 C. CHEVALLEY, “Sur les décompositions cellulaires des espaces G/B” in Algebraic Groups and Their Generalizations: Classical Methods (University Park, Pa., 1991), Proc. Sympos. Pure Math. 56, Amer. Math. Soc., Providence, 1994, 1–23. MR 95e:14041 610 D. EDIDIN and W. GRAHAM, Equivariant intersection theory, Invent. Math. 131 (1998), 595–634. MR 99j:14003a 600 W. FULTON, Intersection Theory, Ergeb. Math. Grenzgeb (3) 2, Springer, Berlin, 1984. MR 85k:14004 600, 605, 610 , Flags, Schubert polynomials, degeneracy loci, and determinantal formulas, Duke Math. J. 65 (1992), 381–420. MR 93e:14007 599 , Determinantal formulas for orthogonal and symplectic degeneracy loci, J. Differential Geom. 43 (1996), 276–290. MR 98d:14004 599 W. FULTON and R. MACPHERSON, Categorical framework for the study of singular spaces, Mem. Amer. Math. Soc. 31 (1981), no. 243. MR 83a:55015 600, 601, 605 M. GORESKY, R. KOTTWITZ, and R. MACPHERSON, Equivariant cohomology, Koszul duality, and the localization theorem, Invent. Math. 131 (1998), 25–83.

614

WILLIAM GRAHAM

MR 99c:55009 606 [G]

W. GRAHAM, The class of the diagonal in flag bundles, J. Differential Geom. 45

[H]

A. HIRSCHOWITZ, Le groupe de Chow e´ quivariant, C. R. Acad. Sci. Paris Sér. I Math.

[KL]

D. KAZHDAN and G. LUSZTIG, “Schubert varieties and Poincaré duality” in Geometry

(1997), 471–487. MR 98j:14070 599 298 (1984), 87–89. MR 85j:14007 605

[KK]

[KN]

[LS]

[M] [P] [PR]

[Sl]

[Sp] [T]

of the Laplace Operator (Honolulu, 1979), Proc. Sympos. Pure Math. 36, Amer. Math. Soc., Providence, 1980, 185–203. MR 84g:14054 610 B. KOSTANT and S. KUMAR, The nil Hecke ring and cohomology of G/P for a Kac-Moody group G, Adv. in Math. 62 (1986), 187–237. MR 88b:17025b 599, 610, 611, 612 S. KUMAR and M. NORI, Positivity of the cup product in cohomology of flag varieties associated to Kac-Moody groups, Internat. Math. Res. Notices 1998, 757–763. MR 99i:14061 600, 602, 603, 607, 608, 612 ¨ A. LASCOUX and M.-P. SCHUTZENBERGER , “Interpolation de Newton a` plusieurs variables” in Seminare D’algèbre Paul Dubreil et Marie-Paule Malliavin (Paris, 1983/84), Lecture Notes in Math. 1146, Springer, Berlin, 1985, 161–175. MR 88h:05020 599 J. MUNKRES, Elements of Algebraic Topology, Addison-Wesley, Menlo Park, Calif., 1984. MR 85m:55001 605 D. PETERSON, lectures, 1997. 599, 603, 609 P. PRAGACZ and J. RATAJSKI, Formulas for Lagrangian and orthogonal degeneracy ˜ loci: Q-polynomial approach, Compositio Math. 107 (1997), 11–87. MR 98g:14063 599 P. SLODOWY, “On the geometry of Schubert varieties attached to Kac-Moody Lie algebras” in Proceedings of the 1984 Vancouver Conference in Algebraic Geometry, ed. J. Carrell, A. Geramita, and P. Russell, CMS Conf. Proc. 6, Amer. Math. Soc., Providence, 1986, 405–442. MR 87i:14043 612 E. SPANIER, Algebraic Topology, McGraw-Hill, New York, 1966. MR 35:1007 B. TOTARO, “The Chow ring of a classifying space” in Algebraic K -Theory (Seattle, 1997), Proc. Sympos. Pure Math. 67, Amer. Math. Soc., Providence, 1999, 249–281. MR CMP 1 743 244 601

University of Georgia, Department of Mathematics, Boyd Graduate Studies Research Center, Athens, Georgia 30602, USA; [email protected]

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

Recommend Documents