A.K. Chakraborty / Physics Reports 342 (2001) 1}61
1
DISORDERED HETEROPOLYMERS: MODELS FOR BIOMIMETIC POLYMERS AND POLYMERS WITH FRUSTRATING QUENCHED DISORDER
Arup K. CHAKRABORTY Department of Chemical Engineering, and Department of Chemistry, University of California, Berkeley, CA 94720, USA
AMSTERDAM } LONDON } NEW YORK } OXFORD } PARIS } SHANNON } TOKYO
Physics Reports 342 (2001) 1}61
Disordered heteropolymers: models for biomimetic polymers and polymers with frustrating quenched disorder Arup K. Chakraborty Department of Chemical Engineering, and Department of Chemistry, University of California, Berkeley, CA 94720, USA Received December 1999; editor: M.L. Klein Contents 1. Introduction 2. Biomimetic recognition between DHPs and multifunctional surfaces 2.1. Theory of thermodynamic properties 2.2. Monte-Carlo simulations of thermodynamic properties 2.3. Kinetics of recognition due to statistical pattern matching
4 5 8 23 30
2.4. Connection to experiments and issues pertinent to evolution 3. Branched DHPs in the molten state } model system for studying microphase ordering in systems with quenched disorder Acknowledgements Appendix References
41
44 52 52 59
Abstract The ability to design and synthesize polymers that can perform functions with great speci"city would impact advanced technologies in important ways. Biological macromolecules can self-assemble into motifs that allow them to perform very speci"c functions. Thus, in recent years, attention has been directed toward elucidating strategies that would allow synthetic polymers to perform biomimetic functions. In this article, we review recent research e!orts exploring the possibility that heteropolymers with disordered sequence distributions (disordered heteropolymers) can mimic the ability of biological macromolecules to recognize patterns. Results of this body of work suggests that frustration due to competing interactions and quenched disorder may be the essential physics that can enable such biomimetic behavior. These results also show that recognition between disordered heteropolymers and multifunctional surfaces due to statistical pattern matching may be a good model to study kinetics in frustrated systems with quenched disorder. We also review work which demonstrates that disordered heteropolymers with branched architectures are good model systems to study the e!ects of quenched sequence disorder on microphase ordering of molten
E-mail address:
[email protected] (A.K. Chakraborty). 0370-1573/01/$ - see front matter 2001 Elsevier Science B.V. All rights reserved. PII: S 0 3 7 0 - 1 5 7 3 ( 0 0 ) 0 0 0 0 6 - 5
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
3
copolymers. The results we describe show that frustrating quenched disorder a!ects the way in which these materials form ordered nanostructures in ways which might be pro"tably exploited in applications. Although the focus of this review is on theoretical and computational research, we discuss connections with existing experimental work and suggest future experiments that are expected to yield further insights. 2001 Elsevier Science B.V. All rights reserved. PACS: 87.15.Aa; 82.35.#t
4
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
1. Introduction Synthetic polymers have enormously impacted societal and economic conditions because they are commonly used to manufacture a plethora of commodity products. This is one of the driving forces that continues to spur fundamental research aimed toward understanding the physics of macromolecules and learning how to chemically synthesize them. Another motivation for such research is, of course, intrinsic interest in the fascinating behavior of macromolecules. Research conducted by several physical and chemical scientists has led to substantial advances in our ability to synthesize macromolecules and understand their physical behavior. In recent years, technological advances have begun to demand materials which exhibit very speci"c properties. If polymers are to continue to impact society in important ways, they must meet this need. Polymers are good candidates for materials which can perform functions with a high degree of speci"city. We can make this claim because it is well-established that biological macromolecules are able to carry out very speci"c functions. One feature that allows biological macromolecules to perform speci"c functions is their ability to self-assemble into particular motifs. Polymeric materials could impact advanced technologies in important ways if we could learn how to design and synthesize macromolecular systems that can selfassemble into functionally interesting structures and phases. One way to confront this challenge is to take lessons from nature since millenia of evolution have allowed biological systems to learn how to create functionally useful self-assembled structures from polymeric building blocks. By suggesting that we take lessons from nature, we do not imply copying the detailed chemistry which allows a biological system to carry out a speci"c function that we seek. This would be impractical in many contexts. Rather, we suggest asking the following questions: are there underlying universalities in the design strategies that nature employs in order to mediate a certain class of functions? If so, can we exploit similar strategies to design synthetic materials that can perform the same class of functions with biomimetic speci"city? The reason for the interest in universal strategies is that these may be easier to implement in synthetic systems than the detailed chemistries of natural systems, and may illuminate the essential physics. However, it is also important to realize that universal strategies will also lead to lower degrees of speci"city compared to situations where the detailed chemistry has been "ne-tuned. Recent work suggests that a possible design strategy employed by natural polymers to a!ect assembly into functionally interesting materials is to exploit multifunctionality and disordered sequence distributions (e.g., [1}3]). Disordered heteropolymers (DHPs) constitute a class of synthetic polymers that embodies these features. These are copolymers containing more than one type of monomer unit, with the monomers connected together in a disordered sequence. The monomers may also be connected with di!erent architectures; e.g., branched versus linear connectivity. An important point is that once synthesis is complete, the sequence and the architecture cannot change in response to the environment. Since DHPs embody multifunctionality and quenched disorder, they serve as excellent vehicles to explore the suggestion that these features may be essential elements for mediating certain types of biomimetic function in synthetic systems relevant to applications. Competing interactions (due to the presence of di!erent types of monomer units), connectivity, and the quenched character of the disordered sequence also make DHPs quintessential examples of frustrated systems.
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
5
In the latter half of this century, physical scientists interested in the condensed phase have directed considerable attention to two broad classes of problems: those involving biological phenomena (e.g., protein folding) and frustrated systems (e.g., the e!ects of frustrating quenched randomness in spin glass physics). In addition to being a way of exploring how biomimetic synthetic soft materials can be designed, studying DHPs also allows us to study those aspects of biopolymer behavior that may be termed physics (as distinguished from detailed chemistry) (e.g., [1}3]). DHPs of particular types (vide infra) also o!er the potential for being excellent vehicles for careful experimental studies of the manifestations of frustrating quenched disorder. In this article, we try to illustrate (via examples) how DHPs can serve as biomimetic polymers and/or as model systems to study the physics of frustrated systems with quenched randomness. We begin (Section 2) by discussing the adsorption of DHPs from solution, and these considerations show that such molecules can exhibit a phenomenon akin to recognition in biological systems. These studies also suggest that the systems that we discuss may be good (and simple) model systems for experimental studies of kinetic phenomena in frustrated systems with quenched disorder. Section 2 also includes a discussion of the connections of the work we describe with experiments and some provocative ideas being considered in evolutionary biology. In Section 3, we discuss theoretical and experimental work which demonstrates that DHPs with branched architectures are good model systems to study e!ects of frustrating quenched disorder on microphase ordering. This article is not a comprehensive or encyclopedic review of DHP physics. However, this article, some recent reviews of the use of these macromolecules as minimalist models to study protein folding (e.g., [1}5]), and a recent review in this journal on theoretical considerations of microphase ordering in molten DHPs with linear architectures [6] provide a glimpse of much of what is known about the physical behavior of these macromolecules.
2. Biomimetic recognition between DHPs and multifunctional surfaces Many vital biological processes, such as transmembrane signaling, are initiated by a biopolymer (e.g., a protein) recognizing a speci"c pattern of binding sites that constitutes a receptor located in a certain part of the surface of a cell membrane. By recognition we imply that the protein adsorbs strongly on the pattern-matched region, and not on other parts of the surface; furthermore, it evolves to the pattern-matched region and binds strongly to it in relatively fast time scales without getting trapped in long-lived metastable adsorbed states in the wrong parts of the cell surface. If synthetic polymers were able to mimic such recognition, it would indeed be useful for many advanced applications. Examples of such applications include sequence selective separation processes [7,8], the development of viral inhibition agents [9}11], and sensors. Polymer adsorption from solution has been studied extensively in recent years (see [12,13] for recent reviews). Most studies have been concerned with the adsorption of polymers with ordered sequence distributions (e.g., homopolymers and diblock copolymers). These studies have taught us many important lessons. One lesson pertinent to our concerns can be illustrated by considering the example of a homopolymer interacting with a chemically homogeneous surface. In this case, once we have chosen the chemical identity of the polymer segments, di!erent surfaces are characterized by the attractive energy per segment between the surface in question and the polymer segments (E/k¹). Thus, if we plot the polymer adsorbed fraction (at equilibrium) as a function of E/k¹, points
6
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
Fig. 1. Schematic representation of an adsorbed polymer chain. The sketch provides the de"nitions of loops, trains, and tails.
on the abcissa correspond to di!erent surfaces. Theoretical and experimental studies have "rmly established the nature of this plot. For small values of E/k¹, there is no adsorption because the energetic advantage associated with segmental binding is not su$cient to beat the entropic penalty associated with chain adsorption. For su$ciently large values of E/k¹, adsorption does occur. The transition from desorbed states to adsorption is a second order phase transition for #exible chains [14]. Adsorbed polymer conformations can be characterized by loops, tails, and trains (see Fig. 1). Fluctuating the distribution of loops while maintaining the same number of contacts is favored because this increases the entropy. These loop #uctuations cause the adsorption transition to be continuous. The practical consequence is that thermodynamic discrimination between surfaces is not sharp } a requisite feature for recognition. The adsorption characteristics of diblock copolymers on striped surfaces have also been understood (e.g., [15,16]). At equilibrium, they adsorb at the interface of the stripes with each block adsorbed on the stripe that is energetically favored. This phenomenon is di!erent from what is meant by recognition in important ways. Firstly, the chain is not localized in a region commensurate with chain dimensions. This is so because it is entropically favorable for the chain to sample the entire interface. Secondly, it seems highly likely that the diblock copolymer chains would be kinetically trapped in regions away from the interface. This is so because adsorbing one block on an energetically favorable stripe while allowing the non-adsorbed block to sample many conformations appears to be a deep free energy minimum. Thus, this system does not seem to exhibit the hallmarks of recognition either. (Much work has also been done on the behavior of molten diblock copolymer layers on patterned surfaces. We do not discuss these studies here because the focus of this section is on adsorption from solution. Readers interested in this topic are directed to a recent review [17] and references therein.) In short, recognition implies a sharp discrimination between di!erent regions of a surface and localization of the chain to a relatively small pattern matched region without getting kinetically trapped in the `wronga parts of the surface. In biological systems it also usually entails adsorption in a particular conformation or shape. Synthetic polymers with ordered sequence distributions do not seem to exhibit these characteristics. Similar conclusions can be reached by perusing interesting studies of DHP adsorption on homogeneous surfaces [18}21] and homopolymer adsorption on chemically disordered surfaces [22]. One way to make synthetic systems mimic recognition is to copy the detailed chemistries which allow natural systems to a!ect recognition. This is not a practical solution in most cases. Recently,
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
7
some work has been done to explore whether there are any universal strategies that may allow synthetic polymers and surfaces to mimic recognition [23}31]. (Such universal strategies may be simpler to implement in practical situations, and may also shed light on the minimal ingredients, or principles, required for synthetic systems to mimic recognition.) This body of work is the primary focus of this section. The purpose of this work has not been to explain the physical and chemical mechanisms that allow recognition in biological systems. However, in order to deduce possible universal strategies, some coarse-grained observations about biological systems have provided the inspiration. Each protein carries a speci"c pattern encoded in its sequence of amino acids. In recent years, great interest in elucidating the physics of protein folding has led to many coarse-grained models for amino acid sequences in proteins. All models exhibit a common feature. In order to illustrate this feature, consider the H/P model [32] wherein amino acids are considered to belong to two classes } hydrophobic and polar. This (and other) models have been used to characterize protein sequences, and it has been found [33}35] that the pattern of H- and P-type moieties is usually not periodically repeating. Similarly, examination of cell and virus surfaces reveals that the chemically di!erent binding sites that constitute receptors (which are recognized by proteins) are also not arranged in a periodically repeating pattern. These observations suggest that disorder and competing interactions (due to preferential interactions between polymer segments and surface sites) may be key ingredients for recognition between synthetic polymers and surfaces. Heteropolymers with disordered sequences carry a pattern encoded in their sequence distribution. The information content is statistical, however, since the sequences are characterized statistically. For example, for a 2-letter DHP (say, A- and B-type segments), the simplest way to describe the disordered sequence distribution is by specifying the average fraction of segments of one type ( f ), and a quantity j that measures the strength of two-point correlations in the chemical identity of segments along the chain [36]. j is directly related to the synthetic conditions and the matrix of reaction probabilities, P. Elements of this matrix, P are the conditional probabilities that GH a segment of type j directly follows a segment of type i. Clearly, j depends upon the choice of the chemical identity of the segments and synthesis conditions. Consequently, synthesis conditions and the choice of chemistry determines the statistical pattern carried by DHPs. If j'0, within a correlation length measured along the chain, there is a high probability of "nding segments of the same type. We shall refer to such an ensemble of sequences as statistically blocky. If j(0, within a certain correlation length measured along the chain, there is a high probability of "nding an alternating pattern of segments. The absolute magnitude of j measures the correlation length. For example, j"0 corresponds to perfectly random sequences, and j"1 implies the correlation length is the entire chain length. Characterization of sequence statistics using f and j implies that we are only looking at two-point correlations to describe the statistical patterns. More elaborate statistical patterns can be described by considering higher order correlations and/or more than two types of segments. Consider the interaction of DHPs with surfaces bearing more than one type of site, with the sites being distributed in a disordered manner. Examples of such surfaces with two kinds of sites distributed in a disordered fashion are shown schematically in Fig. 2. The distribution of these sites on the neutral surface can be characterized statistically. For example, a simple way would be to specify the total number density of both kinds of sites per unit area, the fraction of sites of one type, and the two point correlation function describing how the probability of having a site of type A at
8
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
Fig. 2. Examples of statistically patterned surfaces. White represents a neutral background. The two types of `activea surface sites are depicted using light and dark grey dots. In the panel on the left, within some length scale, there is a high probability of "nding sites of opposite types adjacent to each other. Such surfaces are referred to in the text as statistically alternating. In the panel on the right, within some length scale, there is a high probability of "nding sites of the same type adjacent to each other. Such surfaces are referred to in the text as statistically patchy.
position r is related to the probability of having a site of the same type at position r. Fig. 2 shows speci"c realizations of two surfaces bearing simple statistical patterns. In nature, recognition (with all its hallmarks noted earlier) occurs when the speci"c pattern encoded in its sequence distribution and that carried by the binding sites is matched (i.e., related in a special way). DHP sequences and the surfaces we have described in the preceding paragraph carry statistical patterns. The question we now ask is: will statistically patterned surfaces be able to recognize the statistical information contained in an ensemble of DHP sequences when the statistics characterizing the DHP sequence and surface site distributions are related in a special way? In other words, is statistical pattern matching su$cient for recognition to occur? This question is interesting for three reasons: (1) the answer may tell us what the minimal ingredients are for the occurrence of a phenomenon akin to recognition; (2) if recognition can occur via statistical pattern matching, the phenomenon might be pro"tably exploited in applications; (3) DHPs interacting with functionalized surfaces bearing statistical patterns may be good model systems to study the physics of frustrated systems with quenched disorder. In order to answer this question, we have to address several issues. We must determine whether competing interactions and disorder are su$cient for sharp discrimination between di!erent statistical patterns, and whether the inherent frustration allows localization (in reasonable time scales) on a relatively small part of the surface which is statistically pattern matched. Addressing these issues requires that we study both thermodynamic and kinetic behavior. Let us begin by describing the thermodynamics. 2.1. Theory of thermodynamic properties Srebnik et al. [23] analyzed a bare bones version of the problem in order to examine whether frustration due to competing interactions and quenched disorder are su$cient to obtain sharp discrimination between surfaces with di!erent statistical patterns. In this model, the 2-letter DHPs
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
9
are considered to be Gaussian chains in solution. The surface is comprised of two di!erent types of sites on a neutral background, and each type of site interacts di!erently with the two types of DHP segments. In the in"nitely dilute limit, the physical situation described above corresponds to the following Hamiltonian:
dr , 3 , dn ! dn dr;k(r)d(r(n)!r)h(n)d(z) !bH"! dn 2l
(1)
where r(n) represents chain conformation, k(r) is the interaction strength with a surface site located at r, the factor of d(z) ensures that these sites live on a 2-D plane, l is the usual statistical segment length, and h(n) represents the chemical identity of the nth segment. For a two-letter DHP, this quantity is $1 depending upon whether we have an A- or a B-type segment. Eq. (1) implies that a surface site which exhibits attractive interactions with one type of DHP segment has an equally repulsive interaction with segments of the other type. This is assumed for simplicity. What is important is that interactions between a surface site and the two di!erent types of segments are di!erent. Since the DHP sequence and the surface site distribution are disordered, k(r) and h(n) are #uctuating variables. Later, we shall have much to say about how di!erent types of correlated #uctuations of these variables a!ects the physical behavior. For now, in order to explore the essential physics, let us consider the #uctuations in k(r) and h(n) to be uncorrelated. Further, in order to simplify the analysis, these #uctuations are described by Gaussian processes. This latter approximation should not a!ect the qualitative physics, and generalizes the results to a physical situation with many types of DHP segments and surface sites. Speci"cally, Srebnik et al. [23] take the surface to be neutral on average (i.e., k has a mean value of zero), and the variance of the #uctuations in k is p . This implies that p is the only variable which measures the statistical pattern carried by the surface. Di!erent values of this quantity represent di!erent surfaces. Physically, p is proportional to the total number density of both types of sites on the neutral surface. The uncorrelated sequence distribution is described by h(n) having a mean value of (2f!1), where f is the average composition of one type of segment. The variance is p , and is also related to the average composition (it equals 4f (1!f )). Thus, in this simple case the statistical pattern carried by the DHP sequence is measured by the average composition. In order to proceed, we must average over the quenched sequence distribution and the #uctuations in k that characterize the surface site distribution. Consider the latter issue "rst. If the sites on the surface can anneal in response to the adsorbing chain molecule, then the partition function is self-averaging with respect to the #uctuations in k. This could be the physical situation if the functional groups that represent the sites on the surface were weakly bonded to the surface. If, however, the sites on the surface cannot respond to the presence of the DHP, then the partition function is self-averaging with respect to the #uctuations in k only under restricted circumstances. As has been explicated in many contexts (e.g., [37}39]), the quenched and annealed averages over disorders external to the #uid of interest are equivalent when the medium is su$ciently large, and the time of observation is long enough for the #uid (in our case, the polymer) to sample the medium. Later, we shall quantify these statements by examining results of Monte-Carlo simulations. For the moment, we carry out an analysis that holds for annealed disorders under all circumstances, and
10
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
is appropriate for quenched surface disorders under the restrictions noted above. Therefore, following Feynman [40], we calculate the in#uence functional by averaging over the Gaussian #uctuations in k(r). This obtains the following e!ective Boltzmann factor:
exp[!bH ]"
1 Dk(r) exp ! dr dr k(r)d(r!r)k(r)d(z)d(z) 2p
dr 3 , ! dn dr k(r)d(r(n)!r)h(n)d(z) . ;exp ! dn dn 2l
(2)
The partition function is not self-averaging with respect to the quenched sequence #uctuations under any circumstances. Replica methods [41,42] provide one way to carry out the quenched average. We will consider f"0.5, thereby "xing the statistics of the DHP sequences. We then study adsorption as a function of the surface statistics (i.e., the variance of the distribution that characterizes the #uctuations in k(r)). Replicating the e!ective Hamiltonian in Eq. (2), and carrying out the functional integral corresponding to the average over the distribution of h obtains the following m-replica partition function: K 1GK2" ?
Dr (n) ?
Dk (r) ?
dr 3 K ;exp ! dn ? dn 2l ?
1 ;exp ! dr dr k (r)d d(r!r)k (r)d(z)d(z) ? ?@ @ 2p ?@ ;exp
p dn dr dr k (r)k (r)d(r (n)!r)d(r (n)!r)d(z)d(z) . ? @ ? @ 2 ?@
(3)
This replicated partition function can be written in a form that is more convenient both for thinking and computing. De"ne the following order parameter, Q (r!r), which measures the ?@ conformational overlap on the surface between the replicas:
Q (r, r)" dn d(r (n)!r);d(r (n)!r)d(z)d(z) . ? @ ?@
(4)
This de"nition allows us to rewrite Eq. (3) as a functional integral over this overlap order parameter in the following way:
1GK2"
DQ exp(!E[Q ]#S[Q ]) , ?@ ?@ ?@
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
11
where K E"!ln ?\ K S"ln ?\
1 Dk (r) exp ! dr dr k (r)P (r, r)k (r) , ? ? ?@ @ 2
dr 3 K Dr (n) exp ! dn ? ? dn 2l ?\
;d[Q (r, r)! dn d(r (n)!r);d(r (n)!r)] ?@ ? @
(5)
and P (r, r) is ?@ d(r!r)d ?@ d(z)d(z)!p Q (r, r) . P (r, r)" ?@ ?@ p
(6)
The quantity S is clearly the entropy associated with a given overlap order parameter "eld since it is the logarithm of the number of ways in which the DHP can organize itself in 3D space with the constraint that the overlap between replicas on the 2D surface is Q . Then, E is the associated ?@ energy. As has been demonstrated in the context of protein folding (e.g., [1}5]) and the behavior of DHPs in 3D disordered media [43], these polymers with quenched sequence distributions can exhibit behavior akin to the REM model, the Potts glass with many states, or p-spin models. One consequence of this is that, under certain circumstances, the thermodynamics is determined by a few dominant conformations. This is because frustration due to competing interactions and quenched disorder makes these few conformations energetically much more favorable compared to all others. Since the physical situation that we are considering also embodies the frustrating e!ects of competing interactions and quenched disorder, we must allow for the possibility of such a phenomenon (we will refer to it as freezing for convenience). In fact, since the competing interactions in our case occur on a 2D plane, this e!ect might be enhanced. It is very important to understand that the preceding sentence does not imply that the problem we are considering is one wherein the dimensionality of space is 2. The polymer conformations can (and do) #uctuate in 3D space by forming loops and tails; only the competing interactions in this simple scenario are manifested in 2D space. We shall return to the importance of loop #uctuations later in this section. Mathematically, allowing for the possibility of a few dominant conformations implies that we must allow for broken replica symmetry. Parisi [44] pioneered the way in which to compute and think about broken replica symmetry in the context of spin glasses. For SK spin glasses, replica symmetry is broken in a hierarchical manner [42,44]. For the REM and p-spin models with p'2, one stage of the symmetry breaking process is su$cient for sensible calculations [42]. As noted earlier, since DHPs share some features with these models, a one-step replica symmetry breaking (RSB) scheme is a reasonable approximation (e.g., [1}5,42,43]). Replicas are divided into groups. Replicas within a group have perfect overlap on the surface, and those in di!erent groups do not overlap at all on the surface. The energy can be computed by evaluating the logarithm of the determinant of the matrix P . Mezard and Parisi [44] have ?@
12
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
provided formulas for this quantity when there is broken replica symmetry. Using their formula and a 1-step RSB scheme the energy is computed to be:
1 1 E" !ln p # ln(1!C px ) , x 2 p "p/N , C "p p N/A , (7) where x is the number of replicas in a group, p is the number of contacts with the surface, and A is the surface area of the solid. In writing Eq. (7), the density of adsorbed segments on the surface has been approximated to be uniform. In order to compute the entropy a physically transparent method can be employed. The "rst quantity that we need to compute is the number of ways in which x replicas can be arranged such that their surface conformations overlap perfectly. When polymers adsorb, the conformations are characterized by loops, trains, and tails (see Fig. 1). In the long chain limit, we may ignore tails. Let f (r !r ) be the probability that a loop of length ni starts at r on the surface and ends at r . The , , , , loop length ni ranges from 1 to the chain length, N. Including loops of length 1 implies that trains are incorporated in the computation of the entropy. With this de"nition, the restricted partition function for x replicas in a group can be written as:
dr,2 dr, f V (r,!r,)2f VN (r,!r, )d(n #n #2#n !N) . (8) Z(r,)" L N N L N\ N 2 L L LN The delta function conserves total chain length while allowing loop length #uctuations. We do not integrate over the position of the "rst adsorbed segment, r , for later mathematical convenience. , The entropy for x replicas in a group is obtained by integrating Z over r and then taking the , negative logarithm. In order to compute this partition function, let us "rst introduce a Laplace transform conjugate to N; i.e.,
z(k)" dN z(N)e I,
(9)
where k is the Laplace variable conjugate to N. The product of the functions that describe the loop probabilities exhibits a convolution structure, and so it is convenient to introduce a 2D Fourier transform conjugate to the 2D spatial coordinate that de"nes position on the surface. The Fourier}Laplace transform of the restricted partition function is
, N (10) Z(k, j)" f V (k)e\HL . L L The loop probability factors must be of two types. Following Hoeve et al. [45], the factor f for a loop of unit length is taken to be f (r)"ud(r!l)
(11)
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
13
where u is the partition function for one adsorbed segment, and depends upon the chemical details of chain constitution. In the simple model that we are considering now, the DHP segments do not interact with each other. So, the loop probability factor for longer loops is Gaussian:
r C exp ! f (r)" L 2nl n
(12)
where C is a normalization constant that depends upon chain sti!ness, n is the loop length, and r is the distance between loop ends. Substituting the Fourier transforms of the loop probability factors into Eq. (10), integrating over r , and inverting the Fourier and Laplace transforms yields the partition function that we seek. , Now, noting that there are m/x groups of replicas, the entropy in Eq. (5) is calculated as the product of m/x and the negative logarithm of this partition function. In carrying out these manipulations, the sum over loop lengths in Eq. (10) can be taken to be an integral since we are concerned with the long chain limit. Combining the result of the entropy calculation with Eq. (7) for the energy obtains the following free-energy functional:
1 1 1 F" !ln p # ln(1!C p x ) ! x x N 2
N , p N ;ln q O
2pl O CO((4!3x )/2) ;[N!(p N!q)] \V O\ . COV u.M ,\OV 3x C(q(4!3x )/2) (13)
Srebnik et al. [23] also added a term that represents contributions from non-speci"c three-body repulsions to the energy. It is not essential to add this term, but at high values of the adsorbed fraction it may be necessary for stability. A mean-"eld solution for the order parameters, p and x , is obtained by extremizing the free energy functional with respect to them. It is worth noting that the free energy functional must be minimized with respect to p and maximized with respect to x . The reason that the free energy functional has to be maximized with respect to x is that when the mP0 limit is taken in the replica calculation the lowest-order correction to the free energy evaluated at the saddle point value is negative; this is because the dimensionality of the integral is m(m!1), which is negative when the replica limit is taken [42]. A simple computer code allowed Srebnik et al. [23] to obtain the saddle point values of the two order parameters pN and x . By following p and x we can learn about the adsorption characteristics of DHPs onto disordered multifunctional surfaces. The order parameter p is simply the fraction of adsorbed segments. It acquires values greater than zero when adsorption occurs. The order parameter x has been interpreted to be 1! P, where P is the probability with which conformation i occurs. G G G When a multitude of conformations are sampled, each of these probabilities is very small and x acquires the asymptotic value of unity. This is the usual situation in polymer physics because entropic considerations lead to large conformational #uctuations. Natural polymers seem to be
14
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
designed such that, under appropriate circumstances, conformational #uctuations are suppressed and a few dominant conformations determine the thermodynamics. DHPs can also exhibit similar physics (e.g., [1}5,43]). In fact, our quest for biomimetic recognition can only be successful if adsorption occurs in a few dominant conformations. Let us "x all the parameters that determine the nature of the chains. Then, let us study the variation of the order parameters p and x with C . Di!erent values of this parameter correspond to di!erent surfaces. Speci"cally, each value of C corresponds to a di!erent total number density of sites on the surface. The number of sites per unit area on the surface can be adjusted experimentally by a number of means, the simplest being the adsorption of functional groups onto a surface from solutions of varying concentrations [7]. Fig. 3 shows how the two relevant order parameters vary with C . A uniform neutral surface corresponds to C "0. Therefore, when this parameter is small no adsorption occurs. The order parameter x is unity since in the absence of intersegment interactions and adsorption all conformations are energetically equally likely. Fig. 2 shows that above some value of C weak adsorption occurs with a multitude of conformations being sampled. The transition from no adsorption to weak adsorption is continuous. The theory also predicts that at a higher threshold value of C a sharp transition from weak to strong adsorption occurs. This adsorption transition is accompanied by x becoming less than unity. This signals that the polymer adsorbs in only a few dominant conformations (at least as far as the adsorbed segments, and hence, the loop structure is concerned). As noted earlier, in the simple model that we have been considering, each point on the abcissa corresponds to a di!erent surface. Thus, the sharp transition from weak to strong
Fig. 3. The order parameters p and x plotted as a function of C . For the calculation described in the text, each point on the abcissa represents a di!erent statistically patterned surface.
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
15
adsorption implies sharp thermodynamic discrimination between surfaces bearing di!erent statistical patterns } one of the hallmarks of recognition. The physical reason for the sharp transition from weak to strong adsorption (and the freezing into few dominant conformations that accompanies it) can be understood by "rst discussing what happens when the continuous transition to weak adsorption occurs. When there are only a few sites on the surface (small C ), the energetic advantage associated with chain segments binding to preferred sites is not su$cient to overcome the entropic penalty for chain adsorption. For higher values of C , adsorption occurs because now the number of sites is su$cient for the favorable energetics of preferential segmental binding to overcome the entropic penalty. At the same time, since the number of surface sites per unit area is small, it is very easy for the chain to avoid unfavorable interactions. Furthermore, as shown schematically in Fig. 4, because it is easy to avoid unfavorable interactions, the chain can obtain the same energetic advantage in many di!erent conformations. Thus, the system minimizes free energy by sampling a multitude of conformations which have roughly the same energy. As the loading of surface sites increases, however, it is intuitively obvious that it becomes increasingly di$cult to avoid unfavorable interactions. In fact, it is clear that above some threshold loading of surface sites, most arbitrary adsorbed conformations will be subjected to many unfavorable interactions. Thus, for a su$ciently high loading (and hence, adsorbed fraction), most adsorbed conformations constitute a continuous spectrum of high energy states. However, there will be a small ensemble of conformations that are signi"cantly lower in energy. These conformations are the few that carefully avoid unfavorable interactions (as best as possible given the disorder in the sequence and the surface site distributions). These few
Fig. 4. Schematic representation of DHPs interacting with a surface bearing just a few sites. The two panels depict di!erent conformations with the same energy.
16
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
conformations are energetically more favorable than all others. Thus, in a manner reminiscent of the random energy model of spin glasses, the energy spectrum develops a gap between the small ensemble of conformations that adsorb in a pattern-matched way and the multitude of others. By pattern matching we mean registry between adsorbed segments and the preferred sites. As the loading of surface sites increases, the system becomes more frustrated, and the energy gap between the pattern-matched adsorbed conformations and all others increases. Beyond a threshold value of the loading, the energy gap becomes much greater than the thermal energy, k¹. This causes the polymer chain to sacri"ce the entropic advantage of sampling a multitude of conformations, and it adsorbs in the few pattern-matched conformations. Of course, these pattern-matched conformations are strongly adsorbed. Thus, we get a phase transition with a sharp increase in the adsorbed fraction and passage to a thermodynamic state where only a small ensemble of conformations are sampled. Mean-"eld theory predicts that this transition is "rst order. There are two free-energy minima, one corresponding to a replica symmetric solution of the equations and the other to a solution with broken replica symmetry [46]. Before the transition from weak to strong adsorption, the minimum corresponding to the replica symmetric solution is the global minimum. When the transition occurs, the solution with broken replica symmetry becomes the global minimum. It is worth remarking that, while a two-letter DHP in solution does exhibit REM like behavior, the energy gap between the low-lying conformations and the continuous part of the spectrum is not large [5]. Signi"cantly larger gaps are obtained for designed sequences. In the situation we have been considering, interaction with the disordered distribution of surface sites adds another source of frustration which makes the energy gap in a REM-like picture quite large when the surface loading exceeds a threshold value even for random sequences and surface site distributions. At least this appears to be true for the thermodynamic behavior. We shall see later that kinetic considerations require delicate design of the statistics of heteropolymer sequence and surface site distribution statistics. The preceding discussion provides a compelling physical argument for the existence of a transition from weak to strong adsorption accompanied by the adoption of a few dominant conformations when the statistics of the DHP sequence and surface site distributions are related in a special way. However, we have provided no physical reason for the transition to be sharp (or "rst order as predicted by the mean-"eld calculation). The order of the transition can be established rigorously only by carrying out a renormalization group calculation. A proper calculation of this sort has not yet been performed. From a fundamental standpoint, it is important that the order of the transition be established in a rigorous way. From a practical standpoint what is important is that the transition is sharp and hence can display one of the hallmarks of recognition } a sharp discrimination between surfaces to which the chains bind weakly and others to which it binds very strongly. The physical reason for the sharpness of the transition is suggested by a simple model of the phenomenon under consideration; Monte-Carlo simulations for "nite size systems is also indicative of a "rst-order transition. First, let us discuss a model [26] which complements the replica "eld theory that we have discussed and is motivated by simple physical considerations. Again, consider DHP chains comprised of two types of segments (A and B) interacting with a surface functionalized by two di!erent types of sites. The segments of type A prefer to interact with one type of surface site, and those of type B exhibit the opposite preference. Thus, from an energetic standpoint, there are two types of segment-surface contacts: good and bad contacts. Good contacts are those that
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
17
involve preferred segment-surface site interactions. Let us try to develop a free-energy functional with the order parameters being the total number of adsorbed segments (p) and the number of good contacts (q). The energy corresponding to given values of p and q is: E "[qdE#pE ] k¹
(14)
where dE is the energy di!erence between good and bad contacts, and E is the energy of a bad contact. We now need to compute the entropy for a chain of length N and p adsorbed segments of which q are good contacts. This entropy can be partitioned into three separate contributions. Firstly, there is a `mixinga entropy (S ) associated with the number of ways to choose p adsorbed
segments out of N. There is also the entropy loss (S ) associated with segmental binding, and the entropy (S ) associated with loop #uctuations in the adsorbed conformations. As we shall see, the last contribution is of crucial importance. The simplest possible approximation for S yields
S
"!p ln p !(1!p ) ln(1!p ) N
(15)
where p "p/N is the fraction of adsorbed segments. As is usual, the loss in entropy upon segmental binding is given by S
/N"!wp
(16)
where u is a constant related to chain #exibility and solution conditions. Now consider the computation of the loop entropy. When a homopolymer adsorbs on a chemically homogeneous surface, the energetic advantage associated with every segment-surface contact is the same. Thus, the adsorbed chain exhibits large #uctuations in the loop structure to maximize entropy. It is important to note that the loops live in three-dimensional space, and any description of the physics must account for the loop #uctuations properly. The problem that we are considering is distinctly di!erent from homopolymer adsorption because the segment}surface contacts are of two types. The existence of good and bad contacts implies that there are two types of loops. There are loops associated with forming good contacts at both ends, and those that are associated with forming the other contacts. These two types of loops are fundamentally di!erent in character. Each loop is characterized by the loop length and the distance between loop ends on the surface. For the loops associated with forming good contacts at both ends, only certain values of these quantities are allowed. This is because the two segments that correspond to the loop ends must be bound to surface sites with which they prefer to interact. The allowed loop lengths and distance between the loop ends on the surface are intimately related to the probabilities of "nding certain types of sites and segments at di!erent locations along the chain and on the surface. Thus, the allowed #uctuations of loops associated with good contacts are determined by the statistics that characterize the chain sequence and the surface site distribution. Loops associated with contacts that are not good are not restricted in this manner, and the usual
18
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
#uctuations in loop length and distance between loop ends are allowed. The above argument suggests that competing interactions and disorder cause loop #uctuations to be suppressed by the formation of good contacts. It is reasonable to suspect that if the statistics of the chain sequence and the surface site distribution are such that there is a high probability for the formation of good contacts, loop #uctuations are strongly suppressed. Suppression of loop #uctuations makes the chain e!ectively sti!er. Su$ciently sti! chains are known to undergo sharp ("rst-order) adsorption transitions [14]. These arguments suggest that strong suppression of loop #uctuations resulting from frustration due to competing interactions and quenched disorder, and statistical pattern matching are the origins of the sharp adsorption transition. The meaning of statistical pattern matching also is made more clear; we mean that the statistics that describe the DHP sequence and the surface site distribution are such that the probability of making good contacts in certain adsorbed conformations becomes su$ciently high. In order to explore the veracity of these arguments, we need to write down a mathematical formula for the entropy corresponding to loops associated with good and bad contacts when they coexist. (For ease of reference, we shall refer to these loops as quenched and annealed, respectively.) In order to develop such a formula, it is instructive to "rst consider the entropy associated with each type of loop when it is the only type of loop that exists. Once these formulas are available, it is relatively straightforward to combine them properly to obtain what we seek. Let us begin with the quenched loops. Since the bare chains are non-interacting (and hence Gaussian) in our model, the loop factor for a loop of length n returning to the plane with the two ends separated by a distance d is again (see Eq. (12)) given by
d C exp ! . P(n, d)" 2nb n
(17)
The quantities n and d depend upon the statistics that describe the chain sequence and surface site distributions. For example, in the case of uncorrelated #uctuations in the surface site distribution, the most probable value of d&1/p , where p is the width of the distribution and is proportional to the surface loading of both types of sites. Since we are considering a situation where only quenched loops exist, if there are q adsorbed segments, there are q quenched loops with the average loop length being N/q. Shortly, we shall see how the average loop length (and hence n) is closely related to the statistics which describe the sequence and surface site distributions. In view of the considerations noted above and Eq. (17), it is easy to write down an expression for the entropy corresponding to q quenched loops. Speci"cally, taking n and d to equal their most probable values, obtains
aq S "q ln(Cq)! 2bp N
(18)
where q "q/N. Now consider the entropy associated with forming annealed loops only. Toward this end, consider the well-known problem of a homopolymer adsorbing on a chemically homogeneous surface since herein the loops are annealed. Let the energy bonus for segmental adsorption be represented by a potential (z) which is zero everywhere except at the surface; z is the coordinate normal to the surface. The e!ective energy bonus for segmental adsorption, ln b is then given by the
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
19
following formula:
ln b"ln
dz[e\(XI2!1] .
(19)
As noted earlier, in this case, the loop lengths and the distance between loop ends can #uctuate with the only constraint being the "xed chain length. This problem is similar to the adsorption of a homopolymer chain (which lives in three-dimensions) to a point, and its solution is presented in [14]. This method can be adapted to solve the problem we are considering. The main di!erence between the problem considered in [14] and our concern is that the potential is imposed by an impenetrable two-dimensional manifold rather than a point. This imposes certain additional symmetries. Exploiting these symmetries, it is easy to show [26] that the following Schrodinger-like equation describes the problem under consideration: [1#bd(z)]g( t(z)"lt(z)
(20)
where g is the standard connectivity operator, and the eigenfunction t and the eigenvalue l have their usual meaning. Very simple manipulations (described in [14]) lead to the following relationship between b and l:
1 g(k) 1 " dk , [l!g(k)] b 2p
(21)
where k is the Fourier variable conjugate to z. We seek the entropy corresponding to the formation of P annealed loops. This can be obtained by "rst deriving a relationship between b and P. In order to "nd this relationship, it is convenient to de"ne a generating function z(N, b) as follows: z(N, b)" b.Z(N, P) (22) . where Z(N, P) is the partition function for a chain of length N with P annealed loops. Of course, this generating function is related to the eigenvalue in Eq. (20) in the usual way; i.e., l,"z(N, b). For adsorption problems such as this, the ground state dominance approximation is appropriate [14,47]. This implies that we may evaluate the sum in Eq. (22) using the saddle point approximation. In other words, P"b
R ln z . Rb
(23)
With this approximation, making use of the relationships between b and l (Eq. (21)) and that between l and the generating function allows us to obtain the equilibrium value for P/N. Speci"cally, we "nd that P 1 dk g(k)/[l!g(k)] " . N l dk g(k)/[l!g(k)]
(24)
20
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
The entropy is now easy to calculate as the free energy equals !N ln l and the energy bonus associated with segmental adsorption is ln b. We "nd that
S 1 F E P g(k) "! # "ln l! ln dk . N 2p N¹ N¹ N [l!g(k)]
(25)
Eqs. (21) and (24) can be solved simultaneously to obtain P as a function of b, and Eqs. (21) and (25) yield the relationship between the entropy and b. Thus, we obtain the entropy as a function of P. Now, we need to compute the entropy when quenched and annealed loops co-exist. On physical grounds, it is clear that the total number of segment}surface contacts is greater than or equal to the number of good contacts. This implies that the annealed loops live within the quenched ones. This is illustrated schematically in Fig. 5. The annealed loops can redistribute among the quenched ones in an unconstrained manner. It is most convenient to consider this physics in the grand canonical ensemble, whence we can say that the chemical potential of the annealed loops is the same in all the quenched loops. The chemical potential can be easily calculated from the equations we have derived so far since it equals !RS/RP. One "nds that it is a monotonic function of P. This fact, when combined with the observation that the chemical potential of annealed loops must be equal in the quenched loops, leads to the conclusion that the concentration of annealed loops is the same in each quenched loop. By concentration, we mean the quantity p "number of annealed loops/length of quenched loop. These remarks allow us to properly combine the formulas for the entropies of quenched and annealed loops (Eqs. (18), (21), (24) and (25)). Noting that there are q quenched loops (q good contacts), that p "p !q , and that p is the same in each quenched loop lead us to the following expression for the loop entropy corresponding to p adsorbed segments of which q are good contacts:
1 S "q ln P(1/q , d)# s(pN ) q N
(26)
where P is the probability written down in Eq. (17), and s(p ) is the entropy of annealed loops with concentration p divided by n. The latter quantity is obtained from Eqs. (21), (24), and (25) with one
Fig. 5. Schematic depicting quenched and annealed loops. The darkly shaded loops correspond to loops with both ends being good contacts. The lightly shaded loops do not have good contacts at the end and exhibit signi"cant #uctuations.
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
21
modi"cation. The calculation leading to these equations considered a uniform surface. We are concerned with a situation where adsorption can only occur on particular surface sites which do not uniformly cover the surface. The concomitant entropy loss is proportional to ln p , and is accounted for by Chakraborty and Bratko [26]. Combining Eqs. (14) and (26), we obtain the free energy density f as a function of the order parameters, p and q to be: f"q dE#(u#E )p #p ln p #(1!p ) ln(1!p )
b aq !s(p !q )!(p !q ) ln p . ! q ln(Cq)# a 2p b These two order parameters can be further related by noting that
(27)
q "p P
(28) where P is the probability of making a good contact. At in"nite temperature, when entropy is irrelevant, P is simply related to the statistics of the sequence and surface site distributions. Let us denote this intrinsic probability for making good contacts by P . It has been conjectured that [26] the relationship between P and the sequences and surface site statistics is the following: P " P (m)P (m). Here P (m) is the probability of "nding a block of length m of like segments on K the chain sequence, and P (m) is the probability of "nding a patch of size m of like sites on the surface. This conjecture has been found to be consistent with Monte-Carlo simulation results [24,25]. Note that given the statistics that describe the DHP chain sequence and surface site distributions, P (m) and P (m) can be easily computed. At "nite temperatures, the probability of making good contacts is modi"ed by entropic considerations, and P must be weighted by the free energy for making good contacts in the following manner: e\@$ (29) P "P P e\@$ #(1!P )e\@$ where F and F are the free energies associated with quenched and annealed contacts (loops), and can be calculated from the equations derived earlier. Eqs. (27)}(29) need to be solved numerically to obtain the values of the order parameters pN and qN which minimize the free energy. Chakraborty and Bratko [26] have obtained this mean-"eld solution. The purpose of this exercise is to obtain some insight into the origin of the sharp transition. Thus, let us consider results for the simplest possible scenario } uncorrelated sequence and surface site #uctuations. In this case, P is simply proportional to the product of the widths of the two statistical distributions. Fig. 6 shows the variation of p and q with p , the width of the distribution that characterizes the surface site #uctuations. We have taken the statistics of the DHP sequence to be "xed in constructing Fig. 6. Thus, points on the abcissa in this "gure correspond to di!erent surfaces. Fig. 6 shows that when p is small, both pN and qN are zero. This is simply a re#ection of the fact that the energetic advantage associated with segmental binding is not su$cient to overcome the concomitant entropic penalty. This is because the number of binding sites available on these surfaces is not su$cient. When p becomes su$ciently large, adsorption does occur. However, it is
22
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
Fig. 6. The order parameters p (solid line) and q (dotted line) plotted as a function of p . For the calculation described in the text, each point on the abcissa corresponds to a di!erent surface.
important to note that the number of good contacts is very small in this weak adsorption limit. Above a threshold value of p , our theory predicts a sharp transition from weak to strong adsorption. This transition coincides with a jump in q , signifying that now the preponderance of contacts are good ones. The entropy is now dominated by that associated with the quenched loops, and is low because loop #uctuations are suppressed. Notice that after the sharp transition q grows faster than p , and ultimately approaches p . These results provide evidence for the argument made earlier that the suppression of loop #uctuations is the origin of the sharp transition from weak to strong adsorption. This is evident from the result that the number of good contacts (quenched loops) jumps at this transition. A preponderance of good contacts implies a strong suppression of loop #uctuations; i.e., we have a situation resembling the adsorption of a sti! chain, a case for which the adsorption transition is known to be "rst order [14]. In some ways, the phenomenon we are considering resembles protein folding (or heteropolymeric models of folding). In the latter situation, a "rst-order transition called the coil}globule transition occurs wherein the preponderance of contacts become native ones. This is followed by a continuous transition to the "nal low entropy folded state. The sharp transition we see may be considered the analog of the coil}globule transition. The fact that in the strongly adsorbed state the quenched loops dominate is very signi"cant. The dominance of quenched loops implies that the chain adopts a small number of conformations characterized by a certain distribution of loops speci"c to the sequence and surface site distributions. Only small #uctuations around these conformations (shapes) of the adsorbed chain occur after the transition. This adoption of a small class of shapes makes the phenomenon we are considering richer than other successful e!orts to elucidate strategies that can localize chains to certain regions of a surface [29}31]. This feature was also signaled in the replica "eld theory by broken replica symmetry coinciding with the transition from weak to strong adsorption. In the simple model that we have just discussed, the structure of the quenched loops is measured only by the average length, q. Notice that even this quantity is determined by the probability of good contacts; i.e., the statistics of the sequence and surface site distributions. This suggests that the class
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
23
of shapes that are adopted upon strong adsorption is determined by the statistical patterns on the chain and the surface. This issue of the emergence of a particular class of shapes (conformations) upon recognition will be explored in great detail later when we consider the kinetics of pattern recognition, and the suggestions of the thermodynamic model we have been considering will become vivid. 2.2. Monte-Carlo simulations of thermodynamic properties The predictions of the models we have been considering are consistent with a series of Monte-Carlo simulations designed to compute thermodynamic properties [24,25,27]. These studies were carried out using an adaptation of the non-dynamic ensemble growth method pioneered by Higgs and Orland [48,49]. The simulations were carried out on a cubic lattice. The introduction of the lattice does a!ect quantitative predictions. However, the phenomenology is expected to be the same because of reasons that have been explained in detail in [48]. A particular sequence is "rst drawn from the statistical distribution under consideration. A particular realization of the surface site distribution is also drawn from the statistical distribution under consideration. M monomers are then placed randomly with Boltzmann probabilities dictating the positional probabilities. These positions are allowed to vary between 0 and 2N where N is the length of the polymer we wish to simulate. This is equivalent to studying isolated chains con"ned between identical surfaces separated by a distance, 4N. One then attempts to add a second segment of type A or B (as speci"ed by the particular realization of the sequence) at the end of each monomer. In other words, 6M trials are made. M dimers are then chosen with Boltzmann probabilities. The potential energy is determined by intersegment interactions and interactions with the surface sites; excluded volume interactions are enforced. This process is continued until chains with the desired length have been grown. For M
g (Nx )!g (Nx ) , ;h (Nx , Nx )#(PM)g (Mx )g (Mx ) x !x (2p)C "6PMh (Mx , Mx )#6+PMg (Nx )g (Mx )H (Mx !Mx , Mx ) # (PM)g (Mx )g (Mx )g (Mx )h (Nx , Nx ), (A.9) and s "qb/6 for i"1, 2 and s "(q #q )b/6. The functions C (q , q , q ) are G G G (2p)C "24Nh (Nx , Nx , Nx ) , (2p)C "96PMNg (Mx )h (Nx , Nx , Nx ) , (2p)C "12+2PMNH (!Mx , Mx #Mx )[2h (Nx , Nx )#h (Nx , Nx )] # 4(PMN)g (Mx )g (Mx )h (Nx , Nx , Nx ) (PM) g (Nx )!g (Nx ) g (Nx )!g (Nx ) ! # g (Mx )g (Mx ) x !x x !x x !x 4(PMN) # g (Mx )g (Mx )[h (Nx , Nx )!h (Nx , Nx )] N(x !x ) # 2(PMN)g (Mx )g (Mx )h (Nx , Nx , Nx ), , (A.10)
(2p)C "8 3NPMg (Nx )H (Mx !Mx , Mx !Mx , Mx ) 12PM h (Nx , Nx )H (!Mx , Mx #Mx )g (Mx ) x g (Nx )!g (Nx ) H (!Mx , Mx #Mx ) # 3PMg (Mx ) x !x #
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
#6
55
(PM) g (Mx )g (Mx )g (Mx )[h (Nx , Nx ) x
! H (Nx !Nx , Nx !Nx , Nx )]
h (Nx , Nx )!h (Nx , Nx ) , # 6(PM)g (Mx )g (Mx )g (Mx ) x !x (2p)C "24PMh (Mx , Mx , Mx ) # 24PMg (Mx )H (Mx !Mx , Mx !Mx , Mx )g (Nx ) # 12PMg (Nx )H (!Mx , Mx #Mx )H (!Mx , Mx #Mx ) # 72PMg (Mx )g (Mx )h (Nx , Nx )H (!Mx , Mx #Mx ) # 24(PM)g (Mx )g (Mx )g (Mx )g (Mx )h (Nx , Nx , Nx ) , where s "qb/6 for i"1, 2, 3, and s "(q #q #q )b/6, s "(q #q )b/6, s " G G (q #q )b/6 and s "(q #q )b/6. The following functions have been de"ned for simplicity:
h (y )"
L dn exp[!(n !n )y ] dn
1!g (y ) g (y ) " , " 2 y
(A.11)
L L dn dn exp[!(n !n )y !(n !n )y ] dn
h (y , y )"
g (y )!g (y ) 1 , h (y )! " y !y y h (y , y , y )"
dn
L
dn
L
dn
L
dn exp[!(n !n )y !(n !n )y
!(n !n )y ]
g (y )!g (y ) g (y )!g (y ) 1 1 ! h (y , y )! " y !y y !y y !y y H (y , y )"
dn
L
dn exp[!n y !n y ]
1 " [g (y )!g (y #y )] , y
,
56
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
H (y , y , y )"
L L dn dn exp[!n y !n y !n y ] dn
1 " [H (y , y )!H (y #y , y )] . y Combining all these expressions yields
Q K 1 K 1 i "1# ! dql ? cl (ql ?)M(ql ?)cl (ql ?)# dql ? dql ? < < 2 6 ? [C (ql ? , ql ? )c (ql ? )c (ql ? )c (!ql ? !ql ? ) # C (ql ? , ql ? )c (ql ? )c (ql ? )c (!ql ? !ql ? ) # C (ql ? , ql ? )c (ql ? )c (ql ? )c (!ql ? !ql ? ) # C (ql ? , ql ? )c (ql ? )c (ql ? )c (!ql ? !ql ? )] #
1 dql ? dql ? dql ? [C (ql ? , ql ? , ql ? )c (ql ? )c (ql ? )c (ql ? )c (!ql ? !ql ? !ql ? ) 24
# C (ql ? , ql ? , ql ? )c (ql ? )c (ql ? )c (ql ? )c (!ql ? !ql ? !ql ? ) # C (ql ? , ql ? , ql ? )c (ql ? )c (ql ? )c (ql ? )c (!ql ? !ql ? !ql ? ) # C (ql ? , ql ? , ql ? )c (ql ? )c (ql ? )c (ql ? )c (!ql ? !ql ? !ql ? ) # C (ql ? , ql ? , ql ? )c (ql ? )c (ql ? )c (ql ? )c (!ql ? !ql ? !ql ? )]
1 K (A.12) dql ? dql @1cl (ql ?)M (ql ?)cl (ql ?)cl (ql @)M (ql @)cl (ql @)2 4 ?$@ where the matrix M is .> 2 +(q !q )x#exp[!(q !q )x]!1, (M ) (q)" H\ H H\ x H H .> exp(q x)!exp(q x) exp(!q x)!exp(!q x) G G\ ; H\ H , #2 (A.13) x x GH 1!exp(!Mx) . exp[!x(q !q )] , (M ) (q)"PMg(Mx)#2 H G x GH 1!exp(!Mx) .> . exp(!x"q !q ")!exp(!x"q !q ") H G H G\ (M ) (q)"(M ) (q)" x x G H #
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
57
Combining Eqs. (A.1) and (A.11) yields 1RK2 as a function of the order parameters and conjugate "elds. We now evaluate the functional integrals over the conjugate "elds using a saddle point approximation; i.e., d ln1RK2[o , o , c , c ] "0 , dc? (!ql ) d ln1RK2[o , o , c , c ] "0 . dc? (!ql )
(A.14)
Solving the above equations obtains 1S[o , o ]2"lim [1R2 [o , o ]!1]/k as a functional I of o and o :
i < dql dql [C (ql , ql )c (ql )c (ql )c (!ql !ql ) 1S[o , o ]2"! dql ol =ol # 6< 2 # C (ql , ql )c (ql )c (ql )c (!ql !ql ) # C (ql , ql )c (ql )c (ql )c (!ql !ql ) # C (ql , ql )c (ql )c (ql )c (!ql !ql )] #
1 dql dql dql [C (ql , ql , ql )c (ql )c (ql )c (ql )c (!ql !ql !ql ) 24
, and with the experimental data on the hyper"ne splitting in muonium, hydrogen and deuterium. 2001 Elsevier Science B.V. All rights reserved. PACS: 12.20.!m; 31.30.!r; 32.10.Fn; 31.15.Ar Keywords: Quantum electrodynamics; Bound states; Fine and hyper"ne structure; Lamb shift; Radiative corrections; Recoil corrections; Radiative-recoil corrections
66
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
1. Introduction Light one-electron atoms are a classical subject of quantum physics. The very discovery and further progress of quantum mechanics is intimately connected to the explanation of the main features of hydrogen energy levels. Each step in development of quantum physics led to a better understanding of the bound state physics. Bohr quantization rules of the old quantum theory were created in order to explain the existence of the stable discrete energy levels. The nonrelativistic quantum mechanics of Heisenberg and SchroK dinger provided a self-consistent scheme for description of bound states. The relativistic spin one half Dirac equation quantitatively described the main experimental features of the hydrogen spectrum. Discovery of the Lamb shift [1], a subtle discrepancy between the predictions of the Dirac equation and the experimental data, triggered development of modern relativistic quantum electrodynamics, and subsequently the Standard Model of modern physics. Despite its long and rich history the theory of atomic bound states is still very much alive today. New importance to the bound state physics was given by the development of quantum chromodynamics, the modern theory of strong interactions. It was realized that all hadrons, once thought to be the elementary building blocks of matter, are themselves atom-like bound states of elementary quarks bound by the color forces. Hence, from a modern point of view, the theory of atomic bound states could be considered as a theoretical laboratory and testing ground for exploration of the subtle properties of the bound state physics, free from further complications connected with the nonperturbative e!ects of quantum chromodynamics which play an especially important role in the case of light hadrons. The quantum electrodynamics and quantum chromodynamics bound state theories are so intimately intertwined today that one often "nds theoretical research where new results are obtained simultaneously, say for positronium and also heavy quarkonium. The other powerful stimulus for further development of the bound state theory is provided by the spectacular experimental progress in precise measurements of atomic energy levels. It su$ces to mention that the relative uncertainty of measurement of the frequency of the 1S}2S transition in hydrogen was reduced during the last decade by three orders of magnitude from 3;10\ [2] to 3.4;10\ [3]. The relative uncertainty in measurement of the muonium hyper"ne splitting was reduced recently by the factor 3 from 3.6;10\ [4] to 1.2;10\ [5]. This experimental development was matched in recent years by rapid theoretical progress, and we feel that now is a good time to review bound state theory. The theory of hydrogenic bound states is widely described in the literature. The basics of nonrelativistic theory is contained in any textbook on quantum mechanics, and the relativistic Dirac equation and the Lamb shift are discussed in any textbook on quantum electrodynamics and quantum "eld theory. An excellent source for the early results is the classic book by Bethe and Salpeter [6]. The last comprehensive review of the theory [7] was published more than ten years ago. A number of reviews were published recently which contain new theoretical results [8}15]. However, a coherent discussion of the modern status of the theory, to the best of our knowledge, is missing in the literature, and we will try to provide this in the current paper. Our goal here is to present a state of the art discussion of the theory of the Lamb shift and hyper"ne splitting in light hydrogenlike atoms. In the body of the paper the spin-independent corrections are discussed mainly as corrections to the hydrogen energy levels (see Fig. 1), and the
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
67
Fig. 1. Hydrogen energy levels.
theory of hyper"ne splitting is discussed in the context of the hyper"ne splitting in the ground state of muonium (see Fig. 2). These two simple atomic systems are singled out for practical reasons, because highly precise experimental data exists in both cases, and the most accurate theoretical results are also obtained for these cases. However, almost all formulae in this review are valid also for other light hydrogenlike systems, and some of these other applications, including muonic atoms, will be discussed in the text as well. We will present all theoretical results in the "eld, with emphasis on more recent results which either were not discussed in su$cient detail in the previous theoretical reviews [6,7], or simply did not exist when the reviews were written. Our emphasis on the theory means that, besides presenting an exhaustive compendium of theoretical results, we will also try to present a qualitative discussion of the origin and magnitude of di!erent corrections to the energy levels, to give, when possible, semiquantitative estimates of expected magnitudes, and to describe the main steps of the theoretical calculations and the new e!ective methods which were developed in recent years. We will not attempt to present a detailed comparison of theory with the latest experimental results, leaving this task to the experimentalists. We will use the experimental results only for illustrative purposes. The paper consists of three main parts. In the introductory part we brie#y remind the reader of the main characteristic features of the bound state physics. Then follows a detailed discussion of the
68
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 2. Muonium energy levels.
corrections to the energy levels which do not depend on the nuclear spin. The last third of the paper is devoted to a systematic discussion of the physics of hyper"ne splitting. Di!erent corrections to the energy levels are ordered with respect to the natural small parameters a, Za, m/M and nonelectrodynamic parameters like the ratio of the nucleon size to the radius of the "rst Bohr orbit. These parameters have a transparent physical nature in the light hydrogenlike atoms. Powers of a describe the order of quantum electrodynamic corrections to the energy levels, parameter Za describes the order of relativistic corrections to the energy levels, and the small mass ratio of the light and heavy particles is responsible for the recoil e!ects beyond the reduced mass parameter present in a relativistic bound state. Corrections which depend both on the quantum electrodynamic parameter a and the relativistic parameter Za are ordered in a series over a at "xed power of Za, contrary to the common practice accepted in the physics of highly charged ions with large Z. This ordering is more natural from the point of view of the nonrelativistic bound state physics, since all radiative corrections to a contribution of a de"nite order in the nonrelativistic expansion originate from the same distances and describe the same physics, while the radiative corrections to the di!erent terms in nonrelativistic expansion over Za of the same order in a are generated at vastly di!erent distances and could have drastically di!erent magnitudes. A few remarks about our notation. All formulae below are written for the energy shifts. However, not energies but frequencies are measured in the spectroscopic experiments. The formulae for the energy shifts are converted to the respective expressions for the frequencies with the help of the
We will return to a more detailed discussion of the role of di!erent small parameters below.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
69
De Broglie relationship E"hl. We will ignore the di!erence between the energy and frequency units in our theoretical discussion. Comparison of the theoretical expressions with the experimental data will always be done in the frequency units, since transition to the energy units leads to loss of accuracy. All numerous contributions to the energy levels in di!erent sections of this paper are generically called *E and as a rule do not carry any speci"c labels, but it is understood that they are all di!erent. Let us mention brie#y some of the closely related subjects which are not considered in this review. The physics of the high Z ions is nowadays a vast and well developed "eld of research, with its own problems, approaches and tools, which in many respects are quite di!erent from the physics of low Z systems. We discuss below the numerical results obtained in the high Z calculations only when they have a direct relevance for the low Z atoms. The reader can "nd a detailed discussion of the high Z physics in a number of recent reviews (see, e.g., [16]). In trying to preserve a reasonable size of this review we decided to omit discussion of positronium, even though many theoretical expressions below are written in such form that for the case of equal masses they turn into respective corrections for the positronium energy levels. Positronium is qualitatively di!erent from hydrogen and muonium not only due to the equality of the masses of its constituents, but because unlike the other light atoms there exists a whole new class of corrections to the positronium energy levels generated by the annihilation channel which is absent in other cases. Our discussion of the new theoretical methods will be incomplete due to omission of the recently developed and now popular nonrelativistic QED (NRQED) [17] which was especially useful in the positronium calculations, but was rarely used in the hydrogen and muonium physics. Very lucid presentations of NRQED exist in the recent literature (see, e.g., [18]).
2. Theoretical approaches to the energy levels of loosely bound systems 2.1. Nonrelativistic electron in the Coulomb xeld In the "rst approximation, energy levels of one-electron atoms are described by the solutions of the SchroK dinger equation for an electron in the "eld of an in"nitely heavy Coulomb center with charge Z in terms of the proton charge
D Za ! ! t(r)"E t(r) L 2m r
r t (r)"R (r)> LJK LJ JK r
m(Za) E "! , n"1, 2, 32 , L 2n
(1)
where n is called the principal quantum number. Besides the principal quantum number n each state is described by the value of angular momentum l"0, 1,2, n!1, and projection of the We are using the system of units where "c"1.
70
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
orbital angular momentum m"0,$1,2,$l. In the nonrelativistic Coulomb problem all states with di!erent orbital angular momentum but the same principal quantum number n have the same energy, and the energy levels of the SchroK dinger equation in the Coulomb "eld are n-fold degenerate with respect to the total angular momentum quantum number. As in any spherically symmetric problem, the energy levels in the Coulomb "eld do not depend on the projection of the orbital angular momentum on an arbitrary axis, and each energy level with given l is additionally 2l#1-fold degenerate. Straightforward calculation of the characteristic values of the velocity, Coulomb potential and kinetic energy in the stationary states gives
1n"*"n2" n
(Za) p n " , n m
n
Za m(Za) n " , r n
n
p m(Za) n " . 2m 2n
(2)
We see that due to the smallness of the "ne structure constant a a one-electron atom is a loosely bound nonrelativistic system and all relativistic e!ects may be treated as perturbations. There are three characteristic scales in the atom. The smallest is determined by the binding energy &m(Za), the next is determined by the characteristic electron momenta &mZa, and the last one is of order of the electron mass m. Even in the framework of nonrelativistic quantum mechanics one can achieve a much better description of the hydrogen spectrum by taking into account the "nite mass of the Coulomb center. Due to the nonrelativistic nature of the bound system under consideration, "niteness of the nucleus mass leads to substitution of the reduced mass instead of the electron mass in the formulae above. The "niteness of the nucleus mass introduces the largest energy scale in the bound system problem } the heavy particle mass. 2.2. Dirac electron in the Coulomb xeld The relativistic dependence of the energy of a free classical particle on its momentum is described by the relativistic square root p p #2 . (3274p#m+m# ! 2m 8m
(3)
The kinetic energy operator in the SchroK dinger equation corresponds to the quadratic term in this nonrelativistic expansion, and thus the SchroK dinger equation describes only the leading nonrelativistic approximation to the hydrogen energy levels. We are interested in low-Z atoms in this paper. High-Z atoms cannot be treated as nonrelativistic systems, since an expansion in Za is problematic.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
71
The classical nonrelativistic expansion goes over p/m. In the case of the loosely bound electron, the expansion in p/m corresponds to expansion in (Za); hence, relativistic corrections are given by the expansion over even powers of Za. As we have seen above, from the explicit expressions for the energy levels in the Coulomb "eld the same parameter Za also characterizes the binding energy. For this reason, parameter Za is also often called the binding parameter, and the relativistic corrections carry the second name of binding corrections. Note that the series expansion for the relativistic corrections in the bound state problem goes literally over the binding parameter Za, unlike the case of the scattering problem in QED, where the expansion parameter always contains an additional factor p in the denominator and the expansion typically goes over a/p. This absence of the extra factor p in the denominator of the expansion parameter is a typical feature of the Coulomb problem. As we will see below, in the combined expansions over a and Za, expansion over a at "xed power of the binding parameter Za always goes over a/p, as in the case of scattering. Loosely speaking one could call successive terms in the series over Za the relativistic corrections, and successive terms in the expansion over a/p the loop or radiative corrections. For the bound electron, calculation of the relativistic corrections should also take into account the contributions due to its spin one half. Account for the spin one half does not change the fundamental fact that all relativistic (binding) corrections are described by the expansion in even powers of Za, as in the naive expansion of the classical relativistic square root in Eq. (3). Only the coe$cients in this expansion change due to presence of spin. A proper description of all relativistic corrections to the energy levels is given by the Dirac equation with a Coulomb source. All relativistic corrections may easily be obtained from the exact solution of the Dirac equation in the external Coulomb "eld (see, e.g., [19,20]) E "mf (n, j) , LH where
(4)
\ (Za) f (n, j)" 1# ((( j#)!(Za)#n!j!) (Za) (Za) 3 1 +1! ! ! 2n 2n j#(1/2) 4n
1 (Za) 3 5 6 ! # # ! #2 , 8n ( j#(1/2)) n( j#(1/2)) 2n n( j#(1/2))
(5)
and j"1/2, 3/2,2, n!1/2 is the total angular momentum of the state. In the Dirac spectrum, energy levels with the same principal quantum number n but di!erent total angular momentum j are split into n components of the "ne structure, unlike the nonrelativistic SchroK dinger spectrum where all levels with the same n are degenerate. However, not all degeneracy is lifted in the spectrum of the Dirac equation: the energy levels corresponding to the same n and j but di!erent l"j$1/2 remain doubly degenerate. This degeneracy is lifted by the corrections connected with the "nite size of the Coulomb source, recoil contributions, and by the dominating QED loop contributions. The respective energy shifts are called the Lamb shifts
72
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
(see exact de"nition in Section 4.1) and will be one of the main subjects of discussion below. We would like to emphasize that the quantum mechanical (recoil and "nite nuclear size) e!ects alone do not predict anything of the scale of the experimentally observed Lamb shift which is thus essentially a quantum electrodynamic ("eld-theoretical) e!ect. One trivial improvement of the Dirac formula for the energy levels may easily be achieved if we take into account that, as was already discussed above, the electron motion in the Coulomb "eld is essentially nonrelativistic, and, hence, all contributions to the binding energy should contain as a factor the reduced mass of the electron-nucleus nonrelativistic system rather than the electron mass. Below we will consider the expression with the reduced mass factor E "m#m [ f (n, j)!1] , (6) LH rather than the naive expression in Eq. (4), as a starting point for calculation of corrections to the electron energy levels. In order to provide a solid starting point for further calculations the Dirac spectrum with the reduced mass dependence in Eq. (6) should be itself derived from QED (see Section 4.1), and not simply postulated on physical grounds as is done here. 2.3. Bethe}Salpeter equation and the ewective Dirac equation Quantum "eld theory provides an unambiguous way to "nd energy levels of any composite system. They are determined by the positions of the poles of the respective Green functions. This idea was "rst realized in the form of the Bethe}Salpeter (BS) equation for the two-particle Green function (see Fig. 3) [21] GK "S #S K GK , (7) 1 where S is a free two-particle Green function, the kernel K is a sum of all two-particle irreducible 1 diagrams in Fig. 4, and GK is the total two-particle Green function. At "rst glance the "eld-theoretical BS equation has nothing in common with the quantum mechanical SchroK dinger and Dirac equations discussed above. However, it is not too di$cult to demonstrate that with selection of a certain subset of interaction kernels (ladder and crossed ladder), followed by some natural approximations, the BS eigenvalue equation reduces in the leading approximation, in the case of one light and one heavy constituent, to the SchroK dinger or Dirac eigenvalue equations for a light particle in a "eld of a heavy Coulomb center. The basics of the BS equation are described in many textbooks (see, e.g., [20,22,23]), and many important results were obtained in the BS framework. However, calculations beyond the leading order in the original BS framework tend to be rather complicated and nontransparent. The reasons for these complications can be traced to the dependence of the BS wave function on the unphysical relative energy (or relative time), absence of
Fig. 3. Bethe}Salpeter equation.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
73
Fig. 4. Kernel of the Bethe}Salpeter equation.
the exact solution in the zero-order approximation, nonreducibility of the ladder approximation to the Dirac equation, when the mass of the heavy particle goes to in"nity, etc. These di$culties are generated not only by the nonpotential nature of the bound state problem in quantum "eld theory, but also by the unphysical classi"cation of diagrams with the help of the notion of two-body reducibility. As it was known from the very beginning [21] there is a tendency to cancellation between the contributions of the ladder graphs and the graphs with crossed photons. However, in the original BS framework, these graphs are treated in profoundly di!erent ways. It is quite natural, therefore, to seek such a modi"cation of the BS equation, that the crossed and ladder graphs play a more symmetrical role. One also would like to get rid of other drawbacks of the original BS formulation, preserving nevertheless its rigorous "eld-theoretical contents. The BS equation allows a wide range of modi"cations since one can freely modify both the zero-order propagation function and the leading order kernel, as long as these modi"cations are consistently taken into account in the rules for construction of the higher-order approximations, the latter being consistent with Eq. (7) for the two-particle Green function. A number of variants of the original BS equation were developed since its discovery (see, e.g., [24}28]). The guiding principle in almost all these approaches was to restructure the BS equation in such a way, that it would acquire a three-dimensional form, a soluble and physically natural leading order approximation in the form of the SchroK dinger or Dirac equations, and more or less transparent and regular way for selection of the kernels relevant for calculation of the corrections of any required order. We will describe, in some detail, one such modi"cation, an e!ective Dirac equation (EDE) which was derived in a number of papers [25}28]. This new equation is more convenient in many applications than the original BS equation, and we will derive some general formulae connected with this equation. The physical idea behind this approach is that in the case of a loosely bound system of two particles of di!erent masses, the heavy particle spends almost all its life not far from its own mass shell. In such case some kind of Dirac equation for the light particle in an external Coulomb "eld should be an excellent starting point for the perturbation theory expansion. Then it is convenient to choose the free two-particle propagator in the form of the product of the heavy particle mass shell projector K and the free electron propagator KS(p, l, E)"2pid>(p!M)
p. #M (2p)d(p!l) E!p. !m
(8)
where p and l are the momenta of the incoming and outgoing heavy particle, E !p is the I I I I momentum of the incoming electron (E"(E, 0) } this is the choice of the reference frame), and c-matrices associated with the light and heavy particles act only on the indices of the respective particle. The free propagator in Eq. (8) determines other building blocks and the form of a two-body equation equivalent to the BS equation, and the regular perturbation theory formulae in this case were obtained in [27,28].
74
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 5. Series for the kernel of the e!ective Dirac equation.
In order to derive these formulae let us "rst write the BS equation in Eq. (7) in an explicit form
dk dq GK (p, l, E)"S (p, l, E)# S (p, k, E)K (k, q, E)GK (q, l, E) , 1 (2p) (2p)
(9)
where i i (2p)d(p!l) . S (p, k, E)" p. !M E!l. !m
(10)
The amputated two-particle Green function G satis"es the equation 2 G "K #K S G , (11) 2 1 1 2 A new kernel corresponding to the free two-particle propagator in Eq. (8) may be de"ned via this amputated two-particle Green function G "K#KKSG . (12) 2 2 Comparing Eqs. (11) and (12) one easily obtains the diagrammatic series for the new kernel K (see Fig. 5)
dr K(q, l, E)"[I!K (S !KS)]\K "K (q, l, E)# K (q, r, E) 1 1 1 (2p) 1
i i r. #M !2pid>(r!M) K (r, l, E)#2 . 1 r. !M E!r. !m E!r. !m
(13)
The new bound state equation is constructed for the two-particle Green function de"ned by the relationship G"KS#KSG KS . (14) 2 The two-particle Green function G has the same poles as the initial Green function GK and satis"es the BS-like equation G"KS#KSKG ,
(15)
or, explicitly, G(p, l, E)"2pid>(p!M)
p. #M (2p)d(p!l) E!p. !m
#2pid>(p!M)
dq p. #M K(p, q, E)G(q, l, E) . E!p. !m (2p)
(16)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
75
This last equation is completely equivalent to the original BS equation, and may be easily written in a three-dimensional form
dq p. #M (2p)d(p!l)# iK(p, q, E)GI (q, l, E) , GI (p, l, E)" (2p)2E E!p. !m O
(17)
where all four-momenta are on the mass shell p"l"q"M, E "(p#M, and the O three-dimensional two-particle Green function GI is de"ned as follows: G(p, l, E)"2pid>(p!M)GI (p, l, E)2pid>(l!M) .
(18)
Taking the residue at the bound state pole with energy E we obtain a homogeneous equation L dq iK(p, q, E ) (q, E ) . (19) (E. !p. !m) (p, E )"(p. #M) L L L L (2p)2E O Due to the presence of the heavy particle mass shell projector on the right-hand side the wave function in Eq. (19) satis"es a free Dirac equation with respect to the heavy particle indices:
(p. !M) (p, E )"0 . L Then one can extract a free heavy particle spinor from the wave function in Eq. (19)
(p, E )"(2E ;(p)t(p, E ) L L L where
;(p)"
(20)
(21)
(E #M I N p)r . (E !M N "p"
(22)
Finally, the eight-component wave function t(p, E ) (four ordinary electron spinor indices, and L two extra indices corresponding to the two-component spinor of the heavy particle) satis"es the e!ective Dirac equation (see Fig. 6)
dq iKI (p, q, E )t(q, E ) , (E. !p. !m)t(p, E )" L L L L (2p)2E O where ;M (p)K(p, q, E );(q) L , KI (p, q, E )" L (4E E N O
Fig. 6. E!ective Dirac equation.
(23)
(24)
76
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 7. E!ective Dirac equation in the external Coulomb "eld.
k"(E !p ,!p) is the electron momentum, and the crosses on the heavy line in Fig. 6 mean that L the heavy particle is on its mass shell. The inhomogeneous equation Eq. (17) also "xes the normalization of the wave function. Even though the total kernel in Eq. (23) is unambiguously de"ned, we still have freedom to choose the zero-order kernel K at our convenience, in order to obtain a solvable lowest-order approximation. It is not di$cult to obtain a regular perturbation theory series for the corrections to the zero-order approximation corresponding to the di!erence between the zero-order kernel K and the exact kernel K #dK E "E#(n"idK(E)"n)(1#(n"idK(E)"n)) L L L L #(n"idK(E)G (E)idK(E)"n)(1#(n"idK(E)"n))#2 , L L L L L
(25)
where the summation of intermediate states goes with the weight dp/[(2p)2E ] and is realized N with the help of the subtracted free Green function of the EDE with the kernel K "n)(n" , G (E)"G (E)! L E!E L
(26)
conjugation is understood in the Dirac sense, and dK(E),(dK/dE) . L ##L The only apparent di!erence of the EDE Eq. (23) from the regular Dirac equation is connected with the dependence of the interaction kernels on energy. Respectively the perturbation theory series in Eq. (25) contain, unlike the regular nonrelativistic perturbation series, derivatives of the interaction kernels over energy. The presence of these derivatives is crucial for cancellation of the ultraviolet divergences in the expressions for the energy eigenvalues. A judicious choice of the zero-order kernel (sum of the Coulomb and Breit potentials, for more detail see, e.g., [24,25,28]) generates a solvable unperturbed EDE in the external Coulomb "eld in Fig. 7. The eigenfunctions of this equation may be found exactly in the form of the Dirac}Coulomb wave functions (see, e.g., [28]). For practical purposes it is often su$cient to approximate these exact wave functions by the product of the SchroK dinger}Coulomb wave functions with the reduced mass and the free electron spinors which depend on the electron mass and not on the reduced mass. These functions are very convenient for calculation of the high-order corrections, and while below we will often skip some steps in the derivation of one or another high-order contribution from the EDE, we advise the reader to keep in mind that almost all calculations below are done with these unperturbed wave functions. Strictly speaking the external "eld in this equation is not exactly Coulomb but also includes a transverse contribution.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
77
3. General features of the hydrogen energy levels 3.1. Classixcation of corrections The zero-order e!ective Dirac equation with a Coulomb source provides only an approximate description of loosely bound states in QED, but the spectrum of this Dirac equation may serve as a good starting point for obtaining more precise results. The magnetic moment of the heavy nucleus is completely ignored in the Dirac equation with a Coulomb source, and, hence, the hyper"ne splitting of the energy levels is missing in its spectrum. Notice that the magnetic interaction between the nucleus and the electron may be easily described even in the framework of the nonrelativistic quantum mechanics, and the respective calculation of the leading contribution to the hyper"ne splitting was done a long time ago by Fermi [29]. Other corrections to the Dirac energy levels do not arise in the quantum mechanical treatment with a potential, and for their calculation, as well as for calculation of the corrections to the hyper"ne splitting, "eld-theoretical methods are necessary. All electrodynamic corrections to the energy levels may be written in the form of the power series expansion over three small parameters a, Za and m/M which determine the properties of the bound state. Account for the additional corrections of nonelectromagnetic origin induced by the strong and weak interactions introduces additional small parameters, namely, the ratio of the nuclear radius and the Bohr radius, the Fermi constant, etc. It should be noted that the coe$cients in the power series for the energy levels might themselves be slowly varying functions (logarithms) of these parameters. Each of the small parameters above plays an important and unique role. In order to organize further discussion of di!erent contributions to the energy levels it is convenient to classify corrections in accordance with the small parameters on which they depend. Corrections which depend only on the parameter Za will be called relativistic or binding corrections. Higher powers of Za arise due to deviation of the theory from a nonrelativistic limit, and thus represent a relativistic expansion. All such contributions are contained in the spectrum of the e!ective Dirac equation in the external Coulomb "eld. Contributions to the energy which depend only on the small parameters a and Za are called radiative corrections. Powers of a arise only from the quantum electrodynamics loops, and all associated corrections have a quantum "eld theory nature. Radiative corrections do not depend on the recoil factor m/M and thus may be calculated in the framework of QED for a bound electron in an external "eld. In respective calculations one deals only with the complications connected with the presence of quantized "elds, but the two-particle nature of the bound state and all problems connected with the description of the bound states in relativistic quantum "eld theory still may be ignored. Corrections which depend on the mass ratio m/M of the light and heavy particles re#ect a deviation from the theory with an in"nitely heavy nucleus. Corrections to the energy levels which depend on m/M and Za are called recoil corrections. They describe contributions to the energy levels which cannot be taken into account with the help of the reduced mass factor. The presence of these corrections signals that we are dealing with a truly two-body problem, rather than with a one-body problem. Leading recoil corrections in Za (of order (Za)(m/M)L) still may be taken into account with the help of the e!ective Dirac equation in the external "eld since these corrections are induced by the
78
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 8. Leading-order contribution to the electron radius.
one-photon exchange. This is impossible for the higher-order recoil terms which re#ect the truly relativistic two-body nature of the bound state problem. Technically, respective contributions are induced by the Bethe}Salpeter kernels with at least two-photon exchanges and the whole machinery of relativistic QFT is necessary for their calculation. Calculation of the recoil corrections is simpli"ed by the absence of ultraviolet divergences, connected with the purely radiative loops. Radiative-recoil corrections are the expansion terms in the expressions for the energy levels which depend simultaneously on the parameters a, m/M and Za. Their calculation requires application of all the heavy artillery of QED, since we have to account both for the purely radiative loops and for the relativistic two-body nature of the bound states. The last class of corrections contains nonelectromagnetic corrections, e!ects of weak and strong interactions. The largest correction induced by the strong interaction is connected with the "niteness of the nuclear size. Let us emphasize once more that hyper"ne structure, radiative, recoil, radiative-recoil, and nonelectromagnetic corrections are all missing in the Dirac energy spectrum. Discussion of their calculations is the main topic of this review. 3.2. Physical origin of the Lamb shift According to QED an electron continuously emits and absorbs virtual photons (see the leading order diagram in Fig. 8) and as a result its electric charge is spread over a "nite volume instead of being pointlike 1r2"!6
dF (!k) dk k
1 2a 1 2a m ln + ln(Za)\ + o m p m p
(28) In order to obtain this estimate of the electron radius we have taken into account that the electron is slightly o! mass shell in the bound state. Hence, the would be infrared divergence in the electron charge radius is cut o! by its virtuality o"(m!p)/m which is of order of the nonrelativistic binding energy o+m(Za). The "nite radius of the electron generates a correction to the Coulomb potential (see, e.g., [19]) 1 2p de\ cross section and the unitarity condition.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
89
Fig. 13. Insertions of two-loop polarization operator.
4.2.2.2. Pauli form factor contribution. Calculation of the Pauli form factor contribution follows closely the one which was performed in order a(Za), the only di!erence being that we have to employ the second-order contribution to the Pauli form factor (see Fig. 12) calculated a long time ago in [50}52] (the result of the "rst calculation [50] turned out to be in error)
3 p p 197 F(0)" f(3)! ln 2# # 4 2 12 144
a a +!0.32847892 . p p
(48)
Then one readily obtains for the Lamb shift contribution
a(Za)m m *E "!0.32847892 , J pn m
(49)
a(Za)m j( j#1)!l(l#1)!3/4 m . *E "!0.32847892 J$ l(l#1)(2l#1) m pn 4.2.2.3. Polarization operator contribution. Here we use well known low-momentum asymptotics of the second-order polarization operator [53}55] in Fig. 13 P(!k) k
41 a , "! 162m p k
(50)
and obtain [53]
82 a(Za)m m d . *E"! J m 81 pn
(51)
4.2.3. Corrections of order a(Za)m 4.2.3.1. Dirac form factor contribution. Calculation of the corrections of order a(Za) is similar to calculation of the contributions of order a(Za). Respective corrections depend only on the values of the three-loop form factors or their derivatives at vanishing transferred momentum. The three-loop contribution to the slope of the Dirac form factor (Fig. 14) was recently calculated analytically [56] dF(!k) dk
"! k
25 17 2929 217 217 f(5)! pf(3)! f(3)! a ! ln 2 8 24 288 9 216
90
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 14. Examples of the three-loop contributions for the electron form factor.
103 41 671 3899 454 979 77 513 1 a ! p ln 2# p ln 2# p! p! 1080 2160 25 920 38 880 186 624 m p
0.17 1722 a , +! m p
(52)
where 1 a " . 2Ln L The respective contribution to the Lamb shift is equal to
(53)
4a(Za) m md , *E "0.171722 J $ m pn
(54)
4.2.3.2. Pauli form factor contribution. For calculation of the Pauli form factor contribution to the Lamb shift the third-order contribution to the Pauli form factor (Fig. 14), calculated numerically in [57], and analytically in [58] is used:
83 215 100 pf(3)! f(5)# F(0)" 72 24 3
1 1 239 a # ln 2 ! p ln 2 ! p 24 24 2160
298 17 101 28 259 139 f(3)! p ln 2# p# # 9 810 5184 18
a a +1.1812414562 . (55) p p
Then one obtains for the Lamb shift
a(Za)m m , *E "1.1812414562 J m pn
a(Za)m j( j#1)!l(l#1)!3/4 m *E "1.1812414562 . J$ pn l(l#1)(2l#1) m
(56)
4.2.3.3. Polarization operator contribution. In this case the analytic result for the low-frequency asymptotics of the third-order polarization operator (see Fig. 15) [59] is used P(!k) k
"! k
8135 p ln 2 23p 325 805 1 a f(3)! # ! 9216 15 360 373 248 m p
0.3626544402 a , +! m p
(57)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
91
Fig. 15. Examples of the three-loop contributions to the polarization operator.
and one obtains [60]
a(Za)m m d . *E"!1.4506177632 J m pn
(58)
4.2.4. Total correction of order aL(Za)m The total contribution of order aL(Za)m is given by the sum of corrections in Eqs. (40), (44), (45), (46), (49), (51), (54), (56) and (58). It is equal to
*E " J
4 m(Za)\ 4 38 ln ! ln k (n, 0)# 3 m 3 45
9 3 10 2179 # ! f(3)# p ln 2! p! 4 2 27 648 #
a p
85 121 84 071 71 239 4787 f(5)! pf(3)! f(3)! ln 2! p ln 2# p ln 2 24 72 2304 27 135 108
1591 252 251 679 441 568 a # p! p# ! 3240 9720 93 312 9
a a(Za)m m p pn m
4 m(Za)\ 4 a 38 ln ! ln k (n, 0)# #0.538952 3 p m 3 45 a a(Za)m m , #0.4175042 p pn m
"
(59)
for the S-states, and
m 1 3 4 p p 197 *E " ! ln k (n, l) # # f(3)! ln 2# # J$ m 2 4 3 2 12 144
#
a p
83 215 100 25 25 239 pf(3)! f(5)# a # ln 2! p ln 2! p 72 24 3 18 18 2160
139 298 17 101 28 259 # f(3)! p ln 2# p# 18 9 810 5184
a p
92
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
;
j( j#1)!l(l#1)!3/4 m a(Za)m m pn l(l#1)(2l#1)
m 1 a a 4 " ! ln k (n, l) # !0.32847892 #1.1812414562 m 2 p p 3 j( j#1)!l(l#1)!3/4 m a(Za)m l(l#1)(2l#1) m pn
(60)
for the non-S-states. Numerically corrections of order aL(Za)m for the lowest energy levels give *E(1S)"8 115 785.64 kHz , *E(2S)"1 037 814.43 kHz , *E(2P)"!12 846.46 kHz .
(61)
Contributions of order a(Za)m are suppressed by an extra factor a/p in comparison with the corrections of order a(Za)m. Their expected magnitude is at the level of hundredths of kHz even for the 1S state in hydrogen, and they are too small to be of any phenomenological signi"cance.
4.2.5. Heavy particle polarization contributions of order a(Za)m We have considered above only radiative corrections containing virtual photons and electrons. However, at the current level of accuracy one has to consider also e!ects induced by the virtual muons and lightest strongly interacting particles. The respective corrections to the electron anomalous magnetic moment are well known [57] and are still too small to be of any practical interest for the Lamb shift calculations. Heavy particle contributions to the polarization operator numerically have the same magnitude as polarization corrections of order a(Za). Corrections to the low-frequency asymptotics of the polarization operator are generated by the diagrams in Fig. 16. The muon loop contribution to the polarization operator
a (62) "! 15pm I immediately leads (compare Eq. (45)) to an additional contribution to the Lamb shift [61,62] P(!k) k
k
4 m a(Za) *E"! md . 15 m pn J I
Fig. 16. Muon-loop and hadron contributions to the polarization operator.
(63)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
93
The hadronic polarization contribution to the Lamb shift was estimated in a number of papers [61}63]. The light hadron contribution to the polarization operator may easily be estimated with the help of vector dominance P(!k) k
4pa (64) "! f G mG k G T T where m G are the masses of the three lowest vector mesons and the vector meson}photon vertex T has the form emG /f G . T T Estimating contributions of the heavy quark #avors with the help of the free quark loops one obtains the total hadronic vacuum polarization contribution to the Lamb shift in the form [62]
4p 1 a(Za) 2 *E"!4 R G # md . (65) T f m J 1 GeV pn 3 TG TG Numerically this correction is !3.18 kHz for the 1S-state and !0.04 kHz for the 2S-state in hydrogen. A compatible but a more accurate estimate for the heavy particle contribution to the 1S Lamb shift !3.40(7) kHz was obtained in [63] from the analysis of the experimental data on the low-energy e>e\ annihilation (Table 2). 4.3. Radiative corrections of order aL(Za)m 4.3.1. Skeleton integral approach to calculations of radiative corrections We have seen above that calculation of the corrections of order aL(Za)m (n'1) reduces to calculation of higher-order corrections to the properties of a free electron and to the photon propagator, namely to calculation of the slope of the electron Dirac form factor and anomalous magnetic moment, and to calculation of the leading term in the low-frequency expansion of the polarization operator. Hence, these contributions to the Lamb shift are independent of any features of the bound state. A nontrivial interplay between radiative corrections and binding e!ects arises "rst in calculation of contributions of order a(Za)m, and in calculations of higher-order terms in the combined expansion over a and Za. Calculation of the contribution of order aL(Za)m to the energy shift is even simpler than calculation of the leading-order contribution to the Lamb shift because the scattering approximation is su$cient in this case [64}66]. Formally this correction is induced by kernels with at least two-photon exchanges, and in analogy with the leading-order contribution one could also anticipate the appearance of irreducible kernels with higher number of exchanges. This does not happen, however, as can be proved formally, but in fact no formal proof is needed. First one has to realize that for high exchanged momenta expansion in Za is valid, and addition of any extra exchanged photon always produces an extra power of Za. Hence, in the high-momentum region only diagrams with two exchanged photons are relevant. Treatment of the low-momentum region is greatly facilitated by a very general feature of the Feynman diagrams, namely that the infrared It is not obvious that this contribution should be included in the phenomenological analysis of the Lamb shift measurements, since experimentally it is indistinguishable from an additional contribution to the proton charge radius. We will return to this problem below in Section 7.1.3.
94
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Table 2 Contributions of order aL(Za)m
a(Za) m 3 250 137.65(4) m+ (kHz) pn m n
4 Bethe [32] French and Weisskopf [34] Kroll and Lamb [33] Pauli FF l"0 Pauli FF lO0 Vacuum polarization Uehling [40] Dirac FF Appelquist and Brodsky [43]
1 m(Za)\ 11 ln # d ! ln k (n, l) 3 m 72 J
Pauli FF l"0 Sommer"eld [52] Peterman [51]
Pauli FF lO0
3 p 49 4819 a ! f(3)# ln 2! p! d 4 2 432 5184 p J a +0.46994142 d p J
3 p p 197 a f(3)! ln 2# # 16 8 48 576 p a +!0.08211972 p p p 197 3 f(3)! ln 2# # 16 8 48 576
406 267.21
50 783.40
!216 675.84
!27 084.48
3547.82
443.48
!619.96
!77.50
!1910.67
!238.83
3.01
0.38
j( j#1)!l(l#1)!3/4 a m pm l(l#1)(2l#1) j( j#1)!l(l#1)!3/4 m a +!0.08211972 l(l#1)(2l#1) m p
Sommer"eld [52] Peterman [51] Vacuum polarization
Dirac FF Melnikov and van Ritbergen [56]
*E(2S) (kHz)
7 925 175.26(9) 1 013 988.13(1)
j( j#1)!l(l#1)!3/4 m m 8l(l#1)(2l#1) 1 ! d 15 J
Barbieri et al. [48]
Baranger et al. [53]
*E(1S) (kHz)
41 a ! d 162 p J
25 17 2929 217 f(5)! pf(3)! f(3)! a 8 24 288 9 217 103 41 671 ! ln 2! p ln 2# p ln 2 1080 2160 216 3899 454 979 77 513 p! p! # 38 880 186 624 25 920
a d J p
a d J p 83 215 25 1 pf(3)! f(5)# a # ln 2 24 288 96 3 +0.171 722
Pauli FF l"0 Kinoshita [57]
(continued on next page)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
95
Table 2 (continued)
a(Za) m 3 250 137.65(4) m+ (kHz) pn m n
4
Laporta and Remiddi [58]
*E(2S) (kHz)
239 139 1 p# f(3) ! p ln 2 ! 8640 72 24 149 17 101 28 259 ! p ln 2# p# 18 3240 20 736 a +0.29531032 p
Pauli FF, lO0, Kinoshita [57] Laporta and Remiddi [58]
*E(1S) (kHz)
215 25 83 pf(3)! f(5)# 288 96 3
a p
1 a # ln 2 24
5.18
0.65
1 239 139 149 ! p ln 2 ! p# f(3)! pln 2 24 8640 72 18
17 101 28 259 j( j#1)!l(l#1)!3/4 # p# 3240 l(l#1)(2l#1) 20 736
m a ; m p
j( j#1)!l(l#1)!3/4 m a +0.29531032 l(l#1)(2l#1) m p
Vacuum polarization
Baikov and Broadhurst [59] !
p ln 2 23p 325 805 8135 f(3)! # ! 9216 15 360 373 248
Eides and Grotch [60]
Hadronic polarization Karshenboim [61] Eides and Shelyuto [62] Friar et al. [63]
a d J p !6.36
!0.79
1 m ! 15 m I
!5.07
!0.63
4p 2 1 !R G ! T f m 3 1 GeV TG TG
!3.18
!0.40
+!0.36265442 Muonic polarization Karshenboim [61] Eides and Shelyuto [62]
a d J p
behavior of any radiatively corrected Feynman diagram (or more accurately any gauge invariant sum of Feynman diagrams) is milder than the behavior of the skeleton diagram. Consider the matrix element in momentum space of the diagrams in Fig. 17 with two exchanged Coulomb photons between the SchroK dinger}Coulomb wave functions. We will take the external electron momenta to be on-shell and to have vanishing space components. It is then easy to see that the
96
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 17. Skeleton diagram with two exchanged Coulomb photons. Fig. 18. Radiative insertions in the electron line.
contribution of such a diagram to the Lamb shift is given by the infrared divergent integral
16(Za) m dk m ! d , pn m k J
(66)
where k is the dimensionless momentum of the exchanged photon measured in the units of the electron mass. This divergence has a simple physical interpretation. If we do not ignore small virtualities of the external electron lines and the external wave functions this two-Coulomb exchange adds one extra rung to the Coulomb wave function and should simply reproduce it. The naive infrared divergence above would be regularized at the characteristic atomic scale mZa. Hence, it is evident that the kernel with two-photon exchange is already taken into account in the e!ective Dirac equation above and there is no need to try to consider it as a perturbation. Let us consider now radiative photon insertions in the electron line (see Fig. 18). Account of these corrections e!ectively leads to insertion of an additional factor ¸(k) in the divergent integral above, and while this factor has at most a logarithmic asymptotic behavior at large momenta and does not spoil the ultraviolet convergence of the integral, in the low-momentum region it behaves as ¸(k)&k (again up to logarithmic factors), and improves the low-frequency behavior of the integrand. However, the integrand is still divergent even after inclusion of the radiative corrections because the two-photon-exchange box diagram, even with radiative corrections, contains a contribution of the previous order in Za, namely the main contribution to the Lamb shift induced by the electron form factor. This spurious contribution may be easily removed by subtracting the leading low-momentum term from ¸(k)/k. The result of the subtraction is a convergent integral which is responsible for the correction of order a(Za). As an additional bonus of this approach one does not need to worry about the ultraviolet divergence of the one-loop radiative corrections. The subtraction automatically eliminates any ultraviolet divergent terms and the result is both ultraviolet and infrared "nite. Due to radiative insertions low integration momenta (of atomic order mZa) are suppressed in the exchange loops and the e!ective integration momenta are of order m. Hence, one may neglect the small virtuality of external fermion lines and calculate the above diagrams with on-mass-shell external momenta. Contributions to the Lamb shift are given by the product of the square of the SchroK dinger}Coulomb wave function at the origin "t(0)" and the diagram. Under these conditions the diagrams in Fig. 18 comprise a gauge invariant set and may easily be calculated. Contributions of the diagrams with more than two exchanged Coulomb photons are of higher order in Za. This is obvious for the high exchanged momenta integration region. It is not di$cult to demonstrate that in the Yennie gauge [67}69] contributions from the low exchanged momentum region to the matrix element with the on-shell external electron lines remain infrared "nite, and hence, cannot produce any correction of order a(Za). Since the sum of diagrams with the on-shell
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
97
external electron lines is gauge invariant this is true in any gauge. It is also clear that small virtuality of the external electron lines would lead to an additional suppression of the matrix element under consideration, and, hence, it is su$cient to consider only two-photon exchanges for calculation of all corrections of order a(Za). The magnitude of the correction of order a(Za) may be easily estimated before the calculation is carried out. We need to take into account the skeleton factor 4m(Za)/n discussed above in Section 6, and multiply it by an extra factor a(Za). Naively, one could expect a somewhat smaller factor a(Za)/p. However, it is well known that a convergent diagram with two external photons always produces an extra factor p in the numerator, thus compensating the factor p in the denominator generated by the radiative correction. Hence, calculation of the correction of order a(Za) should lead to a numerical factor of order unity multiplied by 4ma(Za)/n. 4.3.2. Radiative corrections of order a(Za)m 4.3.2.1. Correction induced by the radiative insertions in the electron line. This correction is generated by the sum of all possible radiative insertions in the electron line in Fig. 18. In the approach described above, one has to calculate the electron factor corresponding to the sum of all radiative corrections in the electron line, make the necessary subtraction of the leading infrared asymptote, insert the subtracted expression in the integrand in Eq. (66), and then integrate over the exchanged momentum. This leads to the result
a(Za) m 11 1 md , ! ln 2 *E"4 1# J n m 128 2
(67)
which was "rst obtained in [64}66] in other approaches. Note that numerically 1#11/128!1/2 ln 2+0.739 in excellent agreement with the qualitative considerations above. 4.3.2.2. Correction induced by the polarization insertions in the external photons. The correction of order a(Za) induced by the polarization operator insertions in the external photon lines in Fig. 19 was obtained in [64}66] and may again be calculated in the skeleton integral approach. We will use the simplicity of the one-loop polarization operator, and perform this calculation in more detail in order to illustrate the general considerations above. For calculation of the respective contribution one has to insert the polarization operator in the skeleton integrand in Eq. (66) 1 a P I (k) , k p
(68)
Fig. 19. Polarization insertions in the Coulomb lines.
98
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
where
v(1!v/3) . dv 4#(1!v)k Of course, the skeleton integral still diverges in the infrared after this substitution since I (k)"
(69)
1 I (0)" . 15
(70)
This linear infrared divergence dk/k is e!ectively cut o! at the characteristic atomic scale mZa, it lowers the power of the factor Za, respective would be divergent contribution turns out to be of order a(Za), and corresponds to the polarization part of the leading order contribution to the Lamb shift. We carry out the subtraction of the leading low-frequency asymptote of the polarization operator insertion, which corresponds to the subtraction of the leading low-frequency asymtote in the integrand for the contribution to the energy shift
k v(1!v)(1!v/3) (71) dv II (k),I (k)!I (0)"! 4#(1!v)k 4 and substitute the subtracted expression in the formula for the Lamb shift in Eq. (66). We also insert an additional factor 2 in order to take into account possible insertions of the polarization operator in both photon lines. Then
*E"!m
"m
m a(Za) m pn
m a(Za) m pn
32 m 1! M
8
m 1! M
II (k) dk d k J
v(1!v)(1!v/3) dv dk d J 4#(1!v)k
5 a(Za) m md . " J m 48 n
(72)
We have restored in Eq. (72) the characteristic factor 1/(1!m/M) which was omitted in Eq. (66), but which naturally arises in the skeleton integral. However, it is easy to see that an error generated by the omission of this factor is only about 0.02 kHz even for the electron-line contribution to the 1S level shift, and, hence, this correction may be safely omitted at the present level of experimental accuracy. 4.3.2.3. Total correction of order a(Za)m. The total correction of order a(Za)m is given by the sum of contributions in Eqs. (67) and (72):
a(Za) m 11 5 1 md *E"4 1# # ! ln 2 J n m 128 192 2
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
99
Fig. 20. Six gauge invariant sets of diagrams for corrections of order a(Za)m.
a(Za) m md "3.0616222 J m n "57 030.70 kHz ,"7128.84 kHz . L L
(73)
4.3.3. Corrections of order a(Za)m Corrections of order a(Za) have the same physical origin as corrections of order a(Za), and the scattering approximation is su$cient for their calculation [70]. We consider now corrections of higher order in a than in the previous section and there is a larger variety of relevant graphs. All six gauge invariant sets of diagrams [70] which produce corrections of order a(Za) are presented in Fig. 20. The blob called `2 loopsa in Fig. 20(f ) means the gauge invariant sum of diagrams with all possible insertions of two radiative photons in the lectron line. All diagrams in Fig. 20 may be obtained from the skeleton diagram in Fig. 17 with the help of di!erent two-loop radiative insertions. As in the case of the corrections of order a(Za), corrections to the energy shifts are given by the matrix elements of the diagrams in Fig. 20 calculated between free electron spinors with all external electron lines on the mass shell, projected on the respective spin states, and multiplied by the square of the SchroK dinger}Coulomb wave function at the origin [70]. It should be mentioned that some of the diagrams under consideration contain contributions of the previous order in Za. These contributions are produced by the terms proportional to the
100
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
exchanged momentum squared in the low-frequency asymptotic expansion of the radiative corrections, and are connected with integration over external photon momenta of characteristic atomic scale mZa. The scattering approximation is inadequate for their calculation. In the skeleton integral approach these previous order contributions arise as powerlike infrared divergences in the "nal integration over the exchanged momentum. We subtract leading low-frequency terms in the low-frequency asymptotic expansions of the integrands, when necessary, and thus remove the spurious previous order contributions. 4.3.3.1. One-loop polarization insertions in the Coulomb lines. The simplest correction is induced by the diagrams in Fig. 20(a) with two insertions of the one-loop vacuum polarization in the external photon lines. The contribution to the Lamb shift is given by the insertion of the one-loop polarization operator squared I (k) in the skeleton integral in Eq. (66), and taking into account the multiplicity factor 3 one easily obtains [70}72]
48a(Za) m 23 a(Za) m m dk I (k)d "! md . *E"! J J m m 378 pn pn
(74)
4.3.3.2. Insertions of the irreducible two-loop polarization in the Coulomb lines. The naive insertion 1/kPI (k) of the irreducible two-loop vacuum polarization operator I (k) [54,55] in the skeleton integral in Eq. (66) would lead to an infrared divergent integral for the diagrams in Fig. 20(b). This divergence re#ects the existence of the correction of the previous order in Za connected with the two-loop irreducible polarization. This contribution of order a(Za)m was discussed in Section 4.2.2.3, and as we have seen the respective contribution to the Lamb shift is given simply by the product of the SchroK dinger}Coulomb wave function squared at the origin and the leading low-frequency term of the function I (0). In terms of the loop momentum integration this means that the relevant loop momenta are of the atomic scale mZa. Subtraction of the value I (0) from the function I (k) e!ectively removes the previous order contribution (the low momentum region) from the loop integral and one obtains the radiative correction of order a(Za)m generated by the irreducible two-loop polarization operator [70}72]
32a(Za) m dk m [I (k)!I (0)]d *E"! J m k pn 25 15 647 a(Za) m 52 md . " ln 2! p# J 63 13 230 63 pn m
(75)
4.3.3.3. Insertion of one-loop electron factor in the electron line and of the one-loop polarization in the Coulomb lines. The next correction of order a(Za) is generated by the gauge invariant set of diagrams in Fig. 20(c). The respective analytic expression is obtained from the skeleton integral by simultaneous insertion in the integrand of the one-loop polarization function I (k) and of the expressions corresponding to all possible insertions of the radiative photon in the electron line. It is simpler "rst to obtain an explicit analytic expression for the sum of all these radiative insertions in the electron line, which we call the one-loop electron factor ¸(k) (explicit expression for the electron factor in di!erent forms may be found in [36,73}75]), and then to insert this electron factor in the skeleton integral. It is easy to check explicitly that the resulting integral for the radiative correction
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
101
is both ultraviolet and infrared "nite. The infrared "niteness nicely correlates with the physical understanding that for these diagrams there is no correction of order a(Za) generated at the atomic scale. The respective integral for the radiative correction was calculated both numerically and analytically [73,71,75], and the result has the following elegant form:
32a(Za) m m dk ¸(k)I (k)d *E"! J m pn
"
8 1#(5 872 1#(5 628 2p 67 282 a(Za) m md . ln ! (5 ln # ln 2! # J 2 2 3 pn m 63 63 9 6615 (76)
4.3.3.4. One-loop polarization insertions in the radiative electron factor. This correction is induced by the gauge invariant set of diagrams in Fig. 20(d) with the polarization operator insertions in the radiative photon. The respective radiatively corrected electron factor is given by the expression [74]
L(k)"
v v 1! 3 dv ¸(k, j) , 1!v
(77)
where ¸(k, j) is just the one-loop electron factor used in Eq. (76) but with a "nite photon mass j"4/(1!v). Direct substitution of the radiatively corrected electron factor L(k) in the skeleton integral in Eq. (66) would lead to an infrared divergence. This divergence re#ects existence in this case of the correction of the previous order in Za generated by the two-loop insertions in the electron line. The magnitude of this previous order correction is determined by the nonvanishing value of the electron factor L(k) at zero (78) L(0)"!2F (0)!F (0) , which is simply a linear combination of the slope of the two-loop Dirac form factor and the two-loop contribution to the electron anomalous magnetic moment. Subtraction of the radiatively corrected electron factor removes this previous order contribution which was already considered above, and leads to a "nite integral for the correction of order a(Za) [74,71]
16a(Za) m L(k)!L(0) m dk *E"! d J m pn k a(Za) m md . "!0.072902 J m pn
(79)
4.3.3.5. Light by light scattering insertions in the external photons. The diagrams in Fig. 20(e) with the light by light scattering insertions in the external photons do not generate corrections of the
102
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
previous order in Za. They are both ultraviolet and infrared "nite and respective calculations are in principle quite straightforward though technically involved. Only numerical results were obtained for the contributions to the Lamb shift [71,76]
a(Za) m md . *E"!0.12292 J m pn
(80)
4.3.3.6. Diagrams with insertions of two radiative photons in the electron line. As we have already seen, contributions of the diagrams with radiative insertions in the electron line always dominate over the contributions of the diagrams with radiative insertions in the external photon lines. This property of the diagrams is due to the gauge invariance of QED. The diagrams (radiative insertions) with the external photon lines should be gauge invariant, and as a result transverse projectors correspond to each external photon. These projectors are rational functions of external momenta, and they additionally suppress low-momentum integration regions in the integrals for energy shifts. Respective projectors are of course missing in the diagrams with insertions in the electron line. The low-momentum integration region is less suppressed in such diagrams, and hence they generate larger contributions to the energy shifts. This general property of radiative corrections clearly manifests itself in the case of six gauge invariant sets of diagrams in Fig. 20. By far the largest contribution of order a(Za) to the Lamb shift is generated by the last gauge invariant set of diagrams in Fig. 20(f ), which consists of nineteen topologically di!erent diagrams [77] presented in Fig. 21. These nineteen graphs may be obtained from the three graphs for the two-loop electron self-energy by insertion of two external photons in all possible ways. Graphs in Fig. 21(a)}(c) are obtained from the two-loop reducible electron self-energy diagram, graphs in Fig. 21(d)}(k) are the result of all possible insertions of two external photons in the rainbow self-energy diagram, and diagrams in Fig. 21(l)}(s) are connected with the overlapping two-loop self-energy graph. Calculation of the respective energy shift was initiated in [77,78], where contributions induced by the diagrams in Fig. 21(a)}(h) and in Fig. 21(l) were obtained. Contribution of all nineteen diagrams to the Lamb shift was "rst calculated in [79]. In the framework of the skeleton integral approach the calculation was completed in [80,62] with the result
a(Za) m md *E"!7.725(1)2 J m pn
(81)
which con"rmed the one in [79] but is about two orders of magnitude more precise than the result in [79,14]. A few comments are due on the magnitude of this important result. It is sometimes claimed in the literature that it has an unexpectedly large magnitude. A brief glance at Table 3 is su$cient to convince oneself that this is not the case. For the reader who followed closely the discussion of the scales of di!erent contributions above, it should be clear that the natural scale for the correction under discussion is set by the factor 4a(Za)/(pn)m. The coe$cient before this factor obtained in Eq. (81) is about !1.9 and there is nothing unusual in its magnitude for a numerical factor corresponding to a radiative correction. It should be compared with the respective coe$cient 0.739 before the factor 4a(Za)/nm in the case of the electron-line contribution of the previous order in a.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
103
Fig. 21. Nineteen topologically di!erent diagrams with two radiative photons insertions in the electron line.
The misunderstanding about the magnitude of the correction of order a(Za)m has its roots in the idea that the expansion of energy in a series over the parameter Za at "xed power of a should have coe$cients of order one. As is clear from the numerous discussions above, however natural such expansion might seem from the point of view of calculations performed without expansion over Za, there are no real reasons to expect that the coe$cients would be of the same order of magnitude in an expansion of this kind. We have already seen that quite di!erent physics is connected with the di!erent terms in expansion over Za. The terms of order aL(Za) (and aL(Za), as we will see below) are generated at large distances (exchanged momenta of order of the atomic scale mZa) while terms of order aL(Za) originate from the small distances (exchanged momenta of order of the electron mass m). Hence, it should not be concluded that there would be a simple way to "gure out the relative magnitude of the successive coe$cients in an expansion over Za. The situation is di!erent for expansion over a at "xed power of Za since the physics is the same independent of the power of a, and respective coe$cients are all of order one, as in the series for the radiative corrections in scattering problems.
104
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Table 3 Radiative corrections of order aL(Za)m
Electron-line insertions Karplus et al. [64,65] Baranger et al. [66] Polarization contribution Karplus et al. [64,65] Baranger et al. [66] One-loop polarization Eides [70] Pachucki and Laporta [71,72] Two-loop polarization Eides et al. [70]
4
a(Za) m m n m
*E(1S) (kHz)
*E(2S) (kHz)
11 1 1# ! ln 2 d J 128 2
55 090.31
6886.29
5 d 192 J
1940.38
242.55
23 a d ! 1512 p J
!2.63
!0.33
21.99
2.75
26.45
3.31
a !0.0182 d p J
!3.15
!0.39
a !0.0307 d p J
!5.31
!0.66
!334.24(5)
!41.78
Pachucki and Laporta [71,72]
13 25 15647 a ln 2! p# d 63 252 52920 p J
One-loop polarization and electron factor Eides and Grotch [73] Pachucki [71] Eides et al. [75]
2 1#(5 218 1#(5 ln ! (5 ln 3 2 2 63
Polarization insertion in the electron factor Eides and Grotch [74] Pachucki [71] Light by light scattering Pachucki [71], Eides et al. [76] Insertions of two radiative photons in the electron line Pachucki [79] Eides and Shelyuto [80,62]
157 p 33641 a # ln 2! # d 63 18 13230 p J
a !1.9312(3) d p J a ($1?) d J p
$0.4
$0.05
4.3.3.7. Total correction of order a(Za)m. The total contribution of order a(Za) is given by the sum of contributions in Eqs. (74)}(76) and (79)}(81) [75]
*E"
8 1#(5 872 1#(5 680 2p 25p ln ! (5 ln # ln 2! ! 2 2 3 63 63 9 63
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
105
a(Za) m a(Za) m 24901 md "!6.862(1) md . # !7.921(1) J J pn m pn m 2205 *E"!6.862(1)
a(Za) m md J pn m
"!296.92(4) kHz ,"!37.115(5) kHz . L L
(82)
(83)
4.3.4. Corrections of order a(Za)m Corrections of order a(Za) have not been considered in the literature. From the preceding discussion it is clear that their natural scale is determined by the factor 4a(Za)/(pn)m, which is equal about 0.4 kHz for the 1S-state and about 0.05 kHz for the 2S-state. Taking into account the rapid experimental progress in the "eld these theoretical calculations may become necessary in the future, if experimental accuracy in the measurement of the 1S Lamb shift at the level of 1 kHz, is achieved. 4.4. Radiative corrections of order aL(Za)m 4.4.1. Radiative corrections of order a(Za)m 4.4.1.1. Logarithmic contribution induced by the radiative insertions in the electron line. Unlike the corrections of order aL(Za), corrections of order aL(Za) depend on the large distance behavior of the wave functions. Roughly speaking this happens because in order to produce a correction containing six factors of Za one needs at least three exchange photons like in Fig. 22. The radiative photon responsible for the additional factor of a does not suppress completely the low-momentum region of the exchange integrals. As usual, long distance contributions turn out to be state dependent. The leading correction of order a(Za) contains a logarithm squared, which can be compared to the "rst power of logarithm in the leading-order contribution to the Lamb shift. One can understand the appearance of the logarithm squared factor qualitatively. In the leading-order contribution to the Lamb shift, the logarithm was completely connected with the logarithmic infrared singularity of the electron form factor. Now we have two exchanged loops and one should anticipate the emergence of an exchanged logarithm generated by these loops. Note that the diagram with one exchange loop (e.g., relevant for the correction of order a(Za)) cannot produce a logarithm, since in the external "eld approximation the loop integration measure dk is odd in the exchanged momentum, while all other factors in the exchanged integral are even in the exchanged momentum. Hence, in order to produce a logarithm which can only arise from the dimensionless integrand it is necessary to consider an even number of exchanged loops. These simple remarks
Fig. 22. Diagram with three spanned Coulomb photons.
106
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
may also be understood in another way if one recollects that in the relativistic corrections to the SchroK dinger}Coulomb wave function each power of logarithm is multiplied by the factor (Za) (this is evident if one expands the exact Dirac wave function near the origin). The logarithm squared term is, of course, state independent since the coe$cient before this term is determined by the high momentum integration region, where the dependence on the principal quantum number may enter only via the value of the wave function at the origin squared. Terms linear in the large logarithm are already state dependent. Logarithmic terms were "rst calculated in [81}84]. For the S-states the logarithmic contribution is equal to
m(Za)\ 4 1 2 # ln 2#ln #t(n#1)!t(1) *E " " ! ln J m 3 4 n m(Za)\ 4a(Za) m 601 77 m, ! ! ln m pn m 720 180n where
(84)
L\ 1 t(n)" #t(1) , (85) k is the logarithmic derivative of the Euler C-function t(x)"C(x)/C(x), t(1)"!c. For non-S-states the state-independent logarithm squared term disappears and the singlelogarithmic contribution has the form
1 m(Za)\ 1 6!2l(l#1)/n # d d # ln H J 30 12 m 3(2l#3)l(l#1)(4l!1) 4a(Za) m m. ; m pn
*E " " J$
1 1! n
(86)
Calculation of the state-dependent nonlogarithmic contribution of order a(Za) is a di$cult task, and has not been done for an arbitrary principal quantum number n. The "rst estimate of this contribution was made in [84] Next the problem was attacked from a di!erent angle [85,86]. Instead of calculating corrections of order a(Za) an exact numerical calculation of all contributions with one radiative photon, without expansion over Za, was performed for comparatively large values of Z (n"2), and then the result was extrapolated to Z"1. In this way an estimate of the sum of the contribution of order a(Za) and higher-order contributions a(Za) was obtained (for n"2 and Z"1). We will postpone discussion of the results obtained in this way up to Section 4.5.1, dealing with corrections of order a(Za), and will consider here only the direct calculations of the contribution of order a(Za). An exact formula in Za for all nonrecoil corrections of order a had the form *E"1n"R"n2 ,
(87)
where R is an `exacta second-order self-energy operator for the electron in the Coulomb "eld (see Fig. 23), and hence contains the unmanageable exact Dirac}Coulomb Green function. The real problem with this formula is to extract useful information from it despite the absence of a convenient expression for the Dirac}Coulomb Green function. Numerical calculation without expansion
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
107
Fig. 23. Exact second-order self-energy operator.
over Za, mentioned in the previous paragraph, was performed directly with the help of this formula. A more precise (than in [84]) value of the nonlogarithmic correction of order a(Za) for the 1S-state was obtained in [87,88], with the help of a specially developed `perturbation theorya for the Dirac}Coulomb Green function which expressed this function in terms of the nonrelativistic SchroK dinger}Coulomb Green function [89,90]. But the real breakthrough was achieved in [91,92], where a new very e!ective method of calculation was suggested and very precise values of the nonlogarithmic corrections of order a(Za) for the 1S- and 2S-states were obtained. We will brie#y discuss the approach of papers [91,92] in the next subsection. 4.4.1.2. New approach to separation of the high- and low-momentum contributions. Nonlogarithmic corrections. Starting with the very "rst nonrelativistic consideration of the main contribution to the Lamb shift [32] separation of the contributions of high- and low-frequency radiative photons became a characteristic feature of the Lamb shift calculations. The main idea of this approach was already explained in Section 4.2.1, but we skipped over two obstacles impeding e!ective implementation of this idea. Both problems are connected with the e!ective realization of the matching procedure. In real calculations it is not always obvious how to separate the two integration regions in a consistent way, since in the high-momenta region one uses explicitly relativistic expressions, while the starting point of the calculation in the low-momenta region is the nonrelativistic dipole approximation. The problem is aggravated by the inclination to use di!erent gauges in di!erent regions, since the explicitly covariant Feynman gauge is the simplest one for explicitly relativistic expressions in the high-momenta region, while the Coulomb gauge is the gauge of choice in the nonrelativistic region. In order to emphasize the seriousness of these problems it su$ces to mention that incorrect matching of high- and low-frequency contributions in the initial calculations of Feynman and Schwinger led to a signi"cant delay in the publication of the "rst fully relativistic Lamb shift calculation of French and Weisskopf [34]! It was a strange irony of history that due to these di$culties it became common wisdom in the sixties that it is better to try to avoid the separation of the contributions coming from di!erent momenta regions (or di!erent distances) than to try to invent an accurate matching procedure. A few citations are appropriate here. Bjorken and Drell [19] wrote, having in mind the separation procedure: `The reader may understandably be unhappy with this procedure 2 we recommend the recent treatment of Erickson and Yennie [83,84], which avoids the division into soft and hard photonsa. Schwinger [55] wrote: `2 there is a moral here for us. The arti"cial separation of high and low frequencies, which are handled in di!erent ways, must be avoided.a All this was written even though it was understood that the separation of the large and small distances was physically quite natural and the contributions coming from large and small distances have a di!erent physical nature. However, the distrust to the See fascinating description of this episode in [93].
108
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
methods used for separation of the small and large distances was well justi"ed by the lack of a regular method of separation. Apparently di!erent methods were used for calculation of the high- and low-frequency contributions, high-frequency contributions being commonly treated in a covariant four-dimensional approach, while old-fashioned nonrelativistic perturbation theory was used for calculation of the low-frequency contributions. Matching these contributions obtained in di!erent frameworks was an ambiguous and far from obvious procedure, more art than science. As a result, despite the fact that the methods based on separation of long- and short-distance contributions had led to some spectacular results (see, e.g., [94,95]), their selfconsistency remained suspect, especially when it was necessary to calculate the contributions of higher order than in the classic works. It seemed more or less obvious that in order to facilitate such calculations one needed to develop uniform methods for treatment of both small and large distances. The actual development took, however, a di!erent direction. Instead of rejecting the separation of high and low frequencies, more elaborate methods of matching respective contributions were developed in the last decade, and the general attitude to separation of small and large distances radically changed. Perhaps the "rst step to carefully separate the long and short distances was done in [25], where the authors had rearranged the old-fashioned perturbation theory in such a way that one contribution emphasized the small momentum contributions and led to a Bethe logarithm, while in the other the small momentum integration region was naturally suppressed. Matching of both contributions in this approach was more natural and automatic. However, the price for this was perhaps too high, since the high-momentum contribution was to be calculated in a three-dimensional way, thus losing all advantages of the covariant four-dimensional methods. Almost all new approaches, the skeleton integral approach described above in Section 4.3.1 ([62] and references there), e-method described in this section [91,92], nonrelativistic approach by Khriplovich and coworkers [96], nonrelativistic QED of Caswell and Lepage [17]) not only make separation of the small and large distances, but try to exploit it most e!ectively. In some cases, when the whole contribution comes only from the small distances, a rather simple approach to this problem is appropriate (like in the calculation of corrections of order a(Za), a(Za), a(Za) and a(Za) above, more examples below) and the scattering approximation is often su$cient. In such cases, would-be infrared divergences are powerlike. They simply indicate the presence of the contributions of the previous order in Za and may safely be thrown away. In other cases, when one encounters logarithms which get contributions both from the small and large distances, a more accurate approach is necessary such as the one described below. In any case `the separation of low and high frequencies, which are handled in di!erent waysa not only should not be avoided but turns out to be a very convenient calculational tool and clari"es the physical nature of the corrections under consideration. An e!ective method to separate contributions of low- and high-momenta avoiding at the same time the problems discussed above was suggested in [91,92]. Consider in more detail the exact expression Eq. (87) for the sum of all corrections of orders a(Za)Lm (n51) generated by the insertion of one radiative photon in the electron line
*E"e
dk D (k)1n"c G(p!k; p!k)c "n2 , I J (2p)i IJ
(88)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
109
where G(p!k; p!k) is the exact electron Green function in the external Coulomb "eld. As was noted in [91,92] one can rotate the integration contour over the frequency of the radiative photon in such a way that it encloses singularities along the positive real axes in the u(k) plane. Then one considers separately the region Re u4p (region I) and Re u5p (region II), where m(Za);p;m(Za). It is easy to see that due to the structure of the singularities of the integrand, integration over k in the region I also goes only over the momenta smaller than p ("k"4p), while in the region II the "nal integration over u cuts o! all would be infrared divergences of the integral. Hence, e!ective separation of high- and low-momenta integration regions is achieved in this way and, as was explained above, due to the choice of the magnitude of the parameter p all would be divergences should exactly cancel in the sum of contributions of these regions. This cancellation provides an additional e!ective method of control of the accuracy of all calculations. It was also shown in [92] that a change of gauge in the low-frequency region changes the result of the calculations by a term linear in p. But anyway one should discard such contributions matching high- and low-frequency contributions. The matrix element of the self-energy operator between the exact Coulomb}Dirac wave functions is gauge invariant with respect to changes of gauge of the radiative photon [97]. Hence, it is possible to use the simple Feynman gauge for calculation of the high-momenta contribution, and the physical Coulomb gauge in the low-momenta part. It should be clear now that this method resolves all problems connected with the separation of the high- and low-momenta contributions and thus provides an e!ective tool for calculation of all corrections with insertion of one radiative photon in the electron line. The calculation performed in [91,92,98] successfully reproduced all results of order a(Za) and a(Za) and produced a high precision result for the constant of order a(Za) *E (1S)"!30.92415(1) U
a(Za) m m, m p
(89)
a(Za) m m. *E (2S)"!31.84047(1) U 8p m Besides the high accuracy of this result two other features should be mentioned. First, the state dependence of the constant is very weak, and second, the scale of the constant is just of the magnitude one should expect. In order to make this last point more transparent let us write the total electron-line contribution of order a(Za) to the 1S energy shift in the form
m(Za)\ 28 m(Za)\ a(Za) m 21 m # ln 2! ln !30.92890 m 3 m p m 20 m(Za)\ m(Za)\ a(Za) m m. (90) + !ln #5.42 ln !30.93 m m p m Now we see that the ratio of the nonlogarithmic term and the coe$cient before the singlelogarithmic term is about 31/5.4+5.7+0.6p. It is well known that the logarithm squared terms in QED are always accompanied by the single-logarithmic and nonlogarithmic terms, and the nonlogarithmic terms are of order p (in relation with the current problem see, e.g., [83,84]). This is just what happens in the present case, as we have demonstrated. *E(1S)" !ln
110
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Table 4 Nonlogarithmic coe$cient A
a(Za) m m pn m
kHz
1S Pachucki [91,92,98]
!30.92415(1)
!1338.04
2S Pachucki [91,92]
!31.84047(1)
!172.21
2P Jentschura and Pachucki [99]
!0.99891(1)
!5.40
2P Jentschura and Pachucki [99]
!0.50337(1)
!2.72
3P Jentschura et al. [100]
!1.14768(1)
!1.84
3P Jentschura et al. [100]
!0.59756(1)
!0.96
4P Jentschura et al. [100]
!1.19568(1)
!0.81
4P Jentschura et al. [100]
!0.63094(1)
!0.43
Nonlogarithmic contributions of order a(Za) to the energies of the 2P, 3P and 4P-states induced by the radiative photon insertions in the electron line were obtained in the same framework in [99,100]. We have collected the respective results in Table 4 in terms of the traditionally used coe$cient A [83] which is de"ned by the relationship *E"A
a(Za) m m. pn m
(91)
4.4.1.3. Correction induced by the radiative insertions in the external photons. There are two kernels with radiative insertions in the external photon lines which produce corrections of order a(Za) to the Lamb shift. First is our old acquaintance } one-loop polarization insertion in the Coulomb line in Fig. 9. Its Fourier transform is called the Uehling potential [40,101]. The second kernel contains the light-by-light scattering diagrams in Fig. 24 with three external photons originating from the Coulomb source. The sum of all closed electron loops in Fig. 25 with one photon connected with the electron line and an arbitrary number of Coulomb photons originating from the Coulomb source may be considered as a radiatively corrected Coulomb potential t(r) . 3 We will now discuss contributions contained in Eq. (186) for di!erent special cases.
(186)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
157
7.3.3. Correction to the nS-levels. In the SchroK dinger-Coulomb approximation the expression in Eq. (186) reduces to the leading nuclear size correction in Eq. (158). New results arise if we take into account Dirac corrections to the SchroK dinger}Coulomb wave functions of relative order (Za). For the nS states the product of the wave functions in Eq. (186) has the form (see, e.g, [165])
"t(r)"""t (r)" 1!(Za) ln 1
2mrZa 9 1 11 #t(n)#2c# ! ! n 4n n 4
,
(187)
and the additional contribution to the energy shift is equal to
2(Za) *E"! m1r2 3n
ln
2mrZa 9 1 11 #t(n)#2c# ! ! . n 4n n 4
(188)
This expression nicely illustrates the main qualitative features of the order (Za) nuclear size contribution. First, we observe a logarithmic enhancement connected with the singularity of the Dirac wave function at small distances. Due to the smallness of the nuclear size, the e!ective logarithm of the ratio of the atomic size and the nuclear size is a rather large number; it is equal to about !10 for the 1S level in hydrogen and deuterium. The result in Eq. (188) contains all state-dependent contributions of order (Za). A tedious third-order perturbation theory calculation [164,165] produces some additional state-independent terms with the net result being a few percent di!erent from the naive result above. The additional state-independent contribution beyond the naive result above has the form [167]
1r2 1r211/r2 2(Za) m ! # dr dr o(r)o(r)h("r"!"r") *E" 2 3 3n
; (r#r) ln
"r" r r r!r ! # # "r" 3"r" 3"r" 3
# 6 dr dr dr o(r)o(r)o(r)h("r"!"r")h("r"!"r")
;
r "r" r r 1 1 rr 2rr r ln ! # # # ! # . 3 "r" 45"r""r" 9 "r" "r" 36r 9r 9
(189)
Note that, unlike the leading naive terms in Eq. (188), this additional contribution depends on more detailed features of the nuclear charge distribution than simply the charge radius squared. Detailed numerical calculations in the interesting cases of hydrogen and deuterium were performed in [167]. Nuclear size contributions of order (Za) to the energy shifts in hydrogen are given in Table 10 and, as discussed above, they are an order of magnitude larger than the nuclear size and polarizability contributions of the previous order in Za. Respective corrections to the energy levels in deuterium are even much larger than in hydrogen due to the large radius of the deuteron. The nulear size contribution of order (Za) to the 2S}1S splitting in deuterium is equal to (we have used in this calculation the value of the deuteron charge All numbers in Table 10 are calculated for the proton radius r "0.862(12) fm, see discussion on the status of the N proton radius results in Section 16.1.5.
158
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Table 10 Nuclear size and structure corrections *E(1S) kHz
*E(2S) kHz
Leading nuclear size contribution
2 (Za)m1r2d J 3n
1162(32)
145 (4)
Proton form factor contribution of order (Za) Borisoglebsky and Tro"menko [164] Friar [165]
m(Za) ! m1r2 d J 3n
!0.036
!0.004
!0.070(11)(7)
!0.009(1)(1)
Polarizability contribution *E(nS) Startsev et al. [172] Khriplovich and Senikov [177,179,180]
a(Za) mm ! pn K [5a(0, 0)!bM (0, 0)]ln m
Polarizability contribution *E(nP) Ericson and Hufner [169]
a(Za)am ! 2
Nuclear size correction of order (Za) *E(nS) Borisoglebsky and Tro"menko [164] Friar [165]
2(Za) ! m1r2 3n
Friar and Payne [167]
9 1 11 #t(n)#2c# ! ! #dE 4n n 4
Nuclear size correction of order (Za) *E(nP ) H Friar [165]
(Za)(n!1) m1r2d H 6n
Electron-line radiative correction Pachucki [195] Eides and Grotch [192]
a(Za) d !1.985(1)m1r2 n J
3n!l(l#1) 3 1 1 2n l# (l#1) l# l l! 2 2 2
2mrZa ln n
0.709 (20)
!0.184 (5)
0.095 (3)
!0.023 (1)
Polarization operator radiative correction Friar [193] Hylton [194] Pachucki [195] Eides and Grotch [192]
1 a(Za) m1r2 d 2 n J
0.046 (1)
0.006
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
159
radius obtained in [189] from the analysis of all available experimental data) *E"!3.43 kHz ,
(190)
and in hydrogen *E"!0.61 kHz .
(191)
We see that the di!erence of these corrections gives an important contribution to the hydrogen} deuterium isotope shift. 7.3.2.2. Correction to the nP-levels. Corrections to the energies of P-levels may easily be obtained from Eq. (186). Since the P-state wave functions vanish at the origin there are no charge radius squared contributions of lower order, unlike the case of S states, and we immediately obtain [165] (n!1)(Za) *E(nP )" m1r2d . H H 6n
(192)
There exist also additional terms of order (Za) proportional to 1r2 [165] but they are suppressed by an additional factor m1r2 in comparison with the result above and may safely be omitted. 7.4. Radiative correction of order a(Za)1r2m to the xnite size ewect P Due to the large magnitude of the leading nuclear size correction in Eq. (158) at the current level of experimental accuracy one also has to take into account radiative corrections to this e!ect. These radiative corrections were "rst discussed and greatly overestimated in [190]. The problem was almost immediately clari"ed in [191], where it was shown that the contribution is generated by large intermediate momenta states and is parametrically a small correction of order a(Za)m1r2. On the basis of the estimate in [191] the authors of [7] expected the radiative correction to the leading nuclear charge radius contribution to be of order 10 Hz for the 1S-state in hydrogen. The large magnitude of the characteristic integration momenta [191] is quite clear. As we have seen above, in the calculation of the main proton charge contribution, the exchange momentum squared factor in the numerator connected with the proton radius cancels with a similar factor in the denominator supplied by the photon propagator. Any radiative correction behaves as k at small momenta, and the presence of such a correction additionally suppresses small integration momenta and pushes the characteristic integration momenta into the relativistic region of order of the electron mass. Hence, the corrections may be calculated with the help of the skeleton integrals in the scattering approximation. The characteristic integration momenta in the skeleton integral are of order of the electron mass, and are still much smaller than the scale of the proton form factor. As a result respective contribution to the energy shift depends only on the slope of the form factor. The actual calculation essentially coincides with the calculation of the corrections of order a(Za) to the Lamb shift in Section 4.3.3 but is technically simpler due to the triviality of the proton form factor slope contribution in Eq. (156). There are two sources of radiative corrections to the leading nuclear size e!ect, namely, the diagrams with one-loop radiative insertions in the electron line in Fig. 46, and the diagrams with one-loop polarization insertions in one of the external Coulomb lines in Fig. 47.
160
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 46. Electron-line radiative correction to the nuclear size e!ect. Bold dot corresponds to proton form factor slope.
Fig. 47. Coulomb-line radiative correction to the nuclear size e!ect. Bold dot corresponds to proton form factor slope.
7.4.1. Electron-line correction Inserting the electron line factor [74,75] and the proton slope contribution Eq. (156) in the skeleton integral in Eq. (66), one immediately obtains [192] *E "!1.985(1) U
a(Za) m1r2d . J n
(193)
where an additional factor 2 was also inserted in the skeleton integral in order to take into account all possible ways to insert the slope of the proton form factor in the Coulomb photons. In principle, this integral also admits an analytic evaluation in the same way as it was done for a more complicated integral in [75]. 7.4.2. Polarization correction Calculation of the diagrams with the polarization operator insertion proceeds exactly as in the case of the electron factor insertion. The only di!erence is that one inserts an additional factor 4 in the skeleton integral to take into account all possible ways to insert the polarization operator and the slope of the proton form factor in the Coulomb photons. After an easy analytic calculation one obtains [193,194,192] a(Za) m1r2d . *E " J 2n
(194)
7.4.3. Total radiative correction The total radiative correction to the proton size e!ect is given by the sum of contributions in Eqs. (194) and (193) *E"!1.485(1)
a(Za) m1r2d . J n
(195)
This contribution was also considered in [195]. Correcting an apparent misprint in that work, one "nds the value !1.43 for the numerical coe$cient in Eq. (195). The origin of the minor discrepancy between this value and the one in Eq. (195) is unclear, since the calculations in [195] were done without separation of the polarization operator and electron factor contributions.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
161
Fig. 48. Z-boson exchange diagram.
Numerically the total radiative contribution in Eq. (195) for hydrogen is equal to *E(1S)"!0.138 kHz ,
(196)
*E(2S)"!0.017 kHz , and for deuterium *E(1S)"!0.841 kHz ,
(197)
*E(2S)"!0.105 kHz . These contributions should be taken into account in discussion of the hydrogen}deuterium isotope shift. 8. Weak interaction contribution The weak interaction contribution to the Lamb shift is generated by the Z-boson exchange in Fig. 48, which may be described by the e!ective local low-energy Hamiltonian
16pa 1 mM H8(¸)"! !sin h dx(t>(x)t(x))(W>(x)W(x)) , (198) 5 M sin h cos h 4 5 5 8 where M is the Z-boson mass, h is the Weinberg angle, and t and W are the two-component 8 5 wave functions of the light and heavy particles, respectively. Then we easily obtain the weak interaction contribution to the Lamb shift in hydrogen [196]
a(Za)m a(Za)m 8Gm 1 !sin h d . d +!7.7;10\ *E8(¸)"! 5 J J pn pn (2a 4
(199)
This contribution is too small to be of any phenomenological signi"cance.
9. Lamb shift in light muonic atoms Theoretically, light muonic atoms have two main special features as compared with the ordinary electronic hydrogenlike atoms, both of which are connected with the fact that the muon is about 200 times heavier than the electron. First, the role of the radiative corrections generated by the Discussing light muonic atoms we will often speak about muonic hydrogen but almost all results below are valid also for another phenomenologically interesting case, namely muonic helium. In the sections on light muonic atoms, m is the muon mass, M is the proton mass, and m is the electron mass. C
162
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 49. Energy levels in muonic hydrogen.
closed electron loops is greatly enhanced, and second, the leading proton size contribution becomes the second largest individual contribution to the energy shifts after the polarization correction. The reason for an enhanced contribution of the radiatively corrected Coulomb potential in Fig. 9 may be easily explained. The characteristic distance at which the Coulomb potential is distorted by the polarization insertion is determined by the electron Compton length 1/m and in C the case of electronic hydrogen it is about 137 times less than the average distance between the atomic electron and the Coulomb source 1/(m Za). This is the reason why even the leading C polarization contribution to the Lamb shift in Eq. (31) is so small for ordinary hydrogen. The situation with muonic hydrogen is completely di!erent. This time the average radius of the muon orbit is about r +1/(mZa) and is of order of the electron Compton length r +1/m , the ?R ! C respective ratio is about r /r +m /(mZa)+0.7, and the muon spends a signi"cant part of its ?R ! C life inside the region of the strongly distorted Coulomb potential. Qualitatively one can say that the muon penetrates deep in the screening polarization cloud of the Coulomb center, and sees a larger unscreened charge. As a result the binding becomes stronger, and for example the 2S-level in muonic hydrogen in Fig. 49 turns out to be lower than the 2P-level [197], unlike the case of ordinary hydrogen where the order of levels is just the opposite. In this situation the polarization correction becomes by far the largest contribution to the Lamb shift in muonic hydrogen. The relative contribution of the leading proton size contribution to the Lamb shift interval in electronic hydrogen is about 10\. It is determined mainly by the ratio of the proton size contribution to the leading logarithmically enhanced Dirac form factor slope contribution in Eq. (40) (which is much larger than the polarization contribution for electronic hydrogen). The relatively larger role of the leading proton size contribution in muonic hydrogen may also be easily understood qualitatively. Technically the leading proton radius contribution in Eq. (158) is of order (Za)m1r2, where m is the mass of the light particle, electron or muon in the case of ordinary and muonic hydrogen, respectively. We thus see that the relative weight of the leading proton charge
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
163
contribution to the Lamb shift, in comparison with the standard nonrecoil contributions, is enhanced in muonic hydrogen by the factor (m/m ) in comparison with the relative weight of the C leading proton charge contribution in ordinary hydrogen, and it becomes larger than all other standard nonrecoil and recoil contributions. Overall the weight of the leading proton radius contribution in the total Lamb shift in muonic hydrogen is determined by the ratio of the proton size contribution to the leading electron polarization contribution. In electronic hydrogen the ratio of the proton radius contribution and the leading polarization contribution is about 5;10\, and is much larger than the weight of the proton charge radius contribution in the total Lamb shift. In muonic hydrogen this ratio is 10\, four times larger than the ratio of the leading proton size contribution and the leading polarization correction in electronic hydrogen. Both the leading proton size correction and the leading vacuum polarization contribution are parametrically enhanced in muonic hydrogen, and an extra factor four in their ratio is due to an additional accidental numerical enhancement. Below we will discuss corrections to the Lamb shift in muonic hydrogen, with an emphasis on the classic 2P}2S Lamb shift, having in mind the experiment on measurement of this interval which is now under way [198] (see also Section 16.1.10). Being interested in theory, we will consider even those corrections to the Lamb shift which are an order of magnitude smaller than the expected experimental precision 0.008 meV. Such corrections could become phenomenologically relevant for muonic hydrogen in the future. Another reason to consider these small corrections is that many of them scale as powers of the parameter Z, and produce larger contributions for atoms with higher Z. Hence, even being too small for hydrogen they become phenomenologically relevant for muonic helium where Z"2. 9.1. Closed electron-loop contributions of order aL(Za)m 9.1.1. Diagrams with one external Coulomb line 9.1.1.1. Leading polarization contribution of order a(Za)m. The e!ects connected with the electron vacuum polarization contributions in muonic atoms were "rst quantitatively discussed in [199]. In electronic hydrogen polarization loops of other leptons and hadrons considered in Section 4.2.5 played a relatively minor role, because they were additionally suppressed by the typical factors (m /m). In the case of muonic hydrogen we have to deal with the polarization loops C of the light electron, which are not suppressed at all. Moreover, characteristic exchange momenta mZa in muonic atoms are not small in comparison with the electron mass m , which determines the C momentum scale of the polarization insertions (m(Za)/m +1.5). We see that even in the simplest C case the polarization loops cannot be expanded in the exchange momenta, and the radiative corrections in muonic atoms induced by the electron loops should be calculated exactly in the parameter m(Za)/m . C Electron polarization insertion in the photon propagator in Fig. 9 induces a correction to the Coulomb potential, which may be easily written in the form [20]
1 (f!1 Za 2a df e\KC PD 1# . d
(201)
where
R (r)"2 LJ
m Za (n!l!1)! 2m Za J 2m Za r e\K 8?LP¸J> r L\J\ n n[(n#l)!] n n
(202)
is the radial part of the SchroK dinger}Coulomb wave function in Eq. (1) (but now it depends on the reduced mass), and ¸J> is the associated Laguerre polynomial, de"ned as in [109,200] L\J\ (!1)G[(n#l)!] L\J\ xG . (203) ¸J> (x)" L\J\ i!(n!l!i!1)!(2l#i#1)! G The radial wave functions depend on radius only via the combination o"rm Za and it is convenient to write it explicitly as a function of this dimensionless variable
R (r)"2 LJ
m Za o , f LJ n n
(204)
where
o (n!l!1)! 2o J 2o , . e\ML¸J> f L\J\ n LJ n n[(n#l)!] n
(205)
Explicit dependence of the leading polarization correction on the parameters becomes more transparent after transition to the dimensionless integration variable o [199] 8a(Za) *E"! Q(b)m , LJ 3pn LJ
(206)
where
Q(b), LJ
o do
o 1 (f!1 df f e\MD@ 1# , LJ n f 2f
(207) and b"m /(m Za). The integral Q (b) may easily be calculated numerically for arbitrary n. It was C LJ calculated analytically for the lower levels n"1, 2, 3 in [201,202], and later these results were con"rmed numerically in [203]. Analytic results for all states with n"l#1 were obtained in [204]. The leading electron vacuum polarization contribution to the Lamb shift in muonic hydrogen in Eq. (207) is of order a(Za)m. Recall that the leading vacuum polarization contribution to the Lamb shift in electronic hydrogen in Eq. (32) is of order a(Za)m. Thus, the relative magnitude of the leading polarization correction in muonic hydrogen is enhanced by the factor 1/(Za)&(m/m ). C This means that the electronic vacuum polarization gives by far the largest contribution to the Lamb shift in muonic hydrogen. The magnitude of the energy shift in Eq. (206) is determined also
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
165
Fig. 50. Two-loop polarization insertions in the Coulomb photon.
by the dimensionless integral Q (b). At the physical value of b"m /(m Za)+0.7 this integral is LJ C small (Q (b)+0.061, Q (b)+0.056, Q (b)+0.0037) and suppresses somewhat the leading electron polarization contribution. The expression for Q(b) in Eq. (207) is valid for any b, in particular we can consider the case LJ when m"m . Then b"m/(m Za)
Za 1 1 # V ((t,2m f)" C 4. 2 m M !
mf Zamf e\KC DP C pd(r)! C e\KC DP ! (1!m fr) C r mM r
rGrH Za e\KC DP pG d # (1#2m fr) pH GH C r r 2mM
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
169
Fig. 55. Relativistic corrections to the leading electron polarization contribution.
#
Za 1 1 # e\KC DP(1#2m fr)[r;p] ) r . C r 4m 2mM
(219)
Then the analogue of the Breit potential induced by the electron vacuum polarization insertion is given by the integral
2a 1 (f!1 < " V (2m f) . (220) df 1# 4. 3p 4. C f 2f Calculation of the leading recoil corrections of order a(Za) becomes now almost trivial. One has to take into account that in our approximation the analogue of the Breit Hamiltonian in Eq. (36) has the form [211] p p Za H" # ! #< #
(221)
where < was de"ned in Eq. (35). Then the leading relativistic corrections of order a(Za) may be easily obtained as a sum of the "rstand second-order perturbation theory contributions corresponding to the diagrams in Fig. 55 [211] *E"1< 2#21< G(E )
1 p D Za a(Za) ! (f!1h(f!1)# dx(f!xf (x) , df e\KC DP d 0.0594
4a(Za) m Q5)(b) LJ pn
!0.0010
1* ion [306], and also for HFS in the 2P state [307] (see also review in [308]). 14.2. Radiative corrections to nuclear size and recoil ewects 14.2.1. Radiative-recoil corrections of order a(Za)(m/K)E $ Diagrams for the radiative corrections to the Zemach contribution in Figs. 103 and 104 are obtained from the diagrams in Figs. 97 and 98 by insertions of the radiative photons in the electron line or of the polarization operator in the external photon legs. Analytic expressions for the nuclear
Fig. 103. Electron-line radiative correction to the Zemach contribution.
228
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 104. Photon-line radiative correction to the Zemach contribution.
size corrections of order a(Za)E are obtained from the integral for the Zemach correction in $ Eq. (346) by insertions of the electron factor or the one-loop polarization operator in the integrand in Eq. (346). E!ective integration momenta in Eq. (346) are determined by the scale of the proton form factor, and so we need only the leading terms in the high-momentum expansion of the polarization operator and the electron factor for calculation of the radiative corrections to the Zemach correction. The leading term in the high-momentum asymptotic expansion of the electron factor is simply a constant (see the text above Eq. (326)) and the correction to hyper"ne splitting is the product of this constant and the Zemach correction [290] 5 a(Za) m1r2 E . *E" $ 2 p
(366)
The contribution of the polarization operator is logarithmically enhanced due to the logarithmic asymptotics of the polarization operator. This logarithmically enhanced contribution of the polarization operator is equal to the doubled product of the Zemach correction and the leading term in the polarization operator expansion (an extra factor two is necessary to take into account two ways to insert the polarization operator in the external photon legs in Figs. 97 and 98)
K a(Za) 4 m1r2 E . *E"! ln $ p m 3
(367)
Calculation of the nonlogarithmic part of the polarization operator insertion requires more detailed information on the proton form factors, and using the dipole parametrization one obtains [290]
K 317 a(Za) 4 m1r2 E . *E"! ln ! $ p m 105 3
(368)
14.2.2. Radiative-recoil corrections of order a(Za)(m/M)E $ Radiative-recoil corrections of order a(Za)(m/M)E are similar to the radiative corrections to the $ Zemach contribution, and in principle admit a straightforward calculation in the framework of the skeleton integral approach. Leading logarithmic contributions of this order were considered in [289,290]. The logarithmic estimate in [290] gives *E"0.11(2);10\E , $ for the contribution of the electron-line radiative insertions, and
(369)
*E"!0.02;10\E , $ for the contribution of the vacuum polarization insertions in the exchanged photons.
(370)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
229
Numerically these contributions are much smaller than the uncertainty of the Zemach correction. 14.2.3. Heavy particle polarization contributions Muon and heavy particle polarization contributions to hyper"ne splitting in muonium were considered in Sections 11.3.1 and 12.2.1.6. In the external "eld approximation the skeleton integral with the muon polarization insertion coincides with the respective integral for muonium (compare Eq. (283) and the discussion after this equation) and one easily obtains [61] m 3 (371) *E" a(Za) E . m $ 4 I This result gives a good idea of the magnitude of the muon polarization contribution since muon is relatively light in comparison to the scale of the proton form factor which was ignored in this calculation. The total muon polarization contribution may be calculated without great e!orts but due to its small magnitude such a calculation is of minor phenomenological signi"cance and was never done. Only an estimate of the total muon polarization contribution exists in the literature [290] *E"0.07 (2);10\E . (372) $ Hadronic vacuum polarization in the external "eld approximation for the pointlike proton also was calculated in [61]. Such a calculation may serve only as an order of magnitude estimate since both the external "eld approximation and the neglect of the proton form factor are not justi"ed in this case, because the scale of the hadron polarization contribution is determined by the same o-meson mass which determines the scale of the proton form factor. Again a more accurate calculation is feasible but does not seem to be warranted, and only an estimate of the hadronic polarization contribution appears in the literature [290] *E"0.03 (1);10\E . $
(373)
14.3. Weak interaction contribution The weak interaction contribution to hyper"ne splitting in hydrogen is easily obtained by generalization of the muonium result in Eq. (343) G 3mM g $ E +5.8;10\ kHz . *E" 1#i (2 4pZa $
(374)
Two features of this result deserve some comment. First, the axial coupling constant for the composite proton is renormalized by the strong interactions and its experimental value is g "1.267, unlike the case of the elementary muon when it was equal unity. Second the signs of the weak interaction correction are di!erent in the case of muonium and hydrogen [196].
230
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
15. Hyper5ne splitting in muonic hydrogen We have considered level shifts in muonic hydrogen in Section 9 neglecting hyper"ne structure. However, future measurements (see discussion in Section 16.1.10) will be done on the components of hyper"ne structure, and knowledge of this hyper"ne structure is crucial for comparison of the theoretical predictions for the Lamb shift in muonic hydrogen with the experimental data. We will consider below hyper"ne structure in the states 2S and 2P. 15.1. Hyperxne structure of the 2S state Due to enhancement of the light electron loops in muonic hydrogen they produce the largest contribution to the Lamb shift in muonic hydrogen (see Section 4.3). Unlike the Lamb shift, where the leading contribution is a radiative (loop) correction, the leading contribution to hyper"ne splitting already exists at the tree level (see discussion in Section 4.4). Hence, the Fermi contribution in Eq. (271) (with the natural substitution of the heavy particle mass and anomalous magnetic moment instead of the respective muon characteristics) remains by far the largest contribution to HFS in muonic atoms. The leading electron vacuum polarization contribution to HFS generated by the exchange of one-photon with polarization insertion in Fig. 105 is enhanced in muonic atoms, and becomes the next largest individual contribution to HFS after the Fermi contribution. The reason for this enhancement is the same as for the respective enhancement in the case of the Lamb shift: electron vacuum polarization distorts the "eld of an external source at distances of about 1/m and for muonic atoms the wave function is concentrated in a region of comparable size C determined by the Bohr radius 1/(mZa). Hence, the e!ect of the electronic vacuum polarization on the HFS is much stronger in muonic atoms than in the electronic atoms where the region where the external potential is distorted by the vacuum potential is negligible in comparison with the e!ective radius of the wave function. As a result the electron vacuum polarization contribution to HFS in light muonic atoms is of order aE [309], to be compared with the leading polarization contribu$ tion in electronic hydrogen of order a(Za)E in Eq. (283). Thus, in order to translate the hyper"ne $ results for ordinary hydrogen into the results for muonic hydrogen we have to consider additional contributions which are due to the polarization insertions. Notice "rst of all that the nonrecoil results in Tables 12}17 may be directly used in the case of muonic hydrogen. Since we are interested in the contribution to HFS in 2S state we should properly restore the dependence on the principal quantum number n, which was sometimes omitted above. This dependence is known for the results in Table 12 and all corrections in Tables 13 and 14 are state independent. For the corrections in Tables 15}17 only the leading logarithmic contributions are state independent, and the exact n dependence of the other terms is often unknown. However, all corrections in these Tables are of order 1;10\ meV and are too small
Fig. 105. Electron vacuum polarization insertion in the exchanged photon.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
231
Fig. 106. Leading electron vacuum polarization correction to HFS in muonic hydrogen.
for phenomenological goals (see discussion in Section 16.1.10). The sum of all corrections in Tables 12}17 for the 2S state is equal to *E"22.8322 meV . (375) Vacuum polarization corrections of order aE to HFS may be easily calculated with the help of $ the spin}spin term in the Breit-like potential corresponding to an exchange of a radiatively corrected photon in Fig. 105. This spin}spin potential < can be obtained by substituting the 4. spin}spin term corresponding to a massive exchange [211]
8 Za e\KC DP V (2m f)" (1#a )(1#i)(s ) sp ) pd(r)! (2m fr) C I I C 4. 3 mM 4r
(376)
in the integral in Eq. (220) instead of V (2m f). 4. C Correction to HFS is then given by a sum of the "rst- and second-order perturbation theory contributions similar to the respective contribution to the Lamb shift in Eq. (222) (see Fig. 106) *E"1< 2#21< G(E )
(379)
As was discussed in Sections 14.1.1.1 and 14.1.1.2 this contribution depends on the dipole parametrization of the proton form factors, the value of the proton radius, and can probably be improved as a result of dedicated analysis.
232
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Proton polarizability contributions of order (Za)E discussed for electronic hydrogen $ in Section 14.1.1.3 are notoriously di$cult to evaluate. Comparing the results for the upper boundary for the inelastic contribution in Eq. (362) and the elastic contribution from [289] discussed in Section 14.1.1.2 we see that the polarizability contribution is at the level of 10% of the elastic contribution in electronic hydrogen. As a conservative estimate it was suggested in [211] to assume that the same estimate is valid for muonic hydrogen. This assumption means that the polarizability contribution in muonic hydrogen does not exceed 0.15 meV. Collecting all contributions above we obtain the total HFS splitting in the 2S state in muonic hydrogen [211] (Table 19) *E"22.745 (15) meV .
(380)
As was discussed at the end of Section 14.1.1.2 we expect that the uncertainty of this result determined by the unknown polarizability contribution can be reduced as a result of a new analysis. 15.2. Fine and hyperxne structure of the 2P states The main contribution to the "ne and hyper"ne structure of the 2P states is described by the spin}orbit and spin}spin terms in the Breit Hamiltonian in Eq. (35) (spin}spin terms were omitted in Eq. (35)). For proper description of the "ne and hyper"ne structure we have to include in the Breit potential anomalous magnetic moments of both constituents and restore all terms which depend on the heavy particle spin. These were omitted in Eq. (35). Then relevant terms in the Breit potential have the form
1#a Za 1#2i 1#i Za 1#2a I# I (L ) s )# # (L ) sp ) < " I 2m r 2M mM mM r
(s ) r)(sp ) r) Za (1#i)(1#a ) I (s ) sp )!3 I . ! I mM r r
(381)
The "rst term in this potential describes spin}orbit interaction of light particles and its matrix element determines the "ne structure splitting between 2P and 2P states. Two other terms depend on the heavy particle spin. It is easy to see that these terms mix the states with the same total angular momentum F"J#sp and di!erent J [310], in our case these are the states 2P and 2P (see Fig. 49). Thus to "nd the "ne and hyper"ne structure of the 2P states we have to solve an elementary quantum-mechanical problem of diagonalizing a simple four by four Hamiltonian, where only a two by two submatrix is nondiagonal. Before solving this problem we have to consider if there are any other contributions to the Hamiltonian besides the terms in the Breit potential in Eq. (381). It is easy to realize that the only other contribution to the e!ective potential is given by the radiatively corrected one-photon exchange. The respective Breit-like potential may be obtained exactly as in Eq. (220) by integrating the spin}orbit potential corresponding the
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
233
Table 19 Hyper"ne splitting in hydrogen E "1 418 840.11 (3) (1) kHz $ Total nonrecoil contribution Tables 12}15 Proton size correction, relative order (Za)(m/K) Zemach [166]
1.0011360896 (19)
Bodwin and Yennie [289]
1 420 452.04 (3) (1)
!2(Za)m1r2 "!42.4 (1.1);10\
Recoil correction, relative order (Za)(m/M) Arnowitt [258] Newcomb and Salpeter [259] Iddings and Platzman [292] Recoil correction, relative order (Za)(m/M)
kHz
5.22 (1);10\
7.41 (2)
65 i(11#31i) (Za) m # # 18 1#i mM 36 "0.4585;10\
2 ! (Za) ln(Za)\m1r2"!0.002;10\ 3
Leading logarithmic correction, relative order (Za)(m/K) Karshenboim [290]
!(Za) ln(Za)\m1r2 "!0.01;10\
Electron-line correction, relative order a(Za)(m/K) Karshenboim [290]
5 a(Za) m1r2 "0.12;10\ 2 p
Photon-line correction, relative order a(Za)(m/K) Karshenboim [290]
K 317 a(Za) 4 m1r2 "!0.77;10\ ! ln ! p 3 m 105
Leading photon-line correction, relative order a(Za)(m/M) Karshenboim [290]
7i i(12!11i) 2(1#i)# ln(Za)\ ! 8(1#i)! ln 2 4 4
Leading logarithmic correction, relative order (Za)mr N Karshenboim [290]
Leading electron-line correction, relative order a(Za)(m/M) Karshenboim [290]
!60.2 (1.6)
0.65
!0.002
!0.016
0.11 (2);10\
!0.02;10\
0.17
!1.10
0.16
!0.03
Muon vacuum polarization, relative order a(Za)(m/m ) I Karshenboim [290,61]
0.07 (2);10\
0.10 (3)
Hadron vacuum polarization, Karshenboim [290,61]
0.03 (1);10\
0.04 (1)
Weak interaction contribution, Beg and Feinberg [282] Total theoretical HFS
g G 3mM $ "0.06;10\ 1#i (2 4pZa
0.08 1 420 399.3 (1.6)
234
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
exchange of a particle with mass (t"2m f C 1#a Za 1#2a I e\KC DP(1#2m fr)(L ) s ) I# V U (2m f)" C I 4. C mM 2m r
Za 1#2i 1#i # e\KC DP(1#2m fr)(L ) sp ) # C mM r 2M Za (1#i)(1#a ) I ! mM r
(s ) r)(sp ) r) (s ) sp )!3 I (1#2m fr) I C r
(s ) r)(sp ) r) (2m fr) # (s ) sp )! I I C r
e\KC DP .
(382)
All that is left to obtain the "ne and hyper"ne structure of the 2P states is to diagonalize the four by four Hamiltonian with interaction which is the sum of the Breit potential in Eq. (381) and the respective integral of the potential density in Eq. (382). This problem was solved in [211], where it was obtained *E(2P )"7.963 meV , *E(2P )"3.393 meV . (383) Due to mixing of the states 2P and 2P (see Fig. 49) they are additonally shifted by D"0.145 meV. 16. Comparison of theory and experiment In numerical calculations below we will use the most precise modern values of the fundamental physical constants. The value of the Rydberg constant is [311] R "10 973 731.568516 (84) m\, d"7.7;10\ , the "ne structure constant is equal to [312] a\"137.03599958 (52), d"3.8;10\ ,
(384)
(385)
the proton}electron mass ratio is equal to [313] M "1836.1526665 (40), d"2.2;10\ , m
(386)
the muon}electron mass ratio is equal to [5] M "206.768277 (24), d"1.2;10\ , m
(387)
and the deuteron}proton mass ratio is equal to [314] M "1.9990075013 (14), d"7.0;10\ . m
(388)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
235
16.1. Lamb shifts of the energy levels From the theoretical point of view the accuracy of calculations is limited by the magnitude of the yet uncalculated contributions to the Lamb shift. Corrections to the P levels are known now with a higher accuracy than the corrections to the S levels, and do not limit the results of the comparison between theory and experiment. 16.1.1. Theoretical accuracy of S-state Lamb shifts Corrections of order a(Za) are the largest uncalculated contributions to the energy levels for S-states. The correction of this order is a polynomial in ln(Za)\, starting with the logarithm cubed term. Both the logarithm cubed term and the contribution of the logarithm squared terms to the di!erence *E (1S)!8*E (2S) are known (for more details on these corrections see discussion in * * Section 4.4.2). However, the calculation of the respective contributions to the individual energy levels is still missing. With this circumstance it is reasonable to take one half of the logarithm cubed term (which has roughly the same magnitude as the logarithm squared contribution to the interval *E (1S)!8*E (2S)) as an estimate of the scale of all yet uncalculated logarithm squared * * contributions. We thus assume that uncertainties induced by the uncalculated contributions of order a(Za) constitute 14 and 2 kHz for the 1S- and 2S-states, respectively. All other unknown theoretical contributions to the Lamb shift are much smaller, and 14 kHz for the 1S-state and 2 kHz for the 2S-state are reasonable estimates of the total theoretical uncertainty of the expression for the Lamb shift. Theoretical uncertainties for the higher S levels may be obtained from the 1S-state uncertainty ignoring its state dependence and scaling it with the principal quantum number n. 16.1.2. Theoretical accuracy of P-state Lamb shifts The Lamb shift theory of P-states is in a better shape than the theory of S-states. The largest unknown corrections to the P-state energies are the single logarithmic contributions of the form a(Za) ln(Za)\m, like the one in Eq. (111), induced by radiative insertions in the electron and external photon lines, and the uncertainty of the nonlogarithmic contributions G of order 1# a(Za)m (see Table 6). One half of the double logarithmic contribution in Eq. (109) can be taken as a fair an estimate of the magnitude of the uncalculated single logarithmic contributions of the form a(Za) ln(Za)\m. An estimate of the theoretical accuracy of the 2P Lamb shift is then about 0.08 kHz. Theoretical uncertainties for the higher non-S levels may be obtained from the 2P-state uncertainty ignoring its state-dependence and scaling it with the principal quantum number n. 16.1.3. Theoretical accuracy of the interval ¸(1S)}8¸(2S) State-independent contributions to the Lamb shift scale as 1/n and vanish in the di!erence E (1S)!8E (2S), which may be calculated more accurately than the positions of the individual * * energy levels (see discussion in Section 4.4.2.3). All main sources of theoretical uncertainty of the individual energy levels, namely, proton charge radius contributions and yet uncalculated state independent corrections to the Lamb shift vanish in this di!erence. This observation plays an important role in extracting the precise value of the 1S Lamb shift from modern highly accurate experimental data (see discussion below in Section 16.1.5). Earlier the practical usefulness of the theoretical value of the interval E (1S)}8E (2S) for extraction of the experimental value of * *
236
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
the 1S Lamb shift was impeded by the insu$cient theoretical accuracy of this interval and by the insu$cient accuracy of the frequency measurement. Signi"cant progress was achieved recently in both respects, especially on the experimental side. On the theoretical side the last relatively large contribution to E (1S)}8E (2S) of order a(Za) ln(Za)\ was calculated in [115,116,119,120] * * (see Eq. (113) above), and the theoretical uncertainty of this interval was reduced to 5 kHz D,E (1S)!8E (2S)"!187 231 (5) kHz . * *
(389)
16.1.4. Classic Lamb shift 2S } 2P Discovery of the classic Lamb shift, i.e., splitting of the 2S and the 2P energy levels in hydrogen triggered a new stage in the development of modern physics. In the terminology accepted in this paper the classic Lamb shift is equal to the di!erence of Lamb shifts in the respective states *E(2S }2P )"¸(2S )!¸(2P ). Unlike the much larger Lamb shift in the 1S state, the classic Lamb shift is directly observable as a small splitting of energy levels which should be degenerate according to Dirac theory. This greatly simpli"es comparison between the theory and experiment for the classic Lamb shift, since the theoretical predictions are practically independent of the exact value of the Rydberg constant, which can be measured independently. Many experiments on precise measurement of the classic Lamb shift were performed since its experimental discovery in 1947. We have collected modern post 1979 experimental results in Table 20. Two entries in this table are changed compared to the original published experimental results [315,316]. These alterations re#ect recent improvements of the theory used for extraction of the Lamb shift value from the raw experimental data. The magnitude of the Lamb shift in [315] was derived from the ratio of the 2P decay width and the *E(2S }2P ) energy splitting which was directly measured by the atomic-inter ferometer method. The theoretical expression for the 2P -state lifetime was used for extraction of the magnitude of the Lamb shift. An additional leading logarithmic correction to the width of the 2P state of relative order a(Za) ln(Za)\, not taken into account in the original analysis of the experiment, was obtained recently in [120]. This correction slightly changes the original experimental result [315] *E"1 057 851.4 (1.9) kHz, and we cite this corrected value in Table 20. The magnitude of the new correction [120] triggered a certain discussion in the literature [317,318]. From the phenomenological point of view the new correction [120] is so small that neither of our conclusions below about the result in [315] is a!ected by this correction. The Lamb shift value [316] was obtained from the measurement of interval 2P }2S , and the value of the classical Lamb shift was extracted by subtraction of this energy splitting from the theoretical value of the "ne structure interval 2P }2P . As was "rst noted in [99], recent progress in the Lamb shift theory for P-states requires reconsideration of the original value *E"1 057 839 (12) kHz of the classical Lamb shift obtained in [316]. We assume that the total theoretical uncertainty of the "ne structure interval is about 0.08 kHz (see discussion of the accuracy of P-state Lamb shift above in Section 16.1.2). Comparable contribution of 0.08 kHz to the uncertainty of the "ne structure interval originates from the uncertainty of the most precise modern value of the "ne structure constant in [312] (see Eq. (385)). Calculating the theoretical value of the "ne structure interval we obtain *E(2P }2P )"10 969 041.52 (11) kHz, which is di!erent from the value *E(2P }2P )"10 969 039.4 (2) kHz, used in [316]. As a consequence the original experimental value [316] of the classic Lamb shift changes to the one cited in Table 20.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
237
Table 20 Classic 2S }2P Lamb shift *E (kHz) Newton et al. [319] Lundeen and Pipkin [320] Palchikov et al. [315] Hagley and Pipkin [316] Wijngaarden et al. [321] Schwob et al. [311]
Weitz et al. [323] Berkeland et al. [324] Bourzeix et al. [325] Udem et al. [3] Schwob et al. [311]
1057 862 (20) 1057 845 (9) 1057 857.6 (2.1) 1057 842 (12) 1057 852 (15) 1057 845 (3) 1057 814 (2) (4) 1057 833 (2) (4) 1057 857 (12) 1057 842 (11) 1057 836 (8) 1057 848 (5) 1057 841(5) 1057 843 (2) (6)
Experiment Experiment Experiment Experiment Experiment Exp., [322}325,3,320,316,321] Theory, r "0.805 (11) fm [157] N Theory, r "0.862 (12) fm [158] N Self-consistent value Self-consistent value Self-consistent value Self-consistent value Self-consistent value Theory, r "0.891(18) fm N
Due to a relatively large uncertainty of the result this change does not alter the conclusions below on comparison of the theory and experiment. Accuracy of the radiofrequency measurements of the classic 2S}2P Lamb shift [319, 320, 315,316,321] is limited by the large (about 100 MHz) natural width of the 2P state, and cannot be signi"cantly improved. New perspectives in reducing the experimental error bars of the classic 2S}2P Lamb shift were opened with the development of the Doppler-free two-photon laser spectroscopy for measurements of the transitions between the energy levels with di!erent principal quantum numbers. Narrow linewidth of such transitions allows very precise measurement of the respective transition frequencies, and indirect accurate determination of 2S}2P splitting from this data. The latest experimental value [311] in the "fth line of Table 20 was obtained by such methods. Both the theoretical and experimental data for the classic 2S }2P Lamb shift are collected in Table 20. Theoretical results for the energy shifts in this table contain errors in the parenthesis where the "rst error is determined by the yet uncalculated contributions to the Lamb shift, discussed above and the second re#ects the experimental uncertainty in the measurement of the proton rms charge radius. An immediate conclusion from the data in Table 20 is that the value of the proton rms radius as measured in [157] is by far too small to accommodate the experimental data on the Lamb shift. Even the larger value of the proton charge radius obtained in [158] is inconsistent with the result of the apparently most precise measurement of the 2S }2P splitting in [315]. The respective discrepancy is more than "ve standard deviations. Results of four other direct measurements of the classic Lamb shift collected in Table 20 are compatible with the theory
See more discussion of this method below in Section 16.1.5. We have used in the calculations the result in Eq. (148) for the radiative-recoil correction of order a(Za). Competing result in Eq. (149) would shift the value of the classic Lamb shift by 0.78 kHz, and would not e!ect our conclusions on the comparison between theory and experiment below.
238
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
if one uses the proton radius from [158]. Unfortunately, these results are rather widely scattered and have rather large experimental errors. Their internal consistency as well as their consistency with theory leaves much to be desired. Taken at face value the experimental results on the 2S }2P splitting indicate of an even larger value of the proton charge radius than measured in [158]. The situation with the experimental values of the proton charge radius is unsatisfactory and a new measurement is clearly warranted. We will return to the numbers in the "ve last lines in Table 20 below. 16.1.5. 1S Lamb shift Unlike the case of the classic Lamb shift above, the Lamb shift in the 1S is not amenable to a direct measurement as a splitting between certain energy levels and in principle could be extracted from the experimental data on the transition frequencies between the energy levels with di!erent principal quantum numbers. Such an approach requires very precise measurement of the gross structure intervals, and became practical only with the recent development of Doppler-free two-photon laser spectroscopy. These methods allow very precise measurements of the gross structure intervals in hydrogen with an accuracy which is limited in principle only by the small natural linewidths of respective transitions. For example, the 2S}1S transition in hydrogen is banned as a single photon process in the electric dipole and quadrupole approximations, and also in the nonrelativistic magnetic dipole approximation. As a result the natural linewidth of this transition is determined by the process with simultaneous emission of two electric dipole photons [6,20], which leads to the natural linewidth of the 2S}1S transition in hydrogen about 1.3 Hz. Many recent spectacular experimental successes where achieved in an attempt to achieve an experimental accuracy comparable with this extremely small natural linewidth. The intervals of gross structure are mainly determined by the Rydberg constant, and the same transition frequencies should be used both for measurement of the Rydberg constant and for measurement of the 1S Lamb shift. The "rst experimental task is to obtain an experimental value of the 1S Lamb shift which is independent of the precise value of the Rydberg constant. This goal may be achieved by measuring two intervals with di!erent principal quantum numbers. Then one constructs a linear combination of these intervals which is proportional aR (as opposed to &R leading contributions to the intervals themselves). Due to the factor a the precise magnitude of the Lamb shift extracted from the above mentioned linear combination of measured frequencies practically does not depend on the exact value of the Rydberg constant. For example, in one of the most recent experiments [3] measurement of the 1S Lamb shift is disentangled from the measurement of the Rydberg constant by using the experimental data on two di!erent intervals of the hydrogen gross structure [3] f } "2 466 061 413 187.34 (84) kHz, d"3.4;10\ , 1 1 and [322,311] f } "770 649 561 581.1 (5.9) kHz, d"7.7;10\ . 1 "
(390)
(391)
The original experimental value f } "770 649 561 585.0 (4.9) kHz [322] used in [3] was revised in [311], and 1 " we give in Eq. (391) this later value. The values of the Lamb shifts obtained in [3] change respectively and Tables 20 and 21 contain these revised values.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
239
Theoretically these intervals are given by the expression in Eq. (39) E } "[E"0 !E"0 ]#¸ !¸ , 1 1 1 1 1 1 E } "[E"0 !E"0 ]#¸ !¸ , " 1 " 1 1 "
(392)
where E"0H is the leading Dirac and recoil contribution to the position of the respective energy level LJ ("rst two terms in Eq. (39)). The "rst di!erences on the right-hand side in Eqs. (392) are proportional to the Rydberg constant, which thus can be simply excluded from this system of two equations. Then we obtain an equality between a linear combination of the 1S, 2S and 8D Lamb shifts and a linear combina tion of the experimentally measured frequencies. This relationship admits direct comparison with the Lamb shift theory without any further complication. However, to make a comparison between the results of di!erent experiments feasible (di!erent intervals of the hydrogen gross structure are measured in di!erent experiments) the "nal experimental results are usually expressed in terms of the 1S Lamb shift measurement. The bulk contribution to the Lamb shift scales as 1/n which allows one to use the theoretical value ¸ "71.5 kHz for the D-state Lamb shift without loss of " accuracy. Then a linear combination of the Lamb shifts in 1S and 2S states may be directly expressed in terms of the experimental data. All other recent measurements of the 1S Lamb shift [311,323}325] also end up with an experimental number for a linear combination of the 1S, 2S and higher level Lamb shifts. An unbiased extraction of the 1S Lamb shift from the experimental data remains a problem even after an experimental decoupling of the Lamb shift measurement from the measurement of the Rydberg constant. Historically the most popular approach to extraction of the value of the 1S Lamb shift was to use the experimental value of the classic 2S}2P Lamb shift (see three "rst lines in Table 21). Due to the large natural width of the 2P state the experimental values of the classical Lamb shift have relatively large experimental errors (see Table 20), and unfortunately di!erent results are not too consistent. Such a situation clearly warrants another approach to extraction of the 1S Lamb shift, one which should be independent of the magnitude of the classic Lamb shift. A natural way to obtain a self-consistent value of the 1S Lamb shift independent of the experimental data on the 2S}2P splitting, is provided by the theoretical relation between the 1S and 2S Lamb shifts discussed above in Section 16.1.3. An important advantage of the self-consistent method is that it produces an unbiased value of the ¸(1S) Lamb shift independent of the widely scattered experimental data on the 2S}2P interval. Spectacular experimental progress in the frequency measurement now allows one to obtain self-consistent values of 1S Lamb shift from the experimental data [3], with comparable or even better accuracy (see "ve lines in Table 21 below the theoretical values in the middle of the table) than in the method based on experimental results of the classic Lamb shift in [320,316]. The original experimental numbers from [3,311] in the fourth and "fth lines in Table 21 are averages of the self-consistent values and the values based on the classic Lamb shift. The result frequency from [322,311], and the in [3] is based on the f } frequency measurement, f } 1 "1 1 1 classic Lamb shift measurements [320,316], while the result in [311] is based on the f } 1 " The value of the 1S Lamb shift is also often needed for extraction of the precise value of the Rydberg constant from the experimental data, see Section 16.1.8 below.
240
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Table 21 1S Lamb shift *E(kHz) Weitz et al. [323] Berkeland et al. [324] Bourzeix et al. [325] Udem et al. [3] Schwob et al. [311]
Weitz et al. [323] Berkeland et al. [324] Bourzeix et al. [325] Udem et al. [3] Schwob et al. [311]
8172 874 (60) 8172 827 (51) 8172 798(46) 8172 851 (30) 8172 837 (22) 8172 605 (14) (28) 8172 754 (14) (32) 8172 937 (99) 8172 819 (89) 8172 772 (66) 8172 864 (40) 8172 805 (43) 8172 832 (14) (51)
Exp., ¸ [320] 1. Exp., ¸ [320,316] 1. Exp., ¸ [320,316] 1. Exp., ¸ [320,316] 1. Exp., [322}325,3,320,316,321] Theory, r "0.805 (11) fm [157] N Theory, r "0.862 (12) fm [158] N Self-consistent value Self-consistent value Self-consistent value Self-consistent value Self-consistent value Theory, r "0.891 (18) fm N
frequency measurement, as well as on the frequencies measured in [322,311,323}325,3], and the classic Lamb shift measurements [320,316,321]. The respective value of the classic 2S}2P Lamb shift is presented in the sixth line of Table 20. Unlike other experimental numbers in Table 20, this value of the classic Lamb shift depends on other experimental results in this table. The experimental data on the 1S Lamb shift should be compared with the theoretical prediction *E (1S)"8 172 754 (14) (32) kHz , (393) * calculated for r "0.862 (12) fm [158]. The "rst error in this result is determined by the yet N uncalculated contributions to the Lamb shift and the second re#ects the experimental uncertainty in the measurement of the proton rms charge radius. The experimental results in the "rst "ve lines in Table 21 seem to be systematically higher even than the theoretical value in Eq. (393) calculated with the higher experimental value for the proton charge radius [158]. One is tempted to come to the conclusion that the experimental data give an indication of an even higher value of the proton charge radius than the one measured in [158]. However, it is necessary to remember that the `experimentala results in the "rst "ve lines in the table are `biaseda, namely they depend on the experimental value of the 2S }2P Lamb shift [320,316,321]. In view of a rather large scattering of the results for the classic Lamb shift such dependence is unwelcome. To obtain unbiased results we have calculated self-consistent values of the 1S Lamb shift which are collected in Table 21. These values being formally consistent are rather widely scattered. Respective self-consistent values of the classic Lamb shift obtained from the experimental data in [323}325,3] are presented in Table 20. All experimental results (both original and self-consistent) in both tables are systematically larger than the respective theoretical predictions. The only plausible explanation is that the true value of the proton charge radius is even larger than the one measured in [158]. At this point we can invert the problem and obtain the value of the proton charge radius r "0.891 (18) fm N
(394)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
241
comparing the average ¸(1S)"8 172 832 (25) kHz of the self-consistent values of the 1S Lamb shift in Table 21 based on the most precise recent frequency measurements [322,311,323}325,3] with theory. Major contribution to the uncertainty of the proton charge radius in Eq. (394) is due to the uncertainty of the self-consistent Lamb shift. A new analysis of the low momentum transfer electron scattering data with account of the Coulomb and recoil corrections [159] resulted in the proton radius value r "0.880 (15) fm , (395) N in good agreement with the self-consistent value in Eq. (394). Another recent analysis of the elastic e!p scattering data resulted in an even higher value of the proton charge radius [326] r "0.897 (2) (1) (3) fm , (396) N where the error in the "rst brackets is due to statistics, the second error is due to normalization e!ects, and the third-error re#ects the model dependence. Comparing results of these two analysis one has to remember that the Coulomb corrections which played the most important role in [159] were ignored in [326]. The results of [326] depend also on speci"c parametrization of the nucleon form factors. Under these conditions, despite the super"cial agreement between the results in Eqs. (395) and (396), the extraction of the precise value of the proton charge radius from the scattering data cannot be considered satisfactory, and further work in this "eld is required. Theoretical values of the classic 2S}2P Lamb shift and of the 1S Lamb shift corresponding to the proton radius in Eq. (394) are given in the last lines of Tables 20 and 21, respectively. It is clear that there is much more consistency between these theoretical predictions and the mass of experimental data on the Lamb shifts than between the predictions based on the proton charge radius from [158] (to say nothing about the radius from [157]) and experiment. We expect that future experiments on measurement of the proton charge radius will con"rm the hydrogen Lamb shift prediction of the value of the proton charge radius in Eq. (394). Precise measurements of the Lamb shift in muonic hydrogen (see discussion in Section 16.1.10) provide the best approach to measurement of the proton charge radius, and would allow reduction of error bars in Eq. (394) by at least an order of magnitude. 16.1.6. Isotope shift The methods of Doppler-free two-photon laser spectroscopy allow very precise comparison of the frequencies of the 1S}2S transitions in hydrogen and deuterium. The frequency di!erence *E"[E(2S)!E(1S)] ![E(2S)!E(1S)] (397) " & is called the hydrogen}deuterium isotope shift. Experimental accuracy of the isotope shift measurements was improved by three orders of magnitude during the period from 1989 to 1998 (see Table 22) and the uncertainty of the most recent experimental result [327] was reduced to 0.15 kHz. The main contribution to the hydrogen}deuterium isotope shift is a pure mass e!ect and is determined by the term E"0 in Eq. (39). Other contributions coincide with the respective contribuLH tions to the Lamb shifts in Tables 1}10. Deuteron speci"c corrections discussed in Section 4.1 and collected in Eqs. (170), (181), (182) and (190) also should be included in the theoretical expression for the isotope shift.
242
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Table 22 Isotope shift *E (kHz) Boshier et al. [2] Schmidt-Kaler et al. [330] Huber et al. [327]
67 099 433 (64) 670 994 414 (22) 670 994 334.64 (15)
All yet uncalculated nonrecoil corrections to the Lamb shift almost cancel in the formula for the isotope shift, which is thus much more accurate than the theoretical expressions for the Lamb shifts. Theoretical uncertainty of the isotope shift is mainly determined by the unknown single logarithmic and nonlogarithmic contributions of order (Za)(m/M) and a(Za)(m/M) (see Sections 5.3 and 6.2), and also by the uncertainties of the deuteron size and structure contributions discussed in Section 7. Overall theoretical uncertainty of all contributions to the isotope shift, besides the leading proton and deuteron size corrections does not exceed 0.8 kHz. Theoretical predictions for the isotope shift strongly depend on the magnitude of the radiativerecoil corrections of order a(Za)(m/M)m. Unfortunately, there is still an unresolved discrepancy between the theoretical results on these corrections obtained in [35,36,151] and in [152] (for more detail see discussion in Section 6.1.1), and the di!erence between the respective values of the isotope shift is about 2.7 kHz, to be compared with the uncertainty 0.15 kHz of the most recent experimental result [327]. Discrepancy between the theoretical predictions for the radiative-recoil corrections of order a(Za)(m/M)m is one of the outstanding theoretical problems, and e!orts for its resolution are necessary. Numerically the sum of all theoretical contributions to the isotope shift, besides the leading nuclear size contributions in Eq. (158), is equal to *E"670 999 566.1 (1.5) (0.8) kHz ,
(398)
for the a(Za)(m/M)m contribution from [35,36,151], and *E"670 999 568.9 (1.5) (0.8) kHz ,
(399)
for the a(Za)(m/M)m contribution from [152]. The uncertainty in the "rst parenthesis is de"ned by the experimental error of the electron}proton and proton}deuteron mass ratios, and the uncertainty in the second parenthesis is the theoretical uncertainty discussed above. Individual uncertainties of the proton and deuteron charge radii introduce by far the largest contributions in the uncertainty of the theoretical value of the isotope shift. Uncertainty of the charge radii are much larger than the experimental error of the isotope shift measurement or the uncertainties of other theoretical contributions. It is su$cient to recall that uncertainty of the 1S Lamb shift due to the experimental error of the proton charge radius is as large as 32 kHz (see Eq. (393)), even if ignore all problems connected with the proton radius contribution (see discussion in Sections 16.1.4 and 16.1.5). In such a situation it is natural to invert the problem and to use the high accuracy of the optical measurements and isotope shift theory for determination of the di!erence of charge radii squared of
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
243
the deuteron and proton. We obtain r !r"3.8193 (01) (11) (04) fm , " N using the [35,36,151] value of the a(Za)(m/M)m corrections, and
(400)
r !r"3.8213 (01) (11) (04) fm , (401) " N for the [152] value of the a(Za)(m/M)m corrections. Here the "rst contribution to the uncertainty is due to the experimental error of the isotope shift measurement, the second uncertainty is due to the experimental error of the electron}proton mass ratio determination, and the third is generated by the theoretical uncertainty of the isotope shift. An improvement of the precision of the electron}proton mass ratio measurement is crucial for a more accurate determination of the di!erence of the charge radii squared of the deuteron and proton from the isotope shift measurements. The di!erence of the deuteron and proton charge radii squared is connected to the so called deuteron mean square matter radius (see, e.g., [328,163]), which may be extracted on one hand from the experimental data on the low-energy nucleon}nucleon interaction, and on the other hand from the experiments on low-energy elastic electron}deuteron scattering. These two kinds of experimental data used to generate inconsistent results for the deuteron matter radius as was "rst discovered in [328]. The discrepancy was resolved in [189], where the Coulomb distortion in the second order Born approximation was taken into account in the analysis of the electron}deuteron elastic scattering. This analysis was further improved in [329] where also the virtual excitations of the deuteron in the electron}deuteron scattering were considered. Now the values of the deuteron matter radius extracted from the low-energy nucleon}nucleon interaction [163] and from the low-energy elastic electron}deuteron scattering [189,329] are in agreement, and do not contradict the optical data in Eqs. (400) and (401). The isotope shift measurements are today the source of the most precise experimental data on the charge radii squared di!erence, and the deuteron matter radius. In view of the unsatisfactory situation with the proton charge radius measurements, more experimental work is clearly warranted (Table 22). 16.1.7. Lamb shift in helium ion He> The theory of high-order corrections to the Lamb shift described above for H and D may also be applied to other light hydrogenlike ions. The simplest such ion for which experimental data on the classic 2S }2P Lamb shift exists is He>. As measured in [125] by the quenching-anisotropy method, ¸(2S }2P , He>)"14 042.52 (16) MHz. A new measurement of the classic Lamb shift in He> by the anisotropy method has been completed recently [331]. In the process of this work the authors have discovered a previously unsuspected source of systematic error in the earlier experiment [125]. The result of the new experiment [331] is ¸(2S }2P , He>)" 14 041.13 (17) MHz. Besides the experimental data this result depends also on the theoretical value of the "ne structure interval *E(2P }2P )"175 593.50 (2) MHz, which may be easily obtained from the theory described in this paper. Theoretical calculation of the He> Lamb shift is straightforward with all the formulae given above. It is only necessary to recall that all contributions scale with the power of Z, and the terms with high power of Z are enhanced in comparison with the hydrogen case. This is particularly important for the contributions of order a(Za)L. One can gain in accuracy using in the theoretical
244
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
formulae high-Z results for the functions G (Za) and G (Za) [13,121], extrapolated to Z"2, 1# 4. instead of nonlogarithmic terms of order a(Za) from Table 5 and of the terms of order a(Za) from Table 7. Theoretical uncertainty may be estimated by scaling with Z the uncertainty of the hydrogen formulae. After calculation we obtain ¸ (2S}2P, He>)"14 041.18 (13) MHz, in excellent RF agreement with the latest experimental result [331]. Thus as a result of the new experiment [331] the only discrepancy between the Lamb shift theory and experiment which existed in recent years has been successfully eliminated. 16.1.8. Rydberg constant The leading contribution to the energy levels in hydrogen in Eq. (39) is clearly sensitive to the value of the Rydberg constant, and, hence, any measurement of the gross structure interval in hydrogen and deuterium may be used for determination of the value of the Rydberg constant, if the magnitudes of the Lamb shifts of respective energy levels are known. In practice only the data on the 1S and 2S (or classic 2S}2P) Lamb shifts limits the accuracy of the determination of the Rydberg constant. Higher-order Lamb shifts are known theoretically with su$cient accuracy. All recent values of the Rydberg constant are derived from experimental data on at least two gross structure intervals in hydrogen and/or deuterium. This allows simultaneous experimental determination of both the 1S Lamb shift and the Rydberg constant from the experimental data, and makes the obtained value of the Rydberg virtually independent of the Lamb shift theory and, what is more important on the controversial experimental data on the proton charge radius. Either self-consistent values of both the 1S and 2S Lamb shifts, or direct experimental value of the classic 2S}2P and respective 2S dependent value of 1S Lamb shift are usually used for determination of the precise value of the Rydberg constant. Recent experimental results for the Rydberg constant are collected in Table 23. A few comments freare due on the latest results. The value in [322] is based on the measurement of the f } 1 "1 quency, f } frequency from [332], and the classic Lamb shift measurements [320,316]. This 1 1 result should be changed due to recent revision [311] of the f } frequency. The result in [3] is 1 " frequency form [322]. and the classic Lamb based on the f } frequency measurement, f } 1 "1 1 1 shift measurements [320,316], and also should be revised. The result in [311] is based on the frequency measurement, as well as on the frequencies measured in [322,311,323}325,3], and f } 1 " the classic Lamb shift measurements [320,316,321]. The results in [322,3,311] are averages, obtained from experimental data on di!erent measured frequencies and their linear combinations in hydrogen and deuterium. In principle they depend both on the measured and self-consistent values of the Lamb shifts. To get a better idea of the e!ect of the rather widely spread experimental data on the classic Lamb shift on the value of the Rydberg constant and on the balance of uncertainties one can
The function G (Za) is de"ned in Footnote 13 in Section 4.5.1. The function G (Za) is de"ned similarly to 1# 4. the function G (Za) in Eq. (122), but like the function G (Za) also includes nonlogarithmic contributions of order 4. 1# a(Za). see Footnote 39. see Footnote 39.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
245
Table 23 Rydberg constant R (cm\) Andreae et al. [332] Nez et al. [333] Nez et al. [334] Weitz et al. [335] Weitz et al. [323] de Beauvoir et al. [322] Udem et al. [3] Schwob et al. [311] Self-consistent Lamb [322,311] Self-consistent Lamb [3] Self-consistent Lamb [311] Proton radius Eq. (394) [322,311] Proton radius Eq. (394) [3] Proton radius Eq. (3) [311]
109 737.3156841 (42) 109 737.3156830 (31) 109 737.3156834 (24) 109 737.3156844 (31) 109 737.3156849 (30) 109 737.3156859 (10) 109 737.31568639 (91) 109 737.31568516 (84) 109 737.3156858 (5) (8) (1) 109 737.3156852 (11) (0) (1) 109 737.3156846 (4) (9) (1) 109 737.3156843 (3) (8) (1) (9) 109 737.3156822 (6) (0) (1) (22) 109 737.3156831 (3) (9) (1) (9)
compare the experimental results in the upper part of Table 23 with the value of the Rydberg constant, which may be calculated from the experimental frequencies and self-consistent values of Lamb shifts. Values of the Rydberg constant calculated from the experimental transition frequencies in hydrogen in [3,311] and the average self-consistent values of the 1S and 2S}2P Lamb shifts (see Tables 20 and 21) are presented in the middle of Table 23. The "rst error of these values of the Rydberg constant is determined by the accuracy of the average self-consistent Lamb shifts, the second is de"ned by the experimental error of the frequency measurement, and the third is determined by the accuracy of the electron}proton mass ratio. We see that the results in the lower part of Table 23 are compatible with the results of the least square adjustments of all experimental data in the upper half of the table which thus do not depend crucially on the somewhat uncertain experimental data on the 2S}2P Lamb shift. We also see that the uncertainties of the Lamb shift determination and frequency measurements give the largest contributions to the Rydberg constant uncertainty in most experiments. High accuracy of the modern experimental data and theory could allow Rydberg constant determination from direct comparison between the theory and experiment, without appeal to the Lamb shift results. Respective values of the Rydberg constant, calculated with the self-consistent proton radius from Eq. (394) are presented in the lower part of Table 23. The "rst error of these values of the Rydberg constant is determined by the accuracy of the theoretical formula, the second is de"ned by the experimental error of the frequency measurement, the third is determined by the accuracy of the electron}proton mass ratio, and the last one depends on the proton radius uncertainty. The values of the Rydberg constant in the last three lines in Table 23 are rather accurate, and would be able to complete with the other methods of determination of the Rydberg constant from the experimental data after the current controversial situation with the precise value of the proton charge radius will be resolved. It is appropriate to emphasize once again that the experimental values of the Rydberg constant in the upper part of Table 23 are based on
246
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Table 24 1S}2S transition in muonium *E (MHz) Danzman et al. [336] Jungmann et al. [337] Maas et al. [338] Meyer et al. [339] Theory
2 455 527 936 (120) (140) 2 455 528 016 (58) (43) 2 455 529 002 (33) (46) 2 455 528 941.0 (9.8) 2 455 528 934.9 (0.3)
measurements of at least two intervals of the hydrogen and/or deuterium gross structure and are thus independent of the uncertain value of the proton charge radius. 16.1.9. 1S}2S transition in muonium Starting with the pioneering work [336] Doppler-free two-photon laser spectroscopy was also applied for measurements of the gross structure interval in muonium. Experimental results [336}339] are collected in Table 24, where the error in the "rst brackets is due to statistics and the second error is due to systematic e!ects. The highest accuracy was achieved in the latest experiment [339] *E"2 455 528 941.0 (9.8) MHz .
(402)
Theoretically, muonium di!ers from hydrogen in two main respects. First, the nucleus in the muonium atom is an elementary structureless particle unlike the composite proton which is a quantum chromodynamic bound state of quarks. Hence nuclear size and structure corrections in Table 10 do not contribute to the muonium energy levels. Second, the muon is about ten times lighter than the proton, and recoil and radiative-recoil corrections are numerically much more important for muonium than for hydrogen. In almost all other respects, muonium looks exactly like hydrogen with a somewhat lighter nucleus, and the theoretical expression for the 1S}2S transition frequency may easily be obtained from the leading external "eld contribution in Eq. (39) and di!erent contributions to the energy levels collected in Tables 1}9, after a trivial substitution of the muon mass. Unlike the case of hydrogen, for muonium we cannot ignore corrections in the two last lines of Table 2, and we have to substitute the classical elementary particle contributions in Eqs. (152) and (153) instead of the composite proton contribution in the "fth line in Table 9. After these modi"cations we obtain a theoretical prediction for the frequency of the 1S}2S transition in muonium *E"2 455 528 934.9 (0.3) MHz .
(403)
Even though there is an enhanced role of the recoil corrections for muonium, discrepancy between the results for the radiative-recoil corrections of order a(Za)(m/M)m discussed in Section 6.1.1 is too small, in comparison with the uncertainty originating from the mass ratio, to a!ect the theoretical prediction for the gross structure interval.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
247
The dominant contribution to the uncertainty of this theoretical result is generated by the uncertainty of the muon}electron mass ratio, and we have used the most precise value of this mass ratio [5] (see for more details the next Section 16.2) in our calculations. All other contributions to the uncertainty of the theoretical prediction: uncertainty of the Rydberg constant, uncertainty of the theoretical expression, etc., are at least an order of magnitude smaller. There is a complete agreement between the experimental and theoretical results for the 1S}2S transition frequency in Eqs. (402) and (403), but clearly further improvement of the experimental data is warranted.
16.1.10. Phenomenology of light muonic atoms There are very few experimental results on the energy levels in light hydrogenlike muonic atoms. The classic 2P !2S Lamb shift in muonic helium ion (kHe)> was measured at CERN many years ago [340}343] and the experimental data was found to be in agreement with the existing theoretical predictions. A comprehensive theoretical review of these experimental results was given in [214], and we refer the interested reader to this review. It is necessary to mention, however, that a recent new experiment [344] failed to con"rm the old experimental results. This leaves the problem of the experimental measurement of the Lamb shift in muonic helium in an uncertain situation, and further experimental e!orts in this direction are clearly warranted. The theoretical contributions to the Lamb shift were discussed above in Section 4.3 mainly in connection with muonic hydrogen, but the respective formulae may be used for muonic helium as well. Let us mention that some of these contributions were obtained a long time after publication of the review [214], and should be used in the comparison of the results of the future helium experiments with theory. There also exists a proposal on measurement of the hyper"ne splitting in the ground state of muonic hydrogen with the accuracy about 10\ [345]. Inspired by this proposal the hadronic vacuum polarization contribution of the ground state hyper"ne splitting in muonic hydrogen was calculated in [346], where it was found that it gives relative contribution about 2;10\ to hyper"ne splitting. We did not include this correction in our discussion of hyper"ne splitting in muonic hydrogen mainly because it is smaller than the theoretical errors due to the polarizability contribution. The current surge of interest in muonic hydrogen is mainly inspired by the desire to obtain a new more precise value of the proton charge radius as a result of measurement of the 2P}2S Lamb shift [198]. As we have seen in Section 4.3 the leading proton radius contribution is about 2% of the total 2P}2S splitting, to be compared with the case of electronic hydrogen where this contribution is relatively two orders of magnitude smaller, about 10\ of the total 2P}2S. Any measurement of the 2P}2S Lamb shift in muonic hydrogen with relative error comparable with the relative error of the Lamb shift measurement in electronic hydrogen is much more sensitive to the value of the proton charge radius. The natural linewidth of the 2P states in muonic hydrogen and respectively of the 2P}2S transition is determined by the linewidth of the 2P}1S transition, which is equal C"0.077 meV. It is planned [198] to measure 2P}2S Lamb shift with an accuracy at the level of 10% of the natural linewidth, or with an error about 0.008 meV, what means measuring the 2P}2S transition with relative error about 4;10\.
248
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
The total 2P}2S Lamb shift in muonic hydrogen calculated according to the formulae in Table 11 for r "0.862 (12) fm, is equal N *E(2P}2S)"202.225 (108) meV .
(404)
We can write this result as a di!erence of a theoretical number and a term proportional to the proton charge radius squared *E(2P}2S)"206.108 (4)!5.22501r2 meV .
(405)
We see from this equation that when the experiment achieves the planned accuracy of about 0.008 meV [198] this would allow determination of the proton charge radius with relative accuracy about 0.1% which is about an order of magnitude better than the accuracy of the available experimental results. Uncertainty in the sum of all theoretical contributions which are not proportional to the proton charge radius squared in Eq. (405) may be somewhat reduced. This uncertainty is determined by the uncertainties of the purely electrodynamic contributions and by the uncertainty of the nuclear polarizability contribution of order (Za)m. Purely electrodynamic uncertainties are introduced by the uncalculated nonlogarithmic contribution of order a(Za) corresponding to the diagrams with radiative photon insertions in the graph for leading electron polarization in Fig. 56, and by the uncalculated light by light contributions in Fig. 20(e), and may be as large as 0.004 meV. Calculation of these contributions and elimination of the respective uncertainties is the most immediate theoretical problem in the theory of muonic hydrogen. After calculation of these corrections, the uncertainty in the sum of all theoretical contributions except those which are directly proportional to the proton radius squared will be determined by the uncertainty of the proton polarizability contribution of order (Za). This uncertainty of the proton polarizability contribution is currently about 0.002 meV, and it will be di$cult to reduce it in the near future. If the experimental error of measurement 2P}2S Lamb shift in hydrogen will be reduced to a comparable level, it would be possible to determine the proton radius with relative error smaller that 3;10\ or with absolute error about 2;10\ fm, to be compared with the current accuracy of the proton radius measurements producing the results with error on the scale of 0.01 fm. 16.2. Hyperxne splitting 16.2.1. Hyperxne splitting in hydrogen Hyper"ne splitting in the ground state of hydrogen was measured precisely about thirty year ago [287,288] *E (H)"1 420 405.7517667 (9) kHz, &$1
d"6;10\ .
(406)
For many years, this hydrogen maser measurement remained the most accurate experiment in modern physics. Only recently the accuracy of the Doppler-free two-photon spectroscopy achieved comparable precision [3] (see the result for the 1S}2S transition frequency in Eq. (390)).
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
249
The theoretical situation for the hyper"ne splitting in hydrogen always remained less satisfactory due to the uncertainties connected with the proton structure. The scale of hyper"ne splitting in hydrogen is determined by the Fermi energy in Eq. (271) E (H)"1 418 840.11 (3) (1) kHz , (407) $ where the uncertainty in the "rst brackets is due to the uncertainty of the proton anomalous magnetic moment i measured in nuclear magnetons, and the uncertainty in the second brackets is due to the uncertainty of the "ne structure constant in Eq. (385). The sum of all nonrecoil corrections to hyper"ne splitting collected in Tables 12}15 is equal to *E (H)"1 420 452.04 (3) (1) kHz , (408) &$1 where the "rst error comes from the experimental error of the proton anomalous magnetic moment i, and the second comes from the error in the value of the "ne structure constant a. The experimental error of i determines the uncertainty of the sum of all nonrecoil contributions to the hydrogen hyper"ne splitting. The theoretical error of the sum of all nonrecoil contributions is about 3 Hz, at least an order of magnitude smaller than the uncertainty introduced by the proton anomalous magnetic monent i, and we did not write it explicitly in Eq. (408). In relative units this theoretical error is about 2;10\, to be compared with the estimate of the same error 1.2;10\ made in [289]. Reduction of the theoretical error by two orders of magnitude emphasizes the progress achieved in calculations of nonrecoil corrections during the last decade. The real stumbling block on the road to a more precise theory of hydrogen hyper"ne splitting is the situation with the proton structure, polarizability and recoil corrections, and there was little progress in this respect during recent years. Following tradition [289] let us compare the theoretical result without the unknown proton polarizability correction with the experimental data in the form *E (H)!*E (H) "!4.5 (1.1);10\ . (409) E (H) $ The di!erence between the numbers and estimates of errors on the RHS in Eq. (409) and the respective numbers in [289] is due mainly to di!erent treatment of the form factor parametrizations and the values of the proton radius. New recoil and structure corrections collected in the lower part of Table 19 had relatively small e!ect on the numbers on the RHS in Eq. (409). The uncertainty in Eq. (409) is dominated by the uncertainty of the Zemach correction in Eq. (347). As we discussed in Section 14.1.1.1, this uncertainty is connected with the accuracy of the dipole "t for the proton formfactor and contradictory experimental data on the proton radius. It is fair to say that the estimate of this uncertainty is to a certain extent subjective and re#ects the prejudices of the authors. One might hope that new experimental data on the proton radius and the proton form factor would provide more solid ground for consideration of the Zemach correction and would allow a more reliable estimate of the di!erence on the LHS in Eq. (409). The result in Eq. (409) does not contradict a rigorous upper bound on the proton polarizability correction in Eq. (362). It could be understood as an indication of the relatively large magnitude of
250
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
the polarizability contribution, and as a challenge to theory to obtain a reliable estimate of the polarizability contribution on the basis of the new experimental data. 16.2.2. Hyperxne splitting in deuterium The hyper"ne splitting in the ground state of deuterium was measured with very high accuracy a long time ago [347,308] *E (D)"327 384.3525219 (17) kHz, d"5.2;10\ . (410) &$1 The expression for the Fermi energy in Eq. (271), besides the trivial substitutions similar to the ones in the case of hydrogen, should be also multiplied by an additional factor 3/4 corresponding to the transition from a spin one half nucleus in the case of hydrogen and muonium to the spin one nucleus in the case of deuterium. The "nal expression for the deuterium Fermi energy has the form
4 m m \ ch R , (411) E (D)" ak 1# $ BM 9 M N B where k "0.8574382284 (94) is the deuteron magnetic moment in nuclear magnetons, M is the B B deuteron mass, and M is the proton mass. Numerically N E (D)"326 967.678 (4) kHz , (412) $ where the main contribution to the uncertainty is introduced by the uncertainty of the deuteron anomalous magnetic moment measure in nuclear magnetons. As in the case of hydrogen, after trivial modi"cations, we can use all nonrecoil corrections in Tables 12}16 for calculations in deuterium. The sum of all nonrecoil corrections is numerically equal to *E (D)"327 339.143 (4) kHz . (413) Unlike the proton, the deuteron is a weakly bound system so one cannot simply use the results for the hydrogen recoil and structure corrections for deuterium. What is needed in the case of deuterium is a completely new consideration. Only one minor nuclear structure correction [348}351] was discussed in the literature for many years, but it was by far too small to explain the di!erence between the experimental result in Eq. (410) and the sum of nonrecoil corrections in Eq. (413) *E (D)!*E (D)"45.2 kHz . (414) &$1 A breakthrough was achieved a few years ago when it was realized that an analytic calculation of the deuterium recoil, structure and polarizability corrections is possible in the zero range approximation [184]. An analytic result for the di!erence in Eq. (414), obtained as a result of a nice calculation in [184], is numerically equal 44 kHz, and within the accuracy of the zero range approximation perfectly explains the di!erence between the experimental result and the sum of the nonrecoil corrections. More accurate calculations of the nuclear e!ects in the deuterium hyper"ne structure beyond the zero range approximation are feasible, and comparison of such results with the experimental data on the deuterium hyper"ne splitting may be used as a test of the deuteron models.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
251
16.2.3. Hyperxne splitting in muonium Being a purely electrodynamic bound state, muonium is the best system for comparison between the hyper"ne splitting theory and experiment. Unlike the case of hydrogen the theory of hyper"ne splitting in muonium is free from uncertainties generated by the hadronic nature of the proton, and is thus much more precise. The scale of hyper"ne splitting is determined by the Fermi energy in Eq. (271) E (Mu)"4 459 031.922 (518) (34) kHz , $
(415)
where the uncertainty in the "rst brackets is due to the uncertainty of the muon}electron mass ratio in Eq. (387) and the uncertainty in the second brackets is due to the uncertainty of the "ne structure constant in Eq. (385). Theoretical prediction for the hyper"ne splitting interval in the ground state in muonium may easily be obtained collecting all contributions to HFS displayed in Tables 12}18 *E (Mu)"4 463 302.565 (518) (34) (100) kHz , &$1
(416)
where the "rst error comes from the experimental error of the electron}muon mass ratio m/M, the second comes from the error in the value of the "ne structure constant a, and the third is an estimate of the yet unknown theoretical contributions. We see that the uncertainty of the muon}electron mass ratio gives by far the largest contributions both in the uncertainty of the Fermi energy and the theoretical value of the ground state hyper"ne splitting. On the experimental side, hyper"ne splitting in the ground state of muonium admits very precise determination due to its small natural linewidth. The lifetime of the higher energy hyper"ne state with the total angular momentum F"1 with respect to the M1-transition to the lower level state with F"0 is extremely large q"1;10 s and gives negligible contribution to the linewidth. The natural linewidth C /h"72.3 kHz is completely determined by the muon lifetime I q +2.2;10\ s. I A high precision value of the muonium hyper"ne splitting was obtained many years ago [4] *E (Mu)"4 463 302.88 (16) kHz, &$1
d"3.6;10\ .
(417)
In the latest measurement [5] this value was improved by a factor of three *E (Mu)"4 463 302.776 (51) kHz, &$1
d"1.1;10\ ,
(418)
The new value has an experimental error which corresponds to measuring the hyper"ne energy splitting at the level of *l /(C /h)+7;10\ of the natural linewidth. This is a remarkable I experimental achievement. The agreement between theory and experiment is excellent. However, the error bars of the theoretical value are apparently about an order of magnitude larger than respective error bars of the experimental result. This is a deceptive impression. The error of the theoretical prediction in Eq. (416) is dominated by the experimental error of the value of the electron}muon mass ratio. As a result of the new experiment [5] this error was reduced threefold but it is still by far the largest source of error in the theoretical value for the muonium hyper"ne splitting.
252
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
The estimate of the theoretical uncertainty is only two times larger than the experimental error. The largest source of theoretical error is connected with the yet uncalculated theoretical contributions to hyper"ne splitting, mainly with the unknown recoil and radiative-recoil corrections. As we have already mentioned, reducing the theoretical uncertainty by an order of magnitude to about 10 Hz is now a realistic aim for the theory. One may extract electron}muon mass ratio from the experimental value of HFS and the most precise value of a M "206.7682672 (23) (16) (46) , m
(419)
where the "rst error comes from the experimental error of the hyper"ne splitting measurement, the second comes from the error in the value of the "ne structure constant a, and the third is an estimate of the yet unknown theoretical contributions. Combining all errors we obtain the mass ratio M "206.7682672 (54), m
d"2.6;10\ ,
(420)
which is almost "ve times more accurate than the best earlier experimental value in Eq. (387). We see from Eq. (419) that the error of this indirect value of the mass ratio is dominated by the theoretical uncertainty. This sets a clear task for the theory to reduce the contribution of the theoretical uncertainty in the error bars in Eq. (419) to the level below two other contributions to the error bars. It is su$cient to this end to calculate all contributions to HFS which are larger than 10 Hz. This would lead to reduction of the uncertainty of the indirect value of the muon}electron mass ratio by factor two. There is thus a real incentive for improvement of the theory of HFS to account for all corrections to HFS of order 10 Hz, created by the recent experimental and theoretical achievements. Another reason to improve the HFS theory is provided by the perspective of reducing the experimental uncertainty of hyper"ne splitting below the weak interaction contribution in Eq. (343). In such a case, muonium could become the "rst atom where a shift of atomic energy levels due to weak interaction would be observed [352]. 16.3. Summary High-precision experiments with hydrogenlike systems have achieved a new level of accuracy in recent years and further dramatic progress is still expected. The experimental errors of measurements of many energy shifts in hydrogen and muonium were reduced by orders of magnitude. This rapid experimental progress was matched by theoretical developments as discussed above. The accuracy of the quantum electrodynamic theory of such classical e!ects as Lamb shift in hydrogen and hyper"ne splitting in muonium has increased in many cases by one or two orders of magnitudes. This was achieved due to intensive work of many theorists and development of new ingenious original theoretical approaches which can be applied to the theory of bound states, not only in QED but also in other "eld theories, such as quantum chromodynamics. From the
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
253
phenomenological point of view recent developments opened new perspectives for precise determination of many fundamental constants (the Rydberg constant, electron}muon mass ratio, proton charge radius, deuteron structure radius, etc.), and for comparison of the experimental and theoretical results on the Lamb shifts and hyper"ne splitting. Recent progress also poses new theoretical challenges. Reduction of the theoretical error in prediction of the value of the 1S Lamb shift in hydrogen to the level of 1 kHz (and, respectively, of the 2S Lamb shift to several tenth of kHz) should be considered as a next stage of the theory. The theoretical error of the hyper"ne splitting in muonium should be reduced the theoretical error to about 10 Hz. Achievement of these goals will require hard work and a considerable resourcefulness, but results which years ago hardly seemed possible are now within reach.
Acknowledgements Many friends and colleagues for many years discussed with us the bound state problem, collaborated on di!erent projects, and shared with us their vision and insight. We are especially deeply grateful to the late D. Yennie and M. Samuel, to G. Adkins, M. Braun, M. Doncheski, R. Faustov, G. Drake, K. Jungmann, S. Karshenboim, Y. Khriplovich, T. Kinoshita, L. Labzowsky, P. Lepage, A. Martynenko, A. Milshtein, P. Mohr, D. Owen, K. Pachucki, V. Pal'chikov, J. Sapirstein, V. Shabaev, B. Taylor, and A. Yelkhovsky. This work was supported by the NSF grants PHY-9120102, PHY-9421408, and PHY-9900771.
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]
[11] [12] [13] [14] [15] [16]
W.E. Lamb Jr., R.C. Retherford, Phys. Rev. 72 (1947) 339. M.G. Boshier, P.E.G. Baird, C.J. Foot et al., Phys. Rev. A 40 (1989) 6169. Th. Udem, A. Huber, B. Gross et al., Phys. Rev. Lett. 79 (1997) 2646. F.G. Mariam, W. Beer, P.R. Bolton et al., Phys. Rev. Lett. 49 (1982) 993. W. Liu, M.G. Boshier, S. Dhawan et al., Phys. Rev. Lett. 82 (1999) 711. H.A. Bethe, E.E. Salpeter, Quantum Mechanics of One- and Two-Electron Atoms, Springer, Berlin, 1957. J.R. Sapirstein, D.R. Yennie, in: T. Kinoshita (Ed.), Quantum Electrodynamics, World Scienti"c, Singapore, 1990, p. 560. H. Grotch, Found. Phys. 24 (1994) 249. V.V. Dvoeglazov, Yu.N. Tyukhtyaev, R.N. Faustov, Fiz. Elem. Chastits At. Yadra 25 (1994) 144 [Phys. Part. Nucl. 25 (1994) 58]. M.I. Eides, New Developments in the Theory of Muonium Hyper"ne Splitting, in: H.M. Fried, B.M. Mueller (Eds.), Quantum Infrared Physics, Proceedings of the Paris Workshop on Quantum Infrared Physics, June 6}10, 1994, World Scienti"c, Singapore, 1995, p. 262. T. Kinoshita, Rep. Prog. Phys. 59 (1996) 3803. J. Sapirstein, in: G.W.F. Drake (Ed.), Atomic, Molecular and Optical Physics Handbook, AIP Press, 1996, p. 327. P.J. Mohr, in: G.W.F. Drake (Ed.), Atomic, Molecular and Optical Physics Handbook, AIP Press, New York, 1996, p. 341. K. Pachucki, D. Leibfried, M. Weitz, A. Huber, W. KoK nig, T.W. HaK nch, J. Phys. B 29 (1996) 177; B 29 (1996) 1573(E). T. Kinoshita, hep-ph/9808351, Cornell preprint, 1998. P.J. Mohr, G. Plunien, G. So!, Phys. Rep. 293 (1998) 227.
254 [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65]
M.I. Eides et al. / Physics Reports 342 (2001) 63}261 W.E. Caswell, G.P. Lepage, Phys. Lett. 167B (1986) 437. T. Kinoshita, M. Nio, Phys. Rev. D 53 (1996) 4909. J.D. Bjorken, S.D. Drell, Relativistic Quantum Mechanics, McGraw-Hill, New York, 1964. V.B. Berestetskii, E.M. Lifshitz, L.P. Pitaevskii, Quantum Electrodynamics, 2nd Edition, Pergamon Press, Oxford, 1982. E.E. Salpeter, H.A. Bethe, Phys. Rev. 84 (1951) 1232. C. Itzykson, J.-B. Zuber, Quantum Field Theory, McGraw-Hill, New York, 1980. F. Gross, Relativistic Quantum Mechanics and Field Theory, Wiley, New York, 1993. H. Grotch, D.R. Yennie, Zeitsch. Phys. 202 (1967) 425. H. Grotch, D.R. Yennie, Rev. Mod. Phys. 41 (1969) 350. F. Gross, Phys. Rev. 186 (1969) 1448. L.S. Dulyan, R.N. Faustov, Teor. Mat. Fiz. 22 (1975) 314 [Theor. Math. Phys. 22 (1975) 220]. P. Lepage, Phys. Rev. A 16 (1977) 863. E. Fermi, Z. Phys. 60 (1930) 320. G. Breit, Phys. Rev. 34 (1929) 553; ibid. 36 (1930) 383; ibid. 39 (1932) 616. W.A. Barker, F.N. Glover, Phys. Rev. 99 (1955) 317. H.A. Bethe, Phys. Rev. 72 (1947) 339. N.M. Kroll, W.E. Lamb, Phys. Rev. 75 (1949) 388. J.B. French, V.F. Weisskopf, Phys. Rev. 75 (1949) 1240. G. Bhatt, H. Grotch, Phys. Rev. A 31 (1985) 2794. G. Bhatt, H. Grotch, Ann. Phys. (NY) 178 (1987) 1. S.E. Haywood, J.D. Morgan III, Phys. Rev. A 32 (1985) 3179. G.W.F. Drake, R.A. Swainson, Phys. Rev. A 41 (1990) 1243. J. Schwinger, Phys. Rev. 73 (1948) 416. E.A. Uehling, Phys. Rev. 48 (1935) 55. J. Weneser, R. Bersohn, N.M. Kroll, Phys. Rev. 91 (1953) 1257. M.F. Soto, Phys. Rev. Lett. 17 (1966) 1153; Phys. Rev. A 2 (1970) 734. T. Appelquist, S.J. Brodsky, Phys. Rev. Lett. 24 (1970) 562; Phys. Rev. A 2 (1970) 2293. B.E. Lautrup, A. Peterman, E. de Rafael, Phys. Lett. 31B (1970) 577. R. Barbieri, J.A. Mignaco, E. Remiddi, Lett. Nuovo Cimento 3 (1970) 588. A. Peterman, Phys. Lett. 34B (1971) 507; ibid. 35B (1971) 325. J.A. Fox, Phys. Rev. D 3 (1971) 3228; ibid. D 4 (1971) 3229; ibid. D 5 (1972) 492. R. Barbieri, J.A. Mignaco, E. Remiddi, Nuovo Cimento A 6 (1971) 21. E.A. Kuraev, L.N. Lipatov, N.P. Merenkov, preprint LNPI 46, June 1973. R. Karplus, N.M. Kroll, Phys. Rev. 77 (1950) 536. A. Peterman, Helv. Phys. Acta 30 (1957) 407; Nucl. Phys. 3 (1957) 689. C.M. Sommer"eld, Phys. Rev. 107 (1957) 328; Ann. Phys. (NY) 5 (1958) 26. M. Baranger, F.J. Dyson, E.E. Salpeter, Phys. Rev. 88 (1952) 680. G. Kallen, A. Sabry, Kgl. Dan. Vidensk. Selsk. Mat.-Fis. Medd. 29 (17) (1955). J. Schwinger, Particles, Sources and Fields, Vol. 2, Addison-Wesley, Reading, MA, 1973. K. Melnikov, T. van Ritbergen, Phys. Rev. Lett. 84 (2000) 1673. T. Kinoshita, in: T. Kinoshita (Ed.), Quantum Electrodynamics, World Scienti"c, Singapore, 1990, p. 218. S. Laporta, E. Remiddi, Phys. Lett. B 379 (1996) 283. P.A. Baikov, D.J. Broadhurst, preprint OUT-4102-54, hep-ph 9504398, April 1995; in: B. Denby, D. Perret-Gallix (Eds.), Proceedings of New Computing Technique in Physics Research IV, World Scienti"c, Singapore, 1995. M.I. Eides, H. Grotch, Phys. Rev. A 52 (1995) 3360. S.G. Karshenboim, J. Phys. B: At. Mol. Opt. Phys. 28 (1995) L77. M.I. Eides, V.A. Shelyuto, Phys. Rev. A 52 (1995) 954. J.L. Friar, J. Martorell, D.W.L. Sprung, Phys. Rev. A 59 (1999) 4061. R. Karplus, A. Klein, J. Schwinger, Phys. Rev. 84 (1951) 597. R. Karplus, A. Klein, J. Schwinger, Phys. Rev. 86 (1952) 288.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261 [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112]
255
M. Baranger, Phys. Rev. 84 (1951) 866; M. Baranger, H.A. Bethe, R. Feynman, Phys. Rev. 92 (1953) 482. A.A. Abrikosov, Zh. Eksp. Teor. Fiz. 30 (1956) 96 [Sov. Phys.-JETP 3 (1956) 71]. H.M. Fried, D.R. Yennie, Phys. Rev. 112 (1958) 1391. S.G. Karshenboim, V.A. Shelyuto, M.I. Eides, Yad. Fiz. 47 (1988) 454 [Sov. J. Nucl. Phys. 47 (1988) 287]. M.I. Eides, H. Grotch, D.A. Owen, Phys. Lett. B 294 (1992) 115. K. Pachucki, Phys. Rev. A 48 (1993) 2609. S. Laporta, as cited in [71]. M.I. Eides, H. Grotch, Phys. Lett. B 301 (1993) 127. M.I. Eides, H. Grotch, Phys. Lett. B 308 (1993) 389. M.I. Eides, H. Grotch, V.A. Shelyuto, Phys. Rev. A 55 (1997) 2447. M.I. Eides, H. Grotch, P. Pebler, Phys. Lett. B 326 (1994) 197; Phys. Rev. A 50 (1994) 144. M.I. Eides, S.G. Karshenboim, V.A. Shelyuto, Phys. Lett. B 312 (1993) 358; Yad. Phys. 57 (1994) 1309 [Phys. Atom. Nuclei 57 (1994) 1240]. M.I. Eides, S.G. Karshenboim, V.A. Shelyuto, Yad. Phys. 57 (1994) 2246 [Phys. Atom. Nuclei 57 (1994) 2158]. K. Pachucki, Phys. Rev. Lett. 72 (1994) 3154. M.I. Eides, V.A. Shelyuto, Pis'ma Zh. Eksp. Teor. Fiz. 61 (1995) 465 [JETP Letters 61 (1995) 478]. A.J. Layzer, Phys. Rev. Lett. 4 (1960) 580; J. Math. Phys. 2 (1961) 292, 308. H.M. Fried, D.R. Yennie, Phys. Rev. Lett. 4 (1960) 583. G.W. Erickson, D.R. Yennie, Ann. Phys. (NY) 35 (1965) 271. G.W. Erickson, D.R. Yennie, Ann. Phys. (NY) 35 (1965) 447. G.W. Erickson, Phys. Rev. Lett. 27 (1971) 780. P.J. Mohr, Phys. Rev. Lett. 34 (1975) 1050; Phys. Rev. A 26 (1982) 2338. J.R. Sapirstein, Phys. Rev. Lett. 47 (1981) 1723. V.G. Pal'chikov, Metrologia 10 (1987) 3 (in Russian). L. Hostler, J. Math. Phys. 5 (1964) 1235. J. Schwinger, J. Math. Phys. 5 (1964) 1606. K. Pachucki, Phys. Rev. A 46 (1992) 648. K. Pachucki, Ann. Phys. (NY) 236 (1993) 1. S.S. Schweber, QED and the Men Who Made It, Princeton University Press, Princeton, 1994. E.E. Salpeter, Phys. Rev. 87 (1952) 553. T. Fulton, P.C. Martin, Phys. Rev. 95 (1954) 811. I.B. Khriplovich, A.I. Milstein, A.S. Yelkhovsky, Phys. Scr. T 46 (1993) 252. J.A. Fox, D.R. Yennie, Ann. Phys. (NY) 81 (1973) 438. K. Pachucki, as cited in [121]. U. Jentschura, K. Pachucki, Phys. Rev. A 54 (1996) 1853. U. Jentschura, G. So!, P.J. Mohr, Phys. Rev. A 56 (1997) 1739. R. Serber, Phys. Rev. 48 (1935) 49. E.H. Wichmann, N.M. Kroll, Phys. Rev. 101 (1956) 843. V.G. Ivanov, S.G. Karshenboim, Yad. Phys. 60 (1997) 333 [Phys. Atom. Nuclei 60 (1997) 270]. N.L. Manakov, A.A. Nekipelov, A.G. Fainstein, Zh. Eksp. Teor. Fiz. 95 (1989) 1167 [Sov. Phys.-JETP 68 (1989) 673]. M.H. Mittleman, Phys. Rev. 107 (1957) 1170. D.E. Zwanziger, Phys. Rev. 121 (1961) 1128. P.J. Mohr, At. Data Nucl. Data Tables 29 (1983) 453. P.J. Mohr, in: I.A. Sellin, D.J. Pegg (Eds.), Beam-Foil Spectroscopy, Plenum Press, New York, Vol. 1, 1976, p. 89. L.D. Landau, E.M. Lifshitz, Quantum Mechanics, 3rd Edition, Butterworth-Heinemann, Stoneham, MA, 1997. S.G. Karshenboim, Zh. Eksp. Teor. Fiz. 103 (1993) 1105 [JETP 76 (1993) 541]. S. Mallampalli, J. Sapirstein, Phys. Rev. Lett. 80 (1998) 5297. I. Goidenko, L. Labzowsky, A. Ne"odov et al., Phys. Rev. Lett. 83 (1999) 2312.
256 [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159]
M.I. Eides et al. / Physics Reports 342 (2001) 63}261 V.A. Yerokhin, SPB preprint, Phys. Rev. A 62 (2000) 012508. A.V. Manohar, I.W. Stewart, Phys. Rev. Lett. 85 (2000) 2248. S.G. Karshenboim, Zh. Eksp. Teor. Fiz. 109 (1996) 752 [JETP 82 (1996) 403]. S.G. Karshenboim, J. Phys. B: At. Mol. Opt. Phys. 29 (1996) L29. D.R. Yennie, S.C. Frautchi, H. Suura, Ann. Phys. (NY) 13 (1961) 379. S.G. Karshenboim, Z. Phys. D 36 (1996) 11. S.G. Karshenboim, Yad. Phys. 58 (1995) 707 [Phys. Atom. Nuclei 58 (1995) 649]. S.G. Karshenboim, Zh. Eksp. Teor. Fiz. 106 (1994) 414 [JETP 79 (1994) 230]. U. Jentschura, P.J. Mohr, G. So!, Phys. Rev. Lett. 82 (1999) 53. P.J. Mohr, Y.-K. Kim, Phys. Rev. A 45 (1992) 2727. P.J. Mohr, Phys. Rev. A 46 (1992) 4421. P.J. Mohr, Phys. Rev. A 44 (1991) R4089; Errata A 51 (1995) 3390. A. van Wijngaarden, J. Kwela, G.W.F. Drake, Phys. Rev. A 43 (1991) 3325. S.G. Karshenboim, Yad. Phys. 58 (1995) 309 [Phys. Atom. Nuclei 58 (1995) 262]. S.G. Karshenboim, Can. J. Phys. 76 (1998) 169; Zh. Eksp. Teor. Fiz. 116 (1999) 1575 [JETP 89 (1999) 850]. T. Welton, Phys. Rev. 74 (1948) 1157. M.I. Eides, S.G. Karshenboim, V.A. Shelyuto, Ann. Phys. (NY) 205 (1991) 231, 291. G.W. Erickson, J. Phys. Chem. Ref. Data 6 (1977) 831. G.W. Erickson, H. Grotch, Phys. Rev. Lett. 25 (1988) 2611; 63 (1989) 1326(E). M. Doncheski, H. Grotch, D.A. Owen, Phys. Rev. A 41 (1990) 2851. M. Doncheski, H. Grotch, G.W. Erickson, Phys. Rev. A 43 (1991) 2125. I.B. Khriplovich, A.I. Milstein, A.S. Yelkhovsky, Phys. Scr. T 46 (1993) 252. R.N. Fell, I.B. Khriplovich, A.I. Milstein, A.S. Yelkohovsky, Phys. Lett. A 181 (1993) 172. K. Pachucki, H. Grotch, Phys. Rev. A 51 (1995) 1854. M.A. Braun, Zh. Eksp. Teor. Fiz. 64 (1973) 413 [Sov. Phys.-JETP 37 (1973) 211]. V.M. Shabaev, Teor. Mat. Fiz. 63 (1985) 394 [Theor. Math. Phys. 63 (1985) 588]. A.S. Yelkhovsky, preprint Budker INP 94-27, hep-th/9403095 (1994). L.N. Labzowsky, Proceedings of the XVII All-Union Congress on Spectroscopy, Moscow, 1972, Part 2, p. 89. J.H. Epstein, S.T. Epstein, Am. J. Phys. 30 (1962) 266. V.M. Shabaev, J. Phys. B: At. Mol. Opt. Phys. B 24 (1991) 4479. A.S. Elkhovskii, Zh. Eksp. Teor. Fiz. 110 (1996) 431; JETP 83 (1996) 230. M.I. Eides, H. Grotch, Phys. Rev. A 55 (1997) 3351. A.S. Elkhovskii, Zh. Eksp. Teor. Fiz. 113 (1998) 865 [JETP 86 (1998) 472]. V.M. Shabaev, A.N. Artemyev, T. Beier et al., Phys. Rev. A 57 (1998) 4235; V.M. Shabaev, A.N. Artemyev, T. Beier, G. So!, J. Phys. B: At. Mol. Opt. Phys. B 31 (1998) L337. E.A. Golosov, A.S. Elkhovskii, A.I. Milshtein, I.B. Khriplovich, Zh. Eksp. Teor. Fiz. 107 (1995) 393 [JETP 80 (1995) 208]. I.B. Khriplovich, A.S. Yelkhovsky, Phys. Lett. 246B (1990) 520. K. Pachucki, S.G. Karshenboim, Phys. Rev. A 60 (1999) 2792. K. Melnikov, A.S. Yelkhovsky, Phys. Lett. 458B (1999) 143. G. Bhatt, H. Grotch, Phys. Rev. Lett. 58 (1987) 471. K. Pachucki, Phys. Rev. A 52 (1995) 1079. M.I. Eides, H. Grotch, V.A. Shelyuto, work in progress. M.I. Eides, H. Grotch, Phys. Rev. A 52 (1995) 1757. A.S. Yelkhovsky, preprint Budker INP 97-80, hep-ph/9710377 (1997). L.L. Foldy, Phys. Rev. 83 (1951) 688. D.J. Drickey, L.N. Hand, Phys. Rev. Lett. 9 (1962) 521; L.N. Hand, D.J. Miller, R. Wilson, Rev. Mod. Phys. 35 (1963) 335. G.G. Simon, Ch. Schmidt, F. Borkowski, V.H. Walther, Nucl. Phys. A 333 (1980) 381. R. Rosenfelder, Phys. Lett. B 479 (2000) 381.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
257
[160] I.B. Khriplovich, A.I. Milstein, R.A. Sen'kov, Phys. Lett. A221 (1996) 370; Zh. Eksp. Teor. Fiz. 111 (1997) 1935 [JETP 84 (1997) 1054]. [161] D.A. Owen, Found. Phys. 24 (1994) 273. [162] K. Pachucki, S.G. Karshenboim, J. Phys. B: At. Mol. Opt. Phys. 28 (1995) L221. [163] J.L. Friar, J. Martorell, D.W.L. Sprung, Phys. Rev. A 56 (1997) 4579. [164] L.A. Borisoglebsky, E.E. Tro"menko, Phys. Lett. 81B (1979) 175. [165] J.L. Friar, Ann. Phys. (NY) 122 (1979) 151. [166] C. Zemach, Phys. Rev. 104 (1956) 1771. [167] J.L. Friar, G.L. Payne, Phys. Rev. A 56 (1997) 5173. [168] R.N. Faustov, A.P. Martynenko, Samara State University preprint SSU-HEP-99/04, hep-ph/9904362, April 1999, Yad. Phys. 63 (2000) 915 [Phys. Atom. Nuclei 63, May 2000]. [169] T.E.O. Erickson, J. Hufner, Nucl. Phys. B 47 (1972) 205. [170] J. Bernabeu, C. Jarlskog, Nucl. Phys. B 60 (1973) 347. [171] J. Bernabeu, C. Jarlskog, Nucl. Phys. B 75 (1974) 59. [172] S.A. Startsev, V.A. Petrun'kin, A.L. Khomkin, Yad. Fiz. 23 (1976) 1233 [Sov. J. Nucl. Phys. 23 (1976) 656]. [173] J. Bernabeu, C. Jarlskog, Phys. Lett. 60B (1976) 197. [174] J.L. Friar, Phys. Rev. C 16 (1977) 1540. [175] R. Rosenfelder, Nucl. Phys. A 393 (1983) 301. [176] J. Bernabeu, T.E.O. Ericson, Z. Phys. A 309 (1983) 213. [177] I.B. Khriplovich, R.A. Sen'kov, Novosibirsk preprint, nucl-th/9704043, April 1997. [178] B.E. MacGibbon, G. Garino, M.A. Lucas et al., Phys. Rev. C 52 (1995) 2097. [179] I.B. Khriplovich, R.A. Sen'kov, Phys. Lett. A 249 (1998) 474. [180] I.B. Khriplovich, R.A. Sen'kov, Phys. Lett. B 481 (2000) 447. [181] R. Rosenfelder, Phys. Lett. B 463 (1999) 317. [182] D. Babusci, G. Giordano, G. Matone, Phys. Rev. C 57 (1998) 291. [183] J. Martorell, D.W. Sprung, D.C. Zheng, Phys. Rev. C 51 (1995) 1127. [184] A.I. Milshtein, I.B. Khriplovich, S.S. Petrosyan, Zh. Eksp. Teor. Fiz. 109 (1996) 1146 [JETP 82 (1996) 616]. [185] Y. Lu, R. Rosenfelder, Phys. Lett. B 319 (1993) 7; B 333 (1994) 564(E). [186] W. Leidemann, R. Rosenfelder, Phys. Rev. C 51 (1995) 427. [187] J.L. Friar, G.L. Payne, Phys. Rev. C 55 (1997) 2764. [188] J.L. Friar, G.L. Payne, Phys. Rev. C 56 (1997) 619. [189] I. Sick, D. Trautman, Nucl. Phys. A 637 (1998) 559. [190] E. Borie, Phys. Rev. Lett. 47 (1981) 568. [191] G.P. Lepage, D.R. Yennie, G.W. Erickson, Phys. Rev. Lett. 47 (1981) 1640. [192] M.I. Eides, H. Grotch, Phys. Rev. A 56 (1997) R2507. [193] J.L. Friar, Zeit. f. Physik A 292 (1979) 1; ibid. A 303 (1981) 84. [194] D.J. Hylton, Phys. Rev. A 32 (1985) 1303. [195] K. Pachucki, Phys. Rev. A 48 (1993) 120. [196] M.I. Eides, Phys. Rev. A 53 (1996) 2953. [197] J.A. Wheeler, Rev. Mod. Phys. 21 (1949) 133. [198] F. Kottmann et al., Proposal for an Experiment at PSI R-98-03, January, 1999. [199] A.D. Galanin, I.Ia. Pomeranchuk, Dokl. Akad. Nauk SSSR 86 (1952) 251. [200] L. Schi!, Quantum Mechanics, 3rd Edition, McGraw-Hill, New York, 1968. [201] A.B. Mickelwait, H.C. Corben, Phys. Rev. 96 (1954) 1145. [202] G.E. Pustovalov, Zh. Eksp. Teor. Fiz. 32 (1957) 1519 [Sov. Phys.-JETP 5 (1957) 1234]. [203] A. Di Giacomo, Nucl. Phys. B 11 (1969) 411. [204] R. Glauber, W. Rarita, P. Schwed, Phys. Rev. 120 (1960) 609. [205] J. Blomkwist, Nucl. Phys. B 48 (1972) 95. [206] K.-N. Huang, Phys. Rev. A 14 (1976) 1311. [207] T. Kinoshita, W.B. Lindquist, Phys. Rev. D 27 (1983) 853. [208] T. Kinoshita, W.B. Lindquist, Phys. Rev. D 27 (1983) 867.
258 [209] [210] [211] [212] [213] [214] [215] [216] [217] [218] [219] [220] [221] [222] [223] [224] [225] [226] [227] [228] [229] [230] [231] [232] [233] [234] [235] [236] [237] [238] [239] [240] [241] [242] [243] [244] [245] [246] [247] [248] [249] [250] [251] [252] [253] [254]
M.I. Eides et al. / Physics Reports 342 (2001) 63}261 T. Kinoshita, M. Nio, Phys. Rev. Lett. 82 (1999) 3240. B.J. Laurenzi, A. Flamberg, Int. J. Quantum Chem. 11 (1977) 869. K. Pachucki, Phys. Rev. A 53 (1996) 2092. K. Pachucki, Warsaw preprint, physics/99060002, June 1999. M.K. Sundaresan, P.J.S. Watson, Phys. Rev. Lett. 29 (1972) 15. E. Borie, G.A. Rinker, Rev. Mod. Phys. 54 (1982) 67. B. Fricke, Z. Phys. 218 (1969) 495. P. Vogel, At. Data Nucl. Data Tables 14 (1974) 599. G.A. Rinker, Phys. Rev. A 14 (1976) 18. E. Borie, G.A. Rinker, Phys. Rev. A 18 (1978) 324. M.-Y. Chen, Phys. Rev. Lett. 34 (1975) 341. L. Wilets, G.A. Rinker Jr., Phys. Rev. Lett. 34 (1975) 339. D.H. Fujimoto, Phys. Rev. Lett. 35 (1975) 341. E. Borie, Nucl. Phys. A 267 (1976) 485. J. Calmet, D.A. Owen, J. Phys. B 12 (1979) 169. R. Barbieri, M. Ca!o, E. Remiddi, Lett. Nuovo Cimento 7 (1973) 60. H. Suura, E. Wichmann, Phys. Rev. 105 (1957) 1930. A. Peterman, Phys. Rev. 105 (1957) 1931. H.H. Elend, Phys. Lett. 20 (1966) 682; Errata 21 (1966) 720. G. Erickson, H.H. Liu, preprint UCD-CNL-81 (1968). E. Borie, Helv. Phys. Acta 48 (1975) 671. V.N. Folomeshkin, Yad. Fiz. 19 (1974) 1157 [Sov. J. Nucl. Phys. 19 (1974) 592]. M.K. Sundaresan, P.J.S. Watson, Phys. Rev. D 11 (1975) 230. V.P. Gerdt, A. Karimkhodzhaev, R.N. Faustov, Proceedings of the International Workshop on High Energy Physics and Quantum Filed Theory, 1978, p. 289. E. Borie, Z. Phys. A 302 (1981) 187. R.N. Faustov, A.P. Martynenko, Samara State University preprint SSU-HEP-99/07, hep-ph/9906315, June 1999. G. Breit, Phys. Rev. 35 (1930) 1477. N. Kroll, F. Pollock, Phys. Rev. 84 (1951) 594; ibid. 86 (1952) 876. R. Karplus, A. Klein, Phys. Rev. 85 (1952) 972. M.I. Eides, S.G. Karshenboim, V.A. Shelyuto, Phys. Lett. 229B (1989) 285; Pis'ma Zh. Eksp. Teor. Fiz. 50 (1989) 3 [JETP Lett. 50 (1989) 1]; Yad. Fiz. 50 (1989) 1636 [Sov. J. Nucl. Phys. 50 (1989) 1015]. E.A. Terray, D.R. Yennie, Phys. Rev. Lett. 48 (1982) 1803. J.R. Sapirstein, E.A. Terray, D.R. Yennie, Phys. Rev. D 29 (1984) 2290. M.I. Eides, S.G. Karshenboim, V.A. Shelyuto, Phys. Lett. 249B (1990) 519; Pis'ma Zh. Eksp. Teor. Fiz. 52 (1990) 937 [JETP Lett. 52 (1990) 317]. M.I. Eides, S.G. Karshenboim, V.A. Shelyuto, Phys. Lett. 268B (1991) 433; 316B (1993) 631 (E); 319B (1993) 545 (E); Yad. Fiz. 55 (1992) 466; 57 (1994) 1343 (E) [Sov. J. Nucl. Phys. 55 (1992) 257; 57 (1994) 1275 (E)]. T. Kinoshita, M. Nio, Phys. Rev. Lett. 72 (1994) 3803. A. Layzer, Nuovo Cim. 33 (1964) 1538. D. Zwanziger, Nuovo Cim. 34 (1964) 77. S.J. Brodsky, G.W. Erickson, Phys. Rev. 148 (1966) 148. J.R. Sapirstein, Phys. Rev. Lett. 51 (1983) 985. K. Pachucki, Phys. Rev. A 54 (1996) 1994. T. Kinoshita, M. Nio, Phys. Rev. D 55 (1996) 7267. S.J. Brodsky, G.W. Erickson, Phys. Rev. 148 (1966) 26. J.R. Sapirstein, unpublished, as cited in [243]. S.J. Brodsky, unpublished, as cited in [249]. S.M. Schneider, W. Greiner, G. So!, Phys. Rev. A 50 (1994) 118. P. Lepage, unpublished, as cited in [243].
M.I. Eides et al. / Physics Reports 342 (2001) 63}261 [255] [256] [257] [258] [259] [260] [261] [262] [263] [264] [265] [266] [267] [268] [269] [270] [271]
[272] [273] [274] [275] [276] [277] [278] [279] [280] [281] [282] [283] [284] [285] [286] [287] [288] [289] [290] [291] [292] [293] [294] [295] [296] [297] [298] [299]
259
H. Persson, S.M. Schneider, W. Greiner et al., Phys. Rev. Lett. 76 (1996) 1433. S.A. Blundell, K.T. Cheng, J. Sapirstein, Phys. Rev. Lett. 78 (1997) 4914. P. Sinnergen, H. Persson, S. Salomoson et al., Phys. Rev. A 58 (1998) 1055. R. Arnowitt, Phys. Rev. 92 (1953) 1002. W.A. Newcomb, E.E. Salpeter, Phys. Rev. 97 (1955) 1146. G.T. Bodwin, D.R. Yennie, M.A. Gregorio, Phys. Rev. Lett. 41 (1978) 1088. W.E. Caswell, G.P. Lepage, Phys. Rev. Lett. 41 (1978) 1092. T. Fulton, D.A. Owen, W.W. Repko, Phys. Rev. Lett. 26 (1971) 61. G.T. Bodwin, D.R. Yennie, Phys. Rep. 43C (1978) 267. M.M. Sternheim, Phys. Rev. 130 (1963) 211. W.E. Caswell, G.P. Lepage, Phys. Rev. A 18 (1978) 810. G.T. Bodwin, D.R. Yennie, M.A. Gregorio, Phys. Rev. Lett. 48 (1982) 1799. G.T. Bodwin, D.R. Yennie, M.A. Gregorio, Rev. Mod Phys. 57 (1985) 723. S.G. Karshenboim, M.I. Eides, V.A. Shelyuto, Yad. Fiz. 47 (1988) 454 [Sov. J. Nucl. Phys. 47 (1988) 287]; Yad. Fiz. 48 (1988) 769 [Sov. J. Nucl. Phys. 48 (1988) 490]. M.I. Eides, S.G. Karshenboim, V.A. Shelyuto, Phys. Lett. 216B (1989) 405; Yad. Fiz. 49 (1989) 493 [Sov. J. Nucl. Phys. 49 (1989) 309]. V.Yu. Brook, M.I. Eides, S.G. Karshenboim, V.A. Shelyuto, Phys. Lett. 216B (1989) 401. M.I. Eides, S.G. Karshenboim, V.A. Shelyuto, Phys. Lett. 177B (1986) 425; Yad. Fiz. 44 (1986) 1118 [Sov. J. Nucl. Phys. 44 (1986) 723]; Zh. Eksp. Teor. Fiz. 92 (1987) 1188 [Sov. Phys.-JETP 65 (1987) 664]; Yad. Fiz. 48 (1988) 1039 [Sov. J. Nucl. Phys. 48 (1988) 661]. J.R. Sapirstein, E.A. Terray, D.R. Yennie, Phys. Rev. Lett. 51 (1983) 982. M.I. Eides, S.G. Karshenboim, V.A. Shelyuto, Phys. Lett. 202B (1988) 572; Zh. Eksp. Teor. Fiz. 94 (1988) 42 [Sov. Phys.-JETP 67 (1988) 671]. A. Karimkhodzhaev, R.N. Faustov, Sov. J. Nucl. Phys. 53 (1991) 1012 [Sov. J. Nucl. Phys. 53 (1991) 626]. R.N. Faustov, A. Karimkhodzhaev, A.P. Martynenko, Phys. Rev. A 59 (1999) 2498; Yad. Phys. 62 (1999) 2284 [Phys. Atom. Nuclei 62 (1999) 2103]. M.I. Eides, V.A. Shelyuto, Phys. Lett. 146B (1984) 241. S.G. Karshenboim, M.I. Eides, V.A. Shelyuto, Yad. Fiz. 52 (1990) 1066 [Sov. J. Nucl. Phys. 52 (1990) 679]. S.L. Adler, Phys. Rev. 177 (1969) 2426. G. Li, M.A. Samuel, M.I. Eides, Phys. Rev. A 47 (1993) 876. M.I. Eides, H. Grotch, V.A. Shelyuto, Phys. Rev. D 58 (1998) 013008. J. Barclay Adams, Phys. Rev. 139 (1965) B 1050. M.A. Beg, G. Feinberg, Phys. Rev. Lett. 33 (1974) 606; 35 (1975) 130(E). W.W. Repko, Phys. Rev. D 7 (1973) 279. H. Grotch, Phys. Rev. D 9 (1974) 311. R. Alcotra, J.A. Grifols, Ann. Phys. (NY) 229 (1993) 109. M.I. Eides, Phys. Rev. A 53 (1996) 2953. H. Hellwig, R.F.C. Vessot, M.W. Levine et al., IEEE Trans. IM-19 (1970) 200. L. Essen, R.W. Donaldson, M.J. Bangham et al., Nature 229 (1971) 110. G.T. Bodwin, D.R. Yennie, Phys. Rev. D 37 (1988) 498. S.G. Karshenboim, Phys. Lett. A 225 (1997) 97. S.D. Drell, J.D. Sullivan, Phys. Rev. 154 (1967) 1477. C.K. Iddings, P.M. Platzman, Phys. Rev. 113 (1959) 192. C.K. Iddings, Phys. Rev. 138 (1965) B 446. C.K. Iddings, P.M. Platzman, Phys. Rev. 115 (1959) 919. A. Verganalakis, D. Zwanziger, Nuovo Cimento 39 (1965) 613. F. Guerin, Nuovo Cimento A 50 (1967) 1. G.M. Zinov'ev, B.V. Struminskii, R.N. Faustov et al., Yad. Fiz. 11 (1970) 1284 [Sov. J. Nucl. Phys. 11 (1970) 715]. R.N. Faustov, A.P. Martynenko, V.A. Saleev, Yad. Phys. 62 (1999) 2280 [Phys. Atom. Nuclei 62 (1999) 2099]. E. de Rafael, Phys. Lett. 37B (1971) 201.
260 [300] [301] [302] [303] [304] [305] [306] [307] [308] [309] [310] [311] [312] [313] [314] [315] [316] [317] [318] [319] [320] [321] [322] [323] [324] [325] [326] [327] [328] [329] [330] [331]
[332] [333] [334] [335] [336] [337] [338] [339] [340] [341] [342] [343] [344] [345]
M.I. Eides et al. / Physics Reports 342 (2001) 63}261 P. GnaK dig, J. Kuti, Phys. Lett. 42B (1972) 241. V.W. Hughes, J. Kuti, Ann. Rev. Nucl. Part. Sci. 33 (1983) 611. E.E. Tro"menko, Phys. Lett. 73A (1979) 383. J.W. Heberle, H.A. Reich, P. Kusch, Phys. Rev. 101 (1956) 612. N.E. Rothery, E.A. Hessels, Phys. Rev. A 61 (2000) 044501. J.W. Heberle, H.A. Reich, P. Kusch, Phys. Rev. 104 (1956) 1585. M.H. Prior, E.C. Wang, Phys. Rev. A 16 (1977) 6. S.R. Lundeen, P.E. Jessop, F.M. Pipkin, Phys. Rev. Lett. 34 (1975) 377. N.F. Ramsey, in: T. Kinoshita (Ed.), Quantum Electrodynamics, World Scienti"c, Singapore, 1990, p. 673. M.M. Sternheim, Phys. Rev. 138 (1965) B 430. S.V. Romanov, Z. Phys. D 28 (1993) 7. C. Schwob, L. Jozefovski, B. de Beauvoir et al., Phys. Rev. Lett. 82 (1999) 4960. V.W. Hughes, T. Kinoshita, Rev. Mod. Phys. 71 (1999) S133. D.L. Farnham, R.S. Van Dyck Jr., P.B. Schwinberg, Phys Rev. Lett. 75 (1995) 3598. G. Audi, A.H. Wapstra, Nucl. Phys. A 565 (1993) 1. Yu.L. Sokolov, V.P. Yakovlev, Zh. Eksp. Teor. Fiz. 83 (1982) 15 [Sov. Phys.-JETP 56 (1982) 7]; V.G. Palchikov, Yu.L. Sokolov, V.P. Yakovlev, Pis'ma Zh. Eksp. Teor. Fiz. 38 (1983) 347 [JETP Letters 38 (1983) 418]. E.W. Hagley, F.M. Pipkin, Phys. Rev. Lett. 72 (1994) 1172. V.G. Palchikov, Yu.L. Sokolov, V.P. Yakovlev, Phys. Scr. 55 (1997) 33. S.G. Karshenboim, Phys. Scr. 57 (1998) 213. G. Newton, D.A. Andrews, P.J. Unsworth, Philos. Trans. Roy. Soc. London 290 (1979) 373. S.R. Lundeen, F.M. Pipkin, Phys. Rev. Lett. 46 (1981) 232; Metrologia 22 (1986) 9. A. van Wijngaarden, F. Holuj, G.W.F. Drake, Can. J. Phys. 76 (1998) 95. B. de Beauvoir, F. Nez, L. Julien et al., Phys. Rev. Lett. 78 (1997) 440. M. Weitz, A. Huber, F. Schmidt-Kaler et al., Phys. Rev. A 52 (1995) 2664. D.J. Berkeland, E.A. Hinds, M.G. Boshier, Phys. Rev. Lett. 75 (1995) 2470. S. Bourzeix, B. de Beauvoir, F. Nez et al., Phys. Rev. Lett. 76 (1996) 384. V.V. Ezhela, B.V. Polishcuk, Protvino preprint IHEP 99-48, hep-ph/9912401. A. Huber, Th. Udem, B. Gross et al., Phys. Rev. Lett. 80 (1998) 468. S. Klarsfeld, J. Martorell, J.A. Oteo et al., Nucl. Phys. A 456 (1986) 373. T. Herrmann, R. Rosenfelder, Eur. Phys. J. A 2 (1998) 29. F. Schmidt-Kaler, D. Leibfried, M. Weitz et al., Phys. Rev. Lett. 70 (1993) 2261. A. van Wijngaarden, F. Holuj, G.W.F. Drake, Post-deadline abstract submitted to DAMOP Meeting, Storrs CT, June 14}17, 2000; G.W.F. Drake, A. van Wijngaarden, abstract submitted to ICAP Hydrogen Atom II Satellite Meeting, Tuscany, June 1}3, 2000, to be published. T. Andreae, W. Konig, R. Wynands et al., Phys. Rev. Lett. 69 (1992) 1923. F. Nez, M.D. Plimmer, S. Bourzeix et al., Phys. Rev. Lett. 69 (1992) 2326. F. Nez, M.D. Plimmer, S. Bourzeix et al., Europhys. Lett. 24 (1993) 635. M. Weitz, A. Huber, F. Schmidt-Kaler et al., Phys. Rev. Lett. 72 (1994) 328. S. Chu, A.P. Mills Jr., A.G. Yodth et al., Phys. Rev. Lett. 60 (1988) 101; K. Danzman, M.S. Fee, S. Chu, Phys. Rev. A 39 (1989) 6072. K. Jungmann, P.E.G. Baird, J.R.M. Barr et al., Z. Phys. D 21 (1991) 241. F. Maas, B. Braun, H. Geerds et al., Phys. Lett. A 187 (1994) 247. V. Meyer, S.N. Bagaev, P.E.G. Baird et al., Phys. Rev. Lett. 84 (2000) 1136. A. Bertin, G. Carboni, J. Duclos et al., Phys. Lett. B 55 (1975). G. Carboni, U. Gastaldi, G. Neri et al., Nuovo Cimento A 34 (1976) 493. G. Carboni, G. Gorini, G. Torelli et al., Nucl. Phys. A 278 (1977) 381. G. Carboni, G. Gorini, E. Iacopini et al., Phys. Lett. B 73 (1978) 229. P. Hauser, H.P. von Arb, A. Biancchetti et al., Phys. Rev. A 46 (1992) 2363. D. Bakalov et al., Proceedings of the III International Symposium on Weak and Electromagnetic Interactions in Nuclei (WEIN-92) Dubna, Russia, 1992, pp. 656}662.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261 [346] [347] [348] [349] [350] [351] [352]
R.N. Faustov, A.P. Martynenko, Samara State University preprint SSU-HEP-97/03, hep-ph/9709374. D.J. Wineland, N.F. Ramsey, Phys. Rev. A 5 (1972) 821. A. Bohr, Phys. Rev. 73 (1948) 1109. F.E. Low, Phys. Rev. 77 (1950) 361. F.E. Low, E.E. Salpeter, Phys. Rev. 83 (1951) 478. D.A. Greenberg, H.M. Foley, Phys. Rev. 120 (1960) 1684. K.P. Jungmann, `Muoniuma, preprint physics/9809020, September 1998.
261
Physics Reports 342 (2001) 263}392
Techniques of replica symmetry breaking and the storage problem of the McCulloch}Pitts neuron G. GyoK rgyi Institute of Theoretical Physics, Eo( tvo( s University, 1518 Budapest, Pf. 32, Hungary Received May 2000 editor: I. Procaccia Contents 1. Introduction and overview 1.1. Introduction 1.2. Overview 2. Arti"cial neural networks and spin glasses 2.1. The McCulloch}Pitts neuron and perceptrons 2.2. Associative memory 2.3. Sherrington}Kirkpatrick model 2.4. Little}Hop"eld network 2.5. Pattern storage by a single neuron 2.6. Training, error measures, and retrieval 2.7. Multi-layer perceptrons 3. Statistical mechanics of pattern storage 3.1. The model 3.2. Thermodynamics 3.3. Spherical and independently distributed synapses 3.4. Neural stabilities, errors, and overlaps 4. The Parisi solution 4.1. Finite replica symmetry breaking 4.2. Finite and continuous replica symmetry breaking 5. Correlations and thermodynamical stability 5.1. Expectation values 5.2. Variations of the Parisi term 5.3. The Hessian matrix 6. Interpretation and special properties 6.1. Physical meaning of x(q)
266 266 269 270 271 272 273 279 281 285 288 290 290 292 293 295 296 296 303 314 314 324 327 331 331
6.2. Diagonalization of a Parisi matrix 6.3. Symmetries of Parisi's PDE 6.4. Spherical entropic term: a solvable case of Parisi's PDE 6.5. Small "eld expansion 7. The neuron: spherical synapses 7.1. General results 7.2. The special error measure h(i!y) 8. The neuron: independently distributed synapses 8.1. Free energy and stationarity condition 8.2. Variational principle 8.3. On thermodynamical stability 9. Conclusions and outlook Acknowledgements Appendix A. Abbreviations Appendix B. Derivation of the replica free energy Appendix C. Derivation of the R-RSB free energy term Appendix D. Derivation of the PPDE by continuation Appendix E. Multidimensional generalization of the PPDE Appendix F. An identity between Green functions Appendix G. PDEs for high temperature Appendix H. Longitudinal stability for high temperatures References
E-mail address:
[email protected] (G. GyoK rgyi). 0370-1573/01/$ - see front matter 2001 Elsevier Science B.V. All rights reserved. PII: S 0 3 7 0 - 1 5 7 3 ( 0 0 ) 0 0 0 7 3 - 9
332 334 335 337 340 340 358 370 370 372 373 374 375 376 376 378 379 380 383 383 384 385
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
265
Abstract In this article we review the framework for spontaneous replica symmetry breaking. Subsequently that is applied to the example of the statistical mechanical description of the storage properties of a McCulloch}Pitts neuron, i.e., simple perceptron. It is shown that in the neuron problem, the general formula that is at the core of all problems admitting Parisi's replica symmetry breaking ansatz with a one-component order parameter appears. The details of Parisi's method are reviewed extensively, with regard to the wide range of systems where the method may be applied. Parisi's partial di!erential equation and related di!erential equations are discussed, and the Green function technique is introduced for the calculation of replica averages, the key to determining the averages of physical quantities. The Green function of the Fokker} Planck equation due to Sompolinsky turns out to play the role of the statistical mechanical Green function in the graph rules for replica correlators. The subsequently obtained graph rules involve only tree graphs, as appropriate for a mean-"eld-like model. The lowest order Ward}Takahashi identity is recovered analytically and shown to lead to the Goldstone modes in continuous replica symmetry breaking phases. The need for a replica symmetry breaking theory in the storage problem of the neuron has arisen due to the thermodynamical instability of formerly given solutions. Variational forms for the neuron's free energy are derived in terms of the order parameter function x(q), for di!erent prior distribution of synapses. Analytically in the high temperature limit and numerically in generic cases various phases are identi"ed, among them is one similar to the Parisi phase in long-range interaction spin glasses. Extensive quantities like the error per pattern change slightly with respect to the known unstable solutions, but there is a signi"cant di!erence in the distribution of non-extensive quantities like the synaptic overlaps and the pattern storage stability parameter. A simulation result is also reviewed and compared with the prediction of the theory. 2001 Elsevier Science B.V. All rights reserved. PACS: 07.05.Mh; 61.43.!j; 75.10Nr; 84.35.#i Keywords: Neural networks; Pattern storage; Spin glasses; Replica symmetry breaking
266
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
1. Introduction and overviewH 1.1. Introduction In the past one-and-a-half decade, statistical physical methods yielded a rich harvest in theoretical and practical results in the exploration of arti"cial neural network models. In contrast to more traditional mathematical approaches, such as combinatorics, statistical data analysis, graph theory, or mathematical learning theory, the main emphasis in statistical physics lies on interconnected model neurons, considered as a physical many-body problem, in the limit of large number of variables. The latter property renders the problems similar to statistical mechanical systems in the thermodynamical limit, that is, when the number of particles is very large. This does not necessarily mean large number of units in a neural network, the thermodynamic limit applies also in the case of a single neuron if the number of adjustable variables, the analog of synaptic strengths of biological neurons, is su$ciently large. A much studied type of network is constructed from the McCulloch}Pitts model neuron [1], called also single-layer, or simple, perceptron if it is operating alone as a single unit [2]. In this paper we will examine the single model neuron's ability to store, i.e., to memorize, patterns, crucial for the understanding of networked systems. The paper is strictly about the arti"cial model neuron, and does not imply biological relevance. However, the notions neuron and synapses, the latter designating coupling strength parameters, are biologically inspired, and will use them throughout. We shall apply the statistical mechanical framework introduced by Gardner and Derrida [3}5] in 1988}89, which gave birth to a sub"eld of the theory of neural networks. Since then, the McCulloch}Pitts neuron has become well understood below the storage capacity, where patterns, or, examples, can be perfectly stored. The region beyond it, however, remained the subject of continuous research [6}12]. If the number of patterns exceeds the capacity then there is no way of storing all of them. One possible approach beyond capacity is to choose a quantity to be optimized. Examples for such a quantity are the stability of the patterns } in other words, their resistance to errors during retrieval } , or, the number of correctly stored patterns irrespective of their stability. Such problems can be formulated by means of a cost, or, energy function, giving rise to a statistical mechanical system. In the case of minimization of the number of incorrectly stored patterns, di$culties have arisen on every front where the problem was attacked. On the one hand, the analytical method inherited from spin glass research is no longer applicable in its simplest form, that is, the so-called replica symmetric (RS) ansatz breaks down. On the other hand, near and beyond capacity numerical algorithms begin to require excessive computational power. The physical picture behind that is the roughening of the landscape of the cost function the algorithms try to minimize. Phases of similar complexity, wherein the optimum-"nding algorithm, the analog of the dynamics in the statistical mechanical system, slows down to the extent that can be considered as breakdown of ergodicity, were observed in combinatorial optimization problems and still keep eluding analysis [13}15]. Several empirically hard optimization problems [13], including minimization of error beyond storage capacity for the McCulloch}Pitts neuron [16], are known to belong to the so-called non-deterministic polynomial (NP) complete class. It is of signi"cance, if by means of statistical physical methods some properties of the energy, or, free energy, landscape of NP-complete problems can be clari"ed. The statistical physical equivalent of a few NP-complete
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
267
systems were shown, in averaged thermal equilibrium, to exhibit spin-glass-like behavior [14]. That gives rise to the belief that there may be a general connection between NP-completeness and spin glass behavior. Thus the identi"cation and description of such thermodynamic phases may be instructive from the algorithmic viewpoint as well. It should be emphasized that NP-complete optimization problems are of diverse origin and many of their quantitative properties show little resemblance. Accordingly, those reformulated as statistical mechanical systems exhibit di!erent thermodynamic behavior, e.g., in averaged equilibrium have di!erent phase diagrams. Nevertheless, by the notation of glassy phases statistical physics may provide us with a common concept for understanding at least some ingredients of NP-completeness. It is the region beyond capacity of a single McCulloch}Pitts neuron that we claim to uncover in the present paper, within the averaged statistical mechanical description of thermal equilibrium. While the theoretical framework is in some respects di!erent from, rather a generalization of, the techniques applied to the Ising spin glass, we can now reinforce the so far vague expectation about the appearance of a spin glass phase and deliver quantitative results. Networks beyond saturation are long known to have complex features, here we demonstrate that even a single neuron can exhibit extreme complexity. The present article grew out of the work with P. Reimann, presented in a letter [17]. A more extended article, still in many respects a summary of the main results, has been accepted for publication [18]. The emphasis in the present paper is twofold. On the one hand, we give a comprehensive review of the technical details of the replica symmetry breaking theory, including the so-called continuous replica symmetry breaking. In the core is Parisi's original theory, which is here technically generalized to incorporate also the neuron problem. Furthermore, several extensions of the theory are introduced here that are applicable also to spin glasses. Along the mathematical parts an educative and self-contained line of reasoning is favored over a terse style. By that we would like to "ll a hiatus in the literature on the theory of disordered systems in that we present the technical details to those wishing to understand Parisi's method and possibly also wishing to use it to other problems. On the other hand, we apply the theory to the storage problem of a single neuron. Since the "rst statistical mechanical approach to this question, several other neural functions have been treated by statistical mechanical methods, and some of those may be more important for applications than pattern storage. However, even storage represents a strong theoretical challenge. Beside the Little}Hop"eld model of auto-associative memory [19], it can be considered as a point of entry of the statistical mechanical approach into hard problems in the "eld of arti"cial neural networks, and may open the way for further applications. On the technical side, the paper is centered about Parisi's method, successful in solving the mean equilibrium properties of the in"nite range interaction Ising spin glass, the Sherrington}Kirkpatrick model [14]. It turns out that after some generalization [17,18] of the original method [20}23], this becomes adaptable to the statistical mechanical formulation by Gardner and Derrida [4,5] of the neuron problem. The single neuron with a general cost function, i.e., error measure, was introduced by Griniasty and Gutfreund [6] and they called it potential. We show that this model will give rise to the most general term that admits Parisi's solution with one order parameter function. Under Parisi solution we understand for now the hierarchical structure of the order parameter matrix that gives rise to the nonlinear partial di!erential equation introduced by Parisi in an auxiliary role, allowing continuous replica symmetry breaking.
268
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
We would like to point out that all systems studied by means of the Parisi ansatz with one order parameter matrix, like the multi-spin interaction Ising [24] and the Potts glass near criticality [25], contained as special cases the aforementioned general term. Therefore, our results about the Parisi solution of the neuron, go well beyond the scope of neural computation. Here we call the reader's attention to the fact that Parisi's method has been applied to the study of metastable states in the Sherrington}Kirkpatrick model [26], where in fact three order parameter matrices emerged. That work indicates how the continuous replica symmetry breaking solution is to be obtained there and implicitly suggests the generalization we outline in this paper. Beyond giving a comprehensive account of Parisi's framework, we shall perform a concrete "eld theoretical study, including the calculation of averages, graph rules involving the Green functions for the evaluation of correlation functions, analytic derivation of a Ward}Takahashi identity, and integral expressions for the generalized susceptibilities necessary to determine thermodynamic stability of the solution. The insightful works about the second- [27] and higher-order correlations [28] of the magnetization in the continuous replica symmetry breaking phase of the Sherrington} Kirkpatrick model present concrete examples for "eld expectation values. These are generalized by our formulation in this article, new even in the context of spin glass problems. With the notable exception of the Sherrington}Kirkpatrick model and the formally analogous Little} Hop"eld system, where the low temperature phase was also extensively described [27}31], most studies of long-range interaction disordered systems concerned the region near criticality. The framework we present here is naturally designed for application deeply within the glassy phase. The di!erences between the Sherrington}Kirkpatrick and neuron models are obvious at "rst sight. The former is an Ising-type system, with a multiplicative two-spin interaction. In contrast, our main focus here is a spherical model-neuron, i.e., whose microstates are characterized by the synaptic couplings, continuous and arbitrary up to an overall normalization factor. The interaction between synapses is mediated by the error measure potential of Griniasty}Gutfreund [6], a function arbitrary to a large extent. In this light one may "nd the close analogy between disordered spin systems and the neuron model somewhat surprising. The similarity becomes, however, apparent when the statistical mechanical system is reduced to a variational problem in terms of a single order parameter function. Such have been available for the Sherrington}Kirkpatrick model, whereas we have constructed one for the single neuron. The variational framework is brief, it allows a quick derivation of the stationarity relations, gives account of thermodynamic stability in a subspace called longitudinal, and is of help in numerical computations. The di!erences between the Sherrington}Kirkpatrick and the neuron problems may be small in the variational free energy formula, but are still the cause of technical complications for the neuron problem. The physical reasons are that, "rstly, the neuron does not possess the spin #ip symmetry of the spin glass without external "eld, secondly, the neuron's error measure potential is more complicated than the multiplicative spin exchange energy term. Thus a few special properties of the Sherrington} Kirkpatrick model that allowed for some analytic results and simpli"ed numerics [27,29] are absent in the neuron. Generically similar complications may arise in other spin glass variants, so the much-studied Sherrington}Kirkpatrick model is to be considered as a rather special, simple case. It is worth mentioning brie#y two important areas among the many we do not treat in this paper. First and foremost, we do not discuss here the dynamical evolution of disordered systems. Since the ground-breaking early works on the dynamics of the Sherrington}Kirkpatrick model by
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
269
Sompolinsky and Zippelius [32}35], and the path-integral formulation for Ising spins by Sommers [36], many aspects of the dynamics of disordered systems have been clari"ed. They proved essential for the understanding also of numerical algorithms. However, one has to reckon that even averaged, stable equilibrium properties of complex phases of disordered systems are still far from clari"ed. The existence of many metastable states, the signature of glassy systems, and the ensuing complex nature of dynamical evolution, often termed as breakdown of ergodicity, puts in doubt even the existence of thermal equilibrium. On several model systems, however, extensive numerical simulations have demonstrated that equilibrium properties, averaged over the quenched disorder, can carry physical meaning. These properties are the subject of the present article. Secondly, from the viewpoint of mathematical rigor, the replica method raises many a question that we leave unanswered. In fact, quite a few scientists view this method with suspicion, partly because the limit of `zero number of replicated systemsa may seem to violate physical intuition. However, the large number of simulations con"rming replica symmetric solutions, and the fewer ones supporting replica symmetry breaking, as well as the absence of numerical results outright disproving the theory to this date, should provide ground for con"dence. Theoretical physics often employs methods of seemingly shaky mathematical foundations, whose con"rmation may come from comparison with real or numerical experiments. Such a con"rmation may then trigger rigorous clari"cation.
1.2. Overview Here we give a review of what subsequent sections are about. Section 2 introduces some fundamental concepts and gives a brief historical review on neural modeling and, to a very basic extent, on the Sherrington}Kirkpatrick model of spin glasses. In Section 3 the single McCulloch}Pitts neuron is described as a statistical mechanical system following Gardner and Derrida [4,5] and Griniasty and Gutfreund [6]. Pattern storage is interpreted as an optimization problem in the space of synaptic coupling strengths, and the ensuing thermodynamic picture is outlined. The replica free energy for various prior distributions of synapses is derived, such as the spherical constraint as well as arbitrary distribution of independent synapses. Highlighted is the central role of the neural local stability parameter, whose distribution gives through a simple formula the average error. Most of this section recites known concepts, with a few new details. Sections 4}6 are devoted to the Parisi solution. We start out from the `harda term in the replica free energy of the neuron, that can be considered as a generalization of free energy terms emerging from the classic long-range interaction, disordered, spin problems. In Section 4 a comprehensive presentation of the Parisi solution is given, including the derivation of Parisi's partial di!erential equation. It is demonstrated that this equation incorporates all "nite replica symmetry breaking ansaK tze, besides continuous replica symmetry breaking. Parisi's partial di!erential equation gives rise to a collection of related partial di!erential equations, they are reviewed here, and several useful Green functions are presented, among them prominently the Green function for Parisi's partial di!erential equation. Section 5 contains new results such as analytic expressions for expectation values and correlation functions of replica variables. The eigenvalues of the Hessian of the replica free energy are discussed, determining thermodynamic stability. The Green function of Parisi's partial di!erential equation turns out to be the "eld theoretical Green function that correlators are composed of,
270
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
and allows the introduction of a graph technique. Section 6 discusses a few aspects of the Parisi solution and two particular cases where Parisi's partial di!erential equation can be explicitly solved. We return to the special problem of the model neuron in Sections 7 and 8, and apply the rather abstract results of the preceding sections to it. The case of continuous synapses with the spherical constraint, including the conditions of stationarity and thermodynamic stability is analyzed in detail in Section 7. In the limit of high temperature and large number of patterns the formalism becomes easily manageable, while exhibiting a nontrivial phase diagram with three di!erent glassy states. This section contains our variational approach, the main result being a variational free energy whence thermodynamic properties can be straightforwardly derived and numerically explored. By means of the various partial di!erential equations several relations about the stationary state are uncovered. The scaling required by the low temperature phase is described. The variational free energy is numerically evaluated for several characteristic parameter settings, together with the order parameter function and the probability density of local stabilities. Previous simulation data [37] were improved upon in Ref. [18], whence we redisplay the comparison of simulation results with the theoretical prediction. The case of arbitrarily distributed independent synapses is considered and the corresponding variational framework presented in Section 8. Often used abbreviations are listed in Appendix A. Further appendices contain more technical details. Appendix B gives the derivation of the replica free energy for synapses with spherical as well as with independent but otherwise arbitrary normalization. Appendix C shows how the starting formula of Section 4, the `harda free energy term emerges. In Appendix D the short way of deriving Parisi's partial di!erential equation is given, and this requires the continuity of the order parameter function. Note that this equation is valid even in the case of discontinuities, but then the derivation, as shown in Section 4.1.2, is more involved. We do not pursue in the paper the case of vector order parameters, but give a brief account of how Parisi's partial di!erential equation for a vector "eld emerges in Appendix E. A technically useful identity between Green functions is derived in Appendix F, and the high-temperature limit of some relevant partial di!erential equations are presented in Appendix G. The only case where we can show longitudinal stability far from criticality is analyzed in Appendix H. As also stated in the Acknowledgment, sections with special contribution from P. Reimann are marked by an asterisk.
2. Arti5cial neural networks and spin glassesH The purpose of this section is to put the often technical analysis of later parts of the paper in the wider context of neural networks and spin glasses. The central issues of this work are the intricate details of Parisi's continuous replica symmetry breaking (CRSB) scheme and the adaptation of the method to the equilibrium storage properties of a McCulloch}Pitts neuron or simple perceptron. We have made an attempt to cover the most relevant literature on these two narrower themes. On the other hand, we also mention other subjects like learning algorithms, generalization and unsupervised learning, layered perceptrons, and spin glass models, where our selection of references is far from complete, and not necessarily even representative.
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
271
2.1. The McCulloch}Pitts neuron and perceptrons The model of a neuron as put forth by McCulloch and Pitts in a ground-breaking paper [1] in 1943 has attracted since continuous interest [19]. While inspired by real neurons in the brain, it is oversimpli"ed from the biological viewpoint. The model neuron can assume two states, one `"ringa, the other `quiescenta. The state depends on input signals, obtained possibly from other such units, and on the coupling parameters that weight the inputs. The couplings are often termed `synaptica in reference to the synapses, the connection points of biological neurons. Mathematically speaking, the model neuron computes the projection of an N-dimensional input S along a vector J of synaptic couplings and outputs m"1 (say it "res) or m"!1 (it is quiescent) according to the sign of this product J ) S as
, m"sign J S . I I I
(2.1)
The argument of the sign can be extended by a constant threshold, which alternatively may be represented by J if one only allows S "1 as input. Remarkably, as already McCulloch and Pitts noticed, a su$ciently large collection of such model neurons, when properly connected and the couplings properly set, can represent an arbitrary Boolean function. The model can be naturally extended to continuous outputs, when the sign function is replaced by a continuous transfer function, generally of sigmoid shape [19]. The next major step forward was achieved with the introduction of the perceptron concept by Rosenblatt [2]. The idea is to place a number of McCulloch}Pitts neurons into di!erent layers, with the output of neurons in one layer serving as input for those in the next, hence its name multi-layer feedforward network. As it was intended to model vision, such a network is also called multi-layer perceptron. The input to the network as a whole goes into the "rst layer, while the "nal output is that of the last layer. A widely applied learning concept is to try to determine appropriate synaptic couplings J for all the neurons so as to satisfy a prescribed set of input}output data, called training examples. In other words, the aim is to store the training examples. One of the motivations for doing so is that a possibly existing systematics behind the training examples may be approximately reproduced also on previously unseen inputs, that is, the network will be able to generalize. The special case of a single McCulloch}Pitts unit is a single-layer perceptron, called also simple perceptron by Rosenblatt, and lately sometimes just perceptron. For the simple perceptron with binary outputs, as de"ned in Eq. (2.1), he proposed an explicit learning algorithm that provably converges towards a vector of synaptic couplings J, which correctly stores the training examples, provided such a J exists. Simultaneously with Widrow and Ho! [38,39], he also studied two-layer perceptrons with an adaptive second layer, while using the "rst layer as preprocessor with "xed (non-adaptive) synaptic couplings, however, without being able to generalize his learning algorithm to this case. The "eld was driven into a crisis by the observation of Minsky and Papert [40] that the simple perceptron (2.1) is unable to realize certain elementary logical tasks. Con"dence returned when the so-called error back-propagation learning algorithm began to gain wide acceptance (see [41] and further references in [19]). This algorithm performs training by examples of fully adaptive multi-layer feedforward networks with generically di!erentiable transfer function. Such networks,
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
272
if chosen su$ciently large, are known to be capable to realize arbitrary smooth input}output relations see, e.g., [42,43]. Though this algorithm and its various descendants converge often quite slowly and in principle one cannot exclude that they get stuck before reaching a desired state, they have been successful in a great variety of practical applications [19]. 2.2. Associative memory Besides the layered feedforward perceptron architectures, a second eminent problem in neural computation is the so-called associative memory network or attractor network. We limit our discussion to the auto-associative case, i.e., the memory network is addressable by its own content. The concept can be traced back to Refs. [44,45] and rediscovered later (see [19] for further references). The recurrent (in contrast to feedforward) network of McCulloch}Pitts model neurons, originally suggested by [46}48] was especially suited for the task. A recurrent network contains interconnected units where signals pass through directed links that can form loops. Here the desired patterns to be stored correspond to collective states of the units in the network, and the idea is to de"ne a discrete-time dynamics of the states so that the prescribed patterns (examples) are "xed point attractors. For a collection of N model neurons (2.1), the outputs m , k"1,2, N, at I a given time step t are taken as new inputs S (t) for the next time step. Denoting by J the synaptic I GI coupling by which the ith neuron weights the signal S (t) stemming from the kth neuron, we can I write the discrete-time dynamics of neurons with binary output as
S (t#1)"sign G
, J S (t) , GI I I
(2.2)
where self-interactions are usually excluded by setting J "0. Taking an input pattern S"S(0) as GG initial condition, we understand in the de"nition (2.2) that the update is done sequentially, either by scanning through the S 's one after the other, i"1, 2,2, N, 1, 2,2, or by randomly selecting the G sites i one after the other. Such a dynamics is supposed to evolve towards the closest attractor (hence the name attractor network). If this attractor is a "xed point, that it is if the couplings are symmetric as J "J , a previously unseen pattern S can be associated with one of the stored GI IG examples, assumed to be the `most similara of all stored patterns (hence the name associative memory network). Note that for such an associative memory network patterns S have binary components S "$1. In case of units with continuous states one only requires that the length "S" I goes like N for NPR. We mention that if the synaptic couplings are non-symmetric, convergence to a "xed point is no longer certain and chaos can arise [49,50]. Given the patterns to be stored, the aim is to construct a dynamics with prescribed attrac- tors. This is the reverse of and possibly more di$cult than the more conventional problem of "nding the attractors for a given dynamical system. If we accept the neural dynamics like in Eq. (2.2), the task is then to set the J couplings to such values that lead to the desired attractors. In his pioneering GI works [51,52] Little suggested an approach to this problem by giving an explicit form for the synaptic couplings J of the McCulloch}Pitts neurons as inspired by the ideas of Hebb [53] about GI the working of brain cells. Little de"ned a parallel update rule for (2.2) and included a stochastic element characterized by temperature. Hop"eld's milestone contribution [54,55] consisted in reformulating the dynamics (2.2) as a sequential update algorithm, which led to an optimization
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
273
problem with an energy function. We will call the network with the dynamics (2.2) associative memory, while in the special case, when the synaptic couplings J are chosen according to the GI Hebb rule, the name Little}Hop"eld model will be used. For a neuro-physiological argument for a non-Hebbian learning rule we refer to [56]. The associative memory network (2.2) may be appealing because it models, albeit very crudely in details, a biological concept, its use for practical purposes is in doubt [4]. Indeed, the required storage space for the synaptic couplings is comparable to that for directly storing the patterns, and the computational e!ort of the retrieval dynamics (2.2) is similar to a direct comparison of a given input pattern with all the stored patterns. Only with appropriate modi"cations of the original setup, e.g., non-uniformly distributed patterns, may a digital implementation become advantageous [4]. For various such modi"cations and their possible practical use we refer to [19].
2.3. Sherrington}Kirkpatrick model Spin glasses are normal metals (e.g. Cu or Au) with dilute magnetic impurities (e.g. Mn or Fe), or, lattices of random mixtures of magnetic ions (e.g. Eu Sr O) exhibiting a freezing transition of the V \V spin disorder at low temperatures [57]. Due to spatial disorder, the spin interactions can be considered as random. The random sign of the interactions can be the cause of one of the basic features of spin glasses, the e!ect of frustration [58], when the interaction energies of all spin pairs cannot be minimized simultaneously. In a pioneering paper, Edwards and Anderson [59] introduced a simpli"ed model of a spin glass, essentially an Ising system with randomly selected, but "xed, exchange couplings. The in"nite-range interaction version of that is called the Sherrington} Kirkpatrick (SK) model [60,61] and is considered a realization of the mean "eld approximation. The theoretical analysis of the SK-model triggered the invention of novel statistical mechanical concepts and methods which subsequently found applications in modi"ed spin glass models such as the random energy [62] and p-spin interaction models [63,24,64], the Heisenberg [65] and the Potts glass [66,67,25], multi-p-spin and quantum spin glass models [68}70]. Methods inherited from spin glass theory also provided insight into many other problems, several of them originating from outside of physics. Prominent examples are various models of interfaces in random environment [71}75], granular media [76], combinatorial optimization (see [14] for an early review and [15] for a new development), game theory [77], protein and nucleic acid folding [78}82], and noise reduction in signal processing [83]. Last but not least, as we will expound it in the present paper, methods "rst introduced for describing the equilibrium properties of the SK model are of paramount importance in the statistical mechanical approach to neural networks. We give, therefore, a brief account of the SK model, concentrating on basic properties in thermal equilibrium. The general mathematical framework described in the main part of this paper covers the SK model as a special case. For pedagogical introductions into the calculation techniques we also refer to [74; 19, Chapter 10; 84, Chapter 3]. A detailed discussion of the physical content of the solution can be found, e.g., in [61,57,14,84]. We only mention here that the question, whether the solution of the SK model provides a qualitatively appropriate description of short-range interaction spin glasses, is still debated. See Refs. [85}90] for two exchanges on the subject, and [91] for a review and simulation results.
274
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
The state variables of the SK model are the Ising spins S "$1, interacting via random G coupling strengths J (i, k"1, 2,2, N). In the absence of external magnetic "eld, the spin GI Hamiltonian is of the form 1 (2.3) HJ (S)"! J S S GI G I 2 G$I and the couplings J are independently sampled from an unbiased Gaussian distribution with GI variance 1/N. The scaling by N guarantees the extensivity of the energy in the thermodynamic limit NPR. The feature that the interactions J are randomly chosen but then frozen while the spins GI obey Boltzmannian thermodynamics is summarized by our calling the J's quenched variables. An important goal is then to calculate, in this limit, the free energy per spin (2.4) f "! lim (Nb)\ ln ZJ . 1) , Here b"1/(k ¹) is the inverse thermal energy unit and ZJ is the partition sum S e\@&J S over all spin con"gurations S. The sum over the discrete spin states S is often denoted by a trace as TrS . The interactions J being quenched random variables, the expression (2.4) as it stands is analytiGI cally intractable. Physically, one expects that two di!erent realizations of the random interactions J will exhibit the same behavior at thermal equilibrium for NPR. Mathematically, this means GI self-averaging of the free energy density f , i.e., for any randomly sampled set of the J , Eq. (2.4) 1) GI yields the same result with probability 1, allowing us to rewrite ln ZJ as an average 1ln ZJ 2 over the quenched disorder J. Rigorous mathematical discussions of this property for the SK and the related Little}Hop"eld model can be found in Refs. [92}94]. The direct evaluation of 1ln ZJ 2 is di$cult, but it can be considerably simpli"ed by means of the replica method. This was independently discovered several times (see discussions in Refs. [61,95]) but well known only since its application to the spin glass problem by Edwards and Anderson [59]. The "rst step of this method consists in what has become known as the `replica tricka, xL!1 "ln x . lim n L Thus Eq. (2.4) can be rewritten as
(2.5)
1!1ZLJ 2 . f " lim lim (2.6) 1) bNn , L The name replica refers to the fact that the nth power of ZJ is the partition function of n non-interacting, identical replicas of the original system. The average 122 will create interactions between the replicated systems. As second step we interchange the two limits in (2.6), which has been proved valid for the SK model by van Hemmen and Palmer [95]. A further step consists in the assumption that it is su$cient to evaluate 1ZLJ 2 for integer n and then interpret n as real variable (`continuationa) in order to evaluate the limit nP0. In doing so, the point is that the averaged partition sum 1ZLJ 2 with integer n's can be technically tackled, while with non integer n it is as intractable as the 1ln ZJ 2 of Eq. (2.4). The fourth step is the evaluation of 1ZLJ 2
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
275
by means of a saddle point approximation, becoming exact as NPR. The detailed calculations along this program are given in [60,61] with the result 1 (2.7) f "lim min f (Q) , 1) n Q 1) L nb b (2.8) f (Q)"! # q !b\ ln Z Q . ?@ @ 1) 4 4 ?$@ Here the minimization } stemming from the saddle point approximation } runs over all symmetric, n;n matrices Q with elements q "1 and !14q 41 (a, b"1, 2,2, n being the replica ?? ?@ indices). Furthermore, Z Q is formally identical to ZJ if one sets N"n and J"bQ, a specialty of @ the SK model. The function f (Q) is often referred to as replica free energy. 1) The practical meaning of (2.7) can be understood as follows. A direct analytical evaluation of the minimum in (2.7) for arbitrary integer n is typically not feasible. Therefore, one introduces an ansatz for Q with a set of variational parameters k that lead to formulas explicitly containing n, and so continuation of formulas containing the elements of Q to real n-values becomes feasible. Then (2.7) is to be understood as "rst a minimum condition for general n by di!erentiating the replica free energy f (Q) with respect to the matrix elements q , the so-called stationarity condition, and the 1) ?@ requirement of at least the absence of negative eigenvalues of the second derivative matrix, the Hessian, of f (Q), i.e., the condition of local thermodynamic stability. (Here we disregarded the 1) border case when the minimum does not satisfy stationarity, and the situation when there may be several locally stable states. Interestingly, in the SK model these cases do not occur, but they do in other systems.) These relations cannot be continued to n"0 without further parametrization. But insertion of the ansatz with the variational parameters k allows for the limit nP0. In this light (2.7) does not prescribe a customary minimization, rather de"nes the minimum condition consisting of the aforementioned stationarity and stability relations, which await parametrization. On the other hand, we can reverse the order of parametrizing and minimum search. The parametrization should allow us to construct f (Q(k)) for any n. The minimization condition for 1) integer n with respect to the variational parameters k implies, in the generic case, the vanishing of the derivatives, and is supposed to admit continuation to real n-values. Closer inspection shows [14,61] that after such a continuation, in the limit nP0, the condition of local stability described above will no longer correspond to a local minimum of f (Q(k)) but rather to a local 1) maximum. This can be crudely understood when one realizes that the second term on the r.h.s. of (2.8) contains (L ) independent terms, equal the number of order parameters. The (L ) changes sign when n passes from n'1 to n(1, so for n(1 one has formally a negative number of order parameters q . This does not cause, however, confusion, because due to the parametrization of the matrix ?@ Q we do not need to work with the elements q for n(1. A similar sign change of terms obtained ?@ by expanding the third term in (2.8) changes the nature of the extremum of the free energy from minimum to maximum. The above reasoning thus leads, within a given parametrization, to 1 f "max lim f (Q(k)) . 1) n 1) k L
(2.9)
276
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
This formula prescribes global maximization in k. So if several local maxima are found, the f values there should be compared and the global maximum within the given parametrization is 1) thus well de"ned. However, we are in principle still not allowed to bypass the aforementioned local stability analysis, because a global maximum as in (2.9), within a given parametrization, may still be unstable with respect to changes in the q matrix elements. Thus one should evaluate the spectrum ?@ of the Hessian matrix of f (Q) and require that no negative eigenvalues exist in the limit nP0. 1) This leads to the at "rst sight contradictory prescriptions, namely, the minimization in (2.7), formulated as the absence of negative eigenvalues of the Hessian of f (Q), and the maximization of 1) the parametrized free energy in (2.9). Closer inspection shows, however, that there is no logical contradiction. Indeed, maximization in the restricted space of the variational parameters requires generically the negative semide"niteness of another Hessian, the one for f (Q(k))" . In special 1) L cases one can show that some eigenvalues of the Hessian of f (Q) correspond to the eigenvalues of 1) f (Q(k))" , such that non-negativity for the former ones implies non-positivity for the latter ones 1) L [61,96}98]. Following the reasoning in Chapter 3.3 of Ref. [84] this can be intuitively understood in the way that the in"nitesimal increment around an extremum of f (Q) is the sum of 1) contributions negative in number for n(1, responsible for the reversal of the type of extremum. For a more recent discussion of the problem of maximization in a descendant of the SK model see Ref. [99]. Aiming at an exact solution of the original minimization problem (2.7), one should choose a variational ansatz so that it includes the global solution. In principle, a parametrization should be adopted so that it gives a maximal f value over all possible parametrizations. Veri"cation of the 1) global nature of a maximum found within a given parametrization is a hard problem, physical intuition for the right parametrization and comparison with reliable simulation data, if such exist, may be of guidance. Considering that the replicated partition sum in (2.6) is symmetric under permutation of the replicas, a "rst guess is that also the minimizing Q-matrix in (2.7) } characterizing the state of the system at equilibrium } exhibits this symmetry. This leads us to the replica symmetric (RS) ansatz with a single variational parameter j"q"q 3[!1,1] for all aOb, named Edwards}Anderson ?@ order parameter. The explicit evaluation of (2.9) with such an ansatz and clari"cation of the physical content of the resulting RS solution has been performed in Refs. [60,61]. (For the sake of brevity we do not discuss the inclusion of external magnetic "eld and that of a non-zero average of the couplings J , some main concepts can be presented without them.) The local stability GH conditions for the RS solution have been worked out by de Almeida and Thouless (AT) in [96]. It turns out that the AT stability condition is ful"lled only for temperatures beyond a critical k ¹ "1, below that the RS solution is AT-unstable, implying that the replica symmetry of the system in (2.6) must be spontaneously broken by the equilibrium state of the system. Intriguingly, this instability does not announce itself at any integer n, it only appears as n decreases from 1 towards 0 [96,95,100]. Further evidences about the fact that the RS solution is incorrect are the negative ground state entropy [60] and magnetic susceptibility [101], and its predictions for the ground state energy and the probability density of the local magnetic "eld that contradict simulations, see [14]. In order to "nd a consistent description of the SK model at low temperatures, several replica symmetry breaking (RSB) parametrization for the Q matrix in (2.7) have been proposed [102}107]. In what can be viewed as the generalization of Blandin's one-step RSB (1-RSB) [102], Parisi
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
277
formulated on physical grounds a hierarchical structure for the Q matrix (see also Section 3.3 in [14]) and introduced the so far only RSB ansatz compatible with these conditions in an ingenious series of works [108,109,20}23,110]. Depending on the number 2R#1 of variational parameters k in this ansatz, one speaks of an R-step, R"0, 1, 2,2, RSB ansatz (R-RSB), and of continuous RSB (CRSB) in the limit RPR. The RS ansatz corresponds to R"0, and each higher step contains the previous ones as special cases. Later in the paper the explicit form of Parisi's ansatz for Q will be given and its consequences thoroughly discussed. Following Parisi's study, mostly focusing on the region near criticality, the deep spin glass phase was also extensively analyzed within the CRSB ansatz, see Refs. [14,84]. We highlight among the non-perturbative approaches the work of Sommers and Dupond [29], where a variational free energy especially suited for numerical evaluation was constructed and used to resolve ground state properties. One of their successes was a theoretical prediction for the probability density of the local "eld, that favorably compared to the simulation of Palmer and Pond (see Fig. 3.6 of [14]). The generalization of the AT stability conditions for the case of an R-RSB solution has been developed in a series of works by De Dominicis, Kondor, and TemesvaH ri initiated with Ref. [98] and presented in the most general form in Ref. [111]. Due to the complicated form of these stability conditions, they could so far be veri"ed for Parisi's solution only slightly below the AT instability. Yet it is widely believed that Parisi's solution captures the correct behavior of the SK model in the entire low-temperature regime. The global stability of Parisi's RSB ansatz has not been veri"ed by rigorous mathematics. It is physically supported in part by the suggestive picture of hierarchical organization of states in the glassy phase. Furthermore, it shows none of the aforesaid inconsistencies the RS solution was plagued by, and it compared satisfactorily with simulations. In fact, we do not know of any instance, where the replica method with Parisi's ansatz has been applied and at the same time well-founded analytical or numerical approaches are available and would yield incompatible results. Neither is a case known to us which admits application of the replica method but cannot be handled in a self-consistent way by Parisi's ansatz with su$ciently many, possibly in"nitely many, steps of RSB. As an alternative to the replica method, Thouless, Anderson and Palmer [112] have established a modi"ed form of the Bethe}Peierls method, reproducing the RS results at high temperatures, while di!ering from both the RS and Parisi's solution in the AT-unstable region. This approach has been further developed by Sommers [113,114] in a way that was later realized [106,115] to be equivalent, in a certain limit, to a generalized version of the RSB ansatz by Blandin and coworkers [102,105]. A second alternative method is the dynamical approach of Sompolinsky and Zippelius [32,33,116], capturing Parisi's solution in the static case [34,117]. The latter may in turn be reproduced by an iterative extension of the Blandin}Sommers scheme [106], the "rst step towards the correct Parisi solution. A further modi"ed form of the Bethe}Peierls approach } the so called cavity approach } by MeH zard et al. [14,118] contains the Thouless, Anderson, and Palmer equations as a special case but can also be extended to become equivalent to a Parisi ansatz with an arbitrary number of RSB steps. Again, this is not a mathematically rigorous method but rather an ansatz in combination with an intuitive physical line of reasoning, veri"ed by self-consistency in the end. While the physical picture is less elusive than that behind the formal nP0 limit, the equivalent replica method in conjunction with Parisi's ansatz seems to be in a higher developed status as far as applicability for practical calculations is concerned. For instance, the self-consistency condition of the cavity approach, expected to be equivalent to the thermodynamic stability conditions of the
278
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
replica method [14], has so far been explicitly worked out only in the simplest case, corresponding to the AT stability condition for the RS state. Another formulation of the dynamics was given by Sommers [36], who devised a path-integral approach specially suited for discrete variables like Ising spins. His results are in accordance with those of Sompolinsky and Zippelius, who used a continuous spin model that, in a singular limit, also covered the case of Ising spins. A recently suggested alternative method [119] studies the n-dependence of 1ZL2, reiterates that di!erent continuations to nP0 give the RS and RSB solutions, without the need of explicitly inserting Parisi's ansatz. However, the heuristics involved may cause that the exact solution is obtained only in special cases. There is a large family of spin glass models consisting of various generalizations of the SK model that have been successfully treated by Parisi's ansatz, albeit mostly near criticality in a perturbative manner [84]. A prominent exception is Nieuwenhuizen's multi-p-spin interaction model with continuous, spherical, spins [68]. The "xed p"2 case is the long known spherical SK model, which can be solved within RS [84]; with multi-p-spin interactions, however, it can exhibit RSB. Remarkably, in CRSB phases the continuously increasing part of Parisi's order parameter function can be analytically calculated for any temperatures. Even with a "xed p'2, one can also have phases where the 1-RSB solution is exact, a situation discussed for the neuron with Ising couplings in Section 2.5.2. The multi-p-spin model has also become a test bed for equilibrium thermodynamic calculations meant to capture asymptotic states of dynamics not maximizing the free energy [99]. It is well known that for a ferromagnet, the symmetry of the system as a whole, i.e., of the Hamiltonian, is spontaneously broken by the state of the system at thermal equilibrium, accompanied by a spontaneous breaking of ergodicity [120]. Such a state can be reached by decreasing the temperature, when system undergoes a transition from a paramagnetic phase exhibiting symmetry and ergodicity to a ferromagnet with only axial symmetry and restricted ergodicity. In the SK model described by the replica free energy (2.7), as temperature decreases, an analogous phase transition from a paramagnetic into a spin glass phase takes place [61,121,122,57], with a concomitant spontaneous breaking of ergodicity and of RS. The transition can be monitored by Parisi's variational parameters k at stationarity, thus playing the role of order parameters [20,110,123]. The emerging intuitive picture of RSB is that of a very complicated, rugged, free-energy landscape in some coarse-grained state space, with a large number of local minima, many of them nearly degenerate, as well as a number of global minima, separated by free-energy barriers, whose height diverges in the thermodynamic limit. What, in ordered systems, thermal equilibrium state is corresponds here to a global minimum, also termed as ergodic component, or pure thermodynamical state, or metastate. Within the Parisi solution pure states are organized according to a hierarchical, so-called ultrametric topology [30,28,124]. The ultrametric decomposition of the state space into pure states, from the practical viewpoint, helps in the calculation of non-self-averaging quantities [27,28], and is also a basic ingredient of the cavity approach in [14,118]. However, so far it withstood rigorous mathematical treatment, and as to real spin glasses, it is the subject of ongoing controversy [91,125]. We would like to add here that, in the context of neural networks, examples are known [126}128] where there are multiple ground states, and they are grouped into disconnected regions, i.e., ergodicity is broken, while the replica method implies that RS is preserved. The aforesaid physical picture about RSB can be maintained by distinguishing between pure states and ergodic components [126]. Furthermore, it is unclear whether it is a spontaneous symmetry breaking that takes place in those networks. In the present manuscript we
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
279
do not deal with such subtleties, and concentrate mainly on the replica method as a tool for calculation. The replica approach in conjunction with Parisi's ansatz provides so far the most complete description of the SK model in averaged thermal equilibrium. However, this scheme, as well as the equivalent cavity approach and the static limit of the path-integral formulation, involve certain procedures which, up to now, could not be put on a rigorous mathematical basis. On the one hand, there exists a number of remarkable rigorous results concerning the SK model: in Ref. [129] it was shown that the quenched average N\1ln ZJ 2 approaches the so-called annealed average N\ ln1ZJ 2 in the thermodynamical limit (termed strong self-averaging property) above the AT-line and in the absence of an external magnetic "eld. The evaluation of N\ ln1ZJ 2 is straightforward and reproduces the RS solution. The basic reason behind these conclusions is the vanishing of the Edwards}Anderson order parameter so that the usual e!ective coupling of the replicas after averaging out the quenched disorder does not arise, i.e., 1ZLJ 2 "1ZJ 2L . Further more, some explicit bounds pertaining to the low-temperature region have been obtained in [129] which imply [92] the existence of a phase transition at the same temperature as predicted by the AT-stability criterion. In [92] it was shown by means of a rigorous version of the cavity procedure, called martingale method in the mathematical physics literature, that if the Edwards}Anderson order parameter is self-averaging then the RS solution is exact. In [130] it was rigorously veri"ed that this order parameter is self-averaging and thus the RS solution is exact if the AT stability condition is ful"lled without and external magnetic "eld, and also under a slightly stronger than the AT condition in the presence of a "eld. In view of this theorem, it is suggestive that an AT-stable RS solution will provide the correct result also in other systems. It furthermore con"rms Parisi's RSB ansatz to the extent that this ansatz reduces to the RS result if the AT condition is satis"ed. Finally, the previously discussed evidences as well as the rigorous mathematical proof from [129] that the RS solution is incorrect at low temperatures, it follows that the Edwards}Anderson order parameter is not self-averaging. This feature is indeed reproduced by the Parisi solution. Another interesting rigorous result has been obtained in Refs. [131,132] via the martingale method, namely that there exists a set of `order parameter functionsa 04x(q)41 such that the SK free energy can be expressed in terms of antiparabolic martingale equations, each of them involving one such function x(q) and being exactly of the same form as the non-linear partial di!erential equation in Parisi's CRSB scheme. The remaining non-trivial step in order to complete a rigorous derivation of Parisi's CRSB solution is to show that this set of functions is e!ectively equivalent to a single function x(q). Finally, in [133,134] certain rather strong conditions are derived that should be satis"ed by the order parameter of a class of spin glass models } including the SK and short-ranged models. These constraints are indeed ful"lled by Parisi's solution but still leave room for other possibilities. We remark that the replica method in combination with the Parisi ansatz is not restricted to the SK model and its variants, this is also one of the main reasons why this paper was written. Nevertheless, most of the above rigorous results pertain to the SK model, only some of them have so far been generalized to the Little}Hop"eld network, and none but the last one to even further systems. 2.4. Little}Hopxeld network One of the main breakthroughs of the statistical physical approach to other "elds was achieved on the Little}Hop"eld model by the replica calculation of Amit et al. [135}137]. They considered
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
280
M randomly sampled patterns SI, k"1, 2,2, M, of dimension N, where N is the number of participating neurons, for a "xed value of the so-called load parameter a"M/N
(2.10)
in the thermodynamical limit NPR. The starting point of the statistical mechanical treatment is a canonical Boltzmannian formulation of the problem. A microstate is a con"guration of the neuron states S , i"1,2, N, and a pattern is considered as stored if it is a stable "xed point G attractor of the dynamics (2.2). The energy function for the random, sequential dynamics (2.2) is analogous to the Hamiltonian of the SK model in (2.3) [54]. The main di!erence is in the exchange couplings, taken now as J "N\ + SISI, called Hebb rule. Thus the patterns SI play the role of I G H GH the quenched disorder. At positive temperatures the dynamics (2.2), the update rule for the selected neuron, is non-deterministic and usually Glauber's prescription is applied, see, e.g., Ref. [84]. The original storage problem corresponds to the zero temperature limit. Within the RS ansatz Amit et al., obtained as central result that the maximal number M of patterns which can be stored with an error of a few percent, scales as M "a N in the thermodyn amical limit NPR with a critical capacity a K0.138. Criticality manifests itself by the drop of the overlap of a generic stationary state with the desired pattern from a value below, but close to, one to nearly zero. It has been immediately noticed [135] that the AT stability condition is violated at zero temperature for all a'0, thus for exact results RSB is required, but already a quite small temperature restores the AT stability and thus the validity of the RS solution. Applying the 1-RSB ansatz, Crisanti et al. [138] obtained a modi"ed critical capacity of a K0.144. The problem was reconsidered in the R-RSB, R"0, 1, 2, analysis of Ste!an and KuK hn [139], who put forth a ground state capacity a K0.1382 based on several cross-checking of their computation. The authors raise the possibility that the Parisi}Toulouse hypothesis [140], implying that in a CRSB solution the magnetization in the SK model does not depend on the temperature, believed to be exact for vanishing magnetization, holds also in the Little}Hop"eld model, at least as a good approximation. In that case, they conclude, the capacity is given by the intersection of the AT line and the RS phase boundary, that is, the capacity is essentially the one calculated from the RS solution. A CRSB calculation, an extension of Parisi's solution of the SK model within the formalism of Ref. [117], was performed by Tokita [31]. The sophisticated numerical method applied to evaluate the CRSB equations showed an instability near a K0.155$0.002, which he identi"ed as the capacity. Numerical simulations [54,135,138,141] gave estimates mostly between the aforesaid "nite R-RSB and Tokita's CRSB results. However, a more recent simulation [142], including a "nite-size scaling specially adapted for a discontinuous transition in the presence of quenched disorder, yielded a "0.141$0.0015, in better agreement with the former one. Given the fact that the numerical evaluation of the CRSB state to the required precision is a much more formidable task than that of R-RSB, R"0, 1, 2, and that even 1-RSB computations were the subject of debate [138,139], the question of theoretical prediction may still be considered as open. The main issue here is less the precise number but rather the salient features of the phase diagram like reentrance, the validity of the Parisi}Toulouse hypothesis, or what kind of RSB describes the various phases [139,31]. Tokita's framework involving the freedom of a gauge function is closely related to the variational approach for the SK model [117,29], inspired in turn by dynamical studies [34] where the static
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
281
gauge function is related to the time-dependent susceptibility. The variational framework we present in Section 7 on purely static ground, turns out to be very similar to those, albeit without our resorting to the gauge function. On the technical side, we are unaware of any non-perturbative CRSB analyses that aim at the ground state or at least regions with frustration far from criticality, beyond those performed for the SK model and descendants, as well as the related Little}Hop"eld model. Filling this hiatus was an important motivation for the present paper. The RS results of Amit et al. have been re-derived in several di!erent ways [143}147], based on certain assumptions which are possibly equivalent to that of RS. Alternative methods comparable to RSB, however, do not seem to be available yet. The authors of Ref. [145] speculate that their framework may admit such an extension, being based on Sommers' dynamical path-integral approach [36] which successfully reproduced some RSB features in the SK model. The following mathematically rigorous results for the Little}Hop"eld model are so far available. The self-averaging property of the free energy density has been proven in [93,94]. In [148] the RS solution is rigorously derived under the assumption that the Edwards}Anderson order parameter is self-averaging, and in [130] the latter assumption is shown to hold under a condition similar to, but somewhat stronger than, the AT stability condition. Finally, a constraint similar to ultrametricity on the order parameter has been derived in [133,134] which is indeed satis"ed by the RS solution at high temperatures and by Tokita's CRSB solution at low temperatures. 2.5. Pattern storage by a single neuron As we have seen, the McCulloch}Pitts model neuron is the elementary building block of two prominent types of neural networks, the layered, feedforward, perceptron and the associative memory. Therefore, the detailed exploration of such a single neuron is an indispensable prerequisite for a satisfactory understanding of the collective behavior of networked units. 2.5.1. Continuous synaptic coupling Firstly we describe the case of continuous synaptic couplings, i.e., arbitrary vectors J in (2.1). If their norm is "xed then the term spherical couplings is often used. Note that in Eq. (2.1) the norm does not in#uence the output. An early remarkable results is due to Winder [149] and Cover [150] regarding the maximal number M of input patterns for which a single McCulloch}Pitts neuron can correctly reproduce the prescribed outputs according to (2.1). This is understood as a theoretical maximum, i.e., without reference to any speci"c training algorithm that may be necessary to "nd the right couplings. For randomly sampled patterns SI, k"1, 2,2, M, their result for the critical capacity a "M /N in the limit NPR approaches, with probability 1, the value a "2, a widely referenced result in arti"cial neural networks. An easy to follow account of Covers geometrical proof, for arbitrary N, can be found in Section 5.7 of [19], and notable extensions have been worked out in [151}153]. A central notion for adaptive networks is the version space. This is the set of coupling vectors J compatible with the patterns, or, examples. Intuitively, it is clear that the version space shrinks as the number of patterns increases, and beyond the capacity the version space is empty, at least with probability one in the thermodynamical limit. A breakthrough was achieved when the space of synaptic couplings of a single McCulloch}Pitts neuron was explored, following the proposition of Gardner [3], by Gardner and Derrida within
282
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
both the microcanonical [4] and canonical [5] approaches. A main novelty of the concept was in reversing the traditional analogy between spin systems and neural networks. In the Little}Hop"eld model the states of the neurons form the `spin spacea, and the synaptic couplings are the quenched parameters. The new proposition was to consider the couplings as con"guration space for statistical mechanics, with constraints represented by randomly generated patterns to be stored, i.e., which should be reproduced by appropriate setting of the couplings, that is, to consider the version space. By the introduction of an appropriate cost, or, energy function in coupling space (further synonyms are Hamiltonian function, or, error measure), the stage was set for the statistical mechanical treatment. This does not restrict the study to the version space, but allows for "nite temperatures, so beyond capacity provides a framework to describe states with a given error, including the minimal positive error of the ground state. The common ingredient in both the Little}Hop"eld and the Gardner}Derrida concepts is that patterns, i.e., examples, represent the quenched disorder, else they are quite di!erent. For example, while the energy function of the Little}Hop"eld network closely resembles that of the SK model, not much formal analogy exist between spin systems and synaptic coupling space. In what was a novel application of the replica method, within the RS ansatz, Gardner and Derrida reproduced, and generalized to biased pattern distributions, the Winder}Cover result. They calculated many a characteristics for the region below the critical capacity a , and also proved convergence of training algorithms. We note here that the traditional problem of error-free storage corresponds here to the condition of zero energy in the ground state. If not all patterns can be accommodated by the couplings, that is, the neuron is beyond capacity, then, depending on the choice of the Hamiltonian, various positive ground state energies arise. The thermodynamical stability of the RS solution via the AT condition [96] was formulated here by Gardner and Derrida [5] and was revised later by Bouten [9,10]. It turned out that the RS ansatz beyond the critical capacity a "2 is unstable for the much studied energy function that measures the number of patterns that are not stored, i.e., of unstable patterns. This is sometimes called the Gardner}Derrida error measure and will be in our focus in the present paper. An improved 1-RSB ansatz by Majer et al. [7] and by Erichsen and Theumann [8], as well as the subsequent 2-RSB calculation by Whyte and Sherrington [11], continued to be plagued by similar instability beyond capacity. The latter authors could prove that no "nite R-RSB ansatz in the ground state, beyond capacity, may possibly be locally stable. In the present article we propose Parisi's CRSB ansatz as an appropriate description of a single neuron beyond capacity, within the limits of an equilibrium, averaged statistical mechanical treatment. As shown in [154], the e!ect of frustration, manifesting itself in the spontaneous breaking of RS beyond capacity, brings along from the viewpoint of numerical simulations, a very hard, NPcomplete problem [13,16]. That means that whatever algorithm is used to "nd an N-dimensional vector of synaptic couplings J with the smallest possible number of misclassi"ed examples, the time necessary for it is expected (a rigorous proof is not known) to increase faster than any power law with N. Simple algorithms which minimize the number of misclassi"cations locally, i.e., within a certain neighborhood of the initial choice for J, are due to Wendemuth [155,37]. While his result on the error measuring the number of unstable patterns signi"cantly overestimated the error, as demonstrated in Ref. [18] and cited in the present paper his algorithm may still yield acceptable approximations for global minimization as predicted by the CRSB theory. We refer also to Section 7.3 in [14] for the analogous observations in the context of the SK model. Returning to generic
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
283
NP-complete problems, by admitting some random element in the algorithm, the numerical e!ort can be reduced to some power of N, hence the name non-deterministic polynomial that NP stands for. The price to be paid then is that the absolute minimum will be found only with a certain probability [156}159]. A most widely used such method is simulated annealing [160] and its descendants [161]. As pointed out in [15], the average time required for the numerical solution may undergo a dramatic change if certain parameters are varied, without changing its NPcompleteness. Therefore, the so-called worst-case scenario, on which the classi"cation as NPcomplete is based, may in fact not capture very well the typical behavior, occurring with probability 1 as NPR, of such algorithms in speci"c applications. Conversely, a proof that a problem can be solved deterministically within polynomial times may still allow very long times for an algorithm to converge. Nevertheless, NP-completeness is generally considered as the signature of algorithmically hard tasks. It is natural to expect that some, possibly most, of the rigorous results and alternatives to the replica method for the SK and Little}Hop"eld model can be carried over to the simple perceptron. However, so far available is only the cavity method in its simplest form, equivalent to a RS solution, together with a self-consistency condition equivalent to the AT stability condition of the RS solution [162}165]. Beyond the critical capacity a "2 RS spontaneously breaks, entailing } like in the SK model } an ultrametric organization [30,28,124] of the synaptic couplings J that minimize, in the ground state, the number of incorrect input}output relations SI, mI, k"1, 2,2, M, in (2.1). Below a , a complementary picture arises by introducing `cellsa on the N-dimensional sphere of synaptic couplings Cr "+J " J "N, sign(J ) SI)"pI, k"1, 2,2, M, ,
(2.11)
labeled by the 2+ possible output sequences r"+pI,. The idea to study the simple perceptron in terms of these cells Cr is to some extent already contained in Cover's geometrical derivation of the storage capacity [150] and has been employed again in [166]. An appropriate quantitative framework has been elaborated by Monasson and co-workers [126,167}169] in the context of multi-layer networks and has later been adapted to the simple perceptron in [170}172]. Based on a replica calculation, this method enables one to characterize the distribution of cell-sizes "Cr " to exponentially leading order in N in terms of a so-called multifractal spectrum, as in the thermodynamical formalism for fractals [173,174]. This multi-fractal analysis opens an interesting view on the storage as well as the generalization properties of the simple perceptron. 2.5.2. Ising couplings Storage properties change considerably, if one restricts the analysis to so-called Ising couplings, where each component of J can take only the two possible values $1. This extra constraint is partly motivated by the fact that in a digital computer the J 's have a discrete representation. It has G been observed already by Gardner and Derrida [5] that a self-contained treatment by an RS ansatz of the critical storage capacity with Ising couplings is not possible within a canonical statistical mechanical approach. Krauth and MeH zard performed a 1-RSB analysis with the prominent result a K0.833 for the critical storage capacity of the Ising perceptron [175]. Their 2-RSB explorations furthermore indicate that no new solution arises w.r.t. 1-RSB. The RS state turns out to be globally stable up to
284
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
the capacity limit, the latter being signaled by a vanishing of the entropy. This is an intriguing coincidence that could not have been foreseen by the RS analysis, because therein the point whence the entropy becomes negative is obviously only an upper limit for the capacity. The need for RSB to calculate the capacity should be contrasted with the spherical case, Section 2.5.1, where the capacity could be determined within the RS solution. The reason for the di$culty here is in that the transition form perfect to imperfect storage is discontinuous for Ising couplings. Here the order parameter exhibits a jump in the sense that one of the overlaps in 1-RSB is not the continuation of the RS value, when a passes a . From the viewpoint of the order parameter such a transition can be termed "rst order. On the other hand, since the probability weight of the discontinuously appearing order parameter value vanishes at a , the "rst derivative of the free energy remains continuous and only the second one jumps. The Ising neuron also demonstrates the importance of global stability of a state. The RS solution formally exists beyond the transition and stays locally, i.e., AT-stable up to a"4/p. However, its free energy is smaller than that from RS, so global stability appears to be taken over by the 1-RSB solution, like in "rst-order transitions. It should be added that here the locally stable but globally unstable RS solution should be ruled out as a metastable state in the traditional sense because of its negative entropy. Furthermore, the 1-RSB solution is not a locally stable equilibrium state before the transition, so two spinodal points collapse onto the transition point. While a major part of the existing statistical mechanical investigations } including the SK model in (2.4) and our present study of the simple perceptron } are based on a canonical Boltzmannian formulation of the problem, Gardner's seminal calculations in Refs. [3,4] uses microcanonical ensemble. For the Ising perceptron, this approach was adopted by Fontanari and Meir [176], reproducing Krauth and MeH zards results without going beyond RS and verifying in particular the AT stability condition [96] as well as the physical requirement of a non-negative entropy. Computing the optimal vector J of synaptic couplings for the Ising perceptron is an NPcomplete problem [13,16] for any positive load parameter a, as demonstrated in Refs. [177,178]. The challenge of numerically estimating the critical capacity a has been attacked by several groups, most of them verifying a K0.833, with the exception of Ref. [179] which is criticized by the comment in [180]. Subsequent, more extensive computations in [181,182] appear to con"rm the original critical value. Below critical capacity, a multifractal analysis of the space of Ising couplings J, inspired by the work on the spherical case [126] as discussed in the previous section, has been worked out in [170,183]. Beyond criticality, a thermodynamical stability analysis [184] suggests that 1-RSB is locally stable at and beyond a . On the other hand, also the microcanonical RS approach of Fontanari and Meir [176] continues to coincide with Krauth and MeH zards results and satis"es the local thermal stability criterion of de Almeida and Thouless [96]. The above numerical and analytical "ndings have given rise to the observation that the Ising perceptron beyond capacity behaves quite similarly to Derrida's random energy model [62]. This system is the pPR limit of the p-spin interaction version of the SK model. In particular, the 1-RSB ansatz yields [63] indeed what is accepted as the exact solution of the problem within the canonical Boltzmannian approach and the zero entropy condition marks the transition from RS to 1-RSB. Interestingly, as it has been done originally by Derrida, even the spin glass phase of the random energy model can be described by the replica method, but without the need to introduce the 1-RSB ansatz. There by a direct calculation the mean free energy could be maximized, without
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
285
dealing with spin overlaps, so this can be considered as an independent con"rmation of RSB as applied later by [63]. In the case of the neuron with Ising couplings, like in the random energy model, an overlap q "1 arises, with probability exactly 1!x"1!¹/¹ . The fact that the microcanonical formulation within RS gave as minimal error the ground state error beyond capacity [176] as the canonical 1-RSB result [175], is a further peculiarity of Ising synapses. There is no technical contradiction, however, because if q "1 is set, then the 1-RSB free energy becomes equivalent to the RS microcanonical entropy. This can be understood, if one realizes that in the latter temperature is essentially an extra variational parameter, taking the role of 1/x, related to the aforesaid probability in 1-RSB. The peculiarity of the microcanonical approach was interpreted, and exploited for calculating the storage capacity of certain multi-layer perceptrons, in Ref. [185]. Further systems where stable 1-RSB phases arise, albeit generally without the zero entropy condition, are the p-spin interaction SK model [24], its spherical variant [64], the spherical, multi-p-spin interaction model [68], the Potts glass [66,67,25], and protein folding models [78}81]. The general framework in the present paper includes both continuous and Ising synaptic couplings J. Since the case of principal interest here is Parisi's CRSB ansatz, in the quantitative numerical evaluation beyond capacity we will focus on the example of the continuous, spherical, couplings. Whether or not a continuous RSB ansatz will be necessary for more general Ising networks than the McCulloch}Pitts model, e.g., in multi-layer Ising perceptrons, remains to be seen [169,186,187]. 2.6. Training, error measures, and retrieval We recall that the patterns to be stored are prescribed as pairs SI, mI, k"1,2, M, and the McCulloch}Pitts neuron (2.1) is required to reproduce mI in response to SI. Next we de"ne the so-called local stability parameters , DI"mI"J "\ J SI , (2.12) I I I where the normalization factor "J "\ guarantees a sensible behavior in the thermodynamical limit NPR if the patterns SI are normalized to a length of the order N. Introducing an error measure on a pattern as (4.4) Q" (q !q )U P I P , LK P P\ K P where the subscript k to a matrix marks that it is k;k, I is the unit matrix and U has all elements I I equal unity. Furthermore, q
\
"0,
q "q , 0> "
m "14m 4m 424m 4m "n , 0> 0 0\
(4.5a) (4.5b)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
298
where the integer m is a divisor of m . In the case of the Q of Section 3 there is a presumed P P\ ordering q
"04q 4q 424q 4q "1 . (4.6) \ 0 0> In theory, q (0 are also possible, but in our numerical explorations of examples such q 's P P did not appear, so we shall consider the restriction to non-negative q's part of the ansatz. For QK of Section 3 the assumption is q(
"04q( 4q( 424q( , q( "0 . (4.7) \ 0 0> These represent the R-step replica symmetry breaking scheme (R-RSB). At this stage we do not prescribe the ordering of q 's and allow uniform diagonals q ,q of any magnitude. P ?? " The quadratic form in (4.1) is then
0> LKP HP KP xQx" (q !q ) . (4.8) x P P\ ? HP ?KP HP \> P The u[U(y),Q] of (4.1) should thus be replaced by u[U(y), q, m)], where the parameters in (4.4) are considered to be the elements of the vectors in the argument. By using the notation Dz"
e\Xdz
(4.9)
(2p
and the identity
e\V" Dz e\ XV(
(4.10)
we obtain
0> LKP dLx dLy DzPP H (2p)L P HP LKP HP KP 0> L L ;exp !i (q !q zPP x #i x y # U(y ) . P P\ H ? ? ? ? P HP ?KP HP \> ? ? (4.11)
eLP UW qm "
Appendix C shows that the above expression equals Eq. (C.6). The limit m "nP0 violates the ordering in (4.5b). In fact, experience in spin glasses [14,84] and in R-RSB, R"1, 2 calculations in neural networks (see [7,11]) suggests that m 's get less than P 1 and the ordering in (4.5b) is to be reversed. This can be understood by our introducing n!m P x" P n!1
(4.12)
for arbitrary n and using the x 's for parametrization instead of the m 's. The new parameter P P x should not be confounded with the integration variable x in Eq. (4.1). For integer n and m 's P ? P satisfying (4.5b) we have the ordering x "15x 5x 525x 5x "0 . 0> 0 0\
(4.13)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
299
Keeping the x 's "xed as nP0 de"nes the n-dependence of the m 's, and for n"0 formally we get P P x "m . This explains the aforementioned practice to treat the m 's as real numbers in [0, 1] with P P P ordering reversed w.r.t. (4.5b). Eq. (C.6) becomes for nP0, in terms of the x 's, P 1 Dz ln Dz u[U(y), q, x]" " L x 0> V0 V0> V V exp U z (q !q . ; Dz 2 Dz 2 P P P\ 0> P (4.14)
This is the general formula for R-RSB. Expression (4.14) can be written in form of an iteration for decreasing r's as
t (y)" Dz t (y#z(q !q )VP VP> , P P P\ P\
t (y)" Dz eUW>X(O0> \O , 0
(4.15a) (4.15b)
or, we can set x
"1 and put the initial condition as 0> (y)"eUW .
t (4.16) 0> In the iterated function we omitted to mark the functional dependence on U(y) and q, x. If a q !q (0 then the square root is imaginary. Since the Gauss measure of integrations P P\ suppresses odd powers in a Taylor expansion of the integrand, the result, if the integrals exist, will be real. The case of non-monotonic q sequence will be brie#y discussed in the end of this section. P Finally we get
1 u[U(y), q, x]" " Dz ln t (z(q ) . (4.17) L x Note that an iteration like (4.15) can be also understood, before the nP0 limit is taken, directly on Eq. (C.6) where m /m is integer. Then formally u[U(y), q, x]"n\ ln t (0). Hence for nP0 we P P> \ recover (4.17). It is, however, an advantage that we can "rst take nP0 then de"ne the recursion (4.15) with fractional powers. Indeed, while dealing with the consequences of the recursion, the replica limit nP0 is implied and we do not have to return to the question of that limit again. It is instructive to introduce ln t (y) P , u (y)" P x P> lending itself to the recursion
1 u (y)" ln Dz eVP PP W>X(OP \OP\ , P\ x P u (y)"U(y) , 0>
(4.18)
(4.19a) (4.19b)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
300
and yielding
u[U(y), q, x]" " Dz u (z(q ) . L
(4.20)
4.1.2. Parisi's PDE The above recursions can be viewed as a di!usion processes in the presence of `kicksa. Let us introduce here Parisi's order parameter function (OPF) as 0 x(q)" (x !x )h(q!q ) , (4.21) G> G G G de"ned on the interval [0, 1], where (4.6) and (4.13) are understood. With the standard notation f (q>)"lim f (q#e) , C we have obviously
(4.22)
x(q>)"x , P P> and we may set
(4.23)
x(q\)"x , P P
x(q )"x . P P> Next we introduce the "eld t(q, y) such that at q it has the discontinuity P t(q>, y)"t (y) , P P > P VOP . t(q\, y)"t (y)VO\ P P In other words, t(q, y)VO
(4.24)
(4.25a) (4.25b)
(4.26)
is continuous in q. We may set at the discontinuity t(q , y)"t (y) . (4.27) P P A graphic reminder to the way x(q) and t(q, y) are de"ned at the discontinuity is shown on Fig. 1. Note that r was converted to q di!erently for x and t , cf. Eqs. (4.24) and (4.27). All "elds appearing P P below follow the convention (4.25a), (4.27). In the interval (q , q ) we de"ne the t(q, y) based on (4.15a) as P\ P
t(q, y)" Dz t(q\, y#z(q !q) , P P
(4.28)
ensuring that (4.25a) holds for rPr!1. Relation (4.28) says that the t(q, y) evolves in the open interval from q to q by the linear di!usion equation P P\ R t"!Rt . (4.29) O W
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
301
Fig. 1. Schematic behavior of x(q), t(q, y), and u(q, y) at a discontinuity point q . A "xed y is assumed. The function u(q, y) P is continuous in q but has a discontinuous derivative. The two limits of t(q , y) are related through Eqs. (4.25a) and P (4.25b). The circles are placed where the function value is not taken as the limit.
Near the discontinuity of x(q) another di!erential equation can be derived. Let us di!erentiate Eq. (4.26) by q as
1 x R tV" t\VV R t! t ln t . O O x x
(4.30)
Since t(q, y)VO is continuous in q while t(q, y) and x(q) are not, the two singular derivatives on the r.h.s. must cancel in leading order. Hence we obtain x (4.31) R t" t ln t O x in an in"nitesimal neighborhood of q . The above derivation is apparently unfounded, because at P a discontinuity the rules of di!erentiation used in (4.30) loose meaning. However, considering (4.31) at a "xed y as an ordinary di!erential equation separable in q helps us through the discontinuity, and we obtain
P W dt P dx RO> VO> " . t ln t \ \ x P P RO W VO The integrals yield
(4.32)
> P "ln x(q)"O\ P ln ln t(q, y)"O> , (4.33) P O\ OP whence by exponentiating twice we recover the continuity condition for (4.26). In conclusion, for a discontinuous x(q) Eq. (4.31) can indeed be interpreted as the di!erential form of the prescription that (4.26) is continuous in q.
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
302
Concatenation of (4.29) and (4.31) gives, with regard to the initial condition (4.16), the PDE x R t"!Rt# t ln t , O W x
(4.34a)
t(1, y)"eUW .
(4.34b)
Indeed, at a q the x (q) is singular, so the second term on the r.h.s. dominates and we recover (4.31), P whereas within an interval x (q),0 and thus (4.29) holds. The transformation analogous to (4.18) is t(q, y)"ePOWVO ,
(4.35)
and gives rise to R u"!Ru!x(R u) , O W W u(1, y)"U(y) .
(4.36a) (4.36b)
It follows that when x(q) has a "nite discontinuity then the "eld u(q, y) is continuous in q, as on Fig. 1. This is in accordance with the condition that formula, (4.25b) is continuous. The PDE (4.36a) can be rewritten via the transformation q"q(x) to one evolving in x, a PDE "rst proposed by Parisi with a special initial condition for the SK model [14,84]. In this paper we refer to (4.36) and its equivalents as Parisi's PDE, PPDE for short. When x(q),const., di!erentiation of the PPDE (4.36) in terms of y gives the Burgers equation for the "eld R u. Then the derivative of Eq. (4.35) by y corresponds to the Cole}Hopf transformaW tion formula [224,225], which converts the Burgers equation into the PDE for linear di!usion, here (4.34) with x ,0. If x(q) is not a constant, (4.35) connects two non-linear PDEs. We shall refer to (4.35) as Cole}Hopf transformation. In case of a discontinuous initial condition U(y) the Cole}Hopf transformation (4.35) connects two discontinuous functions at q"1, while generically di!usion smoothens the discontinuity for q(1. Even if we succeed in de"ning the PDEs for non-di!erentiable initial conditions, the equivalence of Eqs. (4.34a) and (4.34b) and Eq. (4.36) is doubtful. In case of ambiguity precedence is taken by the PDE (4.34a), (4.34b), that directly follows from the recursion (4.15). The question of discontinuity in the initial condition will be discussed later. Our main focus is the term (4.20), now also a functional of x(q)
u[U(y), x(q)]" Dz u(q , z(q ) ,
(4.37)
where n"0 is implied. Note that in the interval (0, q ) x(q),0, so (4.36a) becomes the PDE for linear di!usion, whose solution at q"0 is given by the r.h.s. of (4.37). Thus u[U(y), x(q)]"u(0, 0) .
(4.38)
In the above PDEs q is a time-like variable evolving from 1 to 0. In the context of the PDEs we will refer to q as time, and ordinary derivative by q will be denoted by a dot. The above PDEs can be considered as non-linear di!usion equations in reverse time direction. E. Ott has kindly called our attention to the Cole-Hopf transformation.
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
303
Next we study the case of QK with Parisi elements obeying (4.7). Then the PDE obtained for the "eld tK (q( , y) by continuation contains the function x( (q( ). We obtain x( (4.39a) R ( tK "!RtK # tK ln tK , W O x(
tK (q( , y)" Dz eUW> X(O( 0 , 0
(4.39b)
where tK (q( , y) is real due to the symmetry of the Dz measure. Alternatively, with the Cole}Hopf 0 transformation (4.35), we have R ( u( "!Ru( !x( (R u( ) , O W W
(4.40a)
u( (q( , y)"ln Dz eUW> X(O( 0 . 0
(4.40b)
The existence of the integral is a sensitive question here, because the imaginary term in argument expresses the fact that exp U is evolved by backward di!usion. The meaningfulness of the above initial condition should be checked case-by-case. Then the sought term is u[U(y),QK ]" "u[U(y), x( (q( )]"u( (0, 0) . (4.41) L In contrast to the PDEs associated with the matrix Q of naturally bounded elements, where the time span of the evolution is the unit interval, in the case of QK the PDEs' evolution interval is not "xed a priori. Now q( goes from q( to 0, where q( itself is a thermodynamical variable subject to 0 0 extremization. Finally, we emphasize that the recursive technique may be able to treat non-monotonic q sequences. Indeed, if q (q then an imaginary term would multiply z in the integrand on the P P P\ r.h.s. of (4.15a), but the l.h.s. would have a real function. If the integrals involved exist then there is no obstacle to extend the theory to non-monotonic q 's. Such a case did not, however, arise in our P explorations. As we shall see in Section 6.1, the OPF x(q) is a probability measure, a property that non-monotonicity would contradict. On the other hand, QK can be considered as associated with a non-monotonic q( sequence. Its diagonals vanish, q( ,q( "0, and so the step from q( to P ?? 0> 0 q( "0 goes against the trend of the otherwise supposedly monotonic increasing q( sequence, 0> P r"0,2, R. Accordingly, an imaginary factor of z appears on the r.h.s. of (4.40b), and the recursion is as meaningful as it was in the case of a monotonic q sequence. P The generalization of the picture above is straightforward for an order parameter with more components, when the structure of the free energy term remains essentially the same. We brie#y discuss this case in Appendix E. 4.2. Finite and continuous replica symmetry breaking 4.2.1. The continuous limit If the minimum of the free energy is found at an OPF given in (4.21) with R"R, then the q 's P accumulate in"nitesimally closely in some region. If this happens in an interval, the OPF x(q) is expected to increase there strictly monotonically, given its physical interpretation as mean probability distribution of the overlaps, as discussed in Section 6.1. Within that interval the recursions
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
304
go over to the PDEs of Section 4.1.2. In other regions in q, where the x(q) remains a step function, the recursions discussed in Section 4.1.1 can be used, but the PDEs are also still valid, as described in Section 4.1.2. In either case, the PDEs are applicable independent of whether the minimizing OPF is continuous or step-like. In Appendix D we discuss the continuation method of Ref. [226]. In physical systems so far, including spin glass and neural network models, out of "nite R's only R"0 and R"1 RSB phases were found thermodynamically stable. The signi"cance of 24R(R RSB seems to be in approximating the R"R case. Generically, both "nite and in"nite R states are characterized by the border values:
q if R(R, (4.42a) q " lim q if R"R, 0 q if R(R, q " 0 (4.42b) lim q if R"R, 0 0 where q 50 and q 41. These are delimiters of the trivial plateaus of the OPF x(q) as 0 if 04q(q , (4.43) x(q), 1 if 15q5q . The border values (4.42) apply to both the "nite and in"nite R cases, the di!erence remaining in the shape of the OPF x(q) within the interval (q , q ). Here we assumed q "q "1, this makes 0> " the q"1 value special and we will use that in the general discussion. When R"R a typical situation is when extremization of the free energy yields
0
if 04q(q , (4.44) x(q)" x (q), x 4x (q)(x , 0(x (q)(R if q 4q(q , 1 if q 4q41 . In words, the OPF has a strictly increasing, continuous segment x (q) between the border values (4.42). Here x "x(q\ ). ¹he case with an OPF having a smooth, strictly increasing, segment x (q) will be referred to as continuous RSB (CRSB). Obviously, CRSB always implies RPR. In principle, then the OPF may be more complicated than (4.44) e.g., there may be non-trivial plateaus (xO0, 1) and several x (q) segments separated by them. So far, however, no system was found whose replica solution involved more than one strictly increasing segments x (q), separated by a plateau or a discontinuity. In what follows we will use the term `continuationa, when we understand the nP0 limit, the usage of x(q) based on Eqs. (4.12) and (4.21), as well as we give allowance for but do not necessarily imply CRSB. If the OPF in question is x( (q( ), de"ned analogously to (4.21) with the parameters +q( , x( ,, P P continuation goes along similar lines. 4.2.2. Derivatives of Parisi's PDE The iterations derived in Section 4.1.1 only describe "nite R-RSB, including the R"0 replica symmetric case, while the PDEs incorporate both "nite and continuous RSB. We therefore study the PDEs.
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
305
For later purposes it is worth summarizing some PDEs related to the PPDE (4.36) and its Cole}Hopf transformed Eq. (4.34). The "eld k(q, y)"R u(q, y) , W satis"es the PDE R k"!Rk!xkR k , O W W k(1, y)"U(y) ,
(4.45)
(4.46a) (4.46b)
obtained from the PPDE by di!erentiation in terms of y. One more di!erentiation introduces i(q, y)"R k(q, y) , W which evolves according to R i"!Ri!x(i#kR i) , O W W i(1, y)"U(y) .
(4.47) (4.48a) (4.48b)
Note that while the PPDE (4.36) and Eq. (4.46) are self-contained equations, in principle solvable for the respective "elds, (4.48) is not such and should rather be considered as a relation between the "elds k(q, y) and i(q, y). The Cole}Hopf transformation for the "rst derivative k(q, y) can be conveniently de"ned as the "eld k(q, y)t(q, y). This can be further di!erentiated to produce the Cole}Hopf transformed "eld for i(q, y). The PDEs for the transformed "elds each reduce to the linear di!usion equation along plateaus of x(q). 4.2.3. Linearized PDEs and their adjoints As we shall see, in the calculation of expectation values linear PDEs associated with the above equations play an important role. A perturbation u(q, y)#e0(q, y) around a known solution u(q, y) of the PPDE itself satis"es the PPDE to O(e) if R 0"!R0!xkR 0 . O W W This equation is satis"ed by k(q, y) with initial condition (4.46b). The "eld
(4.49)
g(q, y)"R 0(q, y) , W then evolves according to
(4.50)
R g"!Rg!xR (kg) , (4.51) O W W obviously satis"ed by i(q, y) if the initial condition is speci"ed by (4.48b). The "eld P(q, y) adjoint to 0(q, y) and crucial in the computation of expectation values can be introduced by the requirement that
dy P(q, y)0(q, y)
(4.52)
be independent of q. Di!erentiating by q, using Eq. (4.49), and partially integrating with the assumption that P(q, y) falls o! su$ciently fast for large "y", we wind up with the PDE R P"RP!xR (kP) . W O W
(4.53)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
306
Here the time q evolves in forward direction, from 0 to 1. The equivalent of the "eld P(q, y), evolving from the initial condition in our notation P(0, y)"d(y) ,
(4.54)
was introduced by Sompolinsky in a dynamical context for the SK model [34]. In this case the average (4.52) assumes the alternative forms
dy P(q, y) 0(q, y), dy P(1, y) 0(1, y),0(0, 0) .
(4.55)
Eq. (4.53) is in fact a Fokker}Planck equation with x(q)k(q, y) as drift. The initial condition (4.54) is normalized to 1 and localized to the origin. Hence follows the conservation of the norm
dy P(q, y),1 ,
(4.56)
and the non-negativity of the "eld P(q, y). Thus P(q, y) can be interpreted as a q-time-dependent probability density. We will refer to the initial value problem (4.53), (4.54), which determines Sompolinsky's probability "eld P(q, y), as Sompolinsky's PDE (SPDE) hereafter. Analogously, the "eld S(q, y) adjoint to g(q, y) satis"es R S"RS!xkR S , W O W that renders
dy S(q, y)g(q, y)
(4.57)
(4.58)
constant in q. Obviously R S satis"es the SPDE (4.53). W The Cole}Hopf transformation can be extended to 0(q, y). This is done by the recipe that in the intervals with x "0 the new "eld exhibits pure di!usion. Suppose that t(q, y) satis"es (4.34), then let l(q, y)"0(q, y)t(q, y) ,
(4.59)
whence x R l"!Rl# l ln t . O W x
(4.60)
Similarly, the analog of the Cole}Hopf transformation for the "eld P(q, y) adjoint to 0(q, y) is ¹(q, y)"P(q, y)/t(q, y) ,
(4.61)
satisfying x R ¹"R¹! ¹ ln t . O W x
(4.62)
If x "0 then the PDEs (4.60), (4.62) indeed reduce to the equation for pure di!usion. Based on that the 0 and P "elds can be evaluated along plateaus of x(q) straightforwardly.
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
307
4.2.4. Green functions The PDEs previously considered were of the form R X(q, y)"L K (q, y, R ) X(q, y)#h(q, y) , (4.63) O W where the unknown "eld is X(q, y) and the time q may evolve in either increasing or decreasing direction. The di!erential operator L K (q, y, R ) is possibly non-linear in X, may be q- and yW dependent, and contains partial derivatives by y. For vanishing argument X"0 the operator gives zero, L K 0"0. We included the additive term h(q, y) for the sake of generality, it was absent from the PDEs we encountered so far. In what follows we shall introduce Green functions (GFs) for linear as well as non-linear PDEs. Suppose that X(q, y) is the unique solution of a PDE like (4.63) with some initial condition. The GF associated with the PDE for the "eld X(q, y) is de"ned as dX(q , y ) . (4.64) G (q , y ; q , y )" 6 dX(q , y ) This may be viewed as the response of the solution X at q to an in"nitesimal change of the initial condition at q . The above de"nition yields a retarded GF, that is, if the PDE for X evolves towards increasing (decreasing) q then the GF vanishes for q (q (q 'q ). Obviously G (q, y ; q, y )"d(y !y ) . (4.65) 6 The chain rule for the functional derivative in (4.64) can be expressed as
G (q , y ; q , y )" dy G (q , y ; q, y) G (q, y; q , y ) , 6 6 6
(4.66)
where q is in the interval delimited by q and q . This is just the customary composition rule for GFs. In terms of the adjoint property, (4.66) means that the adjoint "eld to the GF in its fore variables is the same GF in its hind variables. The PDEs the GF satis"es in its fore and hind variables are, therefore, each other's adjoint equations. The de"nition (4.64) applies both to linear and non-linear PDEs (4.63). It is the specialty of the linear PDE that G (q , y ; q , y ) satis"es the same PDE in the variables q , y with additive term 6 h(q, y)"$d(q !q ) d(y !y ), where the sign is # if the time q in the PDE (4.63) increases and ! if it decreases. Then the solution can be given in terms of the GFs in the usual form
X(q , y )" dy G (q , y ; q , y ) X(q , y )# dy 6
O
dq G (q , y ; q, y ) h(q, y ) . (4.67) 6 O If the PDE for X is non-linear then G (q , y ; q , y ) is the GF for the PDE that is obtained from 6 the aforementioned PDE by linearization as performed at the beginning of Section 4.2.3. In short, the GF of a non-linear PDE is the GF of its linearized version. Note that for a non-linear PDE the GF is associated with a solution X of it, for that solution usually enters some coe$cients in the linearized PDE the GF satis"es. Suppose now that the di!erential operator in (4.63) is L K (q, R ), i.e., it is translation invariant in y. W Such is the case for the PPDE (4.36) and its derivatives. Then it is easy to see that >(q, y)"R X(q, y) W
(4.68)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
308
will obey the PDE that is the linearization of the PDE for X. Therefore
>(q , y )" dy G (q , y ; q , y ) >(q , y )# dy 6
O
dq G (q , y ; q, y ) R h(q, y ) . (4.69) 6 W O If the PDE for X is non-linear then Eq. (4.67) does not but Eq. (4.69) does hold. The latter, however, is merely an identity and should not be considered as the solution producing > from an initial condition, because in order to calculate G the knowledge of X and thus that of > is necessary. 6 A prominent role will be played by the GF G (q , y ; q , y ) for the "eld u(q, y) from the PPDE P (4.36). The linearization of the PPDE yielded Eq. (4.49) and the linearization of the derivative of the PPDE, Eq. (4.46), produced Eq. (4.51). Therefore the respective GFs are identical, G (q , y ; q , y )"G0 (q , y ; q , y ) , P G (q , y ; q , y )"G (q , y ; q , y ) . I E Given the initial condition (4.54) of the SPDE, its solution is
(4.70) (4.71)
P(q, y)"G (q, y; 0, 0) . (4.72) . The GFs G and G were discussed for the SK model in Ref. [29]. Considering the constancy of . P (4.52) and (4.58) we have G (q , y ; q , y )"G (q , y ; q , y ) , P . G (q , y ; q , y )"G (q , y ; q , y ) . I 1 An identity between derivatives of GFs can be obtained from Eqs. (4.50) and (4.67) as
(4.73) (4.74)
(4.75) R G (q , y ; q , y )"!R G (q , y ; q , y ) . W I W P Because of their central signi"cance, we display the equations the GF of the "eld u satis"es. In its fore set of arguments the G (q , y ; q , y ) satis"es P R G "!R G !x(q ) k(q , y )R G !d(q !q )d(y !y ) , (4.76) O P W P W P where the di!erential operator on the r.h.s. is the same as on the r.h.s. of (4.49). In the hind set, with regard to the identity (4.73) and the SPDE (4.53), we obtain a PDE (4.77) R G "R G !x(q )R (k(q , y )G )#d(q !q )d(y !y ) W P O P W P whose r.h.s. contains the same di!erential operator as on r.h.s. of the SPDE. The norm in the second y argument is conserved as
G (q , y ; q , y ) dy ,1 P
(4.78)
for q 4q . Eq. (4.67) shows how a particular solution of the linear PDE with a source can be expressed by means of the GF. For example, suppose that the source "eld h(q, y) is added to the linearized PPDE as R 0"!R0!xkR 0#h O W W
(4.79)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
309
and an initial condition 0(q , y) is set for some 0(q 41. Then we have the solution for 04q4q in the form O 0(q, y)" dy G (q, y; q , y ) 0(q , y )! dq dy G (q, y; q , y )h(q , y ) . (4.80) P P O The derivative "eld (4.45) satis"es (4.46). Thus it also satis"es the above PDE (4.79) with zero source; whence
(4.81)
(4.82)
k(q, y)" dy G (q, y; 1, y ) U(y ) . P
Derivation of k gives i as from (4.47) which satis"es the PDE (4.48). Its solution can be expressed in terms of the GF associated to k as i(q, y)" dy G (q, y; 1, y ) U(y ) . I
Note that relation (4.75) is necessary to maintain (4.47). So far we considered the GFs of u and its derivative "elds. It is also instructive to see their relation to the GF of the "eld t. Starting from the de"nition (4.64) of the GF and using the Cole}Hopf formula (4.35) we get x(q )t(q , y ) G (q , y ; q , y ) . (4.83) G (q , y ; q , y )" R x(q )t(q , y ) P From the PDEs (4.76) and (4.77) for G we have for G (q , y ; q , y ) P R x (q ) (4.84a) R G "!R G # (ln t(q , y )#1) G !d(q !q )d(y !y ) , O R W R x(q ) R x (q ) R G "R G ! (ln t(q , y )#1) G #d(q !q )d(y !y ) . (4.84b) O R W R x(q ) R Eq. (4.84a) could also be obtained by linearization of the PDE (4.34a), while (4.84b) is its adjoint. These PDEs are particularly useful if x (q)"0, because then they reduce to pure di!usion. One can view relation (4.83) as the translation of the Cole}Hopf transformation (4.35) onto the GFs. We again see the advantage of keeping track of a Cole}Hopf transformed pair like G and G , because R P G is simple for plateaus in x(q) and G is useful when x (q)'0, especially at jumps. R P Notation in subsequent sections can be shortened by the introduction of what we shall call vertex functions
C (q; +q , y , )" dy G (q , y ; q, y)G (q, y; q , y ) G (q, y; q , y ) , P P P PPP G G G
(4.85)
C (q; +q , y , )" dy G (q , y ; q, y)G (q, y; q , y ) G (q, y; q , y ) . PII G G G P I I
(4.86)
The ordering q 4q4q , q 4q4q is understood. The vertex functions satisfy the appropriate linear PDE in each pair q , y , furthermore, if q coincides with say q then the vertex functions G G H reduce to the product of the other two GFs with q , iOj. G
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
310
As shown for q (q(q , q (q(q in Appendix F, we have the useful identity R C "R R C . (4.87) O PPP W W PII A notable consequence of that is obtained from the fact that k(q, y) of Eq. (4.45) and i(q, y) of Eq. (4.48) are evolved by G and G , respectively, as it follows from Eq. (4.69). Therefore, P I multiplication of (4.87) by the initial conditions U(y )"k(1, y ), for i"2, 3, and integration by G G those y 's gives for q (q G R
O
dy G (q , y ; q, y) k(q, y)" dy G (q , y ; q, y) i(q, y) . P P
(4.88)
The mathematical properties of the PDEs will acquire physical meaning in subsequent chapters where thermodynamical properties are studied. 4.2.5. Evolution along plateaus Here we collect the few obvious formulas describing the evolution of some "elds along the trivial x"0 and x"1 plateaus, and give the GF for u for any plateau. Let us consider "rstly the x"0 plateau, i.e., the region 04q(q . We recall the Cole}Hopf formula (4.35) for the "eld t(q, y) to obtain t(q, y),1 .
(4.89)
The "eld u(q, y) obeys the PPDE (4.36), thus is purely di!usive for x"0 as
u(q, y)" Dz u(q , y#z(q !q) .
(4.90)
Due to continuity of u in q this also holds for q"q . The probability "eld P(q, y) from the SPDE (4.53), (4.54) is the Gaussian function P(q, y)"G(y, q) ,
(4.91)
where the notation
1 x G(x, p)" exp ! 2p (2pp
(4.92)
was used. In the region q 4q41 is the x"1 plateau, we have
t(q, y)" Dz exp U(y#z(1!q) ,
(4.93a)
u(q, y)"ln t(q, y) .
(4.93b)
The time-dependent probability "eld P(q, y) is best evaluated along plateaus by its own version, (4.61), of the Cole}Hopf transformation. The transformed "eld ¹(q, y) obeys (4.62), so it reduces to pure di!usion along a plateau. Thus, assuming the knowledge of P(q , y) and having the u(q, y) from (4.93) we get
P(q, y)"ePOW Dz e\PO W>X(O\O P(q , y#z(q!q ) .
(4.94)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
311
The GF for the "eld u, G , will be given on any plateau. Suppose that x (q),0 in the closed P interval [q , q ]. Then from (4.84), for a positive plateau value x, G is a Gaussian function. Then R G becomes from (4.83) P (4.95) G (q , y ; q , y )"eVPO W \PO W G(y !y , q !q ) , P where the notation (4.92) has been used. The GF remains to be determined on the trivial plateau x"0, that is obtained from say (4.76) as G (q , y ; q , y )"G(y !y , q !q ) . P This is the same as we would get from (4.95) by substituting x"0.
(4.96)
4.2.6. Discontinuous initial conditions If the initial condition U(y) of the PPDE (4.36) is discontinuous, then special care is necessary near q"1. While strictly speaking the PPDE is de"ned only for initial conditions twice di!erentiable by y, one may expect that for practical purposes a much less strict condition su$ces. For instance, in the textbook example of pure di!usion any function whose convolution with the Gaussian GF gives a "nite result, can be accepted as initial condition irrespective of its di!erentiability. The physical picture is that di!usion smoothens steps and spikes and brings the solution into a di!erentiable form within an in"nitesimal amount of time. The problem with the PPDE for discontinuous initial condition lies deeper. It can be traced back to the fact that the Cole}Hopf transformation no longer connects the two PDEs (4.34) and (4.36). Even if by means of the Dirac delta we accept di!erentiation through a discontinuity, the derivatives of t(1, y)"exp U(y) and u(1, y)"U(y) are not related by the chain rule, namely U(y) eUWO(eUW) .
(4.97)
This can be seen easily by taking for example the step function U(y)"ah(y) .
(4.98)
eUW"1#(e?!1) h(y) ,
(4.99)
Then
and inequality (4.97) now takes the form ad(y)[1#(e?!1)h(y)]O(e?!1)d(y) .
(4.100)
Equality could only be restored if h(y"0) were chosen a-dependent, an artifact we do not accept. However, the derivation of the PPDE (4.36) from the PDE (4.34) is invalid if the chain rule cannot be applied. The di$culty can be circumvented by our using the explicit expressions (4.93) for the "elds t(q, y), u(q, y) in the interval [q , 1]. Obviously, even if there is a discontinuity } a "nite step } in U(y), the t(q, y) and thus u(q, y) will become smooth for q(1. For instance for (4.98), using the notation
H(x)"
V
1 Dz" [1!erf(x/(2)] , 2
(4.101)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
312
we have (4.102) t(q, y)"e?#(1!e?)H(y/(1!q) for q 4q41 . This is an analytic function for qO1 and becomes (4.99) for qP1. Then u(q, y) is obtained in [q , 1] by (4.93b), also analytic for qO1, and u(1, y) becomes indeed (4.98). The above formulas extend down to q . Interestingly, as we shall see later, in the limit of the ground state ¹P0, we have q P1, but the discontinuity of the "elds equally disappears at q , although analyticity will not hold. Thus we have the "elds for q(1, the only problem remains that we cannot say that u(q, y) satis"es the PPDE (4.36) at q"1, because of inequality (4.97). The di!erence in nature between the t and u functions for q 4q41 can be illustrated by the following. The singularity of the PDEs can be tamed by our considering the "elds as integral kernels. Let us take an analytic function a(y) such that itself and its derivatives decay su$ciently fast for large arguments and consider
A (q)" dy a(y)t(q, y) . R
(4.103)
Starting from (4.93a), changing the integration variable as yPy!z(1!q, and formally expanding in terms of (1!q we get
(1!q)I dy a I (y) eU W , A (q)" R 2Ik! I where we used Dz zI>"0, Dz zI"(2k!1)!!, and the notation dI f (x) . f I (x)" dxI
(4.104)
(4.105)
On the other hand, a similar procedure can be carried out for
A (q)" dy a(y) u(q, y) , P
(4.106)
a case we illustrate on (4.98). From (4.102) we have u(q, y)"ln[e?#(1!e?)H(y/(1!q)]"u(y/(1!q) ,
(4.107)
where the last equality de"nes the single-argument function u(z). Then
A (q)"A (1)# dy a(y)(u(y/(1!q)!ah(y)) , P P
(4.108)
where A (1) was added to and subtracted from the r.h.s. Changing the integration variable as P yPy(1!q and formally expanding by (1!q we get
(1!q)I> a I (0) dy yI(u(y)!ah(y)) . A (q)"A (1)# P P k! I
(4.109)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
313
Thus in leading order we have from (4.104) and (4.109) A (q)!A (1)J1!q , (4.110a) R R A (q)!A (1)J(1!q . (4.110b) P P So, considering the "elds as integral operators in the case of non-di!erentiable initial conditions, we see from Eqs. (4.110a) and (4.110b) that t does but u does not have a "nite derivative by q at q"1. This explains why we could maintain the PDE for t while the PPDE had to be given up in q"1 with a non-di!erentiable initial condition. If the PPDE (4.36) is ill-de"ned for q"1 then so may be the PDEs for the derivative "elds, the linearized PDEs, and the PDEs for the GFs, as discussed in Sections 4.2.2}4.2.4. We settle the ambiguity by rede"ning the derivative "eld k(q, y) as k(q, y)x(q)t(q, y)"R t(q, y) , W so in [q ,1], where x(q),1
k(q, y)t(q, y)" Dz R eUW>X(\O . W
(4.111)
(4.112)
For a smooth U(y) one recovers the original de"nition (4.45) for any q. If, however, U(y) is discontinuous then, due to the inequality (4.97), the new formula (4.112) will, in general, di!er from (4.45) at q"1. The k(q, y) from (4.112) satis"es in [q , 1] (4.113a) R k"!Rk!kR k , W O W k(1, y)eUW"(eUW) . (4.113b) The specialty here is that the derivation Eqs. (4.113a) and (4.113b) could be done without the now invalid chain rule. The above PDE coincides with (4.46a) at x"1, with an initial condition that may be di!erent from (4.46b). In a similar spirit it can be shown that the k(q, y) rede"ned above enters the PDEs (4.76) and (4.77) for the GF G , provided the latter is introduced by our "rst giving G via (4.64) then de"ning P R G via (4.83). Note that the GF G is given in the interval [q , 1] by (4.95) with x"1, a smooth P P function in the y-arguments if both q arguments are less than 1. The continuous framework, with PDEs, was meant to be a practical reformulation of iteration (4.15). Real use of it is in the RPR limit, when it allows more liberty in parametrization of a "nite approximation than just the taking of a large but "nite R. In case of ambiguity, however, the iteration takes precedence. That argument helped us to re"ne our formalism of PDEs for discontinuous initial conditions. In what follows we will use the short notation made possible by the PDE formalism as if we were dealing with a continuous initial condition U(y). However, if U(y) is discontinuous then the PPDE must not be applied at q"1, rather (4.93) yields the "eld u(q, y) in [q , 1]. So although then the PPDE is not true at q"1, we keep it and understand it as the above recipe. The derivative of the PPDE can be upheld with the above de"nition of the derivative "eld k as can the PDEs for the GF G . In concrete computations on a discontinuous initial condition we shall see that this takes care P of most of the problem.
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
314
5. Correlations and thermodynamical stability 5.1. Expectation values 5.1.1. Replica averages In this section we evaluate important special cases of the generalized averages (3.19) and (3.23) within Parisi's ansatz. In what follows, generically the knowledge of Q, or equivalently, in the nP0 limit, that of x(q) will be assumed. Practically, all "elds introduced above as solutions of various PDEs, for given x(q), will be considered as known and expectation values expressed in terms of those "elds. The pioneering works in this subject are that of de Almeida and Lage [27] and of MeH zard and Virasoro [28], who evaluated the average magnetization and its low-order moments in the SK model. What follows in Section 5.1 can be viewed as the generalization of the mechanism these authors uncovered. We shall call the variable y in (4.1) `local "elda. In the SK model y corresponds to the local magnetic "eld, for the neuron it is the local stability parameter, and it is useful have a name for it even in the present framework. The generic formula comprising (3.19) and (3.23) is
dLx dLy L A(x, y) exp U(y )#i xy!xQx . (5.1) ? (2p)L ? The normalizing coe$cient, analogous to the prefactors in Eqs. (3.19) and (3.23), is not included here, since in the limit nP0 it becomes unity. We shall automatically disregard such factors henceforth. Furthermore, we will take nP0 silently whenever appropriate. Dependence on U and Q is not marked on the l.h.s. The quantity (5.1) will be called the replica average of the function A(x, y). Such formulas emerge in most cases when we set out to evaluate thermodynamical quantities in or near equilibrium. [A(x, y)\"
5.1.2. Average of a function of a single local xeld A case of import is when the quantity to be averaged depends only on the local "eld y of a single ? replica. Such is the form of the distribution of stabilities given in Eq. (3.27) and the energy (3.29). Due to the fact that y and x are each other's Fourier transformed variables, the expectation ? ? values of replicated x's, like in Eqs. (3.21), (3.24) and (3.25) are related to the averages of products of functions of local "elds y 's. The latter can be straightforwardly understood once the case of ? a function of one y argument is clari"ed. Thus we "rstly focus on ? C "[A(y )\ . (5.2) There is no loss of generality in choosing the "rst replica, a"1, because RSB only a!ects groups of two or more replicas. Within Parisi's ansatz (4.4) the C evaluates to a formula like the r.h.s. of (4.11) with the di!erence that here A(y ) is inserted into the integrand. In analogy with (C.2) we obtain
C "
0> LKP 0> L 0> DzPP A zP(q !q exp U zPP (q !q . P P\ P P\ H ? H P HP P ? P
(5.3)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
315
We used the de"nition of j (a) from Eq. (C.1). In the argument of A the j (1)"1 label was inserted P P for the zP's. After a reasoning similar to that followed in Section 4.1.1 again expressing the integer m by the real x from (4.12) and taking nP0, we arrive at the recursion P P
0 (y)t (y)" Dz 0 (y#z(q !q )t (y#z(q !q )VP VP> , P\ P\ P P P\ P P P\
(5.4a)
(y)"A(y) , (5.4b) 0 0> while the iteration of t (y) is de"ned by Eqs. (4.15) and (4.16). The "nal average is obtained at r"0 P as
C " Dz 0 (z(q ) .
(5.5)
Using the identity (D.1) we are lead to the operator form
0 (y)t (y)"eOP \OP\ W0 (y)t (y)VP VP> , P\ P\ P P whence by continuation it is easy to derive the PDE x R (0 t)"!R(0 t)# 0 t ln t . O W x
(5.6)
(5.7)
In the spirit of Section 4.1.2 it is straightforward to show that this equation holds also for "nite R-RSB as well. Then at discontinuities of x(q) the singular second term on the r.h.s. is absorbed by the requirement that t(q, y)VO is continuous in q. The initial condition for t(q, y) was previously given in (4.34b) and that for 0(q, y) is set by (5.4b) as
0(1, y)"A(y) .
(5.8)
In Eq. (5.7) we recognize the PDE (4.60) for the "eld (4.59). Now we again have a product like (4.59), so the "eld 0(q, y) here also satis"es the PDE (4.49). Thus the sought average (5.5) can be written as
C " Dz 0(q , z(q ) ,
(5.9)
a functional of U(y) and x(q), where the de"nition of q by (4.42a) was used. A practical expression for the above average involves the adjoint "eld P(q, y), obeying the PDE (4.53) and rendering the formula (4.52) independent of q. Let us recall the abbreviation for the Gaussian (4.92). Then (5.9) is of the form of (4.52) at q"q if P(q , y)"G(y, q ) . (5.10) Given the purely di!usive evolution in the interval (0, q ), this condition means that P(0, y) is localized at y"0, i.e. P(q, y) satis"es the SPDE (4.53), (4.54), whence we can write the expectation value in the form (4.55) as
C " dy P(1, y)A(y) .
(5.11)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
316
This is the main result of this section. Here the initial condition (5.8) was used, which is just the function we intended to average. This expression reveals that P(1, y) is the probability distribution of the quantity y, or, for a general q, P(q, y) is the distribution at an intermediate stage of evolution. Note that in [18] we gave a shorter derivation for (5.11), which avoided the use of the recursion (5.4). The reason for our going the longer way here is that it straightforwardly generalizes to the case of higher-order correlation functions. 5.1.3. Correlations of functions of local xelds The expectation value of a product of functions each depending on a single local "eld variable reads as (5.12) C 2 (a, b,2, z)"[A(y ) B(y )2Z(y )\ . ? @ X 8 This will be called replica correlation function, or correlator, of the functions A, B,2, Z of respective local "elds y , y ,2, y . Its `ordera is the number of di!erent local "elds it contains. The ? @ X natural generalization of the observations in the previous section allows us to construct formulas for the above correlation function. This will be undertaken in the present and the following two sections. Let us "rst consider the second-order local "eld correlator C (a, b)"[A(y ) B(y )\ . (5.13) ? @ The Parisi ansatz allows us to parametrize C by the q variable, rather than the replica indices a and b, remnants of the n;n matrix character of Q. This goes as follows. Fixing the replica indices a and b we obtain two iterations like (5.4), with respective initial conditions A(y) and B(y) at q"1. These we denote by 0 and 0 , respectively. The iterations evolve until they reach an index r(a, b) speci"ed by the property that for r(r(a, b), all j indices coincide, j (a)"j (b). Here we used the P P P de"nition of the labels j (a) from Eq. (C.1) i.e., if j "1,2, n/m are the labels of `boxesa of replicas P P P that contain m replicas then j (a) is the `serial numbera of the box containing the ath replica. The P P r(a, b) marks the largest r index for which the replicas a and b fall into the same box. Obviously, since for decreasing r the box size m increases, for any given r4r(a, b) the said replicas will fall into P the same box of size m . The r(a, b) will be referred to hereafter as merger index, and is a given P function of a and b for a given set of m 's of Eq. (4.5b). P The hierarchical organization of the replicas implies the following property. Consider three di!erent replica indices a, b, and c. Then either all three merger indices coincide as r(a, b)" r(a, c)"r(b, c), or two merger index coincide and the third one is smaller, e.g. r(a, c)"r(b, c)'r(a, b). This is characteristic for tree-like structures, for example, a maternal genealogical scheme. The merger index allows us to relabel the matrix elements (4.6) in the Parisi ansatz as q
"q . (5.14) P?@ ?@ This we can consider as the de"nition of r(a, b), provided that giving q uniquely determines r, that P is, in (4.6) strict inequalities hold. At the juncture r"r(a, b) the two aforementioned iterations, so far each obeying (5.4a), merge into one, such that the product of the two `incominga 0 and 0 "elds at r"r(a, b) give the initial condition for the one `outgoinga iteration, denoted by 0 .
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
317
That is, for r(r(a, b), again the iteration (5.4a) is to be used for 0 (y) such that at r"r(a, b) it P satis"es the initial condition
0 (y)"0 (y)0 (y) . (5.15) P?@ P?@ P?@ Such merging of 0 "elds to produce an initial condition for further evolution will turn out to be ubiquitous whenever correlators are computed. After changing from the discrete r index to the q time variable, we obtain the expectation value in a form similar to (5.9) as
C (q )" Dz 0 (q , z(q ) . P?@
(5.16)
Here we switched notation and denote the dependence on the initial a, b replica indices through q . Equivalently, replacing q by q, we get P?@ P?@
C (q)" dy P(q, y) 0 (q, y)" dy P(q, y) 0 (q, y) 0 (q, y)
(5.17)
Here only such q is meaningful that equals a q in the R-RSB ansatz, or, is a limit of a q if RPR. P P However, this expression can be understood, at least formally, for all q's in the interval [0, 1]. 5.1.4. Replica correlations in terms of Green functions It is instructive to redisplay the formulas for C and C (q) in terms of GFs. Their natural generalization will yield the GF technique and the graphical representation for general correlation functions. The time evolution of the 0 "eld can be expressed by means of the GF. Based on the relation between P(q, y) and the GF given by (4.72) we can write
C " dy G (0, 0; 1, y)A(y) . P
(5.18)
Correlators can be conveniently represented by graphs. On the obvious case of C , see Fig. 2, we can illustrate the graph rules. We symbolize the GF G (q , y ; q , y ) by a line stretching between q and q . Over the y's P appropriate integrations will be understood. If q "0 the corresponding y is set to zero, i.e., integration is done after multiplication by a Dirac delta. For this is always the case in our examples, we do not put any marks at q"0. A weight function under the integral at q"1, like A(y) in (5.18), should be marked at the right end of the line. In sum, C is a single line between q"0 and q"1, labeled by A(y) at q"1. As to the second-order correlator (5.17), based on Eqs. (4.79) and (4.80) we can write 0 and 0 in terms of the GF and obtain
C (q)" dy dy dy G (0, 0; q, y) G (q, y; 1, y ) A(y ) G (q, y; 1, y ) B(y ) . P P P
(5.19)
Its graphic representation is given in Fig. 3, it consists of a single vertex. The third-order correlator C (a, b, c), see (5.12) for notation, can be analogously calculated. ! We can assume without restricting generality that r(a, b)5r(a, c)"r(b, c), and use the notation
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
318
Fig. 2. Graphical representation of Eq. (5.18) for C . The line corresponds to the GF associated with the "eld u. Its two q-coordinates are taken at the endpoints of the line and the two y-coordinates are integrated over. At q"1 the function included in the integrand is displayed. At q"0 the Dirac delta d(y), understood in the integrand and forcing the zero y-argument in (5.18), is not indicated, because it is present for all correlators. Fig. 3. The correlation function C
(q).
q "q 4q "q . The q 's, i"1, 2, used here should not be confounded with the q 's of (4.6) P?A P?@ G P from the R-RSB scheme. In this case the two iterations (5.4a) with respective initial conditions A(y) and B(y) merge at r(a, b). Switching to parametrization by q means that the PDE (4.49) rather than the iteration (5.4a) is to be considered. Thus (4.49) should now be used in two copies, one with initial condition 0 (1, y)"A(y) and the other with 0 (1, y)"B(y). They merge at q . That means, the `incominga "elds multiply to yield a new initial condition 0 (q , y)"0 (q , y)0 (q , y), like in (5.15), and hence for q 4q4q the "eld 0 (q, y) obeys the PDE (4.49). In q another merger takes place with the incoming "eld 0 (q, y). This started from the initial condition 0 (1, y)"C(y) ! ! and has evolved according to (4.49) until q"q . Here the product of the two incoming "elds 0 (q , y)"0 (q , y)0 (q , y) becomes the initial condition at q"q for the "nal stretch of ! ! evolution by (4.49) down to q"0. The resulting correlator is easy to formulate in terms of GF's. Indeed, (4.80) with h,0 gives the solution of the PDE (4.49) starting from an arbitrary initial condition, speci"ed at an arbitrary time. Hence C (q , q )"[A(y ) B(y ) C(y )\ ! ? @ A
" dy dy dy dy dy P(q , y ) G (q , y ; 1, y ) C(y ) P ;G (q , y ; q , y ) G (q , y ; 1, y ) A(y ) G (q , y ; 1, y ) B(y ) . (5.20) P P P The corresponding graph is on Fig. 4, it has two vertices. The special case r(a, b)"r(a, c)"r(b, c) corresponds to q "q . Then we wind up with a single vertex of altogether four legs, and accordingly, the G (q , y ; q , y ) in (5.20) should be replaced by d(y !y ). P A general correlator of local "elds y can be graphically represented starting out of the full ultrametric tree [14]. This can be visualized as a tree with R#1 generations of branchings and at the rth generation having uniformly the connectivity m /m . The (R#1)th generation has P P> n branches, to the end of each a `leaf a can be pinned. The leaves are labeled by the replica index a"1,2, n. Between r"0 and r"1 is the `trunka. For a } possibly large } integer number of replicas n this is a well de"ned graph. If nP0 then the m 's cannot be held integers and possibly the P q 's densely "ll an interval. Thus the full tree looses graphical meaning. On the other hand, the P graphs representing replica correlators can be understood as subtrees of the full tree for integer n, and remarkably, they remain meaningful even after continuation.
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
319
Fig. 4. The correlation function C (q , q ). !
On Figs. 2}4 we illustrated the "rst three simplest local "eld correlations by graphs. There a branch connecting vertices of time coordinates say q and q 'q was associated with G (q , y ; q , y ), with implied integrations over the local "eld coordinates. This feature holds also P for higher-order correlations. Similarly to the case explained in Section 5.1.2, then again iteration (5.4), or, equivalently, the PDE (4.49) emerges. Given an interval (q , q ) the initial condition for a "eld 0 is set at the upper border q , then 0 undergoes evolution by the linearized PPDE (4.49), and the result is the solution at q . Since G (q , y ; q , y ) is the GF that produces the solution of P (4.49) out of a given initial condition, it is natural to associate the GF with the branch of a graph linking q with q . Since the GF is in fact an integral kernel, integration is to be performed over variables y and y at the endpoints of the branch. This automatically yields the merging of incoming "elds 0 at a vertex to form a new initial condition, as exempli"ed (before continuation) for the second-order local "eld correlator in Eq. (5.15). Indeed, the local "eld y associated with a vertex at q of altogether three legs is the fore y argument of two incoming GFs and the hind y argument of one outgoing GF, so the latter evolves the product of the incoming 0 "elds towards decreasing times starting from q. The graph rules for the general local "eld correlator C 2 (a, b2, z), de"ned by (5.12), can be 8 summarized as follows. Draw continuous lines starting out from the leaves corresponding to the replica indices a, b,2, z along branches until the trunk is reached. Lines will merge occasionally, and in the end all lines meet at the trunk. The merging points are speci"ed by the merger indices r(a, b)2, or equivalently, by the q values from (5.14) for each pair of the replica indices we P?@ 2 started with. Obviously, not all such q's for di!erent replica index pairs from the set a, b,2, z need to be di!erent, in the extreme case all such q's may be equal. The graph thus obtained is, from the topological viewpoint, uniquely determined by the given set of replica indices of a correlator. Then the explicit dependence on the replica indices a, b,2, z is no longer kept, instead they appear through merger indices r(a, b),2, or, equivalently, q , . This allows us to take the nP0 limit. P?@ 2 In the end, the correlator becomes a function of all q 's that can be formed from the replica P?@ indices a, b,2, z of (5.12). Now that each branch merging has a given time q value, it is useful to include the coordinate axis of q with a graph. The calculation of a correlator implies evolution by the PDE (4.49), "rst with di!erent y variables along the respective branches, from the leaves towards the trunk. The functions A(y), B(y),2, Z(y) are the initial conditions of this evolution until the "rst respective merging points. Whenever branches meet, say at a q , the "elds 0 (q , y), 0 (q , y), etc., associated with the di!erent incoming G G G
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
320
lines multiply, all having a common y local "eld. Thus is created a new initial condition for further evolution by (4.49), from q onward to decreasing q's. At the last juncture, say q , the y-integral of G the product of the incoming "elds weighted with P(q , y) yields the correlator in question. Obviously, the branches that connect merging points can be associated with the GF G of the PDE P (4.49). It follows that at a merging point of two branches the y-integral gives the vertex function C of (4.85). PPP It should be noted that the correlator C 2 (a, b,2, z) is now expressed as an integral 8 expression, where the product A(y ) B(y )2Z(y ) appears in the integrand. Thus an average of the ? @ X more general form [A(y , y ,2, y )\ ? @ X
(5.21)
is obtained by our replacing A(y ) B(y )2Z(y ) by A(y , y ,2, y ) in that expression. Then we ? @ X ? @ X loose the picture of 0 "elds independently evolving from q"1 by the PDE (4.49) and then merging for some smaller q's, because the function A(y , y ,2, y ) couples the 0 "elds at the outset q"1. In ? @ X what follows we will not encounter averages (5.21) of non-factorizable functions. In summary, a given correlation function is represented by a tree, that is a "nite subtree of the full ultrametric tree. Leaves are associated with initial conditions of the evolution by (4.49). Branches directed from larger to decreasing q correspond to the GF G . Each vertex, including the leaves and P the bottom of the trunk, has a q, y pair associated with it. At the leaves q"q "1, and there is 0> integration over y's in each vertex. At q"0 simply y"0 should be substituted into the "nal formula, so the GF of the trunk becomes just Sompolinsky's "eld P due to (4.72). The intermediate q's will be the independent variables by those we characterize the correlation function. Thus a tree uniquely de"nes an integral expression. Furthermore, topologically identical trees correspond to the same type of integral. Of course, two topologically identical trees can have di!erent functions associated with their respective leaves, and then the two integrals will evaluate to di!erent results. Elementary combinatorics gives the number N(K) of topologically di!erent trees of K leaves in terms of a recursion. Denoting the integer part of z by [z] we have N(1)"1 ,
(5.22a)
K K K K!1 )\
! N N #1 . (5.22b) N(K)" N(k)N(K!k)# 2 2 2 2 I The basis of this recursion is the fact that in a tree with K leaves two subtrees meet at the trunk, one having k and the other, K!k number of leaves. The sum is interpreted as zero for K"2. The second term on the r.h.s. contributes only for K even, it gives the number of trees that are composed out of two subtrees both having K/2 leaves. Some terms generated by the above recursion are N(2)"1, N(3)"1, N(4)"2, N(5)"3, N(6)"6, N(7)"11, N(8)"23. For K"1, 2, 3 we have N(K)"1, in accordance with our previous "nding that in each of those cases there is only one graph, see Figs. 2}4. In deriving (5.22) we assumed that vertices have altogether three legs. In that case the number of vertices is K!1. If q's coincide because branches shrink to a point then the number of vertices decreases and vertices with more than three legs arise. The corresponding integral expressions are consistent with the graph rules laid done before. Indeed, a branch of zero length is associated with
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
321
the GF as in (4.65), i.e., gives rise to a Dirac delta equating the local "elds at its two endpoints, wherefore each remaining branch still represents a GF and the vertex with more than three legs will still have a single y variable to be integrated over. 5.1.5. Replica correlations of x's Derivatives by q of the archetypical expression (4.1) play an important role in determining ?@ thermodynamical properties. Let us introduce the expectation values (5.1) of products of x 's as ? (5.23) CI(a ,2, a )"(!i)I [x x 2x I \ . ? V I ? ? The (!i)I is factorized for later convenience. This is the correlation function of order k of the variables x H . Correlators of even, 2k, order are related to the derivatives of (4.1) by the matrix ? elements q as ?@ RIeL P UWQ
. (5.24) CI(a ,2, a )" V I Rq 2Rq I\ I ?? ? ? Second-order correlators enter the stationarity conditions (3.21), (3.24) and (3.25), and fourth-order ones appear in studies of thermodynamical stability, as we shall see it later. By partial integration (5.23) can be brought to the form of the average of products of various derivatives of U(y ) as ? dLx dLy xy x Q x L e \ R ? R ? 2R ?I exp U(y ) , (5.25) CI(a ,2, a )" ? W W W V I (2p)L ? where coinciding replica indices give rise to higher derivatives. In the special case when all a indices are di!erent, we have H CI(a ,2, a )"[U(y ) U(y )2 U(y I )\ . (5.26) V I ? ? ?
Note that in the case of a discontinuous U(y) we may not use the chain rule of di!erentiation. Therefore in (5.25) the derivatives should act directly on the exponential. Then, in the spirit of Section 4.2.6, we can conclude that k(1, y) as de"ned in (4.113b) should be used in lieu of U(y), so the "eld k(q, y) de"ned in (4.111) evolves from q"1 down until the "rst merging point in its way (the "rst vertex to be met when coming from a leaf at q"1). In the following general treatment we assume a smooth U(y), with the note that the adaptation of the results to discontinuous ones is straightforward. Expression (5.26) is of the form (5.12), so
CI(a ,2, a )"CU 2U (a ,2, a ) . Y Y I V I We review some low-order correlators below.
(5.27)
5.1.6. One- and two-replica correlators of x's The simplest case of replica correlation function of x's is the average of a single x. Eq. (5.27) for k"1 becomes independent of the single replica index and gives a formula of the type (5.10) as
C"CU " dy P(1, y) U(y) . V Y
(5.28)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
322
Fig. 5. The graph of C is a single line. V
Comparison of (4.46) and (4.49) shows that with the present initial condition 0(q, y)"k(q, y). Thus, recalling that P(1, y)"G (0, 0; 1, y), we get alternatively P C"k(0, 0) . (5.29) V This is shown on Fig. 5 graphically, it is a special case of Fig. 2. Let us now turn to the correlator of two x 's as de"ned in (5.23). If the replica indices are di!erent ? then (5.26) applies; that should be complemented to allow for coinciding indices as C(a, b)"CU U (a, b)#d CU . (5.30) Y Y ?@ V This function depends on the replica indices through the overlap q at the merger q"q . The P?@ "rst term on the r.h.s. is a special case of the correlation function C (q) given in Eq. (5.19) with A(y)"B(y)"U(y). Note, however, that the k "eld satisfying (4.46) is in fact the 0 of (4.49) starting from the initial condition U(y). Therefore the two instances of convolution of the GF with U(y) give k(q, y) in (5.19) and we get
dy P(q
P?@
, y) k(q , y) P?@
(5.31)
for the "rst term on the r.h.s. of Eq. (5.30). The second term there is of the type studied in Section 5.1.2. Note that the initial condition is by (4.48b) just i(1, y). Furthermore, r(a, a)"R#1 and q "q "1. In summary, for the q-dependent two-replica correlation function we obtain ?? P?? dy P(q, y) k(q, y) if q(1 , C(q)" (5.32) V dy P(1, y) [k(1, y)#i(1, y)] if q"1 ,
having omitted the subscript r(a, b) from q. Note that the second term on the r.h.s. of (5.30) contributes at q"1. The above formula can be abbreviated as
C(q)" dy P(q, y) [k(q, y)#h(q!1\) i(1, y)] , V
(5.33)
where the second term is non-zero only if q"1. We will use the shorter notation with the Heaviside function in similar cases hereafter. Fig. 6 summarizes the result graphically. As it was emphasized earlier, the correlator is meaningful for q arguments at the stationary q 's, ?@ or at their limits for nP0. For q's where x (q),0 the extension of the correlators is not unique. For instance, we can write any q (q(1 (for "nite R-RSB, q "q , and for continuation see 0 Section 4.2.1) in lieu of 1\ in the argument of the Heaviside function in (5.33). Note that the
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
323
Fig. 6. The correlation function C(q). V
two-replica correlation function, like the "elds obeying the PPDE and the PDEs described in Sections 4.2.2 and 4.2.3, does not have a plateau in (q , 1). In summary, expression (5.33) is the two-replica correlation function for both the "nite R-RSB and RPR, at arguments q where x (q)O0. 5.1.7. Four-replica correlators The native form of the four-replica average is by (5.25) C(a, b, c, d)"CU U U U (a, b, c, d)#[d CUU U (a, c, d)#5 comb's] Y Y Y Y ?@ Y Y V # [d d CUU (a, c)#2 comb's]#[d CU U (a, d)#3 comb's] ?@ AB ?@A Y Y # d CU . ?@AB
(5.34)
Here `comb'sa stands for combinations. Then we used the shorthand notation that a d 2 "1 ?@ A only if all a, b,2, c indices are equal, else d 2 "0. Furthermore, abbreviation (4.105) is ?@ A understood. In order to simplify notation, we switch to using q for the parametrization of expectation values. G The q 's should not be confounded with the q values introduced in (4.6) for the R-RSB scheme. G P There are only two essentially di!erent correlation functions, because two topologically di!erent trees with four leaves can be drawn. Indeed, N(4)"2, c.f. Eq. (5.22). The graphs are shown on Fig. 7. They correspond to the "rst term on the r.h.s. of Eq. (5.34) and thus represent the case when all replica indices are di!erent. Taking into account coinciding indices is somewhat involved both analytically and graphically, we give below only the formulas. The graph in Fig. 7a corresponds to
C(q , q , q )" dy P(q , y) N(q , y; q )N(q , y; q )# h(q !1\) dy P(1, y) U (y) , V (5.35)
where
N(q , y ; q )" dy G (q , y ; q , y )[k(q , y )#h(q !1\)U(y )] . P
(5.36)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
324
Fig. 7. The correlation functions (a) C(q , q , q ), (b) C(q , q , q ), when all q (1 and are di!erent from each V V G other. The U(y) functions at the tip of the branches at q"1 are understood but not marked.
Note that N(q , y ; q ) can be considered as a generalized two-replica correlation with extra q , y dependence, because N(0, 0; q)"!C(q). The inequalities V q 4q 41, q 4q 41 (5.37) are understood, so the last term on the r.h.s. of (5.35) is non-zero only, if q "1, i"1, 2, 3. G The topologically asymmetric tree of Fig. 7b is associated with
C(q , q , q )" dy dy P(q , y ) k(q , y )G (q , y ; q , y )k(q , y ) N(q , y ; q ) P V
# h(q !1\) dy dy P(q , y )k(q , y )G (q , y ; 1, y )U(y ) , (5.38) P where we assume
q 4q 4q 41 (5.39) but also require q (1, because the case q "1 has been settled by Eq. (5.35). In conclusion, given the GF for the linear PDE (4.49), correlation functions can be calculated in principle. Interestingly, the GF for a Fokker}Planck equation also assumes the role here as the traditional "eld theoretical GF. Note that this is an instance where a mean-"eld property transpires: the graphs to be calculated are all trees. It should be added that here the tree structure is the direct consequence of ultrametricity [14], and may carry over to non-mean-"eld-like systems with ultrametricity [227]. That simple form of graphs is a priori far from obvious, since there are techniques for long-range interaction systems where diagrams with loops are present [84]. In hindsight we can say that by using the GF of a Fokker}Planck equation with a non-trivial drift term, we implicitly performed a summation of in"nitely many graphs of earlier approaches. 5.2. Variations of the Parisi term The variation of the free energy term by the OPF x(q) is necessary in order to formulate later stationarity conditions, and second-order variations yield the matrix of stability against #uctuation
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
325
of the OPF. In this section only the mathematical properties are investigated, physical signi"cance will be elucidated later. 5.2.1. First variation The main result of Section 4 is that the ubiquitous term (4.1) boils down within the Parisi ansatz to (4.38), i.e., lim u[U(y), Q]"u[U(y), x(q)]"u(0, 0) . (5.40) L In order to determine the variation of u(0, 0) in terms of x(q) we introduce small variations as xPx#dx and uPu#du and require that the varied quantities also satisfy the PPDE (4.36a) with the same initial condition (4.36b) for u#du. Linearization of the PPDE in the variations gives R du"!R du!xkR du!k dx , W O W du(1, y)"0 ,
(5.41a) (5.41b)
where k(q, y)"R u(q, y) satis"es the PDE (4.46). Eq. (5.41) is an inhomogeneous, linear PDE for W du(q, y), given x(q), dx(q), and k(q, y). Note that this is of the form of the linearized PPDE with source (4.79). Its solution is given in (4.80), whence
1 dq dy G (q , y ; q , y )k(q , y ) dx(q ) , du(q , y )" P 2 O whence
du(q , y ) "h(q !q ) dy G (q , y ; q , y )k(q , y ) . P dx(q ) Thus the variation of the term (5.40) is
du(0, 0) 1 1 " dy G (0, 0 ; q, y) k(q, y)" dy P(q, y) k(q, y) . P dx(q) 2 2
(5.42)
(5.43)
(5.44)
Here we used the identity (4.72) between the GF and the "eld P(q, y). It is interesting that the above formula is in fact proportional to the two-replica correlation of Eq. (5.33) du(0, 0) "C(q) V dx(q)
(5.45)
for q(1. Since the correlation function can also be obtained by di!erentiation in terms of q , we ?@ have by Eqs. (4.1), (5.32), and (5.44), for q(1
du(0, 0) Rnu[U(y), Q] " . (5.46) lim dx(q) Rq ?@ O O ?@ L This relation tells us that if a free energy is the sum of terms (4.1) then the two stationarity conditions, one obtained by di!erentiation in terms of the matrix elements q "q and the other by ?@
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
326
variation in terms of x(q), are equivalent. Such is the SK model, the spherical neuron, and the neuron with arbitrary, independent synapses. In the case of a discrete R-RSB scheme (4.5) variation by x(q) is made with the assumption of a plateau, i.e., x(q),x, 0(x(1, in an interval I. Then the role of the variation will be taken over by the derivative in terms of the plateau value x and of the endpoints q and q . It is straightfor ward to show that Ru(0, 0) 1 " 2 Rx
C(q) dq V
(5.47)
' results. Since the "elds P and k are purely di!usive in I, the q-integral is Gaussian. On the other hand, the derivatives in terms of the endpoints are C at the endpoints, due to Eqs. (5.46) and V (5.45). If we work with an ansatz for the OPF that has both x (q)'0 and x(q),x, 0(x(1, segments, then (5.44) should be used in an interval where x (q)'0 and (5.47) along a plateau. If x (q)'0 at isolated points, like in a "nite R-RSB scheme at jumps, di!erentiation in terms of the location of that points results in (5.44) at that points. 5.2.2. Second variation The stability of a thermodynamic state against #uctuations in the space of the OPF x(q), the so called longitudinal #uctuations, can be studied through the second variation of the free energy term (5.40). We will present here brie#y the way the longitudinal Hessian can be calculated. In order to determine the variation of the "rst derivative (5.44), we should vary the "elds k and P. For k we obtain by de"nition
du(q , y ) dk(q , y ) "h(q !q ) dy R G (q , y ; q , y )k(q , y ) . "R (5.48) W dx(q ) W P dx(q ) In order to calculate the variation of the "eld P we need to vary the SPDE (4.53). This yields R dP"R dP!x R (k dP)!x R (P dk)!dx R (kP) , O W W W W dP(0, y)"0 .
(5.49a) (5.49b)
This can be solved by using the fact that the GF for the SPDE is the reverse of G . Thus P O dP(q , y )"! dq dy G (q , y ; q , y ) P ;+x(q ) Ry (P(q , y ) dk(q , y ))#Ry (P(q , y ) k(q , y )) dx(q ), . (5.50) Hence the variation of P(q , y ) by x(q , y ) is straightforward to obtain, where also Eq. (5.48) should be used. The above preliminaries allow us to express the second variation of the free energy functional. Varying (5.44) gives
1 dP(q , y ) dk(q , y ) du(0, 0) k(q, y )# dy P(q , y )k(q , y ) . " dy dx(q ) dx(q ) dx(q ) dx(q ) 2
(5.51)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
327
Substitution of the variation of P(q , y ) and of k(q , y ) yields after some manipulations du(0, 0) 1 " dy dy R G (q , y ; q , y ) P(q , y ) k(q , y ) k(q , y ) W P
dx(q ) dx(q ) 2 1 O # dq x(q ) dy dy dy P(q , y ) 4 (5.52) ;R G (q , y ; q , y ) R G (q , y ; q , y ) k(q , y )k(q , y ) , W P W P where
q
"min(q , q ) , (5.53a) q "max(q , q ) . (5.53b)
Note the symmetry of (5.52) w.r.t. the interchange of q and q . If we have the extremizing x(q) as well as the GF G , the latter yielding by (4.81) the "eld k, then Eq. (5.52) is an explicit expression for P the second functional derivative.
5.3. The Hessian matrix There are results in the literature on the algebraic properties of ultrametric matrices that can be straightforwardly applied to the present problem. As we shall see below, this amounts to "nding, in the state described by a general OPF x(q), an explicit expression for the eigenvalues of the Hessian in the so called replicon sector, deemed to be `dangerousa from the viewpoint of thermodynamical stability. 5.3.1. Ultrametric matrices The Hessian, or, stability matrix of the free energy term (4.1) is Rnu[U(y), Q] . (5.54) " Rq Rq ?@ AB If the replica correlations of x 's as in (5.24) are thought as moments then (5.54) is analogous to ? a cumulant, and can obviously be expressed as M
?@AB
"[x x x x \![x x \[x x \"C(a, b, c, d)!C(a, b) C(c, d) . (5.55) ?@AB ? @ A B ? @ A B V V V The transposition symmetry of the matrix Q was understood in the above de"nition. The Hessian (5.54) becomes a so called ultrametric matrix [111] once the R-RSB form (4.4) for Q is substituted. Note that while constructing the stability matrix we did not di!erentiate in terms of the indices x . P Indeed, one produces the Hessian before the hierarchical form for Q is substituted, and at that stage the parameters of the R-RSB scheme do not appear. We can now comfortably apply the results of the elaborate study by TemesvaH ri et al. [111] about ultrametric matrices. Such matrices have four replica indices and are in essence de"ned by the property that they exhibit the same symmetries w.r.t the interchange of indices as the Hessian (5.54) with a Parisi Q matrix substituted in it. The theory was originally formulated for "nite R-RSB [111], but, as we shall see, continuation of the formulas comes naturally. Firstly we should clarify M
328
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
notation. Let us remind the reader to the merger index r(a, b) de"ned in the R-RSB ansatz by Eq. (5.14) in Section 5.1.5. The r(a, b) was denoted by a5b in Ref. [111]. According to the convention of [111], the elements of the ultrametric matrix M can be characterized in a symmetric way by four merger indices, among them three independent. Redundancy is the price paid for a symmetric de"nition. The new indices are r "r(a, b) , r "r(c, d) , r "max[r(a, c), r(a, d)] , r "max[r(b, c), r(b, d)] , whence
(5.56a) (5.56b) (5.56c) (5.56d)
MP P ,M (5.57) ?@AB P P is just a relabeling of the Hessian matrix elements. According to [111] one can distinguish among three main invariant subspaces } sectors } of the space of Q matrices. Here we give a loosely worded brief account of the decomposition, emphasizing also the physical picture that transpires from comparison with earlier results on the SK model. The longitudinal sector is spanned by Parisi matrices with the same set of m , or, equivalently, x P P (its relation to the m is given by (4.12)), indices as the matrix Q had that was substituted into (5.54). P In the general case (without restrictions like the "xing of the diagonal elements) this space has R#1 dimensions. The projection of the Hessian onto the longitudinal sector is a (R#1);(R#1) matrix, whose diagonalization cannot be performed based solely on its utrametric symmetry, but should be done di!erently for di!erent free energy terms u[U(y), Q]. The longitudinal Hessian in the RPR limit is related to the Hessian of the functional u[U(y), x(q)] (see Section 5.2.2). This is demonstrated by the variational stability analysis of the SK model, within the continuous RSB scheme, near the spin glass transition, as performed in Ref. [97]. The eigenvalue equation obtained by variation was recovered by taking the RPR limit of the eigenvalue problem within the longitudinal sector of the Hessian (5.54). The longitudinal subspace can be considered as the generalization of a deviation from the RS solution that equally has RS structure, i.e., the longitudinal eigenvector of de Almeida and Thouless (AT) [96]. The second sector has been called anomalous in Ref. [111]. It may be viewed as the generalization of the second family of AT eigenvectors. The ultrametric symmetry allowed the transformation of the Hessian restricted to this invariant subspace into a quasi-diagonal form of n!1 pieces of (R#1);(R#1) matrices [111]. Some of these submatrices are identical, there are only R di!erent of them in the generic case. Again, the diagonalization of these submatrices is a task to be performed on a case-by-case basis. To our knowledge no such study has been performed for R'1. The third is the so-called replicon sector. Here the ultrametric symmetry made it possible to fully diagonalize the Hessian, resulting in an explicit expression for the replicon eigenvalues in terms of Hessian matrix elements [111]. The replicon modes, the elements of this subspace, are the generalization of the eigenvectors of de Almeida and Thouless that destabilized the RS solution of the SK model. In other words, these can be thought as responsible for replica symmetry breaking.
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
329
In the stability analysis by Whyte and Sherrington [11] on the 1-RSB solution of the storage problem of the spherical neuron (by Ref. [7]) it was equally the replicon eigenvalue that caused thermodynamical instability. Note that the replicon modes were also termed as ergodons by Nieuwenhuizen [68,69], due to their role in the breakdown of ergodicity in an RSB phase. 5.3.2. Replicons The replicon sector has special physical signi"cance, since instability there in known cases signaled the need for higher order R-RSB. The replicon eigenvalues of an ultrametric matrix can be written as [111] 0 0 !MP P !MP P #MP P ) , (5.58) jP " m m (MP P Q> R> Q>R> P P Q>R QR> QR QP RP where 04r 4R and r 4r , r 4R. The r 's are no longer attached to replica labels as they had G been in Eqs. (5.56). This discrete expression lends itself to continuation, when one uses parametrization by q G to relabel as P M(q , q , q ),MP P . (5.59) P P P P P Here inequalities (5.37) are implied. Using the simpler notation of q 's for parameterization we get G for the replicon eigenvalues
dq dq x(q ) x(q ) R R M(q , q , q ) . (5.60) O O > > O O Comparison with the sum above shows that the inequalities q 4q , q 4q ("q ) need to hold, 0 and, of course, the eigenvalue is de"ned only in those q 's where x (q )O0. Expression (5.60) is G G unambiguous even though the correlation functions and so the integrand are ill-de"ned over intervals where x(q ) has a plateau. In such an interval the integrand becomes a derivative and we G de"ne the quadrature as the di!erence between values at the endpoints of the interval. Eq. (5.60) is equivalent to a formula expressed in terms of the variable x that was quoted in [223]. We call the reader's attention also to the fact that the continuation of the sum (5.58) implies that in case of ambiguity the right-hand-side limit in q of the partial derivatives are to be used. This distinction is generically of no import in regions where R'x (q)'0, but is necessary to be made at steps, where the left and right limits are di!erent. The lower integration limits in (5.60) carry the superscript #0 for this reason. In order to simplify notation, hereafter we often omit the mark #0 but understand it tacitly wherever necessary. Next we use the expression of the Hessian through correlators as given by (5.55). After inspection of how the discrete labeling was converted to continuous parametrization we get j(q , q , q )"
M(q , q , q )"C(q , q , q )!C(q ) , (5.61) V V where the fourth-order correlator de"ned in (5.35) appears. Hence the replicon spectrum is
j(q , q , q )"
O
dq
O
dq x(q ) x(q ) R R C(q , q , q ) . O O V
(5.62)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
330
From expression (5.35) for the correlator we obtain
j(q , q , q )" dy P(q , y) K(q , y; q ) K(q , y; q ) ,
(5.63)
where by de"nition
dq x(q )R N(q , y ; q ) . (5.64) O O Using Eq. (5.36) for N and the identity (4.88) then substituting for the product x(q)i(q, y) the other terms in Eq. (4.48a), next performing partial integration and noting that G satis"es in its hind P variables the SPDE (4.53), we obtain K(q , y ; q )"
K(q , y ; q )" dy G (q , y ; q , y )i(q , y ) . P
(5.65)
The replicon spectrum can be expressed equivalently by the vertex function (4.85) as
j(q , q , q )" dy dy C (q ; 0, 0; q , y ; q , y )i(q , y )i(q , y ) . PPP
(5.66)
This formula can be graphically represented, if we recall that the "eld i is produced by the GF G for the PDE (4.46a) by (4.82). Let us mark G with a dashed line, then we have the graph on I I Fig. 8. Here we reemphasize that the solution of the relevant PDEs, in particular, the "eld u(q, y) with its derivatives and the GFs are assumed to be known, so the correlation functions and the replicon spectrum are considered as resolved if they are expressed in terms of the above "elds and GFs. 5.3.3. A Ward}Takahashi identity Recent results indicate the existence of an in"nite series of identities among derivatives of a function of Q, such as a free energy term, provided this term exhibits permutation symmetry in replica indices and the derivatives are considered with a Parisi matrix substituted as argument [223,228]. An equivalent source of the same identities is a `gaugea invariance, namely, the property
Fig. 8. The replicon eigenvalue in terms of GFs. The full line is G as before, the dashed line represents G , the GF for the P I PDE (4.46a).
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
331
that the free energy term looses its dependence on the speci"c m and q values and winds up P P depending only on x(q) in the nP0 limit [228]. These relations can be considered as analogous to the Ward}Takahashi identities (WTIs), arising in "eld theory for a thermodynamical phase wherein a continuous symmetry is spontaneously broken [229]. The continuous symmetry that is held responsible for the WTIs is the replica permutation symmetry in the nP0 limit, together with the appearance of an interval in q where x(q) is continuous and strictly increasing [223,228]. In our case, the free energy term (4.1) is of the aforementioned type, so we expect the WTIs to hold. Interestingly, the lowest-order nontrivial WTI can be easily obtained based on the results expounded in the previous section. Let us consider the replicon eigenvalues (5.66) in the case of coinciding arguments q"q "q "q 4q . (5.67) The behavior of the vertex function C for coinciding q-arguments can be easily deduced from PPP the requirement that the GFs become Dirac-deltas for coinciding times. Then the replicon eigenvalue assumes the form
j(q, q, q)" dy P(q, y)i(q, y)"j(q) .
(5.68)
This is precisely the r.h.s. of the identity (4.88) at q "0, y "0, while on the l.h.s. of the same we discover the 2nd-order correlator (5.33) for q(1. Therefore j(q)"CQ (q) . (5.69) V Strictly, this formula should be taken only at q's where the correlation function is de"ned i.e. q's that are limits of some q 's in the Parisi scheme (4.6). Nevertheless, we "nd that it holds with the P smooth continuation of (5.33) and (5.68) for any 04q(1, the more so remarkable because the replicon eigenvalues were not de"ned for arguments larger than q . Our present derivation yields just one identity out of a set of in"nitely many, but its advantage is that it uses analytic forms, and it is brief due to our prior knowledge about the properties of the relevant PDEs. Note that the WTI (5.69) was obtained for a mathematical abstraction, formula (4.1), but will gain physical signi"cance once we return to thermodynamics in Sections 7 and 8.
6. Interpretation and special properties 6.1. Physical meaning of x(q) In relation to spin glasses it has been shown that the OPF x(q) is the average probability that the overlap of two spin con"gurations from two di!erent pure (macro)states is smaller than q [110]. Furthermore, this property was found to naturally hold for combinatorial optimization problems that can be mapped to various spin glass models [14]. Similar feature follows from Parisi's ansatz for Q in the present neuron model evidently, but because of its signi"cance we brie#y give the derivation. Several further consequences of the hierarchical form of Q, as discussed in [14], also carry over to the neuron in the case of RSB.
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
332
Firstly let us consider the expression (4.8), where we replace x by 1 and q by some function G ?@ F(q ) of it. We obtain, using m "nP0, ?@ 0> 1 F(q )"!F(1)# [F(q )!F(q )]m , (6.1) ?@ P P\ P n ?$@ P whence, by continuation in the sense of Section 4.2.1
1 O "!F(q )# dq FQ (q)x(q)"! dq x (q)F(q) . F(q ) (6.2) ?@ n L ?$@ Here the assumption that only non-negative q's are relevant and q "q "1 was used. 0> " A density for the o!-diagonal matrix elements of Q can be obtained by substituting the Dirac delta for F(q) as
2 d(q!q ) " dq x (q )d(q!q )"x (q) . (6.3) ?@ n(n!1) L ?@ Finally, using the notation 122 for thermal average with n replicated partition functions, also L averaged over the patterns, the mean probability density of overlaps P(q) is, by the de"nition of q , ?@ 2 2 1d(q!N\J J )2 1d(q!q )2 " . (6.4) P(q)" ? @ L ?@ L n(n!1) n(n!1) L L ?@ ?@ Since the quantity to be averaged on the r.h.s. does not depend exponentially on N, the saddle point known from the free energy calculation does not move. The average 122 can be thus obtained L by simple substitution of the saddle point value in the Dirac deltas, i.e., the 122 sign can be L removed and we obtain (6.3), i.e.,
P(q)"x (q) .
(6.5)
The P(q) considered here is not to be confounded with the probability "eld P(q, y) of Section 4.2.3. This interpretation of x(q) indeed restricts the physically relevant space to monotonic functions. Further consequence that should be born in mind is that q's where P(q)"0 have vanishing relative weight in the thermodynamical limit. So any quantity depending on q carries direct physical meaning only for q's where x (q)'0. This reservation will hereafter be understood. The signi"cance of the x(q) (or q(x)) order parameter in long-range interaction systems extend to the "nite range problems. Indeed, the `mean "elda q(x) plays a role also in the "eld theory of spin glasses as discussed in Ref. [223]. It should be emphasized that the distribution of overlaps for a given instance of patterns SI, P (q) I 1 is not self-averaging. So the quenched average included in 122 and so in the de"nition of P(q) L leads to loss of information about the distribution of the random variable q. 6.2. Diagonalization of a Parisi matrix Since spectral properties of Parisi matrices (4.4) play an essential role in our framework, here we brie#y review known results about them (see, e.g., Refs. [230,73]). Only the case q "q "1 will ?? " be considered here, extension to any diagonals is straightforward. The eigenvalue problem is Q*P"DP*P ,
(6.6)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
333
where r labels the eigenvalues and eigenvectors. The simplest eigenvector belongs to r"0 and has uniform elements, say *"(1, 1,2, 1). The r"1 subspace is spanned by vectors, orthogonal to *, that are uniform over boxes of the "rst generation, each having m number of elements. An example is v"1 if a"l m #1,2, l (m #1), v"!1 if a"l m #1,2, l (m #1), ? ? with l , l (n/m , integers, and v"0 for other a's. For a general r, the eigenvectors are uniform ? over boxes of size m and orthogonal to all eigenvectors of lower indices, yielding the eigenvalues P 0> DP" m (q !q ). (6.7) N N N\ NP The dimension of the space of vectors uniform in boxes of size m is n/m , this space is spanned by P P all eigenvectors of index not larger than r. Given the fact that the r"0 eigenvalue is nondegenerate, it follows that the degeneracy of the rth, r'0, eigenvalue is k "n(m\!m\ ) . P P P\ Continuation of (6.7) in the sense of Section 4.2.1 results in eigenvalues indexed by q as
dq x(q ) . O In the case of "nite R-RSB, comparison with (6.7) gives D(q)"
(6.8)
(6.9)
D(q )"DP> . (6.10) P Thus formula (6.9) incorporates both the R-RSB case and the one when x(q) is made up of plateaus and curved segments. According to the conclusions of Section 6.1, whereas the function D(q) is de"ned for all 04q41, it gives eigenvalues only for q's where x (q)'0. In particular, after continuation and with the notation of Section 4.2.1, x(q),1 in the interval [q , 1], so we have from (6.9) D(q)"1!q .
(6.11)
While D(q ) is an eigenvalue, D(q )"1!q "D0>, the D(q) from Eq. (6.11) has not the meaning of eigenvalue for q'q . The above results allow us to calculate the trace of a matrix function F(Q) 0> 0 1 [F(DP)!F(DP>)]#nF(D0>) . Tr F(Q)" k F(DP)"n P m P P P In the continuation process we obtain
(6.12)
O 1 dq F(D(q))#F(D(q )) lim Tr F(Q)" n L " dq [F(D(q))!F(1!q)]#F(1)" dq F(D(q))#F(0) . (6.13) Note that depending on F(q) not all alternative forms may be meaningful, e.g., if F(x)"ln(1!x) or F(x)"ln x then the second or the third expression is ill de"ned. The explicit dependence on
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
334
q was eliminated from the second and third formulas. These expressions stay valid also for "nite R-RSB. A special case is the calculation of the determinant for (3.17c)
1 1 1 1 lim ln det Q"lim Tr ln(Q)" dq ! , (6.14) n n D(q) 1!q L L where the second formula from (6.13) was used. Since in the stationarity relation (3.21) the inverse of a Parisi matrix appears, we will calculate that herewith. Because of the fact that the diagonalizing transformation depends only on the m 's, P but not on the q 's, the inverse of a Parisi matrix is a Parisi matrix with the same +m , set. Thus also P P the elements of the inverse matrix depend only on the merger index r(a, b) introduced in (5.14). It is convenient to parametrize them also by q as [Q\] ,q\(q ). (6.15) ?@ P?@ This de"nes a function q\(q) by continuation, that has plateaus within (q , q ) in the R-RSB P\ P scheme. Equivalently, the inverse matrix can be represented by the inverse of q\(q), the function x\(q) (not to be confounded with the inverse of q(x) that is x(q)). The two characteristics are related through x\(q\(q)),x(q) .
(6.16)
This expresses the fact that in a "nite R-RSB the set of x indices is the same for Q and Q\. The P spectra are in reciprocal relation, for q4q D\(q\(q))"1/D(q) , (6.17) whence by di!erentiation, using (6.9) on each side, and requiring q\(0)"0, we arrive at
O dq . (6.18) D(q ) This leaves the diagonal elements (q\) "q\(1) of Q\ undetermined, that is obtained from 0> the reciprocal relation of the respective eigenvalues of index R#1, yielding q\(q)"!
O dq 1 q\(1)" ! . (6.19) D(q ) 1!q An attempt to continuation of q\(q) between q and 1 shows that q\(q) is non-monotonic. Again, relations (6.18) and (6.19) equally hold for the discrete R-RSB case, as well as when x(q) has both plateaus and curved segments, with the usual reservation that (6.18) relates matrix elements only when x (q)'0. 6.3. Symmetries of Parisi's PDE A systematic procedure of identifying all continuous symmetries of a PDE is the so-called prolongation method [231]. The knowledge of a continuous symmetry group allows one to generate out of a given solution a family of other solutions. Via the prolongation method we "nd by construction that there are altogether three oneparameter transformations leaving the PPDE (4.36) invariant. The action of these symmetries on
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
335
a solution u(q, y) can be given as a one-parameter family u(s, q, y), with u(0, q, y)"u(q, y). These one-parameter families are u (s, q, y)"u(q, y#s) , (6.20a) u (s, q, y)"u(q, y)#s , (6.20b) u (s, q, y)"u(q, y!D(q)s)!ys#D(q)s , (6.20c) where D(q) is de"ned by (6.9). The fact that the above families are solutions of the PPDE (4.36), provided u(q, y) is also a solution, can also be shown by substitution. The additional statement, namely, that there are no more continuous symmetries, follows from the construction of the prolongation method that we cannot undertake to describe here. Eq. (6.20a) represents translation in y, while (6.20b) is a shift of the "eld u by a constant, these symmetries are obvious. The third one, (6.20c), is less so, it is a shift of the origin in y and of the "eld u and a `tiltinga of the "eld u in y. The symmetry transformation equally changes the initial condition. As a forward reference we note that, in the case of the energy term for the storage problem of a single neuron, the PPDE (7.4) has the error measure potential X(\O, q 4q41 . b
(7.69)
The initial condition of the PPDE (7.68) is f (t"1, y)"f (q , y) , (7.70) whence for ¹P0, after change of integration variable in (7.69) as y "y#z(1!q, we get
(y!y ) . f (t"1, y)"min X(O\O , q 4q41 .
(7.74b)
Thus the initial condition for P(t, y) in the SPDE is P(t"0, y)"P(q , y)"G(y, q ) . The stationarity condition (7.21) for q's where x (q)'0, i.e., mQ (t)q (t)!m(t)qK (t)'0
(7.75)
(7.76)
reads now as
R q (tM ) q F[t, m(t)], # dtM !a dy P(t, y)m(t, y)"0 . (7.77) D(0) D(tM ) Of course, if one has a non-trivial plateau within the t-interval (0, 1) i.e., (7.76) fails in a subinterval, then (7.77) is invalid in that subinterval and one should extremize by the parameters of the plateau extra. In the PDEs and the stationarity condition the temperature does not appear explicitly and allows for a smooth limit in case ¹P0. Assuming that we solved the above PDEs, in the scaled variables the free energy becomes f"f #af , Q C 1 q (t) dt 1 b 1q # ln , f "! ! Q D(t) 2b g 2 D(0) 2
f " Dz f (t"0, z(q ) . C
(7.78a) (7.78b) (7.78c)
The distribution of local stabilities, based on (7.74b), is
o(y)"P(q"1, y)"e\@4W Dz P(t"1, y#z(1!q ) e@D RW>X(\O .
(7.79)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
355
It is straightforward to show that o(y) is normalized if P(t"1, y) was normalized, and the latter property follows from the fact that the SPDE preserves the normalization of its initial condition (7.75). The mean error per pattern can be calculated as
e" dy o(y)O. 2 (q!qi#q#i)(1!q)(1#q) (7.105)
The interesting feature is that the OPF has an explicit and non-perturbative form. The perturbation is in b now, and a small b apparently does not make x(q) degenerate. We shall need
if q 4q41 , D(q)" D (q),1!q #O dq x (q ) if q 4q4q , (7.106) O D (q ) if 04q4q . The leading term non-trivial in the free energy, (7.54b), depends only on the endpoints of the interval as 1!q
O dq 1 q # #ln(1!q )
(q , q )"! D (q) 2 D (q ) O O dq x (q)= Q (q)#c=(1)!c=(q ) . (7.107) #c O The replicon eigenvalues with identical arguments vanish due to the Ward}Takahashi identity, as described in Section 7.1.4, so the SG-I phase is at best marginally stable. Non-linear stability analysis is not available, but believed not to result in instability.
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
361
The fourth type of phase found here is a concatenation of a non-trivial plateau of x(q), like in 1-RSB, and a continuously increasing x (q). This CRSB spin glass state is also called SG-IV. The x (q) is again given by (7.105), but extra variational parameters w.r.t. the classic Parisi phase (SG-I) should be introduced: the value x of the plateau stretching from q to a q , and its upper border q . The OPF is given by (7.32) with x (q) as in (7.105), and 1!q if q 4q41 , if q 4q4q , D (q),1!q #O dq x (q ) O (7.108) D(q)" x (q !q)#D (q ) if q 4q4q , D "1!q #D (q )#x (q !q ) if 04q4q . The resulting free energy can be straightforwardly constructed from Eq. (7.54b) as
1 q 1 x (q !q ) O dq # # ln 1#
(q , q , q , x )"! #ln(1!q ) 2 D x D (q ) D (q) O # cx (=(q )!=(q )) O #c dq x (q)= Q (q)#c=(1)!c=(q ) . (7.109) O The specialty of the high-¹ limit is that the numerical evaluation of all spin-glass-like phases involves extremization only in a few scalars, because the x (q) is explicitly known. This has been done in Ref. [17], the results are demonstrated in the "gures there, which we redisplay for illustration. On Fig. 10 the phase diagram is shown, with one RS region and three di!erent types of
Fig. 10. Phase diagram for the potential
(7.111c)
where we have expressed q through g according to (7.66c). This maximization of the free energy functional f [*] has to be performed under the (non-holonomous) constraints x(q )50, x(q )41, and x (q)50 for q 4q4q , or equivalently, m(0)50, m(1)/bq (1)41 (cf. (7.66b)), and mQ (t)q (t)!m(t)qK (t)50 for 14t41 (cf. (7.76)). It is convenient to incorporate these constraints into an augmented free energy functional f (*) in the form of soft penalty terms: I f (*)"f [*]!k t(!m(0))!k I R ! k t(m(1)/bq (1)!1) , t(x)"xh(x)/2 .
dt t(m(t)qK (t)!mQ (t)q (t)) (7.112a) (7.112b)
Thus, by successively increasing the coe$cients k , k , and k in the course of the maximization R procedure of f (*), the respective constraints will be respected more and more rigorously. I Before we proceed, the following points are worth mentioning: (i) Like in Section 7.1.9, our only assumption on q(t) is that it should be a monotonically increasing function with q(0)"q and q(1)"q . But for concrete numerical calculations, especially at low temperatures ¹"b\, the speci"c choice (7.66a) has proven to be particularly appropriate. In any case, the implicit dependence of q(t) on the variational parameters v "q and v "g should be kept in mind: ,> q(t)"q(t; v , v ). ,>
(7.113)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
364
(ii) In our experience, the maximization procedure typically ends not at the border of the admitted parameter regime, where the soft constraints (7.112a) come into action, but rather in the interior of this admitted region. However, in the course of the maximization this border may be visited, and, in the absence of the soft constraints in (7.112a), the maximization procedure often goes out of the admitted region and diverges eventually. (iii) Strictly speaking, there are additional constraints on v and v associated with the restrictions 04q (q 41; in our experience they, however, ,> were never in danger to be violated with the obvious exception of cases with a stable RS solution. (iv) As in any variational ansatz, the necessary number N of parameters depends on how well the ansatz is adapted to the problem. In principle, a polynomial or piecewise linear ansatz (7.110) with a su$ciently large number N of parameters can approximate any shape of x(q) arbitrarily well. Whether or not N is su$ciently large in a given case should follow from the accuracy with which the stationarity conditions (7.21) and (7.22) are satis"ed. In practice, unavoidable numerical inaccuracies make things more complicated. As has been observed already in Ref. [16] within a 2-RSB ansatz, in the neighborhood of its maximum the free energy functional f (*) changes I extremely little upon certain parameter-variations, i.e., the energy landscape f (*) is very `#ata in I certain directions. In our experience, with increasing number of parameters N in (7.110), this problem becomes worse and worse in that the "nite numerical accuracy gives rise to a spurious `roughnessa in the already very `#ata energy landscape. As a consequence, any maximization strategy becomes slow or even fails for too large N. Similarly, the stability conditions are satis"ed very well (in comparison with their numerical uncertainty) within a fairly large neighborhood of the true maximizing x(q). As a consequence, in any speci"c case, a carefully tailored ansatz with not too many parameters has to be used and the criterion for convergence should be negligible changes in q , g, and m(t) upon re"ning the parametrization (7.110). In order to maximize the augmented free energy functional (7.112a), a good compromise between robustness against the spurious numerical "ne structure in the energy landscape and speed of convergence turned out to be a plain steepest descent procedure along the following lines: given a `workinga parameter set *, the direction of the steepest increase of f (*) is along the gradient I Rf (*)/R*. Taking into account all the implicit dependencies on * in (7.110), (7.113) and the I expression (7.20b) for the gradient of the original free energy functional, a straightforward but somewhat tedious calculation yields for the gradient of f (*) from (7.112a) the result I
FQ (t)m(t) Rq(t) M Rq (1) Rf I" dt# 2q (t) Rq q (1) Rq Rv !
M(t) m(t)
RqK (t) Rq (t) !mQ (t) dt , Rq Rq
(7.114a)
Rf F(t) Rm(t) Rm(0) M Rm(1) I" dt#k t(!m(0)) ! Rv 2 Rv Rv m(1) Rv L L L L !
M(t)
Rm(t) RmQ (t) qK (t)! q (t) dt , Rv Rv L L
(7.114b)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
Rf FQ (t)m(t) Rq(t) F(1) M Rq (1) I " ! dt! Rv 2bq (t) Rq 2 bq (1) Rq ,> Rq (t) RqK (t) ! M(t) mQ (t) !m(t) dt bRq bRq where 14n4N, t(x)"xh(x), and we have introduced the quantities
365
(7.114c)
M "k t(m(1)/bq (1)!1)m(1)/bq (1) , (7.115a) M(t)"k t(m(t)qK (t)!mQ (t)q (t)) , (7.115b) R and used F(t) to denote the l.h.s. of Eq. (7.77) for a given m(t) function. Along this direction Rf (*)/R* of steepest increase, one now searches for the maximum, i.e., the I expression f (*#jRf (*)/R*) has to be maximized with respect to j. This implies the condition I I J(j )"0 (7.116)
for the maximizing j"j , where
Rf (*#jRf (*)/R*) Rf (*) I ) I . (7.117) J(j)" I R* R* By updating the parameter set as * C *#j
Rf (*)/R* (7.118)
I one completes one iteration step of the steepest descent procedure. This iteration scheme is then repeated until * does not appreciably change any more. Note that due to the numerical inaccuracies it makes little sense to locate the zero from (7.116) very precisely in each iteration step. Our usual strategy was based on the assumption that J(j) behaves approximately linear near its zero at j"j . If J(j) is given at two nearby j-values, one then obtains an approximation for j by
linear interpolation. One such readily available J(j)-value is that for j"0, the second one follows by choosing for j the approximation for j from the previous iteration step.
7.2.4. The CRSB state In Ref. [18] we presented some characteristic results, obtained by the method expounded in the previous section, for the error measure (7.94). In a non-exhaustive search we found that if the RS solution is AT-unstable, at ¹"0 beyond capacity and also for some low temperatures, only a classic Parisi CRSB state emerges. Its OPF is given in (4.44), and was denoted as SG-I. We conjecture that at ¹"0 the region beyond capacity is such a phase. Su$ciently high ¹'s, where the 1-RSB and the CRSB state with a plateau (SG-IV) would have arisen, as described in Section 7.2.2, were not reached in our explorations. The scaling introduced in Section 7.1.9, and notably the introduction of the OPF m(q)"b x(q), allows the description of the CRSB state at any temperature, at the same time maintaining a smooth transition to the ground state, ¹"0. Physically, the fact that x(q)P0, at ¹"0, for any q(1 means that q"1 with probability one. Thus freezing sets in, similar to the ground state of the SK model [29]. At the same time, the degenerate x(q) is no longer a useful OPF, because the free energy becomes a functional of rather m(q).
366
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
Fig. 13. Scaled-order parameter function x(q) for i"0, a"3 at ¹"0 (solid), ¹"0.01 (dashed), and ¹"0.1 (dotted). The "rst discontinuity is at q , below the function is constantly zero. The second discontinuity for ¹'0 is q , which goes to 1 for ¹P0. Reprinted from Ref. [18].
On Fig. 13 the scaled OPF m(q)"bx(q) is displayed for various parameters. All parameter settings are in the AT-unstable region. This "gure is the "rst indication, to our knowledge, of Parisi's CRSB state for low temperatures in a system that is not a model of long-range interaction spin glasses, or closely related to such as the Little}Hop"eld network. It is remarkable that the scaling by b makes the continuously increasing segment m (q)"bx (q) of the OPF little sensitive to the temperature. Equally stable is the lower end q of the m (q) segment, but the upper end q shows linear temperature dependence, 1!q J¹. The rightmost plateau's value is obviously m(1)"b. At the same parameter settings as before, the local stability density is displayed on Fig. 14. Since in the method of Section 7.2.3 the evaluation of the probability "eld P(q, y) by the scaled SPDE (7.73) is done in every approximant step, we obtain the sought "eld in the end by (7.79). Not shown is the Dirac delta peak at ¹"0, this restores normalization to one there. A gap exists at ¹"0, with right border D"i, in accordance with (7.98), but the gap immediately disappears for any positive ¹, as it can be seen from (7.79). At ¹"0 the density o(D) linearly vanishes at the lower edge of the gap. Comparison between the CRSB solution and earlier RS [5], 1-RSB [7,8] and 2-RSB [11] approaches shows that averaged quantities, like the mean error per pattern do not show signi"cant di!erences. The qualitative behavior of the error, that it is zero below and is positive beyond capacity at ¹"0, furthermore that it linearly increases for small a!a , is re#ected by the previous solutions. The 1- and 2-RSB e(a) curves look the same on a resolution of a "gure [11]. On the other hand, the di!erence is more conspicuous in the distribution of non-self-averaging quantities. The OPF x(q) is the averaged probability measure of the overlap of coupling vectors, and the de"nitely
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
367
Fig. 14. Density of local stabilities o(D) from theory for i"0, a"3 at ¹"0 (solid), ¹"0.01 (dashed), and ¹"0.1 (dotted). Reprinted from Ref. [18].
continuously increasing part of it in Figs. 11 and 13 shows that "nite R-RSBs are qualitatively in error. Further qualitative di!erence can be found in the distribution of local stabilities o(D). Indeed, for "nite R-RSB the o(D) exhibits a discontinuity at the lower edge of the gap. The right tendency is shown by the feature that the size of the discontinuity is smaller in the 1-RSB than in the RS solution [7]. 7.2.5. Simulation In this section we describe the simulation results from [18]. Wendemuth adapted existing algorithms for below capacity of the simple perceptron, with potentials of the form (3.9), to the region beyond it by specially dealing with patterns with positive stabilities [155], and performed a series of simulations [37]. The most sensitive part of his work was the potential with b"0, which that counts the number of unstable patterns, an NP-complete problem from the algorithmic viewpoint [154]. His data showed signi"cant deviation from the then available best theoretical prediction from the 1-RSB calculation of Majer et al. [7]. He evaluated the probability density of local stabilities at a"1 and i"1, a point known to be beyond capacity. Although the shapes roughly resembled, a gap, and a peak at its right end, were present, the simulation data gave systematically and discouragingly larger stabilities than predicted by theory. Essentially following Wendemuth's algorithm we redid the simulation in order to see how persistent the deviation is. The "rst step is to generate random patterns (3.2). We selected numbers with uniform distribution from an interval centered around zero and in the end normalized them as , (SI)"N I I
(7.119)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
368
The output for the patterns, mI, were taken uniformly 1, not restricting generality, for SI have I random signs. The algorithm goes in discrete time t"0, 1,2. We initialized at t"0 the coupling vector according to the Hebb rule + J (0)"const. SI , (7.120) I I I with the constant chosen so that the Eucledian norm was "J(0)""N. At time t the local stabilities J(t) ) SI DI(t)" "J(t)"
(7.121)
are computed and among the unstable ones, i.e., DI(t)(i, the one with the largest DI(t) is selected. This is the least unstable pattern, characterized by the index k (t). The couplings are updated according to the rule of Wendemuth [155,37]. We took J(t#1)"J(t)#j(SI R#*S(t)) ,
(7.122)
where
*S(t)"
0
N/"J(t)"!DI R(t) J(t) "J(t)"!DI R(t)
if DI R(t)'0 , if DI R(t)(0 .
(7.123)
The j is the gain parameter, chosen in Ref. [155] as j"N\. By trial and error we found that a larger gain parameter j"N\ did not endanger overall convergence, and made the "nal approach for a given pattern, DI R(t)Pi, faster. The second row in the update rule (7.123) is Wendemuth's term introduced to specially cope with patterns with negative stability. At the next time step t#1 we again "nd the least unstable pattern with index k (t#1) and update the couplings by the above rule. The usual course of the algorithm is that the least unstable pattern is the same, k (0)"k (1)"2, until it becomes stabilized at say t !1, whence another pattern is taken for some steps, k (t )"k (t #1)"2, again until it becomes stabilized. In principle, another pattern may become least stable before the one in question is stabilized, but typically this was not the case. The above recipe is repeated until a pattern cannot be stabilized in a reasonable time. The notion of reasonable time could be quanti"ed, because the time needed to stabilize a pattern showed a systematic increase as function of the total number of patterns stabilized before. Therefore, it is a good recipe to halt the algorithm, when a pattern cannot be stabilized within a small multiple of the extrapolated convergence time. In test runs, if the last pattern could not be stabilized within twice the extrapolated convergence time, it could not within ten times of the same either. Thus we are con"dent that we exploited the possibilities of the update rule described above. Wendemuth algorithm is based on the argument that one has the highest chance to stabilize the pattern among all patterns with DI(i whose DI is closest to i. So this algorithm may maximize the number of stable patterns, by successively pushing the stability of the least unstable pattern to i from below. A consequence is that the remaining non-stabilized patterns with DI(i will have relatively large distance i!DI, but the latter quantity does not enter the present error measure. Nevertheless, the principle of stabilizing the least unstable pattern resembles qualitatively the
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
369
Fig. 15. Density of local stabilities o(D) at a"i"1. The horizontal axis is D, the vertical one o. The theoretical prediction is given by the full line. The two empirical densities are normalized histograms, taken with M"N"500 and 1000. Reprinted from Ref. [18].
gradient descent algorithm for di!erentiable error measures, because every step is made in the momentarily most promising direction. The shortcomings of such algorithms in NP-complete problems is known, and we cannot be certain that the number of unstable patterns is indeed minimized. The result of the simulation at the parameter setting a"i"1 is shown on Fig. 15. Since i'0, in the "nal approach DI RPi for the momentarily least unstable pattern the stability is positive, so the second row in the update rule (7.123) does not come into play. The full line is the result of numerical extremization of the variational free energy (7.19) by the method explained in Section 7.2.3. We omitted the Dirac delta peak of the theoretical probability density at i"1. The dashed lines are the histograms for the local stabilities from simulation for two sizes, M"N"500 and 1000, with proper normalization. We do not enclose the original data of Wendemuth [37], but mention that his histogram showed a much larger systematic error. To quantify the deviation let us consider the mean error e, i.e., the relative number of misclassi"ed patterns. Wendemuth's number is 0.21, the present simulation gives 0.15, while theory predicts 0.1358. Thus we are still about 10% o! the theoretical value, but it is a remarkable improvement w.r.t. the previous deviation of 55%. The size of the gap from simulation is also within about 10% of the theoretical value. The simulation data reproduces, for the larger size M"N"1000, the property that the density o(D) linearly vanishes at the lower edge of the gap. This should be contrasted with the 1-RSB result in Ref. [7], where the size of the discontinuity at the lower edge of the gap is about the third of the height of left peak. The simulation clearly favors the CRSB solution. In summary, the theoretical and simulation data do not match perfectly, however, given the NP-completeness of the numerical problem, this does not disprove theory. We mention that the
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
370
algorithm used had the primitive side of being deterministic. Furthermore, it does not have a rigorous mathematical basis for convergence to the desired state. There is obviously room for further improvements. 8. The neuron: independently distributed synapses 8.1. Free energy and stationarity condition In this paper we focus mostly on the spherical neuron. Since, however, the main formulas for the case of prior distribution (3.6), where synapses are independent and obey arbitrary distribution, follow straightforwardly from Section 4, we now brie#y review them. In the course of continuation the limits q Pq , q Pq , 0 q( Pq( , q( Pq( 0 are assumed. The corresponding free energy (3.22) can be characterized by two OPFs q(x), q(0)"q , q(1)"q , q( (x), q( (0)"q( , q( (1)"q( . Alternatively, we can take as OPFs the respective inverses
(8.1a) (8.1b)
(8.2a) (8.2b)
x(q), x( (q( ) .
(8.3)
q( "q( (x(q)) ,
(8.4)
Then
or its inverse function q"q(x( (q( ))
(8.5)
establishes a relation between the overlaps q and q( . Concerning the f term, Eqs. (7.1)}(7.9) from the spherical case carry over unchanged. The C entropic term (3.22d) is a transcript of (4.41) with (4.3) together with the appropriate equations that produce the averages. We introduce the "eld (see Eqs. (4.40a) and (4.40b)) fK (q( , y)"!b\u( (q( , y)
(8.6)
to get
1 fK [x( (q( )]"lim fK (QK )"!b\u( ln du w (u) e\@SW,QK "fK (0, 0) , Q n Q L L where fK (q( , y) is the solution of
(8.7)
R ( fK "!RfK #bx( (R fK ) , O W W
(8.8a)
fK (q( , y)"!b\ ln Dz du w (u) exp(!bu(y#iz(qL )) .
(8.8b)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
371
Introducing m( (q( , y)"R fK (q( , y) , W we have
(8.9)
R ( m( "!Rm( #bx( m( R m( , W W O m( (q( , y)"R fK (q( , y) . W Furthermore, the K -ed &susceptibility "eld' is s( (q( , y)"R m( (q( , y) , W obeying
(8.10a) (8.10b)
(8.11)
R ( s( "!Rs( #bx( (m( R s( #s( ) , W W O s( (q( , y)"R fK (q( , y) . W The probability density PK (q( , y) satis"es a variant of the SPDE R ( PK "RPK #bx( R (PK m( ) , W W O PK (0, y)"d(y) .
(8.12a) (8.12b)
(8.13a) (8.13b)
The interaction term (3.22c) is simplest if expressed through the functions (8.2) f [x(q), x( (q( )]"!b G
dx q(x)q( (x) . (8.14) Since a function is a functional of its inverse, the f [2] can be considered as functional of x(q) G and x( (q( ). The stationarity conditions (3.24), (3.25) now read as
q" dy PK (q( , y)m( (q( , y) ,
(8.15a)
q( "a dy P(q, y)m(q, y) ,
(8.15b)
where the connection between q and q( is established by (8.4) or (8.5). The r.h. sides are respective functionals of x( (q( ) and x(q). Note that solving these equations involves also "nding the starting point q( , in contrast to the evaluation of the energy term, where the initial condition is "xed at q"1. Given the solution for the stationary x(q) and x( (q( ), by substituting them into the r.h.s. of f"fK [x( (q( )]#f [x(q), x( (q( )]#af [x(q)] (8.16) Q G C we obtain the "nal result for the mean free energy. A special case of independently distributed synapses is the clipped neuron, i.e., the neuron with discrete synapses. The most studied such model is the Ising neuron with binary synapses, which has
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
372
attracted considerable interest (see [19,12] for references). The prior distribution in the Ising case involves (3.7), so the initial conditions for the PDEs are , (8.17a) fK (q( , y)"!b\ ln cosh by#bq( m( (q( , y)"!tanh by . (8.17b) The Ising neurons studied in the literature so far were reminiscent to the random energy model in that they involved at most 1-RSB [234]. However, only a few choices of the error measure potential LKP L 0> . (C.2) DzPP exp U zPP (q !q eLP UWqm " P P\ H ? H P P H ? P Note that m "1 and a"j (a); we will substitute j for a. The integrals over z0> 0> 0> 0> H0> ? factorize as j (a)" P
0 LKP LK0> DzPP Dz0> H0> H P HP H0> 0 ;exp U zPP 0> (q !q #z0> (q !q . (C.3) P P\ H H H0> 0> 0 P The functions j ( j ), r4R, are step-like in that they are constant for m /m di!erent j 's P 0> 0 0> 0> 's associated with the same box belonging to the same box of length m . Integrations over z0> 0 H0> eLP UWqm "
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
379
give identical results. Di!erent integrals are characterized by di!erent j 's, this can be given as the 0 new argument for the rest of the indices as j ( j ), r4R. We then have P 0 P 0 LK 0 LK DzPP eLP UWqm " H P HP H0 0 K0 K0> exp U zPP 0 (q !q #z (q !q . (C.4) ; Dz P P\ H H 0> 0> 0> 0 P Again, integration over a z0 gives the same value for those j 's that de"ne the same j ( j ), H0 0 P 0 r4R!1. These can be characterized by j , and one obtains 0\ 0\ LKP LK0\ Dz Dz eLP UWqm " DzPP 0 0> H H0\ P HP 0> 0\ K0 K0> K0\ K0 . ;exp U zPP 0\ (q !q # z (q !q P P\ P P P\ H H P0 P (C.5)
The expression can be rolled up by continuing the above reasoning and we arrive at
eLP UWqm " Dz ;
Dz
Dz 2
Dz exp U 0>
0> z (q !q P P P\ P
K0 K0>
K K K K
2
.
(C.6)
Appendix D. Derivation of the PPDE by continuation To the author's knowledge Ref. [226] is considered to be the only publication on the derivation of the PPDE. However, we were not able to reproduce the derivation from that article. Furthermore, [226] required RPR and q !q P0, conditions which we did not "nd necessary to P P\ prescribe. In essence, [226] proposes an iteration in a direction that is opposite to that of the recursion (4.15). We were unable to reconstruct that, mostly because the starting term was not known. In other words, we evaluated the free energy term (4.11) starting from r"R#1, while [226] did so from r"0 (in our notation). When q !q P0 is assumed, our recursion yields the PPDE in the spirit of Ref. [226]. We P P\ use the identity
(D.1)
t (y)" eOP \OP\ Wt (y)VP VP> . P\ P
(D.2)
exp
c d F(y)" Dz F(y#z(c) 2 dy
to rewrite (4.15a) into
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
380
In order to produce a PDE from the recursion, the assumption of ordering for q 's is necessary. We P can then relegate the dependence on the index r to dependence on the variable q"q . ContinuaP tion is then performed by replacing q by q, t (y) by t(q, y). We allow for non-trivial limits q and P P q as introduced in (4.42). The conditions (4.6) and (4.13) ensure monotonicity of x(q). If we assume a smooth x(q), i.e., that all q !q P0 and x !x P0 for 14r4R#1, then an expansion P P\ P P\ of (D.2) in the di!erences to lowest nontrivial order yields for t(q, y) the PDE (4.34) in the interval (q , q ). As we found in Section 4.1.2, Eq. (4.34) and, equivalently, the PPDE (4.36), stands even if x(q) is not smooth, with the right interpretation of (4.34) at discontinuities of x(q). On the other hand, the author gladly acknowledges that the way he "rst obtained the PPDE for the general free energy term (4.1) was in the spirit of the above discussed derivation of Ref. [226].
Appendix E. Multidimensional generalization of the PPDE We consider here the generalized free energy term
dL)x dL)y L 1 exp U(y,2, y)) u[U(y), Q]" ln ? ? (2p)L) n ? L ) L 1 ) ;exp i xI yI ! xI qIJ xJ , ? ?@ @ ? ? 2 IJ ?@ I ? where the order parameter matrix has now extra indices
(E.1)
[Q]IJ "qIJ . (E.2) ?@ ?@ Such a situation occurs, for instance, in the treatment of thermodynamical states in vector spin glasses, or, of the metastable states in the SK model. When counting the stationary states of the Thouless}Anderson}Palmer equations, Bray and Moore [26] encountered Eq. (E.1) with K"2 and a special U. They displayed the corresponding PPDE but did not pursue the matter further. Since Eq. (E.1) is a straightforward generalization of the Parisi term, we brie#y give the way how to evaluate it. Also, we concisely formulate the calculation of replica correlators. The assumption of the Parisi structure for all individual submatrices of Q with "xed k, l can be cast into the form 0> Q" (Q !Q )U P I P . LK P P\ K P
(E.3)
Here [Q ]IJ"qIJ "qIJ (E.4) P?@ P?@ ?@ is the symmetric K;K matrix analog of (5.14). The quadratic form in the exponent in (E.1) is now LKP HP KP 0> ) HP KP (qIJ!qIJ ) xI xJ , P P\ ? @ P IJ HP ?KP HP \> @KP HP \>
(E.5)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
381
with qIJ "0. Let us diagonalize the di!erence between subsequent Q 's as \ P Q !Q "O2K O , r"0,2, R#1 , (E.6) P P\ P P P where the orthogonal K;K matrix O is made up by column eigenvectors of Q !Q and K is P P P\ P diagonal and has the real eigenvalues as diagonal elements. A derivation similar to that given in Section 4.1 and Appendix C yields the R-RSB term 1 u[U(y), +qIJ,, x]" " P L x
D)z ln D)z
D)z 2
0> V0 V0> V V z KO . (E.7) 2 P P P P Here K has the square root of the eigenvalues (possibly also imaginary numbers, the sign being P irrelevant) as diagonal elements, D)z denotes the K-dimensional Gaussian integration measure, and z is a K-dimensional vector. The function U(y,2, y)) is naturally abbreviated by U(y). The P recursion ;
D)z exp U 0>
t (y)" D)z t ( y#zKO )VP VP> , P P P P\ t (y)"eUy 0> evaluates (E.7) as
(E.8a) (E.8b)
1 D)z ln t (zKO ) . (E.9) u[U(y), +qIJ,, x]" " P L x In order to produce a PDE we need to specify a time-like variable. For practical purposes we consider the case when one diagonal element is a known constant, say q "1. Then we pick 0> q as time variable, call its continuation q, and obtain the PDE for the "eld t(q, y) in K spatial P dimensions as x 1 R t"! Q t# t ln t , W O x 2 W
(E.10a)
t(1, y)"eUy .
(E.10b)
Here the dot means derivative in terms of q, of course [Q ]"1, and q evolves from 1 to 0. As in the case with one spatial dimension, in the q-intervals (q , 1) and (0, q ) we have x(q),1 and x(q),0, resp., where q "q and q "q . Again, by introducing 0 ln t(q, y) (E.11) u(q, y)" x(q) we obtain the K-dimensional PPDE as R u"! Q u!x( u)Q u , O W W W W u(1, y)"U(y) .
(E.12a) (E.12b)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
382
Then the sought term is u[U(y), +qIJ,, m]" "u(0, 0) . P L The evolution in the interval (q , 1) can be solved explicitly to give
(E.13)
u(q , y)" D)z exp U(y#zK O ) 0> 0> d)v d)w exp[U(*)#iw(*!y)!w(Q !Q )w] , 0> 0 (2p))
"
(E.14)
that is the initial condition for further evolution in (0, q ). From the mathematical viewpoint, the problem of existence of the above expression needs to be clari"ed for the speci"c U in play. It typically occurs that a diagonal element of Q is known to vanish, but for other r's the same 0> diagonal is positive. In general, Q !Q is not necessarily a positive-de"nite matrix. However, 0> 0 given the fact that Eq. (E.14) at y"0 is the RS free energy (where q is replaced by the RS value of q), on physical grounds we surmise that the divergence of the integral is a rare threat. In the present case there are K(K#1) OPFs, namely, x(q) and qIJ(q), (k, l)O(1, 1) and k, l4K. Expectation values [A(+xI ,, +yI ,)\ (E.15) ? ? we conveniently de"ne by inserting the function A in the integrand of (E.1) and omitting the 1/n ln from in front of the formula. The nP0 limit is understood. As in one spatial dimension, the GF G (q , y ; q , y ) for the multidimensional PPDE is a key help in calculating averages of common P occurrence. The GF is zero for q 'q and satis"es the PDE (E.16) R G "! Q G ! x(q )( u(q , y ))Q G !d(q !q )d)(y !y ) . W W P W W P O P Special signi"cance is attached to P(q, y)"G (0, 0; q, y) , P a natural generalization of the K"1 "eld. Let us introduce the derivative "elds kI(q, y)"R I u(q, y) , W iIJ(q, y)"R I R J u(q, y) . W W Then we can write the two-replica-correlator
(E.17)
(E.18a) (E.18b)
(E.19)
(E.20)
Ru[U(y), Q] "![xI xJ \,CIJ (q ) ? @ V P?@ RqIJ L ?@ as CIJ (q)" d)y P(q, y)[kI(q, y)kJ(q, y)#h(q!1\)iIJ(q, y)] . V
By use of this formula the stationarity conditions for a free energy that contains a term like (E.1) can be immediately constructed.
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
383
Appendix F. An identity between Green functions In this appendix we show the identity (4.87). The r.h.s. of
R Cuuu (q; +q , y , )" dy([R G (q , y ; q, y)]G (q, y; q , y ) G (q, y; q , y ) G G G O P P P O # G (q , y ; q, y)[R G (q, y; q , y )] G (q, y; q , y ) P O P P # G (q , y ; q, y) G (q, y; q , y ) [R G (q, y; q , y )]) (F.1) P P O P can be expressed by our making use of the PDEs for the participating GFs. From (4.77) we have R G (q , y ; q, y)"RG (q , y ; q, y)!x(q)R [k(q, y) G (q , y ; q, y)]#d(q!q )d(y!y ) , O P W P W P (F.2) and for i"2, 3 (4.76) holds as R G (q, y; q , y )"! RG (q, y; q , y )!x(q) k(q, y) R G (q, y; q , y )!d(q!q )d(y!y ) . O P G G W P G G W P G G G G (F.3) Let us substitute the r.h. sides of the above PDEs into (F.1). The sum of the terms linear in x(q) turns out to be a derivative by y, so } under the plausible condition that the GF's decay for large "y" } integration by y gives zero. The second derivatives in y also cancel after partial integration but for a remnant that yields
R Cuuu (q; +q , y , )" dy G (q , y ; q, y)[R G (q, y; q , y )] [R G (q, y; q , y )] O G G G P W P W P #d(q!q )G (q , y ; q , y )G (q , y ; q , y ) P P !d(q!q )G (q , y ; q , y )G (q , y ; q , y ) P P !d(q!q )G (q , y ; q , y )G (q , y ; q , y ) . (F.4) P P Eq. (4.75) relates derivatives of G and G , whence we obtain (4.87) for q (q(q and P I q (q(q . Appendix G. PDEs for high temperature Here we record the calculation leading to the lowest order non-trivial correction for the distribution of local stabilities at high temperatures. Assuming P(q, y)"P (q, y)# b P (q, y)#O(b) and expanding the SPDE we obtain R P "RP , P (0, y)"d(y) , (G.1a) O W R P "RP #xR (P m ), P (0, y),0 . (G.1b) O W W
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
384
Here m (q, y) is the lowest-order approximation for the "eld m(q, y) in (7.5), thus it satis"es (7.6) with b"0, i.e., evolves according to pure di!usion. Using its initial condition m (1, y)" a nuclear binding of less than 10 MeV. The radius of the disk of our galaxy is 30 kps. A kiloparsec is about 3300 light years.
N.K. Glendenning / Physics Reports 342 (2001) 393}447
397
Fig. 1. Three phases in the evolutionary track of pulsars are (1) from high magnetic "eld and moderate rotation period to long period in about 10}10 yr, (2) to accreting X-ray neutron stars (radio silent), (3) to millisecond pulsars with low magnetic "elds.
"eld strength, they fade from the radio sky. The concentration of canonical pulsars at longer period simply re#ects the rate at which angular momentum is lost, which falls o! as the rotational period increases. (dP/dtJ1/P for magnetic dipole radiation.) A neutron star that has disappeared from the radio sky may reappear after an inde"nite time, "rst as an X-ray neutron star, its surface heated by accretion of matter from a less dense companion. The companion may have been acquired in the long, inde"nite silent period, or the progenitor of the neutron star may have already had a low-mass companion. Angular momentum conservation assures the accretion-driven spin-up of the neutron star. This phase has been discovered } the missing link between canonical and millisecond pulsars [4,5]. When the combination of higher frequency and a magnetic "eld, now much smaller either because of ohmic decay or partial destruction by the accreted material, regains a critical value, the neutron star reappears as a millisecond pulsar. Because of the signi"cant centrifugal force in millisecond pulsars, their internal structure will change as angular momentum is radiated, though on a very long timescale [6,7]. The three stages in the evolution of neutron stars are illustrated in Fig. 1. Of course, the last two stages need not be realized by a particular neutron star. Because canonical and millisecond pulsars are the only neutron stars that have been observed, or are likely ever to be observed, our discussion is centered on them rather than the transient proto-star stage. The intermediate stage X-ray neutron stars may eventually provide crucial information on the mass-radius relationship for neutron stars, but it is too early to say for sure [8,9]. Certainly they provide the missing link between the canonical pulsars and millisecond pulsars.
About half of all stars have companions.
398
N.K. Glendenning / Physics Reports 342 (2001) 393}447
The much weaker magnetic "eld of millisecond pulsars compared to canonical pulsars produces a much smaller torque. Consequently, millisecond pulsar rotation changes very slowly. However, though the time rate-of-change of properties of these stars is much too slow to observe over a human lifetime } even many lifetimes } the rate of change with respect to rotational frequency can be large and produce highly anomalous values of certain observables related to the spin characteristics [6,7].
2. Superdense matter and its phases The properties and phenomena associated with dense matter have fascinated scientists since the time white dwarfs were "rst discovered. It was di$cult even for Eddington to understand how objects that were orders of magnitude denser than earth could be formed and cool } even of what they might consist } until the wedding of special relativity and quantum mechanics provided the understanding of degenerate Fermi systems [10}12]. Gravity crushes atoms in white dwarfs until they are ionized and the degenerate electrons that occupy the interstices between nuclei provides the pressure that stabilizes these stars. An understanding at the level of Fermi degeneracy was available to Baade and Zwicky [13] who correctly asserted that the binding energy of closely packed nucleons could power the explosion of stars when the core collapsed at the endpoint of exothermic fusion reactions. Fermi energy of nucleons, assisted by the repulsion of nuclear forces at short distance (presumably due in part or whole to the exclusion principle among their quark constituents) provided an easy understanding of the possible existence of neutron stars. The discovery of rapidly spinning pulsars provided the evidence of the existence of superdense matter. The average density can be inferred by balancing the centrifugal and gravitational forces at the stellar surface. The fastest rotator provides a lower limit on the average density of the order of nuclear density. The inner density is much higher than the average, just as for the atmosphere on our planet. Since then, great strides have been made in theoretical explorations of the possible states of superdense matter, but little in the way of concrete evidence exists. On the one hand, the astronomer is limited to discovering and making measurements on the spectacles that are presented. While being a limitation, one must admire the fantastic expansion of our understanding of the universe in the last decade occasioned by the rapid development of new technologies, satellite borne observational apparatuses and the electronic computer. On the other hand, the laboratory experimenter can control and repeat observations within limits, but as concerns superdense matter, the handicaps are also great, not least of which is the extremely short lifetime of the dense domain produced in nuclear collisions, and the enormous multiplicity of produced particles which can be observed only after the dense domain has been blown apart. In exploring the possible nature of dense matter, the theorist is limited only by the laws of nature, his imagination, and very little in the way of observational constraints. In this paper we shall discuss the properties and especially phases that may exist in superdense matter, being especially attentive to possible veri"able consequences of our exploration. The properties of superdense matter that are inferred from theory are quite remarkable; among them is the formation of crystalline structures in the mixed phase of nuclear matter and any of its high-density phases. The
N.K. Glendenning / Physics Reports 342 (2001) 393}447
399
crystal structure consists of the rarer of the two phases occupying crystal lattice sites in the background of the dominant phase. Such a possibility was not even imagined a decade ago [14,15]. A very promising signal of a high density phase transition in pulsars was recently conceived [6,7]. But the consequences of crystalline structure may be more subtle; however, the structure must surely e!ect all transport properties [15]. Indeed, recent calculations con"rm a large e!ect of geometric structure on neutrino transport [16]. In addition, one can anticipate an e!ect on pulsar glitches, small, sudden, irregular, and unpredictable changes in rotation frequency followed by a recovery; there is great individuality in the glitch behavior of individual pulsars, as would be expected because of large changes in crystalline structure for small di!erences in mass [17]. So, at the present, theoretical investigation goes beyond what is veri"able; but that is likely to change. Because of the potential that such a novel crystalline structure can produce observable signals, the purpose of the paper is to review the theoretical basis on which it rests. The structures should appear in any dense nuclear medium so long as (1) electric charge is one of the conserved quantities carried by the matter, (2) that matter under consideration is isospin or charge asymmetric, and (3) provided the dense state endures long enough to allow relaxation into the lowest energy state of the dense matter [14,15]. Neutron stars satisfy all three conditions, but the dense matter produced in nuclear collisions almost certainly does not have the time to relax. The discussion will therefore be centered on neutron stars and in particular on the quark decon"nement and the kaon condensate phase transitions. At nuclear density and at densities somewhat higher, neutron star matter consists of a charge neutral mixture of neutrons, protons and leptons. At moderately higher density than nuclear, some nucleons, driven by the Pauli principle, will likely be converted to hyperons. We refer to the phase containing leptons, nucleons or other members of the baryon octet as the normal phase. At higher densities, achieved either as a proto neutron star cools through neutrino di!usion and shrinks [18}20] or as a centrifugally deformed rotating neutron star loses angular momentum to magnetic dipole radiation and shrinks [6], transitions to other phases may occur. High-density phases include, besides further hyperonization, the Bose condensed (either pion or kaon), and quark decon"ned phases. Each of these examples may be of "rst or second order. If of "rst order, a &mixed' phase consisting of spatially distinct regions of the pure phases in phase equilibrium will occur at densities intermediate between the phase of normal neutron star matter and the highdensity phase. The mixed phase will extend over a "nite radial region of the star and the proportion of the normal and high-density phases will vary according to depth (pressure) in the star. The mixed phase will consist of an intricate spatial pattern of the rare phase that occupies Coulomb lattice sites immersed in the dominant phase. This was a surprising "nding that pertains to the characteristics of any "rst-order phase transition in a substance having more than one conserved quantity (or independent component) of which one is the electric charge [14,15]. Beta stable neutron star matter is such a substance.
The Coulomb force overwhelms gravity at the surface of a star when the ratio of net positive charge to baryon number exceeds 10\. The limit is reduced by m /m for negative charge. C
400
N.K. Glendenning / Physics Reports 342 (2001) 393}447
2.1. Hyperonized matter Hyperonized matter refers to a phase in which, at moderate density above normal nuclear density, some of the baryon charge is carried by hyperons of either sign of electric charge or neutral. Such a phase in high-density baryonic matter is almost an inevitable consequence of the Pauli principle, which will distribute baryon charge over several species so as to lower the baryon chemical potential and therefore the energy [21,22]. While the protoneutron star is still hot, such #avor changing reactions as N#NPN##K ,
(1)
are possible. The associated kaon is free to decay unless it is driven by a phase transition to a condensed state (which we discuss later). Depending on the combination of protons and neutrons that are involved in the above reaction, all three charge states may be formed. Some of their decays are KP2 , K\P\# , \#K>P\#>#P2# .
(2)
The star cools by di!usion of the neutrinos and photons to the surface where they escape. The star thus looses energy and the reactions above become irreversible; strangeness is thus locked in [21]. When the star has cooled to a point where the associated Kaon cannot be produced in reactions like (1), the "nal state of equilibrium can be reached by weak #avor changing reactions such as illustrated in Fig. 2. In all of the reactions cited, a neutrino is produced, and consequently, the reactions are Pauli blocked during the "rst 20 s in the life of the neutron star until the neutrinos have di!used to the surface and escaped. There are also neutrinoless #avor-changing week interactions. The timescale for the normal weak interaction is 10\ s. The neutrinoless reactions are slower with a timescale of 10\ s. Obviously, the weak interactions facilitate hyperonization, when the strong interaction cannot. The hyperonization phase transition is likely of second order though in principle it could be of "rst order. (A relevant way of thinking of a second-order transition is that the di!erence of the phases is one of degree rather than of substance. Thus if the concentration of hyperons increases continuously from zero as the baryon density is increased above a critical value, the transition is of second or higher order. However if the concentration is
Fig. 2. Illustration of a weak #avor changing reaction at the quark level. The three horizontal lines denote the three valence quarks in a baryon. denotes lepton, either electron or muon.
N.K. Glendenning / Physics Reports 342 (2001) 393}447
401
discontinuous, being zero in the &normal' phase and "nite in the hyperonized phase } never tending to zero as the critical density is approached from above } the transition is of "rst order.) Hyperonization has several important e!ects in neutron stars: (1) It reduces the maximum possible mass of neutron stars compared to models in which they are absent by as much as 3/4M > [21,23]. (2) It e!ects the cooling rate of neutron stars [24,25]. (3) It provides a mechanism by which protoneutron stars of mass somewhat above the limiting mass of the fully equilibrated stars may produce a supernova and then promptly subside into a low-mass (&1.5}2M ) black hole in about > 20 s [18}20] (see Ref. [26] for a discussion of the role of neutrino trapping). In fact the neutron star produced in the 1987A event may have disappeared promptly into a black hole [27]. 2.2. Bose condensation Bose condensation could be of either order, depending on the interactions, but is possibly preempted by hyperonization or by quark decon"nement [21,22]. The reason for this, as explained in detail elsewhere [21], is that the condensation of \ or K\ is favorable only if the electron Fermi energy exceeds the e!ective mass in the medium of either meson. In such an event, bosons become the energetically favored agent for neutralization of protons. However at the higher baryon densities where this might otherwise be the case, baryons of both charges appear. Since baryon number of a star is conserved, but not lepton number (because of neutrino di!usion out of the star), charge neutrality can be achieved among baryons only, with no or little need for charged leptons or mesons [21,22]. (This is easily understood by remarking that the energetic cost of baryons must be paid because of their conservation but lepton Fermi energy and boson masses need not be paid when neutrality can be achieved among baryons.) However, the question of whether kaons are likely to condense or not has not been fully explored so far for several reasons: (1) The phase transition with conservation laws properly enforced (as discussed below) is very di$cult to implement when the full `botanya of baryon species is included. (2) The coupling constants of hyperons of the baryon octet are not well known. In fact the best that can be done is a constraint on the coupling [23], and the assumption that all other hyperons couple similarly, or by quark counting rules. 2.3. Quark deconxnement The property of asymptotic freedom of quarks assures that at some su$ciently high density, the decon"nement phase transition will occur irrespective of whether either of the other phase transitions has occurred at lower density [28]. It is not known whether decon"nement is of "rst or second order in cold, baryon rich hadronic matter. Lattice QCD so far has not been simulated with dynamical quarks. Models of the phase transition in which the decon"ned phase is represented by the &Bag' model [29] or any of its variations, are "rst order [30]. Both quark decon"nement and Bose condensation can produce the same e!ects for neutron stars as cited above for hyperonization. The decon"nement phase transition in neutron stars was "rst discussed more than twenty years ago [31}36]. However, the theory was reexamined and new insights into "rst-order phase transitions in any complex substance were achieved including the formation of a mixed phase with crystalline structure, as mentioned above [14,15]. In all of the earlier work, either beta equilibrium
402
N.K. Glendenning / Physics Reports 342 (2001) 393}447
was ignored or else charge neutrality was imposed as a local constraint on the mixed phase. It is evident that either constraint may prevent the model star from attaining its lowest-energy state. What is not necessarily obvious is that both constraints cause a "rst-order phase transition to be of the constant pressure type like the vapor}liquid transition in water [14,15]. A mixed phase of constant pressure independent of proportion of the phases is excluded in the monotonically varying pressure environment of a gravitating body. In the second case in which charge neutrality is imposed as a local constraint, the electron chemical potential is discontinuous between the two phases and so cannot satisfy Gibbs conditions for equilibrium. The implications for stellar structure in both cases were therefore incorrect. The implications were substantial: a neutron star composed of the two phases would have a large density discontinuity at the radial point that corresponds to the constant phase transition pressure. Pure quark matter would occupy the region interior to the discontinuity, with pure con"ned hadronic matter surrounding. The mixed phase would be entirely absent in this incorrect treatment of phase equilibrium. However, when a phase transition in beta stable neutron star matter is treated so as to respect Gibbs criteria for equilibrium, the pressure is not constant but rather is a monotonically varying function of the proportion of the two phases in equilibrium [14,15]. The density discontinuity disappears and instead, a region of mixed phase occupies a layer of possibly several kilometers in thickness between the pure phases. As was pointed out, it is not possible to simultaneously satisfy Gibbs criteria and locally imposed conservation laws ( (r),0) in systems containing several conserved quantities. Gibbs conditions for phase equilibrium and conservation laws can be satis"ed simultaneously only when the conservation laws are imposed in a global sense ( (r) d