Random Graphs The book is devoted to the study of classical combinatorial structures, such as random graphs, permutatio...
77 downloads
1286 Views
2MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Random Graphs The book is devoted to the study of classical combinatorial structures, such as random graphs, permutations, and systems of random linear equations in finite fields.
The author shows how the application of the generalized scheme of allocation in the study of random graphs and permutations reduces the combinatorial problems to classical problems of probability theory on the summation of independent random variables. He concentrates on recent research by Russian mathematicians, including a discussion of equations containing an unknown permutation. This is the first English-language presentation of techniques for analyzing systems of random linear equations in finite fields. These results will interest specialists in combinatorics and probability theory
and will also be useful in applied areas of probabilistic combinatorics, such as communication theory, cryptology, and mathematical genetics. V. F. Kolchin is a leading researcher at the Steklov Institute and a professor at the Moscow Institute of Electronics and Mathematics (MIEM). He has written four books and many papers in the area of probabilistic combinatorics. His papers have been published mainly in the Russian journals Theory of Probability and Its Applications, Mathematical Notes, and Discrete Mathematics, and in the international journal Random Structures and Algorithms.
ENCYCLOPEDIA OF MATHEMATICS AND ITS APPLICATIONS
EDITED BY G.-C. ROTA
Editorial Board R. Doran, M. Ismail, T.-Y. Lam, E. Lutwak Volume 53
Random Graphs 6 18 19
22 23 24 25 26 27 28 29 30 31
32 33 34 35 36 37 38 39
40 41
42 43 44 45 46 47 48 49 50 51
52
H. Minc Permanents H. O. Fattorini The Cauchy Problem G. G. Lorentz, K. Jetter, and S. D. Riemenschneider Birkhoff Interpolation J. R. Bastida Field Extensions and Galois Theory J. R. Cannon The One-Dimensional Heat Equation S. Wagon The Banach-Tarski Paradox A. Salomaa Computation and Automata N. White (ed.) Theory of Matroids N. H. Bingham, C. M. Goldie, and J. L. Teugels Regular Variation P. P. Petrushev and V. A. Popov Rational Approximation of Real Functions N. White (ed.) Combinatorial Geometries M. Pohst and H. Zassenhaus Algorithmic Algebraic Number Theory J. Aczel and J. Dhombres Functional Equations in Several Variables M. Kuczma, B. Chozewski, and R. Ger Iterative Functional Equations R. V. Ambartzumian Factorization Calculus and Geometric Probability G. Gripenberg, S.-O. Londen, and O. Staffans Volterra Integral and Functional Equations G. Gasper and M. Rahman Basic Hypergeometric Series E. Torgersen Comparison of Statistical Experiments A. Neumaier Interval Methods for Systems of Equations N. Korneichuk Exact Constants in Approximation Theory R. A. Brualdi and H. J. Ryser Combinatorial Matrix Theory N. White (ed.) Matroid Applications S. Sakai Operator Algebras in Dynamical Systems W. Hodges Basic Model Theory H. Stahl and V. Totik General Orthogonal Polynomials R. Schneider Convex Bodies G. Da Prato and J. Zabczyk Stochastic Equations in Infinite Dimensions A. BjOrner, M. Las Vergnas, B. Sturmfels, N. White, and G. Ziegler Oriented Matroids G. A. Edgar and L. Sucheston Stopping Times and Directed Processes C. Sims Computation with Finitely Presented Groups T. Palmer Banach Algebras and the General Theory of *-Algebras F. Borceux Handbook of Categorical Algebra I F. Borceux Handbook of Categorical Algebra II F. Borceux Handbook of Categorical Algebra III
ENCYCLOPEDIA OF MATHEMATICS AND ITS APPLICATIONS
Random Graphs V. F. KOLCHIN Steklov Mathematical Institute, Moscow
AMBRIDGE UNIVERSITY PRESS
CAMBRIDGE UNIVERSITY PRESS Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, Sao Paulo
Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521440813
© Cambridge University Press 1999
This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 1999
A catalogue record for this publication is available from the British Library
Library of Congress Cataloguing in Publication data Kolchin, V. F. (Valentin Fedorovich) Random graphs / V. F. Kolchin p cm. - (Encyclopedia of mathematics and its applications; v. 53) Includes bibliographical references and index. ISBN 0 521 44081 5 hardback 1. Random graphs. I. Title. II. Series. QA166.17.K65 1999 98-24390 511'.5 - dc20 CIP ISBN 978-0-521-44081-3 hardback
Transferred to digital printing 2007
CONTENTS
Preface 1
ix
The generalized scheme of allocation and the components of random graphs The probabilistic approach to enumerative combinatorial problems 1.2 The generalized scheme of allocation 1.3 Connectivity of graphs and the generalized scheme 1.4 Forests of nonrooted trees 1.5 Trees of given sizes in a random forest 1.6 Maximum size of trees in a random forest 1.7 Graphs with unicyclic components 1.8 Graphs with components of two types 1.9 Notes and references
1
1.1
2
Evolution of random graphs 2.1
2.2 2.3 2.4 2.5 3
3.2 3.3 3.4 3.5
22 30 42 48 58
70 86 91
Subcritical graphs Critical graphs Random graphs with independent edges Nonequiprobable graphs Notes and references
Systems of random linear equations in GF(2) 3.1
1
14
Rank of a matrix and critical sets Matrices with independent elements Rank of sparse matrices Cycles and consistency of systems of random equations Hypercycles and consistency of systems of random equations vii
91
97 100 109 120 122 122 126 135 143
156
Contents
viii
3.6 3.7
Reconstructing the true solution Notes and references
4 Random permutations 4.1
4.2 4.3 4.4 5
Random permutations and the generalized scheme of allocation The number of cycles Permutations with restrictions on cycle lengths Notes and references
Equations containing an unknown permutation 5.1
5.2 5.3 5.4
A quadratic equation Equations of prime degree Equations of compound degree Notes and references
164 177
181 181
183 192
212
219 219 225 235 239
Bibliography
241
Index
251
PREFACE
Combinatorics played an important role in the development of probability theory
and the two have continued to be closely related. Now probability theory, by offering new approaches to problems of discrete mathematics, is beginning to repay its debt to combinatorics. Among these new approaches, the methods of asymptotic analysis, which have been well developed in probability theory, can be used to solve certain complicated combinatorial problems. If the uniform distribution is defined on the set of combinatorial structures in question, then the numerical characteristics of the structures can be regarded as random variables and analyzed by probabilistic methods. By using the probabilistic approach, we restrict our attention to "typical" structures that constitute the bulk of the set, excluding the small fraction with exceptional properties. The probabilistic approach that is now widely used in combinatorics was first
formulated by V. L. Goncharov, who applied it to S,,, the set of all permutations of degree n, and to the runs in random (0,1)-sequences. S. N. Bernstein, N. V. Smirnov, and V. E. Stepanov were among those who developed probabilistic combinatorics in Russia, building on the famous Russian school of probability founded by A. A. Markov, P. L. Lyapunov, A. Ya. Khinchin, and A. N. Kolmogorov. This book is based on results obtained primarily by Russian mathematicians and presents results on random graphs, systems of random linear equations in GF(2), random permutations, and some simple equations involving permutations. Selecting material for the book was a difficult job. Of course, this book is not a complete treatment of the topics mentioned. Some results (and their proofs) did not seem ready for inclusion in a book, and there may be relevant results that have escaped the author's attention. There is a large body of literature on random graphs, and it is not possible to review it here. Among the probabilistic tools that have been used to analyze random structures are the method of moments, Poisson and Gaussian approximations, generating functions using the saddle-point method, Tauberian-type theorems, analysis ix
x
Preface
of singularities, and martingale theory. In the past two decades, a method called the generalized scheme of allocation has been widely used in probabilistic combinatorics. It is so named because of its connection with the problem of assigning n objects randomly to N cells. Let 111, ... -77N be random variables that are, for example, the sizes of components of a graph. If there are independent random variables 1, ... , 4N so that the joint distribution of 111, ... , I]N for any integers kl, ... , kN can be written as
P{i11=k1,...,17N=kN}=P{ where n is a positive integer, then we say that ill, ... , ?IN satisfy the generalized scheme of allocation with parameters n and N and independent random variables
1,
, N-
Graph evolution is the random process of sequentially adding new edges to a graph. For many classes of random graphs with n labeled vertices and T edges, the parameter 0 = 2T/n plays a role of time in the process; various graph properties
often change abruptly at the critical point 0 = 1. Graph evolution is the most fascinating object in the theory of random graphs, and it appears that it is well suited to the generalized scheme. We will show that applying generalized schemes makes it possible to analyze random graphs at different stages of their evolution and to obtain limit distributions in those cases in which only properties similar to the law of large numbers have been proved. The theory of random equations in finite fields is shared by probability, combinatorics, and algebra. In this book, we will consider systems of linear equations in GF(2) with random coefficients. The matrix of such a system corresponds to a random graph or hypergraph; therefore, results on random graphs help to study these systems. We are sure that this application alone justifies developing the theory of random graphs. The theory of random permutations is a well-developed branch of probabilistic combinatorics. Although Goncharov has investigated the cycle structure of a random permutation in great detail, there is still great interest in this area. We will
fully describe the asymptotic behavior of P{v = k) for the total number v, of cycles in a random permutation for all possible behaviors of the parameters n and
k = k(n) as n -k oo. We will also give some of the asymptotic results for the number of solutions of the equation Xd = e, where an unknown X E Sn, d is a fixed positive integer, and e is the identity of the group S'. Although the generalized scheme of allocation cannot be applied to nonequiprobable graphs, we present some results in this situation by using the method of moments. The statistical applications of nonequiprobable graphs call for the development of regular methods of analyzing these structures. The book consists of five chapters. Chapter 1 describes the generalized scheme of allocation and its applications to a random forest of nonrooted trees, a random
Preface
xi
graph consisting of unicyclic components, and a random graph with a mixture of trees and unicyclic components. In Chapter 2, these results are applied to the study of the evolution of random graphs. Chapter 3 is devoted to systems of random linear equations in GF(2). Much of this branch of probabilistic combinatorics is the work of Russian mathematicians; this is the first English-language presentation of many of the results. Random permutations are considered in Chapter 4, and Chapter 5 contains some results on permutation equations of the form Xd = e. Most results presented in this book derive from work done over the past fifteen years; notes and references can be found in the last section of each chapter. (It is, of course, impossible to give a complete list in each particular area.) In addition to articles used in the text, the summary sections of all chapters include references to papers on related topics, especially those in which the same results were obtained by other methods.
We assume that the reader is familiar with basic combinatorics. This book should be accessible to those who have completed standard courses of mathematical analysis and probability theory. Section 1.1 includes a list of pertinent results from probability. This book continues in the tradition of Random Mappings [78] and differs from other treatments of random graphs in the systematic use of the generalized scheme of allocation. We hope that the chapter on systems of random linear equations in GF(2) will be of interest to a broad audience. I wish to express my sincere appreciation to G.-C. Rota, who encouraged me to write this book for the Encyclopedia of Mathematics series, even though there are already several excellent books on random graphs. My greatest concern is writing the book in English. I am indebted to the editors who have brought the text to an acceptable form. It is apparent that no amount of editing can erase the heavy Russian accent of my written English, so my special thanks go to those readers who will not be deterred by the language of the book. I greatly appreciate the support I received from my colleagues at the Steklov Mathematical Institute while I wrote this book.
1
The generalized scheme of allocation and the components of random graphs
1.1. The probabilistic approach to enumerative combinatorial problems The solution to enumerative combinatorial problems consists in finding an exact or approximate expression for the number of combinatorial objects possessing the property under investigation. In this book, the probabilistic approach to enumerative combinatorial problems is adopted. The fundamental notion of probability theory is the probability space (Q, A, P), where Q is a set of arbitrary elements, A is a set of subsets of S2 forming a oralgebra of events with the operations of union and intersection of sets, and P is a nonnegative countably additive function defined for each event A E A so that P (S2) = 1. The set S2 is called the space of elementary events and P is a probability.
A random variable is a real-valued measurable function
_ ((w) defined for all
(ES2. Suppose Q consists of finitely many elements. Then the probability P is defined on all subsets of S2 if it is defined for each elementary event w E 0. In this case, any real-valued function _ ((o) on such a space of elementary events is a random variable.
Instead of a real-valued function, one may consider a function f((o) taking values from some set Y of arbitrary elements. Such a function f ((0) may be considered a generalization of a random variable and is called a random element of the set Y. In studying combinatorial objects, we consider probability spaces that have a natural combinatorial interpretation: For the space of elementary events S2, we take the set of combinatorial objects under investigation and assign the same probability to all the elements of the set. In this case, numerical characteristics of combinatorial objects of 0 become random variables. The term "random element of the set S2" is usually used for the identity function f((o) = w, w E Q, mapping each element
of the set of combinatorial objects into itself. Since the uniform distribution is I
2
The generalized scheme of allocation and the components of random graphs
assumed on S2, the probability that the identity function f takes any fixed value w is the same for all w E Q. Hence the notion of a random combinatorial object of S2, such as the identity function f((o) = co, agrees with the usual notion of a random element of a set as an element sampled from all elements of the set with equal probabilities. Note that a random combinatorial object with the same distribution could also be defined on larger probability spaces. For our purposes, however, the natural construction presented here is sufficient for the most part. The exceptions are those few cases that involve several independent random combinatorial objects and in which it would be necessary to resort to a richer probability space, such as the direct product of the natural probability spaces. Since we use probability spaces with uniform distributions, in spite of the probabilistic terminology, the problems considered are in essence enumeration problems of combinatorial analysis. The probabilistic approach furnishes a convenient form of representation and helps us effectively use the methods of asymptotic analysis that have been well developed in the theory of probability. Thus, in the probabilistic approach, numerical characteristics of a random combinatorial object are random variables. The main characteristic of a random variable is its distribution function F(x) defined for any real x as the probability of the
event { < x}, that is,
F(x) = P{ < x}. The distribution function F(x) defines a probability distribution on the real line called the distribution of the random variable . With respect to this distribution, given a function g(x), the Lebesgue-Stieltjes integral
Joog(x)dF(x) ao
can be defined. The probabilistic approach has advantages in the asymptotic investigations of combinatorial problems. As a rule, we have a sequence of random variables , n = 1 , 2, ... , each of which describes a characteristic of the random combinatorial object under consideration, and we are interested in the asymptotic
behavior of the distribution functions F (x) = P{ < x} as n -* oo. A sequence of distributions with distribution functions F (x) converges weakly to a distribution with the distribution function F(x) if, for any bounded continuous function g(x),
00
as n oo. The weak convergence of distributions is directly connected with the pointwise convergence of the distribution functions as follows.
1.1 Probabilistic approach to enumerative combinatorial problems
3
Theorem 1.1.1. A sequence of distribution functions Fn (x) converges to a distribution function F(x) at all continuity points if and only if the corresponding sequence of distributions converges weakly to the distribution with distribution function F(x). In a sense, the distribution, or the distribution function F(x), characterizes the random variable 4. The moments of l; are simple characteristics. If
fJ-
Ix I d F(x)
exists, then
E=
xdF(x)
00
I
is called the mathematical expectation, or mean, of the random variable . Further, /00
Mr = El;r = J
xrdF(x)
-00
is called the rth moment, or the moment of rth order (if the integral of Ix Ir exists).
In probabilistic combinatorics, one usually considers nonnegative integervalued random variables. For such a random variable, the factorial moments are natural characteristics. We denote the rth factorial moment by
m(r) = El; ( - 1) ... (l; - r + 1). If a distribution function F(x) can be represented in the form
F(x) =
x
p(u) du,
I
where p(u) > 0, then we say that the distribution has a density p(u). In addition to the distribution function, it is convenient to represent the distribution of an integervalued random variable by the probabilities of its individual values. For l; , we will use the notation
k=0,1,..., and for integer-valued nonnegative random variables ln, pkn)
=
k},
k = 0, 1, ... .
It is clear that 00
El; = Ykpk, n=0
if this series converges. It is not difficult to see that the following assertion is true.
4
The generalized scheme of allocation and the components of random graphs
Theorem 1.1.2. A sequence of distributions { P(n) }, n = 1, 2, ... , converges weakly to a distribution { pk} if and only if for every fixed k = 1, 2, ... , pkn) -+ Pk
as n -4 oo. If an estimate of the probability P{ > 01 is needed for a nonnegative integervalued random variable , then the simple inequality 00
00
k=1
k=1
(1.1.1)
can be useful. In particular, for a sequence fin, n = 1, 2, ..., of such random variables with Eln -+ 0 as n oo, it follows that
Since it is generally easier to calculate the moments of a random variable than the whole distribution, one wants a criterion for the convergence of a sequence of distributions based on the corresponding moments. But, first, it should be noted that even if a random variable has moments of all orders, its distribution cannot, in general, be reconstructed on the basis of these moments, since there exist distinct
distributions that have the same sequences of moments. For example, it is not difficult to confirm that for any n = 1, 2, ..., 00
fo
xne-1/4sinx1/4dx = 0.
Hence, for -1 < a < 1, the function
pa(x) = -L 24
e-1/4(1 +a sin X 1/4)
is the density of a distribution on [0, oo) whose moments do not depend on a. Thus the distribution functions with moments of all orders are divided into two classes: The first class contains the functions that may be uniquely reconstructed from their moments, and the second class contains the functions that cannot be reconstructed from their moments. There are several sufficient conditions for the moment problem to have a unique solution. Let
Mn = f00
Ix I' dF(x).
00
A distribution function F(x) is uniquely reconstructed by the sequence mr, r = 1, 2, ... , of its moments if there exists A such that 1 Mn/n < A. n
(1.1.2)
1.1 Probabilistic approach to enumerative combinatorial problems
5
The following theorem describing the so-called method of moments is applicable only to the first class of distribution functions. Theorem 1.1.3.
If distribution functions Fn (x), n = 1, 2, ..., have the moments
of all orders and for any fixed r = 1, 2, ...,
xrdFn(x)
m;. n) =
mr,
ImrI < 00,
roo
as n -+ oo, then there exists a distribution function F(x) such that for any fixed
r = 1,2,..., fOO
xr dF(x),
Mr = J 0C
and f r o m the sequence F n (x), n = 1 , 2, ... , it is possible to select a subsequence
Fnk (x), k = 1, 2, ... , that converges to F(x) as n -k oo at every continuity point of F(x). If the sequence mr, r = 1, 2, ... , uniquely determines the distribution function F(x), then Fn (x) F(x) as n - oo at every continuity point of F(x). Note that the normal (Gaussian) and Poisson distributions are uniquely reconstructible by their moments. To use the method of moments, it is necessary to calculate moments of random variables. One useful method of calculating moments of integer-valued random variables is to represent them as sums of random variables that take only the values 0 and 1. Theorem 1.1.4.
If Sn =1 + ... + 4n,
and the random variables m=1,2, n,
take only the values 0 and 1, then for any
Sn(Sn-1)...(S, -m+1)=
im
where the summation is taken over all different ordered sets of different indices im }, the number of which is equal to (m)m!.
Generating functions also provide a useful tool for solving many problems related to distributions of nonnegative integer-valued random variables. The complex-valued function 00
4, (z) = O1: (z) = E pkzk = Ez k=0
(1.1.3)
6
The generalized scheme of allocation and the components of random graphs
is called the generating function of the distribution of the random variable 4. It is defined at least for Izi < 1. For example, for the Poisson distribution with parameter A, which is defined by the probabilities k
pk
ke
k=0,1, ...,
eA(Z-1). the generating function is Relation (1.1.3) determines a one-to-one correspondence between the generating functions and the distributions of nonnegative integer-valued random variables, since the distribution can be reconstructed by using the formula
1 Pk = _O(k) (0),
k = 0, 1,
....
(1.1.4)
Generating functions are especially convenient for the investigation of sums of independent random variables. If l , ... , n are independent nonnegative integervalued random variables and S = l;i + + ,,, then 0sn (z) =
i
(z) ...
n
(z)
The correspondence between the generating functions and the distributions is continuous in the following sense.
Theorem I.I.S.
Let {pkn'}, n = 1, 2, ..., be a sequence of distributions. If for
anyk=0,1,...,
A
Pk
as n -* oo, then the sequence of corresponding generating functions On (z), n = 1, 2, ... , converges to the generating function of the sequence {pk} uniformly in any circle I z I < r < 1. In particular, if { pk} is a distribution, then the sequence of corresponding generating functions converges to the generating function 0 (z) of the distribution { pk }
uniformly in any circle Izi < r < 1. Theorem 1.1.6. If the sequence of generating functions On (z), n = 1, 2, ... , of the distributions {pkn' } converges to a generating function O (z) of a distribution
{pk} on a set M that has a limit point inside of the circle Izi < 1, then the distributions {pkn' } converge weakly to the distribution { pk}.
Since a generating function 0 (z) _ be represented by the Cauchy formula
o
pkzk is analytic, its coefficients can
0 (z) dz
P,,
JC
n =0, 1,...,
where the integral is over a contour C that lies inside the domain of analyticity of 0 (z) and contains the point z = 0.
1.1 Probabilistic approach to enumerative combinatorial problems
7
Thus, if we are interested in the behavior of p, as n -> oo, then we have to be able to estimate contour integrals of the form
_
1
2ni c
g(z)e Xf(z) dz,
where g(z) and f (z) are analytic in the neighborhood of the curve of integration C and A is a real parameter tending to infinity. The saddle-point method is used to estimate such integrals. The contour of integration C may be chosen in different ways. The saddle-point method requires choosing the contour C in such a way that it passes through the point zo, which is a root of the equation f(z) = 0. Such a point is called the saddle point, since the function 91f(z) has a graph similar to a saddle or mountain pass. The saddle-point method requires choosing the contour of integration such that it crosses the saddle point zo in the direction of the steepest descent. However, finding such a contour and applying it are complicated problems, so for the sake of simplicity one usually does not choose the best contour, hence losing some accuracy in the remainder term when estimating the integral. A parametric representation of the contour transforms the contour integral to an integral with a real variable of integration. Therefore the following theorem on estimating integrals with increasing parameters, based on Laplace's method, sometimes provides an answer to the initial question on estimating integrals. Theorem 1.1.7.
If the integral
G(,l) =
g(t)e'`f(t) dt foo
converges absolutely for some A = Ao, that is,
g(t) l e'ol(t) dt < M. FOO
if the function f (t) attains its maximum at a point to and in a neighborhood of this point
f(t) = f(to) + a2 (t - to)2 + a3(t - to) 3 + .. . with a2 < 0; if for an arbitrary small 8 > 0, there exists h = It (S) > 0 such that
f(to) - f(t) >- h,
fort - tol > 8; and if, as t -+ to,
g(t) = C(t - to)2,(1 + O(It - tol)),
The generalized scheme of allocation and the components of random graphs
8
where c is a nonzero constant and m is a nonnegative integer, then, as J
00,
G(A) = e'kf(to)A-m-1/2cclm+lr(m + 1/2)(1 + where r (x) is the Euler gamma function and 1
,I
1
a2
In particular if m = 0, then c = g(to), and as A - oo, G(A) = exf(to)
g(to) _(to)ll n/(I +
To demonstrate that this rather complicated theorem can really be used, let us estimate the integral 00
r(A+1)=Jx)'e-xdx 0
oo, and obtain the Stirling formula. The change of variables x = At leads to the equation as A
r(A + 1) =
00 A;'+le-x
I
e-A(t-l-log t) dt.
Here g(t) = 1, and f(t) = -(t - 1 - logt), f(1) = 0, f'(l) = 0, f"(1) _ -1. The conditions of the theorem are fulfilled; therefore, by (1.1.5), 00
G(A) =
J0
ef(t) dt = 2n/X(1 +
and for the Euler gamma function, we obtain the representation
r(A + 1) _ .k;1+1/2e-x 2Jr(1 + 0(1/,/,-k)) oo, coinciding with the Stirling formula, except for the remainder term, which can be improved to 0 (1 /),). Generating functions are only suited for nonnegative integer-valued random variables. A more universal method of proving theorems on the convergence of sequences of random variables is provided by characteristic functions. The charas A
acteristic function of a random variable distribution is defined as cp(t) = tpg(t) = Ee`t =
or the characteristic function of its
ftxdF,
(1.1.6)
where -oo < t < oo and F(x) is the distribution function of l; .
If the rth moment mr exists, then the characteristic function to(t) is r times differentiable, and cp(r)(0) = irmr.
1.1 Probabilistic approach to enumerative combinatorial problems
9
Characteristic functions are convenient for investigating sums of independent random variables, since if S, = l 1 + + fin, where 41, ... ,n are independent random variables, then 1(t)...cp4n(t)-
cos(t) = (p
The characteristic function of the normal distribution with parameters (m, U2) and density
p(x) = is
e-(x-m)2/(2a2)
1
2n a
eimt-a2t2/2
Relation (1.1.6) defines a one-to-one correspondence between characteristic functions and distributions. There are different inversion formulas that provide a formal possibility of reconstructing a distribution from its characteristic function, but they have limited practical applications. We state the simplest version of the inversion formulas. Theorem 1.1.8. If a characteristic function cp(t) is absolutely integrable, then the corresponding distribution has the bounded density 1
p(x) = 2n
J
°O
e-itxco(t) dt.
The correspondence defined by (L 1.6) is continuous in the following sense.
Theorem 1.1.9. A sequence of distributions converges weakly to a limit distribution if and only if the corresponding sequence of characteristic functions cpn (t) converges to a continuous function cp (t) as n -k oo at every fixed t, -oo < t < 00. In this case, cp(t) is the characteristic function of the limit distribution, and the convergence cpn (t) -+ (p (t) is uniform in any finite interval. For a sequence i;n of characteristics of random combinatorial objects, applying Theorem 1.1.9 gives the limit distribution function. But for integer-valued characteristics, one would rather have an indication of the local behavior, that is, the behavior of the probabilities of individual values. To this end the so-called local limit theorems of probability theory are used. Let l; be an integer-valued random variable and pn = P f = n). It is clear that P(l; E F'1 } = 1, where F1 is the lattice of all integers. If there exists a lattice rd with a span d such that P(i;' E rd} = 1 and there is no lattice r with span greater than d such that P(l; E r}, then d is called the maximal span of the distribution of %'. The characteristic function cp(t) of the random variable is periodic with
period 2n/d and rcp(t) I < 1 for 0 < t < 2,r/d.
10
The generalized scheme of allocation and the components of random graphs
For integer-valued random variables, the inversion formula has the following form: 1
n
Pn = 2n J e-itnco(t) dt. n Consider the sum SN = 1 + + N of independent identically distributed integer-valued random variables 1, ... , l;N. When the distributions of the summands are identical and do not depend on N, the problem of estimating the probabil-
ities P{SN = n}, as N - oo, has been completely solved. If there exist sequences of centering and normalizing numbers AN and BN such that the distributions of the random variables (SN - AN)/BN converge weakly to some distribution, then the limit distribution has a density. Moreover, a local limit theorem holds on the lattice with a span equal to the maximal span of the distribution of the random variable 1. If the maximal span of the distribution of 1 is 1, then the local theorem holds on the lattice of integers. Theorem 1.1.10. Let 1, 2, ... be a sequence of independent identically distributed integer-valued random variables and let there exist AN and BN such that, as N oo for any fixed x,
P
{SNAN BN
-< x j
__*
x
Jf 00
p(u) du.
Then, if the maximal span of the distribution of l 1 is 1,
BNP{SN = n} - p((n - AN)/BN) -+ 0 uniformly in n.
Local limit theorems are of primary importance in what follows. Therefore, let us prove a local theorem on convergence to the normal distribution as a model for proofs of local limit theorems in more complex cases, which will be discussed later in the book.
a'IN1
Theorem 1.1.11. Let the independent identically distributed integer-valued random variables 4i, 2, ... have a mathematical expectation a and a positive variance a2. Then, if the maximal span of the distribution oft is 1,
1+...+4N=n}-
P{
uniformly in n as N
1
(n-aN)2 I - 2a2N
oo.
Proof. Let
z=
n - aN
a'J
and
PN(n) = P{ 1 +
+ N = n}.
0
1.1 Probabilistic approach to enumerative combinatorial problems
11
If cp(t) is the characteristic function of the random variable 1, then the character+ N is equal to cpN(t), and istic function of the sum SN = 41 + 00
tPN(t) = E PN(n)eitn n=-oo
By the inversion formula,
PN(n) = 2n
J
e-itntoN(t) dt.
(1.1.7)
Let cp* (t) denote the characteristic function of the centered random variable 1 - a, which equals cp(t) exp{-ita}. Since /n = aN+az,..IN--, it follows from (1.1.7) that eitoz 1 (cP*(t))N dt.
PN(n) = 2n J
After the substitution x = to,i', this equality takes the form na-1N
1
o-IN--PN(n)
= - f-7rc-l-N- etxz(tp*(x/ (aI-N)))Ndx. 27r
(1.1.8)
By the inversion formula, 1
e-z2/2
=
1
27r
27r
00
_
e-ixz-x2/2
dx.
(1.1.9)
00
It follows from (1.1.8) and (1.1.9) that the difference RN = 27r
(o-V'PN(n)
-
I
e-z2/2)
2n
(1.1.10)
can be written as the sum of the following four integrals:
It = LI 12 =
f 0. The number of such partitions is (n-(N i)N-1) . Let us define the uniform distribution on the set of these partitions by assigning the probability (n-(r-1)N-1)-1 to each partition n = nl + +nN, nl,... , nN > r. Then n can N-1 be written in the form
where the summands 171, ... , 17N are random variables. If n1, ... , nN > r and
P{,11=nl,...,i1N=nN}=
(n-(r-1)N-1\-1 N-1
The generalized scheme of allocation and the components of random graphs
20
The general scheme of allocation corresponding to this combinatorial problem is obtained if we use the geometric distribution for the distribution of the random
variables 1, ... , N:
=k}= pk-r(1-p), k=r,r+1,..., 0< p< 1. Indeed, as is easily verified,
=
I
In - (r - 1)N - 1
N-1
)
since, for geometrically distributed summands,
In - (r - 1)N - 1)pn_Nr(lp)N
N-1
Example 1.2.3. Note that it is not necessary for the random variables , ... , i4N in a generalized scheme to be identically distributed. Consider the following example. Draw n balls at random without replacement from an urn containing m, balls of
the ith color, i = 1, ..., N. Let nt denote the number of balls drawn of the ith color, i = 1, ... , N. It is easily seen that for nonnegative integers n 1, . that
. .
, n N such
(m)(mJ%.r) = nl..... r1N = nN) _
nlnNP{r11 (m) n
where m =ml+ +MNIf in the generalized scheme of allocation the random variables 1, ... , N have the binomial distributions
k} _
k`) pk(1
-
p)Mi-k,
where 0 < p < 1, k = 1,2,...,mi,i = 1,...,N, then
(nl) ... P{
(fN)
(m) n
and the distribution of the random variables 111, ... ,11N coincides with the con-
ditional distribution of the independent random variables 1, ... , N under the + N = n. Thus t11, ... , ON may be viewed as contents of cells condition 41 + in the generalized scheme of allocation, in which the random variables i;1, .... N have different binomial distributions.
1.2 The generalized scheme of allocation
21
Example 1.2.4. In a sense, the graph rn of a random mapping consists of trees. Indeed, the graph can be naturally decomposed into connected components. Clearly, each connected component of the graph r,, contains exactly one cycle. Vertices in the cycle are called cyclic. If we remove the arcs joining the cyclic vertices, then the graph turns into a forest, that is, a graph consisting of rooted trees. Recall that a rooted tree with n + 1 vertices is a connected undirected graph without cycles, with one special vertex called the root, and with n nonroot labeled vertices. A rooted tree with n + 1 vertices has n edges. In what follows, we view all edges of trees as directed away from the root, and the multiplicity of a vertex of a tree is defined as the number of edges emanating from it.
Let T denote the set of all rooted trees with n + 1 vertices whose roots are labeled zero, and the n nonroot vertices are labeled 1 , 2, ... , n. The number of elements of the. set T is equal to (n + 1)i-1 A forest with N roots and n nonroot vertices is a graph, all of whose components are trees. The roots of these trees are labeled with 1, . . . , N and the nonroot vertices with 1, ... , n. We denote the set of all such forests by Tn, N. The number of elements in the set T,,, N is N(n + N)' -1. The number of forests in which the kth tree contains
nk nonroot vertices, k = 1, 2, ... , n, is n!
nl!...nN! (n1 + 1)n1
1
... (nN + 1)nN
1
,
where the factor n!/(n 1! . nN!) is the number of partitions of n vertices into N ordered groups, and (nk + 1)nk-1 is the number of trees that can be constructed from the kth group of vertices of each partition. Then
(nl + I)n1-1 ... (nN + n!
I
n,+...+nN=n
1)nN-1
= N(n + N)n-1,
(1.2.9)
where the summation is taken over nonnegative integers n 1, ... , n N such that Next, we define the uniform distribution on T, N. Let rlk denote the number of nonroot vertices in the kth tree of a random forest in Tn,N, k = 1, ... , N. For the random variables i11, ... , 1IN, we have n! (nl + 1)n1 ... (nN + 1)nN
n1> ... , ]N = nN} = N(n
+ N)n-1(n1 + 1)I ... (nN + 1)!
(1.2.10)
where n 1, ... , n N are nonnegative integers and n 1 + + n N = n. Let us consider independent identically distributed random variables 41, ..., 4N for which
k) =
(k + 1)kxke-e(x)
(k + 1)!
k = 0, 1, ... ,
(1.2.11)
where the parameter x lies in the interval 0 < x < e-1 and the function 0(x) is
22
The generalized scheme of allocation and the components of random graphs
defined as 00 kk-1
9(x) =
xk. k=1
k!
By using (1.2.9), we easily obtain P{
(nl + 1)n, ... (nN + I
1 + ... + N = n} =
xne-Ne(x)
nl+...+nN=n(nl+1)I...(nN+l)!
N(n + N)n-1 xne-NB(x). n!
hence, for any x, 0 < x < e-1, and for nonnegative integers n1, ..., nN such that
n1+...+nN=n,
P( l
I41+...+4N=n)
n! (nl + 1)nI ... (nN + 1)nN
(1.2.12)
The right-hand sides of (1.2.10) and (1.2.12) are identical, and the joint distribution
of 111, ... , 11N coincides with the distribution of 41, ... , N under the condition that 1 + +4N = n. Thus, for the random variables t)l, ... , rI N and 1, ..., N, relation (1.2.2) holds, enabling us to study tree sizes in a random forest by using the generalized scheme of allocating particles into cells, with the random variables , N having the distribution given by (1.2.11). 41,
1.3. Connectivity of graphs and the generalized scheme Not pretending to give an exhaustive solution, let us describe a rather general model of a random graph by using the generalized scheme of allocation. Consider the set of all graphs rn(R) with n labeled vertices possessing a property R. We assume that connectivity is defined for the graphs from this set and that each graph is represented as a union of its connected components. In the formal treatment that follows, it may be helpful to keep in mind the graphs of random mappings or of random permutations. The former graphs consist of components that are connected directed graphs with exactly one cycle, whereas the latter graphs consist only of cycles.
Let an denote the number of graphs in the set rn (R) and let bn be the number of connected graphs in rn (R). We denote by f'n,N(R) the subset of graphs in rn (R) with exactly N connected components. Note that the components of a graph in f'n,N(R) are unordered, and hence we can consider only the symmetric characteristics that do not depend on the order of the components. To avoid this restriction, we, instead, consider the set fn,N(R) of combinatorial objects constructed by means of all possible orderings of the components of each graph from
1.3 Connectivity of graphs and the generalized scheme
23
rn,N(R). The elements of this set are ordered collections of N components, each of which is a connected graph possessing the property R, and the total number of vertices in the components is equal to n. Since the vertices of a graph in 1',,,N(R) are labeled, all the connected components of the graph are distinct; therefore the number of elements in Fn,N(R) is equal to N! an, N, where an,N is the number of elements of the set r,N(R) consisting of the unordered collection of components. Now let us impose a restriction on the property R of graphs. Let a graph possess the property R if and only if the property holds for each connected component: The property R is then called decomposable. Set ao = 1, bo = 0 and introduce the generating functions 00
anxn
A(x) _ n=1
Lemma 1.3.1.
n.
I
'
B(x) =
bnxn 00 n. I
n-o
If the property R is decomposable, then an ,N =
n! NI
bn, ... bnN n i +...+n N=n
n1!...nN!,
(1.3.1)
where the summation is taken over nonnegative integers n 1, ... , n N such that
n1 +nN=n n N) denote nN the number of graphs in G,N(R) with ordered components of sizes n1, . . . , nN.
n1+
We construct all an (n 1, ... , nN) such graphs and decompose then labeled vertices
into N groups so that there are n, vertices in the ith group, i = 1, ..., N. This n N!) ways. From n; vertices, we construct a connected graph possessing the property R; this can be done in bn; ways. Thus the number of ordered sets of connected components of sizes n1, ... , nN is can be done in n ! / (n I! .
n!bn1. . .bfN
an(nl,...,nN) _ n1!...nN!
Since N components can be ordered in N! ways, the number a n (n 1, ... , nN) of unordered sets, or the number of graphs in rn,N(R) having exactly N components of sizes n1, ... , nN, is
an(n1,...,nN)=
Lemma 1.3.2.
'
an(n1,...,nN)= N!
nIn,.. .bnN N! n1! ... nN!
If the property R is decomposable, then
= e B(x)
(1.3.2)
24
The generalized scheme of allocation and the components of random graphs
Proof. As follows from (1.3.1), the number an of all graphs in rn(R) is n
an =
ni!... nN. n1. N=l N=El N n1+...+nN=n
(1.3.3)
nN!'
By dividing both sides of this equality by n!, multiplying by xn, and summing over n, we get the chain of equalities 00
A(x) - 1
anxn
n=1
n!
oo
1
E N! n=1 N=1 Z
bnlxnl ... bnNxnN
nn
n1!...nN!
L-+
``
bn'n)N
nn
=
eB(x)
- 1,
which proves the lemma. Let us define the uniform distribution on the set Fn (R) and consider the random
variables an, equal to the number of components of size m in a random graph from Fn (R). The total number of components vn of a random graph from rn (R) is related to these variables by vn = at + + an. Arrange the components in order of nondecreasing sizes and denote by lam the size of the mth components in the ordered series; if m > vn, set j8m = 0. We will also consider the random variables defined on the set I'n,N (R) of ordered sets of N components. The ordered components labeled with the numbers from 1 to N play the role of cells in the generalized scheme of allocating particles. Define the uniform distribution on F ,N(R) and denote by 111, ... , ?IN the sizes of the ordered connected components of a random element in rn,N(R). It is then clear that
=n1,...,t1N=nN) _ Theorem 1.3.1.
N!an(ni,...,nN)
an(nl,...,nN)
N! an,N
an,N
.
(1.3.4)
If the series
B(x) = E bnxn n=0
n!
(1.3.5)
has a nonzero radius of convergence, then the random variables 77 1, ... , 17N are the contents of cells in the generalized scheme of allocation in which the independent
1.3 Connectivity of graphs and the generalized scheme
25
identically distributed random variables 1, ... , N have the distribution bkx k} =
k
(1.3.6)
k! B(x)'
where the positive value x from the domain of convergence of (1.3.5) maybe taken arbitrarily.
Proof. Let us find the conditional joint distribution of the random variables 1, , N with distribution (1.3.6) under the condition S l + + 4N = n. For such random variables,
P{6 +...+4N=n}=
xn
bnl...bnN,
. (B(x))N nl+...+nN=n nil ...nNi
(1.3.7)
and by virtue of (1.3.1), n
P{
1
+...+ N=n}_ (B(x
Hence, ifnl,...,nN > 1 andnl
I N!
nlanN.
(1.3.8)
=n, then
=nN 16
=n}
bnl ... bnNXn
=n} bnl ... bnNn!
nl!...nN!N!an N' and according to (1.3.2),
I 1+
+4N=n}= an(n1,...,nN)
(1.3.9)
an,N
From (1.3.4) and (1.3.9), we obtain the relation (1.2.2) between the random variables 171, ... , 17N and 1, ... , l N in the generalized scheme of allocating particles to cells. In the generalized scheme of allocating particles, we usually study the random
variables / r(n, N) equal to the number of cells containing exactly r particles and the order statistics 77(1), 71(2), ... , 1)(N) obtained by arranging the contents of
cells in nondecreasing order. In this case, tr(n, N) is the number of components of size r, and 77(1), 11(2), ... , 11(N) are the sizes of the components in a random element from I'n,N(R) arranged in nondecreasing order. The random variables help in studying distributions of the random variables a l ,--- an and the associated variables defined on the set F, (R) of all graphs possessing the property R.
26
The generalized scheme of allocation and the components of random graphs
Lemma 1.3.3.
For any positive x from the domain of convergence of (1.3.5),
P{vn = N} =
n! (B(x))N an xn
(1.3.10)
+4N = n}.
PR I +
Proof. Relation (1.3.10) follows from (1.3.8) because P{vn = N} = an,N/an by definition. It is clear by virtue of (1.3.3) that the number an can also be expressed in terms of probabilities related to i;1, ... , N: 00
an = Lemma 1.3.4.
N=1
nm (B(x))N
n).
n ,.n
(1.3.11)
For any nonnegative integers N, m 1, ... , Mn,
P{a1 =m1,...,an =mn I vn =N} = P{N-1(n, N) = m 1, ... , µn (n, N) = mn }.
Proof. The conditional distribution on rn (R) under the condition vn = N is concentrated on the set I'n, N (R) of graphs having exactly N connected components and is uniform on this set. Hence,
P{a1=In 1,...,an=mnlvn=NJ =
CN(m1,...,mn) an,N
,
(1.3.12)
where an,N is the number of elements in rn,N(R) and cN(m1, ... , mn) is the number of graphs in I'n,N(R) such that the number of components of size r is m
r=1,2,
,n.
Consider the above set I'n,N(R) composed of ordered sets of N components. Let cN (m 1, ... , m n) denote the number of elements in I'n, N (R) such that the number
of components of size r is mr, r = 1, 2, ... , n. It is clear that
P{µ1(n, N)=ml,...,µn(n, N)=rnn}=
CN(m1, ... , Mn)
(1.3.13)
an, N
where an,N is the number of elements in f'n,N(R). The assertion of the lemma follows from (1.3.12) and (1.3.13) because an,N = N! an,N and cN(ml, ,.. , Mn) _
N!CN(m1,...,mn). Thus, if the series (1.3.5) has a nonzero radius of convergence, then all of the random variables expressed by a1, ..., an can be studied by using the generalized scheme of allocating particles in which the random variables i;1, ... , N have the distribution (1.3.6). Roughly speaking, under the condition that the number v, of connected components of the graph rn(R) is N, the sizes of these components (under a random ordering) have the same joint distribution as the random variables
1.3 Connectivity of graphs and the generalized scheme
27
in the generalized scheme of allocating particles that are defined by the independent random variables 1, ..., i;N with distribution (1.3.6). Thus, for , 17N
111,
vn = N the random variables A, ... , ON are expressed in terms of al, ..., an in exactly t h e same way as the order statistics Y ( 17.) ,,-17(N)
in the generalized
scheme of allocating particles are expressed in terms of µ1(n, N), ... , µn (n, N). Hence, Lemma 1.3.4 implies the following assertion. Lemma 1.3.5.
For any nonnegative integers N, k1, ... , kN,
P{O1=k1,...,ON=kNI vn=N}=P{ri(1)=k1,...,1)(N)=kN}. (1.3.14)
We now consider the joint distribution of µ1(n, N), ... , An (n, N). Lemma 1.3.6.
For nonnegative integers m 1, ... , mn such that m 1 + +mn = N
P{t-t1(n,N)=ml,...,µn(n,N)=mn}
n!bj'...bn" m1! ... Mn! (1!)m' ... (n!)mnan P{vn = N}
(1.3.15)
Proof. To obtain (1.3.15), it suffices to calculate CN (m 1, ... , m n) in (1.3.13). It is clear that
cN(m1, ..., Mn) = E
..., nN),
where the summation is taken over all sets (n 1, . . . , n N) containing the element r exactly Mr times, r = 1, ... , n. The number of such sets is N!/(m 1 ! . . mn !), and for each of them, by (1.3.2),
an(nl,...,nN)=
n!bm'...bmn n (1!)mt ... (n!)jnn
Hence,
N!n!bi 1...bn" jN(yn1, ... , Mn) = ml! ... mn! (1!)m' ... (n!)mn To obtain formula (1.3.15), it remains to note that an,N an P{vn = NJ = an,N = Ni an
Lemmas 1.3.4 and 1.3.6 enable us to express the joint distribution of the random
variables a1, ... , an in a random graph from rn (R).
28
The generalized scheme of allocation and the components of random graphs
Lemma 1.3.7.
If ml,..., mn are nonnegative integers, then n
F1
P{al = ml, ... , an = Mn) =
bOr
mr r!)mr an r=l (
if _r=1 rmr = n,
0
otherwise.
Proof. By the total probability formula,
Plat =ml,...,an =mn} N
=EP{vn =k}P{a1 =ml,...,an =mn I vn =k} k=1
= P{vn = N}P{al = ml, ... , an = Mn I vn = N}, where N = in 1 +
+ mn. By using Lemma 1.3.4, we find that
P{al =In1,...,an =mn} = P{vn = N}P{1-t1(n, N) = m1, ..., An(n, N) = mn}.
(1.3.16)
It remains to note that P {µ 1(n, N) = M1, ... , µn (n, N) = mn } = 0 if M1 + n and that equality (1.3.15) from Lemma 1.3.6 holds for the 2m2 + + mmn probability P {µ 1(n, N) = m 1, ... , µn (n , N) = mn } if m 1 + + m n = N and + mmn = n. The substitution of (1.3.15) into (1.3.16) proves ml + 2m2 + Lemma 1.3.7. We now turn to some examples.
Example 1.3.1. The set Sn of one-to-one mappings corresponds to the set rn (R) of graphs with n vertices for which we have the property R: Graphs are directed with exactly one arc entering each vertex and exactly one arc emanating from each vertex. This property is decomposable. The connected components of such a graph are (directed) cycles. In this case, an = n!, bn = (n - 1)!, and the generating functions 1
A(x) = 1 - x ,
B(x)
log(1 - x)
satisfy the relations of Lemma 1.3.2:
A(x) = eB(x).
(1.3.17)
To study the lengths of cycles of a random permutation and the associated variables,
one can use the generalized scheme of allocating particles in which the random variables 41, ..., 4N have the distribution xk
klog(1 - x) '
k=1,2,..., 0<x 0 such that for ItI < s and
O 1 and k such that (k - Npr(1))/(Npr(1)(1 - pr(1)))1/2
lies in any fixed finite interval. Thus the ratio in (1.5.3) tends to 1, and the asymptotics of P{µr = k} is determined by the first factor and coincides with the asymptotics of the corresponding binomial probability. This proves Theorems 1.5.1 and 1.5.2 in the third domain. The proof of Theorems 1.5.1 and 1.5.2 is now complete.
1.6. Maximum size of trees in a random forest The results of the previous section give some information on the behavior of the maximum size 17(N) of trees in a random forest from Jr,,,N with T = n - N edges.
Indeed, if 0 = 2T/n -k 0 and there exists r = r(n, N) such that Npr(0) -' 00 and Npr+1 (B) -+ X, 0 < A < oo, then the distribution of the number µr of trees of size r approaches a normal distribution, and the distribution of µr+1 approaches
1.6 Maximum size of trees in a random forest
49
the Poisson distribution with parameter A. This implies that the limit distribution of the random variable i1(N) is concentrated on the points r and r + 1.
If 0 = 2T/n -k y, y > 0, then there are infinitely many r = r(n, N) such that the distribution of µr approaches a Poisson distribution; hence, the distribution of 11(N) is scattered more and more as y increases. If 0 < y < 1, then the limit distribution is concentrated on a countable set of integers, whereas if y > 1, then 71(N) must be normalized to have a limit distribution, and the normalizing values tend to infinity at different rates, depending on the region of 0. Thus, it should be possible to prove the limit theorems for 17(N) when T/n --* 0
by using results on µr from the previous section. But if 2T/n -* y for y > 0, this approach may not work, and even if it did, the proofs would not be simple.
Therefore we choose instead to use the approach based on the generalized scheme of allocation. Let l, ... , 4N be random variables with distribution 2rr-2er-le-rB pr(e) = P{41 = k}
=
0 1 vary such that Npr (0) - oo and Npr+1(0) -+ X for 0 < A < oo, then P{11(N) = r} = e-X +0(1),
P{17(N) = r + 1) = 1 - e-A + 0(1).
The generalized scheme of allocation and the components of random graphs
50
Note that if A 0 0 in the conditions of the theorem, then Npr (0) + 00 without any additional requirements. In particular, the conditions of the theorem are fulfilled if T/n(r-1)/r ,
0 < p < 00.
p,
Under this condition, Theorem 1.6.1 was proved by Erdds and Renyi [37], whose well-known paper provided the only results on the behavior of 17(N) until Britikov's work seventeen years later [28].
Theorem 1.6.2. If n, N -+ oo, 0 = 2T/n -* y, 0 < y < 1, then for any fixed integer k,
(Y - 1 - log Y)5/2 k+(a))(y-l-logy) e(
[a] < k} = exp
(ey-1 - y)
27r
0+00A
where
log n -
a
log log n
z 0-1-log9
and [a] and (a) denote, respectively, the integer and fractional parts of a.
00, 0 = 2T/n
Theorem 1.6.3. If n, N
1, and N(1 - 0)3 -k 00, then for
any fixed z,
P017(N) - U < z} where $
a-e z,
log(Be1-e) and u is the root of the equation
(2
/2
)
NN312 __ u5/2eu
Theorem 1.6.4. If n, N -+ 0o such that N1/3(1 - 2T/n) -+ v, -00 < v < 00, then for any fixed positive z,
)'Is (z, v),
00
P{ 11(N) r
r
The sum Y(Pk)-5/2e-1k$
k>r
is an integral sum of the function f (y) = y by the corresponding integral: E(Ok)-5/2e-1kO
=1
5/2e-Y with step
0) Y-5/2 e-Y dy(1
r
k>r
0 and is approximated
+ 0(1))
c
=
(r$)-5/2e-rl (1
+ o(1)).
Therefore 1/2
(2
NPr =
-1 n JJJ
NN312(ro)-512e-rO(1 + o(1)).
By definition, r$ = u + z + o(1) and
(2) 1/2 NN3/2 = u5/2eu Substituting these expressions into (1.6.8) yields
NPr = e-Z(1 + o(1)).
Now we are ready to prove the theorems of this section.
(1.6.8)
The generalized scheme of allocation and the components of random graphs
54
Proof of Theorems 1.6.1-1.6.3. By applying Lemma 1.6.1, we find that under the conditions of Theorem 1.6.1,
(1 - Pr)N -
(1 - Pr-1)N _+ 0,
(1 - Pr+l)N -+ 1
as N - oo. These relations, together with (1.6.5), whose proof is pending, imply the assertion of Theorem 1.6.1. Let
a=
log n - i log log n
0-1-logo
'
and choose r = [a] + k, where k is a fixed integer. Under the conditions of Theorem 1.6.2, r = [a] + k -> oo and according to Lemma 1.6.2, NPr = Npr(o)c(1 - c)-1(1 + o(1)),
where c = ye1-y. It is easy to see that 2W-2er-lerO
ner(1-B+1ogO)
Npr (0) =
n
r!(2 - o)
(1 + 0(1))
(y - 1 -logY)5/2e-(k-{a))(y-l-logy)(1
Y 2n
+O(1)).
Thus
NPr -
(y - 1 - log y)512Ce (k-{a})(y-l-logy)(1
y(1-c) 2n
+ o(1)),
and consequently,
1 - log
(1 - Pr)N = exp (
(ey-1
y5/2
- y) -/27r
e
_(k-(a))(y-l-logy)
0+00)).
Under the conditions of Theorem 1.6.3, Lemma 1.6.3 shows that NPr
e_Z
and
(1-Pr)N-, e-e z. Thus, to complete the proof of Theorems 1.6.1-1.6.3, it remains to verify (1.6.5) under each set of conditions. Since ON -* oo and N(1 - 0)3 oo, by Theorem 1.4.1 the random sum N is asymptotically normal, and
PRN = n} =
1
a(0) 2 N
0+00)),
1.6 Maximum size of trees in a random forest
55
where
or 2(e) = D1 =
29
(1-0)(2-0)
1/2for
While estimating the asymptotic behavior of (1 - Pr)N in Lemmas 1.6.1-1.6.3,
we determined the choice of r. We now prove the central limit theorem for the sum (r)
these choices of r. Set BN = The characteristic function of the random variable fi - m (0), where m (0) _ Q(9)N.
E41, is -itm(o)
1 - Pr
r
E
pk(0)eitk
= e-'
1 - Pr
k=1
(t) -
Pk(6)eitk
k>r
where cp(t) is the characteristic function of the random variable 1. Hence, the characteristic function cpr (t, 9) of the random variable Nm (9))/BN can be written e(1itNm(o)/BN
tPr(t> 8) =
Pr)N
(BNt ) (1
('P
N
-
Pk(0)eitk/BN(1 +0(1)))
k>r
According to Theorem 1.4.1, the distribution of the standard normal law, and consequently, e-itNm(o)/
Nm (9))/BN converges to
coN(t/BN) -+ e-t2/2
(1.6.9)
It is clear that
E k>r
pr(6)eitk/BN
= Pr +
pr(0)(eitk/BN
- 1) = Pr + 0
k>r
1
kpr(9)
C BN k>r
and it is not difficult to prove in each of the three cases that
1B Y'kpk(0) = o(1/N). N
(1.6.10)
k>r
Estimates (1.6.9) and (1.6.10) imply that for any fixed t,
(Pr Q, 0) ,
e-tz/2,
and the distribution of (-(r) - Nm(9))/BN converges to the standard normal distribution. The local convergence
n} =
1
a(9) 2nN
(1 +0(1))
needed for the proof of (1.6.5) can be proved in the standard way.
Thus the ratio in (1.6.5) tends to 1, and this, together with the estimates of (1 - Pr)N, completes the proof of Theorems 1.6.1-1.6.3.
The generalized scheme of allocation and the components of random graphs
56
To prove Theorem 1.6.4, the following lemma is needed.
If N -+ oo, the parameter 6 in the distribution (1.6.1) equals 1, N113(1 - 2T/n) -a v, and r = zN213, where z is a positive constant, then
Lemma 1.6.4.
bN2/3P{r;. =n} = f(z,v)+o(1), where z-3/2
f(z, y) = exp
I
2, I
(Po'; 3/2, -1) + E s=1
s
3
1
Is(z, y)
s!
and Is (z, y) is defined in Theorem 1.6.4.
Proof. As N
oo, 2kr-2e-k
Pk = Pk (1) =
ml/2 k-5/2(1
=
k-
(1.6.11)
uniformly in k > r. It is clear that
itk
1
k
1
(bN2/31
exp I bN2/3 I _ b3/2N
k512
-5/2
exp
itk {
bN2/3 1
1
bN2/3
The last sum is an integral sum of the function y-5/2 e"Y with step 1/(bN2/3); hence, 1
k5/2
p exp
itk
_
1
b3/2N
y-512 e i
J
dy+o(1)).
(1.6.12)
Set
f
3
H(t, z) =
°O
Y-5/2 e'ty dy.
Then
H(t, z)II(0, z) =
4 fy
2 -3/2
eo
3
5/2 dy =
(1.6.13)
Taking into account b = 2(2/3)2/3, we obtain from (1.6.12) and (1.6.13) that pk exp k>r
{
itk bN2/3 J _
-
2
1/2
n/
1
k>r
k/2/2
itk
exp { bN2/3
H(t, z) + o(1)
N
1+0(1) N (1.6.14)
In particular,
NPr = H(0, z)(1 + o(1)).
(1.6.15)
1.6 Maximum size of trees in a random forest
The characteristic function cpr(t, 1) of the random variable can be written
cpr(t, 1) =
(p (b2/3' 1) - exp
J_b2i2t13
57
- 2N)/(bN2/3)
tk>r
k
jPkexplbN2/3I
2 in this
where cp(t, 1) is the characteristic function of 41 Note that case. It follows from (1.6.13), (1.6.14), and Theorem 1.4.2 that
\ (Pr (t, 1) =
Nt
cPN(
213'
1I
1
-
(
l
PkexP j
bN2 3 1
k>r
N
(1+0(1) )) (1 - Pr)-N
))N (t)H(Nz)+0Nx
(1-H(0,z)+0(NII_N where *(t) is the characteristic function of the stable distribution with density p(y; 3/2, -1). Thus, for any fixed t, as N -+ oo,
cpr(t, 1) -a g(t, z) = ,fi(t) exp{-H(t, z) + H(0, z)). The function g(t, z) is continuous; therefore, by Theorem 1.1.9, it is a characteristic function. Since Ig(t, z) I is integrable, it corresponds to the density
f(z, Y) =
27r
f
00
e_ityg(t, z) dt.
The span of the distribution of fir) is 1; therefore, by Theorem 1.1.10, the local convergence is valid. Thus it remains to show that f (z, y) has the form given in Theorem 1.6.4. Representing e-H(t,z) by its Taylor series gives
f(z, y) = eH(0,z)
00
(-1)s s.i
f (z. v).
(1.6.16)
S=O
where 00
fs(z, Y) =
27r
e-'ty*(t)Hs(t, z) dt.
J 00
It is easy to see that the function 2,1-i-rz3/2H(t, z) is the characteristic function of the distribution with density Pz(Y) = 2Z3/2y 5/2,
y > Z.
(1.6.17)
58
The generalized scheme of allocation and the components of random graphs
Therefore the function (2/-7r-z3/2)'
* (t) Hs (t, z)
is the characteristic function of the sum ,B + 01 + + j of independent random variables, where 0 has the stable law with density p(y; 3/2, -1) and 01, . .., 0,s are
identically distributed with density pz(y). The density of the sum,B+fil + +is is
hs(Y) = (Zz3/2)SIs(t, y), where Is (t, y) is defined in Theorem 1.6.4. Thus 00
if e_1t /i(t)Hs(t, Y) dt = 2n
(3)s 4
Is (t, Y) -
When we substitute this expression into (1.6.16), we find that
f(z, y) = eH(o'z)
00
1
sE =O
-s!
(--) s Is(t, Y)
(1.6.18)
Taking into account (1.6.15), Theorem 1.4.2, and (1.6.18), we see that Theorem 1.6.4 follows from (1.6.2). To prove Theorem 1.6.5 with the help of (1.6.2), we need to know the asymptotic behavior of large deviations of P { N = n }. We give that information without proof (see [28]).
Lemma 1.6.5. If n, N
oo, the parameter 0 in the distribution (1.6.1) equals 1,
N(1 - 2T/n)3 - -oo, and r = n - 2N - bzN213, where z is a constant, then (Y)
_
2
n} _ Gr )
1/2
N (n - 2N)5/2
00
p(y; 3/2, -1) dy (1 + o(1)). (1.6.19)
The assertion of Theorem 1.6.5 follows from (1.6.19), Theorem 1.4.2, and the fact that NP,. - 0 under the condition of Theorem 1.6.5.
1.7. Graphs with unicyclic components A graph is called unicyclic if it is connected and contains only one cycle. The number of edges of a unicyclic graph coincides with the number of its vertices. Let U, denote the set of all graphs with n vertices where every connected component is unicyclic. Any graph from U, has n edges. In this section, we study the structure of a random graph from U,, . We follow the general approach described in Section 1.2. As usual, denote by u the number of graphs in U,,; we will study u as n 00. Let b,, be the number of unicyclic graphs with n vertices, and bn the number of
1.7 Graphs with unicyclic components
59
unicyclic graphs with n vertices, where the cycle has size r. The cycle of a unicyclic graph is nondirected; in other aspects, a unicyclic graph is similar to the connected
graph of a mapping of a finite set into itself. Let do be the number of connected graphs of mappings of a set with n labeled vertices into itself, and d(nr) the number of such graphs with the cycle of size r. It is easy to see that dnl)
=
bnl)
= dnl),
nn-1,
d(2) = 21 n n
Inn-3 = nn-1 - nn-2
bn2) = d(2)
bn') = dn'')/2,
2
r > 3.
Introduce the generating functions 00 d(x)=EdnXn
bnxn
B(x)
n=1
n!
n=1
00
c) = n=1
ni .
nn-2xn n!
These functions can be represented in terms of the function 00 nn-1xn 0(x) _
57,
,
n. .
n=1 0e-e
= x in the interval [0, 1]. This function was which is the root of the equation used in Section 1.4. Taking into account the notation introduced here and using the results of Section 1.4, we see that
d(x) _ -log(1 - 0(x)),
c(x) = 122(1 - (1 - 0(x))2 .
Since bn = bnl) + + b(nn), we have 00 bnxn 1 00 dnxn
B(x) =
n=1
n
=
2
n=1
1
n!
dnl)xn
1
+2
n=1
n!
dn2)xn
1
+
2
n=1
n
1
= 2d(x) +0(x) - -c(x) 2 log(1 - 0(x)) + 0(x) - 4 (1 - (1 - 0(x))2).
(1.7.1)
In accordance with the general model of Section 1.4, let us introduce independent
identically distributed random variables 1, ... , N for which
k} =
bkxk
k! B(x)'
k = 1, 2, ....
(1.7.2)
The number of graphs in U, with N components can be represented in the form
un N _ n! Nt
bnl ... bnN
E n1!...nN! nl+ +nN=n
n! (B(x))N N!xn P{ l + ... + N = n}. (1.7.3)
The generalized scheme of allocation and the components of random graphs
60
In what follows, we choose
x = (1 -
1/,-)e1-1/fn-.
Theorem 1.7.1. As n --> oo,
un =
nn-1/4(1
27re314
21/ f(1/4)
+0(1)),
where 00
r(P) = jxP_1e_xdx is the Euler gamma function. Before proving Theorem 1.7.1, we will prove some auxiliary results.
Lemma 1.7.1.
Forx=(1-1/,fn-)e1-11'1
(1-B(xe`t))2= I -2it+s1(t)+82(t,n), n
where El (t)/t -+ 0 as t --* 0 uniformly inn and I s(t, n) I < 21 t I / J. Proof. We found in Section 1.4 that kk-2wk
u(w) = (1 - O(w))2 = 1 - 0"2 E ki
= 1 - 2c(w),
IwI < e-1.
k=1
When we write u(w) as u(e-1+`r)
u(xe") =
+
1
n
+ ez(t, n),
(1.7.4)
it is clear thatO(x) = 1-1//and6(e-1) = 1;therefore u(e-1)-u(x) = -1/n. With this equality and the observation that x < e-1, we obtain the estimates
182(t, n)I = l u(xe`t) - u(e-l+it) - 1/nl xk)(eitk
kk-2(e-k
=2
-
- 1)
kl
k=1
00 kk-1(e-k - xk) I/t I < 2E kl
k=1
= 21t1(e(e 1) - e(x)) = altll,,,fn.
(1.7.5)
1.7 Graphs with unicyclic components
61
The function u(e-l+it) has the first derivative -2i at the point t = 0; thus, as
t - 0, u(e-l+")
= -2it +o(t).
(1.7.6)
The assertion of the lemma follows from (1.7.4), (1.7.5), and (1.7.6).
Lemma 1.7.2. I fn -+ oo, N = a log n + o (log n), where a is a positive constant, then
+N=k}= 2ai'(a) za-le-z/2(1 + o(1))
nP{ll+
1
uniformly in k such that z = k/n lies in any fixed interval of the form 0 < zp
M
bmI mix ... bm mt+...+mk=m
mk
< (B(x)
x
MI! ... Mk!
)k-1
"' m>>M/k m!
m
(1.8.12)
holds. As in Section 1.7, let do be the number of connected graphs of single-valued mappings of a set with n elements into itself and let
dmxm
d(x) _
m=1
m!
Since
m-1
bm M/k C2n2T
q Tl/4
k!2TT!(1-q)
/k ,
where cl, C2 are some constants. Thus, under the conditions of the theorem, S2 = O(n 2T/(2TT!)).
We now estimate the sum Si. According to (1.8.8),
T! Fn-m,N -
T ! (n
2T m(T - m)!
- n2Txm
1-
2Tnm
uniformly in m < M = T t/a
0+00))
(1 +O(1))
76
The generalized scheme of allocation and the components of random graphs
rE Er
Therefore, for any fixed k = 1, 2, ... , 1
S1
= kl
M!
m<M nll+...+mk=m
n2T
In
M
X
1
(n)bml
k! 2TT!
F
...bmk
mlt ... ynkl
bmlx"'1 ... bkx"'k (1+0(1)).
m1!...Ink!
m=k
Taking into account the estimate of S2, we obtain
Sl -
n2T
1-
k! 2 T T! M=k
X
ml+...+mk=m
bmlxml ... bmkx"'k (1 + 0(1)) + 0(1) yn11...nrkl
n2T 1- (B(x))k (1+0(1)). 2TT!k!
Combining the estimates of Sl and S2 yields 2T n an,T (k) = k!
2T_
(B(x))k(1 + o(1))
T!
9e-0
under the hypothesis of the theorem. Since x =
B1(x) -* Al,
Theorem 1.8.3.
B3(x) - A3.
A2,
oo such that 0 = 2T/n
If n, T an`T
B2(x)
-k Ae-X, we also have
n2T2T/T!1-J
=
eA',
A, 0 < A < 1, then
i = 1, 2> 3,
where Ai, i = 1, 2, 3, as in Theorem 1.8.2. Proof. To obtain the asymptotics of an,T, we have to estimate the sum 00
an,T =
(1.8.14)
Ean,T(k).
k=0
After normalization, we have n2T )-I
( 2TT !
00
an,T = k=0
n2T )-I
an,T(k),
(2T T !
where f o r any fixed k = 0, 1, ... , 2T
-1 an,T (k)
B(Xe
1
k!
(1.8.15)
1.8 Graphs with components of two types
77
as n, T -+ oo, 2T/n -+ ,l, 0 < ), < 1. We can pass to the limit under the sum in (1.8.15) if the series converges uniformly with respect to the parameters n, T. To see this, it suffices to obtain an estimate
(!)' n2T
an,T
(1.8.16)
Ak
such that the series F_'O Ak converges. Using (1.8.8) and (1.8.9) and reasoning as we did in the proof of the estimate of S2 give
E
an,T(k) =k!1 n
ml+...+mk=m
m=1
Cn2T
...bmk m! 1 nynJFn-m,Nbm1 MI!...Mk!
°O
xmbml
k!2TT! m=k ml+...+mk=m
=
bmk
MI!...mk!
cn2T (B(k))k
2TT!k!
Thus we have an estimate of the form (1.8.16) and can pass to the limit under the sum in (1.8.15) to obtain n2T
1-,l
an,T = 2T T, Depending on the set of graphs under consideration, replace B(x) with BI (x), B2 W, or B3(x), and Theorem 1.8.3 is proved.
A random graph from An,T has exactly N = n - T trees and a random number xn,T of unicyclic components. We denote by the number of unicyclic components in a random graph from T, i = 1, 2, 3. Theorem 1.8.4.
If n, T -> oo such that 0 = 2T/n
0 < A < 1, then for
any i = 1, 2, 3 and for anyfixed k = 0, 1, ... , P{xn`T
= k} =
Ake-Ai
`ki
(1 +o(1)),
where the A; are as in Theorem 1.8.2. Proof. The assertions of the theorem follow from Theorems 1.8.2 and 1.8.3, since
P {x(`) n,T -
k -a (`) n, T(k)l a(`) n,T.
Now we consider the case 0 = 2T/n -* 1. Let (on,T be the number of vertices that lie in the unicyclic components of a random graph from An`T, i = 1, 2, 3. It is clear that if we know the distribution of a characteristic of the random graph
78
The generalized scheme of allocation and the components of random graphs
under the condition On j = m in }the unconditional distribution can be obtained by averaging over the distribution of conT. Theorem 1.8.5. If n, T -+ oo such that e = 1 - 2T/n for any i = 1, 2, 3,
m} =
E
0 and e3n -+ oo, then
314e-Y(1
r(1/4)Y
+ o(l))
uniformly with respect to m such that y = 8 2m/2 lies in any fixed interval of the form 0 < yo < y < yi < oo,Jand there exists a constant A such that, for all in, Pictn1T=m} 0,
x} = T
(_4Ss! s
ZS(x)(1 +o(1)),
s=0
where ZS (x) is defined in Theorem 1.8.7. Finally, for any fixed z, P{Oa{Gn`T)
where 0
u < z}(1 + o(1)) = e-e z(1 +o(1)),
- u < z} =
log(9e-e), 0 = 2T/n, and u is the root of the equation 1/2
(21
N0312 =
u5/2eu.
(2.1.7)
Proof. The results of the theorem are the consequences of (2.1.5), (2.1.6), and Theorems 1.8.6, 1.8.7, 1.8.8, and 1.8.9.
2.2 Critical graphs
97
2.2. Critical graphs Recall that a graph with n vertices and T edges is called critical if n, T -* oo such
that s = 1 - 2T/n -+ 0 and sin tends to a constant. We have seen that many of the characteristics of the random graphs i = 1, 2, 3, change their behavior if 0 = 2T/n approaches the value 1. For example, the number of cycles, or the number of unicyclic components x(G(') ), tends to zero in probability if 0 0, A, has the Poisson distribution with parameter A; , i = 1, 2, 3, respectively, if 0 0 < A < 1, where 2
Al
=-Zlog(1-.l)+2+ 4,
A A2 A2=-2log(1-A)-2+ 4, 1
,l ,l2 A3=-2log(1-A)-24, 1
and is asymptotically normal with parameters (-1 log s, 4-1 logs) if s - 0, 4 sin oo. Thus, 0 = 1 is a singular point and one can correctly suppose that the behavior of the graphs near this point is interesting but difficult to investigate. Indeed, not much is known about the properties of critical graphs. We present here only one assertion about this behavior. Recall that A(' A. is the set of graphs with n labeled vertices and T edges that consists of trees and unicyclic components with neither loops nor multiple edges
for i = 3, without loops and with cycles of length 2 allowed for i = 2, and with cycles of lengths 1 and 2 allowed for i = 1. Theorem 2.2.1. If n, T -+ oo such that sn1/3 -+ 2 3-2/3v, where v is a con-
stant, then for any random graph G, ,, i = 1, 2, 3,
j
PIG(`) n E A(`) n, T 1ll
-
f3 3n
/^I,(1/4)
e4v3/27p(v)(1 +o(1)),
where 00
p(v) = J
Y-3/4 p(- v - y; 3/2, -1) dy
0
and p(y; 3/2, -1) is the density of the stable law, introduced in Theorem 1.4.2, with the characteristic function
f(t) = exp { -
ItI3/2eint/(41tl)I.
Proof. It is clear that
PIG(') nT
(') -a(i) (1) EAT{-anTlgfT,
Evolution of random graphs
98
T, and gn`T is the number of graphs in nT, i = 1, 2, 3. In accordance with Theorem 1.8.10,
where an` T is the number of graphs in
cin! en
o(1)),
an,T =
whereN=n - T, f3-e-1/4
,f3-e314 Cl
2x1'(1/4),
c2
r3-e-314
= 2 J r(1/4)'
c1
= 2V' f(1/4)
In the previous section, we proved that (i)
gn,T =
n 2T c, 1) (1 2T T
+ o(1)),
where cl (1) = e3/4, c2(1) = e-1/4 c3(1) = e-3/4
Since T = n(1 - e)/2 and e3n -+ 8v3/9, we easily find n!. e n T.i 2 T = 2N/7-re4v3/27(1 + o(1)) 2NNI,lN--n2T
and, consequently, (i) an,T
3n
=
e4v3/27p(v)(1 +o(1)).
-17(1/4)
gnt,T
The function p(v) can be represented by a convergent power series. The function
g(v) = p(-v) =
f
00
Y-3/4 p(v - y; 3/2, -1) dy
0
can be thought of as the convolution of the function
gl (Y) =
Y-3 (0,
4
y > 0,
Y 3. Each construction fixes r edges that must occur. These r edges
2.3 Random graphs with independent edges
103
can occur at r fixed places of the subset labeled i, and there exist r! possibilities to assign these r edges to r places. Thus the event 1 } can be realized by one
of the (r)(r - 1)! r!/2 variants. For r > 3, each of these variants has the probability (2/n2)r. Thus,
Ear =
(T)(n)( r-1)!r! r
r
2
(-)
r
(2.3.1)
n2
It is not difficult to check that this formula is also valid for r = 1 and r = 2. It follows from (2.3.1) that
Ear
0, s
P(8218(Gn,T) < x} = E00(4ss! Zs(x)(1 + o(1)),
where Zs (x) is defined in Theorem 1.8.7; and
P{j O (Gn,T) - u < z} = P{Ptj(Gn,T) - u < z}(1 + o(1)) = e-e z(1 + o(1)), where fi
log(0e-9), 0 = 2T/n, and u is the root of the equation 1/2
(2 )
(n - T),63/2 = u5/2e".
For the same reasons, Theorem 2.2.2 can be extended to the critical graph Gn,T
Theorem 2.3.6. stant, then
If n, T -+ oo such that en1/3
P{Gn,T E An,T}
F7r2
4v 3/27
e
2 3-2/3v, where v is a con-
rvk r(2k 3+2 1
ki
nk cos 3 (1 +0(1)).
k=0
For the supercritical case where n, T oo such that sin -+ -oo, we present here only the simplest results. In the final section of this chapter, we will give a short review of what is known about the supercritical graphs. It is known that if 0 = 2T/n -+ A, A > 1, a giant component appears in the graph Gn3T and, with probability tending to 1, consists of trees, unicyclic components, and this giant component formed by all the vertices that are not contained in trees and unicyclic components. As 2T/n increases, the size of the giant component increases and the number of unicyclic components decreases. If 0 = 2T/n -- A, 1 < A < oo, then the number of unicyclic components has oo, we have the following result. a Poisson distribution. For 0 Theorem 2.3.7. If n, T -* oo such that 0 = 2T/n -+ oo, then with probability tending to 1, there are no unicyclic components in Gn,T.
2.3 Random graphs with independent edges
107
Proof. The number of unicyclic component with r vertices is not greater than crr-1/2, where c is a constant (see, e.g., [16]). Denote by xr(Gn,T) the number of unicyclic components of size r in Gn,T. By reasoning as in the proof of (2.3.1), we find that
Exr(Gn,T)
where the summation is taken over all distinct ordered sets M iil), ... , ir1) }, J1), ... , (Ii(im), ... , iy(m)1, jm))
of distinct indices of the form ({i1, ... , iY}, j); the set {il, ... , ir} in the index is considered an unordered set of distinct indices, and j indicates the number of the cycle formed by the vertices i1, ... , ir. We show that under the conditions of Theorem 2.4.2, for any fixed r and any fixed m > 1,
( ra2r m I .
EaYml
(2.4.14)
2r
This assertion for m = 1 follows from Corollary 2.4.2.
In order to become accustomed to the more complicated notation, we first consider the case m = 2. By (2.4.13), Ea[21 =
Pli(iii )...i(1) _i r
(1)},jl),
I
2))
i(2) =
... r
1
1}
Decompose the right-hand side sum into two sums. Let the first sum E 1 include the summands with nonintersecting sets {iii), . . . , i(r1)} and {ii2), ... , i(re)}. When we take into account that in this case 2r edges must exist to guarantee
-
(j1) 1
(j2)
1)...i(1)-5(2)...i(2) r 1
and by using Lemma 2.4.1, we obtain P { (ii... (1) - 4 i1 lr l
(2).
1r(2)
= 1j
()2TaI) . .aa2) 2 2 .. a(2) (1+0(0). .
Therefore
n)2r ((r
21)!)2 2 2 2 aia) ... ai(1)ai(2) ... a,(z) 2
E 1= ...,irl)}
x (1 + O(1/n)).
r2)})
1
r
1
r
Evolution of random graphs
116
It is clear by virtue of (2.4.9) and (2.4.10) that 2 2 2 . al(l) ... a(l)ai!2)
r
1
2
1
r
(r!)2
i( 1
1 a2 ... azr M + 1 is valid. The number of summands in the sum EM,L does not exceed nM, and the probability that L fixed edges appear in Gn,T does not exceed, by virtue of (2.4.4), the value cn-L. This implies c
c
EM,L ,
(I,', jm11')1 ... ((hs>, >
Is>),
... , (Ims', jm')))-
We decompose the sum on the right-hand side of this representation into two parts; let the sum El include only summands with the distinct elements in all (k) II , l = 1 , ... , mk, k = 1, ... , s; and let the sum E2 include all the remaining summands. For the summands of the first sum, the corresponding random variables equal l only if there exist m 1 rl + +m srr fixed edges in G,,,T. Therefore, by Lemma 2.4.1, 1
1
(Jmss s)
I(t)
=
IM(IS)
(A)mlrl++msrs
2
1
I 2
2
2
irl
11
irs
/
ait(1 1) ... a( 1 ,1 ) ... a-(.,,s) ... a(mss) 0 + 00)),
n
and, by (2.4.9), (2.4.10), and Lemma 2.4.2,
rta2rt
(
2rl
mt
)
.
...
(
rsa2rs 2rs
fs (1 + o(1)).
It remains to prove that E2 tends to zero. The summation in E2 is taken over sets of composite indices in which at least one of the elements 1, 2, ... , n is encountered at least twice. A cycle corresponds to each of the composite indices. The existence of a common element in the cycles implies that the number M of distinct vertices contained in the cycles and the number L of distinct edges involved in the cycles satisfy L > M + 1. We decompose the sum E2 into a finite number of sums EM,L
containing summands with fixed values of the parameters M and L. By virtue of (2.4.3), for each of these sums, the estimate c
c
holds because the number of summands does not exceed n`x, and the probability of L fixed edges occurring in G,,,T does not exceed cn-L. This proves Theorem 2.4.3. To prove Theorem 2.4.1, we need the following auxiliary assertion.
119
2.4 Nonequiprobable graphs
Lemma 2.4.3. be nonnegative integer-valued random variables such that for an arbitrary fixed s and arbitrary nonnegative integers k1, . . . , ks, P{l (n) 1
al' ... as e_al_..._as
- k1, ... ,1, (n)-- ks} s
k1!...kst
as n _+ oo, where al, a2.... is a fixed sequence of nonnegative numbers. Moreover, suppose
nn) - 0 as s
(2.4.20)
oo, uniformly in n, and let 00
Eak=A 0 and an arbitrary fixed m, Ame- A
nn)= m}-
j
M!
<E
for sufficiently large n. For fixed e and m, there exists s such that Ame-A,. s
_
Ame-A
m!
where As = a1 +
m!
+ as.
It is not hard to see that
Therefore, by (2.4.20), IP{inn) = m} m}l < E/3 for sufficiently large s. Finally, the conditions of the lemma yield the convergence of the distribution of S n) = l; in) + . + (for any fixed s) to the Poisson distribution with parameter + as. Therefore As = a1 + 1
m}
-
Asme -AS
m!
E
-3
for sufficiently large s. Theorem 2.4.1 follows from Theorem 2.4.3 and Lemma 2.4.3, whose conditions are satisfied when Aa2 < 1.
120
Evolution of random graphs
2.5. Notes and references The investigation of the evolution of random graphs began when P. Erd6s and A. Renyi published the results of their study [37] in 1960. Along with the basic properties of the random graph Gn3T, they discovered the effect known as a phase
transition. At about the same time, V. E. Stepanov studied the graph G p, as documented later [ 133,134,135]. Until recently, Stepanov's results had not seemed to receive wide recognition. In particular, Stepanov proved that if p = c/n, where
c is a constant, c > 1, then the size of the giant component is asymptotically normal with mean na(c) and variance no(c), where Y
c
Y(1 - Ylc)
0(c) = c(1-y)2 ,
and y < 1 is the root of the equation Ye-Y = Ce-C A similar assertion for the graph Gn3T was proved by B. Pittel [123] about twenty years later. He found that the size of the giant component of Gn3T is asymptotically
normal with parameters not(c) and no(c)(1 - 2y + 2y2/c) as n, T -* oo and
2T/n- c> 1. Many open questions concerning the evolution of random graphs remain. The main goal of this chapter is to demonstrate the approach based on the generalized scheme of allocation in investigations of the evolution of random graphs. Section 2.1 shows that fine properties of subcritical graphs can be obtained in a rather simple and natural way, especially as concerns the behavior of subcritical graphs near the critical point. The transition phenomena for the graph Gn3T were first considered by B. Bollobas [20]. The results presented in Section 2.1 can be found in [77]. The approach based on the generalized scheme of allocation allowed us to prove asymptotic normality of the number of unicyclic components and find the limit distribution of the maximum sizes of trees and unicyclic components. Section 2.2 is devoted to critical graphs. The behavior of random graphs near the critical point, and especially in the critical domain where the giant component appears, is very complicated and difficult to investigate. The investigations of the behavior are far from complete, but even now the results obtained could fill another book. Much information about random graphs can be found in the fundamental work by Bollobas [21] and in the book [105], which is devoted to the evolution of random graphs. A detailed investigation of the birth of the giant component
is given in [63]. Supercritical graphs are considered by Luczak [99], who, in particular, proved that the right-hand bound of the critical domain is determined by the conditions n, T -+ oo, (1 - 2T/n)3n -+ -oo. Formally, to analyze supercritical random graphs, we can use the representation of almost all such graphs as a combination of components of three types: one giant
2.5 Notes and references
121
component, trees, and unicyclic components. However, this approach is hampered by the absence of a simple formula for the number of connected graphs with n
vertices and T edges with k = T - n > 0. Note that k = T - n is equal to the number of independent cycles in the graph and is called the cyclomatic number of the graph. Denote by c(n, k) the number of connected graphs with n labeled vertices and a cyclomatic number k. It is clear that c(n, -1) is the number of trees, and by the Cayley formula, c(n, -1) = nn-2, whereas c(n, 0) is the number u of unicyclic graphs considered in Section 1.7. The numbers c(n, k) were investigated by Stepanov (see [10, 142, 143]) and E. M. Wright [151, 152] and are known as the Stepanov-Wright numbers (see [143]). As n - oo and k3/n --> 0,
c(n, k)
d(3rr)1/2
(l2ke
n"+(k-l)/2(1
+ 0(1)),
where, as it was proved by Meertens, d = 1/(2n) (see Bender, Canfield, and McKay [16]). We hope that the results of the study by Bender et al. [ 17], who give the asymptotics of c(n, k) for all regular variations of the parameters n and k, can be used in the application of the generalized scheme to random graphs and help to bring the investigations of supercritical graphs to the level attained for the subcritical case in Section 2.1. Note that obtaining the limit distributions of numerical characteristics of supercritical graphs would be merely a problem of averaging if the joint distribution of the size of the giant component and the number of its edges were known.
The parameter 0 = 2T/n plays the role of time in the evolution of random graphs. Therefore, each numerical characteristic of a random graph can be considered not only as a random variable, but also as a random process with the time parameter 0. Of significant interest is the approach using the convergence of such processes. This approach is used in the recent papers [34, 62, 127]. Note that the investigations of convergence of such random processes in combinatorial problems were started by B. A. Sevastyanov [132] and Yu. V. Bolotnikov [22, 23, 24]. The random graph G,,,T discussed in Section 2.3 was investigated by Kolchin [79, 83]. This graph provides an appropriate model of the graph corresponding to the left-hand side of a system of random congruences modulo 2 considered in the next chapter. An analogy of Theorem 2.3.8 for bipartite graphs was proved by Saltykov [131]. The nonequiprobable version of the graph G,,,T is considered in Section 2.4, where the results of the papers [88,66,65] are presented. Here we use the method of moments. The lack of regular methods for an asymptotic analysis of nonequiprobable graphs makes it impossible to carry out anything approaching a complete investigation of such graphs. It seems to us that developing the methods appropriate for the analysis of nonequiprobable combinatorial structures is a problem of great importance.
3
Systems of random linear equations in GF(2)
3.1. Rank of a matrix and critical sets In this section, we consider systems of linear equations in GF(2), the field with elements 0 and 1. Let us begin with two examples where such systems appear. Consider first a simple classification problem. Suppose we have a set of n objects
of two sorts, for example, of two different weights. We may sequentially sample pairs of the objects from the set at random, compare the weights of the objects from the chosen pair, and determine whether the weights are identical or different. The problem is to identify the objects that have the same weight - actually, to estimate the probability of finding that solution. For a formal description of the situation, let {1, 2, ... , n} be the set of objects under consideration and let xj be t h e unknown type of the object j, j = 1 , ... , n. W e may assume that x1, ... , x take the values 0 and 1, depending on the class to which the object belongs. We
choose a pair of objects i (t) and j (t) in the trial with number t, t = 1, ... , T, and let bt be the result of their comparison: bt = 0 if their weights are identical, and bt = 1 otherwise. Thus, the results of the comparisons can be written as the following system of linear equations in GF(2):
xi(t) +xj(t) = bt,
t = 1, ... , T.
(3.1.1)
It is clear that the system can be rewritten in the matrix form
AX= B, where X = (x l , ... , and B = (b1, ... , bT) are column-vectors, and the elements ati of the matrix A = I1 ate 11, t = 1, ... , T, j = 1, . . . , n, are random variables whose distribution is determined by the sampling procedure. It is convenient to associate the system, or more precisely, the matrix A, with the random graph GG,T with n vertices that correspond to the variables x1, ... , x,,. The graph has T edges (i (t), j (t)), t = 1, ... , T. Therefore the graph can have loops and multiple edges, depending on the sampling procedure. 122
3.1 Rank of a matrix and critical sets
123
In this chapter, we consider the characteristics of the graph Gn,T that are related to some of the properties of the system (3.1.1). It is clear that the connectedness of the graph is an important characteristic for the classification problem. Indeed, in the case where the graph is connected, we can determine all values of the variables xl, ... , xn if we set one of them equal to 0 or 1. In both cases, the partitions of the set are the same, but the system has two different solutions. In the case where the graph Gn,T is disconnected, the system has more than two solutions; therefore a complete classification is impossible.
Now let the vector B consist of independent random variables that take the values 0 and 1. If the balance is out of order, the weighings can sometimes be wrong, and the variables b1, ... , bT can differ from the true values. In this case, we obtain a system with distorted entries on the right-hand side that sometimes has no solution. If the balance is completely wrong, we may assume that the variables bl .... bT do not depend on the left-hand side of the system and take the values 0 and 1 with equal probabilities. In this situation, several natural problems arise. Does the right-hand side b1, . . . , bT depend on the left-hand side of the system or are the sides independent? Can we reconstruct the real values of x1, ... , xn in the case where the right-hand parts bl, ... , bT are distorted? Let us turn to the second example. Let a vector (Cl, ... , cn) in GF(2) be given. If we take an initial vector xi, . . . , xn, then we can develop the recurring sequence
xn+t, t = 1, 2, ... , by the following recurrence relation:
t=1,2,....
(3.1.2)
This recurrence relation can be realized with the help of a device called a shift register, presented in Figure 3.1.1. A shift register consists of n cells or stages with labels 1, 2, ... , n. The n-dimensional (0, 1) vector of the contents of these stages is called the state of the shift register. At an initial moment, the state of the shift register under consideration is the vector (xl, ... , xn). The choice of the vector (cl, . . . , cn) means that we choose the stages with numbers corresponding to the ones in the sequence cl, ... , cn and form the mod 2 sumxn+l = Clxl + +Cnxn.
At the next moment, the contents of all stages are shifted to the left so that xn transfers to the stage numbered n - 1, xn-1 transfers to the stage n - 2, and so +Cnxn is placed into the on, xl leaves the register, and the sum xn+1 = Clxl + stage with label n. Thus the state (xl, ... , xn) transfers to the state (x2, ... , xn+l).
Xt
xt+n-1
Figure 3.1.1. Shift register
J
Systems of random linear equations in GF(2)
124
The process is repeated. Thus, if cl, . .. , cn are given, then for any initial state xl, ... , xn, the recurring sequence (3.1.2) satisfies xn+1 = Clx1 + ... +Cnxn,
Xn+2 = C1x2 +
+ Cnxn+1,
Xn+T = C1XT + ... + Cnxn+T-1
Let us change the notations and put bt = xn+t, t = 1, 2, ..., T, and all = cl, . . . , aln = cn. Then the first relation becomes
=bl It is clear that we can substitute clxl + and obtain
+ cnxn for xn+l in the second relation
a21x1 + ... + a2nxn = b2
In the same way, we obtain allxl + ... + alnxn = b1, (3.1.3)
aTlxl + ... + aTnxn = bT
Suppose that the initial state (xl, ... , xn) is unknown and we observe the sequence b1, . . . , bT. Then we can regard relations (3.1.3) as a system of linear equations with respect to the unknowns xl, ... , xn. A natural question is how many observations are needed to reconstruct the initial state and to obtain all elements
of the sequence bt, t = T + 1, ... . The other situation concerns the feedback points cl, ... , cn. Suppose we observe the sequence b 1 . . . . . bT, but the vector (Cl, ... , cn) determining the shift register is unknown. If the number of l's in (Cl, ..., c,) is k, then there are (k) possibilities for this vector. If we use an exhaustive search to find the true vector that corresponds to the observed sequence, we have the following situation. If the chosen vector is true, then system (3.1.3) is consistent for any T, but if the vector
(cl, ..., cn) is wrong, then the system becomes inconsistent for some T. Therefore the consistency of the system (3.1.3) serves as a test for selecting the true vector.
Let us introduce the auxiliary notions of a critical set and a hypercycle for our investigations of systems of linear equations in GF(2). Note that the ordinary notions of linear algebra, such as the notion of linear independence of vectors, rank of a matrix, Cramer's rule for finding the solutions of linear systems of equations,
3.1 Rank of a matrix and critical sets
125
and so on, are extended in the obvious way to the n-dimensional vector space over GF(2). For example, if the rank of a T x n matrix A = IIat.I II in GF(2) is r, then the homogeneous system of equations
AX= 0, where X = (x1, ... , x,) is the column-vector of unknowns, has exactly n - r linearly independent solutions. Denote by
at = (atl,...,atn), t = 1,...,T, the rows of the matrix A. If the coordinate-wise sum
ate + ... + atm =0, then the set C = {t1, ... , tm} of row indices is called a critical set. If C1 and C2 are critical sets and C1 # C2, then
C1AC2=(C1UC2)\(C1nC2) is also a critical set. Let El, ... , ss take the values 0 and 1 . Critical sets C1, ... , CS are called independent if
s1C1 A s2C2 A ... A ESC, = 0, if and only ifel Denote by s(A) the maximum number of independent critical sets and by r (A) the rank of the matrix A.
Theorem 3.1.1. For any T x n matrix A in GF(2),
s(A) + r(A) = T. Proof. We consider the homogeneous system of equations
A'Y=0
(3.1.4)
in GF(2), where A' is the transpose of A. There is a one-to-one correspondence between the solutions of the system (3.1.4) and the critical sets: The solution 1 t,,...,tm = (Y1, , YT), whose components yt,, ..., ytm are 1 and the other components are zero, corresponds to the critical set C = {t1, ... , t,,,}. The linear independence of solutions corresponds to the independence of critical sets. Therefore the maximum number of critical sets s(A) equals the maximum number of linearly independent solutions of system (3.1.4), which we know is T - r(A).
Systems of random linear equations in GF(2)
126
In addition to the critical sets of a T x n matrix A = Ilatj II, we consider a hypergraph GA that is also defined by the matrix A. The set of vertices of the hypergraph GA is the set 11, . . . , n) of column indices and the set of enumerated hyperedges is the set {el, ... , eT}, where
et={j: atj=1}, t=1,...,T. Thus there exists a correspondence between a row at = (at 1, ... , atn) and the hyperedge et, t = 1, ..., T. Note that the empty set corresponds to a row consisting of zeros. The multiplicity of a vertex j in a set of hyperedges C = {et. , ... , etn } is the number of hyperedges in C that contain this vertex. A set of hyperedges C = {ett , ... , e1, } is called a hypercycle if each vertex of the hypergraph GA has an even multiplicity in C, in other words, if the coordinatewise sum of rows at, + + atm in GF(2) equals the zero vector. If each row of the matrix A contains exactly two l's, then the hypergraph GA is an ordinary graph, perhaps with multiple edges, and a hypercycle is an ordinary cycle or a union of cycles. The set of the indices of hyperedges that form a hypercycle is a critical set for the matrix A. Let 81, ... , ss take the values 0 and 1 . Hypercycles C 1 . . . . . Cs are independent, if
s1C1 L1s2C20... pe5C5=0, if and only if Si = = es = 0. Therefore the maximum number s(A) of critical sets of the matrix A equals the maximum number of independent hypercycles in GA.
3.2. Matrices with independent elements This section deals with random matrices with independent elements. Let A = II atj II
be a T x n matrix whose elements are independent random variables taking the values 0 and 1 with equal probabilities, and let pn (T) be the rank of the matrix A in GF(2). The following theorem is the main result of this section. Theorem 3.2.1. Let s > 0 and in be fixed integers, in + s > 0. If n
oo and
T = n + m, then 00
P{pn(T) = n - s) -' 2-s(m+s) `
-f1
m+s
(i_) fl (1
1
-
1
21 f
where the last product equals 1 for in + s = 0.
Proof. The limit theorem will be proved by using an explicit formula for Pip, (T) = n - s). Denote by pn (t) the rank of the submatrix of A which consists
3.2 Matrices with independent elements
127
of the first t rows of the matrix A. We interpret the parameter t as time and consider
the process of sequential growth of the number of rows. Let i;t = 1 if the rank p, (t - 1) increases after joining the tth row, and t = 0 if the rank preserves the previous value. It is clear that
P"(0 = 1 + ... -f- t It is not difficult to describe the probabilistic properties of the random variables 1, , T. The event {fit = 1 } means that the tth row is linearly independent with respect to the set of the rows with numbers 1, ... , t - 1, and the event {fit = 0) means that the row with number t is a linear combination of the preceding rows. If among the preceding t - I rows there are exactly k linearly independent ndimensional vectors, then the linear span of these k vectors contains 2k vectors (all linear combinations of these k vectors). The matrix A is constructed in such a way that each row can be obtained by sampling with replacement from a box containing all 2" distinct n-dimensional vectors. In other words, any row of the matrix A is independent of all other rows and is equal to any n-dimensional vector with probability 2-n. Therefore 2k
0 I pn(t - 1) = k) = 2n, 1-2k
P{4t=1I pn(-1)=k} =
n
.
Thus the process pn (t) is a Markov chain with stationary transition probabilities that are given by (3.2.1). To find P{pn(T) = n - s}, we can sum the probabilities of all trajectories of the Markov chain that lead from the origin to the point
with coordinates (n + m, n - s), that is, the trajectories such that pn(0) = 0, pn (n + m) = n - s. If we represent a trajectory as a "broken line" with intervals of growth and horizontal intervals, we see that any such a broken line has exactly n + m - (n - s) = m + s horizontal intervals corresponding to m + s zeros among the values of i;1, . , n+m . The graph of the trajectory with fit, = 0, ... , 0 is illustrated in Figure 3.2.1. By using (3.2.1) and Figure 3.2.1, we can easily write an explicit formula for the probability of a particular trajectory and for the total probability. The derivation of this probability is quite simple if m + s = 0. Indeed, the only trajectory with pn (0) = 0 and p, (n + m) = n + m has no horizontal intervals, and at each interval the broken line increases; therefore 1
P{pn(n+m)=n-s} = 1-2n n
_
H n
i=-m+1
2) ...(l-
)(I -2n
(i_) 1
2'
,
2n+m-1 \
2n
f
Systems of random linear equations in GF(2)
128
n-s
t = n + m
tm+s
t2
tl
Figure 3.2.1. Graph of the trajectory with tl =
= tm+s = 0
and in thecasem+s=0,asn - oo, °O
P{pn(n+m)ns} fl
i=s+1
(_).
This coincides with the assertion of the theorem for m + s = 0 because the last product equals 1. In the general case, for m + s < 0,
P{pn(n+m)=n-s}
T
P{ 1 = 1,...,4tl-1 = 1,4t1
1 3.
(3.4.5)
Denote by Cn,T the event that the graph Gn,T contains no unmarked cycles. We represent the event {v(Gn,T) = k, Gn,T E An,T, Cn,T} as a union of the following disjoint events: In a specific order, T trials give T fixed edges that form a graph consisting of trees and k unicyclic components, including a marked cycle. It follows from this description that Pn,T (k) = P{v(Gn,T) = k, Gn,T E .'4n,T, Cn,T } n
E
n
m!
In
MI!...Mk!k!
X M1
Mk
r1= 1
T'
/2\T
mk PrkFn-m,T-m b(rk)
..
b(rl) ml Prl
n2
/J11
211+12 >
rk= 1
where sl = sl (rl, ... , rk) is the number of 1's among rl, . . . , rk, and s2 = s2(ri, . . . , rk) is the number of 2's among r1, ... , rk. The factor 2-51 appears because the probability 2n-2 is replaced by n-2 in si cases. The factor 2-s2 reflects the fact that permuting trials in which two identical edges occur results in the same graph. The lemma follows from the relations (3.4.5). Theorem 3.4.6. If n, T
oo such that 2T/n -+ ? , 0 < k < 1, then for any
fixedk=0,1,..., Pn,T(k) _
(D(a)2kk! 1
-
(1 + o (1)),
where 00 Dmxm
D(x) _ m=1
m!
,
a = Ae-
3.4 Cycles and consistency of systems of random equations
151
Proof. The proof is similar to the proof of Theorem 1.8.2. We partition the sum from Lemma 3.4.5 into two parts. We put
M=T1/4. It is clear that for any x in the domain of convergence of the series 00 Dmxm
D(x) = we have Dm1xm' ... Dmkxmk m1....mk.
m>M ml+...+mk=m
Dmxm
k-1
E
m>M/k
m.
(3.4.6)
Along with the function D(x), let us introduce the generating function of the number of connected mappings 00 = T,dmxm
d(x)
m!
M=1
The inequality
D(x) < d(x)
(3.4.7)
holds because m
in
dmr) Pr
Dm =
dmr) = dm.
M/k
'
Let
0o nn-lxn
6(x) =
°O
,
a(x) =
nt
.
n=l
T
n=O
n nx n n.!'
By Example 1.3.2 and (1.4.8),
d(x) = loga(x),
a(x)
9(x))-1.
We put a = 2T/n and x = ae-a for a < 1. Then
0(x) = a,
d(x) _ -log (1 -a).
Systems of random linear equations in GF(2)
152
Under the hypothesis of the theorem, a = 2T/n --> X, 0 < ), < 1, and for x = ae-a, there exists q < 1 such that ex = ae1-a < q < 1 for sufficiently large n. Therefore
L
Ml k
m>M/k
(3.4.9)
1 -1 q q
dmxm m!
rr
Using estimates (1.8.8), (1.8.9), and (3.4.6)-(3.4.9) yields
k (m) T!
S2 =
< cT!
2
T
>
1:
M> -M mt+...+mk=m
< C1 1:
n(ae-a)
1:
cln
dmxm
1
m>M/k
m)2(T-m)dm, ... dmk nn!(n (n - m)! 2T-m(T - m)! ml! ...mk!
ml!' mk!
m>M ml+...+mk=m
cln(D(x)) k
ml!...mk!
mF"-m,N
M ml+...+mk=m
(?)T 1: n2
(n)
- 1 -q
m!
T114/k
q
c2Tq
T1141k
where cl, c2 are constants. Thus, under the hypothesis of the theorem, S2 -+ 0. If n, T -+ oo, 2T/n -+ A, 0 < < 1, then by virtue of (1.8.7),
2T-m(T - m)!
0+00))
n2Txm 1- (1+0(1)) 2Tnm uniformly in m < M = T 1/4. Therefore, f o r any fixed k = 1 , 2, ... , T! Sl =
() 2
i
1
2 -k-k!-
X
TZ
1:
i
m <M ml++mkm
M
(n\
Dml ... Dmk
m' mF"-m'N MI!...mk!
Dmixml ... Dmkxmk (I
E m=k
mm! ' mk!
+o (1))
By using the estimate of S2, we obtain Sl
_2k1.l k!
2kk!
00
E m=k ml+...+mk=m (D(x))k(1 + 0
Dm x m1 ... Dmk x mk (1+o(1))+o(1) ml! . . . Mk! 1
(1)).
3.4 Cycles and consistency of systems of random equations
153
Combining the estimates of Si and S2, we obtain, under the hypothesis of the theorem, pn,T(k) = P{v(Gn,T) = k, Gn,T E An,T, Cn,T)
= (D(x))k
a,
1
2kk!
(1+o(1)).
(3.4.10)
Hence the assertion of Theorem 3.4.6 for k > 1 follows, since x = ae-a = (2T/n) e-2T/n - Ae-)` = a and D(x) -+ D(a). We use (1.8.6) and the representation from Lemma 3.4.5 and conclude that
1 - a.(1 + o (1)).
Pn,T (0) =
Corollary 3.4.1. If n, T oo such that 2T/n ---> )., 0 < A < 1, then the probability Pn,T of the event that the graph Gn,T contains no unmarked cycles satisfies the relation
Pn,T = eD(a)/2 1 _-X(1 + o (1)). Proof. We denote by pnT the probability of only marked cycles in the case where the graph has k unicyclic components and all the probabilities pr are equal to 1,
r = 1, 2. .... In this case, D(a) = d(a) = 2A = - log (1 - )), and Theorem 3.4.6 gives Ake-A pn17 (k)
k = 0, 1, ... .
,
k!
To prove the corollary, it suffices to show that in the sum n
(3.4.11)
Pn,T = E Pn,T (k), k=0
one can pass to the limit under the sum. Let us show that for any e > 0, there exists
K such that 00
(3.4.12)
T. Pn,T (k) < e. k=K+1
We choose K such that 00
Ake-A
k=K+1
s
k!
2
and for fixed K, we choose no so that for n > no, K
K
pn,T (k) k=0
k=0
Ake-A
k
£
< 2.
Systems of random linear equations in GF(2)
154
Then, for n > no, 00
nk-A
00
pn, T(k) -
pn1T (k)
k!
k=K+1
k=K+1
K
K k=0
-
k=0
nke-A
k
and therefore 00
pn,T (k) < &. k=K+1
Since P, ,T (k) < pn1 , (k), estimate (3.4.12) and the validity of passing to the limit under the sum are established. Proof of Theorems 3.4.4 and 3.4.5. A cycle leads to the inconsistency of system (3.4.1) if the sum of the right-hand sides of the subsystem corresponding to the cycle is odd. Let pr be the probability that this sum is even for a cycle of length
r. Then Pn,T(k) = pn,T(k) for any k = 0, 1. .... Therefore Theorems 3.4.4 and 3.4.5 are direct corollaries to Theorem 3.4.6 and the fact proved above that one can pass to the limit under the sum in (3.4.11). To prove Theorem 3.4.4, we notice that in this case, according to Lemma 3.4.1,
Pr= (1+Or)/2, where A = 1 - 2p; therefore 00
D(x) = E
D 'n
m=1
M!
2
00
yn
L d!
m=1
m
EY m=1 r=1
=
00 1
00
xrn
+
d r(nr)(1 + Or)xm
m
2 m=1 r=1
2m! (Mr)
rm
d m! x
= 2(d (x) +d(x, 0)),
where 00
m
d(x,A)=T, E m=1 r=l
(3.4.13)
) Arxm m!
Forx=ae-Q,0oofor a>1. In Section 3.4, we considered systems with at most two unknowns in each
equation. It was shown that if T/n -+ a, 0 < a < 1/2, then the maximal number of independent critical sets or independent cycles in the corresponding graph approaches the Poisson distribution with parameter A = -2 log(1 - 2a). As follows from Theorem 2.1.6, if a > 1/2, then s(A) tends in probability to infinity.
The case of a matrix with independent and identically distributed random elements taking the values 0 and 1 with probabilities 1/2 and the case of a matrix with at most two elements in each row studied in Section 3.4 can be considered as the extreme cases in terms of the behavior of the rank and the maximum number of independent critical sets. In these cases, the threshold effect appears at the points
T/n = 1 and T/n = 1/2, respectively. In this section, we consider an intermediate case and obtain a weaker form of the threshold effect. We consider the system of random linear equations in GF(2): x=1(t)+...+xt.(t)=bt,
t=1,...,T,
(3.5.1)
where i1(t), ... , ir(t), t = 1 , ... , T, are independent identically distributed random variables taking the values 1, ... , n with equal probabilities, and the independent random variables b1, . . . , bT do not depend on the left-hand side of the system and take the values 0 and 1 with equal probabilities. If r = 2, we obtain the system considered in Section 3.4. In Section 3.1, we introduced the notions of critical sets for a matrix and hypercycles for the hypergraph corresponding to a matrix. Denote by A,,,,,T the matrix
3.5 Hypercycles and consistency of systems of random equations
157
of system (3.5.1) and by Gr,n,T the hypergraph with n vertices and T hyperedges el, ..., eT that corresponds to this matrix. Thus we consider a random hypergraph Gr,n,T, whose matrix A = A,,n,T = Ilat.i I I has the following structure. The elements of the matrix ate, t = 1, ... , T, j = 1, ... , n, are random variables and the rows of the matrix are independent. There are r ones allocated to each row: Each 1, independent of the others, is placed in each of n positions with probability 1/n, and atj equals 1 if there are an odd number of l's in position j of row t. Therefore, there are no more than r ones in each row. For such regular hypergraphs, the following threshold property holds: If n, T oo such that T/n -> a, then an abrupt change in the behavior of the rank of the
matrix Ar,n,T occurs while the parameter a passes the critical value ar. This property can be expressed in terms of the total number of hypercycles in Gr,n,T. Let s(A,,,,T) be the maximum number of independent critical sets of A,,, T or independent hypercycles of the hypergraph Gr,n,T. Then S(Ar,n,T) =
2s(Ar.n.T)
-I
is the total number of critical sets or hypergraphs.
In this section, we prove that the following threshold property is true for S(Ar,n,T).
Theorem 3.5.1. Let r > 3 be fixed, T, n - oo such that T/n - a. Then there exists a constant a, such that ES(Ar,n,T) O for a < a, and ES(Ar,n,T) - 00
for a > a,. The constant a, is the first component of the vector that is the unique solution of the system of equations
)a ex cosh A,
(ar
ar-x
x (ar-xll1r x
=1
= 1.
(3.5.2)
AtanhA = x, with respect to the variables a, x, and X. The numerical solution of the system of equations gives us the following values of the critical constants: C13 = 0.8894...,
a4 = 0.9671...,
a5 = 0.9891...,
a6 = 0.9969...,
a7 = 0.9986...,
a8 = 0.9995....
158
Systems of random linear equations in GF(2)
Expanding the solution of the system into powers of e-' yields
e-''
e-2r
artil-log2 log2(2r2
r
1
+log2-r-2)
which gives values close to the exact ones for r > 4. Let us give some auxiliary results that will be needed for the proof of Theorem 3.5.1. The total number of hypercycles S(Ar,n,T) in the hypergraph Gr,,,,T with the matrix Ar,n,T can be represented as a sum of indicators. Let t1,,,,,tm = 1 if the hypercycle C = {et, , ... , etm } occurs in Gr,n,T, and t...... tm = 0 otherwise. It is clear that 11 does not depend on the indices t1, ... , t,,,. Indeed, from the definition of the random hypergraph Gr,n,T, the indicator 4t...... t , = 1 if and only if there are an even number of 1's in each column of the submatrix consisting of the rows with indices t1 , .. t,,,. The number of 1's inn columns of any m rows, before these numbers were reduced modulo 2, have the multinomial distribution with rm trials and n equiprobable outcomes. Denote by rll (s, n), ... , nn (s, n) the contents of the cells in the equiprobable scheme of allocating s particles into n cells. In these notations, the number of 1's in the columns of any m rows, before those numbers have been reduced modulo 2, have a distribution that coincides with the distribution of the variables r/l (rm, n), ... , r/n (rm, n). Therefore
tm = 1) = P{ril(rm, n) E E, ... , r/n(rm, n) E E}, where E is the set of even numbers, and the average number of hypercycles in Gr,n,T can be written in the following form: T
ES(Ar,n,T) = E () PE(rm, n), M=1
(3.5.3)
P
where
PE(rm, n) = P{ril(rm, n) E E_., 11n(rm, n) E E}. Thus, to estimate ES(Ar,n,T), we need to know the asymptotic behavior of PE(rm, n). We consider a more general case and obtain the asymptotic behavior of the probabilities
PR(s, n) = P{rjl (s, n) E R, ... , rjn(s, n) E R}, where R is a subset of the set of all nonnegative integers.
The joint distribution of the random variables rjl(s, n), ... , ?In (s, n) can be expressed as a conditional distribution of independent random variables i , ... , Sin, identically distributed by the Poisson law with an arbitrary parameter A, in the
3.5 Hypercycles and consistency of systems of random equations
159
following way (see, e.g., [90]). For any nonnegative integers s1, ..., s, such that
s1+...+Sn=S,
P{rlt(s,n)=S1,...,rln(s,n)=sn} =P{
=S}.
=Sn I
1
Therefore
PR(s, n) = P{rjl(s, n) E R, ... , 17n (s, n) E R)
E R, ..., Sn E R, 1 + ... + n = s} P{'1 + + Sin = S}
S I 1 E R, ..., n E R}
_ (P{l;1 E R})n
Pt
We now introduce independent identically distributed random variables
(R),
nR)
with the distribution
1
=kI
i E R},
k = 0,1,....
It is not difficult to see that
=s},
E R} = P
=s 1 4 1 E
and therefore P{41R>
PR(S, n) = (P{1 E R})n
+ ... +nRi = s}
(3.5.4)
S}
Let x = s/n and choose the parameter A of the Poisson distribution in such a way that
x = Eli
ke-'
=
k ki
E R}.
kER
Let d be the maximum span of the lattice on which the set R is situated and denote the lattice by FR.
Theorem 3.5.2. If s, n --* oo such that n E FR, then in any interval of the form
0<xo<x<x1 0 can be chosen arbitrarily. We now estimate T
ES(Ar,n,T) _
Q PE(rm, n).
M=1
Lemma 3.5.1. If r > 3 is fixed, and T, n -+oo such that T/n -+a, then for any s > 0, there exists S > 0 such that
T
(T)PE(rmn)<sE.
1<m (X(o)). The coordinates of X(o) that coincide with the corresponding coordinates of X* are called true, whereas those that do not coincide are called wrong. For the algorithm of coordinate testing to lead to the true solution, the following obvious conditions must be fulfilled. For each coordinate of the vector X(o), the value of fi (X(o)) must increase if we replace the wrong value of the coordinate by the true value, and the value of ,B(X(°)) must strictly decrease if we replace the true value by the wrong one. We separate all the equations of the system (3.6.1) that contain xi, and denote the number of such equations by ni. Replacing by x(o) changes the contribution in l(X(o)) of these equations only, and each equation containing xi contributes
170
Systems of random linear equations in GF(2)
1 or -1. If x(°) is wrong, then the increment of 0(X(0)) due to replacing xl°) by x(°) is equal to the random variable Oi(X(°)) such that (Oj(X(°)) +ni)/2 has the binomial distribution with parameters (n1, pi), where pi is the probability that the coincidence in a fixed equation containing xi appears after substituting x( 0) for 0), provided .40) is wrong. It is not difficult to see that xI
pi = vq + (1 - v)p,
(3.6.10)
where q = P {bi = bl }, p = 1- q, and v is the probability that the second variable in the equation has the true value. The second variable takes values from the set 1xi°),
... ,
x1°)
with equal probabilities. Therefore v = (k - 1)/(n - 1), where k is the number of true coordinates of X(0), which equals (X(0)) under the assumption that (X(0)) > (X(0)). It follows from Lemma 3.6.1 and equality (3.6.10) that
(1+0)k
hi = 2(n-1)
+ C1
k 11-0 -n-1 2
(2(X(0))-n+1)0 2+ 2(n-1) 1
which we write as ISn I0
1
Pi=2"1' 2,fn-
(3.6.11)
where
(2 (X(0))
n+
n1
By assumption, n is asymptotically normal with parameters (0, 1). Therefore
P{I4nl > (O2T/(n2logn)) 1/4} -
1
(3.6.12)
because 02T/(n2logn) -+ oo. Next, we find a lower bound f o r n1, i = 1, ... , n. To this end, we take into account only the first variable in each equation. Then we obtain the classical scheme of equiprobable allocation of T particles into n cells, and by applying the corresponding results on the distribution of the minimum of contents of cells [90], we find that
P{ min ni > T/(2n)} -* 1. 1), we have
(X(c)) < 0 1 xiO) is wrong}
= P{(4(X1c)} +ni)/2 < ni/2 1 xtci is wrong)
= P{Sn; < ni/2}, where Sn, has the binomial distribution with parameters (n1, p;). From (3.6.11), we find that
P{Sn < n1/2} = P{Sn; - ESn, < When we use estimate (3.6.8) of the exponential type for the binomial distribution and take into account (3.6.12) and (3.6.13), we obtain
P{S< n/2}
exp I-
02T
02T
(n2logn)"21
In a similar way, we obtain the bound
0 1 xeis true}
exp
A2T (2T)_h/2J 2logn
Therefore an upper estimate for the probability of at least one wrong decision while testing all the coordinates of the vector X(O) is n
(Xtci) < 0 1 x,o) is wrong) +
(X(O)} > 0 1 x(° is true))
i=1
I_ 0
< 2n exp { -
T
n2
Q2T \ -1/2
\ n log n
and tends to zero under the conditions of the theorem because
LI2T/(n2log n) -* 00. With the help of a preliminary search of the n-dimensional vectors, it is possible to select an initial vector X(°) with a great number ,j (X (0)) of coordinates coincid-
ing with the corresponding coordinates of the true solution X*. If the algorithm for coordinate testing begins with this initial vector, then a much smaller number of equations is needed to reconstruct the true solution. This number is comparable to the number of edges needed for the graph Fn,T to be connected. Theorem 3.6.3. If n, T -a oo and A
0 such that
02T n log n
-+ 00,
172
Systems of random linear equations in GF(2)
then there exists an algorithm that reconstructs the true solution of system (3.6.1) with probability tending to 1.
Proof. The algorithm, which gives the true solution under the conditions of the theorem, begins with a preliminary search of an initial vector X(°) with a large number of coincidences with the true vector X*. The choice of X(°) is determined by a search of all n-dimensional vectors. To this end, we choose the level l =
Tq-uTNIT,where q = P{br = br) = (1+0)/2anduT = 0//18, and select the vectors X for which ,B (X) > 1. Recall that 0 (X) is the number of coincident coordinates of the vector B = (b1, . . . , bT) and the vector of the right-hand sides of system (3.6.1) that are obtained when X is substituted into the left-hand side of the system. The vector X* will be selected with probability tending to 1. Indeed,
P{O(X*) < Tq - UT/} = P{ST - EST < -uTvlT}, where ST is the number of successes in T independent trials with the probability of success equal to q = (1 + 0)/2. By using estimate (3.6.8), we find that
P{f(X*) < 11
2n /3. Beginning the algorithm for coordinate testing with this vector X(O), we find, using the notations introduced in the proof of Theorem 3.6.2, that
0 I xac) is wrong) = P{S"; - ES"; < -0I4"In;/(2,1'n-)j. Using estimate (3.6.8) and taking into account that with probability tending to 1, Ion I > / /3 for the selected vector and ni > T/(2n), we find the estimate
P{-i(X(c)) < 0 I xiO) is wrong) < P{Sni
- ES, < -Ani/6}
< e-°ZT/(36n) Similarly we obtain P{4-i (X(0)) > 0 I x(o) is true}
0 < ()P{rn12
< m}.
(3.6.16)
Let tti = 1 if the edges (i + 2, 1) and (i + 2, 2) occur in rn,T, and µi = 0 otherwise; and vi = 1 if exactly one of the edges (i + 2, 1), (i + 2, 2) occurs in rn,T, and vi = 0 if the edges (i + 2, 1) and (i + 2, 2) do not occur in rn,T. The random variable m 12 can be represented as the following sum of indicators:
M12=µ1++µn-2,
3.6 Reconstructing the true solution
175
and m
P{m12<m}_E
Pi,...ik,
(3.6.17)
k=0 {i,,...,ik}
where is the probability that µi, , ... , µik take the value 1 and all the other random variables take the value 0. It is not difficult to see that (Mt, Nt), where
Nt = vi +...+vt,
Mr = tti +-..+µt,
is a Markov chain because (1-tt+1, vt+1) depends only on the number of edges used
to construct the random variables µl, ... , µt, vi, ... , vt, t = 1, ... , n - 2. More precisely, let
p(t I Yt-1, Zt-1) = P{l-tt = 1 I Mt-1 = Yt-1, Nt-1 = Zt-1},
q(t I Yt-1, Zt-1) = P{/-tt = 0 I Mt-1 = Yt-1, Nt-i = Zt_1}. By using this notation, we can write the probability that µi, , ... , µlk take the value 1 and all the other random variables take the value 0 in the form Pi,...ik = P{µi, = 11 ... , N-ik = 1,
/-ti = 0,
il,...,1k I V1 =Z1,...,Vn-2=Zn-2}
i
= q(I I Yo, Z0) ... q(il - I Yi,-2, Zi,-2)p(il Yi,-1, Zi,-1) I
x q(il + 1
I
I
Yi,, Zi,) ...q(n - 2 1 Yn-3, Zn-3),
where Z0 = Yo = 0, Zt = zl +
+ zt, and Yt is the number of il, ... , ik that do not exceed t. We now estimate the probabilities p(t Y, Z) and q(t Y, Z). It is clear that p(t I Y, Z) + q (t Y, Z) = 1, and the probability p(t I Y, Z) does not depend on t and equals the probability p2(s, N) that two fixed places corresponding to the I
I
I
edge (1, t), (2, t) will be occupied after allocating s = T - 2Y - Z edges into N = (2) - Z places in the classical scheme of allocation of particles. Therefore
p2(s, N) =
S!
1! 1! (s - 2)! N2
(-
2
s-2
s!
+ k+l>2,k,1 1, a
lwithm=n - N -+ oo,anda -lwith m
fixed.
Let a oo. We must select the value of the parameter x so that ton. Since for l;l with distribution (4.1.2),
is close
(1 - x)log(l - x)'
we choose x, 0 < x < 1, such that x (4.2.11)
(1 - x)log(1 - x)
a'
where a = n/N. This equation is approximately satisfied if we take
x=1- a log a 1
If N/ log n oo, then x = 1 - 1/(a log a) is farther from the singular point x = 1 than x = 1 - 1/n, and therefore the normal approximation is valid for the
sum N. Theorem 4.2.5. If n, N
oo, a = n/N - oo and
oo such that N/ log n
the parameter x = 1 - 11 (a log a) and 6a = a log a, then
k} =
2nNe-Zl2(1 + o(1)) Qa
Z
uniformly in the integers k such that z = (k - n)/(aaN/_N) lies in any fixed finite interval. Proof. The characteristic function of the random variable l; i is
Pn(t)
log (l - xe`t) log(1 - x)
It is easy to see that for any fixed t, as N/ log n -* oo and a = n/N -* oo, e
(or.,) t
1 - 2N + o N
Denote by /n(t) the characteristic function of N =
n)/(a ay w ), then
under the conditions of the theorem for any fixed t,
*n(t) = (e_itcon (
t tv
))N
-
,
e-t2/2,
and the distribution of N converges weakly to the normal distribution with parameters (0, 1).
Random permutations
190
The local convergence can be proved by the standard reasoning and we omit this technical part of the proof of Theorem 4.2.5. From Theorem 4.2.5 and representation (4.2.2), we obtain the following assertion.
Theorem 4.2.6. If n, N -+ oo such that N/ log n -+ oo, a = n/N -* oo, then
P{v = NJ _ (- log(1 - x))N (1 + o(1)), N! xnUa 27rN
where x = 1 - 1/(a log a) and as = a -10-9a. The following theorem for the case where a tends to a constant greater than 1 can be proved in the same way as Theorem 4.2.5.
If n, N -+ oo and there exist constants ao and al such that 1 < ao < a < al, the parameter x = x, where xa is the unique solution of Theorem 4.2.7.
equation (4.2.11) in the interval (0, 1), and 2 _
xalog(1 - X,,,) + xa
Qx (1 - xa)2log2(1 - xa)
k}
2,-e-ZZ/2(1
=
+ o(1))
Qx
uniformly in the integers k such that z = (k - n)/(vx 27rN) lies in any fixed finite interval. Proof. The proof is similar to the proof of Theorem 4.2.5 and we omit the details.
Note only that a =
and or; = D41 for x = xa.
Using Theorem 4.2.7 and representation (4.2.2), we obtain the following assertion on the distribution of v,,.
oo and there exist constants ao and al such that
Theorem 4.2.8. If n, N
1 0 and satisfies conditions oo, N = p log n + o (log n). Then (1) and (2), n
1e-y/ r(p)(1 + o(1))
k} = y
uniformly in the integers k such that y = k/n lies in any fixed interval of the form
0 0, 00
pm = Pl = m} = T, PkPk+m k=1
Let 00
0(t) = Y' pmeitm = Po + 2 m
pm cos tm. m=1
It is clear that the characteristic function cp(t) of the random variable 1 is related to ep(t) by the equality cp(t) = I cp(t) I2. To estimate ip(t), we use a standard inequality
4.3 Permutations with restrictions on cycle lengths
205
(see, e.g., [49]): For t > 0, 00
00
pm(1 - costm) > 2
1 - cp(t) = 2 m=1
(4.3.22)
Pm, s=0 mEM,s
where
jr 27r s <m< Ms={m:2t+
37r
t
Lemma 4.3.4.
2t
2irs
+ t }.
For mo > 0,
2 j Pm ? m>mo
00 Pk PI T, 1>2mo
k=1
Proof. By using 1 = m + k as the variable of summation, we obtain 1-MO
00
2 E Pm = 2 m>mo
00
P1 Pk + E E PkPI
E PkPk+m = m>mo k=1
00
1>mo+l k=1
1=1 k=1+mo (4.3.23)
The right-hand side of (4.3.23) is estimated from below by the quantity 00
PI > Pk = 1>2mo
k=1
PI l>2mo
To see this, it is sufficient to delete the first terms in the first sum from (4.3.23), retaining 1-MO
P1 Y' Pk, k=1
1>2mo
and, in the second sum from (4.3.23), to shift the domain of summation to 2mo, giving 00
P1 T Pk> l>2mo
k=1-mo+1
which does not exceed the second sum from (4.3.23) by the monotonicity of the probabilities. Lemma 4.3.5. For 0 < t < n, 1 - cp(t) ?:
I 3
T, Pkk>7r/t
206
Random permutations
Proof. Note that the summation on the right-hand side of (4.3.22) occurs over integers m from an interval of length 7r/t. If we enumerate intervals of such a length
on the positive semi-axis starting at the point n/(2t), the domain of summation will consist of the intervals labeled by odd numbers. Notice that the sequence of probabilities pk, k = 1, 2, ..., is monotone, and the numbers of integer points in any two intervals of length 7r/t differ by at most 1. Therefore each interval of length 7r/t for 0 < t < it contained in the right-hand side of the sum (4.3.22) contributes not less than one-third of the total sum of the two following intervals: the interval itself and the interval adjoined to it on the right side, which does not belong to the initial domain of summation. (Note that, as t - oo, the number of integer points in one interval increases and its contribution to the sum tends to 1/2.) Therefore, (4.3.22) implies 00
2
1-tp(t)>2E
pm - 3
i5-.
m>n/(2t)
s=0 mEM,
By applying Lemma 4.3.4, we obtain the assertion of Lemma 4.3.5.
It remains to estimate the sum of the form >k>,, pk from below. If we use the inequality 1 - 1/n > e-11(n we obtain xk k>a k
>
E e-k/(n-1)
J -e-y
00
1
k>ak/(n-1)n-1
>
1)
a/(n-1) Y
dy > C3
log
a
n-1 (4.3.24)
where c3 is a constant. We use Lemma 4.3.5, set a = 7rn / I t I in (4.3.24), and obtain for I t I /n < 7r,
/t
1
1 - cp (n)
E
3 l>nn/Itl
1
pt
- 3B(x)
(c3 -log
nn I ti(n - 1)
1
3B(x)
(log It I+ c4),
where c4 is a constant. Hence, we go on to estimate tp(t/n) and find that logItl+C4)112
1 log n, then ,p
(n
N
< exp S
12 log I t I +
C4
12 1ogn
} < csltl
- 1/12.
(4.3.25)
4.3 Permutations with restrictions on cycle lengths
207
We now return to the case R c N. We retain the notation cp(t) and tp(t) for the characteristic functions and set akxk
B(x)akx, ak=k; k
kER,
1
kER
SR(k) = 0 fork V R, and SR(k) = 1 fork E R. Lemma 4.3.6. Suppose that R has the density p > 0 and satisfies conditions (1) and (2). Then, for I t I /n < 7r and N > p log n, 2
N (t
tp
1/(12r2p)
< c6Iti
n
where r is defined in condition (1) and c6 is a constant.
Proof. We revise the arguments leading to estimate (4.3.25). Inequality (4.3.22) now takes the following form: For t > 0, 1 - I tp(t)12 =
00
2
B 2 (x)
00
E L: 1:
akxkSR(k)ak+mxk+maR(k+m),
s=0 mEMM k=1
where Ms
_
m.
r
3n
2 rs
2irs
2t + t < m - 2t + t We retain only one summand in each interval of length r, replace this summand by the minimum value over the interval, and use the transition from the sum over one interval of length r to one-third of the sum over the interval of twice the length.
Then we obtain for t > 0,
1`
00
EY
3 L ak+rlx k+r[ 3
ak+mxk+mSR(k+m)
rl>n/(2t)
s=0 mEMs
Once again, we preserve only one summand in each interval of length r and get
1-
itp (t)12
= 3B2(x) 2 >
2
akx kS Y k=1 °O
3B (x) T armx 2
M=1
2
ak+rlx k+rl
R (k)
I>7r/(2tr)
rm
Y,
arm+rlx
rm+rl
1>7r/(2tr) °O
xrm xr(m+l)
2( )r2 L, m- m f l 3Bx 1>n/(2tr) m=1
The assertion of Lemma 4.3.4 is based on the monotonicity of the probabilities pk, k = 1, 2 , .... The summands of the last double sum are similar to the
Random permutations
208
summands of the sum in Lemma 4.3.4, and the values xrk/k, k = 1, 2, ... , are also monotonic. Therefore we may use Lemma 4.3.4 and obtain 1 - Iw(t)12
ea
xrl
2
xrl
log(1 -xr)
xrm
3B2(x)r2 l>Er) l
For a fixed r, the estimate (4.3.24) remains true. Therefore, by taking into account the asymptotics B (x) = p log n + o (log n) and - log(1 - xr) = log n + o (log n), we find 2
1
3r2p2logn
(log ItI +c).
Hence,
-
I
1-
logIti+c
logIti+c eXp 1-6r2 P 2log n I
3r2 p2 log n)
and for N > 2 p log n, N c6it,-1/(12r2P)
where c6 is a constant.
Proof of Theorem 4.3.2. Consider the sum r;N = l + + 4N of independent identically distributed random variables with distribution (4.3.10). As we have seen, Lemma 4.3.1 implies that, as n -+ oo and N = p log n + o(log n), the distribution of Wn converges weakly to the distribution with density
We now prove the local convergence of these distributions. For an integer k, let y = k/n. By the inversion formula, /'nn
n
nn
dt,
where co(t) is the characteristic function of the distribution (4.3.10). The density of the limit distribution at a point u > 0 can be represented by the integral up-le-n
f'(p)
__
1F 27r
1 it)pe-itu
(1 -
du.
Hence,
1e y/r(p)=It+I2+I3,
4.3 Permutations with restrictions on cycle lengths
209
where
- IA e-lty
(PN ( t
f
_
13 = J
(1 - it)P) dt,
n)
A
12
1
e-,ty(1
1
dt,
- it)P
< IrI
e-ZtycoN
yo > 0, mclc4NgN-1 f8n
112 (E)1
0. It is easy to see that by first choosing a sufficiently large A and then a sufficiently
large n, we can make the difference being estimated arbitrarily small. Note that the difference is bounded uniformly with respect to N, and hence, there exists a constant c9 such that for y > yo > 0 and for all N,
k} < c9/n.
(4.3.26)
Proof of Theorem 4.3.3. In (4.3.8), divide the domain of summation into two parts: N1 = {N: IN - B(x)I < N2/3} and N2 = {N: IN - B(x)I > N2/3). It is not difficult to see that the assertion of Theorem 4.3.2 is fulfilled uniformly in N E N1. Therefore
n} = ell I'(p)(1 + o(1)) uniformly in N E N1, so
_+00)). e-1
ni (p)
NEN1
(1
(B(x))N e-B(x) (1 + o(1)) = N! nr'(p)
We use the estimate (4.3.26) and obtain
(B(x NEN2
N!
n} < c9 n
NEN2
Since the sum on the right-hand side of this inequality tends to zero, the total sum
in (4.3.4) equals (enF(p))-l(1 + o(1)). It remains to note that
xn = e1(1 +o(1)),
B(x) = Bn,R = kER
k
(1
- n)k.
Random permutations
212
Proof of Theorem 4.3.4. According to (4.3.3), P{vn,R = N} =
n!(B(x))'' N!xnan,R
PRN = n).
If we substitute the corresponding expressions for an,R and PRN = n), we obtain
P{vn,R = N) = (B(X
)Ne-B(x)(1
+o(1))
for N = B(x) + o(B(x)). We note that B(x) = Bn,R and that the expression obtained above holds uniformly in N such that (N - B(x))/ B(x) lies in any fixed finite interval; thus, we obtain the assertion of Theorem 4.3.4.
4.4. Notes and references The probabilistic approach that is now commonly used in combinatorics was first formulated in an explicit form and applied in the investigations of the symmetric
group S, by V. L. Goncharov [51, 52, 53]. For the random variables al, ..., an, he found the joint distribution (4.1.4) and the generating function (4.1.5). For the total number of cycles vn = al + + an, he proved that, as n - oo,
Evn = logn+y+o(1), Iogn - (n2/2 - y/2)/ logn + o( logn
Dvn =
Goncharov also proved that the distribution of (vn - log n)/ log n converges to the standard normal distribution, and the distribution of a, converges to the Poisson distribution with parameter 11r. Let &, be the length of the maximum cycle in a random permutation from Sn. Goncharov [51, 53] showed that 00
(h
P{ fJV < m) =
h
Sh(m, n),
h=0
where So (M, n) = 1,
1
Sh(m, n) =
k1...kh k1,...,kh>m
Let
Io(x,l-x)=1,
Ih(x,l-x)=
f x
dxl ... dxh xl ..xh
0 < x < 1.
4.4 Notes and references
213
Goncharov proved that, as n - oo, the random variable On In has the distribution with the density
_h
X-1
fi(x)=x1 h!) I(x,l-x), 1+A-x-A h=0 which, as is clear from the preceding formula, is defined by different analytic expressions on the sequential intervals of the form [1/(1 +A), 1/A], where X is an integer. For example, 1
1
2<x s2 for sufficiently small s. Similarly,
/2'
cosae-Q(1-cos40)-Q2(1-cos2w)/2dip
< f e1odip
3, and consider the number T (P) of solutions of equation (5.1.2).
oo and p is prime, p > 3, then
Theorem 5.2.1. If n T(P)
(e)n(1-11P) p-1/2en'/P(1
=
+00)).
Proof. The proof is almost the same as the proof of Theorem 5.1.3 given above and is also based on relation (5.2.4). For R = 11, p},
B(x) = BR(x) = x +xP/p, and the independent random variables 1, ... , N in (5.2.4) have the distribution x P11 = 11 = B(x)
P
P XP
px
P11
px +xP '
- - B(x) - px +xP p}
We choose the parameter x such that
x + xP = n.
(5.2.7)
Then
x = n11P - 1 n- 1+21p + 0 (n-2+21P l 1 p
B(x) = x + xPIP =
P
+ pp
1
n
1/p + O(n- 1+2/P),
XP
p(x) = px + XP = 1 - pn-1+1/P + O(n-2+2/p), n/B(x),
(p - 1)2
pn-1+1/p(1
+ o(1)).
Equations containing an unknown permutation
230
Let
u=
p(N - B(x)) B(x)D41
A = 21ogn,
,
and divide the sum in (5.2.4) into two parts so that
T(P)-n!e B(x) n
xn
(SI + S2),
where
Si =
n},
N!
N:IulA
In the first sum, N = B(x)(1 + o(/)) and N
B N! ) e-B(x)
2 B(x) 0+00))
uniformly in the integers N such that I u I < A.
Let 4t* = (41 - 1)/(p - 1), i = 1, ... , N. The sum
N=S1 +...+cr/ has the binomial distribution with N trials and the probability of success
p(x) = xP/(px +xP) = 1 - pn-1+11P + O(n-2+2/Pl l // as n -k oo. It is clear that
n} = PK = (n - N)/(p - 1)}, n) = 0. Since E 1 = n /B (x),
and if (n - N) / (p-1) is not an integer, then P
B(x) = n/p(1 + o(1)), and
n-NE41 _ n(B(x)-N)
NDi
B(x)
ND41 nu
_
p B(x)N as n
-u(1 + 0(1))
oo and I u I < A, by using the de Moivre-Laplace theorem, we obtain
n} =
(n - N)/(p - 1)} =
2n
I
e-u2/2(1 +o(1))
uniformly in the integers N such that (n - N)/(p - 1) is an integer and I u I < A.
5.2 Equations of prime degree
231
Therefore
S1 = E BNt )e-B(x)PVN = n) N:Iul A,
n}
2. Put N = 4i +
+ SN It is clear that
(x+xd1 +...+xd'-1 +xd)/B(x).
236
Equations containing an unknown permutation
We choose the parameter x such that
x + xdl + ... + xd,-' + xd = n,
(5.3.3)
and in what follows, we consider the random variables 1, ... , N with distribution (5.3.2), where x is the solution of this equation. By iteration, it is not difficult to determine that
- ndr_1/d _ ... _ nl/d +0(1)
(5.3.4)
- ndr_i/d _ ... _ nl/d +112+o(l)
(5.3.5)
xd = n if d is odd, and
xd = n
if d is even. Since T(d) = an,R, where R = 11, dl, ... , d,._1, d), we can use the representation (5.2.1) and obtain n! eB(x) xn
00
n}.
(5.3.6)
N=1
Therefore, to obtain the assertions of Theorem 5.3.1, it is sufficient to find the asymptotics of n}. It is not difficult to see that n
B(x) , B(x)(x + dlxd' + ... + dxd)
D1 = B(x)
- n2
B2(x) xdi
xd
dl
d
xi JId
j'
where the summation is over the integers j, which are the divisors of d. In view of (5.3.4) and (5.3.5), n j/d
B(x) = lld
J
(1 +o(1)),
(5.3.7)
as n -k oo. By estimating the second and third central moments of 41 and using the characteristic function of N, we can prove that the distribution of the random variable NE41)/ converges to the normal law with parameters (0, 1) as N D 1 oo. If h is the maximal step of the lattice containing the set R, then the local limit theorem is valid on this lattice. We omit the proof of this local theorem.
5.3 Equations of compound degree
237
The remaining part of the proof of Theorem 5.3.1 repeats the corresponding part of the proof of Theorem 5.1.3 from Section 5.2. We put V
_ n - NE41
u
ND_1 '
_ d(N - B(x))
A = 2 2 log n,
B(x)De1
and divide the sum from (5.3.6) into two parts so that xn
n
where S1 =
Ni BN(x)
S2 = E
BN
n},
n}.
N:Iul>A
It is easy to see that N = B(x)(1 + o(1)) for JuI < A = 2 2logn and v
= n(B(x) - N) B(x)
= -u(l + O(n(5.3.8)
and by the local limit theorem, h
n} =
2nN D 1
e-nz/2(1 + o(1))
uniformly in the integers N such that I u I < A and (n - N)/ h are integers. Recall that h is the maximal span of the distribution of 1. As in the proof of Theorem 5.1.3, Section 5.2, we obtain S1
=
1
dh
1
d 2nB(x) 2n
N:Iul 1
2
k
where k > 2, m 1, ... , m k are fixed natural numbers, X1, ... , Xk E Sn , and e is the identity permutation in Sn, is considered in [110]. The asymptotic representation
of the number of solutions X = (X1, ... , Xk) such that Xi Xj = XjX1 for all i # j is found.
BIBLIOGRAPHY
[1] Sh. M. Agadzhanyan. On a general method of estimating the number of graphs from given classes. Avtomatika, (1):10-21, 1981. In Russian. [2] Sh. M. Agadzhanyan. The asymptotic formulae for the number of mcomponent graphs. Avtomatika, (4):27-33, 1986. In Russian. [3] D. J. Aldous. Exchangability and related topics. Lecture Notes in Math., 1117:1-198, 1985. [4] D. J. Aldous. Brownian bridge asymptotics for random mappings. Adv. Appl. Probab., 24:763-764, 1992. [5] J. Amey and E. A. Bender. Random mappings with constraints on coalescence. Pacific J. Math., 103:269-294, 1982. [6] R. A. Arratia. Independent process approximation for random combinatorial structures. Adv. Appl. Probab., 24:764-765, 1992. [7] R. Arratia and S. Tavard. Limit theorems for combinatorial structures via discrete process approximations. Random Structures and Algorithms, 3:321-345, 1992. [8] G. N. Bagaev. Distribution of the number of vertices in a component of an indecomposable mapping. Belorussian Acad. Sci. Dokl., 21(12):10611063, 1977. In Russian. [9] G. N. Bagaev. Limit distributions of metric characteristics of an indecomposable random mapping. In Combinatorial and Asymptotic Analysis, pp. 55-61. Krasnoyarsk Univ., Krasnoyarsk, 1977. In Russian. [10] G. N. Bagaev and E. F. Dmitriev. Enumeration of connected labelled bipartite graphs. Belorussian Acad. Sci. Dokl., 28:1061-1063, 1984. In Russian. [11] G. V. Balakin. On random matrices. Theory Probab. Appl., 12:346-353, 1967. In Russian. [12] G. V. Balakin. The distribution of random matrices over a finite field. Theory Probab. Appl., 13:631-641, 1968. In Russian. 241
Bibliography
242
[13] G. V. Balakin, V. I. Khokhlov, and V. F. Kolchin. Hypercycles in a random hypergraph. Discrete Math. Appl., 2:563-570, 1992. [14] A. D. Barbour. Refined approximations for the Ewens sampling formula. Adv. Appl. Probab., 24:765, 1992. [15] A. D. Barbour. Refined approximations for the Ewens sampling formula. Random Structures and Algorithms, 3:267-276, 1992. [16] E. A. Bender, E. R. Canfield, and B. D. McKay. The asymptotic number
of labeled connected graphs with a given number of vertices and edges. Random Structures and Algorithms, 1:127-170, 1990. [17] E. A. Bender, E. R. Canfield, and B. D. McKay. Asymptotic properties of labeled connected graphs. Random Structures and Algorithms, 3:183-202, 1992.
[18] M. R. Best. The distribution of some variables on a symmetric group. Nederl. Akad. Wetensch. Indag. Math. Proc., 73:385-402, 1970. [19] L. Bieberbach. Analytische Fortsetzung. Springer-Verlag, Berlin, 1955. [20] B. Bollobas. The evolution of random graphs. Trans. Amer. Math. Soc., 286:257-274, 1984. [21] B. Bollobas. Random Graphs. Academic Press, London, 1985. [22] Yu. V. Bolotnikov. Convergence to the Gaussian and Poisson processes of the variable Ar(n, n) in the classical occupancy problem. Theory Probab. Appl., 13:39-50, 1968. In Russian. [23] Yu. V. Bolotnikov. Convergence to the Gaussian process of the number of empty cells in the classical occupancy problem. Math. Notes, 4:97-103, 1968. In Russian. [24] Yu. V. Bolotnikov. Limit processes in a non-equiprobable scheme of allocating particles into cells. Theory Probab. Appl., 13:534-542, 1968. In Russian.
[25] Yu. V. Bolotnikov. On some classes of random variables on cycles of permutations. Math. USSR Sb., 36:87-99, 1980. [26] Yu. V. Bolotnikov, V. N. Sachkov, and V. E. Tarakanov. Asymptotic normality of some variables connected with the cyclic structure of random permutations. Math. USSR Sb., 28:107-117, 1976. [27] J. D. Bovey. An approximate probability distribution for the order of elements of the symmetric group. Bull. London Math. Soc., 12:41-46, 1980. [28] V. E. Britikov. Limit theorems on the maximum size of trees in a random forest of non-rooted trees. In Probability Problems of Discrete Mathematics, pp. 84-91. MIEM, Moscow, 1987. In Russian.
[29] V. E. Britikov. The asymptotic number of forests from unrooted trees. Math. Notes, 43:387-394, 1988. [30] V. E. Britikov. The limit behaviour of the number of trees of a given size in a random forest of nonrooted trees. In Stochastic Processes and Applications,
pp. 36-41. MIEM, Moscow, 1988. In Russian.
Bibliography
243
[31] I. A. Cheplyukova. Emergence of the giant tree in a random forst. Discrete
Math. Appl., 8(1):17-34, 1998. [32] S. Chowla, I. N. Herstein, and K. Moore. On recursions connected with symmetric groups. Canad. J. Math., 3:328-334, 1951. [33] S. Chowla, I. N. Herstein, and W. R. Scott. The solution of xd = 1 in symmetric groups. Norske Vid. Selsk., 25:29-31, 1952. [34] J. M. DeLaurentis and B. G. Pittel. Random permutations and Brownian motion. Pacific J. Math., 119:287-301, 1985. [35] P. J. Donnelly. Labellings, size-biased permutations and the gem distribution. Adv. Appl. Probab., 24:766, 1992. [36] P. J. Donnelly, W. J. Ewens, and S. Padmadisastra. Functionals of random mappings: Exact and asymptotic results. Adv. Appl. Probab., 23:437-455, 1991.
[37] P. Erdo"s and A. Renyi. On the evolution of random graphs. Publ. Math. Inst. Hungarian Acad. Sci., Ser. A, 5(1-2):17-61, 1960. [38] P. Erdo"s and A. Renyi. On random matrices. Magyar Tud. Akad. Mat. Kutato Int. Kdzl., 8:455-461, 1963. [39] P. Erdo"s and P. Turan. On some problems of statistical group theory. iii. Acta Math. Acad. Hungar., 18(3-4):309-320, 1967. [40] W. J. Ewens. The sampling theory of selectively neutral alleles. Theoret. Pop. Biol., 3:87-112, 1972. [41] W. J. Ewens. Sampling properties of random mappings. Adv. Appl. Probab., 24:773, 1992. [42] M. V. Fedoryuk. Saddle Point Method. Nauka, Moscow, 1977. In Russian. [43] W. Feller. An Introduction to Probability Theory and Its Applications, vol. 2. Wiley, New York, 1966. [44] P. Flajolet. The average height of binary trees and other simple trees. Journal of Computer and System Sciences, 25:171-213, 1982. [45] P. Flajolet. Random tree models in the analysis of algorithms. In P.J. Cour-
tois and G. Latouche, editors, Performance'87, pp. 171-187. NorthHolland, Amsterdam, 1988. [46] P. Flajolet, D. E. Knuth, and B. Pittel. The first cycles in an evolving graph. Discrete Math., 75:167-215, 1989. [47] P. Flajolet and A. M. Odlyzko. Random mapping statistics. In J.-J. Quisquarter and J. Vandewalle, editors, Advances in Cryptology, Lecture Notes in Computer Science, Vol. 434, pp. 329-354. Springer-Verlag, Berlin, 1990. [48] P. Flajolet and M. Soria. Gaussian limiting distributions for the number of components in combinatorial structures. J. Combinatorial Theory, Series A, 53:165-182, 1990. [49] B. V. Gnedenko and A. N. Kolmogorov. Limit Distributions for Sums of Independent Random Variables. Addison-Wesley, Reading, MA, 1949.
244
Bibliography
[50] S. W. Golomb. Shift Register Sequences. Aegean Park Press, Laguna Hills, CA, 1982.
[51] V. L. Goncharov. On the distribution of cycles in permutations. Soviet Math. Dokl., 35(9):299-301, 1942. In Russian. [52] V. L. Goncharov. On the alternation of events in a sequence of Bernoulli trials. Soviet Math. Dokl., 36(9):295-297, 1943. In Russian. [53] V. L. Goncharov. On the field of combinatorics. Soviet Math. Izv., Ser. Math., 8:3-48, 1944. In Russian. [54] A. A. Grusho. Random mappings with bounded multiplicity. Theory Probab. Appl., 17:416-425, 1972. [55] A. A. Grusho. Distribution of the height of mappings of bounded multiplicity. In Asymptotic and Enumerative Problems of Combinatorial Analysis, pp. 7-18. Krasnoyarsk Univ., Krasnoyarsk, 1976. In Russian. [56] J. C. Hansen. Order statistics for random combinatorial structures. Adv. Appl. Probab., 24:774, 1992.
[57] B. Harris. Probability distributions related to random mappings. Ann. Math. Statist., 31:1045-1062, 1960. [58] C. C. Heyde. A contribution to the theory of large deviations for sums of independent random variables. Z. Wahrscheinlichkeitstheorie and verw. Gebiete, 7:303-308, 1967. [59] W. Hoeffding. Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc., 58(301):13-30, 1963. [60] I. A. Ibragimov and Yu. V. Linnik. Independent and Stationary Related
Variables. Nauka, Moscow, 1965. In Russian. [61] E. Jacobstal. Sur le nombre d' elements du group symmetric S dont l' ordre est un nombre premier. Norske lyd. Selsk., 21:49-51, 1949. [62] S. Janson. Multicyclic components in a random graph process. Random Structures and Algorithms, 4:71-84, 1993.
[63] S. Janson, D. E. Knuth, T. Luczak, and B. Pittel. The birth of the giant component. Random Structures and Algorithms, 4:233-358, 1993. [64] I. B. Kalugin. The number of cyclic points and the height of a random mapping with constraints on multiplicities of the vertices. In Abstracts of the All-Union Conference Probab. Methods in Discrete Math., pp. 35-36. Karelian Branch of the USSR Acad. Sci., Petrozavodsk, 1983. In Russian. [65] V. I. Khokhlov. On the structure of a non-uniformly distributed random graph. Adv. Appl. Probab., 24:775, 1992. [66] V. I. Khokhlov and V. F. Kolchin. On the structure of a random graph with nonuniform distribution. In New Trends in Probab. and Statist., pp. 445456. VSP/Mokslas, Utrecht, 1991. [67] J. F. C. Kingman. The population structure associated with the Ewens sampling formula. Theoret. Pop. Biol., 11:274-284, 1977.
Bibliography
245
[68] A. V. Kolchin. Equations in unknown permutations. Discrete Math. Appl., 4:59-71, 1994.
[69] V. F. Kolchin. A class of limit theorems for conditional distributions. Litovsk. Mat. Sb., 8:53-63, 1968. In Russian. [70] V. F. Kolchin. On the limiting behavior of extreme order statistics in a polynomial scheme. Theory Probab. Appl., 14:458-469, 1969. [71] V. F. Kolchin. A problem of allocating particles into cells and cycles of random permutations. Theory Probab. Appl., 16:74-90, 1971. [72] V. F. Kolchin. A problem of the allocation of particles in cells and random mappings. Theory Probab. Appl., 21:48-63, 1976. [73] V. F. Kolchin. Branching processes, random trees, and a generalized scheme of arrangements of particles. Math. Notes, 21:386-394, 1977.
[74] V. F. Kolchin. Moment of degeneration of a branching process. Math. Notes, 24:954-961, 1978. [75] V. F. Kolchin. Branching processes and random trees. In Cybernetics, Combinatorial Analysis and Graph Theory, pp. 85-97. Nauka, Moscow, 1980. In Russian. [76] V. F. Kolchin. Asymptotic Methods of Probability Theory. MIEM, Moscow, 1984. In Russian.
[77] V. F. Kolchin. On the behavior of a random graph near a critical point. Theory Probab. Appl., 31:439-451, 1986. [78] V. F. Kolchin. Random Mappings. Optimization Software, New York, 1986.
[79] V. F. Kolchin. Systems of Random Equations. MIEM, Moscow, 1988. In Russian. [80] V. F. Kolchin. On the number of permutations with constraints on their cycle lengths. Discrete Math. Appl., 1:179-194, 1991. [81] V. F. Kolchin. Cycles in random graphs and hypergraphs. Adv. Appl. Probab., 24:768, 1992. [82] V. F. Kolchin. The number of permutations with cycle lengths from a fixed set. In Random Graphs'89, pp. 139-149. Wiley, New York, 1992. [83] V. F. Kolchin. Consistency of a system of random congruences. Discrete Math. Appl., 3:103-113, 1993. [84] V. F. Kolchin. A classification problem in the presence of measurement errors. Discrete Math. Appl., 4:19-30, 1994. [85] V. F. Kolchin. Random graphs and systems of linear equations in finite fields. Random Structures and Algorithms, 5:135-146, 1994. [86] V. F. Kolchin. Systems of random linear equations with small number of non-zero coefficients in finite fields. In Probabilistic Methods in Discrete Mathematics, pp. 295-304. VSP, Utrecht, 1997.
246
Bibliography
[87] V. F. Kolchin and V. I. Khokhlov. An allocation problem and moments of the binomial distribution. In Probab. Problems of Discrete Math., pp. 1621. MIEM, Moscow, 1987. In Russian. [88] V. F. Kolchin and V. I. Khokhlov. On the number of cycles in a random non-equiprobable graph. Discrete Math. Appl., 2:109-118, 1992. [89] V. F. Kolchin and V. I. Khokhlov. A threshold effect for systems of random equations of a special form. Discrete Math. Appl., 5:425-436, 1995. [90] V. F. Kolchin, B. A. Sevastyanov, and V. P. Chistyakov. RandomAllocations. Wiley, New York, 1978. [91] I. N. Kovalenko. A limit theorem for determinants in the class of Boolean functions. Soviet Math. Dokl., 161:517-519, 1965. In Russian. [92] I. N. Kovalenko. On the limit distribution of the number of solutions of a random system of linear equations in the class of Boolean functions. Theory
Probab. Appl., 12:51-61, 1967. In Russian. [93] I. N. Kovalenko, A. A. Levitskaya, and M. N. Savchuk. Selected Problems
of Probabilistic Combinatorics. Naukova Dumka, Kiev, 1986. In Russian. [94] J. B. Kruskal. The expected number of components under a random map-
ping function. Amer. Math. Monthly, 61:392-397, 1954. [95] E. Landau. Handbuch der Lehre von der Verteilung der Primzahlen, vol. 1. Teubner, Berlin, 1909. [96] A. A. Levitskaya. Theorems on invariance of the limit behaviour of the number of solutions of a system of random linear equations over a finite ring. Cybernetics, (2):140-141, 1978. In Russian. [97] A. A. Levitskaya. Theorems on invariance for the systems of random linear equations over an arbitrary finite ring. Soviet Math. Dokl., 263:289-291, 1982. In Russian. [98] A. A. Levitskaya. The probability of consistency of a system of random linear equations over a finite ring. Theory Probab. Appl., 30:339-350, 1985. In Russian.
[99] T. Luczak. Component behaviour near the critical point of the random graph process. Random Structures and Algorithms, 1:287-310, 1990.
[100] T. Luczak. Cycles in a random graph near the critical point. Random Structures and Algorithms, 2:421-439, 1991. [101] T. Luczak and B. Pittel. Components of random forests. Comb. Probab. and Comput., 1:35-52, 1992. [102] M. P. Mineev and A. I. Pavlov. On the number of permutations of a special form. Math. USSR Sb., 99:468-476, 1976. In Russian. [103] L. Moser and M. Wyman. On the solution of xd = 1 in symmetric groups. Canad. J. Math., 7:159-168, 1955. [104] L. R. Mutafchiev. Local limit theorems for sums of power series distributed random variables and for the number of components in labelled relational structures. Random Structures and Algorithms, 3:403-426, 1992.
Bibliography
247
[105] E. Palmer. Graphical Evolution. Wiley, New York, 1985.
[106] A. I. Pavlov. On the limit distribution of the number of cycles and the logarithm of the order of a class of permutations. Math. USSR Sb., 42:539567, 1982. [107] A. I. Pavlov. On the number of cycles and the cycle structure of permutations from some classes. Math. USSR Sb., 46:536-556, 1984.
[108] A. I. Pavlov. On the permutations with cycle lengths from a fixed set. Theory Probab. Appl., 31:618-619, 1986. In Russian. [109] A. I. Pavlov. Local limit theorems for the number of components of random substitutions and mappings. Theory Probab. Appl., 33:196-200, 1988. In Russian.
[110] A. I. Pavlov. The number and cycle structure of solutions of a system of equations in substitutions. Discrete Math. Appl., 1:195-218, 1991. [111] Yu. L. Pavlov. The asymptotic distribution of maximum tree size in a random forest. Theory Probab. Appl., 22:509-520, 1977. [112] Yu. L. Pavlov. Limit theorems for the number of trees of a given size in a random forest. Math. USSR Sb., 32:335-345, 1977. [113] Yu. L. Pavlov. A case of limit distribution of the maximum size of a tree in a random forest. Math. Notes, 25:387-392, 1979. [114] Yu. L. Pavlov. Limit distributions of some characteristics of random mappings with a single cycle. In Math. Problems of Modelling Complex Systems, pp. 48-55. Karelian Branch of the USSR Acad. Sci., Petrozavodsk, 1979. In Russian. [115] Yu. L. Pavlov. Limit theorems for a characteristic of a random mapping. Theory Probab. Appl., 27:829-834, 1981. [116] Yu. L. Pavlov. Limit distributions of the height of a random forest. Theory Probab. Appl., 28:471-480, 1983. [117] Yu. L. Pavlov. On the random mappings with constraints on the number of cycles. In Proc. Steklov Inst. Math., pp. 131-142. Nauka, Moscow, 1986. [118] Yu. L. Pavlov. Some properties of plane planted trees. In Abstr. All-Union Conference on Discrete Math. and its Appl. to Modelling of Complex Systems, p. 14. Irkutsk State Univ., Irkutsk, 1991. In Russian. [119] Yu. L. Pavlov. Some properties of planar planted trees. Discrete Math. Appl., 3:97-102, 1993. [120] Yu. L. Pavlov. The limit distributions of the maximum size of a tree in a random forest. Discrete Math. Appl., 5:301-316, 1995. [121] Yu. L. Pavlov. Limit distributions of the number of trees of a given size in a random forest. Discrete Math. Appl., 6:117-133, 1996. [122] V. V. Petrov. Sums of Independent Random Variables. Springer-Verlag, New York, 1975. [123] B. Pittel. On tree census and the giant component in sparse random graphs. Random Structures and Algorithms, 1:311-342, 1990.
248
Bibliography
[124] G. P61ya and G. Szeg6. Aufgaben and Lehrsatze aus derAnalysis. SpringerVerlag, Berlin, 1925. [125] Yu. V. Prokhorov. Asymptotic behaviour of the binomial distribution. Uspekhi Matem. Nauk, 8(3):135-142, 1953. In Russian. [126] J. Riordan. Combinatorial Identities. Wiley, New York, 1968.
[127] A. Rucinski and N. C. Wormald. Random graph processes with degree restrictions. Combinatorics, Probability and Computing, 1: 169-180, 1992. [128] V. N. Sachkov." Mappings of a finite set with restraints on contours and height. Theory Probab. Appl., 17:640-656, 1972. [129] V. N. Sachkov. Random mappings with bounded height. Theory Probab. Appl., 18:120-130, 1973. [130] V. N. Sachkov. Probability Methods in Combinatorial Analysis. Nauka, Moscow, 1978. In Russian. [131] A. I. Saltykov. The number of components in a random bipartite graph. Discrete Math. Appl., 5:515-523, 1995. [132] B. A. Sevastyanov. Convergence of the number of empty cells in the classical allocation problems to Gaussian and Poisson processes. Theory Probab. Appl., 12:144-154, 1967. In Russian. [133] V. E. Stepanov. On the probability of connectedness of a random graph g,,, (t). Theory Probab. Appl., 15:55-67, 1970. [134] V. E. Stepanov. Phase transition in random graphs. Theory Probab. Appl., 15:187-203, 1970. [135] V. E. Stepanov. Structure of random graphs g (x h). Theory Probab. Appl., 17:227-242, 1972. [136] L. Takacs. On the height and widths of random rooted trees. Adv. Appl. Probab., 24:771, 1992. [137] S. G. Tkachuk. Local limit theorems on large deviations in the case of stable limit laws. Izvestiya of Uzbek Academy of Sciences, (2):30-33, 1973. In Russian. [138] V. A. Vatutin. Branching processes with final types of particles and random trees. Adv. Appl. Probab., 24:771, 1992. [139] A. M. Vershik and A. A. Shmidt. Symmetric groups of high degree. Soviet Math. Dokl., 13:1190-1194,1972. [140] A. M. Vershik and A. A. Shmidt. Limit measures arising in the asymptotic theory of symmetric groups. i. Theory Probab. Appl., 22:78-85, 1977. [141] A. M. Vershik and A. A. Shmidt. Limit measures arising in the asymptotic theory of symmetric groups. ii. Theory Probab. Appl., 23:36-49, 1978. [142] V. A. Voblyi. Asymptotic enumeration of labelled connected sparse graphs with a given number of planted vertices. Discrete Analysis, 42:3-16, 1985. In Russian. [143] V. A. Voblyi. Wright and Stepanov-Wright coefficients. Math. Notes, 42:969-974, 1987. I
Bibliography
249
[144] L. M. Volynets. The number of solutions of an equation in the symmetric group. In Probab. Processes and Appl., pp. 104-109. MIEM, Moscow, 1985. In Russian. [145] L. M. Volynets. On the number of solution of the equation xs = e in the symmetric group. Math. Notes, 40:155-160, 1986. In Russian. [146] L. M. Volynets. An estimate of the rate of convergence to the limit distribution for the number of cycles in a random substitution. In Probab. Problems of Discrete Math., pp. 40-46. MIEM, Moscow, 1987. In Russian. [147] L. M. Volynets. The generalized scheme of allocation and the distribution of the number of cycles in a random substitution. In Abstracts of the Second All-Union Conf. Probab. Methods of Discrete Math., pp. 27-28. Petrozavodsk, 1988. In Russian. [148] L. M. Volynets. The generalized scheme of allocation and the number of cycles in a random substitution. In Probab. Problems of Discrete Math., pp. 131-136. MIEM, Moscow, 1988. In Russian. [149] L. M. Volynets. An example of a nonstandard asymptotics of the number of substitutions with restrictions on the cycle lengths. In Probab. Processes and Appl., pp. 85-90. MIEM, Moscow, 1989. In Russian. [150] H. Wilf. The asymptotics of eP(z) and the number of elements of each order in S,,. Bull. Amer. Math. Soc., 15:228-232, 1986. [151] E. M. Wright. The number of connected sparsely edged graphs. iii. J. Graph Theory, 4:393-407, 1980. [152] E. M. Wright. The number of connected sparsely edged graphs. iv. J. Graph Theory, 7:219-229, 1983. [153] A. L. Yakymiv. On the distribution of the number of cycles in random a-substitutions. In Abstracts of the Second All-Union Conference Probab. Methods in Discrete Math., p. 111. Karelian Branch of the USSR Acad. Sci., Petrozavodsk, 1988. In Russian. [154] A. L. Yakymiv. Substitutions with cycle lengths from a fixed set. Discrete Math. Appl., 1:105-116, 1991. [155] A. L. Yakymiv. Some classes of substitutions with cycle lengths from a given set. Discrete Math. Appl., 3:213-220, 1993. [156] N. Zierler. Linear recurring sequences. J. Soc. Ind. Appl. Math., 7:31-48, 1959.
INDEX
algorithm A2, 173
independent critical sets, 125 inversion formula, 10
characteristic function, 8 classical scheme of allocation, 16 complete description of distribution of the number of cycles, 192 connected component, 23 connectivity, 22 critical graph, 91 critical set, 125
length of the maximum cycle, 212 limit distribution of the number of hypercycles, 164 linearly independent solutions, 125 local limit theorem, 10 mathematical expectation, 3 maximal span, 9 maximum number of independent critical sets, 125, 135 maximum size of components, 66 maximum size of components of a random graph, 84 maximum size of trees in a random forest, 48 maximum size of trees in a random graph, 83 maximum size of trees in a random graph from A,, T, 71 mean, 3 mean number of solutions in the equiprobable case, 132 method of coordinate testing, 168 multinomial distribution, 15 multiplicity of a vertex in a set of hyperedges, 126
decomposable property, 23 distribution function, 2
equations involving permutations, 219 equations of compound degree, 235 equations of prime degree, 225 factorial moment, 3 feedback point, 124 forest, 21 forest of nonrooted trees, 30 generalized scheme of allocation, 14 generating function, 6 graphs with components of two types, 70 homogeneous system of equations, 125 hypercycle, 126 251
252
Index
nonequiprobable graph, 109 normal distribution, 9 number of components, 107, 144 number of components in Un, 65 number of cycles in a random permutation, 182, 183 number of cycles of length r in a random permutation, 182 number of forests, 31 number of linearly independent solutions, 130 number of nontrivial solutions, 131 number of trees of fixed sizes, 71 number of trees with r vertices, 42 number of unicyclic components, 77, 81 number of unicyclic graphs, 58 number of vertices in the maximal unicyclic component, 81 number of vertices in unicyclic components, 71, 77 order of random permutation, 239 order statistics, 17 partition, 19 permutations with restrictions on cycle lengths, 192 Poisson distribution, 6 probability, 1 probability distribution, 2 probability of consistency, 144 probability of reconstruction of the true solution, 166 probability space, 1 problem of moments, 4 process of sequential growth of the number of rows, 127
random graph of a random permutation, 182 random graphs with independent edges, 100 random matrices with independent elements, 126 random pairwise comparisons, 164 random partitions, 30 random permutation, 28 random variable, 1 rank of matrix, 124 rank of random sparse matrices, 135 reconstructing the true solution, 165 saddle-point method, 7, 221 set of rooted trees, 21 shift register, 123 simple classification problem, 122 single-valued mapping, 18 statistical problem of testing the hypotheses Ho and H1, 180 subcritical graph, 91 summation of independent random variables in GF(2), 131 supercritical graph, 91 system of linear equations in GF(2), 122 system of random equations with distorted right-hand sides, 180 system with at most two unknowns in each equation, 156 threshold property, 156 total number of components, 24 total number of critical sets, 157 total number of cycles, 102, 212 total number of hypercycles, 158
unicyclic graph, 58 random element, 1 random forest, 21, 30 random graph corresponding to random permutation, 181
voting algorithm, 165 weak convergence, 2