This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
36
3
Michael Clausen and Meinard M¨ uller
Eﬃcient Construction of Monomial DFTs
In this section we give a summary of the algorithm in [3] which constructs a monomial DFT of a supersolvable group G given by a pcpresentation with O(G log G) operations. One can even show, that the running time is essentially proportional to the output length. For a detailed description and analysis of this algorithm we refer to [3]. 3.1
PCPresentations
Let G be a supersolvable group with chief series T as above. For 1 ≤ i ≤ n let gi be an element in Gi not in Gi−1 . With respect to (g1 , . . . , gn ) each element g ∈ G can be expressed uniquely in normal form e
n−1 · . . . · g1e1 g = gnen · gn−1
(0 ≤ ei < pi ).
The multiplication in G is completely described, if the normal forms of all powers gipi and all commutators [gi , gj ] := gi−1 gj−1 gi gj are known. More formally, every supersolvable group has a powercommutator presentation (pcpresentation) of the form G = g1 , . . . , gn  gipi = ui (1 ≤ i ≤ n), [gi , gj ] = wij (1 ≤ i < j ≤ n) , with primes pi as well as words ui ∈ Gi−1 and wij ∈ Gi , all given in normal form. Moreover, we require the presentation to be consistent, i.e., that every word in the generators has a unique normal form. Consistent pcpresentations of this kind exactly describe the class of supersolvable groups. With respect to such a pcpresentation, an irreducible representation of the group Gi is fully described by the representing matrices of the generators g i , . . . , g1 . As an example, we give a consistent pcpresentation of a supersolvable group with 128 elements denoted by G128 . In the presentation, trivial commutator relations are omitted. G128 = g7 , g6 , g5 , g4 , g3 , g2 , g1  g12 = g22 = g42 = g52 = g62 = 1, g32 = g1 , g72 = g4 , [g2 , g6 ] = [g2 , g7 ] = [g3 , g4 ] = [g3 , g5 ] = [g3, g6 ] = g1 , [g3 , g7 ] = g2 , [g4 , g5 ] = g2 · g1 , [g4 , g6 ] = g3 · g1 , [g5 , g7 ] = g3 , [g6 , g7 ] = g5
3.2
The Algorithm
Before describing the algorithm, we want to mention the following important points. First, the pcpresentation of G already contains all the information on the group needed in the algorithm, so no group operations are required at all. Second, even though the irreducible representations are computed over C, it turns out that the algorithm uses just integer arithmetic. Hence, we never run into numerical problems! More precisely, all matrices to be processed by
A Fast Program Generator of Fast Fourier Transforms
37
the algorithm are emonomial and all matrix manipulations are multiplications. Therefore, we can compute in the additive group ZZ e , which is isomorphic to the group of eth roots of unity in C. (One can show that the algorithm works over any ﬁeld K containing such a primitive eth root of unity, but, for simplicity, we just consider the case K = C.) The central idea of the algorithm is based on Cliﬀord’s Theorem. In our notation it says that given an irreducible representation F of CGi−1 , 0 < i ≤ n, then there are two cases: Case 1. F extends to pi = [Gi : Gi−1 ] pairwise inequivalent irreducible representations of CGi of the same degree deg(F ). Case 2. The induction of F is an irreducible representation of CGi of degree pi · deg(F ). Furthermore, up to equivalence all irreducible representations of CGi can be obtained this way. This allows us to construct the irreducible representations of CG iteratively in a bottomup fashion along the chief series T . However, constructing an arbitrary DFT is not what we want. We are interested in the construction of a very special set of irreducible representations  namely representations resulting in emonomial matrices when evaluated at group elements. Suppose, we already have constructed a full set of nonequivalent irreducible emonomial representations of CGi−1 denoted by F . In order to construct an emonomial D ∈ Irrep(Gi , Ti ) of level i from a given emonomial F ∈ F of level i − 1, we need to know the relation between the conjugate representation F gi and the corresponding F˜ ∈ F with F gi ∼ F˜ . That is the reason, why the intertwining spaces Int(F gi , F˜ ) come into play. It turns out that all intertwining matrices in Int(F gi , F˜ ) are scalar multiples of an emonomial matrix. In a second phase, the algorithm computes such intertwining matrices. To cut a long story short, we now give a summary of the algorithm. At level i the algorithm takes the following input: Phase 1. F = Irrep(Gi−1 , Ti−1 ), i.e., a full set of nonequivalent irreducible emonomial representations of CGi−1 such that F ∈F F is Ti−1 adapted. Phase 2. For every i − 1 < j ≤ n a permutation πj of F such that F gj ∼ πj F for all F ∈ F as well as emonomial matrices XjF ∈ Int(F gj , πj F ), F ∈ F . The following output is computed: Phase 1. D = Irrep(Gi , Ti ), i.e., a full set of nonequivalent irreducible emonomial representations of CGi such that D∈D D is Ti adapted. Phase 2. For every i < j ≤ n a permutation τj of D such that Dgj ∼ τj D for all D ∈ D as well as emonomial matrices YjD ∈ Int(Dgj , τj D), D ∈ D. Note that the input of level 0 is trivial, all intertwining matrices being set to 1. Level i of the algorithm consists of two phases. Phase 1 (Computation of D). sentation F gi .
Consider F ∈ F and its gi conjugate repre
38
Michael Clausen and Meinard M¨ uller
Case 1. F ∼ F gi , i.e., πi F = F . Then, by Cliﬀord’s Theorem, there are exactly p := pi pairwise nonequivalent irreducible extensions D0 , . . . , Dp−1 of F to CGi satisfying Dk =χk ⊗ D0 , where χ0 , χ1 , . . . , χp−1 are the irreducible characters of the cyclic group Gi /Gi−1 . Since Dk ↓ Gi−1 = F , k = 1, . . . , p − 1, in this step only the Dk (gi ) have to be computed. One can show that Dk (gi ) ∈ Int(F gj , F ) p and cpXiF = F (gip ) with a constant c ∈ C∗ . The last equation has p distinct solutions c0 , . . . , cp−1 ∈ C∗ , which can be proven to be even eth roots of unity. Thus the desired emonomial matrices Dk (gi ), 0 ≤ k < p, just diﬀer by a factor, which is a power of a pi th root of unity, and are given by Dk (gi ) := ck XiF . Case 2. F ∼ F gi , i.e., πi F = F . Again, by Cliﬀord’s Theorem, the induced p−1 gk i representation F ↑ Gi is irreducible and (F ↑ Gi ) ↓ Gi−1 = k=0 F . As gik k F ∼ πi F , we know the existence of a unique irreducible representation D of p−1 CG, sucht that D ↓ Gi−1 = k=0 πik F . This Ti adapted representation is now to be computed. We already know D(g1 ), . . . , D(gi−1 ) from level i − 1. Thus it remains to specify D(gi ). Here, the intertwining spaces constructed in level i − 1 are to be used. If Xk := Xiπk−1 F · . . . · XiF , 0 ≤ k < p, then i
Z
X1 X0−1 X2 X1−1 D(gi ) = .. .
−1 Xp−1 Xp−2 −1 , as shown in [3]. where Z := X0 F (gip )Xp−1
By these two constructions, all irreducible representations of Gi up to isomorphism are obtained, and Phase 1 is complete. In addition, during the construction in Phase 1 a bipartite graph is built up in which F ∈ F and D ∈ D are linked if and only if F is a constituent of D ↓ Gi−1 . This “traceback” information is needed in the next phase. Furthermore, this information, collected over all levels i = 1, . . . , n, is nothing else but the T character graph of the group G. Phase 2 (Computation of τj and YjD ). Let F ∈ F and i < j ≤ n. We have to consider the same two cases as in Phase 1. Case 1. πi F = F . In Phase 1, the p extensions D0 , . . . , Dp−1 have been computed. As Dk is an extension of F , one can show that τj D must be an extension of πj F . Let ∆0, . . . , ∆p−1 be the extension of πj F . Then it can be shown that YjDk := XjF must be set for all k and τj ({D0 , . . . , Dp−1 }) = {∆0 , . . . , ∆p−1}. Using YjDk one can determine τj as is explained in [3]. Case 2. πi F = F . In this case, τj (D) can be immediately determined, since it equals the unique ∆ ∈ D such that ∆ ↓ Gi−1 contains πj F (this information is encoded in the bipartite graph built up in Phase 1). We don’t want to discuss here the construction of YjD , which is a bit delicate, but refer to [3].
A Fast Program Generator of Fast Fourier Transforms
3.3
39
An Example: G128
Figure 1 shows the character graph of the group G128 given by its pcpresentation in Subsection 3.1. Each node represents an irreducible character and its corresponding irreducible representation. The numbers on the left hand side indicate the levels and the numbers on the top are the degrees of the corresponding reprentations of the top level. To illustrate the above algorithm, we describe the construction of the irreducible representation of level 7, denoted by D, corresponding to the circled node in Figure 1. l\d 1111211112
4
4
4
4
4
4
4
7 6 5 4 3 2 1 0
Fig. 1. Character graph of G128 . The representation D is induced, let’s say by the representation F of level 6, and is constructed in Phase 1 Case 2 of the algorithm. Suppose the algorithm has already constructed all data up to level 6 including F and the intertwining matrix X7F . Since p = p7 = 2, we have D ↓ G6 = F ⊕ π7 F and D is known at the generators g1 , . . . , g6 . We can compute D(g7 ) by the formula −1
D(g7 ) =
X1 X0−1
F (g72 )X1
=
−1
X7F
F (g4 )X7F
=
1 1
ω4 ,
1 where ω denotes an 8th root of unity, e = 8 being the exponent of G128 . Here we have used that X0 is always the identity, X1 = X7F is in our speciﬁc example 1 also the identity matrix of dimension 2. F (g72 ) = F (g4 ) = has to be ω4 2 computed using the powerrelation g7 = g4 of the pcpresentation.
40
3.4
Michael Clausen and Meinard M¨ uller
Implementation
The presented algorithm has been implemented in the programming language C. The tests were run on an Intel Pentium II with 300 MHz. As we have mentioned previously, no ﬁeld arithmetic is needed but only computations in the additive group ZZ e . For simplicity, we have assumed e to be known, even though one can show that this is not necessary. The eﬃciency of the implementation is based on the fact, that emonomial matrices of size N can be multiplied or inverted with only N operations in ZZ e . Since any emonomial matrix M ∈ CN ×N can be written in the form M = πdiag(ω a1 , . . . , ω an ) with a permutation π ∈ SN and nonzero coeﬃcients ω a1 , . . . , ω an , just the 2N integers π(1), . . . , π(N ) and a1 , . . . , aN have to be stored for M. For the group h G and any r ∈ IN deﬁne dr (G) := k=1 drk , where h denotes the number of conjugacy classes of G and d1 , . . . , dh the degrees of the irreducible characters of G. One easily checks, that the running time to write out the result of the algorithm, i.e., all matrices Di,k (gl ), 1 ≤ i ≤ n, 1 ≤ k ≤ hi (h i denoting the n number of conjugacy classes of Gi ), 1 ≤ l ≤ i, is proportional to i=1 i · d1 (Gi ), n which is bounded from above by i=1 log(Gi ) · d1 (Gi ). One can show n that the number of operations of the algorithm is of this magnitude O( i=1 log(Gi ) · d1 (Gi )) with a moderate constant ≤ 20. In this sense the algorithm is nearly optimal. The following table shows the running times for some small supersolvable groups to construct all the matrices Di,k (gl ) as above. Here G is the order of G, h the nnumber of conjugacy classes of G, o.l. the output length of the algorithm (i.e., i=1 log(Gi ) · d1 (Gi )), r.t. the running time in milliseconds and r.t./o.l. the quotient of the last two quantities. The groups in the ﬁrst three examples are direct products of the symmetric group S3 , the group in the forth example is G128 from subsection 3.3 and the last example is concerned with a Sylow 2subgroup of the symmetric group S16. G G h o.l. r.t. (S3 )5 7776 243 13235 (S3 )6 46656 729 63528 (S3 )7 279936 2187 296464 G128 128 40 280 Syl2 (S16 ) 32768 230 30960
(ms) r.t/o.l. 266 0.020 1125 0.018 4250 0.014 15 0.054 2156 0.069
Of course, the ﬁrst three groups are of a very simple nature. However, the running time of the algorithm does not essentially depend on the complexity of the pcpresentation, but mainly on the number and degrees of the irreducible representations constituting the DFT. This is veriﬁed by the more complex example Syl2 (S16 ). Therefore, the actual running times for constructing a monomial DFT of CG reﬂect very well the theoretical result concerning the output length.
A Fast Program Generator of Fast Fourier Transforms
4 4.1
41
Applications Related Work
The fast DFTgeneration algorithm has been used as a subroutine to solve other computational problems. Th¨ ummel [12] has designed an algorithm that computes from a pcpresentation of a ﬁnite pgroup G in time O(p · h · G) its h conjugacy classes as well as the character table. Omrani and Shokrollahi [8] have combined the fast DFTgeneration algorithm with Galois theory to construct a full set of irreducible representations of a supersolvable group G over a ﬁnite ﬁeld K, char K  G, which is not assumed to contain a primitive eth root of unity. 4.2
Fast Convolution
FFTalgorithms allow a fast convolution in the group algebra CG along the formula: a·b = D−1 (D(a)·D(b)), for a, b ∈ CG. (Note that the linear complexities of a DFT D and its inverse do not diﬀer substantially, for L(D)−L(D−1 ) ≤ G, see [2].) Let D = ⊕hk=1Dk be a DFT of CG and dk the degree of Dk . Then the h convolution in CG can be performed with at most 2L(D) + L(D−1 ) + 2 k=1 d3k arithmetic operations. Thus if D, and hence D−1 , allows a fast Fourier transform, and d := maxk dk , then convolution can be done in time O(G log G + dG). As 1 ≤ d ≤ G1/2 , this constitutes a substantial improvement over the naive convolution algorithm, which performs this task in time quadratic in the order of G. Even in a very special case, a variant of this FFTbased fast convolution in CG might shed new light onto a classical problem in computational group theory. A sketch of this will be the last topic of this paper. 4.3
DFTBased Collection
As already mentioned, every element a in a pcpresented supersolvable group G αn−1 · . . . · g1α1 . The normal can be expressed as a normal word: a = g α := gnαn · gn−1 form problem is to compute on input (α, β) the unique γ with g α · g β = g γ . Classical techniques for solving this problem involve various kinds of collection processes (see, e.g., [10]) or Hall polynomials combined with interpolation techniques (see, e.g., [11]). To the best of our knowledge, there is no strategy that is always superior to all other strategies. As an alternative to classical collection strategies we propose DFTbased normalization. To simplify our notation, we start with a pcpresented pgroup G with corresponding chief series T = (Gn ⊃ . . . ⊃ G0 ) and complete lists Irrep(Gi , Ti ) of Ti adapted emonomial irreducible representations of CGi . Di,0 always denotes the trivial representation of CGi , Di,1 always a nontrivial extension of Di−1,0 satisfying Di,1 (gi ) = ζ, where ζ is a primitive pth root of unity. On input (α, β), the algorithm proceeds in n steps (n downto 1) to compute γ. After Step i + 1, the numbers γn , . . . , γi+1 are known. To get γi in Step i, we work in G/Gi−1 by replacing gj by 1, for all j < i. Consider the word −γ
wi := gi+1i+1 · · · gn−γn · gnαn · · · giαi · gnβn · · · giβi .
42
Michael Clausen and Meinard M¨ uller
By deﬁnition, wi ∈ Gi and wi ≡ giγi mod Gi−1 . We want to compute Di,1 (wi ), since γi is determined by Di,1 (wi ) = Di,1 (giγi ) = ζ γi . However, since wi is expressed in all generators g1 , . . . , gn , this cannot be done directly at level i. To this end we choose a suitable representation F ∈ Irrep(Gn , Tn ) whose restriction to CGi contains Di,1 as its ﬁrst irreducible constituent. Then all what remains to do is to compute the ﬁrst position of the diagonal matrix F (wi ), which equals Di,1 (wi ) = Di,1 (giγi ) = ζ γi . As F (wi ) = F (gi+1 )−γi+1 · · · F (gn )−γn · F (gn )αn · · · F (gi )αi · F (gn )βn · · · F (gi )βi is a product of monomial matrices and we are interested in only one entry of the ﬁnal result, each factor F (gj ) causes only one addition in ZZ e . Altogether, we obtain the following. Theorem 6. Let G be a pcpresented pgroup of order pn and exponent e, with corresponding chief series T . Then, given (suitable parts of) the T adapted DFT, normalization of the product of two normal words in G can be done with at most 2 · p · n2 additions in ZZ e . Finally, we want to remark that a similar result holds for the normalization of any formula in the generators g1 , . . . , gn of G.
References 1. Baum, U.: Existence and eﬃcient construction of fast Fourier transforms for supersolvable groups. Computational Complexity, 1 (1991), 235–256. 2. Baum, U., Clausen, M.: Some lower and upper complexity bounds for generalized Fourier transforms and their inverses. SIAM J. Comput., 20 (1991), 451–459. 3. Baum, U., Clausen, M.: Computing irreducible representations of supersolvable groups. Mathematics of Computation, Volume 63, Number 207 (1994) 351–359. 4. B¨ urgisser, P., Clausen, M., Shokrollahi, M.A.: Algebraic Complexity Theory. Grundlehren der mathematischen Wissenschaften, Volume 315, Springer Verlag, Berlin, 1997. 5. Clausen, M., Baum, U.: Fast Fourier Transforms. BIWissenschaftsverlag, Mannheim, 1993. 6. Clausen, M., Baum, U.: Ein kombinatorischer Zugang zur Darstellungstheorie u ¨ berauﬂ¨ osbarer Gruppen. Bayreuther Mathematische Schriften, 44 (1993), 99– 107. 7. Morgenstern, J.: Complexit´e lin´eaire de calcul. Th`ese, Univ. de Nice, 1978. 8. Omrani, A., Shokrollahi, M.A.: Computing irreducible representations of supersolvable groups over small ﬁnite ﬁelds. Mathematics of Computation, Volume 66, Number 218 (1997) 779–786. 9. Serre, J.P.: Linear Representations of Finite Groups. Graduate Texts in Mathematics, Springer, 1986. 10. Sims, C.C.: Computation with ﬁnitely presented groups. Cambridge University Press, 1994. 11. Sims, C.C.: Fast multiplication and growth in groups. ISSAC’98, Rostock, Germany (1998), 165–170. 12. Th¨ ummel, A.: Computing character tables of pgroups. pp. 150–154 in Proceedings ISSAC’96, Z¨ urich, Switzerland.
On Integer Programming Problems Related to SoftDecision Iterative Decoding Algorithms Tadao Kasami Department of Computer Science, Faculty of Information Science Hiroshima City University, Hiroshima, Japan
Abstract. We consider suﬃcient conditions for ruling out some useless iteration steps or all subsequent iteration steps without degradation of error performance in softdecision iterative decoding algorithms for binary block codes used over the AWGN channel using BPSK signaling. Then the derivation of such suﬃcient conditions and the selection of centers of search regions in iterative steps are formulated uniformly as a type of integer programming problems. Several techniques for reducing such an integer programming problem to a set of subprograms with smaller computational complexities are presented.
1
Introduction
A suﬃcient condition on the optimality of a decoded codeword rules out the subsequent iterations in softdecision iterative decoding algorithms. Such a condition is a kind of early termination condition and it was ﬁrst introduced in [1] based on a single candidate codeword already generated and then its improvement was presented in [2]. Furthermore, stronger suﬃcient conditions, denoted CondS,h , based on at most h candidate codewords already generated have been reported in [3,4] for h = 2 and 3, and in [5] for h = 4. CondS,3 was eﬀectively used to reduce the number of iterations in a low weighttrellis based iterative softdecision (iterative MWTS) decoding algorithm [6]. A suﬃcient condition for the softdecision decoding based on ordered statistics was presented in [7,8] for h = 1, [9] for h = 2 and formulated in [10] for h ≥ 2. A suﬃcient condition for ruling out some useless test error patterns for Chasetype decoding [11] was ﬁrst introduced in [12]. A suﬃcient condition, denoted CondRO, for ruling out some useless next test error pattern in Chasetype decoding was derived in [13]. Simulation results [13] for several example codes show that this rulingout condition is much more eﬀective in reducing the number of iterations than CondS,1 with almost the same computational complexity. An early termination condition CondET is presented in [14] for Chasetype decoding algorithm. Simulation results show that CondET is more eﬀective than CondS,1 , especially as the number of test error patterns grows. The combination of CondET with CondRO turns out to be very eﬀective.
The paper is partially based on reference [14].
Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 43–54, 1999. c SpringerVerlag Berlin Heidelberg 1999
44
Tadao Kasami
The derivation of rulingout, termination and suﬃcient conditions and the selection of centers of search regions [15] in iterative steps are shown to be a type of integer programming problems. Several techniques for reducing such an integer programming problem to a set of subproblems with smaller computational complexities are presented.
2
Definitions
Suppose a binary block code C of length N is used for error control over the AWGN channel using BPSK signaling. Each codeword is transmitted with the same probability. Let r = (r1 , r2 , . . . , rN ) be a received sequence and let z = (z1 , z2 , . . . , zN ) be the binary harddecision sequence obtained from r using the harddecision function: zi = 1 for ri > 0 and zi = 0 for ri ≤ 0. Any softdecision decoding scheme is devised based on r or reliability information provided by r. For the AWGN channel and the BPSK transmission, the reliability of a received symbol ri is generally measured by its magnitude ri . 2.1
Correlation Discrepancy
For a positive integer n, let V n denote the vector space of all binary ntuples. For u = (u1 , u2 , . . . , uN ) ∈ V N , the correlation between u and the received sequence N N r is given by M(u) = i=1 ri (2ui − 1). Then M(z) = i=1 ri  ≥ M(u) for any u ∈ V N . Deﬁne D−1 (u) {i : ui = zi , and 1 ≤ i ≤ N } and ri  = (M(z) − M(u))/2. (2.1) L(u) i∈D−1 (u)
L(u) is called the correlation discrepancy of u with respect to z. For a subset U of V N , let L[U] be deﬁned as L[φ(empty set)] ∞ and L[U] min L(u). u∈U
(2.2)
The assumption that L(u) = L(v) for any diﬀerent tuples u and v in V N will be called the assumption of correlation uniqueness. For a positive real number ε and u = v, the probability of L(u)−L(v) ≤ ε approaches zero as ε approaches zero. For a nonempty set U ⊆ V N , let v[U] denote a binary N tuple u ∈ U such that L(u) = L[U]. Under the above assumption, v[U] is unique. The following lemma holds. Lemma 1. Under the assumption of correlation uniqueness, L[U ] < L[U] for φ = U U ⊆ V N , if and only if v[U ] ∈ U. The maximum likelihood decoder decodes the received sequence r into the optimal codeword copt for which L(copt ) = L[C].
(2.3)
On Integer Programming Problems
45
If z is a codeword, then z is the optimal codeword. A candidate codeword c is said to be better (or more likely) than another candidate codeword c if L(c) ≤ L(c ). A candidate codeword c is said to be the best if L(c) is the minimum among a speciﬁed set of candidate codewords. 2.2
SoftDecision Iterative Decoding Algorithm
We consider the following type of softdecision iterative decoding algorithms. Let IDA denote an algorithm of this type for a binary block code C of length N . For a received sequence r, if the binary harddecision sequence z is in C, then z is output as the optimal decoded codeword and the decoding terminates. Otherwise, IDA starts from the ﬁrst stage. At the jth stage, IDA performs the following two subprocesses (G) and (T): (G) Generation of a Candidate Codeword: A search region at the jth stage, denoted R(j), is searched through to ﬁnd the best codeword, denoted c(j). If C R(j) = φ, then, deﬁne c(j) ∗. For convenience, we deﬁne L(∗) ∞. The codeword c(j) (= ∗) is called the candidate codeword generated at the jth stage. Deﬁne Rp(j), Rf (j) and RIDA as Rp (0) φ and Rp (j) 1≤i≤j R(i), for j ≥ 1, (2.4) Rf (j) ( i>j R(i)) \ Rp (j), and RIDA j≥1 R(j). (2.5) Deﬁne cbest (j) to be the best codeword in C ∩ Rp (j). If C ∩ Rp (j) = φ, then deﬁne cbest (j) ∗. (T) Test of Termination Condition: A termination condition, CondT , is tested. If CondT holds, then cbest (j) is output as the decoded codeword and the decoding terminates (if ‘∗’ is output, IDA fails in decoding). If CondT is false, IDA proceeds to the (j + 1)th stage. The termination condition CondT is usually to limit the number of iterations by a predetermined way. We will introduce a rulingout condition and an early termination condition into IDA in Section 3.
3 3.1
RulingOut and Early Termination Conditions RulingOut Condition
Note that Rp (j ) \ Rp (j − 1) with 1 < j ≤ j denotes the subset of Rp (j ) which has not been searched before the jth stage. Let LRO (j, j ) be a lower bound such that (3.1) LRO (j, j ) ≤ L[C (Rp (j ) \ Rp (j − 1))]. Lemma 2. If the following condition holds at the beginning of the jth stage: L(cbest (j − 1)) < LRO (j, j ),
then cbest (j ) = cbest (j − 1) for j ≤ j ≤ j .
(3.2)
46
Tadao Kasami
Hence, if the condition (3.2) holds, then the subprocesses (G) of the jth through j th stages may be skipped, and the condition (3.2) is called rulingout (useless decoding subprocesses) condition. 3.2
Early Termination Condition
Let LET (j) be a lower bound such that:
LET (j) ≤ L[C Rf (j)].
(3.3)
Lemma 3. Suppose the following condition holds: L(cbest (j)) ≤ LET (j).
(3.4)
Then L(cbest (i)) = L(cbest (j)) for any i > j. Hence, there is no improvement on error performance by any further iteration if (3.4) holds. The condition (3.4) is called an early termination condition. 3.3
Suﬃcient Condition of Optimality
In case that Rf (j) in (3.3) can not be expressed as a simple form, we can substitute simply V N \ Rp (j) for Rf (j). Let LS (j) be a lower bound such that LS (j) ≤ L[C (V N \ Rp (j))]. (3.5) Then, the condition that L(cbest (j)) ≤ LS (j),
(3.6)
is a suﬃcient condition of the optimality of cbest(j). Next we consider how to derive such lower bounds as LS (j), LRO (j) LRO (j, j), LRO (j, j ) and LET (j) without information on C other than the minimum distance dmin or a few smallest distances in the distance proﬁle of C. From (3.1), (3.3), (3.5) and (2.5), the lower bounds can be derived by evaluating L[U], where U is of the following form: U = C (X \ Rp (j)), X ⊆ V N , = (C \ Rp (j)) (X \ Rp (j)). (3.7) The second term in the righthand side of (3.7) is shown in [13,14] and Sec. 4 for several decoding algorithms. The ﬁrst term can be expressed as j j C \ Rp (j) = C \ ( i=1 R(i)) = i=1 (C \ R(i)). (3.8) For v ∈ V N and a positive integer d, deﬁne ¯ d (v) {x ∈ V N : dH (x, v) ≥ d}, (3.9) Od (v) {x ∈ V N : dH (x, v) ≤ d} and O
On Integer Programming Problems
47
where dH (x, v) denotes the Hamming distance between x and v. For most iterative decoding algorithms, j ¯d (ui ), (3.10) C \ Rp(j) ⊆ i=1 O i where if c(i) = ∗, then ui = c(i) and di is dmin or a small distance in the distance proﬁle of C (see Example 4.6), and if c(i) = ∗, then the ith search center (see Example 4.3) and (dmin + 1)/2 can be chosen as ui and di , respectively. For positive integers n, h, di and ui ∈ V n with 1 ≤ i ≤ h, deﬁne h ¯d (ui ) Vdn1 ,d2 ,... ,dh (u1 , u2 , . . . , uh ) i=1 O i = {u ∈ V N : dH (u, ui ) ≥ di , for 1 ≤ i ≤ h}.(3.11)
4
Examples of SoftDecision Iterative Decoding Algorithm
For 1 ≤ i ≤ j ≤ N , deﬁne [i, j] {i, i + 1, . . . , j}, called an interval. For x = (x1 , x2 , . . . , xN ), v = (v1 , v2 , . . . , vN ) ∈ V N and a nonempty set I ⊆ [1, N ], deﬁne dH,I (x, v) {i ∈ I : xi = vi }.
(4.1)
For I = {i1 , i2 , . . . , im } with 1 ≤ i1 < i2 < · · · < im ≤ N , pI (x) (xi1 , xi2 , . . . , xim ).
(4.2)
We abbreviate dH,[i,j] (x, v), dH,[1,N ] (x, v) and p[i,j] (x) as dH,i,j (x, v), dH (x, v) and pi,j (x), respectively. For simplicity, we assume that the bit positions 1, 2, . . . , N are ordered according to the reliability order given as follows: ri  ≤ rj , for 1 ≤ i < j ≤ N.
(4.3)
For nonnegative integers s and t such that s + 2t ≤ dmin − 1 and u ∈ V N , the decoding which corrects s erasures in the ﬁrst s bit positions and t or less errors in the remaining bit positions of input u is called (s, t)decoding with respect to u. We consider next several examples. Example 4.1: GMDLike Decoding: Deﬁne p as: 0 for even dmin , and 1 for odd dmin . For u ∈ V N , GMD(u) decoding is deﬁned as follows: For 1 ≤ j ≤ ρ (dmin + p)/2, the subprocess G of the jth stage of GMD(u) is the (2j − p − 1, ρ − j)decoding with respect to u. GMD(u) terminates at the ρth stage. GMD(z) is the original GMD proposed by Forney [16]. Then for 1 ≤ j ≤ ρ, R(j) = {v ∈ V N : dH,2j−p,N (v, u) ≤ ρ − j}.
(4.4)
It follows from (2.5) and (4.4) that V N \ RGMD(u) = {v ∈ V N : dH,2i−p,N (v, u) > ρ − i, for 1 ≤ i ≤ ρ}. (4.5)
48
Tadao Kasami
Example 4.2: We introduce “MultipleGMD”. For a positive integer h, hGMD consists of successive GMD(u(i)) for 1 ≤ i ≤ h, where u(1) z and u(i) v[
i−1
j=1 (V
N
\ RGMD(u(j) ) )], for 1 < i ≤ h. (2)
(2)
(4.6)
(2)
It is shown in [17] that u(2) + z = (u1 , u2 , . . . , uN ), where 1, if h + p is even and 1 ≤ (h + p)/2 ≤ ρ, (2) uh = 0, otherwise.
(4.7)
Example 4.3: Generalized Chase Algorithm [18]: Bounded distancet0 decoding around an input word v(j) ∈ V N , called the jth search center, is used for generating c(j), where 0 < t0 ≤ (dmin − 1)/2. Then, ¯ d(j) (u(j)), C \ R(j) ⊆ O
(4.8)
where u(j) d(j)
c(j), if c(j) = ∗, v(j), if c(j) = ∗,
(4.9)
dmin , if c(j) = ∗, t0 + 1, if c(j) = ∗.
(4.10)
For choosing v(j), the following method is proposed in [15]: Choose v(j + 1) such that v(j + 1) = v[VdN1 ,d2 ,... ,dh (u1 , u2 , . . . , uh )],
(4.11)
where u1 , u2 , . . . , uh are chosen from u(i) with 1 ≤ i ≤ j deﬁned by (4.9). Example 4.4: Multiple ChaseLike Decoding Algorithm: For 1 ≤ τ < N and u ∈ V N , Chase(u) denotes a 2τ stage decoding algorithm whose jth stage is bounded distancet0 decoding with input u + e(j), where t0 = (dmin − 1)/2 and e(j) is the jth test error pattern in Eτ {x ∈ V N : pτ +1,N (x) = 0}. Chase(z) is the original Chase decoding algorithm [11]. Then V N \ RChase(u) = {x ∈ V N : dH,τ +1,N (x, u) ≥ t0 + 1}.
(4.12)
For a positive integer h, hChase consists of successive Chase(u(i) ) with 1 ≤ i ≤ h, where u(i) , called the ith search center, is given by u(1) z and for 1 < i ≤ h i−1 u(i) v[ j=1 (V N \ RChase(u(j) ) )] = v[{x ∈ V N : dH,τ +1,N (x, u(j) ) ≥ t0 + 1, for 1 ≤ j < i}].
(4.13)
Example 4.5: Decoding Based on Ordered Statistics [7]: Let C be a binary linear (N, K) code, and let MK = {i1 , i2 , . . . , iK }, where 1 ≤ i1 < i2 < · · · < iK ≤ N − dmin + 1, be the most reliable basis (MRB) [7]. For
On Integer Programming Problems
49
x = (x1 , x2 , . . . , xK ) ∈ V K , there is a unique codeword γ(x) = (c1 , c2 , . . . , cN ) in C such that cih = zih + xh , for 1 ≤ h ≤ K.
(4.14)
For 0 ≤ h ≤ K, deﬁne VhK be the set of those binary Ktuples whose weights are h or less. The orderh decoding [7] generates candidate codewords {γ(x) : x ∈ VhK } iteratively in an increasing order of the weight of x. The search region, denoted Rosh , is the subcode of C deﬁned by Rosh {γ(x) : x ∈ VhK } = {v ∈ C : dH,MK (v, z) ≤ h}.
(4.15)
Similarly, we can treat the decoding algorithm in [19]. Example 4.6: Iterative MWTS Decoding Algorithm: In the iterative minimumweight subtrellis search (MWTS) decoding algorithm [6] for binary linear codes, the ﬁrst candidate codeword c(1) is obtained by the ordered statistics decoding, the zeroth order or the ﬁrst order [7]. At the jth stage with j > 1, the next candidate codeword c(j) is the best codeword in {v ∈ C : dH (v, c(j − 1)) = w1 } which is obtained by MWTS around c(j − 1), where w1 (= dmin ) is the minimum weight. Let w2 be the second smallest weight of C. Then, R(j) = Ow1 (c(j − 1)) \ {c(j − 1)}, for j > 1,
(4.16)
¯w1 (c(1)) and for j > 1, and we have that C \ R(1) ⊆ O C \ R(j) = C \ {Ow1 (c(j − 1)) \ {c(j − 1)}} ¯ w1 (c(j)). ¯ w2 (c(j − 1)) {c(j − 1)}) O ⊆ (O
5
(4.17)
Formulation as Integer Programming Problem
For the decoding algorithms stated in Sec. 4 and in [13,14], the derivation of the lower bounds LS , LRO and LET and the selection of search centers can be uniformly formulated as the following optimization problem. For a received sequence r = (r1 , r2 , . . . , rN ), a set Σ ⊆ [1, h], nonempty sets Ij ⊆ [1, N ], uj ∈ V N , and positive integers dj ≤ N for 1 ≤ j ≤ h, ﬁnd the following lower bound min L(x), where x ∈ V N subjects to the inequalities ≥ dj , for j ∈ Σ, dH,Ij (x, uj ) = dj , for j ∈ / Σ,
(5.1)
(5.2)
where u1 , u2 , . . . , uh are called reference words. Deﬁne 1 (1, 1, . . . , 1). Note that dH,Ij (x, uj ) ≤ dj if and only if dH,Ij (x, uj + 1) ≥ Ij  − dj , for 1 ≤ j ≤ h.
(5.3)
50
Tadao Kasami
Deﬁne y x + z by yi = xi + zi (mod 2) for 1 ≤ i ≤ N . Then L(x) =
N
ri yi .
(5.4)
i=1
For v ∈ V N , 1 ≤ j ≤ h and I ⊆ [1, N ], deﬁne D0,I (v) {1, 2, . . . , N} \ I, D1,I (v) {i ∈ I : vi = zi },
(5.5)
D−1,I (v) {i ∈ I : vi = zi }, n−1,I (v) D−1,I (v).
(5.6)
Since dH,I (x, uj ) = dH,Ij (y, uj + z), yi − dH,Ij (x, uj ) = i∈D1,Ij (uj )
yi + n−1,Ij (uj ).
(5.7)
i∈D−1,Ij (uj )
For a vector α = (α1 , α2 , . . . , αh ) ∈ {−1, 0, 1}h , deﬁne h Dα j=1 Dαj ,Ij (uj ), nα Dα .
(5.8)
Deﬁne A {α ∈ {−1, 0, 1}h : Dα = φ}. Since ∪α∈A Dα = [1, N ] and Dα ∩Dα = ¯ {Dα : α ∈ A} is a partition of [1, N ] and therefore A ≤ N . φ for α = α , D For 1 ≤ i ≤ N and 1 ≤ j ≤ h, let aji be the jth component of the vector α ∈ A with i ∈ Dα . Then Lemma 4 follows from (5.4) to (5.8). Lemma 4. The optimization problem (5.1) and (5.2) can be expressed as the following integer programming problem: min
N
ri yi ,
(5.9)
i=1
where yi ∈ {0, 1} subjects to the equalities N
aji yi = δj + σj , for 1 ≤ j ≤ h,
(5.10)
i=1
where δj dj − n−1,Ij (uj ), and σj is a slack variable such that σj ≥ 0 for j ∈ Σ and σj = 0 for j ∈ Σ. For the above integer programming problem, denoted IPP, a binary N tuple y which meets the constraints is called feasible and a feasible y at which the object function takes the minimum is called optimal. For α ∈ A, deﬁne yi . (5.11) qα = i∈Dα
Then 0 ≤ qα  ≤ nα .
(5.12)
On Integer Programming Problems
51
Equalities (5.10)can be expressed as
αj qα = δj + σj , for 1 ≤ j ≤ h,
(5.13)
α∈A
where αj denotes the jth component of α. Let Q denote the set of those Atuples over nonnegative integers (qα : α ∈ A)’s which satisfy (5.12) and (5.13). By using vector representation, equalities (5.13) are expressed as
qα α = δ + σ,
(5.14)
α∈A
where δ = (δ1 , δ2 , . . . , δh ) and σ = (σ1 , σ2 , . . . , σh ). For a subset X ⊆ [1, N ] and an integer m with m ≤ X, let X[m] denote the set of m smallest integers in X for 1 ≤ m ≤ X and the empty set φ for m ≤ 0. Then, from (4.3), (5.9) and (5.11), the optimal value of the object function of the IPP can be expressed as min ri , (5.15) i∈
α∈A
Dα [qα ]
where the minimum is taken over Q. Deﬁne q (qα : α ∈ A), where the components are ordered in a ﬁxed order. Then q deﬁned by (5.11) from an optimal solution y of IPP is also called optimal. Conversely, for an optimal q, the solution y whose ith bit is 1 if and only if i ∈ Dα [qα ] for α ∈ A such that i ∈ Dα is optimal from (4.3) and (5.11). Hereafter, we consider the IPP in terms of qα with α ∈ Q. For S ⊆ A, Σ ⊆ Σ and an htuple c = (c1 , c2 , . . . , ch ) over nonnegative integers such that cj = 0 for j ∈ / Σ, let IPP(S, Σ , c), called a subIPP, denote the integer programming problem whose constraints are those of the IPP and the following additional ones: / S, qα = 0, for α ∈ ≥ cj , for j ∈ Σ , σj = = cj , for j ∈ Σ .
(5.16) (5.17)
We say that the IPP can be reduced into a set of subIPP’s if and only if there exists an optimal solution q of the IPP which is an optimal solution of one of the subIPP’s. For example, suppose that the jth restriction is dH (x, uj ) ≥ dmin with uj ∈ C and it is enough to consider x to be a codeword of C. Since the restriction is equivalent to dH (x, uj ) = w1 or dH (x, uj ) ≥ w2 , IPP can be reduced to subIPP’s. Deﬁne S(q) {α ∈ A : qα > 0}. The following lemma and simple corollary are useful for the reduction of IPP. Lemma 5 follows from (4.3), (5.14) and (5.15).
52
Tadao Kasami
Lemma 5. Let q be optimal. Then (i) for any nonempty subset X of S(q), α ≤ σ and (ii) for α ∈ S(q) and α ∈ A such that nα > qα , if the qα α∈X th integer of Dα is greater than the (qα + 1)th integer of Dα , then α − α ≤ σ. Corollary 1. Let q be optimal. Then (i) α ∈ / S(q), for α ∈ {0, −1}h , (ii) {α, −α} ⊆ S(q), for any α ∈ A, hence S(q) ≤ min{N, 3h /2}, (iii) at least one component of σ is zero, and (iv) if there are α ∈ S(q) and i ∈ [1, h] such that αi = 1 and αj = −1 for j = i, then σi = 0 and βi = 1 for all β ∈ S(q). Some further details can be found in [14]. As the number of the reference words grows, the computational complexity for solving IPP grows. We may choose a relatively small subset of reference words to derive a little weaker but still useful lower bound. The reference words u1 , u2 , . . . , uh in (5.2) can be partitioned into two blocks, denoted RW1 and RW2 , in such a way that uj ∈RW1 if and only if uj is chosen independently of any candidate codeword generated already. A block RW1 or RW2 may be empty. Examples of RW1 are {z} for LRO in [13], {z, z +1} for LET in [14], {v(1), . . . , v(j) : decoding failures continue up to the jth stage} for ﬁnding v(j +1) in Example 4.3, {u(1), . . . , u(i−1) } for ﬁnding u(i) in Example 4.4 and {z} for LET or LS in Example 4.5. Assume that the ﬁrst h1 reference words are in RW1 . For 1 ≤ i ≤ h1 , deﬁne Ai {p1,i (α) : α ∈ A} and for β = (β1 , β2 , . . . , βi ) ∈ Ai , deﬁne Dβ ∩ij=1 Dβj ,Ij (uj ). For v ∈ V N and I ⊆ [1, N ], if Da,I (v) with a ∈ {−1, 0, 1} is empty or an interval, then (v, I) is called interval type. For an interval I or I = φ, (z, I) and (z + 1, I) are interval type. If ui with i > 1 is an optimal solution of IPP with reference words u1 , u2 , . . . , ui−1 such that (uj , Ij ) is interval type for 1 ≤ j < i, then it follows from (4.3) that ui is also interval type. Suppose (uj , Ij ) is interval type for 1 ≤ j ≤ i. Then Dβ is an interval for β ∈ Ai , that is, there are two integers ν1 (β) and ν2 (β) such that Dβ = [ν1 (β), ν2 (β)]. For diﬀerent β and β in Ai , we write β < β if and only if ν2 (β) < ν1 (β ). Since Dβ ∩ Dβ = φ, β < β or β < β holds. This simpliﬁes the solution of IPP. For instance, (ii) of Lemma 5 becomes the following more useful version: (ii’) For 1 ≤ i ≤ h, let β and β be in Ai such that β < β. Then, for α and α in A such that p1,i (α) = β and p1,i (α ) = β , either qα = 0, or qα = nα , or α − α ≤ σ. Thus, it is better to choose a relatively small number of reference words in RW2 . Reasonable choice rules are: i) give priority to a reference word ui with a larger di or a smaller L(ui ); and ii) to renew the test condition, the candidate codeword generated most recently must be chosen. Example 5.1: Consider the evaluation of LS,h L[VdN1 ,d2 ,... ,dh (u1 , u2 , . . . , uh )]. For this case, RW1 = φ, Σ = [1, N ] and Ij = [1, N ] for 1 ≤ j ≤ h. Formulas for LS,2 and LS,3 were presented in [3,4] and an eﬃcient algorithm for computing LS,4 is presented in [5]. Upper bounds of the number of real number additions and comparisons for evaluating LS,h with 2 ≤ h ≤ 4 have been derived in [20] for 2 ≤ h ≤ 3 and [5] for h = 4. Table 1 lists the upper bounds under the assumption of (4.3) and the number of subIPP’s.
On Integer Programming Problems
53
Tab. 1 Upper bounds of the number of real number additions and comparisons for evaluating LS,h and the number of subIPP’s, where δ min1≤i≤h {δi }. h Upper Bound The number of subIPP’s 2 δ−1 1 3 10δ − 2 2 4 310δ 2 + 184δ − 1 9
Acknowledgments The author is grateful to Mr. Y. Tang for his help in preparing this paper, and to Prof. S. Lin, Prof. M. P. C. Fossorier, Prof. T. Fujiwara and Dr. T. Koumoto for their helpful suggestions.
References 1. D. J. Taipale and M. B. Pursley, “An Improvement to Generalized MinimumDistance Decoding,” IEEE Trans. Inform. Theory, vol. 37, pp. 167–172, Jan. 1991. 2. T. Kaneko, T. Nishijima, H. Inazumi and S. Hirasawa, “An Eﬃcient MaximumLikelihoodDecoding Algorithm for Linear Block Codes with Algebraic Decoder,” IEEE Trans. Inform. Theory, vol. 40, pp. 320–327, Mar. 1994. 3. T. Kasami, T. Takata, T. Koumoto, T. Fujiwara, H. Yamamoto and S. Lin, “The Least Stringent Suﬃcient Condition on Optimality of Suboptimal Decoded Codewords,” Technical Report of IEICE, Japan, IT9482, Jan. 1995. 4. T. Kasami, T. Koumoto, T. Takata and S. Lin, “The Eﬀectiveness of the Least Stringent Suﬃcient Condition on the Optimality of Decoded Codewords,” Proc. of the 3rd Int. Symp. on Commu. Theory & Appl., pp. 324–333, Ambleside, UK, July 1995. 5. Y. Tang, T. Kasami and T. Fujiwara, “An Optimality Testing Algorithm for a Decoded Codeword of Binary Block Codes and Its Computational Complexity,” to appear in Proc. of the 13th Int. Symp. AAECC, Honolulu, HI, USA, Nov. 1999. 6. T. Koumoto, T. Takata, T. Kasami and S. Lin, “A LowWeight Trellis Based Iterative SoftDecision Decoding Algorithm for Binary Linear Block Codes,” IEEE Trans. Inform. Theory, vol. 45, pp. 731–741, Mar. 1999. 7. M. P. C. Fossorier and S. Lin, “SoftDecision Decoding of Linear Block Codes Based on Ordered Statistics,” IEEE Trans. Inform. Theory, vol. 41. pp. 1379–1396, Sept. 1995. 8. D. Gazelle and J. Snyders, “ReliabilityBased CodeSearch Algorithms for MaximumLikelihood Decoding of Block Codes,” IEEE Trans. Inform. Theory, vol. 43. pp. 239–249, Jan. 1995. 9. C. X. Chen, “Suﬃcient Conditions for the Optimality of a Codeword in SoftDecision Decoding of Binary Linear Block Codes,” Master Thesis, Univ. of Hawaii, Oct. 1998. 10. M. P. C. Fossorier, T. Koumoto, T. Takata, T. Kasami and S. Lin, “The Least Stringent Suﬃcient Condition on the Optimality of a Suboptimally Decoded Codeword Using the Most Reliable Basis,” Proc. of ISIT, pp. 430, Ulm, Germany, June 1997.
54
Tadao Kasami
11. D. Chase, “A New Class for Decoding Block Codes with Channel Measurement Information,” IEEE Trans. Inform. Theory, vol. IT18, pp. 170–182, Jan. 1972. 12. T. Kaneko, T. Nishijima and S. Hirasawa, “An Improvement of SoftDecision MaximumLikelihood Docoding Algorithm Using HardDecision BoundedDistance Decoding,” IEEE Trans. Inform. Theory, vol. 43, pp. 1314–1319, July 1997. 13. T. Koumoto, T. Kasami and S. Lin, “A Suﬃcient Condition for Ruling Out Some Useless Test Error Patterns in Iterative Decoding Algorithms,” IEICE Trans. on Fundamentals, vol. E81A, No. 2, pp. 321–326, Feb. 1998. 14. T. Kasami, Y. Tang, T. Koumoto and T. Fujiwara, “Suﬃcient Conditions for RulingOut Useless Iterative Steps in a Class of Iterative Decoding Algorithms,” to appear in IEICE Trans. Fundamentals, vol. E82A, Oct. 1999. 15. T. Koumoto and T. Kasami, “An Iterative Decoding Algorithm Based on Information of Decoding Failure,” Proc. of the 20th SITA, pp. 325–328, Matsuyama, Japan, Dec. 1997. 16. G. D. Forney, Jr., “Generalized Minimum Distance Decoding,” IEEE Trans. Inform. Theory, vol. IT2, pp. 125–181, Apr. 1966. 17. T.Koumoto and T. Kasami, “Analysis and Improvement on GMDlike Decoding Algorithms,” Proc. of ISITA, pp. 419–422, Mexico City, Mexico, Oct. 1998. 18. N. Tendolkar and C. P. Hartmann, “Generalization of Chase Algorithms for Soft Decision Decoding of Binary Linear Codes,” IEEE Trans. Inform. Theory, vol. IT30, pp. 714–721, Sept. 1984. 19. Y. H. Han, C. R. P. Hartman, and C. C. Chen, “Eﬃcient PriorityFirst Search MaximumLikelihood SoftDecision Decoding of Linear Block Codes,” IEEE Trans. Inform. Theory, vol. 39, pp. 1514–1523, Sept. 1993. 20. T. Kasami and T. Koumoto, “Computational Complexity for Computing Suﬃcient Conditions on the Optimality of a Decoded Codeword,” Technical Report of NAIST, Japan, NAISTISTR98008, July 1998.
Curves with Many Points and Their Applications Ian F. Blake HewlettPackard Laboratories 1501 Page Mill Road Palo Alto, CA 94304 USA ifblake@hpl.hp.com
Abstract. The use of curves in algebraic geometries in applications such as coding and cryptography is now extensive. This note reviews recent developments in the construction of such curves and their application to these subjects.
1
Introduction
The use of curves in algebraic geometries, in applications such as coding and cryptography, has now an extensive literature and has been responsible for dramatic developments in both subjects. In coding theory, the use of modular curves to construct asymptotically good codes i.e., to construct an inﬁnite sequence of codes whose parameters asymptotically exceed the VarshamovGilbert bound over alphabets of size ≥ 49, was a remarkable achievement. Likewise, the introduction of the theory of elliptic curves to public key cryptography has opened up a new area of both practical and theoretical signiﬁcance. This note reviews some of the current developments and directions of research in areas related to the construction of curves with many points. Of necessity it is highly selective. The term ‘curves with many points’ refers to curves whose set of points has order close to the maximum allowed by the HasseWeil theorem, √ namely q + 1 + 2g q. The construction of such curves is an interesting subject in its own right. The use of such curves in coding theory results in codes of long length, relative to the dimension and distance, in comparison to codes obtained from other curves of the same genus and alphabet size. The application of elliptic curves to cryptography is to replace the more usual ﬁnite groups, such as the multiplicative group of a ﬁnite ﬁeld, with the group of points on the curve under point addition. As such, the construction of curves with many points (which in the case of elliptic curves means the upper √ bound of q + 1 + 2 q) is less important than the size of the largest prime order subgroup of points. However, by recourse to the prime number theorem and √ the fairly uniform distribution of curve orders over the interval q + 1 ± 2 q, as the curve parameters range over all allowable values, one can argue that curves with many points are of interest in this case as well. The reason for interest in such cryptosystems is that the same level of security can be achieved with much smaller key lengths compared to the more conventional systems, resulting in more eﬃcient implementations. Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 55–64, 1999. c SpringerVerlag Berlin Heidelberg 1999
56
Ian F. Blake
This work reviews the subject and its applications, with a particular emphasis on recent developments and directions. The next section presents some background material needed for a discussion of the subject and its applications to coding theory and cryptography, which are considered in the succeeding sections.
2
Preliminaries
A very brief introduction to the background needed to discuss the problems addressed in this work is given. More complete introductions to the material are given in [32], [13], [43] and [3]. ¯ q its algebraic closure. Let f(x, y) ∈ Fq [x] be a Let Fq be a ﬁnite ﬁeld and F bivariate polynomial over Fq and consider the set ¯ q f(u, v) = 0} ¯q × F C = {(u, v) ∈ F ¯ q . Each such solution is i.e. the set of solutions of the equation f(x, y) = 0 over F referred to as a point of the curve. The subset of these points with coordinates in Fq is referred to as the set of rational points. We will only be interested in the case where f(x, y) ∈ Fq [x, y] is an irreducible polynomial over Fq and where the curve is smooth i.e. contains no nonsingular points, points where at least one of the formal derivatives vanishes. There is an equivalent approach to the subject in terms of function ﬁelds for which we refer to [43]. A divisor of the curve C is a ﬁnite formal sum m i Pi , m i ∈ Z D= Pi ∈C
where only a ﬁnite number of the integers mi are nonzero. The degree of the divisor is i mi . Denote by D the set of all divisors of C, an abelian group, and by D0 the set of divisors of degree 0, a subgroup of D. Denote D ≥ 0 if mi ≥ 0 for all i. greatest common divisor (gcd) ofdivisors D = i mi Pi and D = The to be gcd(D, D ) = i min(mi , m i )Pi − kO where k = i m i Pi is deﬁned min(m , m ) i.e. the gcd is a divisor of degree 0.To a bivariate polynomial i i i g(u, v) ∈ F¯q [u, v] can be associated the divisor (g) = i mi Pi − kO ∈ D0 where k is chosen so the divisor has degree 0, and the integer mi represents the order of vanishing of g at Pi . Associated with the rational function g(u, v)/h(u, v) is the divisor (g) − (h). A divisor of this form is referred to as a principal divisor. Clearly all such principal divisors have degree 0 and they form a subgroup P of D0 . The factor group D0 /P is referred to as the jacobian J of the curve and will play a role in the formulation of cryptosystems using hyperelliptic curves. Notice that notions of divisors discussed above hold also for curves and their corresponding divisors deﬁned with polynomials of a single variable. Associated with a curve is an invariant g, the genus of the curve, which is a nonnegative
Curves with Many Points and Their Applications
57
integer which is related to the dimension of certain subspaces associated with divisors of the curve. A fundamental result of curves is the HasseWeil theorem which states that the number of rational points, N1 , on a curve of genus g is bounded by the relation N1 − (q + 1) ≤ 2gq 1/2 . The right hand side of this bound was improved by Serre [42] to g[ 2q 1/2 ]. When the genus of the curve is large compared to q, this bound is not very eﬀective. A more accurate bound is given in this case in a theorem by Oesterle (and reported by Schoof [38]). It can be shown that the number of solutions (of the curve deﬁned over Fq ) over Fqr , r > g, is determined by N1 , N2 , · · · , Ng where Ni is the number of solutions over Fqi . The relationship is made explicit via the zeta function. Deﬁne a curve of genus g over Fq with N1 points to be maximal if N1 = q + 1 + 2gq 1/2. To achieve this upper bound it is clearly necessary that q be a square, q = r2 . For a given genus g and ﬁnite ﬁeld size q, let Nq (g) be the maximum number of points on any curve of genus g over Fq , Nq (g) ≤ q+1+2gq 1/2 . A curve with this maximum number of points Nq (g) will be called optimal. Some limited, speciﬁc results on the existence of certain classes of optimal and maximal codes are known (e.g. [12]). A curve that contains ‘almost’ q + 1 + 2gq 1/2 points will be referred to as ”a curve with many points”. Subsequent sections survey recent contributions to the problem of constructing curves with many points. Asymptotics: Deﬁne the quantity A(q) = limsupg→∞ Nq (g)/g and notice from the HasseWeil bound that A(q) ≤ 2q 1/2 or, using the Serre improvement, A(q) ≤ [2q 1/2 ]. A variety of upper and lower bounds on this quantity are known ([25], [44], [8], [42], [50], [40]). by exhibiting a suitable inﬁnite family of modular curves. Niederreiter and Xing [37] have recently established a variety of lower bounds on A(q) including the fact that, for q odd and m ≥ 3, A(q m ) ≥ 2q/2(2q + 1)1/2 + 1. Additionally they show that A(2) ≥ 81/317, A(3) ≥ 62/163, and A(5) ≥ 2/3 improving on the previously best bounds of these cases of 2/9, 1/3 and 1/2 respectively. Many other bounds on A(q), both upper and lower exist, but are not pursued here. Construction Techniques: A variety of techniques have been used to establish curves with many points and a web site containing many of the ‘best’ curves available is maintained [17]. The techniques have been categorized as: I Methods from general class ﬁeld theory; II Methods from class ﬁeld theory based on Drinfeld modules of rank 1; III Fibre products of ArtinSchreier curves; IV Towers of curves with many points; V Miscellaneous methods such as : (i) formulas for Nq (1) and N2 (q); (ii) explicit curves, curves, (iii) elliptic modular curves (iv) DeligneLusztig curves; (v) quotients of curves with many points.
58
Ian F. Blake
Numerous techniques to construct curves with many points that depend on coding theory are given in [18]. In essence, the codewords lead to families of curves and low weight codewords lead to curves with many points. The role of coding theory to yield linear spaces of curves with many points is then of interest. Elliptic Curves It can be shown [31] that an elliptic curve over Fq can always be reduced to the form: E : y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6 ,
ai ∈ Fq .
¯ q , together with the point at inﬁnity The set of solutions of the equation over F ¯ O, form the points of the curve E(Fq ). All such curves have genus 1. The above (Weierstrass) equation can be further reduced to one of two forms, depending on whether the characteristic of the ﬁeld is even or odd. The maximum number of points such a curve of genus 1 can have over Fq is completely determined ([41], [45]). For q = pa if a is an odd integer, a ≥ 3 and p[2q 1/2 ], then q is called exceptional. Then Nq (1) =
q + 1 + [2q 1/2 ], q nonexceptional q + [2q 1/2 ], q exceptional.
Partial results are available for g = 2. There also exists necessary and suﬃcient conditions [45] for the existence of a curve with exactly N points when the characteristic does not divide (N − q − 1). Hyperelliptic Curves As a generalization of elliptic curves, consider the hyperelliptic curves, deﬁned by an equation of the form: y 2 + h(x)y = k(x) ,
h(x), k(x) ∈ Fq [x]
where h(x) is a polynomial of degree at most g and k(x) is monic of degree exactly 2g + 1. We require the curve to be smooth i.e. have no singular points ¯ 2 where both of the partial derivatives 2y+h(x) = 0 and h y−k (x) = 0 (x, y) ∈ F q vanish([27]). In such a case the genus of the curve is g. Notice that for elliptic curves h(x) is at most a linear polynomial and k(x) a monic cubic, and the curve is of genus 1. If char(Fq ) = 2 then the change of variables x → x, y → y − (h(x)/2) transforms the equation to y 2 = f(x) , degf(x) = 2g + 1 . In this case, if P = (x, y) is a point on the curve, then (x, −y − h(x)) is also a point on the curve, the sum of P and the point at inﬁnity. If char(Fq ) = 2 and (x, y) is on the curve, then so also is (x, y + h(x)). This point will be designated P. Just as the zeta function leads to the enumeration of points on a curve over extensions of the base ﬁeld, given the number on the ﬁrst g extensions, a slight modiﬁcation leads to the enumeration of the number of points in the Jacobian J(Fqr ), r > g, given the number of points on the curve for r = 1, 2, . . . , g [27].
Curves with Many Points and Their Applications
3
59
Curves in Coding Theory
For a given divisor G on a curve C deﬁne a subset of rational functions on the curve by L(G) = {f ∈ Fq (x, y)(f) + G ≥ 0} ∪ {O} . Clearly L(G) is a ﬁnite dimensional vector space over Fq . Denote its dimension by l(G). A consequence of the RiemannRoch theorem [10] is then that l(G) ≥ deg(G) + 1 − g where g is the genus of the curve. Indeed for a divisor G with deg(G) > 2g − 2, this equation will hold with equality. To construct the class of codes that will be of interest, let C be a nonsingular irreducible curve over Fq of genus g and let P1 , P2 , · · · , Pn be the rational points on C and D = P1 + P2 + · · · + Pn , a divisor. Let G be a divisor with support disjoint from D and assume that 2g − 2 < deg(G) < n. Deﬁne the linear code C(D, G) over Fq as the image of the linear map ([43]) α : L(G) −→ Fnq f
−→ (f(P1 ), f(P2 ), · · · , f(Pn )) .
This is often referred to as the evaluation construction of codes from curves in the literature. The parameters of the code C(D, G) are established by using the properties discussed above. The kernel of the map is the set L(G − D) and, for the restrictions on the parameters noted, k = dimC(D, G) = dimL(G) − dim(G − D) = deg(G) − g + 1. The minimum distance of C(D, G) is d ≥ d∗ = n − deg(G), the designed minimum distance. The other standard construction for codes, the residue construction, uses the idea of diﬀerentials of the curve, and will not be considered. Using the above evaluation construction of codes for elliptic curves, one can √ deﬁne a class of codes over Fq with the parameters length n ≤ q + 1 + 2 q, dimension k and minimum distance n − k. Since the distance is one away from the Singleton bound, one often refers to this as a code with ‘defect’ of 1. Further information on codes from elliptic curves is given in [21]. One can also use the construction on hyperelliptic curves and Xing [46] determines certain classes of such codes whose minimum distance can be determined exactly. Other aspects of the relationship between algebraic curves and codes are touched upon. The ﬁrst establishes the connection between codewords of low weight in certain trace codes and curves with many points. The emphasis here will be on ReedMuller (RM) codes and follow [20]. The generalized RM code Rq (s, m) can be deﬁned as follows. Consider the vector space (over Fq ) Ps = {f ∈ Fq [X1 , X2 , . . . , Xm ]deg(f) ≤ s}
60
Ian F. Blake
and denote the evaluation map by β : Ps −→ Fnq , f −→ (f(v)v∈Fm . The q kernel of this map is the ideal generated by the polynomials Xiq − Xi and a polynomial f ∈ Ps is called reduced if f is an Fq linear combination of monomials dm with 0 ≤ di ≤ q − 1. The image β(Ps ) is the RM code Rq (s, m) X1d1 X2d2 · · · Xm of order s in m variables. It can be shown that the evaluation codeword corresponding to certain polynomials f can in fact be written in the form: cf = (T r(R(x)))x∈Fm
for some R(x) ∈ Fqm [x].
An element in Fqm has trace 0 if and only if it can be written in the form y q −y for some y ∈ Fqm . Now if we associate to the codeword cf the irreducible complete smooth (ArtinSchreier) curve Cf over Fqm given by the equation y q − y = R(x), then the curve has genus g(Cf ) = (q − 1)(deg(R) − 1)/2. It follows immediately that w(cf ) = q m − (Cf (Fqm ) − 1)/q, giving an interesting relationship between codewords of low weight and curves with many points. In [20] these considerations are also shown to lead in some cases to maximal curves, achieving the HasseWeil upper bound for the case of √ √ small genus i.e. ([16], [20]) g ≤ ( q − 1)2 /4 or g = (q − q)/2 (see [49]). Much of this work exploits the connection between generalized Hamming weights and curves with many points. Another contribution to the relationship between codes and curves, is the work on Niederreiter and Xing [37] on the existence of asymptotically good codes, i.e., sequences of codes whose rate/distance proﬁle asymptotically exceeds the VarshamovGilbert bound. They are able to show a result equivalent to the celebrated one [44] for q a square. Speciﬁcally it is shown that for m ≥ 3 an odd integer and r a prime power ≥ 100m3 for odd r and ≥ 576m3 for even r then there is an open interval on (0, 1) which contains the point (rm − 1)/(2rm − 1) where the VarshamovGilbert bound can be exceeded. The result is achieved by establishing lower bounds on A(q m ), some of which were mentioned earlier.
4
Curves in Cryptography
The notion from public key cryptography that will be required for our presentation is that of a oneway function, on which cryptographic protocols , such as the DiﬃeHellman key exchange, depend. Other protocols, such as digital signatures, authentication and others follow readily, depending on the function in use. The particular oneway function of interest here is that of the discrete logarithm. In the multiplicative group of the ﬁnite ﬁelds F2m or Fp , p a prime, the order of complexity of the discrete logarithm problem in both time and space, is L(1/3, c, N) where L(v, c, N) = exp{c(log N )v (log log N )1−v }
Curves with Many Points and Their Applications
61
for some constant c where N is the largest prime divisor of the multiplicative group order. For a prime ﬁeld, the constant c is approximately 1.6. This complexity function represents subexponential growth in log N . Establishing this complexity for the discrete logarithm problem uses a technique known as index calculus, which is facilitated by the group actually having two operations, residing as it does in a ﬁnite ﬁeld. To ensure the discrete logarithm problem is as diﬃcult as possible, the cyclic group order, q − 1, should have as large a prime divisor as possible. The essence of the application of curves, either elliptic or hyperelliptic, to the discrete logarithm problem, is to replace the groups previously used with additive groups deﬁned from the curves. The presentation of these groups and the group operations, is the focus of the cryptographic aspects of the problem. On an elliptic curve it is possible to deﬁne a point addition such that the sum of any two points on the curve uniquely deﬁnes a third turning the set of curve points into a commutative additive group. The identity or neutral element of the group is the point at inﬁnity O. The discrete logarithm problem in the additive group of points on the curve is: given a point P of large prime order N and a point multiple c · P = P + · · · + P (c times), determine the integer c. No subexponential algorithm is known to attack this problem. The most eﬃcient algorithm known is the so called square root attack (or babystepgiantstep) which has complexity on the order of the square root of the largest prime divisor of the group order. The essence of the problem appears to be the lack of an index calculus for it, as was possible for F∗p . A problem of some importance for the use of elliptic curves in cryptography, is the ability to determine precisely the number of points on the curve, or equivalently to determine curve parameters that yield a curve whose group order contains a large prime divisor. This socalled ‘point counting’ problem is both a theoretically and practically challenging one. The more general problem of counting points on curves of higher genus is even more so [9]. With the success of using elliptic curves for public key cryptosystems, it is natural to consider the use of other curves. A group structure is required in order to uniquely carry out the public key operations and no such addition seems possible on more general curves. However, it is possible to deﬁne an addition on the jacobian of a hyperelliptic curve and we give a few comments on this situation here. The deﬁnition of the jacobian of a curve given previously is valid for any curve. There are two reasons however why the jacobian of a hyperelliptic curve is especially interesting [27]. Since the deﬁnition of the jacobian is as a quotient group of two inﬁnite groups, it is necessary to be able to describe in an eﬃcient manner, representatives of cosets or, to be able to determine when two divisors are in the same coset, and hence identiﬁed. Secondly it is important to be able to add two elements of the jacobian and, again, identify the coset the result is in. For the jacobian of a hyperelliptic curve, these two problems have eﬀective algorithms. For the ﬁrst problem, it can be shown that every element in J can be uniquely represented as a reduced divisor deﬁned as follows:
62
Ian F. Blake
Definition 1. A divisor D = mi Pi − (∗)∞ ∈ D0 is said to be reduced if: (i) All the mi are nonnegative and m1 ≤ 1 if Pi = Pi . (ii) IfPi = Pi then Pi and Pi do not both occur in the sum. (iii) mi ≤ g. Indeed any reduced divisor can be uniquely represented as the gcd of polynomials of a certain form. Likewise, an algorithm to add two divisors is also possible (see [27], Section 5. and Appendix) although it is too detailed to give here. Hyperelliptic curve cryptosystems have been seriously investigated by several prominent researchers. To date they seem to have disadvantages with respect to eﬃciency of operations, compared to elliptic curve systems at the same level of security and they have not been developed very far. At the same time a √ subexponential algorithm of complexity O( g log g), for ﬁxed prime p, has been found for hyperelliptic curves of large genus, although no such algorithm has been found for the lower genus case.
5
Comments
The construction of curves with many points is an interesting area in its own right. Not only have these constructions given insights to the structure of such curves, they have also inﬂuenced developments in areas such as coding theory and cryptography, as touched upon in this article. New applications continue to be found such as low discrepancy sequences, sequences with low correlation and authentication codes, largely by Xing, Niederreiter and their coauthors. The continued development of constructions of such curves and their applications will be followed with interest. The set of references included here goes beyond the topics described, as a convenience for the reader wishing to pursue a topic.
References 1. L.M. Adleman, J. DeMarrais and M.D. Huang, A subexponential algorithm for discrete logarithms over the rational subgroup of the jacobians of large genus hyperelliptic curves over finite fields, in Algorithmic Number Theory, L.M. Adleman and M.D. Huang eds., Lecture Notes in Computer Science, vol. 877, SprigerVerlag, pp. 2840, 1994. 2. A. M. Barg, S. L. Katsman and M. A. Tsfasman, Algebraic Geometric codes from curves of small genus, Problems of Information Transmission, vol. 23, pp. 3438, 1987. 3. Ian F. Blake, Gadiel Seroussi and Nigel Smart, Elliptic Curves in Cryptography, Cambridge University Press, 1999. 4. D. le Brigand, Decoding of codes on hyperelliptic curves, Lecture Notes in Computer Science, G. Cohen and P. Charpin eds., vol. 514, Berlin: SpringerVerlag, pp. 126134, 1990. 5. A. Brouwer, Bounds on the minimum distance of a linear code, http://win.tue.nl/~aeb/voorlincod.html.
Curves with Many Points and Their Applications
63
6. Y. Driencourt and J.F. Michon, Elliptic codes over fields of characteristic 2, Jour. Pure and Appl. Algebra, vol. 45, pp. 1539, 1987. 7. Y. Driencourt, Some properties of elliptic codes over a field of characteristic 2, in Lecture Notes in Computer Science, Berlin: SpringerVarlag, vol. 229, 1985. 8. V.G. Drinfeld and S. Vladut, Sur le nombre de points d’une courbe alg´ebrique, Anal. Fonct. et Appl., vol. 17, pp. 6869, 1983. 9. N.D. Elkies, Elliptic and modular curves over finite fields and related computational issues, in D.A. Buell and J.T. Teitelbaum, editors, Computational Perspectives on Number Theory: Proceedings of a Conference in Honor of A.O.L. Atkin, American Mathematical Society International Press, 7, 21–76, 1998. 10. W. Fulton, Algebraic Curves: An Introduction to Algebraic Geometry, Reading, MA: AddisonWesley Publishing Company, 1969. 11. Number of rational points of a singular curve, Proc. Am. Math. Soc., vol. 126, pp. 2549  2556, 1998. 12. A. Garcia and H. Stichtenoth, A tower of ArtinSchreier extensions of function fields attaining the DrinfeldVladut bound, Inventiones Math., vol. 121, pp. 211222, 1995. 13. A. Garcia and H. Stichtenoth, Algebraic function fields over finite fields with many rational points, IEEE Trans. Information Theory, vol. 41, pp. 15481563, 1995. 14. A. Garcia and J.F. Voloch, Fermat curves over finite fields, Jour. Number Theory, vol. 30, pp. 345 356, 1988. 15. A. Garcia and H. Stichtenoth, On the asymptotic behaviour of some towers of function fields over finite fields, Jour. Number Theory, vol. 61, pp. 248273, 1996. 16. G. van der Geer and M. van der Vlugt, Curves over finite fields of characteristic 2 with many rational points, C.R. Acad. Sci., Paris, S´erie I, vol. 317, pp. 593507, 1993. 17. G. van der Geer and M. van der Vlugt, Tables of cuves with many points, preprint, available at http://www.wins.uva.nl/~geer. 18. G. van der Geer and M. van der Vlugt, How to construct curves over finite fields with many points, available at http://www.wins.uva.nl/~geer. 19. G. van der Geer and M. van der Vlugt, Constructing curves over finite fields with many points by solving linear equations, preprint. 20. G. van der Geer and M. van der Vlugt, Generalized ReedMuller codes and curves with many points, J. Number Theory, vol. 72, pp. 257268, 1998. 21. G. van der Geer, Codes and elliptic curves, in Eﬀective Methods in Algebraic Geometry, T. Mora and C. Traverso eds., Birkh¨ auser, pp. 160168, 1991. 22. V.D. Goppa, Codes associated with divisors, Prob. Info. Trans., vol. 13, pp. 2227, 1977. 23. V. D. Goppa, Codes on algebraic curves, Soviet Math. Dokl., vol. 24, pp. 170172, 1981. 24. M.D. Huang and D. Ierardi, Counting points on curves over finite fields, J. Symbolic Conp., vol. 25, pp. 121, 1998. 25. Y. Ihara, Some remarks on the number of rational points of algebraic curves over finite fields, J. Fac. Sci. Tokyo, vol. 28, pp. 721724, 1981. 26. N. Koblitz, A course in number theory and cryptography, Graduate Texts in Mathematics, vol. 114, Berlin: SpringerVerlag, 1994. 27. N. Koblitz, Algebraic aspects of Cryptography, Algorithms and Computation in Mathematics, Vol. 3, Springer, 1998. 28. Kristin Lauter, Ray class field constructions of curves over finite fields with many rational points, in Algorithmic Number Theory, Henri Cohen ed., Lecture Notes in Computer Science, vol. 1122, SprigerVerlag, 187195 1996.
64
Ian F. Blake
29. Kristin Lauter, A formula for constructing curves over finite fields with many rational points, J. Number Theory, vol. 74, pp. 5672, 1999. 30. Yu. I. Manin, What is the maximum number of points on a curve over F2 , J. Fac. Sci. Univ. Tokyo, Sec. IA, vol. 28, pp. 715720, 1981. 31. A. Menezes, Elliptic Curve Public Key Cryptosystems, Dordrecht, The Netherlands: Kluwer, 1993. 32. C. Moreno, Algebraic curves over ﬁnite ﬁelds, Cambridge Tracts in Mathematics 97, Cambridge University Press, 1993. 33. H. Niederreiter and C. Xing, Lowdiscrepancy sequences and global function fields with many rational places, Finite Fields and their Applications, vol. 2, pp. 241273, 1996. 34. H. Niederreiter and C. Xing, Cyclotomic function fields, Hilbert class fields, and global function fields with many rational places, Acta Math., vol. 79, pp. 5376, 1997. 35. H. Niederreiter and C. Xing, Explicit global function fields over the binary field with many rational places, Acta Math., vol. 75, pp. 383396, 1996. 36. H. Niederreiter and C. Xing, Drinfeld modules of rank 1 and algebraic curves with many rational points II, Acta Math., vol. 81, pp. 81100, 1997. 37. H. Niederreiter and C. Xing, Towers of global function fields with asymptotically many rational places and an improvement on the GilbertVarshamov, Math. Nachr. vol. 195, pp. 171186, 1998. 38. Ren´e Schoof, Algebraic curves and coding theory, UTM 336, Univ. of Trento, 1990. 39. R. Schoof, Families of curves and weight distributions of codes, Bull. Amer. Math. Soc., vol. 32, pp. 171183, 1996. 40. R. Schoof, Algebraic curves over F2 with many rational points, Jour. Number Theory, vol. 1, pp. 614, 1992. 41. J.P. Serre, Nombres des points des courbes alg´ebriques sur Fq , S´eminaire de Th´eorie des Nombres de Bordeaux, 1982/83, no. 22. 42. J.P. Serre, Sur le nombre des points rationnels d’une courbe alg´ebrique sur un corps fini, C.R. Acad. Sci., Paris, S´erie I, vol. 296, pp. 397402, 1983. 43. H. Stichtenoth, Algebraic Function Fields and Codes, Berlin: SpringerVerlag, 1991. 44. M.A. Tsfasman, S.G. Vladut and T. Zink, Modular curves, Shimura curves and Goppa codes better than the VarshamovGilbert bound, Math. Nach., vol. 109, pp. 2128, 1982. 45. W.C. Waterhouse, Abelian varieties over finite fields, Ann. Sci. E.N.S., vol. 2, pp. 521560, 1969. 46. C. Xing, Hyperelliptic function fields and codes, J. Pure and Appl. Alg., vol. 74, pp. 109118, 1991. 47. C. Xing and H. Stichtenoth, The genus of maximal function fields, Manuscripta Math., vol. 86, pp. 217224, 1995. 48. C. Xing and H. Niederreiter, Modules de Drinfeld et courbes alg´ebriques ayant beaucoup de points rationnels, C.R. 156 Acad. Sci., Paris, S´erie I, vol. 322, pp. 651 654, 1996. 49. C. Xing and H. Stichtenoth, The genus of maximal function fields over finite fields, Manuscripta Math., vol. 86, pp. 217224, 1995. 50. T. Zink, Degeneration of Shimura surfaces and a problem in coding theory, in Proceedings, Fundamentals of Computation Theory, Lecture Notes in Computer Science, vol. 199, L. Budach, ed., SpringerVerlag, 1986.
New Sequences of Linear Time Erasure Codes Approaching the Channel Capacity M. Amin Shokrollahi Bell Labs, Room 2C353, 700 Mountain Ave, Murray Hill, NJ 07974, USA amin@research.belllabs.com
Abstract. We will introduce a new class of erasure codes built from irregular bipartite graphs that have linear time encoding and decoding algorithms and can transmit over an erasure channel at rates arbitrarily close to the channel capacity. We also show that these codes are close to optimal with respect to the tradeoﬀ between the proximity to the channel capacity and the running time of the recovery algorithm.
1
Introduction
A linear errorcorrecting code of block length n and dimension k over a ﬁnite ﬁeld IFq —an [n, k]q code for short—is a kdimensional linear subspace of the standard vector space IFnq . The elements of the code are called codewords. To the code C there corresponds an encoding map Enc which is an isomorphism of the vector spaces IFkq and C. A sender, who wishes to transmit a vector of k elements in IFq to a receiver uses the mapping Enc to encode that vector into a codeword. The rate k/n of the code is a measure for the amount of real information in each codeword. The minimum distance of the code is the minimum Hamming distance between two distinct codewords. A linear code of block length n, dimension k, and minimum distance d over IFq is called an [n, k, d]q code. Linear codes can be used to reliably transmit information over a noisy channel. Depending on the nature of the errors imposed on the codeword during the transmission, the receiver then applies appropriate algorithms to decode the received word. In this paper, we assume that the receiver knows the position of each received symbol within the stream of all encoding symbols. We adopt as our model of losses the erasure channel, introduced by Elias [3], in which each encoding symbol is lost with a ﬁxed constant probability p in transit independent of all the other symbols. As was shown by Elias [3], the capacity of this channel equals 1 − p. It is easy to see that a code of minimum distance d is capable of recovering d−1 or less erasures. In the best case, it can recover from any set of k coordinates of the encoding which means that d − 1 = n − k. Such codes are called MDScodes. A standard class of MDScodes is given by ReedSolomon codes [10]. The connection of these codes with polynomial arithmetic allows for encoding and decoding in time O(n log 2 n log log n). (See, [2, Chapter 11.7] and [10, p. 369]). However, these codes do not reach the capacity of the erasure channel, since there is no inﬁnite sequence of such codes over a ﬁxed ﬁeld. Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 65–76, 1999. c SpringerVerlag Berlin Heidelberg 1999
66
M. Amin Shokrollahi
Elias [3] showed that a random linear code can be used to transmit over the erasure channel at any rate R < 1 − p, and that encoding and decoding can be accomplished with O(n2 ) and O(n3 ) arithmetic operations, respectively. Hence, we have on the one hand codes that can be encoded and decoded faster than general linear codes, but do not reach the capacity of the erasure channel; and on the other hand we have random codes which reach the capacity but have encoding and decoding algorithms of higher complexity. The paper [1] was the ﬁrst to design codes that could come arbitrarily close to the channel capacity while having linear time encoding and decoding algorithms. Improving these results, the authors of [8] took a diﬀerent approach and designed fast lineartime algorithms for transmitting just below channel capacity. For all > 0 they were able to produce rate R = 1 − p(1 + ) codes along with decoding algorithms that could recover from the random loss of a p fraction of the transmitted symbols in time proportional to n ln(1/) with high probability, where n is the block length. These codes could also be encoded in time proportional to n ln(1/). They belong to the class of lowdensity parity check codes of Gallager [4]. In contrast to Gallager codes, however, the graphs used to construct the asymptotically good codes obtained in [8] are highly irregular. The purpose of the present paper is twofold. First, we prove a general tradeoﬀ theorem between the proximity of a given Gallager code to the channel capacity in terms of the loss fraction and the running time of the recovery algorithm of [8]. We show that in this respect, the codes constructed in that paper are close to optimal. Next, we exhibit a diﬀerent sequence of asymptotically close to optimal codes which have better parameters than the codes in [8]. An interesting feature of these codes is that the underlying bipartite graphs are right regular, i.e., all nodes on the right hand side of the graph have the same degree. Since they are theoretically better than their peers, we expect them to also perform better in practice. The organization of the paper is as follows. In the next section we will review the construction of Gallager codes. Next, we prove upper bounds on the maximum tolerable loss fraction in terms of the running time of the decoding algorithm. The last two sections are concerned with the derivation of the sequence of right regular erasure codes.
2
Codes from Bipartite Graphs
In this section, we will brieﬂy review the class of codes we are interested in, and the erasure recovery algorithm associated to them. Our codes are similar to the Gallager codes [4] in that they are built from sparse bipartite graphs. In contrast to Gallager codes, however, our codes will be constructed from graphs that have a highly irregular degree pattern on the left. Let G be a bipartite graph with n nodes on the left and n − k nodes on the right. G gives rise to a binary code of blocklength n and dimension ≥ k in the
New Sequences of Erasure Codes
67
following way: let the adjacency matrix of the graph G be given as H 0 , A= H 0 where H is some (n − k) × n matrix describing the connections in the graph. The code deﬁned by the graph is the code with parity check matrix H. A diﬀerent way of describing the code is as follows: we index the coordinate positions of the code with the n nodes on the left hand side of the graph. The code consists of all binary vectors (c1 , . . . , cn) such that for each right node in the graph the sum of the coordinate places adjacent to it equals zero. The blocklength of this code equals n, and its dimension is at least k since we are imposing n − k linear conditions on the coordinates of a codeword. Expressed in terms of the graph, the fraction of redundant symbols in a codeword is at most aL /aR where aL and aR are the average node degrees on the left and the right hand side of the graph, respectively. In other words, the rate of the code is at least 1 − aL /aR . This description of the rate will be useful in later analysis. In the following, we will assume that the rate is in fact equal to this value. This is because the statements we will prove below will become even stronger if the rate is larger. The above construction needs asymptotically O(n2 ) arithmetic operations to ﬁnd the encoding of a message of length k, if the graph is sparse. One can apply a trick to reduce the running time to O(n) by a modiﬁcation of the construction. Details can be found in [8]. Suppose now that a codeword (c1 , . . . , cn ) is sent and that certain erasures have occurred. The erasure recovery algorithm works as follows. We ﬁrst initialize the contents of the right hand nodes of G with zero. Then we collect the nonerased coordinate positions, add their value to the current value of their right neighbors, and delete the left node and all edges emanating from it from the graph. After this stage, the graph consists of the erased nodes on the left and the edges emanating from these nodes. In the next step we look for a right node in the graph of degree one, i.e., a node that has only one edge coming out of it. We transport the value of this node to its unique left neighbor , thereby recovering the value of c . We add c to the current value of all the right neighbors of , delete the edges emanating from , and repeat the process until we cannot ﬁnd a node of degree one on the right, or until all nodes on the left have been recovered. It is obvious that, on a RAM with unit cost measure, the amount of arithmetic operations to ﬁnish the algorithm is at most proportional to the number of edges in the graph, i.e., to naL , where aL is the average node degree on the left. The aim is thus to ﬁnd graphs with constant aL for which the recovery algorithm ﬁnishes successfully. The main contribution of [8] was to give an analytic condition on the maximum fraction of tolerable losses in terms of the degree distribution of the graph. More precisely, deﬁne the left and the right degree of an edge in the graph as the degree of the left, resp. right node it is emanating from. Further, denote by λi and ρi the fraction of edges degree i, and consider the of left, resp. right generating functions λ(x) := i λi xi−1 and ρ(x) := i ρi xi−1 . In the following we will call the pair (λ, ρ) a degree distribution.
68
M. Amin Shokrollahi
Table 1. Rate 1/2 codes built from heavy tail/Poisson distribution. N 8 16 27 47 79 132 221
aR 5.9266 7.0788 8.0054 9.0256 10.007 10.996 12.000
θ 5.9105 7.0729 8.0027 9.0243 10.007 10.996 12.000
δ/(1 − R) 0.91968 0.95592 0.97256 0.98354 0.98990 0.99378 0.99626
δ 0.45984 0.47796 0.48628 0.49177 0.49495 0.49689 0.49813
δˆ 0.49085 0.49609 0.49799 0.49902 0.49951 0.49975 0.49988
δ/δˆ 0.93682 0.96345 0.97648 0.98547 0.99087 0.99427 0.99650
The main theorem of [8] states that if the graph is chosen at random with degree distribution (λ, ρ) and if the erasures occur at random positions, then the above erasure recovery algorithm can correct a δfraction of losses if ρ(1 − δλ(x)) ≥ 1 − x for x ∈ [0, 1]. In a later paper [6], this condition was slightly relaxed to δλ (1 − ρ(1 − x)) < x
for x ∈ (0, δ].
(1)
The paper [8] further exhibited for any > 0 and any rate R an inﬁnite sequence of degree distributions (λ, ρ) giving rise to codes of rate at least R such that the above inequality is valid for δ = (1−R)(1−), and such that the average left degree of the graph is O(log(1/)). In other words, δ can get arbitrarily close to its optimal value (1 − R) with a “logarithmic” sacriﬁce in the running time of the algorithm. Explicitly, for any given 0 < < 1, one chooses an integer N close to 1/ and considers the pair (λN , ρN ) with λN (x) =
N −1 k 1 x , H(N − 1) k
ρN (x) = eθN ·(x−1),
(2)
k=1
where θN is chosen to make the fraction of the nodeson the left and the right N −1 equal to 1 − R, and H(N − 1) is the harmonic sum k=1 1/k. This sequence is referred to as the heavy tail/Poisson sequence in the following. Table 1 gives an example of the performance of this sequence for some codes of rate 1/2. In that table, aR denotes the average right degree, δ is the maximum tolerable loss fraction, and δˆ is a theoretical upper bound on δ, see Remark 1 following Corollary 1. In the following we will show that this sort of relationship between the average degree and the maximum tolerable loss fraction is the best possible. Furthermore, we will exhibit a new sequence of degree distributions for which the same type of relationship holds between the average degree and the maximum tolerable loss fraction.
New Sequences of Erasure Codes
3
69
Upper Bounds for the Maximum Tolerable Loss Fraction
In this section we will prove some upper bounds on the maximum tolerable loss fraction δ for which the algorithm given in the previous section is successful when applied to a random graph with a given degree distribution. Our main tool will be the inequality (1). We will ﬁrst need a preliminary result. Lemma 1. Let G be a bipartite graph with edge degree distribution (λ, ρ). Then, the average left, resp. right, node degree of G equal 1 0
1 λ(x)dx
,
1 0
1 ρ(x)dx
,
respectively. Let the polynomials Λ and R be deﬁned by x x λ(t)dt ρ(t)dt 0 , R(x) := 01 . Λ(x) := 1 λ(t)dt ρ(t)dt 0 0 Then the coeﬃcient of xi in Λ(x), resp. R(x) equals the fraction of nodes of degree i on the LHS, resp. RHS of G. Proof. Let L denote the number of nodes on the LHS of G. Then the number of edges in G equals aL L, and the number of edges of degree i equals λi aL L. Hence, the number of nodes of degree i on the LHS of the graph equals λi aL L/i, since each such node contributes to exactly i edges of degree i. Hence, the fraction of nodes of degree i equals λi aL /i. Since the sum of all these fractions equals 1, we 1 obtain a−1 L = i λi /i = 0 λ(x)dx and this yields both assertions for the LHS of the graph. The assertions on the RHS follow similarly. The following theorem shows a rather strong upper bound on δ. Theorem 1. Let λ and ρ denote the right and left edge degree distributions of a bipartite graph. Let δ be a positive real number such that δλ (1 − ρ(1 − x)) ≤ x for 0 < x ≤ δ. Then we have δ≤
aL (1 − (1 − δ)aR ) , aR
where aL and aR are the average node degrees of the graph on the left, resp. right, hand side. Proof. As a real valued function the polynomial λ(x) is strictly increasing for positive values of x. Hence it has a unique inverse λ−1 which is also strictly increasing. The ﬁrst of the above inequalities is thus equivalent to 1 − ρ(1 − x) ≤ λ−1 (x/δ)
70
M. Amin Shokrollahi
for 0 < x ≤ δ. Integrating both sides from 0 to δ and simplifying the expressions, we obtain 1−δ 1 1 ρ(x)dx − ρ(x)dx ≥ δ λ(x)dx. (Note that have
1 0
0
λ−1 (x)dx = 1 −
aL δ≤ − aL aR
1−δ 0
1 0
0
0
λ(x)dx.) Invoking the previous lemma, we thus
aL ρ(x)dx = aR
1−δ
1− 1 0
0
ρ(x)dx
ρ(x)dx
=
aL (1 − R(1 − δ)) , aR
where the polynomial R is deﬁned as in the statement of Lemma 1. To ﬁnish the proof we only need to show that R(1 − δ) ≥ (1 − δ)aR . To see this, ﬁrst note that if a1 , . . . , aM are nonnegative real numbers adding up to 1 and µ is any nonnegative real number, then ai µi ≥ µ i iai . (3) i
This follows from taking logarithms of both sides and noting that the logfunction is concave. Denoting by ai the coeﬃcients of R(x) and setting µ := 1 − δ, this gives our desired inequality: the ai are by the previous lemma the fractions of nodes of degree i, and hence i iai is the average right degree aR . Corollary 1. With the same assumptions as in the previous theorem, we have δ≤
aL (1 − (1 − aL /aR )aR ) . aR
Proof. Follows from δ ≤ aL /aR .
Remark 1. A more reﬁned upper bound for δ is given by the following. Let r denote the quantity aL /aR , i.e., one minus the rate of the code. Suppose that aR > 1/r—an assumption that is automatically satisﬁed if the average left degree is larger than one—and consider the function f(x) = x−r(1−(1−x)aR ). We have f(0) = 0, f(1) = 1 − r > 0, f (0) < 0, and f (x) has exactly one root. Hence, f has exactly one nonzero root, denoted by δˆaR ,r and this root is between 0 and 1. The previous theorem asserts that the maximum δ satisfying the inequality (1) is smaller than δˆaR ,aL /aR . The following deﬁnition is inspired by Corollary 1. Definition 1. A sequence (λm , ρm ) of degree distributions giving rise to codes of rate R ≥ R0 for a ﬁxed R0 > 0 is called asymptotically quasioptimal if there exists a constant µ (possibly depending on R) and for each m there exists a positive δm with δm λm (1 − ρm (1 − x)) ≤ x for x ∈ (0, δm ] such that the average right degree aR satisﬁes aR ≤ µ log(1 − δ/(1 − R)).
New Sequences of Erasure Codes
71
The signiﬁcance of quasioptimal is the following: we want to construct codes that can recover as many erasures as possible in as little as possible time. The running time of the erasure recovery algorithm is proportional to the blocklength of the code times aR , the average right degree. Hence, we are looking for a tradeoﬀ between aR and δ. Corollary 1 shows that we cannot make aR too small. In fact, it implies that aR ≥ log(1 − δ/(1 − R))/ log R. However, we might hope that we can make aR smaller than log(1 − δ/(1 − R)) times a (negative) constant µ. In this way, we have maintained a qualitative optimality of the code sequence in the sense that the running time increases only logarithmically with the relative increase of the erasure correction capability. The heavy tail/Poisson sequence (2) is asymptotically quasioptimal as is shown in [8]. Starting in the next section, we will introduce another quasioptimal sequence which has certain advantages over the heavy tail/Poisson sequence. We close this section by stating another useful upper bound on the maximum possible value of δ. Lemma 2. Suppose that λ and ρ are polynomials and δ is such that δλ(1−ρ(1− x)) < x for x ∈ (0, δ]. Then δ ≤ ρ (1)/λ (0). Proof. Let f(x) = δλ(1 − ρ(1 − x)) − x. By assumption, f(x) < 0 for x ∈ (0, δ]. This implies that f (0) ≤ 0, where f (x) is the derivative of f with respect to x. But f (0) = δλ (0)ρ (1) − 1, which gives the assertion.
4
Fractional Binomial Coeﬃcients
As can be seen from their description (2), the heavy tail/Poisson sequence is closely related to the Taylor series of − ln(1 − x). The new sequence that we will describe in the next section will be related to the Taylor expansion of (1 − x)α where 1/α is an integer. The coeﬃcients of this expansion are fractional binomial coeﬃcients. For this reason, we will recall some wellknown facts about these numbers. For real α and a positive integer N we deﬁne α α(α − 1) · · · (α − N + 1) := N N! α α
α 1− ··· 1 − (1 − α) . = (−1)N −1 N N −1 2 For convenience, we also deﬁne α0 := 1. We have the Taylor expansion ∞ α α 1 − (1 − x) = (4) (−1)k+1 xk . k k=1
Note that for 0 < α < 1 the coeﬃcients of the right hand power series above are all positive. Furthermore, we have N −1 α N α (−1)N +1 , (−1)k+1 = 1 − (5) α N k k=1
72
and
M. Amin Shokrollahi N −1 k=1
α (−1)N +1 α− N α (−1)k+1 = , k+1 α+1 k
(6)
as can easily be proved by induction. In the following, we will derive some estimates on the size of the binomial coeﬃcients. Proposition 1. Let α be a positive real number less than 1/2 and let N ≥ 2 be an integer. There is a constant c independent of α and N such that α α cα ≤ (−1)N +1 ≤ α+1 . α+1 N N N α N +1 Proof. First note that N (−1) i is positive. Taking its logarithm and expanding the series − ln(1 − x) = i x /i, we obtain N N 1 α2 1 α N +1 (−1) ln = ln(α/N ) − α − ···. − k 2 k2 N k=1
(7)
k=1
(Note that the series involved are absolutely convergent, so that we can rearrange the orders of the summations.) For an upper bound on this sum we replace N s k=1 1/k by 1 for s ≥ 2 and use the left hand side of the inequality ln(N ) + γ <
N 1 1 < ln(N ) + γ + , k 2N k=1
where γ is Euler’s constant [5, pp. 480–481]. For the lower bound we use the N right hand side of the above inequality and replace k=1 1/k s for s ≥ 2 by 2. This gives, after exponentiation, α α α α(2−γ−1/2N ) 2 e (1 − α) ≤ (−1)N +1 ≤ α+1 eα(1−γ) (1 − α). N N α+1 N Noting that 0 < α ≤ 1/2, we obtain our assertion.
5
Right Regular Sequences
A closer look at the proof of Theorem 1 reveals the following: one of the reasons we might obtain a lower than optimal upper bound for the largest δ satisfying (1) is that the inequality (3) is not sharp. In fact, that inequality is sharp iﬀ all the ai except for one are zero, i.e., iﬀ ρ(x) has only one nonzero coeﬃcient, i.e., iﬀ the graph has only one degree on the right hand side. We call such graphs right regular in the following. We will study in this section right regular degree distributions and design the left hand side in such a way as to obtain asymptotically quasioptimal sequences. We remark that right regular sequences
New Sequences of Erasure Codes
73
were previously studied in an unpublished manuscript [9]. Our approach diﬀers however from the one given in that paper. From a theoretical point of view, codes obtained from these sequences should perform better than the heavy tail/Poisson distribution, since they allow for a larger lossfraction for a given rate and a given average left degree. Furthermore, the analysis given in [6] suggests that the actual performance of the code is related to how accurately the neighborhood of a message node is described by a tree given by the degree distributions. For instance, the performance of regular graphs is much more sharply concentrated around the value predicted by the theoretical analysis, than the performance of irregular graphs. Hence, if the graph is right regular, one should expect a smaller variance in the actual performance than for corresponding irregular codes. For integers a ≥ 2 and N ≥ 2 we deﬁne N −1 α (−1)k+1 xk a−1 ρa (x) := x , λα,N (x) := α k=1 kα , (8) α − N N (−1)N +1 where here and in the following we set α := 1/(a − 1). Note that ρa (1) = 1, that λa,N (1) = 1 by (5), and that λa,N has positive coeﬃcients. Hence, (ρa , λa,N ) indeed deﬁnes a degree distribution. Let ν < 1 be a positive constant. The sequence we are interested in is given by the following degree distributions: ρa (x),
λa,ν −1/α (x).
(9)
First we consider the rate of the codes deﬁned by these distributions. Proposition 2. The rate R = Ra,µ of the code deﬁned by the distributions in (9) satisﬁes 1−ν 1 − cν ≤1−R≤ , 1 − cνν 1/α 1 − νν 1/α where c is the constant from Proposition 2. Proof. In the following, we denote by N the quantity ν −1/α and by λ(x) the function λa,N (x). We need to compute the integral of λ(x). We have α 1 (−1)N +1 α− N α α λ(x)dx = . α + 1 α − N N (−1)N +1 0 1 Further, 0 ρa (x)dx = 1/a = α/(α + 1). Hence, the rate of the code is α α−N N (−1)N +1 α 1− := 1 − rα,N . (10) α − N (−1)N +1 Next, we estimate rα,N using Proposition 2 again: 1−ν 1 − cν ≤ rα,N ≤ . 1 − cνν 1/α 1 − νν 1/α This gives the desired assertion on the rate.
74
M. Amin Shokrollahi
Theorem 2. The sequence of degree distributions given in (9) is asymptotically quasioptimal. Proof. Let us compute the maximum value of δ such that δλ(1 − ρ(1 − x)) < x for x ∈ (0, δ), where (λ, ρ) is the pair given in (9). We have δα α (1 − ρ(1 − x)α ) α − N N (−1)N +1 δα α = x. α−N N (−1)N +1
δλ(1 − ρ(1 − x)) <
So, for δλ(1 − ρ(1 − x)) < x we need δ≤
α−N
α
(−1)N +1 . α
N
(11)
This is, by the way, the same upper bound as the one obtained from Lemma 2. In the following we assume that δ is equal to the above upper bound, and compute δ/(1 − R), R being the rate of code estimated in Proposition 2: α α− N (−1)N +1 δ 1 = ≥ 1 − α+1 . 1−R α N Ignoring diophantine constraints, we assume in the following that N = ν −1/α . This gives δ ≥ 1 − ν (α+1)/α = 1 − ν aR , 1−R where aR = a = (α + 1)/α is the average right degree of the code. This proves the assertion. In practical situations, one is interested in designing a code of a given ﬁxed rate. Since the parameter α in the deﬁnition of the sequence in (9) is the inverse of an integer, the range of this parameter is discrete. Hence, we do not have a continuous range of rates for these codes. However, we can come arbitrarily close to our desired rate by making α smaller, i.e., by allowing high degrees on the RHS of the graph. Some examples for diﬀerent target rates are given in Table 2. The value of N has been handoptimized in these examples to come as close to the desired rate as possible. The last column in that table corresponds to the value δˆ deﬁned in Remark 1 following Corollary 1. It gives a theoretical upper bound on the best value of δ. As can be observed from the table, the maximum value of δ converges very quickly to the maximum possible value. Also, a comparison between these codes and the heavy tail/Poisson distribution in Table 1 reveals that the new codes are better in terms of the tradeoﬀ between δ and aR . However, the new codes need larger degrees on the left than the heavy tail/Poisson distribution. To obtain codes that have a ﬁxed rate, one can modify α slightly. Such examples are given in Table 3. Both parameters (α and N ) of these codes are handoptimized to give the best results.
New Sequences of Erasure Codes
75
Table 2. Right regular sequences for rates R close to 2/3, 1/2, and 1/3. N 2 3 6 11 17 27 42 64 13 29 60 12. 257 523 1058 111 349 1077 3298
6
aR 6 7 8 9 10 11 12 13 6 7 8 9 10 11 12 6 7 8 9
1−R 0.33333 0.31677 0.32886 0.33645 0.33357 0.33392 0.33381 0.33312 0.50090 0.50164 0.49965 0.49985 0.50000 0.50002 0.49999 0.66677 0.66667 0.66663 0.66669
δ/(1 − R) 0.60000 0.74537 0.88166 0.93777 0.96001 0.97502 0.98401 0.98953 0.96007 0.98251 0.99159 0.99598 0.99805 0.99904 0.99953 0.99698 0.99904 0.99969 0.99990
δ 0.20000 0.23611 0.28994 0.31551 0.32024 0.32558 0.32847 0.32963 0.48090 0.49287 0.49545 0.49784 0.49903 0.49954 0.49975 0.66475 0.66603 0.66642 0.66662
δˆ 0.29099 0.28714 0.31243 0.32690 0.32724 0.32984 0.33113 0.33134 0.49232 0.49759 0.49762 0.49885 0.49951 0.49977 0.49986 0.66584 0.66636 0.66653 0.66665
δ/δˆ 0.68731 0.82230 0.92801 0.96514 0.97860 0.98711 0.99197 0.99484 0.97679 0.99052 0.99563 0.99797 0.99904 0.99953 0.99977 0.99837 0.99950 0.99984 0.99995
Conclusions and Open Questions
In this paper we have analyzed the theoretical performance of a simple erasure recovery algorithm applied to Gallager codes by deriving upper bounds on the maximum fraction of tolerable losses given a parameter that describes the running time for encoding and decoding of the codes. We have shown that there is a tradeoﬀ between proximity to the optimal value of tolerable losses, i.e., one minus the rate of the code, and the average degree of nodes in the graph, in the sense that multiplying the average degree by a constant factor implies an exponential relative increase of the maximum tolerable loss fraction. Further, we have introduced a new sequence of graphs which are asymptotically close to optimal with respect to this criterion. These graphs are right regular and their node degree distribution on the left hand side is closely related to the power series expansion of (1 − x)α , where α is the inverse of an integer. Previously, the only known such sequence was the heavy tail/Poisson sequence introduced in [8]. We have included examples which show that the new codes tolerate a higher fraction of losses if the average degree of the graph is ﬁxed. It would be very interesting to extend the analysis given in this paper to other decoding algorithms, e.g., to the simple decoding algorithm of Gallager for error correcting codes [4,7]. Such an analysis would probably give clues for constructing inﬁnite sequences of graphs that perform asymptotically optimally with respect to the decoding algorithm in question.
76
M. Amin Shokrollahi
Table 3. Rate 1/2 codes, ρ(x) = xaR −1 , λα,N (x) as deﬁned in (8) for arbitrary α. N 11 27 59 124 256 521 1057
α 0.17662 0.16412 0.14225 0.12480 0.11112 0.09993 0.09090
aR 6 7 8 9 10 11 12
δ/(1 − R) 0.95982 0.98190 0.99142 0.99596 0.99800 0.99896 0.99950
δ 0.47991 0.49095 0.49571 0.49798 0.49900 0.49948 0.49975
δˆ 0.49134 0.49586 0.49798 0.49901 0.49951 0.49975 0.49988
δ/δˆ 0.97673 0.99009 0.99543 0.99794 0.99898 0.99945 0.99974
References 1. N. Alon and M. Luby. A linear time erasureresilient code with nearly optimal recovery. IEEE Trans. Inform. Theory, 42:1732–1736, 1996. 2. R.E. Blahut. Theory and Practice of Error Control Codes. Addison Wesley, Reading, MA, 1983. 3. P. Elias. Coding for two noisy channels. In Information Theory, Third London Symposium, pages 61–76, 1955. 4. R. G. Gallager. Low Density ParityCheck Codes. MIT Press, Cambridge, MA, 1963. 5. R.L. Graham, D.E. Knuth, and P. Patashnik. Concrete Mathematics. AddisonWesley, 1994. 6. M. Luby, M. Mitzenmacher, and M.A. Shokrollahi. Analysis of random processes via andor tree evaluation. In Proceedings of the 9th Annual ACMSIAM Symposium on Discrete Algorithms, pages 364–373, 1998. 7. M. Luby, M. Mitzenmacher, M.A. Shokrollahi, and D. Spielman. Analysis of low density codes and improved designs using irregular graphs. In Proceedings of the 30th Annual ACM Symposium on Theory of Computing, pages 249–258, 1998. 8. M. Luby, M. Mitzenmacher, M.A. Shokrollahi, D. Spielman, and V. Stemann. Practical lossresilient codes. In Proceedings of the 29th annual ACM Symposium on Theory of Computing, pages 150–159, 1997. 9. M. Luby and M.A. Shokrollahi. Right regular erasure codes. Unpublished, 1998. 10. F.J. MacWilliams and N.J.A. Sloane. The Theory of ErrorCorrecting Codes. NorthHolland, 1988.
On the Theory of LowDensity Convolutional Codes Karin Engdahl, Michael Lentmaier, and Kamil Sh. Zigangirov Dept. of Information Technology, Lund University, Box 118, S22100, Sweden {karin,michael,kamil}@it.lth.se
Abstract. We introduce and analyze a new statistical ensemble of lowdensity paritycheck convolutional (LDC) codes. The result of the analysis are bounds, such as a lower bound for the free distance and upper bounds for the burst error probability of the LDC codes.
1
Introduction
In the past few years there has been an enormous interest in lowdensity convolutional (LDC) codes, combined with iterative decoding algorithms, due to that such systems have been proved to achieve low bit error probabilities even for signaltonoise ratios close to the channel capacity. The most well known results in this direction are the simulation results obtained by Berrou et al. [1] for the so called turbocodes. Lowdensity paritycheck block codes were introduced in the 60s by Gallager [2]. The generalization of Gallager’s codes to lowdensity convolutional codes was presented in [3]. Our earlier work in this area [4,5,6] focused on the mathematical description of, and a general construction method for LDC codes, together with an iterative decoding algorithm for these codes. In this paper we introduce a new statistical model of LDC codes, that can be investigated analytically. The analytic approach, that we take, is made in terms of statistical ensemble analysis, and the resulting theoretical tools are bounds on the free distance, and on the average bit and burst error probabilities of the ensemble considered, if a maximum likelihood sequence estimator were used in the decoder. We model the statistical ensemble through the use of a device that we call a Markov scrambler, and which is described in the paper. The performance analysis of the ensemble is then reduced to the calculation of a reﬁned average weight distribution through the study of a Markov process. To illustrate the method, we perform the analysis on some diﬀerent classes of LDC codes, and compare the results.
2
Code Description
To introduce the ensemble of LDC codes, we modify some of the deﬁnitions in [6]. A rate R = b/c LDC code is a convolutional code having a sparse syndrome former matrix H T , where H is the paritycheck matrix and T denotes transposition. Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 77–86, 1999. c SpringerVerlag Berlin Heidelberg 1999
78
Karin Engdahl, Michael Lentmaier, and Kamil Sh. Zigangirov
If each row of H T contains ν ones and each column contains ν/ (1 − R) ones, the LDC code is called a homogeneous (ν,ν/ (1 − R))code. The code sequence v = . . . , v −1 , v 0 , v 1 , . . . , v t , . . . , where v t = vtc , . . . , v(t+1)c−1 , vi ∈ GF (2), i ∈ Z, satisﬁes the equality vH T = 0. This equation can be written in recursive form [7] as ms (t)
v t−i HiT (t) = 0 ,
(1)
i=0
where HiT (t), i = 0, 1, . . . , ms (t) are binary c × (c − b) submatrices, such that T H0T (t) has full rank, and Hm (t) = 0. Even though the syndrome former s (t) memory ms is ﬁxed in practice, we will suppose that it is a random variable depending on t. Equation (1) deﬁnes the code subblock v t if subblocks v t−1 , v t−2 , . . . , v t−ms (t) are known. We will suppose that the ﬁrst b symbols of v t coincide with the tth subblock of information symbols, ut = utb , . . . , u(t+1)b−1 , ui ∈ GF (2), at the encoder input, i.e. the encoder is systematic. In [6] a turbocode was deﬁned as a LDC code whose syndrome former matrix H T can be represented as the product of a convolutional scrambler S and the syndrome former matrix of a basic (component) code HbT , i.e. H T = SHbT . A convolutional scrambler S is deﬁned as an inﬁnite matrix S = (sij ), i, j ∈ Z, sij ∈ {0, 1}, that has one one in each column and at least one one in each row, and that satisﬁes the causality condition. If all rows have the same weight η (number of ones), then the scrambler is called homogeneous. We also consider semihomogeneous scramblers, when all even rows of H T have weight η, all odd rows have weight η − 1, and all columns have equal weight. An alternative deﬁnition of a scrambler is that it is a device that maps input ctuples onto output dtuples, d > c, by permuting the input symbols and making copies of (some of) them. The ratio d/c is called the scrambler rate Rs . The rate R of the turbocode is R = 1 − Rs (1 − Rb ), where Rb is the rate of the basic code. The memory M of the scrambler is deﬁned as the number of symbols that the decoder keeps in its memory. We have studied two classes of turbocodes, A and B. In class A, a rate Rs = d/c convolutional scrambler is followed by a rate Rb = (d − 1) /d degenerated component convolutional encoder of memory zero. (It calculates one paritycheck symbol to d − 1 input symbols.) In class B, a rate Rs = d/c convolutional scrambler is followed by a rate Rb = (d − c + b) /d component convolutional encoder. To simplify the description in this paper we consider only rate R = 1/2 LDC codes.
3
Statistical Ensemble of TurboCodes
To deﬁne a statistical ensemble of turbocodes, we ﬁrst introduce a statistical ensemble of rate Rs = d/c (c = 2) memory M Markov scramblers. The simplest way to describe this ensemble of scramblers is to represent the scrambler as a
On the Theory of LowDensity Convolutional Codes
79
box that contains M bits.1 Initially the box is ﬁlled with M dummy zeros. When the information subblock ut of b = 1 information bits enters the LDC encoder input, the encoder randomly picks d − c = c(Rs − 1) = 2(Rs − 1) bits from the box and replaces them by d − c = d − 2 new bits, generated by the encoder. For example, in case A the basic encoder picks, at each time instant t, 2(Rs − 1) bits (Rs − 1 is a positive integer) from the box and the information bit ut and calculates the paritycheck symbol of subblock v t . Then (Rs − 1) copies of both 2 bits of subblock v t are put into the box to replace the 2(Rs − 1) bits, picked before. In case B, the rate Rb = 2/3 (d = 3) component encoder picks, at each time instant t, one bit from the box and replaces it by the new information bit ut . The input of the component encoder is the information bit and the bit picked from the box. The output is the information bit and the paritycheck bit. Let at (l, d) be the mathematical expectation, in the ensemble of turbocodes, of the number of weight d paths that depart from the allzero path at time instant t and merge again with the allzero path at time (t + l). Since the statistical properties of the ensemble do not depend on t, we will skip the index t and consider the case t = 0. To calculate the average spectrum a(l, d) we can use recurrent equations. Example 1. (Class A, rate R = 1/2 LDC (2, 4)code) This code can be represented as a turbocode with a rate Rs = 4/2 (d = 4) convolutional scrambler. Let µ (µ is even), the number of ones in the scrambler, be the scrambler state at time instant t. To the allzero path corresponds the sequence of zero states. To a path which departs from the allzero path at moment t = 0 and merges again with the allzero path at moment t = l corresponds a sequence of states µ0 , µ1 , µ2 , . . . , µl , where µ0 = µl = 0 and µi = 0, 0 < i < l. Let f(µ, l, d) be the mathematical expectation of the number of paths of weight d, which start at moment t in state µ and reach state zero for the ﬁrst time at moment t + l. Since any path departing from the allzero path has to come to state µ = 2 ﬁrst, and since the transition from the zero state to state µ = 2 has weight 2, a(l, d) = f(2, l − 1, d − 2) .
(2)
The statetransition diagram of the encoder is presented in Fig. 1. All branches depart from state µ. They are labeled by the probability of the corresponding transition and the generated code subblock v t . From state µ the scrambler can come to the states µ, µ + 2 or µ − 2, as illustrated by the arrows in the ﬁgure, and the weight of the generated subblock can be 0, 1 or 2. For example, the lower right branch corresponds to the transition from state µ to state µ + 2 when the encoder generates the subblock v t = 11. This happens when the basic encoder picks two zeros from the box (with probability (M −µ)(M −µ−1) ) and a nonzero information symbol enters the encoder. Using M (M −1) this statetransition diagram we can get the following system of recurrent equa1
A more formal definition of Markov scramblers will be given in [8].
80
Karin Engdahl, Michael Lentmaier, and Kamil Sh. Zigangirov
−µ−1) ( (M −µ)(M , 00) M (M −1)
( Mµ(µ−1) , 11) (M −1) µ
(M −µ)µ (2 M , 10) (M −1)
(M −µ)µ , 01) (2 M (M −1)
µ−2 µ+2
( Mµ(µ−1) , 00) (M −1)
−µ−1) ( (M −µ)(M , 11) M (M −1)
Fig. 1. Statetransition diagram of the encoder in Example 1. tions (M − µ)(M − µ − 1) (f(µ, l − 1, d) + f(µ + 2, l − 1, d − 2)) M(M − 1) (M − µ)µ f(µ, l − 1, d − 1) (3) +4 M(M − 1) µ(µ − 1) (f(µ − 2, l − 1, d) + f(µ, l − 1, d − 2)) , µ = 0, M, l = 0 + M(M − 1) 1, l = 0, d = 0 f(0, l, d) = (4) 0, otherwise
f(µ, l, d) =
f(M, l, d) = f(M, l − 1, d − 2) + f(M − 2, l − 1, d) .
(5)
The recurrent solution of (3)(5) gives the average spectrum a(l, d). The generalization to other codes of case A (Rs = 2.5, 3, 4) is straightforward. Example 2. (Class B, rate R = 1/2 turbocode with a timeinvariant convolutional, rate Rb = 2/3, component code of syndrome former memory 2) Consider the case when the submatrices of the paritycheck matrix of the timeinvariant component code are given as H0 = 111,
H1 = 101,
H2 = 011
(6)
and the ﬁrst symbol in subblock v t = v3t , v3t+1 , v3t+2 is the information symbol ut , entering the turboencoder, v3t+1 is the bit that the component encoder picks from the scrambler box and v3t+2 is the paritycheck symbol generated by the component encoder. The syndrome former of the component encoder can be in
On the Theory of LowDensity Convolutional Codes
81
one of 4 states s, s = 0, 1, 2, 3 (0 00, 1 01, 2 10, 3 11). Together with the state of the scrambler µ, i.e. the number of ones within the box, the state s of the syndrome former of the component code deﬁnes the state of the turboencoder. By deﬁnition, the zero state corresponds to µ = 0, s = 0. Let f(µ, s, l, d) be the mathematical expectation of the number of paths of weight d, which start at some moment t in state (µ, s) and reach the zero state for the ﬁrst time at moment t + l. Since any path departing from the allzero path has to come to state (1, 1) ﬁrst, and since the transition from the zero state to this state has weight 2, the average spectrum of the turbocode is a(l, d) = f(1, 1, l − 1, d − 2) .
(7)
The statetransition diagram of the encoder is presented in Fig. 2. The branches are labeled analogously to Example 1. We emphasize, that all branches depart from state (µ, ·) and end in the state that the arrows in the ﬁgure point to. −µ ( MM ,00)
µ1 µ (M −µ ,11) ( MM
0
µ
µ+1
,01)
µ (M ,11) µ (M ,01)
−µ ( MM ,10) −µ ,00) ( MM
µ+1 µ (M
,10) µ
1 µ1 µ (M
−µ ,10) ( MM
µ+1
,00) µ ,10) (M
−µ ( MM ,01)
µ1 µ 2
µ (M ,11)
−µ ( MM ,11)
µ+1
3
µ
−µ ,01) ( MM
µ1 µ (M ,00)
Fig. 2. Statetransition diagram of the encoder in Example 2. Using this statetransition diagram we can obtain a system of recurrent equations, analogous to system (3)(5). The solution of this system gives the average spectrum a(l, d). The generalization to other codes of class B (with component codes of syndrome former memory 3,4 and 5) is straightforward.
4
Ensemble Analysis and Simulation Results
Ensemble analysis is widely used in coding theory. Often it can be hard to analyze a speciﬁc code, but easier to analyze the average behavior of a collection
82
Karin Engdahl, Michael Lentmaier, and Kamil Sh. Zigangirov
of codes. Obviously, at least one of the codes in the ensemble performs better than, or the same as, the average code. The solution of the systems of recurrent equations describing the behavior of the ensembles of turbocodes of classes A and B gives us the average spectrum of each ensemble a (l, d). Actually, in this paper we are more interested in the reﬁned average spectrum a (d) l a (l, d). We get this spectrum if we leave out the argument l in the system of recurrent equations. It is obvious that if we ﬁnd dˆ = dˆ(π), 0 ≤ π < 1, such that ˆ d−1
a (d) < 1 − π ,
(8)
d=1
at least a fraction π of the codes in the ensemble has free distance dfree not less ˆ The calculation of dˆ for diﬀerent codes gives us a Costellotype lower than d. bound for the free distance of diﬀerent codes. In Fig. 3 lower bounds for the free distance of some diﬀerent codes are given as a function of the scrambler size M. It is worth to note that for the LDC (2,4) and (2.5,5)codes of class A (the latter ones have a semihomogeneous scrambler with three ones in even rows and two ones in odd rows) and for the codes of class B, the bound grows logarithmically with M. For the LDC (3,6)code of class A it grows linearly.
90
80
70
60
d
free
50
40
30
20
10
0
500
1000
1500
2000
2500 3000 scrambler size M
3500
4000
4500
5000
Fig. 3. Lower bounds on the free distance. The dashed lines correspond to (from bottom to top) (2,4), (2.5,5) and (3,6) codes of class A. The solid lines correspond to class B codes with memory 2,3,4 and 5.
On the Theory of LowDensity Convolutional Codes
83
Using the average spectrum a (d) we can easily get an upper bound for the average burst error probability P¯B . In fact, if the transmission is over an additive white Gaussian noise (AWGN) channel with signaltonoise ratio Es /N0 , then P¯B < a (d) Dd , (9) d
where D = exp (−Es /N0 ). We note that the bound (9) can be calculated by directly solving the system of equations, recurrent in µ, with respect to the function f (µ, l, d) Dd . (10) FD (µ) = l
d
The upper bound (9) for the burst error probability can be improved if we expurgate bad codes from the ensemble, i.e. codes that have free distance less than dˆ(π), where dˆ(π) satisﬁes (8). Then the bound for the average burst error probability over the expurgated subensemble of codes (expurgated bound) is 1 a (d) Dd . P¯B,exp < π
(11)
ˆ d≥d(π)
Results of the calculation of the upper (union) bound (9) and expurgated bound (11) with π = 1/2 (only for class A codes) for the burst error probability are presented, together with simulation results of the burst error probability, in Fig. 4 and 5, and simulation results of the bit error probability are presented in Fig. 6 and 7. Note, that the bounds are average ensemble bounds for maximum likelihood decoding, while the simulations have been performed with a speciﬁc, randomly chosen, LDC code, using an iterative decoding procedure. In principle, we can use the reﬁned statetransition diagram of the ensemble for calculation of upper bounds for the bit error probability, analogously to the case of “usual” convolutional codes [7].
5
Conclusion
In this paper we presented preliminary results of the statistical analysis of a wide class of LDC codes. The results are lower bounds for the free distance and upper bounds for the error probability. The bounds for the error probability that we have calculated so far are nontrivial only for relatively large signaltonoise ratios. In future work we are planning to achieve bounds for smaller signaltonoise ratios, close to the Shannon limit.
84
Karin Engdahl, Michael Lentmaier, and Kamil Sh. Zigangirov
References 1. C. Berrou, A. Glavieux and P. Thitimajshima, “Near Shannon Limit Error Correcting Coding and Decoding: TurboCodes”, Proceedings ICC93, pp. 10641070. 2. R. G. Gallager, LowDensity ParityCheck Codes, M.I.T. Press, Cambridge, Massachusetts, 1963. 3. A. Jimenez and K. Sh. Zigangirov, “Periodic TimeVarying Convolutional Codes with LowDensity ParityCheck Matrices”, IEEE Trans. on Inform. Theory, vol. IT45, no. 5, Sept. 1999. 4. K. Engdahl and K. Sh. Zigangirov, “On the Statistical Theory of TurboCodes”, Proceedings ACCT98, pp. 108111. 5. K. Engdahl and K. Sh. Zigangirov, “On LowDensity ParityCheck Convolutional Codes”, Proceedings WCC99, pp. 379392. 6. K. Engdahl and K. Sh. Zigangirov, “On the Theory of LowDensity Convolutional Codes I”, Probl. Peredach. Inform., vol. 35, no. 4, OctNovDec 1999. 7. R. Johannesson and K. Sh. Zigangirov, Fundamentals of Convolutional Coding, IEEE Press, 1999. 8. K. Engdahl, M. Lentmaier and K. Sh. Zigangirov, “On the Theory of LowDensity Convolutional Codes II”, in preparation.
(2,4)
(2.5,5)
−2
−2
10
burst error probability
burst error probability
10
−4
10
−4
10
−6
10
−6
10
1
2
3 Eb/N0 in dB
4
1
5
(3,6)
−2
2 Eb/N0 in dB
2.5
3
2.5
3
(4,8)
−2
10
burst error probability
10
burst error probability
1.5
−4
10
−6
10
−4
10
−6
10
−8
10 −8
10
1
1.5
2 Eb/N0 in dB
2.5
3
1
1.5
2 Eb/N0 in dB
Fig. 4. Class A: burst error probabilities. The solid lines show (from top to bottom) simulation results for ms = 129, 257, 513, 1025, 2049, 4097. The union bound (dasheddotted) and the expurgated bound (dotted) are shown for ms = 129.
On the Theory of LowDensity Convolutional Codes memory 3 basic code
memory 2 basic code −2
−2
10
burst error probability
burst error probability
85
−4
10
10
−4
10
−6
10
−6
10
1
0
1
0
3
2 Eb/N0 in dB
memory 4 basic code
3
2 Eb/N0 in dB
memory 5 basic code
−2
10 10
burst error probability
burst error probability
−2
−4
10
−6
10
−4
10
−6
10
−8
10 −8
10
1
0
1
0
3
2 Eb/N0 in dB
3
2 Eb/N0 in dB
Fig. 5. Class B: burst error probabilities. The solid lines show (from top to bottom) simulation results for ms = 1024, 4096, 16384. The union bound (dasheddotted) is shown for ms = 1024. (2,4)
(2.5,5)
−2
bit error probability
bit error probability
−2
10
−4
10
−4
10
−6
−6
10
10
1
2
3 Eb/N0 in dB
4
10
5
1
1.2
(3,6)
−2
1.8
1.6
1.8
−2
bit error probability
bit error probability
1.6
(4,8)
10
−4
10
−6
10
1.4 Eb/N0 in dB
10
−4
10
−6
1
1.2
1.4 Eb/N0 in dB
1.6
1.8
10
1
1.2
1.4 Eb/N0 in dB
Fig. 6. Class A: bit error probabilities. The solid lines show (from top to bottom) simulation results for ms = 129, 257, 513, 1025, 2049, 4097.
86
Karin Engdahl, Michael Lentmaier, and Kamil Sh. Zigangirov
memory 2 basic code
0
bit error probability
bit error probability 0
0.5
1 Eb/N0 in dB
1.5
−4
10
10
2
memory 4 basic code
0
0
0.5
1 Eb/N0 in dB
1.5
2
memory 5 basic code
0
10
bit error probability
10
bit error probability
−2
10
−6
−5
10
memory 3 basic code
0
10
10
−2
10
−4
10
−2
10
−4
10
−6
10 −6
10
0
0.5
1 Eb/N0 in dB
1.5
2
0
1 0.5 E /N in dB b
1.5
0
Fig. 7. Class B: bit error probabilities. The solid lines show (from top to bottom) simulation results for ms = 1024, 4096, 16384.
On the Distribution of Nonlinear Recursive Congruential Pseudorandom Numbers of Higher Orders Frances Griﬃn1, Harald Niederreiter2 , and Igor E. Shparlinski3 1
2
Department of Computing, Macquarie University Sydney, NSW 2109, Australia fgriffin@mpce.mq.edu.au Institute of Discrete Mathematics, Austrian Academy of Sciences Sonnenfelsgasse 19, A–1010 Vienna, Austria niederreiter@oeaw.ac.at 3 Department of Computing, Macquarie University Sydney, NSW 2109, Australia igor@mpce.mq.edu.au
Abstract. The nonlinear congruential method is an attractive alternative to the classical linear congruential method for pseudorandom number generation. In this paper we present a new type of discrepancy bound for sequences of stuples of successive nonlinear multiple recursive congruential pseudorandom numbers of higher orders. In particular, we generalize some recent results about recursive congruential pseudorandom numbers of ﬁrst order.
1
Introduction
In this paper we study some distribution properties of pseudorandom number generators deﬁned by a recurrence congruence modulo a prime p of the form un+1 ≡ f(un , . . . , un−m+1 ) (mod p),
n = m − 1, m, . . . ,
(1)
with some initial values u0, . . . , um−1 , where f(X1 , . . . , Xm ) is a rational function of m variables over the ﬁeld IFp of p elements. We also assume that 0 ≤ un < p, n = 0, 1, . . .. Composite moduli have also been considered in the literature, but we will restrict our attention to prime moduli. It is obvious that the sequence (1) eventually becomes periodic with some period t ≤ pm . Throughout this paper we assume that this sequence is purely periodic, that is, that un = un+t beginning with n = 0, otherwise we consider a shift of the original sequence. These nonlinear congruential generators provide a very attractive alternative to linear congruential generators and, especially in the case m = 1, have been extensively studied in the literature (see [4,15] for surveys). Although linear congruential generators are widely used, they have several wellknown deﬁciencies, Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 87–93, 1999. c SpringerVerlag Berlin Heidelberg 1999
88
Frances Griﬃn, Harald Niederreiter, and Igor E. Shparlinski
such as machinedependent upper bounds on the period length and an unfavorable lattice structure (see [9,12,13,15]). On the other hand, nonlinear congruential generators, and especially inversive congruential generators, tend to have a lesser amount of intrinsic structure and are thus preferable in this respect. Furthermore, by working with recurrences of order m > 1 in the nonlinear congruential method, we can overcome the restriction in ﬁrstorder recurrences that the period length cannot exceed the modulus. For these reasons, it is of interest to study nonlinear congruential generators of higher orders. When m = 1, for sequences of the largest possible period t = p, a number of results about the distribution of the fractions un/p in the interval [0, 1) and, more generally, about the distribution of the points un un+s−1 ,..., (2) p p in the sdimensional unit cube [0, 1)s have been obtained in the case where n runs through the full period, n = 0, . . . , p − 1. Many of these results are essentially best possible. We refer to [4,5,8,9,10,13,14,15,16,17,18,19] for more details and precise references to original papers. The case of periods t < p is of interest as well. Quite recently, in the series of papers [8,16,17,18,19] a new method has been introduced and successfully applied to the case m = 1. In the present paper we show that the same method works for nonlinear generators of arbitrary order m ≥ 1. In particular, we obtain rather weak but nontrivial bounds on the discrepancy of the points (2) when n runs over a part of the full period. In the very special case where f(X) = X e , that is, for the power generator , an alternative approach has been proposed in [6]. This approach, although it has produced quite strong results for the power generator, cannot be extended to other nonlinear generators. Moreover, a combination of the approach of [8,16,17,18,19] and the present work with some additional considerations has been used in [7] to improve the results of [6].
2
Definitions and Auxiliary Results
For a sequence of N points N
Γ = (γ1,n , . . . , γs,n )n=1 of the halfopen interval [0, 1)s , denote by ∆Γ its discrepancy, that is, TΓ (B) ∆Γ = sup − B , N B⊆[0,1)s where TΓ (B) is the number of points of the sequence Γ which hit the box B = [α1 , β1 ) × . . . × [αs , βs ) ⊆ [0, 1)s
(3)
Distribution of Nonlinear Recursive Congruential Pseudorandom Numbers
89
and the supremum is taken over all such boxes. For an integer vector a = (a1 , . . . , as ) ∈ ZZ s we put a = max ai ,
r(a) =
i=1,...,s
s
max{ai , 1}.
(4)
i=1
We need the Erd¨ os–Tur´ an–Koksma inequality (see Theorem 1.21 of [3]) for the discrepancy of a sequence of points of the sdimensional unit cube, which we present in the following form. Lemma 1. There exists a constant Cs > 0 depending only on the dimension s such that, for any integer L ≥ 1, for the discrepancy of a sequence of points (3) the bound s N 1 1 1 ∆ Γ < Cs + exp 2πi aj γj,n L N r(a) n=1 j=1 0<a≤L holds, where a, r(a) are defined by (4) and the sum is taken over all integer vectors a = (a1 , . . . , as ) ∈ ZZ s with 0 < a ≤ L. The currently best value of Cs is given in [1]. We put e(z) = exp(2πiz/p). Our second main tool is the Weil bound on exponential sums (see [2] and Chapter 5 of [11]) which we present in the following form. Lemma 2. For any nonconstant polynomial F (X1 , . . . , Xm ) ∈ IFp [X1 , . . . , Xm ] of total degree D we have the bound p e (F (x1 , . . . , xm )) < Dpm−1/2 . x1 ,...,xm =1
3
Discrepancy Bound
Let the sequence (un ) generated by (1) be purely periodic with an arbitrary period t. For an integer vector a = (a0 , . . . , as−1 ) ∈ ZZ s we introduce the exponential sum N −1 s−1 Sa (N ) = e aj un+j . n=0
j=0
We estimate these sums, and thus the discrepancy of corresponding sequences, for polynomials which satisfy a certain special property which is described below.
90
Frances Griﬃn, Harald Niederreiter, and Igor E. Shparlinski
We say that a polynomial f(X1 , . . . , Xm ) ∈ IFp [X1 , . . . , Xm ] has a dominating term if it is of the form dm + f(X1 , . . . , Xm ) = ad1 ...dm X1d1 . . . Xm
d 1 −1
...
i1 =0
d m −1
im ai1 ...im X1i1 . . . Xm
im =0
with some integers d1 ≥ 1, d2 ≥ 0, . . . , dm ≥ 0 and coeﬃcients ai1 ...im ∈ IFp with ad1 ...dm = 0. Theorem 1. If the sequence (un), given by (1) generated by a polynomial f(X1 , . . . , Xm ) ∈ IFp [X1 , . . . , Xm ] of total degree d ≥ 2 and having a dominating term, is purely periodic with period t and t ≥ N ≥ 1, then the bound
Sa (N ) = O N 1/2 pm/2 log−1/2 p max gcd(a0 ,...,as−1,p)=1
holds, where the implied constant depends only on d and s. Proof. Select any a = (a0 , . . . , as−1 ) ∈ ZZ s with gcd(a0 , . . . , as−1 , p) = 1. It is obvious that for any integer k ≥ 0 we have N −1 s−1 Sa (N ) − e aj un+k+j ≤ 2k. n=0 j=0 Therefore, for any integer K ≥ 1, KSa (N ) ≤ W + K 2 , where
N −1 K−1 N −1 K−1 s−1 s−1 e aj un+k+j ≤ e aj un+k+j . W = n=0 k=0 n=0 k=0 j=0 j=0
Deﬁne the sequence of polynomials fk (X1 , . . . , Xm ) ∈ IFp [X1 , . . . , Xm ] by the recurrence relation fk (X1 , . . . , Xm ) = f (fk−1 (X1 , . . . , Xm ), . . . , fk−m (X1 , . . . , Xm )) , k = 1, 2, . . ., where fk (X1 , . . . , Xm ) = X1−k , k = −m + 1, . . . , 0. It is easy to see that fk is a nonconstant polynomial of total degree at most dk and that un+k = fk (un, . . . , un−m+1 ), k = 1, 2, . . .. Accordingly, we obtain 2 N −1 K−1 s−1 e a f (u , . . . , u ) W2 ≤ N j k+j n n−m+1 n=0 k=0 j=0 2 K−1 s−1 ≤N e aj fk+j (w1 , . . . , wm ) j=0 w1 ,...,wm ∈Fp k=0 K−1 s−1 =N e aj (fk+j (w1 , . . . , wm ) − fl+j (w1 , . . . , wm )) . k,l=0 w1 ,...,wm ∈Fp
j=0
Distribution of Nonlinear Recursive Congruential Pseudorandom Numbers
91
If k = l, then the inner sum is trivially equal to pm . There are K such sums. Because f has a dominating term, the total degree of the polynomials fν grows strictly monotonically with ν = 1, 2, . . .. Therefore we can apply Lemma 2 to the inner sum, getting the upper bound dK+s−2 pm−1/2 for at most K 2 sums. Hence, W 2 ≤ KNpm + dK+s−2 K 2 Npm−1/2 and so
1/2 Sa (N ) = O K −1 KNpm + dK+s−2 K 2 Npm−1/2 +K
= O K −1/2 N 1/2 pm/2 + d(K+s−2)/2 N 1/2 p(m−1/2)/2 + K .
log p K = 0.4 . log d
Select
Then after simple calculations we obtain the desired result.
Let Ds (N ) denote the discrepancy of the points (2) for n = 0, . . . , N − 1. Theorem 2. If the sequence (un), given by (1) generated by a polynomial f(X1 , . . . , Xm ) ∈ IFp [X1 , . . . , Xm ] of total degree d ≥ 2 and having a dominating term, is purely periodic with period t and t ≥ N ≥ 1, then the bound
Ds (N ) = O N −1/2 pm/2 log−1/2 p (log log p)s holds, where the implied constant depends only on d and s. Proof. The statement follows from Lemma 1, taken with L = N 1/2 p−m/2 log1/2 p , and the bound of Theorem 1.
4
Remarks
The results of Theorems 1 and 2 are nontrivial only for suﬃciently large values of N , namely for t ≥ N ≥ pm log−1+ε p with some ﬁxed ε > 0, and it would be very important to extend the range of N for which Ds (N ) = o(1). We believe that Theorems 1 and 2 hold for much more general polynomials, not necessarily only polynomials with a dominating term. Obtaining such an extension would be very important. We note that for univariate polynomials this condition is automatically satisﬁed, thus our results indeed generalise those of [17]. It would be very interesting to extend the results of this paper to the case of nonlinear generators with rational functions f(X1 , . . . , Xm ) ∈ IFp (X1 , . . . , Xm ).
92
Frances Griﬃn, Harald Niederreiter, and Igor E. Shparlinski
At the beginning of this paper we have mentioned two special generators in the case m = 1, namely the inversive generator and the power generator, for which essentially stronger results than in the general case are known, see [8,16,19] and [6,7], respectively. It is desirable to understand what are the analogues of these special generators in the case m ≥ 2 and extend the results of [6,7,8,16,19] to these analogues. Finally we remark that our method works for generators modulo a composite number as well. But one should expect weaker results because instead of the very powerful Weil bound one will have to use bounds on exponential sums with composite denominator which are essentially weaker, see [20].
References 1. T. Cochrane, ‘Trigonometric approximation and uniform distribution modulo 1’, Proc. Amer. Math. Soc., 103 (1988), 695–702. 2. P. Deligne, ‘Applications de la formule des traces aux sommes trigonom´etriques’, Lect. Notes in Mathematics, SpringerVerlag, Berlin, 569 (1977), 168–232. 3. M. Drmota and R.F. Tichy, Sequences, discrepancies and applications, SpringerVerlag, Berlin, 1997. 4. J. EichenauerHerrmann, E. Herrmann and S. Wegenkittl, ‘A survey of quadratic and inversive congruential pseudorandom numbers’, Lect. Notes in Statistics, SpringerVerlag, Berlin, 127 (1998), 66–97. 5. M. Flahive and H. Niederreiter, ‘On inversive congruential generators for pseudorandom numbers’, Finite Fields, Coding Theory, and Advances in Communications and Computing (G.L. Mullen and P.J.S. Shiue, eds.), Marcel Dekker, New York, 1993, 75–80. 6. J. B. Friedlander, D. Lieman and I. E. Shparlinski, ‘On the distribution of the RSA generator’, Proc. SETA ’98 (C.S. Ding, T. Helleseth and H. Niederreiter, eds.), SpringerVerlag, Singapore, (to appear). 7. J. B. Friedlander and I. E. Shparlinski, ‘On the distribution of the power generator’, Preprint, 1999. 8. J. Gutierrez, H. Niederreiter and I. E. Shparlinski, ‘On the multidimensional distribution of inversive congruential pseudorandom numbers in parts of the period’, Monatsh. Math., (to appear). 9. D. Knuth, The art of computer programming, Vol. 2, 3rd ed., AddisonWesley, Reading, MA, 1998. 10. R. Lidl and H. Niederreiter, ‘Finite ﬁelds and their applications’, Handbook of Algebra (M. Hazewinkel, ed.), Vol. 1, Elsevier, Amsterdam, 1996, 321–363. 11. R. Lidl and H. Niederreiter, Finite fields, Cambridge University Press, Cambridge, 1997. 12. G. Marsaglia, ‘The structure of linear congruential sequences’, Applications of Number Theory to Numerical Analysis (S.K. Zaremba, ed.), Academic Press, New York, 1972, 249–285. 13. H. Niederreiter, Random number generation and quasi–Monte Carlo methods, SIAM, Philadelphia, 1992. 14. H. Niederreiter, ‘Finite ﬁelds, pseudorandom numbers, and quasirandom points’, Finite Fields, Coding Theory, and Advances in Communications and Computing (G.L. Mullen and P.J.S. Shiue, eds.), Marcel Dekker, New York, 1993, 375–394.
Distribution of Nonlinear Recursive Congruential Pseudorandom Numbers
93
15. H. Niederreiter, ‘New developments in uniform pseudorandom number and vector generation’, Lect. Notes in Statistics, SpringerVerlag, Berlin, 106 (1995), 87–120. 16. H. Niederreiter and I. E. Shparlinski, ‘On the distribution of inversive congruential pseudorandom numbers in parts of the period’, Preprint, 1998. 17. H. Niederreiter and I. E. Shparlinski, ‘On the distribution and lattice structure of nonlinear congruential pseudorandom numbers’, Finite Fields and Their Appl., 5 (1999), 246–253. 18. H. Niederreiter and I. E. Shparlinski, ‘On the distribution of pseudorandom numbers and vectors generated by inversive methods’, Preprint, 1999. 19. H. Niederreiter and I. E. Shparlinski, ‘Exponential sums and the distribution of inversive congruential pseudorandom numbers with primepower modulus’, Acta Arith., (to appear). 20. S. B. Steˇckin, ‘An estimate of a complete rational exponential sum’, Trudy Mat. Inst. Steklov., 143 (1977), 188–207 (in Russian).
A New Representation of Boolean Functions Claude Carlet1 and Philippe Guillot2 1
INRIA Projet CODES, BP 105, 78153 Le Chesnay Cedex, France and GREYC, Universit´e de Caen, France 2 ThomsonCSF Communication, 66 rue du Foss´e blanc 92231 Gennevilliers Cedex, France
Abstract. We study a representation of Boolean functions (and more generally of integervalued / complexvalued functions), not used until now in coding and cryptography, which yields more information than the currently known representations, on the combinatorial, spectral and cryptographic properties of the functions. Keywords: Boolean function, Fourier spectrum, cryptography.
1
Introduction
Let n be any positive integer. From a cryptographic and coding theoretic point of view, we are interested in Boolean (i.e. {0, 1}valued) functions deﬁned on the set F2 n of all binary words of length n. This set is viewed as an F2 vector space of dimension n (the addition in this vectorspace is mod 2 and will be denoted by ⊕). Since {0, 1} can be viewed either as a subset of Z or as the ﬁeld F2 , we need to distinguish between the addition in Z (denoted by +, the sum of r several terms b1 , · · · , br will then be denoted by i=1 bi ) and the addition r in F2 (denoted by ⊕, the sum of several terms b1 , · · · , br will be denoted by i=1 bi ). The basic Boolean functions are the aﬃne functions: f(x1 , · · · , xn ) = a1 x1 ⊕ · · · ⊕ an xn ⊕ a0 = a · x ⊕ a0 where a = (a1 , · · · , an ) ∈ F2 n and a0 ∈ F2 . The expression a · x = a1 x1 ⊕ · · · ⊕ an xn ⊕ a0 is the usual inner product in F2 n . The set of all the aﬃne functions is the ReedMuller code of length 2n and order 1 (a Boolean function is identiﬁed with the 2n long binary word of its values, assuming that some order on F2n is chosen). Important parameters of Boolean functions are: – the Hamming weight: w(f) is the size of the support of f, i.e. of the set {a ∈ F2 n  f(a) = 1}; the Hamming distance between two functions f and g is d(f, g) = w(f ⊕ g); – the Fourierspectrum, i.e. the data of all the values of the Fourier transform of f: f(a) = x∈F2 n f(x) (−1)a·x . The related Fourier spectrum of the function χf = (−1)f = 1 − 2f is equal to χ f = 1 − 2f = 2n δ0 − 2f, where δ0 is the Dirac symbol at the allzero word (recall that, in general, δa (x) equals 1 if x = a and 0 otherwise).
AMS classiﬁcation numbers: 06E30, 11T23, 94A60.
Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 94–103, 1999. c SpringerVerlag Berlin Heidelberg 1999
A New Representation of Boolean Functions
95
– the nonlinearity Nf of f, i.e. the Hamming distance between f and the set of aﬃne functions. We have Nf = 2n−1 − 12 maxa∈F2 n  χf (a); The Boolean functions whose nonlinearity is maximum are called bent functions (cf. [9]). They are interesting from a coding point of view, since they correspond to the words of length 2n whose distance to the ReedMuller code of order 1 is equal to the covering radius of this code. The bent functions on F2 n with n even have extra properties which make them also interesting from a cryptographic point of view. They are called perfect nonlinear [6] and characterized by the fact that, for any nonzero word a, the Boolean function x → f(x) ⊕ f(x ⊕ a) is balanced (i.e. takes the values 0 and 1 equally often). We describe now the representations of Boolean functions currently used in cryptography and in coding theory. The truthtable of a Boolean function f is the onedimensional table, indexed by the elements of F2 n (some order being chosen), whose entry at a ∈ F2 n is f(a). The advantage of this representation is its simplicity together with the fact that the weight of f is directly computed from w(f) = a∈F2 n f(a). But it has important drawbacks: – it does not give any information on the algebraic degree of the function and on the number of terms in its algebraic normal form (see below); – the aﬃne functions have the same complexity as any other function; – it does not directly characterize perfect nonlinear functions. The Algebraic Normal Form A.N.F. (cf. [7]): n ui f(x1 , · · · , xn ) = au xi ; au ∈ F2 u∈F2 n
(1)
i=1
is the sum, modulo 2, of monomials in which each of the n binary variables n has degree at most 1. For simplicity, we shall write xu instead of i=1 xi ui . The algebraic degree of f is the global degree of its A.N.F. The A.N.F. can be computed from the truth table with a complexity O(n 2n ) by a butterﬂy algorithm (cf. [4]); conversely, the truth table can be computed from the A.N.F. with the same complexity. We have au = xu f(x), where x = (x1 , . . . , xn )
u = (u1 , . . . , un ) ⇐⇒ ∀i ∈ {1, . . . , n} xi ≤ ui . The qualities of the A.N.F. are the following: – it leads to an important class of linear codes: the ReedMuller codes; – the complexity of the A.N.F. of a function is coherent with its algebraic complexity, i.e. its implementation with and/xor gates; but it has also some drawbacks: – there exists no simple formula which computes the weight of a function, given its A.N.F.; the only known formula, cf. [10], page 124 is complex; – a fortiori, there exists no simple formula which computes the Fourier spectrum or the nonlinearity of a function, given its A.N.F.; – it does not directly characterize perfect nonlinear functions.
96
Claude Carlet and Philippe Guillot
The Fourier spectrum deﬁned above can be also considered as a representation. It can be computed from the truth table by a butterﬂy algorithm with complexity O(n 2n ) (cf. [4]). Its qualities are the following: – the weight of f is directly given by w(f) = f(0); – the aﬃne functions are characterized by the fact that one of the values of the Fourier spectrum of χf is equal to ±2n and all the others are null; – perfect nonlinear functions are directly characterized by means of the Fourier spectrum of χf : f is perfect nonlinear if and only if, for every word a ∈ F2 n , n the value at a of the Fourier transform of χf is equal to ±2 2 . But this representation has at least two important drawbacks: – there is no simple characterization of the fact that a list (µa )a∈F2 n of integers corresponds to the Fourier spectrum of a Boolean function; – the algebraic degree cannot be directly deduced from the Fourier spectrum.
2
The Numerical Normal Form
This representation of Boolean functions (and more generally of complexvalued functions) is similar to the Algebraic Normal Form, but with numerical coefﬁcients. It has been used in the context of circuit complexity. We study this representation in a more systematic way which allows direct formulae for the weight and the Fourier spectrum. It leads to a characterization of perfect nonlinear functions and to divisibility properties of the weight. Boolean functions are characterized with a single formula by means of this representation. The overcost of such an improvement is the storage of 2n integers in the range {−2n , · · · , 2n } instead of 2n bits. Deﬁnition 1. Let f be a complexvalued function on F2 n . We call Numerical Normal Form (N.N.F.) of f, the following expression of f as a polynomial : n
ui f(x1 , · · · , xn ) = λu xi λu xu , λu ∈ C. (2) = u∈F2 n
i=1
u∈F2 n
Proposition 1. Any complexvalued function admits a unique Numerical Normal Form. Proof. The set of all complexvalued functions on F2 n , and the set of all polynomials over C in n variables such that every variable has degree at most 1 are vectorspaces of dimension 2n over C. The mapping which maps any such polynomial P (x1 , · · · , xn ) to the complexvalued function f : a ∈ F2 n → P (a1 , · · · , an ) is linear. It is enough to show that this linear mapping is surjective. For every x ∈ F2 n , the polynomial expression of the Dirac function δa (x) is equal to the product, when i ranges over {1, · · · , n} of xi if ai = 1, and of 1 − xi if ai = 0. By expanding this product, we deduce the existence of a polynomial representing δa . A polynomial representing any complexvalued function is then obtained by using the equality f(x) = a∈F2 n f(a) δa (x).
A New Representation of Boolean Functions
3 3.1
97
Relations between N.N.F. and other Representations N.N.F. and Truth Table
According to the deﬁnition of the N.N.F., the truth table of f can be recovered n from its N.N.F. by the formula f(a) = ua λu , ∀a ∈ F2 . Conversely, it is possible to derive an explicit formula giving the N.N.F. of the function by means of its truth table. Proposition 2. Let f be any complexvalued function on F2 n . For every u ∈ F2 n , the coeﬃcient λu of the monomial xu in the N.N.F. of f is:
λu = (−1)w(u) (−1)w(a)f(a) a∈F2 n  au
where w(u) denotes the Hamming weight of the word u (i.e. the number of nonzero coordinates). Proof. Since we have f(x) = a∈F2 n f(a) δa (x), it is enough to prove that the coeﬃcient of xu in the N.N.F. of the Dirac symbol δa is (−1)w(u)−w(a) if a u and 0 otherwise. by supp(a) Denoting the set {i = 1, · · · , n  ai = 1}, the value of δa (x) is equal to i∈supp(a) xi i∈ / supp(a) (1 − xi ) . Let Ia be the complement of supp(a) w(v) vi in {1, · · · , n}, we have i∈Ia (1 − xi ) = v∈F2 Ia (−1) i∈Ia xi , and the result holds, denoting by u the word whose support equals the union of those of a and v. We give now a butterﬂy algorithm for computing the coeﬃcients λu . The truth table of a function f, in n variables, is the concatenation of the truth tables of the two functions, in (n − 1) variables: f0 : (x2 , . . . , xn ) → f(0, x2 , . . . , xn ) f1 : (x2 , . . . , xn ) → f(1, x2 , . . . , xn ). Let f0 (x) = u∈F n−1 µu xu and f1 (x) = u∈F n−1 νuxu be respectively the 2 2 N.N.F.’s of f0 and f1 . It is easy to deduce from proposition 2 that λ(0,u2,... ,un) = µ(u2,... ,un) and λ(1,u2,... ,un ) = ν(u2,... ,un ) − µ(u2,... ,un ) . This leads to the F.F.T.like algorithm which computes the N.N.F. of any function with a complexity of n2n−1 elementary substractions: procedure NNF(f) for i ← 0 to n − 1 do b ← 0; repeat for x ← b to b + 2i − 1 do f[x + 2i ] ← f[x + 2i ] − f[x]; end for; b ← b + 2i+1 ; until b = 2n ; end for; end procedure
98
Claude Carlet and Philippe Guillot
In this algorithm, the function f is given by an array of 2n numbers (integer, real or complex), and thevalue of f at vector u = (u1 , . . . , un ) is the kth entry n of this array, where k = i=1 ui 2n−i−1 = u1 2n−1 + · · · + un . 3.2
N.N.F. and A.N.F.
Given the N.N.F. of a Boolean function f, the A.N.F. of f is simply equal to its N.N.F. mod 2. Conversely, the coeﬃcients of the N.N.F. can be recovered from those of the A.N.F. This uses the wellknown Poincar´e’s formula which computes the xor of a set of binary values by using usual arithmetic operations. Lemma 1 ((Poincar´ e formula)). Let a1 , . . . , am be elements of the set {0, 1}. The following formula holds: m
ai =
i=1
m
(−2)k−1
k=1
ai1 · · · aik .
1≤i1 <···
In the following theorem, the condition “{u1 , . . . , uk }  u1 ∨ · · · ∨ uk = u” means that the words u1 , . . . , uk are all distinct, in indeﬁnite order, and that the union of their supports is equal to the support of the word u. Theorem 1. Let f(x) = u∈F2 n λu xu be the N.N.F. of a Boolean function f n on F2 . Let au = λu mod 2 be the coeﬃcient of xu in its A.N.F. Then, one can retrieve λu from the av ’s with the following relation: n
λu =
2
(−2)k−1
k=1
au1 · · · auk .
(3)
{u1 ,... ,uk }  u1 ∨···∨uk =u
Proof. For every value of x, the value of f(x) is a xor of the 2n terms au xu . By applying Poincar´e’s formula, and by noticing that for all vectors u and v, one has xu xv = xu∨v , we obtain: 2n 2n
k−1 u1 ∨···∨uk k−1 f(x) = (−2) au1 ···auk x = (−2) au1 ···auk xu . k=1
u∈F2 n k=1
{u1 ,... ,uk }
{u1 ,... ,uk }  u1 ∨···∨uk =u
The latter expression is the unique N.N.F. of f and the result holds. Remark: Denoting {u1 , . . . , uk } by G and identifying any binary word u with the corresponding monomial xu , we can interpret theorem 1 as follows: let F be the set of all the monomials with coeﬃcient 1 in the A.N.F. of f; we consider all the nonempty subsets G of F and, for each of them, we denote by G the size of G, and by supp(G) the set of all the indices of those variables involved in G. Then relation (3) is equivalent to
λu = (−2)G−1 . G⊂F ; G=∅; supp(G)=supp(u)
A New Representation of Boolean Functions
99
3.3
N.N.F. and Fourier Spectrum u Let f(x) = u∈F2 n λu x be any Boolean (or complexvalued) function. For n every word a ∈ F2 , we have:
f(a) = f(x) (−1)a·x = λu xu (−1)a·x = λu (−1)a·x . x∈F2 n
x∈F2 n u∈F2 n
u∈F2 n x∈F2 n  ux
Changing x into x = x ⊕ (1, · · · , 1), we yield: f(a) =
u∈F2
n
λu
(−1)a·x =
u∈F2
n
x∈F2  xu
n
λu
(−1)a·x+w(a).
n
x∈F2  xu
Since the set {x ∈ F2 n  x u} is a (n − w(u))dimensional vectorspace, and since its orthogonal is {a ∈ F2 n  a u}, the sum x∈F2 n  xu (−1)a·x is equal to 2n−w(u) if a u and to 0 otherwise. Thus:
= (−1)w(a) f(a) 2n−w(u)λu . (4) u∈F2 n  au
In particular: w(f) = f(0) =
u∈F2
2n−w(u)λu .
(5)
n
Remark: Thanks to relations (3) and (5), the weight of a Boolean function can be expressed by means of the coeﬃcients of its A.N.F.: w(f) =
u∈F2 n
2
n
n−w(u)
2
k=1
k−1
(−2)
au1 · · · auk .
{u1 ,... ,uk }  u1 ∨···∨uk =u
This latter relation can be written w(f) = G⊂F ; G =∅ 2ν(G) (−2)G−1 , where F and G are deﬁned as in the remark following Theorem 1 and where ν(G) denotes the number of variables which are not involved in G. We obtain in this way the formula given in [10], page 124. We deduce, similarly:
χ f (a) = 2n δ0 (a) + (−1)w(a)+1 2n−w(u)+1λu . (6) u∈F2 n  au
Conversely, we can express the N.N.F. coeﬃcients of a Boolean function by means ofits Fourier coeﬃcients. According to Proposition 2, λu is equal to (−1)w(u) a∈F2 n  au (−1)w(a) f(a). Using the inverse Fourier transform, we get: λu =
(−1)w(u)
(−1)w(u)
w(a)+a·x (−1) (−1)w(a)+a·x. f (x) = f(x) 2n a∈F n 2n n n a∈F n 2 au
x∈F2
x∈F2
2 au
100
Claude Carlet and Philippe Guillot
w(a) + a · x mod 2 is linear on the vectorspace {a ∈ Since the function a → F2 n  a u}, the sum a∈F2 n  au (−1)w(a)+a·x is nonzero if and only if this linear function is null on this vectorspace. Thus a∈F2 n  au (−1)w(a)+a·x is equal to 2w(u) if u x, and to 0 otherwise. Finally:
λu = 2−n (−2)w(u)
f(x).
(7)
x∈F2 n  ux
Notice that, according to relations 6 and 7, the N.N.F. degree of a function is equal to the maximum weight of an element belonging to the Fourier transform support.
4
4.1
Characterization of the N.N.F. of Boolean Functions among those of IntegerValued Functions; Examples of such N.N.F. Characterization of Boolean Functions
From condition y ∈ {0, 1} ⇐⇒ y 2 = y, applied to all the values of f, we get: u Proposition 3. The polynomial u∈F2 n λu x ; λu ∈ Z (or R or C) is the N.N.F. of a Boolean function if and only if
∀u ∈ F2 n , λu =
v,v ∈F2 n
λv λv .
 u=v∨v
Since this condition has to be satisﬁed by the 2n vectors of F2n , it has high computational complexity. But there an integervalued is a simpler condition: function f is Boolean if and only if x∈F2 n f 2 (x) = x∈F2 n f(x). Thus: u Proposition 4. The polynomial λu ∈ Z is the N.N.F. of a u∈F2 n λu x ; Boolean function if and only if
u∈F2
4.2
2n−w(u) n
v,v ∈F2 n
 u=v∨v
λv λv =
u∈F2
2n−w(u)λu .
n
N.N.F. of Aﬃne Functions
Let f(x) = a · x ⊕ ε be any aﬃne function (a ∈ F2 n , ε ∈ {0, 1}). Using relation (3), we yield: (−1)ε (−2)w(u)−1 if u a and u = 0 λu = ε if u = 0 0 otherwise.
A New Representation of Boolean Functions
4.3
101
N.N.F. of Quadratic Functions
It can be shown that λu is equal to 0 or to ±1 if u = 0, and to 0 or to ±2i , w(u)−2 ≤ i ≤ w(u) − 1 otherwise. Conversely, this condition implies that f with 2 is quadratic, since we have au = λu mod 2 = 0 for every word of weight greater than 2. Il is well known that there exist n2 +1 orbits of the set of quadratic functions under the actions of composition with any aﬃne isomorphism and addition of any aﬃne function (cf. [7]). It is possible to read directly on the N.N.F. which orbit contains a given function. 4.4
N.N.F. of Symmetric Functions
Let r ∈ {0, · · · , n} and let f be the Boolean function whose support is the set of all the words of weight r in F2 n . According to Proposition 2, the coeﬃcient of u ∈ F2 n in the N.N.F. of f is λu = (−1)w(u)−r w(u) . Any symmetric function r (i.e. any function f(x1 , · · · , xn ) invariant under permutation of the variables) is equal to a sum of functions of this form.
5 5.1
Properties Deduced from the N.N.F. A Characterization of Perfect Nonlinear Functions
We shall need ﬁrst to show a property of the N.N.F. concerning the dual f n f (a) = 2 2 (χf)(a). Usof a perfect nonlinear function f, deﬁned on F2 n by χ 1−χ n ing relation (6) and equality f = 2 f , we have f(a) = 12 − 2 2 −1 δ0 (a) + n w(a) −w(u) 2 λu . Changing u into u in this latter relation, we (−1) u∈F2 n  au 2 obtain the N.N.F. of f by expanding the following relation: n n
n = 1 x(0,··· ,0) − 2 n2 −1 f(x) (1 − xi ) + (−1)w(x) 2w(u)− 2 λu (1 − xi )ui . 2 n i=1 i=1 u∈F2
(8) This implies that, for every u = 0, u = (1, · · · , 1), the coeﬃcient of xu in the n N.N.F. of f is divisible by 2w(u)− 2 . Proposition 5. Let f(x) = u∈F2 n λu xu be the N.N.F. if a Boolean function f on F2 n . Then f is perfect nonlinear if and only if it satisﬁes: n 1. for every u such that n2 < w(u) < n, the coeﬃcient λu is divisible by 2w(u)− 2 ; n n 2. λ(1,··· ,1) is congruent with 2 2 −1 mod 2 2 . Proof. According to Lemma 1 of [1], f is perfect nonlinear if and only if, for ≡ 2 n2 −1 mod 2 n2 . Thus, according to relation (4), conditions every a ∈ F2 n , f(a) 1. and 2. are suﬃcient for a Boolean function f to be perfect nonlinear. Conversely, assume that f is perfect nonlinear. The observation above on the N.N.F. of the dual of a perfect nonlinear function, applied to f (whose dual is f) shows that condition 1 is necessary. Condition 2 is also necessary since f(1, · · · , 1) = (−1)n λ(1,··· ,1) (from relation 4).
102
5.2
Claude Carlet and Philippe Guillot
Divisibility Properties of the N.N.F. Coeﬃcients
We ﬁrst show that, in expression (3), some terms are null, depending on the degree of f and of the number of maximal degree monomials in its A.N.F. of f. Proposition 6. Let f be a Boolean function of algebraic degree d; let r be the number of monomials of degree d in the A.N.F. of f. Then, for every u ∈ F2 n , the coeﬃcient λu of xu in its N.N.F. is equal to: n
2
λu = k=max(
w(u) d
,
(−2)k−1 w(u)−r d−1
)
au1 · · · auk .
(9)
{u1 ,... ,uk }  u1 ∨···∨uk =u
Proof. Let {u1, . . . uk } be a set of k distinct vectors of F2 n . If w(u1 ∨ · · · ∨ uk ) > w(u) kd, then exists an index j such that w(uj ) > d. Thus, for k < d , i.e. for there
k≤
w(u) d
− 1, every term of sum (3) has a null factor auj with w(uj ) > d.
Moreover, if w(u1 ∨ · · · ∨ uk ) > rd + (k − r)(d − 1) = k(d − 1) + r, then w(u)−r there exists an index j such that w(uj ) > d. Thus, for k < d−1 , i.e. for w(u)−r k≤ − 1, every term of sum (3) has a null factor auj with w(uj ) > d. d−1 Corollary 1. Under the same hypothesis as in Proposition 6, the coeﬃcient λu w(u) w(u)−r of xu in its N.N.F. is a multiple of 2 d −1 and of 2 d−1 −1 . Mc Eliece theorem on cyclic codes and Ax Theorem imply the following result on Boolean functions (cf. [7], page 447): Proposition 7. Let f be a Boolean function of degree d, then the weight of f n is a multiple of 2 d −1 . Corollary 1 and relation (5) give a new proof of this result. Indeed, according to Corollary 1 each term of the sum in relation (5) is multiple of w(u) w(u) n−w(u)+ d −1 − 1 is minimum for w(u) = n. 2 . And n − w(u) + d
We know that the bound given by Proposition 7 is tight. However, Corollary 1 improves upon it for the functions which have few monomials of highest degree in their A.N.F.: Proposition 8. Let f be a Boolean function of degree d. Assume that the numn ber of monomials of degree d in its A.N.F. is r < nd . Then n−r d−1 > d and the n−r weight of f is a multiple of 2 d−1 −1 . Proof. According to Corollary 1, each term of the sum in relation (5)is a multiple w(u) w(u)−r w(u) of 2n−w(u)+ d −1 and of 2n−w(u)+ d−1 −1 . We have n−w(u)+ d −1 ≥ w(u)−r n − w(u) + − 1 if and only if w(u) ≤ rd and it is a simple matter to d−1 w(u) w(u)−r check that max(n − w(u) + d − 1, n − w(u) + d−1 − 1) is minimum for w(u) = n and the result holds.
A New Representation of Boolean Functions
6
103
Conclusion
Since integervalued functions are characterized by an integervalued N.N.F. and thanks to Proposition 4, the N.N.F. representation allows more easily to construct a general Boolean function with prescribed combinatorial properties such as bentness or high nonlinearity. It was not possible before as Boolean functions are not easily characterized with the Fourier transform, and combinatorial properties are not directly ensured from the truth table or A.N.F. representation.
References 1. C. Carlet. Generalized Partial Spreads, IEEE Transactions on Information Theory vol 41 (1995) 14821487 2. C. Carlet and P. Guillot. A characterization of binary bent functions, Journal of Combinatorial Theory, Series A, Vol. 76, No. 2 (1996) 328335 3. J. F. Dillon. Elementary Hadamard Diﬀerence sets, Ph. D. Thesis, Univ. of Maryland (1974). 4. C.J.A. Jansen. Investigation on Nonlinear Streamcipher Systems: Construction and Evaluation Methods Philips (1989). 5. J.P.S. Kung. Source Book in Matro¨ıd Theory, Birkha¨ user (1986). 6. Meier, W. and O. Staﬀelbach. Nonlinearity Criteria for Cryptographic Functions, Advances in Cryptology, EUROCRYPT’ 89, Lecture Notes in Computer Science 434, 549562, Springer Verlag (1990). 7. F. J. Mac Williams and N. J. Sloane. The theory of errorcorrecting codes, Amsterdam, North Holland 1977. 8. GianCarlo Rota. On the foundations of Combinatorial Theory; Springer Verlag (1964) ; reprint in [5]. 9. O. S. Rothaus. On bent functions, J. Comb. Theory, 20A (1976) 300 305. 10. J. H. Van Lint. Coding Theory, Springer Verlag 201 (1971).
An Algorithm to Compute a Nearest Point in the Lattice A∗n I. Vaughan and L. Clarkson Department of Electrical and Electronic Engineering The University of Melbourne, Parkville, Victoria 3052, Australia v.clarkson@ee.unimelb.edu.au
Abstract. The lattice A∗n is an important lattice because of its covering properties in low dimensions. Conway and Sloane [3] appear to have been the first to consider the problem of computing the nearest lattice point in A∗n . They developed and later improved [4] an algorithm which is able to compute a nearest point in O n2 arithmetic steps. In this paper, a new algorithm is developed which is able to compute a nearest point in O(n log n) steps.
1
Introduction
The study of point lattices is of great importance in several areas of number theory, particularly the studies of quadratic forms, the geometry of numbers and simultaneous Diophantine approximation, and also to the practical engineering problems of quantisation and channel coding. They are also important in studying the sphere packing problem and the kissing number problem [5]. Let us now deﬁne what is meant by the term ‘point lattice’. Definition 1. Consider a set B = {b1 , b2 , . . . , bn } of linearly independent points in IRm , m n. The set Λ = {a1 b1 + a2 b2 + · · · + an bn  a1 , a2 , . . . , an ∈ ZZ} is a (point) lattice of rank n in IRm and B is a basis of Ω. The lattice that is studied in this article, known as ‘A∗n ’ following the notation of Conway and Sloane or sometimes known as ‘Voronoi’s principal lattice of the ﬁrst type’, is remarkable because of its covering properties in low dimensions [5]. The author’s interest in the lattice arises from his work in the engineering problem of pulse train deinterleaving [1,2]. The lattice A∗n reduces to the hexagonal lattice when n = 2 and to the bodycentred cubic lattice when n = 3. This is illustrated in Fig. 1. The computational problem of ﬁnding a nearest lattice point to a given point is the particular problem of interest here. van Emde Boas [7] showed that
This work was supported by the Australian Research Council under Grant S499721.
Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 104–120, 1999. c SpringerVerlag Berlin Heidelberg 1999
An Algorithm to Compute a Nearest Point in the Lattice A∗n
105
Fig. 1. Examples of the lattice A∗n for n = 2 (the hexagonal lattice) and n = 3 (the bodycentred cubic lattice).
the problem is NP complete under certain conditions when the lattice itself, or rather a basis thereof, is considered as an additional input parameter. For speciﬁc lattices, the problem is considerably easier. The problem of computing the nearest lattice point in A∗n was ﬁrst studied by Conway and Sloane [3]. By decomposing the lattice A∗n into the ﬁnite superposition of translations of its dual lattice An , the authors discovered for computing the nearest an algorithm lattice point to a given point in O n2 log n arithmetic operations. [4], Later they were able to improve the execution time of the algorithm to O n2 steps. In this paper, a new algorithm is presented which is able to compute a nearest lattice point in A∗n to a given point in O(n log n) arithmetic steps. After discussing some mathematical preliminaries in Sect. 2, we investigate in Sect. 3 the socalled Voronoi regions and relevant sets of vectors in lattices and, particularly, in A∗n . A Voronoi region surrounds each lattice point, and any point within this region is closer to the lattice point at the region’s centre than to any other lattice point. The relevant vectors are those lattice vectors which can be used to deﬁne the boundary of the Voronoi region. Having found a relevant set of vectors for A∗n , we ﬁnd it convenient in Sect. 4 to deﬁne three notions of proximity of a given point to a lattice point. These notions allow us to decompose the algorithm into three smaller algorithms, discussed in turn in Sects. 5–7, each of which calculates a successively closer lattice point to the given point until at last a nearest lattice point has been found.
106
2
I. Vaughan and L. Clarkson
Mathematical Preliminaries
In this section, we will brieﬂy discuss the notation and terminology used throughout the sequel. Firstly, we note that we will, on occasion and without notice, use elements of IRm as if they were column vectors (elements of IRm×1 ). We will use the notation IPm to refer to the set of (elementwise) permutations of the point (1, 2, . . . , m) ∈ ZZ m . We also deﬁne the parameterised set σ(·) as follows. Definition 2. We define the parameterised set σ(·) such that if x ∈ IRm and s ∈ σ(x) ⊆ IPm then xs1 xs2 . . . xsm . Finally, it is necessary to deﬁne the lattice A∗n . Definition 3. The lattice A∗n is the lattice of rank n in IRn+1 whose basis vectors are any n columns of the matrix n −1 · · · −1 −1 n · · · −1 1 (1) 11T = . . . B=I− . . n+1 .. .. . . .. −1 −1 · · · n Remark 1. The matrix B is a symmetric projection matrix, which is to say that B = B T and B 2 = B. Remark 2. Notice that bi · bj =
n n + 1 −1 n+1
if i = j, (2) otherwise,
and, more generally, that x · bj = xj
(3)
for any x ∈ BIRn+1. Remark 3. The lattice A∗n is constituted of vectors in a vector space whose dimension, n + 1, exceeds the rank of the lattice, n. Remark 4. In Figs. 1 and 2, A∗n is represented with respect to the hyperplane BIRn+1 on which the lattice vectors lie. That is, representative coordinates of the lattice points in IRn are generated by premultiplication by QT where Q ∈ IR(n+1)×n is a matrix whose columns form an orthonormal basis of BIRn+1 . In the case of A∗3 , this is followed by an orthographic projection into IR2 .
An Algorithm to Compute a Nearest Point in the Lattice A∗n
3
107
Relevant Sets of Lattice Vectors
Consider a lattice Λ in IRm and a norm · . Around the origin, there is a region V consisting of points which are closer to the origin than to any other lattice point. That is, we deﬁne V as the set V = {x ∈ IRn ; x x − v ∀v ∈ Λ} . Such a region is called the Voronoi cell of the origin. The whole of IRm can be tessellated by translating these cells by the lattice vectors. Therefore, an algorithm to ﬁnd the nearest lattice point to a given point can be interpreted as an algorithm to ﬁnd the particular translation of the Voronoi cell to which the given point belongs. Consider the Voronoi cell of the origin. It is a convex polytope with faces that lie in the hyperplanes at the midway points along the lines connecting the origin to nearby lattice points. The set of vectors which deﬁne the faces are the (Voronoi)relevant vectors of the lattice. Definition 4. For a lattice Λ in IRm , a set of nonzero lattice vectors R is relevant if, for every x ∈ IRm which satisfies x
1 2
x − r
(4)
for every r ∈ R, this same inequality is also satisfied for every r ∈ Λ. In other words, a relevant set contains lattice vectors such that, for any point which is closer to the origin than to any of the vectors in the set, that point is closer to the origin than to any other lattice point. A straightforward implication of the deﬁnition is that a lattice point v ∈ Λ is closest to some point u ∈ IRm if (4) is satisﬁed with x = u − v for each lattice point r in a relevant set. For the Euclidean norm, we note that (4) is equivalent to x · x (x − r) · (x − r) = x · x − 2x · r + r · r and, after cancellation and rearrangement of terms, it is equivalent to x·r
1 2
r
2
.
(5)
Let us now turn our attention to the lattice A∗n with the Euclidean norm. Henceforth, we will use the notation · exclusively to denote the Euclidean norm of its argument. Consider the set
m n+1 R= bsi ; s ∈ IP ,1 m n . (6) i=1
We will show that this set is relevant to A∗n .
108
I. Vaughan and L. Clarkson
Fig. 2. Examples of the Voronoi regions of the lattice A∗n for n = 2 (a hexagon) and n = 3 (a truncated octahedron). Lemma 1. Every element v ∈ A∗n can be expressed in the form v=
j n
c j bs i ,
(7)
j=1 i=1
where the cj ∈ ZZ, cj 0, j = 1, 2, . . . , n and s ∈ IPn+1 . Proof. We can express any v with respect to the basis vectors b1 , b2 , . . . , bn in the obvious way, which is to say that v=
n
aj bj ,
j=1
where the aj ∈ ZZ, j = 1, 2, . . . , n. Clearly, we can extend the summation to n + 1 terms, so that v=
n+1
aj bj
j=1
with the coeﬃcient an+1 = 0. Now, choose s ∈ σ(−a), so that as1 as2 . . . asn+1 . Then, using the fact that bsn+1 = −bs1 − bs2 − · · · − bsn ,
An Algorithm to Compute a Nearest Point in the Lattice A∗n
109
we ﬁnd that n
v=
asj − asn+1 bsj
j=1 j n
=
asj − asj+1 bsi .
j=1 i=1
Therefore, with cj = asj − asj+1 , we satisfy (7).
Lemma 2. If v, w ∈ A∗n can be expressed as v=
p
bs i ,
w=
i=1
q
bs i ,
i=1
where s ∈ IPn+1 , 1 p n and 1 q n then v · w > 0. Proof. Suppose p q. Then, v·w = =
q p
bs i · bs j
i=1 j=1 p
bs i · bs i +
i=1
=
q
bs i · bs j
j=1 j=i q
p 1 n − n + 1 j=1 n + 1 i=1 j=i
p(n + 1 − q) = >0 . n+1 Since we can use the labels v and w arbitrarily, we conclude that v · w > 0, regardless of whether p q.
Theorem 1. The set R as defined in (6) is relevant to A∗n . Proof. We prove the theorem statement by working directly from the deﬁnition of a relevant set of lattice vectors in Deﬁnition 4. Consider some x ∈ IRn+1 which satisﬁes (5) for every r ∈ R. Consider also some v ∈ A∗n . From Lemma 1, we can write v=
n j=1
cj w j
110
I. Vaughan and L. Clarkson
where each cj 0, j = 1, 2, . . . , n and wj =
j
bs i
i=1
where s ∈ IPn+1. Thus, w j ∈ R. Now, our assumption that (5) is satisﬁed implies that x·v =
n
cj x · w j j=1 n 2 1 cj w j 2 j=1
.
(8)
Furthermore, 2
v =
n n
ci cj w i · w j
i=1 j=1
=
n j=1 n
2
c2j wj + 2
n i−1
ci cj w i · w j
i=2 j=1
c2j wj
2
(9)
j=1
because w i · w j > 0 from Lemma 2. From (8) and (9) and bearing in mind that cj 0, we have x·v
1 2
v
2
,
as required.
4
Degrees of Proximity of a Lattice Point
In order to explain the workings of subsequent algorithms, we ﬁnd it convenient to deﬁne degrees of proximity of a lattice point in A∗n with respect to another point in BIRn+1. Firstly, we note the following fact. Theorem 2. Consider some x ∈ IRn+1. If v ∈ A∗n is a closest point to Bx with respect to the Euclidean norm, where B is defined in (1), then it is also a closest point to x. Proof. The proof follows from a simple decomposition of the vector x into orthogonal components. Consider a lattice point w ∈ A∗n . We have 2
2
x − w = (x − Bx) + (Bx − w)
.
An Algorithm to Compute a Nearest Point in the Lattice A∗n
111
Now, w can be expressed as w = Bz, where z ∈ ZZ n+1 . Thus, 2
x − w = (x − Bx) + B(x − z)
2
.
Consider the inner product (x − Bx) · (Bx − w). We have (x − Bx) · (Bx − w) = xT B(x − z) − xT B T B(x − z) = 0 , since B T B = B. Clearly, then, 2
2
x − w = x − Bx + Bx − w 2
2
2
x − Bx + Bx − v = x − v
2
,
as required.
From this theorem, we see that it is suﬃcient to consider only the nearest lattice points to points on the plane BIRn+1. We now deﬁne three degrees of proximity to a lattice point in A∗n with respect to points in BIRn+1 . Definition 5. Consider a lattice point v ∈ A∗n and a point y ∈ BIRn+1 . Let δ = y − v. The lattice point v is αclose to y if δi − δj  1
(10)
for all i, j = 1, 2, . . . , n + 1. The lattice point v is βclose to y if δi 
1 2
(11)
for all i = 1, 2, . . . , n + 1 and it is γclose if m m(n + 1 − m) δsi 2(n + 1)
(12)
i=1
for all m = 1, 2, . . . , n and s ∈ IPn+1. Theorem 3. If v ∈ A∗n is γclose to a point y ∈ BIRn+1 then v is a nearest lattice point to y. Proof. From (3), we ﬁnd that, with δ = y − v, m
δsi = δ · w = −δ · w
i=1
where s ∈ IPn+1, w=
m i=1
bs i
and
w = −w =
n+1 i=m+1
bs i .
112
I. Vaughan and L. Clarkson
Clearly, both w and w are elements of R as deﬁned in (6). Furthermore, it is easily conﬁrmed that w = w = 2
2
m(n + 1 − m) . n+1
Therefore, satisfaction of the inequality (12) for any particular value of s is equivalent to simultaneous satisfaction of δ·w
1 2
w
2
and
δ · w
1 2
w
2
(13)
for the corresponding vector w. Satisfaction of (12) for all values of s ∈ IPn+1 is then equivalent to satisfaction of (13) for all w ∈ R. However, from Theorem 1, we know that R is relevant to A∗n and so the origin is the closest lattice point to δ. By extension, v is the closest lattice point to y when v is γclose to y.
We now describe three algorithms. The ﬁrst algorithm takes as input a point x ∈ IRn+1 and, in O(n) arithmetic steps, outputs a point z ∈ ZZ n+1 such that Bz is αclose to Bx. The second algorithm and third algorithms both take a point x ∈ IRn+1 and a point z ∈ ZZ n+1 as inputs and, after O(n log n) operations, output a new value of z. For the second algorithm, the inputs z and x are assumed to be such that Bz and Bx are αclose and, on output, they are βclose. For the third algorithm, the input z and x are assumed to be such that Bz and Bx are βclose and, on output, they are γclose. From Theorem 3, we see that the application of these algorithms in series results in an algorithm which ﬁnds a nearest lattice point in A∗n to an input point in O(n log n) steps.
5
The First Algorithm
We will now set out to prove that the algorithm described below, which takes as its input some x ∈ IRn+1 , outputs a vector z ∈ ZZ n+1 such that the lattice point Bz is αclose to Bx. In the following, we will use the notation · to denote a function which returns a nearest integer to its real argument. Algorithm 1. 1 2 3 4 5 6
begin for i := 1 to n + 1 do zi := xi ; od; output(z); end.
Proposition 1. If x ∈ IRn+1 is input to Algorithm 1 then a point z ∈ ZZ n+1 is output such that Bz is αclose to Bx.
An Algorithm to Compute a Nearest Point in the Lattice A∗n
113
Proof. If we write y = Bx and v = Bz then we have yj − vj = (xj − xj ) − Now, − 12 xi − xi <
1 2
n+1 1 (xi − xi ) . n + 1 i=1
so we conclude that yj − vj  1. Furthermore,
(yj − vj ) − (yk − vk ) = (xj − xj ) − (xk − xk ) and so (yj − vj ) − (yk − vk ) 1. Thus, v is αclose to y, as required.
6
The Second Algorithm
In this section, we will show that Algorithm 2, listed below, given inputs x ∈ IRn+1 and z ∈ ZZ n+1 such that Bz is αclose to Bx, outputs a new value for z, say z , such that Bz is βclose to Bx. Before setting out the algorithm, we deﬁne the use of two functions, project and sortindices. The function project takes an input, say, u ∈ IRn+1 and returns Bu. Note that this function can be calculated in O(n) steps. We see this by observing that the ith element of Bu is ui − µ where µ is the average of the elements of u. The function sortindices takes as input a vector, say, u ∈ IRn+1 and returns an element s of σ(u), which is to say that it returns a vector s such that us1 us2 . . . usn+1 . This function requires O(n log n) arithmetic operations [6]. Algorithm 2. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
begin y := project(x); v := project(z); δ := y − v; s := sortindices(δ); m := 0; m while δsm+1 < n+1 − 12 do m := m + 1; zsm := zsm − 1; od; m := n + 1; m while δsm > n+1 − 12 do zsm := zsm + 1; m := m − 1; od; output(z); end.
114
I. Vaughan and L. Clarkson
Lemma 3. Let v be a lattice point of A∗n and let y be a point in BIRn+1. If v is αclose to y then, with δ = y − v, there exists a permutation s ∈ IPn+1 and an integer 0 m n + 1 such that δs2 . . . δsm ηm ηm − 1 δs1 δsm+1 δsm+2 . . . δsn+1 ηm + 1
(14)
where m 1 − . (15) n+1 2 Proof. We will prove the lemma by contradiction. We begin by choosing a value of s ∈ σ(δ), for if we did not, we could not possibly satisfy (14). However, suppose there is no value of m which will satisfy (14), even with s ∈ σ(δ). With the value m = 0 or m = n + 1, this implies that there is some index i such that δi  > 12 . Note that, because v is αclose to y, there does not exist a pair of indices i and j such that ηm =
δi < − 12
δj >
and
1 2
because this would contradict (10) of Deﬁnition 5. So assume, without loss of generality, that there exists some index i such that δi < − 12 . This implies that δs1 < − 12 and that δsn+1 < 12 . Now, we will show that, even though (14) is not satisﬁed for m = 0 or m = n + 1, there must exist some value of m with 1 m n such that the slightly weaker condition δs1 δs2 . . . δsm ηm δsm+1 δsm+2 . . . δsn+1
(16)
is satisﬁed. If this were not the case then, by applying an inductive argument from our assumption that δs1 < η0 , we would have δsi < ηi−1 for each i = 1, 2, . . . , n. However, this would imply that n+1 i=1
δsi = δsn+1 +
n
δsi <
1 2
+
i=1
n
ηi−1 =
i=1
1 1 − 0 n+1 2
whereas this sum should be zero since δ ∈ BIRn+1. Therefore, although we assume that (14) is not satisﬁed for any value of m and, as a result, we further assume that δs1 < − 12 , we have concluded that at least (16) must hold for some m with 1 m n. However, that this implies δs1 ηm δsn+1 and, because v is αclose to y, we have δsn+1 − δs1 1 so that ηm −1 δs1 ηm δsn+1 ηm +1. Finally, this implies (14), contradicting our initial assumption.
Theorem 4. Let v be a lattice point of A∗n , y be a point in BIRn+1 and let δ = v−w. If v is αclose to y then there exists some s ∈ IPn+1 and 0 m n+1 such that (14) and (15) is satisfied and, moreover, v − w is βclose to y, where w=
m i=1
bs i .
(17)
An Algorithm to Compute a Nearest Point in the Lattice A∗n
Proof. It can be readily checked that m 1 − n+1 w si = m − n+1
115
if i m, (18) otherwise.
With reference to the inequalities of (14), we then ﬁnd that the values of the elements of the diﬀerence = y − v + w = δ + w satisfy the inequality !i  12 for all i = 1, 2, . . . , n + 1. Therefore, v − w is βclose to y.
Proposition 2. If x ∈ IRn+1 and z ∈ ZZ n+1 are input to Algorithm 2 and Bz is αclose to Bx then, after O(n log n) arithmetic operations, a new value of z is output such that Bz is βclose to Bx. Proof. When line 6 of Algorithm 2 is reached, we have computed the values y = Bx, v = Bz, δ = y − v and a permutation s ∈ σ(δ). As discussed previously, the execution time to this point is dominated by the sorting operation, which requires O(n log n) arithmetic operations to complete. First of all, suppose δi  12 for all i = 1, 2, . . . , n + 1. That is, we consider the case where v is already βclose to y. Clearly, neither of the while loops on lines 7–10 or 12–15 will be entered and the value of z will be unchanged. Therefore, in this case, the output z is identical to the input z and so the output Bz is βclose to Bx, as required. Suppose there exists some index i such that δi  > 12 . That is, either δs1 < − 12 or δsn+1 > 12 . These inequalities cannot both be satisﬁed at once because v is αclose to y. This implies that exactly one of the while loops on lines 7–10 and 12–15 will be entered. Suppose the while loop which is entered is the ﬁrst one, on lines 7–10. This is equivalent to the supposition that δs1 < − 12 and δsn+1 < 12 . The while loop continues while δsm+1 < ηm , incrementing m at the end of each loop. Lemma 3 guarantees us that the loop will terminate with 1 m n, since (14) is not satisﬁed for m = 0 or m = n + 1. Thus, the loop requires O(n) steps to complete. When it terminates, (16) will be satisﬁed with 1 m n. As discussed in the proof of Lemma 3, the fact that δs1 ηm δsn+1 implies that (14) is satisﬁed for that value of m. Furthermore, by the end of the loop, the new value of z — and let us denote this new value as z — diﬀers from the input value by −1 in a given element whenever its index is in the set {s1 , s2 , . . . , sm }. That is, Bz = Bz −
m
bs i .
(19)
i=1
From Theorem 4, Bz is then βclose to Bx. Finally, suppose instead that it is the second while loop on lines 12–15 that is entered, which implies that δsn+1 > 12 and δs1 > − 12 . The argument follows along similar lines to those employed when the ﬁrst while loop is entered. We are assured that the loop will terminate with 1 m n, thus requiring
116
I. Vaughan and L. Clarkson
O(n) operations to complete. In turn, this implies that (16) will be satisﬁed with 1 m n and that, as a result, (14) will also be satisﬁed. When the loop terminates, the new value of z which has been computed, say z , diﬀers from the input value by 1 in a given element whenever its index is in the set {sm+1 , sm+2 , . . . , sn+1 }. That is, n+1
Bz = Bz +
bs i .
i=m+1
However, substituting the identity n+1
bs i = −
i=m+1
n
bs i ,
i=1
we ﬁnd we again have (19) and so, from Theorem 4, Bz is βclose to Bx.
7
The Third Algorithm
Having presented algorithms which compute an αclose lattice point in A∗n to an input point in BIRn+1 and, given an αclose point, produce one which is βclose, we now present an algorithm that can produce a γclose point from one which is βclose. Algorithm 3. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
begin y := project(x); v := project(z); δ := y − v; s := sortindices(δ); m := 0; t := 0; τ := 0; for i := 1 to n do i−1 n t := t + δsi + n+1 − 2(n+1) ; if t < τ then τ := t; m := i; fi od; for i := 1 to m do zsi := zsi − 1; od; output(z); end.
Lemma 4. A point v ∈ A∗n is γclose (a nearest lattice point) to y ∈ BIRn+1 if and only if m i=1
(δsi − pi ) 0
(20)
An Algorithm to Compute a Nearest Point in the Lattice A∗n
117
for all m = 1, 2, . . . , n + 1, where δ = y − v, s ∈ σ(δ) and p=
1 (−n, −n + 2, . . . , n − 2, n) . 2(n + 1)
(21)
Proof. First of all, let us prove that (20) is a necessary condition for v to be γclose to y. Noting that m
pi = −
i=1
m(n + 1 − m) , 2(n + 1)
we see that if (12) of Deﬁnition 5 is satisﬁed then (20) must be satisﬁed also. Having proved necessity, let us now prove suﬃciency. For any r ∈ IPn+1 and 1 m n + 1, we use the fact that the sum of any m elements of δ must be greater than or equal to the sum of the m smallest elements to obtain m
δri
i=1
m
δsi
m
i=1
pi = −
i=1
m(n + 1 − m) . 2(n + 1)
Similarly, using the fact that the sum of any m elements of δ must be less than or equal to the sum of the m largest elements, we ﬁnd that m i=1
δri
n+1
δsi = −
n+1−m
i=n+2−m
δsi −
n+1−m
i=1
i=1
pi =
m(n + 1 − m) . 2(n + 1)
Thus, we ﬁnd that if (20) is satisﬁed then, for any r ∈ IPn+1 and 1 m n, m m(n + 1 − m) δri 2(n + 1) i=1
and, since this is simply (12) of Deﬁnition 5, v is γclose to y.
Lemma 5. Consider a lattice point v ∈ A∗n and a point y ∈ BIRn+1 . Let δ = y − v and s ∈ σ(δ). Suppose v is βclose to y. If v = v − w where w=
m
bs i
(22)
i=1
for some 1 m n then, with δ = y − v , there exists some s ∈ σ(δ ) for which s1 , s2 , . . . , sn+1 = (sm+1 , sm+2 , . . . , sn+1 , s1 , s2 , . . . , sm ) . (23) That is, there exists some s ∈ σ(δ ) which is the elementwise left rotation of s by m positions.
118
I. Vaughan and L. Clarkson
Proof. Since the form of w in (22) is identical to that in (17), it follows from (18) that δs i δs j whenever 1 i j m and whenever m + 1 i j n + 1. Furthermore, since v is βclose to y, we know that δs1 − 12 and δsn+1 12 . Hence, δs 1 = δs1 + 1 −
m 1 m − n+1 2 n+1
δs n+1 = δsn+1 −
m 1 m − . n+1 2 n+1
and
This implies that δs m+1 δs m+2 . . . δs n+1 δs 1 δs 2 . . . δs m
which in turn implies (23), as required.
Corollary 1. Consider a lattice point v ∈ A∗n and a point y ∈ BIRn+1. Let δ = y − v, s ∈ σ(δ) and = (δs1 − p1 , δs2 − p2 , . . . , δsn+1 − pn+1)
(24)
where p is defined in (21). Suppose v is βclose to y. If v = v − w where w is given in (22) then, with δ = y − v , there exists some s ∈ σ(δ ) such that, when = (δs 1 − p1 , δs 2 − p2 , . . . , δs n+1 − pn+1) , is the elementwise left rotation of by m positions. Proof. From Lemma 5, we have, when 1 i n + 1 − m, m δs = δs i+m = δsi+m − i n+1 and so !i = δs i − pi = δsi+m −
2(m + i − 1) − n = δsi+m − pi+m = !i+m . 2(n + 1)
Similarly, when n + 1 − m < i n + 1, we have δs = δs i+m−n−1 = δsi+m−n−1 + 1 − i
m n+1
which implies that !i = δs i − pi = δsi+m−n−1 −
2(i + m − n − 2) − n = δsi+m−n−1 − pi+m−n−1 2(n + 1)
= !i+m−n−1 and hence is the elementwise left rotation of by m positions.
An Algorithm to Compute a Nearest Point in the Lattice A∗n
Lemma 6. Consider a point ∈ BIRn+1. If m = arg
min
j=1,2,...,n+1
j !i
119
(25)
i=1
and is the elementwise left rotation of by m positions then j
!i 0
(26)
i=1
for all 1 j n + 1. Proof. If 1 j n + 1 − m then j i=1
!i
=
m+j m !i = !i − !i 0 .
m+j i=m+1
i=1
i=1
On the other hand, if n + 1 − m < j n + 1 then we ﬁnd that j+m−n−1 n+1 j+m−n−1 m j !i = !i + !i = !i − !i 0 i=1
i=1
i=m+1
and so (26) is true for any 1 j n + 1.
i=1
i=1
Theorem 5. Consider a lattice point v ∈ A∗n and a point y ∈ BIRn+1. Let δ = y − v, s ∈ σ(δ) and let be as defined in (24). Suppose v is βclose to y. If m is defined according to (25) then v = v − w is γclose to y, where w is defined according to (22). Proof. The proof follows directly from application of Lemma 6, Corollary 1 and Lemma 4.
Proposition 3. If x ∈ IRn+1 and z ∈ ZZ n+1 are input to Algorithm 3 and Bz is βclose to Bx then, after O(n log n) arithmetic operations, a new value of z is output such that Bz is γclose to Bx. Proof. The proposition is proved merely by observing that Algorithm 3 expresses in a programmatic way the construction of a γclose point given in Theorem 5. Speciﬁcally, the ﬁrst for loop on lines 7–10 calculates the value of m as deﬁned in (25), albeit that the value of m is restricted to lie in the range 0 m n rather than 1 m n + 1 — a variation that has no mathematical implications. The second for loop on lines 11–13 constructs the new value of z, say z , such that v = Bz = v − w, where w is deﬁned as in (22). In terms of the amount of calculation required, both for loops require only O(n) arithmetic steps to complete and so the total execution time is dominated by the execution of the sortindices procedure, which requires O(n log n) operations.
120
I. Vaughan and L. Clarkson
References 1. I. Vaughan L. Clarkson. Approximation of Linear Forms by Lattice Points with Applications to Signal Processing. PhD thesis, The Australian National University, 1997. 2. I. Vaughan L. Clarkson, Stephen D. Howard, and Iven M. Y. Mareels. Parameter estimation and association from pulse timeofarrival data. Submitted to Signal Process., 1997. 3. J. H. Conway and N. J. A. Sloane. Fast quantizing and decoding algorithms for lattice quantizers and codes. IEEE Trans. Inform. Theory, IT–28(2):227–232, March 1982. 4. J. H. Conway and N. J. A. Sloane. Soft decoding techniques for codes and lattices, including the Golay code and the Leech lattice. IEEE Trans. Inform. Theory, IT– 32(1):41–50, January 1986. 5. J. H. Conway and N. J. A. Sloane. Sphere Packings, Lattices and Groups. SpringerVerlag, Berlin, 1988. 6. Donald E. Knuth. The Art of Computer Programming, volume 3 (Sorting and Searching). AddisonWesley, Reading, Ma., 1973. 7. P. van Emde Boas. Another N P complete partition problem and the complexity of computing short vectors in a lattice. Technical Report 81–04, Mathematisch Instituut, Roetersstraat 15, 1018 WB Amsterdam, The Netherlands, April 1981.
Sequences from Cocycles Kathy J. Horadam RMIT University, Melbourne, VIC 3001, Australia horadam@rmit.edu.au
Abstract. Binary and quaternary sequences with perfect periodic autocorrelation, and perfect nonlinear pm ary sequences, are both shown to equate to orthogonal coboundaries — the simplest class of orthogonal cocycles. We consider doublyindexed sequences deﬁned by cocycles. We give a new construction — a generalised multiplication — of orthogonal cocycles and show it gives perfect nonlinear sequences for parameters where 1dimensional PN sequences cannot exist.
1
Introduction
Sequences with desirable correlation or distribution properties are much sought for use in signal transmission, optical imaging and encryption. Much eﬀort has been expended on constructing and classifying sequences which are optimal or nearly optimal with respect to some measure of merit. The sequence is frequently regarded as a mapping from an index set G (such as the modular group Zv or the ﬁnite ﬁeld Fpm ) to a sequence set C (typically taking binary, complex or pm ary values). The measure of merit can be oﬀpeak correlation, or uniform distribution amongst sequence values, or nonlinearity. Here we observe that sequences with perfect periodic autocorrelation and perfect nonlinear sequences both equate to orthogonal coboundaries — the simplest class of a set of functions called orthogonal cocycles which are 2dimensional on G. This allows us to generalise from 1dimensional to 2dimensional mappings and hence consider sequences deﬁned by cocycles, which have improved performance over the optimal 1dimensional functions against these ﬁgures of merit. Cocycles are mappings ψ : G × G → C, where G and C are ﬁnite groups with C abelian, which satisfy a particular quasiassociative equation (1). They arise naturally in the topology of surfaces, in quantum dynamics, in projective representation theory, and in combinatorial design theory, as well as in the cohomology theory of groups. Increasingly, links have been found between the optimal 1dimensional sequences and diﬀerence sets [2,3,15]. In the 2dimensional case, there is a precise link: we know each orthogonal cocycle is equivalent to a semiregular central relative diﬀerence set, and conversely [17]. Semiregular relative diﬀerence sets are of interest precisely because of their good Hamming correlation properties for FHMA. For example, perfect binary
This work was supported by Australian Research Council Grant A49701206.
Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 121–130, 1999. c SpringerVerlag Berlin Heidelberg 1999
122
Kathy J. Horadam
arrays are equivalent to splitting abelian relative (4u2 , 2, 4u2 , 2u2)diﬀerence sets, and Jedwab’s generalised perfect binary arrays are equivalent to abelian relative (4t, 2, 4t, 2t)diﬀerence sets [13, Theorem 3.2]. Kumar’s ideal matrices for FHMA communications systems are the twodimensional characteristic functions of relative (v, v, v, 1)diﬀerence sets in Z2v [16], and hence are rare. In this paper we ﬁrstly give a new construction (Theorem 2) of orthogonal cocycles. Secondly, we show that orthogonal coboundaries are the same as 1dimensional sequences with perfect periodic autocorrelation and with perfect nonlinearity. We generalise these measures to 2dimensional sequences of cocycle values and prove that the distribution of values of a cocyclic sequence is invariant (Theorem 3) under the relevant equivalence operations on the cocycle. We prove that there are many 2dimensional sequences with perfect nonlinearity (Example 5) for parameters where 1dimensional PN functions cannot exist, and give constructions for AlmostPN and ∆nonlinear cocyclic sequences (Corollary 3). Earlier, the author and Perera [10] introduced a very general description of cocyclic codes in order to demonstrate the previously unrecognised (and wellhidden) presence of cocycles in several code construction techniques. Category I of these codes comprised those constructed from a cocyclic generalised Hadamard matrix. Many standard constructions of (generalised) Hadamard matrices are in fact cocyclic, and in [9] many wellknown codes are shown to be Category I. Since (generalised) Hadamard matrices determine nonlinear codes which meet the Plotkin bound, and orthogonal cocycles are equivalent to cocyclic generalised Hadamard matrices [17], the new orthogonal cocycles identiﬁed here also determine optimal nonlinear codes.
2
Cocycles and Generalised Hadamard Matrices
Throughout, G will be a ﬁnite group of order v and C will be a ﬁnite abelian group of order w. A (2dimensional) cocycle is a mapping ψ : G × G → C satisfying the cocycle equation ψ(g, h) ψ(gh, k) = ψ(g, hk) ψ(h, k), ∀g, h, k ∈ G.
(1)
This implies ψ(g, 1) = ψ(1, h) = ψ(1, 1), ∀g, h ∈ G, so we follow standard usage and consider only normalised cocycles, for which ψ(1, 1) = 1. Each cocycle ψ determines a central extension 1 → C → Eψ → G → 1 of C by G, in which the group Eψ of order vw consists of the set of ordered pairs {(a, g) : a ∈ C, g ∈ G} with multiplication (a, g)(b, h) = (abψ(g, h), gh),
(2)
and the image C × {1} of C lies in the centre of Eψ . A cocycle is naturally displayed as a Gcocyclic matrix; that is, a square matrix Mψ whose rows and columns are indexed by the elements of G under some ﬁxed ordering, and whose entry in position (g, h) is ψ(g, h). If ψ is symmetric (ψ(g, h) = ψ(h, g) always), Mψ is a symmetric matrix. We write Mψ = [ψ(g, h)]g,h∈G .
(3)
Sequences from Cocycles
123
Definition 1. A cocycle is a coboundary ∂φ if it is derived from a set mapping φ : G → C having φ(1) = 1 by the formula ∂φ(g, h) = φ(g)−1 φ(h)−1 φ(gh). Two cocycles ψ and ψ are cohomologous if there exists a coboundary ∂φ such that ψ = ψ · ∂φ. Example 1. If G = g : g v = 1 ∼ = Zv and a is an element of order n in C, then ψ(g i , g j ) = aij , for all 0 ≤ i, j ≤ v − 1, is a symmetric cocycle and Mψ is a Vandermonde matrix. If n = v, Mψ is the matrix of the Discrete Fourier Transform (or the MattsonSolomon polynomial). Example 2. If G = Zn2 and C = {±1} ∼ = Z2 then ψ(u, v) = (−1)u·v , for all u, v ∈ G, is a symmetric cocycle and Mψ is the Sylvester Hadamard matrix of order 2n . Example 3. If V is a ﬁnitedimensional vector space over a ﬁeld F and ψ is a bilinear form on V then ψ : (V, +) × (V, +) → F is a cocycle. We will term an additivelywritten abelian group “distributive” if it also carries a distributive multiplication. That is, a distributive group is an abelian group G = (G, +) (with identity 0) having a second binary operation · such that ∀g, h, k ∈ G, g · (h + k) = g · h + g · k and (h + k) · g = h · g + k · g. (We write g · h = gh when the context is clear.) Such distributive groups include ﬁnite rings and presemiﬁelds (where (G, ·) is a quasigroup; that is, for any nonzero g, h ∈ G there are unique solutions in G to gx = h and to yg = h.) In particular, Galois ﬁelds, Galois rings and semiﬁelds (such as Dickson commutative semiﬁelds) are distributive groups. If a distributive group is a ﬁnite commutative semiﬁeld but not a ﬁeld, the only ﬁeld property which does not hold is associativity of multiplication, and (see [1, VI.8.4], [14, p.269]), G is an elementary abelian group of order pa ≥ 16, with a ≥ 3. Example 4. Let G be a distributive group. The mapping µ : G × G → G deﬁned by µ(g, h) = gh, ∀g, h ∈ G, is a normalised cocycle, the multiplication cocycle. Since G is abelian, µ is symmetric ⇔ Eµ is abelian ⇔ (G, ·) is commutative. The last four examples are instances of a general class of cocycles: any mapping f : G × G → C which is homomorphic in each coordinate (that is, f(gh, k) = f(g, k)f(h, k) and f(g, hk) = f(g, h)f(g, k) always) is a normalised cocycle. In particular we can generalise the multiplication cocycle. Example 5. Let C be a distributive group. Let λ, ρ : G → C be mappings, and deﬁne µλ,ρ : G × G → C by µλ,ρ(g, h) = λ(g)ρ(h), ∀g, h ∈ G. If λ and ρ are homomorphisms then µλ,ρ is a normalised cocycle. This link is emphasised by the following partial converse.
124
Kathy J. Horadam
Corollary 1. Let C be a distributive group with a multiplicative unity 1, and let λ, ρ : G → C be mappings. If µλ,ρ is a normalised cocycle, 1 ∈ Imλ (resp. Imρ) and λ (resp. ρ) is a homomorphism then ρ (resp. λ) is a homomorphism. Proof. Since µλ,ρ is a cocycle, λ(g)ρ(h) + λ(gh)ρ(k) = λ(g)ρ(hk) + λ(h)ρ(k). If λ is a homomorphism, the result follows on setting λ(g) = 1. To link cocycles with generalised Hadamard matrices and perfect sequences we will be interested in the frequency with which a cocycle takes each value in the target abelian group C. Definition 2. Let ψ : G × G → C be a cocycle and for each g ∈ G, a ∈ C deﬁne N (g, a) = {h ∈ G : ψ(g, h) = a}. The set of frequencies {N (g, a) : g ∈ G, a ∈ C} is the distribution of ψ. A generalised Hadamard matrix GH(w, v/w) over W is a v × v matrix H with entries from the group W of order wv such that the list of quotients hij h−1 kj , , the 1 ≤ j ≤ v, contains each element of W exactly v/w times. With h∗ij = h−1 ji deﬁning matrix equation over ZW is HH ∗ = vIv + (v/w)( u)(Jv − Iv ). (4) u∈W
Any GH(w, v/w) is equivalent to a normalised GH(w, v/w) with its ﬁrst row and column consisting entirely of the unit element of W . A Hadamard matrix is a GH(2, v/2). All known examples of GH(w, v/w) require w to be a prime power. When the GH(w, v/w) H is also cocyclic; that is, W = C is abelian, wv and H = Mψ for some cocycle ψ, this is equivalent to imposing a combinatorial condition we call orthogonality, on the cocycle. Definition 3. The cocycle ψ : G × G → C is orthogonal when wv if the noninitial rows of Mψ are uniformly distributed over the elements of C; that is if, for each g = 1 ∈ G, N (g, a) = v/w, ∀a ∈ C (5) or equivalently, if in ZC, for each g = 1 ∈ G, h∈G ψ(g, h) = v/w ( a∈C a). For instance, in Example 1 if nv then ψ is orthogonal (with entries in C = a ) when n = p and v = pm , for p a prime. The cocycles of Example 2 are orthogonal. The multiplication cocycle for any ﬁnite presemiﬁeld is orthogonal. The designs corresponding to normalised GH(w, v/w) are divisible designs. When the GH matrix is also cocyclic, these designs have been completely characterised. Furthermore, they correspond to relative diﬀerence sets. A relative (v, w, k, λ)diﬀerence set [4] in a ﬁnite group E of order vw relative to a normal subgroup N of order w, is a kelement subset R of E such that the multiset of quotients d1 d−1 2 of distinct elements d1 , d2 of R contains each element of E\N exactly λ times, and contains no elements of N . (The ordinary (v, k, λ)diﬀerence sets correspond to the case N = {1}.) There is always a short exact
Sequences from Cocycles
125
sequence 1 → N → E → E/N → 1 . We will be concerned only with relative diﬀerence sets having k = v and therefore also k = wλ. A relative diﬀerence set with the latter property is termed semiregular. Theorem 1. (Equivalence Theorem) [17, Lemma 2.7, Theorem 4.3] Let wv and let Mψ be a Gcocyclic matrix with entries in C. Then Mψ is a GH(w, v/w) if and only if the design Dψ developed from {(1, g), g ∈ G} ⊂ Eψ is a divisible (v, w, v, v/w)design, class regular with respect to C, if and only if Rψ = {(1, g), g ∈ G} ⊂ Eψ is a relative (v, w, v, v/w)diﬀerence set relative to the central subgroup C × {1}, if and only if the cocycle ψ is orthogonal.
3
New Classes of Orthogonal Cocycles
In this section we give a new construction of orthogonal cocycles, and show how to obtain new orthogonal cocycles from old. In [8] generalisations of the multiplication cocycle for a ﬁnite ﬁeld were given, together with conditions under which they were orthogonal. Example 6. [8, Lemma 3.11] If F is a ﬁnite ﬁeld and G = (F, +), then the power i cocycles µi : G×G → G deﬁned by µi (g, h) = g p h, 0 ≤ i ≤ a−1, are orthogonal. (Here µ0 = µ.) Udaya [19] has pointed out that the Frobenius automorphism in this example can be replaced by an arbitrary linearised permutation polynomial. This prompts the following generalisation, giving a very large class of orthogonal cocycles which have not previously been recognised. Lemma 1. Let G be a distributive group which is a presemiﬁeld and let λ, ρ : G → G be homomorphisms. If λ (resp. ρ) is an automorphism, then µλ,ρ : G×G → G is an orthogonal cocycle if and only if ρ (resp. λ) is an automorphism. Proof. Here v = w, so v/w = 1. From Example 5, µλ,ρ is a cocycle. By symmetry we may assume λ is an automorphism, and we only need to show that µλ,ρ is orthogonal if and only if ρ is onetoone. This is clear because, given any λ(g) = 0 and k ∈ G, there is a unique solution h ∈ G to the equation λ(g)ρ(h) = k if and only if {h ∈ G : µλ,ρ(g, h) = k} = 1. The composition of an (orthogonal) cocycle G×G → C with an epimorphism C → C of abelian groups is an (orthogonal) cocycle. For example the ﬁeld multiplication cocycle projects via the relative trace function. Example 7. Let F = Fqm and K = Fq be ﬁnite ﬁelds and let T = T rF/K be the relative trace function. Then T ◦ µ : (F, +)2 → (K, +) is an orthogonal cocycle. A more general version of this example is this consequence of Lemma 1.
126
Kathy J. Horadam
Theorem 2. If G is a distributive group which is a presemiﬁeld, λ, ρ : G → G are automorphisms and γ : G → C is an epimorphism then γ ◦ µλ,ρ : G × G → C is an orthogonal cocycle. The composition of an automorphism of G acting diagonally on G × G with an (orthogonal) cocycle G × G → C is an (orthogonal) cocycle. For example, if ρ is an automorphism then µλ,ρ ◦ (ρ−1 × ρ−1) = µλ◦ρ−1 ,1 , so that µλ,ρ and µλ◦ρ−1 ,1 lie in the same orbit under diagonal Aut(G)action. When further composed with an automorphism of C, we obtain an (Aut(C) × Aut(G))action on the abelian group of cocycles which partitions it into (Aut(C) × Aut(G))orbits which either consist entirely of orthogonal cocycles or contain no orthogonal cocycles. Denote by O(ψ) the orbit of ψ under the action ψ (γ,θ) (g, h) = γ(ψ(θ(g), θ(h))), γ ∈ Aut(C), θ ∈ Aut(G).
(6)
In [7] it was shown that these orbits may be collected into bundles under a further Gaction termed shift equivalence. Definition 4. Cocycles ψ, ϕ : G × G → C are shiftequivalent, written ψ ∼s ϕ via k, if there exists k ∈ G such that ψ = ϕ · ∂ϕk , where ϕk (g) = ϕ(k, g), g ∈ G, or, equivalently, if there exists k ∈ Gsuch that ψ(g, h) = ϕ(kg, h) ϕ(k, h)−1 . The bundle B(ψ) of ψ is B(ψ) = ϕ∼s ψ O(ϕ). Shift equivalence preserves orthogonality, so these bundles consist wholly of orthogonal cocycles or of nonorthogonal cocycles. In fact, each orthogonal bundle corresponds uniquely to an equivalence class of relative diﬀerence sets, and vice versa (the Bundle Isomorphism Theorem [7]). Finally we show that the distribution of a cocycle is an invariant of the bundle containing that cocycle. It is easy to show it is an invariant of the (Aut(C) × Aut(G)) orbit of the cocycle. Write h∈G ψ(g, h) = a∈C N (g, a)a. If k ∈ G then (by (1)) ψ(kg, h)ψ(k, h)−1 = ψ(kgk −1 .k, h)ψ(k, h)−1 = (7) h∈G
h∈G
ψ(kgk
−1
h∈G −1
, k)
ψ(kgk
−1
, kh) =
N (kgk −1 , a)(ad), d = ψ(kgk −1, k)−1 .
a∈C
Hence the distribution is invariant under shift equivalence. Theorem 3. The distribution D(B(ψ)) of a cocycle ψ is an invariant of its bundle B(ψ).
4
Sequences from Cocycles
For binary sequences φ : G → {±1} indexed by an abelian group G, the (periodic) autocorrelation function A : G → Z is deﬁned [15] to be A(g) = φ(g + h)φ(h), and the sequence is a perfect binary array if A(g) = 0 h∈G
Sequences from Cocycles
127
for every g = 0. It is wellknown that a perfect binary array corresponds to a MenonHadamard diﬀerence set in G and to a splitting relative (4u2 , 2, 4u2 , 2u2 )diﬀerence set in Z2 × G, and vice versa. Perfect binary arrays are investigated as one response to the lack of examples of perfect binary sequences (for which G = Zv in the deﬁnition above). A recent survey of progress in the search for generalisations of perfect binary sequences, and their links to diﬀerence sets and divisible diﬀerence sets, appears in [15]. More generally, the autocorrelation function for a sequence φ : G → C where G and C are multiplicatively written ﬁnite groups with C abelian and φ(1) = 1, is A : G → ZC where A(g) = h∈G φ(gh)φ(h)−1 , and φ is a perfect array if A(g) = 0 for every g = 1. We see that for g = 1, φ(gh)φ(h)−1 = 0 ⇔ φ(gh)φ(h)−1 φ(g)−1 = 0 (8) h∈G
h∈G
⇔
∂φ(g, h) = 0,
(9)
h∈G
so that a perfect binary array is also equivalent to an orthogonal coboundary with C = {±1}. A perfect quaternary array has C = {±1, ±i} and in this case the orthogonal coboundaries determine the subclass of balanced perfect quaternary arrays: those for which each value of C is taken equally often. Clearly the coboundary in (9) may be replaced by a cocycle. Definition 5. A cocyclic perfect array is a cocycle ψ : G × G → C, where G is a ﬁnite group, C is a ﬁnite abelian group, such that in ZC, ∀ g = 1, ψ(g, h) = 0. (10) h∈G
It is balanced if each element of C appears equally often in the summation. Whenever a∈C a = 0 in ZC, such as in the cases of interest mentioned above, balanced cocyclic perfect arrays are identical to orthogonal cocycles, and vice versa. Hughes [12] has investigated the relationships between autocorrelation functions and orthogonal cocycles in detail. He has, in a sense, reversed the process above, deriving from each orthogonal cocycle ψ a 1dimensional sequence Eψ → C with uniformly distributed autocorrelation function for values h of Eψ away from a speciﬁed forbidden subgroup. This sequence is the characteristic function of a central relative (v, w, v, v/w)diﬀerence set. The autocorrelation is one measure of the behaviour of a sequence φ relative to its translates. Another approach is to look at the spread, as h ranges over G, of the values φ(g + h)φ(h)−1 , and to take the ﬂatness of the distribution as the measure of merit. Sequences for which this distribution is as ﬂat as possible are good potential Sbox functions because of their resistance to diﬀerential cryptanalysis. What is sought is sequences φ : Fpm → Fpm such that ∆ = max {x ∈ Fpm : φ(a + x) − φ(x) = b} a =0,b
(11)
128
Kathy J. Horadam
is as small as possible. A function φ with ∆ = 1 is perfect nonlinear (PN) and with ∆ = 2, almost perfect nonlinear (APN). It is known that for p = 2, ∆ = 2 is the best possible, while for p ≥ 3, ∆ = 1 is achievable. For p = 2, extensive current research focuses on the identiﬁcation and classiﬁcation of APN functions and their relationship to maximally nonlinear sequences (eg. [3]). It is also clear that further links to cyclic diﬀerence sets are being uncovered [2,3]. In this section, we propose a cocyclic construction of nonlinear functions with desirable distribution properties. Note that in (11), ∆ = 1 if and only if ∀a = 0, b ∈ Fpm , {x ∈ Fpm : φ(a + x) − φ(x) = b} = 1 if and only if ∀a = 0, c ∈ Fpm , {x ∈ Fpm : φ(a + x) − φ(x) − φ(a) = c} = 1 if and only if ∂φ : G × G → G, G = (Fpm , +) is an orthogonal coboundary. Corollary 2. φ : Fpm → Fpm is a PN function if and only if ∂φ is an orthogonal coboundary if and only if {(φ(g), g) : g ∈ Fpm } is a splitting relative (pm , pm , pm , 1)diﬀerence set in Fpm × Fpm relative to Fpm × {0}. In [8] the author shows that the constructions of abelian splitting relative (pm , pm , pm , 1)diﬀerence sets known to her [14,18] for odd p are isomorphic to those deﬁned by the multiplication cocycle in some commutative semiﬁeld. In these cases, µ is (necessarily) a coboundary [8, Theorem 3.8]. By Ganley’s result [5], no abelian splitting relative (2m , 2m , 2m , 1)diﬀerence set exists, so ∆ = 2 is indeed the best possible for p = 2, in contrast to the situation for odd primes. However in the same article, the author shows that the ﬁeld multiplication cocycle deﬁnes the known [6,18] abelian (nonsplitting) relative (2m , 2m , 2m , 1)diﬀerence sets, and thus, if we relax the requirement that a nonlinear function be 1dimensional, a wealth of constructions of sequences with excellent distribution properties exists. Definition 6. Let ψ : G×G → C be a cocycle and deﬁne ∆ = maxg =1,c N (g, c). We say ψ is a ∆nonlinear cocycle (brieﬂy: ψ is perfect nonlinear (PN) if ∆ = 1 and almost perfect nonlinear (APN) if ∆ = 2). Combining this deﬁnition with the results of the previous section we obtain: Corollary 3. If ψ : G × G → C is an orthogonal cocycle it is v/wnonlinear, and if γ : C → C is any monomorphism, γ ◦ ψ is v/wnonlinear. Theorem 4. If ψ is ∆nonlinear, so is every cocycle in its bundle B(ψ). We illustrate the above ideas in full detail for the small case G = C = Z22 . Example 8. If G = C = Z22 , each cocycle ψ is identiﬁed by a quadruple x, y, z, t , where x, y, z, t ∈ C. With indexing {0 = 00, 10, 01, 11}, the cocyclic matrix Mψ is 0 0 0 0 0 x z x+z . 0 z+t y y + z + t 0 x+z +t y+z x+y +t
Sequences from Cocycles
129
There are 16 bundles of cocycles [7] and on computing their distributions from a representative cocycle ψ we obtain the following table, in which bundles are grouped according to the cohomology class of ψ. The ﬁrst group (of 2 bundles) consists of the coboundaries. The ﬁrst three groups consist of symmetric cocycles (and Eψ is abelian) and the second three groups consist of nonsymmetric cocycles (and Eψ is nonabelian). The distributions are read oﬀ in order from the rows of the corresponding Mψ and abbreviated, so that for example, the third row of M a,0,c,0 is 0, c, 0, c, which is represented as 22 and the fourth is 0, a + c, c, a, which is represented as 14 . (Here distinct variables refer to distinct nonzero elements of C.) ψ 1.1 0, 0, 0, 0 1.2 0, 0, c, 0 2.1 a, 0, 0, 0 2.2 a, a, 0, 0 2.3 a, 0, c, 0 3.1 a, b, 0, 0 3.2 a, b, c, 0 4.1 0, 0, 0, d 4.2 0, 0, c, d 5.1 0, b, 0, d 5.2 0, b, c, c 5.3 0, b, b, d 6.1 a, a, 0, a 6.2 a, a, 0, d 6.3 a, a, a, d 6.4 a, a, c, a
D(B(ψ))
Eψ
{4, 4, 4, 4} Z42 2 2 2 {4, 2 , 2 , 2 } Z42 {4, 22 , 4, 22 } Z22 × Z4 {4, 22 , 22 , 22 } Z22 × Z4 {4, 14 , 22 , 14 } Z22 × Z4 {4, 22 , 22 , 14 } Z24 {4, 14 , 14 , 14 } Z24 {4, 4, 22 , 22 } Z2 × D4 {4, 22 , 22 , 14 } Z2 × D4 {4, 4, 14 , 14 } E5 [7] {4, 22 , 22 , 22 } E5 {4, 22 , 14 , 22 } E5 {4, 22 , 22 , 22 } Z4 Z4 {4, 22 , 14 , 14 } Z4 Z4 {4, 22 , 14 , 22 } Z4 Z4 {4, 14 , 14 , 14 } Z4 Z4
∆ Comments 4 2 4 2 2 2 1 4 2 4 2 2 2 2 2 1
Trivial cocycle APN ≡ best 1 − dim. function APN APN, µ for F2 + uF2 APN, µ for F2 × F2 PN, µ for F4 APN APN APN APN APN APN PN, power cocycle µ1 for F4
The 2dimensional APN functions in this table are all new sequences with distribution as good as or ﬂatter than the best 1dimensional APN functions (which must equate to coboundaries, ie those in bundle 1.2 above). The 2dimensional balanced APN functions involving a single variable (bundles 1.2, 2.2 and 6.1 above) can also be derived as the monomorphic images of orthogonal (hence APN) cocycles G × G → Z2 using Corollary 3. The 2dimensional PN functions in this table (bundles 3.2 and 6.4 above) are new sequences whose distributions achieve the theoretical limit not attainable by any 1dimensional functions. Theorem 5. Let G be the additive group of a presemiﬁeld and let λ, ρ : G → G be any automorphisms. Then µλ,ρ has distribution {N (0, 0) = v, N(0, c) = 0, c = 0, N (g, c) = 1, g = 0}, so that ∆ = 1 and µλ,ρ is a PN cocycle. There are many presemiﬁeld structures with underlying additive group G ∼ = (for example any ﬁnite ﬁeld or Dickson commutative semiﬁeld). In this case Zm p
130
Kathy J. Horadam
there are (pm − 1)(pm − p) . . . (pm − pm−1 ) automorphisms of G, each of which can be expressed as a unique linearised permutation polynomial of the ﬁnite ﬁeld structure on G. We hereby obtain new sequences whose distributions achieve the theoretical limit.
References 1. Colbourn, C. J. and Dinitz, J. H. (Eds), The CRC Handbook of Combinatorial Designs, CRC Press, Boca Raton, 1996. 2. Dillon, J. F., Multiplicative diﬀerence sets via additive characters, preprint August 1998. 3. Dobbertin, H., Kasami power functions, permutation polynomials and cyclic diﬀerence sets, NATOASI Workshop “Diﬀerence sets, sequences and their correlation properties”, August 1998. 4. Elliott, J. E. H. and Butson, A. T., Relative diﬀerence sets, Illinois J. Math. 10 (1966), 517–531. 5. Ganley, M. J., On a paper of Dembowski and Ostrom, Arch. Math. 26 (1976), 93–98. 6. Hammons, A. R. et al, The Z4 linearity of Kerdock, Preparata, Goethals, and related codes, IEEE Trans. IT 40 (1994), 301–319. 7. Horadam, K. J., Equivalence classes of central semiregular relative diﬀerence sets, preprint April 1999. 8. Horadam, K. J., Multiplication cocycles and central relative (pa , pa , pa , 1)diﬀerence sets, Research Report 7, Mathematics Department, RMIT, May 1998. 9. Horadam, K. J., Cocyclic Hadamard codes, preprint 1998, abstract in Proc. ISIT 1998, IEEE, p. 246. 10. Horadam, K. J. and Perera, A. A. I., Codes from cocycles, in AAECC12, T. Mora and H. Mattson, eds, LNCS 1255, Springer, Berlin 1997. 11. Hughes, D. R. and Piper, F. C., Projective planes, GTM 6, Springer, New York, 1973. 12. Hughes, G., Characteristic functions of relative diﬀerence sets, correlated sequences and Hadamard matrices, in AAECC13, LNCS, Springer, 1999, to appear. 13. Jedwab, J., Generalised perfect arrays and Menon diﬀerence sets, Des., Codes Cryptogr. 2 (1992) 19–68. 14. Jungnickel, D., On automorphism groups of divisible designs, Canad. J. Math. 24 (1982), 257–297. 15. Jungnickel, D. and Pott, A., Perfect and almost perfect sequences, Des., Codes Cryptogr., to appear. 16. Kumar, P. V., On the existence of square dotmatrix patterns having a speciﬁc threevalued periodiccorrelation function, IEEE Trans. IT 34 (1988) 271–277. 17. Perera, A. A. I. and Horadam, K. J., Cocyclic generalised Hadamard matrices and central relative diﬀerence sets, Des., Codes Cryptogr. 15 (1998) 187–200. 18. Pott, A., Finite Geometry and Character Theory, LNM 1601, Springer, Berlin, 1995. 19. Udaya, P., Cocyclic generalised Hadamard matrices over GF (pn ) and their related codes, Proc. AAECC13, Honolulu, Hawaii, November 1999.
On the Second Greedy Weight for Binary Linear Codes Wende Chen1 and Torleiv Kløve2 1
2
System and Control Lab., Institute of Systems Science Academia Sinica, Beijing 100080, China Department of Informatics, University of Bergen, N5020 Bergen, Norway
Abstract. The diﬀerence g2 − d2 for a binary linear [n, k, d] code C is studied. Here d2 is the smallest size of the support of a 2dimensional subcode of C and g2 is the smallest size of the support of a 2dimensional subcode of C which contains a codeword of weight d. For codes of dimension 4, the maximal value of g2 −d2 is determined. For general dimensions, upper and lower bounds on the maximal value are given.
1
Introduction
Ozarow and Wyner [6] suggested one application of linear codes to cryptology, namely to the wiretap channel of type II. For this channel, an adversary is assumed to be able to tap s bits (of his choice) of n bits transmitted. The goal for the sender is to encode k bits of information into n transmitted bits in such a way that the adversary gets as little information as possible. One of their schemes was to use the dual of an [n, k] binary linear code C. The code has 2k cosets, each representing a binary ktuple. If the sender wants to transmit k bits of information to the receiver, he selects a random vector in the corresponding coset. The channel is assumed to be noiseless, so the receiver can determine the corresponding coset of the received vector. It is assumed the adversary has full knowledge of the code, but not of the random selection of a vector in a coset. In his studies of this scheme, Wei [7] introduced a set of parameters of a binary code which he called the generalized Hamming weights. The same parameters had also been studied previously in another context [4] and has since proved important also in other contexts. For any code D, let χ(D), the support of D, be the number of positions where not all the codewords of D are zero. Further, the support weight of D is χ(D). For an [n, k] code C and any r, where 1 ≤ r ≤ k, Wei defined dr (C) = min{χ(D)  D is an (n, r) subcode of C}. In particular, the minimum distance of C is d1 (C). The weight hierarchy of C is the set {dr (C)  1 ≤ r ≤ k}. For the OzarowWyner scheme, it was shown by Wei [7] that the adversary can obtain r bits of information if and only if s ≥ dr (C). Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 131–141, 1999. c SpringerVerlag Berlin Heidelberg 1999
132
Wende Chen and Torleiv Kløve
Cohen et al. [2], [3] considered the following variation of the problem. The adversary is greedy. He first reads d = d1 positions to obtain one symbol of information as soon as possible. He then reads a minimal number of further positions to get one additional symbol of information and so on. Let gr denote the minimal number of symbols he has to read to get r symbols of information in this way. Note that g1 = d1 and gk = dk . We call the sequence (g1 , g2 , g3 , . . . , gk ) the greedy weight hierarchy. In particular, g2 is the smallest support of a 2dimensional subcode of C which contains a codeword of weight d. The cost to the adversary (in extra positions read) to get two symbols of information using this algorithm is g2 − d2 . We consider here how large g2 − d2 can be for given n, k, and d. In general, we consider three main cases: Case I: There exist two codewords c1 , c2 ∈ C, such that w(c1 ) = d and χ(c1 ) ∪ χ(c2 ) = d2 . Here and in the following w() denotes the Hamming weight. I this case g2 = d2 by definition and there is nothing more to be said. Case II: We do not have case I, but there exist three codewords c1 , c2 , c3 ∈ C, such that w(c1 ) = d, χ(c2 ) ∪ χ(c3 ) = d2 , and χ(c1 ) ∪ χ(c3 ) = g2 . Case III: We do not have case I or II. Then we consider in detail the 4dimensional subspace D generated by four codewords c1 , c2 , c3 , c4 ∈ C such that two of them generate a subspace of support d2 and the other two a subspace of support g2 which contains a codeword of weight d.
2
Case II Codes
We first consider briefly [n, 3, d] codes. Let m3 = m3 (n, d) be defined by m3 (n, d) = max g2 (D) − d2 (D)  D is an [n, 3, d] code without zeropositions . The exact value of m3 (n, d) was determined in [1]:
− n−d if n+3 ≤ d ≤ 2n−3 , d 3 4 5
4n−15 2n−3d−3 n−d 2n−3 m3 = − 3 if
(1)
For [n, 3, d] codes in general (possibly with zeropositions) which satisfy condition II, we get g2 − d2 ≤ m3 (d3 , d). We have m3 (d3 , d) = 0 for d ≥ 4d73 and 4d3 −7d in particular for d ≥ 4n ≤ 4n−7d for d ≤ 4d73 . 7 . Further, m3 (d3 , d) ≤ 6 6 d Also, m3 (d3 , d) ≤ 2 for all d. Since any [n, k, d] code which satisfy condition II contains an [n, 3, d] code which satisfy condition II, we get the following upper bound: d if 0 ≤ d ≤ 2n 2 5 , 4n−7d 2n 4n g 2 − d2 ≤ (2) if ≤ d ≤ 5 7 , 6 4n 0 d≥ 7.
On the Second Greedy Weight for Binary Linear Codes
3
133
Case III Codes
First we consider codes without zero positions and translate the problem into geometric terms. Let G be a generator matrix for an [n, k] code C. For any x ∈ GF (2)k , m(x), the value of x, will denote the number of occurrences of x as a column in G. In [5] it was shown that there is a oneone correspondence between the subspaces C of dimension r and the subspaces of GF (2)k of dimension (k − r) such that if D corresponds to U, then wS (D) + x∈U m(x) = dk . We may view the vectors as points in the projective space P G(k − 1, 2). A value assignment is a function m : P G(k − 1, 2) → N = {0, 1, 2, . . . }. For p ∈ P G(k − 1, 2) we call m(p) the value of p. A value assignment defines a generator matrix and a code (up to equivalence). We define the value of a subset S of P G(k − 1, 2) as follows: m(S) = p∈S m(p). Suppose that a value assignment m corresponding to an [n, 4, d] code is given. By definition, we have n − d = max{m(P )  P is a plane}. Let α = max{m(l)  l is a line}, β = max{m(l)  l is a line in a plane of value n − d}. Conditions III can then be stated as follows: if l is a line of value α and P is a plane of value n − d, then all lines l in P have value at most β and the lines in P meeting l have value less than β. We get g2 − d2 = α − β. Hence, we want to maximize α − β for given n and d. We denote this maximum by m4 = m4 (n, d). 8 t ✦ ✦✦ ✡❏ ✦ 10 ✦✦ ✡ ❏ ✦t ✡ ❏ ✦ ✡ ❏ ✦✦ ✦ ✦ ✡ ❏ ✦ t 2 ✡ ❏ ❉ ✡ ❏ ❉ 9✡ t ❏t12 ❉ 14 t ✡ ❏ t ❉ ❏ ❉ 11 ✡ t ✡ t15 ❏ ❉ ✡ ❏ 6 t ❉t 13 t 3 ❉ ✡ ❏ ✡ ❏ ❉ 7 ❏ ❉ ✡ ❏t ❉✡ t t 1 5 4 The space P G(3, 2) contains 15 points, 35 lines, and 15 planes. To be able to refer to these items in a simple way we find it convenient to introduce coordinates, that is, each point is given by a nonzero quadruple. To simplify the notation,
134
Wende Chen and Torleiv Kløve
we refer to the point (c1 c2 c3 c4 ) as point 8c1 + 4c2 + 2c3 + c4 and we write m(c1 c2 c3 c4 ) = x8c1 +4c2 +2c3 +c4 . For example, (0110) is named 6 and m(0110) = x6 . The space P G(3, 2) is illustrated in the figure above where all the points and some of the lines are included. Without loss of generality (renaming the points if necessary) we may assume that m(P ∗ ) = n − d, where P ∗ = {1, 2, 3, 4, 5, 6, 7} m(l+ ) = β, where l+ = {1, 2, 3}, ∗ m(l ) = α, where l∗ = {4, 8, 12}. The Range d 2n/7 Theorem 1. If d < Theorem 2. If
n+6 6
n+6 6 ,
then m4 (n, d) = 0.
≤ d and
n − d 5
≤
3n − 8d − 7 5
,
(3)
then m4 (n, d) =
6d − n − 6 . 5
Note that condition (3) is true for d 2n/7. Proof of Theorem 2: The three lines in P ∗ through point 4 all have value at most β − 1 by condition III. Hence 3(β − 1) ≥ m({1, 4, 5}) + m({2, 4, 6}) + m({3, 4, 7}) = n − d + 2x4 , and so 2x4 ≤ 3(β − 1) − (n − d).
(4)
Since {1, 2, 3, 4} ⊂ P ∗ we get β + x4 ≤ n − d. Hence, by (4) we get 2x4 ≤ 3(n − d − x4 − 1) − (n − d) which implies 5x4 ≤ 2n − 2d − 3 and so def
x4 ≤ u =
2n − 2d − 3 . 5
(5)
From (4) we also get β≥
n − d + 2x + 3 4 . 3
(6)
On the Second Greedy Weight for Binary Linear Codes
135
We have α = x4 + x8 + x12 ≤ x4 + d and so n − d − x + 3 n − d + 2x + 3 4 4 =d− α − β ≤ x4 + d − 3 3 n − d − (2n − 2d − 3)/5 + 3 n − d − u + 3 =d− ≤d− 3 3 n − d + 6 6d − n − 6 = . =d− 5 5 If 6d − n − 6 < 0, then α < β. This is impossible and so m4 = 0 in this case. This proves Theorem 1. If 6d − n − 6 ≥ 0, then a value assignment which reaches the upper bound is given as follows: xp = 0 for p ∈ {1, 2, 3, 4, 7, 8, 12},
x8 = d/2, x12 = d/2 ,
x3 x7 x4 = u α µ 0 2µ − 1 d + 2µ − 1 µ 1 2µ − 1 d + 2µ − 1 µ 0 2µ d + 2µ µ 1 2µ d + 2µ µ+1 0 2µ + 1 d + 2µ + 1
6d−n−6 in all cases. This construction Note that α − β = d − µ − 2 = 5 provided n−d 5µ 5µ + 1 5µ + 2 5µ + 3 5µ + 4
x2 µ µ µ+1 µ+1 µ+1
x1 µ+1 µ+1 µ+1 µ+1 µ+1
n − d − 1 ≥ m(P ) = x1 + x4 + x8 + x12 =
n − d + 5 5
+
β 3µ + 1 3µ + 1 3µ + 2 3µ + 2 3µ + 3 is valid
2n − 2d − 3 5
+ d,
where P is the plane {1, 4, 5, 8, 9, 12, 13}, and this is equivalent to condition (3). The Range 2n/7 d 4n/11 Theorem 3. If
2n−4 7
n − 2d 3
+
≤ d and
n − 2d + 1 3
≥
7d − 2n + 6 6
+
7d − 2n + 8 , 6
(7)
then m4 (n, d) =
d − 3 . 2
Note that (7) is satisfied for d 4n/11. Proof of Theorem 3. There are three planes which contain l∗ . These planes must have value at most n − d − 1. Hence n + 2α = m(P ) ≤ 3(n − d − 1) P l∗ ⊂P
136
Wende Chen and Torleiv Kløve
and so α≤
2n − 3d − 3 . 2
(8)
We have x4 = α − (x8 + x12 ) ≥ α − d.
(9)
Combining with (6) we get n − d + 2(α − d) + 3 α − n + 3d − 3 = α−β ≤α− 3
3 2n−3d−3
− n + 3d − 3 d − 3 2 ≤ = . 3 2
. This proves m4 (n, d) ≤ d−3 2 We now give a value assignment which attains this bound in the range given in Theorem 3. xp = 0 for p ∈ {9, 10, 11, 13, 14, 15}, x1 =
x5 =
n − 2d + 2
n − 2d + 1
n − 2d , x2 = , x3 = , 3 3 3
7d − 2n + 4 6 x4 =
, x6 =
7d − 2n + 6 6
, x7 =
7d − 2n + 8 , 6
2n − 5d − 3 d
d , x8 = , x12 = . 2 2 2
The value assignment is valid if xp ≥ 0 for all p, that is, n − 2d ≥ 0 and 7d − 2n + 4 ≥ 0; and x1 + x2 + x3 ≥ x1 + x6 + x7 which is the same as (7). The Range d 4n/11 Theorem 4. If n − d ≡ 1 (mod 7) and
2n − 3d − 3 2
≤
3n − 3d + 3 7
+ 1,
(10)
or n − d ≡ 1 (mod 7) and
2n − 3d − 3 2 then m4 (n, d) = 0.
≤
3n − 3d + 3 , 7
(11)
On the Second Greedy Weight for Binary Linear Codes
137
Theorem 5. If n − d ≡ 1 (mod 7) and
n − d − 2 7
+d≥
2n − 3d − 3 2
≥
3n − 3d + 3 7
+2
(12)
+1
(13)
then m4 (n, d) =
2n − 3d − 3 2
−
3n − 3d + 3 7
− 1.
If n − d ≡ 1 (mod 7) and
n − d − 2 7
+d≥
2n − 3d − 3 2
≥
3n − 3d + 3 7
then m4 (n, d) =
2n − 3d − 3 2
−
3n − 3d + 3 . 7
Consider first the lines in P ∗ . Three of the lines (those containing the point 4) have value at most β − 1, the remaining at most β. Since each point in P ∗ is on three of the lines, we get 3m(P ∗ ) = 3(n − d) ≤ 7β − 3
(14)
and so β≥ If n − d = 7µ + 1, then
3n−3d+3 7
3n − 3d + 3 7
.
(15)
= 3µ + 1. However, β = 3µ + 1 is not possible
in this case. Suppose the contrary. We then have two possibilities: i) 3 lines of value 3µ+ 1 and 4 of value 3µ or ii) 4 lines of value 3µ+ 1, 2 of value 3µ, and 1 of value 3µ − 1. In case i) we may assume without loss of generality that the 3 lines are {1, 2, 3}, {1, 6, 7} and {2, 5, 7}. Hence we get the following set of equations: x1 + x2 + x3 = 3µ + 1, x1 + x6 + x7 = 3µ + 1, x2 + x5 + x7 = 3µ + 1, x3 + x6 + x7 = 3µ, x1 + x4 + x5 = 3µ, x2 + x4 + x6 = 3µ, x3 + x4 + x7 = 3µ. However, this system of equations has the unique solution: 1 x1 = x2 = x7 = µ + , 2
1 x4 = µ − , 2
x3 = x5 = x6 = µ,
which is impossible since the xp are integers. Similarly, in case ii) we get a unique solution where some of the xp are nonintegers. Hence, we have the following relation: 3n − 3d + 3 if n − d ≡ 1 (mod 7), then β ≥ + 1. (16) 7
138
Wende Chen and Torleiv Kløve
Combining (14), (15) and (8), Theorem 4 follows. We also see that m4 is upper bounded by the expressions in Theorem 5. For the points in the plane P ∗ we give the following values:
n − d + 7
n − d + 6
n − d + 3
n − d − 2 x2 = x3 = x4 = x1 = 7 7 7 7 x5 =
n − d + 2
n − d + 1
x7 =
n − d + 4
7 7 . Hence (9) gives the first inequality in (12) and (13). Note that α = 2n−3d−3 2 Next, we define
x8 =
7
x6 =
2n−3d−3 − x 4
2
2
,
x12 =
2n−3d−3 − x 4 2 . 2
For the remaining points, we give the following values: xp =
d + x − 2n−3d−3 4 2 + p , 6
where p is given by the following table, where ν = n − d (mod 7) and η = d (mod 4): (ν, η) 9 10 11 13 14 15 (0,0) (4,2) (5,3) 0 1 0 0 0 0 (0,1) (3,2) 0 1 1 0 0 0 (0,2) (4,0) (5,1) 1 1 1 0 1 0 (0,3) (3,0) 1 1 1 0 1 1 (1,0) (2,0) (3,1) (5,2) (6,3) 0 0 0 0 0 0 (1,1) (2,1) 0 0 1 0 0 0 (1,2) (2,2) (3,3) (5,0) (6,1) 1 1 1 0 0 0 (1,3) (2,3) 1 1 1 0 0 1 (4,1) (6,2) 1 1 1 1 1 0 (4,3) (6,0) 1 1 0 0 0 0 The proof that this construction has the stated properties is similar to the proofs of the previous constructions. The simplest, but most tedious way, is to consider 28 cases depending on the various values of (ν, η). We skip the details. Remarks. 1) The theorems above determine m4 (n, d) for all n and d. This is easy to check, e.g. by looking at the 18 possible combinations of residues of n modulo 3 and d modulo 6 that indeed the conditions of at least one theorem above are satisfied. In some case, two theorems apply. For example both Theorems 2 and 3 can be used to show that m4 (95, 27) = 12, and both Theorems 3 and 5 show that m4 (95, 34) = 15. 2) For [n, 4, d] codes without zeropositions in general (that is, condition III may not be satisfied) can be treated in the same way. We only omit the condition “if l is a line of value α and P is a plane of value n−d, then all lines in P meeting l have value less than β”. We do not give further details.
On the Second Greedy Weight for Binary Linear Codes
4
139
General Dimensions
For [n, 4, d] codes in general (possibly with zeropositions) which satisfy condition 4 III, we get g2 − d2 ≤ m4 (d4 , d). We have m4 (d4 , d) = 0 for d ≥ 8d 15 and in 8d4 −15d 8d4 8n 8n−15d particular for d ≥ 15 . Further, m4 (d4 , d) ≤ ≤ 14 for d ≤ 15 . Also, 14 [m4 (d4 , d) ≤ d2 for all d. In particular, (2) is satisfied also for case III codes. Hence, we have the following theorem. Theorem 6. For any [n, k, d] code we have d if 0 ≤ d ≤ 2n 2 5 , 4n−7d 2n 4n g 2 − d2 ≤ if ≤ d ≤ , 5 7 6 4n 0 d≥ 7 . We next give a construction of a code without zeropositions for general k. It is again given in the projective geometry representation, that is, the points are represented by binary ktuples. For given n, k, and d, let X be a (k − 1)dimensional projective (binary) space; it contains 2k − 1 points. Let H = {(x1 , x2 , . . . , xk )  xk = 0}, which is a fixed (k − 2)dimensional subspace of X and P = {(x1 , x2 , . . . , xk )  xk−1 = xk = 0} which is a fixed (k − 3)dimensional subspace of H. Further, let Q = {(x1 , x2 , . . . , xk )  xk−2 = xk−1 = 0}, which is a fixed (k − 3)dimensional subspace of X which is not contained in H. We want an assignment m such that 1. m(H) = n − d, 2. m(H ) ≤ n − d for all subspaces H ⊂ X of dimension k − 2, 3. m(P ) ≤ m(P ) for all subspaces P of dimension k − 3 of a subspace H of dimension k − 2 and value n − d, 4. m(P ) ≤ m(Q) for all subspaces P ⊂ Xof dimension k − 3. Then d2 = m(P ) and g2 = m(Q). We assign the following values: – All points (x1 , x2 , . . . , xk−1 , xk ) of even weight are assigned the value 0. – The 2k−2 − 1 points (x1 , x2 , . . . , xk−1 , xk ) = (1, 0, . . . , 0, 0) of odd weight in H are assigned the value (d + 1)/2k−3 . – To (1, 0, . . . , 0, 0) we assign the value which makes m(H) = n − d, namely n − d − (2k−2 − 1)(d + 1)/2k−3 . – Finally, (0, 0, . . . , 0, 1) is assigned the value d and the remaining points of odd weight are assigned the value 0.
140
Wende Chen and Torleiv Kløve
For the construction to work we also assume that m(1, 0, . . . , 0, 0) ≥ (d + 1)/2k−3 , that is n ≥ d + 2k−2 (d + 1)/2k−3 . In particular, this is satisfied if n ≥ d + 2k−2 (d + 2k−3 )/2k−3 = 3d + 2k−2 . We get m(P ) = m(1, 0, . . . , 0) + (2k−3 − 1)(d + 1)/2k−3 = n − d − 2k−3 (d + 1)/2k−3 , m(Q) = d + m(1, 0, . . . , 0) + (2k−4 − 1)(d + 1)/2k−3 = n − 3 · 2k−4 (d + 1)/2k−3 . Hence m(Q) − m(P ) = d − 2k−4 (d + 1)/2k−3 ≥ d/2 − 2k−2 . Consider a subspace P of H of dimension k − 3. We have m(P ) − m(P ) = m(P \ P ) − m(P \ P ). Both P \P and P \P contain 2k−4 points of odd weight. All the points in P \P have value (d + 1)/2k−3 and all the points in P \ P have the same value except possibly (1, 0, . . . , 0, 0) which have at least this value. Hence m(P ) ≥ m(P ). If H = H is a space of dimension k − 2, then m(H ) ≤ d + m(H ∩ H) ≤ d + m(P ) = n − 2k−3 (d + 1)/2k−3 < n − d. Hence g2 −d2 = m(Q)−m(P ). We summarize this result in the following theorem. Theorem 7. If n ≥ d + 2k−2 (d + 1)/2k−3 , then there exists an [n, k, d] code C without zeropositions such that g2 − d2 = d − 2k−4 (d + 1)/2k−3 . Asymptotics Cohen et al. [3] studied the asymptotics of (g2 − d2 )/n. They used the notations δ1 = nd , δ2 = dn2 , γ2 = gn2 , and showed that for almost all codes we have H(δ2 ) + δ2 log2 3 = 2H(δ1 ) and H(γ2 ) + γ2 H(δ1 /γ2 ) + δ1 = 2H(δ1 ), where H is the binary entropy function. Solving these equations for δ2 and γ2 in terms of δ1 , we see that the corresponding γ2 − δ2 increases slowly with δ1 , reaching a maximum ≈ 0.002016618 for δ1 ≈ 0.262549 and then it decreases to zero for δ1 = 1/2. In contrast, we note that Theorems 6 and 7 show that (for fixed k), for the codes with maximum g2 − d2 we have asymptotically γ2 − δ2 =
δ1 for all δ1 ≤ 1/3 2
(for δ1 > 1/3, we only get an upper bound on the maximum).
On the Second Greedy Weight for Binary Linear Codes
141
Acknowledgments This work was supported by The National Science Foundation of China and The Norwegian Research Council.
References 1. W. Chen and T. Kløve, ”On the second greedy weight for linear codes of dimension 3”, to appear in Discrete Mathematics. 2. G. D. Cohen, S. B. Encheva and G. Z´emor, ”Antichain codes”, Proc. IEEE Intern. Symp. Inform. Theory, Cambridge, Mass, Aug. 1620, 1998, p. 232. 3. G. D. Cohen, S. B. Encheva and G. Z´emor, ”Antichain codes”, Designs, Codes and Cryptography, to appear. 4. T. Helleseth, T. Kløve, J. Mykkeltveit, ”The weight distribution of irreducible cyclic codes with block lengths n1 ((q l − 1)/N )”, Discrete Math., vol. 18, pp. 179211, 1977. 5. T. Helleseth, T. Kløve, Ø. Ytrehus, “Generalized Hamming weights of linear codes”, IEEE Trans. Inform. Theory, vol. 38, pp. 11331140, 1992. 6. L. H. Ozarow and A. D. Wyner, ”Wiretap channel II”, AT& T Bell Labs Technical Journal, vol. 63, pp. 21352157, 1984. 7. V. K. Wei, ”Generalized Hamming Weights for Linear Codes”, IEEE Trans. Inform. Theory, vol. 37, pp. 14121418, 1991.
On the Size of Identifying Codes Uri Blass1 , Iiro Honkala2 , and Simon Litsyn1 1
2
Department of Electrical Engineering – Systems TelAviv University, RamatAviv 69978, Israel blass n@netvision.net.il, litsyn@eng.tau.ac.il Department of Mathematics, University of Turku, 20014 Turku, Finland honkala@utu.fi
Abstract. A code is called tidentifying if the sets Bt (x) ∩ C are all nonempty and diﬀerent. Constructions of 1identifying codes and lower bounds on the minimum cardinality of a 1identifying code of length n are given.
1
Introduction and Basics
Denote by IF2 the binary alphabet {0, 1}. The (Hamming) distance d(x, y) between any two vectors x, y ∈ IFn2 is the number of coordinates in which they diﬀer from each other. The integer d(x, 0), where 0 denotes the allzero vector (0, 0, . . . , 0), is called the weight of x. If x, y ∈ IFn2 and d(x, y) ≤ t, then we say that x tcovers y (and vice versa). Any nonempty subset C of of IFn2 is called a binary code of length n. The covering radius of a code is the smallest integer R with the property that every vector in IFn2 is within distance R from at least one codeword. As usual, we denote for x ∈ IFn2 , Bt (x) = {y ∈ IFn2  d(y, x) ≤ t} and
St (x) = {y ∈ IFn2  d(y, x) = t}.
The purpose of this paper is to study codes that can be used for identiﬁcation in the following situation: Assume that 2n processors are arranged in the nodes of an ndimensional hypercube. A processor can check itself and all the neighbours in the hypercube within Hamming distance t, and reports YES if everything is all right and NO if there is something wrong in the processor itself or one of these neighbours. We wish to ﬁnd a code C consisting of some of the nodes in the hypercube such that if the checking is done in the corresponding processors, then based on the answers we receive, we can tell where the problem lies — assuming there is a problem in at most one of the processors. Such a code is called tidentifying. This problem has been studied in Karpovsky, Chakrabarty and Levitin [4]. More formally, a tidentifying code is deﬁned as follows. Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 142–147, 1999. c SpringerVerlag Berlin Heidelberg 1999
On the Size of Identifying Codes
143
Definition 1. A binary code C of length n is called tidentifying (where t < n) if the sets Bt (x) ∩ C, x ∈ IFn2 are nonempty and diﬀerent. The fact that the sets Bt (x) are all nonempty implies that C has covering radius at most t. Moreover, for all x, y ∈ IFn2 , the condition Bt (x)∩C = Bt (y)∩C implies that x = y. The results of this paper are from [1], where more detailed proofs of our results can be found. The minimum cardinality of a tidentifying code of length n is denoted by Mt (n). Theorem 1. Assume that C is 1identifying. The direct sum {0, 1} ⊕ C is 1identifying if and only if d(c, C \ {c}) ≤ 1 for all c ∈ C. Proof. Denote the code {0, 1} ⊕ C by D. If there is a codeword c ∈ C such that d(c, C \ {c}) > 1, then clearly for both z = (0, c) and z = (1, c) we have B1 (z) ∩ D = {(0, c), (1, c)} and therefore D is not 1identifying. Assume therefore that the condition d(c, C \ {c}) ≤ 1 always holds, and let z = (a, b) ∈ IFn+1 , where a ∈ IF2 and b ∈ IFn2 , be arbitrary. If b ∈ / C, then 2 there are no codewords not beginning with a in B1 (z) ∩ D; if b ∈ C, then there is a unique codeword (a + 1, b) not beginning with a in B1 (z) ∩ D, and by the condition at least two codewords in B1 (z) ∩ D that begin with a. In both cases we can immediately identify z.
We do not know if M1 (n) ≤ 2M1 (n−1) holds — or more generally, Mt1 +t2 (n1 + n2 ) ≤ Mt1 (n1 )Mt2 (n2 ) — but we can prove the following slightly weaker result. Theorem 2. If C is 1identifying then so is C ⊕ {00, 01, 10, 11}. Proof. Assume that C ⊆ IFn2 and let z = (x, y) ∈ IFn+2 , where x ∈ IFn2 and 2 2 y ∈ IF2 , be arbitrary. Without loss of generality, assume y = 00. If x ∈ C, then B1 (z) contains the words (x, 00), (x, 01), (x, 10), and (x, 00) is the only vector covered by all of them. If x ∈ / C, then all the codewords in B1 (z) end in y and we can identify z because C is 1identifying.
2
Constructions
Theorem 3. M1 (4) = 7. Proof. The seven codewords 0000, 0001, 0010, 0101, 0110, 1011, 1101 form a 1identifying code of length four. For the proof of optimality, we refer to [1].
In the following theorems we mention the best previous upper bounds from [4] in parenthesis. Theorem 4. M1 (6) ≤ 19 (20).
144
Uri Blass, Iiro Honkala, and Simon Litsyn
Proof. Take as codewords i) the four words 010000, 001000, 000010, 000001, ii) all the words of weight two except 110000 and its cyclic shifts and iii) all the words of weight ﬁve. It is not diﬃcult to check that this code is indeed 1identifying.
Theorem 5. M1 (7) ≤ 32 (40), M1 (8) ≤ 64 (80), M1 (9) ≤ 128. Proof. In this proof all pairs (x, y) denote binary vectors with x ∈ IF52 , y ∈ IF22 . Denote C = {(x, y)  w(x) = 1} \ {(00001, 00), (00010, 01), (00100, 10), (01000, 11)}. We claim that the sets B1 (u) ∩ C where u = (a, b) and w(a) ≤ 2 are nonempty and diﬀerent. Then it is clear that the codewords of C together with their complements form a 1identifying code with 32 words. This code has the property of Theorem 1, which gives the other two upper bounds. Let u = (a, b) be ﬁxed. If w(a) = 0, then B1 (u) ∩ C consists of four or ﬁve codewords (x, b) ∈ C. If w(a) = 1, then all the codewords B1 (u) ∩ C are of the form (a, y), and as we shall see, there are at least two diﬀerent choices for y. If a = 10000, then y has the following choices: b 00 01 10 11 00 00 00 01 y 01 01 10 10 10 11 11 11 If a = 01000, the choice y = 11 is not available (because (01000, 11) was removed), but the sets of choices for y still remain diﬀerent and all contain at least two elements. The same is true in the three remaining cases. If w(a) = 2, the (one or two) codewords in B1 (u) ∩ C are of the form (x, b). If there are two codewords in B1 (u) ∩ C, we immediately know u; if there is only one, we know that the vector ending in b among the removed four vectors also has to cover u, and we again know u.
Theorem 6. For all even m ≥ 4, M1 (2m − 1) ≤ (2m + 2m−1 − 2)22
m
−2m
.
∗ Proof. (Sketch) Denote by Pm the punctured Preparata code for all even m ≥ 4. We take as codewords all the vectors x ∈ IFn2 such that d(x, c) = 1 for some ∗ ∗ c ∈ Pm together with all the vectors in Hm \ Pm we obtain a 1identifying code. For a detailed proof, see [1]. m ∗ The cardinality of Pm is 22 −2m and therefore the cardinality of our code m equals (n + (2m−1 − 1))22 −2m .
Corollary 1. M1 (15) ≤ 5632 (5760).
On the Size of Identifying Codes
3
145
Lower Bounds
The following theorem has been proved in [4]. We prove it using a short counting argument, which also leads to further results. In the proof we use the concept of excess, cf. e.g. [3, Section 6.3]: Assume that C ⊆ IFn2 has covering radius 1. If a vector x ∈ IFn2 is 1covered by exactly i + 1 codewords of C, we say that the excess E(x) (by C) on x is i. In general, the excess E(V ) on a subset V ⊆ IFn2 is deﬁned by E(V ) = E(x). x∈V Theorem 7. M1 (n) ≥
n2n . V (n, 2)
Proof. Assume that C is a 1identifying code with K = M1 (n) codewords. There may be some at most K points in the space IFn2 covered by a unique codeword; for all the other points x we have E(x) ≥ 1. Every point x with E(x) = 1 is called a son; every point x with E(x) > 1 is called a father. Every son has a unique father deﬁned as follows. A son x is covered by exactly two codewords, who necessarily have distance 1 or 2 from each other. Consequently, they both 1cover exactly one other point, which is called the father of x. Since C is 1identifying, the father must be covered by at least one more codeword and therefore automatically has excess at least 2. We now divide the space into families: all the sons that have a common father, form a family with their father. The families are disjoint; the families together with the uniquely covered points partition the whole space. There may be fathers with no sons. If the father of a family is covered by exactly i codewords, then it can have at most 2i sons. Indeed, each son is covered by exactly two codewords that also cover the father. The average excess on the points in a family whose father is covered by exactly i ≥ 3 codewords is therefore at least f(i) := ( 2i + i − 1)/( 2i + 1). This is a decreasing function on i ≥ 4; and f(3) = f(6). Assume that n ≥ 6. Then f(i) ≥ f(n) whenever 3 ≤ i ≤ n. If there were a codeword c ∈ C such that B1 (c) ⊆ C then we could remove c from the code without violating the identiﬁcation property. Since K = M1 (n), this is not the case. In other words, there are no families with i = n + 1. The total excess E(IFn2 ) trivially equals K(n + 1) − 2n . We now estimate it in a diﬀerent way. The uniquely covered points contribute nothing. The number of such points is at most K. Apart from the uniquely covered points, the remaining points form families, each of which has average excess at least f(n). Therefore K(n + 1) − 2n ≥ (2n − K)f(n),
146
Uri Blass, Iiro Honkala, and Simon Litsyn
The claimed lower bound on K now follows. For n = 5, the argument gives 6K − 25 ≥ (25 − K)f(3) and hence K ≥ 10, so the formula of the theorem also holds when n = 5. The formula also holds for n = 4 by Theorem 3, and it is easy to see that it is also valid for n = 2 and n = 3.
A code C is called eerrorcorrecting if d(a, b) ≥ 2e + 1 whenever a, b ∈ C, a = b. An eerrorcorrecting code is called perfect if it has covering radius e. Theorem 8. Assume that n > 2. Equality holds in the previous theorem if and only if there exists a perfect 2errorcorrecting code of length n. Proof. (Sketch) Assume that n ≥ 7. It has been proved in [4] that if there exists a perfect 2errorcorrecting code of length n, then equality holds in Theorem 7. From the previous proof we see that if equality holds, there are no families with i = n + 1 or i < n, i.e., all the families have i = n and, moreover, consist of n 2 + 1 points. Assume then that a code C attains the bound with equality. If the distance between any two fathers is less than ﬁve, it is not diﬃcult to show that at least one of them does not have enough sons. To prove that D is perfect, we show that the covering radius of D is at most two. Let z ∈ IFn2 be arbitrary. We claim that d(z, D) ≤ 2. If z itself is a father or a son, there is nothing to prove. Assume that z is 1covered by a unique codeword c of C. Because B1 (z) ∩ C = B1 (c) ∩ C, we know that c is covered by another codeword c ∈ C. However, since B1 (c) ∩ C and B1 (c ) ∩ C are diﬀerent and both contain c and c , at least one of them contains another codeword c ∈ C, i.e., either c or c is a father.
It is well known that there is no perfect 2errorcorrecting code of length 90, and therefore 90 · 290 M1 (90) > = 90 · 278 V (90, 2) The following optimality result gives a case where the argument of the previous theorem can be sharpened. Theorem 9. M1 (7) = 32. Proof. (Sketch) By Theorem 5 it suﬃces to prove that M1 (7) ≥ 32. Step 1: By Theorem 7, M1 (7) ≥ 31. Assume that C is a 1identifying code of length seven with K = 31 codewords. From the proof of Theorem 7 we know that if there are less than K points x ∈ IFn2 with E(x) = 0, then 120 = K(n + 1) − 2n ≥ (2n − K + 1)f(7) > 120, which is a contradiction. Hence for every c ∈ C there is a vector x ∈ B1 (c) such that E(x) = 0. Step 2: The 97 points with excess at least one are divided into families, say F1 , F2 , . . . , Fk with fathers y1 , y2 , . . . , yk , respectively. A careful analysis similar to the one used in the proof of Theorem 7 in this speciﬁc shows that we
On the Size of Identifying Codes
147
may assume that F1 , F2 , F3 are families with E(y1 ) = E(y2 ) = E(y3 ) = 6 and all have at least 21 elements. Step 3: Moreover, since n = 7, two of these three fathers are at most distance four apart, say d(y 1 , y2 ) ≤ 4. It is not diﬃcult to show that this leads to a contradiction with the fact that the families are so large or with the observation made in Step 1.
The following table presents the currently known bounds on M1 (n) when n ≤ 9. For results on M2 (n), we refer to [4] and [2]. n M1 (n) 3 a4a 4 b7b 5 c 10 a 6 c 18–19 e 7 d 32 f 8 c 56–64 f 9 c 101–128 f Key a Karpovsky et al. [4] b Theorem 3 c Theorem 7 (which is from [4]) d Theorem 9 e Theorem 4 f Theorem 5
Acknowledgement The second author would like to thank the Finnish Academy for ﬁnancial support.
References 1. U. Blass, I. Honkala, S. Litsyn: Bounds on identifying codes. Discrete Mathematics, to appear 2. U. Blass, I. Honkala, S. Litsyn: On binary codes for identiﬁcation. Journal of Combinatorial Designs, submitted 3. G. Cohen, I. Honkala, S. Litsyn, A. Lobstein: Covering Codes. Elsevier, Amsterdam (1997) 4. M. G. Karpovsky, K. Chakrabarty, L. B. Levitin: On a new class of codes for identifying vertices in graphs. IEEE Trans. Inform. Theory 44 (1998) 599–611
Fast Quantum Fourier Transforms for a Class of Nonabelian Groups Markus P¨ uschel1, Martin R¨ otteler2 , and Thomas Beth2 1
2
Dept. of Mathematics and Computer Science, Drexel University 3141 Chestnut Street, Philadelphia, PA 19104 pueschel@ece.cmu.edu Institut f¨ ur Algorithmen und Kognitive Systeme, Universit¨ at Karlsruhe Am Fasanengarten 5, D76128 Karlsruhe, Germany {roettele,EISS Office}@ira.uka.de
Abstract. An algorithm is presented allowing the construction of fast Fourier transforms for any solvable group on a classical computer. The special structure of the recursion formula being the core of this algorithm makes it a good starting point to obtain systematically fast Fourier transforms for solvable groups on a quantum computer. The inherent structure of the Hilbert space imposed by the qubit architecture suggests to consider groups of order 2n ﬁrst (where n is the number of qubits). As an example, fast quantum Fourier transforms for all 4 classes of nonabelian 2groups with cyclic normal subgroup of index 2 are explicitly constructed in terms of quantum circuits. The (quantum) complexity of the Fourier transform for these groups of size 2n is O(n2 ) in all cases.
1
Introduction
Quantum algorithms are a recent subject and possibly of central importance in physics and computer science. It has been shown that there are problems on which a putative quantum computer could outperform every classical computer. A striking example is Shor’s factoring algorithm (see [27]). Here we address a problem used as a subroutine in almost all known quantum algorithms: The quantum Fourier transform (QFT) and its generalization to arbitrary ﬁnite groups. In classical computation there exist elaborate methods for the construction of Fourier transforms (e. g., [3], [4], [5], [6], [10], [19]), therefore it is highly interesting to adapt and modify these methods to get a quantum algorithm for the Fourier transform with a much better performance (with respect to the quantum complexity model, see Section 3) than any classical algorithm. First attempts in this direction have been proposed by Beals [2] and Høyer [16]. In this paper we present an algebraic approach using representation theory which can be seen as a ﬁrst step towards the realization of a large class of generalized Fourier transforms on a quantum computer. Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 148–159, 1999. c SpringerVerlag Berlin Heidelberg 1999
Fast Quantum Fourier Transforms for a Class of Nonabelian Groups
2
149
Representation Theory and Fourier Transforms
Fourier transforms for ﬁnite groups are an interesting and well studied topic for classical computers. We refer to [3], [6], [19], [24] as representatives for a vast number of publications. The reader not familiar with the standard notations concerning group representations should refer to these publications or to standard references like [9] or [26]. For the convenience of the reader we will brieﬂy present the terms and notations from representation theory which we are going to use and recall the deﬁnition of Fourier transforms. A representation of a ﬁnite group G of degree deg(φ) = n is a homomorphism φ : G → GLn (K) from G into the group of invertible (n × n)matrices over a ﬁeld K. We denote by 1G : g → 1 the trivial representation of G (of degree 1). If A ∈ GLn (K), then φA : g → A−1 · φ(g) · A is the conjugate of φ by A. φ and ψ are called equivalent, if φ = ψ A . If φ,ψ are representations of G, then the φ(g) 0 representation φ ⊕ ψ : g → φ(g) ⊕ ψ(g) = is called the direct sum of 0 ψ(g) φ and ψ. φ is called irreducible, if it cannot be conjugated to be a direct sum. In this paper, we will deal only with ordinary representations, i. e. the characteristic of K does not divide the group order G (Maschke condition). In this case, every representation φ can be conjugated, by a suitable matrix A, to ak direct sum of irreducible representations ρi (Maschke’s theorem), i. e. φA = i=1 ρi , which is called a decomposition of φ and A is referred to as a decomposition matrix for φ. Let φ be a representation of H ≤ G, and φ a representation of G which is equal to φ when restricted to H (φ ↓ H = φ). Then φ is called an extension of φ to G and φ is called extensible (to G). Note, that an extension does not exist in general. If φ is a representation of H ≤ G and t ∈ G then φt : h → φ(tht−1 ) is a representation of H t , called the inner conjugate of φ by t. If H ≤ G is a subgroup with transversal (i. e. a system of representatives of the right cosets of ˙ i gt−1 )  i, j = 1 . . . n], where H in G) T = (t1 , . . . , tk ), then (φ ↑T G)(g) = [φ(t j ˙ φ(x) = φ(x) for x ∈ H and the allzero matrix else, is called the induction of φ to G with transversal T . A regular representation is an induction of the form φ = (1E ↑T G) where E denotes the trivial subgroup of G. Let φ be a regular representation of G. A Fourier transform for G is any decomposition matrix A of φ with the additional property that equivalent irreducibles in the corresponding decomposition are even equal. Note, that the deﬁnition says nothing about the transversal ﬁxing φ, nor the choice of the irreducible representations. As an example let G = Zn = x  xn = 1 be the cyclic group of order n with regular representation φ = 1E ↑T G, T = (x0 , x1 , . . . , xn−1 ), n−1 and ωn a primitive nth root of unity. Then φA = i=0 ρi , where ρi : x → ωni and A = DFTn = √1n [ωnij  i, j = 0 . . . n − 1] is the (unitary) discrete Fourier transform wellknown from signal processing. If A is a Fourier transform for the group G, then we will refer to a fast Fourier transform as any fast algorithm for the multiplication with A. Of course, the term “fast” depends on the chosen complexity model. Since we are primarily interested in the realization of a fast Fourier transform on a quantum computer (QFT) we ﬁrst have to deﬁne the measure of complexity on this architecture.
150
3
Markus P¨ uschel, Martin R¨ otteler, and Thomas Beth
The Complexity Model Used in Quantum Computing
Quantum computing is a topic of recent interest which emerged after the discovery of polynomial algorithms for integer factoring and discrete logarithms by P. Shor (see [27]). The state of a quantum computer is given by a normalized vector in a Hilbert space Hn of dimension 2n , which is given the natural tensor structure Hn = C2 ⊗ . . . ⊗ C2 (n factors). The restriction of the computational space to Hilbert spaces of this particular form is motivated by the idea of a quantum register consisting of n quantum bits. A quantum bit, also called qubit, is a state corresponding to one tensor component of Hn and has the form ϕ = α0 + β1 ,
α2 + β2 = 1,
α, β ∈ C.
The possible operations this computer can carry out are the elements of the unitary group U(2n ). To study the complexity of performing unitary operations on nqubit quantum systems we introduce the following two types of computational primitives (see also [15], this volume): Local unitary operations on a qubit i are matrices of the form U (i) := 12i−1 ⊗ U ⊗ 12n−i , where U is an element of the unitary group U(2) of 2 × 2matrices. Furthermore we need operations which aﬀect two qubits at a time, the standard choice for which is the socalled controlled NOT gate (also called measurement gate) between the qubits i (control) and j (target) deﬁned by 1 1 CNOT(i,j) := 1 1
when restricted to the tensor component of the Hilbert space spanned by the qubits i and j. We assume that these socalled elementary quantum gates can be performed with cost O(1). In the graphical notation using quantum wires (for details see [1]) these transforms are written as shown in Figure 1. The lines correspond to the qubits, unaﬀected qubits are omitted, and the dot • sitting on a wire denotes the control bit. U (i) =
U
i
CNOT(2,1) =
•
2
❢
1
Fig. 1. Elementary quantum gates. These two types of gates suﬃce to generate all unitary transformations, which is the content of the following theorem from [1]. Theorem 1. The set G = {U (i) , CNOT(i,j)  U ∈ U(2), i, j = 1 . . . n, i = j} is a generating set for the unitary group U(2n ). This means that for each U ∈ U(2n ) there is a word w1 w2 . . . wk (where wi ∈ G for i = 1 . . . k is an elementary gate) such that U factorizes as U = w1 w2 . . . wk . In general only exponential upper bounds for the minimal length occurring in factorizations have been proved (see [1]) but there are many interesting classes
Fast Quantum Fourier Transforms for a Class of Nonabelian Groups
151
of unitary matrices in U(2n ) aﬀording only polylogarithmic word length, which means that the minimal length is asymptotically O(p(n)) where p is a polynomial. In the following, we give examples of some particular unitary transforms admitting short factorizations which will be useful in the rest of the paper. – The symmetric group Sn is embedded in U(2n ) by the natural operation of Sn on the tensor components (qubits). Let τ ∈ Sn and Πτ the corresponding permutation matrix on 2n points. Then Πτ has a O(n) factorization as shown in [22]. As an example in Figure 2 the permutation (1, 3, 2) of the qubits (which corresponds to the permutation (2, 5, 3)(4, 6, 7) on the register) is factored according to (1, 3, 2) = (1, 2)(2, 3). ❅✁✁ ❅ ❅✁ ✁❅
=
•
❣
•
❣
•
❣
•
❣
•
❣
•
❣
Fig. 2. Factorization (1, 3, 2) = (1, 2)(2, 3). – Following the notation of [1] we denote a ktimes controlled U by Λk (U). As an example for the graphical notation we give in Figure 3 a Λ1 (U) gate for arbitrary U ∈ U(2n ) with normal, and a gate with inverted control bit including the represented matrix. Lemma 7.2 and Lemma 7.5 in [1] show that for U ∈ U(2) the gate Λk (U) can be realized with gate complexity O(n), for k < n − 1, and Λn−1 (U) in O(n2 ). r Λ1 (U ) = 12n ⊕ U =
.. .
U
❜
n+1 .. .
n .. .
U ⊕ 12n =
.. .
U
n+1 .. .
1
n .. . 1
Fig. 3. Controlled gates with a) normal and b) inverted control bit. – The Fourier transform DFT2n can be performed in O(n2 ) elementary operations on a quantum computer (see [27], [8]). – Let Pn ∈ S2n be the cyclic shift acting on the states of the quantum register as x → x + 1 mod 2n . The corresponding permutation matrix is the 2n cycle (1, 2, . . . , 2n ). Pn can be realized using O(n2 ) basic operations (see [13], Section 4.4). Let U ∈ U(2n ). The cost for a (single) controlled U is settled by the following Lemma 1. If U ∈ U(2n ) can be realized in O(p(n)) elementary operations then Λ1 (U) ∈ U(2n+1 ) can also be realized in O(p(n)) basic operations. Proof: First we assume without loss of generality that U is written in elementary gates. Therefore we have to show that a double controlled NOT and a single controlled U ∈ U(2) can be realized with a constant increase of length. This follows from [1].
152
4
Markus P¨ uschel, Martin R¨ otteler, and Thomas Beth
Creating Fast Fourier Transforms
In Section 2 we have explained that a Fourier transform for a group G is a decomposition matrix B for a regular representation φ of G with the additional property that equivalent irreducible summands are equal, i. e. φB = ρ1 ⊕ . . . ⊕ ρk
fulﬁlling ρi ∼ = ρj ⇒ ρi = ρj .
A “fast” Fourier transform (on a classical computer) is given by a factorization of B into a product of sparse matrices. (see [3], [6], [19], [25]). For a solvable group G, this factorization can be obtained recursively using the following idea. First, a normal subgroup of prime index (G : N ) = p is chosen. Using transitivity of induction, φ = 1E ↑ G is written as (1E ↑ N ) ↑ G (note, that we have the freedom to choose the transversals appropriately). Then 1E ↑ N , which again is a regular representation, is decomposed (by recursion) yielding a Fourier transform A for N . In the last step, B is derived from A using a recursion formula. In the following, we will explain this procedure in more detail by ﬁrst presenting the two essential theorems (without proof) and then stating the actual algorithm for deriving fast Fourier transforms for solvable groups. The special tensor structure of the recursion formula mentioned above will allow us to use the very same algorithm as a starting point to also obtain fast quantum Fourier transforms in the case G being a 2group (i. e. G is a power of 2). The statements in this section are all taken from the ﬁrst chapter of [24] where decomposition matrices for arbitrary monomial representations in general are investigated. The ﬁrst thing we need is Cliﬀord’s Theorem which explains the relationship between the irreducible representations of N and those of G. Theorem 2 (Cliﬀord). Let N ✂ G be a normal subgroup of prime index p with (cyclic) transversal T = (t0 , t1 , . . . , t(p−1)) and denote by λi : t → ωpi , i = 0 . . . p − 1, the p irreducible representations of G arising from G/N . Assume ρ is an irreducible representation of N . Then exactly one of the two following cases applies: 1. ρ ∼ = ρt and ρ has p pairwise inequivalent extensions to G. If ρ is one of them, then all are given by λi · ρ, i = 0 . . . p − 1. p−1 i 2. ρ ∼ ρt and ρ ↑T G is irreducible. Furthermore, (ρ ↑T G) ↓ N = i=0 ρt = and (λi · (ρ ↑T G))
D⊗1d
= ρ ↑T G,
D = diag(1, ωp , . . . , ωp(p−1))i .
The following theorem provides the recursion formula which had already been used in [3] to obtain fast Fourier transforms. Theorem 3. Let N ✂ G be a normal subgroup of prime index p with transversal T = (t0 , t1 , . . . , t(p−1)) and φ a representation of degree d of N . Suppose that A is matrix decomposing φ into irreducibles, i. e. φA = ρ = ρ1 ⊕ . . . ⊕ ρk and that ρ is an extension of ρ to G. Then (φ ↑T G)B =
p−1 i=0
λi · ρ,
Fast Quantum Fourier Transforms for a Class of Nonabelian Groups
153
where λi : t → ωpi , i = 0 . . . p − 1, are the p irreducible representations of G arising from the factor group G/N , B = (1p ⊗ A) · D · (DFTp ⊗ 1d ),
and
D=
p−1
ρ(t)i .
i=0
If, in particular, ρ is a direct sum of irreducibles, then B is a decomposition matrix of φ ↑T G. In case of an cyclic group G the formula yields exactly the wellknown CooleyTukey decomposition, [7], in which D is usually called the Twiddle matrix. Assume that N ✂ G is a normal subgroup of prime index p with Fourier m transform A and decomposition φA = ρ = i=1 ρi . We can reorder the ρi , such that the ﬁrst, say k, ρi have an extension ρi to G and the other ρi occur (p−1) of inner conjugates (cf. Theorem 2, note as sequences ρi ⊕ ρti ⊕ . . . ⊕ ρti j that irreducibles ρi , ρti have the same multiplicity since φ is regular). In the ﬁrst case the extension may be calculated by Minkwitz’ formula, [21], in the latter case each sequence can be extended by ρi ↑T G (Theorem 2, Case 2). We do not state Minkwitz’ formula here, since we will not need it in the special cases treated in Section 5. Altogether we obtain an extension ρ of ρ and can apply p Theorem 3. The remaining task is to assure, that equivalent irreducibles in i=1 λi · ρ are equal. For summands of ρ of the form ρi we have that λj · ρi and ρi are inequivalent and hence there is nothing to do. For summands of ρ of the form ρi ↑T G, we conjugate λj · (ρi ↑T G) onto ρi ↑T G using Theorem 2, Case 2. Now we are ready to formulate the recursive algorithm for constructing a fast Fourier transform for a solvable group G. Algorithm 1. Let N ✂ G a normal subgroup of prime index p with transversal T = (t0 , t1 , . . . , t(p−1) ). Suppose that φ is a regular representation of N with (fast) Fourier transform A, i. e. φA = ρ1 ⊕ . . . ⊕ ρk , fulﬁlling ρi ∼ = ρj ⇒ ρi = ρj . A Fourier transform B of G with respect to the regular representation φ ↑T G can be obtained as follows. 1. Determine a permutation matrix P rearranging the ρi , i = 1 . . . k, such that the extensible ρi (i. e. those satisfying ρi = ρti ) come ﬁrst followed by the oth(p−1) . (Note: ers ordered into sequences of length p equivalent to ρi , ρti , . . . , ρti (p−1) which is established in These sequences need to be equal to ρi , ρti , . . . , ρti the next step). 2. Calculate a matrix M which is the identity on the extensibles and conjugates (p−1) the sequences of length p to make them equal to ρi , ρti , . . . , ρti . 3. Note that A · P · M is a decomposition matrix for φ, too, and let ρ = φA·P ·M . Extend ρ to G summandwise. For the extensible summands use Minkwitz’ (p−1) formula, the sequences ρi , ρti , . . . , ρti can be extended by ρi ↑T G. p−1 ρ(t)i . 4. Evaluate ρ at t and build D = i=0
154
Markus P¨ uschel, Martin R¨ otteler, and Thomas Beth
5. Construct a blockdiagonal matrix C with Theorem 2, Case 2, conjugating p−1 i=0 λi · ρ such that equivalent irreducibles are equal. C is the identity on the extended summands. Result: B = (1p ⊗ A · P · M) · D · (DFTp ⊗ 1N ) · C
(1)
is a fast Fourier transform for G.
It is obviously possible to construct fast Fourier transforms on a classical computer for any solvable group by recursive use of this algorithm. Since we restrict ourselves to the case of a quantum computer consisting of qubits, i. e. twolevel systems, we apply Algorithm 1 to obtain QFTs for 2groups (G = 2n , p = 2). In this case the two tensor products occurring in (1) ﬁt very well to yield a coarse factorization as shown in Figure 4. The lines correspond to the qubits like in Section 3 and a box ranging over more than one line denotes a matrix admitting no a priori factorization into a tensor product. The remaining problem is the realization of the matrices A, P, M, D, C in terms of elementary building blocks as presented in Section 3. At present, however, this realization remains a creative process which might be performed by hand if a certain class of groups is given. In Section 5 we will apply Algorithm 1 to a class of nonabelian 2groups. n
DFT2 .. .
A
.. .
P
.. .
M
D
.. .
.. .
C
.. .
n–1 .. . 1
Fig. 4. Coarse quantum circuit visualizing Algorithm 1 for 2groups.
5
Generating QFTs for a Class of 2Groups
In the case of G being an abelian 2group the realization of a fast quantum Fourier transform has been settled by [18]. This case is also covered by the method presented here. In this section we will apply Algorithm 1 to the class of nonabelian 2groups containing a cyclic normal subgroup of index 2. Fast quantum Fourier transforms for these groups have already been constructed by Høyer in [16]. According to [17], p. 90/91, there are for n ≥ 3 exactly four isomorphism types of nonabelian groups of order 2n+1 aﬀording a cyclic normal subgroup of order 2n : 1. 2. 3. 4.
The The The The
n
dihedral group D2n+1 = x, y  x2 = y 2 = 1, xy = x−1 . n quaternion group Q2n+1 = x, y  x2 = y 4 = 1, xy = x−1 . n n−1 group QP2n+1 = x, y  x2 = y 2 = 1, xy = x2 +1 . n n−1 quasidihedral group QD2n+1 = x, y  x2 = y 2 = 1, xy = x2 −1 .
Fast Quantum Fourier Transforms for a Class of Nonabelian Groups
155
Observe that the extensions 1, 3, and 4 of the cyclic subgroup Z2n = x split, i. e. the groups have the structure of a semidirect product of Z2n by Z2 . The three isomorphism types correspond to the three diﬀerent embeddings of Z2 = y into (Z2n )× ∼ = Z2 × Z2n−2 . 5.1
QFT for the Dihedral Groups D2n+1
In this section we construct a QFT for the dihedral groups D2n+1 step by step according to Algorithm 1 and explicitly state the occurring quantum circuits. n Let G = D2n+1 = x, y  x2 = y 2 = 1, xy = x−1 with normal subgroup N = x ✂ G of index 2 and transversal T = (1, y). We consider the regular n representation φ = (1E ↑S N ) ↑T G of G with S = (1, x, . . . , x2 −1 ). Obviously the regular representation (1E ↑S N ) of N is decomposed by A = DFT2n into ρ0 ⊕ . . . ⊕ ρ2n −1 where ρi : x → ω2i n . Now we are ready to apply Algorithm 1 to obtain a decomposition matrix B for φ. For convenience we denote ω2n simply as ω and H = DFT2 = √12 · 11 −11 . 1. Since ρyi (x) = ρi (yxy −1 ) = ρi (x−1 ) = ρ2n −i (x) we see that there are exactly two extensible ρi namely for i = 0, 2n−1 . The sequences of inner conjugates are given by ρi , ρ2n −i , i = 0, 2n−1 . Thus we need a permutation P reordering the ρi as ρ , ρ n−1 , ρ , ρ n , . . . , ρi , ρ2n −i , . . . , ρ2n−1 −1 , ρ2n−1 +1 . 0 2 1 2 −1
extensibles
pairs of inner conjugates
This can be accomplished by the circuit given in Figure 5 since the ncycle on the qubits which is performed ﬁrst yields a decimation by two on the indices, i. e. the indices 0, . . . , 2n−1 − 1 have found their correct position. The only thing which remains to do is to perform the operation x → −x on the odd positions. This can be done by an inversion of all (odd) bits followed by a x → x + 1 shift Pn−1 on the odd states of the register. ❉ ...
❉
❉ ❉.
..
❉ ❉ ❉
.. .
x → −x
•
.. .
=
...
❢ ❢
❉ ❉.
..
❉ ❉
❢ ❢ ❉
•
Pn−1
.. .
•
Fig. 5. Ordering the irreducibles of Z2n ✂ D2n+1 . 2. M can be omitted since all the ρi are of degree 1. 3. Let φA·P = ρ. We extend ρ summandwise to ρ: – ρ0 = 1N can be extended by 1G . – ρ2n−1 can be extended through ρ2n−1 (y) = 1. – The sequences ρi ⊕ ρ2n −i , i = 0, 2n−1 can be extended by ρi ↑T G and (ρi ↑T G)(y) = ( 01 10 ).
156
Markus P¨ uschel, Martin R¨ otteler, and Thomas Beth
4. Evaluation of ρ at the transversal T yields the Twiddle matrix D = ρ(1) ⊕ ρ(y) = 12n ⊕ 12 ⊕ ( 01 10 ) ⊕ . . . ⊕ ( 01 10 ),
2n−1 −1
which is realized by the quantum circuit given in Figure 6. r .. .
❢
r ❜
r
❜ ❜ ❢
.. .
r ❜
.. .
.. .
H
Fig. 6. Twiddle matrix for D2n+1 .
❢
❜ ❜ ❢ H
Fig. 7. Equalizing inductions.
5. According to Theorem 2, Case 2, the matrix C has the following diagonal form: C = 12n ⊕ diag(1, 1, 1, −1, . . . , 1, −1),
2n−1 −1 pairs
which is realized by the quantum circuit given in Figure 7. We obtain that B = (1p ⊗ A · P · M) · D · (DFTp ⊗ 1N ) · C is a decomposition matrix for φ and a fast quantum Fourier transform for G. The whole circuit is shown in Figure 8. r
...
DFT2n
...
❢ ❢
❉ ❉ ❉.
..
❉ ❉
❢ ❢ ❉
•
Pn−1
•
.. .
❢
r ❜ ❜ ❜ ❜ ❢
r
H
r ❜ ❜ .. .
H
❢
❜ ❜ ❢ H
Fig. 8. Complete QFT circuit for the dihedral group D2n+1 . 5.2
QFT for the Groups Q2n+1 , QP2n+1 , and QD2n+1
In the following we give the circuits for the groups Q2n+1 , QP2n+1 , and QD2n+1 . In all cases we have x = N ✂ G so that Algorithm 1 has to be performed only once for the last step. For the sake of brevity we will state only those parts of the circuit which diﬀer from the dihedral group. We use the same notation as in the last section. Q2n+1 : The irreducibles ρi extend or induce in the same way as in the dihedral case. Hence the QFT only diﬀers in the Twiddle matrix D since for a not
Fast Quantum Fourier Transforms for a Class of Nonabelian Groups
. Thus D is given by 0 1 ⊕ 12 ⊕ −1 0 ⊕ . . . ⊕ −10 10
extensible ρi we have (ρi ↑T G)(y) = D = 1 2n
157
01 −1 0
2n−1 −1
and can be realized by the circuit in Figure 9. r
r ❜
.. .
.. .
❜ ❜ 0 1 1 0
❈ ✄ ❈ ✄ ❈✄ . .. .. ✄❈ . ✄❈ ✄ ❈ ✄ ❈
0 1 1 0
Fig. 9. Twiddle matrix for Q2n+1 .
Fig. 10. Permutation for QP2n+1 .
QP2n+1 : To determine which ρi are extensible we observe ρyi (x) = ρi (yxy −1 ) = n−1 n−1 n−1 = 1 ⇔ 2  i and there ρi (x2 +1 ). Hence ρi = ρyi ⇔ ω i = ω i·(2 +1) ⇔ ω i·2 are exactly 2n−1 extensible ρi . The reordering permutation P has the easy form shown in Figure 10, and the matrix D is given by D = 12n ⊕ 12n−1 ⊕ ( 01 10 ) ⊕ . . . ⊕ ( 01 10 ),
2n−2
which is simply a double controlled NOT as visualized in Figure 11. The matrix C then is given by Figure 12. r r ...
r r ...
...
❢
H
Fig. 11. Twiddle matrix for QP2n+1 . QD2n+1 : Here we have ρyi (x) = ρi (x2 ρi = ρyi ⇔ ω i = ω i·(2
n−1
−1)
n−1
...
❢
H
Fig. 12. Equalizing for QP2n+1 . −1
) and
⇔ ω i·(2
n−1
−2)
= 1 ⇔ i = 0, 2n−1 .
Thus everything is the same as in the dihedral case beside the ordering permutation P which takes the more complicated form shown in Figure 13. Concerning the complexity of these QFTs we have the following theorem. Theorem 4. The Fourier transforms for the groups G = D2n , Q2n , QP2n , and QD2n can be performed on a quantum computer in O(log2 G) elementary operations.
158
Markus P¨ uschel, Martin R¨ otteler, and Thomas Beth
❇ .. .
❇ ❇
❇❇
❈ ✄ ❈ ✄ .. ❈✄ .. . ✄❈ . ✄ ❈ ✄ ❈ ❜
❢ ❢ ❢ r
Pn−2
. ..
r
r ❜
...
r ❜
Fig. 13. The permutation for the QD2n+1 . Proof: We can treat the four series uniformly, since the Fourier transforms all have the same decomposition pattern. First, in all cases a Fourier transform for the normal subgroup Z2n−1 is performed with cost of O(n2 ) basic operations. The reordering permutation P , the Twiddle matrix D, and the equalizing matrix C cost O(n2 ) in case of D2n , Q2n , and QD2n due to Lemma 1 and the examples in Section 3. For QP2n we need only O(1) operations for P , D, and C. All presented Fourier transforms have been implemented by the authors in the language GAP [14] using the share package AREP [12].
6
Conclusions and Outlook
A constructive algorithm has been presented allowing to attack the problem of constructing fast Fourier transforms for 2groups G on a quantum computer built up from qubits. For a certain class of nonabelian 2groups the algorithm has been successfully applied. All the QFTs created are of computational complexity O(log2 G) like in the case of the cyclic group Z2n . The main problem imposed by the implementation of certain permutation and block diagonal matrices has been solved eﬃciently. Using the recursion formula from Theorem 3 it should be possible to construct QFTs for other classes of groups as well as to realize certain signal transforms on a quantum computer by means of symmetrybased decomposition (see [24], [11], [20]).
Acknowledgments The authors are indebted to Markus Grassl for helpful comments and fruitful discussions. Part of this work was presented and completed during the 1998 ElsagBailey – I.S.I. Foundation research meeting on quantum computation. M. R. is supported by Deutsche Forschungsgemeinschaft, Graduiertenkolleg Beherrschbarkeit Komplexer Systeme under Contract No. DFGGRK 209/398.
References 1. A. Barenco, C. H. Bennett, R. Cleve, D. P. DiVincenzo, N. Margolus, P. Shor, T. Sleator, J. A. Smolin, and H. Weinfurter. Elementary gates for quantum computation. Physical Review A, 52(5):3457–3467, November 1995. 2. R. Beals. Quantum computation of Fourier transforms over the symmetric groups. In Proc. STOC 97, El Paso, Texas, 1997. 3. Th. Beth. Verfahren der Schnellen Fouriertransformation. Teubner, 1984.
Fast Quantum Fourier Transforms for a Class of Nonabelian Groups
159
4. Th. Beth. On the computational complexity of the general discrete Fourier transform. Theoretical Computer Science, 51:331–339, 1987. 5. M. Clausen. Fast generalized Fourier transforms. Theoretical Computer Science, 67:55–63, 1989. 6. M. Clausen and U. Baum. Fast Fourier Transforms. BIVerlag, 1993. 7. James W. Cooley and John W. Tukey. An Algorithm for the Machine Calculation of Complex Fourier Series. Mathematics of Computation, 19:297–301, 1965. 8. D. Coppersmith. An Approximate Fourier Transform Useful for Quantum Factoring. Technical Report RC 19642, IBM Research Division, 1994. 9. W.C. Curtis and I. Reiner. Methods of Representation Theory, volume 1. Interscience, 1981. 10. P. Diaconis and D. Rockmore. Eﬃcient computation of the Fourier transform on ﬁnite groups . Amer. Math. Soc., 3(2):297–332, 1990. 11. S. Egner. Zur Algorithmischen Zerlegungstheorie Linearer Transformationen mit Symmetrie. PhD thesis, Universit¨ at Karlsruhe, Informatik, 1997. 12. S. Egner and M. P¨ uschel. AREP – A Package for Constructive Representation Theory and Fast Signal Transforms. GAP share package, 1998. http://avalon.ira.uka.de/home/pueschel/arep/arep.html. 13. A. Fijany and C. P. Williams. Quantum Wavelet Transfoms: Fast Algorithms and Complete Circuits. In Proc. NASA conference QCQC 98, LNCS vol. 1509, 1998. 14. The GAP Team, Lehrstuhl D f¨ ur Mathematik, RWTH Aachen, Germany and School of Mathematical and Computational Sciences, U. St. Andrews, Scotland. GAP – Groups, Algorithms, and Programming, Version 4, 1997. 15. M. Grassl, W. Geiselmann, and Th. Beth. Quantum Reed–Solomon Codes. In Proc. of the AAECC–13 (this volume), 1999. 16. P. Høyer. Eﬃcient Quantum Transforms. LANL preprint quant–ph/9702028, February 1997. 17. B. Huppert. Endliche Gruppen, volume I. Springer, 1983. 18. A. Yu. Kitaev. Quantum Measurements and the Abelian Stabilizer Problem. LANL preprint quant–ph/9511026, November 1995. 19. D. Maslen and D. Rockmore. Generalized FFTs – A survey of some recent results. In Proceedings of IMACS Workshop in Groups and Computation, volume 28, pages 182–238, 1995. 20. T. Minkwitz. Algorithmensynthese f¨ ur lineare Systeme mit Symmetrie. PhD thesis, Universit¨ at Karlsruhe, Informatik, 1993. 21. T. Minkwitz. Extension of Irreducible Representations. AAECC, 7:391–399, 1996. 22. C. Moore and M. Nilsson. Some notes on parallel quantum computation. LANL preprint quant–ph/9804034, April 1998. 23. M. P¨ uschel. Konstruktive Darstellungstheorie und Algorithmengenerierung. PhD thesis, Universit¨ at Karlsruhe, Informatik, 1998. Translated in [24]. 24. M. P¨ uschel. Constructive representation theory and fast signal transforms. Technical Report DrexelMCS19991, Drexel University, Philadelphia, 1999. Translation of [23]. 25. D. Rockmore. Some applications of generalized FFT’s. In Proceedings of DIMACS Workshop in Groups and Computation, volume 28, pages 329–370, 1995. 26. J. P. Serre. Linear Representations of Finite Groups. Springer, 1977. 27. P. W. Shor. Algorithms for Quantum Computation: Discrete Logarithm and Factoring. In Proc. FOCS 94, pages 124–134. IEEE Computer Society Press, 1994.
Linear Codes and Rings of Matrices M. Greferath and S.E. Schmidt AT&T LabsResearch, 180 Park Ave, Florham Park, NJ 07932
Abstract. We investigate the role of matrix rings in coding theory. Using these rings we introduce an embedding technique which enables us to give new interpretations for some previous results about algebraic representations of some prominent linear codes.
Introduction Codes over rings have been enjoying growing importance during the recent decade. This is based on the observation by Nechaev [13] and Hammons et al. [9] that certain nonlinear binary codes (Kerdock, Preparata and further Codes) of high quality are linear over ZZ 4 . Since then various papers have been published many of them dealing with codes over ZZ 4 but also involving other rings (cf. [8,3,2]). Matrix rings are a particular class of noncommutative alphabets which— apart from [2]—have not yet been involved in coding theory. In what follows we therefore ﬁrst describe relations of coding theory over the latter class of rings to that over their base rings. Subsequently we consider embeddings of given rings into matrix rings and study their properties. To get prepared for the last section we consider matrix representations of algebras. Extensions of SkolemNoether type theorems based on results by Nechaev [12] show that under suitable conditions an embedding of a Kalgebra R into a matrix algebra Mm (K) is unique up to inner automorphisms of the target algebra, provided K is a commutative Artinian local ring. In the last section we apply the preparations of the preceeding ones and provide a diﬀerent view on several known code representations via matrix rings. In particular our considerations shed new light on the results in Wolfmann [16] and Pasquier [14] (cf. also Goldberg [7]) and yield similar statements for extended binary Hamming Codes and the Octacode. In what follows the reader is supposed to be familiar with basic facts of ring and module theory as well as order theory. We will frequently denote the set of all linear (left) codes of length n over a ring K—i.e. the lattice of all submodules of K K n —by L(K K n ). Hence, our use of the term lattice is the order theoretical one rather than the one usually referred to in coding theory.
Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 160–169, 1999. c SpringerVerlag Berlin Heidelberg 1999
Linear Codes and Rings of Matrices
1
161
Linear Codes under Morita Equivalence
Let K be a ring, m a positive integer, and let S denote the full ring of all (m × m)matrices over K. We establish a connection between linear codes over K and those over S referring to what is known as Morita equivalence. For a module K M let S operate on the abelian group M m via: S × M m −→ M m ((aij )i,j , (mj )j ) −→ ( aij mj )i , j
where the index domain of (aij )i,j and (mj )j are thought to be understood. Hence, S operates in the same way on M m , as it operates on K m ; the only diﬀerence is that the matrix elements are multiplied with module elements rather than ring elements. Lemma 1. For every module isomorphic to S M m .
SN
there exists a module
KM
such that
SN
is
The following lemma clariﬁes the connection between the submodule lattices involved. For an element x of a module K M let x(m) denote the mfold concatenation of x, i.e. the element (x1 , . . . , xm ) ∈ M m with xi = x for all i = 1, . . . , m. Proposition 1. 1. The mapping M −→ M m with x −→ x(m) is a semilinear embedding where the accompanying ring homomorphism K −→ S is given by the natural embedding of K in S with λ → λIm . 2. Its induced lattice mapping L(K M) −→ L(S M m ) is given by C −→ C m and is an isomorphism. Proof. It is easily veriﬁed that the mapping in question is additive and that (λx)(m) = (λIm )x(m) , for all λ ∈ K and x ∈ M, which is our ﬁrst claim. The second claim is a direct consequence. We are interested in how the property of being free carries over to the respective module. Lemma 2. Let S be the ring of (m × m)matrices over the ring K, let K M and m S N be modules with S N S M . Then S N is free of rank n if and only if K M is free of rank nm. For the coding theoretical context we now summarize what we have seen in a theorem: Theorem 1. For a ring K and its ring of all (m × m)matrices Mm (K) there is a 1 − 1correspondence between linear codes of length mn over K and linear codes of length n over Mm (K). This correspondence is induced by (semilinearly) embedding the former as their mfold concatenations into the latter.
162
M. Greferath and S.E. Schmidt
Remark 1. There is a natural relation between the respective decoding problems: 1. Any decoding scheme A for a Klinear code C can be extended to a scheme A∗ used to decode its Mm (K)linear image C m under the natural embedding established above. It proceeds in m steps at the cost of one application of A per step. It corrects any matrix error pattern each full row of which is!correctable by A. 2. Any decoding scheme A∗ for an Mm (K)linear code C m can be reduced to a scheme A used to decode its Klinear preimage C under the natural embedding. It proceeds at the cost of one application of A. It corrects any error pattern e the mfold concatenation e(m) of which is correctable by A∗ . The correspondence between codes over rings and those over their matrix rings is purely algebraic so far. It does not imply any further going equivalence since it does not involve any metrical aspect. The following considerations are devoted to this point. We establish a weight function on Mm (K) for which the semilinear embedding mentioned in Prop. 1 is an isometry. Let again S denote the ring of all (m × m)matrices over the ring K, and let wK : K −→ IR be a weight function on K, which shall be thought to be completed additively on K m . We deﬁne a function wS : S −→ IR A −→ max wK (Ai ), i=1...m
where Ai denotes the ith row of A. This kind of rowsum norm is known to satisfy the triangle inequation and is strictly positive. We have furthermore wS (A) = wS (−A) for all A ∈ S. As usual we complete wS additively to S n (n ∈ IN). Then our deﬁnition gets justiﬁed by the following statement. Proposition 2. Let wK be a weight function on K and wS the just defined weight function on S. The semilinear embedding K K nm −→ S S n with x −→ x(m) is an isometry of (K nm , wK ) into (S n , wS ). Proof. It is easily veriﬁed that wS (x(m) ) = wK (x) for all x ∈ K nm .
For the set of all linear Codes of length nm over K and those of length n over S we are now able to prove the following: Theorem 2. Let K be a unital ring, wK a weight function on K and S the ring of all (m × m)matrices over K together with the abovedefined weight function wS . For every natural number n the lattice isomorphism L(K K nm ) −→ L(S S n ) induced by the isometry K nm −→ S n preserves minimum distance. Proof. Let C be a Klinear code of length nm, and let C m be its Slinear image under the lattice mapping in question. If x is a word of minimal Kweight in C, then by Prop. 2 we have wS (x(m ) = wK (x) and therefore dS (C m ) ≤ dK (C). Conversely, if y is a word of minimal Sweight in C m , then each full row yi
Linear Codes and Rings of Matrices
163
(i = 1, . . . , m) of this matrix word is contained in C and satisﬁes by deﬁnition of wS clearly wK (yi ) ≤ wS (y). This however shows, that dK (C) ≤ dS (C m ), and therefore our claim follows. We are now able to reﬁne our statements in Rem. 1. Theorem 3. Let K be a unital ring, wK a weight function on K and S the ring of all (m × m)matrices over K together with the rowsum weight function wS . Let furthermore A be a decoding scheme for a Klinear code C of length nm and A∗ its extension for the Slinear code C m . Then A is bounded distance (with respect to wK ) if and only if A∗ is bounded distance (with respect to wS ). Proof. Let ﬁrst A correct error patterns of Kweight up to t, and let y be an Serror pattern of Sweight s ≤ t. As we saw in the proof of Thm. 2 the Kweight of each full row of y is at most s, and hence it can be corrected by A. But this shows that A∗ corrects Serror patterns of Sweight upt to t. Conversely, if A∗ corrects error patterns of Sweight up to t, and x is a Kerror pattern of Kweight s ≤ t, then by Prop. 2 combined with wS (x(m) ) = wK (x) it is immediate that x(m) can be corrected by A∗ , and therefore x can be corrected by A.
2
Code Correspondences Induced by Ring Embeddings
The situation of the foregoing section changes signiﬁcantly when considering more general embeddings. Let ϕ : R −→ S be a unital ring embedding and let n be a natural number. The given embedding clearly induces a semilinear embedding ϕ : R Rn −→ n S S deﬁned componentwise and the latter gives rise to a lattice mapping ϕ : L(R Rn ) −→ L(S S n) with C → S ϕ(C). This mapping need not be injective, as the natural embedding of ZZ into Q shows. It is immediate, however, that ϕ is (completely)joinpreserving, i.e. for any family (Ci )i∈I of submodules of R Rn there holds ϕ( i∈I Ci ) = i∈I ϕ(Ci ). Under appropriate assumptions on the relation between R and S an according statement for ﬁnite meets can be proved. Recall that S obtains the structure of an Rbimodule by setting r · s := ϕ(r)s and s · r := sϕ(r), and it is clear that (r · s) · r = r · (s · r ) for all r, r ∈ R and s ∈ S. Keeping this in mind we ﬁnd it convenient to write ϕ(C) = S ⊗ϕ R C. Lemma 3. If SR is a flat module, then ϕ : L(R Rn ) −→ L(S S n ) is (finite) meet preserving, i.e. ϕ(C ∩ D) = ϕ(C) ∩ ϕ(D) for all C, D ∈ L(R Rn ). Proof. cf. [4, §2.6].
The last statement is in particular valid, if SR is a projective module. In the coding theoretical context the ﬂatness of SR might be important under a diﬀerent aspect: Let an Rlinear left code C of length n possess a (k × n)generator matrix G and an (m × n)check matrix H. We have C = Im(G), which is usually represented by the short exact sequence G
0 −→ ker(G) −→ Rk −→ C −→ 0.
164
M. Greferath and S.E. Schmidt
Applying ϕ to this sequence and subsequent tensoring with S SR and idS results in ϕ(C) = Im(ϕ(G)) because of ⊗ϕ R being right exact. Therefore ϕ maps generator matrices for C to such for ϕ(C). On the other hand the equality C = ker(H t ) may be represented by the sequence Ht
0 −→ C −→ Rn −→ Im(H t ) −→ 0, the exactness of which is not preserved in general under the above manipulations. If however SR is presumed to be ﬂat then ϕ(C) = ker(ϕ(H t )), which shows that ϕ maps check matrices for C to such for ϕ(C). Next, we clarify under which conditions ϕ is injective, and hence an order embedding. It is known, direct decompositions (cf. [1, that tensoring preserves Lemma 19.9]), i.e. if i∈I Ci = Rn then i∈I ϕ(Ci ) = S n. For our question we therefore obtain (cf. [11, 5.2.5]): Proposition 3. The restriction of the lattice mapping ϕ : L(R Rn ) −→ L(S S n ) to the set of all direct summands of R Rn is injective. We summarize what we have seen in terms of coding theory. From [8] recall a splitting code to be a linear code having a complement in its ambient module. Theorem 4. Let ϕ : L(R Rn ) −→ L(S S n) denote the lattice mapping induced by the ring embedding ϕ : R −→ S. 1. ϕ is joinpreserving. Its restriction to the set of all splitting codes is injective. 2. If the module SR is flat then ϕ is meetpreserving. 3. If R is semisimple then ϕ is a lattice embedding. Proof. The ﬁrst two statements result from our initial remarks together with Prop. 3 and Lem. 3. For our last claim we just remark that in the semisimple case every code is splitting and every Rmodule is ﬂat because of projectivity (cf. [1, Cor. 17.4]).
3
Matrix Representations of Algebras
In what follows, let K be a commutative ring, and R be a Kalgebra. We are interested in unital embeddings R −→ Mm (K), for suitable m ∈ IN, and ﬁrst give conditions under which these always exist. Proposition 4. A unital Kalgebra embedding R −→ Mm (K) exists if and only if there exists a faithful (left module) operation of R on K m . Proof. Suppose there is an embedding σ : R −→ Mm (K). Then K m naturally obtains the structure of an Rmodule by r · x = σ(r)x for all x ∈ K m . Obviously r · K m = 0 implies r = 0, which means that R is operating faithfully on K m . If, conversely, K m is an Rmodule, then leftmultiplication by elements of R is a Kendomorphism of K m , which provides a homomorphism σ : R −→ Mm (K) which is injective if and only if R K m is a faithful module.
Linear Codes and Rings of Matrices
165
Observing that R R is always a faithful module it is now clear that if R is free of rank m over K, then there is a natural faithful operation of R on K m . According to Prop. 4 this gives rise to an embedding of R in Mm (K). For a particular case this embedding is easy to construct. Proposition 5. Let K be an Artinian commutative local ring and let f := m i f x ∈ K[x] be a monic irreducible polynomial. Then the mapping β : i i=0 K[x] −→ Mm (K), g(x) → g(X) induces a unital embedding of K[x]/f into Mm (K) where 0 0 · · · 0 −f0 1 0 · · · 0 −f1 X := 0 1 · · · 0 −f2 . .. . . . . .. .. . . . . . 0 0 · · · 1 −fm−1 The matrix X introduced in the last proposition is known as the companion matrix of the polynomial f. As we saw, it provides a particular embedding of K[x]/f into the ring Mm (K), and there immediately arises the question for a classiﬁcation of all unital embeddings of K[x]/f into the ring Mm (K). In case of K being a ﬁeld, this is wellknown as a consequence of the following theorem. For a short proof see [10, p. 222]. Theorem 5. Let K be a field, and S be a simple Kalgebra the center of which is K. Then every algebra homomorphism from a simple subalgebra R of S into R is the restriction of an inner automorphism of S. For the context at hand, there is a need for a statement clarifying the situation where K is a local ring. Problems of this type have been treated in [12] which contains a proof of the following: Theorem 6. Let K be an Artinian commutative local ring, and let f ∈ K[x] be a monic irreducible polynomial of degree m. Then each pair of unital embeddings of K[x]/f into the matrix ring Mm (K) are conjugate. We ﬁnally give four example of unital embeddings based on the previously outlined companion matrix approach. These embeddings will be used for new code representations in the following section. Example 1. 1. Let K := F2 and R := K[ξ] with ξ 2 + ξ + 1 = 0, i.e. R is the 4element ﬁeld. The companion matrix approach yields the embedding R −→ M2 (K) generated by
01 ξ → . 11 2. Let K := F2 and R := K[ξ] with ξ 3 + ξ + 1 = 0 be the 8element ﬁeld. We will make use of the embedding R −→ M3 (K) generated by −1 001 010 111 010 ξ → 1 1 0 = 0 0 1 1 0 1 0 0 1 . 010 110 100 110
166
M. Greferath and S.E. Schmidt
3. For K := F3 and the 9element ﬁeld R := K[ξ] with ξ 2 + ξ + 2 = 0 we will use the embedding R −→ M2 (K) generated by
ξ →
−1
12 10 01 10 = . 11 12 12 12
4. Let K := ZZ 4 and consider R := K[ξ] with ξ 2 + ξ + 1 = 0, i.e. the Galois ring GR(4, 2). We introduce the embedding R −→ M2 (K) generated by
ξ →
−1
21 31 03 31 = 11 10 13 10
which will play its role in a new representation of the quaternary Octacode.
4
Representations for Linear Codes
In the foregoing sections we dealt with diﬀerent kinds of ring embeddings and clariﬁed the induced correspondences between respective sets of linear codes. The goal of this section is to combine these results: Given a Kalgebra R, a natural number m, and a unital embedding R −→ Mm (K), we study a kind of representation of all codes of length n over R by codes of length nm over K according to the following diagram. [n, k]Codes over Mm (K)
.....................
Sect. 1
............ .........
[nm, km]Codes over K
...... ................ ....... .... . . . .... .... ....
....... ............. ....... .... .... .... .... ...
Sect. 2,3
[n, k]Codes over R
This may be illustrated next: Example 2. Let H(2, 3) be the extended binary Hamming Code of length 8 and dimension 4, and let S denote the ring of (2 × 2)matrices over F2 . Row manipulations on the usual parity check matrix for H(2, 3) produce a matrix which is especially good for a representation of C via the fourelement ﬁeld, namely: 10101010 0 1 0 1 0 1 0 1 H := 0 0 1 0 0 1 1 1. 00011110
Linear Codes and Rings of Matrices
167
Considering the embedding F4 = F2 [ξ] −→ M2 (F2 ) as introduced in Ex. 1(1) it is obvious that the above matrix comes from an F4 check matrix given by
1111 0 1 ξ ξ2
It checks a selfdual [4, 2, 3] code correcting one random error which carries over to the image code over M2 (F2 ), i.e. the latter code corrects up to one full matrix error. This yields more than can be directly seen from the distance properties of the code H(2, 3) that we started with: it proves in an elementary way that H(2, 3)—in addition to its single random error correction—is able to correct certain adjacent double error patterns. As a generalization we obtain: Theorem 7. For all odd r ∈ IN the extended binary Hamming code H(2, r) is induced by an F4 linear code.
Optimal Binary Codes of Length 24 – The Binary Golay Code Pasquier [14] and Wolfmann [16] show, how to construct the binary Golay code from the extended selfdual [8, 4, 5]8 ReedSolomon code over the eight element ﬁeld (cf. also [6, p. 130]). If α is a primitive element of F8 satisfying α3 = α + 1 then the elements of the F2 basis B := {α6 , α3 , α5 } of F8 satisfy the relation tr(xy) = δxy . The image of a coordinatewise binary representation of the ReedSolomon code at hand using B is the [24, 12, 8] GolayCode. Note that the matrix representation of the mapping F8 −→ F8 , x → αx is given by 111 1 1 0, 100
i.e. the matrix in Ex. 1(2). Representing all other F8 ReedSolomon codes in this way, we obtain binary linear codes of length 24 and dimensions 21, 18, 15, 9, 6, 3, respectively. Elementary row and column manipulations prove that the resulting codes have the parameters [24, 21, 2], [24, 18, 4], [24, 15, 4], [24, 9, 8], [24, 6, 8], [24, 3, 8]. According to [15] the ﬁrst 4 of them are optimal. For the [24, 12, 8] Golay code this yields a generator matrix for the Wolfmanndescription [16], which we write down as an illustration.
168
M. Greferath and S.E. Schmidt
100100100100100100100100 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 1 1 1 0 1 0 1 1 0 1 0 1 1 0 0 0 1 0 0 0 0 1 0 1 1 0 0 0 1 1 0 0 1 1 1 1 0 1 0 1 1 0 0 0 0 0 1 1 0 0 1 1 1 1 0 1 0 1 1 0 1 0 1 1 0 0 0 0 1 0 0 1 0 1 0 1 0 0 0 1 1 1 1 0 1 1 1 1 0 0 0 0 0 1 0 0 0 1 1 1 1 0 1 1 1 1 0 1 0 0 1 0 1 0 0 0 0 0 1 1 1 1 0 1 1 1 1 0 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0 1 1 0 0 1 1 0 1 1 1 0 1 1 1 0 1 0 0 0 0 0 1 0 1 0 0 0 1 1 0 0 1 1 0 1 1 1 0 1 1 1 000001101110111010100011 The Ternary Golay Code Goldberg [7] gives a construction of the [12, 6, 6]Golay code as a ternary image of a selfdual code over F9 . We give a slight modiﬁcation based on the embedding in Ex. 1(3). The matrix 1 0 0 1 ξ −ξ 0 1 0 ξ 1 ξ , 0 0 1 −ξ ξ 1 is a check matrix for a kind of [6, 3, 4] (hermitian selfdual) hexacode over F9 . Its image under the given embedding produces the ternary (6 × 12)matrix 100000101221 0 1 0 0 0 0 0 1 1 1 2 2 0 0 1 0 0 0 1 2 1 0 1 2 0 0 0 1 0 0 1 1 0 1 1 1 0 0 0 0 1 0 2 1 1 2 1 0 000001221101 which by elementary row and column manipulations can be seen to be a check matrix for the extended ternary [12, 6, 6] Golay Code. The Octacode In Ex. 1(4) we introduced a unital embedding of the Galois ring GR(4, 2) into M2 (ZZ 4 ). It induces the (2 × 4)check matrix
1 0 ξ ξ2 0 1 ξ 2 −ξ for a selfdual code over GR(4, 2) (a kind of tetracode) being mapped to the (4 × 8)matrix 10002113 0 1 0 0 1 1 3 2 0 0 1 0 1 3 2 3. 00013233
Linear Codes and Rings of Matrices
169
Up to equivalence the latter matrix can be seen to be a check matrix for the Octacode.
Acknowledgments The authors like to thank AT&T LabsResearch for their support. They are in particular indebted to T. Y. Lam, E. Rains and N. J. A. Sloane for helpful discussions.
References 1. F. W. Anderson and K. R. Fuller, Rings and categories of modules, Springer Verlag, New York, Berlin, Heidelberg, 1992. 2. C. Bachoc, Applications of coding theory to the construction of modular lattices, J. Combin. Theory, Series A 78 (1997), 92–119. 3. A. Bonnecaze and P. Udaya, Cyclic codes and selfdual codes over F2 + uF2 , Preprint, 1997. 4. N. Bourbaki, Elements of mathematics, ch. Commutative algebra, AddisonWesley, Reading, MA, 1972. 5. A. E. Brouwer, Bounds on the size of linear codes, Handbook of coding theory, Vol. I, II, NorthHolland, Amsterdam, 1998, pp. 295–461. 6. P. J. Cameron and J. H. van Lint, Designs, graphs, codes and their links, Cambridge University Press, Cambridge, 1991. 7. D. Y. Goldberg, Reconstructing the ternary Golay code, J. Combin. Theory Ser. A 42 (1986), no. 2, 296–299. 8. M. Greferath, Cyclic codes over finite rings, Discrete Math. 177 (1997), no. 13, 273–277. 9. A. R. Hammons, P. V. Kumar, A. R. Calderbank, N. J. A. Sloane, and P. Sol´e, The ZZ 4 linearity of Kerdock, Preparata, Goethals and related codes, IEEE Trans. Inform. Theory 40 (1994), 301–319. 10. N. Jacobson, Basic Algebra II, W. H. Freeman and Company, New York, 1989. 11. J. Lambek, Lectures on rings and modules, Blaisdell Publ. Co., Waltham, MassachusettsTorontoLondon, 1966. 12. A. A. Nechaev, Similarity of matrices over commutative Artinian rings, J. of Soviet. Math. 33 (1986), 1221–1237. 13. A. A. Nechaev, Kerdock codes in a cyclic form, Discrete Math. Appl. 1 (1991), 365–384. 14. G. Pasquier, The binary GOLAY code obtained from an extended cyclic code over F8 , European J. Combin. 1 (1980), no. 4, 369–370. 15. V. S. Pless, W. C. Huﬀman, and R. A. Brualdi (eds.), Handbook of coding theory. Vol. I, II, NorthHolland, Amsterdam, 1998. 16. J. Wolfmann, A new construction of the binary Golay code (24, 12, 8) using a group algebra over a finite field, Discrete Math. 31 (1980), no. 3, 337–338.
On ZZ 4 Simplex Codes and Their Gray Images Mahesh C. Bhandari, Manish K. Gupta, and Arbind K. Lal Department of Mathematics, Indian Institute of Technology, Kanpur, 208 016 India {mcb,mankg,arlal}@iitk.ac.in, m.k.gupta@ieee.org
Abstract. In [7] Rains has shown that for any linear code C over ZZ 4 , dH , the minimum Hammimg distance of C and dL , the minimum Lee distance of C satisfy dH ≥ d2L . C is said to be of type α(β) if dH = d2L (dH > d2L ). In this paper we deﬁne Simplex codes of type α and β, namely, Skα and Skβ , respectively over ZZ 4 . Some fundamental properties like 2dimension, Hamming and Lee weight distributions, weight hierarchy etc. are determined for these codes. It is shown that binary images of Skα and Skβ by the Gray map give rise to some interesting binary codes.
1
Introduction
The key motivation for studying codes over ZZ 4 , the ring of integers modulo 4 is that they can be used to obtain desirable types of good binary codes. Such codes have been studied recently in connection with the construction of lattices, sequences with low correlation and in a variety of other contexts [3][5],[10], [11]. Many good nonlinear binary codes of high minimum distances have a simple description as a linear codes over ZZ 4 . Being a linear code decoding becomes simpliﬁed. A linear code C, of length n, over ZZ 4 is a submodule of ZZ n4 . The minimum Hamming distance dH of C is given by dH = min{wH (x − y) : x, y ∈ C, x = y}, where wH (x) is the number of nonzero components in x. It is widely used for error correction/detection capabilities. Another distance which is not that widely used is the Lee distance. Lee weight of an element a ∈ ZZ 4 , denoted wL (a) is the minimum of {a, 4 − a}. Lee weight of a vector x ∈ ZZ n4 is the sum of Lee weights of its components and the minimum Lee distance of C is dL = min{wL (x − y) : x, y ∈ C, x = y}. In [7] Rains has shown that for any linear code C over ZZ 4 , dH ≥
dL . 2
C is said to be a code of type α(β) if dH = d2L (dH > d2L ). Note that the Octacode is of type β while Reed Muller code of ﬁrst order is of type α. The Gray map φ : ZZ n4 → ZZ 2n 2 is the coordinate wise extension of the function from ZZ 4 to ZZ 22 deﬁned by 0 → (0, 0), 1 → (0, 1), 2 → (1, 1) and 3 → (1, 0). Thus φ(C), the image of a linear code C over ZZ 4 of length n by the Gray map is a binary code of length 2n. If φ(C) is linear then C is called ZZ 2 linear. Some well known binary nonlinear codes are images by the Gray map of linear codes over Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 170–179, 1999. c SpringerVerlag Berlin Heidelberg 1999
On ZZ 4 Simplex Codes and Their Gray Images
171
ZZ 4 . Dual code C ⊥ of C is deﬁned in the natural way. In [5] Hammons et al has shown that Kerdock and Preparata codes are dual over ZZ 4 and hence explains the duality relation between these nonlinear binary families. In this correspondence we deﬁne ZZ 4  Simplex codes of type α &β, namely, Skα and Skβ and study the images of such codes under the gray map in two ways. In [9] Satyanarayana has shown that the Lee weight of every nonzero codeword in Skα is 22k . A characterization of such constant Lee weight codes has been obtained by Carlet [2]. We determine the fundamental parameters like 2dimension, Hamming and Lee weight distributions and weight hierarchy for these codes. It is shown that both Skα and Skβ satisfy chain conditions. We also obtain some nonlinear and linear binary codes associated with these codes. Section 2 contains some preliminaries and notations. Deﬁnitions and basic parameters of ZB4 simplex codes of type α and β are given in section 3. Section 4 deals with binary images of Simplex codes of type α and β under the gray map and some fundamental properties.
2
Preliminaries and Notations
A linear code C over ZZ 4 has a generator matrix G ( rows of G generate C) of the form Ik0 A B1 + 2B2 G= (1) 0 2Ik1 2C where A, B1 , B2 and C are matrices with entries 0 or 1 and Ik is the identity matrix of order k. Two codes are said to be equivalent if one can be obtained from the other by rearrangements of columns or by multiplying one or more coordinates by a unit in ZZ 4 . For each a ∈ ZZ 4 let a ¯ be the reduction of a modulo 2 then the code C (1) = {(c¯1 , c¯2 , . . . , c¯n ) : (c1 , c2 , . . . , cn ) ∈ C} is a binary linear code called the residue code of C. Another binary linear code associated with C is the torsion code C (2) which is deﬁned by c C (2) = { : c = (c1 , . . . , cn) ∈ C and ci ≡ 0 (mod 2) for 1 ≤ i ≤ n}. 2 If k1 = 0 then C (1) = C (2) . For details and further references see [3],[8]. A vector v is a 2linear combination of the vectors v1 , v2 , . . . , vk if v = l1 v1 + . . . + lk vk with li ∈ ZZ 2 for 1 ≤ i ≤ k. A subset S = {v1 , v2 , ..., vk } of C is called a 2basis for C if for each i = 1, 2, ..., k − 1, 2vi is a 2−linear combination of vi+1 , ..., vk , 2vk = 0, C is the 2linear span of S and S is 2linearly independent [13]. The number of elements in a 2basis for C is called the 2dimension of C. It is easy to verify that the rows of the matrix Ik0 A B1 + 2B2 (2) 2B1 B = 2Ik0 2A 0 2Ik1 2C form a 2basis for the code C generated by G given in (1). The following lemma will be needed in Section 4.
172
Mahesh C. Bhandari, Manish K. Gupta, and Arbind K. Lal
Lemma 1. The images of the rows of B under the gray map φ are linearly independent over ZZ 2 . Proof. Applying the gray map φ to the rows of B with a suitable rearrangement of rows yields a binary matrix of the type 01 11 .. 0 . 0 0 01 0 0 11 0 0 0 11 . 0 0 0 0 .. 0 0 0 0 0 11
01 with k0 blocks of and k1 blocks of 1 1 . It is easy to see that the rows 11 of the the above matrix are linearly independent. A linear code C over ZZ 4 ( over ZZ 2 ) of length n, 2dimension k, minimum Hamming distance dH and minimum Lee distance dL is called an [n, k, dH , dL ] ([n, k, dH ]) or simply an [n, k] code. A necessary and suﬃcient condition for ZZ 2 linearity is given by the following theorem. Theorem 1. [5] C is ZZ 2 linear if and only if whenever c = (c1 , . . . , cn ), c = (c1 , . . . , cn ) ∈ C, 2¯ c c¯ = (2c¯1 c¯1 , . . . , 2c¯nc¯n ) ∈ C. Thus, if C is an [n, k, dH , dL ] ZZ 2 linear code over ZZ 4 then φ(C) is a binary linear [2n, k, dL ] code. Hence by the Griesmer bound for binary linear codes [6] we have n≥
k−1 1 dL . 2 i=0 2i
(3)
Note that the Octacode O8 meets the bound given by (3) even though it is not ZZ 2 linear. For 1 ≤ r ≤ k, the rth Generalized Hamming weight of C is deﬁned by dr (C) = min{wS (Dr ) : Dr is an [n, r] subcode of C}, where wS (D), called support size of D, is the number of coordinates in which some codeword of D has a nonzero entry. The set {d1 (C), d2 (C), . . . , dk (C)} is called the weight hierarchy of C. In [15] Yang et al has obtained a lemma connecting Lee weights to the support size of a subcode. Thus for any linear code C over ZZ 4 , dr (C) may also be deﬁned by dr (C) =
1 min{ wL (d) : Dr is an r − dimensional subcode of C} r 2 d∈Dr
On ZZ 4 Simplex Codes and Their Gray Images
173
The two deﬁnitions are equivalent. C is said to satisfy the chain condition if there exists a chain D1 ⊆ D2 ⊆ · · · ⊆ Dk , of subcodes of C satisfying wS (Dr ) = dr (C), 1 ≤ r ≤ k. In [14] Yang and Helleseth has shown that the Goethals code over ZZ 4 of length 8 satisﬁes chain condition. A relation between dr (C) and dr (C ⊥ ) is given by the following theorem. Theorem 2. [1] Let C be an [n, k] linear code over ZZ 4 Then {dr (C) : 1 ≤ r ≤ k} = {1, 1, 2, 2, . . . , n, n}\{n + 1 − dr (C ⊥ ) : 1 ≤ r ≤ 2n − k}.
3
ZZ 4 Simplex Codes of Type α and β
Let Gk be a k × 22k matrix over ZZ 4 consisting of distinct columns. Inductively, Gk may be written as 00 · · · 0 11 · · · 1 22 · · · 2 33 · · · 3 Gk = Gk−1 Gk−1 Gk−1 Gk−1 with G1 =[0123]. The code Skα generated by Gk has been visited earlier [9,2]. In [9] Satyanarayana has shown that the Lee weight of every nonzero codeword of Skα is 22k . While Carlet, in [2] has classiﬁed all constant Lee weight codes over ZZ 4 . Clearly, the 2dimension of Skα is 2k. The following observations are useful in determining Hamming (Lee) weight distributions of Skα. α and if i = (i, i, i, ..., i) Remark 1. If Ak−1 denotes an array of codewords in Sk−1 α then an array of all codewords of Sk is given by
Ak−1 Ak−1 Ak−1 Ak−1 Ak−1 1 + Ak−1 2 + Ak−1 3 + Ak−1 Ak−1 2 + Ak−1 Ak−1 2 + Ak−1 Ak−1 3 + Ak−1 2 + Ak−1 1 + Ak−1
Remark 2. If R1 , R2 , ..., Rk denote the rows of the matrix Gk then wH (Ri ) = 3 · 22k−2 , wH (2Ri ) = 22k−1 and wL (Ri ) = 22k = wL (2Ri ). It may be observed that each element of ZZ 4 occurs equally often in every row of Gk . In fact we have the following lemma. Lemma 2. Let c ∈ Skα . If one of the coordinates of c is a unit then every element of ZZ 4 occurs 4k−1 times as a coordinate of c. Otherwise wH (c) = 22k−1 .
174
Mahesh C. Bhandari, Manish K. Gupta, and Arbind K. Lal
α Proof. By Remark 1, any x ∈ Sk−1 gives rise to the following four codewords of α Sk
y1 = x x x x ,
y2 = x 1 + x 2 + x 3 + x ,
y3 = x 2 + x x 2 + x and y4 = x 3 + x 2 + x 1 + x . Hence by induction, the assertion follows. Let G(Sk ) ( columns consist of all nonzero binary ktuples) be a generator matrix for an [n, k] binary simplex code Sk . Then the extended binary simplex 0 code Sˆk is generated by the matrix G(Sˆk ) = ... G(S ) . Inductively, k
0 G(Sˆk ) =
00 . . . 0 11 . . . 1 ˆ ) G(Sk−1 ˆ ) . G(Sk−1
(4)
Lemma 3. The torsion code of Skα is equivalent to the 2k copies of Sˆk . Proof. Observe that the torsion code of Skα is the set of codewords obtained by replacing 2 by 1 in all 2linear combination of the rows of the matrix 00 · · · 0 22 · · · 2 00 · · · 0 22 · · · 2 2Gk = . (5) 2Gk−1 2Gk−1 2Gk−1 2Gk−1 Now the assertion follows by induction on k and by regrouping the columns in (5) according to (4). As a consequence of Lemmas 2 and 3 one obtains Hamming and Lee weight distributions of Skα . Theorem 3. The Hamming and Lee weight distribution of Skα are : 1. AH (0) = 1, AH (22k−1 ) = 2k − 1, AH (3 · 22k−2 ) = 2k (2k − 1), and 2. AL (0) = 1, AL (22k ) = 22k − 1. where AH (i) (AL (i)) denotes the number of vectors of Hamming (Lee) weight i in Skα. Proof. By Lemma 2, each nonzero codeword of Skα has Hamming weight either 3 · 4k−1 or 22k−1 and Lee weight 22k . Since dimension of the torsion code is k, there will be 2k − 1 codewords of the weight 22k−1 . Hence the number of codewords having weight 3 · 4k−1 will be 4k − 2k . Remark 3. 1. Skα is an equidistant code with respect to Lee distance whereas Sk is an equidistant binary code with respect to Hamming distance. 2. Skα is of type α.
On ZZ 4 Simplex Codes and Their Gray Images
175
Let c = (c1 , c2 , . . . , cn) ∈ C and let ωj (c) = {k : ck = j}. Then the correlation √ of c ∈ C is deﬁned as θ(c) = (ω0 (c) − ω2 (c)) + i(ω1 (c) − ω3 (c)); where i = −1 [11]. The Symmetrized weight enumerator (swe) of C over ZZ 4 is given by swe(x, y, z) = c∈C xω0 (c)y ω1 (c)+ω3(c)z ω2 (c) [8]. Let S¯kα be the punctured code of Skα obtained by deleting the zero coordinate. Then the swe of S¯kα is 2k−1 swe(x, y, z) = xn(k) + (2k − 1)(xz)n(k−1) z 1 + 2k y 2 , where n(k) = 4k − 1 and correlation of any c ∈ S¯kα is given by θ(c) = −1.
(6)
The length of Skα is large as compare to its 2dimension and increases fast with increment in its 2dimension. But one can always drop some columns from Gk in a speciﬁc manner to yield good codes over ZZ 4 in the sense of having maximum possible Lee weights for the given length and 2dimension. Let Gβk be the k × 2k−1 (2k − 1) matrix deﬁned inductively by 1111 0 2 Gβ2 = , 0123 1 1 and for k > 2 Gβk =
11 · · · 1 00 · · · 0 22 · · · 2 , Gk−1 Gβk−1 Gβk−1
α . Note that Gβk is obtained from Gk where Gk−1 is the generator matrix of Sk−1 by deleting 2k−1 (2k + 1) columns. By induction it is easy to verify that no two columns of Gβk are multiple of each other. Let Skβ be the code generated by Gβk . Note that Skβ is a [2k−1 (2k − 1), 2k] code. To determine Hamming (Lee) weight distributions of Skβ we ﬁrst make few observations. β α Remark 4. If Ak−1 (Bk−1 ) denotes an array of codewords in Sk−1 (Sk−1 ) and if β i = (i, i, . . . , i) then an array of all codewords of Sk is given by Ak−1 Bk−1 Bk−1 1 + Ak−1 Bk−1 2 + Bk−1 2 + Ak−1 Bk−1 Bk−1 3 + Ak−1 Bk−1 2 + Bk−1
Remark 5. Each row of Gβk has Hamming weight 2k−3 [3(2k − 1) + 1] and Lee weight 2k−1 (2k − 1). Proposition 1. Each row of Gβk contains 22(k−1) units and ω0 = ω2 = 2k−2 (2k−1 − 1).
176
Mahesh C. Bhandari, Manish K. Gupta, and Arbind K. Lal
Proof. Clearly, assertion holds for the ﬁrst row. Assume that the result holds for each row of Gβk−1 . Then the number of units in each row of Gβk−1 is 22(k−2). By Lemma 2, the number of units in any row of Gk−1 is 22k−3 . Hence the total number of units in any row of Gβk will be 22k−3 + 2 · 22(k−2) = 22(k−1). A similar argument holds for the number of 0’s and 2’s. Infact, similar to Skα , we have the following lemma. Lemma 4. Let c ∈ Skβ . If one of the coordinates of c is a unit then ω1 + ω3 = 22(k−1) and ω0 = ω2 = 2k−2 (2k−1 − 1). Otherwise ω0 = 2k−1 (2k−1 − 1), ω2 = 22(k−1) and ω1 = ω3 = 0. β α and y2 ∈ Sk−1 such Proof. Let x ∈ Skβ then, by Remark 4, there exist y1 ∈ Sk−1 that x can have any of the following four forms: x = (y1 y2 y2 ), x = (1 + y1 y2 2 + y2 ) Now the assertion follows by inx = (2 + y1 y2 y2 ) or x = (3 + y1 y2 2 + y2 ) duction and Lemma 2,
The proof of the following lemma, being similar to the proof of Lemma 3, is omitted. Lemma 5. The torsion code of Skβ is equivalent to the 2k−1 copies of the binary simplex code Sk . The proof of the following theorem, being similar to the proof of Theorem 3 is omitted. Theorem 4. The Hamming and Lee weight distributions of Skβ are : 1. AH (0) = 1, AH (22k−2 ) = 2k − 1, AH (2k−3 [3(2k − 1) + 1]) = 2k (2k − 1) ,and 2. AL (0) = 1, AL (22k−1 ) = 2k − 1, AL (2k−1 (2k − 1)) = 2k (2k − 1). Remark 6. 1. Skβ is of type β. 2. The correlation of each nonzero codeword of Skβ with components 0’s or 2’s is −2k−1 . 3. The swe of Skβ is given as swe(x, y, z) = xn(k) + (2k − 1)x2n(k−1) z 4
k−1
+ 2n(k)xn(k−1) y 4
k−1
z n(k−1)
where n(k) = 2k−1 (2k − 1).
4
Gray Image Families
Let C be an [n, k, dH , dL ] linear code over ZZ 4 . Then φ(C) is a binary code having 2k codewords of length 2n, and minimum Hamming distance dL . However φ(C) need not be linear. Let B be the matrix ( given in (2)) whose rows form a 2basis for C and let φ(B) be the matrix obtained from B by applying the gray map to each entry of B. The code Cφ generated by φ(B) is a [2n, k, ≥ d2L ] binary linear code. Note that φ(C) and Cφ have same number of codewords but they are not equal in general. The following proposition shows that both φ(S¯kα) and φ(Skβ ) are not linear.
On ZZ 4 Simplex Codes and Their Gray Images
177
Proposition 2. φ(S¯kα ) and φ(Skβ ) are nonlinear for all k. Proof. Let R1 , R2 , . . . Rk be the rows of the generator matrix Gk (Gβk ). Let c = Rk (R1 ) and let c = Rk−1 (Rk ) then by (6) ( Remark 6.2 ), 2¯ c c¯ ∈ / S¯kα (Skβ ). Hence, by Theorem 1, the result follows. Remark 7. 1. φ(S¯kα ) is a binary nonlinear code of length 22k+1 −2 and minimum Hamming distance 22k . It meets the Plotkin bound [6] and n < 2dH . 2. φ(Skβ ) is a binary nonlinear code of length 2k (2k −1) and minimum Hamming distance 2k−1 (2k − 1). This is an example of a code having n = 2dH [6]. 3. Even though, both S¯kα and Skβ are not ZZ 2 −linear they meet the bound given by (3). Skβ .
The next two results are about the binary linear codes obtained from S¯kα and
Theorem 5. Let C = S¯kα. Then Cφ is an [22k+1 − 2, 2k, 22k ] binary linear code consisting of two copies of binary simplex code S2k with Hamming weight distribution same as the Lee weight distribution of S¯kα . Proof. By Lemma 1, Cφ is a binary linear code of length 22k+1 − 2 and dimension 2k. Let Gk be a generator matrix of Skα in 2basis form. Then 0...0 1...1 2...2 3...3 0...0 2...2 0...0 2...2 Gk = Gk−1 Gk−1 Gk−1 Gk−1 2Gk−1 2Gk−1 2Gk−1 2Gk−1 and
0000 . . . 00 0101 . . . 01 1111 . . . 11 1010 . . . 10 φ(Gk ) = 0000 . . . 00 1111 . . . 11 0000 . . . 00 1111 . . . 11 . φ(Gk−1 ) φ(Gk−1 ) φ(Gk−1 ) φ(Gk−1 )
The proof is by induction on k. Assume that φ(Gk−1 ) yields a [22k−1 , 2(k − 1), 22k−2 ] binary code in which every nonzero codeword is of weight 22k−2 . Then the possible nonzero weight from the lower portion of the above matrix will be 4 · 22k−2 = 22k . From the structure of ﬁrst two rows of φ(Gk ), it is easy to verify that any linear combination of these rows with other rows has weight 22k . Puncturing the ﬁrst two columns and rearranging the columns yields the code having two copies of S2k . Theorem 6. Let C = Skβ . Then Cφ is the binary MacDonald code M2k,k : [22k − 2k , 2k, 22k−1 − 2k−1 ] with Hamming weight distribution same as the Lee weight distribution of Skβ . Proof. It follows by induction and is similar to the proof of Theorem 5.
178
Mahesh C. Bhandari, Manish K. Gupta, and Arbind K. Lal
The weight hierarchy of Skα and Skβ are given by the following two theorems. Theorem 7. Skα satisfies the chain condition and its weight hierarchy is given by r dr (Skα ) = 22k−i = 22k − 22k−r ; 1 ≤ r ≤ 2k. i=1
Proof. By Remark 3, Any rdimensional subcode of Skα is of constant Lee weight. Hence by deﬁnition, dr (Skα ) =
1 r (2 − 1)22k = 22k − 22k−r . 2r
Let D1 =< 2R1 >, D2 =< 2R1 , 2R2 >, D3 =< R1 , 2R1 , 2R2 >, D4 =< R1 , 2R1 , R2 , 2R2 >, . . . , D2k =< R1 , 2R1 , . . . , Rk , 2Rk > . It is easy to verify that D1 ⊆ D2 ⊆ · · · ⊆ D2k , and wS (Dr ) = dr (Skα ) for 1 ≤ r ≤ 2k. Theorem 8. Skβ satisfies the chain condition and its weight hierarchy is given by r dr (Skβ ) = n(k) − 2k−r−1 (2k − 2 2 ) 1 ≤ r ≤ 2k. where n(k) = 2k−1 (2k − 1). Proof. The proof follows by induction on k. Clearly the result holds for k = 2. β Assume that the result holds for Sk−1 . Hence if 1 ≤ r ≤ 2k − 2 then there β exists an rdimensional subcode of Sk−1 with minimum support size n(k − 1) − k−r−2 k−1 r2 (2 − 2 ). By Remark 4, 2 β α dr (Skβ ) = 2dr (Sk−1 ) + dr (Sk−1 )
(7)
α But all rdimensional subcodes of Sk−1 have constant support size (22k−2 − 2k−2−r ). Thus simplifying (7) yields the result. For r = 2k − 1 and 2k the result 2 can be easily proved. Let D1 =< 2R1 >, D2 =< R1 , 2R1 >, D4 =< R1 , 2R1 , R2 , 2R2 >, ... , D3 =< R1 , 2R1 , 2R2 >, D2k =< R1 , 2R1 , . . . , Rk , 2Rk > . Then it is easy to see that
D1 ⊆ D2 ⊆ · · · ⊆ D2k is the required chain of subcodes. The dual code of Skα is a code of length 22k and 2dimension 22k+1 − 2k, whereas the dual code of Skβ is a code of length 2k−1 (2k − 1) and 2dimension 22k −2k −2k. The Hamming and Lee weight distributions of these dual codes can be obtained with the help of Theorems 3 and 4 and the MacWilliams Identies
On ZZ 4 Simplex Codes and Their Gray Images
179
[8]. Similarly, the weight hierarchies of duals can be obtained from Theorems 2, 7 & 8. In [12] Sun and Leib have considered the dual of Skβ (k ≥ 3) in a diﬀerent context. They have used combinatorial arguments to obtain a code of length n = 2r−1(2r − 1), redundancy r and minimum squared noncoherent weight N + 1 − (N − 2)2 + 9, where N = n − 1. They have further punctured these codes to get some good codes in the sense of having larger coding gains over noncoherent detection. The results proved in sections 3 and 4 are easily extendible to simplex codes over ZZ 2s by suitably modifying the deﬁnition of the Gray map.
References 1. A.Ashikhmin, “On Generalized Hamming Weights for Galois Ring Linear Codes”, Designs, Codes and Cryptography, 14, 107126 (1998) 2. C. Carlet, “One weight ZZ 4 Linear Codes ” Int. Conf. on Coding, Crypto and related Topics, Mexico (1998), To appear in Springer Lecture Notes in Computer Science 3. J.H.Conway and N.J.A. Sloane, “SelfDual Codes over the Integers Modulo 4”, Jr. of Comb. Theory, Series A 62, 3045 (1993) 4. M. Greferath and U. Vellbinger, “On the Extended ErrorCorrecting Capabilities of the Quaternary Preparata Codes”, IEEE Trans. Inform. Theory, vol 44, No. 5, 20182019 (1998) 5. A.Hammons, P.V.Kumar, A.R.Calderbank, N.J.A.Sloane and P.Sole, “The ZZ 4 Linearity of Kerdock, Preparta, Goethals, and Related Codes”, IEEE Trans. Inform. Theory, vol 40, No. 2, 301319 (1994) 6. F.J.MacWilliams and N.J.A.Sloane, The Theory of Errorcorrecting codes, NorthHolland, New York (1977). 7. E.M.Rains, “Optimal Self Dual Codes over ZZ 4 ”, Discrete Math., vol 203, pp. 215228, (1999) 8. E.M.Rains and N.J.A.Sloane, Self Dual Codes: The Handbook of Coding Theory, Eds V. Pless et al, NorthHolland (1998) 9. C. Satyanarayana, “ Lee Metric Codes over Integer Residue Rings ”. IEEE Trans. Inform. Theory, vol 25, No. 2, pp. 250254 (1979) 10. A.G.Shanbhag, P.V.Kumar and T.Helleseth, “Improved Binary Codes and Sequence Families from ZZ 4  Linear Codes”, IEEE Trans. Inform. Theory, vol 42, No. 5, 15821587 (1994) 11. P.Sole, “A Quaternary Cyclic Code, and a Family of Quadriphase Sequences with Low Correlation Properties”, Lecture Notes in Computer Science, vol 388, SpringerVerlag, NY, 193201 (1989) 12. F.W.Sun and H.Leib, “MultiplePhase Codes for Detection Without Carrier Phase Reference”, IEEE Trans. Inform. Theory, vol 44, No. 4, 14771491 (1998) 13. V.V.Vazirani, H.Saran and B.Sundar Rajan, “An Eﬃcient Algorithm for Constructing Minimal Trellises for Codes over Finite Abelian Groups”, IEEE Trans. Inform. Theory, vol 42, No. 6, 18391854 (1996) 14. K.Yang and Tor Helleseth, “On the Weight Hierarchy of Goethals Codes over ZZ 4 ”, IEEE Trans. Inform. Theory, vol 44, No. 1, 304307 (1998) 15. K.Yang, T.Helleseth, P.V.Kumar and A.G.Shangbhag, “On the Weight Hierarchy of Kerdock Codes over ZZ 4 ”, IEEE Trans. Inform. Theory, vol 42, No. 5, 15871593 (1994)
Some Results on Generalized Concatenation of Block Codes M. Bossert1 , H. Grießer1 , J. Maucher1, and V.V. Zyablov2 1
2
University of Ulm, Dept. of Information Technology AlbertEinsteinAllee 43, 89081 Ulm, Germany Helmut.Griesser@etechnik.uniulm.de Institute for Information Transmission Problems, Russian Academy of Science B. Karetnyi per. 19, 101447, Moscow GSP4, Russia
Abstract. We consider generalized concatenation of block codes. First we give a short introduction on the notion for concatenated and errorlocating codes. Then an estimation of the hard decision error correcting capacity of concatenated codes beyond half the minimum distance is presented.
1
Introduction
Code concatenation is a useful method for constructing long codes from shorter ones. It is possible to decode these concatenated codes via their component codes, leading to a reduced decoding complexity. Moreover, many errors with weight exceeding half the minimum distance of the code can be corrected. The concatenation of block codes (CC codes) was introduced by Forney [3]. Blokh and Zyablov enhanced this deﬁnition resulting in so called generalized concatenation (GCC) that includes Forney’s approach as a special case. Later Zinov’ev [7] modiﬁed the deﬁnition of generalized concatenation in order to include nonlinear component codes. The principle of errorlocating codes (EL codes) was ﬁrst described by Wolf [6,5]. In [9,1] Zyablov investigated this class of codes and generalized the construction in a similar way as it was done for concatenated schemes. This led to so called generalized errorlocating codes (GEL codes). GEL codes are a subclass of the GCC code class given by the deﬁnition in [7]. A formal proof of this can be found in [10,11]. For a detailed introduction on generalized concatenation see [2]. Usually the error correcting and detecting capabilities of a block code are solely characterized by the minimum Hamming distance. This is justiﬁed by using an (algebraic) decoding algorithm that can decode any error pattern with a weight up to half the minimum distance and will usually fail if an error of higher weight occurs. However, if we use a decoding algorithm that allows the correction of error patterns with weight beyond half the minimum distance this
This work was supported by Deutsche Forschungsgemeinschaft (DFG), Germany.
Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 181–190, 1999. c SpringerVerlag Berlin Heidelberg 1999
182
M. Bossert et al.
should be used for an estimation of the decoding performance. We present a modiﬁed approach for the classiﬁcation of the correcting capability of concatenated schemes that leads to tighter bounds on the bit error and block error rate of the generalized concatenated code.
2
Code Definitions
A code C of length n, dimension k and minimum Hamming distance d with symbols from GF(q) will be denoted by C(n, k, d)q . For the binary case the symbol q may be omitted. Let GF(q m ) be an extension ﬁeld of order q m , q is a prime power. An element a ∈ GF(q m ) in exponential representation will be denoted by a if it is given as a vector consisting of m coeﬃcients of its polynomial representation. Consequently, a vector a ∈ GF(q m )n can be written as an m × n matrix a with entries from GF(q). 2.1
Code Partitioning
Let B be a linear code with codewords b ∈ B. A µway partition of B is con structed from µ disjoint subcodes Ba , a = 1, . . . , µ, for a Ba = B. We will denote such a partition by B/B . µ is called the order of the partition and a is a label identifying a unique subset Ba . If a set is partitioned L times by repeated partitioning of the subsets we get an Llevel partition chain B (1)/B (2)/ · · · /B (L). Subsets in such a partition chain are labeled by a multipart label, e.g. the label (a(1) , a(2) , . . . , a(L) ) denotes a subset in the last partition level of an Llevel partitioning. The subcode that contains the all zero vector (linear subcode) will be labeled with a ‘0’. d(B) denotes the minimum Hamming distance of a code. If a subcode B (l) consists of only one single codeword we deﬁne d(B (l)) = ∞. (l+1) A linear subcode B0 of the code B (l) induces a partition of B (l) into cosets. A coset is given by adding any element of this coset, say b, to the linear (l+1) (l+1) (l+1) (l+1) subcode B0 : Ba(l) = B0 + b, b ∈ B0 . Therefore b is called coset representative of the subcode with label a(l). The union of all cosets adds to (l+1) B (l): B (l) = b∈[B(l) /B(l+1) ] B0 + b, where B (l)/B (l+1) is a set of all coset representatives. There are sets of coset representatives that form a linear code and can be described by a generator matrix GB(l) /B(l+1) . GB(l) /B(l+1) describes the mapping of the label a(l) to a coset representative of level l. Hence it is (L) possible to calculate a code word of the subcode b ∈ Ba(1) ,... ,a(L) : GB(L) /B(L+1) a(L) .. · ... , where GB(1) /B(L+1) = . .
b = GTB(1) /B(L+1)
a(1)
GB(1) /B(2)
GB(1) /B(L+1) is the combination of all generator matrices of coset representatives.
Some Results on Generalized Concatenation of Block Codes
183
As the mapping done by the generator matrices is unique, there exists an inverse mapping which produces the label of a partition at level l if a coset element is given. The matrix describing this mapping is a (partial) parity check (l+1) matrix of the linear subcode B0 and will be denoted by H B(l) /B(l+1) . The combination of all the matrices H B(l) /B(l+1) , l = 1, . . . , L, deﬁnes the linear mapping from a given coset to the label: (L) H B(L) /B(L+1) a .. H B(1) /B(L+1) · b = ... , where H B(1) /B(L+1) = . . H B(1) /B(2)
a(1)
A code B(nb , nb , 1)q consisting of all possible nb tuples will be denoted by GF(q)nb and the single codeword code B(nb , 0, ∞) by {0}. Note that GB(l) /{0} = GB(l) and H GF(q)nb /B(l) = H B(l) . 2.2
Generalized Concatenated Codes
The deﬁnition of GCC codes is based on the description given by Zinov’ev in [7] and is not restricted to linear component codes. Definition 1 (GCC Code). An Llevel generalized concatenated code (GCC (l) (l) code) C consists of L outer codes A(l)(na , ka , da ) ma(l) and an inner code q
(1)
(1)
B (1)(nb , kb , db )q that is partitioned into a Llevel partition chain (1) (1) (L+1) (L+1) B (1)(nb , kb , db )q / · · · /B (L+1)(nb , kb = 0, db = ∞)q = b. The sub(l) codes of level L + 1 consist of a single codeword only. The symbol ai of the (l) lth outer code determines a subset of B (l)a(1) ,a(2) ,... ,a(l−1) . Thus the symbols ai , i
i
i
(1)
(L)
l = 1, . . . , L, of all outer codes together form the label (ai , . . . , ai ) of a unique codeword bi . The codeword of the GCC code consists of all the codewords bi , i = 1, . . . , na . The construction principle of GCC codes is illustrated in ﬁgure 1. The GCC
L (l) (l) code has length nc = na · nb , dimension kc = l=1 ka · ma , and a minimum (l) (l) Hamming distance dc ≥ minl=1,... ,L {da · db } (see [1,4]). We will assume that (l) (l) (l+1) (L+1) m a = kb − kb , kb = 0, for all l = 1, . . . , L. If L = 1 the GCC code reduces to an ordinary concatenated code. (l) Encoding can be done by ﬁrst encoding ka information symbols into a code(l) word a for l = 1, . . . , L and then mapping the label (a(1) , . . . , a(L) ) on to the codeword b. This encoding scheme, however, will not be systematic. Example 1 (GCC Code). We construct a twolevel GCC code using a partition chain of the inner binary Hamming code B (1)(7, 4, 3)/B (2)(7, 3, 4)/B (3)(7, 0, ∞) and two outer codes A(1)(7, 1, 7)2 (a repetition code) and A(2)(7, 4, 4)23 . The parameters of the generalized concatenated code construction are nc = 49, kc = (1) (1) (2) (2) ka · ma + ka · ma = 1 + 12 = 13 and dc ≥ min{7 · 3, 4 · 4} = 16.
184
M. Bossert et al. na
···
a(L)
(L)
ma
..
L outer codes
···
a(1)
(1)
ma
columnwise mapping deﬁned by a Llevel partition chain of code B (1)
···
C
nb
Fig. 1. Construction principle of generalized concatenated codes.
2.3
Generalized ErrorLocating Codes
Errorlocating codes can be deﬁned by the parity check matrices of the component codes [5]. Definition 2 (GEL Code). An Llevel generalized errorlocating code (GEL (l) (l) code) consists of L outer codes A(l)(na , ka , da ) m(l) , l = 0, . . . , L − 1, and L a q
(l)
(l)
inner codes B (l)(nb , kb , db )q , l = 1, . . . , L. Let
H B(l−1) /B(l) .. . H B(1) be a parity check matrix of the inner code B (l) and a(l) a codeword of the lth outer code A(l). Each codeword C of a GEL code in matrix form fulfills a(l) = H B(l) /B(l+1) · C, a(l) ∈ A(l)
(1)
for all l = 0, . . . , L − 1. The deﬁning equation (eq. 1) of a GEL code is equivalent to the calculation of the syndrome. Therefore the codevector a(l) is also known as syndrome. Notice that the columns of the codeword matrix C are in general not codewords of B (1) or one of its subcodes. (l) (l) The redundancies of outer and inner codes are given by ra = na − ka and (l) (l) (l) (l) (l+1) (0) rb = nb − kb respectively. We require that ma = kb − kb , kb = n b , for all l = 0, . . . , L − 1. The GEL code has length nc = na nb , redundancy
L−1 (l) (l)
L−1 (l) (l) (L) rc = l=0 ra ma and dimension kc = na kb + l=0 ka ma = nc − rc . For L = 1 the GEL code reduces to an ordinary EL coding scheme.
Some Results on Generalized Concatenation of Block Codes
185
na X (L−1) information X (0) symbols
(L−1)
kb
111111111 000000000 P 111111111 000000000 .. 000000000000 111111111111 00000000000 11111111111 000000000000 111111111111 00000000000 11111111111 111111111111111 000000000000000 P 111111111111111 000000000000000
C
nb
(L−1)
(0)
a(L−1) S = ... = H B(L) · C a(0)
(L)
ka
a(L−1)
u(L−1) (1)
ka
a(0)
u(0)
r(L−1)
(L−1)
ma
.. r(0)
(0)
ma
Fig. 2. Systematic encoding of generalized errorlocating codes. We consider systematic encoding. The encoders for the outer codes should be systematic. Further we assume that the matrices H B(l−1) /B(l) are given by (L−1) Q I 0 0 . . . 0 (L−1) b ma H B(L−1) /B(L) H B(L−2) /B(L−1) Q(L−2) I m(L−2) 0 . . . 0 b a .. .. , .. .. .. H B(L) = = . . . . . H B(1) /B(2) Q(1) I m(1) 0 b a (0) H B(1) I m(0) Qb a
(l)
(l)
where I m(l) is a ma × ma identity matrix. Note that any H B(L) describing the a partition chain and its labeling of the inner code, can be transformed into such a representation without destroying the nested partition. This is possible if row operations within H B(l−1) /B(l) and column permutations of H B(L) are allowed. Of course, permuting the columns of H B(L) will lead to an equivalent code, but without changing the properties of the concatenated code. The matrix H B(L) cannot be made systematic while preserving the partition structure. Therefore, there is still a nonsystematic mapping that combines outer and inner codes. Figure 2 illustrates the encoding process. After ﬁlling the white part of the codeword matrix C with information symbols the information part u(l) of every outer codeword can be calculated using equation 1. Note that this can be done without knowing any of the redundancy matrices in C denoted by P (l) . Then the redundancy part r (l) of the outer codewords, l = 0, . . . , L − 1, results from ordinary systematic encoding. In the last step the matrices P (l) will be calculated. If we denote the upper right submatrix by X (l) (see ﬁgure 2) then the following equation holds for all l = 0, . . . , L − 1
X (l) (l) (l) (l) (l) r = Qb I m(l) = Qb X (l) + P (l) ⇒ P (l) = r (l) − Qb X (l). (l) a P
186
M. Bossert et al.
As P (l+1) is part of X (l) that is needed for the calculation of P (l), the recursive calculation of the matrices P (l) starts with l = L − 1. Example 2 (GEL Code). A twolevel GEL code is constructed using the inner codes B (1)(7, 6, 2) and B (2)(7, 3, 4), and the outer codes A(0)(7, 1, 7)2 and A(1)(7, 4, 4)23 . A sample parity check matrix H B(2) is 1011000 1 1 1 0 1 0 0 H B(1) /B(2) H B(2) = = 1 1 0 0 0 1 0 . H B(1) 1111111 (2)
The resulting concatenated code has length nc = 49 and dimension kc = na kb + (0) (0) (1) (1) ka ma + ka ma = 34. Description of the Decoding Principle: A codeword C of the generalized concatenated code, given in matrix form, will be corrupted by some error pattern ˜ = C + E. DecodE ∈ GF(q)nb ×na after transmitting it over a noisy channel: C ing of generalized errorlocating codes can be done by means of a multistage decoding algorithm. Decoding starts with calculating the ﬁrst syndrome part ˜ Then a ˜ (0) = H B(0) /B(1) C. ˆ (0) providing the labels of the ˜ (0) is decoded into a a cosets needed to do decoding of ˜bj , j = 1, . . . , na , in some (usually) nonlinear subspace of B (0). In the second stage of the decoding ﬁrst an additional part of ˜ (1) is performed and so the syndrome (˜ a(1)) is calculated. Then decoding of a on. The information of corrupted columns of the code matrix resulting from the ˜ (1). This previous decoding step can be used to erase some of the symbols of a decoding principle gave rise to the term errorlocating codes. If decoding of the outer codes fails for any step then decoding of the concatenated code will fail, too. 2.4
The Connection between GCC and GEL Codes
A formal proof of this result can be found in [10,11]. Both code concatenation descriptions are based on the same partition principle of the inner code. The main diﬀerence is the labeling of the partition. For the case of GEL codes, the parity check matrix H B(L) is speciﬁed, whereas for GCC codes the labeling is not deﬁned. If labeling of the subcodes is done by a linear mapping and consequently can be described via a matrix GB(1) then both descriptions are identical: GTB(1) · S = C
⇐⇒ S = H B(L) · C,
provided that GTB(1) = H −1 . Since H B(L) has full rank, it is always possible to B(L) ﬁnd such a GB(1) . Therefore both constructions can be used to describe the same code: A linear Llevel GCC code can be described as a special case of an (L + 1)level GEL code and an Llevel GEL code is a special case of an (L + 1)level GCC code.
Some Results on Generalized Concatenation of Block Codes
187
Llevel GEL −→ (L+1)level GCC: An Llevel GEL code with inner component (l) (l) codes B (l)(nb , nb , db )q , l = 1 . . . , L, and outer component codes (l) (l) A(l)(na , na , da ) m(l) , l = 0 . . . , L − 1, is a special case of an (L + 1)level GCC a q
code with the additional codes B (0)(nb , nb , 1)q and A(L) (na , na , 1)
(L)
q kb
. Note that
indexing of the component codes has only been shifted by 1 compared to the original deﬁnition of GCC codes. This description reveals the minimum distance (l) (l) of the code: dc ≥ minl=0,... ,L {da db }. Example 3. The twolevel GEL code from example 2 shall be interpreted as a threelevel GCC code. For the GCC description we need two more component codes: B (0)(7, 7, 1) and A(2)(7, 7, 1)23 . The minimum distance of the code is dc ≥ (0) (1) (1) (2) min{da · 1, da · db , 1 · db } = 4. Llevel GCC −→ (L + 1)level GEL: An Llevel GCC code with inner and outer (l) (l) (l) (l) component codes B (l)(nb , nb , db )q and A(l)(na , na , da ) ma(l) , l = 1 . . . , L, is a q
special case of an (L + 1)level GEL code with the additional codes B (L+1)(nb , 0, ∞)q and A(0)(na , 0, ∞) r(1) . q
b
Example 4. An interpretation of the twolevel GCC code from example 1 with B (1)(7, 4, 3), B (2)(7, 3, 4), A(1)(7, 1, 7)2 , and A(2)(7, 4, 4)23 as a threelevel GEL code requires the additional component codes B (3)(7, 0, ∞) and A(0)(7, 0, ∞)23 . Since an Llevel GEL code is only a special case of an (L + 1)level GCC code, decoding algorithms for GCC codes can be used for GEL codes, too. Thus an algorithm of Blokh and Zyablov (see e.g. [8]) can be used to decode all error patterns whose weight does not exceed dc2−1 . However, this algorithm can correct many error patterns of higher weight, too. In the following section we will present a method for estimating the bit error rate that will take into account these additional correcting capabilities.
3
On the Hard Decision Correcting Capability of Generalized Concatenated Codes
If an algorithm is used for decoding that can correct all error patterns with a weight up to half the minimum distance of the code and in addition many patterns of higher weight, then the minimum distance is not a suﬃcient description of the correcting capability of the code. This is the case for the decoding of generalized concatenated codes with the algorithm of Blokh and Zyablov [8]. Therefore we present bounds for the bit error and the block error rate that take into account this additional error correcting capacity. Transmission over a binary symmetric channel (BSC) is assumed. The bounds depend on the decoding algorithm on which the estimation of the number of errors after decoding is based.
188
M. Bossert et al.
Estimation of Bit Error Rate: Let s = d−1 2 be the number of errors a bounded minimum distance decoder can correct. We will state an upper and a lower bound for the bit error rate of the code bits of a generalized concatenated code that can correct at least a part of all error patterns up to the weight t > s. For a good estimate, t should be chosen such that it includes most of the additional correctable error patterns. However, since t is also the maximum number of errors that can be added to a code vector in case of wrong correction, t should not be too large. Moreover the eﬀort for the calculation of the bounds increases very fast with increasing t. One has to ﬁnd a good compromise. The codewords C in matrix form consist of na columns of length nb , resulting in a total code length of nc = na · nb . Let e = (e1 , e2 , . . . , enb ) ∈ E be the distribution of the column weights of an error pattern, i.e. ei is the number of columns i errors and E is the set of all possible error distributions.
ncontaining b e = i=1 i· ei denotes the total number of errors. Now it is possible to split the bit error rate in a part that is due to all the errors of weight ≤ t, and a second part that is due to the remaining errors with weight > t: Pˆbit (t) = pˆ(e ≤ t) + pˆ(e > t).
(2)
We assume that the decoding algorithm will not add more errors than it is able to correct. Therefore we can upper bound the second term (p is the crossover probability of a BSC):
pˆ(e > t) ≤
nc 1 nc i 1 ˆ N (e)P (e) ≤ (i + t) p (1 − p)nc −i . nc nc i=t+1 i
(3)
e∈E e>t
ˆ (e) ≤ (i + t) is the maximum number of bit errors that can result, given a N weight distribution of bit errors e. P (e) = nic pi (1 − p)nc −i is the probability of this weight distribution. The ﬁrst term of equation 2 is estimated by a sum of all diﬀerent weight distributions of error patterns e ≤ t: pˆ(e ≤ t) ≤
1 ˆ 1 ˆ N (e)P (e) = N (e)U(e)pe (1 − p)nc −e , nc nc e∈E e≤t
e∈E e≤t
where the number U(e) of error patterns of a weight distribution results from the number of possibilities for the selection of erroneous columns times the number of possibilities to place the errors within these columns:
U(e) =
i−1 nb t ei na − j=1 ej nb . · ei i
i=1
i=1
Some Results on Generalized Concatenation of Block Codes
189
0
10
−5
Pbit
10
−10
10
−15
10
upper and lower bound according to eq. 2 and 4 upper bound of a two−error correcting code upper bound of a three−error correcting code
−20
10
−7
10
−6
10
−5
10
−4
−3
10
10
−2
10
−1
10
p
Fig. 3. A comparison of some bounds on the bit error rate of the concatenated code given in example 5.
ˆ has to be done individually for all e ≤ t. The correspondThe calculation of N ˇ ing lower bound Pbit (t) is given by Pˇbit (t) = p(e ≤ t) + p(e > t), nc 1 ˇ 1 nc i pˇ(e > t) ≥ (i − t) N (e)P (e) ≥ p (1 − p)nc −i , nc nc i=t+1 i
(4)
e∈E e>t
pˇ(e ≤ t) ≥
1 ˇ N (e)U(e)pe (1 − p)nc −e , nc e∈E e≤t
ˇ (e) as the minimum number of bit errors that will result from decoding. with N For p → 0 both bounds will coincide. Example 5 (Bounds on the Bit Error Rate). We consider a concatenated code of length nc = 1024 and dimension k = 988 based on a partition chain of the inner code B (0)(32, 32, 1)/B (1)(32, 31, 2)/B (2)(32, 26, 4)/B (3)(32, 21, 6) and the outer codes A(0)(32, 16, 8), A(1)(32, 29, 4)25 and A(2)(32, 31, 2)25 . As the minimum Hamming distance of the code is 6, each error of weight 1 or 2 can be corrected using the multistage decoding algorithm by Blokh and Zyablov. In addition almost all of the error patterns of weight 3—except the pattern given by e = (0, 0, 1, 0, . . . , 0)—can be corrected. This error pattern, however, seldom occurs as compared to all the correctable errors and can be detected. Thus the decoding result is similar to a threeerror correcting code. Figure 3 compares the bounds of eq. 2 and 4 for t = 3 with upper bounds on two and threeerror correcting codes (given by eq. 3, t = 2, 3).
190
M. Bossert et al.
Estimation of Block Error Rate: The block error rate is upper bounded by the error rate of an ordinary t error correcting code and the error rate that is caused by error patterns of weight ≤ t (we ignore error patterns of weight > t that can be corrected): BMD Pblock (t) ≤ Pblock (t) + Pblock (e ≤ t), n c nc i BMD p (1 − p)nc −i (t) = Pblock i i=t+1 ˜ (e)U(e)pe (1 − p)nc −e . Pblock (e ≤ t) = N e∈E e≤t
˜ (e) indicates whether a given weight distribution of an error pattern oversteps N the correcting capacity of the code: ˆ (e) > 0 1 if N ˜ N= ˆ 0 if N (e) = 0.
References 1. E. L. Blokh and V. V. Zyablov. Coding of generalized concatenated codes. Problemy Peredachi Informatsii, 10(3):45–50, July–Sept. 1974. 2. M. Bossert. Channel Coding for Telecommunications. Wiley, 1999. 3. G. D. Forney, Jr. Concatenated Codes. MIT, Cambridge, MA, 1966. 4. F. J. MacWilliams and N. J. A. Sloane. The Theory of ErrorCorrecting Codes. North Holland, Amsterdam, 1996. 5. J. K. Wolf. On codes derivable from the tensor product of check matrices. IEEE Trans. Inf. Theory, IT11:281–284, 1965. 6. J. K. Wolf and B. Elspas. Errorlocating codes—a new concept in error control. IEEE Trans. Inf. Theory, IT9:113–117, 1963. 7. V. A. Zinov’ev. Generalized cascade codes. Problemy Peredachi Informatsii, 12(1):5–15, 1976. 8. V. A. Zinov’ev and V. V. Zyablov. Decoding of nonlinear generalized cascade codes. Problemy Peredachi Informatsii, 14(2):46–52, 1978. 9. V. V. Zyablov. New interpretation of localization error codes, their error correcting capability and algorithms of decoding. In Transmission of Discrete Information over Channels with Clustered errors, pages 8–17. Nauka, Moscow, 1972. (in Russian). 10. V. V. Zyablov, J. Maucher, and M. Bossert. On the equivalence of GCC and GEL codes. In Proc. 6th Int. Workshop on Algebraic and Combinatorial Coding Theory, pages 255–259, Pskov, 1998. 11. V. V. Zyablov, J. Maucher, and M. Bossert. On the equivalence of generalized concatenated codes and generalized error location codes. To appear in IEEE Trans. Inf. Theory, 1999.
Near Optimal Decoding for TCM Using the BIVA and Trellis Shaping Qi Wang1 , Lei Wei1 , and Rodney A. Kennedy2 1
Department of Engineering, The Australian National University Qi.Wang@faceng.anu.edu.au, Lei.Wei@anu.edu.au 2 Research School of Information Sciences and Engineering The Australian National University rodney.kennedy@anu.edu.au
Abstract. In this paper a bootstrap iterative decoding technique concatenated with the Viterbi algorithm (BIVA) and trellis shaping for trelliscoded modulation (TCM) is proposed. The concept of a bootstrap decoding is introduced ﬁrst and then the new metric functions which take into account the bootstrap iterative decoding algorithm for TCM systems are derived. One and two dimensional bootstrap decoding are proposed for packet transmission. Furthermore, the trellis shaping technique is also considered to combine with such TCM schemes using the BIVA. The simulation results show that the performance of 1.25 dB away from Shannon limit can be achieved by the BIVA and trellis shaping for 256state 6 bits/T TCM scheme, with low complexity and reasonable computation.
1
Introduction
TrellisCoded Modulation (TCM) has been widely used as a combined coding and modulation technique for digital transmission over bandlimited channel. Ungerboeck has shown that signiﬁcant coding gains can be achieved using trelliscoded modulation with the Viterbi decoding algorithm over uncoded modulation without sacriﬁcing bandwidth eﬃciency on a bandlimited channel [1]. In the past decade, many variants on the basic TCM scheme have been developed to obtain higher coding gains. Bootstrap decoding [4][7] is a method which imposes algebraic constraints on streams of convolutionally encoded information sequences. Such constraints can then be made use to gather extrinsic information from other streams when one stream is decoded. In [7] Wei extent the results of [4][6] to near optimally bootstrap decoding using long convolutional codes. One of the simpliﬁed bootstrap algorithms, which only uses the Viterbi algorithm, was given in [7] and named as BIVA in [8]. In this paper, a system of concatenating trellis codes with the bootstrap decoding is proposed. We will focus on how to apply the BIVA to TCM, and one and two dimensional bootstrap structures are designed. In addition, we noted Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 191–200, 1999. c SpringerVerlag Berlin Heidelberg 1999
192
Qi Wang, Lei Wei, and Rodney A. Kennedy
that the bootstrap iterative decoding algorithm can be modiﬁed to accommodate trellis shaping [10] and full shaping gains can be achieved. The paper is organised as follows. In section 2, the principle of encoding system with bootstrap decoding is described. We then present the BIVA which combines the bootstrap iterative decoding with Viterbi algorithm in section 3, and the new metric functions for bootstrap TCM schemes are also derived. In section 4, one and two dimensional bootstrap structures are proposed. In section 5, concatenating shaping techniques with bootstrap TCM using the BIVA is considered. The numerical simulation results and discussion will be reported in section 6 to compare the performance of the BIVA with Viterbi algorithm. Lastly, in section 7, the conclusion is given.
2
Review the Bootstrap Structure
Bootstrap decoding is a method which utilises algebraic constraints across streams of convolutional encoded information sequences. Before introducing the BIVA and its decoding algorithm, let us look at the basic concept of the bootstrap decoding. In the bootstrap decoding [4], a medium size packet can be formed by organising (mb − 1) × l information symbols a block of (mb − 1) rows and l columns. Suppose [U]j,l = [Uj,1 , Uj,2 , · · · , Uj,l ] and [V]j,l = [Vj,1 , Vj,2 , · · · , Vj,l ] denote the information symbol vector and the encoded codeword vector of the j th packet stream, respectively. As usual, we will encode each packet of l × k binary information bits into codewords of length l through the same k/n convolutional encoder and each encoded symbol Vj,i , (i = 1, 2, · · · , l) has n information bits. th An mth digit will be the b packet is then generated in such way that the i th parity of the i digits of the (mb − 1) information packets, namely, the mth b packet is a modulo 2 positionbyposition sum of the above (mb − 1) information packets, i.e., Umb ,i = U1,i ⊕ U2,i ⊕ · · · ⊕ Umb −1,i ,
i = 1, 2, . . . , l.
(1)
where ⊕ denotes a modulo 2 sum. The mth b packet is therefore called the paritycheckconstraint (PCC) packet. It was shown in [7] that the free distance of such block code is doubled. The mth b PCC packet is also encoded by the same convolutional encoder. Fig. 1 shows the bootstrap structure of one block of mb rows and l columns. Because of the linearity of convolutional encoding1 , the PCC packet corresponds to a path in the coding tree whose information digits are the mod 2 sum of the information digits underlying the information packets; i.e., Vmb ,i = V1,i ⊕ V2,i ⊕ · · · ⊕ Vmb −1,i , 1
i = 1, 2, . . . , l.
(2)
All the convolutional encoders shown in this paper are assumed to be linear which guarantees that the PCC condition among the information packets still exists for coded symbols after the coding process.
Near Optimal Decoding for TCM Using the BIVA and Trellis Shaping packet 1
V1,1
V1,2
···
V1,l
packet 2
V2,1
V2,2
···
V2,l
···
Vmb −1,l
···
Vmb ,l
193
. . . packet mb − 1 packet mb
Vmb −1,1 Vmb −1,2 Vmb ,1
Vmb ,2
Fig. 1. The bootstrap structure of one block code.
Hence, all mb packets are in principle decodable. Suppose next that the mb encoded packets are sent through the channel disturbed by AWGN, and that the corresponding received signal vectors [R]j,i (j = 1, · · · , mb ; i = 1, · · · , l) are arranged by the decoder into a mb by l array. We can ﬁnd that if the jth received packet is to be decoded, the received digits of all other packets should also be taken into account, since these contain information about the transmitted digits of the jth packet (the transmitted digits are related by the paritycheckconstraint condition).
3
Bootstrap Iterative Viterbi Algorithm for TCM
In this section, we will modify the BIVA [7][8] for TCM schemes. In [7] Wei made (p) a signiﬁcant simpliﬁcation on the computation of the parity metric λj,i for convolutional codes. The modiﬁed BIVA is an appropriate to decode the bootstrap TCM scheme with low complexity. Now let us consider how to construct bootstrap TCM scheme and how to derive its parity metric. In trelliscoded modulation, the key concept is the principle of mapping by set partitioning [1]. A general TCM system is shown in Fig. 2. In the trellis encoder, k the inputed information bits are divided into two parts. One part (Ul1 , · · · , Ul ) is encoded by a convolutional encoder whose output is used to select one subset from the whole constellation; the other part (Ulk +1 , · · · , Ulk ) comprises uncoded bits which are used to determine one signal point within that selected subset. The TCM system can be combined with the forementioned bootstrap structure. Next we describe how to concatenate the bootstrap iterative decoding algorithm with VA for such TCM systems. The key idea of bootstrap decoding is to produce one sequence which follows the PCC condition using (mb −1) information sequences. But for TCM systems, the question arises whether to parity check or protect each bit in one symbol? We know that the bit error probability (BER) performance of TCM is mainly determined by the minimum squared Euclidean distance d2f ree which is the minimum of parallel transition’s squared
194
Qi Wang, Lei Wei, and Rodney A. Kennedy Ul1
. ˜ . Ulk .
Encoder
.. .
k/ n
˜ Ulk+1
Ulk
Vl1
Convolutional
.. .
.. Vln˜ .
Subset Selector
˜ Vln+1 .. Vln .
Signal Selector
S
Fig. 2. Schematic representation of a typical trellis code.
distance d2parallel and coded minimum squared Euclidean distance d2f ree,c , i.e., d2f ree = min(d2parallel, d2f ree,c ). If d2f ree,c ≤ d2parallel, we can say that the bit error caused by parallel path error can be ignored at a high signaltonoise rate (SNR), and the BER is dominated by d2f ree,c . Therefore, the performance should k are parity checked not be dramatically aﬀected if only the coded bits U 1 , · · · , U i
i
k +1 or protected and uncoded bits Ui , · · · , Uik are intact or unprotected instead of 1 protecting the whole symbol Ui , · · · , Uik . The beneﬁt of such operation is obvious — the information transmission rate is increased. However, for the TCM systems in which d2f ree,c > d2parallel, if only coded bits are protected, the performance will be much worse than the one in which the whole symbol is protected. Now let us derive the new metric function which take into account the PCC condition for TCM. In this paper, we just focus on protecting the coded bits in one symbol. Let [R]j,l = [Rj,1 , Rj,2 , · · · , Rj,l ] denote the received signal vector n+1 n of the j th packet, and (V p ; V np ) = (V 1 , · · · , V ;V , · · · , V n ) denote the ith j,i
j,i
j,i
j,i
j,i
j,i
p encoded codeword of the j th packet, where Vj,i is the output from the convonp lutional encoder and Vj,i is the uncoded part of the input symbol. If only the p coded bits are protected, then only Vj,i satisﬁes the PCC condition in the encoded codeword. Suppose that packets j1 , j2 , · · · , jmb −2 have been successfully decoded using the normal VA and now we are going to decode packet jmb −1 . The original PCC condition on the ith symbols in one block (mb packets) is p p ⊕ V2,i ⊕ · · · ⊕ Vmp b ,i = 0. Wmb ,i = V1,i
(3)
p ⊕· · ·⊕Vmp b −2,i . The receiver replaces the j th (j = 1, 2, . . . , mb − Let Wmb −2,i = V1,i 2) received packets by the estimated j th transmitted packets. So we can get the estimated PCC condition
m −2,i = V p ⊕ · · · ⊕ V p W b 1,i mb −2,i .
(4)
p p is the decision of Vj,i . Now the new likehood function for decoding where Vj,i th the (mb − 1) packet is
λmb −1,i = − log[P (R1,i , · · · , Rmb −1,i , Rmb ,i Vmp b −1,i ; Vmnpb −1,i )].
(5)
Near Optimal Decoding for TCM Using the BIVA and Trellis Shaping
195
where Vmp b −1,i and Vmnpb −1,i are coded part and uncoded part of codeword Vmb −1,i , respectively. Assuming R1,i ,· · ·,Rmb −1,i and Rmb ,i are independent of each other2 , we then have λmb −1,i = − log[P (Rmb −1,i Vmp b −1,i ; Vmnpb −1,i )]
− log[P (R1,i , · · · , Rmb −2,i , Rmb ,i Vmp b −1,i ; Vmnpb −1,i )], (v)
(p)
= λmb −1,i + λmb −1,i .
(6)
(v)
where λmb −1,i denotes the branch metric value which can be obtained using (p)
the VA, and λmb −1,i denotes the extrinsic metric value introduced by the PCC condition from the other packets. m −2,i = Wm −2,i , then we have If W b b p mb −2,i ⊕ V p W mb −1,i ⊕ Vmb ,i = 0.
(7)
Therefore, Vmp b ,i , the coded part of ith codeword in mth b packet, can be obtained through the PCC condition and (mb − 2) decoded received packets as well. np However, there is no PCC relation among the uncoded part Vj,i , so the question is how to determine the uncoded part Vmnpb ,i of ith codeword for mth b packet. We know that the coded part Vmp b ,i decides which subset will be selected in the constellation, and the parallel transitions error in the subset can be ignored at high SNR’s. Therefore, after Vmp b ,i is determined, Vmnpb ,i can also be decided by selecting one point (Vmp b ,i ,Vmnpb ,i ) which is the closest to the received signal Rmb ,i in this subset. Now we have λmb −1,i = − log[P (R1,i , · · · , Rmb −2,i , Rmb ,i Vmp b −1,i ; Vmnpb −1,i )], (p)
m −2,i ; V np )]. = − log[P (R1,i , · · · , Rmb −2,i , Rmb ,i Vmp b ,i ⊕ W b mb ,i
(8)
Exact calculation of (8) is very computation expensive and almost practically impossible when mb is large. Therefore, we approximate (8) as (p) m −2,i ; V np )]. λmb −1,i ≈ − log[P (Rmb ,i Vmp b ,i ⊕ W b mb ,i
(9)
(p)
It is worth mentioning that λmb −1,i will not be the correct metric and the error mb −2,i = Wmb −2,i . The eﬀect of error propagation propagation can result if W (p) can be reduced by scaling down the value of λmb −1,i with a scale factor α. So far, we have got the new metric functions for TCM systems combined with bootstrap structure. The corresponding bootstrap iterative decoding using Viterbi algorithm (BIVA) for TCM can be summarised as follows. (a) (mb − 1) × l information symbols are arranged as a block of (mb − 1) rows and l columns, where each symbol includes k coded bits and (k − k) uncoded 2
Actually, R1,i , · · · , Rmb ,i are weakly dependent because of the PCC condition among them.
196
(b)
(c) (d) (e) (f)
4
Qi Wang, Lei Wei, and Rodney A. Kennedy
bits. Encode each row of l information symbols into codewords of length n using a k/ n convolutional encoder of memory length ν. All rows are encoded through the same convolutional encoder. Generate the mth b parity check row using the PCC condition. If it is only necessary to protect the coded bits, the coded bits of the ith symbol will be the parity of the coded bits of the ith symbols of the previous (mb − 1) information packets, and its uncoded bits will continue to be information bits. In the receiver, decode the ﬁrst (mb −2) packets based on the VA and update the PCC condition through decoded packets. Decode the next packet based on the BIVA, using the new metric which takes into account the extrinsic information from other packets. Update the PCC condition based on the newest decoded values. Repeat step (d) and (e) for several iterations until the stop criterion is satisﬁed.
Variation of BIVA for Packet Transmission
In this section, we will study the BIVA decoding in packet transmission format. In previous sections, several dummy symbols in each packet are needed to make the decoding of the last information symbols reliable. Such dummy symbols can be eliminated using tailbiting [9]. In tailbiting, the encoder is ﬁrst initialised by inputing the last ν information bits into the encoder and ignoring the output. Therefore, the start and end encoder states are constrained to be identical; that is, a trellis codeword starts from the state at which it will eventually end. 4.1
One Dimensional (1D) Bootstrap Structure
We now can use the tailbiting technique in each packet to encode the whole packet information symbols. But it is noted that the much computation will be taken if each packet applies tailbiting. In this subsection, we will present a modiﬁcation of the BIVA to cut down such computation without eﬀect on its error performance. The key concept of 1D bootstrap structure is to connect all the row packets into one big packet. In other words, instead of terminating each row packet the information symbols of all row packets will continue to feed into the encoder until the last symbol. in the 1D BIVA. The Next we consider how to update the PC condition W Viterbi decoder progresses over the trellis for a certain depth (i.e., truncation length), it then produces its decision result. Ideally, updating for one packet should should be based on the decoding history of the other packets, namely, W be updated based on the newest decoded values of the other packets. In the 1D BIVA, however, the decision results of any packet can not be determined until is updated only once the whole super packet is ﬁnished. Thus, for all packets, W is updated has little in every iteration. Simulation results show that the way W eﬀect on the error performance and iteration number.
Near Optimal Decoding for TCM Using the BIVA and Trellis Shaping
4.2
197
Two Dimensional (2D) Bootstrap Structure
We can reapply the bootstrap structure on the structure in the above subsection 4.1 to build up a two dimensional bootstrap structure. The main motivation is to build up a block code with a large minimum free distance, but decodable by an iterative BIVA with a very low complexity. The 2D bootstrap structure is illustrated in Fig. 3. In the 2D structure, there are two types of paritychecks
super
packet 11
packet 21
packet 12
packet 22
.. . packet 1B
.. . packet 2B
packet 1 super
packet 2
super parity
packet B
··· ···
···
packet mb,1 packet mb,2 .. . packet mb,B
Fig. 3. 2D bootstrap structure.
which can produce two sorts of parity metrics3 . Type (a) constraint provides the parity metric given in (9) for each super row packet. Type (b) constraint provides the parity metric for each column in the structure given in Fig. 3. Again, both parity metrics will be used after the ﬁrst iteration. Updating the parity metrics for type (a) and (b) constraints is exactly the same as the procedure given in the above subsection.
5
Combining Shaping Techniques with Bootstrap TCM System
It has come to be recognised that shaping and coding are two separable and complementary components of the TCM systems. In [10], it has been shown that shaping gain can be achieved by using nonuniform, Gaussianlike signalling. One of the approaches, called trellis shaping, was proposed by Forney [10]. It was shown that a simple 4state shaping code can achieve about 1.0 dB shaping gain, which is about 2/3 of the full 1.53 dB ultimate shaping gain. In this work, we concentrate on trellis shaping with TCM systems using the bootstrap iterative decoding algorithm. A TCM system using BIVA with shaping techniques is shown in Fig. 4. A 2n point 2D constellation is used to transmit k = kc + nu + rs 3
A third type of parity check constraint can be contemplated for the 2D structure which is a combination of row and column check sums. However, it is found that this posed computational and technical diﬃculties which did not signiﬁcantly contribute to the performance.
198
Qi Wang, Lei Wei, and Rodney A. Kennedy
bits
Channel Encoder Gc
yl nc
M
bits
A Shaping zl Encoder ns
decoder
P sl rs bits
a
channel
wl nu bits
channel
xl kc
y ˆl nc
bits
x ˆl
G−1 c kc
bits
Gc
y ˆl kc
bits
w ˆl nu bits
Shaping sˆl zˆl ns bits Decoder rc bits
bits
Fig. 4. Coded modulation system with trellis shaping.
information bits/T, where kc , nu and rs are the number of channel coded bits xl , uncoded bits wl and shaping coded bits sl , respectively. The 2D constellation is ﬁrst partitioned into 2nc subsets using mapping by set partitioning [1]. The nc coded bits yl produced by the channel encoder Gc at time l specify one of the 2nc subsets. The 2D constellation is also divided into 2ns subregions. The ns shaping bits zl produced by the shaping encoder at time l specify one of the 2ns subregions. The nu uncoded bits wl at time l specify a point a = M(yl , wl , zl ) in the constellation for a given subregion and a given subset. At the receiver, a Viterbi decoder with BIVA can be used to obtain the estimate of the transmitted j,i , one channel encoder Gc is information bits. To update the PCC condition W needed to reencode the decoded bits xl at the receiver. It was shown in [7] that the free distance of block code is doubled by 1D bootstrap structure. We note that the free distance is just determined by coded bits xl for such TCM schemes in which d2parallel is larger than d2f ree,c . Therefore, the shaping codes sl should not aﬀect the distance property and the full shaping gains will be achieved when shaping techniques are combined with TCM systems using BIVA.
6
Numerical Results
In the previous sections, the BIVA for bootstrap TCM systems has been proposed and shaping technique integrated with such TCM was also discussed. In this section, we report some simulation results to compare the performance with the Viterbi algorithm. In our simulation, each superblock has 20×20 packets in 2D bootstrap structure, i.e., mb = 20 and B = 20, and each packet includes 7ν symbols. The largest iteration number is set as 100. If the stop criterion is meted within 100 iterations, the decoding process will be stopped. The stop criterion in bootstrap decoding is straightforward. If the coded part of decoded symbols (part protection) in all mb packets (1D bootstrap structure) or (mb × B) packets (2D bootstrap structure) = satisfy the original PCC condition, i.e., W j i ⊕Vj,i = 0, then the iterative decoding process is stopped. Figure 5 shows the performance of 2D BIVA
Near Optimal Decoding for TCM Using the BIVA and Trellis Shaping
199
−3
10
BER
VA 2−D BIVA without shaping 2−D BIVA with shaping
−4
10
−5
10
10.5
11
11.5
12.5
12
E /N b
13
13.5
14
0
Fig. 5. Performance of ν = 8 trellis codes at a spectral eﬃciency of 6 bits/T with part protection using 2D BIVA.
with part protection for 2D 256state 6 bits/T TCM scheme, combined with trellis shaping technique. 16state shaping code is applied in our simulation. The 16state and 256state TCM are Ungerboeck’s codes [2]. For comparison, the performance of such TCM scheme using VA is also reported. The results show that about 2.0 dB gross gain can be achieved beyond the VA without shaping and about 2.7 dB gross gain with shaping. However, noted that the shaping gain is smaller at low SNR’s and only about 0.7 dB shaping gain was obtained in such TCM scheme with 2D bootstrap structure (similar case can be found in [11]). We also can see the error ﬂoor in Fig. 5. This error ﬂoor is mainly dominated by the parallel path error at relative low SNR’s. Such errors can be reduced through multilevel error protection techniques. It is noteworthy that each block includes some noninformation bits which cause the real transmission rate to be lower than the nominal rate, namely, the parity bits introduced by the bootstrap structure oﬀset a part of achieved gains and result in practical rate loss. In our case, the real rate now is 5.805 bits/T , Shannon limit for this rate is 9.76 dB. Therefore the performance of 1.25 dB away from Shannon limit at BER = 3 × 10−5 is achieved by such a TCM scheme. Finally, we have to point out that the scale factor α mentioned in section 3 is critical to the performance and convergent speed. The average iteration number varies according to the diﬀerent SNR’s and diﬀerent codes. Generally, less iterations are required at higher SNR’s or by larger memory length code. In our simulation, for ν = 8 code, the average numbers of iteration are 45 and 13 for SNR=11.01 dB and 11.1 dB, respectively. It is shown that the peak and average complexity can be signiﬁcantly cut down if we increase SNR by 0.1 dB.
200
7
Qi Wang, Lei Wei, and Rodney A. Kennedy
Conclusions
A bootstrap iterative Viterbi algorithm has been proposed and new metric functions which take into account the extrinsic information based on the bootstrap TCM systems have also been derived. Additional gains for trellis codes using the BIVA instead of VA can be obtained with low complexity and reasonable computation. For TCM schemes in which d2f ree,c ≤ d2parallel , protecting the coded bits instead of the whole symbol can achieve most gain with less redundancy. 1D and 2D bootstrap structures are also designed to be suitable for packet transmission. We can see that such bootstrap decoding can be applied with existing systems such as ADSL, etc. Furthermore, trellis shaping technique is employed on bootstrap TCM systems and full shaping gain can be achieved. In the simulation, the performance of 1.25 dB away from Shannon limit is achieved by 2D BIVA for 2D 256state 6 bits/T TCM scheme combined with trellis shaping.
Acknowledgements The authors would like to thank Prof. D. J. Costello Jr. for introducing us to this area, Dr. G. D. Forney, Jr. for pointing out the importance of continuous transmission, which has been the key motivation of this research.
References 1. G.Ungerboeck: “Channel coding with multilevel/phase signals”, IEEE Trans. on Inform. Theory, Vol. IT28, pp. 5567, Jan. 1982. 2. Ezio Biglieri, D. Divsalar, P. J. McLane and M. K. Simon, Introduction to Trelliscoded Modulation with Applications, Macmillan Publishing Company, New Vork, 1991. 3. G.D. Forney, Jr.: “The Viterbi Algorithm” Proc. of the IEEE, Vol. 61, No. 3, March 1973. 4. F. Jelinek, J. Cocke: “Bootstrap hybrid decoding for symmetrical binary input channels”, Information and Control, Vol. 18, pp. 261298, April 1971. 5. F. Jelinek: “Bootstrap trellis decoding”, IEEE Trans. on Inform. Theory, Vol. 21, pp. 318325, May 1975. 6. H. A. Cabral, D. J. Costello, Jr., P.R. Chevillat: “Bootstrap hybrid decoding using the multiple stack algorithm”, Proceeding of IEEE ISIT’97, pp. 494, Ulm, Germany, June 29  July 4. 7. L.Wei: “Near Optimal LimitedSearch Detection, Part II: Convolutional Codes”, Submitted to IEEE Trans. on Inform. Theory, 8. L.Wei: “On bootstrap iterative Viterbi algorithm”, IEEE ICC’99, Vancouver, Canada. 9. G. Solomon, H. C. A. van Tilborg: “A connection between block and convolutional coded”, SIMA J. of Appl. Math, Vol. 37, No. 2, Oct. 1979. 10. G. D. Forney, Jr.: “Trellis shaping”, IEEE Trans. on Inform. Theory, Vol. 38, pp. 281300, Mar. 1992. 11. S. Couturier, D. J. Costello, Jr., and FuQuan Wang, “Sequential decoding with trellis shaping”, IEEE Trans. on Inform. Theory, Vol. IT41, No.6, pp. 20372040, Nov. 1995.
An Optimality Testing Algorithm for a Decoded Codeword of Binary Block Codes and Its Computational Complexity Yuansheng Tang1 , Tadao Kasami2, and Toru Fujiwara1 1
2
Graduate School of Information Science Nara Institute of Science and Technology, Ikoma, Nara, Japan Department of Computer Science, Faculty of Information Science Hiroshima City University, Hiroshima, Japan
Abstract. Based on h reference words, Kasami et. al. proposed an integer programming problem (IPP) whose optimal value of the objective function gives a suﬃcient condition on the optimality of a decoded codeword. The IPP has been solved for 1 ≤ h ≤ 3. In this paper, an algorithm for solving the IPP for h = 4 is presented. The computational complexity of this algorithm is investigated.
1
Introduction
Let IDA denote a softdecision iterative decoding algorithm for a binary block code. For most IDA’s, in each successful decoding step a candidate codeword is generated by a simple decoder and an optimality testing condition is tested on the candidate codeword. When the testing condition is satisﬁed, the decoding iteration process is terminated and the optimal (or most likely) codeword is obtained. A number of testing conditions have been derived, such as those proposed in [1] and [2]. Recently, based on h reference words, Kasami et. al. in [3,4] proposed an integer programming problem (IPP), whose optimal value of the objective function gives an optimality testing condition and can be incorporated in any IDA which is based on the generation of a sequence of candidate codewords. This testing condition with h = 3 was used eﬀectively in the iterative decoding algorithm presented in [5]. It was shown that this testing condition can provide fast termination of the decoding iteration without degrading the error performance. It is pointed out in [6] that the approach used in the derivation of this testing condition can also be used to derive other two important conditions for the IDA’s. One is the rulingout condition, which can be used to skip useless decoding steps. If the rulingout condition holds in some decoding step, then the output of the next decoding step can not be better (or more likely) than the best candidate codeword obtained so far. Another is a stronger version of the rulingout condition, called early termination condition, if the early termination condition holds in some decoding step, then all the successive decoding steps can not generate any candidate codeword which is better than the best candidate codeword obtained so far, that is, there is no improvement on error performance Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 201–210, 1999. c SpringerVerlag Berlin Heidelberg 1999
202
Yuansheng Tang, Tadao Kasami, and Toru Fujiwara
by any further iteration. In [7], the approach is also used to ﬁnd a good sequence of search centers around which algebraical decodings are iterated. In [3,4], the computational complexity for solving the IPP was investigated for 1 ≤ h ≤ 3, the number of additions and comparisons of real numbers, which is the majority of the computational complexity, for solving the IPP was shown to be of order N , where N is the code length. For larger h, this testing condition is stronger while the computational complexity for solving the IPP grows. It is desirable to give eﬀective methods to solve the IPP for relatively large h. In this paper, we consider to solve the IPP for h = 4. In Section 2, we will review brieﬂy the IPP proposed by Kasami et. al. in [3,4] and [6]. In Section 3, an algorithm for solving the IPP for h = 4 is presented. The IPP is split into at most 9 subIPP’s, the number of the variables of each subIPP is half of that of the original IPP. The number of additions and comparisons of real numbers for this algorithm is shown to be of order N 2 . In Section 4, each subIPP is split further into a few simpler subsubIPP’s that are solved by simple iterations. The proofs of the theorems and lemmas appeared in Sections 3 and 4 can be found in [8].
2
The Testing Condition of Optimality
For a positive integer N , let V N denote the set of all binary N tuples over V GF(2). Suppose a binary block code C ⊆ V N is used for error control over the AWGN channel with BPSK signaling, and r = (r1 , r2 , . . . , rN ) is a received N tuple at the output of a matched ﬁlter in the receiver. Let z = (z1 , z2 , . . . , zN ) be the binary harddecision N tuple obtained from r using the harddecision function: zi = 1 for ri > 0 and zi = 0 for ri ≤ 0. ri  indicates the reliability of zi for each i. For u = (u1 , u2 , . . . , uN ) ∈ V N , deﬁne D1 (u) {i : ui = zi , 1 ≤ i ≤ N } and D0 (u) {1, 2, . . . , N}\D1 (u). L(u) i∈D1 (u) ri  is called the correlation discrepancy of u with respect to the harddecision tuple z [3,4]. For any subset T of V N , let L[T ] minu∈T L(u), and write L[∅] +∞. The maximum likelihood decoding (MLD) can be stated in terms of the correlation discrepancy as follows: The decoder decodes the received tuple r into the optimal codeword copt ∈ C with L(copt ) = L[C]. For u ∈ V N and positive integer d, let Od (u) {v ∈ V N : dH (u, v) < d}, where dH (u, v) is the Hamming distance between u and v. Let h be a positive integer and u1 , u2 , . . . , uh be h reference words in V N . Let R ∪hj=1 Odj (uj ), where dj is called the preassigned radius for the reference word uj . Write VdN1 ,d2 ,... ,dh (u1 , u2 , . . . , uh ) V N \R, that is VdN1 ,d2 ,... ,dh (u1 , u2 , . . . , uh ) = {v ∈ V N : dH (uj , v) ≥ dj for 1 ≤ j ≤ h}. (1) Lemma 1. If a codeword cbest in R satisfies L(cbest ) ≤ L[VdN1 ,d2 ,... ,dh (u1 , u2 , . . . , uh )],
(2)
Optimality Testing Algorithm
203
then the optimal codeword copt must belong to the set R. Furthermore, we have cbest = copt if L(cbest ) = L[C ∩ R]. Lemma 1 not only deﬁnes a region which contains the optimal codeword copt but also provides a suﬃcient testing condition for the optimality of candidates which have been generated. The selection of the reference words and the preassigned radiuses eﬀect quite the eﬃciency of the results of Lemma 1. If we incorporate it in a softdecision iterative decoding algorithm, we can select the reference words from the previously generated candidate codewords, which are trusted to possess low correlation discrepancies, and take “the covering distance around uj ” assured by the decoding algorithm as the preassigned radius for the reference word uj . Here we do not consider the problem of selection of the reference words and the preassigned radiuses further. The main problem we concern in this paper is to evaluate L[VdN1 ,d2 ,... ,dh (u1 , u2 , . . . , uh )]. Below we will introduce an IPP proposed by Kasami et. al. in! [3,4], the optimal value of this IPP was shown to be equal to L[VdN1 ,d2 ,... ,dh (u1 , u2 , . . . , uh )]. Let B h denote the set of all htuples over B {0, 1}. For simple notation, the htuple α = (α1 , α2 , . . . , αh ) may be represented as α1 α2 · · · αh . Deﬁne Dα
h
Dαi (ui ), nα Dα .
(3)
i=1
Let Q∗h denote the set of all the 2h tuples over nonnegative integers. For q ∈ Q∗h , each of the 2h components of q is referred as qα with α ∈ B h . Let Qh denote the set of all the 2h tuples q ∈ Q∗h which satisfy 0 ≤ qα ≤ nα , for all α ∈ B h , qα (−1)αi ≥ δi di − D1 (ui ), for i = 1, 2, . . . , h. ∆(q)i
(4) (5)
α∈B h
Without loss of generality, we assume further that the components of the received tuple r are ordered in the increasing order of their absolute values r1  ≤ r2  ≤ · · · ≤ rN .
(6)
For convenience, we deﬁne r∗ +∞. For any subset X ⊆ {1, 2, . . . , N, ∗} and integer j, let X (j) denote the set of j smallest integers in X\{∗} if 1 ≤ j ≤ X\{∗}, the set X ∪ {∗} if j > X\{∗}, the empty set ∅ otherwise. For q ∈ Q∗h , (q ) let D(q) ∪α∈B h Dα α , L (q) i∈D(q) ri . For nonempty subset Q ⊆ Q∗h , let L [Q ] denote the optimal value of the following IPP: P(Q ): Minimize {L (q)  q ∈ Q }, i.e. L [Q ] minq∈Q L (q). For convenience, we write L [∅] +∞. A 2h tuple q ∈ Q is called a Q optimum if L (q) = L [Q ]. It was shown in [3,4] that L[VdN1 ,d2 ,... ,dh (u1 , u2 , . . . , uh )] is equal to the optimal value L [Qh ] of the IPP P(Qh ) if the received tuple r satisﬁes (6), i.e.
204
Yuansheng Tang, Tadao Kasami, and Toru Fujiwara
Theorem 1. If the received tuple r satisfies (6), we have the following formula: L [Qh ] = L[VdN1 ,d2 ,... ,dh (u1 , u2 , . . . , uh )].
(7)
If δi ≤ 0 for i = 1, 2, . . . , h, then the all zero 2h tuple q = 0 belongs to Qh and L [Qh ] = 0, this implies that the harddecision tuple z belongs to the set VdN1 ,d2 ,... ,dh (u1 , u2 , . . . , uh ). Without loss of generality, hereafter we assume that the components of δ (δ1 , δ2 , . . . , δh ) satisfy δ1 ≥ δ2 ≥ · · · ≥ δh , δ1 > 0.
(8)
For each h with 1 ≤ h ≤ 3, the IPP P(Qh ) was solved in [3,4]. The main results on the IPP P(Qh ) obtained in [3,4] can be summarized as the following three results: Result 1. If h = 1 and (6) hold, then
L [Q1 ] =
ri .
(9)
i∈D0 (u1 )(δ1 )
Result 2. If h = 2 and (6) and (8) hold, then L [Q2 ] =
ri .
(10)
((δ −δ )/2) (δ ) i∈(D00 ∪D01 1 2 ) 1
Result 3. If h = 3 and (6) and (8) hold, then L [Q3 ] = min
min
1≤j≤2 0≤k≤kj
ri ,
(11)
i∈Yj (k)
where k1 min{δ1 , (δ1 − δ2 )/2, (δ1 − δ3 )/2}, k2 min{(δ2 + δ3 )/2, (δ1 + δ3 )/2, (δ1 + δ2 )/2} and (k)
((δ −δ2 )/2 −k)
Y1 (k) D011 ∪ (D000 ∪ D010 1 Y2 (k)
(k) D000
∪
((δ +δ )/2 −k) D100 2 3
∪
((δ −δ3 )/2 −k) (δ1 −k)
∪ D001 1
((δ +δ )/2 −k) D010 1 3
)
∪
,
((δ +δ )/2 −k) D001 1 2 .
(12) (13)
Furthermore, for each j, there is a kj with 0 ≤ kj ≤ kj such that i∈Yj (k) ri  is nonincreasing in [0, kj ], nondecreasing in [kj , kj ], respectively. Since with at most 4 operations of additions and comparisons of real numbers we can determine whether i∈Yj (k+1) ri − i∈Yj (k) ri  > 0 holds or not, the number of operations of additions and comparisons of real numbers for solving the IPP P(Q3 ) is of order N . For greater h, the testing codition is stronger while the computational complexity for solving the IPP P(Qh ) grows. It is pointed out in [9] that, under the assumption of correlation uniqueness, i.e. L(u) = L(v) for any diﬀerent tuples u and v in V N , the inequality L [Qh+1 ] > L [Qh ] holds if and only if the Qh optimum is not in Qh+1 . We will solve the IPP for h = 4 in this paper.
Optimality Testing Algorithm
3
205
Algorithm for Solving the IPP P(Q4 )
In this section, we will present an algorithm to solve the IPP P(Q4 ) by splitting the IPP P(Q4 ) into at most 9 subIPP’s, the number of variables of each subIPP is only half of that of the IPP P(Q4 ). For any q ∈ Q∗4 , write S(q) {α ∈ B 4 : qα ≥ 1}. For α ∈ B 4 , let wH (α) denote the Hamming weight of α. Let Y {α ∈ B 4 : wH (α) ≤ 1}. For any with 1 ≤ ≤ 4, we call C {α ∈ B 4 : α = 0} and D Y ∪ {α ∈ 4 B : wH (α) = 2, α = 1} and E Y ∪ {α ∈ B 4 : wH (α) = 2, α = 0} are Msets. For each Mset Ξ, there is one and only one sequence in Ξ, denoted α(Ξ), such that ρ(Ξ) = (ρ(Ξ)1 , ρ(Ξ)2 , ρ(Ξ)3 , ρ(Ξ)4 ) δ + α(Ξ) is a 4tuple over even integers or a 4tuple over odd integers. Let Q(C ) denote the set of those 24 tuples q ∈ Q4 which satisfy S(q) ⊆ C and ∆(q) = ρ(C ) = δ . Let Q(D ) denote the set of those 24 tuples q ∈ Q4 which satisfy S(q) ⊆ D and ∆(q)j = ρ(D )j for all j with j = . Let Q(E ) denote the set of those 24 tuples q ∈ Q4 which satisfy S(q) ⊆ E and ∆(q)j = ρ(E )j for j = 1, 2, 3, 4. Theorem 2. If the received N tuple r satisfies (6), we have L [Q4 ] = L [Qmin ],
(14)
where Qmin ∪4=1 (Q(C ) ∪ Q(D ) ∪ Q(E )). In general, for some Msets Ξ the sets Q(Ξ) are empty or covered by the others and thus should be excluded from further consideration. Theorem 3. If the 4tuple δ satisfies (8), then we have Qmin = Q(Ξ), Ξ∈{C1 }
(15)
ℵ∗
where ℵ∗ is a set of Msets defined as follows: 4 (i). ∅ if δ2 + δ3 + 1 < 0 and j=1 ρ(E1 )j < 0; 4 (ii). {E1 } if δ2 + δ3 + 1 < 0 and j=1 ρ(E1 )j ≥ 0; (iii). {D4 } if δ2 + δ3 + 1 ≥ 0 and δ1 + δ4 + 1 < 0; (iv). {E1 , D4 } if δ2 + δ3 + 1 ≥ 0 and δ1 + δ4 + 1 ≥ 0 and δ2 + δ4 + 1 < 0; (v). {E1 , E2 , D3 , D4 } if δ2 + δ4 + 1 ≥ 0 and δ3 + δ4 + 1 < 0; 4 (vi). {E1 , E2 , E3 , E4 , D2 , D3 , D4 } if δ3 + δ4 + 1 ≥ 0 and ρ(D1 )1 > j=2 ρ(D1 )j ; 4 (vii). {E1 , E2 , E3 , E4 , D1 , D2 , D3 , D4 } if ρ(D1 )1 ≤ j=2 ρ(D1 )j . If both of (6) and (8) are valid, from Theorems 2 and 3, we see easily that to solve the IPP P(Q4 ) it is needed only to solve the IPP P(Q(Ξ)) for the Msets Ξ in {C1 } ∪ ℵ∗ . For each Mset Ξ ∈ {C1 , D1 , D2 , D3 , D4 , E1 , E2 , E3 , E4 }, we give a SubalgorithmQ(Ξ) to solve the IPP P(Q(Ξ)) in the next section. Using these subalgorithms we can solve the IPP P(Q4 ) by the following algorithm. Algorithm for Solving the IPP P(Q4 ) Input. The N tuple r with (6). The 4tuple δ = (δ1 , δ2 , δ3 , δ4 ) deﬁned by (5) and
206
Yuansheng Tang, Tadao Kasami, and Toru Fujiwara
satisfying (8). Dα and nα deﬁned by (3) for all α ∈ B 4 . Output. L [Q4 ] = L[VdN1 ,d2 ,d3 ,d4 (u1 , u2 , u3 , u4 )]. Step 1. By the deﬁnitions, determine the 4tuples ρ(D1 ) and ρ(E1 ). And then generate the set ℵ∗ by Theorem 3, and solve the IPP P(Q(Ξ)) for each Ξ ∈ {C1 } ∪ ℵ∗ by the SubalgorithmQ(Ξ). Step 2. Output minΞ∈{C1 }∪ℵ∗ L [Q(Ξ)] and END. Since for all of the subalgorithms the numbers of additions and comparisons of real numbers are of order δ12 , we see easily that the numbers of additions and comparisons of real numbers of above algorithm is of order δ12 , and thus is of order N 2 .
4 4.1
Subalgorithms for Solving the IPP’s P(Q(Ξ)) Subalgorithm for the Evaluation of L [Q(C1 )]
Let δiC max{0, (δ1 + δi + 1)/2}, i = 2, 3, 4. For integers d, k with 0 ≤ d ≤ δ1 , 0 ≤ k < δ4C , let Rd,k be the set of the 2h tuples q in Q∗4 which satisfy (4) and q0000 + q0001 = d, qα = δ1 − d, α∈C1 (16) q0010 + q0011 ≥ δ2C − d, q0100 + q0101 ≥ δ3C − d, q + q0010 + q0100 + q0110 ≥ k, 0000 qα = 0, for α ∈ C1 , 0001}. Clearly, Rd,k ⊆ Rd,k−1 ⊆ · · · ⊆ Rd,0 ⊆ Q∗4 . Let where C1 C1 \{0000, l d max{0, δ1 − α∈C nα , δ2C − n0011 − n0010 , δ3C − n0101 − n0100 , δ2C + δ3C − δ1 } 1
and dr min{δ1 , n0000 + n0001 }, then Rd,0 is not empty if and only if d satisﬁes dl ≤ d ≤ dr . We can show easily that C Rd,δ4 . (17) Q(C1 ) = dl ≤d≤dr
For any d with dl ≤ d ≤ dr , let v(d) δ1 − d − max{δ2C − d, 0} − max{δ3C − d, 0} ≥ 0, (δ2C −d)
V (d) (D0011 ∪ D0010 )
(δ3C −d)
∪ (D0101 ∪ D0100 )
,
V ∗ (d) V (d) ∪ (D0000 ∪ D0001 )(d) ∪ ((∪α∈C1 Dα )\V (d))(v(d)),
(18) (19) (20)
and let q d,0 denote the 24 tuple which satisﬁes D(q d,0 ) = V ∗ (d). Then q d,0 is an C Rd,0 optimum. We will show an iterative method to ﬁnd an Rd,δ4 optimum, or C determine Rd,δ4 = ∅, from q d,0 . For any 24 tuple q ∈ Q∗4 and two sequences α, α ∈ B 4 , let ϕ(q, α, α ) denote the 24 tuple which satisﬁes
ϕ(q, α, α )α = qα + 1, ϕ(q, α, α )α = qα − 1, (21) ϕ(q, α, α )α = qα , for all α ∈ B 4 \{α, α }.
Optimality Testing Algorithm
207
Lemma 2. Assume q d,k is an Rd,k optimum which satisfies d,k d,k d,k d,k + q0010 + q0100 + q0110 = k. q0000 d,k
d,k
(22)
d,k+1
Let ξ(q ) {ϕ(q , α, α ) : (α, α ) ∈ A}∩R , where A {(0000, 0001)}∪ ({0010, 0100, 0110} × {0011, 0101, 0111}). Then Rd,k+1 = ∅ if and only if ξ(q d,k ) = ∅, and the ξ(q d,k )optimum must be an Rd,k+1 optimum if ξ(q d,k ) = ∅. Assume q d,k is an Rd,k optimum which satisﬁes (22) and ξ(q d,k ) = ∅. Since between L (ϕ(q d,k , α, α )) and L (ϕ(q d,k , β, β )) with α = β or α = β we can ﬁnd the smaller one with no operations of real numbers, we see easily that to determine the ξ(q d,k )optimum it is enough to consider L (q) − L (q d,k ) for at most 4 tuples q of ξ(q d,k ) and thus with at most 7 operations of additions and comparisons of real numbers we can ﬁnd an Rd,k+1 optimum from q d,k . Hence, by iteration from q d,0 with at most 7δ4C operations of additions and comparisons C C of real numbers we can ﬁnd an Rd,δ4 optimum, or determine Rd,δ4 = ∅. With C respect to (17) and the deﬁnition of the Rd,δ4 optimums, we can give the following SubalgorithmQ(C1) to compute L [Q(C1 )]. SubalgorithmQ(C1) Input. The N tuple r with (6). The 4tuple δ = (δ1 , δ2 , δ3 , δ4 ) deﬁned by (5) and satisfying (8). Dα and nα deﬁned by (3) for all α ∈ C1 . Output. L [Q(C1 )]. Step 1. Compute δjC , j = 2, 3, 4 and dl and dr . Step 2. If dl > dr , then output +∞ (i.e. Q(C1 ) = ∅) and END, otherwise, for each integer d with dl ≤ d ≤ dr , generate the set V ∗ (d) and determine the 24 tuple qd,0 with D(q d,0 ) = V ∗ (d), and then, by using of Lemma 2, ﬁnd an C C C Rd,δ4 optimum q d,δ4 , or determine Rd,δ4 = ∅. C Step 3. Output mindl ≤d≤dr L (q d,δ4 ) and END. The number of d with dl ≤ d ≤ dr is at most δ1 + 1. For each d with d ≤ d ≤ dr , it needs at most 7δ4C ≤ 7δ1 operations of additions and comparisons C C C of real numbers to ﬁnd an Rd,δ4 optimum q d,δ4 or to determine Rd,δ4 = ∅. To C compute L (q d,δ4 ) it needs δ1 − 1 additions of real numbers. Hence the total number of operations of additions and comparisons of real numbers for SubalgorithmQ(C1) is not more than 8δ1 (δ1 + 1). l
4.2
Subalgorithm for the Evaluation of L [Q(D )]
For any integer k with 0 ≤ k ≤ 15, let b(k) = (b(k)1 , b(k)2 , b(k)3 , b(k)4 ) denote 4 the sequence in B 4 which satisﬁes j=1 b(k)j 2j−1 = k. Clearly, Q(D ) consists of the tuples q in Q∗4 which satisfy (4) and 3 (qsj + qsj ) − 2(qsi + qsi ) = ρ(D )ai , for i = 1, 2, 3, j=0 (23) 3 (qsj − qsj ) ≥ δ , and qα = 0 for α ∈ D , j=0
208
Yuansheng Tang, Tadao Kasami, and Toru Fujiwara
where s0 = 0000, s0 = b(2−1 ) and for i = 1, 2, 3
i+ , if i + ≤ 4, , si b(2ai −1 ), si b(2ai −1 + 2−1 ). (24) ai i + − 4, otherwise, For i = 1, 2, 3, let δiD j∈Ii ρ(D )aj /2 = j∈Ii δaj /2, where Ii {1, 2, 3} \ 3 {i} for i = 1, 2, 3. Let δ D max{0, (δ + j=1 ρ(D )aj )/2} = max{0, (δ + 3 D ˙ d,k denote the set of the 2h j=1 δj )/2}. For nonnegative integers d, k, let R ∗ tuples q in Q4 which satisfy (4) and qs0 + qs0 = d, qs0 + qs1 + qs2 + qs3 ≥ k, qs + qsi = δiD − d, for i = 1, 2, 3, (25) i qα = 0, for α ∈ D . Clearly, R˙ d,k ⊆ R˙ d,k−1 ⊆ · · · ⊆ R˙ d,0 ⊆ Q∗4 . Let d˙r min{ns0 + ns0 , δ1D , δ2D , δ3D } and d˙l max{0, max1≤i≤3 (δiD − nsi − nsi )}. Then we can prove easily that D Q(D ) = (26) R˙ d,δ −d . d˙l ≤d≤d˙r
For any d with d˙l ≤ d ≤ d˙r , let q˙ d,0 denote the 24 tuple which satisﬁes D D(q˙ d,0 ) = (Ds0 ∪ Ds0 )(d) (Dsi ∪ Dsi )(δi −d) .
(27)
1≤i≤3
Then q˙ d,0 must be an R˙ d,0 optimum. The following lemma suggests an iterative D D method for ﬁnding an R˙ d,δ −d optimum, or determining R˙ d,δ −d = ∅, from q˙ d,0 . Lemma 3. Assume q˙ d,k is an R˙ d,k optimum which satisfies q˙sd,k + q˙sd,k + q˙sd,k + q˙sd,k = k. 0 1 2 3
(28)
˙ q˙ d,k ) {ϕ(q˙ d,k , sj , s ) : 0 ≤ j ≤ 3} ∩ R˙ d,k+1 . Then R˙ d,k+1 = ∅ if and only Let ξ( j ˙ q˙ d,k ) = ∅, and the ξ( ˙ q˙ d,k )optimum is an R˙ d,k+1 optimum if ξ( ˙ q˙ d,k ) = ∅. if ξ( Similar to the SubalgorithmQ(C1), we can devise a SubalgorithmQ(D) to compute L [Q(D )]. The detail of SubalgorithmQ(D) is given in [8]. The number of operations of additions and comparisons of real numbers for SubalgorithmQ(D) is not more than 25δ1 (δ1 + 1)/2. 4.3
Subalgorithm for the Evaluation of L [Q(E )]
Clearly, Q(E ) consists of the 24 tuples q in Q∗4 which satisfy (4) and (qsj − qs∗j ) − qsi + qs∗i = ρ(E )ai , for i = 1, 2, 3, qs0 + qs0 + j∈Ii 3 + − q (qsj + qs∗j ) = ρ(E ) , and qα = 0 for α ∈ E . q s s 0 0 j=1
(29)
Optimality Testing Algorithm
209
where s∗i b(15 − 2ai −1 − 2−1 ), i = 1, 2, 3. For i = 1, 2, 3, let δiE (ρ(E ) + ρ(E )ai )/2 = (δ + δai )/2. Let E 4 δ1 + δ2E + δ3E − δ , if δ Σ is even, 1 E ρ(E )j = 1 Σ δ (30) (δ + 1), otherwise, 2 j=1 2 4 Σ 4 where δ j=1 δj . Then the 2 tuples q in Q(E ) can be given by qs∗i = δiE − xi , qsi = xj − w, for i = 1, 2, 3, (31) j∈Ii qs0 = δ E − w, qs0 = 2w − x1 − x2 − x3 , and qα = 0, for α ∈ E . where w, x1 , x2 , x3 satisfy max{0, δ E − ns0 } ≤ w ≤ δ E and 0 ≤ 2w − x1 − x2 − x3 ≤ ns0 , xj − w ≤ nsi , for i = 1, 2, 3. max{0, δi − ns∗i } ≤ xi ≤ δiE , 0 ≤ E
(32) (33)
j∈Ii
For max{0, δ E − ns0 } ≤ w ≤ δ E , let Ω(w) denote the set of pairs π (x1 , x2 , x3 ) which satisfy (32) and (33). For π ∈ Ω(w), let q w (π) be the 24 tuple of Q(E ) deﬁned by (31). We write Lw (π) L (q w (π)). For any subset Ω of Ω(w), let Lw [Ω ] minπ∈Ω Lw (π), and write Lw [∅] +∞. If a pair π ∈ Ω satisﬁes Lw (π) = Lw [Ω ], we call it Ω pair. We will show a method to evaluate L [Q(E )] by ﬁnding an Ω(w)pair for each w with Ω(w) = ∅. For integers w, x, let Ω(w, x) {π ∈ Ω(w) : π = (·, ·, x)}. At ﬁrst, we consider to determine Ω(w, x)pairs. Let Υ denote the set of pairs π = (x1 , x2 , x3 ) with xi ∈ {1, 0, −1}. For π ∈ Ω(w) and π ∈ Υ , we say that we can grow π in the π direction to π + π if Lw (π) > Lw (π + π ). For any nonempty set Ω(w, x), we can ﬁnd an Ω(w, x)pair by the following growth procedure. Growth Procedure of an Ω(w, x)pair τ (w, x) Preparation. Select an arbitrary pair π of Ω(w, x) as the seed. Step 1. We consider to grow π in the (1, 0, 0)direction and (−1, 0, 0)direction till we can not do anymore and then goto Step 2. Step 2. If we can grow π in π direction for some π of {(1, 0, 0), (−1, 0, 0), (0, 1, 0), (0, −1, 0)}, we grow it in π direction step by step till we can not do anymore and then goto Step 2, otherwise goto Step 3. Step 3. We consider to grow π in (1, −1, 0)direction and (−1, 1, 0)direction till we can not do anymore and then output τ (w, x) = π and END. Lemma 4. The growth procedure outputs an Ω(w, x)pair τ (w, x) with at most 16w operations of additions and comparisons of real numbers. Sometimes it is not easy to select a seed for this growth procedure. A concrete method for giving a seed for the growth procedure is shown in [8]. However, it is not needed to ﬁnd all the Ω(w, x)pairs by the growth procedure. Indeed, the following lemma suggests a simple method for ﬁnding a Ω(w)pair from a known Ω(w, x)pair.
210
Yuansheng Tang, Tadao Kasami, and Toru Fujiwara
Lemma 5. Let τ (w, x) be an Ω(w, x)pair and Υ ∗ {(0, 0, 0), (−1, 0, 0), (0, −1, 0), (1, −1, 0), (−1, 1, 0), (−1, −1, 0)}. Then 1. For ε ∈ {1, −1}, if Ω(w, x + ε) = ∅, then there exists a pair π in Υ ∗ such that τ (w, x) + (0, 0, ε) + ε · π is an Ω(w, x + ε)pair. 2. τ (w, x) is an Ω(w)pair if and only if Lw (τ (w, x)) ≤ min{Lw [Ω(w, x − 1)], Lw [Ω(w, x + 1)]}. For ε ∈ {1, −1}, according to Lemma 5, from an Ω(w, x)pair τ (w, x) we can get an Ω(w, x + ε)pair τ (w, x + ε) with at most 20 operations of additions and comparisons of real numbers, and determine whether Lw (τ (w, x)) ≤ Lw (τ (w, x+ ε)) holds or not with 5 more operations of additions and comparisons of real numbers. Then, with respect to Lemmas 4 and 5 and Ω(w, x) = ∅ for x > w, we can ﬁnd an Ω(w)pair τ (w) with at most 16w+25w/2 operations of additions and comparisons of real numbers. Furthermore, since the number of additions and comparisons of real numbers for computing Lw (τ (w)) is δ E +δ1E +δ2E +δ3E −2w −1 and Ω(w) = ∅ for w > δ E , we can devise a SubalgorithmQ(E) to evaluate δ E L [Q(E )] with at most w=0 (53w/2 + δ E + δ1E + δ2E + δ3E ) ≤ 63δ1 (2δ1 + 1)/2 operations of additions and comparisons of real numbers. The detail of the SubalgorithmQ(E) is given in [8].
References 1. D. J. Taipale and M. B. Pursley, “An Improvement to Generalized MinimumDistance Decoding,” IEEE Trans. Inform. Theory, vol. 37, pp. 167–172, Jan. 1991. 2. T. Kaneko, T. Nishijima, H. Inazumi and S. Hirasawa, “An Eﬃcient Maximum Likelihood Decoding Algorithm for Linear Block Codes with Algebraic Decoder,” IEEE Trans. Inform. Theory, vol. 40, pp. 320–327, Mar. 1994. 3. T. Kasami, T. Takata, T. Koumoto, T. Fujiwara, H. Yamamoto and S. Lin, “The Least Stringent Suﬃcient Condition on Optimality of Suboptimal Decoded Codewords,” Technical Report of IEICE, Japan, IT9482, Jan. 1995. 4. T. Kasami, T. Koumoto, T. Takata and S. Lin, “The Eﬀectiveness of the Least Stringent Suﬃcient Condition on Optimality of Decoded Codewords,” Proc. 3rd Int. Symp. Commun. Theory & Appl., Ambleside, UK, pp. 324–333, July 1995. 5. T. Koumoto, T. Takata, T. Kasami and S. Lin, “A LowWeight Trellis Based Iterative SoftDecision Decoding Algorithm for Binary Linear Block Codes,” IEEE Trans. Inform. Theory, vol. 45, pp. 731–741, Mar. 1999. 6. T. Kasami, Y. Tang, T. Koumoto and T. Fujiwara, “Suﬃcient Conditions for RulingOut Useless Iterative Steps in a Class of Iterative Decoding Algorithms,” to appear in IEICE Trans. Fundamentals, vol. E82A, Oct. 1999. 7. T. Koumoto and T. Kasami, “An Iterative Decoding Algorithm Based on Information of Decoding Failure,” Proc. of the 20th Symp. Inform. Theory & Appl., Matsuyama, Japan, pp. 325–328, Dec. 1997. 8. Y. Tang, T. Kasami and T. Fujiwara, “An Optimality Testing Algorithm for a Decoded Codeword of Binary Block Codes,” Technical Report of NAIST, Japan, NAISTISTR98012, Oct. 1998. 9. T. Kasami, “On Integer Programming Problems Related to Softdecision Iterative Decoding Algorithms,” to appear in Proc. of the 13th Int. Symp. AAECC, Honolulu, HI, USA, Nov. 1999.
Recursive MDSCodes and Pseudogeometries Elena Couselo1, Santos Gonzalez1 , Victor Markov2, and Alexandr Nechaev2 1
2
University of Oviedo, Spain Center of New Informational Technologies of Moscow State University, Russia nechaev@cnit.chem.msu.su
Abstract. In [2,4] the notion of a recursive code was introduced and some constructions of recursive MDS codes were proposed. The main result was that for any q ∈ {2, 6} (except possibly q ∈ {14, 18, 26, 42}) there exists a recursive MDScode in an alphabet of q elements of length 4 and combinatorial dimension 2 (i.e. a recursive [4, 2, 3]q code). One of the constructions we used there was that of pseudogeometries; it enabled us to show that for any q > 126 (except possibly q = 164) there exists a recursive [4, 2, 3]q code that contains all the “constants”. One part of the present note is the further application of the pseudogeometry construction which shows that for any q > 164 (resp. q > 26644) there exists a recursive [7, 2, 6]q code (resp. [13, 2, 12]q code) containing ”constants”. Another result presented here is a negative one: we show that there is no nontrivial pseudogeometry consisting of 14, 18, 26 or 42 points with no lines of order 2, 3, 4 or 6, so the pseudogeometry construction cannot be applied for settling the question mentioned in the above. In both cases the usage of computer is essential.
Introduction A code K ⊂ Ω n in an alphabet Ω of q elements is called krecursive, 1 ≤ k < n, if there exists a function f : Ω k → Ω such that K consists of all the rows u(0, n − 1) = (u(0), . . . , u(n − 1)) ∈ Ω n with the property u(i + k) = f(u(i, i + k − 1)),
i ∈ 0, n − k − 1.
In other words K is the set of all output nsequences of a feedback shift register with a feedback function f. We denote K = K(n, f) and investigate the existence problem for MDScodes of such type, i.e. recursive [n, k, n − k + 1]q codes. In connection with this, we consider the following three parameters: n(k, q) – maximum of lengths of MDS codes K of (combinatorial) dimension k (K = q k ) in alphabet Ω of cardinality q. r n (k, q) – maximum of lengths of krecursive MDS codes of the same type. nir (k, q) – maximum of lengths of krecursive MDS codes of the same type which contains all the “constants”, i.e. all the words (a, . . . , a) : a ∈ Ω. It is clear that nir (k, q) ≤ nr (k, q) ≤ n(k, q). Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 211–220, 1999. c SpringerVerlag Berlin Heidelberg 1999
212
Elena Couselo et al.
Here we study only the case k = 2. In this case our problem has the following simpliﬁcation. Let us deﬁne the ith recursive derivation f (i) (x, y) of the recursive law f(x, y) recursively: f (0) (x, y) = f(x, y), f (1) (x, y) = f(y, f(x, y)), f (i) (x, y) = f(f (i−2) (x, y), f (i−1) (x, y)). The necessary condition for K(n, f) to be MDS is that (Ω, f(x, y)) is a quasigroup. We say that a quasigroup Ω is tstable if (Ω, f (i) ) is also a quasigroup for i ∈ 1, t, It was proved in [2, Theorem 4] that nr (2, q) ≥ m (resp., nir (2, q) ≥ m) if and only if there exists an (m − 3)stable (idempotent) quasigroup f(x, y) of order q. The aim of this note is to show how the notion of pseudogeometries can be used to provide some estimations of nir (2, q). We show also some limitations of this technique. The term “pseudogeometry” was taken from [1] where pseudogeometries were used to construct Latin squares which are orthogonal to their transposes. Since the construction of the quasigroups under consideration may be reduced to the construction of orthogonal Latin squares with special properties, it was natural to use pseudogeometries for this purpose also. Now we remind that a pseudogeometry is a pair P = (P, L), where L is a set of nonempty subsets of the nonempty set P , if for any diﬀerent x, y ∈ P there is a unique subset L(x, y) ∈ L such that {x, y} ⊆ L(x, y). The elements of the sets P and L are called points and lines, respectively. We consider only the ﬁnite pseudogeometries. The cardinality of a line will be often called its length. Note that if one adds to or removes from L any set of onepoint sets, the pair (P, L) remains a pseudogeometry. We will say that a pseudogeometry P = (P, L) is nontrivial if P ∈ L. The standard examples of pseudogeometries are aﬃne and projective planes over ﬁnite ﬁelds.
1
Idempotent Quasigroups of Large Orders
For the convenience of the reader we recall here the deﬁnition of the accompanying pseudogeometry for a mutually orthogonal set of Latin squares [2]. Let q be some positive integer and suppose that there exists a mutually orthogonal set of s Latin squares f1 , . . . , fs of order q, fi : (0, q − 1)2 → 0, q − 1. Consider the set P of all pairs (x, y), x ∈ 0, q − 1, y ∈ −1, s. Deﬁne lines of two types: (1) horizontal lines: Hy = {(x, y) : x ∈ 0, q − 1}, for any ﬁxed y ∈ −1, s; (2) skew lines: Sij = {(i, −1), (j, 0)} ∪ {(fy (i, j), y) : y ∈ 1, s}, for any ﬁxed i, j ∈ 0, q − 1. Let H = {Hy : y ∈ −1, s}, S = {Sij : i, j ∈ 0, q − 1}, L = H ∪ S. It is easy to see that (P, L) is a pseudogeometry with lines of length q and s+2. Later we use this construction mostly in the case of primary q, when one can take s = q − 1. Note that in this case one can obtain the accompanying pseudogeometry by removing one point from the projective plane over a ﬁeld of order q.
Recursive MDSCodes and Pseudogeometries
213
Let (P, L) be an arbitrary pseudogeometry, and P an arbitrary nonempty subset of the set P . Let L = {L ∩ P : L ∈ L, L ∩ P  > 1}. It is easy to see that (P , L ) is also a pseudogeometry. We shall call (P , L ) the reduced pseudogeometry for the pseudogeometry (P, L). The following result is very close to one of the main constructions of [1]: Theorem 1 Let (P, L) be a pseudogeometry. Then nir (2, P ) ≥ min {nir (2, L) : L ∈ L}. ✷ The proof follows that of [2, Theorem 10]. Let t = min {nir (2, L) : L ∈ L} − 3 and deﬁne an operation fL : L2 → L for every L ∈ L in such a way that L becomes an idempotent tstable quasigroup. Then for any x, y ∈ P such that x = y let f(x, y) = fL (x, y) where L = L(x, y) and let f(x, x) = x, for any x ∈ P . It is obvious that the operation is welldeﬁned and (P, f) is an idempotent quasigroup. So if y = f(x, y) then x = y, x, y ∈ P . An easy induction argument (i−1) (i) shows that for any i ∈ 1, t, any L ∈ L and any x, y ∈ L, fL (x, y) = fL (x, y) (i) implies x = y. So if x = y, then f (i) (x, y) = fL (x, y) where L = L(x, y), (i) (i) and f (x, x) = x, for any x ∈ P , hence (P, f ) is again a quasigroup. The application of [2, Theorem 4] ﬁnishes the proof. ✷ Now let I(n) be the set of integers q such that nir (2, q) ≥ n. It is known that 2, 3, 4, 6 ∈ I(4) and that r ∈ I(4) for any q ≥ 127 (except possibly q = 164) [2, Corollary 8]. The authors do not know if 10 ∈ I(4) but any prime number p ≥ n and any primary number q > n belong to I(n) as well as any product of such numbers (see [2, Propositions 10, 11]). The following theorem is a slight generalization of [2, Theorem 11], but it gives many extra numbers in the search described below. Theorem 2 Let n be an integer, n ≥ 4, and s, t, l, m, d1 , . . . , dl such nonnegative integers that (a) there exist t + l + m − 2 mutually orthogonal Latin squares of order s; (b) s, t, t + 1, . . . , t + l, d1 , . . . , dl ∈ I(n); (c) if m > 0 then also t + l + 1, t + l + m ∈ I(n); (d) 1 ≤ di < s for i = 1, . . . , l. Then q = st + d1 + · · · + dl + m ∈ I(n). ✷ Consider the accompanying pseudogeometry (P, L) for the given set of mutually orthogonal Latin squares. It contains t+l +m horizontal lines which we shall consider as numbered from the ﬁrst one to the (t + l + m)th. Take one skew line and call it the distinguished vertical line. Delete s − dl points from the (t + l)th horizontal line, s − dl−1 points from the (t + l − 1)th one and so on in such a way that no point of the distinguished vertical line is removed. Then delete all the points of the remaining m horizontal lines except that of the distinguished vertical line.
214
Elena Couselo et al.
Let P be the set of the remaining points and (P , L ) the reduced pseudogeometry. The situation is illustrated by Fig. 1 (here s = 11, t = 7, l = 1, d1 = 5, m=3, empty circles represent deleted points, full circles represent the points of P , the left column of points represents the distinguished vertical line). • • • • • • • • • • •
• • • • • • • • ◦ ◦ ◦
• • • • • • • • ◦ ◦ ◦
• • • • • • • • ◦ ◦ ◦
• • • • • • • • ◦ ◦ ◦
• • • • • • • ◦ ◦ ◦ ◦
• • • • • • • ◦ ◦ ◦ ◦
• • • • • • • ◦ ◦ ◦ ◦
• • • • • • • ◦ ◦ ◦ ◦
• • • • • • • ◦ ◦ ◦ ◦
• • • • • • • ◦ ◦ ◦ ◦
Fig. 1. Reduced pseudogeometry. If L is a horizontal line in L such that L ∩ P  > 1 then L ∩ P  is either s or one of di , i ∈ 1, l. on L ∩ P . If L is a skew line in L, and it is not the distinguished vertical line, then t ≤ L ∩ P  ≤ t + l + 1 since L contains not more than one remaining point of each of l “shortened” horizontal lines and not more then one point of the distinguished vertical line lying on the last m horizontal lines. Finally, if L is the distinguished vertical line then L∩P  = L = t+l +m. Now the result follows from Theorem 1. ✷ As in [2] we obtain the following Corollary 3 If there exists a number k such that 2k − 1, 22k+1 ⊂ I(n) then q ∈ I(n) for any q ≥ 2k − 1. ✷ The proof is exactly the same as that of [2, Corollary 8] and therefore is omitted. ✷ Now we shall outline the idea of the “experimental” part of our work. First we ﬁxed some reasonable value qmax and ﬁlled the array of current estimations of nir (2, q) for q ≤ qmax with initial values based on the known estimations for prime and primary q and on the quasigroup product construction. Then the program repeatedly checked the numbers q ∈ 10, qmax for the existence of numbers satisfying the conditions of Theorem 2 (with prime and primary s and l ≤ 3), taking into account the estimations numbers found on the previous repetitions. The usage of this algorithm was limited because we had to keep all the found numbers in the computer’s memory. When we obtained rather long series of consecutive numbers in I(n) for some n > 4 we passed the result to the second step of the algorithm which used the Theorem 2 only with prime and primary s and with l = 0 (we did not need to keep all the estimations now). We summarize the computation results in the following
Recursive MDSCodes and Pseudogeometries
215
Theorem 4 nir (2, q) ≥ 7 for all q > 164; nir (2, q) ≥ 13 for all q > 26644. One can conjecture that for any integer n there exists an integer q0 = q0 (n) such that nir (2, q) ≥ n for all q ≥ q0 . For n(2, q), the corresponding fact is well known (see e.g. [5]): n(2, q) ≥ q 10/143 . Note however that this gives n(2, q) ≥ 7 only for q > 714.3 ≈ 1.2 · 1012 and n(2, q) ≥ 13 only for q > 1314.3 ≈ 8.5 · 1015 .
2
Nonexistence Theorem
In this section we will consider the question of existence of a pseudogeometry with lines whose lengths belong to a given set of integers. Recall that it is still an open question if there exists a 1stable quasigroup of order q if q ∈ {14, 18, 26, 42}. On the other hand, there is no 1stable idempotent quasigroups of order q if q ∈ {2, 3, 4, 6}. So it would be desirable, as the ﬁrst step to the positive solution of the question by the pseudogeometry technique, to construct the pseudogeometries with 14, 18, 26 and 42 points without lines of length 2,3,4,6. But instead of that we have proved a negative result. First we give a direct proof that there is no pseudogeometry with 14, 18 or 26 points without lines of length 2,3,4,6. For the pseudogeometries with 42 points and the same restrictions on lines we deduce some properties that allow us to carry out an exhaustive computer search which shows that they do not exist also. Let π = (P, L) be a pseudogeometry. For any point x ∈ P denote by Lx the set of lines containing x, and for any x ∈ P and L ∈ L such that x ∈ L let M(x, L) = {L(x, y) : y ∈ L}, M(x, L) = {L : L ∈ M(x, L)}. Let m(π) and M(π) be the minimal and the maximal length of line in π, respectively. Denote by Li the set of lines of length i and let ki = Li . We begin with some rather elementary observations. Lemma 5 If π = (P, L) is a pseudogeometry then P (P  − 1) = Σi i(i − 1)ki
(2.1)
✷ It is suﬃcient to note that the both parts of (2.1) are equal to the number of ordered pairs (x, y) ∈ P × P such that x = y, taking into account that every such pair belongs to L × L for exactly one line L ∈ L. ✷ Lemma 6 If π = (P, L) is a pseudogeometry with even (odd) number of points then every point in P belongs to odd (even) number of lines of even length. ✷ Let x ∈ P . Then P \ x is a disjoint union of the sets L \ {x}, where L ∈ Lx , so P  − 1 = ΣL∈Lx (L − 1) ≡ k
mod 2,
where k is the number of lines of even length in Lx . ✷ Lemma 7 For any nontrivial pseudogeometry π = (P, L), (m(π) − 1)M(π) + 1 ≤ P 
(2.2)
216
Elena Couselo et al.
✷ Let L be a line of length M(π). Take a point x ∈ P \ L and note that M(x, L) = {x} ∪ (
L(x, y) \ {x}) = 1 + Σy∈L L(x, y) − 1,
y ∈L
since the union in the previous line is a disjoint one. So P  ≥ M(x, L) ≥ 1 + (m(π) − 1)M(π). ✷ Note that for a projective plane the equality holds in (2.2). We will say for brevity that a pseudogeometry is admissible if it is nontrivial and does not contain a line whose length is 2, 3, 4 or 6. Now we arrive at the following Prop osition 8 If π = (P, L) is an admissible pseudogeometry, then either P  > 28 or P  ∈ {21, 25}. ✷ First suppose that m(π) ≥ 7. Then Lemma 7 gives P  ≥ 7 · 6 + 1 = 43. So we can suppose that m(π) = 5. Again if M(π) ≥ 7 then P  ≥ 7 · 4 + 1 = 29, and only the case remains when all the lines have length 5. If any two lines in π have a common point then there is equality in (2.2) and so P  = 21. Suppose there are two “parallel” lines, say L1 and L2 . Consider any line L that intersects L1 at some point x. Suppose L ∩ L2 = ∅. Then we have a disjoint union {x} ∪ (L1 \ {x}) ∪ (L \ {x}) ∪ (M(x, L2 ) \ {x}) ⊆ P, so P  ≥ 1+4+4+4·5 = 29. So if P  ≤ 28 then for any point x ∈ L1 and any point y ∈ L1 the line L(x, y) intersects L2 . This means that P = (L1 \ {x}) ∪ M(x, L2 ) and P  = 4 + 21 = 25. ✷ Proposition 8 means in particular that there is no hope to construct an idempotent 1stable quasigroup of order 14, 18 or 26 using pseudogeometries. From this point till the end of the section we suppose that π = (P, L) is an admissible pseudogeometry with 42 points. We already know from Lemma 7 that m(π) = 5. The following lemma gives a rough estimation of M(π). Lemma 9 M(π) < 10. ✷ By virtue of Lemma 7, M(π) < 11, since 4 · 11 + 1 > 42. Let L be a line in π of length 10 and x ∈ P \ L. If some line in M(x, L) contains more than 5 (and hence more than 6) points, then as in the proof of Lemma 7 one has P  ≥ 1 + 6 + 9 · 4 = 43 > 42, a contradiction. So all the lines L0 , . . . , L9 have length 5, so M(x, L) = 1 + 10 · 4 = 41. This means that there exists a point y ∈ P \ M(x, L). So there is a disjoint union L(x, y) ∪ (M(x, L) \ {x}) ⊆ P , so P  ≥ L(x, y) + 10 · 4 ≥ 5 + 40 > 42, a contradiction again. ✷ Now the only even length of a line in P is 8, so by Lemma 6 every point in P belongs to 1, 3 or 5 lines of length 8 (7 lines of length 8 with one common point contain 50 points). Lemma 10 Every two lines of length 8 in π intersect each other.
Recursive MDSCodes and Pseudogeometries
217
✷ Suppose L1 , L2 ∈ L, L1  = L2  = 8 and L1 ∩ L2 = ∅. Consider a point x ∈ P \ (L1 ∪ L2 ). Take a line L ∈ Lx of length 8. Suppose ﬁrst that L ∩ L1 = ∅ and L ∩ L2 = ∅. As in the proof of Lemma 7, there is a disjoint union {x} ∪ (L1 \ {x}) ∪ (L \ {x}) ∪ (M(x, L2 ) \ {x}) ⊆ P, so P  ≥ 1 + 2 · 7 + 4 · 8 = 47 > 42, a contradiction. If L ∩ L1 = ∅ and L ∩ L2 = ∅ then L ∈ M(x, L2 ), so again we have a disjoint union {x} ∪ (L1 \ {x}) ∪ (M(x, L2 ) \ {x}) ⊆ P, but now one of the lines in M(x, L2 ) has length 8, so M(x, L2 )\{x} ≥ 7·4+7 = 35 and P  ≥ 1 + 7 + 35 = 43 > 42. So the only remaining possibility is that line L does not intersect lines L1 and L2 . Let L3 = L and consider any point y in P \ (L1 ∪ L2 ∪ L3 ). Again there is a line of length 8 that contains y and by the preceding argument it does not intersect lines L1 , L2 and L3 . Repeating this argument, we have that P  is a multiple of 8, which is wrong. ✷ Lemma 11 There is no point in π that belongs to 5 lines of length 8. ✷ Suppose on the contrary that x ∈ P is such apoint and L1 , . . . , L5 are 5 lines 5 of length 8 such that x ∈ Li , i ∈ 1, 5. Let M = i=1 Li . Since M = 1+5·7 = 36 and the set P \ M is covered by disjoint sets L(x, y) \ {x}, where y spans P \ M, of cardinality not less then 4 each, the only possibility is that P \ M = L \ {x} for some line L of length 7. Take any point y ∈ L \ {x} and consider a line K of length 8 in Ly . Then K ∈ {L, L0 , . . . , L5 }, K ∩ M = 5 and K ∩ L = 1, so K contains 2 “extra points”. A contradiction. ✷ Now let us introduce some more notation. Let Ni be the set of points in P that belong to exactly i lines of length 8. Lemma 12 N3  = 7, N1  = 35, k8 = 7, k9 = 0. Moreover, the reduced pseudogeometry π = (N3 , L ) is the projective plane P2 (F2 ). ✷ Let ni = Ni , i = 1, 3. Then by Lemma 6 and Lemma 11 we have n1 + n3 = 42.
(2.3)
Counting all the points of the lines in L8 as diﬀerent ones and taking into account that each point of N3 will be counted 3 times, we obtain n1 + 3n3 = 6k8 .
(2.4)
Now let ∆ ⊆ L8 × L8 be the diagonal: ∆ = {(L, L) : L ∈ L8 } and consider the map ϕ : L8 × L8 \ ∆ → N3 deﬁned by the rule: ϕ(L1 , L2 ) is the intersection point of the lines L1 and L2 . This map is welldeﬁned by Lemma 10. By deﬁnition of N3 , ϕ−1 (x) = 6 for any x ∈ N3 . So 6n3 = k8 (k8 − 1).
(2.5)
218
Elena Couselo et al.
Solving the equation system (2.3)—(2.5) and using the condition n1 ≥ 0 we obtain N3 = 7, N1 = 35, k8 = 7. Since each point in P belongs to some of 7 lines in L8 , there is no line of length 9 in L, so k9 = 0. Now consider the reduced pseudogeometry π = (N3 , L ). First check that every two diﬀerent points x, y in N3 belong to one line of length 8 in L. Really, if (L8 ∩ Lx ) ∩ (L8 ∩ Ly ) = ∅ then three lines of L8 ∩Lx have 9 intersection points with three lines of L8 ∩Lx , which is impossible. So N3 ∩ L ≤ 1 for any line L ∈ L \ L8 ), and so L  = L8  = 7. Note also that four points of N3 cannot belong to one line: every one of these 4 points would belong to two another lines of length 8 and all these 9 lines would be diﬀerent, which contradicts the equality k8 =7. Using Lemma 5 for π we have k2 + k3 = 7, 2k2 + 6k3 = 42, so k2 = 0 and k3 = 7. So we have ﬁnally that every line in L contains 3 points and every point is contained in 3 lines, while the numbers of points and lines in π are equal to 7. So π is isomorphic to the projective plane P2 (F2 ). ✷ The conﬁguration described by Lemma 12 is presented in Fig. 2, where the lines of length 8 are drawn and the points of N3 and N1 are marked.
Fig. 2. Lines of length 8 in the pseudogeometry π.
Lemma 13 Every point of N3 belongs to 5 lines of length 5 and to no line of length 7. ✷ Let x ∈ N3 and L ∈ Lx \ L8 . Then L \ {x} is contained in the union of four disjoint sets K \ N3 , K ∈ L8 \ Lx , and each of these sets has one common point with L. So L = 5. On the other hand, each of these sets contains 5 points, so Lx ∩ L5  = 5. ✷
Recursive MDSCodes and Pseudogeometries
219
Lemma 14 Every point of N1 belongs to some line of length 7. ˜ ∈ L8 \ Lx . Then M(x, L) ˜ = 36 and ✷ Let x ∈ N1 and L ∈ Lx ∩ L8 and L remaining set of 6 points cannot be the disjoint union of sets of cardinality 4. ✷ Now we pass again to the “experimental” part of our arguments. Applying Lemma 5 and Lemma 12 to π we obtain the equation 42 · 41 = 20k5 + 42k7 + 7 · 56. It has three solutions in nonnegative integers: (a) k5 = 14, k7 = 25; (b) k5 = 35, k7 = 15; (c) k5 = 56, k7 = 5. The case (a) is impossible since by Lemma 13 k5 ≥ 35. In the case (b) it is evident that every point x ∈ N1 belongs to 1 line of length 8 and to 4 lines of length 5, so it must belong to 3 lines of length 7, and that every line of length 7 must intersect every line of length 8 in some point of N1 . If the points of L \ N1 are marked by numbers 0, . . . , 4 for every line L of length 8 then every line of length 7 may be presented as vector (a0 , . . . , a6 ), were ai ∈ 0, 4. In other words the lines of length 7 must form a code of length n = 7 over an alphabet of q = 5 elements having cardinality c = 15 and distance d = 6. Such codes seem to be of independent interest because their parameters lay on Plotkin boundary: d≤
q−1 c · ·n q c−1
(see, e.g., [9, Theorem 1.1.39]). So we ﬁrst asked if such codes exist. The answer was aﬃrmative: Proposition 15 There exist codes of length n = 7 over an alphabet of q = 5 elements having cardinality c = 15 and distance d = 6. ✷ The codes in question were constructed by a computer program. ✷ But the attempt to add the lines of length 5 failed: exhaustive search showed that it is impossible. In the case (c) it is easy to deduce from Lemma 14 that the lines of length 7 do not intersect each other, and that every point in N1 belongs to 4 lines of length 5 that connect it with points of N3 and to one line of length 5 that is contained in N1 . Again these restrictions make the exhaustive search possible and again it showed that the desired conﬁguration does not exist. We can summarize the results on the pseudogeometries with 42 points as follows. Theorem 16 Any nontrivial pseudogeometry with 42 points must contain a line of length 2,3,4 or 6.
220
Elena Couselo et al.
References 1. R.K.Brayton, D.Coppersmith & J.Hoﬀman, “Self orthogonal Latin squares of all orders n = 2, 3, 6”, Bull, Amer. Math. Soc., 80 (1974), 116–119. 2. Couselo E., Gonzalez S., Markov V., Nechaev A. “Recursive MDScodes and recursively diﬀerentiable quasigroups”, Diskr. Math. and Appl., V.8, No. 3, 217247, VSP, 1998. 3. Couselo E., Gonzalez S., Markov V., Nechaev A. “Recursive MDScodes and recursively diﬀerentiable quasigroupsII”, Diskr. Math. and Appl., to appear. 4. Couselo E., Gonzalez S., Markov V., Nechaev A. “Recursive MDScodes”. In: Workshop on coding and cryptography (WCC’99). January 1114, 1999, Paris. Proceedings. 271277. 5. J.D´enes & A.D.Keedwell, “Latin Squares and their Applications”. Akad´emiai Kiad´ o, Budapest; Academic Press, New York; English Universities Press, London, 1974. 6. J.D´enes & A.D.Keedwell, “Latin squares. New developments in the theory and applications”. Annals of Discrete Mathematics, 46. NorthHolland, Amsterdam, 1991. 7. Heise W. & Quattrocci P. “Informations und codierungstheorie”, Springer, 1995, Berlin–Heidelberg. 8. MacWilliams F.J. & Sloane N.J.A. “The theory of ErrorCorrecting Codes”. Elsevier Science Publishers, B.V., 1988. North Holland Mathematical Library, Vol. 16. 9. Tsfasman M.A., Vlˇ adut¸ S.G., “Algebraicgeometric codes”, Kluwer Acad. Publ., Dordrecht–Boston–London, 1991.
Strength of MISTY1 without FL Function for Higher Order Diﬀerential Attack Hidema Tanaka, Kazuyuki Hisamatsu, and Toshinobu Kaneko Department of Electrical Engineering, Science University of Tokyo 2641 Yamazaki Noda city 2788510 Japan
Abstract. The encryption algorithm MISTY is a ”provably secure” one against Linear and Diﬀerential cryptanalysis. Since the designer showed 3 round MISTY1 without FL function is provable secure, we omit FL to estimate the strength for Higher Order Diﬀerential Attack. This attack is a chosen plain text attack and uses the value of higher order diﬀerential of output to derive an attacking equation for subkeys. The value depends on the degree of output and the degree depends on the choice of plain texts. We show that the eﬀective chosen plain text and 5 round MISTY1 without FL is attackable using 11 diﬀerent 7th order diﬀerentials. And we show the attacks to remaining subkeys by the determined subkeys and intermediate values. Intermediate value is a constant in the process of encryption. As the result, we can determine all subkey in 5 round MISTY1 without FL.
1
Introduction
Linear and Diﬀerential cryptanalysis are the powerful attack to DESlike cryptosystems. As a counter measure, the concept of ”provably secure” against them is proposed [5][7]. The encryption algorithm MISTY, proposed by Matsui in 1996, is a block cipher designed under such concept [1]. There are two types of algorithm, MISTY1 and MISTY2. MISTY has Ffunction named FO. MISTY1 has DESlike structure with 8 rounds FO function. The designer claimed that 3 round MISTY1 without FL function is enough to have provable security against Linear and Diﬀerential cryptanalysis. Since ”provable security” is guaranteed only by FO, we analyzed modiﬁed MISTY1, which has no FL and reduced number of rounds, by Higher Order Diﬀerential Attack. This is a chosen plain text attack which uses the fact that the value of higher order diﬀerential of the output does not depend on subkeys. The order for the attack depends on the chosen plain text and it aﬀects the number of plain texts and the computational cost. We show the outline of the attack in Section 2 and 3. In Section 4, we show the eﬀective chosen plain text which enables the attack to 5 round MISTY1 without FL. The attack is consisted of 3 main parts. Section 4 shows an attack using 7th order diﬀerentials to determine 4 subkeys in 5th round. Section 5 shows the estimation of intermediate values. Intermediate value is a constant in the process of encryption, which is a function of ﬁxed part of plain text and Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 221–230, 1999. c SpringerVerlag Berlin Heidelberg 1999
222
Hidema Tanaka, Kazuyuki Hisamatsu, and Toshinobu Kaneko
P : Plain text 64[bit]
❄ ❄ FO ✲ ❤ ✘✘✘ ✘✘ ❄ FO ✲ ❤ ✘✘✘ ✘✘ ❄ FO ✲ ❤ ✘✘✘ ✘✘
❄ FI
❤✛ ✘✘✘ ✘✘
❤✛ ❄
FI
FI
❤✛truncate ✘✘✘ ✘✘ Kijk✲ ❤
❤✛ ✘✘✘ ✘✘
✘✘✘ ✘✘
❄
K
C : Cipher text 64[bit] (i) MISTY without FL
❄
❤✛zeroextend ✘✘✘ ✘✘ Kijk✲ ❤
❤✛ ✘✘✘ ✘✘
·· ·
·· ·
Kijk✲ ❤ S9
(ii) Equivalent FO
S7
S9
❤✛zeroextend ✘✘✘ ✘✘ ❄
(iii) Equivalent FI
Fig. 1. The modiﬁed MISTY (where K and Kijk denotes the eqivalent subkeys).
subkeys. Section 6 shows an attack using diﬀerentials of intermediate values to determine remaining subkeys.
2
Modiﬁed MISTY1
In the following, we discuss one type of the algorithm, MISTY1. The successful attack of Linear Attack or Diﬀerential attack depends on maximum linear or diﬀerential probability. Let p be the average probability of them for F function. From the theorem shown by Nyberg and Knudsen [5], the probability for 3 round F functions equals to p2 . If the probability p2 is low enough, they call such property as ”provable secure” against Linear and Diﬀerential Attack. The designer showed that F function named FO has the probability with p ≤ 2−56 and 3 round FO function is provable secure. Though the main part of security is guaranteed by FO function, the designer added the auxillary function FL, expecting the higher security. FO function is consisted of 3 rounds FI functions. FI function is consisted of two kinds of Sboxes called S7 and S9. The degree of S7 is 3 and the degree of S9 is 2. We denote FO function in ith round as FOi, jth FI function in FOi as FIij . We attack modiﬁed MISTY1 by Higher Order Diﬀerential Attack. The modiﬁcation assumes omitting FL functions and reducing number of rounds motivated by the statement of the designer that MISTY1 without FL is enough to attain a provable security assuming more than or equals to 3 rounds. To simplify
Strength of MISTY1 without FL Function Xij
❄ FIij Yij
❄ ❤✛ ❄
Hij
(i) FIij
223
Zijk
❄ ❤✛ Xijk
Kijk
❄
K (i)
❄
Yijk
❄ ❤✛ ❄
❄
CL (X)
Hijk
H (i) (X)
❄ FO ✲ ❤
S
❄
CR (X)
(ii) kth Sbox in FIij (iii)The last round of i round MISTY
Fig. 2. (i)(ii):The input and output variables for FIij and kth Sbox, (iii):The last round of i round MISTY.
the attacking equation, we deduce the equivalent FO function and FI function (Figure 1). We denote equivalent subkeys as K and Kijk . In the following, we use input and output variables for FIij and kth Sbox as shown in Figure 2(i) and (ii).
3 3.1
Higher Order Diﬀerential Attack Higher Order Diﬀerential
Let F (X; K) be a function : GF(2)n × GF(2)s → GF(2)m . Y = F (X; K),
(X ∈ GF(2)n , Y ∈ GF(2)m , K ∈ GF(2)s )
(1)
Let (A1 , A2 , . . . , Ai ) be a set of linear independent vectors in GF(2)n and V (i) be a subspace spanned by the set. We deﬁne ∆(i) F (X; K) as the ith order diﬀerential of F (X; K) with respect to X as follows. ∆(i)F (X; K) = F (X + A; K) (2) A∈V (i)
If degX F (X; K) = N , we have the following properties. Let symbol ”]d ” be the operation which omits terms whose degree is smaller than d. Property 1. degX {F (X; K)} = N ⇒
∆(N +1)F (X; K) = 0 ∆(N )F (X; K) = ∆(N )F (X; K)]N
(3)
Property 2. Let F (X) : GF(2)n → GF(2)m . If V (n)=GF(2)n , then for any ﬁxed value f ∈ GF(2)n , ∆(n)F (X + f; K) = ∆(n)F (X; K).
224
3.2
Hidema Tanaka, Kazuyuki Hisamatsu, and Toshinobu Kaneko
Attacking Equation
Figure 2(iii) shows the last round of i round MISTY1. H (i) (X) can be calculated by cipher text CL (X), CR (X)∈ GF(2)32 and subkey K (i), as follows. H (i) (X) = F O(CL (X); K (i) ) + CR (X)
(4)
If degX H (i) (X) = N , following equation holds. ∆(N )H (i) (X) = ∆(N )F (X; K (1···(i−2)))]N
(5)
where F (·) denotes the function GF(2)32 × GF(2)75×(i−2) → GF(2)32 . K (1···(i−2)) denotes the set of keys for previous (i − 2) rounds. From equations (4) and (5), we can derive the following equation. {F O(CL (X + A); K (i) ) + CR (X + A)} = ∆(N )F (X; K (1···(i−2)) )]N (6) A∈V (N )
If the right hand of this equation can be determined for some analytical method, we can use this equation (6) as the attacking equation for K (i).
4 4.1
Attack of MISTY1 Eﬀective Chosen Plain Text
The order for Higher Order Diﬀerential Attack depends on the chosen plain text. Since the order aﬀects the number of chosen plain texts and the computational cost, it is important to search for the eﬀective chosen plain text. The plain text can be divided into 8 subblocks according to the Sboxes to be inputed to. GF(2)7 , i = even P = (X7 , X6 , . . . , X1 , X0 ), Xi ∈ (7) GF(2)9 , i = odd, (i = 0 ∼ 7) The degree of output depends on which subblock we choose as a variable. We searched for the eﬀective choice which makes the slowest increase of degree. As the result, the eﬀective one is to keep all the subblocks ﬁxed except right most 7[bit] subblock X0 . For this chosen plain text, the increase of degree by the formal analysis is shown in Figure 3. The symbol < ij > denotes that the degree of left block is i and the right block is j. 4.2
Attacking Equation Using 7th Order Diﬀerential
We have 7[bit] variable for the chosen plain text P . Let’s discuss the attack using 7th order diﬀerential. We use a subspace V (7) as follows. V (7) = (A1 , A2 , . . . , A7 ), Ai = (0, 0, . . . , 1, . . . , 0)) ∈ GF(2)64 ↑ (i − 1)th bit
(8)
Strength of MISTY1 without FL Function < 33 >
❄ X0 + H133 S9 Y221
< 22 >
❄ Z322
❄
Function F(·)
225
S9 <6>
S9 <4>
S7 < 3 >❤✛truncate
S7 < 9 >❤ ✛truncate
S7 < 6 >❤ ✛truncate
S9 <2>
S9 < 12 >
S9 <8>
❤✛zeroextend ✘✘✘ ✘✘
❤✛zeroextend ✘✘✘ ✘✘
✘✘✘ ✘✘
✘✘✘ ✘✘
❤✛zeroextend ✘✘✘ ✘✘
❤✛ H21
< 3 >❄< 3 > < 33 > (i) FI22
❤✛zeroextend ✘✘✘ ✘✘
H312 < 9 >❄< 12 > < 912 > (ii) FI31
❤✛zeroextend ✘✘✘ ✘✘ ✘✘✘ ✘✘ ❤✛zeroextend ✘✘✘ ✘✘
H322 < 6 >❄< 8 > < 68 > (ii) FI32
Fig. 3. The increase of degree by the formal analysis (To simplify the expression, we omit subkeys in these ﬁgures.)
L7 Let H32 be the left 7[bit] of the output from FO3. L7 = H312 + H322 + Z322 H32
(9)
From Property 1, the following holds. L7 ∆(7)H32 = ∆(7)(H312 + H322 + Z322 )]7 = ∆(7)H312 ]7
(10)
Let F (·) be the function GF(2)7 × GF(2)9 → GF(2)7 shown in Figure 3. H312 = F (X0 + H133 + K222 , Y221 )
(11)
Note that Y221 is a constant for the chosen plain text P . As X0 spans GF(2)7 , from Property 2, the following holds. ∆(7)H312 = ∆(7)F (X0 + H133 + K222 , Y221 ) = ∆(7)F (X0 , Y221 )
(12)
L7 From equation (10) and (12), we have 7th order diﬀerential of H32 as follows. L7 ∆(7)H32 = ∆(7)F (X0 , Y221 )]7
(13)
We calculated the Boolean expressions of H312 by using the computer algebra software REDUCE. As the result, we found the followings.
226
Hidema Tanaka, Kazuyuki Hisamatsu, and Toshinobu Kaneko
Table 1. The Boolean expression of H312 . ˆ h0 ˆ h1 ˆ h2 ˆ h3 ˆ h4 ˆ h5 ˆ h6
x0 x1 x2 x3 x4 x5 x6 + (y0 + y3 + y5 + y6 + y8 )x0 x1 x2 x3 x4 x5 + · · · + 1 (y0 + y2 + y4 + y7 )x0 x1 x2 x3 x4 x5 + · · · + y5 y7 + y5 y8 + y6 y8 + y6 x0 x1 x2 x3 x4 x5 x6 + (y0 + y2 + y4 + y5 + y7 + y8 + 1)x0 x1 x2 x3 x4 x5 + · · · + 1 x0 x1 x2 x3 x4 x5 x6 + (y0 + y3 + y4 + y6 + y8 )x0 x1 x2 x3 x4 x5 + · · · + 1 (y0 + y2 + y3 + y6 + y7 )x0 x1 x2 x3 x4 x5 + · · · + y6 y7 y8 + y7 + y8 + 1 x0 x1 x2 x3 x4 x5 x6 + (y1 + y6 + y8 + 1)x0 x1 x2 x3 x4 x5 + · · · + y8 x0 x1 x2 x3 x4 x5 x6 + (y0 + y2 + y5 + y7 + 1)x0 x1 x2 x3 x4 x5 + · · · + y6 + y7
1. The degree of H312 equals to 7. L7 2. The value of 7th order diﬀerential of H32 equals to 0x6D. 3. The coeﬃcients of terms whose degree is 6, are functions of elements in Y221 . We show a part of them in Table 1. X222 = (x6 , . . . , x0 ), (X222 = X0 + H133 + K222 ) ˆ 6 , . . . , ˆh0 ) Y221 = (y8 , . . . , y0 ), H312 = (h L7 By using ∆(7)H32 = 0x6D, the following attacking equation, with respect to K522 , K521 , K512 and K511 (32[bit] out of 75[bit]) can be derived. F O(CL (P + A); K522 , K521 , K512 , K511 ) + CR (P + A) = 0x6D (14) A∈V (7) L7 , the resultant equation is As we construct the attacking equation for 7[bit] H32 7 the vector equation on GF(2) . Note that the appropriate bit of CL (P ), CR (P ) and FO(·) is selected for the attacking equation (15).
4.3
Number of Chosen Plain Texts and Computational Cost
We adapted the algebraic method [8] for solving the attacking equation (15). We regard all the variable terms with respect to K522 , K521 , K512 and K511 as the independent variables. The attacking equation has two 9[bit] unknowns (K521 and K511 ) whose degree is 1, and two 7[bit] unknowns (K522 and K512 ) whose degree is 2. The equation can be regarded as the linear equation which has 2 × (9 + 7 + 7 C2 ) = 74 unknowns. We can deduce 7 linear equations from one 7th order diﬀerential. To solve the equation, we need 74/7 11 diﬀerent 7th order diﬀerentials. Thus we need 27 × 11 = 1, 408 chosen plain texts. To solve the attacking equation, we calculate the coeﬃcient matrix. The size 7 17 of matrix is 74 × 74, so it needs 74 × 74 times FO function operations. 7 ×2 2 7 If we make a brute force search by using 2 chosen plain texts, the computational cost is 27 × 232 = 239 . The algebraic method is 222 times faster than the brute force search (Table 2). The computer simulation took about 0.5[s] for this attack (SONY NEWS5000: CPU R4400 150[Mhz], Memory 32[M]).
Strength of MISTY1 without FL Function
227
Table 2. The compalison of the algebraic method and the blute force sarch. Number of texts Computatinal cost CPU time Algebraic method 1,408 217 about 0.5[s] Brute force sarch 256 239 
Table 3. The Boolean expression of H322 . h0 h1 h2 h3 h4 h5 h6
5 5.1
(h0 h1 + h0 h4 + h1 h3 + h1 h4 + · · · + h4 h5 + h4 h6 + h5 h6 + 1)x0 x1 x2 x3 x4 x5 · · · (h0 h3 + h0 h5 + h0 h6 + h1 h2 + h2 h3 + h2 h4 + h2 h5 + h3 h4 )x0 x1 x2 x3 x4 x5 · · · (h0 h1 + h0 + h1 h2 + h1 h3 + h1 h4 + h1 h5 + h1 h6 + · · · + h6 )x0 x1 x2 x3 x4 x5 · · · (h0 h2 + h0 h3 + h0 h5 + h1 h4 + · · · + h3 h6 + h4 h5 + h5 + h6 )x0 x1 x2 x3 x4 x5 · · · (h0 h1 + h0 h3 + h0 h4 + h0 h5 + h0 h6 + h0 + h1 h5 + · · · + h6 )x0 x1 x2 x3 x4 x5 · · · (h0 h1 + h0 h2 + h0 h3 + h0 h4 + · · · + h3 h6 + h4 h5 + h4 h6 + 1)x0 x1 x2 x3 x4 x5 · · · (h0 h2 + h0 h3 + h0 h5 + h0 h6 + h0 + h1 h2 + h1 h5 + · · · + 1)x0 x1 x2 x3 x4 x5 · · ·
Estimation of Intermediate Value Intermediate Value
There are constants which are functions of the ﬁxed part of plain text and subkeys. We call such constants as intermediate values. Since we have 11 diﬀerent 7th order diﬀerentials mentioned before, we have 11 diﬀerent constants in ﬁxed part in chosen plain texts. Thus each intermediate value has 11 diﬀerent values. In the following, we use the diﬀerentials of these to derive attacking equations for remaining subkeys. Next, we estimate intermediate values using 6th order diﬀerentials. 5.2
Attacking Equation Using 6th Order Diﬀerential
L7 By decoding the cipher texts using the derived subkeys, we can calculate H32 . From Figure 3, the degree of H322 will be 6. Let’s consider 6th order diﬀerential of equation (9). The terms whose degree is greater than 5 should be counted. L7 H32 ]6 = (H312 + H322 + Z322 )]6
(15)
We calculate Boolean expressions of H322 by REDUCE. As the result, we found the followings. 1. The degree of H322 equals to 6. 2. The coeﬃcients of the terms whose degree is 6, are functions of H213 . We show a part of them in Table 3. X222 = (x6 , . . . , x0 ), H213 = (h6 , . . . , h0 ), H322 = (h6 , . . . , h0 )
(16)
228
Hidema Tanaka, Kazuyuki Hisamatsu, and Toshinobu Kaneko
L7 L7 L7 From Table 1, 3 and equation (15), hL7 i (i = 0 ∼ 6) in H32 = (h6 , . . . , h0 ) is as follows.
hL7 i ]6 = cj · x0 · · · x6 +
6
([Y221 ]j + [H213 ]j )x\{j}
(17)
j=0
where cj ∈ GF(2) is a coeﬃcient. And where x\{j} denotes the product of x0 , · · · , x6 except xj , [Y221 ]j denotes the coeﬃcient of x\{j} in Table 1, and [H213 ]j in Table 4. We can calculate 6th order diﬀerential of hL7 i as follows. ∆
L7 (6) h V\{j} i (6)
= cj ·xj +[Y221 ]j +[H213 ]j ,
(6)
V\{j} = (A1 , A2 , . . . , Aj−1 , Aj+1 , . . . , A7 )
(18) By solving this, we can determine the intermediate values of Y221 , H213 and one bit xj of X222 . Since X222 is 7 [bit], to determine all bit, we need 7 diﬀerent 6th order diﬀerentials. One 7th order diﬀerential can derive 7 diﬀerent 6th order diﬀerentials. The equation (18) has 7 [bit] and 9[bit] unknowns whose degree is 1 and one 7 [bit] unknown whose degree is 2. So the equation has 7+9+7+7C2 = 44 unknowns. Thus we can solve equation (18) by using one 7th order diﬀerential
6
Attack to the Remaining Subkeys
We can determine intermediate values X222 , Y221 and H213 . The attack to the remaining subkeys is ; (1)Estimation of the intermediate values X212 and X213 , (2)Attack of FO1, (3)Attack of FI21 , (4)Attack of FI22 . Since we have 11 diﬀerent 7th order diﬀerentials, ﬁxed subblock Xi (i = 0), has some diﬀerent values. 6.1
Estimation of the Intermediate Values X212 and X213
In FI21 , following holds. H213 = Y213 + Y212 + Y211 + Z212
(19)
The value of Z212 is a constant determined by the plain text subblock X2 . Z212 = X2 + H122
(20)
Since H122 is a constant, the value of 1st order diﬀerential of Z212 equals to the value of 1st order diﬀerential of X2 . The value of H213 is known, we can calculate the value of 1st order diﬀerential of (19) as follows. ∆H213 = ∆Y213 + ∆Y212 + ∆Z212
(21)
Since Y213 = S9(X213 ), we can regard ∆Y213 as linear equation. In the same way, ∆Y212 as 2nd order equation. The equation (21) has 7 [bit] unknown whose degree is 2 and 9 [bit] unknown whose degree is 1. So the equation has 7 + 9 + 7 C2 = 37 unknowns. To solve this, we need 37/7 6 diﬀerent 1st order diﬀerentials. As the result, we can determine X212 and X213 .
Strength of MISTY1 without FL Function
6.2
229
Attack of FO1
The intermediate value X213 is calculated as follows. L7 L7 X213 = Z212 + K213 + S9(H11 + H12 + K211 )
(22)
Since Z212 and K213 are constants and X213 is known, we can calculate 1st order diﬀerential of this as follows. L7 L7 ∆X213 = ∆S9(H11 + H12 + K211 )
(23)
We can determine subkeys K111 , K112 , K121 , K122 and K211 by solving this equation. The intermediate value X222 is calculated as follows. L7 L7 L7 X222 = Y11 + Y12 + Y13 + K222 + X0 + X4
(24)
Since X222 , X0 and X4 are knowns, we have following. L7 L7 L7 ∆X222 = ∆Y11 + ∆Y12 + ∆Y13 + ∆X0 + ∆X4
(25)
By solving equation (26), we can determine subkeys K113 , K123 , K131 , K132 and K133 . As the result, we can determine all subkey in FO1 . So we can calculate the values of inputs to FO2. 6.3
Attack of FI21
In FI21 , X213 can be calculated as follows. X213 = Y211 + Z212 + K213
(26)
Since all subkey in FO1 and K211 are determined, the values of Z212 and Y211 can be calculated. Thus we can determine subkey K213 from equation (26). The following equation holds. X212 = K212 + Z212
(27)
The value of Z212 is known, we can determine subkey K212 . Since K211 is determined in the attack of FO1 , we can determine all subkey in FI21 . In the same way, we can attack to all subkey in FI22 . Due to the limitation of space, we omit the details.
7
Conclusion
We showed that 5 round MISTY1 without FL function, which is secure against Linear and Diﬀerential cryptanalysis, is attackable by Higher Order Diﬀerential Attack using the eﬀective chosen plain texts. Our attack is consisted of 2 attacking phases. The 1st phase is an attack using 7th order diﬀerentials to determine
230
Hidema Tanaka, Kazuyuki Hisamatsu, and Toshinobu Kaneko
4 subkeys in 5th round. This attack needs 1,408 chosen plain texts and 217 of computational cost. Our computer simulation for this phase took about 0.5[s]. The 2nd phase is an attack using intermediate values, to determine another 15 subkeys. The chosen plain texts for the 1st phase are suﬃcient for this phase. After subkeys mentioned above are determined, it is far easier to estimate the rest subkeys. We conclude that at least 6 rounds is necessary for resistance against Higher Order Diﬀerential Attack.
References 1. Matui, ”Block Encryption Algorithm MISTY”, ISEC9611(199607) 2. Shimoyama, Moriai, Kaneko, ”Improving the Higher Order Diﬀerential Attack and Cryptanalysis of the KN Cipher”, ISEC9729(199709) 3. Moriai, Shimoyama, Kaneko, ”Higher Order Attack of a CAST Cipher (II)”, SCIS9812E 4. Jakobsen, Knudsen, ”The Interpolation Attack on Block Cipher”, FSE4th International Workshop, LNCS.1008 5. Nyberg, Knudsen, ”Provable Security against Diﬀerential Cryptanalysis”, Journal of Cryptology, Vol.8no.1 (1995) 6. Shimoyama, Moriai, Kaneko, ”Improving the Higher Order Diﬀerential Attack and Cryptanalysis of the KN Cipher”, 1997 Information Security Workshop 7. Matsui, ”New Structure of Block Ciphers with Provable Security against Diﬀerential and Linear cryptanalysis”, FSE3rd International Workshop, LNCS.1039 8. Moriai, Shimoyama, Kaneko, ”Higher Order Attack of a CAST Cipher”, FSE4th International Workshop, LNCS.1372
Quantum Reed–Solomon Codes Markus Grassl, Willi Geiselmann, and Thomas Beth Institut f¨ ur Algorithmen und Kognitive Systeme Arbeitsgruppe Quantum Computing Universit¨ at Karlsruhe, Am Fasanengarten 5, 76 128 Karlsruhe, Germany
Abstract. We introduce a new class of quantum error–correcting codes derived from (classical) Reed–Solomon codes over finite fields of characteristic two. Quantum circuits for encoding and decoding based on the discrete cyclic Fourier transform over finite fields are presented.
1
Introduction
During the last years it has been shown that computers taking advantage of quantum mechanical phenomena outperform currently used computers. The striking examples are integer factoring in polynomial time (see [18]) √ and ﬁnding pre–images of an n–ary Boolean function (“searching”) in time O( 2n ) (see [12]). Quantum computers are not only of theoretical nature—there are several suggestions how to physically realize them (see, e. g., [6,7]). On the way towards building a quantum computer, one very important problem is to stabilize quantum mechanical systems since they are very vulnerable. A theory of quantum error–correcting codes has already been established (see [15]). Nevertheless, the problem of how to encode and decode quantum error–correcting codes has hardly been addressed, yet. In this paper, we present the construction of quantum error–correcting codes based on classical Reed–Solomon (RS) codes. For RS codes, many classical decoding techniques exist. RS codes can also be used in the context of erasures and for concatenated codes. Encoding and decoding of quantum RS codes is based on quantum circuits for the cyclic discrete Fourier transform over ﬁnite ﬁelds which are presented in Section 4, together with the quantum implementation of any linear transformation over ﬁnite ﬁelds. We start with some results about binary codes obtained from codes over extension ﬁelds, followed by a brief introduction to quantum computation and quantum error–correcting codes.
2 2.1
Binary Codes from Codes over F2k Bases of Finite Fields
First, we recall some facts about ﬁnite ﬁelds (see, e. g., [13]). Any ﬁnite ﬁeld of characteristic p, i. e., a ﬁnite ﬁeld Fq where q = pk , is a vector space of dimension k over Fp . For a ﬁxed basis B of Fq over Fp , any element of Fq can thus be represented by a row vector of length k over Fp . To Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 231–244, 1999. c SpringerVerlag Berlin Heidelberg 1999
232
Markus Grassl, Willi Geiselmann, and Thomas Beth
stress the dependence on the choice of the basis B, we will denote this Fp vector space homomorphism by B: Fq = Fpk → Fpk ,
a → B(a).
(1)
The multiplication with a ﬁxed element a ∈ Fq deﬁnes an Fp –linear mapping. Thus it can be written as a k × k matrix MB (a) over Fp where B(a · a) = B(a ) · MB (a). The trace of MB (a) is independent of the choice of the basis and deﬁnes an Fp –linear mapping tr: Fq → Fp ,
a → tr(a) :=
k−1
i
ap = tr(MB (a))
i=0
(for the last equality see, e. g., [9, Satz 1.24]). To be able to proceed further, we recall the deﬁnition of the dual basis. Given a basis B = (b1 , . . . , bk ) of a ﬁnite ﬁeld Fq over Fp , the dual basis of B is a basis B ⊥ = (b1 , . . . , bk ) with (2) ∀i, j: tr(bi bj ) = δij . For any basis there exists a unique dual basis (see [13, Theorem 4.1.1]). Furthermore, for any ﬁnite ﬁeld of characteristic two there exists a self–dual basis, i. e., a basis B with B ⊥ = B (see [13, Theorem 4.3.5]). For a self–dual basis B, the matrix MB (a) is symmetric. This follows from [9, Satz 1.22], where MB⊥ (a) = MB (a)t
(3)
is shown. Finally, any linear transformation A ∈ GL(n, Fpk ) can be written as a linear transformation B(A) ∈ GL(nk, Fp ) by replacing each entry aij of A by MB (aij ). Moreover, the diagram (4) is commutative, i. e., the change to the ground ﬁeld can be done after or before the linear transformation—a fact that will be essential later. A Fpnk −−−→ Fpnk B (4) B B(A)
Fpkn −−−→ Fpkn 2.2
Subfield Expansion of Linear Codes
In the following, we restrict ourselves to the case p = 2, but the results are valid for any characteristic p > 0. Definition 1. Let C = [N, K, D] denote a linear code of length N , dimension K, and minimum distance D over the field F2k , and let B = (b1 , . . . , bk ) be a basis of F2k over F2 . Then the binary expansion of C with respect to the basis B, denoted by B(C), is the linear binary code C2 = [kN, kK, d ≥ D] given by ∈C . C2 = B(C) := (cij )i,j ∈ F2kN c = j cij bj i
Quantum Reed–Solomon Codes
233
The following theorem relates the dual codes of a code and its binary expansion. Theorem 1. Let C = [N, K] be a linear code over the field F2k and let C ⊥ be its dual. Then the dual code of the binary expansion B(C) of C with respect to the basis B is the binary expansion B ⊥ (C ⊥ ) of the dual code C ⊥ with respect to the dual basis B ⊥ , i. e., the following diagram is commutative: C −→ C ⊥ basis B dual basis B⊥ B(C) −→ B ⊥ (C ⊥ ) = B(C)⊥ Proof. Let c ∈ C and d ∈ C ⊥ be arbitrary elements of the code and its dual, resp. Then
k k N N c i di = cij bj dil bl , 0= (5) i=1
i=1
j=1
l=1
where B = (b1 , . . . , bk ) is a basis of F2k over F2 and B ⊥ = (b1 , . . . , bk ) is the corresponding dual basis. Taking the trace in (5) and rewriting the summation yields N N k k k cij dil tr(bj bl ) = cij dij 0= i=1 j=1 l=1
i=1 j=1
(the last equality follows from Eq. (2)). Hence the binary expansions of the codewords c and d are orthogonal which proves that the binary expansion of C ⊥ is contained in B(C)⊥. The theorem follows from the observation that both sets have 2k(N −K) elements. Corollary 1. Let C = [N, K] be a weakly self–dual linear code over the field F2k . Then the binary expansion B(C) of C with respect to a self–dual basis B is weakly self–dual, too.
3 3.1
Quantum Error–Correcting Codes Qubits and Quantum Circuits
In this section, we give a brief introduction to quantum computation (for a more comprehensive introduction see, e. g., [2,21]). The basic unit of quantum information, a quantum bit (or short qubit), is represented by the normalized linear combination q = α0 + β1,
where α, β ∈ C, α2 + β2 = 1.
(6)
Here 0 and 1 are orthonormal basis states written in Dirac notation (see [8]). The normalization condition in Eq. (6) stems from the fact that when extracting
234
Markus Grassl, Willi Geiselmann, and Thomas Beth
classical information from the quantum system by a measurement, the results “0” and “1” occur with probability α2 and β2 , resp. A quantum register of length n is obtained by combining n qubits modeled by the n–fold tensor product (C 2 )⊗n . The canonical orthonormal basis of (C 2 )⊗n is
B := b1 ⊗ . . . ⊗ bn =: b1 . . . bn = b bi ∈ {0, 1} . Hence the state of an n qubit register is given by ψ = cb b, where cb ∈ C and b∈{0,1}n
b∈{0,1}n
cb 2 = 1.
(7)
All operations of a quantum computer are linear. Furthermore, in order to preserve the normalization condition in Eq. (7), the operations have to be unitary. Basic operations are single qubit operations and two qubit operations. A single qubit operation on the jth qubit is given by U = I2j−1 ⊗ U2 ⊗ I2n−j , where U2 ∈ U(2) is a 2 × 2 unitary matrix. Important examples for single qubit operations are the Hadamard transform H and the Pauli matrices σx , σy , σz 1 0 1 0 −i 1 0 1 1 H := √ , σy := , σz := , (8) , σx := 1 0 i 0 0 −1 2 1 −1 where i2 = −1. The most important example for a two qubit gate is the so–called controlled NOT gate (CNOT ) since any unitary operation on a 2n –dimensional space can be implemented using only single qubit operations and CNOT gates (see [1]). The transformation matrix of the CNOT gate is given by:
1 0 CNOT := 0 0
0 1 0 0
0 0 0 1
0 0 1 0
00 → 00 01 → 01 10 → 11 11 → 10
x1
•
x1
x2
✐
x1 ⊕ x2
(9)
The CNOT gate corresponds to the classical XOR gate since CNOT x1 x2 = x1 x1 ⊕ x2 . (For the graphical notation on the right hand side see, e. g., [1].) 3.2
Error Model
In the following, we brieﬂy summarize some results about quantum error–correcting codes. For a more comprehensive treatment, we refer to, e. g., [3,15]. One common assumption in the theory of quantum error–correcting codes is that errors are local, i. e., only a small number of qubits are disturbed when transmitting or storing the state of an n qubit register. The basic types of errors are bit–ﬂip errors exchanging the states 0 and 1, phase–ﬂip errors changing the relative phase of 0 and 1 by π, and their combination. The bit–ﬂip error corresponds to the Pauli matrix σx , the phase–ﬂip error to σz , and their combination to σy . It is suﬃcient to consider only this discrete set of errors in order to cope with any possible local error (see [15]). The important duality of bit–ﬂip errors and phase–ﬂip errors is shown by the following lemma.
Quantum Reed–Solomon Codes
235
Lemma 1. Bit–flip and phase–flip errors are conjugated to each other by the Hadamard transform, i. e., Hσx H −1 = σz and Hσz H −1 = σx . Errors operating on an n qubit system are represented by tensor products of Pauli matrices and identity. The weight of an error e = e1 ⊗ . . . ⊗ en , where ei ∈ {id, σx , σy , σz } is the number of local errors ei that diﬀer from identity. 3.3
Quantum Codes
Analogously to the notation C = [N, K, d] for a classical error–correcting code encoding K information symbols using N code symbols and being capable of detecting up to d − 1 errors, a quantum error–correcting code encoding K qubits using N qubits is denoted by C = [[N, K, d]]. The code C is a 2K –dimensional subspace of the 2N –dimensional complex vector space (C 2 )⊗N such that any error of weight less than d can be detected or, equivalently, any error of weight less than d/2 can be corrected. The construction of quantum Reed–Solomon codes is based on the construction of quantum error–correcting codes from weakly self–dual binary codes presented in [5] and [19,20] as summarized by the following deﬁnition and theorem. Definition 2. Let C = [N, K] be a weakly self–dual linear binary code, i. e., C ≤ C ⊥ . Furthermore, let {w j  j = 1, . . . , 2N −2K } be a system of representatives of the cosets C ⊥ /C. Then the basis states of a quantum code C = [[N, N − 2K]] are given by 1 c + w j . (10) ψj = C c∈C Theorem 2. Let d be the minimum distance of the dual code C ⊥ in Definition 2. Then the corresponding quantum code is capable of detecting up to d − 1 errors or, equivalently, is capable of correcting up to (d − 1)/2 errors. Proof. (Sketch) A general state of the quantum code is a linear combination of the states in Eq. (10), i. e., 1 αj ψj = αj c + w j = βc c. (11) ψ = C c∈C j j c∈C ⊥ A combination of bit–ﬂip and phase–ﬂip errors can be written as e
e
e = (σxb,1 σzep,1 ) ⊗ . . . ⊗ (σxb,n σzep,n ),
(12)
where eb and ep are binary vectors. The eﬀect of this error on the state (11) is βc (−1)c·ep c + eb . (13) eψ = c∈C ⊥
Computing the syndrome with respect to the binary code C ⊥ using auxiliary qubits, we obtain the state βc (−1)c·ep c + eb s(c + eb ). (14) c∈C ⊥
236
Markus Grassl, Willi Geiselmann, and Thomas Beth
As the syndrome s(c + eb ) depends only on the error eb , the state (14) is a tensor product and we can measure the syndrome without disturbing the ﬁrst part of the quantum register. Using a classical decoding algorithm for the code C ⊥ , the error vector eb is computed from the measured syndrome s(eb ). For each non–zero position of eb , a σx gate is applied to correct the error. From Lemma 1 follows that the Hadamard transform exchanges the rˆole of eb and ep in Eq. (12). Furthermore, computing the Hadamard transform of the states in (10) yields 1 H ⊗N ψj = (−1)c·wj c. (15) C ⊥ c∈C ⊥ Hence, the Hadamard transform changes the state (13) into γc (−1)c·eb c + ep . H ⊗N eψ = c∈C ⊥
The error vector ep can be determined as before. The general outline of decoding is shown in Fig. 1.
H . . .
H correction of phase–flip errors
. . .
syndrome computation
correction of bit–flip errors
H syndrome computation
. (erroneous) . encoded state . . . auxiliary qubits .
. . . H . . .
Fig. 1. General decoding scheme for a quantum error–correcting code constructed from a weakly self–dual binary code. Before we will be ready to present quantum Reed–Solomon codes, we need to show how to implement a discrete Fourier transform over a ﬁnite ﬁeld on a quantum computer.
4
Quantum Implementation of the Cyclic DFT over F2k
Recall from Section 3.1 that the state of an n qubit system can be written as ψ = cx x, where cx ∈ C and cx 2 = 1. (16) x∈F2n
x∈F2n
Hence any invertible linear transformation A ∈ GL(n, F2 ) on the binary vector a linear transformation Q(A) ∈ GL(2n , C) on the complex space F2n induces 2n vector space C = (C 2 )⊗n . The transformation Q(A) permutes the basis states x according to Q(A) : x → xA. In the following we will show how this transformation can be implemented eﬃciently using only CN OT gates.
Quantum Reed–Solomon Codes
237
Theorem 3. Let π ∈ Sn be a permutation and let P ∈ GL(n, F2 ) be the corresponding permutation matrix acting on the binary vector space F2n . Then the quantum transformation Q(P ) ∈ GL(2n , C) defined by Q(P ) : x → xP is a permutation matrix permuting the n tensor factors C 2 of the complex vector n space C 2 = (C 2 )⊗n . It can be implemented using at most 3(n − 1) CNOT gates. Proof. Any permutation π ∈ Sn on n letters can be written as product of at most n − 1 transpositions. Each transposition (i, j) can be implemented by a quantum circuit with three CNOT gates, see Fig. 2.
(i, j) =
... .. . ...
•
❣
❣
•
... .. ❣ . ...
•
i j
Fig. 2. Implementing a transposition of two qubits using three CNOT gates. Theorem 4. Let A ∈ GL(n, F2 ) be an invertible linear mapping on the binary vector space F2n . Then the quantum transformation Q(A) ∈ GL(2n , C) defined by Q(A) n: x → xA is a permutation matrix acting on the complex vector space C 2 . It can be implemented using at most n(n − 1) + 3(n − 1) CNOT gates. Proof. Any matrix A ∈ GL(n, F2 ) can be decomposed as A = P · L · U, where P is a permutation matrix and L (resp. U) is a lower (upper) triangular matrix. By Theorem 3 we need at most 3(n − 1) CNOT gates for the implementation of Q(P ). For the implementation of the lower diagonal matrix L, we use the factorization L = L1 · . . . · Ln , where Li is almost an identity matrix, but the ith row equals the ith row of L. Hence multiplication of a binary vector x with Li is given by xLi = (x1 + xi Li1 , . . . , xi−1 + xi Li,i−1 , xi , xi+1 , . . . , xn ), i. e., the jth position of x is inverted iﬀ both xi and Lij are equal to one. This translates into a sequence of at most i − 1 CNOT gates with control qubit i and target qubit j whenever Lij equals one. In total, the implementation of Q(L) needs at most n(n − 1)/2 CNOT gates. The quantum transformation Q(U) can be implemented similarly. The quantum implementation of linear mappings over an extension ﬁeld F2k can be reduced to implementing linear mappings over F2 . First, we ﬁx a basis B of F2k . By extending the homomorphism B given in Eq. (1) we obtain a homomorphism F2nk → F2kn . Vectors v ∈ F2nk are mapped to binary vectors of length kn represented by kn qubits. Similarly to Eq. (16), we get cv v = cv B(v1 ) ⊗ . . . ⊗ B(vn). (17) ψ = v∈F nk 2
v∈F nk 2
238
Markus Grassl, Willi Geiselmann, and Thomas Beth
In this representation, a linear mapping A ∈ GL(n, F2k ) corresponds to a linear mapping B(A) ∈ GL(nk, F2 ) (see Eq. (4)). In the context of quantum Reed–Solomon codes, we will use the cyclic discrete Fourier transform over F2k which can be implemented eﬃciently as a quantum circuit. Theorem 5. Let n be a divisor of 2k − 1 and let α ∈ F2k be an element of order n. Then the cyclic DFT of length n over the field F2k , given by the matrix (18) DFT = αij i,j=0,...,n−1 can be implemented on a quantum computer using O(k 2 n2 ) CNOT gates. Proof. The condition n(2k − 1) ensures that the ﬁeld F2k contains a primitive nth root of unity α. Thus, we have DFT ∈ GL(n, F2k ). Fixing a basis B of F2k , we obtain a linear transformation B(DFT) ∈ GL(nk, F2 ) which can be implemented using O(k 2 n2 ) CNOT gates using Theorem 4.
5 5.1
Quantum Reed–Solomon Codes Definition of Quantum Reed–Solomon Codes
First, we recall the deﬁnition of Reed–Solomon codes (see [16, Fig. 10.6]). Definition 3. A (classical) Reed–Solomon (RS) code of length N = 2k − 1 over the field F2k is a cyclic code with generator polynomial g(X) = (X − αb )(X − αb+1 ) . . . (X − αb+δ−2 ), where α is a primitive element of F2k , i. e., an element of order 2k − 1 = N . The dimension of the code is K = N − δ + 1 and the minimum distance is δ. Alternatively, an RS code can be described by the spectrum with respect to the cyclic discrete Fourier transform of length N over F2k , see Eq. (18). For any vector v ∈ F2Nk , the spectrum is deﬁned by := v · DFT = v(αi ) i=0,...,N −1 , v N −1 c has δ − 1 where v(X) = j=0 vj X j . Then for any codeword c of an RS code, consecutive (possibly cyclically wrapped around) zeros starting at position b. Fixing the zeros in the spectrum, all codewords can be obtained by the inverse Fourier transform, i. e., the set of codewords is given by
∈ F2Nk , · DFT−1  v vb = vb+1 = . . . = vb+δ−1 = 0 , (19) C= v where the indices are computed modulo N . Lemma 2. For b = 0 and δ > N/2 + 1, RS codes are weakly self–dual.
Quantum Reed–Solomon Codes
239
Proof. The generator polynomial of C is g(X) = (X − 1)(X − α) . . . (X − αδ−2 ). The generator polynomial of the dual code C ⊥ is the reciprocal polynomial of (X N − 1)/g(X), i. e., g ⊥ (X) = (X − α−(δ−1) )(X − α−δ ) . . . (X − α−(N −1)) = (X − α1 )(X − α2 ) . . . (X − αN −δ+1) ). For δ > N/2 + 1, N − δ + 1 ≤ δ − 2. Thus g(X)⊥ is a divisor of g(X) which proves C ≤ C ⊥ . The relation between the spectra of an RS code C and its dual is illustrated in Fig. 3. The spectrum of any codeword c ∈ C is zero at the ﬁrst δ − 1 positions, 0
1 ...
δ−2
0 ∗
0 0 0
∗
spectrum c of c ∈ C
❍ ✟ ❍ ✡✟ ❍✟ ✡ ✟ ❍ ✟ ❍ ❄✟ ✢ ❄ ✙ ✡ ❥ ~❍
∗ 0
0 ∗
! N −δ+1
∗ ∗
δ−2
!
spectrum c of c ∈ C ⊥
Fig. 3. Relation between the spectra of a Reed–Solomon code C and its dual. Positions taking arbitrary values (marked with ∗) and positions being zero are interchanged. whereas the spectrum of any codeword c ∈ C ⊥ may take any value at the corresponding positions, the ﬁrst one and the last δ − 2 positions. In contrast, the last N − δ + 1 positions of the spectrum of c ∈ C are arbitrary, and positions 1 to N − δ + 1 in the spectrum of c ∈ C ⊥ are zero. Combining Lemma 2 and Corollary 1, we are ready to deﬁne quantum Reed– Solomon codes. Definition 4. Let C = [N, K, δ] where N = 2k − 1, K = N − δ + 1, and δ > N/2 + 1 be a Reed–Solomon code over F2k (with b = 0). Furthermore, let B be a self–dual basis of F2k over F2 . Then the quantum Reed–Solomon (QRS) code is the quantum error–correcting code C of length kN derived from the weakly self–dual binary code B(C) according to Definition 2. The parameters of the QRS code are given by the following theorem. Theorem 6. The QRS code C of Definition 4 encodes k(N − 2K) qubits using kN qubits. It is able to detect at least up to K errors, i. e., the parameters are C = [[kN, k(N − 2K), d ≥ K + 1]]. Proof. The weakly self–dual binary code B(C) has length kN and dimension kK. Hence, by Deﬁnition 2 the corresponding quantum code encodes kN − 2kK = k(N − 2K) qubits. The dual code B(C ⊥) has dimension k(δ − 1) and minimum distance d ≥ K + 1. From Theorem 2 follows that the QRS code can detect up to d − 1 ≥ K errors.
240
5.2
Markus Grassl, Willi Geiselmann, and Thomas Beth
Encoding Quantum Reed–Solomon Codes
Encoding of QRS codes is based on the quantum version of the cyclic discrete Fourier transform over F2k presented in Section 4. In the sequel, let C be an RS code over F2k and let B be a self–dual basis of F2k over F2 . Furthermore, we ﬁx a primitive element α ∈ F2k . Theorem 7. Let C = [[kN, k(N −2K), d > K]] where N = 2k −1, K = N −δ+1, and δ > N/2 + 1 be a quantum Reed–Solomon code constructed from the Reed– Solomon code C = [N, K, δ] over F2k . The transformation ⊗k(δ−1) ⊗ H ⊗kK E = Q(B(DFT−1 )) · I2 operating on states of the form φ1 ⊗ . . . ⊗ φk ⊗ 0 ⊗ . . . ⊗ 0 ⊗ φk+1 ⊗ . . . ⊗ φk(N −2K) ⊗ 0 ⊗ . . . ⊗ 0 ! ! ! ! k
kK
k(N −2K−1)
kK
is an encoder for the QRS code. The corresponding quantum circuit is shown in Fig. 4. φ1 φk 0 0 φk+1 φk(N −2K) 0 0
.. .
.. .
. . .
. . . DFT−1
. . . . . .
H
. . . . . .
...
& k qubits " kK qubits
k(N − 2K − 1) qubits
" kK qubits
H
Fig. 4. Encoder for a quantum Reed–Solomon code. Proof. Similarly to Eq. (10), any basis state of the QRS code can be written as 1 ψj = B(c + w j ), C c∈C
(20)
where the coset representatives wj ∈ C ⊥ will be speciﬁed later. The ﬁrst δ − 1 positions of the spectrum of c are zero. From Eq. (19), the other positions may take any value. Thus computing the Fourier transform of the state (20) yields 1 'j ). B(0, . . . , 0, i1 , . . . , iK ) + B(w Q(B(DFT))ψj = ! C i∈FK 2k
δ−1
'j can be chosen to be zero. Without loss of generality, the last K positions of w Hence applying the Hadamard transform to the last kK qubits yields ⊗k(δ−1) 'j ). I2 ⊗ H ⊗kK · Q(B(DFT))ψj = E −1 ψj = B(w
Quantum Reed–Solomon Codes
241
'j are zero, too, since w j ∈ C ⊥ (see Furthermore, positions i = 1, . . . , K of w Fig. 3). For any set of values for the remaining positions, we get a diﬀerent coset of C in C ⊥ . 5.3
Decoding Quantum Reed–Solomon Codes
Decoding procedures for quantum Reed–Solomon codes follow the scheme of Fig. 1. The syndrome of a vector v ∈ F2Nk are positions i = 1, . . . , K of the of v which is obtained by computing the DFT of v. This syndrome, spectrum v indicating bit–ﬂip–errors, is “copied” to kK auxiliary qubits using CNOT gates. Computing the inverse Fourier transform DFT−1 returns to the original basis. After a Hadamard transform, the same circuit is used to compute the syndrome of the phase–ﬂip errors. The whole quantum circuit is shown in Fig. 5. Both the .. .
. . .
H
H
.
.
•
❝
.
.
.
❝
.. .
•
H
.. .
H
.
.
.
•
H
DFT−1
.
. . .
. . .
H
DFT
•
DFT−1
. . .
H
.. .
DFT
(erroneous) encoded state 0 kK qubits 0 0 kK qubits 0
H
.. .
H H
. . .
. . .
H
H
.. . . . .
.
.
.
❝
k qubits " kK qubits
. k(N − K − 1) . . qubits . . .
❝
"
. . .
"
"
syndrome of bit–flip errors syndrome of phase–flip errors
Fig. 5. Computation of the syndrome for a quantum Reed–Solomon code. syndrome of bit–ﬂip errors and the syndrome of phase–ﬂip errors are measured yielding classical syndrome vectors. Then the most likely positions of errors are computed using a classical algorithm, e. g., the Berlekamp–Massey algorithm or the Euclidean algorithm (see [16]). The quantum circuit in Fig. 5 can be simpliﬁed using the following theorem. Theorem 8. Let DFT denote the cyclic discrete Fourier transform of length n over the field F2k and be B a self–dual basis of F2k over F2 . Then the following identities hold: Q(B(DFT−1 )) · H ⊗kn · Q(B(DFT)) = H ⊗kn · Q(B(π)) = Q(B(π)) · H
⊗kn
(21)
,
where π is the permutation x → −x mod n. Using the factorization on the right hand side of Eq. (21) instead of the factorization on the left hand side for the implementation as a quantum circuit reduces the complexity from O(k 2 n2 ) to O(kn).
242
Markus Grassl, Willi Geiselmann, and Thomas Beth
Proof. Let D := B(DFT) denote the binary matrix obtained by replacing each entry αij of DFT by MB (αij ). For a self–dual basis, MB (αij ) is symmetric (see Eq. (3)), and the Fourier matrix is symmetric, too. Hence the matrix D is also symmetric. Using Dirac notation, the matrices read xDx = xxD−1  and H ⊗kn = (−1)x·y xy. Q(D) = x∈F2kn
x∈F2kn
x,y∈F2kn
Multiplying the matrices results in Q(D−1 )H ⊗kn Q(D) =
(−1)x·y uD−1 ux yvvD−1  ! ! kn
u∈F2kn x,y∈F2kn v∈F2
=
=δux =δyv
(−1)x·y xD−1 yD−1  =
x,y∈F2kn
(−1)xD·yD xy.
x,y∈F2kn
The inner product of xD and yD is the same as the inner product of xDDt and y. Since D is symmetric, DDt = D2 = B(DFT2 ) = B(π). Finally, we obtain Q(D−1 )H ⊗kn Q(D) (−1)x·y xyB(π) = = x,y∈F2kn
=
(−1)x·y xy
x,y∈F2kn
(−1)x·y xB(π)y =
x,y∈F2kn
vB(π)v
v∈F2kn
vvB(π)
v∈F2kn
(−1)x·y xy.
x,y∈F2kn
From Eq. (21) in the preceding theorem it follows that Q(B(DFT−1 )) · H ⊗kn = Q(B(π)) · H ⊗kn Q(B(DFT−1 )).
(22)
Using the identities (21) and (22), and conjugating the CNOT gates by the permutation of qubits Q(B(π)), we obtain the simpliﬁed quantum circuit shown in Fig. 6.
6
Example
We construct a quantum Reed–Solomon code from an RS code over the ﬁeld F8 . We choose δ = 5 and obtain an RS code C = [7, 3, 5] with generator polynomial g(X) = (X − α0 )(X − α1 )(X − α2 )(X − α3 ), where α is a primitive element of F8 fulﬁlling α3 + α + 1 = 0. The dual code C ⊥ = [7, 4, 4] is generated by g ⊥ (X) = (X − α−4 )(X − α−5 )(X − α−6 ) = (X − α3 )(X − α2 )(X − α1 ). ¿From Theorem 6, the resulting QRS code has parameters C = [[21, 3, d ≥ 4]]. As self–dual basis of F8 we choose B = (α3 , α6 , α5 ). The binary expansions of C and C ⊥ yield binary codes C2 = B(C) = [21, 9, 8] and C2⊥ = B(C ⊥) = [21, 12, 5]. Thus the QRS code has parameters C = [[21, 3, 5]].
Quantum Reed–Solomon Codes
•
. . . .. .
.
.
.
•
H
. . . . . . . . .
DFT−1
.. .
DFT
(erroneous) encoded state 0 kK qubits 0 0 kK qubits 0
.. .
❝
•
.
H
.
.
H
.
.
.
❝
•
.. .
.. k qubits . " . . kK qubits . .. . . . .
H
. . .
❝
.
.
.
❝
243
. . .
&
k(N − 2K − 1) qubits
" kK qubits "
"
syndrome of bit–flip errors syndrome of phase–flip errors
Fig. 6. Simpliﬁed quantum circuit for computing the syndrome for a QRS code.
7
Conclusion
Most quantum error–correcting codes known so far are based on classical binary codes or codes over GF (4) = F22 (see [4]). In this paper, we have demonstrated how codes over extension ﬁelds of higher degree can be used. They might prove useful, e. g., for concatenated coding. The spectral techniques for encoding and decoding presented in this paper do not only apply to Reed–Solomon codes, but in general to all cyclic codes. The main advantage of Reed–Solomon codes is that no ﬁeld extension is necessary. The same is true for all BCH codes of length n over the ﬁeld F2k where n2k − 1. In addition to the spectral techniques, cyclic codes provide a great variety of encoding/decoding principles, e. g., based on linear shift registers that can be translated into quantum algorithms (see [10]). The quantum implementation of linear mappings over ﬁnite ﬁelds presented in Section 4 enlarges the set of eﬃcient quantum subroutines. In contrast, the transforms used in most quantum algorithms—such as cyclic and generalized Fourier transforms—are deﬁned over the complex ﬁeld (see, e. g., [17]). It has to be investigated how eﬃcient fully quantum algorithms for error– correction can be obtained, e. g., using quantum versions of the Berlekamp– Massey algorithm or of the Euclidean algorithm.
Acknowledgments The authors would like to thank Martin R¨ otteler and Rainer Steinwandt for numerous stimulating discussions during the process of writing this paper. Part of this work was supported by Deutsche Forschungsgemeinschaft, Schwerpunktprogramm QIV (SPP 1078), Projekt AQUA (Be 887/131).
244
Markus Grassl, Willi Geiselmann, and Thomas Beth
References 1. A. Barenco, C. H. Bennett, R. Cleve, D. P. DiVincenzo, N. Margolus, P. Shor, T. Sleator, J. A. Smolin, and H. Weinfurter, Elementary gates for quantum computation, Physical Review A, 52 (1995), pp. 3457–3467. 2. A. Berthiaume, Quantum Computation, in Complexity Theory Retrospective II, L. A. Hemaspaandra and A. L. Selman, eds., Springer, New York, 1997, pp. 23–51. 3. T. Beth and M. Grassl, The Quantum Hamming and Hexacodes, Fortschritte der Physik, 46 (1998), pp. 459–491. 4. A. R. Calderbank, E. M. Rains, P. W. Shor, and N. J. A. Sloane, Quantum Error Correction Via Codes over GF(4), IEEE Transactions on Information Theory, IT–44 (1998), pp. 1369–1387. 5. A. R. Calderbank and P. W. Shor, Good quantum error–correcting codes exist., Physical Review A, 54 (1996), pp. 1098–1105. 6. J. I. Cirac and P. Zoller, Quantum Computation with Cold Trapped Ions, Physical Review Letters, 74 (1995), pp. 4091–4094. 7. D. G. Cory, A. F. Fahmy, and T. F. Havel, Ensemble Quantum Computing by Nuclear Resonance Spectroscopy, Technical Report TR–10–96, B. C. M. P., Harvard Medical Medical School, Boston, Dec. 1996. 8. P. M. A. Dirac, The Principles of Quantum Mechanics, Clarendon Press, Oxford, 4th ed., 1958. 9. W. Geiselmann, Algebraische Algorithmenentwicklung am Beispiel der Arithmetik in endlichen K¨ orpern, Shaker, Aachen, 1994. Zugleich Dissertation, Universit¨ at Karlsruhe, 1993. 10. M. Grassl and T. Beth, Codierung und Decodierung zyklischer Quantencodes, in Fachtagung Informations– und Mikrosystemtechnik, B. Michaelis and H. Holub, eds., Magdeburg, 25–27 Mar. 1998. 11. M. Grassl, T. Beth, and T. Pellizzari, Codes for the Quantum Erasure Channel, Physical Review A, 56 (1997), pp. 33–38. 12. L. K. Grover, A fast quantum mechanical algorithm for database search, in Proceedings 28th Annual ACM Symposium on Theory of Computing (STOC), New York, 1996, ACM, pp. 212–219. 13. D. Jungnickel, Finite Fields, BI–Wissenschaftsverlag, Mannheim, 1993. 14. E. Knill and R. Laflamme, Concatenated Quantum Codes. LANL preprint quant–ph/9608012, 1996. , Theory of quantum error–correcting codes, Physical Review A, 55 (1997), 15. pp. 900–911. 16. F. J. MacWilliams and N. J. A. Sloane, The Theory of Error–Correcting Codes, North–Holland, Amsterdam, 1977. ¨schel, M. Ro ¨tteler, and T. Beth, Fast Quantum Fourier Transforms 17. M. Pu for a Class of Non–abelian Groups, in Proceedings AAECC–13, 1999. 18. P. W. Shor, Polynomial–Time Algorithms for Prime Factorization and Discrete Logarithms, in Proceedings 35th Annual Symposium on Foundations of Computer Science (FOCS), IEEE Computer Society Press, Nov. 1994, pp. 124–134. 19. A. Steane, Error Correcting Codes in Quantum Theory, Physical Review Letters, 77 (1996), pp. 793–797. , Multiple Particle Interference and Quantum Error Correction, Proceedings 20. of the Royal Society London Series A, 452 (1996), pp. 2551–2577. , Quantum computing, Reports on Progress in Physics, 61 (1998), pp. 117– 21. 173.
Capacity Bounds for the 3Dimensional (0, 1) Runlength Limited Channel Zsigmond Nagy and Kenneth Zeger Department of Electrical and Computer Engineering University of California, San Diego, La Jolla CA 920930407 {nagy,zeger}@code.ucsd.edu
(3)
Abstract. The capacity C0,1 of a 3dimensional (0, 1) runlength constrained (3)
channel is shown to satisfy 0.522501741838 ≤ C0,1 ≤ 0.526880847825.
1 Introduction A binary sequence satisfies a 1dimensional (d, k) runlength constraint if there are at most k zeros in a row, and between every two consecutive ones there are at least d zeros. An ndimensional binary array is said to satisfy a (d, k) runlength constraint, if it satisfies the 1dimensional (d, k) runlength constraint along every direction parallel to a coordinate axis. Such an array is called valid. The number of valid ndimensional (d,k) arrays of size m1 × m2 × . . . × mn is denoted by Nm1 ,m2 ,...,mn and the corresponding capacity is defined as (d,k)
(n)
Cd,k =
log2 Nm1 ,m2 ,...mn . m1 ,m2 ,...mn →∞ m1 m2 · · · mn lim
(n)
(n)
By exchanging the roles of 0 and 1 it can be seen that C0,1 = C1,∞ for all n ≥ 1. A simple proof of the existence of the 2dimensional (d, k) capacities can be found in [1], and the proof can be generalized to ndimensions. It is known (e.g. see [2]) that the 1dimensional (0, 1)constrained capacity is the logarithm of the golden ratio, i.e. √ 1+ 5 (1) = 0.694242 . . . C0,1 = log2 2 and in [3] very close upper and lower bounds were given for the 2dimensional (0, 1)constrained capacity. The bounds in [3] were calculated with greater precision in [4] and are further slightly improved here by us (see Remark section at end for more details), now agreeing in 9 decimal positions: (2)
0.587891161775 ≤ C0,1 ≤ 0.587891161868 .
(1)
(2)
A lower bound of C 0,1 ≥ 0.5831 was obtained in [5] by using an implementable encod(2)
ing procedure known as “bitstuffing”. The known bounds on C 0,1 have played a useful Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 245–251, 1999. c SpringerVerlag Berlin Heidelberg 1999
246
Zsigmond Nagy and Kenneth Zeger
role in [1] for obtaining bounds on other (d, k)constraints in two dimensions. The 3dimensional (0, 1)constrained bounds given in the present paper can play a similar role for obtaining different 3dimensional bounds, and are also of theoretical interest. In fact, a recent tutorial paper [6] discusses an interesting connection between run length constrained capacities in more than one dimension and crossword puzzles (based on work of Shannon from 1948). In the present paper we consider the 3dimensional (0, 1) constraint, and by extending ideas from [3] our main result is to derive (in Sections 2 and 3) the following bounds on the 3dimensional (0, 1) capacity. Theorem 1 (3)
0.522501741838 ≤ C0,1 ≤ 0.526880847825 It is assumed henceforth in this paper that d = 0 and k = 1. Two valid m 1 × m2 rectangles can be put next to each other in 3 dimensions without violating the 3dimensional (0, 1) constraint if they have no two zeros in the same positions. Define a (0,1) (0,1) transfer matrix Tm1 ,m2 to be an Nm1 ,m2 × Nm1 ,m2 binary matrix, such that the rows and columns are indexed by the valid 2dimensional m 1 × m2 patterns, and an entry of Tm1 ,m2 is 1 if and only if the corresponding two rectangles can be placed next to each other in 3 dimensions without violating the (0, 1) constraint. Then, (0,1) m3 −1 m2 −1 m1 −1 Nm = 1 · Tm 1 = 1 · Tm 1 = 1 · Tm 1 1 ,m2 ,m3 1 ,m2 1 ,m3 2 ,m3
where 1 is the all ones column vector and prime denotes transpose. The matrix T m1 ,m2 meets the conditions of the PerronFrobenius theorem [7], since it has nonnegative real elements and is irreducible (since the all one’s rectangle can be placed next to any valid rectangle without violating the (0, 1) constraint). Therefore the largest magnitude eigenvalue Λm1 ,m2 , of Tm1 ,m2 , is positive, real, and has multiplicity one. This implies that (0,1) lim (Nm )1/m3 = Λm1 ,m2 , 1 ,m2 ,m3 m3 →∞
and (0,1)
(3)
C0,1 =
log2 Nm1 ,m2 ,m3 m1 ,m2 ,m3 →∞ m1 m2 m3 lim
(0,1)
log2 limm3 →∞ (Nm1 ,m2 ,m3 )1/m3 m1 ,m2 →∞ m1 m2 log2 Λm1 ,m2 = lim m1 ,m2 →∞ m1 m2 =
lim
1/m
2 log2 limm2 →∞ Λm1 ,m 2 m1 →∞ m1 log2 Λm1 = lim , m1 →∞ m1
= lim
1/m
log Λ
(2) log Λ
2 2 m1 ,m2 2 m1 and can be where Λm1 = limm2 →∞ Λm1 ,m 2 . The quantities m1 m2 m1 viewed as capacities corresponding to 3dimensional arrays with two fixed sides (lengths m1 and m2 ), and one fixed side (length m1 ), respectively.
Capacity Bounds for the 3Dimensional (0, 1) Runlength Limited Channel
247
Upper and lower bounds on the 3dimensional capacity can be computed directly from the inequalities (similar to the 2dimensional case, as noted in [4]) log2 Λm1 ,m2 log2 Λm1 ,m2 (3) ≤ C 0,1 ≤ (m1 + 1)(m2 + 1) m1 m2 but these do not yield particularly tight bounds for values of m 1 and m2 that result in reasonable space and time complexities (e.g. Table 1 shows that the eigenvalues Λm1 ,m2 correspond to matrices with more than 40 million elements when roughly m 1 m2 ≥ 20). The upper and lower capacity bounds derived in this paper agree to within ±0.002 and were computed using less than 100 Mbytes of computer memory. (3)
2 Lower Bound on C0,1
(3)
To derive a lower bound on C 0,1 we generalize a method of Calkin and Wilf [3]. Since Tm1 ,m2 is a symmetric matrix, the CourantFischer Minimax Theorem [8, pg. 394] implies that p x · Tm x 1 ,m2 Λpm1 ,m2 ≥ (3) x · x q 1 for any integer for any nonzero vector x and any integer p ≥ 0. Choosing x = T m 1 ,m2 q ≥ 0 gives m2 −1 p+2q 1 1 · Tm 1 1 · Tm 1 ,m2 1 ,p+2q+1 = . (4) Λpm1 ,m2 ≥ 2q m2 −1 1 · Tm1 ,2q+1 1 1 · Tm1 ,m2 1 Thus, 2
(3)
pC0,1
=
lim Λ1/(m1 m2 ) m1 ,m2 →∞ m1 ,m2
≥ lim
m1 →∞
Λm1 ,p+2q+1 Λm1 ,2q+1
p
= lim
1/m1
m1 →∞
lim Λp/m2 m →∞ m1 ,m2
1/m1
2
1/m
=
limm1 →∞ Λm1 ,p1+2q+1 1/m limm1 →∞ Λm1 ,21q+1
=
Λp+2q+1 (5) Λ2q+1
and therefore for any odd integer r ≥ 1 and any integer z > r, 1 Λz (3) C 0,1 ≥ . log2 z−r Λr
(6)
(3)
This lower bound on C0,1 is analogous to a 2dimensional bound in [3], but Λz and Λr are not eigenvalues associated with transfer matrices of 2dimensional arrays here, and cannot easily be computed as in the 2dimensional case. Instead, we obtain a lower bound on Λz and an upper bound on Λ r . From (4) and (5) a lower bound on Λ z is Λz =
lim Λ1/m2 m →∞ z,m2 2
≥ lim
m2 →∞
m2 −1 1 · Tz,v 1
m2 −1 1 · Tz,u 1
1/((v−u)m2 )
=
Λz,v Λz,u
1/(v−u) ,
248
Zsigmond Nagy and Kenneth Zeger
where u is an arbitrary positive odd integer, v > u, and Λ z,v and Λz,u are the largest eigenvalues of the transfer matrices Tz,v and Tz,u , respectively. To find an upper bound on the quantity Λ r for a given r, we apply a modified version of a method in [3]. We say that a binary matrix satisfies the (0, 1) cylindrical constraint if it satisfies the usual 2dimensional (0, 1) constraint after joining its leftmost column to its rightmost column (i.e. the left and right columns can be put next to each other without violating the (0, 1) constraint). A binary matrix satisfies the (0, 1) toroidal constraint if it satisfies the usual 2dimensional (0, 1) constraint after both joining its leftmost column to its rightmost column, and its top row to its bottom row. Proposition 1 Let s be a positive even integer and let Tm1 ,m2 be the transfer matrix whose rows and columns are indexed by all (0, 1)constrained m1 × m2 rectangles. Let Bm1 ,s denote the transfer matrix whose rows and columns are indexed by all cylindrically (0, 1)constrained m1 × s rectangles. Then, s m2 −1 Trace[Tm ] = 1 · Bm 1. 1 ,m2 1 ,s
m1
111 000 000 111 111 000 111 000 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11
11 00 00 11 00 11 00 11 00 11 00 11 00 11 00 11 000 111 00 11 00 11 000 111 00 11 00 11 000 111 00 11 11 00 000 111 00 11 000 111 00 11 000 111 00 11 00 11 000 111 00 0011 11 111 000 00 111 000 11 00 11 11 00 0 1 0 1 000 111 0 1 000 111 0 1 111 000 111 000
11 00 00 11 00 11 00 11
m2
s Fig. 1. Cylindrically (0, 1)constrained m 1 × s rectangles used to build cylindric m 1 × m2 × s arrays.
For every positive integer m 1 and m2 , and every even positive integer s, the mas trix Tm has nonnegative eigenvalues and thus any one of its eigenvalues is upper 1 ,m2 bounded by its trace. Hence, s 1/s 1/s m2 −1 Λm1 ,m2 ≤ Trace Tm = 1 · Bm 1 (7) 1 ,m2 1 ,s which gives the following upper bound on Λ r : 1 m2 −1 1/s 2 Λr = lim Λ1/m 1 sm2 = ξr,s , 1 · Br,s r,m2 ≤ lim m2 →∞
m2 →∞
(8)
Capacity Bounds for the 3Dimensional (0, 1) Runlength Limited Channel
249
where ξr,s is the largest eigenvalue of Br,s (note that Br,s satisfies the PerronFrobenius theorem for the same reasons as for Tm1 ,m2 in Section 1). (3) The lower bound on C0,1 in (6) can now be written as (3)
C 0,1 ≥
1 log2 z−r
Λz,v Λz,u
1/(v−u) 1/s
ξr,s
r and u odd, s even z>r≥1 v>u≥1 s≥2
(9)
To obtain the best possible lower bound, the right hand side of (9) should be maximized over all acceptable choices of r, z, u, v, and s, subject to the numerical computability of the quantities Λz,v , Λz,u , and ξr,s . Table 1 shows the largest eigenvalues of various transfer matrices which were numerically computable. From this table, the best parameters we could find for the lower bound in (9) on the capacity were r = 3, z = 4, u = 5, v = 6, and s = 10, yielding (3)
C0,1 ≥
9346.35893701 1 2102.73425568 ≥ 0.522501741838. log2 4−3 (80481.0598379)1/10
(3)
3 Upper Bound on C0,1
Proposition 2 Let s1 and s2 be positive even integers and let Bs∗1 ,s2 denote the transfer matrix whose rows and columns are indexed by all toroidally (0, 1)constrained s1 × s2 (3) rectangles. If ξs∗1 ,s2 is the largest eigenvalue of Bs∗1 ,s2 , then C0,1 ≤ s11s2 log2 ξs∗1 ,s2 . ∗ ∗ Note that B2,s2 = B2,s and thus ξ2,s2 = ξ2,s . The best parameters we were able 2 2 to find (from Table 1) were s1 = 4 and s2 = 6, and the resulting eigenvalue gave the following upper bound: (3)
C0,1 ≤
1 log2 6405.69924332 ≤ 0.526880847825. 24
4 Remark Direct computation of eigenvalues using standard linear algebra algorithms generally requires the storage of an entire matrix. This severely restricts the matrix sizes allowable, due to memory constraints on computers. By exploiting the fact that our matrices are all binary, symmetric, and easily computable, we were able to obtain the largest eigenvalues of much larger matrices. Specifically, the eigenvalues used to obtain the capacity bounds in Theorem 1 were computed using the “power method” [8, pg. 406]. Similarly, we obtained the upper bound in (1) with the power method (computing Λ 1,21 , Λ1,23, and ξ1,24 ). Originally these bounds were computed in [3] as 0.587891161 ≤ (2) C0,1 ≤ 0.588339078 (computing Λ 1,13, Λ1,15 , and ξ1,6 ) and were later improved in [4] (2)
(computing Λ 1,13 , Λ1,14 , and ξ1,14 ) to 0.587891161775 ≤ C0,1 ≤ 0.587891494943. The lower bound in (1) is from [4].
250
Zsigmond Nagy and Kenneth Zeger ∗ ∗ Table 1. Largest eigenvalues of Ta,b , Ba,b, and Ba,b are Λa,b, ξa,b , and ξa,b .
∗ ∗ a b Λa,b rows of Ta,b ξa,b rows of Ba,b ξa,b rows of Ba,b 1 1 1.61803398875 2 2 2.41421356237 3 2.41421356237 3 3 3.63138126040 5 4 5.45770539597 8 5.15632517466 7 5 8.20325919376 13 6 12.3298822153 21 11.5517095660 18 7 18.5324073775 34 8 27.8550990963 55 26.0579860919 47 9 41.8675533183 89 10 62.9289457252 144 58.8519350815 123 11 94.5852312050 233 12 142.166150393 377 132.947794048 322 13 213.682559741 610 14 321.175161677 987 300.345852027 843 15 482.741710897 1597 16 725.584002895 2584 678.525669346 2207 17 1090.58764423 4181 18 1639.20566742 6765 1532.89283597 5778 19 2463.80493521 10946 20 3703.21728345 17711 3463.03987027 15127 21 5566.11363689 28657 22 8366.13642876 46368 7823.53857819 39603 23 12574.7053170 75025 24 18900.3867144 121393 17674.5747630 103682 2 2 5.15632517466 7 5.15632517466 7 5.15632517466 7 3 11.1103016575 17 4 23.9250625386 41 21.9287654025 35 21.9287654025 35 5 51.5229210280 99 6 110.954925971 239 100.236549238 199 100.236549239 199 7 238.942175857 577 8 514.563569622 1393 463.203410887 1155 463.203410887 1155 9 1108.11608218 3363 10 2386.33538059 8119 2146.04060032 6727 2146.04060032 6727 11 5138.98917320 19601 12 11066.8474924 47312 9949.63685703 39203 9949.63685703 39203 3 3 34.4037405361 63 4 106.439377528 227 94.2548937790 181 5 329.331697608 827 6 1018.97101980 2999 884.498791440 2309 7 3152.75734322 10897 8 9754.81971205 39561 8421.60680806 30277 9 30181.9963196 143677 10 93384.9044989 521721 80481.0598378 398857 4 4 473.069084944 1234 404.943621498 933 355.525781764 743 5 2102.73425567 6743 6 9346.35893702 36787 7799.87080772 26660 6405.69924332 18995
Capacity Bounds for the 3Dimensional (0, 1) Runlength Limited Channel
251
References 1. A. Kato and K. Zeger, “On the Capacity of TwoDimensional Run Length Constrained Channels,” IEEE Trans. Info. Theory, 1999 (to appear). 2. D. Lind and B. H. Marcus, An Introduction to Symbolic Dynamics and Coding. New York: Cambridge University Press, 1995. 3. N. J. Calkin and H. S. Wilf, “The Number of Independent Sets in a Grid Graph,” SIAM Journal on Discrete Mathematics, vol. 11, pp. 54–60, February 1998. 4. W. Weeks and R. E. Blahut, “The Capacity and Coding Gain of Certain Checkerboard Codes,” IEEE Trans. Info. Theory, vol. 44, pp. 1193–1203, May 1998. 5. P. H. Siegel and J. K. Wolf, “Bit Stuffing Bounds on the Capacity of 2Dimensional Constrained Arrays,” in Proceedings of ISIT98, (MIT, Cambridge, MA), p. 323, August 1998. 6. K. A. Immink, O. H. Siegel, and J. K. Wolf, “Codes for Digital Recorders,” IEEE Trans. Info. Theory, vol. 44, pp. 2260–2299, October 1998. 7. R. B. Bapat and T. E. S. Raghavan, Nonnegative Matrices and Applications. Cambridge, United Kingdom: Cambridge University Press, 1997. 8. G. H. Golub and C. F. van Loan, Matrix Computations (3rd edition). Baltimore and London: Johns Hopkins University Press, 1996.
Rectangular Codes and Rectangular Algebra V. Sidorenko1, J. Maucher2, and M. Bossert2 1
Institute for Information Transmission Problems, Russian Academy of Science B. Karetnyi per. 19, 101447, Moscow GSP4, Russia sid@iitp.ru 2 University of Ulm, Dept. of Information Technology AlbertEinsteinAllee 43, 89081 Ulm, Germany {joma,boss}@it.etechnik.uniulm.de
Abstract. We investigate general properties of rectangular codes. The class of rectangular codes includes all linear, group, and many nongroup codes. We define a basis of a rectangular code. This basis gives a universal description of a rectangular code. In this paper the rectangular algebra is defined. We show that all bases of a trectangular code have the same cardinality. Bounds on the cardinality of a basis of a rectangular code are given.
1
Introduction
A block code C is a set of words c = (c1 , . . . cn) of length n over an alphabet Q = {0, 1, . . . , q − 1}. Given t ∈ 1, n − 1, split every codeword c into the head (past) p = (c1 , . . . ct ) and the tail (future) f = (ct+1 , . . . , cn ), i.e., c is the concatenation of the head p and the tail f: c = pf. A set C ⊂ Qn is called trectangular if the following implication is true [1] (in [2] such a set was called tseparable): p1 f1 , p1 f2 , p2 f1 ∈ C
→ p2 f2 ∈ C.
(1)
A set C ⊂ Qn is called rectangular if it is trectangular for each t. All group codes (and hence all linear codes) are rectangular. Many famous nonlinear codes are also rectangular. This includes Hadamard, Levenshtein, DelsarteGoethals codes (and hence Kerdock and NordstromRobinson codes) [3], Goethals codes (and hence Preparata codes) [4]. All codewords of a linear block code having ﬁxed Hamming weight form a rectangular set. = {(00), (01), (10)} is the simplest example of a nonThe binary code C rectangular code. As an example of a rectangular code consider the binary code = {(0000), (0011), (0101), (1000), (1011), (1101)}. The minimal trellis of the C is shown in Fig. 1. There is a one to one correspondence between codecode C and paths in the trellis. A trellis is called minimal if it has words of the code C in each depth the minimum number of vertices among all code trellises of the given code.
The work was supported by Russian Fundamental Research Foundation (project No 990100840) and by Deutsche Forschungs Gemeinschaft (Germany).
Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 252–258, 1999. c SpringerVerlag Berlin Heidelberg 1999
Rectangular Codes and Rectangular Algebra vr t
0 1
✲t ✼
0
✲t
1
0
✲t
1
❘t
0
0
253
✲ tvg ✒
1
❘ ✲t
Fig. 1. A trellis of a rectangular code C. Rectangular codes have the following nice property. The minimal trellis of a rectangular code is unique, biproper [1], and minimizes the number of vertices V  (by deﬁnition) the number E of edges, and the cycle rank E − V  + 1. As a result the Viterbi decoding complexity of a rectangular code is minimum when using the minimal trellis of the code. In addition, the minimal code trellis gives a universal compact representation of a rectangular code. If a rectangular code has no additional structure then perhaps the minimal code trellis is the only known compact description of the code. We present another universal compact description of a rectangular set using a suggested idea of rectangular basis. Given an arbitrary block code C, a rectangular set that includes C and has the minimum cardinality is called a rectangular closure of C and is denoted by [C]. A rectangular closure [C] is unique. We say that a set G generates a rectangular set S (G is a generating set for S) if [G] = S. A set G is called independent if for any g ∈ G g ∈ / [G \ g]. An independent set B generating a rectangular set S is called a basis of the rectangular set S. In [8] we obtained the following results. Given a rectangular set S, the Coloring algorithm was proposed to design a basis B having cardinality B = E − V  + 2,
(2)
where E and V  is the number of edges and vertices in the minimal trellis of S respectively. Thus, a basis gives approximately the same compact description of a rectangular set as the minimal code trellis. Similar to trellis complexity, the cardinality of a code basis depends on the order of codewords positions. It was also shown in [8] that the Merging algorithm [5], [2] applied to a trellis of a set A generates the rectangular closure [A]. The complexity of the minimal trellis of a closure [A] is less than the one of any trellis of the nonrectangular set A. This fact can be used to simplify iterative decoding algorithms [6],[7]. A similar problem of constructing a set with smallest trellis complexity was considered in [7]. The Wolf bound is not valid for nonlinear codes (even for rectangular codes). In this paper we continue investigation of rectangular codes. In Section 2 we deﬁne the rectangular complement operation and introduce the rectangular algebra. We believe that the rectangular algebra is an interesting mathematical object. We show that the algebra has not Exchange of independent sets property. Despite of this fact we conjecture (Conjecture 12) that all bases of a rectangular code have the same cardinality (given by (2)). We propose an upper bound on the cardinality of rectangular closure of a given set. This bound was obtained independently also by Yu. Sidorenko [10]. In Section 3 we consider t−rectangular codes. This is equivalent to consider codes of length 2. We show that Conjecture 12 is true for t−rectangular codes.
254
V. Sidorenko, J. Maucher, and M. Bossert
In Section 4 using results from rectangular algebra we propose lower and upper bounds on the cardinality of a rectangular code. The codes attaining the upper bound are called prime. The suﬃcient condition for a code to be prime is presented.
2
Rectangular Algebra
A universal algebra or, brieﬂy, algebra A is a pair < A; F >, where A is a nonvoid set and F is a family of ﬁnitary operations on A. A is called the base set. Deﬁne the base set A to be A = Qn = Q × Q . . . × Q, where Q is a ﬁnite alphabet . So, an element a ∈ A is a vector of length n over Q. For every t ∈ [1, n] we deﬁne a ternary partial operation of trectangular complement rt : A × A × A → A as follows. If a, b, c ∈ A can be represented as the following concatenations a = p2 f1 , b = p1 f1 , c = p1 f2 ,
(3)
where pi ∈ Qt , fi ∈ Qn−t , then rt (a, b, c) = p2 f2 , else rt (a, b, c) is undeﬁned. Definition 1 The partial algebra Ret =< A; rt > is called a trectangular algebra. The rectangular algebra can be deﬁned as < A; r1 , . . . , rn−1 > having n − 1 operations. However, we can simplify the deﬁnition of the algebra as follows. Extend the alphabet Q by joining a special zero element θ, Qθ = Q ∪ {θ} and deﬁne the partial operation sum as follows: ∀α, β, γ ∈ Qθ 1. α + α = θ; 2. α + β = β + α; 3. α + θ = α; 4. (α + β) + γ = α + (β + γ). The sum of words over Qθ is deﬁned as componentwise sum. Lemma 2 If rt (a, b, c) is defined then rt (a, b, c) = a + b + c. Proof. If rt (a, b, c) is deﬁned then there exist twords p1 , p2 and (n − t)words f1 , f2 such that (3) is satisﬁed and rt (a, b, c) = p2 f2 . On the other hand a + b + c = (p2 + (p1 + p1 ))((f1 + f1 ) + f2 ) = (p2 + θ)(θ + f2 ) = p2 f2 . Corollary 3 If both ri (a, b, c) and rj (a, b, c) are defined then ri (a, b, c) = rj (a, b, c). This Corollary allows to deﬁne rectangular algebra using only one operation instead of n − 1 operations. Definition 4 The rectangular complement operation r(a, b, c) is defined as follows. If there exists t such that rt (a, b, c) is defined, then r(a, b, c) = rt (a, b, c), else r(a, b, c) is undefined.
All results are valid for the base set A = Q1 × Q2 . . . × Qn , where Qi is a finite alphabet.
Rectangular Codes and Rectangular Algebra
255
Definition 5 The partial algebra Re =< A; r > is called the rectangular algebra. Now we can give equivalent to (1) deﬁnitions of rectangular codes. Definition 6 A code C ⊆ A is called trectangular if < C; rt > is a subalgebra of Ret , i.e., C is closed under operation rt . Definition 7 A code C ⊆ A is called rectangular if < C; r > is a subalgebra of Re, i.e., C is closed under operation r. Definition 8 A rectangular code of minimum cardinality that includes a set G is called the rectangular closure of G and is denoted by [G]. Some properties of rectangular codes can be immediately obtained from the theory of universal algebra [9]. For example, the intersection of rectangular codes is rectangular [1], since the intersection of subalgebras is a subalgebra [9]. The Rectangular closure is unique [9]. We say that a set G generates a rectangular set C if [G] = C. How many words can be generated by a set G? The following theorem gives an upper bound for the rectangular closure of the set G. Theorem 9 [G] ≤ 2G−1 . Proof. Let G = {g1 , . . . , gm }. Every word c ∈ C = [G] can be generated by words from the set G using a chain of rectangular complement operations. So, c = r(y1 , y2 , y3 ), where
(i)
(i)
(i)
yi = r(z1 , z2 , z3 ), i = 1, 2, 3, and so on until we get the rectangular complement of words from the set G. Using Lemma 2 for this chain of rectangular complements we get c = y1 + y2 + y3 , c=
(1) z1
(1)
(1)
+ z2 + z3 + y2 + y3 , ...
c = gi1 + gi2 + . . . + gil ,
(4)
where l is odd since one item is always replaced by 3 items. Since the sum is commutative and distributive we can rewrite 4 as c = k1 g 1 + k2 g 2 + . . . + km g m , where ki is integer and kg = g + . . . + g (k times) by deﬁnition. Σki = l is odd. Using properties 1 and 3 of the sum operation we get c = j 1 g1 + j 2 g2 + . . . + j m gm , where ji ∈ {0, 1}, Σji is odd. So, to each word c ∈ [G] corresponds a binary sequence j1 , . . . , jm of odd Hamming weight. Hence, the number 2m−1 = 2G−1 of such sequences gives an upper bound for [G]. Q.E.D.
256
V. Sidorenko, J. Maucher, and M. Bossert
Definition 10 A set G ⊆ A is called independent if for every g ∈ G : g ∈ / [G\g]. Definition 11 A set B ⊆ A is called a rectangular basis of a rectangular code C if B is independent and [B] = C. An important question for any universal algebra is: “Do all bases of a closed set have the same cardinality?”. Conjecture 12 All bases of a rectangular code have the same number of words. The invariance of the number of elements in a basis follows from the following exchange property: let y, z ∈ / [X] and z ∈ [X ∪y], then y ∈ [X ∪z]. Unfortunately, the rectangular algebra does not have this property. To prove, consider A = {0, 1}3 , X = {(100), (010), (001)}, y = (000), z = (111). However we still think that Conjecture 12 is true. In the next section we show that the conjecture is true for length 2 codes.
3
Rectangular Codes of Length 2
Assume that we are interested in trectangularity of a code C of length n for a particular t only. Denote by Qp the set of all theads of the code and by Qf the set of all ttails. Each codeword of the code can be represented as a word pf, p ∈ Qp , f ∈ Qf . In this section we consider only length 2 codes over alphabets Qp , Qf . Every rectangular code can be represented as a union of disjoint subcodes [1,2] m C= Pi × Fi , (5)
i=1
where Pi ⊆ Qp , Fi ⊆ Qf , Pi Pj = ∅, Fi Fj = ∅, i = j, and every subcode Ci = Pi × Fi is a direct product: Ci = {pf : p ∈ Pi , f ∈ Fi }. The following theorem gives the cardinality of a basis of a direct product. Theorem 13 Let B be a basis of length 2 code C = P × F , then B = P  + F  − 1.
(6)
Proof. Consider a P ×F  rectangular array. Rows of the array are labeled by words of the set P, columns are labeled by words from F. A word c = pf from the code C will be represented by a cross in the array in row p and column f. The code C is represented by the array A(C) completely ﬁlled with crosses. Let us ﬁll the empty array with words of a basis B of the code C, denote the array by A(B). We say that three words a, b, c that satisfy (3) form a triangle because they occupy 2 columns and 2 rows in the array. These words generate the fourth word d = r(a, b, c) situated in the same lines. Using this procedure of cross generating the complete array must be ﬁlled starting from A(B).
Rectangular Codes and Rectangular Algebra
257
1. Every line (row or column) of the array A(B) must contain a cross. Otherwise this line will not be ﬁlled. 2. Let the ﬁrst column of the array A(B) have n1 ≥ 1 crosses. These crosses occupy n1 + 1 lines (n1 rows and 1 column). Denote by Lr the set of these rows and by Lc the set consisting of the ﬁrst column. The cells Lr × Lc of the array A(B) are ﬁlled with crosses. 3. Among the rest columns F \Lc there exists a column f having a cross in rows Lr . Otherwise the crosses in cells Lr × Lc are isolated and the array will not be ﬁlled; this contradicts the fact that B is a basis. The column f has exactly 1 cross in rows Lr , say in row p. If it has more than 1 cross then B is dependent because the set of crosses Lr × Lc and one cross pf will generate all crosses Lr × f. Join the column f to the set Lc and join rows occupied by crosses in column f to the set Lr . Totally we join nf new occupied lines, where nf is number of crosses in the column f. Using the column f the whole array Lr × Lc can be ﬁlled. 4. We will repeat step 3 until all columns of the array will be exhausted. After this we have Lc = F and Lr = P because the whole array must be ﬁlled. Hence the number of occupied lines will be P  + F . On the other hand by construction the number of occupied lines is nf = 1 + B = P  + F , 1+ f ∈F
from where the statement of the theorem follows. Q.E.D. Since tailsets Fi and headsets Pi in (5) are disjoint from Theorem 13 we get Theorem 14 Let B be a basis of length 2 code C that satisfies (5), then B =
(Pi  + Fi  − 1).
(7)
i
The minimal trellis of a length 2 code has m + 2 vertices and edges. So, from Theorem 14 we have
i (Pi  + Fi )
Theorem 15 Let B be a basis of length 2 rectangular code having the minimal trellis T = (V, E) then B = E − V  + 2. Since the minimal trellis of a rectangular code is unique it follows from Theorem 15 that all bases of a rectangular length 2 code have the same cardinality.
4
Bounds on the Cardinality of a Basis
Now we return to the general case of rectangular codes of length n. What can we say about cardinality of a basis B(C) of a code C if only C and Q are known. From Theorem 9 we get
258
V. Sidorenko, J. Maucher, and M. Bossert
Theorem 16 The cardinality of a basis B(C) of a binary rectangular code C is bounded by log2 C + 1 ≤ B(C) ≤ C. The lower bound can be attained for a binary alphabet. So in binary case the bounds can not be improved if only C is available. There exist a wide class of rectangular codes that attain the upper bound. We call these codes prime because they can be generated only by the complete code. Definition 17 A rectangular code C that satisfies B(C) = C is called prime code. The following theorem gives a suﬃcient condition for a code to be prime. Theorem 18 Let C be a rectangular code having minimum distance d(C) and diameter D(C) in the Hamming metric and 2d(C) > D(C),
(8)
then B(C) = C. The condition (8) in Theorem 18 can be replaced by d(C) > n/2. It follows from Theorem 17 and from the Coloring algorithm that E − V  + 2 = C if (8) is satisﬁed. So E − V  does not depend on reordering of codewords positions and one can ﬁnd a permutationminimal trellis that minimizes simultaneously V  and E, when E − V  is constant (this means minimization of the Viterbi decoding complexity).
References 1. F. R. Kschischang, “The trellis structure of maximal fixedcost codes,” IEEE Trans. Inform. Theory, vol. 42, Part I, no. 6, pp. 1828  1838, Nov. 1996. 2. V. Sidorenko, “The Euler characteristic of the minimal code trellis is maximum,” Problems of Inform. Transm. vol. 33, no. 1, pp. 8793, January–March. 1997. 3. V. Sidorenko, I. Martin, and B. Honary “On the rectangularity of nonlinear block codes,” IEEE Trans. Inform. Theory, vol. 45, no. 2, pp. 720  725, March 1999. 4. Y. Shany and Y. Be’ery, “On the trellis complexity of the Preparata and Goethals codes,” to appear in IEEE Trans. Inform. Theory. 5. A. Vardy and F. R. Kschischang, “Proof of a conjecture of McEliece regarding the optimality of the minimal trellis,” IEEE Trans. Inform. Theory, vol. 42, Part I, no. 6, pp. 2027  1834, Nov. 1996. 6. R. Lucas, M. Bossert, M. Breitbach, “Iterative softdecision decoding of linear binary block codes,” in proceedings of IEEE International Symposium on Information Theory and its Applications, pp.811814, Victoria, Canada, 1996. 7. S. Lin, T. Kasami, T. Fujiwara, and M. Fossorier, “Trellises and trellisbased decoding algorithms for linear block codes,” Boston: Kluwer Academic, 1998. 8. V. Sidorenko, J. Maucher, and M. Bossert, “On the Theory of Rectangular Codes,” in Proc. of 6th Intern. Workshop on Algebraic and Combinatorial Coding theory, Pskov, Russia, pp.207210, Sept. 1998. 9. P.M. Cohn, Universal algebra, Harper and Row, New York, N.Y., 1965. 10. Yu. Sidorenko, “How many words can be generated by a rectangular basis”, preprint (in Russian).
Decoding Hermitian Codes with Sudan’s Algorithm T. Høholdt and R. Refslund Nielsen Technical University of Denmark Department of Mathematics, Bldg 303, DK2800 Lyngby, Denmark {tom,refslund}@mat.dtu.dk http://www.student.dtu.dk/~p938546/public.html
Abstract. We present an eﬃcient implementation of Sudan’s algorithm for list decoding Hermitian codes beyond half the minimum distance. The main ingredients are an explicit method to calculate socalled increasing zero bases, an eﬃcient interpolation algorithm for ﬁnding the Qpolynomial, and a reduction of the problem of factoring the Qpolynomial to the problem of factoring a univariate polynomial over a large ﬁnite ﬁeld.
1
Introduction
In 1997 M. Sudan [1] presented a conceptually easy algorithm for decoding ReedSolomon codes of rates less than 13 beyond half the minimum distance. The method was extended to algebraic geometry codes by M.A. Shokrollahi and H. Wassermann in [2], and M. Sudan and V. Guruswami in [3] further improved the algorithm to cover all rates both for ReedSolomon codes and for general algebraic geometry codes. It is clear from [3] that the resulting algorithm has polynomial complexity, but also that some more work was needed to make the computations really eﬃcient. In this paper we address that problem. The paper is organized as follows. Section 2 gives the necessary background on algebraic geometry codes, in particular the codes that come from the Hermitian curve. Section 3 gives the prerequisites on multiplicities and the concept of increasing zero bases, and Section 4 presents and proves Sudan’s algorithm. Section 5 gives an eﬃcient method for calculating increasing zero bases and Section 6 gives a fast interpolation algorithm for ﬁnding the Qpolynomial. Section 7 treats the factorization problem and reduces it to factoring a univariate polynomial over a large ﬁnite ﬁeld for which Berlekamp’s algorithm [4] can be used. Section 8 is the conclusion. A version including more detailed proofs and an example can be obtained at
2
Hermitian Codes
Let χ be a nonsingular absolutely irreducible curve over Fq and let P1 , . . . , Pn , P∞ be Fq rational points on χ. The curve deﬁnes an algebraic function ﬁeld, Fq (χ), with a discrete valuation, vPi , corresponding to each point (i = 1, . . . , n, ∞). More details can be found in [5]. Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 260–269, 1999. c SpringerVerlag Berlin Heidelberg 1999
Decoding Hermitian Codes with Sudan’s Algorithm
261
A class of algebraic geometry codes is given by CL (P1 + . . . + Pn , mP∞ ) = {(f(P1 ), . . . , f(Pn ))  f ∈ L(mP∞ )}
, m
where L(mP∞ ) = {f ∈ Fq (χ)  vP∞ (f −1 ) ≤ m ∧ vPi (f) ≥ 0 for i = 1, . . . , n}. The length of this code is n, and if g denotes the genus of χ and 2g − 1 ≤ m < n then the dimension of the code is k = m − g + 1 and the minimum distance is lower bounded by d∗ = n − m since the number of zeroes of a nonzero function cannot exceed the number of poles. The Hermitian codes over Fq12 are the codes deﬁned by the above method using the Hermitian curve as χ: X q1 +1 − Y q1 − Y = 0 It is wellknown that this curve indeed is nonsingular and absolutely irreducible. Furthermore, the curve contains q13 aﬃne Fq12 rational points and has genus q1 (q1 −1) . 2
In this case, the point P∞ corresponds to the (unique) point at inﬁnity on the homogenization of the Hermitian curve. In the following it will be assumed that a function ﬁeld, Fq (χ), is given and that P1 , . . . , Pn , P∞ are points on a nonsingular and absolutely irreducible curve, χ.
3
Prerequisites
For ≥ 2g − 1, L(P∞ ) is a vector space over Fq of dimension − g + 1. It is well known that L(P∞ ) has a basis, φ1 , . . . , φ−g+1 where the pole order at P∞ is increasing: −1 −1 vP∞ (φ−1 1 ) < vP∞ (φ2 ) < · · · < vP∞ (φ−g+1 )
However, the following theorem (from [3]) shows the existence of bases where the zero multiplicity of a given point – diﬀerent from P∞ – is increasing. Furthermore, the proof of the theorem describes a strategy to ﬁnd these bases. Theorem 1. Let Pi (i ∈ {1, . . . , n}) be a point and let ≥ 2g − 1. Then there exists a basis, φ1,i , . . . , φ−g+1,i of L(P∞ ) such that vPi (φ1,i ) < vPi (φ2,i ) < · · · < vPi (φ−g+1,i ) In the following, such a basis will be called an increasing zero basis with respect to the point Pi . Proof. Suppose that some basis, B = {φ1 , . . . , φ−g+1 }, of L(P∞ ) is given. Suppose that two function have the same valuation at Pi . Then one of them can be replaced by a suitable linear combination of the two having greater valuation. This can be repeated until none of the basis functions have the same valuation
at Pi and an increasing zero basis is obtained.
262
T. Høholdt and R. Refslund Nielsen
Recall that the nonnegative integers, N, are divided into gaps and nongaps by calling s ∈ N a gap if and only if L(sP∞ )\L((s − 1)P∞ ) = ∅. The number of gaps equals the genus, g, of the curve deﬁning the function ﬁeld. For t ∈ N, g(t) denotes the number of gaps less than or equal to t. That is g(t) := t − dim(L(tP∞ )) + 1
(1)
Let R denote the following vector space: R :=
∞
L(iP∞ )
i=0
Suppose that R = span{φi  i ≥ 1} with the poleorders of the φi ’s being strictly increasing. Then R[z] = span{φi z j  i ≥ 1 ∧ j ≥ 0} (where z is transcendental over Fq (χ)). We will deﬁne a total ordering on these basis functions by associating a nonnegative integer – called the weight – to each function. The ordering will be parameterized by the number associated with z. Let this be denoted by ρ(z). Then the weight of the basis function φi z j is given by ρ(φi z j ) = vP∞ (φ−1 i ) + jρ(z)
(2)
An ordering can now be deﬁned using some lexicographic rule to break ties, for example: φi z j < φa z b ⇔ ρ(φi z j ) < ρ(φa z b ) ∨ (ρ(φi z j ) = ρ(φa z b ) ∧ j < b)
(3)
However, in this context only the weighting is important. ρ is extended to any nonzero function in R[z] by the following deﬁnition: Definition 1. Let f ∈ R[z]\{0}. Suppose that f = i,j fi,j φi z j and that ρ(z) is given. Then the weight of f is defined as follows where ρ(φi z j ) is given by (2): ρ(f) = max{ρ(φi z j )  fi,j = 0} The following lemmas describes the weight of the basis functions and the concept of zero multiplicities (proofs are omitted): Lemma 1. Let ρ(z) ≥ 2g − 1 be given and suppose that the basis functions of R[z] are enumerated as Q0 , Q1 , . . . so that ρ(Q0 ) ≤ ρ(Q1 ) ≤ . . . . Let j ∈ N be given. Then let r and t satisfy r r+1 ρ(z) − (r − 1)g ≤ j < ρ(z) − rg 2 2 r tr − g(t) ≤ j − ( ρ(z) − (r − 1)g) < (t + 1)r − g(t + 1) 2 where g(t) is given by (1). The weight of Qj is now given by ρ(Qj ) = (r − 1)ρ(z) + t
Decoding Hermitian Codes with Sudan’s Algorithm
263
deg(f ) Definition 2. Let f ∈ R[z] with f = fj (z − z0 )j for some z0 ∈ Fq . j=0 Then the pair, (Pi , z0 ), is said to be a zero of multiplicity s of f if vPi (fj ) ≥ s − j for all j ≤ s and vPi (fj ) = s − j for some j ≤ s. Lemma 2. If (Pi , z0 ) is a zero of multiplicity s of f then vPi (f(t + z0 )) ≥ s for any t ∈ R with vPi (t) ≥ 1. Lemma 3. Let φ1,i , . . . , φ−g+1,i be an increasing zero basis with respect to Pi and consider a nonzero polynomial, f ∈ R[z]. f can then be written as f(z) =
u j=0
fj,k φk,i z j
k
u −j with u ≥ deg(f). If fj,k := =j f,k = 0 for all j + k ≤ s then (Pi , z0 ) z j 0 is a zero of multiplicity at least s of f.
4
Sudan’s Algorithm
Let B(w, r) denote the ball in Fnq with center w and radius r: B(w, r) := {u ∈ Fnq  d(w, u) ≤ r} The decoding problem for a code, C ⊆ Fnq , and a received word, w ∈ Fnq , can then be speciﬁed as calculating the set decτ (w) := C ∩ B(w, τ ) where τ ≥ 0 is an integer. If τ is smaller than half the minimum distance, then decτ (w) will always contain at most one codeword, however, we will allow τ to be greater than or equal to half the minimum distance. When that is the case, decoding may not be unique since decτ (w) may contain two or more codewords. This is therefore called listdecoding, and we will refer to τ as the errorcorrecting capability of a decoding algorithm which is capable of calculating decτ (w) for any received word, w. The version of Sudan’s algorithm given below was ﬁrst presented in [3]. The algorithm can be seen as an extension of the generalization of Sudan’s original algorithm to the case of algebraic geometry codes (see [2] and [1]). The extension gives an improved errorcorrecting capability over the original algorithm at all information rates if the code is suﬃciently long. The description used here is from the presentation of Sudan’s original algorithm in [6]. Algorithm 1. Input: The code CL(P1 + · · · + Pn , (k + g − 1)P∞ ) with k > g − 1, a received word, w = (w1 , . . . , wn ) ∈ Fnq , and a parameter, s ≥ 1. Output: decτs (w).
264
T. Høholdt and R. Refslund Nielsen
−1 – Set ρ(z) := vP∞ (φk ) and calculate rs and t as in Lemma 1 with j := s+1 n . Now 2 (rs − 1)ρ(z) + t −1 τs = n − s
– Calculate Q(z) ∈ R[z]\{0} so that 1. For all i, Q has (Pi , wi ) as a zero of multiplicity at least s (see Definition 2). 2. ρ(Q) ≤ (rs − 1)ρ(z) + t – Factorize Q into irreducible factors. – If z−f divides Q and f ∈ L((k+g−1)P∞ ) and d((f(P1 ), . . . , f(Pn )), w) ≤ τs then add (f(P1 ), . . . , f(Pn )) to the set of candidates. That is decτs (w) := {(f(P1 ), . . . , f(Pn ))  f ∈ L((k + g − 1)P∞ ) ∧ (z − f)Q ∧ d((f(P1 ), . . . , f(Pn )), w) ≤ τs } Any polynomial, Q, satisfying the two conditions in the algorithm will be called a Qpolynomial (with zero multiplicity s) in the following. The correctness of this algorithm must be shown by proving the existence of a Qpolynomial and proving that such a polynomial has the right factors. The existence is given by the following theorem: Theorem 2. A Qpolynomial does exist. Proof. By Lemma 3 it is clear that the ﬁrst condition on a Qpolynomial can be written as a system of homogeneous linear equations. By Lemma 1 there is a nonzero solution satisfying the second condition.
The fact that a Qpolynomial has all the factors corresponding to codewords in decτs (w) is proved by the following lemma and theorem: Lemma 4. Let f ∈ L(mP∞ ) and suppose that f(Pi ) = wi . Then vPi (Q(f)) ≥ s. Proof. Follows from the deﬁnition of Q and Lemma 2 since vPi (f − wi ) ≥ 1.
Theorem 3. Let Q be a Qpolynomial and suppose that f ∈ L(k + g − 1) with d((f(P1 ), . . . , f(Pn )), w) ≤ τs . Then (z − f)Q. Proof. Let h = Q(f). Then ρ(h) ≤ (rs − 1)ρ(z) + t. By Lemma 4 vPi (h) ≥ s for each value of i where f(Pi ) = wi . This happens at least n − τs times. So the total number of zero multiplicities of h is at least
(rs − 1)ρ(z) + t vPi (h) ≥ s(n − τs ) = s( + 1) > (rs − 1)ρ(z) + t ≥ vP∞ (h−1 ) s i=1
n
So h = 0 and therefore z − f must divide Q.
Remark 1. Notice that by Lemma 1 the degree of Q must be less than rs . This means that Sudan’s algorithm gives at most rs − 1 codewords as output. So decτs (w) < rs
Decoding Hermitian Codes with Sudan’s Algorithm
5
265
Calculating Increasing Zero Bases
In principle, the method for calculating increasing zero bases, which is described in the proof of Theorem 1 is perfectly ﬁne. However, in practice there are some unsolved problems since it is not trivial to calculate the standard representation of a function and evaluate the unit. In this section both problems are solved in the case of the Hermitian function ﬁeld. Suppose that we want to calculate the standard representation of some polynomial, f ∈ OPi , with respect to the point Pi = (xi , yi ) (i ∈ {1, . . . , n}). In this case x − xi is a valid local parameter in Pi . Since f is regular in Pi the valuation of f in Pi can in principle be calculated by repeatedly dividing f by x − xi until a unit (a function which evaluates to a nonzero value in Pi ) is obtained. f can now be written as f=
q1
fα,β (x − xi )α (y − yi )β
(4)
α=0 β∈N
j where fα,β = j∈Z fα,β,j e with fα,β,j ∈ Fq12 , fα,β,j = 0 for all but a ﬁnite number of values of j, and e := (y − yi )q1 −1 + 1. It should be mentioned that this representation of f is not unique, however, that will not be a problem in this context. The idea of using this representation is that f can now be divided by x − xi in such a way that the result is a function which can be written on the same form. This is seen by noticing that in the Hermitian function ﬁeld we have (x − xi )q1 + xi (x − xi )q1 −1 + xqi 1 y − yi = (5) x − xi e e is a unit since e(Pi ) = 1 and furthermore f(Pi ) = j∈Z f0,0,j . In the case where f is not a unit (f(Pi ) = 0) let s := min{j  f0,0,j = 0}. Then f0,0 = es f0,0,j ej−s j∈Z
where
j−s j∈Z f0,0,j e
is a polynomial in y, which is divisible by y − yi , since f
is not a unit. So with h :=
f0,0,j ej−s y−yi
j∈Z
we have
f0,0 y − yi = es h = es−1 ((x − xi )q1 + xi (x − xi )q1 −1 + xqi 1 )h x − xi x − xi
(6)
This leads to the following algorithm for calculating the standard representation of a polynomial in the Hermitian function ﬁeld: Algorithm 2. Input: Polynomial, f, and point Pi = (xi , yi ). Output: u and m so that m = vPi (f) and f = u(x − xi )m . 1. Initialization: f (0) := f = α,β fα,β (x − xi )α (y − yi )β , m := 0. 2. If j∈Z f0,0,j = 0 then let u := f (m) and stop.
266
T. Høholdt and R. Refslund Nielsen
3. Use equations (6) and (5) to calculate f (m+1) :=
f (m) = x − xi
f0,0 (m) y − yi + fα,β (m) (x − xi )α−1 (y − yi )β + f0,β (m) (y − yi )β−1 x − xi x − xi α≥1,β
β≥1
4. Let m := m + 1 and go to step 2. The above algorithm for calculating standard representations provides all what is needed to implement the method for determining increasing zero bases in the proof of Theorem 1 since the unit of the standard representation is given on a form which can be evaluated directly.
6
Interpolation
The goal of the interpolation step is to determine a valid Qpolynomial. As mentioned in Theorem 2, such a polynomial must exist in the vector space s+1 n span{Q0 , . . . , Q } , := 2 with Q0 , . . . , Q being as in Lemma 1, and a Qpolynomial can be found by solving a system of linear equations using Gaussian elimination (In the following each of these equations will be referred to as a zerocondition, see Lemma 3). However, the system has a special structure, and that can be used to make the calculations more eﬃcient. One method for doing this is described in the following. The method is an application of the Fundamental Iterative Algorithm, along the same lines as the application in [7], Chapter 4. The Fundamental Iterative Algorithm was ﬁrst presented in [8]. Let ord : R[z] → N∞ be given by ord(0) = −∞, ord(span{Q0 }\{0}) = {0} and ord(span{Q0 , . . . , Qi }\span{Q0 , . . . , Qi−1 }) = {i} for i > 0. If f ∈ R[z] then ord(f) will be called the order of the function f. The task of ﬁnding a Qpolynomial can now be rephrased as the task of ﬁnding a polynomial which satisﬁes the zeroconditions and which is minimal with respect to ord. It will be safe only to look for such a polynomial in the vector space V := {fz j  f ∈ R ∧ 0 ≤ j ≤ max{deg(Qi )  0 ≤ i ≤ }} The idea of the algorithm which is presented in the following is to make a partition of V and then consider the zeroconditions one by one while maintaining a list of polynomials – one from each partition class – which all satisﬁes the
Decoding Hermitian Codes with Sudan’s Algorithm
267
zeroconditions considered so far, and which all are minimal within the given partition class. To do this we need for each point a (small) element of R[z] which has a given pair, (Pi , wi ), as a zero of multiplicity at least s and which whenever it is multiplied by a polynomial gives a result within the same partition class as that polynomial. To make this work, the order of these elements must be the same for all i ∈ {1, . . . , n}. Therefore, choose t as the smallest integer so that ord(φj(i),i ) = t ∧ vPi (φj(i),i ) ≥ s , i = 1, . . . , n Now construct a partition of V doing the following: Let A := ord(V )\{i ∈ ord(V )  ∃f ∈ ord−1 ({i})∃h ∈ ord−1 ({t}) : hf} Furthermore, let h ∈ ord−1 ({t}) and deﬁne T :=
∞
ord−1 (ord(hi ))
i=0
Notice that this deﬁnition of T does not depend on the choice of h. Let A= {a1 , . . . , aA } and let Gj := ord−1 (ord({fj g  fj ∈ ord−1 ({aj }) ∧ g ∈ T })) Now G1 , . . . , GA is a partition of V , and furthermore, if a polynomial in some Gj is multiplied by a polynomial in ord−1 ({t}) then the result will remain in Gj . following notation is needed: Let f ∈ R[z] be written as f = Finally, the j f φ z then j,k j,k k,i coef(f, φk,i z j ) := fj,k Now the algorithm can be stated: Algorithm 3. Input: Pairs, (P1 , w1 ), . . . , (Pn , wn ), and required zero multiplicity, s. Initialize by setting G := {g1 , . . . , gA } so that ord(G) = A (which means ord(gj ) = min ord(Gj ) for j = 1, . . . , A). For i = 1, . . . , n do the following: For each pair (j, k) ∈ N2 with j + k ≤ s and k ≥ 1 do the following: Let G := {g ∈ G  coef(g(z + wi ), φk,i z j ) = 0}. If G = ∅ then choose f ∈ G so that ord(f) = min ord(G ) and let G := {φj(i),if} ∪ {coef(f(z + wi ), φk,i z j )g − coef(g(z + wi ), φk,i z j )f} After this, the result is given by the polynomial in G which is smallest with respect to ord. The correctness of this algorithm is proved by induction over the iteration steps since G at any time holds one polynomial from each partition class satisfying the zeroconditions considered so far and minimal with respect to ord.
268
7
T. Høholdt and R. Refslund Nielsen
Factorization
The Qpolynomial is a polynomial in R[z] and therefore, factorization is not so easy. However, fairly simple and eﬃcient methods exist for factoring univariate polynomials over a ﬁnite ﬁeld (see for example [4], Chapter 4). In this section the problem of factoring the Qpolynomial is transformed into a problem of factoring a univariate polynomial over a (large) ﬁnite ﬁeld. In the case of Hermitian codes, R = Fq12 [X, Y ]/X q1 +1 − Y q1 − Y and R = span{xα y β  0 ≤ α ≤ q1 ∧ 0 ≤ β}. So seen this way, polynomials in span{Q0 , . . . , Q } have degree in x smaller than q1 + 1 and degree in y smaller than some integer, c. Now let f(Y ) ∈ Fq12 [Y ] with deg(f) ≥ c and D1 = Fq12 [Y ]/f(Y ) Furthermore, let g(X) := X q1 +1 − Y q1 − Y mod f and D2 = D1 [X]/g(X) Consider the mappings, φ : Fq12 [X, Y ][z] → D1 [X][z] and θ : D1 [X][z] → D2 [z], given by φ(h1 ) = h1 mod f θ(h2 ) = h2 mod g It is a wellknown fact that these mappings are homomorphisms and that the composition of these mappings, θφ, is again a homomorphism. So suppose that h ∈ L(mP∞ ) and that z − h  Q then θφ(z − h)  θφ(Q) and furthermore, reducing z − h modulo f and g will not change z − h since the degree in Y of f and the degree in X of g is higher than the degrees of h in Y and X respectively. Therefore, in the factorization step, it will be suﬃcient to factorize θφ(Q), since this will still reveal the factors corresponding to codewords within distance τs from the received word. This will be very useful if f(Y ) is chosen so that D2 is a ﬁnite ﬁeld. That will be the case if and only if D1 is a ﬁnite ﬁeld (f is irreducible over Fq12 ) and g(X) is irreducible over D1 . Suppose that f is irreducible so that D1 Fq2c1 , where c1 = deg(f). Notice 1 that g(X) = X q1 +1 − (y q1 + y) then is a binomial in Fq2c1 [X]. The question is 1 now if f can be chosen with degree at least c so that g is irreducible. That this is in fact the case is shown in the following. The following theorem, which is a special case of theorem 3.75 of [4] is needed: Theorem 4. Let ω ∈ Fq12c \{0} and let e denote the order of ω. Then X q1 +1 − ω is irreducible over Fq12c if and only if each prime factor of q1 + 1 divides e but not
q12c −1 e .
Decoding Hermitian Codes with Sudan’s Algorithm
269
Notice that since q1 + 1 divides q12c − 1 the theorem states that X q1 +1 − ω is q2c −1 irreducible if and only if gcd(q1 + 1, 1 e ) = 1 with e being as in the theorem. The existence of a polynomial as the one called f above is given by the following theorem (the proof is omitted here): Theorem 5. Let c ≥ 1 be an integer. Then there exists a polynomial, f(Y ), so that deg(f) ≥ c and the order, e, of y q1 + y in Fq12 [Y ]/f(Y ) satisfies gcd(q1 + 1,
q12c − 1 )=1 e
It should be mentioned that experiments indicate that irreducible polynomials with the property described in Theorem 5 are rather common, so in practice it seems to be suﬃcient to generate random irreducible polynomials and check if they have the right property. However, we have no proof that this will always work.
8
Conclusion
We have demonstrated how to decode Hermitian codes eﬃciently beyond half the minimum distance using Sudan’s algorithm. The main steps are calculation of increasing zero bases, fast interpolation in order to determine the Qpolynomial, and a fast method of factorization. The actual complexity of the overall algorithm remains to be calculated and the extension to more general algebraic geometry codes is a subject for future work.
References 1. Madhu Sudan: “Decoding of Reed Solomon Codes beyond the ErrorCorrection Bound” Journal of Complexity 13, 180193, 1997. 2. M.A. Shokrollahi and H. Wassermann: “List Decoding of AlgebraicGeometric Codes” IEEE Trans. Inform. Theory, vol 45, pp. 432437, March 1999. 3. Venkatesan Guruswami and Madhu Sudan: “Improved Decoding of ReedSolomon and AlgebraicGeometric Codes” Proc. 39th IEEE Symp. Foundations Comput. Sci., 1998. 4. Rudolf Lidl and Harald Niederreiter: “Finite ﬁelds” AddisonWesley, 1983. 5. Henning Stichtenoth: “Algebraic function ﬁelds and codes” SpringerVerlag, 1993. 6. Weishi Feng and R. E. Blahut: “Some Results on the Sudan Algorithm”, Proceedings 1998 IEEE International Symposium on Information Theory, p.57, august 1998. 7. Ralf K¨ otter: “On Algebraic Decoding of AlgebraicGeometric and Cyclic Codes” Department of Electrical Engineering, Link¨ oping University, 1996. 8. G.L. Feng and K. K. Tzeng: “A Generalization of the BerlekampMassey Algorithm for Multisequence ShiftRegister Synthesis with Applications to Decoding Cyclic Codes”, IEEE Trans. Inform. Theory, vol. 37, pp. 12741287, Sept. 1991. 9. H. Elbrønd Jensen, R. Refslund Nielsen, and T. Høholdt: “Performance analysis of a decoding algorithm for algebraic geometry codes” IEEE Trans. on Inform. Theory, vol. 45, pp. 17121717, July 1999.
Computing a Basis of L(D) on an Aﬃne Algebraic Curve with One Rational Place at Inﬁnity Ryutaroh Matsumoto1 and Shinji Miura2 1
Sakaniwa Lab., Dept. of Electrical and Electronic Engineering Tokyo Institute of Technology, 2121 Ookayama, Meguroku, Tokyo, 1528552 Japan rmatsumoto@member.ams.org http://tskwww.ss.titech.ac.jp/~ryutaroh/ 2 Sony Corporation Information & Network Technologies Laboratories 6735, Kitashinagawa, Shinagawaku, Tokyo, Japan
Abstract. Under the assumption that we have deﬁning equations of an aﬃne algebraic curve in special position with respect to a rational place Q, we propose an algorithm computing a basis of L(D) of a divisor D from an ideal basis of the ideal L(D + ∞Q) of the aﬃne coordinate ring L(∞Q) of the given algebraic curve, where L(D + ∞Q) := ∞ i=1 L(D + iQ). Elements in the basis produced by our algorithm have pairwise distinct discrete valuations at Q, which is crucial in the construction of algebraic geometry codes. Our method is applicable to a curve embedded in an aﬃne space of arbitrary dimension, and involves only the Gaussian elimination and the division of polynomials by the Gr¨ obner basis of the ideal deﬁning the curve.
1
Introduction
For a divisor D on an algebraic curve, there exists the associated linear space L(D). Recently we showed how to apply the FengRao bound and decoding algorithm [8] for the Ωconstruction of algebraic geometry codes to the Lconstruction, and showed examples in which the Lconstruction gives better linear codes than the Ωconstruction in certain range of parameters on the same curve [15]. In order to apply the FengRao algorithm to an algebraic geometry code from the Lconstruction, we have to ﬁnd a basis of the diﬀerential space Ω(−D + mQ) whose elements have pairwise distinct discrete valuations at the place Q. Finding such a basis of Ω(−D + mQ) reduces to the problem ﬁnding a basis of L(D ) whose elements have pairwise distinct discrete valuations at Q as described in [16]. Finding such a basis of Ω(−D + mQ) is also necessary in the precomputation of the eﬃcient encoding method proposed in [16]. But there seems to be no algorithm capable to ﬁnd such a basis of L(D ). In this paper we present an algorithm computing such a basis.
2000 Mathematical Subject Classiﬁcation. Primary 14Q05, 13P10; Secondary 94B27, 11T71, 14H05, 14C20.
Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 271–281, 1999. c SpringerVerlag Berlin Heidelberg 1999
272
Ryutaroh Matsumoto and Shinji Miura
An aﬃne algebraic curve with one rational place Q at inﬁnity is easy to handle and used in many publications [9,17,18,19,20,21,22,23]. For a divisor D ∞ we deﬁne L(D+∞Q) := i=1 L(D+iQ). An aﬃne algebraic curve is said to be in special position with respect to a place Q of degree one if its aﬃne coordinate ring is L(∞Q) and the pole orders of coordinate variables generate the Weierstrass semigroup at Q (Deﬁnition 2). Under the assumption that we have deﬁning equations of an aﬃne algebraic curve in special position with respect to Q, we point out that a divisor can be represented as an ideal of L(∞Q), and we propose an eﬃcient algorithm computing a basis of L(D). For eﬀective divisors A and B with supp A ∩ supp B = ∅ and Q ∈ / supp A ∪ supp B, there is a close relation between the linear space L(A − B + nQ) and the fractional ideal L(A − B + ∞Q) of L(∞Q), namely L(A − B + nQ) = {f ∈ L(A − B + ∞Q)  vQ (f) ≥ −n}, where vQ denotes the discrete valuation at Q. When A = 0, by this relation we can compute a basis of L(−B + nQ) from a generating set of L(−B + ∞Q) as an ideal of L(∞Q) under a mild assumption. When A > 0, we ﬁnd an eﬀective divisor E such that −E + n Q is linearly equivalent to A − B + nQ, then ﬁnd a basis of L(−E + n Q) from a generating set of the ideal L(−E + ∞Q), then ﬁnd a basis of L(A − B + nQ) from that of L(−E + n Q) using the linear equivalence. Computing an ideal basis of L(−E + ∞Q) from A − B + nQ involves computation of ideal quotients in the Dedekind domain L(∞Q), but by clever use of the properties of an aﬃne algebraic curve in special position, our method involves only the Gaussian elimination and a small number of division of polynomials by the Gr¨ obner basis of the ideal deﬁning the curve, thus it is eﬃcient. Moreover while the other algorithms [2,3,4,6,11,12,13,25,26] except [10] are applicable only to a plane algebraic curve, our method is applicable to a curve embedded in an aﬃne space of arbitrary dimension. The algorithm [10] is designed for an arbitrary projective nonsingular variety whose homogeneous coordinate ring satisﬁes Serre’s normality criterion S2 (a deﬁnition of S2 can be found in [7, Theorem 11.5]), and due to the wide applicability their method involves Buchberger’s algorithm that sometimes takes very long computation time. Due to the page limitation we had to omit all examples and most of proofs. For the complete version, please wait for the journal paper version or send email to the ﬁrst author.
2
Theoretical Basis for Computation
First we ﬁx notations used in this paper. K denotes an arbitrary perfect ﬁeld. We consider an algebraic function ﬁeld F/K of one variable. IPF denotes the set of places in F/K. For a place P , OP (resp. vP ) denotes the discrete valuation ring (resp. discrete valuation) corresponding to P . Other notations follow those in Stichtenoth’s textbook [24] unless otherwise speciﬁed.
Computing a Basis of L(D)
273
In this section, we introduce theorems which play important roles in ideal computation in the aﬃne coordinate ring of an aﬃne algebraic curve and computation of a basis of L(D). Hereafter we ﬁx a place Q of degree one in F/K. 2.1
Relation between Fractional Ideals of Nonsingular Aﬃne Coordinate Ring and Divisors in Function Field ∞ Deﬁnition 1. For a divisor D in F/K, we deﬁne L(D + ∞Q) := i=0 L(D + iQ). Then we have L(∞Q) = Q=P ∈IPF OP , and we can see that L(∞Q) is a Dedekind domain by [24, p.71]. By [24, Proposition III.2.9], the set of maximal ideals in L(∞Q) is {L(−P + ∞Q)  P ∈ IPF \ {Q}}. Proposition 1. For a divisor D in F/K with Q ∈ / supp(D), L(−D + ∞Q) is a fractional ideal in L(∞Q). We have L(−D + ∞Q) = L(−P + ∞Q)vP (D). P ∈IP F
Corollary 1. For two divisors D, E with support disjoint from Q, L(−D + ∞Q)L(−E + ∞Q) = L(−(D + E) + ∞Q), L(−D + ∞Q) + L(−E + ∞Q) = L(− min{vP (D), vP (E)}P + ∞Q), P =Q
L(−D + ∞Q) ∩ L(−E + ∞Q) = L(−
max{vP (D), vP (E)}P + ∞Q),
P =Q
L(−D + ∞Q) : L(−E + ∞Q) = L(−
max{0, vP (D) − vP (E)}P + ∞Q).
P =Q
Proof. The assertion follows from Proposition 1 and the unique factorization of an ideal in a Dedekind domain [27, Theorem 11, Section 5.6].
By the facts described so far, we can show a preliminary version of our method for a basis of L(D). Let D := A − B + nQ, where A and B are eﬀective, supp A ∩ supp B = 0 and Q ∈ / supp A ∪ supp B. Suppose that generating sets of the ideal L(−A + ∞Q) and L(−B + ∞Q) are given. If A = 0, then L(D) = {x ∈ L(−B + ∞Q)  vQ (x) ≥ −n}. From the equation above if we have a basis of L(−B + ∞Q) as a Klinear space with pairwise distinct pole orders at Q, then ﬁnding a basis of L(D) from that
274
Ryutaroh Matsumoto and Shinji Miura
of L(−B + ∞Q) is just picking up elements in the basis of L(−B + ∞Q) with pole orders ≤ n. We shall show how to compute such a basis of L(−B + ∞Q) from a generating set of the ideal L(−B + ∞Q) in Theorem 1. If A = 0, then choose a nonzero element f ∈ L(−A + ∞Q). Let f be the ideal generated by f in L(∞Q), and I = (fL(−B + ∞Q)) : L(−A + ∞Q). Then I = L(−(f) + ∞Q)L(−B + ∞Q) : L(−A + ∞Q) = L(A − B − (f) + ∞Q). Since L(A − B − (f) + ∞Q) is an ordinary ideal of L(∞Q), we can compute a basis of L(A − B + nQ − (f)), say b1 , . . . , bl . Then b1 /f, . . . , bl /f is a basis of L(D) = L(A − B + nQ). We have to compute an ideal quotient in the method above. We shall show how to compute an ideal quotient only with linear algebra techniques in Section 4. 2.2
Modules over L(∞Q)
In this subsection we shall study how we can represent an L(∞Q)submodule of F. Proposition 2. For a Ksubspace W of L(∞Q), suppose that there is a subset {αj }j∈vQ (W \{0}) ⊂ W such that vQ (αj ) = j. Then {αj }j∈vQ (W \{0}) is a Kbasis of W .
Let a ∈ −vQ (L(∞Q) \ K). Then a > 0. Fix an element x ∈ F such that (x)∞ = aQ. Proposition 3. For an L(∞Q)submodule M of L(∞Q), we set bi := min{j ∈ −vQ (M \ {0})  j mod a = i} for i = 0, . . . , a − 1. Choose elements yi ∈ M such that vQ (yi ) = −bi . Then {y0 , y1 , . . . , ya−1 } generates M as a K[x]module. Proposition 4. We retain notations from the previous proposition. If a Ksubspace W generates M as a K[x]module, that is, M = K[x]W , then we can ﬁnd the elements yi in W for i = 0, . . . , a − 1.
3
Gr¨ obner Bases of an Aﬃne Algebraic Curve with a Unique Rational Place at Inﬁnity
An aﬃne algebraic curve with a unique rational place at inﬁnity is convenient and has been treated by several authors independently [9,17,18,19,20,21,22,23]. we review and extend results in [19,20,23].
Computing a Basis of L(D)
275
Deﬁnition 2. [23, Deﬁnition 11] Let an ideal I ⊂ K[X1 , . . . , Xt ] deﬁne an aﬃne algebraic curve, R := K[X1 , . . . , Xt ]/I, F the quotient ﬁeld of R, and Q a place of degree one in F/K. Then the aﬃne algebraic curve deﬁned by I is said to be in special position with respect to Q if the following conditions are met: 1. The pole divisor of Xi mod I is a multiple of Q for each i. 2. The pole orders of X1 mod I, X2 mod I, . . . , Xt mod I at Q generate the Weierstrass semigroup {i  L(iQ) = L((i − 1)Q)} at Q. In other words, for any j ∈ {i  L(iQ) = L((i − 1)Q)} there exists nonnegative integers l1 , . . . , lt such that j=
t
−li vQ (Xi mod I).
i=1
The Weierstrass form of elliptic curves can be considered as a special case of curves in special position. Proposition 5. We retain notations from Deﬁnition 2. R = L(∞Q) and the aﬃne algebraic curve deﬁned by I is nonsingular.
If an algebraic curve is not in special position, then the proposed method cannot be applied to it. We can put an arbitrary algebraic curve into special position using Gr¨ obner bases if we know elements in the function ﬁeld which have their unique pole at some place Q of degree one and their pole orders generate the Weierstrass semigroup −vQ (L(∞Q) \ {0}) [23, p.1739]. In another direction, it is convenient to have a class of algebraic curves known to be in special position. Miura found the necessary and suﬃcient condition for a nonsingular nonrational aﬃne plane curve to be in special position [19,20]. An aﬃne algebraic set deﬁned by F (X, Y ) = 0 is a nonsingular nonrational aﬃne algebraic curve in special position with respect to Q if and only if it is nonsingular and F (X, Y ) = αb,0 X b + α0,a Y a + αi,j X i Y j ai+bj
where αi,j ∈ K, both αb,0 and αa,0 are nonzero and a and b are relatively prime positive integers1. In the above situation vQ (X mod F (X, Y )) = −a and vQ (Y mod F (X, Y )) = −b. Then he generalized this result to curves in aﬃne space of arbitrary dimension [19,20]. We use the theory of Gr¨ obner bases. Basic facts in the theory are explained in [5]. Hereafter I ⊂ K[X1 , . . . , Xt ] denotes an ideal deﬁning an algebraic curve in special position with respect to a place Q of degree one of the function ﬁeld F of the curve, unless otherwise stated. We ﬁx a monomial order ≺ on K[X1 , . . . , Xt ] induced by the discrete valuation at Q. IN0 denotes the set of nonnegative integers. 1
Although all published proofs of that fact are in Japanese, an English proof can be found in [14].
276
Ryutaroh Matsumoto and Shinji Miura
Deﬁnition 3. We deﬁne X1m1 X2m2 · · · Xtmt ≺ X1n1 X2n2 · · · Xtnt if −vQ (X1m1 · · · Xtmt mod I) < −vQ (X1n1 · · · Xtmt mod I), or −vQ (X1m1 · · · Xtmt mod I) = −vQ (X1n1 · · · Xtmt mod I) and (m1 , . . . , mt )
Computing a Basis of L(D)
277
Let NF(I) be the set of polynomials F ∈ K[X1 , . . . , Xt ] such that the remainder on division of F by a Gr¨ obner basis of I is F itself. An element f ∈ L(∞Q) is represented in a computer by a polynomial F ∈ NF(I) such that F mod I = f. By Proposition 6 and 7, {X N  N ∈ B(≺)} is a Kbasis of NF(I). If X1n1 X2n2 · · · Xtnt is the leading monomial of F ∈ NF(I), then vQ (F mod I) = −a1 n1 − · · · − at nt , because the lower terms of F with respect to the monomial order ≺ have higher discrete valuations at Q by deﬁnition of B(≺). This easy method for discrete valuation computation is crucial in Theorem 1.
4
Fast Computation of Ideal Quotients
In this section we show how we can eﬃciently compute various ideal operations in L(∞Q). We retain notations from the previous section and deﬁne a := −vQ (X1 mod I). To make computation most eﬃcient, we have to make a(= 0) as small as possible. 4.1
Representation of Ideals
Deﬁnition 6. For a nonzero ideal J ⊂ L(∞Q), we call G0 , . . . , Ga−1 ∈ NF(I) a standard basis for J if: 1. G0 mod I, . . . , Ga−1 mod I belong to J . 2. −vQ (Gi mod I) = min{j ∈ −vQ (J \ {0})  j mod a = i}. Note that by deﬁnition Gi mod I = 0 for i = 0, . . . , a − 1. This representation is convenient for computation of a basis of L(D). Theorem 1. Suppose that B is an eﬀective divisor with vQ (B) = 0 and G0 , . . . , Ga−1 is a standard basis for L(−B + ∞Q). Then a basis of L(−B + nQ) is {X1i Gj mod I  vQ (X1i Gj mod I) ≥ −n}. Proof. The assertion follows from Proposition 2.
We describe how to compute G0 , . . . , Ga−1 from given F1 , . . . , Fs ∈ K[X1 , . . . , Xt ] where F1 mod I, . . . , Fs mod I generate J . For simplicity we assume that none of Fi mod I is zero. Let T0 , . . . , Ta−1 ∈ T (≺) satisfy −vQ (X Ti mod I) = min{j ∈ −vQ (L(∞Q) \ {0})  j mod a = i}. Then {X Ti Fj mod I  0 ≤ i ≤ a − 1, 1 ≤ j ≤ s} generates J as a K[X1 mod I]module since {X Ti mod I} generates L(∞Q) as a K[X1 mod I]module by Proposition 7. Let {Hl } be the set of remainders on division of XTi Fj by a
278
Ryutaroh Matsumoto and Shinji Miura
Gr¨ obner basis of I for 0 ≤ i ≤ a − 1 and 1 ≤ j ≤ s. Then the Kvector space generated by H1 , . . . , Hsa generates J as a K[X1 mod I]module and by Proposition 4, we can ﬁnd G0 , . . . , Ga−1 from the Kvector space generated by H1 , . . . , Hsa . We compute G0 , . . . , Ga−1 by Gaussian elimination as follows. Let {B1 , B2 , . . . } = ∆(I) such that vQ (X Bi mod I) > vQ (X Bi+1 mod I),
(1)
and deﬁne the integer µ by the equation −vQ (X Bµ mod I) = max{−vQ (Hi mod I)  i = 1, . . . , sa}. Write each polynomial Hi as Hi = mi1 X Bµ + mi2 X Bµ−1 + . . . + miµ X B1 , for each i. Note that X B1 = 1. Consider the matrix (mij ). By elementary row operations, we can transform the matrix (mij ) into a form such that for any two nonzero rows the columns of their leftmost nonzero elements are diﬀerent. Let (nij ) be a transformed matrix of (mij ), and Ei :=
µ
nij X Bµ+1−j .
j=1
Since the leading monomials of Ek and El are diﬀerent if k = l, vQ (Ek mod I) = vQ (El mod I). Then {vQ (Ei mod I)  1 ≤ i ≤ sa} equals to vQ (H1 mod I, . . . , Hsa mod I) where · denotes the vector space generated by ·. Thus we can choose G0 , . . . , Ga−1 as Gi = Ek ∈ / I where −vQ (Ek mod I) = {j ∈ {−vQ (E1 mod I), . . . , vQ (Esa mod I)}  j mod a = i}. Since (mij ) is a µ × sa matrix, the number of arithmetic operations in K required to compute nij from mij is the order of O(max{µ, sa}3 ), and µ ≤ max{−vQ (Hi mod I)  i = 1, . . . , sa} = max{−vQ (Fi mod I)  i = 1, . . . , s} + max{−vQ (X Ti mod I)  i = 1, . . . , a} These G0 , . . . , Ga−1 have a nice property, which is convenient for computation of an ideal quotient. Proposition 8. Let G be a Gr¨ obner basis for I. Then {G0 , . . . , Ga−1 } ∪ G is a Gr¨ obner basis for I + G0 , . . . , Ga−1 where · denotes the ideal generated by ·.
If one does not need eﬃciency and want to reduce the eﬀort required to implement our algorithm, there is an alternative approach to compute a standard basis for J . Proposition 9. Let G be a Gr¨ obner basis for I + F1 , . . . , Fs . Then we can ﬁnd all elements in a standard basis for J among { the remainders on division of X T H  T ∈ T (≺), H ∈ G}.
Computing a Basis of L(D)
4.2
279
Ideal Quotient
Suppose that a standard basis of an ideal J1 ⊂ L(∞Q) is {G0 , . . . , Ga−1 }, and an ideal J2 ⊂ L(∞Q) is generated by {H1 mod I, . . . , Hb mod I}, where Hi ∈ K[X1 , . . . , Xt ] for each i. We would like to compute a standard basis of J1 : J2 . Obviously J1 ⊆ J1 : J2 . Let F0 , . . . , Fa−1 be a standard basis of J1 : J2 . Each Fi is determined by the following algorithm. Each element in B(≺) is indexed as Eq. (1). Algorithm 2. In this algorithm, variables are integers α, γ, a polynomial elementinquotient ∈ K[X1 , . . . , Xt ], and a polynomial candidate ∈ K(β1 , . . . , βα−1 )[X1 , . . . , Xt ], where βj is an indeterminate over K for each j. 1. Let elementinquotient = Gi . Find an integer α such that Bα ∈ B(≺) and X1 X Bα = LM(Gi ). If there is no such α, then Fi = Gi and the algorithm terminates. 2. Let candidate = X Bα + βα−1 X Bα−1 + · · · + β1 . Let Ej be the remainder on division of Hj × candidate by I + G0 , . . . , Ga−1 . We view Ej as a polynomial in variable X1 , . . . , Xt over the coeﬃcient ﬁeld K(β1 , . . . , βα−1 ). Since the Gr¨ obner basis of I +G0 , . . . , Ga−1 is contained in K[X1 , . . . , Xt ], each coeﬃcient of Ej is a Klinear combination of 1, β1 , . . . , βα−1 . Let (δ1 , . . . , δα−1 ) ∈ K α−1 . The element in L(∞Q) represented by candidate with (β1 , . . . , βα−1 ) replaced by (δ1 , . . . , δα−1 ) belongs to J1 : J2 if and only if Ej with (β1 , . . . , βα−1 ) replaced by (δ1 , . . . , δα−1 ) is zero for j = 1, . . . , b. Thus we consider the linear system of equations in variable β1 , . . . , βα−1 such that all coeﬃcient of Ej is zero for j = 1, . . . , b. If the linear system of equations has no solution, then elementinquotient has minimum pole order 0 at Q among elements in J1 : J2 such that the remainder of 0 by a is i. Thus Fi = elementinquotient, and the algorithm terminates. Else update elementinquotient by candidate with β1 , . . . , βα−1 substituted by a solution of the linear system. Find the integer γ as Bγ = Bα − (1, 0, . . . , 0), update α = γ and repeat this process. If there is no such γ, then Fi = elementinquotient and the algorithm terminates. The number of iteration in the algorithm above to compute each Fi is at most a + # LM(F0 , . . . , Fa−1 + I) \ LM(G0 , . . . , Ga−1 + I) = a + #∆(G0 , . . . , Ga−1 + I) \ ∆(F0 , . . . , Fa−1 + I) = a + dim(J1 : J2 )/J1 ,
280
Ryutaroh Matsumoto and Shinji Miura
where (J1 : J2 )/J1 is the factor space of J1 : J2 modulo J1 . If J1 = L(−A + ∞Q) and J2 = L(−B + ∞Q) with divisors A ≥ B ≥ 0, then dim(J1 : J2 )/J1 = L(B − A + ∞Q)/L(−A + ∞Q) (by Corollary 1) = dim L(∞Q)/L(−A + ∞Q) − dim L(∞Q)/L(B − A + ∞Q) = deg A − (deg A − deg B) (by [7, Exercise 11.13]) = deg B. Remark 1. When one does not need eﬃciency, an ideal quotient can be computed in the standard way described in [5].
Acknowledgments The ﬁrst author would like to thank Mr. Arita at NEC C & C Media Laboratory. He realized Proposition 1 in email discussion with Mr. Arita on his eﬃcient algorithm performing additions in the Jacobian of an algebraic curve [1].
References 1. S. Arita, Publickey cryptosystems with Cab curves (1), Technical Report ISEC9754, Institute of Electronics, Information and Communication Engineers, December 1997, (Japanese). 2. T.G. Berry, Construction of linear systems on hyperelliptic curves, J. Symbolic Comput. 26 (1998), no. 3, 315–327. 3. V.A. Brill and M. N¨ other, Ueber die algebraischen functionen und ihre anwendung in der geometrie, Math. Ann. 7 (1874), 269–310. 4. J. Coates, Construction of rational functions on a curve, Proc. Cambridge Phil. Soc. 68 (1970), 105–123. 5. D. Cox, J. Little, and D. O’Shea, Ideals, varieties, and algorithms, second ed., SpringerVerlag, Berlin, 1996. 6. J.H. Davenport, On the integration of algebraic functions, Lecture Notes in Computer Science, vol. 102, SpringerVerlag, Berlin, 1981. 7. D. Eisenbud, Commutative algebra with a view toward algebraic geometry, Graduate Texts in Mathematics, vol. 150, SpringerVerlag, Berlin, 1995. 8. G.L. Feng and T.R.N. Rao, Decoding algebraic geometric codes up to the designed minimum distance, IEEE Trans. Inform. Theory 39 (1993), 36–47. 9. R. Ganong, On plane curves with one place at inﬁnity, J. Reine Angew. Math. 307/308 (1979), 173–193. 10. D.R. Grayson and M.E. Stillman, User’s manual of Macaulay2 version 0.8.41, http://www.math.uiuc.edu/Macaulay2, March 1998. 11. G. Hach`e and D. Le Brigand, Eﬀective construction of algebraic geometry codes, IEEE Trans. Inform. Theory 41 (1995), no. 6, 1615–1628. 12. M.D. Huang and D. Ierardi, Eﬃcient algorithms for the RiemannRoch problem and for addition in the Jacobian of a curve, J. Symbolic Comput. 18 (1994), 519– 539.
Computing a Basis of L(D)
281
13. D. Le Brigand and J.J. Risler, Algorithme de BrillNoether et codes de Goppa, Bull. Soc. Math. France 116 (1988), no. 2, 231–253. 14. R. Matsumoto, The Cab curve, http://tskwww.ss.titech.ac.jp/~ryutaroh/cab.html, December 1998. 15. R. Matsumoto and S. Miura, On the FengRao bound for the Lconstruction of algebraic geometry codes, submitted to IEICE Trans. Fundamentals, 1999. 16. R. Matsumoto, M. Oishi, and K. Sakaniwa, Fast encoding of algebraic geometry codes, submitted to IEEE Trans. Inform. Theory, 1999. 17. S. Miura, Algebraic geometric codes on certain plane curves, Trans. IEICE J75A (1992), no. 11, 1735–1745 (Japanese). 18. S. Miura, Constructive theory of algebraic curves, Proc. 17th. Symp. Inform. Theory and Its Appl., December 1994, pp. 461–464 (Japanese). 19. ,S. Miura, Ph.D. thesis, Univ. Tokyo, 1997 (Japanese). 20. S. Miura, Linear codes on aﬃne algebraic curves, Trans. IEICE J81A (1998), no. 10, 1398–1421 (Japanese). 21. S.C. Porter, Decoding codes arising from Goppa’s construction on algebraic curves, Ph.D. thesis, Yale Univ., New Heaven, CT, 1988. 22. S.C. Porter, B.Z. Shen, and R. Pellikaan, Decoding geometric Goppa codes using an extra place, IEEE Trans. Inform. Theory 38 (1992), no. 6, 1663–1676. 23. K. Saints and C. Heegard, Algebraicgeometric codes and multidimensional cyclic codes: A uniﬁed theory and algorithms for decoding using Gr¨ obner bases, IEEE Trans. Inform. Theory 41 (1995), no. 6, 1733–1751. 24. H. Stichtenoth, Algebraic function ﬁelds and codes, SpringerVerlag, Berlin, 1993. 25. E.J. Volcheck, Computing in the Jacobian of a plane algebraic curve, Proc. Algorithmic Number Theory I, Lecture Notes in Computer Science, vol. 877, SpringerVerlag, 1994, pp. 221–233. 26. E.J. Volcheck, Addition in the Jacobian of a curve over a ﬁnite ﬁeld, http://acm.org/~volcheck/, 1995. 27. O. Zariski and P. Samuel, Commutative algebra, Graduate Texts in Mathematics, vol. 28,29, SpringerVerlag, Berlin, 1975.
Critical Noise for Convergence of Iterative Probabilistic Decoding with Belief Propagation in Cryptographic Applications Marc P.C. Fossorier1, Miodrag J. Mihaljevi´c2 , and Hideki Imai3 1
2
Department of Electrical Engineering, University of Hawaii 2540 Dole St., Holmes Hall 483, Honolulu, HI 96822, USA marc@spectra.eng.hawaii.edu Mathematical Institute, Serbian Academy of Science and Arts 11001 Belgrade, Yugoslavia emihalje@ubbg.etf.bg.ac.yu 3 University of Tokyo, Institute of Industrial Science 7221, Roppongi, Minatoku, Tokyo 106, Japan imai@iis.utokyo.ac.jp
Abstract. In this paper, the critical noise beyond which no convergence can be expected is determined for iterative decoding with belief propagation of binary linear block codes over the binary symmetric channel. This value is derived developing the self composition channel model ﬁrst introduced for iterative aposterioriprobability decoding. These results are then applied to the cryptanalysis of a keystream generator based on linear feedback shift registers.
1
Introduction
Iterative decoding is a powerful method for eﬃcient decoding of certain block codes. A number of algorithms for iterative decoding of certain binary block codes have been developed and analyzed, noting that some of them are presented and considered in crypto oriented forms. Iterative decoding techniques originated from [6] where Gallager proposed two algorithms for decoding of his low density parity check (LDPC) codes. The ﬁrst one is a simple ﬂipping decoding approach based on the following: ﬂip a bit if the majority of its checks indicates so. In this decoding scheme, the decoder computes all the paritychecks, and then changes any bit that is contained in more than some ﬁxed number of unsatisﬁed paritycheck equations. Using these new values, the parity checks are recomputed, and the process is repeated until either all parity checks are satisﬁed, or a predetermined number of iterations
This research was supported in part by the National Science Foundation under Grant CCR9732959, by the Science Fund of Serbia under Grant 04M02, through the Mathematical Institute, Serbian Academy of Science and Arts and by the Japan Society for Promotion of Science (JSPS) under Contract JSPSRFTF96P00604.
Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 282–293, 1999. c SpringerVerlag Berlin Heidelberg 1999
Critical Noise for Convergence of Iterative Probabilistic Decoding with BP
283
is reached. Simple ﬂipping iterative decoding approach has also been employed and analyzed in [1,19], and in crypto oriented forms in [21]. The second algorithm of [6] is based on the approach now known as Belief Propagation (BP) [16]. BP based iterative decoding of LDPC codes has been recently considered in a number of papers [5,9]. In [9], the iterative decoding is based on the BP algorithm. For each bit, this algorithm iteratively updates the aposteriori probability of error based on the results of the check sums intersecting on that bit and the apriori probabilities of error associated with the bits contributing to these check sums. At the next iteration, these aposteriori probabilities are used to reevaluate all parity checks and become the new apriori probabilities of error. The BP algorithm takes into account and cancels the correlations between probability values introduced by the iterative process. In fact, it would produce the exact posterior probabilities of all the bits if the bipartite graph deﬁned by the parity check matrix of the code contained no cycles [16]. In [5] a reduced complexity BP based iterative decoding algorithm is proposed and applied for decoding LDPC codes. It achieves a good performancecomplexity trade oﬀ. Finally, several results related to the decoding procedures based on aposteriori probability (APP) threshold decoding [11] and the iterative principle presented in [3, pp. 152153] have been reported. Essentially using the underlying ideas from [3,6], a number of iterative errorcorrection decoding algorithms have been developed and analyzed in [4,8,12,14,20], for example, noting that the algorithms in [4,12,14] are presented in cryptooriented forms. These APPbased algorithms are simplier than, but not as eﬃcient as the BPbased algorithms since they neglect the correlations between the values updated by the iterative method. Also an approach applicable for the previous decoding problems has been reported in [10]. This approach is based on approximating the posterior probabilities using a continuous optimization algorithm. As pointed out in [9], the employed iterative procedure is not optimal, but it is practical. Accordingly, reported iterative algorithms for decoding certain binary block codes could be classiﬁed into the following three classes: (i) simple ﬂipping based iterative decoding, (ii) APP based iterative decoding, and (iii) BP based iterative decoding. One of the main issues regarding the iterative decoding is convergence through the iterations. The reported experimental results show that the iterative processing converges with high probability to the true solution, assuming suﬃciently low noise in all three classes of algorithms (i)  (iii). For algorithms from the simple ﬂipping class (i) some analytical results related to the convergence conditions for a successful iterative errorcorrection decoding have been reported in [6,19,21]. Some convergence analysis results related to class of APP algorithms (ii) are reported in [2,12,15]. Analytical convergence consideration of algorithms from BP class (iii) had been mainly based on the result from [16], or on consideration of some simple algorithms as reported in [6,9]. Recently, the convergence of BP based algorithms for LDPC codes has been analyzed in details for various channel models in [7,17].
284
Marc P.C. Fossorier, Miodrag J. Mihaljevi´c, and Hideki Imai
In this paper we consider the convergence of BP based iterative decoding algorithms in conjunction with a cryptographic application. To this end, the critical noise beyond which no convergence can be expected is determined for iterative decoding with BP of binary linear block codes over the binary symmetric channel (BSC). This critical noise value is derived based on the self composition channel model introduced in [15] for APP decoding. These results are then applied to the analysis of certain cryptographic pseudorandom bit generators (for stream ciphers) based on linear feedback shift registers (LFSR’s). The main cryptogoal of this paper is to point out the gain due to the replacement of an APP based decoding procedure by a BP based one assuming that all other parameters of the algorithm for the cryptanalysis are the same.
2
Preliminaries
We consider a binary linear (n, k) block code C with paritycheck matrix H to be used on a BSC with crossover probability p. The eﬀect of a BSC with error probability p is modeled by an ndimensional binary random variable E deﬁned over {0, 1}n with independent coordinates Ei such that Pr(Ei = 1) = p, i = 1, 2, . . . , n. Applying a codeword x = [xi ]ni=1 ∈ C, to the input of the BSC, we get the random variable Y = E ⊕ x as a received word at its output. Let y = [yi ]ni=1 and e = [ei ]ni=1 denote particular values of the random variables (j) Y and E, respectively. Let P(j) = [Pi ]ni=1 denote an error probability vector, (j) with coordinates in [0,1], after the jth iteration step. More precisely, Pi stands for the posterior probability of error for the ith received bit after the jth iteration step. Also, let y(j) denote the modiﬁed word after the jth iteration step. In general, an iterative probabilistic decoding algorithm (IPDA) performs the following steps: 1. Input: received codeword y. (0) 2. Initialization: set Pi = p, i = 1, 2, . . . , n, and y(0) = y. 3. Iterative probabilistic errorcorrection: for j = 1, 2, . . . , jmax : – compute P(j) as the vector of posterior error probabilities using P(j−1) as the vector of prior error probabilities (see equation (2) below), (j) (j) (j−1) (j) (j) – if Pi > 0.5, then set yi = yi ⊕1 and Pi = 1−Pi , i = 1, 2, . . . , n. (j ) max ˆ=y 4. Output: estimated codeword x . The posterior error probabilities are computed by using the appropriate paritychecks which correspond to the codewords from the dual code. We assume that for each bit, the paritychecks used are orthogonal on that bit, meaning that except for that bit, every other involved bit appears in exactly one of the paritychecks. For any i, i = 1, 2, . . . , n, let H(i) be the matrix whose rows are the dual code codewords corresponding to the chosen ith set of paritychecks. Let Ji (w) denote the number of rows in H(i) containing exactly w+1 ones. Given a received codeword y, let si (w) denote the number of satisﬁed (zerovalued) paritychecks among the corresponding Ji (w) paritychecks each involving w + 1 bits and let
Critical Noise for Convergence of Iterative Probabilistic Decoding with BP
285
n−1 si = [si (w)]n−1 w=1 . Let Si = [Si (w)]w=1 denote the corresponding random variable depending on the random variable E. For each i, i = 1, 2, . . . , n, let qi (y) or, simply, qi denote the ratio of posterior error probabilities deﬁned by
qi =
Pr(Ei = 1  H(i)E = H(i) y) . 1 − Pr(Ei = 1  H(i)E = H(i) y)
(1)
Let p = [pi ]ni=1 be the vector of the prior error probabilities. Then for orthogonal paritychecks: pˆ(x ) 1−2σ(y,x ) pi (2) qi = 1 − pi (i) 1 − pˆ(x ) x ∈H
p(x ) = (1 − 2pl ) and the product where for every codeword x from H(i) , 1 − 2ˆ is over all l = i such that xl = 1, and σ(y, x ) is the value of the paritycheck determined by x .
3
Convergence Analysis for Iterative APP Decoding
The iterative APP decoding updates iteratively the values qi for each biti, i = 1, 2, . . . , n. As a result, if q1 , q2 , · · · , qn are the values qi ’s computed at iterationj, we simply substitute P(j) = (q1 /(1 + q1 ), q2 /(1 + q2 ), · · · , qn /(1 + qn )) in the general algorithm presented in Section 2. For simplicity, suppose that the paritycheck numbers Ji (w) are for each w and diﬀerent i mutually equal, that is, Ji (w) = J (w), i = 1, 2, . . . , n, w = 1, 2, . . . , n − 1. This can be obtained by reducing the original numbers to their minimum value which then leads to a conservative estimate of the critical noise rate. In cryptographic applications where we deal with lowrate truncated cyclic codes this can be a very good approximation. Furthermore, assume that all parity checks considered have the same weight, so that J (w) = J . For the convergence analysis, the particular case when pi = p, i = 1, 2, . . . , n, turns out to be of special importance [15]. It corresponds to the ﬁrst iteration step of the algorithm. In this case, for Si = s, qi =
p 1−p
1 + (1 − 2p)w 1 − (1 − 2p)w
J−2s .
(3)
For a given value s, deﬁne the Bayes probability of error for each biti after the ﬁrst iteration step of the decoding algorithm as PB (s) = min{Pr(Ei = 1  H(i) y), 1 − Pr(Ei = 1  H(i)y)}, = min{Pr(Ei = 1  Si = s), 1 − Pr(Ei = 1  Si = s)}, qi if qi ≤ 1, i = 1+q 1 1+qi if qi > 1.
(4)
286
Marc P.C. Fossorier, Miodrag J. Mihaljevi´c, and Hideki Imai
The average Bayes probability of error for each biti after the ﬁrst iteration step of the decoding algorithm is given by PB,AP P (J, w) =
J s=0
PB (s) P (Si = s) = p −
s: qi >1
P (Si = s)
qi − 1 , qi + 1
where qi is given by (3), and J s J J−s P (Si = s) = p pw (1 − pw ) + (1 − p) (1 − pw )s pJ−s w s s
(5)
(6)
with pw = (1 − (1 − 2p)w )/2. Note that if PB (s) = Pr(Ei = 1  H(i) y) for all i = 1, · · · , n, then no a priori decision is modiﬁed and PB,AP P (J, w) = p. As a result, no convergence is possible. It follows from (5) that a necessary condition for convergence is that at least one qi > 1. As a result, (5) suggests an equivalent average BSC with crossover probability PB,AP P (J, w) < p obtained from selfcomposition of the initial BSC. We deﬁne the critical noise value as the noise level associated with the largest crossover probability p so that there exists at least one qi > 1. This crossover probability is deﬁned as the critical probability for iterative APP decoding, and represented by pcrit,AP P . It is important to notice that this deﬁnition simply implies that after iteration1, an average probability PB,AP P (J, w) smaller than p is achieved, but without guarantee that the iterative algorithm will converge to a zero error probability with a suﬃcient number of iterations. As a result, this deﬁnition diﬀers from that of [17] where convergence to an error free channel is considered. In other words, in this paper, the critical probability is deﬁned as the largest p for which a better channel than the original one can be obtained, while in [17], the critical probability is deﬁned as the largest p for which an error free channel can be obtained. Our deﬁnition can be justiﬁed by a crypto oriented motivation as even with residual errors, an information set decoding approach can complete the iterative decoding, resulting in an unsecured crypto system.
4
Convergence Analysis for Iterative BP Decoding
The iterative decoding based on BP updates J + 1 values for each biti, i = 1, 2, . . . , n [9,16]. As for the iterative APP decoding, for each biti, the aposteriori values qi evaluated from the J check sums intersecting on biti are updated iteratively. However, for each check suml, l = 1, 2, . . . , J intersecting on biti, J values ql,i are also updated iteratively. The value ql,i is evaluated via the J − 1 check sums other than check suml intersecting on biti. Therefore, the value ql,i corresponds to the probability that biti is in error, given the information obtained via checks other than checkl. At iterationj, the decision about biti is made based on qi , which has been computed from the values ql,i ’s evaluated at iteration(j − 1). As a result, the BSC obtained by selfcomposition of the
Critical Noise for Convergence of Iterative Probabilistic Decoding with BP
287
original BSC and corresponding to the values ql,i ’s has to be considered after iteration1. To evaluate the value ql,i , the result of check suml is discarded. For s < J , deﬁne c(s) as the event that an unsatisﬁed check suml is discarded and for s > 0, deﬁne c(s − 1) as the event that a satisﬁed check suml is discarded. Let C represent the corresponding random variable. It follows that for each biti and each value of s, 0 ≤ s ≤ J , the ratio of posterior error probabilities qi =
Pr(Ei = 1  Si , C) 1 − Pr(Ei = 1  Si , C)
(7)
can take one of the two possible following values qi,1 qi,2 =
Pr(Ei = 1  Si = s, C = c(s)) p = = 1 − Pr(Ei = 1  Si = s, C = c(s)) 1−p
Pr(Ei = 1  Si = s, C = c(s − 1)) p = 1 − Pr(Ei = 1  Si = s, C = c(s − 1)) 1−p
1 + (1 − 2p)w 1 − (1 − 2p)w 1 + (1 − 2p)w 1 − (1 − 2p)w
J−1−2s (8) , J+1−2s (9) .
Note that (8) is deﬁned for s < J , while (9) is deﬁned for s > 0. For a given value Si = s and a given value C = c, deﬁne the Bayes probability of error for each biti after the ﬁrst iteration step of the decoding algorithm as PB (s, c) = min{Pr(Ei = 1  Si = s, C = c), 1 − Pr(Ei = 1  Si = s, C = c)}, qi if qi ≤ 1, i = 1+q (10) 1 1+qi if qi > 1, where qi is deﬁned in (7). The average Bayes probability of error for each biti after the ﬁrst iteration step of the decoding algorithm is given by PB,BP (J, w) = PB (s, c) P (Si = s, C = c), s
=
J s=0
c
P (Si = s)
J −s s PB (s, c(s − 1)) + PB (s, c(s)) , J J
qi,1 − 1 J −s P (Si = s) J qi,1 + 1 s: qi,1 >1 s qi,2 − 1 + P (Si = s) . J qi,2 + 1 s: q >1
= p−
(11)
i,2
As for (5), a necessary condition for convergence is that for j = 1, 2, there exists at least one qi,j > 1, so that PB,BP (J, w) < p in (11). In [15], it is shown that the necessary condition derived in Section 3 is also suﬃcient for the self composition model (j)
Pi
(j−1)
= Pi
− f(P(j−1) ).
(12)
288
Marc P.C. Fossorier, Miodrag J. Mihaljevi´c, and Hideki Imai
Unfortunately, this result no longer applies for the self composition model (j)
Pi
(0)
= Pi
− f˜(P(j−1) ),
(13)
which has to be considered for iterative BP decoding. This fact also explains the diﬀerence between our deﬁnition of the critical noise value and that of [17]. Note ﬁnally that (12) and (13) are equivalent for j = 1. As in Section 3, (11) is associated with a critical noise value and a corresponding critical probability pcrit,BP for iterative decoding based on BP. In fact, the following theorem shows that (11) can be simply obtained from (5) by considering J − 1 check sums of weight w + 1 each, as expected. Theorem 1. Let PB,AP P (J, w) and PB,BP (J, w) represent the average Bayes probabilities of error for each biti after the first iteration step of iterative APP and BP decodings, respectively. Then PB,BP (J, w) = PB,AP P (J − 1, w).
(14)
Proof. Let s1 ≤ J − 1 represent the largest value of s such that qi,1 > 1 in (11) and deﬁne for 0 ≤ s ≤ s1 , f(J, s) = (qi,1 − 1)/(qi,1 + 1). It follows that for 1 ≤ s ≤ s1 + 1, (qi,2 − 1)/(qi,2 + 1) = f(J, s − 1) and (11) can be rewritten as s s1 +1 1 s J −s PB,BP (J, w) = p −
s=0
=p−
P (Si = s) f (J, s) +
J
s1 J −s s=0
J
s=1
P (Si = s) +
J
P (Si = s) f (J, s − 1)
s+1 P (Si = s + 1) f (J, s). J
After some algebra, we obtain s+1 J −s P (Si = s) + P (Si = s + 1) J J J −1 s J −1 J−1−s =p pw (1 − pw ) + (1 − p) (1 − pw )s pJ−1−s w s s
(15)
(16)
By comparing (15) and (16) with (5) and (6), we conclude that PB,BP (J, w) is equal to the value obtained in (5) by considering J − 1 check sums of weight w + 1 for each bit, which completes the proof. Theorem 1 suggests that iterative APP decoding converges faster than iterative decoding based on BP. However, based on the model considered, we have no guarantee that convergence to the correct solution is achieved. Note ﬁnally that both iterative APP and iterative BP decodings yield the same decision at iteration1, but diﬀerent apriori probabilities are considered to evaluate the aposteriori probabilities which determine the decisions at subsequent iterations. From (14), we readily conclude that pcrit,BP < pcrit,AP P .
(17)
Critical Noise for Convergence of Iterative Probabilistic Decoding with BP
289
Since iteration1 is common to any class of iterative decoding algorithm for the BSC model, pcrit,AP P can be viewed as an upper bound on the critical probability of such decoding methods. Note that this upper bound remains valid for the stronger convergence considered in [17]. Also, since the model considered for self concatenation in Sections 3 and 4 implicitly assumes a bitﬂipping strategy, pcrit,BP can be interpreted as a more realistic estimate for practical BP based iterative decoding schemes. Note that pcrit,BP corresponds to an instance of the model considered in Example 5 of [17]. The values pcrit,AP P and pcrit,BP for diﬀerent values J and w are given in Table 1. As expected, we observe that the Table 1. Values pcrit,AP P and pcrit,BP for diﬀerent values J and w. J w + 1 pcrit,AP P pcrit,BP 3 6 0.091 0.039 4 8 0.080 0.055 5 10 0.071 0.057 3 5 0.127 0.061 4 6 0.124 0.091 3 4 0.191 0.107
values pcrit,AP P slightly overestimate the corresponding values derived in Table 4 of [17], while the values pcrit,BP are close to the values derived in Table 2 of [17].
5
Cryptanalysis of a Keystream Generator with LFSR’s
In this section, we apply the previously obtained results for improving the reported results regarding the analysis of a class of cryptographic pseudorandom bit generators for stream ciphering systems (see [13], pp. 191197 and 205207, for example). A number of the published keystream generators are based on binary LFSR’s assuming that parts of the secret key are used to load the LFSR’s initial states. The unpredictability request, which is one of the main cryptographic requests, implies that the linearity inherent in LFSR’s should not be “visible” in the generator output. One general technique for destroying the linearity is to use several LFSR’s which run in parallel, and to generate the keystream as a nonlinear function of the outputs of the component LFSR’s. Such keystream generators are called nonlinear combination generators. A central weakness of a nonlinear combination keystream generator is demonstrated in [18]. Assuming certain nonlinear functions it is shown in [18] that there is possible to reconstruct independently initial states of the LFSR’s, i.e. parts of the secret key (and accordingly the whole secret as well) based on the correlation between the keystream generator output and the output of each of the LFSR’s. The reported approach is based on exhaustive search through all possible nonzero initial states of each LFSR. Due to the exponential complexity of this approach it is not feasible when the lengths of employed LFSR’s are sufﬁciently long. Substantial improvements of the previous approach which yield
290
Marc P.C. Fossorier, Miodrag J. Mihaljevi´c, and Hideki Imai
complexity linear with the LFSR length are proposed in [12] and [21]. Further extensions and reﬁnements of this approach called fast correlation attack, as well as its analysis are presented in a number of papers including [2,4,10,14]. 5.1
Fast Correlation Attack and Decoding
The problem of the LFSR initial state reconstruction based on the keystream generator output sequence can be considered as decoding a punctured simplex code after a BSC with crossover probability uniquely determined by the correlation between the generator output and the component LFSR output. This correlation means that the modulo2 sum of the corresponding output of the LFSR and the generator output can be considered as a realization of a binary random variable which takes value 0 and 1 with the probabilities 1 − p and p, respectively, p = 0.5. Accordingly, the problem of the LFSR initial state reconstruction given the segment of the generator output can be considered as follows: (1) The nbit segment of output sequence from the klength LSFR is a codeword of an (n, k) punctured simplex code; and (2) The corresponding nbit segment of the nonlinear combination generator output is the corresponding noisy codeword obtained through BSC with crossover probability p. Also, following the approaches from [18] and [12], note that, if the ciphertext only attack is considered, then the inﬂuence of the plaintext in the previous model yields only some increase of the parameter p. The main underlying ideas for the fast correlation attacks are based on the iterative decoding principle introduced in [6]. Accordingly, all the fast correlation attacks mentioned in the previous section could be considered as variants of iterative decoding based on either simple ﬂipping or extensions of the well known APP decoding [11]. Due to the established advantages of BP based iterative decoding over iterative APP, the main objective of this section is to report results of applications of BP based iterative decoding for realizations of the fast correlation attack. 5.2
Belief Propagation Based Fast Correlation Attack
In this section, the advantages of BP based correlation attacks with respect to APP based ones are veriﬁed by simulation. We consider the decoding of the punctured simplex code of length n = 980 and dimension k = 49 deﬁned from its cyclic dual code generated by the primitive polynomial 1 + X 9 + X 49 . It follows that w + 1 = 3, while J varies between 5 and 12 depending on the position of the bit considered. We consider the BSC associated with BPSK transmission over an additive white Gaussian noise (AWGN) channel and hard decision decoding of the received values, so that p = Q( 2Eb /N0 ), where Eb and N0 /2 represent the average energy per transmitted BPSK signal and the variance of the AWGN, respectively. Note that since the keystream sequence is “superposed” to the plaintext sequence during encryption, no redundancy is added to the message text and therefore, no normalization of the transmitted average energy with respect to the channel code rate is required, as opposed to conventional channel
Critical Noise for Convergence of Iterative Probabilistic Decoding with BP
291
coding schemes. Figures 1 and 2 depict the simulation results for iterative APP decoding 0
−0.5
−1 log10(Pe)
* : Iteration 1 + : Iteration 2 . : Iteration 3
−1.5
x : Iteration 5 o : Iteration 20 −2
−2.5 −13
−12
−11
−10
−9 −8 Eb/N0 (in dB)
−7
−6
−5
−4
Fig. 1. Iterative APP decoding of the (980,49) truncated simplex code.
0
−0.5
−1
log10(Pe)
From top to bottom at SNR = −6 dB: −1.5 * : Iteration 1 + : Iteration 2 −2 . : Iteration 3 x : Iteration 5 o : Iteration 10
−2.5
* : Iteration 15 + : Iteration 20 −3
−3.5 −16
−14
−12
−10 Eb/N0 (in dB)
−8
−6
−4
Fig. 2. Iterative BP decoding of the (980,49) truncated simplex code. and BP decoding, respectively. In both cases, a maximum of 20 iterations was performed. In these ﬁgures, the word (or key) error probability is represented as a function of the SNR Eb /N0 in dB. Each key of 49 bits is encoded in systematic form into a codeword of length 980, and at the receiver, the key is retrieved from the decoded codeword based on the systematic encoding. Indeed, the error performance could be further improved by information set decoding methods, but such improvements are beyond the scope of this paper. Based on Figures 1 and 2, we observe that the iterative BP algorithm converges slower than the iterative APP algorithm, but achieves a much better error performance. Figure 3 compares APP and BP iterative decodings after 20 iterations at SNR values for which the key starts having a nonzero probability of being recovered. We observe that this event occurs for SNR ≈ 15 dB for BP decoding and SNR ≈ 12 dB for APP decoding. The corresponding crossover probabilities are p ≈ 0.40
292
Marc P.C. Fossorier, Miodrag J. Mihaljevi´c, and Hideki Imai
and p ≈ 0.36, respectively. For w + 1 = 3, we compute pcrit,BP = 0.454 from (11) and pcrit,AP P = 0.458 from (5) for J = 12, and pcrit,BP = 0.372 and pcrit,AP P = 0.399 for J = 5. Similar results were observed for n = 490, in which case J varies between 4 and 9. For w + 1 = 3, we compute pcrit,BP = 0.437 and pcrit,AP P = 0.444 for J = 9, and pcrit,BP = 0.327 and pcrit,AP P = 0.372 for J = 4, while from simulations, we record p ≈ 0.36 for iterative BP decoding and p ≈ 0.33 for iterative APP decoding. In both cases, the theoretical values provide a relatively good estimate of the critical probabilities for the iterative BP decoding algorithm. 0
−0.005
log10(Pe)
−0.01 o : APP decoding + : BP decoding −0.015
−0.02
−0.025 −16
−15
−14
−13 −12 Eb/N0 (in dB)
−11
−10
−9
Fig. 3. Iterative APP and BP decodings of the (980,49) code after 20 iterations.
6
Concluding Remarks
In this paper, characteristics of the BP based iterative decoding assuming very high noise have been considered. In particular, the expected noise value beyond which the iterative BP decoding is not feasible for the BSC model considered has been derived. The established results have been applied to improve a cryptanalytic technique: the fast correlation attack. We have showed that under the same conditions of realization for fast correlation attacks, BP based iterative decoding provides signiﬁcant improvements upon APP based iterative decoding.
Acknowledgments The authors wish to thank one reviewer for pointing out the work of [7,17], as well as Tom Richardson and Rudiger Urbanke for several interesting communications which enlighted some interpretations of our results.
References 1. A.M. Chan and F.R. Kschischang, “A simple taboobased softdecision decoding algorithm for expander codes,” IEEE Commun. Lett., vol. 2, pp. 183185, 1998.
Critical Noise for Convergence of Iterative Probabilistic Decoding with BP
293
2. V. Chepyzhov and B. Smeets, “On fast correlation attack on certain stream ciphers,” Advances in Cryptology  EUROCRYPT ’91, Lecture Notes in Computer Science,vol. 547, pp. 176185, 1991. 3. G. C. Clark, Jr. and J. B .Cain, ErrorCorrecting Coding for Digital Communications. New York: Plenum Press, 1982. 4. A. Clark, J. Dj. Goli´c, and E. Dawson, “A comparison of fast correlation attacks,” Fast Software Encryption  Cambridge ’96, Lecture Notes in Computer Science, vol. 1039, pp. 145157, 1996. 5. M.P.C. Fossorier, M. Mihaljevi`c and H. Imai, “Reduced complexity iterative decoding of low density parity check codes based on belief propagation,” IEEE Trans. Commun., vol. 47, pp. 673680, May 1999. 6. R. G. Gallager, “Lowdensity paritycheck codes,” IRE Trans. Inform. Theory, vol. IT8, pp. 2128, Jan. 1962. 7. M.G. Luby, M. Mitzenmacher, M.A. Shokrollahi and D.A. Spielman, “Analysis of Low Density Codes and Improved Designs Using Irregular Graphs,” 30th ACM STOC, May 1998. 8. R. Lucas, M. Bossert, and M. Breitbach, “On Iterative softdecision decoding of linear binary block codes and product codes,” IEEE Jour. Select. Areas Commun., vol. 16, pp. 276298, Feb. 1998. 9. D.J.C. MacKay, “Good errorcorrecting codes based on very sparse matrices,” IEEE Trans. Inform. Theory, vol. 45, pp. 399431, Mar. 1999. 10. D.J.C. MacKay, “Free Energy Minimization Algorithm for Decoding and Cryptanalysis,” Electronics Lett., vol. 31, pp. 446447, Mar. 1995. 11. J. L. Massey, Threshold Decoding. Cambridge, MA: MIT Press, 1963. 12. W. Meier and O. Staﬀelbach, “Fast correlation attacks on certain stream ciphers,” Journal of Cryptology, vol. 1, pp. 159176, 1989. 13. A. Menezes, P.C. van Oorschot and S.A. Vanstone, Handbook of Applied Cryptography. Boca Raton: CRC Press, 1997. 14. M. Mihaljevi´c and J. Goli´c, “A comparison of cryptanalytic principles based on iterative errorcorrection,” Advances in Cryptology  EUROCRYPT ’91, Lecture Notes in Computer Science, vol. 547, pp. 527531, 1991. 15. M. Mihaljevi´c and J. Goli´c, “A method for convergence analysis of iterative errorcorrection decoding,” ISITA 96  1996 IEEE Int. Symp. Inform. Theory and Appl., Canada, Victoria, B.C., Sep. 1996, Proc. pp. 802805 and submitted to IEEE Trans. Inform. Theory, Apr. 1997. 16. J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Mateo, CA: Morgan Kaufmann, 1988. 17. T. Richardson and R. Urbanke, “The Capacity of LowDensity Parity Check Codes under MessagePassing Decoding,” submitted to IEEE Trans. Inform. Theory, Nov. 1998. 18. T. Siegenthaler, “Decrypting a class of stream ciphers using ciphertext only,” IEEE Trans. Comput., vol. C34, pp. 8185, 1985. 19. M. Sipser and D.A. Spielman, “Expander codes,” IEEE Trans. Inform. Theory, vol. 42, pp. 17101722, Nov. 1996. 20. K. Yamaguchi, H. Iizuka, E. Nomura and H. Imai, “Variable Threshold Soft Decision Decoding,” IEICE Trans. Elect. and Commun., vol. 72, pp. 6574, Sep. 1989. 21. K. Zeng, C.H. Yang and T.R.N. Rao, “An improved linear syndrome method in cryptanalysis with applications,” Advances in Cryptology  CRYPTO ’90, Lecture Notes in Computer Science, vol. 537, pp. 3447, 1991.
An Authentication Scheme over Nonauthentic Public Channel in InformationTheoretic SecretKey Agreement Shengli Liu and Yumin Wang National Key Laboratory on ISN, Xidian University Xi’an, 710071, People’s Republic of China shengliliu@hotmail.com, ymwang@xidian.edu.cn
Abstract. It is necessary to authenticate the messages over an insecure and nonauthentic public channel in informationtheoretic secretkey agreement. For a scenario where all three parties receive the output of a binary symmetric source over independent binary symmetric channels as their initial information, an authentication scheme is proposed based on coding theory, which uses the correlated strings between the two communicants to authenticate the messages over the public channel. How to select coding parameters to meet the safety requirements of the authentication scheme is described. This paper also illustrates with an example that when the adversary’s channel is noisier than the communicants’ channels during the initialization phase, such an authentication scheme always exists, and the lower bound of the length of authenticators is closely related to the safety parameters, code rate of the authentication scheme and bit error probabilities of independent noise channels in the initialization phase.
1
Introduction
In the past few years, the informationtheoretic secretkey agreement protocols [1][2][4][5][6][7] secure against adversaries with inﬁnite computing power have attracted much attention, for those protocols discard the unproven assumptions on the hardness of certain computational problems such as the discrete logarithm or the integer factoring problem which are essential to publickey cryptographic protocols. Informationtheoretic secretkey agreement over authentic public channel takes place in a scenario where two parties Alice and Bob who want to generate a secret key, have access to random variables X and Y , respectively, whereas the adversary Eve knows a random variable Z. The three random variables X, Y , and Z are distributed according to distribution PXY Z , and a key agreement protocol consists of three phases: Advantage distillation, Alice and Bob use their advantage over Eve oﬀered by the authenticity of the public channel to generate an advantage over Eve in terms of their knowledge about each other’s information; Information reconciliation, Alice and Bob agree on a
This work was supported by National Natural Science Foundation of China.
Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 294–301, 1999. c SpringerVerlag Berlin Heidelberg 1999
An Authentication Scheme over Nonauthentic Public Channel
295
mutual string S by using errorcorrection techniques; Privacy amplification, the partially secret S is transformed into a shorter, highly secret string S . Bennett et.al. [6] have shown that the length of S can be nearly the R´enyi entropy of S when given Eve’s complete knowledge Z = z and the entire communication held over the public channel. Most of the protocols assume the existence of an insecure and authentic channel[2][4][5][6][7], which means that Eve can eavesdrop on the communication between Alice and Bob, but she can not modify or introduce fraudulent messages over the channel without detection. However, the existing public channels are usually nonauthentic as well as insecure. In other words, Eve can see every message and replace it by an arbitrary message of her choice, and she even may impersonate either party by fraudulently initiating a protocol execution. So it is necessary to authenticate the public discussions in a secret key agreement over a nonauthentic channel. For a scenario where all three parties receive the output of a binary symmetric source over independent binary symmetric channels as their initial information, an authentication scheme is proposed based on coding theory, which uses the correlated strings between the two communicants for authentication and makes informationtheoretic secretkey agreement against active adversaries possible.
2
SecretKey Agreement and the Scenario Employed
Generally, a keyagreement protocol consists of three phases[1]: – An initialization phase in which Alice, Bob and Eve receive random variables X,Y and Z, respectively, which are jointly distributed according to some probability distribution PXY Z . – During the communication phase Alice and Bob alternate sending each other messages M1 , M2 , . . . , Mt , Where Alice sends M1 , M3 , . . . and Bob sends M2 , M4 , . . .. Each message depends possibly on the sender’s entire view of the protocol at the time it is sent and possibly on privately generated random bits. Let t be the total number of messages and let Mt = [M1 , . . . , Mt ] denote the set of exchanged messages. – Finally, Alice and Bob each either accepts or rejects the protocol execution, depending on whether they believe to be able to generate a secret key. If Alice accepts, she generates a key S depending on her view of their protocol. Similarly, if Bob accepts, he generates a key S depending on his view of the protocol (maybe with the help of privacy ampliﬁcation techniques). We consider the following scenario(see Figure 1) which is inspired by[1] in this paper. 1. Initialization phase: A source (maybe a satellite) broadcasts random bits U n = (U0 , U1 , . . . , Un−1 ) through three independent binary symmetric channels CA , CB and CE with bit error probabilities A , B and E to Alice, Bob and Eve, and they receive random variable X n = (X0 , X1 , . . . , Xn−1 ), Y n =
296
Shengli Liu and Yumin Wang
(Y0 , Y1 , . . . , Yn−1 ), and Z n = (Z0 , Z1 , . . . , Zn−1 ) respectively, which means PUi (0) = PUi (1) = 0.5, PXi Yi Zi Ui = PXi Ui ·PYi Ui ·PZi Ui , i = 0, 1, . . . , n−1, n−1 PXY Z (xi , yi , zi ) and PX n Y n Z n (x0 , . . . , xn−1 , y0 , . . . , yn−1 , z0 , . . . , zn−1 ) = i=0
2. Communication phase: Alice and Bob interchange messages over a public channel. We assume that the public channel is an ideal noiseless channel, but it is insecure and nonauthentic. 3. Decision phase: Alice and Bob each either accepts or rejects the protocol execution. If both of them accept the results, they generate the ﬁnal secret key.
Xn (Alice) ✛
CA ✛
✻
Source Un
✲ CB
Y n✲
(Bob)
✻
❄ CE Zn
❄
(Eve)
✲
✻ ❄ Public Channel
✛
Fig. 1. The scenario in the informationtheoretic secretkey agreement.
The above scenario is a special case of the general key agreement protocol, which is well motivated by models such as discrete memoryless sources and channels previously considered in information theory. Such a scenario is relatively easy to analyze, and its result will be helpful to ongoing research. It is shown in [2] that such a secret key agreement over authentic public channel is possible even when Eve’s channel is superior to the other two channels, i.e. A ≥ E or B ≥ E . But M.Maurer proved recently that such a secret key agreement over nonauthentic public channel is only possible when Eve’s channel is inferior to the other two channels[1]. For this reason we assume that A < E and B < E in this paper.
3
The Authentication Scheme
Some concepts and facts about coding theory are introduced ﬁrst.
An Authentication Scheme over Nonauthentic Public Channel
297
Definition 1. [1] 01 distance from a codeword c1 to a codeword c2 , denoted by d(c1 → c2 ), is deﬁned as the number of transitions from 0 to 1 when going from c1 to c2 , not counting the transitions from 1 to 0. The 01 distance of two codewords is diﬀerent from the Hamming distance and it is not symmetric, i.e. d(c1 → c2 ) = d(c2 → c1 ) in general. Definition 2. The minimum 01 distance of a code C, denoted by d0→1 (C), is deﬁned as the smallest value among the distances of any two diﬀerent codewords in C, i.e. d0→1 (C) = min d(ci → cj ), ci , cj ∈ C. i,j,i=j
The minimum 01 distance of any conventional linear code is 0 for the existence of zerocode. Lemma 1. Every conventional linear code of length n with minimum Hamming distance d can be converted to a code of length 2n with minimum 01 distance d by replacing every bit in the original code by pair of bits, namely by replacing 0 by 01 and 1 by 10. The authentication scheme is described as follows: Prerequisite: Alice, Bob and Eve obtain initial information (X0 , X1 , . . .), (Y0 , Y1 , . . .) and (Z0 , Z1 , . . .) over independent binary symmetric channels with bit error probabilities A , B and E respectively, where A < E and
B < E . Safety parameters: s, s > 1. Authentication performance: The receiver accepts the sender’s legitimate messages with probability at least 1 − 1/s2 while rejects Eve’s fraudulent messages with probability at least 1 − 1/s 2 . Protocol: 1. Let AB = A + B − 2 A B , BE = B + E − 2 B E , AE =
A + E − 2 A E . Choose N and d from a set (N, d) satisfying
N∆1 ≥ [s (1 − AB − min( BE , AE ))/2]2 s N∆1 + s (N − d)∆1 + d∆2 ≤ d( BE − AB ) and
s N∆1 + s (N − d)∆1 + d∆3 ≤ d( AE − AB )
(1) (2) (3)
where ∆1 = AB (1 − AB ), ∆2 = BE (1 − BE ) and ∆3 = AE (1 − AE ), and make sure that a (N, K, d) linear code C exists for some integer K. It should be noted that if N is large enough, such a (N, K, d) linear code always exists, a fact that will be shown in the next section. Parameters N, K, d and corresponding coding rules are public. 2. Convert the (N, K, d) linear code C to the corresponding 01 code C of length 2N with minimum 01 distance d, using the method provided by lemma 1. 3. Every time the sender sends a Kbit message to the receiver, she appends a N bit authenticator to the message. Without loss of generality, we assume that Alice wants to send a message M = (M0 , M1 , . . . , MK−1 ) to
298
Shengli Liu and Yumin Wang
Bob. The authenticator must be chosen according to the following rules: First of all, Alice encodes the message and gets the linear codeword C = (C0 , C1 , . . . , CN −1 ) according to the coding rules of the (N, K, d) linear code. Then she ﬁnds out the corresponding 01 codeword C = (C0 , C1 , . . . , C2N −1 ) from C . If Ci1 = Ci2 = . . . = CiN = 1 in the codeword, then she selects (Xi1 , Xi2 , . . . , XiN ) from (X0 , X2 , . . . , X2N −1 ) as the authenticator. Finally she appends (Xi1 , Xi2 , . . . , XiN ) to (M0 , M1 , . . . , MK−1 ) and sends it to Bob. 4. When Bob receives the message, he also gets the corresponding 01 code word C = (C0 , C1 , . . . , C2N −1 ) from (M0 , M1 , . . . , MK−1 ). Similarly, he determines (Yi1 , Yi2 , . . . , YiN ) from (C0 , C1 , . . . , C2N −1). Then he compares (Yi1 , Yi2 , . . . , YiN ) with the authenticator (Xi1 , X i2 , . . . , XiN ), if the number of diﬀerent bits, say x, is less than N AB + s N AB (1 − AB ), he accepts the message or he rejects it. 5. (Xi1 , Xi2 , . . . , XiN ) and (Yi1 , Yi2 , . . . , YiN ) should be discarded from the initial correlated strings after each message transmission, and never used again for any purpose. Remark. The required authentication performance can be accomplished by choosing proper safety parameters. Theorem 2. The above authentication scheme ensures that the receiver accepts sender’s legitimate messages with probability at least 1−1/s2 while rejects Eve’s fraudulent messages with probability at least 1 − 1/s 2 . Proof. Let AB = A + B − 2 A B be the bit error probability between corresponding bits of Alice’s and Bob’s strings, and AE = A + E − 2 A E and
BE = B + E − 2 B E be the bit error probabilities between corresponding bits of Alice’s and Eve’s and between Bob’s and Eve’s strings, respectively. We still assume Alice is the sender(When Bob is the sender, BE should be changed into
AE in the following proof). If the message sent by Alice is not modiﬁed by Eve, the subscripts of (Yi1 , Yi2 , . . . , YiN ) determined by Bob will be consistent with those of (Xi1 , Xi2 , . . . , XiN ) received by Bob since the public channel is noiseless. Let x denote the number of diﬀerent bits between them, the expected value and the standard deviation of x are µ = N AB and σ=
N AB (1 − AB ).
Since Bob accepts message only when x < N AB + s N AB (1 − AB ), we get the following result from the Chebyshev inequality Pr {x − µ < sσ} > 1 − σ 2 /(sσ)2 =⇒ Pr {x − N AB  < s N AB (1 − AB ) > 1 − 1/s2
An Authentication Scheme over Nonauthentic Public Channel
299
=⇒ Pr {x < N AB + s N AB (1 − AB ) > 1 − 1/s2 , which means that Bob accepts legitimate messages with probability at least 1 − 1/s2 . When Eve has intercepted a message together with its authenticator (M0 , M1 , . . . , MK−1 )(Xi1 , Xi2 , . . . , XiN ), her best strategy for creating a new authenticator for a diﬀerent message M = (M0 , M1 , . . . , MK−1 )(hoping that it will be accepted by Bob) is to copy those bits from the received authenticator that are also contained in the new authenticator and to take as guesses for the remaining bits her copies of the bits in (Z0 , Z2 , . . . , Z2N −1 ), introducing bit errors in those bits with probability BE . The maximal probability of successful deception is hence determined by the number l of bits that Eve must guess and the total number N of bits in the forged authenticator. When Eve tries to deceive Bob, the expected value and the standard deviation of the bits in the forged authenticator that disagree with Bob’s corresponding bits are
µ = (N − l) AB + l BE and
σ =
(N − l) AB (1 − AB ) + l BE (1 − BE ) = (N − l)∆1 + l∆2 ,
where ∆1 = AB (1 − AB ) and ∆2 = BE (1 − BE ). In fact, the 01 distance from a codeword C1 to a codeword C2 is the number of bits that Eve must guess when trying to convert the authenticator corresponding to C1 into the authenticator corresponding to C2 . Since the minimum 01 distance of the code C is d, we obtain l ≥ d, which means that Eve must guess at least d bits to forge the authenticator. It is easy to prove that if
N∆1 ≥ [s (1 − AB − BE )/2]2 , the derivative function of f(x),
f(x) = ( BE − AB )x − s N∆1 − s (N − x)∆1 + x∆2 ,
is nonnegative, i.e. f (x) ≥ 0, when x ≥ 0. So we get f(l) ≥ f(d). The scheme assumes that d( BE − AB ) ≥ s N∆1 + s (N − d)∆1 + d∆2 holds, so
l( BE − AB ) ≥ s N∆1 + s (N − l)∆1 + l∆2
also holds, which means µ − µ ≥ sσ + s σ . Let x still denote the number of diﬀerent bits between the authenticator received by Eve and that forged by Eve. From the Chebyshev inequality we know
Pr {x − µ  < s σ } > 1 − σ 2 /(s σ )2 =⇒ Pr {x > µ − s σ } > 1 − 1/s 2 .
Since µ − s σ ≥ µ + sσ, we get Pr {x > µ + sσ} > 1 − 1/s 2 , i.e. Pr {x > N AB + s N AB (1 − AB ) > 1 − 1/s 2 .
Therefore Bob rejects fraudulent messages with probability at least 1 − 1/s 2 .
300
4
Shengli Liu and Yumin Wang
The Existence of the Authentication Scheme
In this section, we shows with an example that as long as A < E and B < E , it is possible to ﬁnd a proper linear code to implement the above scheme and accomplish the required authentication performance. We take extended ReedSolomon codes over a ﬁnite ﬁeld GF (2r ) [3] as an example to illustrate that it is possible to ﬁnd a proper linear code to implement the authentication scheme and accomplish the required authentication performance as long as A < E and B < E . Let N = 2r be the code length, and let information digit K = c · N where 0 < c < 1, then the minimum Hamming distance is d = (1 − c) · N + 1. When the extended ReedSolomon code is converted to the 01 code, we know that the minimum 01 distance is still d. Substituting d = (1 − c) · N + 1 to the inequality (2), we obtain ((1 − c)N + 1) · ( BE − AB ) ≥ s N∆1 + s N [c∆1 + (1 − c)∆2 ] + ∆2 − ∆1
(4)
N 0 (bit)
It is obvious that there exists a N0 to make both (1) and (4) hold for all integers N , N ≥ N0 . Therefore, we can always ﬁnd an extended RS code (N, cN, (1 − c)N + 1) to implement the authentication scheme, and the code rate is K/(K + N ) = c/(c + 1)(see Figure 2).
25000
K/(K+N)=1/3, εΑ=0.1, εΒ=0.02, εΞ=0.3 K/(K+N)=1/9, εΑ=0.1, εΒ=0.02, εΞ=0.3 K/(K+N)=1/3, εΑ=0.01, εΒ=0.02, εΞ=0.3 K/(K+N)=1/9, εΑ=0.01, εΒ=0.02, εΞ=0.3
Lower bound of authenticator length
20000
15000
10000
5000
0 2
3
4 '
5 '
Safety parameters: s ,s (s = s )
Fig. 2. The lower bound of length of authenticators N0 as function of safety parameters s, s for code rate K/(K + N ) = 1/3 and 1/9 with the bit error rate
A , B , E = 0.01, 0.02, 0.3 and 0.1,0.02,0.3.
We assume that s = s for simplicity. The dashed lines denote the case when A = 0.01, B = 0.02 and E = 0.3 while the solid lines denote the case
An Authentication Scheme over Nonauthentic Public Channel
301
when A = 0.1, B = 0.02 and E = 0.3. It is evident that the lower bound of authenticators N0 grows with increasing s, s , decreases with decreasing code rate K/(K + N ), and the larger E / A or E / B , the smaller N0 is.
5
Conclusion
When two communicants and an adversary obtain correlated information through independent binary symmetric channels from a random source, and the adversary’s channel is noisier than those of communicants, informationtheoretic secretkey agreement secure against active adversaries is always possible since an authentication scheme based on coding theory can always be implemented at the required safety level with the help of correlated strings between the two communicants. The authentication scheme based on extended RS code is simulated and the result shows that the lower bound of the length of authenticator is closely related to safety parameters, code rate and the bit error rates of the channels. Although a linear code satisfying the requirements can always be found, the scheme proposed in this paper may be not very practical since the authenticator may be too long or the code rate may be too low. So how to design a practical authentication scheme with high code rate and moderate authenticator length remains an open problem.
Acknowledgments The authors thank Dr. Wenping Ma for interesting discussions on the subject, and the referees for their valuable comments.
References 1. Maurer, U.: Informationtheoretically Secure Secretkey Agreement by NOTauthenticated Public Discussion. In: Walter Fumy(Ed.): Advances in CryptologyEUROCRYPT’97, Lecture Notes in Computer Science, Vol. 1233. SpringerVerlag, Berlin Heidelberg New York (1997) 209225 2. Maurer, U.M.: Secret Key Agreement by Public Discussion from Common Information. IEEE Transaction on Information Theory. 39(3) (1993) 733742 3. Blahut, R.E.: Theory and Practice of Error Control Codes, Reading, MA: AddisonWesley. (1983) 4. Cachin,C., Maurer,U.: Linking Information Reconciliation and Privacy Amplification. J. Cryptol. 10 (1997) 97110 5. Benett, C.H., Bessette, F., Brassard, G., Salvail,L., Smolin, J.: Experimental Quantum Cryptography. J.Cryptol. 5(1) (1992) 328 6. Bennett, C.H., Brassard, G., Crepeau,C., Maurer, U.M.: Generalized privacy amplification. IEEE Transaction on Information Theory. 41(6) (1995) 19151923 7. Brassard, G., Salvail, L.: Secretkey reconciliation by public discussion. In: Tor Helleseth(Ed.): Advances in CryptologyEurocrypt’93. Lecture Notes in Computer Science, Vol. 765. SpringerVerlag, Berlin Heidelberg New York (1994) 410423
A Systolic Array Architecture for Fast Decoding of OnePoint AG Codes and Scheduling of Parallel Processing on It Shojiro Sakata and Masazumi Kurihara The University of ElectroCommunications Department of Information and Communication Engineering 151 Chofugaoka Chofushi, Tokyo 1828585, Japan
Abstract. Since before we have proposed a systolic array architecture for implementing fast decoding algorithm of one point AG codes. In this paper we propose a revised architecture which is as its main framework a onedimensional systolic array, in details, composed of a threedimensional arrangement of processing units called cells, and present a method of complete scheduling on it, where not only our scheme has linear time complexity but also it satisfies restriction to local communication between nearest cells so that transmission delay is drastically reduced.
1
Introduction
Onepoint codes from algebraic curves are a class of algebraic geometry (AG) codes which are important because not only they have potentially better performance for longer code lengths than the conventional codes such as BCH codes and RS codes, but also they can be decoded eﬃciently. We have given several versions [1][2][3][4] of fast decoding methods for them based on the BerlekampMasseySakata (BMS) algorithm [5][6]. As the second step toward practical use of onepoint AG codes, we must devise eﬃcient hardware implementation of the decoding method. In the setting, the vector version [4] of decoding method is suitable for parallel processing by nature. In this paper, we propose a special kind of systolic array architecture for its parallel implementation. This is a revision of our papers [7][8] partly presented at ISIT95 and at ISIT97 as well as a result of these previous trials. K¨ otter [9][10] gave a parallel architecture of his fast decoding method, which has a form nearer to the onedimensional BerlekampMassey (BM) algorithm in comparison with our decoding method relying deeply on the multidimensional BMS algorithm. His architecture consisting of a set of feedback shift registers is an extension of Blahut’s implementation [11] of the BM algorithm. The shift registers have nonlocal links among delay units, which are not desirable because they give rise to long delay in transmitting data necessary for computations of the BM algorithm. Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 302–313, 1999. c SpringerVerlag Berlin Heidelberg 1999
A Systolic Array Architecture for Fast Decoding of OnePoint AG Codes
303
Instead of shift register, we propose a systolic array architecture having only local links between neighboring cells (processing units). The main framework of this systolic array is onedimensional, and it consists of a series of pipelined processors along which all data are conveyed. But, since each processor is composed of a twodimensional arrangement of cells, the detailed structure is threedimensional. We assume that an array of size m is given as a set of input data, which is usually a syndrome array obtained from a given received word, where the integer m is comparable to code length. The input data are fed component by component into the cells constituting the ﬁrst or leftmost processor of our systolic array. Then, all the relevant data are processed and transmitted component by component from the cells constituting each processor to those of its rightneighboring processor, and so on. As the output from the cells of the last or the rightmost processor, we can get the necessary data for decoding, in particular, the coeﬃcients of polynomials composing a basis of the module of error locator functions, whose common zeros coincide with the error locators. Our main concern is in how to give a complete schedule of all the operations of cells constituting our systolic array, where we assume that each cell can compute by itself a small piece of intermediate values necessary for decoding and communicate the relevant data with its neighboring cells. The important issues are synchronization of all the operations of cells and local communication of the data through the links existing only between nearest cells (without any nonlocal feedback link). This is an extension of our previous work [12] on a systolic array architecture for implementing the BM algorithm of decoding RS codes. The present situation is more complex because of the multidimensional character of our fast decoding algorithm for onepoint AG codes. By this scheme we can get much reduction in time complexity of decoding with some amount of space complexity, i.e. a number of cells, compared with the serial scheme. Furthermore, our parallel scheme is more eﬃcient than K¨otter’s, in particular, w.r.t. time complexity, where ours O(m) is much better than his O(m2 ) for the input data size m.
2
Preliminaries
In this paper we use the same terminologies as in the reference[4] except for some modiﬁcations and additions. As preliminaries, we give a brief sketch of concepts and symbols appearing in the subsequent sections together with short descriptions of correspondences between the present and the previous symbols. Some important symbols used in [4] and their correspondences to the present ones are explained in brackets [ ]. A onepoint code from an algebraic curve X over a ﬁnite ﬁeld K = F q is deﬁned from a prescribed point Q on the curve X , the set P of Krational points excluding Q, an aﬃne ring K[X ] = K[φ1 , · · · , φN ]/I and a nonnegative integer m, where K[X ] is the set of all algebraic functions having no pole except at the point Q, and the functions φi , 1 ≤ i ≤ N , constitute its basis. The
304
Shojiro Sakata and Masazumi Kurihara
ring K[X ] is a Klinear space denoted as K[Σ] which is spanned by a set of N functions {xp (:= i=1 φpi i )p = (pi )1≤i≤N ∈ Σ}, where Σ is a subset of the direct product ZN 0 of the set Z0 of nonnegative nintegers. Speciﬁcally, such a code is deﬁned as C := {c = (cj )1≤j≤n ∈ K n  j=1 cj f(Pj ) = 0, f ∈ L(mQ)} for a subset L(mQ), which consists of algebraic functions f ∈ L(∞Q)(:= K[X ]) of pole order o(f) ≤ m, m ∈ Z0 , where P = {P1 , · · · , Pn }. The set O of all possible pole orders o(xp ) of functions xp corresponding to p ∈ Σ coincides with the set of all pole orders o(f) of functions f ∈ K[X ], which is a semigroup with identity 0 w.r.t. addition. Its elements are numbered in the increasing order, i.e. O = {ol l ∈ Z0 }, where o0 = 0 < o1 < o2 < · · ·. The ﬁrst nonzero pole order o1 is denoted as ρ, and a function x ∈ K[Σ] having o(x) = ρ is ﬁxed. Then, the set K[Σ] becomes a K[x]module over the univariate 0 polynomial ring K[x] provided x is regarded as an independent variable. [x = xb for a certain b0 ∈ Σ.] A basis {o(i) 1 ≤ i ≤ ρ} of the semigroup O is given by o(i) := min{ol ∈ Ool ≡ i − 1 mod ρ}, 1 ≤ i ≤ ρ, in particular o(1) = 0. [The semigroup Σ (w.r.t. vector addition) called a cylinder is identical with the union i ∪ρi=1 {bi + kb0 k ∈ Z0 } for a ρtuple bi , 1 ≤ i ≤ ρ, such that o(xb ) = o(i) , 1 ≤ i ≤ ρ.] In the context of decoding, an error syndrome array u = (ul ), l ∈ Z0 , is introduced. [On the cylinder Σ, the syndrome array is given as u = (up ), p ∈ Σ, where up = ul for o(xp ) = ol .] Furthermore, a syndrome array vector u = (i) (u(i)), 1 ≤ i ≤ ρ, accompanied with ρ component arrays u(i) = (ujk ), (j, k) ∈ Σ (i), is considered, where Σ (i) is a subset of the direct product P × Z0 for P := {1, · · · , ρ} ⊂ Z0 . [The array vector u corresponds to a syndrome array u = (up ), p ∈ 2Σ, deﬁned on the double cylinder 2Σ := {p + qp, q ∈ Σ} ⊂ ZN 0 , which is represented as an array vector u = (ui ), 1 ≤ i ≤ ρ, with component (i) array ui = (uip ), p ∈ Σ in [4], such that uip = ubi +p , p ∈ Σ, In fact, ujk =
).] As K[Σ] is a residue class ring modulo an ideal I, the ubi + bj + k b0 (= ui j b + k b0 (i) array components ujk having the same value of o¯(i) (j, k) := o(i) + o(j) + kρ ∈ O are pairwise dependent through Klinear dependence of functions xp , p ∈ 2Σ, having the same pole order o(xp ) = ol modulo the linear subspace xq o(xq ) < ol , q ∈ Σ. Thus, we introduce a pair of mappings η(l, i): Z0 × P −→ P and κ(l, i): Z0 ×P −→ Z0 deﬁned by j = η(l, i), k = κ(l, i) if and only if there exists a pair (j, k) such that ol = o¯(i) (j, k), i.e. (j, k) ∈ Σ (i) ; otherwise we denote η(l, i) = κ(l, i) = ∅, where η(l, i) = (ol −i+1 mod ρ)+1, and κ(l, i) = (ol −o(i) −oη(l,i)))/ρ (i) if η(l, i), κ(l, i) = ∅. In fact, we know only the values ujk from a given received word such that o(i) (j, k) ≤ m. Example 1 Throughout this paper we use the following code accompanied with a set of instance data to illustrate our method. We consider onepoint codes C (of codelength n = 64) over K = F 24 from the Hermitian curve X : X 5 + Y 4 Z + Y Z 4 = 0 having genus g = 6. For the point Q = (0 : 1 : 0), the linear space L(∞Q) is spanned by {xi y j 0 ≤ i, 0 ≤ j ≤ 3}, where the functions x := Z/X
A Systolic Array Architecture for Fast Decoding of OnePoint AG Codes
305
and y := Y /X have pole order o(x) = 4 and o(y) = 5, respectively. Thus, the cylinder Σ = {(i, j) ∈ Z20 j ≤ 3} onetoone corresponds to the semigroup of pole orders O = {o(xi y j )(i, j) ∈ Σ} = {0, 4, 5, 8, 9, 10, 12, 13, 14, 15, 16, 17, · · ·}, where the first pole order is ρ = 4(= o(x)), and o(1) = 0(= o(1)), o(2) = 5(= o(y)), o(3) = 10(= o(y 2 )), o(4) = 15(= o(y 3 )). For an Hermitian code over F 24 defined by L(23Q), the values of o¯(i) (j, k) := o(i) +o(j) +kρ are shown in Table 1, and the corresponding functions η(l, i) and κ(l, i) are shown in the lefthalf of Table 2, where the symbol − means the empty value ∅. (All the tables are in Appendix.) The information of error locations is contained in a set of locator functions f ∈ K[Σ]. As any function in the module K[Σ] can be expressed uniquely as j ρ sj a form f = j=1 fj xb with fj = k=0 fjk xk ∈ K[x] of degree deg(fj ) = sj (fj,sj = 0), 1 ≤ j ≤ ρ, it is represented by the corresponding polynomial vector f = (fj ), 1 ≤ j ≤ ρ. The head number HN(f ) (or HN(f)) and the head exponent HE(f ) (or HE(f)) are deﬁned from the pole order o(f), w.r.t. a ρtuple of prescribed weight vectors w i , 1 ≤ i ≤ ρ. In this paper, we restrict ourselves to the case of w i = bi , 1 ≤ i ≤ ρ. The exact error locations, which constitute a subset E of the set P, are obtained as the zeros of a socalled error locator module M(E) := {f ∈ K[Σ]f(P ) = 0, P ∈ E}. Thus, we inquire a (i) ρtuple of polynomial vectors f (i) = (fj )1≤j≤ρ , 1 ≤ i ≤ ρ, which constitute a basis of the submodule M(E), or equivalently the corresponding polynomial (i) (i) matrix F = [fj ], 1 ≤ i, j ≤ ρ, where HN(f (i) ) = i, 1 ≤ i ≤ ρ, and fj = (i) sj (i) k (i) ) is equal to k=0 fjk x ∈ K[x], 1 ≤ i, j ≤ ρ. (The head exponent HE(f (i)
(i)
(i)
(i)
sj = deg(fj ) for a certain j, 1 ≤ j ≤ ρ, s.t. o(j) + sj ρ ≥ o(j ) + sj ρ for any j , 1 ≤ j ≤ ρ.) For the code deﬁned by L(mQ), the polynomial matrix F can be obtained as a minimal polynomial matrix of the syndrome array vector up to the pole order m + 4g by the BMS algorithm including majority voting scheme. In the subsequent sections, we often dispense with the subscripts and (i) superscripts, and denote f (i), f(i, j), f(i, j, k), s(i, j), etc. instead of f (i), fj , (i)
(i)
fjk , sj , etc.
3
BMS Algorithm and Its Pipelining
The vectorversion of BMS algorithm[4] can be given in a modiﬁed form, which ﬁnds not only a pair of minimal and auxiliary polynomial matrices F (l) = [f(l, i)]1≤i≤ρ , G(l) = [g(l, i)]1≤i≤ρ but also the corresponding pair of discrepancy and auxiliary array vectors v(l) = (v(l, i))1≤i≤ρ , w(l) = (w(l, i))1≤i≤ρ iteratively at the increasing pole orders ol ∈ O, where each (lth) iteration is for the set of 3tuples (i, j, k) having the same value o¯(i) (j, k)(= ol ), l ∈ Z0 .
306
Shojiro Sakata and Masazumi Kurihara
The discrepancy and auxiliary array vectors v(l), w(l) are deﬁned by a kind of operation by the minimal and auxiliary polynomial matrices F (l), G(l), respec(i) tively, upon the syndrome array vector u as follows. Denoting u(i, j, k) := ujk , 1 ≤ i ≤ ρ, (j, k) ∈ Σ (i) , ρ s(l,i,µ) v(l, i, j, k) := f(l, i, µ, ν)u(µ, j, ν + k), µ=1 ν=1
w(l, i, j, k) :=
ρ t(l,i,µ)
g(l, i, µ, ν)u(µ, j, ν + k),
µ=1 ν=1
where the values f(l, i, j, k) (0 ≤ k ≤ s(l, i, j) := deg(f(l, i, j))), g(l, i, j, k) (0 ≤ k ≤ t(l, i, j) := deg(g(l, i, j))) are the coeﬃcients of polynomials f(l, i, j), g(l, i, j), and the values v(l, i, j, k), w(l, i, j, k) ((j, k) ∈ Σ (i)) are the components of arrays v(l, i), w(l, i), respectively. ¯ = (η(l, i), κ(l, i)) coincides The value d(f (l, i)) := v(l, i, ¯j, k¯ −s(l, i)) for (¯j, k) with the discrepancy of the minimal polynomial vector f (l, i) w.r.t. the syndrome array vector u, provided the value d(f (l, i)) does not vanish, where s(l, i) = HE(f (l, i)). The following is a modiﬁed BMS algorithm (without majority logic scheme) for a syndrome array vector given up to the pole order m. Algorithm Step 1 (Initialization) l := 0; s(0, i) := 0, 1 ≤ i ≤ ρ; f (0, i) := ei (the ith unit vector), 1 ≤ i ≤ ρ; c(0, i) := −1, 1 ≤ i ≤ ρ; g(0, i) := ∅, 1 ≤ i ≤ ρ; v(0, i, j, k) := u(i, j, k), (j, k) ∈ Σ (i) , 1 ≤ i, j ≤ ρ; w(0, i, j, k) := ∅; Step 2 (Discrepancy computation) FN := {f(l, i)d(f (l, i)) = 0, 1 ≤ i ≤ ρ}; ¯ := κ(l, i); Step 3 (Updating) for each f (l, i), 1 ≤ i ≤ ρ, ¯j := η(l, i), k df ¯ := d(f (l, i))/d(g(l, j)); dg ¯ Case A: if f (l, i) ∈ FN , for (j, k) ∈ Σ (i) or Σ (j) f(l + 1, i, j, k) := f(l, i, j, k), v(l + 1, i, j, k) := v(l, i, j, k), g(l + 1, ¯ j, j, k) := g(l, ¯j, j, k), w(l + 1, ¯j, j, k) := w(l, ¯j, j, k); Case B: if f (l, i) ∈ FN and s(l, i) ≥ k¯ − c(l, ¯j), ¯ for (j, k) ∈ Σ (i) or Σ (j) df g(l, ¯j, j, k − s(l, i) + k¯ − c(l, ¯j)), dg df ¯ − c(l, ¯j)), w(l, ¯j, j, k − s(l, i) + k v(l + 1, i, j, k) := v(l, i, j, k) − dg g(l + 1, ¯ j, j, k) := g(l, ¯j, j, k), w(l + 1, ¯j, j, k) := w(l, ¯j, j, k); ¯ − c(l, j), ¯ Case C: if f(l, i) ∈ FN and s(l, i) < k (i) (¯ j) for (j, k) ∈ Σ or Σ f(l + 1, i, j, k) := f(l, i, j, k) −
¯ + c(l, ¯j) + s(l, i)) − df g(l, ¯j, j, k), f(l + 1, i, j, k) := f(l, i, j, k − k dg ¯ + c(l, j) ¯ + s(l, i)) − df w(l, j, ¯ j, k), v(l + 1, i, j, k) := v(l, i, j, k − k dg g(l + 1, ¯ j, j, k) := f(l, i, j, k), w(l + 1, ¯j, j, k) := v(l, i, j, k);
A Systolic Array Architecture for Fast Decoding of OnePoint AG Codes
307
Cace D: if f (l, i) ∈ FN and g(l, ¯j) = ∅, for (j, k) ∈ Σ (i) or Σ (j) ¯
¯ + s(l, i) − 1), f(l + 1, i, j, k) := f(l, i, j, k − k ¯ v(l + 1, i, j, k) := v(l, i, j, k − k + s(l, i) − 1), g(l + 1, ¯ j, j, k) := f(l, i, j, k), w(l + 1, ¯j, j, k) := v(l, i, j, k); Step 4 (Termination check) l := l + 1; if ol > m then stop else go to Step 2. ¯ j, k) are quite similar to The computations of v(l + 1, i, j, k) and w(l + 1, j, those of f(l+1, i, j, k) and g(l+1, ¯j, j, k), respectively, owing to their relationships so that there are involved many redundancies in their computations. However, these duplicate computations and the redundant structure of the discrepancy and auxiliary array vectors are indispensable for eﬃcient parallel implementation of the BMS algorithm, in particular, in calculation of the discrepancy d(f (l, i)) through the array v(l, i, j, k). By introducing a pair of modiﬁcations of the mapping κ(l, i), we have a pipelined version of the above algorithm, which can be implemented easily on a kind of systolic array. First, putting κ(i) := (o(i) − i + 1)/ρ, 1 ≤ i ≤ ρ, we introduce oˇ(i) (j, k) := o(i) + kρ + j − 1, oˆ(i) (j, k) := oˇ(i) (j, k) − κ(i)ρ, where oˇ(i) (j, k) and oˆ(i) (j, k) are deﬁned only for (j, k) ∈ P × Z0 s.t. oˇ(i) (j, k), ˇ (i) and Σ ˆ (i), respecoˆ(i) (j, k) ∈ O. We denote the sets of those points (j, k) as Σ tively. Then, it is easy to see that the mappings κ ˇ (l, i) := (ol − o(i) − η(l, i) + 1)/ρ, κ ˆ (l, i) := κ ˇ(l, i) + κ(i) satisfy the following relationships ˇ (l, i) = k, η(l, i) = j; oˆ(i) (j, k) = ol ⇔ κ ˆ (l, i) = k, η(l, i) = j; oˇ(i) (j, k) = ol ⇔ κ κ(l, i) = κ ˇ (l, i) − κ(η(l, i)) = κ ˆ (l, i) − κ(η(l, i)) − κ(i). Furthermore, the values κ ˇ (l, i), κ ˆ (l, i) are nondecreasing w.r.t. l ∈ Z0 , and in particular κ ˇ (l, i) ≤ κ ˇ (l + 1, i) ≤ κ ˇ (l, i) + 1, κ ˆ(l, i) ≤ κ ˆ(l + 1, i) ≤ κ ˆ (l, i) + 1. Example 2 (Continued) In our example of code, the functions κ ˇ (l, i) and κ ˆ (l, i) are shown in the righthalf of Table 2. Finally, we deﬁne modiﬁed polynomial vectors fˇ (l, i) and arrays vˆ(l, i) having the coeﬃcients and components vˆ(l, i, j, k) := v(l, i, j, k − κ(i) − κ(j) − s(l, i)), fˇ(l, i, j, k) := f(l, i, j, k − κ ˇ (l, i) + s(l, i)),
308
Shojiro Sakata and Masazumi Kurihara
where HN (fˇ (l, i)) = i, HE(fˇ (l, i)) = κ ˇ (l, i). Then, we can show that the updatings of them and their auxiliary counterparts are given as follows: fˇ(l + 1, i, j, k) := fˇ(l, i, j, k + κ ˇ (l, i) − κ ˇ (l + 1, i)) ¯ −ˇ g (l, j, j, k + κ ˇ(l, i) − κ ˇ (l + 1, i)), 0 ≤ k ≤ κ ˇ (l + 1, i), ˇ ¯ ¯ gˇ(l + 1, j, j, k) := gˇ(l, j, j, k) or f (l, i, j, k), depending on case, df ˆ w(l, ˆ ¯j, j, k), (j, k) >T (¯j, k), dg w(l ˆ + 1, ¯ j, j, k) := w(l, ˆ ¯ j, j, k) or vˇ(l, i, j, k), depending on case. vˆ(l + 1, i, j, k) := vˆ(l, i, j, k) −
The components vˆ(l, i, j, k), w(l, ˆ i, j, k) of discrepancy and auxiliary arrays ˆ where the total ˆ (i) s.t. (j, k) ≥T (¯j, k), vˆ(l, i), w(l, ˆ i) are given for (j, k) ∈ Σ (i) order ≥T is deﬁned by (j, k) ≥T (j , k ) if and only if o (j, k) ≥ o(i) (j , k ). It is important to note that they satisfy for ¯j = η(l, i), k¯ = κ(l, i), kˇ = κ ˇ (l, i), kˆ = κ ˆ (l, i) ˇ = f(l, i, i, s(l, i)), vˆ(l, i, ¯j, k) ˆ = v(l, i, ¯j, k¯ − s(l, i)), fˇ(l, i, i, k) where the above righthand sides are equal to the head coeﬃcient of the polynomial vector f (l, i) and the discrepancy d(f (l, i)), respectively. Furthermore, both polynomial coeﬃcients fˇ(l, i, j, k) and gˇ(l, i, j, k) are synchronized in updating fˇ(l, i, j, k) to fˇ(l + 1, i, j, k), and similarly both vˆ(l, i, j, k) and w(l, ˆ i, j, k) are synchronized in updating vˆ(l, i, j, k) to vˆ(l + 1, i, j, k) as shown in the above formulae. These updating formulae are a generalization of the similar ones in parallelization of the BM algorithm[12].
4
Systolic Array and Scheduling
To implement parallelization of our algorithm, we introduce a special kind of systolic array architecture as follows, where instead of the subset {o ∈ O0 ≤ o ≤ m}, we take {l ∈ Z0 0 ≤ l ≤ m} containing gaps for the purpose of making the following discussions easy. (The functions η, κ, etc. are redeﬁned appropriately so that they are assumed to be completely deﬁned.) (1) It consists of a series of m + 1 processors: P (0), · · · , P (m), where the lth processor P (l) is connected to the (l − 1)th (leftneighboring) and the (l + 1)th (rightneighboring) processors P (l − 1) and P (l + 1), 1 ≤ l ≤ m − 1, except for the 0th (leftmost) and the mth (rightmost) ones. There are g trivial (dummy) processors working only as delayers as well as m + 1 − g eﬀective processors by which the BMS computations are executed. (2) The leftmost processor P (0) receives as input data the components of a syndrome array vector, and the rightmost processor P (m) outputs the components of the minimal polynomial matrix, from which one can get the desired basis of the error locator module. The lth processor P (l), 1 ≤ l ≤ m, receives as input the components of the modiﬁed arrays vˆ(l − 1, i), w(l ˆ − 1, i)
A Systolic Array Architecture for Fast Decoding of OnePoint AG Codes
309
ˇ (l − 1, i) from the and the coeﬃcients of the polynomial vectors fˇ (l − 1, i), g (l − 1)th processor P (l − 1) and (if it is eﬀective) calculates the components of the modiﬁed arrays vˆ(l, i), w(l, ˆ i) and the coeﬃcients of the polynomial ˇ (l, i) together with the accompanying control data such as vectors fˇ (l, i), g s(l, i), c(l, i), d(f (l, i)), d(g(l, i)), 1 ≤ i ≤ ρ. Each computation of values vˆ(l, i, j, k), etc. is a combination of one multiplication and one addition over the symbol ﬁeld K. (3) Each processor P (l), 0 ≤ l ≤ m, has ρ subprocessors S(l, i), 1 ≤ i ≤ ρ, which contain ρ cells C(l, i, j), 1 ≤ j ≤ ρ. In total, it consists of ρ2 cells C(l, i, j), 1 ≤ i, j ≤ ρ. For k ∈ Z0 , cell C(l, i, j) manipulates the four values vˆ(l, i, j, k), w(l, ˆ ¯ j, j, k), fˇ(l, i, j, k), and gˇ(l, ¯j, j, k) at a certain clock n ∈ Z0 , where ¯j = η(l, i) and n depends on l, i, j, k as well as vˆ, w, ˆ fˇ, gˇ. (4) We assume an artiﬁcial 2dimensional arrangement of ρ2 cells C(l, i, j), 1 ≤ i, j ≤ ρ. (We disregard realizability of such an eﬀective structure by the current VLSI technology.) Consequently, our systolic array architecture has a threedimensional structure having three perpendicular axes corresponding to the indices l, i, j. The arrangement of ρ2 cells in each processor P (l) is determined so that each cell C(l, i, j) is situated at the point of coordinates −1 (l, φ−1 is the inverse of l (i), j), 0 ≤ l ≤ m, 1 ≤ i, j ≤ ρ, where the mapping φl the following permutation φl of the integers 1, 2, · · · , ρ deﬁned by induction w.r.t. l. (Base) if k = 1 1, , if k=odd & k = 1 φ0 (k) := ρ − k−3 2 k + 1, if k=even 2 (Induction) for 1 ≤ l ≤ m φl−1(ρ), if φl−1(k + 1), if φl (k) := φl−1(1), if φl−1(k − 1), if
k k k k
=ρ = ρ =1 = 1
& & & &
k + l=even k + l=even k + l=odd k + l=odd
(5) The main data such as components of discrepancy arrays and coeﬃcients of polynomials, i.e. elements of the symbol ﬁeld K, are transmitted from the cells in the processor P (l) to the cells in the processor P (l+1), 0 ≤ l ≤ m−1. More precisely, these communications are made through the links which connect each cell C(l, i, j), 1 ≤ i, j ≤ ρ, in P (l) to two cells C(l − 1, i, j) and C(l − 1, i− , j) in the leftneighboring processor P (l − 1) and also to two cells C(l + 1, i, j) and C(l + 1, i+ , j) in the rightneighboring processor P (l + 1), where the numbers i− , i+ are determined uniquely by the condition that η(l, i) = η(l − 1, i− ) = η(l + 1, i+ ). These links are only between nearest cells of neighboring processors (as shown in Lemma 1 below). ˆ = (η(l, i), κ (6) By the processor P (l), for (¯j, k) ˆ(l, i)), (if it is eﬀective) the value ˆ is tested for 0 or not at cell C(l, i, ¯j). If d(f (l, i)) = vˆ(l, i, ¯j, k) ˆ vˆ(l, i, ¯ j, k) turns out to be not equal to zero, a ﬂag F1 is set to be 1 at ρ2 cells C(l, i, j),
310
Shojiro Sakata and Masazumi Kurihara
1 ≤ i, j ≤ ρ, in the subprocessor S(l, i), and kept onwards (Otherwise F1 := 0). At cell C(l, i, ¯ j) with the ﬂag F1 = 1 all the data vˆ(l, i, j, k), w(l, ˆ ¯j, j, k), ˇ ¯ f(l, i, j, k), gˇ(l, j, j, k) are processed for updating, so that the new values vˆ(l + 1, i, j, k) and fˇ(l + 1, i, j, k) are obtained at cell C(l + 1, i, j), and the ¯ new values w(l+1, ˆ j, j, k) and gˇ(l+1, ¯j, j, k) are obtained at cell C(l+1, i+, j), respectively. Provided that F1 = 1 at cell C(l, i, ¯j), another ﬂag F2 is set to be, e.g. 0, 1 or −1 (for controlling the updating) according to cases B, C or D. (7) The clock for synchronization and all the control data (ﬂags F1 , F2 , the integers s(l, i), c(l, j), etc.) are communicated between subprocessors or maintained in subprocessors. In particular, we assume that our architecture is equipped with a global clock to maintain synchronization of the operations of all the cells C(l, i, j), 0 ≤ l ≤ m, 1 ≤ i, j ≤ ρ. Except for a global link for the clock information exchange among all the processors and the control data exchange among ρ2 cells of each processor, there is nothing but local links between nearest cells of the neighboring processors as shown below. (8) Timing of the abovementioned data transmission and computation is adjusted such that manipulations of the data vˆ(l, i, j, k), etc. by ρ2 cells of the processor P (l) are done by one or two clocks later than manipulations of the corresponding data vˆ(l − 1, i, j, k), etc. having the same index k by ρ2 cells of the leftneighboring processor P (l − 1), 1 ≤ l ≤ r. The links between cells C(l − 1, i, j), 1 ≤ i, j ≤ ρ, of processor P (l − 1) and cells C(l, i, j), 1 ≤ i, j ≤ ρ, of processor P (l) are local in the sense that the following conditions are satisﬁed: −1 (a) The diﬀerences between positions φ−1 l−1 (i) and φl (i) are not greater than 1, 1 ≤ i ≤ ρ; (b) For ψl (k) := η(l, φl (k)) = (l − φl (k) + 1 mod ρ) +1, the diﬀerence −1 between values ψl−1 (i) and ψl−1 (i) is not greater than 1, 1 ≤ i ≤ ρ,
in view of the following lemma, which can be proved by induction. (Remark: It holds that φl (k) + ψl (k) ≡ l + 2 mod ρ, 1 ≤ k ≤ ρ, 0 ≤ l ≤ m.) Lemma 1 In case of l + k = odd, φl (k) = φl+1 (k − 1), 2 ≤ k ≤ ρ; φl (1) = φl+1(1); ψl (k) = ψl+1 (k + 1), 1 ≤ k ≤ ρ − 1; ψl (ρ) = ψl+1 (ρ). In case of l + k = even, φl (k) = φl+1(k + 1), 1 ≤ k ≤ ρ − 1; φl (ρ) = φl+1(ρ); ψl (k) = ψl+1 (k − 1), 2 ≤ k ≤ ρ; ψl (1) = ψl+1 (1). Finally, we can give a complete scheduling of the BMS computations on the above architecture, where a scheduling is a mapping from the set of all operations
A Systolic Array Architecture for Fast Decoding of OnePoint AG Codes
311
(calculations and other manipulations of relevant data) of the algorithm into the direct product N ×C of the set N (:= Z0 ) of clocks and the set C := {C(l, i, j)0 ≤ l ≤ m, 1 ≤ i, j ≤ ρ} of all cells. A possible scheduling is given by the mappings nv , nw , nf , ng which assign the data vˆ(l, i, j, k), w(l, ˆ i, j, k), fˇ(l, i, j, k), gˇ(l, i, j, k) to the clocks ˆ (l, i) + l + k, nw (l, i, j, k) := κ ˆ(l, i) + l + k, nv (l, i, j, k) := κ nf (l, i, j, k) := l + k, ng (l, i, j, k) := l + k, respectively, of the cell C(l, i, j). It means that the value vˆ(l, i, j, k) is manipulated, i.e. calculated and/or stored to be sent to the rightneighboring nearest cell(s) C(l + 1, i, j) (and C(l + 1, ¯j, j)) at clock n = nv (l, i, j, k), and so on. (Reˇ := κ ˇ i, i, k) ˇ of fˇ (l, i) mark: For k ˇ (l, i), computation of the head coeﬃcient f(l, ˇ In view of the properties κ is done just at clock n = l + k.) ˆ (l, i) ≤ κ ˆ (l + 1, i) ≤ κ ˆ (l, i)+ 1 and κ ˇ (l, i) ≤ κ ˇ (l + 1, i) ≤ κ ˇ (l, i)+ 1, this scheduling ensures that the set of data {ˆ v (l, i, j, k), w(l, ˆ ¯ j, j, k), fˇ(l, i, j, k), gˇ(l, ¯j, j, k)} available at cell C(l, i, j) at a certain clock n ∈ N can be available at two cells C(l + 1, i, j), C(l + 1, i+ , j) successively to obtain {ˆ v (l + 1, i, j, k), fˇ(l + 1, i, j, k)} at cell C(l + 1, i, j) and {w(l ˆ + 1, ¯ j, j, k), gˇ(l + 1, ¯ j, j, k)} at cell C(l + 1, i+ , j), at clock n + 1 ∈ N and/or at clock n + 2 under the requirement of local communication, provided that each cell has a buﬀer storage in addition to the register for the current data vˆ(l, i, j, k), etc. ˇ i, j, k) and gˇ(l, ¯i, j, k) can start just after all the The computations of f(l, input data have been fed into the 0th processor and the controlling data such as the ﬂags F1 , F2 , and the integers s(l, i), c(l, i), etc. have been ﬁxed at each subprocessor S(l, i) as a result of the computations of vˆ(l, i, j, k) and w(l, ˆ ¯i, j, k). The integer values F1 , F2 , s(l, i), c(l, i), etc. control the ﬁnite ﬁeld computations of fˇ(l, i, j, k) and gˇ(l, ¯i, j, k) during the whole process of the algorithm. Consequently, the total number of clocks required is about 3m. Thus, the time complexity is O(m) and the space complexity as the number of cells is O(ρ2 m). K¨ otter [9] gave a parallel implementation of a fast decoding algorithm of onepoint AG codes. In his architecture, he uses a set of ρ feedback shift registers each with m storage elements. That scheme has time complexity of O(m2 ) and space complexity of O(ρm). Since ρ is usually much less than m, our scheme is better than his in regard to the total complexity as well as in regard to time 1 complexity. For instance, in case of Hermitian codes where O(ρ) = O(m 3 ), our 5 time and space complexities O(m), O(m 3 ) can be compared with his O(m2 ), 4 O(m 3 ).
5
Concluding Remarks
In this paper we have proposed a special kind of systolic array architecture composed of threedimensional arrangement of processing units called cells for implementing fast decoding of onepoint AG codes and presented a method of scheduling of all the operations done by cells, where each cell executes a
312
Shojiro Sakata and Masazumi Kurihara
combination of multiplication and addition over the ﬁnite ﬁeld (symbol ﬁeld) by itself and transmits the result to his nearestneighboring cells at each clock. This architecture not only satisﬁes restriction to local communication which is required usually for any eﬃcient systolic array architecture, but also has linear time complexity. We omit discussions about incorporating majority logic scheme and some other subtle aspects for decoding into our parallel architecture.
References 1. S. Sakata, J. Junsten, Y. Madelung, H. Elbrønd Jensen, and T. Høholdt, “A fast decoding method of AG codes from MiuraKamiya curves Cab up to half the FengRao bound,” Finite Fields and Their Applications, Vol.1, No.6, pp.83–101, 1995. 2. S. Sakata, J. Junsten, Y. Madelung, H. Elbrønd Jensen, and T. Høholdt, “Fast decoding of algebraic geometric codes up to the designed distance,” IEEE Transactions on Information Theory, Vol.41, No.6, Part I, pp.1672–1677, 1995. 3. S. Sakata, H. Elbrønd Jensen, and T. Høholdt, “Generalized BerlekampMassey decoding of algebraic geometric codes up to half the FengRao bound,” IEEE Transactions on Information Theory, Vol.41, No.6, Part I, pp.1762–1768, 1995. 4. S. Sakata, “A vector version of the BMS algorithm for implementing fast erasureanderror decoding of any onepoint AGcodes,” (Eds., T. Mora, H. Mattson) Applied Algebra, Algebraic Algorithms and ErrorCorrecting Codes: Proc. AAECC12, Toulouse, France, June 1997, Lecture Notes in Computer Science, Vol. 1255, pp.291–310, Springer Verlag. 5. S. Sakata, “Finding a minimal set of linear recurring relations capable of generating a given finite twodimensional array,” J. Symbolic Computation, Vol.5, pp.321–337, 1988. 6. S. Sakata, “Extension of the BerlekampMassey algorithm to N dimensions,” Information and Computation, Vol.84, pp.207–239, 1990 7. M. Kurihara and S. Sakata, “A fast parallel decoding algorithm for onepoint AGcodes with a systolic array architecture,” presented at the 1995 IEEE International Symposium on Information Theory, Whistler, Canada, September 1995. 8. S. Sakata and M. Kurihara, “A systolic array architecture for implementing a fast parallel decoding algorithm of onepoint AGcodes,” presented at the 1997 IEEE International Symposium on Information Theory, Ulm, Germany, June 1997. 9. R. K¨ otter, “A fast parallel implementation of a BerlekampMassey algorithm for algebraic geometric codes,” Link¨ oping, Sweden, January 1995; Dissertation, Link¨ oping University, §3, pp.721–737, 1996. 10. R. K¨ otter, “Fast generalized minimum distance decoding of algebraic geometric and Reed Solomon codes,” Link¨ oping, Sweden, August 1993; IEEE Transactions on Information Theory, Vol.42, No.3, pp.721–737, 1996. 11. R. Blahut, Theory and Practice of Error Correcting Codes, Reading, MA: Addison Wesley, 1983. 12. S. Sakata and M. Kurihara, “A fast parallel implementation of the BerlekampMassey algorithm with a 1D systolic array architecture,” (Eds., G. Cohen, M. Giusti, T. Mora) Applied Algebra, Algebraic Algorithms and ErrorCorrecting Codes: Proc. AAECC11, Paris, July 1995, Lecture Notes in Computer Science, Vol. 948, pp.415–426, Springer Verlag.
A Systolic Array Architecture for Fast Decoding of OnePoint AG Codes
Appendix Table 1: Values o¯(i) (j, k) for a code.
k 0 1 2 3 4 5 6
i 1 2 3 4 j 1 2 3 4 1 2 3 4 1 2 3 4 1 2 31 0 5 10 15 5 10 15 20 10 15 20 4 9 14 19 9 14 19 14 19 8 13 18 23 13 18 23 18 23 12 17 22 17 22 22 16 21 21 20
15 20 19 23
Table 2: Functions η(l, i), κ(l, i), κ ˇ (l, i), κ ˆ (l, i).
l 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
i
ol 1 0 1 4 1 5 2 8 1 9 2 10 3 12 1 13 2 14 3 15 4 16 1 17 2 18 3 19 4 20 1 21 2 22 3 23 4
η(l, i) κ(l, i) κ ˇ (l, i) κ ˆ (l, i) 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 − − 1 − 1 2 − 1 2 3 − 1 2 3 4 1 2 3
−−0 −−1 −−0 −−2 −−1 1 −0 −−3 −−2 1 −1 2 1 0 −−4 −−3 1 −2 2 1 1 3 2 5 −−4 1 −3 2 1 2
− − 0 − 1 0 − 2 1 0 − 3 2 1 0 4 3 2
−−0 −−1 −−1 −−2 −−2 0 −2 −−3 −−3 1 −3 0 0 3 −−4 −−4 2 −4 1 1 4 0 0 5 −−5 3 −5 2 2 5
− − 0 − 1 1 − 2 2 2 − 3 3 3 4 4 4 4
−−0 −−1 −−1 −−2 −−2 0 −2 −−3 −−3 1 −3 1 0 3 −−4 −−4 2 −4 2 1 4 3 2 5 −−5 3 −5 3 2 5
− − 1 − 2 2 − 3 3 3 − 4 4 4 4 5 5 5
−− −− −− −− −− 2 − −− −− 3 − 3 3 −− −− 4 − 4 4 4 4 −− 5 − 5 5
313
Computing Weight Distributions of Convolutional Codes via Shift Register Synthesis Mehul Motani and Chris Heegard School of Electrical Engineering, Cornell University Ithaca, NY 14853 USA motani@ee.cornell.edu, heegard@ee.cornell.edu
Abstract. Weight distributions of convolutional codes are important because they permit computation of bounds on the error performance. In this paper, we present a novel approach to computing the complete weight distribution function (WDF) of a convolutional code. We compute the weight distribution series using the generalized Viterbi Algorithm (GVA) and then ﬁnd the minimum linear recursion relation in this series using the shift register synthesis algorithm (SRSA). The WDF follows from the minimum recursion. In order to generalize the use of the SRSA over certain commutative rings, we prove the key result that the set of ﬁnite recursions forms a principal ideal.
1
Introduction
Weight distributions of convolutional codes are important because they can be used to compute error performance bounds for the code. Viterbi and Omura [2] derive upper bounds for the probability of word and bit errors for a linear code in terms of the Hamming weights of all error events. Traditionally, the weight distribution function is computed using signal ﬂow graph analysis and Mason’s gain formula [3]. Other methods have been presented by Fitzpatrick and Norton [5], Onyszchuk [6], and McEliece [7]. In [8], we presented an algorithm to compute univariate weight distribution functions. This paper extends the technique to compute multivariate weight distribution functions. Our method is to use the Viterbi algorithm to generate a recursive array and use the BerlekampMassey algorithm (BMA) to ﬁnd the minimum recursion in the array. Massey showed [1] that the BMA synthesizes the minimum length shift register capable of generating the recursive array and so we will refer to it as the shift register synthesis algorithm (SRSA). The weight distribution function follows easily from the minimum recursion. We introduce the key steps of the problem by an example. We will compute the weight distribution function of the code [1 +D + D2 1 + D2 ], whose encoder and state diagram are shown in Fig. 1. We are interested in enumerating the error events for the code, i.e. those paths in the state diagram which deviate from the zerostate and remerge with
Supported in part by NSF grant # CCR9805885.
Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 314–323, 1999. c SpringerVerlag Berlin Heidelberg 1999
Computing Weight Distributions of Convolutional Codes
315
0/00
v1
00 1/11
u
0/11 1/00
D
D
01
10 0/10
1/01
v2
0/01 11
1/10
Fig. 1. Encoder and State Diagram for code with G(D) = [1 + D + D2
1 + D2 ].
the zerostate exactly once. It can be seen by inspection that the free distance of this code is 5 and that there is exactly 1 error event of output weight 5. There are 2 events of weight 6, 4 of weight 7, 8 of weight 8, and so on. The number of events forms the sequence S = {1, 2, 4, 8, · · · }. By inspection, the sequence satisﬁes the recursion: S0 = 1, Sn = 2Sn−1 for n ≥ 1. Associating the number of events of weight n with the coeﬃcient of X n , we get the weight distribution series for the code as X 5 + 2X 6 + 4X 7 + 8X 8 + · · · = X 5 (1 + 2X + 4X 2 + · · · ) = X 5
∞
Si X i .
i=0
To get the weight distribution function, consider the sum ∞ i=0
Si X i = S0 +
∞
2Si−1 X i = S0 + 2X
i=1
∞
Si−1 X i−1 = S0 + 2X
i=1
∞
Si X i .
i=0
Substituting S0 = 1 and solving, we get the output weight distribution function X5
∞ i=0
Si X i =
X5 . 1 − 2X
In the simple example above, we were able to easily generate the recursive series and identify the recursion. It is not so easy in general. The next section deﬁnes weight distribution functions. Section 3 explains some of the theory of linear recursion relations and proves the key result that the set of ﬁnite recursions is a principal ideal. Section 4 indicates how to generalize shift register synthesis to certain rings. Section 5 describes the generalized Viterbi Algorithm (GVA). Sections 6 and 7 describe how to generate the weight distribution series and compute the weight distribution function respectively. The ﬁnal section makes some concluding remarks.
316
2
Mehul Motani and Chris Heegard
Weight Distributions
We ﬁrst introduce some notation. Usually, F will denote a field and R a commutative ring with identity. R[X1 , · · · , Xn ] and R[[X1 , · · · , Xn ]] are respectively the polynomial and power series rings in n variables. We denote the set of integers as Z and the set of rational numbers as Q. An error event is a nonzero ﬁnite weight codeword for which the encoder starts in the zero state, departs, and returns to the zero state exactly once. In addition, we will call two error events equal if they are shifts of each other. The weight distribution of a convolutional code is deﬁned as the weight distribution of the set of error events. The output weight distribution series (OWDS) of a code is a power series in Z[[X]], where the coeﬃcient of X n , an , indicates the number of error events of weight n. A recursion relation can be found in this series and it can expressed as a rational function, called the output weight distribution function (OWDF): T (X) = a1 X 1 + a2 X 2 + · · · + an X n + · · · =
Ω(X) , Λ(X)
(1)
where Λ(X), Ω(X) are polynomials over Z with deg Ω(X) < deg Λ(X). The inputoutput weight distribution series (IOWDS) is a multivariate power series in Z[[X, Y ]], where the coeﬃcient of X n Y k , an,k , indicates the number of error events of weight n which correspond to input sequences of weight k. The rational function representation is called the inputoutput weight distribution function (IOWDF): T (X, Y ) =
∞ ∞ n=1 k=1
an,k X n Y k =
Ω(X, Y ) . Λ(X, Y )
(2)
Although we assumed the existence of an encoder for the convolutional code, the OWDS and OWDF are independent of the encoder and depend only on the structure of the code. Since the IOWDS and IOWDF indicate the input sequence associated with the output codeword, they do depend on the choice of encoder. Traditionally, the WDF is computed by signal ﬂow graph analysis and Mason’s gain formula [3]. From [3], we see that the denominator of the WDF is a polynomial with constant term one. In the following section, we will see that this implies that the sequence of coeﬃcients of the WDS is a shift register sequence.
3
Recursion Relations
d Let R be a ring. Let f(X) = i=0 fi X i be an element of R[X], where fi may be equal to zero for any i (e.g. f(X) = 1 − X − X 2 − 0X 3 ). The sequence {Si : i = 0, 1, 2, . . . , } with entries from R is said to satisfy the linear recursion relation speciﬁed by f if d i=0
fi Sr−i = 0 ∀ r ≥ d.
(3)
Computing Weight Distributions of Convolutional Codes
317
P f1
f2

Sd;1
fd


Sd;2
...
S0
Fig. 2. Shift Register deﬁned by f(X) = 1 +
d
i=1
fi X i .
We also call f a recursion for the sequence {Si }. Deﬁne a polynomial with constant term one to be motonic. If f ∈ R[X] is motonic and {Si } satisﬁes the recursion speciﬁed by f, then (3) implies Sr =
d
−fi Sr−i
∀ r ≥ d.
(4)
i=1
This means {Si } is a shift register sequence and it can be generated by the shift register shown in Fig. 2. of the We can also think of the sequence {Si : i = 0, 1, 2, . . . , } as an element ∞ power series ring R[[X]] by associating with it the series S(X) = i=0 Si X i . The series S(X) is often called the generating function associated with the sequence {Si }. The generating function has a nice form when S(X) comes from a shift register sequence. Suppose the sequence {Si } is a shift register sequence which satd isﬁes the recursion speciﬁed by the motonic polynomial f(X) = 1 + i=1 fi X i . Then by a similar derivation as in [9], we can write S(X) =
∞ i=0
Si X i =
d−1 i=0
d−i−1 fi X i k=0 Sk X k . f(X)
(5)
Note that the numerator depends on the d initial states of the shift register. The denominator, f(X), speciﬁes the recursion and is independent of the initial conditions. If f(X) is the minimum degree polynomial which speciﬁes the recursion, it is called the characteristic function of the series. Suppose S ∈ R[[X1 , . . . , Xn ]] and there exist f and g in R[X1 , . . . , Xn ] such that S = fg . Then f is called a finite recursion. We make the distinction between an arbitrary recursion and a ﬁnite recursion because, unlike one dimensional recursions, not all multidimensional recursions are ﬁnite. The next theorem shows that for certain commutative rings, the set of ﬁnite recursions forms a singly generated ideal in R[X1 , . . . , Xn ].
318
Mehul Motani and Chris Heegard
Theorem 1. Let R be a commutative Noetherian Unique Factorization Domain (UFD). Let K = R[X1 , . . . , Xn ], and L = R[[X1 , . . . , Xn ]]. Let S ∈ L and let I = {h ∈ K : Sh = g for some g ∈ K}. Then I is a principal (singly generated) ideal of K. Proof. The basic idea behind the proof is a follows. It is clear that I is an ideal of K. Next we show that any two elements of the ideal are multiples of some element in the ideal. By the Hilbert Basis Theorem, K is ﬁnitely generated. The result then follows by collapsing the ﬁnite basis to a single generator. Since 0S = 0, I = ∅. Suppose h1 , h2 ∈ I, a ∈ K, Sh1 = g1 , and Sh2 = g2 . Then S(h1 + h2 ) = Sh1 + Sh2 = g1 + g2 ∈ K. Also S(h1 a) = (Sh1 )a = g1 a ∈ K. So I is an ideal of K. Next, let d1 = gcd(h1 , g1 ) and d2 = gcd(h2 , g2 ). Then ∃ h1 , h2 , g1 , g2 ∈ K suchthat Sh1 = g1 Sd1h1 = d1 g1 Sh1 = g1 ⇒ ⇒ Sh2 = g2 Sd2h2 = d2 g2 Sh2 = g2 where gcd(h1 , g1 ) = 1 and gcd(h2 , g2 ) = 1. Note that the above equations imply that h1 , h2 ∈ I. Now, let d= gcd(h1 , h2 ). Then ∃ h1 , h2 ∈ K such that Sh1 = g1 Sdh1 = g1 =⇒ =⇒ g1 h2 = g2 h1 Sh2 = g2 Sdh2 = g2 Recall that sinceR is a UFD, K is a UFD. gcd(h1 , g1 ) = 1 gcd(h1 , g1 ) = 1 h h ⇒ ⇒ 1 2 gcd(h2 , g2 ) = 1 gcd(h2 , g2 ) = 1 h2 h1 So h1 = uh2 where u ∈ K is a unit. h1 = d1 h1 = d1 uh2 where h2 ∈ I. Then h1 = uh2 ⇒ h1 = uh2 ⇒ h2 = d2 h2 Finally, recall that since R is Noetherian, the Hilbert Basis Theorem implies K is Noetherian. So there exists a ﬁnite generating set for I which can be collapsed into a single generator by the above argument. So I is a principal ideal of K.
4
Generalized Shift Register Synthesis
The Shift Register Synthesis Algorithm (SRSA), as described in [1], takes an input sequence, whose elements come from a ﬁeld F , and generates the shift register of minimum length, L, which generates the sequence. The characteristic function of the shift register is a polynomial with coeﬃcients from F , degree ≤ L and constant term one. In other words, the SRSA ﬁnds the minimum degree motonic polynomial in F [X] which speciﬁes the recursion. Recalling that the set of recursions forms a principal ideal in F [X], it is easy to see that the SRSA solution is also the generator of the ideal. We want to generalize the SRSA and apply it to a sequence whose elements lie in a ring. For example, if we input an integer sequence, which has an underlying
Computing Weight Distributions of Convolutional Codes
319
motonic integer recursion, it turns out that the SRSA yields that motonic integer recursion. In this section, we show why. Recall that while Z is not a ﬁeld, it can be embedded in its ﬁeld of fractions, Q. In this case, we can consider the integer sequence as a rational sequence, and so the SRSA should yield the minimum motonic rational recursion. So we need to show that the minimum motonic rational recursion lies in Z[X]. Let S ∈ Z[[X]] ⊂ Q[[X]]. Let I = {h ∈ Z[X] : hS ∈ Z[X]} and J = {h ∈ Q[X] : hS ∈ Q[X]}. By Theorem 1, I and J are singly generated ideals. Let I = f and J = g for some f ∈ I and g ∈ J . Since Z[X] ⊂ Q[X], I ⊂ J . Figure 3 depicts the various inclusions.
Q [X ] J
Z[X ]
I f
g
Fig. 3. Z[X] ⊂ Q[X], I ⊂ J .
A simple argument shows that any generator of I also generates J : J = g ⇒ J = ug for any unit u ∈ Q[X]. By clearing the denominators, it is possible to choose u such that ug ∈ Z[X]. So ugS ∈ Z[X] which implies ug ∈ I. This means ug is a Z[X]multiple of f and so J = f. Solution via Mason’s gain formula [3] means there exists a motonic polynomial in I. This implies that there exists a motonic generator of I, which, by the preceding argument, generates J . In a singly generated ideal, generators are unique up to multiplication by a unit, and so the motonic generator, if it exists, is unique. Thus the motonic generator of J lies in Z[X] and this is precisely what the SRSA computes. The discussion above is easily generalized by replacing Z by a Noetherian UFD R and Q by the ﬁeld of fractions of R.
5
The Generalized Viterbi Algorithm (GVA)
A trellis for a convolutional encoder is an extension in time of the encoder’s state diagram. Thus a trellis is a directed graph consisting of vertices, or states, interconnected by edges, or branches. In a recent paper, McEliece [4] presented a generalized version of the Viterbi Algorithm which operates on a trellis in which the branch labels come from a semiring. Examples of semirings include
320
Mehul Motani and Chris Heegard 1
1
00
00 2
x
x 01
2
x 01
1
01
. . .
x 10
x x x
2
1
x 10
11
00 2
x
10
x x 11
x
11
Fig. 4. Trellis for code with G(D) = [1 + D + D2
1 + D2 ].
the nonpositive real numbers and the familiar ring of polynomials, Z[X]. Figure 4 depicts a trellis section of an encoder for the convolutional code which has a generator matrix given by G(D) = [1 + D + D2 1 + D2 ]. As previously mentioned, the branch labels come from the semiring. A path is a sequence of branches and the path metric of a path P is the product (“·”) of the labels of the branches in the path, taken in order. The flow between two states, S1 and S2 , is deﬁned as the sum (“+”) of the path metrics of all paths starting at S1 and ending at S2 . For example, when we use the semiring of polynomials over Z described above, the ﬂow between two states is the weight enumerator polynomial that counts the number of paths of each possible weight between the states. In this context, the generalized Viterbi Algorithm is an eﬃcient algorithm for computing the ﬂow between two states [4].
6
Using the GVA to Compute the WDS
We can use the GVA to enumerate the paths which diverge immediately from the zero state and remerge with the zero state exactly once. Let the semiring be Z[X] with the usual polynomial multiplication and addition. We label each branch in the trellis with a monomial X n , where the exponent n corresponds to the Hamming weight of the output associated with the branch. Now all error events correspond to paths which leave and end with the zero state once. Now consider a path, p1 , which stays at the zero state for a ﬁnite time, then diverges and remerges with the zero state. Then p1 corresponds to the same error event as the path, p2 , which immediately diverges from the zero state, imitates p1 and remerges with the zero state. We can modify the original trellis in the following manner to prevent overcounting paths corresponding to the same error event. In the original trellis, remove all nonzero branches which diverge from the zero state except the initial one. In addition we break the selfloop at the zero state by removing the initial branch connecting the zero states. Note that a branch can be eﬀectively removed by setting the branch label to zero. Figure 5 shows the modiﬁed trellis for the trellis of Fig. 4. Now we apply the GVA to the modiﬁed trellis, setting the initial ﬂow to 1 for the zero state and 0 for all nonzero states. After each iteration, the ﬂow at the zero state consists of terms in the WDS. It will soon be clear that the lower order
Computing Weight Distributions of Convolutional Codes 0 00
1 00
2
x
x 01
2
2
x
10
11
x
0
11
10
x
0 01
10
x
. . .
x
x x 11
01
1
x
x x
00 2
x
1
x
x x
2
01
1
x
1 00
x
0 01
1
10
1 00
321
10
x x 11
x
11
Fig. 5. Modiﬁed Trellis for code with G(D) = [1 + D + D2
1 + D2 ].
terms will stabilize after a certain number of iterations. So after every iteration the stabilized terms form a truncated version of the WDS. Let us apply the GVA to the convolutional code of Fig. 5. Table 1 shows the computations. After seven and eight iterations of the GVA, we see that the partial WDS are x5 + 2x6 + 4x7 + 4x8 + x9 and x5 + 2x6 + 4x7 + 7x8 + 5x9 + x10 respectively. Note that the ﬁrst three terms remain unchanged after the seventh iteration. From Fig. 5, we see that changes to the ﬂow at the 00 state depend on the ﬂow at all other states. After the seventh iteration, the minimum degree term at the 10 state is 3x6 . Since any path connecting 10 to 00 has path metric at least x2 , it can aﬀect only the coeﬃcients of the terms of degree eight and higher. The minimum degree term at 01 is x5 . Since any path from 01 to 00 has path metric at least x3 , it also can aﬀect only the coeﬃcients of the terms of degree eight and higher. A similar argument for 11 proves that the terms of degree seven and lower in the WDS will remain unchanged after every iteration thereafter. The above paragraphs describe how to compute the OWDS. To compute the IOWDS, we simply label each branch with a monomial from Z[X, Y ], i.e. X n Y k , where n and k correspond to the Hamming weights of the output and input respectively associated with the branch. Table 1. GVA Computations for G(D) = [1 + D + D2 State 0 00 1 01 0 10 0 11 0
1 + D2 ].
Iteration 3 4 5 6 x5 x5 + x6 x5 + 2x6 + x7 x5 + 2x6 + 3x7 + x8 x3 x4 x4 + x5 2x5 + x6 4 4 5 5 6 5 x x +x 2x + x x + 3x6 + x7 x4 x4 + x5 2x5 + x6 x5 + 3x6 + x7 7 8 00 x5 + 2x6 + 4x7 + 4x8 + x9 x5 + 2x6 + 4x7 + 7x8 + 5x9 + x10 01 x5 + 3x6 + x7 3x6 + 4x7 + x8 6 7 8 6 10 3x + 4x + x x + 6x7 + 5x8 + x9 11 3x6 + 4x7 + x8 x6 + 6x7 + 5x8 + x9 1 0 x2 0 0
2 0 0 x3 x3
322
7
Mehul Motani and Chris Heegard
Using the SRSA to Compute the WDF
We have seen that the stable terms of the GVA computations yield a truncated version of the WDS. From this partial WDS, we want to ﬁnd the recursion relation and thus compute the WDF whose series expansion yields the entire WDS. It can be shown that this is equivalent to solving the Key Equation. The Euclidean Algorithm can be used to solve the Key Equation but the SRSA provides an algorithmically appealing alternative. The SRSA synthesizes the minimum length linear feedback shift register (LFSR) which generates the WDS coeﬃcients. Theorem 1 in [1] allows one to compute an upper bound on the number, M, of required stable terms of the WDS based on a bound on the Xdegree, D, of the denominator of the WDF. Letting A be the weighted adjacency matrix of the state diagram of the code, it can be shown that D≤ max {deg(I − A)ij } and M ≤ 2D. i
j
We assume we have a noncatastrophic encoder (reasonable since catastrophic encoders are not very interesting). A noncatastrophic encoder for the code implies that the state diagram has no zero weight loops except for the selfloop at the zero state. This means there are a ﬁnite number of error events with a given Hamming weight. Let us ﬁrst compute the OWDF (as in [8]). The noncatastrophic encoder assures that the OWDS, S(X), is an element of Z[[X]]. Theorem 1 implies that the set of recursions is a principal ideal and the SRSA can be used to ﬁnd the generator of that ideal, which is the minimum motonic recursion in Z[X]. us compute the IOWDF. It is clear that the IOWDS, S(X, Y ) = Now let i j i,j Sij X Y , is an element of Z[[X, Y ]]. The IOWDS can also be represented by an inﬁnite two dimensional array over Z, and computing the IOWDF is equivalent to ﬁnding the “minimum” twodimensional recursion in this array. Although it is not clear what “minimum” means for recursions in multiple variables, Theorem 1 implies that the set of ﬁnite recursions is a principal ideal. The “minimum” recursion we are searching for is the generator of that ideal. The noncatastrophic encoder implies that S(X, Y ) is actually an element of Z[Y ][[X]], i.e. it is a univariate series with polynomial coeﬃcients. In other words, the IOWDS can be represented as a sequence of polynomials in Y . The SRSA can then be used to compute the minimum motonic recursion in this sequence. The algorithm used to compute the recursion relation using the GVA and SRSA is described in pseudocode below. Algorithm 1: label and modify trellis as described while(number of stable terms < bound) run one GVA iteration if(one more stable term) call shift register synthesis algorithm end if end while
Computing Weight Distributions of Convolutional Codes
8
323
Conclusion
In this paper, we presented a novel method to compute the WDF of a convolutional code. We used the generalized Viterbi Algorithm to compute a recursive series and shift register synthesis to ﬁnd the minimum recursion in this series. Areas of further research include issues related to the implementation of the algorithm. Since the algorithm requires symbolic computation in multivariate polynomial rings, an eﬃcient implementation is needed to compute the weight distribution of large constraint length convolutional codes. We conclude by giving the OWDF for the best rate 1/2, constraint length ν = 6 convolutional code, which has generator G = [133 171]. The coeﬃcients of the numerator Ω(X) and the denominator Λ(X) are shown below: Ω = [0 · 0 · 0 · 0 · 0 · 0 · 0 · 0 · 0 · 0 · 11 · 0 · 6 · 0 · 25 · 0 · 1 · 0 · 93 · 0 · 15 · 0 · 176 · 0 · 76 · 0 · 243 · 0 · 417 · 0 · 228 · 0 · 1156 · 0 · 49 · 0 · 2795 · 0 · 611 · 0 · 5841 · 0 · 1094 · 0 · 9575 · 0 · 1097 · 0 · 11900 · 0 · 678 · 0 · 11218 · 0 · 235 · 0 · 8068 · 0 · 18 · 0 · 4429 · 0 · 20 · 0 · 1838 · 0 · 8 · 0 · 562 · 0 · 1 · 0 · 120 · 0 · 0 · 0 · 16 · 0 · 0 · 0 · 1] Λ = [1 · 0 · 4 · 0 · 6 · 0 · 30 · 0 · 40 · 0 · 85 · 0 · 81 · 0 · 345 · 0 · 262 · 0 · 844 · 0 · 403 · 0 · 1601 · 0 · 267 · 0 · 2509 · 0 · 389 · 0 · 3064 · 0 · 2751 · 0 · 2807 · 0 · 8344 · 0 · 1960 · 0 · 16133 · 0 · 1184 · 0 · 21746 · 0 · 782 · 0 · 21403 · 0 · 561 · 0 · 15763 · 0 · 331 · 0 · 8766 · 0 · 131 · 0 · 3662 · 0 · 30 · 0 · 1123 · 0 · 3 · 0 · 240 · 0 · 0 · 0 · 32 · 0 · 0 · 0 · 2]
References 1. J. L. Massey, “ShiftRegister Synthesis and BCH Decoding”, IEEE Transactions of Information Theory, Vol. 15, pp. 122127, January 1969. 2. A. J. Viterbi and J. K. Omura, Principles of Digital Communication and Coding, New York: McGrawHill Book Company, 1979. 3. S. Lin and D. J. Costello, Jr., Error Control Coding: Fundamentals and Applications, Englewood Cliﬀs: Prentice Hall, 1983. 4. R. J. McEliece, “On the BCJR Trellis for Linear Block Codes”, IEEE Transactions of Information Theory, Vol. 42, pp. 10721092, July 1996. 5. P. Fitzpatrick and G.H. Norton, “Linear recurring sequences and the path weight enumerator of a convolutional code” Elect. Lett., 27 (1991), pp. 9899. 6. I. Onyszchuk, “Finding the Complete Path and Weight of Convolutional Codes”, JPL TDA Progress Report, 42100, 1990. 7. R. J. McEliece, “How to Compute Weight Enumerators for Convolutional Codes”, Communications and Coding, M. Darnell and B. Honary, eds., Taunton, Somerset, England: Research Studies Press Ltd., pp. 121141, June 1998. 8. M. Motani and C. Heegard, “The Viterbi Algorithm Meets The Key Equation”, Proceedings of International Symposium on Information Theory (Boston, MA), August 1998. 9. S. Golomb, Shift Register Sequences, Revised Edition, Aegean Park Press, May, 1982.
Properties of Finite Response Input Sequences of Recursive Convolutional Codes Didier Le Ruyet1 , Hong Sun2 , and Han Vu Thien1 1
2
Conservatoire National des Arts et M´etiers, Laboratoire Signaux et Syst`emes 75141 Paris Cedex 03, France leruyet@cnam.fr Huazhong University of Science and Technology, Departement of Electronic and Information Engineering, 430074 Wuhan, China caes@blue.hust.edu.cn
Abstract. A recursive convolutional encoder can be regarded as an inﬁnite impulse response system over the Galois Field of order 2. First, in this paper, we introduce ﬁnite response input sequences for recursive convolutional codes that give ﬁnite weight output sequences. In practice, we often need to describe the ﬁnite response sequence with a certain Hamming weight. Then, diﬀerent properties of ﬁnite response input sequences are presented. It is shown that all ﬁnite response input sequences with a certain Hamming weight can be obtained in closedform expressions from the socalled basic sequences. These basic sequences are presented for important recursive convolutional encoders and some possible applications are given.
1
Introduction
Recursive convolutional codes have seldom been employed in the past because their weight enumerating function is equivalent to that of the non recursive convolutional codes [1]. But they have been renewed since they have been used to construct serial and parallel concatenated convolutional codes (turbo codes) whose performances are near Shannon limit (see [2] and [3]). The works of Battail et al. [4] have shown that recursive convolutional codes mimic random coding if the denominator polynomial is chosen as a primitive polynomial. In comparison with non recursive convolutional codes, the input sequences with ﬁnite weight are associated with output sequences with inﬁnite weight, except for a fraction of ﬁnite weight input sequences which generate ﬁnite weight output sequences. These input sequences are called ﬁnite response input sequences (FRISs). In [5], FRISs have been introduced ; the enumeration of FRISs for a Hamming weight w=2 is simple but however, no practical method to enumerate these sequences with a certain Hamming weight w greater than 2 has yet been given. The goal of this paper is to study the properties of ﬁnite response input sequences with weight w and to show how these sequences can be enumerated from one or more basic FRISs. Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 324–333, 1999. c SpringerVerlag Berlin Heidelberg 1999
Properties of FRIS
325
In the next section, we recall some classical deﬁnitions of convolutional codes. The third section we give diﬀerent properties of FRIS and introduce basic FRIS. An example is given to show how these properties can be used to enumerate all the FRIS in closed form. Then, the basic FRISs are presented for some important recursive convolutional encoders. Finally, we will show how these properties can be used to ﬁnd the Hamming weight of the output sequence of any FRIS and to build interleavers for turbo codes.
2
Review of Basics
In order to keep the following expositions selfcontained, we shall introduce recursive convolutional codes and some deﬁnitions to be used later in this section. A rate 1/r recursive convolutional encoder maps the input sequence of information bits u0 , u1 , u2 , . . . into the output sequence of rdimensional code blocks y0 , y1 , y2 , . . . with yn = (y1n , y2n , ..., yrn ) . The encoder also goes through the internal state sequence s0 , s1 , s2 , ..., where each encoder state sn at time n is a Mtuple : sn = [s1n , s2n , ...sM n] . M is the number of delay cells of the encoder and sin is the state at time n of the ith delay cell. The structure of a recursive systematic convolutional encoder of rate 1/2 is shown in Fig.1. A recursive encoder can also be regarded as an inﬁnite impulse response (IIR) system over the ﬁnite ﬁeld GF(2) with input u(D) and output y(D), where D is the unitdelay operator: y(D) = u(D)G(D) with G(D) =
P1 (D) P2 (D) Pr (D) , , ..., Q(D) Q(D) Q(D)
(1)
and y(D) = (y1 (D), y2 (D), ..., yr (D)).
326
Didier Le Ruyet, Hong Sun, and Han Vu Thien
y1 n
un
D
p0
s1n
D
qM  1
q2
q1
s2 n
p1
D
p2
qM
s Mn
pM  1
pM
y2n Fig. 1. The structure of a recursive systematic convolutional encoder of rate 1/2. where Q(D) is a primitive polynomial of degree M: Q(D) = q0 + q1 D + ... + qM DM and Pi (D) is a polynomial of degree at most equal to M: Pi (D) = p0i + p1i D + ... + pM i DM . When the recursive convolutional encoder is systematic, we have y1n = un since P1 (D) = Q(D). Since Q(D) is a primitive polynomial, the encoder generates a pseudo noise (PN) sequence or a maximum length sequence. The period of the PN sequence is 2M − 1. The weight of the output sequence for one period of the PN sequence is 2M −1 [6]. An example of state diagram is shown in Fig.2 for the primitive polynomial Q(D) = 1 + D + D3 . Each edge is labeled by xwI y wO where wI and wO are respectively the weight of the corresponding input and output bit. As the edge drawn in dotted line corresponds to an input bit equal to 0, we can clearly observe the loop corresponding to the PN sequence of period 7 and that the output weight of the PN sequence is equal to 4. We say that the encoder with Q(D) is IIR, since the weightone input sequence (impulse input) produces an inﬁnite response, i.e. an inﬁnite weight output sequence. Definition 1. A finite response input sequence (FRIS) is an input sequence whose first ”1” causes the encoder state to leave the zero state S0 = [0, 0, ..., 0] at time n0 and whose last ”1” brings it back to S0 at time n0 + L − 1 (L > 0). A FRIS will produce a ﬁnite weight output sequence. These FRISs are represented by F (D).
Properties of FRIS [110 ]
y
327
x
1 [111 ] [100 ] S 1 [ 000 ] SO
y
x xy
xy
xy 1
[ 011 ]
x
xy 1
S*
y
[ 001 ] x
y
[101 ] 1
[ 010 ]
Fig. 2. The state diagram for a primitive polynomial Q(D) = 1 + D + D3 .
3
Properties of Finite Response Input Sequences (FRIS)
We have the following theorems about F (D). Theorem 1. A FRIS of a recursive convolutional encoder satisfies the equation: F (D) ≡ 0
(mod Q(D))
(mod 2) .
(2)
Proof. From (1), if and only if Q(D)u(D) (mod 2) , i. e. Q(D) is a factor of u(D) over the ﬁnite ﬁeld GF(2), then yi (D) becomes a ﬁnite order polynomial or a ﬁnite weight output sequence. Since Q(D) is a primitive polynomial, the encoder generates a maximum length sequence of period 2M −1 . We then have: D0 ≡ D2
M
−1
≡1
(mod Q(D))
(mod 2) .
(3)
− 1) (mod 2) .
(4)
Then, (2) becomes: F (D) ≡ 0
(mod Q(D))
(mod D2
M
−1
Theorem 2. If we have a FRIS F(D) of weight w noted F (w) (D): F (w)(D) = Dn1 + Dn2 + ... + Dnw
(5)
where n1 = 0 and n2 , ..., nw are any positive integer, then there exists a family of weight w FRISs : (w) Fm (D) = Dm0 (Dn1 +m1 (2 0 m2 ...mw
M
−1)
+ Dn2 +m2 (2
M
−1)
+ ... + Dnw +mw (2
M
−1)
) (6)
where m1 = 0 and m0 , m2 , ..., mw can be any integer, positive, negative, or zero.
328
Didier Le Ruyet, Hong Sun, and Han Vu Thien
Proof. From (4),(5) and (6), we obtain: (w) (D) ≡ Dm0 F (w)(D) ≡ 0 Fm 0 m2 ...mw
(mod Q(D))
(7)
(mod D2
M
−1
− 1)
(mod 2) .
This theorem tells us that if we ﬁnd any FRIS in a family, we can deduce all the FRISs of this family. We note that there are two diﬀerent kinds of FRISs called simple and complex FRISs. Definition 2. A FRIS is simple if its last ”1” solely brings back the encoder state to S0 . Otherwise, the FRIS is complex since the encoder state returns to S0 more than once. We will now choose a unique representative for each family of simple FRISs, called basic FRIS. (w)
Definition 3. F0 (D) is called a basic FRIS for weight w if and only if the following three conditions are satisfied: (w)
F0
(D)is a FRIS with the form (5) 0 < ni − ni−1 < 2
M
−1
(8)
(∀i)
(9)
nw = min .
(10)
Condition (8) means that the ﬁrst ”1” of a basic FRIS should occur at time 0; condition (9) means that after rearranging n1 , n2 , ...nw in ascendant form, the duration between two consecutive “1” should be less than 2M − 1; condition (10) means that we choose as the basic FRIS the sequence with the minimal length. The basic FRISs of a recursive convolutional encoder depend only on Q(D). We call F (w)(D) which satisﬁes conditions (8) and (9) a secondary basic (w) FRIS FS (D). The next theorem will show how to describe all the FRISs with weight w. Theorem 3. Supposing w = i wi (wi > 1), all the FRISs can be obtained in (w) (w ) the form (6) from F0 (D) and from combinations of F0 i (D). In particular for w=2 and w=3, since we have no combination by w = i wi (wi > 1), each FRIS is obtained from basic FRISs according to (6). The next theorem will give us the total number of basic FRISs for each weight w. Theorem 4. For w=2, there exists only one basic FRIS: 1 + D2 For w=3, there exists M 2 −2 basic F RISs. A33
M
−1
(11)
Properties of FRIS
For 4 ≤ w < 2M − 1, there exists M (2 − 2)(2M − 3)w−3 − Nw Aw w
basic F RISs.
329
(12)
Nw is the number of F (w) (D) which are constructed from secondary basic FRISs (w ) FS i (D) ; Anp is the number of ordered selections of p elements from a set of n elements, and c means c rounded to the nearest integer towards plus inﬁnity. Proof. Since Q(D) is a primitive polynomial of degree M, (s1 s0 = S0 ) = S1 when an input “1” occurs at time 0, where S1 = [1, 0, ...0]; and then, in the absence of an input, sn goes through all possible 2M − 1 nonzero encoder states and repeats with period 2M − 1; it returns to S0 if and only if an input “1” occurs and the current state is S ∗ = [0, ..., 0, 1]. So, if we exclude the ﬁrst “1” and the last “1” of this FRIS, the w − 2 other “1”s can occur under any state sni sni = S0 , sni = S ∗ , sni = sni−1 . Note that, for the second ”1” of the FRIS, sni−1 = s0 . Therefore, there are (2M − 2)(2M − 3)w−3 diﬀerent secondary basic FRISs including those that are constructed from F (wi ) (D); on the other hand, from (6) each family includes Aw w secondary basic FRISs if ni − ni−1 = nj − nj−1 (i = j) and possibly less than Aw w otherwise. As a result, we conclude that there exist ((2M − 2)(2M − 3)w−3 − Nw )/Aw w basic FRISs. For w = 2, there is only one basic FRIS which has the ﬁrst “1” corresponding to the leaving of the zero state to S1 and the other “1” for the return from M (2) s2M −1 = S ∗ to the zero state S0 , that is, F0 (D) = 1 + D2 −1 . For w = 3, N3 = 0, then there are (2M − 2)/A33 basic FRISs. Example 1. Supposing M=3 and Q(D) = 1 + D + D3 . (2) For w=2, since 2M − 1 = 7, F0 (D) = 1 + D7 . All weight2 FRISs can be written as follows according to (6) : (2) Fm0 m2 (D) = Dm0 (1 + D7m2 ), (3) For w=3, there exists (2M − 2)/A33 = 1 basic FRIS, F0 (D) = 1 + D + D3 . All weight3 FRISs can be written as follows according to (6) : (3) Fm0 m2 m3 (D) = Dm0 (1 + D1+7m2 + D3+7m3 ), for example, (3) F6,−1,−1 (D) = 1 + D2 + D6 , (3)
F4,0,−1(D) = 1 + D4 + D5 . (2)
For w=4, since 4 = 2+2, and F0 (D) = 1 + D7 , we have F (4)(D) which are (2) (4) combinations of secondary basic FRISs FS (D) written by F∗ (D): (4) (2) (2) F∗ (D) = F0 (D) + Dli F0 (D), li = 1, 2, ..., 6. M Clearly, here N4 = 6, ((2 − 2)(2M − 3)4−3 − N4 )/A44 = 1 and we have (4) (4) one F0 (D) that is, F0 (D) = 1 + D2 + D3 + D4 . Therefore, the following two equations describe all simple weight4 FRISs : F (4)(D) = Dm0 (1 + D2+7m2 + D3+7m3 + D4+7m4 ),
330
Didier Le Ruyet, Hong Sun, and Han Vu Thien
F∗ (D) = Dm0 [(1 + D7m1 ) + Dli (1 + D7m2 )], where m0 , mi can be any integer and li = 1, 2, ..., 6. And the following equation describe all complex weight4 FRISs: (4) Fcom (D) = Dm0 (1 + D7m1 + D7m2 + D7m3 ) where m0 , mi can be any integer and mi = mj (i = j). (4)
4
Tables
In this section, we will give a list of basic FRISs for recursive convolutional encoders with M=2, 3, 4 and 5. The following basic FRISs have been obtained from an exhaustive search since there is no known method to ﬁnd them. Table 1. Basic FRISs for M = 2 Q(D) = 1 + D + D2 . (w)
w
F0
2 3 4
1 + D3 1 + D + D2 1 + D + D3 + D4
Table 2. Basic FRISs for M = 3 Q(D) = 1 + D + D3 . (w)
w
F0
2 3 4
1 + D7 1 + D + D3 1 + D2 + D3 + D4
Table 3. Basic FRISs for M = 4 Q(D) = 1 + D + D4 . (w)
(w)
(w)
w
F0
w
F0
w
F0
2 3 3 3
1 + D15 1 + D + D4 1 + D2 + D8 1 + D5 + D10
4 4 4 4
1 + D2 + D4 + D5 1 + D5 + D6 + D7 1 + D + D3 + D7 1 + D + D5 + D8
4 4 4
1 + D3 + D6 + D8 1 + D3 + D4 + D9 1 + D4 + D8 + D10
Properties of FRIS
331
Table 4. Basic FRISs for M = 5 Q(D) = 1 + D + D2 + D3 + D5 . (w)
(w)
(w)
w
F0
w
F0
w
F0
2 3 3 3 3 3 4 4 4 4 4 4 4 4
1 + D31 1 + D3 + D8 1 + D7 + D9 1 + D + D12 1 + D6 + D16 1 + D4 + D17 1 + D4 + D5 + D6 1 + D + D4 + D7 1 + D + D3 + D10 1 + D2 + D7 + D11 1 + D6 + D8 + D11 1 + D4 + D9 + D12 1 + D8 + D10 + D12 1 + D3 + D5 + D13
4 4 4 4 4 4 4 4 4 4 4 4 4 4
1 + D2 + D12 + D13 1 + D + D5 + D14 1 + D6 + D7 + D14 1 + D2 + D8 + D14 1 + D10 + D13 + D14 1 + D4 + D8 + D15 1 + D9 + D10 + D15 1 + D5 + D11 + D15 1 + D3 + D11 + D16 1 + D9 + D14 + D16 1 + D13 + D15 + D16 1 + D + D9 + D17 1 + D7 + D12 + D17 1 + D11 + D13 + D17
4 4 4 4 4 4 4 4 4 4 4 4 4
1 + D3 + D15 + D17 1 + D5 + D16 + D17 1 + D6 + D9 + D18 1 + D5 + D12 + D18 1 + D7 + D16 + D18 1 + D3 + D7 + D19 1 + D8 + D9 + D19 1 + D4 + D10 + D19 1 + D11 + D18 + D19 1 + D10 + D17 + D20 1 + D3 + D9 + D20 1 + D6 + D13 + D21 1 + D8 + D17 + D21
5 5.1
Examples of Application Hamming Weight of the Output Sequences of Finite Input Response Sequences
In this section, we will show how to use the properties introduced above to compute the Hamming weight of the output sequence of any FRIS. Theorem 5. Consider an arbitrary FRIS of weight w F (w) (D): F (w)(D) = Dm0 (Dn1 + Dn2 + ... + Dnw ), where n1 = 0 and ni > ni−1 , (∀i). d[F (w)(D)] denotes the Hamming weight of the output sequence. We have (w)
d[F (w)(D)] = d[FS (D)] + d[P N ]
w
bi
(13)
i=2
(w)
where FS (D) is the secondary basic FRIS FS (D) = Dl1 + Dl1 +l2 + ... + D (w)
with
i li
,
l1 = 0 li ≡ ni − ni−1 − 1 (mod 2M − 1) + 1 (ni − ni−1 − li ) bi = . 2M − 1
(14)
332
Didier Le Ruyet, Hong Sun, and Han Vu Thien
d[P N ] is the weight of the output sequence for one period of the PN sequence. d[P N ] = 2M −1 since Q(D) is a primitive polynomial. This theorem tells us that we can calculate the Hamming weight of the output sequence of any FRIS from its associated secondary basic FRIS. We will now give a method to ﬁnd the Aw w secondary basic FRISs from the basic FRIS. (w) Consider a basic FRIS F0 (D) (w)
F0
(D) = Dn1 + Dn2 + ... + Dnw ,
(15)
where n1 = 0 , ni > ni−1 (∀i). From this basic FRIS, we can deduce all the simple FRIS of the family F (w) (D) = Dm0 (Dn1 +m1 (2 l1
l2
M
−1)
+ Dn2 +m2 (2
M
−1)
+ ... + Dnw +mw (2
M
−1)
) (16)
lw
= D + D + ... + D , where m1 = 0 , li = m0 + ni + mi (2M − 1)(∀i) . All the secondary basic FRISs can be obtained from the basic FRIS by permutation of n1 , n2 , ..., nw and then searching m0 , m2 , ..., mw to satisfy the inequality l1 < l2 < ... < lw and l1 = 0 , li − li−1 < 2M − 1. Example 2. Supposing M=3, w=3 and Q(D) = 1 + D + D3 . There is only one basic FRIS : (3) F0 (D) = D0 + D + D3 = 1 + D + D3 . We have (3) FS1 (D) = D0 + D3 + D1+7 = 1 + D3 + D8 with m0 =0,m2 =0, m3 =1 . (3) FS2 (D) = D−1 (D1 + D3 + D0+7 ) = 1 + D2 + D6 with m0 =1,m2=0,m3 =1 . ... 5.2
Interleaver Construction for Turbo Codes
Turbo codes are a parallel concatenation of recursive systematic convolutional codes [2]. The turbo encoder consists of two recursive convolutional codes and an interleaver of size N. An example of a turbo encoder is shown in Fig.3. The N bits information sequence u(D) is encoded twice : ﬁrstly by C1 and secondly after interleaving by C2. A tail sequence composed of M bits is added after the information sequence in order to bring the internal state of the ﬁrst encoder to the zero state. As a consequence only FRISs are allowed. So we can use the properties of FRISs for the construction of the interleaver. The interleaver should improve the weight distribution and the free distance of turbo codes. An optimal interleaver should map the input sequences u(D) which generate low weight output sequences y1 (D) with sequences v(D) which generate high weight output sequence y2 (D) and vice versa. For the construction of the interleaver, we can take into account only the input sequences u(D) which generate low weight output sequences. These sequences can be enumerated using the properties of FRISs introduced above. The
Properties of FRIS
333
u( D )
u( D)
C1
y 1( D )
C2
y 2( D )
E
v( D)
Fig. 3. The structure of a turbo encoder of rate 1/3. weight of the associated output sequence y1 (D) is calculated by using (13). The weight of the output sequence y2 (D) can also be obtained using a generalization of this principle. In [7], we have shown that these properties combined with a tree research method for construction of the interleaver can produce very good interleavers.
6
Conclusion
The ﬁnite response input sequences (FRISs) for a recursive convolutional encoder with a primitive polynomial can be deﬁned by (4). In this paper, new practical properties of FRISs with a certain Hamming weight w are presented. We have introduced the basic FRIS and shown that we could write all FRISs with weight w in closedform expressions from these basic FRISs. These properties can be employed in many applications, such as the computing of the weight enumerators of these codes and the construction of eﬃcient interleavers for turbo codes.
References 1. Forney, G. D.: Convolutional codes I: Algebraic structure. IEEE Trans. Inform. Theory IT16 (1970) 720–738 2. Berrou, C., Glavieux, A.,Thitimajshima, P.: Near Shannon limit error correcting coding and decoding : Turbocodes. Proc. of Int. Conf. on Comm., Geneva, Switzeland (1993) 1064–1070 3. Benedetto, S., Divsalar, D., Montorsi, G., Pollara, F.: Serial concatenation of interleaved codes : performance analysis, design and iterative decoding. IEEE Trans. Inform. Theory IT44 (1998) 909–926 4. Battail, C., Berrou, C.,Glavieux, A.: Pseudorandom recursive convolutional coding for nearcapacity performance. Proc. of GLOBECOM’93, Houston, Texas, USA (1993) 23–27 5. Podemski, R.,Holubowicz, W.,Berrou, C.,Battail, G.: Hamming distance spectra of turbocodes. Ann.Telecommun. 50 (1995) 790–797 6. Golomb, S. W.: Shift register sequences. revised version Aegean Park, Laguna Hills, CA (1982) 7. Le Ruyet, D., Sun, H., Vu Thien, H.: New method for the construction of interleavers. submit to Int. Symp. on Info. Theory, Sorrento, Italia (2000)
Lower Bounds for Group Covering Designs K.K.P. Chanduka, Mahesh C. Bhandari, and Arbind K. Lal Department of Mathematics Indian Institute of Technology, Kanpur, 208 016 India {mcb,arlal}@iitk.ac.in, chanduka@inf.com
Abstract. A group covering design (GCD) is a set of mn points in n disjoint groups of size m and a collection of b k−subsets, called blocks, such that every pairset not contained in the same group occurs in at least one block. For m = 1, a GCD is a covering design [5]. Particular cases of GCD’s, namely transversal covers, covering arrays, Sperner systems etc. have been extensively studied by Poljak and Tuza [22], Sloane [24], Stevens et al. [26] and others. Cohen et al. [8], [9] and Sloane [24] have also shown applications of these designs to software testing, switching networks etc.. Determining the group covering number, the minimum value of b, for given k, m and n, in general is a hard combinatorial problem. This paper determines a lower bound for b, analogous to Sch¨ onheim lower bound for covering designs [23]. It is shown that there exist two classes of GCD’s (Theorems 15 and 18) which meet these bound. Moreover, a construction of a minimum GCD from a covering design meeting the Sch¨ onheim lower bound is given. The lower bound is further improved by one for three diﬀerent classes of GCD’s. In addition, construction of group divisible designs with consecutive block sizes (Theorems 20 and 21) using properties of GCD’s are given.
1
Introduction
Let K and M be sets of positive integers and let λ be a positive integer. A triple (X, G, B) is a group divisible design (GDD), denoted GD[K, λ, M; v], if (i) X is a ﬁnite set of v elements (points); (ii) G = {G1 , G2 , . . . , Gn }, n > 1, is a partition of X with Gi  ∈ M. The elements of G are called groups; (iii) B is a collection of subsets, called blocks, of X such that B ∈ K and B ∩ G ≤ 1 for every B ∈ B and G ∈ G; (iv) every pairset {x, y} ⊂ X such that x and y belong to distinct groups is contained in exactly λ blocks. Observe that a GD[K, λ, {1}; v] is a pairwise balanced design, denoted P BD[K, λ; v]. If K = {k} and M = {m} then GD[K, λ, M; v] is often denoted by GD[k, λ, m; v]. A GD[k, λ, m; km] is called a transversal design and is denoted by T D[k, λ; m], while a GD[k, λ, 1; v] is known as a balanced incomplete block design, denoted B[k, λ; v]. If λ = 1 one usually represents GD[k, λ, m; v], T D[k, λ; m], respectively by GD[k, m; v], T D[k; m]. Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 334–345, 1999. c SpringerVerlag Berlin Heidelberg 1999
Lower Bounds for Group Covering Designs
335
A pair (X, B) is said to be a (v, k, λ)−covering design, denoted AD[k, λ; v], if X = v and B is a collection of k−subsets of X such that every distinct pair of elements of X is contained in at least λ blocks of B. Let C(k, λ; v) = min B : (X, B) is an AD[k, λ; v] is called the covering number. In [23], Sch¨ onheim has shown that v λ(v − 1) ≡ α(k, λ; v) (1) C(k, λ; v) ≥ k k−1 where x is the least integer such that x ≤ x. This has been further sharpened by Hanani[14] in the following theorem. Theorem 1. [14] Let (X, B) be a covering design AD[k, λ; v]. If λ(v − 1) ≡ 0 (mod k − 1) and λv(v − 1)/(k − 1) ≡ −1 (mod k) then C(k, λ; v) ≥ α(k, λ; v) + 1. In [4], Caro and Raphael has shown that if λ = 1 and the conditions of Theorem 1 are not satisﬁed then equality in (1) is attained for v ≥ v0 (k). v v−1 Theorem 2. [4] For v ≥ v0 (k), C(k, 1; v) = k k−1 ≡ α(k, λ; v), unless k − 1 is even, (v − 1) ≡ 0 (mod k − 1) and v(v − 1)/(k − 1) ≡ −1 (mod k) in which case C(k, 1; v) = α(k, 1; v) + 1. Recently, the authors [2], Honkala [15] and Zhang [28] have used covering designs to improve lower bounds on binary covering codes. An attempt to extend the counting arguments of covering designs to study q−ary covering codes for q ≥ 3 has been made by Chanduka [5], Chen and Honkala [7]. It motivates a study of a new combinatorial design, called group covering design, which generalizes the concepts of covering designs and group divisible designs. Particular cases of group covering designs, namely, transversal covers, covering arrays, Sperner systems etc. have been extensively studied by Poljak and Tuza [22], Sloane [24], Stevens et al. [26] and others. Cohen et al. [8], [9] and Sloane [24] have also shown applications of these designs to software testing, switching networks etc. Deﬁnitions and basic properties of group covering design and group covering number are discussed in section 2 of this paper. Lower bounds for the group covering number are also obtained in this section. Section 3 contains a few lower bounds for group covering number obtained from known designs. Few constructions for a class of group covering designs are given in section 4. It is shown that for any k there exists two classes of GCD’s which meet the bound given by (5). The paper concludes by constructing some GDD’s with consecutive block sizes using properties of group covering designs. The following two theorems of Hanani [14] give necessary conditions for the existence of a GDD and a TD. Theorem 3. [14] : If a group divisible design GD[k, λ, m; v] exists then v ≥ km, v ≡ 0
(mod m), λ(v − m) ≡ 0
(mod k − 1)
(2)
336
K.K.P. Chanduka, Mahesh C. Bhandari, and Arbind K. Lal
and λv(v − m) ≡ 0
(mod k(k − 1)).
(3)
Theorem 4. [14] : If v = km, m > 1, λ = 1 and m ∈ T D(k) then m ≥ k − 1. Thus, if 1 < m < k−1 then a GD[k, m; km] does not exist. Further, Tarry [27] has shown that a GD[4, 6; 24] does not exist. Therefore, the conditions of Theorem 3 are not suﬃcient for the existence of a GD[k, λ, m; v]. However, group divisible designs with block sizes 3 and 4 are known to exist (see e.g., [3] and [14]) for all λ, m and v satisfying the necessary conditions of Theorem 3 with the exceptions of GD[4, 2; 8] and GD[4, 6; 24]. Very little is known for the case k ≥ 5 except for some special cases, see Assaf [1]. If λ = 1 and v > km then the following theorem gives a necessary condition for the existence of a group divisible design. Theorem 5. If there exists a group divisible design GD[k, m; v] with v > km then v ≥ k(k − 1) + m. Proof. Let (X, G, B) be a group divisible design GD[k, m; v] and let B ∈ B. Then there exists a group G ∈ G such that B G = φ. Let x0 ∈ G. For any point x ∈ B, let Bx be the block which contains the pairset {x0 , x}. Then for any two distinct points x and y of B, (B \{x }) (By \{x0 }) = φ. For if x 0 y0 ∈ Bx By , y0 = x0 , then {y0 , x0 } ⊆ Bx By , contradicting the fact that each pair is contained in exactly one block. Also (B \{x }) G = φ for every x 0 x ∈ B. Thus, the set G ( x∈B Bx \{x0 }) contains m + k(k − 1) points and this number cannot exceed v. Remark 1. If m = 1, the above theorem gives v ≥ k(k − 1) + 1, the famous Fisher s inequality for λ = 1. The above theorem also shows nonexistence of many group divisible designs. For example, it is easy to verify that GD[8, 2; 30], GD[9, 3; 51] and GD[10, 3; 48] do not exist. For other elementary results on GDD and covering designs the reader is referred to [10].
2
Group Covering Designs
It has been observed that a group divisible design may not always exist for certain values of v, k and m. To deal with such cases, the notion of group covering designs analogous to covering designs is introduced in this section. The study of group covering designs apart from being a generalization of covering designs ﬁnds applications in qary covering codes, q ≥ 3 (see, [5], [6] and [7]). Throughout this paper, unless otherwise stated m, n and k are assumed to be positive integers with n ≥ k ≥ 2 and the members of a pairset belong to distinct groups.
Lower Bounds for Group Covering Designs
337
Definition: A triple (X, G, B) is a group covering design (GCD), denoted GC[k, m; mn], if X = mn, G is a partition of X into m−subsets and B is a collection of k−subsets of X such that G ∩ B ≤ 1 for all G ∈ G and B ∈ B and every pairset is contained in at least one block of B. In case  G B  = 1 for G ∈ G and B ∈ B, the group covering design GC[k, m; km] is called a transversal covering design, denoted T C[k; m] (also referred to as transversal covers [26]). It follows immediately that an AD[k, 1; n] is a group covering design GC[k, 1; n] and vice versa. The number G(k, m; mn) = min B : (X, G, B)is a GC[k, m; mn] is called the group covering number. In case n = k, G(k, m; km) will be denoted by tc(k; m). A trivial lower bound for G(k, m; mn) is given by G(k, m; mn) ≥
nm2 (n − 1) . k(k − 1)
(4)
Equality in (4) is attained if there exists a GD[k, m; mn]. Analogous to Sch¨ onheim lower bound[23] for covering designs, the following theorem gives a better lower bound for the group covering number. Theorem 6. Let (X, G, B) be a group covering design GC[k, m; mn]. Then mn m(n − 1) ≡ β(k, m; mn) (say). (5) G(k, m; mn) ≥ k k−1 Proof. Let x0 ∈ X. Then the number of pairsets containing x0 is m(n − 1). If B ∈ B is a block containing x0 then B can cover (k − 1) of these pairsets. Hence the number of blocks in the group covering design containing x0 is at least m(n − 1)/(k − 1). A simple counting argument gives kG(k, m; mn) ≥ mn m(n − 1)/(k − 1) . As G(k, m; mn) is an integer, the result follows. n n−1 Remark 2. If m = 1 then C(k, 1; n) = G(k, 1; n) ≥ k k−1 which is the best known lower bound for the covering number C(k, 1; n) [23]. A similar result was proved by Chen and Honkala [7] while estimating the minimum number of (R + 2)weight codewords required to cover all 2weight words of any qary covering code of length n and covering radius R. If a group divisible design GD[k, m; mn] exists then k, m and n must satisfy 2 (n−1) ( 2) and ( 3) and the number of blocks is nm k(k−1) . This observation gives the following theorem. Theorem 7. Let k, m and n satisfy ( 2) and ( 3). If a group divisible design 2 (n−1) GD[k, m; mn] does not exist then G(k, m; mn) ≥ nm k(k−1) + 1. In particular, when n = k and 1 < m < k − 1, by Theorem 4 tc(k; m) ≥ m2 + 1. In [26], Stevens et al have shown that if m ≥ 3 and k ≥ m + 2 then tc(k; m) ≥ m2 + 3 with the only exception being tc(5; 3) = 11.
338
K.K.P. Chanduka, Mahesh C. Bhandari, and Arbind K. Lal
Let (X, G, B) be a group covering design GC[k, m; mn] and let x0 ∈ X. Following the arguments of Theorem 6, the number of blocks containing x0 , denoted f(x0 ), satisﬁes m(n − 1) f(x0 ) ≥ ≡ m0 (say). k−1 Let g(x0 ) = f(x0 ) − m0 . If g(x0 ) > 0 and m(n − 1) ≡ 0 (mod k − 1) then there exists at least one pairset {x0 , y} which is contained in two or more blocks. This proves the following lemma. Lemma 1. Let (X, G, B) be a group covering design GC[k, m; mn] and let {x, y} be a pairset. If either g(x) = 0 or g(y) = 0 and m(n − 1) ≡ 0 (mod k − 1) then the pairset {x, y} is contained in exactly one block of the group covering design. The following theorem is analogous to the result obtained by Hanani for covering designs, see [14]. Theorem 8. Let (X, G, B) be a group covering design GC[k, m; mn]. If m(n − 1) ≡ 0 (mod k − 1) and nm2 (n − 1)/(k − 1) ≡ −1 (mod k) then G(k, m; mn) ≥ β(k, m; mn) + 1. Proof. If possible, suppose G(k, m; mn) = β(k, m; mn). By hypothesis nm2 (n−1)+(k−1) and hence β(k, m; mn) = x∈X g(x) = 1. So there exists a k(k−1) unique x0 ∈ X such that g(x0 ) = 1 and g(x) = 0 for x = x0 . Hence by Lemma 1, there exists a pairset {x0 , u} which is contained in at least two blocks. Thus g(u) > 0 for u = x0 , a contradiction. The above theorem gives many improvements to (5), e.g., G(7, 2; 26) ≥ 16, G(7, 2; 32) ≥ 24, G(5, 4; 32) ≥ 46, G(5, 4; 52) ≥ 126, G(11, 4; 64) ≥ 36, G(5, 4; 72) ≥ 246. Lemma 2. Let (X, G, B) be a group covering design GC[k, m; mn] and let m(n− 1) ≡ 0 (mod k − 1) and nm2 (n − 1)/(k − 1) ≡ −2 (mod k). If G(k, m; mn) = β(k, m; mn) then there exists a unique pairset {x0 , y0 } ⊆ X which is contained in exactly k blocks and every other pairset is contained in exactly one block. 2 Proof. Note that, G(k, m; mn) = nm (n−1)+2(k−1) and x∈X g(x) = 2. k(k−1) Thus, there exists x0 ∈ X with g(x0 ) > 0. Hence there exists y0 ∈ X such that {x0 , y0 } is contained in at least two blocks. Thus g(y0 ) ≥ 1. Hence g(x0 ) = g(y0 ) = 1 and g(x) = 0 for all other x ∈ X. Hence by Lemma 1, every pairset having x0 or y0 will be contained in exactly one block. Since each block contains k 2 pairsets and k(k−1) β(k, m; mn) = nm (n−1) + (k − 1), the pairset {x0 , y0 } 2 2 2 will be contained in exactly k blocks and the result follows.
Theorem 9. Let (X, G, B) be a group covering design GC[k, m; mn] and let m(n − 1) ≡ 0 (mod k − 1) and nm2 (n − 1)/(k − 1) ≡ −2 (mod k). If m(n − 1)/(k − 1) < k + m − 2 then G(k, m; mn) ≥ β(k, m; mn) + 1.
Lower Bounds for Group Covering Designs
339
Proof. Suppose G(k, m; mn) = β(k, m; mn). By Lemma 2, there exists a pairset {x0 , y0 } contained in exactly k blocks of B and f(x0 ) = f(y0 ) = m0 + 1. Let y0 ∈ G. Then for every y ∈ G, y = y0 , the pairset {x0 , y} must be contained in exactly one block of B. Thus f(x0 ) ≥ k + (m − 1), a contradiction. As a consequence of the above theorem, there are many improvements to (5), e.g., G(6, 2; 22) ≥ 16, G(8, 3; 45) ≥ 35, G(5, 4; 28) ≥ 35, G(10, 4; 76) ≥ 62. Lemma 3. Let n ≥ k > 3 and let (X, G, B) be a group covering design GC[k, m; mn]. If m(n − 1) ≡ 0 (mod k − 1), nm2 (n − 1)/(k − 1) ≡ −3 (mod k) and G(k, m; mn) = β(k, m; mn) then there exists three elements x0 , y0 and z0 of X belonging to distinct groups such that each of the pairsets {x0 , y0 }, {y0 , z0 } and {z0 , x0 } is contained in 12 (k + 1) blocks of B and all other pairsets are contained in exactly one block. nm2 (n−1)+3(k−1) and x∈X g(x) = 3. Proof. Observe that G(k, m; mn) = k(k−1) Thus, there exists a point x0 ∈ X with g(x0 ) > 0. Following the arguments given in Lemma 2, there exists y0 ∈ X with g(y0 ) > 0 and x0 and y0 are in 2 two distinct groups. Note that k(k−1) β(k, m; mn) = nm (n−1) + 32 (k − 1). Now 2 2 if g(x0 ) = 2 and g(y0 ) = 1 then g(x) = 0 for all other x ∈ X and the pairset {x0 , y0 } will be contained in 32 (k − 1) + 1 blocks of B. Hence g(x0 ) = g(y0 ), a contradiction. Therefore, g(x0 ) = g(y0 ) = 1 and there exists z0 ∈ X such that g(z0 ) = 1. Hence for each x ∈ {x0 , y0 , z0 }, the number of pairsets containing x ( counting multiplicity ) that are contained in more than one block is k − 1. By Lemma 1 these (k − 1) extra pairsets containing x0 must contain either y0 or z0 . Similar statements hold for y0 and z0 . If x0 and z0 belong to the same group then since the pairset {x0 , y0 } occurs in exactly k blocks, there will be no extra pairset containing z0 , i.e., g(z0 ) = 0, a contradiction. Hence x0 , y0 and z0 must belong to distinct groups, each of the pairsets {x0 , y0 }, {y0 , z0 } and {z0 , x0 } must occur in (k + 1)/2 blocks and all other pairsets occur exactly once. The proof of the following theorem, being similar to the proof of Theorem 9 is omitted. Theorem 10. Let (X, G, B) be a group covering design GC[k, m; mn] and let m(n − 1) ≡ 0 (mod k − 1) and nm2 (n − 1)/(k − 1) ≡ −3 (mod k). If m(n − 1)/(k − 1) < (k − 3)/2 + m, then G(k, m; mn) ≥ β(k, m; mn) + 1. The above theorem gives improvements to (5) for many values of k, m and n; e.g., G(7, 2; 20) ≥ 10, G(9, 2; 26) ≥ 10, G(11, 2; 32) ≥ 10, G(9, 4; 52) ≥ 36. Remark 3. If m = 1, Lemma 2 and 3 give analogous results for covering designs. Theorem 11. G(5, 2; 14) ≥ 10. Proof. Suppose (X, G, B) is a group covering design GC[5, 2; 14] with 9 blocks. Let G = {G1 , . . . , G7 }. For simplicity, let Gi = {1, 2} meaning thereby that 1(2) represents the ﬁrst (second) element of the group Gi . Moreover for B ∈ B (pairset
340
K.K.P. Chanduka, Mahesh C. Bhandari, and Arbind K. Lal
P ) it is convenient to write B = x1 . . . x7 (P = x1 . . . x7 ) with xi ∈ Gi ∪{θ}. Here xj = θ will mean B(P ) does not contain any element of Gj . Since β(5, 2; 14) = 9, by Lemma 3 there exists 3 pairsets, say P1 = 11θ . . . θ, P2 = θ11θ . . . θ and P3 = 1θ1θ . . . θ, that are contained in exactly 3 blocks of B and every other pairset is contained in exactly 1 block. Following the argument given in Lemma 3, it is easy to see that for each i = 1, 2, 3 there exists exactly 4 blocks having xi = 1. Since each of the pairsets 12θ . . . θ, 21θ . . . θ and 22θ . . . θ occurs in exactly 1 block, there exists 6 blocks B1 , . . . , B6 whose ﬁrst 2 entries are 11, 11, 11, 12, 21 and 22 respectively. Since each of the pairsets P2 and P3 occurs in exactly 3 blocks and x3 = 1 for only 4 blocks, the following two possibilities arise. Case 1: B1 = 111 ∗ ∗ ∗ ∗ B2 = 111 ∗ ∗ ∗ ∗ B3 = 111 ∗ ∗ ∗ ∗ B4 = 122 ∗ ∗ ∗ ∗ B5 = 212 ∗ ∗ ∗ ∗ B6 = 221 ∗ ∗ ∗ ∗ Case 2: B1 = 111 ∗ ∗ ∗ ∗ B2 = 111 ∗ ∗ ∗ ∗ B3 = 112 ∗ ∗ ∗ ∗ B4 = 121 ∗ ∗ ∗ ∗ B5 = 211 ∗ ∗ ∗ ∗ B6 = 22γ ∗ ∗ ∗ ∗ where ∗ denotes an element of {θ, 1, 2} and γ = 1. Suppose that Bi ’s have the conﬁguration given in Case 1. Then each pairset with x1 = 1, x2 = x3 = θ must be contained in exactly one Bi , 1 ≤ i ≤ 4. Without loss of generality, let B5 = 212αβθθ and let P = 1θθαθθθ. If P is contained in B1 or B2 or B3 , say B ∗ , then the pairset Q = θ1θαθθθ will be contained in B ∗ and B5 , a contradiction. Otherwise P is contained in B4 and hence the pairset R = θθ2αθθθ is contained in B4 and B5 , a contradiction. Thus β(5, 2; 14) > 9. The proof in Case 2 follows similarly.
3
Lower Bounds for G(k, m; mn) from Known Combinatorial Designs
A relation between covering numbers and group covering numbers is given by the following theorem. Theorem 12. If n ≥ m + 1 then C(m + 1, 1; mn + 1) ≤ n + G(m + 1, m; mn). Proof. Let (Y, G, P) be a minimum group covering design GC[m+1, m; mn] and let G = {G , G , . . . , G }. Let y ∈ Y. Consider X = Y {y } and B = 1 2 n 0 0 P P , where P = {G1 {y0 }, G2 {y0 }, . . . , Gn {y0 }}. It is easy to verify that (X, B) is a covering design AD[m + 1, 1; mn + 1]. Thus C(m + 1, 1; mn +1) ≤ n + G(m + 1, m; mn). As an immediate consequence of the above theorem, we have, Corollary 1. If n ≥ m + 1 then G(m + 1, m; mn) ≥ β(m + 1, m; mn) + ∆(mn + 1; m + 1) where ∆(mn + 1; m + 1) = C(m + 1, 1; mn + 1) − α(m + 1, 1; mn + 1). Proof. Observe that α(m + 1, 1; mn + 1) − n = β(m + 1, m; mn). If ∆(mn + 1; m + 1) > 0 then the above corollary gives an improvement to (5), e.g., G(4, 3; 18) ≥ β(4, 3; 18) + 2 = 25, and G(5, 4; 28) ≥ β(5, 4; 28) + 1 = 35. Remark 4. Let n(mn + 1) ≡ −1 (mod m + 1). Then Hanani [14] has shown that C(m + 1, 1; mn + 1) ≥ α(m + 1, 1; mn + 1) + 1 and hence by Corollary 1 G(m + 1, m; mn) ≥ β(m + 1, m; mn) + 1.
Lower Bounds for Group Covering Designs
341
Theorem 13. Let t ≥ n ≥ 3. If there exists a transversal design T D[n; m] then G(n, m; m(t(n − 1) + 1)) ≤ tm2 + G(n, m(n − 1); m(n − 1)t). Proof. Let {(Xj , Gj , Bj ) : j = 1, . . . , t} be a collection of t transversal designs T D[n; m] satisfying Xi ∩ Xj = G0 for all i = j and Gj = {G0 = Gj1, Gj2 , . . . , Gjn }. Let Y = j,l,l=1 Gjl , and let G = {G1 , G2 , . . . , Gt } where Gi = 2≤l≤n Gil . Let P be a collection of n−subsets of Y such that (Y, G, P) is a minimal group covering design GC[n, m(n − 1); m(n − 1)t]. Then (X, H, B) where X = Y ∪ G0 , H = j Gj and B = P ∪ B1 ∪ · · · ∪ Bt is a GC[n, m; m((n − 1)t + 1)]. If m is a prime power then a T D[m + 1; m] exists [14]. Hence we have Corollary 2. Let m be a prime power with m = n − 1 ≥ 2 and t ≥ n. If G(n, m(n − 1); t) = β(n, m(n − 1); t) then G(n, m; t(n − 1) + 1) = β(n, m; t(n − 1) + 1). Proof. Note that β(n, m(n − 1); t) = tm2 (n − 1)(t − 1)/n. Hence by the above theorem, G(n, m; t(n − 1) + 1) ≤ β(n, m; t(n − 1) + 1) and now the result follows from (5).
4
Exact Bounds for G(k, m; mn)
Constructions of minimum group covering designs can be quite challenging. In this section we use the structure of covering designs, meeting certain conditions, to construct few classes of minimum group covering designs. In [4], Caro and Raphael have shown that for each k, the size of the block, there exists a minimum covering design. It is used to establish the existence of a minimum group covering design for any k. Recall that tc(k; m) = G(k, m; km). Using extremal set theory, Kleitman and Spencer [16] have proved the following theorem. s−1 Theorem 14. [16] Let s be a positive integer and let g(s) = s/2 −1 . Then tc(k; 2) = min{s : g(s) ≥ k}. If a covering design meeting the Sch¨ onheim bound exists then the following theorem gives a construction of a minimum group covering design. Theorem 15. Let (v − 1) ≡ 0 (mod k − 1), n = (v − 1)/(k − 1), nv ≡ −2 (mod k), and let v > k(k − 2) + 2. If there exists a minimum covering design AD[k, 1; v] with C(k, 1; v) = α(k, 1; v) then there also exists a minimum group covering design GC[k, k − 1; v − 1] with G(k, k − 1; v − 1) =
(v−1)(v−k) k(k−1)
.
Proof. Let (Y, P) be a covering design AD[k, 1; v] with C(k, 1; v) = α(k, 1; v). Then by Remark 3 there exists a unique pairset {u, v} ⊆ Y which is contained in exactly k blocks, say P1 , P2 , . . . , Pk and all other pairsets are contained in exactly one block. Observe that Pi Pj = {u, v} for all 1 ≤ i < k j ≤ k. Let S = i=1 Pi . Clearly  S = k(k − 2) + 2 and Y \S = φ. Let x0 ∈ Y \S. Since any pairset containing x0 must be contained in exactly one block of P,
342
K.K.P. Chanduka, Mahesh C. Bhandari, and Arbind K. Lal
the number of blocks containing x0 is n. Let T1 , . . . , Tn be the blocks containing x0 . Clearly G = {Ti \{x0 } : 1 ≤ i ≤ n} is a partition of X = Y \{x0 } and {u, v} ⊆ Ti for 1 ≤ i ≤ n. Let B = P\{T1 , T2 , . . . , Tn }. Note that for each B ∈ B,  Ti B ≤ 1 for 1 ≤ i ≤ n. Since (Y, P) is a covering design, every pairset {s, t} ⊆ X such that s and t are from distinct Ti \{x0 } will be contained in a block of B. Hence (X, G, B) is a group covering design GC[k, k − 1; v − 1]
(v−1)(v−k)
with G(k, k − 1; v − 1) ≤ B = α(k, 1; v) − n = . k(k−1)
v(v−k) Therefore by (5), G(k, k − 1; v − 1) ≥ k(k−1) . Hence the theorem follows.
If n ≡ 0 or 1 (mod 3) and k = 3 then Hanani [14] has shown that a GD[3, 2; 2n] exists and hence G(3, 2; 2n) = 2n(n − 1)/3 . If n ≡ 2 (mod 3) and v = 2n + 1 then Fort and Hedlund [13] have shown that there exists a covering design AD[3, 1; v] with C(3, 1; v) = α(k, 1; v). Hence by Theorem 15 the following corollary is immediate. Corollary 3. If n ≥ 3 then G(3, 2; 2n) = 2n(n − 1)/3 . If n ≡ 0 or 1 (mod 4) and k = 4 then Brouwer et al. [3] have shown that a GD[4, 3; 3n] exists and hence G(4, 3; 3n) = 3n(n − 1)/4 . If n ≡ 2 or 3 (mod 4) and v = 3n + 1 then Mills [19],[20] has shown that there exists a covering design AD[4, 1; v] with C(4, 1; v) = α(4, 1; v) with the exception of v = 19. Hence by Theorem 15 we have the following result. Corollary 4. If n ≥ 4 and n = 6 then G(4, 3; 3n) = 3n(n − 1)/4 . Remark 5. If n = 6, Corollary 1 gives G(4, 3; 18) ≥ 25 > 3n(n − 1)/4. If k = 5 and v = 4n + 1, then Mills and Mullin [21] have shown that C(5, 1; v) = α(5, 1; v) whenever n ≡ 2 (mod 5), n > 787 or n ≡ 4 (mod 5), n > 189. In each of these cases, the hypothesis of Theorem 15 are satisﬁed. Hence we have the following corollary. Corollary 5. If n ≡ 2 (mod 5), n > 787 or n ≡ 4 (mod 5), n > 189 then G(5, 4; 4n) = 4n(n − 1)/5 . Let v0 (k) be as in Theorem 2 and let v2 (k) = max{v0 (k), 3(k − 2)(k + 1)/2 + 4}. If v ≥ v2 (k), the following theorem follows from Theorems 2 and 15.
Theorem 16. Let v ≥ v2 (k), (v − 1) ≡ 0 (mod k − 1) and let v(v − 1)/(k − 1) ≡ −2 (mod k). Then there exists a GC[k, k − 1; v − 1] with (v−1)(v−k) G(k, k − 1; v − 1) = . k(k−1) Theorem 17. Let k ≥ 3, (v−1) ≡ 0 (mod k −1), n = v−1 k−1 , nv ≡ −3 (mod k), and let v > 3(k + 1)(k − 2)/2 + 3. If there exists a minimum covering design AD[k, 1; v] with C(k, 1; v)= α(k, 1; v)then there also exists a GC[k, k − 1; v − 1] with G(k, k − 1; v − 1) =
(v−1)(v−k) k(k−1)
.
Lower Bounds for Group Covering Designs
343
Proof. Let (Y, P) be a covering design AD[k, 1; v] with C(k, 1; v) = α(k, 1; v). Then by Remark 3 there exists three pairsets, say {x, y}, {y, z} and {z, x}, each of which is contained in exactly (k + 1)/2 blocks of P (not necessarily distinct) and all other pairsets are contained in exactly one block. Let B1 , B2 , . . . , Bs be the blocks containing the pairsets {x, y}, {y, z} and {z, x} and let S = ∪si=1 Bi . Note that (k + 1)(k − 3)/2 + 3 ≤ S ≤ 3(k + 1)(k − 2)/2 + 3. Since v > 3(k + 1)(k − 2)/2 + 3, X\S = φ. Let x0 ∈ X\S and let T1 , T2 , . . . , Tn be the blocks containing x0 . Let X = Y \{x0 }, G = {Ti \{x0 } : 1 ≤ i ≤ n} and let B = P\{T1 , T2 , . . . , Tn }. Following the argument given in the proof of Theorem 15 it is easy to verify that (X, G, B) is a minimum group covering design containing
(v−1)(v−k) k(k−1)
blocks.
Let v0 (k) be as in Theorem 2 and let v3 (k) = max{v0 (k), 3(k − 2)(k + 1)/2 + 4}. If v ≥ v3 (k), by Theorems 2 and 17 we have the following theorem. Theorem 18. Let k ≥ 3, v ≥ v3 (k), (v − 1) ≡ 0 (mod k − 1) and let v(v − 1)/(k − 1) ≡ −3 (mod k). Then there exists a GC[k, k − 1; v − 1] with (v−1)(v−k) G(k, k − 1; v − 1) = . k(k−1) Theorem 19. (i) G(3, 2; 8) = 8, (ii) (iii) G(5, 2; 12) = 8, (iv) G(6, 2; 14) = 7.
G(4, 2; 10)
=
8,
Proof. We use the notation for representing a block as described in Theorem 11. (i) By (5), G(3, 2; 8) ≥ 8. The following eight blocks gives the desired result. 111θ 12θ1 21θ2 222θ 1 θ 2 2 2 θ 1 1 θ 1 2 1 θ 2 1 2. (ii) By (5), G(4, 2; 10) ≥ 8. The following eight blocks gives the desired result. 2111θ 112θ1 121θ2 2222θ 1 1 θ 2 2 1 2 θ 1 1 2 θ 1 2 1 2 θ 2 1 2. (iii) By (5), G(5, 2; 12) ≥ 8. The following eight blocks gives the desired result. 12111θ 1112θ1 1121θ2 12222θ 2 1 1 θ 2 2 2 1 2 θ 1 1 2 2 θ 1 2 1 2 2 θ 2 1 2. (iv) By (5), G(6, 2; 14) ≥ 7. The following seven blocks gives the desired result. 12θ1221 21122θ1 222θ212 1122θ22 2 θ 1 1 1 2 2 θ 1 2 1 1 1 1 1 2 1 2 1 1 θ.
5
Group Divisible Designs with Consecutive Block Sizes
Recently Colbourn, Lenz, Ling, Rosa, Stinson and others have studied PBD’s with consecutive block sizes (see, [17], [11], [12] and [18]). However very little is known about GDD’s with consecutive block sizes. In this section we use Lemmas 2 and 3 to construct some GDD’s with two, three or four consecutive block sizes.
344
K.K.P. Chanduka, Mahesh C. Bhandari, and Arbind K. Lal
Theorem 20. Let (X, G, B) be a minimum group covering design GC[k, m; mn] satisfying m(n − 1) ≡ 0 (mod k − 1) and nm2 (n − 1)/(k − 1) ≡ −2 (mod k) and B = β(k, m; mn). Then there exist group divisible designs GD[{k − 1, k}, {m − 1, m}; mn − 1] and GD[{k − 2, k − 1, k}, {m − 1, m}; mn − 2]. Proof. By Lemma 2 there exists a unique pairset P0 = {x0 , y0 } which is contained in exactly k blocks. Let X1 = X\{x0 }, X2 = X\P0 , G1 = {G\{x0 } : G ∈ G}, G2 = {G\P0 : G ∈ G}, B1 = {B\{x0 } : B ∈ B} and let B2 = {B\P0 : B ∈ B}. Then it is easy to verify that (X1 , G1 , B1 ) is a GD[{k − 1, k}, {m − 1, m}; mn − 1] and (X2 , G2 , B2 ) is a GD[{k − 2, k − 1, k}, {m − 1, m}; mn − 2]. Theorem 21. Let n ≥ k > 3 and let (X, G, B) be a minimum group covering design GC[k, m; mn] with B = β(k, m; mn). If m(n − 1) ≡ 0 (mod k − 1) and nm2 (n − 1)/(k − 1) ≡ −3 (mod k) then there exist group divisible designs GD[{k − 2, k − 1, k}, {m − 1, m}; mn − 2] and GD[{k − 3, k − 2, k − 1, k}, {m − 1, m}; mn − 3]. Proof. By Lemma 3 there exists three pairsets, say {x, y}, {y, z} and {z, x}, each of which is contained in exactly (k + 1)/2 blocks. Let X1 = X\{x, y}, X2 = X\{x, y, z}, G1 = {G\{x, y} : G ∈ G}, G2 = {G\{x, y, z} : G ∈ G}, B1 = {B\{x, y} : B ∈ B} and let B2 = {B\{x, y, z} : B ∈ B}. Observe that B1 contains blocks of sizes k − 2, k − 1 or k while B2 contains blocks of sizes k − 3, k − 2, k − 1 or k. A block of size k − 3 in B2 will exist only when there exist a block containing x, y and z. It can be easily veriﬁed that (X1 , G1 , B1 ) and (X2 , G2 , B2 ) are GD[{k − 2, k − 1, k}, {m − 1, m}; mn − 2] and GD[{k − 3, k − 2, k − 1, k}, {m − 1, m}; mn − 3], respectively. It may be observed that if mn ≥ v2 (k) (v3 (k)), m(n − 1) ≡ 0 (mod k − 1) and nm2 (n − 1)/(k − 1) ≡ −2 (mod k) (≡ −3 (mod k)) then Theorem 16 ( Theorem 18 ) guarantees the existence of a minimum group covering design with β(k, m; mn) blocks.
References 1. A. M. Assaf, An application of modiﬁed group divisible designs, J. Combin. Theory Ser. A 68 (1994), 152168. 2. M. C. Bhandari, K. K. P. Chanduka and A. K. Lal, On lower bounds for covering codes, Designs, Codes and Cryptography, 15 (1998), 237243. 3. A. E. Brouwer, A. Schrijver and H. Hanani, Group divisible designs with block size four, Discrete Math. 20 (1977), 110. 4. Y. Caro and Y. Raphael, Covering graphs: The covering problem solved, J. Comb. Theory, Ser. A, 83 (1998), 273282. 5. K. K. P. Chanduka, ”Combinatorial Designs and Covering Codes”, Ph.D. thesis, Indian Institute of Technology, Kanpur, India (January 1998). 6. K. K. P. Chanduka, M. C. Bhandari and A. K. Lal, Further results on q−ary covering codes, manuscript under preparation. 7. W. Chen and I. S. Honkala, Lower bounds for qary covering codes, IEEE Trans. Inform. Theory 36 (1990), 664671.
Lower Bounds for Group Covering Designs
345
8. D. M. Cohen, S. R. Dalal, J. Parelius and G. C. Patton, The combinatorial design approach to automatic test generation, IEEE software, 13 (1996), 8388. 9. D. M. Cohen, S. R. Dalal, M. L. Fredman and G. C. Patton, The AETG systems: An approach to testing based on combinatorial design, IEEE Trans. Soft. Eng., 23 (1997), 437444. 10. C. J. Colbourn and J. H. Dinitz, editors, ”The CRC Handbook of Combinatorial Designs”, CRC Press, Boca Raton, ﬁrst edition, 1996. 11. C. J. Colbourn and A. C. H. Ling, Pairwise balanced designs with block sizes 8, 9 and 10, J. Combin. Theory Ser. A 77 (1997), 228245. 12. C. J. Colbourn, A. Rosa and D. R. Stinson, Pairwise balanced designs with block sizes three and four, Can. J. Math. 43 (1991), 673704. 13. M. K. Fort and G. A. Hedlund, Minimal covering of pairs by triples, Pacific J. Math. 8 (1958), 709719. 14. H. Hanani, Balanced incomplete block designs and related designs, Discrete Math. 11 (1975), 255369. 15. I. S. Honkala, Modiﬁed bounds for covering codes, IEEE Trans. Inform. Theory 37 (1991), 351365. 16. D. J. Kleitman and J. Spencer, Families of Kindependent sets, Discrete Math. 6 (1973), 255262. 17. H. Lenz, Some remarks on Pairwise balanced designs, Mitt. Math. Sem. Giessen, 165 (1984), 4962. 18. A. C. H. Ling, X. Zhu, C. J. Colbourn and R. C. Mullin, Pairwise balanced designs with consecutive block sizes, Designs, Codes and Cryptography, 10 (1997), 203222. 19. W. H. Mills, On the covering of pairs by quadruples I, J. Combin. Theory Ser. A 13 (1972), 5578. 20. W. H. Mills, On the covering of pairs by quadruple II, J. Combin. Theory Ser. A 15 (1973), 138166. 21. W. H. Mills and R. C. Mullin, Covering pairs by quintuples: The case v congruent to 3 (mod 4), J. Combin. Theory Ser. A 49 (1988), 308322. 22. S. Poljak and Z. Tuza, On the maximum number of qualitatively independent partitions, J. Combin. Theory Ser. A 51 (1989), 111116. 23. J. Sch¨ onheim, On coverings, Pacific J. Math. 14 (1964), 14051411. 24. N. J. A. Sloane, Covering arrays and intersecting codes, J. Combin. Designs, 1 (1993), 5163. 25. B. Stevens and E. Mendelsohn, New recursive methods for transversal covers, preprint. 26. B. Stevens, L. Moura and E. Mendelsohn, Lower bounds for transversal covers, Designs, Codes and Cryptography, 15 (1998), 279299. 27. G. Tarry, Le probl`eme des 36 oﬃciers, Compt. Rend. Assoc. Fr. Av. Sci. 1 (1900), 122123; 2 (1901), 170203. 28. Z. Zhang, Linear inequalities for covering codes: Part IPair covering inequalities, IEEE Trans. Inform. Theory 37 (1991), 573582.
Characteristic Functions of Relative Diﬀerence Sets, Correlated Sequences and Hadamard Matrices Garry Hughes Royal Melbourne Institute of Technology, Melbourne, VIC 3001, Australia garry.hughes@rmit.edu.au
Abstract. Given a cocycle, α, the concept of a sequence being α correlated provides a link between the cohomology of ﬁnite groups and various combinatorial objects: autocorrelated sequences, relative diﬀerence sets and generalised Hadamard matrices. The cohomology enables us to lift a map φ, deﬁned on a group, to a map Φ, deﬁned on an extension group, in such a way that Φ inherits some of its combinatorial properties from those of φ. For example, if φ is αcorrelated, Φ will be the characteristic function of a relative diﬀerence set in the extension group determined by α. Many wellknown results follow from choosing the appropriate extension groups and cocycles, α.
1
Introduction
It is well known that many objects of combinatorial interest are intimately connected; relative diﬀerence sets, sequences with certain autocorrelation properties and generalised Hadamard matrices for example. Recently several authors have related these objects to the cohomological theory of ﬁnite groups (see [6] and the references therein). That there is a relationship is, in hindsight, no surprise because combinatorial objects such as those mentioned are often deﬁned and studied in terms of a group and its extensions. Cohomology is a natural language to discuss group extensions. In this spirit, the work presented here will use ideas from cohomology to show the equivalence of the combinatorial objects mentioned, under certain circumstances. The equivalence is constructive showing how to pass easily from one object to another (see Theorem 3). One of the advantages of this approach is that the theory covers, in a single framework, both splitting and nonsplitting extensions (see the corollaries to Theorem 3). Another is that construction techniques for any of the objects can be proved using the most convenient equivalence (see Section 4). In Section 2 we introduce deﬁnitions that express certain distribution properties of the values of a map between groups. These deﬁnitions are fundamentally connected to each other and to objects from cohomology, namely “cocycles”. In Section 3 we review the results and deﬁnitions we need concerning cocycles and group extensions. We then show how, when given a map between groups, φ, with Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 346–354, 1999. c SpringerVerlag Berlin Heidelberg 1999
Characteristic Functions of Relative Diﬀerence Sets
347
a certain distribution property, we can lift this map to an extension group in such a way that the lifted map has a similar distribution property. The lifted map will be the characteristic function for a relative diﬀerence set in the extension group. We apply this to the split extension and the natural extension. In Section 4 we give some methods that enable us, when given such φ, to construct others. Finally we show that a Generalised Perfect Binary Array (see [2]) corresponds to a special example of the lifting of such a map φ. Throughout this paper we adopt the following notation: A will be a ﬁnite abelian group; ZA the group ring of A over the integers; if w is a vector, wi will denote its ith component.
2
Relative Orthogonality and Relative Correlation
We give two deﬁnitions that describe the distribution properties of maps between groups. The deﬁnitions are then connected to cohomology. Deﬁnition 1. Let ψ : L × L → A be a map of ﬁnite groups with A dividing L. Let H be a subgroup of L. Then we say ψ is orthogonal relative to H if, in ZA, b ∈ L − H implies ψ(b, j) = L/A a. j∈L
a∈A
Here L − H refers to the set of elements of L that are not in H. Furthermore, ψ is called orthogonal if it is orthogonal relative to the trivial subgroup. H is called a forbidden subgroup and we may alternatively express relative orthogonality by saying that, when b is not in the forbidden subgroup, the sequence {ψ(b, j) : j ∈ L} has each element of A an equal number of times. Important in what follows is the eﬀect of onto group homomorphisms on relative orthogonality. We have the following result, the ﬁrst part of which we will use later to lift orthogonality on a group to relative orthogonality on an extension of that group. Lemma 1. Let β : R → S and θ : A → A be onto homomorphisms of groups. Let ψ : S × S → A be a map of groups and H a subgroup of S. Then i) ψ ◦ (β × β) : R × R → A is orthogonal relative to ker β if and only if ψ is orthogonal. ii) If ψ is orthogonal relative to H then so is θ ◦ ψ. Proof. We give an outline of the proof of i). Let T be a transversal for the cosets of ker β in R. For b ∈ R we have, upon collecting by cosets, ψ(β(b), β(r)) =  ker β ψ(β(b), β(t)). r∈R
t∈T
Now, because β is onto, as t runs over T then β(t) runs over S, so this last sum is equal to  ker β s∈S ψ(β(b), s). The result now follows.
348
Garry Hughes
The concept of relative orthogonality is closely connected to that of the “twisted relative correlation” of sequences. The nature of this connection, which we prove in the next lemma, enables results concerning cohomology theory and relative diﬀerence sets to be interpreted in terms of correlation of sequences. We ﬁrstly need some deﬁnitions. Deﬁnition 2. Let φ : L → A and α : L × L → A be maps of groups with A dividing L. Let H be a subgroup of L. Then we say φ is αcorrelated relative to H if, in the group ring ZA, b ∈ L − H implies α(b, j)φ(j)(φ(bj))−1 = L/A a. j∈L
a∈A
Further, if H is the trivial subgroup we omit the phrase “relative to H” and if α is identically 1, we replace “1correlated” by “correlated”. So, in the deﬁnition above, we think of φ as being autocorrelated when the autocorrelation function has been “twisted” by α. We will use the following notation from cohomology theory (see [4]). If φ : L → A is a map of groups, the coboundary ∂φ : L × L → A is deﬁned by ∂φ(m, n) = φ(m)φ(n)(φ(mn))−1 , ∀m, n ∈ L. We can now establish the connection mentioned above. Lemma 2. With the notation of the above deﬁnition, φ is αcorrelated relative to H if and only if α∂φ is orthogonal relative to H. Proof. For b ∈ L, we have α(b, j)∂φ(b, j) = φ(b) α(b, j)φ(j)(φ(bj))−1 . j∈L
Noting that φ(b)
3
a∈A a
=
j∈L a∈A a
= (φ(b))−1
a∈A a,
we obtain the result.
Cohomological Aspects of Relative Orthogonality
For the remainder of this paper let G be a group of order v and we suppose w  v, where w denotes the order of A. When the “twisting” function α in the deﬁnition of αcorrelation is of a special form there is an equivalence between the existence of αcorrelated maps, generalised Hadamard matrices and relative diﬀerence sets in extension groups determined by α. The special form needed is that α is a cocycle (strictly a two dimensional cocycle with trivial action). We will deﬁne this shortly, but the following result of Perera and Horadam indicates the sort of relationship between cohomological and combinatorial objects that we are interested in. A v×v matrix, C, of elements of A, indexed by the elements of G (in some ﬁxed order), is a (v/w)generalised Hadamard matrix if, whenever 1 ≤ i = k ≤ v, the list cij c−1 kj , 1 ≤ j ≤ v has each element of A exactly v/w times (see [6] or [1]).
Characteristic Functions of Relative Diﬀerence Sets
349
Theorem 1. [6, Lemma 2.2] Let ψ : G × G → A be a cocycle and form the matrix Mψ = [ψ(x, y)] indexed by the elements x, y ∈ G (in some ﬁxed order). Then ψ is orthogonal if and only if Mψ is a (v/w)generalised Hadamard matrix. 3.1
Cocycles and Central Extensions
We summarise the deﬁnitions and results we need on cocycles and central extensions (for proofs see [4, Chapter 2]). We call the map α : G × G → A a cocycle if α(1, 1) = 1 and ∀x, y, z ∈ G it satisﬁes the equation α(x, y)α(xy, z) = α(y, z)α(x, yz). A consequence of this equation is that ∀x ∈ G we have α(x, 1) = α(1, x) = 1. The abelian group of all such cocycles under the multiplication (αα )(x, y) = α(x, y)α (x, y) is denoted Z 2 (G, A). If φ : G → A is such that −1 φ(1) = 1, the coboundary deﬁned by ∂φ(x, y) = φ(x)φ(y)(φ(xy)) ∀x, y ∈ G is in Z 2 (G, A). If α, α ∈ Z 2 (G, A) and α = α ∂φ for some φ we say α and α are cohomologous and write α ∼ α . This is an equivalence relation and the group of equivalence (cohomology) classes, α, is denoted H 2 (G, A). Cocycles determine and are determined by central extensions. Consider a central extension, R, of G by A; that is consider a short exact sequence, ι
β
1 → A → R → G → 1,
(1)
where ι(A) = ker β is a subgroup of the centre of R. We take a section, λ, of β, that is a map λ : G → R such that β(λ(x)) = x, ∀x ∈ G and λ(1) = 1. Then λ deﬁnes a cocycle fλ ∈ Z 2 (G, A) by ι(fλ (x, y)) = λ(x)λ(y)(λ(xy))−1 . We note that λ(G) is a transversal for the cosets of ι(A) in R. Equivalently we could deﬁne λ given such transversal. Diﬀerent choices of λ lead to cohomologous cocycles. Conversely, given a cocycle α ∈ Z 2(G, A) we have the central extension ι
β
1 → A → Eα → G → 1,
(2)
where Eα is the group {(a, x) : a ∈ A, x ∈ G} with a “twisted” multiplication deﬁned by (a, x)(b, y) = (abα(x, y), xy) and ι (a) = (a, 1), β (a, x) = x. We refer to this as the standard extension for α. We also refer to the section of β given by λ (x) = (1, x) as the standard section for α since it determines the cocycle fλ = α. 3.2
Extensions of Set Maps
Consider the central extension (1) and a ﬁxed section, λ, of β. In this part we take a map φ : G → A, with φ(1) = 1, and deﬁne an extension to a map from R to A. This extension will preserve some of the relative correlation properties of the original map. When φ has certain correlation properties the extension will prove to be a characteristic function of a relative diﬀerence set in R. We will establish this in the next part.
350
Garry Hughes
Any r ∈ R may be written in the form r = ι(ar )λ(xr ) where xr = β(r) and ar ∈ A is unique. Therefore, given φ : G → A we deﬁne the extension, Φλ : R → A, of φ by Φλ (r) = ar −1 φ(xr ).
(3)
If the central extension and section are the standard ones for α ∈ Z 2 (G, A) we write Φα for Φλ . We have the following properties for the extension function. Lemma 3. i) {r ∈ R : Φλ (r) = 1} = {ι(φ(x))λ(x) : x ∈ G}; ii) ∂Φλ = (fλ ∂φ) ◦ (β × β). Proof. Only ii) needs any work. For r, s ∈ R we see xrs = xr xs and, because ι(A) is in the centre of R, we also have ars = ar as fλ (xr , xs ). The result now follows from the deﬁnitions of Φλ and ∂Φλ . 3.3
Relative Diﬀerence Sets and Cocycles
We ﬁrsly deﬁne a relative diﬀerence set (rds). Suppose M is a group of order vw and N a subgroup of order w where w  v. A v element subset, D, of M is called a (v, w, v, v/w)rds in M relative to N if the list d1 d−1 2 , d1 , d2 ∈ D contains no elements of N and each element of M − N exactly v/w times. The fundamental result connecting such relative diﬀerence sets and cocycles is also due to Perera and Horadam. Theorem 2. [6, Theorem 4.1] Let ψ ∈ Z 2(G, A). Then ψ is orthogonal if and only if {(1, x) : x ∈ G} is a (v, w, v, v/w)rds in Eψ relative to A × 1. We are ﬁnally in a position to prove our main result which links the relative orthogonality and correlation properties of a map and its extension with a characteristic function for a relative diﬀerence set. Recall that G = v, A = w and w  v. Theorem 3. Consider the central extension, (1), and a section, λ, of β. Let φ : G → A be such that φ(1) = 1 and let Φλ be deﬁned by (3). Then the following are all equivalent: i) fλ ∂φ is orthogonal; ii) D = {r ∈ R : Φλ (r) = 1} is a (v, w, v, v/w)rds in R relative to ι(A) = ker β; iii) ∂Φλ is orthogonal relative to ι(A); iv) Φλ is correlated relative to ι(A); v) φ is fλ correlated. Proof. The equivalence of i) and iii) follows from Lemmas 1 and 3; that of iii) and iv) and also that of i) and v) from Lemma 2. We need only show that i) and ii) are equivalent. Let ψ = fλ ∂φ. There is an isomorphism Γ : Eψ → R given by Γ (a, x) = ι(aφ(x))λ(x) (see [4] for this). We see that Γ (a, 1) = ι(a). Let D∗ = {(1, x) : x ∈ G} ⊆ Eψ . Therefore Γ (D∗ ) = {ι(φ(x))λ(x) : x ∈ G} and, by Lemma 3, D = Γ (D∗ ). Applying the isomorphism Γ , Theorem 2 tells us that the orthogonality of ψ is equivalent to D = Γ (D∗ ) being a (v, w, v, v/w)rds in Γ (Eψ ) = R relative to Γ (A × 1) = ι(A) = ker β.
Characteristic Functions of Relative Diﬀerence Sets
351
In view of part ii) of the above equivalence, we can regard Φλ as a characteristic function for the relative diﬀerence set {ι(φ(x))λ(x) : x ∈ G} in R. If we begin with a cocycle and use the standard extension, (2), and standard section for that cocycle we obtain the following corollary. Corollary 1. Let α ∈ Z 2 (G, A) and φ : G → A with φ(1) = 1. Then the following are equivalent: i) α∂φ is orthogonal; ii) Φα (a, x) = a−1 φ(x) is correlated relative to A × 1; iii) φ is αcorrelated; iv) {(φ(x), x) : x ∈ G} is a (v, w, v, v/w)rds in Eα relative to A × 1. We note that instead of starting with a map φ we could equally well begin with a (v, w, v, v/w)rds D, in Eα , and deﬁne φ(x) = a for (a, x) ∈ D. The map φ is welldeﬁned on G and would be αcorrelated. When we choose α ≡ 1 in the above corollary we obtain some well known results on splitting relative diﬀerence sets, because E1 = A × G. Corollary 2. i) φ is correlated if and only if the set {(φ(x), x) : x ∈ G} is a (v, w, v, v/w)rds in A × G relative to A × 1. ii) If v = w, let b ∈ G and ∆b,φ : G → A be the diﬀerence operator ∆b,φ(x) = φ(x)(φ(bx))−1 . Then ∆b,φ is onto (equivalently onetoone) for every b = 1 if and only if {(φ(x), x) : x ∈ G} is a (v, v, v, 1)rds in A × G relative to A × 1. As an example of the last result; if G is the additive group of the ﬁeld GF(q), q odd, then for any e = 0, g, h ∈ GF(q) there is a “quadratic” (v, v, v, 1)rds in G × G relative to G × 0, namely {(ex2 + gx + h, x) : x ∈ G}. This is also proved, basically, in [7]. As we have seen the split extension provides certain examples. So, too, does the natural extension. Let M, N be groups of order, respectively, vw, w with N in the centre of M. Let D be a transversal for the cosets of N in M with 1 ∈ D. The natual short exact sequence id
π
1 → N → M → M/N → 1, with π(m) = mN and section λ(dN ) = d, d ∈ D, deﬁnes a cocycle fD ∈ Z 2 (M/N, N ) as follows. For d, d ∈ D we have dd N = d∗ N for unique d∗ ∈ D. So we deﬁne fD (dN, d N ) = dd (d∗ )−1 . We take the map φ on M/N to be identically 1 and extend it to M as before. Write r ∈ M uniquely as r = dr nr , dr ∈ D, nr ∈ N and deﬁne Φλ (r) = dr r−1 = n−1 r . We now have the following by using Theorems 1 and 3. Corollary 3. For N central in M and a transversal D, with 1 ∈ D, the following are equivalent: i) D is a (v, w, v, v/w)rds in M relative to N ; ii)Φλ (r) = dr r−1 is correlated relative to N ; iii) fD is a (v/w)generalised Hadamard matrix. The construction of a generalised Hadamard matrix from a relative diﬀerence set appears in [3]. We have proved the equivalence of these objects in the case of the parameters and groups above.
352
4
Garry Hughes
Base Sequences and Generalised Perfect Binary Arrays
We have seen that maps that are autocorrelated when the correlation function is “twisted” by a cocycle can be lifted to produce relative diﬀerence sets in extension groups. In view of this we call φ : G → A, where φ(1) = 1, a base sequence with respect to α ∈ Z 2 (G, A) if it satisﬁes any of the equivalent conditions of Corollary 1. In the special case of α being symmetric (that is α(x, y) = α(y, x), ∀x, y ∈ G), G being an abelian group and A having order 2, base sequences have been studied under the name Generalised Perfect Binary Arrays, or GPBAs, by Jedwab and others (see [2]). We will discuss this correspondence later. 4.1
Construction of Base Sequences
Jedwab [2] gives many construction techniques for GPBAs. Some of the techniques seem speciﬁc to the situation he is studying, but many work in more general circumstances. We present some of these. By virtue of Theorem 3 these techniques give constructions for rds in extension groups. Theorem 4. For α ∈ Z 2 (G, A) and φ : G → A, with φ(1) = 1, we have: i)If α is cohomologous to α ∈ Z 2(G, A), say α = α ∂µ, then φ is a base sequence wrt α if and only if µφ is a base sequence wrt α ; ii) If Θ : G → G is an isomorphism, φ is a base sequence wrt α if and only if φ ◦ Θ is a base sequence wrt α ◦ (Θ × Θ) ∈ Z 2 (G , A); iii) Let φ1 : G1 → A and φ2 : G2 → A. Deﬁne the tensor product, φ1 ⊗ φ2 , to be φ1 ⊗ φ2 (x1 , x2 ) = φ1 (x1 )φ2 (x2 ). Similarly given α1 ∈ Z 2 (G1 , A) and α2 ∈ Z 2 (G2 , A) deﬁne α1 ⊗ α2 . Then φ1 ⊗ φ2 is a base sequences wrt α1 ⊗ α2 if and only if φ1 , φ2 are base sequences wrt α1 , α2 respectively. iv) Let Ω : A → A be an onto homomorphism. If φ is a base sequence wrt α then Ω ◦ φ is a base sequence wrt Ω ◦ α ∈ Z 2 (G, A ). Proof. These are most easily seen using the deﬁnition: φ is a base sequence wrt α if and only if α∂φ is orthogonal. All the parts except iii) follow from Lemma 1 with the following observations: in i) α∂φ = α ∂(µφ); in ii) (α∂φ) ◦ (Θ × Θ) = α ◦ (Θ × Θ)∂(φ ◦ Θ); in iv) Ω ◦ (α∂φ) = (Ω ◦ α)∂(Ω ◦ φ). It remains only to prove iii). Suppose ψ = (α1 ⊗ α2 )∂(φ1 ⊗ φ2 ) is orthogonal. Let ψ1 = α1 ∂φ1 and 1 = x ∈ G1 . Then (x,y)∈G1×G2
ψ (x , 1), (x, y) =
(x,y)∈G1×G2
ψ1 (x , x).1 =  G2 
ψ1 (x , x).
x∈G1
From the deﬁnition of orthogonality we deduce ψ1 is orthogonal. An equivalent proof works for φ2 . The converse can be proven by a similar argument but is also a consequence of a result on the Kronecker product of generalised Hadamard matrices (see [6, Theorem 5.1]).
Characteristic Functions of Relative Diﬀerence Sets
353
Part i) of the preceeding theorem tells us that if we want a base sequence wrt some cocycle, we may as well assume the cocycle is a representative of a cohomology class in H 2 (G, A). If G ∼ = Zs1 × · · · × Zsr is abelian, representatives for cohomology classes of symmetric cocycles are easy to describe.Indeed, let Ext(G, A) = {α ∈ H 2 (G, A) : α symmetric}, then Ext(G, A) ∼ = i,j Z(si ,tj) , where A ∼ = Zt1 × · · · × Ztk and (si , tj ) refers to the greatest common divisor of si , tj (see [4] for this). In view of this fact, and the theorem above, if we seek a base sequence wrt some symmetric cocycle when G is abelian, we may as well assume G = Zps , A = Zqt for primes p, q. If p = q then we may take the cocycle to be a coboundary and we are in the situation of Corollary 2. 4.2
GPBAs
We will now outline how we may regard a GPBA as a base sequence wrt to a very speciﬁc cocycle. Full details of proofs are given in [5]. We take the deﬁnition of GPBA from [2]. Let G = Zs1 × · · · × Zsr , A = {±1} be a group of order two, z = (z1 , . . . , zr ) where zi = 0 or 1, and s = (s1 , . . . , sr ). Also, let φ : G → A be a map of groups with φ(1) = 1. If z = 0 then φ is called a GPBA(s) of type 0, or a PBA(s), if it is correlated. When z = 0 a more involved deﬁnition is needed. We deﬁne yet more groups: G∗ = Z(z1 +1)s1 × · · · × Z(zr +1)sr . Thus, the arithmetic in the ith coordinate of G∗ is mod 2si or mod si according as zi = 1 or zi = 0. Further, deﬁne the following subgroups of G∗ , H = {h ∈ G∗ : hi = 0 if zi = 0; hi = 0 or si if zi = 1}, K = {k ∈ H : k has even weight}. We now deﬁne an extension of φ to G∗ . We may write any g ∈ G∗ uniquely in the form g = x + h where x ∈ G and h ∈ H by taking x = g mod s = (g1 mod s1 , . . . , gr mod sr ) and h = g − x. Here gi mod si refers to the unique residue in the range 0, . . . , si − 1. Now put a(x) if h ∈ K 7(g) = −a(x) if h ∈ / K. Finally, then, φ is called a GPBA(s) of type z = 0 if 7 is correlated relative to H. The concept of GPBA relates to the ideas in the earlier part of this paper by taking the short exact sequence ι
β
1 → A → G∗ /K → G → 0, where ι is the homomorphism ι(A) = H/K and β the homomorphism g + K → g mod s. Using λ(x) = x + K as a section of β we have, in the notation of the previous section, 7(g) = Φλ (g + K). We may prove from the results earlier that 7 is correlated relative to H if and only if Φλ is correlated relative to H/K. The section λ deﬁnes a cocycle in Z 2 (G, A), which we call fJ , in the usual way and we have the following result.
354
Garry Hughes
Theorem 5. [5, Theorem 5.6] For any z, φ is a GPBA(s) of type z if and only if φ is a base sequence wrt fJ . Finally we note that, because of the isomorphism of Ext(G, A) mentioned above, fJ depends only on s and z, and in a particularly simple manner (for details see [5]).
References 1. D. L. Drake, Partial λgeometries and generalised Hadamard matrices, Canad. J. Math. 31 (1979), 617627. 2. J. Jedwab, Generalized Perfect Arrays and Menon Diﬀerence Sets, Des. Codes and Cryptogr. 2 (1992), 1968. 3. D. Jungnickel, On automorphism groups of divisible designs, Canad. J. Math. 34 (1982), 257297. 4. G. Karpilovsky, Projective Representations of Finite Groups, Marcel Dekker, New York, 1985. 5. G. Hughes, Cocyclic Theory of Generalized Perfect Binary Arrays, Royal Melbourne Institute of Technology, Department of Mathematics, Research Report 6, May 1998. 6. A. A. I. Perera and K. J. Horadam, Cocyclic Generalised Hadamard Matrices and Central Relative Diﬀerence Sets, Des. Codes and Cryptogr. 15 (1998), 187200. 7. A. Pott, Finite Geometry and Character Theory, Lecture Notes in Mathematics 1601, SpringerVerlag, Berlin, 1995.
Double Circulant SelfDual Codes Using FiniteField Wavelet Transforms F. Fekri, S.W. McLaughlin, R.M. Mersereau, and R.W. Schafer Center for Signal & Image Processing Georgia Institute of Technology, Atlanta, GA 303320250 {fekri,swm,rmm,rws}@ee.gatech.edu WWW home page: http://www.ee.gatech.edu/users/fekri/ Abstract. This paper presents an example of integrating recently developed ﬁniteﬁeld wavelet transforms into the study of error correcting codes. The primary goal of the paper is to demonstrate a straightforward approach to analyzing double circulant selfdual codes over any arbitrary ﬁniteﬁeld using orthogonal ﬁlter bank structures. First, we discuss the proper combining of the cyclic mother wavelet and scaling sequence to satisfy the requirement of selfdual codes. Then as an example, we describe the encoder and decoder of a (12,6,4) selfdual code, and we demonstrate the simplicity and the computation reduction that the wavelet method oﬀers for the encoding and decoding of this code. Finally, we give the mother wavelet and scaling sequence that generate the (24,12,8) Golay code.
1
Introduction
Although wavelets and ﬁlter banks over real or complex ﬁelds have been studied extensively for years, the ﬁniteﬁeld wavelet transform has received little attention because of the very limited application of this transform in the signal and image processing area. In the past there was some interest in extending wavelet transforms to the situation in which the complex ﬁeld is replaced with a ﬁnite ﬁeld. In [1] the authors show that unlike the real ﬁeld case, there is no complete factorization technique for paraunitary ﬁlter banks (FB) over GF (p), for p a prime, using degreeone and degreetwo building blocks. Relying on the Fourier transform deﬁned over GF (pr ), the authors of [2] construct a wavelet transform for ﬁnite dimensional sequences over ﬁelds with a characteristic other than 2, p = 2. In [3] an alias cancellation approach was used to design twoband ﬁlter banks over ﬁnite ﬁelds. The main problem with this approach was excluding ﬁelds of characteristic two, GF (2r ) for r ≥ 1, in which an element of order two does not exist. An extensive review of ﬁnite ﬁeld transforms can be found in [4]. Recently, a formulation of the wavelet decomposition of a vector space over any ﬁniteﬁeld has been derived in [5]. Since that formulation relies on a basis decomposition in the time domain rather than in the frequency domain, it does not require the existence of the number theoretic Fourier transform. Thus it becomes more attractive, particularly for the ﬁelds GF (2r ) having characteristic 2. Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 355–363, 1999. c SpringerVerlag Berlin Heidelberg 1999
356
F. Fekri et al.
The objective of this paper is to present an example of our attempt to bring together ﬁniteﬁeld wavelet transforms and error correcting codes and to study error control coding in the signal processing context. This uniﬁed view uncovers a rich set of signal processing techniques that can be exploited to investigate new error correcting codes, and to simplify the encoding and decoding techniques of some existing codes. Selfdual codes have been widely studied [6]. In this paper, ﬁrst we discuss a model to generate double circulant selfdual codes over any arbitrary ﬁniteﬁeld using the notion of ﬁniteﬁeld wavelet transform. Then as an example, we present an encoding and decoding method for the (12,6,4) binary selfdual code. More indepth study of double circulant selfdual codes using cyclic orthogonal and biorthogonal wavelet transforms can be found in [7]. Let FN be the vector space of N tuples over the ﬁniteﬁeld F. Then a selfdual code of length N is a subspace C of FN such that C = C ⊥ , where C ⊥ is deﬁned as: C ⊥ = {v ∈ FN : v, c = 0 ∀ c ∈ C} (1) Clearly, C has dimension M = N/2 and is called a (N, M, d) code where d is the minimum distance of the code. Since the primary focus of this paper is on linear block codes, before talking explicitly about selfdual codes, we brieﬂy summarize some results on cyclic wavelet transforms.
2
Cyclic Wavelet Transforms over the Field F
It is well known that wavelet decomposition and reconstruction can be implemented as the analysis and synthesis components of a perfect reconstruction ﬁlter bank, respectively. Figure 1 shows the analysis and synthesis bank of a twochannel perfect reconstruction ﬁlter bank in which the synthesis ﬁlters g0 (n) and g1 (n) are the scaling sequence and mother wavelet, respectively. In [5] the authors show how to decompose a vector space V over a ﬁniteﬁeld F onto two orthogonal subspaces V0 and W0 . Particularly, to have a two channel perfect reconstruction orthogonal ﬁlter bank, a design methodology is presented to obtain the analysis and synthesis ﬁlters over the ﬁelds of characteristic. Since the codewords c(n) of selfdual codes have ﬁnite even length, the vector space V is considered to be a vector space of ﬁnite dimension N = 2M. It can also be regarded as a space of periodic sequences of period N . Now, consider a twochannel perfect reconstruction orthogonal ﬁlter bank with the scaling sequence g0 (n) = {g0 (0), g0 (1), · · · , g0 (N − 1)} and the mother wavelet g1 (n) = {g1 (0), g1 (1), · · · , g1 (N − 1)}, where the mother wavelet is the time reverse of the scaling function g1 ((n))N = g0 ((−n − 1))N . It is worth noting that throughout the paper ((·))N denotes a moduloN operation, or equivalently an N point circular shift. Furthermore, the analysis ﬁlters are related to the synthesis ﬁlters by: hj ((n))N = gj ((−n))N
j = 0, 1
n = 0, . . . , N − 1.
(2)
Double Circulant SelfDual Codes Using FiniteField Wavelet Transforms
357
In the study of block codes, we will frequently use the fact that the algebra of M × M onecirculant matrices over the ﬁeld F is isomorphic to the algebra of polynomials in the ring F[z −1 ]/(z −M − 1). This isomorphism allows us to simplify the proofs of the relations using matrix notation instead of ztransforms. Therefore, we introduce a matrix representation to express the relations of the cyclic ﬁlter bank. In the analysis bank, the ﬁltering operation of periodic signals followed by decimation by a factor of two can be described using 2circulant matrices. In other words, let H0 : V → V0 and H1 : V → W0 be two linear transformations (M × N matrices in matrix notation) that project a codeword c(n) ∈ V onto two orthogonal spaces V0 and W0 with wavelet coeﬃcients x0 (n) and x1 (n), respectively. Then we have: x0 (n) = x1 (n) =
N −1 i=0 N −1
c(i)h0 ((2n − i))N = (H0 c)(n) (3) c(i)h1 ((2n − i))N = (H1 c)(n),
i=0
in which H0 and H1 are 2circulant matrices. By using (2), these matrices can be written as: gj (0) gj (1) gj (2) · · · gj (N − 1) gj (N − 2) gj (N − 1) gj (0) · · · gj (N − 3) j = 0, 1. (4) Hj = .. .. .. .. .. . . . . . gj (3) gj (4) · · · gj (1) gj (2) Similarly, in the synthesis bank, the upsampling of periodic signals by a factor of two followed by the ﬁltering operation can be described by 2circulant matrices G0 and G1 : c(n) =
M −1
x0 (i)g0 ((n − 2i))N +
i=0
= (G0 x0 )(n) + (G1 x1 )(n).
M −1
x1 (i)g1 ((n − 2i))N
i=0
(5)
Because of the relation (2), the synthesis matrices turn out to be the transposes of the analysis matrices: Gj = HjT
j = 0, 1.
From the perfect reconstruction constraint we deduce that: H0 = IN ×N . [H0T H1T ] H1
(6)
(7)
Furthermore, since the operator T = [H0T H1T ]T is a 1to1 mapping V → V0 ×W0 in the ﬁnitedimensional vector space, consequently it is onto as well. Hence T T T = I implies that T T T = I (i.e, T is unitary). Therefore: Hj HjT = IM ×M
j = 0, 1
and
H0 H1T = 0M ×M .
(8)
358
F. Fekri et al. c(n)
h 0(n)
2
h1(n)
2
x0(n)
x 1(n)
x0(n)
x 1(n)
Analysis Bank
2
g 0(n)
2
g 1(n)
+
c(n)
Synthesis Bank
Fig. 1. Diagram of the twoband ﬁlter bank.
3
Double Circulant SelfDual Codes
Assume that C ∈ FN is a double circulant selfdual code. The double circulant property requires that if c(n) = {c(0), c(1), . . . , c(N − 1)} is a codeword in C, then c((n − 2))N is also in C. Let c(n) be a codeword that is decomposed by the analysis bank of Fig. 1 to its wavelet coeﬃcients x0 (n) and x1 (n). Although we do not need the result of the following theorem in this paper, we quote the following theorem from [7] to maintain the generality of the discussion. Theorem 1. Suppose the codeword c(n) ∈ C corresponds to the message block m(n), then for any double circulant selfdual code C there exists a cyclic orthogonal wavelet transform that maps the codeword c(n) to the message block, i.e, x0 (n) = x1 (n) = m(n). 3.1
Encoder Structure
Figure 2a shows the encoder of a (N, M, d) code that maps the message block m(n) of size M = N/2 to the codeword c(n). The encoder is realized by the synthesis portion of the twoband ﬁlter bank in which go (n) and g1 (n) are an orthonormal wavelet basis over F. The parameter α ∈ F introduced in this structure will be speciﬁed later to meet the selfdual constraint. The delay z −l , 0 ≤ l ≤ M − 1 controls the minimum distance of the generated code, and will also be discussed later. From the linearity of the wavelet transform, it is obvious that the generated code is linear. Furthermore, if c(n) is a codeword associated with the message m(n), then by the property of the multirate ﬁlters, there exists a message datum m((n − 1))M that is mapped to the codeword c((n − 2))N . Therefore, the code is double circulant. It is also worth noting that using the combined scaling sequence and the mother wavelet (synthesis bank) as the encoder, guarantees that the mapping of the message m(n) onto the codeword c(n) is a onetoone mapping. This is true because the message block can be extracted by ﬁltering c(n) through h0 (n) and downsampling it by a factor of two. In the following, we show that regardless of the amount of the delay l, the encoder generates a double circulant selfdual code in F. Using the matrix notation that we developed for the cyclic ﬁlter bank in Section 2, the N × M generator matrix G of this code can be written as: G = G0 + G1 Πl ,
(9)
Double Circulant SelfDual Codes Using FiniteField Wavelet Transforms
359
in which Πl is an M × M monomial matrix, which is a permutation of the identity matrix if the ﬁeld is GF (2r ). In fact Πl is a onecirculant matrix deﬁned by its ﬁrst row that is zero everywhere except the (l + 1) position. Due to the isomorphism of the algebra of onecirculant matrices and the algebra of polynomials, the following statements can be readily proved. First, one can show that ΠlT is also a onecirculant matrix whose ﬁrst row is zero everywhere except at the (M − l + 1) position. Furthermore, it can be proved that ΠlT Πl = α2 IM ×M .
(10)
Now, recalling the necessary and suﬃcient condition for selfdual codes, it is deduced that the M columns of the generator matrix G that are linearly independent must also lie in the dual space. Since dim C = dim C ⊥ = M, then the columns of G specify a basis set for the dual space as well. Consequently the generator matrix and the paritycheck matrix of the code are the same. From the above argument we conclude that the if and only if condition of the selfdual codes is equivalent to GT G = 0. Using (9), (6), (8), and (10), we derive the following equality for the generator matrix of the encoder (Fig. 2a): GT G = I + α2 I.
(11)
Consequently, to meet the selfdual condition, we require that α2 + 1 = 0 for α ∈ F. channel error: e(n) x0(n)
m(n)
αz
2
g 0(n)
+
c(n)
c(n)
+
h0(n)
2
l
+
s(n)
α z (Ml) x1(n) 2
g 1(n)
h1(n)
2
(b) Syndrome Generator
(a) Encoder Structure
Fig. 2. Filter bank structure of the encoder and syndrome generator.
Our previous discussion of selfdual codes is valid over any arbitrary ﬁniteﬁeld. Now, let us study the (12,6,4) code as an example in the binary ﬁeld. As explained in [5], we choose the mother wavelet to be g1 (n) = {100010010101}. Therefore, the scaling sequence and the analysis ﬁlters are obtained by the relation g0 ((n))N = g1 ((−n − 1))N and (2) as: g0 (n) = {101010010001} h0 (n) = {110001001010} h1 (n) = {110101001000}.
(12)
The parameter α in GF (2) is equal to one, and the choice of the delay l determines the minimum distance of the generated code. Choosing the delay l from
360
F. Fekri et al.
the set {1, 2, 4, 5} generates a code with a minimum distance four, while the other choices reduce the minimum distance of the code to two. It is worth noting that the codes generated by diﬀerent values of the delay from the set {1, 2, 4, 5} are all equivalent to each other. By using any delay value from this set, all of the codeword weights are a multiple of two and the weight numerator of the code is: A4 = 15 , A6 = 32 , A8 = 15 , A12 = 1,
(13)
in which Ai denotes the number of codewords of weight i. In the rest of the paper, we choose delay l to be one. 3.2
Syndrome Generator and Decoder Structure
In the following we show that the structure in Fig. 2b constructs the syndrome of the code. Again, by using the relations that we developed for cyclic ﬁlter banks, we write: s(n) = (H0 + ΠlT H1 )(c + e)(n), (14) in which e(n) is the error pattern due to the communication channel. Given the selfdual requirement α2 + 1 = 0, the above equality is simpliﬁed further by combining the equations (6), (8), and (10) as: s(n) = (H0 + ΠlT H1 )e(n).
(15)
Therefore, the output of the system in Fig. 2b depends only on the error pattern. This structure can be simpliﬁed as in Fig. 3 in which h2 (n) = h0 (n) + h1 ((n + 2l))N . Now, the remaining problem is to interpolate the low (M) dimensional syndrome s(n) into the higher (N ) dimensional error pattern e(n). This is a conditional minimum distance estimation problem in the signal processing context. It is conditional because more than one error pattern is mapped into the same low dimensional syndrome. Therefore, the interpolator should choose (out of those possible valid choices) the error pattern that is most likely (has minimum weight) to achieve the maximum likelihood ML decoder performance.
channel error: e(n) c(n)
+
s (n)
e(n) s(n) h2(n)
2
00
2
u (n)
2
u (n)
+
00
s(n)
z 1 01
s (n) 01
(a) Syndrome Generator
(b) Polyphase Structure of the Syndrome Generator
Fig. 3. Polyphase representation of the syndrome generator.
Since the estimator is a signal dependent operator, there is no single operator solution for this estimation problem. Hence, to keep the decoder as simple as
Double Circulant SelfDual Codes Using FiniteField Wavelet Transforms
361
possible, we design the interpolator to be exact for all the weight one error patterns. Therefore, like the ML decoder, this decoder is guaranteed to correct all errors of weight one. Our approach to design the decoder is based on inverting the polyphase ﬁlters of the syndrome generator ﬁlter h2 (n). Figure 3b shows the polyphase structure of the syndrome generator in which u00 (n) = h2 (2n) and u01 (n) = h2 (2n + 1) are the polyphase components of h2 (n). Furthermore, let r00 (n) and r01 (n) be two ﬁlters with the ztransform satisfying: R0i (z)U0i (z) = 1 mod (z −M − 1)
i = 0, 1.
(16)
In other words, these two ﬁlters are the circular inverses of u00 (n) and u01(n), respectively. Now, let us deﬁne two sets E0 and E1 by distinguishing those errors that occur only in the even time indexes from those occur in the odd time indexes, respectively: E0 = {e(n) = {ζ0 , 0, ζ1 , 0, · · · , ζM −1 , 0} where: ζi ∈ {0, 1} ∀ i} E1 = {e(n) = {0, η0 , 0, η1 , · · · , 0, ηM −1 } where: ηi ∈ {0, 1} ∀ i} .
(17)
If e(n) ∈ E0 , then s01 (n) = 0 and s(n) = s00 (n). Therefore, we are required to invert the ﬁlter u00 (n) to estimate the error. By this argument, we realize that whenever e(n) ∈ E0 , the output at position 1 in Fig. 4 is the exact interpolation of the syndrome to e(n).
r (n) 00
~
s(n)
e(n)
Weight Computation r (n) 01
c(n)+e(n)
1 2
2
z (N1)
+
m(n) h0(n)
2
2
Conditional Interpolator
Fig. 4. Filter structure to reconstruct the message sequence. Taking into the account that half of the weight one errors belong to the set E1 , we need to identify these cases and be able to correct these errors as well. To do that, let us deﬁne the set E11 as : E11 = {e(n) : e(n) ∈ E1 & wt(e) = 1}
(18)
in which wt(e) means the weight of the error. It is clear that if e(n) ∈ E11 , then s(n) = u01 ((n − n0 ))M (n0 depends on the location of the ’1’ in e(n)) which generates a weight ﬁve output at node 1. Hence, whenever the weight of the output at node 1 is ﬁve, we select the output at node 2 as a correct estimate of the error. It is worth noting that weight ﬁve outputs at node 1 are also produced by errors from the set E05 = {e(n) : e(n) ∈ E0 & wt(e) = 5}. Consequently, the correctable errors by the decoder of Fig. 4 are from the set E = {E0 ∪ E11 } − E05 .
362
F. Fekri et al.
Table 1. The number of correctable errors (by error weight) by the ﬁlter decoder in Fig. 4 and the ML decoder. Weight of the Error Filter Method ML Decoder 1 12 12 2 15 31 3 20 20 4 15 6 1 
Table 1 gives the correctable errors by the ﬁltering technique and ML decoder. In the following we investigate the amount of extra signaltonoise ratio (SNR) that is required by the ﬁlter method to achieve the same performance as the ML decoder. Suppose pb is the probability of a bit error and perr is the word error probability. Then
Nwe can determine the word error rate by applying the formula perr = 1 − i=0 βi pib (1 − pb )N −i in which βi is the total number of correctable errors of weight i (given in Table 1). To plot the word error rate as a function of SNR, we choose DPSK and noncoherent FSK detection methods in which pb is (1/2)e−SN R and (1/2)e−SN R/2 , respectively. These graphs show that a very subtle extra SNR is required by the ﬁlter decoding method to achieve the performance of the syndrome table lookup decoder. It is important to note that the (12,6,4) selfdual code has been studied only as an example, and the wavelet method described in this paper can be used to generate selfdual codes of any length. As a ﬁnal remark we give the generator ﬁlters of the two double circulant even selfdual codes (8,4,4), and (24,12,8) [7]. The (8,4,4) code is generated by a cyclic orthogonal ﬁlter bank with the scaling sequence and the mother wavelet equal to g0 = {9D} and g1 = {4C}, respectively. Similarly, the cyclic orthogonal ﬁlter bank that generates the Golay code (24,12,8) is constructed by g0 = {A80011} and g1 = {40DD55} (with no need for delay element in Fig. 2a). Note that the ﬁlter coeﬃcients in GF (2) are represented in hexadecimal form. Furthermore, there exist several other pairs of scaling and mother wavelets that generate equivalent codes [7].
4
Conclusion
We report a new approach to study and implement selfdual codes by using ﬁniteﬁeld wavelet transforms. This method allows us to construct double circulant selfdual codes of arbitrary length over any ﬁniteﬁeld in a straightforward manner. We also introduce a decoder based on a polyphase ﬁlter inversion methodology. This decoder achieves nearly the performance of the syndrome table lookup decoder. Our approach, in addition to being a powerful tool to investigate error correcting codes, reduces the complexity and computation costs in the encoder and decoder by the polyphase implementation of the multirate ﬁlters.
Double Circulant SelfDual Codes Using FiniteField Wavelet Transforms
363
Noncoherent FSK Method
0
word error rate
10
ML Decoder Filter Method
−5
10
−10
10
8
9
10
11
12
13
14
15
SNR (db) DPSK Method
0
word error rate
10
−5
10
−10
10
−15
10
4
5
6
7
8
9
10
11
12
13
SNR (db)
Fig. 5. Comparison of the word error rate of the Filter decoding technique with that of the ML decoder method in DPSK and noncoherent FSK.
References 1. S. Phoong, and P. P. Vaidyanathan, “Paraunitary Filter Banks Over Finite Fields”, IEEE Trans. Signal Proc., vol. 45, pp. 1443–1457, June 1997. 2. G. Caire, and R. L. Grossman, and H. V. Poor, “Wavelet Transforms Associated with Finite Cyclic Groups”, IEEE Trans. on Information Theory, vol. 39, pp. 1157–1166, July 1993. 3. T. Cooklev, and A. Nishihara, and M. Sablatash, “Theory of Filter Banks over Finite Fields”, Proc. Asia Pacific conf. Circuits Syst., pp. 260–265, Taipei Taiwan, 1994. 4. F. Fekri, “Transform Representation of Finite Field Signals”, A qualifying examination report available at http://www.ee.gatech.edu/users/fekri, Georgia Institute of Technology, June, 1998. 5. F. Fekri, R. M. Mersereau, and R. W. Schafer,“Theory of Wavelet Transforms over Finite Fields”, Proc. Int. Conf. Acoust. Speech, and Signal proc., 605–608, March 1999. 6. F. J. MacWilliams, and N. J. A. Sloane, “The Theory of Error Correcting Codes, NorthHolland publishing company, 1977. 7. F. Fekri, S. W. McLaughlin, R. M. Mersereau, and R. W. Schafer, “Error Control Coding Using Finite Field Wavelet Transforms, Part III: Double Circulant Codes”, in preparation.
Linear Codes and Polylinear Recurrences over Finite Rings and Modules (A Survey) V.L. Kurakin, A.S. Kuzmin, V.T. Markov, A.V. Mikhalev, and A.A. Nechaev Department of Mechanics and Mathematics and Center of New Information Technologies of Moscow State University
Abstract. We give a short survey of the results obtained in the last several decades that develop the theory of linear codes and polylinear recurrences over ﬁnite rings and modules following the wellknown results on codes and polylinear recurrences over ﬁnite ﬁelds. The ﬁrst direction contains the general results of theory of linear codes, including: the concepts of a reciprocal code and the MacWilliams identity; comparison of linear code properties over ﬁelds and over modules; study of weight functions on ﬁnite modules, that generalize in some natural way the Hamming weight on a ﬁnite ﬁeld; the ways of representation of codes over ﬁelds by linear codes over modules. The second one develops the general theory of polylinear recurrences; describes the algebraic relations between the families of linear recurrent sequences and their periodic properties; studies the ways of obtaining “good” pseudorandom sequences from them. The interaction of these two directions leads to the results on the representation of linear codes by polylinear recurrences and to the constructions of recursive MDScodes. The common algebraic foundation for the eﬀective development of both directions is the Morita duality theory based on the concept of a quasiFrobenius module.
Introduction The theory of linear codes over ﬁnite ﬁelds is a wellestablished part of discrete mathematics. The highlights of this theory include concepts of dimension, duality, weight enumerators and their generalizations, MacWilliams identities, cyclic codes, etc... [1,21,43,63,73]. The theory of linear recurrences (LR) over ﬁelds is also well developed; it has rather more ancient historical roots and important applications in various areas of mathematics (theory of algorithms, Monte Carlo method, cryptography). In particular, the analytical formulae for the general member of an LR were deduced, as well as the algebraic relations between the families of LR, cyclic types of such families over ﬁnite ﬁelds, the distribution laws for elements on the cycles of some LR and the ways of obtaining from them pseudorandom sequences (see [79,41,33,36] and the sources cited there). We point out also some papers on the properties of polylinear sequences over Galois ﬁelds: [44,61,65,66,67,68]. The attempts to extend the mentioned results to some class of ﬁnite rings more general than that of ﬁelds for long time dealt mostly with integral residue Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 365–390, 1999. c SpringerVerlag Berlin Heidelberg 1999
366
V.L. Kurakin et al.
rings [2,3,4,6,71,72,74,75], and in the last years especially with the ring Z4 (see e.g. [46,47,20,7,30,57,76]). The last works ﬁnally proved the actuality and importance of the investigations of linear codes and polylinear recurrences over arbitrary ﬁnite rings and modules that became so active in ’70’90th [14,22,23,24,26,27,28,32,33,36,39,45], [48,49,52,53,54,55,58,59,60,70,77,78,62]. It turned out that suﬃciently deep generalization requires not the absence of zero divisors in the base ring but the so called double annihilator relations in the module (see below p.1.1). This leads to the concepts of quasiFrobenius and reciprocal module, providing a nice way to introduce for linear codes over an arbitrary ﬁnite module the main concepts of coding theory over ﬁelds so that its fundamental results remain valid. For instance, the parity check matrix and the dual code are deﬁned over the reciprocal module (having the same number of elements as the given one) and the MacWiliams identities for complete weight enumerators remain true. We apologize for the brevity of our further comments caused by the space limits which were rather severe for a paper of such kind. Due to the same cause, the reference list is not exhaustive. We only tried to cite all the specialists that worked in the area during the last several years. We apologize in advance to the authors whose names may be missed: please consider it not as a malicious intention but as the sign of our insuﬃcient competence. More complete reference lists could be found in e.g. [33,36,45,60]. We consider the following topics which are comparatively new: 1. General theory of linear codes over ﬁnite modules. 2. The comparison of linear codes over ﬁnite modules, linear spaces and ﬁelds. 3. Weightfunctions on ﬁnite modules as a generalization of Hamming weight. 4. Presentations of codes over ﬁelds by linear codes over modules. 5. General theory of polylinear recurrences over ﬁnite modules. 6. Presentations of linear codes over modules by polylinear recurrences. 7. Recursive and linear recursive MDScodes. 8. Periodic properties of polylinear recurrences over ﬁnite modules. 9. Pseudorandom sequences generated by polylinear recurrences.
1
General Theory of Linear Codes over Modules
Let R be a ﬁnite commutative ring with identity e, and R M a ﬁnite faithful module. Any submodule K < R M n is called a linear ncode over R M, its Hamming distance deﬁned as d(K) = min{α : α ∈ K \ 0}, where α is Hamming weight of α. We choose as the main criterion of correctness for the theory of such codes the presence of notions of the paritycheck matrix and the code Ko dual to the given code K, deﬁned in such a way that in particular the equality Koo = K and the MacWilliams identity for complete weight enumerators of codes K and Ko should be valid. Consider the following fundamental example. Suppose we study linear codes L < R Rn over a ring R, and we deﬁne the dual code Lo in the usual way as Lo = {β ∈ Rn : βL = 0}. Then Loo ⊇ L, but the equality Loo = L is
Linear Codes and Polylinear Recurrences over Finite Rings and Modules
367
guaranteed if and only if R is a quasiFrobenius ring [52,54] (for example principal ideal rings and in particular R = Zm are quasiFrobenius). In such cases the codes over R can be studied without codes over modules, as it was done in [3,4,7,27,69,71,72]. On the other hand, if the ring R is not quasiFrobenius then for deriving deep enough results the dual code should be built not over R but over the corresponding quasiFrobenius module R Q (see below) that has the same number of elements as R. In its turn, the code dual to a linear code over R Q is built over the ring R. In the most general case, while studying the linear codes over arbitrary ﬁnite module R M, to obtain the results close enough to that of the theory of linear codes over ﬁelds, the code dual to the given one must be deﬁned over the module ∗ R M , which is Morita dual to R M. Now we pass to the exact statements. 1.1. QuasiFrobenius Modules and the Reciprocal Modules. For any ideal I R and any submodule K < R M we deﬁne their annihilators, correspondingly in M and R, by the equalities AnnM (I) = {α ∈ M : Iα = 0} < R M; AnnR (K) = {r ∈ R : rK = 0} R. A module R M is called quasiFrobenius (a QF module), if AnnR (AnnM (I)) = I and AnnM (AnnR (K)) = K for all I R and K < R M. For any ﬁnite commutative ring R there exists a unique (up to isomorphism) QF module R Q [18,52]. It might be described as the character group Q = Hom(R, Q/Z) of the group (R, +), where the product rω ∈ Q of an element ω ∈ Q by an element r ∈ R is deﬁned by the condition rω(a) = ω(ra) for any a ∈ R. We have (Q, +) ∼ = (R, +) and Q = R. A ring R is called quasiFrobenius, if R R is a QFmodule. Now, as we continue to discuss the example given in the Introduction, deﬁne the product of the row a = (a1 , . . . , an ) ∈ Rn by the row α = (α1 , . . . , αn ) ∈ Qn as aα = a1 α1 + . . . + an αn ∈ Q, and say the the code L0 = {α ∈ Qn : Lα = 0} over R Q is dual to the linear code L < Rn , while the code K0 = {a ∈ Rn : aL = 0} over R is dual to the linear code K < R Qn . Then, in particular, we have L00 = L, K00 = K, LL0  = KKo = Rn = Qn [52,54]. This construction is generalized to the linear codes over an arbitrary ﬁnite module R M using the following concept. We call the module R M ∗ = HomR (M, Q) of all homomorphisms R M → Q reciprocal to R M (or Moritadual to R M). It may be presented also as R ˙ + α ˙ t be a direct sum Hom(M, Q/Z) in the following way. Let M = α1 +... of cyclic groups. Then any ϕ ∈ M ∗ generates t characters ϕ(αi ) = ωi ∈ Q = ∈ Hom(M, Q/Z) such that Hom(R, Q/Z), i ∈ 1, t, and unique character ϕ induces an isomorphism of Rϕ(α i ) = ωi (e), i ∈ 1, t. The correspondence ϕ → ϕ modules M ∗ → Hom(M, Q/Z). It is important to note that (M ∗ , +) ∼ = (M, +), so in particular M and M ∗ are equivalent alphabets. Let us deﬁne the product of α ∈ M by ϕ ∈ M ∗ as ϕα = ϕ(α) ∈ Q. Then for a ﬁxed α ∈ M the correspondence ϕ → ϕα induces a homomorphism R M ∗ → Q belonging to M ∗∗ = HomR (M ∗ , Q). We identify this homomorphism with α, obtaining equality M ∗∗ = M. Note that if R M = R Q is a QFmodule over R then the Rmodule M ∗ = Q∗ = HomR (Q, Q) is isomorphic to R [18,52]. In particular if R is a QFring and M = R, then there are natural identiﬁcations: Q = R and M ∗ = M = R.
368
V.L. Kurakin et al.
1.2. The Dual Code and the ParityCheck Matrix [55]. Let now K
α∈K
(1.1) According to [52] there exists a distinguishing character χ : (Q, +) → (C∗ , ·) of the module R Q, i.e. such a character that χ(K) = 1 for every nonzero submodule K < R Q. The following theorem may be considered as an extension of results of [14] for Hamming weight enumerators of dual codes over a ﬁnite abelian group to complete weight enumerators of dual codes over modules. Theorem 3 ([55]) There is the MacWilliams identity m
WKo (y) =
1 µ1 (y), ..., µ m (y)), where µ s (y) = χ(µ∗t µs )yµ∗t , s ∈ 1, m. WK ( K t=1
Linear Codes and Polylinear Recurrences over Finite Rings and Modules
369
The (Hamming) weight enumerator of a code K < M n satisﬁes the equalities WKH (x, y) = WK (x, y, ..., y),
WKHo (x, y) =
1 W H (x + (m − 1)y, x − y). K K
These results generalize the preceding results of [50,51,52] (for the case when is a QFmodule) and results of [27,7] (for the case M = R = Zm ). The last theorem implies also the result of [77] about MacWilliams identity for linear codes over a ﬁnite (in general noncomutative) Frobenius ring since the latter always can be presented as a module over a suitable commutative QFsubring. A slightly diﬀerent approach to the concept of duality was proposed in [17].
RM
2
Comparison of Codes over Fields and Modules
Here the study of linear codes over a module R M is reduced to the case when the ring R is local. In this case it is possible to build linear codes over R M which inherit the properties of linear codes over the residue ﬁeld R. But the possibilities to build the linear codes with better parameters, than those of the linear codes over the ﬁeld of M elements, are limited in some sense: every such code is majored (cf. 2.2) by a linear code over the space L of cardinality M. 2.1. Reduction to Local Rings. A ﬁnite commutative ring R is called local if its nilradical N = N(R) (the set of all nilpotent elements) is a maximal ideal of R. Then N is the unique maximal ideal of R. Any ring we consider has ˙ +R ˙ t . If e = e1 + decomposition into a direct sum of local subrings: R = R1 +... ... + et , where es ∈ Rs , then es is the identity of Rs and Rs = es R, s ∈ 1, t. The module M and the code K < R M n have corresponding decompositions: ˙ +M ˙ t , K = K1 +... ˙ +K ˙ t , where Ms = es M is an Rs module and M = M1 +... Ks = es K is a linear ncode over Ms . Proposition 4 ([60,55]) d(K) = min{d(K1 ), . . .,d(Kt )}. ¯ = R/N is a ﬁeld Let now R be a local ring with the nilradical N. Then R of elements r¯ = r + N, r ∈ R. The socle S(M) of the module R M (of the code K < R M n ) is deﬁned as the sum of all its irreducible submodules, and S(M) = AnnM (N) (S(K) = AnnK (N)) (see p. 1.1). We may consider S(M) ¯ where r¯α = rα for all r¯ ∈ R ¯ and α ∈ S(M). as a space over the ﬁeld R, Proposition 5 ([60,55]) Let R be a local ring and K < R M n . Then S(K) is a linear ncode over the space R¯ S(M) and d(K) = d(S(K)). It allows to “turn linear codes over ﬁelds into linear codes over modules”. We say that K ⊆ M n is an [n, k, d]code over M if K = Mk , d(K) = d. Proposition 6 ([55]) Let R be a local ring. If there exists a linear [n, k, d]code ¯ then there exists a linear systematic Rclosed [n, k, d]code L over the ﬁeld R, ¯ = 1 and L is a cyclic code, then K can be chosen to be K over R M. If (n, R) cyclic also.
370
V.L. Kurakin et al.
This result generalizes some particular results of papers [3,4,69,71,72]. 2.2. Linear Codes over Fields, Spaces and Modules. Let L be an elementary abelian pgroup of order q = pt , i.e. a ﬁnite linear space over GF (p). If t > 1 then there exist linear codes over L which are better than linear codes over GF (q). Let BL (n, 3) (Bq (n, 3)) be the maximum of the cardinalities of linear ncodes over (L, +) (over GF (q)) with the distance 3. r
r
−1 −1 < n ≤ pδ qq−1 − (pδ − 1) for some k ≥ 2 and Proposition 7 ([60]) If pδ−1 qq−1 t−δ δ ∈ 1, t − 1, then BL (n, 3) = p Bq (n, 3).
It is a generalization of earlier result of [25] for some cases when p = 2. The attempts to build linear codes over modules which are better than linear codes over linear spaces unexpectedly failed. We say that a ncode K over M is majored by a ncode L over some alphabet L if L = M, L ≥ K and d(L) ≥ d(K). Theorem 8 ([55]) Let R M be a module over a local ring R, and let R¯ L be a ¯ = R/N. Then any linear space of cardinality L = M over the residue ﬁeld R linear code K < R M n is majored by some linear code L < R¯ Ln . If M is a ﬁnite abelian group and L is a direct sum of elementary abelian groups of cardinality L = M, then any linear ncode over M is majored by some linear ncode over L. See also p. 7.2 below about the Asturian code.
3
Weight Functions on Finite Modules
The investigation of linear codes over modules is not so important for construction of codes which are better than codes over ﬁelds as for description of new linear representations of these codes. In [46,47] it was discovered that certain nonlinear binary codes (Kerdock codes) can be represented as linear codes over Z4 . Later in [20] a variant of this representation (the so called Gray map) was found which gives an isometry between the Leemetric space Zn4 and the Hammingmetric space F2n 2 . Meanwhile in [37,38,56] a generalized Kerdock code over Fq , q = 2l , was built as a representation of some linear code over the Galois ring GR(q 2 , 4). Independently, in [8,9] the socalled homogeneous weight on the residue class ring Zm was introduced, and the resulting metric was characterized by algebraicinformationtheoretic properties. With a suitable generalization of this weight to the ring GR(q 2 , p2 ) the Reed–Solomon map [56] isometrically embeds GR(q 2 , p2 )n into the space (Fnq q , wHam ). So the notion of homogeneous weight is closely related with the representations of codes over ﬁelds by linear codes over modules. Here we present the results of [22,24], where general notion of homogeneous weight on ﬁnite modules M over arbitrary ﬁnite (possibly noncommutative) rings R was introduced and investigated. A function w : M → R is called a weight if (W1) ∀x ∈ M : w(x) ≥ 0, w(x) = 0 ⇔ x = 0; (W2) ∀x ∈ M : w(x) = w(−x); (W3) ∀x, y ∈ M : w(x + y) ≤ w(x) + w(y).
Linear Codes and Polylinear Recurrences over Finite Rings and Modules
371
For any weight w : M → R the function ρw (x, y) = w(x−y) deﬁnes a translationinvariant metric on M, and every translationinvariant metric ρ on M arises in this way from the weight wρ (x) = ρ(x, 0). We call a function w : M → R egalitarian, if (H1) there exists ζ ∈ R such that x∈U w(x) = ζ · U for any nonzero submodule U ≤ M. This function is called homogeneous if in addition (H2) ∀x ∈ M, ∀u ∈ R∗ : w(x) = w(ux). A module R M is called weighted if it admits an egalitarian weight w. In this case it admits also a homogeneous weight: w∗ (x) = R∗ −1 · u∈R∗ w(ux). Note that Hamming weight w = wHam on R M is homogeneous if and only if the module R M is simple. In [9] the following motivation for introducing the homogeneity axiom (H1) was given. For an arbitrary weight w on R M and n ∈ N the weight wn : R M n → R deﬁned by wn (x) = w(x1 ) + · · · + w(xn ) for x = (x1 , . . . , xn ) ∈ M n turns M n into a translationinvariant metric space. Let now K be a linear code over R M, i. e. a submodule of R M n . Then the projection Ki of K onto the ith coordinate is a submodule of R M. For informationtheoretic purposes it is natural to require that Ki = 0 for every i (i. e., K is a fulllength code) and that the numbers Wi = x∈K w(xi ) satisfy the condition W1 = W2 = · · · = Wn The second condition holds for fulllength linear codes over ﬁelds. It is satisﬁed by every fulllength linear code over R M if and only if w satisﬁes (H1). We give the full description of weighted modules. Let N(R) be the nilradical of the ring R. The quotient ring R = R/N(R) has an ArtinWedderburnMolien decomposition R = R1 ⊕ R2 ⊕ · · · ⊕ Rt ,
(3.1)
where each ideal Rj is a simple subring of R. Hence there exist positive integers mj and prime powers qj (j ∈ 1, t) such that Rj is isomorphic to the matrix ring Mmj (Fqj ). The socle S(R M) of R M (p. 2.1), is an Rmodule, and S(M) = S1 ⊕ S2 ⊕ · · · ⊕ St ,
Sj = Rj M.
(3.2)
Theorem 9 A module R M is weighted if and only if S(M) is a cyclic Rmodule (i.e. nj ≤ mj for j ∈ 1, t) and the decomposition (3.2) does not contain F2 ⊕ F2 or F2 ⊕ F3 as a direct summand. Corollary 10 (a) A ﬁnite abelian group of order m is weighted if and only if it is cyclic and m ≡ 0( mod 6). (b) A faithful module R M over a ﬁnite commutative local ring is weighted if and only if it is a QFmodule. The Corollary 10(a) implies that the Constantinescu—Heise criterion [9] is true not only for cyclic groups but for all ﬁnite abelian groups. A ﬁnite ring R is called a Frobenius ring if R R ∼ = S(R R) and RR ∼ = S(RR ).
372
V.L. Kurakin et al.
Theorem 11 For a ﬁnite ring R both modules R R and RR are weighted if and only if R is a Frobenius ring and the decomposition (3.1) does not contain F2 ⊕F2 or F2 ⊕ F3 . In this case the left and right homogeneous weights on R coincide. We denote by FR the class of all ﬁnite Rmodules and deﬁne the Euler obius function function R : FR → N as R (M) = {x ∈ M  M = Rx}and the M¨ µR : FR → Z by the recursion formulae: µR (0) = 1, U ≤M µR (U) = 0 if M ∈ FR \ 0. Theorem 12 For a weighted module R M there exists the unique homogeneous weight w = wh (x) such that x∈U wh (x) = U for any nonzero submodule U ≤ M. This weight has the form wh (x) = 1 −
4
µR (Rx) R (Rx)
for all
x ∈ M.
(3.3)
Scaled Isometries and Presentation of Codes
We describe here a general enough technique, based on the concept of scaled isometry, providing construction of presentations of linear codes over weighted modules, and give some examples of eﬃcient applications of this technique. For a weighted module R M ∈ FR we ﬁx someegalitarian weight wR . It is extended n n n to wR : M n → R by setting wR (x) = i=1 wR (xi ), and generates a metric n n n ρR (x, y) = wR (x − y) on M . Let now S N be another weighted module over some ring S with an egalitarian weight wS . Suppose that for some d ∈ N and ζ ∈ R \ 0 there exists a mapping σ : M → N d satisfying ∀a, b ∈ M : ρdS σ(a), σ(b) = ζ · ρR (a, b). (4.1) We call σ an isometry with scale factor ζ or, for short, a scaled isometry. It induces for every n ∈ N a scaled isometry σ n : (M n , ρnR ) → (N dn , ρnd S ) with the same scale factor. With every code C ⊆ M n we associate the code C = σ n (C) ⊆ N nd and call C a σrepresentation of the code C. Note that if C is distance invariant (relative to the metric ρnR ) then such is C (relative to the metric ρnd S ). If C is a linear code over R M, we call C a σlinear code (and sometimes brieﬂy an R Mlinear code). An R Mlinear code C is distance invariant but may be nonlinear. This approach allows to construct some new good codes and to ﬁnd new compact representations of some wellknown codes. In [20] an isometry between (Z4 , ρZ4 ) and (F22 , ρHam ) was rediscovered (the socalled Gray mapping Φ : Z4 → F22 , 0 → 00, 1 → 01, 2 → 11, 3 → 10), and the term Z4 linear code was introduced for what we call a Φlinear code. This approach allows to repeat the proof of Z4 linearity of binary Kerdock code [46,47] and to prove the Z4 linearity of Preparata, DelsarteGoethals and some other codes. The more general form of this mapping for Galois rings in [56,34] gives 4.1. A Generalized Kerdock Code over Fq , q = 2l . Let R = GR(q 2 , 4) be the Galois ring of characteristic 4 and cardinality q 2 , q = 2l , l ≥ 1. A generalized
Linear Codes and Polylinear Recurrences over Finite Rings and Modules
373
Kerdock code Kq (m + 1) over Fq (m is odd) is a ReedSolomon presentation of the so called base linear code KR (m) over the ring R. Let S = GR(q 2m , 4) be an extension of degree m of the ring R. The set m Γ (S) = {β ∈ S : β q = β} is closed under multiplication and consists of q m elements. Any element β ∈ S can be written uniquely as the sum β = β0 + 2β1 , where βt = γt (β) ∈ Γ (S), t = 0, 1. If we deﬁne ⊕ on Γ (S) by the rule u ⊕ v = γ0 (u + v) then (Γ (S), ⊕, ·) is Fqm and Γ (R) = {β ∈ R : β q = β} is the subﬁeld Fq of Γ (S). Let θ be a primitive element of the ﬁeld Γ (S). The base code KR (m) is deﬁned as a linear code of length h = q m over R consisting of all words v = (v(0) . . . v(h − 1)) such that for some ξ ∈ S, c ∈ R S (ξθi ) + c, i = 0, h − 2, v(i) = T rR
v(h − 1) = c, (4.2) S S (x) is the tracefunction from S onto R: T rR (x) = σ σ(x), σ spans where T rR the group of automorphisms of S over R. Let now Γ (R) = {ω0 = 0, ω1 = e,. . . , ωq−1 }. Deﬁne γ∗ : R → Γ (R)q on elements r = r0 +2r1 ∈ R as γ∗ (r) = (r1 , r1 ⊕ω1 r0 , . . . , r1 ⊕ωq−1 r0 ). Then γ∗ (R) is a Reed–Solomon [q, 2, q − 1]q code over Fq and therefore the mapping γ∗ (R) is called the RSmapping [56]. It is easy to see that γ∗ (R) is a scaled isometry of the space R with the homogeneous weight into the Hamming space Γ (R)q . The code Kq (m + 1) is a concatenation of the code KR (m) (linear over R) and the code γ∗ (R) (linear over Γ (R)). It is the code of length n = q m+1 , consisting of all words γ∗h (u) = (γ∗ (u(0)), . . . , γ∗ (u(h − 1)), u ∈ KR (m). If q = 2, i.e. R = Z4 , this code is the original binary Kerdock code [15,7,15]. Theorem 13 ([34]) Let m = 2λ + 1 ≥ 3. Then the code Kq (m + 1) is √ an Rlinear (n, n2 , q−1 n))q code with the complete weight enumerator q (n − WKq (m+1) (x0 , ..., xq−1 ) = q−1 q−1 q−1 n/q 1 n −qλ q−1 qλ+1 xnj + (q m+2 − q) xj + q(q m − 1)(q m + q λ+1 ) xjq xj + 2 j=0
j=0
q−1
λ n q +q
1 + q(q m − 1)(q m − q λ+1 ) xj 2 j=0
q−1 j=0
j=0
x−q j
λ+1
j=0
.
4.2. Presentations of the Extended Binary Golay Code [24,22]. It can be presented as a linear code over the ring R = F2 ⊕ F4 . Note that smaller rings of such form (F2 ⊕ F2 , F2 ⊕ F3 ) are not weighted. Proposition 14 Let e = e1 + e2 be the corresponding decomposition of the identity of R, and F4 = F2 (α). Let σ : R → F32 be the F2 linear map deﬁned by e1 → 111, e2 → 110, α → 011 and K ≤ R M 8 be the Rlinear code with paritycheck matrix e1 e2 e e e 0 0 0 e e1 e2 e 0 e 0 0 (4.3) e e e1 e2 0 0 e 0 . e2 e e e1 0 0 0 e
374
V.L. Kurakin et al.
Then the mapping σ is a scaled isometry from (R, ρR ) onto (F32 , ρ3Ham ) with the scale factor 32 , and the code σ 8 (K) is a linear binary (Golay) [24, 12, 8]code. Another isometric representation of the same code is based on some egalitarian, but not homogeneous weight. Let R = F2 [x]/(x4 ) = F2 [z], where z = x+(x4 ) is the image of x in F2 [x]/(x4 ). Every a ∈ R has the unique representation a = a0 + a1 z + a2 z 2 + a3 z 3 with aj ∈ F2 . Deﬁne τ : R → F42 and w : R → R by setting τ (a) = (a0 + a1 + a2 + a3 , a1 + a3 , a2 + a3 , a3 ) and w(a) = wHam τ (a) . The function w is an egalitarian weight on R R. Proposition 15 Let K ≤ R R6 be the linear code with paritycheck matrix 100vzz 0 1 0 z v z , (4.4) 001zzv where v = 1 + z 3 . The code τ 6 (K) is the linear (Golay) [24, 12, 8]code over F2 . 4.3. Presentation of Ternary Golay Code [22]. Let R = F3 [x]/(x3 ) = F3 [z] with z = x + (x3 ). Then any a ∈ R has the unique representation a = a0 + a1 z + a2 z 2 with aj ∈ F3 . Let now σ : R → F33 be deﬁned by σ(a) = 3 (a0 − a1 + a2 , a1 + a2 , a2 ). Then w : R → R, deﬁned by w(a) = wHam σ(a) , is an egalitarian weight on R R. Proposition 16 Let K ≤ R R4 be the linear code with paritycheck matrix 1 0 v v2 , (4.5) 0 1 v 2 −v where v = 1 + z 2 . The code σ 4 (K) is a linear (Golay) [12, 6, 6]code over F3 . 4.4. Scaled Isometry over a Commutative QFRing [24,22]. Let now R be a ﬁnite commutative ring with the unique minimal ideal S, i. e. R is a local quasiFrobenius ring with soc R = S. We construct a scaled isometry from the weighted Rmodule R R into a suitable Hamming space Fnq . Let J = rad R and R = R/J ∼ = Fq . The set Γ (R) = {a ∈ R  aq = a} is a full system of representatives for R, and thus has a natural ﬁeld structure Γ (R), ⊕, ·). There exists a system of elements π0 , . . . , πl ∈ R such that πl is a generator of S and every x ∈ R has the unique representation x = a0 π0 + · · · + al πl
with ai ∈ Γ (R) for i ∈ 0, l.
(4.6)
We ﬁx such system (π0 , . . . , πl ) and deﬁne the functions γi : R → Γ (R) by setting l x = i=0 γi (x)πi . Consider ((l + 1) × q l )matrix G(l, q) with entries from Γ (R) whose columns are the vectors (a0 , . . . , al−1 , 1), (a0 , . . . , al−1 ) ∈ Γ (R)l , in some ﬁxed order. The qary linear [q l , l + 1]code with generator matrix G(l, q) is the generalized Reed–Muller code C = GRM(l, 1, q), cf. [21, Ch. 9.5]. It is a twoweight code with nonzero weights q l − q l−1 and q l .
Linear Codes and Polylinear Recurrences over Finite Rings and Modules
375
l
Proposition 17 The mapping σ : R → Γ (R)q , deﬁned by σ(x) = γ0 (x), γ1 (x), . . . , γl (x) · G(l, q),
(4.7)
a generalized is a scaled isometry with scale factor q l − q l−1 from (R, ρR ) onto Reed–Muller [q l , l, q l − q l−1 ]code GRM(l, 1, q), over the ﬁeld Γ (R), ⊕, · . Some particular cases of this result can be found in [33,56,9]. For a generalization to arbitrary ﬁnite commutative local rings (using the notion of a quasiFrobenius module) we refer to [24]. About linearly representable codes over chain rings see also [23]. We have also to mention here the works [70,78] concerning the extendibility of code isometries.
5
General Theory of Polylinear Recurring Sequences
As it was already pointed out in the Introduction, the theory of (poly)linear recurrences over modules was actively developing recently due to various applications, in particular, in coding theory. It is worth noting that, as in previous sections, quasiFrobenius modules play special role. Here we use the results of [33,36] and of works cited therein, in particular [50,54,61,65,66,67,68,74,75,79]. 5.1. Main Concepts. For k ∈ N, we call any function µ : Nk0 → M a ksequence over a module R M. We write µ = µ(z), where z = (z1 , . . . , zk ) is the row of free variables over N0 . The set M k of all ksequences over M is an Rmodule relative to the usual operations on functions. Let Pk = R[x1 , . . . , xk ] = R[x] be the polynomial ring of k variables. For any s = (s1 , . . . , sk ) ∈ Nk0 denote the monomial xs11 . . . xskk by xs . Then any F (x) ∈ Pk has the form F (x) = s the structure of a Pk module on M k by F (x)µ = ν, ν∈ s fs x . We deﬁne
k M , ν(z) = s fs µ(z + s). An ideal I of Pk is called monic if there exist monic polynomials F1 (x), . . . , Fk (x) ∈ R[x] (of one variable) such that F1 (x1 ), . . . , Fk (xk ) ∈ I.
(5.1)
If I is a monic ideal, then the quotient ring Pk /I is ﬁnite, and vice versa. The annihilator Ann(M) = AnnPk (M) = {F (x) ∈ Pk : F (x)M = 0} of any subset M ⊂ M k in the ring Pk is an ideal of Pk . We say that a sequence µ ∈ M k is a klinear recurring sequence (kLRS) over a module M if I = Ann(µ) is a monic ideal. In this case, polynomials (5.1) are called elementary characteristic polynomials of the kLRS µ. Proposition 18 The set LM k of all kLRS µ ∈ M k is a submodule of the Pk module M k . For any subset I ⊂ Pk the set LM (I) = {µ ∈ M k  Iµ = 0}
k is also a submodule of this module. Moreover, LM (I) ⊂ LM if and only if Pk (I) is a monic ideal of Pk .
376
V.L. Kurakin et al.
For a monic ideal I Pk the set LM (I) is called a kLRSfamily over R M. Since I · LM (I) = 0 the Pk module LM (I) may be considered as a module over the ring S = Pk /I = R[θ1 , . . . , θk ], where θs = xs + I. This ring will be called the operator ring of the ideal I (of the family LM (I)). If I = Ann(µ) for some µ ∈ M k , then S is said to be the operator ring of the ksequence µ. 5.2. Relations between LRSFamilies and their Annihilators. The following relations between 1LRS families over a ﬁeld P are wellknown. For any monic polynomials F (x), G(x) ∈ P = P [x] LP (F ) + LP (G) = LP ([F, G]); LP (F ) ∩ LP (G) = LP ((F, G)). Any 1LRS family M over the ﬁeld P has the form M = LP (F ) for some monic polynomial F (x) ∈ P and is a cyclic Pmodule: M = PeF . For any u, v ∈ LP 1 v ∈ Pu ⇔ Mv (x)Mu (x), where Mu (x) is the minimal (characteristic) polynomial of the LRS u. Any monic ideal I ⊆ P is the annihilator of some LRS over P . The attempts to generalized these results on kLRS families over ﬁnite module R M gives the following results. Let Ak = Ak (R) be the set of all monic ideals of the ring Pk = R[x1 , . . . , xk ] and let Mk = Mk (M) be the set of all ﬁnite Pk submodules of the module M k . In this case any element M ∈ Mk is a submodule of LM k and we have a pair of the Galois correspondences Ann : Mk → Ak ,
LM : Ak → Mk .
(5.2)
This means that for any M ∈ Mk , I ∈ Ak we get M ⊆ LM (Ann(M)),
I ⊆ Ann(LM (I)).
(5.3)
These inclusions are strict in general. Moreover we may state that Ann(M1 + M2 ) = Ann(M1 ) ∩ Ann(M2 ); LM (I1 + I2 ) = LM (I1 ) ∩ LM (I2 ); Ann(M1 ∩M2 ) ⊇ Ann(M1 )+Ann(M2 ); LM (I1 ∩I2 ) ⊇ LM (I1 )+LM (I2 ); (5.4) for any ideals I1 , I2 ∈ Ak and modules M1 , M2 ∈ Mk . The inclusions (5.4) are also strict in general, but in particular, we get Proposition 19 Let I1 , I2 be comaximal ideals of Pk . Then LM (I1 ∩ I2 ) = ·
LM (I1 ) + LM (I2 ) (a direct sum), and it is a cyclic Pk module if and only if the modules LM (Is ), s = 1, 2, are cyclic. If M = R is a ﬁeld and k = 1 then the correspondences (5.2) are bijections, and the inclusions (5.3),(5.4) are equalities. The ﬁnite modules satisfying these conditions (i.e. admitting the theory of LRS analogous to the theory of LRS over a ﬁeld) are described as in section 1.
Linear Codes and Polylinear Recurrences over Finite Rings and Modules
377
Theorem 20 ([50,52,54]) The following conditions are equivalent: (a) R M is a QFmodule; (b) the Galois correspondences (5.2) are bijective; (c) for any monic ideal I ⊆ Pk the family LM (I) is a QFmodule over the ring S = Pk /I, and any module M ∈ Mk is a QFmodule over the ring S = Pk / Ann(M); (d) the inclusions (5.4) are equalities; (e) for any recurrences µ, ν ∈ LM k we have ν ∈ Pk µ ⇔ Ann(µ) ⊂ Ann(ν). The following essential supplement to the last Theorem shows an interesting connection between cyclic LRSfamilies and QFrings. Theorem 21 ([50,52,54]) Let R Q be a quasiFrobenius module. Then for any monic ideal I ⊆ Pk the following conditions are equivalent: (a) I = Ann(µ) for some recurrence µ ∈ LQ k ; (b) M = LQ (I) is a cyclic Pk module; (c) S = Pk /I is a quasiFrobenius ring.
Corollary 22 Let F1 (x1 ), . . . , Fk (xk ) be monic polynomials from Pk and S = Pk /(F1 (x1 ), . . . , Fk (xk )). Then S is a QFring if and only if R is a QFring. Theorem 20(c) allows us to build the QFmodule over any ﬁnite commutative ring S as a kLRS family over some principal ideal ring. Really the ring S is an extension S = R[π1 , . . . , πk ] of a principal ideal subring R ≤ S [29]. Then S = Pk /I for some monic ideal I ≤ R[x1 , . . . , xk ]. Since R is a QFring, Theorem 20(c) implies that LR (I) is the required QFmodule over S. 5.3. The Berlekamp—Massey Algorithm. Let R M be a ﬁnite left module over a ﬁnite (not necessary commutative) ring R. We say that a polynomial F (x) = xs − cs−1 xs−1 − ... − c1x − c0 ∈ R[x] generates (from the left) a sequence u(0, l − 1) = (u(0), u(1), ..., u(l − 1)) ∈ M l of length l if s ≥ l or s < l and u(i + s) = cs−1 u(i + s − 1) + ... + c1 u(i + 1) + c0 u(i),
i ∈ 0, l − s − 1.
A monic polynomial of the smallest degree which generates u(0, l − 1) is called its (left) minimal polynomial. The Berlekamp—Massey algorithm ﬁnds a minimal polynomial of a sequence of lengths l with complexity O(l2 ) operations of R and M. For the ﬁrst time the Berlekamp—Massey algorithm was described for sequences over ﬁelds in [1] and [42]. Since then many versions of this algorithm over ﬁelds and rings were proposed (see, e.g., [19,64,68], and bibliography in [32]). In the presented general form the Berlekamp—Massey algorithm was built in [32].
378
6
V.L. Kurakin et al.
Presentations of Linear Codes by Polylinear Recurrences
Some linear codes over a ﬁnite module R M (and all linear codes over any QF module R Q !) may be described in terms of polylinear recurrences over R M. Any ﬁnite subset F = {i1 , ..., in } ⊆ Nk0
(6.1)
F
is called a polyhedron. Let M be the Rmodule of all functions δ : F → M. Any such function is uniquely determined by its valuation diagram δ[F ] = (δ(i1 ), ..., δ(in )) ∈ M n . It is clear that the module R M F is isomorphic to R M n . Of course for any ksequence µ ∈ M
(6.2)
Evidently, K is a submodule of the Rmodule R M F and by the indexing (6.1) we may consider K as a submodule of R M n . Thus K is a linear ncode over R M. In general not every linear code over R M has the form (6.2), but at the same time we have Proposition 23 [60]. Let R Q be a QF module. Then for any linear code K < there exist a parameter k ∈ 1, n, a polyhedron F ⊂ Nk0 of cardinality n and a monic ideal I Pk such that
n RQ
K = LF Q (I).
(6.3) n
It is interesting to ﬁnd the smallest k such that the code K < R Q has the representation (6.3) with the additional condition: F is a Ferrers diagram, i.e. if i ∈ F , j ∈ Nk0 and j ≤ i (in each coordinate) then j ∈ F . Let us call a code K < R M n recursive of dimension k (or kdim recursive), if it has a presentation (6.2) for some ideal I Pk , Ferrers diagram F ⊆ Nk0 and the ordering (6.1) of their elements. The minimal k with this property will be called the recursive dimension of the code K. First important class of recursive codes gives Theorem 24 (Asturian Theorem [60].) Let K < R M n be a systematic linear code of the rank m with a paritycheck matrix (1) (1) f0 . . . fm−1 −e 0 . . . 0 (2) (2) f0 . . . fm−1 0 −e . . . 0 H = . , k = n − m. .. .. . . .. . . . . . . (k)
f0
(k)
. . . fm−1 0
0 . . . −e
k×n
Then K is a kdim recursive code satisfying the equality (6.2), where I Pk is (1) m−1 − ... − the ideal generated by the polynomials F1 (x1 ) = xm 1 − fm−1 x1 (1) (1) (2) (2) m−1 − . . . − f0 , · · · , Fk (x1 , xk ) = xk − f1 x1 − f0 , F2 (x1 , x2 ) = x2 − fm−1 x1
fm−1 xm−1 − . . . − f0 ; and F ⊂ Nk0 1 {0, e1 , 2e1 , . . . , me1 , e2 , . . . , el }. (k)
(k)
is Ferrersdiagram of the form F =
Linear Codes and Polylinear Recurrences over Finite Rings and Modules
7
379
Recursive and Linear Recursive MDS Codes
The connection between MDScodes, Latin squares and quasigroups is wellknown. We present here the results of [10,11,12] where these objects are studied under the additional practically useful condition of recursivity for the corresponding code. 7.1. Quality Parameters of Recursive Codes. A code K ⊂ Ω n in an alphabet Ω of q elements is called krecursive, 1 ≤ k < n, if there exists a function f : Ω k → Ω such that K = K(n, f) consists of all the rows u(0, n − 1) = (u(0), . . . , u(n − 1)) ∈ Ω n with the property u(i + k) = f(u(i, i + k − 1)),
i ∈ 0, n − k − 1.
In other words, K is the set of all output nsequences of a feedback shift register with a feedback function f. In [10] the MDScodes of such type, i.e. recursive [n, k, n − k + 1]q codes are investigated, and the following parameters are considered: n(k, q) – the maximum of lengths of MDS codes K of (combinatorial) dimension k (K = q k ) in alphabet Ω of cardinality q; r n (k, q) – the maximum of lengths of krecursive MDS codes of the same type; l(k, q) – the maximum of lengths of MDS codes K of (combinatorial) dimension k which are linear over an Abelian group (Ω, +) for some operation + (i.e. K is a subgroup of the Abelian group Ω n and K = q k , where q = Ω and n ∈ N). We also call such codes linear in wide sense. lr (k, q) – the analog of l(k, q) for recursive codes. For the primary (the power of a prime) numbers q one can also deﬁne m(k, q) – the analog of l(k, q) for the codes which are linear over the ﬁeld Fq . mr (k, q) – the analog of m(k, q) for the recursive codes. Moreover, we call the above function f(x) idempotent if it satisﬁes the identity f(x, . . . , x) = x (this means that all “constants” (a, . . . , a) belong to K(n, f)). Thus, in addition to the above six parameters we can introduce nir (k, q), lir (k, q) and mir (k, q) (only for primary q); “ir” means “idempotent recursive”. So we have the following matrix of parameters ir [m (k, q) mr (k, q) m(k, q)] M(k, q) = lir (k, q) lr (k, q) l(k, q) nir (k, q) nr (k, q) n(k, q) whose entries are not decreasing from left to right and from up to down. Naturally, the ﬁrst row of this matrix (put in brackets) is present only when q is primary. It is interesting to estimate and to compare the entries of M(k, q) for various values of k and q. In what follows the equality xy (k, q) = k means that the corresponding code does not exist. A standard argument gives the following source of estimations for the entries of M(k, q):
380
V.L. Kurakin et al.
Proposition 25 If x ∈ {l, n}, y ∈ {∅, r, ir} and k, q1 , q2 ∈ 2, ∞ then xy (k, q1 q2 ) ≥ min{xy (k, q1 ), xy (k, q2 )}. 7.2. Results for k = 2. It is wellknown that n(2, q) = 2 + N (q) where N (q) is the maximal number of mutually orthogonal Latin q × qsquares. The latter were studied extensively (see [15,16,21]). We cite here only the following general conclusion: Theorem 26 Let q ∈ N, q > 1. Then: (a) N (q) ≥ 2 if q ∈ {2, 6}, N (q) ≥ 3 if q ∈ {2, 3, 6, 10}; (b) N (q) ≤ q − 1, if q is primary, then N (q) = q − 1; (c) N (q1 q2 ) ≥ min{N (q1 ), N (q2 )}, in particular, if q = q1 · . . .· qt is the canonical factorization of q then N (q) ≥ min{q1 − 1, . . . , qt − 1}; 10 (d) N (q) ≥ q 143 − 2. For small values of q we have (omitting trivial cases q = 2, 3): 788 566 355 M(2, 4) = 3 5 5 , M(2, 5) = 5 6 6 , M(2, 7) = 7 8 8 , 788 566 355 Proposition 27 If q is a primary number then mr (2, q) = nr (2, q) = n(2, q) = q + 1; if q is a prime; q mir (2, q) = lir (2, q) = q − 1 if q is not a prime (A.Abashin (private communication)). Corollary 28 For a prime p and t > 1 there are no cyclic codes among ones permutation equivalent to a Reed–Solomon [pt , 2, pt − 1]− or [pt , pt − 2, 3]−codes. Theorem 29 ([10]) For any q > 2, except q = 6 and possibly q ∈ {14, 18, 26, 42}, nr (2, q) ≥ 4. Really, the last inequality may be sharpened for many values of q. Some of the stronger estimations are easily deduced from Propositions 25,27. We call them standard ones. Other estimations are presented in Theorem 30 ([10]) The following nonstandard lower estimations are valid: nr (2, q) ≥ 8 for q = 80. nr (2, q) ≥ 7 for q ∈ {50, 57, 58, 65, 70, 78, 84, 85, 86, 92, 94, 95, 96, 97, 98}. nr (2, q) ≥ 6 for q ∈ {54, 62, 66, 68, 69, 74, 75, 76, 82, 87, 90, 93}. nr (2, q) ≥ 5 for q ∈ {21, 24, 39, 44, 60}. nir (2, q) ≥ 7 for q ∈ {50, 57, 58, 65, 70, 78, 84, 85, 86, 92, 94, 95, 96, 97, 98}. nir (2, q) ≥ 5 for q ∈ {54, 62, 66, 68, 69, 74, 75, 76, 82, 87, 90, 93}.
Linear Codes and Polylinear Recurrences over Finite Rings and Modules
381
7.3. Results for k > 2. The following estimations in nonrecursive case are known. Theorem 31 ([43,21]) (a) If q ≤ k then n(k, q) = k + 1. (b) If k ≤ q and q is even then n(k, q) ≤ q + k − 1. (c) If 3 ≤ k ≤ q and q is odd then n(k, q) ≤ q + k − 2. (d) If k ∈ 1, q + 1 and q is primary (i.e. a power of a prime) then mr (k, q) ≥ q+1. (e) n(3, q) = q + 1 for primary odd q. (f) n(3, q) = n(q − 1, q) = q + 2, for primary even q. The “recursive version” of these results is contained in the following Proposition 32 ([12]) If q ≤ k then lr (k, q) = n(k, q) = k +1 and, for primary q, mr (k, q) = k + 1. The wellknown conjecture of Bush [5,43] states that m(k, q) = q + 1 for 2 ≤ k ≤ q; except the case m(3, q) = m(q − 1, q) = q + 2 for even q. For k = 3 we have mr (3, q) = q + 1 and PC calculations give Proposition 33 For every primary q ∈ 4, 128, the number of linear recursive [q + 1, 3, q − 1]codes is equal to (1/2)ϕ(q + 1)(q − 1). Thus, all the codes enumerated in this Proposition may be constructed in some natural way from the linear cyclic codes indicated in [43, ch.11, Theorem 9]. One can conjecture that it is true for all primary q. For the case k = 3, q = 4 we have the following Proposition 34 mr (3, 4) = 5 < lr (3, 4) = nr (3, 4) = m(3, 4) = n(3, 4) = 6. One of the codes whose existence proves the preceding proposition is linear in wide sense with recursive function f(x1 , x2 , x3 ) = αx21 + αx2 + x23 . over F4 = F2 (α) We call it the Asturian code. The existence of the Asturian code gives some important theoretical corollaries: (1) there exist linear in wide sense recursive codes that are better than any linear in the classical sense recursive code, and (2) for some of the best known linear in classical sense but non recursive codes, there exist linear in wide sense recursive codes with the same parameters. However, for k = 3 the Asturian code is an exclusive example because the PC calculations give lr (3, 8) = 9 = 8 + 1, lr (3, 16) = 17 = 16 + 1, and the following statement is valid: Proposition 35 (A.Abashin, private communication) If t > 4 then lr (3, 2t ) = 2t + 1. Corollary 36 There are no cyclic codes among ones linearly (in the wide sense) equivalent to twice extended Reed–Solomon [2t + 2, 3, 2t ]− or [2t + 2, 2t − 1, 4]−codes.
382
8
V.L. Kurakin et al.
Periodic Properties of Polylinear Recurrences over Finite Modules
In the theory of LRS over Galois ﬁelds the technique of period calculation for linear recurrences and the constructions of maximal period LRS occupy important place. In [33,36] these results are generalized to the kLRS over ﬁnite rings and modules. Here we present only two series of results that illustrate the advance in this direction. 8.1. Periodic kSequences and kLRS [33,36]. Let µ ∈ M k be a ksequence. For a ﬁxed l ∈ Nk0 , d ∈ Nk0 \0 we call a 1sequence µ[l,d] (z) = µ(l + dz) a regular (l, d)extract (extract in the direction d) of the sequence µ. We say that the sequence µ is (l, d)periodic (periodic in the direction d) if µ[l,d] is a periodic sequence (for any l ∈ Nk0 ). A sequence µ ∈ M k is called a periodic (reversible) sequence if any regular (l, d)extract of this sequence is a periodic (reversible) 1sequence. Proposition 37 For µ ∈ M k the following conditions are equivalent: (a) µ is a periodic (respectively, reversible) sequence; (b) the ideal Ann(µ) contains a system of elementary polynomials of the form xl11 (xt11 − e), . . . , xlkk (xtkk − e) (respectively, of the form xt11 − e, . . . , xtkk − e) for some ls ∈ N0 , ts ∈ N, s ∈ 1, k; (c) µ is periodic (respectively, reversible) in each of the directions e1 ,. . .,ek , where es is the sth row of the identity matrix. Proposition 38 A ksequence (over a ﬁnite module) is periodic if and only if it is a kLRS. A nonzero vector t ∈ Nk0 is called a vectorperiod of the sequence µ ∈ M k if x (xt − e)µ = 0 for some l ∈ Nk0 . A subgroup P(µ) ≤ (Zk , +), generated by all vectorperiods of µ, will be called its group of periods. If µ has no vectorperiods, then P(µ) = 0. l
Proposition 39 The reversibility of the sequence µ ∈ M k is equivalent to the condition ∀i ∈ Nk0 ∃j ∈ Nk0 : xj (xi µ) = µ. If the sequence µ is reversible, then for any i ∈ Nk0 the sequence ν = xi µ is also reversible and P(ν) = P(µ); for any t ∈ P(µ), we have xt µ = µ. The set O(µ) of all ksequences ν ∈ M k of the form ν = xi µ, i ∈ Nk0 , is called a trajectory of µ. Proposition 40 A sequence µ ∈ M k is periodic if and only if its trajectory O(µ) is ﬁnite, and it is reversible if and only if O(µ) = O(xi µ) for any i ∈ Nk0 . For a periodic sequence µ the set T (µ) of all reversible elements of its trajectory O(µ) is called the cycle of the sequence µ, and its cardinality T (µ) = T (µ) is called the period of the sequence µ. The set D(µ) = O(µ)\T (µ) is called the set
Linear Codes and Polylinear Recurrences over Finite Rings and Modules
383
of all defect elements of the trajectory O(µ), and its cardinality D(µ) = D(µ) is called the defect of the sequence µ. The sequence µ is said to be degenerating if it is periodic and its cycle contains only the zero sequence, i.e., T (µ) = {0}. Thus, D(µ) + T (µ) = O(µ), and a periodic sequence µ is reversible iﬀ D(µ) = 0, i.e., T (µ) = O(µ). A periodic sequence is degenerating iﬀ xi ∈ Ann(µ) for some i ∈ Nk0 . Proposition 41 If µ ∈ M k is a periodic sequence, then T (µ) is the index [Zk : P(µ)] of the subgroup P(µ) of the group (Zk , +), and O(µ) ≤ Pk / Ann(µ). Let us denote the Pk modules of all reversible and degenerating ksequences over by RM k , and DM k respectively.
RM
·
Proposition 42 LM k = RM k + DM k . The following result gives us an interesting relation between the properties of reversible sequences and the properties of associated rings. Theorem 43 Let µ ∈ M k , and let S = Pk / Ann(µ) be the operator ring of µ (see sec. 5), θs = xs + Ann(µ) for s ∈ 1, k. Then the sequence µ is reversible if and only if θ1 , . . . , θk ∈ S ∗ . If µ is a reversible sequence, then T (µ) =  θ1 , . . . , θk  ≤ S ∗  ≤ S − 1, where θ1 , . . . , θk is a subgroup of the group S ∗ generated by θ1 , . . . , θk . The equality T (µ) = S ∗  holds if and only if S ∗ = θ1 , . . . , θk .
(8.1)
If µ is a reversible sequence and Ann(µ) ∩ R = 0, then T (µ) = S − 1 if and only if the following three conditions hold: (a) R = Fq ; (b) Ann(µ) is a maximal ideal of the ring Pk (i.e., S = Fqr for some r ∈ N); (c) the equality (8.1) is true. Let R Q be a QFmodule corresponding to the ring R. A reversible sequence µ ∈ Q k is called fullcycle if its annihilator I = Ann(µ) satisﬁes the conditions I ∩ R = 0; S ∗ = θ1 , . . . , θk ; LQ (I) = Sµ.
Theorem 44 Let µ be a fullcycle krecurrence over QFring and for any ν ∈ LQ (I) = Sµ P(µ) ⊆ P(ν); T (ν)T (µ);
R Q,
then S = Pk /I is a
(T (ν) = T (µ)) ⇒ (v ∈ T (u)).
Theorem 45 Let R < S where S is a QFring. Then there exists a fullcycle LRS over the QFmodule R Q with the ring of operators isomorphic to S.
384
V.L. Kurakin et al.
8.2. Linear Recurrences of Maximal Period over Galois Ring. This is the the most deeply studied and very interesting in various applications class of linear recurrences over rings. Let R = GR(q n , pn ) be a Galois ring of order q n and of the characteristic pn (p is a prime, q = pr ), and let F (x) ∈ P = R[x] be a monic polynomial of degree m. Then for any LRS u ∈ LR (F (x)) the inequality T (u) ≤ (q m − 1)pn−1 holds. If in this situation T (u) = (q m − 1)pn−1 . then we say that the sequence u is a linear recurring sequence of maximal period (MPrecurrence) of rank m over a Galois ring R. For any monic polynomial F (x) ∈ P there exist λ ∈ N0 , t ∈ N such that F (x)xλ (xt − e). The minimal t with this property is called a period of F (x) and denoted by T (F ). Denote by u and F the images correspondingly of a sequence u and a polynomial F under the natural homomorphism R → R = R/pR. Note, that if degF (x) = m, then T (F ) ≤ T (F (x))pn−1 ≤ (q m − 1)pn−1 . If T (F ) = (q m − 1)pn−1 , then F is called a polynomial of maximal period (MPpolynomial) over R. Proposition 46 Under the above assumptions an LRS u ∈ LR (F ) is an MPrecurrence of rank m if and only if F (x) is an MPpolynomial over R and u = 0. In [33] a simple algorithm for the building an MPpolynomial was presented. It has the following simplest form for p = 2: let R = GR(q n , 2n ), q = 2r , and let G(x) ∈ R[x] be a monic polynomial of degree m such that T (G) = q m − 1. Let the polynomials G[0] (x), G[1] (x), . . . ∈ P be deﬁned recursively: G[0] (x) = G(x), [k] [k] and if G[k] (x) = G(0) (x2 ) + xG(1) (x2 ) then G[k+1] (x) = (−1)m (G(0) (x)2 − xG(1) (x)2 ). [k]
[k]
Theorem 47 Any polynomial of the form F (x) = G[r] (x) + 2∆(x),
deg∆(x) < m,
∆(x) = 0, e
is an MPpolynomial There are some simple conditions for the polynomial over residue ring Zpn to have a maximal period. Theorem 48 ([33]) Let F (x) = xm + am−1 xm−1 + . . . + a0 be a polynomial over Zpn , such that T (F ) = pm − 1. Then F (x) is an MPpolynomial when 1. p > 2 and: ap0 ≡ a0 ( mod p2 ), or F (x) = xm + ak xk + a0 , m ≥ p − 2. 2. p = 2 and: (a) m is even, a0 ≡ e ( mod 4); e + 2a0 a2 if a1 = e; or (b) m is odd and a1 ≡ 2(e + a0 a2 ) if a1 = 0; 4 or (c) F (x) = xm + ak xk + a0 , ak , a0 ∈ {−e, e}, (m, a0 ) = (2k, e).
Linear Codes and Polylinear Recurrences over Finite Rings and Modules
9
385
Pseudorandom Sequences Generated by Polylinear Recurrences
Here we restrict ourselves only with some illustrations of the statement that polylinear sequences over a Galois ring are good source of pseudorandom sequences. More detailed information on this subject is contained in [33,36]. 9.1. Coordinate Sequences of MPRecurrence. Let u be an MPrecurrence of the period (pm − 1)pn−1 with the minimal polynomial G(x) of the degree m over the ring R = Zpn . Any item u(i) of the sequence u has standard pary decomposition: u(i) = uo (i) + u1 (i)p + ... + un−1(i)pn−1 ; us (i) ∈ 0, p − 1. The latter gives us n sequences uo , ..., un−1 over the ﬁeld Zp . For suﬃciently large but acceptable values of m and s the sequence us is a very good source of pseudorandom numbers. Of course us is an LRS over Zp . Let rk us be rank or linear complexity of us: the degree of its minimal polynomial. Apparently we can consider us as an “approximation” of a random sequence only if rk us is large enough. There are some estimations of this parameter for the simplest case p = 2. For k, l ∈ N denote b(k, 0) = k, b(k, l) = 0, if k < 2l, and in other cases 1, if l is even or l = 1, b(k, l) = k − 2l + 2, if l is odd l ≥ 3. Theorem 49 If p = 2 then rk us ≤
ps−1 l=0
(l + 1) ·
b(l)
t=b(l+1)+1
m t
.
Due to the limits on the length of the text, we point out here only one of the earliest and simplest lower estimations of rank. Theorem 50 (A.Nechaev, 1982 [33]) If p = 2 then the polynomial G(x) can be chosen such that for s ∈ 3, n − 1 s−2 m m s−1 m + + s , (9.1) rk us ≥ (2s−1 +1)m+L 2 (2s−1 −2k +1) k+1 +1 2 4 2 k=2
where L ∈ {0, 1} and L = 1 for m ≤ 14 and m = 20. We conjecture that L in (9.1) is always equal to 1. The lower estimations of rk us for p = 2 are also given in [13]; these estimations do not contain the ﬁrst and the second summands of (9.1). In [33] one can ﬁnd more precise rank estimations for us for p = 2 as well as for p > 2. The PCcalculations, using these estimations give, for example, that the polynomial G(x) ∈ Zpn [x] can be chosen so that for p = 2 if m = 11, then 3383 ≤ rk u3 ≤ 5340, 59, 703 ≤ rk u7 ≤ 128, 430; if m = 31, then 1.37 · 107 ≤ rk u3 ≤ 1.53 · 107 , 6 · 1010 ≤ rk u7 ≤ 1011 ; for p = 5 if m = 11, then 2 · 108 ≤ rk u3 ≤ 109 , 1011 ≤ rk u7 ≤ 8 · 1011 ; for p = 11 2 · 1010 ≤ rk u7 ≤ 3 · 1011 . if m = 5, then 106 ≤ rk u3 ≤ 2 · 107 ,
386
V.L. Kurakin et al.
9.2. The Distribution of Elements on a Cycle of an MPRecurrence. Another important requirement to the pseudorandom sequence is the “uniformity” condition on the distribution of elements and ktuples on its long enough segments. The results on this topic are summarized in [33,36]. We present here only one of new results. Let F (x) be an MPpolynomial of degree m over a Galois ring R = GR(q n , pn ), and let u ∈ LR (F ) be a MPrecurrence of period T = (q m − 1)pn−1 . Let 0 ≤ i1 ≤ . . . ≤ ik < T be ﬁxed integer numbers, and let α1 , . . . , αk be ﬁxed elements of R. Denote by ν the number of solutions i ∈ 0, T − 1 of the system of equations u(i + i1 ) = α1 , . . . , u(i + ik ) = αk . Theorem 51 ([26]) If the system of residues of polynomials x ¯i1 , . . . , x ¯ik ∈ ¯ ¯ then R[x] modulo F¯ (x) is linearly independent over the ﬁeld R, √ ν − T ≤ p2(n−1)q m/2 ≈ p3(n−1)/2 T . k R More precise results on distribution of elements on cycles of linear recurring sequences over Zpn can be found in [35] and [33, § 27]. 9.3. Weight Characteristics of MPRecurrence over R = GR(q 2 , 4). In [34,40] the full description was given of the possible variants of distributions of elements on cycles of MPrecurrences not only over Z4 [33] but over any Galois ring R = GR(q 2 , 4) of the characteristic 4. Let F (x) ∈ R[x] be a monic polynomial of the degree m such that its period T = T (F ) is equal to τ = q m − 1 (distinguished polynomial) or to 2τ (MPpolynomial). Let u ∈ LR (F ), u = 0 and Nu (c) be the number of solutions i ∈ 0, T − 1 of the equation u(i) = c for a given c ∈ R. The description of possible types [Nu (c) : c ∈ R] in LR (F ) and their multiplicities is obtained, i.e. the 0,T −1 (F ) is described. These results complete weight enumerator of the code LR were based on the presentation of LRS u using the tracefunction in Galois rings [47,33,39] and on the theory of quadrics over Galois ﬁelds of characteristic 2 (see [59]). For brevity we give only the description of possible values of Nu (c). Let λ = [m/2] be the integer part of m/2 and δc,0 be the Kronecker delta. Theorem 52 If F (x) is a distinguished polynomial then for any c ∈ R Nu (c) = q m−2 ± wq λ−1 − δc,0 , where w ∈ {1, q − 1} if m = 2λ + 1, and w ∈ {0, 1, q − 1} if m = 2λ. There exists not more than 2q + 1 diﬀerent types [Nu (c) : c ∈ R] in LR (F ). [m/2] In particular for a nonzero c we have Nu (c) − q m−2 ≤ q−1 . Note that q q in [31] the trigonometric sums approach gives more rough estimation, with q m/2 in the right part.
Linear Codes and Polylinear Recurrences over Finite Rings and Modules
387
Theorem 53 If F (x) is an MPpolynomial then for any c ∈ R Nu (c) = 2q m−2 ± wq λ−1 − 2δc,0 , where w ∈ {0, 2, q − 2, q, 2(q − 1)} if m = 2λ + 1, and w ∈ {0, 1, 2, q − 1, 2(q − 1)} if m = 2λ. There exist not more than 2q + 1 diﬀerent types [Nu (c) : c ∈ R] in LR (F ). Acknowledgment. The authors thank the referee for many interesting and helpful comments and suggestions that had improved the explication of the results.
References 1. Berlekamp E.R. Algebraic Coding Theory. McGrawHill, New York, 1968. 2. Berman S.D., Semisimple cyclic and abelian codes II. Cybernetics, 3 (1967) No. 3, 17–23. 3. Blake J.F. Codes over certain rings. Inform. and Control, 20 (1972), 396–404. 4. Blake J.F. Codes over integer residue rings. Inform. and Control, 29 (1972), No. 4, 295–300. 5. Bush K.A. Orthogonal arrays of index unity. Ann. Math. Stat., 23 (1952), 426–434. 6. Camion P. Abelian codes. Math. Res. Center, Univ. Wisconsin, Rept.1059, 1970. 7. Carlet C. On Z4 duality. IEEE Trans. Inform. Theory, IT41 (1995), No. 5, 1487– 1494. 8. Constantinescu I., Heise W., Honold T. Monomial extensions of isometries between codes over Zm . Proceedings of the Vth Int. Workshop on Alg. and Comb. Coding Theory, Sozopol, Bulgaria, 1996, 98–104. 9. Constantinescu I., Heise W. A metric for codes over residue rings. Probl. of Inform. Transm., 33 (1997), No. 3. 10. Couselo E., Gonzalez S., Markov V., and Nechaev A. Recursive MDScodes and recursively diﬀerentiable quasigroups. Discr. Math. and Appl., 8 (1998), No. 3, 217–247. 11. Couselo E., Gonzalez S., Markov V., and Nechaev A. Recursive MDScodes and recursively diﬀerentiable kquasigroups. Proceedings of the Sixth International Workshop on Algebraic and Combinatorial Coding Theory (ACCTVI). September 6–12, 1998, Pskov, Russia. 78–84. 12. Couselo E., Gonzalez S., Markov V., and Nechaev A. Recursive MDScodes. Proceedings of the WCC’99 Workshop on Coding and Cryptography. January 11–14, 1999, Paris, France. 271–278. 13. Dai Z.D., Beth T., Gollmann D. Lower bounds for the linear complexity of sequences over residue rings. Lect. Notes Comput. Sci. 473. Berlin, 1991. 189195. 14. Delsarte P. An algebraic approach to the association schemes of coding theory. Philips research, Rep. Suppl., 10, 1973. 15. J.D´enes & A.D.Keedwell, “Latin Squares and their Applications”. Akad´emiai Kiad´ o, Budapest; Academic Press, New York; English Universities Press, London, 1974. 16. J.D´enes & A.D.Keedwell, “Latin squares. New developments in the theory and applications”. Annals of Discrete Mathematics, 46. NorthHolland, Amsterdam, 1991.
388
V.L. Kurakin et al.
17. Ericson R., Zinoviev V.A. On Fourierinvariant partitions of ﬁnite abelian groups and the MacWilliams identity for group codes. Probl. of Inform. Transm., 32 (1996), No. 1. 18. Faith C. Algebra II. Ring Theory. Springer. Berlin, 1976. 19. Fitzpatrick P., Norton G.H. The Berlekamp—Massey algorithm and linear recurring sequences over a factorial domain. Appl. Alg. in Eng., Comm. and Comp., 6 (1995), 309–323. 20. Hammons A.R., Kumar P.V., Calderbank A. R., Sloane N. J. A., Sole P. The Z4 linearity of Kerdock, Preparata, Goethals and related codes. IEEE Trans. of Inf. Theory, 40 (1994), No. 2, 301–319. 21. Heise W., Quattrocci P. Informations und Codierungstheorie. Springer, Berlin– Heidelberg, 1995. 22. Heise W., Honold Th., and Nechaev A. A. Weighted modules and representations of codes. Proceedings of the Sixth International Workshop on Algebraic and Combinatorial Coding Theory (ACCTVI). September 6–12, 1998, Pskov, Russia, 123–129. 23. Honold Th., Landjev I. Linearly representable codes over chain rings. Proceedings of the Sixth International Workshop on Algebraic and Combinatorial Coding Theory (ACCTVI). September 6–12, 1998, Pskov, Russia, 135–141. 24. Honold Th., Nechaev A.A., Weighted modules and presentations of codes. Probl. of Inform. Transm., to appear. 25. Hong S.J., Patel A.M., A general class of maximal codes for computer application. IEEE Trans. on computers, C21 (1972), No. 12, dec., 1322–1331. 26. Kamlovskii O.V., Kuzmin A.S., Distribution of elements on cycles of linear recurring sequences over Galois rings. Russian Math. Surveys, 53 (1998), No. 2. ¨ 27. Klemm M. Uber die Identit¨ at von MacWilliams f¨ ur die Gewichtsfunktion von Codes. Arch. Math., 49 (1987), 400–406. 28. Klemm M. Selbstduals codes u ¨ ber dem Ring der ganzen Zahlen modulo 4. Arch. Math., 53 (1989), 201–207. 29. Krull W. Algebraische Theorie der Ringe II. Math. Ann., 91 (1923), 1–46. 30. Kumar P.V., Helleseth T., Improved binary codes and sequences families from Z4linear codes. IEEE Trans. Inform. Theory, 42 (1995), No. 5, 1562–1593. 31. Kumar P.V., Helleseth T., Calderbank A.R. An upper bound for Weil exponential sums over Galois rings and applications. IEEE Trans. Inform. Theory, 41 (1995), No. 2, 456–468. 32. Kurakin V.L. The Berlekamp—Massey algorithm over ﬁnite rings, modules and bimodules. Discrete Math. and Appl., 8 (1998), No. 5. 33. Kurakin V.L., Kuzmin A.S., Mikhalev A.V., Nechaev A.A. Linear recurring sequences over rings and modules. (Contemporary Math. and its Appl. Thematic surveys. Vol. 10. Algebra 2. Moscow, 1994.) J. of Math. Sciences, 76 (1995), No. 6, 2793–2915. 34. Kurakin V., Kuzmin A., Nechaev A. Codes and linear recurrences over Galois rings and QFmodules of the characteristic 4. Proceedings of the Sixth International Workshop on Algebraic and Combinatorial Coding Theory (ACCTVI). September 6–12, 1998, Pskov, Russia, 166–171. 35. Kuzmin A.S. Distribution of elements on cycles of linear recurring sequences over residue rings Russian Math. Surveys, 47 (1993), No. 6. 36. Kuzmin A.S., Kurakin V.L., Nechaev A.A. Pseudorandom and polylinear sequences Trudy po Diskretnoi Matematike (editors V.N.Sachkov et all), TVP, 1 (1997), 139– 202. 37. Kuzmin A.S., Nechaev A.A. A construction of noise stable codes using linear recurring sequences over Galois rings. Russian Math. Surveys, 47 (1992), No. 5.
Linear Codes and Polylinear Recurrences over Finite Rings and Modules
389
38. Kuzmin A.S., Nechaev A.A. Linearly presented codes and Kerdock code over an arbitrary Galois ﬁeld of the characteristic 2. Russian Math. Surveys, 49 (1994), No. 5. 39. Kuzmin A.S., Nechaev A.A. Linear recurring sequences over Galois rings. Algebra and Logic, Plenum Publ. Corp., 34 (1995), No. 2. 40. Kuzmin A.S., Nechaev A.A. Complete weight enumerators of generalized Kerdock code and linear recursive codes over Galois rings. Proceedings of the WCC’99 Workshop on Coding and Cryptography. January 11–14, 1999, Paris, France, 332–336. 41. Lidl R. and Niederreiter H., Finite Fields, AddisonWesley, London (1983). 42. Massey J.L. Shiftregister synthesis and BCH decoding. IEEE Trans. Inf. Theory, 15 (1969), No. 1, Part 1, 122–127. 43. MacWilliams F.J., Sloane N.J.A. The Theory of ErrorCorrecting Codes. NorthHolland Publ. Co., 1977. 44. MacWilliams F.J. Sloane N.J.A. Pseudorandom sequences and arrays. Proc. IEEE. 64 (1976), No. 11, 17151729. 45. Mikhalev A.V. and Nechaev A.A. Linear recurring sequences over modules. Acta Applicandae Mathematicae, 42 (1996), 161–202. 46. Nechaev A.A. Trace function in Galois ring and noise stable codes. V AllUnion Symp. on theory of rings, algebras and modules, Novosibirsk, 1982, p.97 (in Russian). 47. Nechaev A.A. Kerdock code in a cyclic form. Diskr. Math. (USSR), 1 (1989), No. 4, 123–139 (in Russian). English translation: Discrete Math. and Appl., 1 (1991), No. 4, 365–384. 48. Nechaev A.A. Linear recurring sequences over commutative rings. Discrete Math. and Appl., 2 (1992), No. 6, 659–683. 49. Nechaev A.A. The cyclic types of linear substitutions over ﬁnite commutative rings. Mat. Sbornik, 184 (1993), No. 3, 21–56 (in Russian). 50. Nechaev A.A. Linear recurring sequences over quasiFrobenius modules. Russian Math. Surveys, 48 (1993), No. 3. 51. Nechaev A.A. Linear codes over ﬁnite rings and QFmodules. Proceedings of the IVth Int. Workshop on Algebraic and Combinatorial Coding Theory. Novgorod, Sept. 1994, Soﬁa, Zakrila, 1994, 154–157. 52. Nechaev A.A. Finite QuasiFrobenius modules, applications to codes and linear recurrences. Fundamentalnaya i prikladnaya matematika, 1 (1995), No. 1, 229–254 (in Russian). 53. Nechaev A.A. Linear codes over QuasiFrobenius modules. Doklady Math. Science. MAIK NAUKA. Interperiodika, 52 (1995), No. 3. 54. Nechaev A.A. Polylinear recurring sequences over modules and quasiFrobenius modules. Proc. First Int. Tainan–Moscow Algebra Workshop, 1994, Walter de Gruyter, Berlin–N.Y., 1996, 283–298. 55. Nechaev A.A. Linear codes over modules and over spaces. MacWilliams identity. Proceedings of the 1996 IEEE Int. Symp. Inf. Theory and Appl., Victoria B.C., Canada, 1996, 35–38. 56. Nechaev A.A. and Kuzmin A.S. Linearly presentable codes. Proceedings of the 1996 IEEE Int. Symp. Inf. Theory and Appl., Victoria B.C., Canada, 1996, 31–34. 57. Nechaev A.A. and Kuzmin A.S. Z4 linearity, two approaches. Proceedings of the Vth Int. Workshop on Alg. and Comb. Coding Theory, Sozopol, Bulgaria, 1996, 212–215. 58. Nechaev A.A. and Kuzmin A.S. Formal duality of linearly presentable codes over a Galois ﬁeld. Lecture Notes Comput. Sci., 1255, Springer, Berlin, 1997, 263–276. 59. Nechaev A.A., Kuzmin A.S. Tracefunction on a Galois ring in coding theory. Lecture Notes Comput. Sci., 1255, Springer, Berlin, 1997, 277–290.
390
V.L. Kurakin et al.
60. Nechaev A.A., Kuzmin A.S., and Markov V.T. Linear codes over ﬁnite rings and modules. Fundamentalnaya i prikladnaya matematika, 2 (1996), No. 3, 195–254 (in Russian). 61. Nomura T., Miyakawa H., Imai H., Fukuda A. A theory of twodimensional linear recurring arrays. IEEE Trans. Inform. Theory, 18 (1972), No. 6, 775–785. 62. Norton G., SalageanMandache A. On the structure and Hamming distance of linear codes over Galois rings. Proceedings of the WCC’99 Workshop on Coding and Cryptography. January 11–14, 1999, Paris, France, 337–341. 63. Peterson W.W., Weldon E.J. Errorcorrecting codes. The MIT Press, Cambridge, 1972. 64. Reeds J.A., Sloane N.J.A. Shiftregister synthesis (modulo m). SIAM J. on computing, 14 (1985), No. 3, 505–513. 65. Sakata S. On determining the independent set for doubly periodic arrays and encoding twodimensional cyclic codes and their duals. IEEE Trans. Inform. Theory, 27 (1981), 556–565. 66. Sakata S. Synthesis of twodimensional linear feedback shiftregisters and Groebner bases. Applied algebra, algebraic algorithms and errorcorrecting codes (AAECC), 1987. Lecture Notes in Comput. Sci., 356, Springer, Berlin, 1989, 394–407. 67. Sakata S. ndimensional Berlekamp—Massey algorithm for multiply arrays and construction of multivariate polynomials with preassigned zeros. Applied algebra, algebraic algorithms and errorcorrecting codes (AAECC) (Rome, 1988). Lecture Notes in Comput. Sci., 357, Springer, Berlin, 1989, 356–376. 68. Sakata S. Twodimensional shift register synthesis and Groebner bases for polynomial ideals over an integer residue ring. Discr. Appl. Math., 33 (1991), No. 1–3, 191–203. 69. Shankar P. On BCH codes over arbitrary integer rings. IEEE Trans. Inf. Theory, IT25 (1979), No. 4, 480–483. 70. Solov’eva F.I., Avgustinovich S.V., Honold Th., Heise W. On the extendability of codes isometries. Journ. of Geometry. 61 (1998), 3–16. 71. Spiegel E. Codes over Zm . Inform. and Control, 35 (1977), No. 1, 48–51. 72. Spiegel E. Codes over Zm . Revisited. Inform. and Control, 37 (1978), No. 1, 100– 104. 73. Tsfasman M.A., Vlˇ adut¸ S.G. AlgebraicGeometric Codes. Math. and its Appl. (Soviet Series), Vol. 58. Kluwer Academic Publishers, 1991. 667 p. 74. Ward M. The arithmetical theory of linear recurring series. Trans. Amer. Math. Soc., 35 (1933), No. 3, 600–628. 75. Ward M. Arithmetical properties of sequences in rings. Ann. Math., 39 (1938), 210–219. 76. Wolfmann J. Negacyclic and cyclic codes over Z4 . Proceedings of the WCC’99 Workshop on Coding and Cryptography. January 11–14, 1999, Paris, France, 301– 306. 77. Wood J.A. Duality for modules over ﬁnite rings and applications to coding theory. Preprint, 1996. 78. Wood J.A. Extension theorems for linear codes over ﬁnite rings. Preprint, 1996. 79. Zierler N. Linear recurring sequences. J. Soc. Ind. Appl. Math., 7 (1959), No. 1, 31–48.
Calculating Generators for Invariant Fields of Linear Algebraic Groups J¨ orn M¨ uller–Quade and Thomas Beth Institut f¨ ur Algorithmen und Kognitive Systeme Professor Dr. Th. Beth, Arbeitsgruppe Computer Algebra Universit¨ at Karlsruhe, Am Fasanengarten 5, 76 128 Karlsruhe, Germany
Abstract. We present an algorithm to calculate generators for the invariant field k(x)G of a linear algebraic group G from the defining equations of G. This work was motivated by an algorithm of Derksen which allows the computation of the invariant ring of a reductive group using ideal theoretic techniques and the Reynolds operator. The method presented here does not use the Reynolds operator and hence applies to all linear algebraic groups. Like Derksen’s algorithm we start with computing the ideal vanishing on all vectors (ξ, ζ) for which ξ and ζ are on the same orbit. But then we establish a connection of this ideal to the ideal of syzygies the generators of the field k(x) have over the invariant field. From this ideal we can calculate the generators of the invariant field exploiting a fieldidealcorrespondence which has been applied to the decomposition of rational mappings before.
1
Introduction
Invariant theory has a long tradition as well as new applications. The branch of constructive invariant theory is mostly interested in questions like ﬁnding generators for invariant rings of a given group. Much research has been done for the case of ﬁnite groups (see e. g. [10,16,14]). Invariant ﬁelds for ﬁnite groups are just the quotient ﬁelds of the corresponding invariant ring so problems in that area were rather structural like “is a given invariant ﬁeld rational?” [9] which is an instance of the famous rationality problem. But in the case of linear algebraic groups generating systems of invariant ﬁelds become more interesting. Every invariant ﬁeld has a ﬁnite generating system even if the corresponding invariant ring does not. Recently Derksen showed how to compute generators for invariant rings also of inﬁnite reductive groups using ideal theory. This motivated our study of an ideal theoretic approach to invariant ﬁelds. This paper presents an algorithm to compute generators for the invariant ﬁeld of a linear algebraic group which is given by its equations. The paper is structured as follows: Section 2 will explain a correspondence between ﬁelds and certain syzygy ideals. This correspondence will later on allow us to pass from the ideal theoretic part of our algorithm back to function ﬁelds. Marc Fossorier et al. (Eds.): AAECC13, LNCS 1719, pp. 392–403, 1999. c SpringerVerlag Berlin Heidelberg 1999
Calculating Generators for Invariant Fields of Linear Algebraic Groups
393
Section 3 introduces the equivalence relation relating those points which are on the same group orbit—the so called graph of the action. The ideal describing the Zariski closure of this equivalence relation can be computed from the equations deﬁning the group. In Section 4 we will present two results relating the ideal describing the graph of the action and the syzygy ideal corresponding to the invariant ﬁeld we are looking for. Because of the ﬁeldidealcorrespondence being constructive we will be able to compute the invariant ﬁeld following this path. The last section gives four examples which were computed using MapleV.
2
A Field Ideal Correspondence
To be able to use ideal theoretic methods to deal with function ﬁelds we shortly restate the ﬁeldidealcorrespondence which was used in [13] to ﬁnd decompositions of rational mappings. Given ﬁelds k(f ) = k(f1 , . . . , fr ) and k(x) = k(x1 , . . . , xn ) ﬁnitely generated over a ﬁeld of constants k and both being contained in some ﬁeld k(V ) = Quot(k[X1 , . . . , Xs ]/I(V )) we formally deﬁne the syzygy ideal mentioned in the introduction. Definition 1. Let k(f ) and k(x) be ﬁelds lying over a ﬁeld k of constants and let {x}, {y} denote the sets of generators of k(f ) and k(x) over k respectively. Furthermore let x∈{x} {Zx } be a set of variables and k(f )[Z] be the ring of polynomials in these variables over the ﬁeld k(f ). Then the ideal Jk(x)/k(f ) ⊆ k(f )[Z] of all algebraic relations of the set {x} over k(f ) is deﬁned as {x − Zx x ∈ {x}} ∩ k(f )[Z]. The ideal {x − Zx x ∈ {x}} used in the deﬁnition can also be viewed as a syzygy ideal, namely Jk(x)/k(x) representing the trivial extension k(x)/k(x). There is an alternative characterization of this ideal which was given in [12]. It is the key to the properties of the syzygies ideal which are used in the following. Lemma 1. The ideal Jk(x)/k(f ) equals the kernel of the specialization homomorphism Ψx : k(f )[Zx1 , . . . , Zxn ] → k(f )(x), Zx → x. In this paper we just need a special case of the ideal Jk(x)/k(f ) namely we restrict ourselves to k(f ) being a subﬁeld of k(x). As we can think of the ﬁeld k(x) and its generating system {x} as being ﬁxed we will just write Jk(f ) instead of Jk(x)/k(f ). In this special case we can express the generators of k(f ) in terms of the x1 , . . . , xr . An alternative characterization of the ideal Jk(f ) can then be given (following [13]). This characterization has computational advantages and for k(x) being ﬁxed we get a correspondence between intermediate ﬁelds of k(x)/k and the ideals Jk(f ) . For this correspondence we need a result from [13]:
394
J¨ orn M¨ uller–Quade and Thomas Beth
Proposition 1. Let k(f ) ≤ k(x) be ﬁelds ﬁnitely generated over k and let the generators f1 , . . . , fm of k(f ) over k be expressed in x = x1 , . . . , xr as f1 = n1 (x) nm (x) d1 (x) , . . . , fm = dm (x) . Let Z denote Zx1 , . . . , Zxr then we deﬁne an ideal
I=
n1 (x) nm (x) n1 (Z) − d1 (Z), . . . , nm (Z) − dm (Z) d1 (x) dm (x)
and for d = d1 (Z) · · · · · dm (Z) we get: Jk(f ) = (Jk + I) : d∞ . Furthermore the coeﬃcients of a reduced Gr¨ obner basis of Jk(f ) · k(x)[Z] generate k(f ). In the remaining paper we will not have algebraic relations among the generators x1 , . . . , xr of k(x). Therefore Jk will be the zero ideal and we need not worry about computing Jk . For an ideal I the saturation I : d∞ is eﬀective ([1] Algorithm idealdiv2) and so is the problem of representing a ﬁeld element in some generators [17,8,13]. Thus the ideals of Proposition 1 can be achieved through Gr¨obner basis computations if the generators for the ﬁeld are known. We can conclude for the ﬁelds k(x), k(f ) being ﬁnitely generated over a computable ﬁeld k and contained in a ﬁeld k(V ) = Quot(k[X1 , . . . , Xs ]/I(V )) that the ideal Jk(f ) can be computed eﬀectively. Thus the following ﬁeld ideal correspondence is constructive: Corollary 1. For subﬁelds k(f ) of k(x) being ﬁnitely generated over a computable ﬁeld k and contained in a ﬁeld k(V ) = Quot(k[X1 , . . . , Xs ]/I(V )) the mapping C:
{k(f )k ≤ k(f ) ≤ k(x)} → {II ✂ k(x)[Z]} k(f ) → Jk(f ) · k(x)[Z]
is inclusion preserving, injective and C as well as C −1 can be computed eﬀectively But the problem in this paper is to calculate ﬁeld generators. Hence in the next two sections we will develop a method to compute the Ideal Jk(x)/k(x)G · k(x)[Z] from the ideal HG of the deﬁning equations of the graph of the action of a group G. With the above corollary we will then see that generators for k(x)G can be computed eﬀectively from HG .
3
The Graph of the Action of a Linear Algebraic Group
In this subsection we give a formal deﬁnition of the equivalence relation which relates two points iﬀ they are on the same Gorbit and we restate the fact that the ideal HG of the deﬁning equations of the graph of the action can be computed eﬀectively. Then we will state a theorem of Derksen which motivated this paper.
Calculating Generators for Invariant Fields of Linear Algebraic Groups
395
Definition 2. For a group G acting on a ﬁnite dimensional vector space V we deﬁne the following relation from V to V : {(ξ, ζ)∃g ∈ G : g · ξ = ζ}. This relation will be called the graph of the action of the group G. The Zariski closure of this relation will be called the closed graph of the action of the group G. The ideal of all polynomials over vanishing on the graph of the action of G will be denoted by HG ✂ k[X, Z] . For the ﬁrst step in our algorithm we have to compute generators for the ideal HG from the equations deﬁning the group G. Lemma 2. Let p1 , . . . , ps ✂k[m11 , m12 , . . . , mnn ] denote the ideal of those polyn×n nomials over k vanishing on G ⊆ k . Let I ✂ k[m11, m12 , . . . , mnn , X, Z] denote the ideal generated by p1 , . . . , ps and the entries of m11 . . . mn1 Zx1 X1 .. . . . . · · · .. · .. − .. . mn1 . . . mnn
Zxn
Xn
Then HG = I ∩ k[X, Z]. Hence HG can be computed using elimination. Proof. The proof is obvious from the Extension Theorem [2]: consider the points of the graph of the action of G as all those points which extend to a point on the variety of I. As HG describes the Zariski closure of all partial solutions which can be extended to points on the variety of I the ideal HG also describes the Zariski closure of the graph of the action of G. But ﬁrst we will give an additional motivation for the study of the Zariski closure of the graph of the action of a group namely its connection to the invariant ring of that group. Derksen was able to turn the proof of Hilbert’s ﬁniteness theorem into an algorithm (see [4]). The algorithm of Derksen makes use of the ideal deﬁning the graph of the action and thereby establishes the connection between the two structures. Next we will shortly state the result of Derksen following the presentation of Decker and De Jong [3]. Before that we restate the ﬁniteness result of Hilbert: Theorem 1 (Hilbert). Let G be a reductive group, let ∗ : k[X] → k[X]G , f → f ∗ denote the Reynolds operator, and IN be the ideal generated by the invariant ring k[X]G in k[X]. Then for homogeneous polynomials f1 , . . . , fr we have f1 , . . . , fr = IN iﬀ k[f1∗ , . . . , fr∗ ] = k[X]G . Because of k[X] being noetherian the invariant ring is ﬁnitely generated.
396
J¨ orn M¨ uller–Quade and Thomas Beth
For reductive groups it suﬃces to know generators of the ideal IN and one can derive a generating system of k[X]G using the Reynolds operator. The algorithm of Derksen computes generators of IN from the ideal deﬁning the graph of the action. Theorem 2 (Derksen). Let HG ✂k[X, Z] denote the deﬁning ideal of the closed graph of the action of G then Z + HG ∩ k[X] = IN . The proof can be found in [3]. In the next section we will present a variation of Derksen’s algorithm which works for invariant ﬁelds of algebraic groups. This algorithm will be independent of the Reynolds operator and will do for nonreductive groups as well.
4
Calculating Generators for Invariant Fields
In this section we want to establish a connection between the Ideal HG describing the graph of the action of a group G and the ideal Jk(x)G corresponding to the invariant ﬁeld. This connection will then be exploited to calculate invariant ﬁelds. But ﬁrst we look at a result of Rosenlicht connecting generating systems of Invariant ﬁelds and the graph of the action. Theorem 3 (Rosenlicht). Let G denote a linear algebraic group and let f = (f1 , . . . , fr ) be a rational mapping for which k(f ) = k(x)G . Then there exists a n Ginvariant open subset U of k such that f is regular on U and the relation n n −1 f ◦f ⊂ k ×k restricted to U ×U equals the graph of the action of G restricted to U × U. For a proof see [15, Satz 2.2.] As a corollary we get: Corollary 2. Using the same notation as above we can conclude: 1. We denote the extension of the ideal Jk(x)G to k(x)[Z] by Jk(x)G ·k(x)[Z]. Let (Jk(x)G ·k(x)[Z]∩k[x, Z])x=X be the ideal of all polynomials in Jk(x)G ·k(x)[Z] where the algebraically independent parameters x1 , . . . , xn are replaced by n variables X1 , . . . , Xn . Then there exists an open nonempty subset U of k such that we get the following equality for the varieties restricted to U × U: V(HG )U ×U = V((Jk(x)G · k(x)[Z] ∩ k[x, Z])x=X)U ×U 2. The ideal HG and (Jk(x)G ·k(x)[Z]∩k[x, Z])x=X have associated prime ideals in common. 3. Let HG X=x ✂ k(x)[Z] denote the ideal generated by the image of HG under the specialization homomorphism k[X, Z] → k(x)[Z] deﬁned by Xi → xi . Then the ideal Jk(x) · k(x)[Z] and the ideal HG X=x have associated prime ideals in common.
Calculating Generators for Invariant Fields of Linear Algebraic Groups
397
Proof. For an algebraic group G the graph of the action is a projection of a variety. Hence from the Extension Theorem [2, Theorem 4 § 6 Chapter 3] we n can conclude that there exists an open nonempty subset U1 ⊆ k such that the closed graph of the action V(HG ) is on U1 ×U1 equal to the graph of the action of n G. Furthermore it is easy to see that there is an open nonempty subset U2 ⊆ k such that (Jk(x)G · k(x)[Z] ∩ k[x, Z])x=X describes on the open set U2 × U2 the ﬁbers of a rational mapping f = (f1 , . . . , fr ) for which k(f ) = k(x)G . From the theorem of Rosenlicht we can deduce the existence of an open set U3 such that on U3 × U3 the ﬁbers of (f1 , . . . , fr ) equal the graph of the action. Hence for U = U1 ∩ U2 ∩ U3 we get the equality stated in point 1. of our corollary. To show point 2. let V denote the complement of the open set U × U. To shorten notation we will write J for (Jk(x)G ·k(x)[Z]∩k[x, Z])x=X and H for HG in the remaining part of the proof. We get V(H) = (V(H) ∩ U × U)∪(V(H)∩V ) and V(J ) = (V(J ) ∩ U × U ) ∪ (V(J ) ∩ V ). As V(J ) ∩ U × U = V(H) ∩ U × U all their associated prime ideals must be equal. Since V(H) ∩ U × U and V(H) ∩ V are disjoint and the latter is closed we can conclude that no associated prime of I(V(H) ∩ V )) is contained in I(V(H) ∩ U × U ). Thus all primes associated to I(V(H) ∩ U × U ) are also associated to H. The same argument applies to J and thus the ideals H and J share all prime ideals which are associated to I(V(H) ∩ U × U ). For the proof of point 3. we ﬁrst observe that specialization of our variables X to ﬁeld elements x is equivalent to localizing our ring k[X, Z] with the multiplicatively closed set k[X] \ {0} (and then substituting the symbols X1 , . . . , Xn by the symbols x1 , . . . , xn ). Since V(H) ∩ U × U describes exactly the graph of the action it does not contain any points (0, . . . , 0, ζ) with ζ not equal to the zero vector. Hence no associated prime P of I(V(H)∩U ×U) does contain a polynomial from k[X]\{0}. Thus all prime ideals associated to I(V(H) ∩ U × U) remain associated after localization (see [18, Theorem 16 (b),Theorem 17 § 10 Chapter IV]). It remains to show that these associated prime ideals are also associated to the localization of H and J . All those associated prime ideals which contain an element of k[X]\{0} become the whole ring when localized. Hence they become redundant and all other associated prime ideals remain associated prime ideals according to [18, Theorem 17 § 10 Chapter IV]. Next we shortly look at the action which the group G induces on the associated prime ideals of Jk(x)G · k(x)[Z]. Lemma 3. Let the group G operate on k(x)[Z] by operating on the coeﬃcients from k(x). Then the group G operates transitively on the associated prime ideals of Jk(x)G · k(x)[Z]. G Proof. Let k(x)G alg denote the algebraic closure of k(x) in k(x). Then the ﬁeld G k(x)alg is the unique maximal ﬁeld lying algebraically over k(x)G and being contained in k(x). As the ﬁelds k(x)G and k(x) are closed under the group
398
Jo ¨rn Mu ¨ ller–Quade and Thomas Beth
action of G the ﬁeld k(x)G alg must by deﬁnition be closed under the group action of G, too. The group G operates on k(x)G alg as a group of automorphisms. Hence G operates like a ﬁnite group on k(x)G alg because the group of automorphisms of G G k(x)alg leaving k(x) invariant must be ﬁnite. From the book of Eisenbud [5, Proposition 13.10’] we know that the group G then operates transitively on the associated primes of Jk(x)G · k(x)G alg [Z]. The claim of our lemma can now be concluded by looking at the algorithm for the computation of primary decompositions given by Gianni, Trager, and Zacharias [6]. This algorithm never introduces transcendental ﬁeld extensions of the ground ﬁeld (here k(xG )) and the primary decomposition of the ideal Jk(x)G · k(x)[Z] is already found in k(x)G alg [Z]. The relation between the ideals HG and Jk(x)G · k(x)[Z] can now be stated: Theorem 4. Let HG X=x denote the ideal generated by the image of HG under the specialization homomorphism k[X, Z] → k(x)[Z], Xi → xi and let Jk(x)G · k(x)[Z] be the ideal generated by Jk(x)G ✂ k(x)G [Z] in k(x)[Z]. Then we get the following equality: HG X=x = Jk(x)G · k(x)[Z] Proof. The ideals Jk(x)G · k(x)[Z] and HG X=x are invariant under the group action of the group G on the coeﬃcients (∈ k(x)). Furthermore they have (at least) one associated prime ideal in common. As the group G operates transitively on the associated primes of Jk(x)G · k(x)[Z] we can conclude that every prime associated to Jk(x)G · k(x)[Z] is also associated to HG X=x. Hence we have HG X=x ⊆ Jk(x)G · k(x)[Z] if the two ideals are radical. The ideal HG X=x is radical as it contains all polynomials vanishing on G · x × G · x. The radicality of Jk(x)G · k(x)[Z] can be deduced from Jk(x)G ✂ k(x)G [Z] being prime and the ﬁeld extension k(x)/k(x)G being separable [15, Section VI Lemma 1.5], [19, Corollary to the Lemma in § 11 Chapter VII]. The inclusion Jk(x)G · k(x)[Z] ⊆ HG X=x can be deduced from HG X=x being by construction the maximal ideal vanishing on G · x × G · x and Jk(x)G · k(x)[Z] also vanishing on G · x × G · x. This relation between the ideals HG and Jk(x)G · k(x)[Z] can be used to calculate generators for an invariant ﬁeld k(x)G . Corollary 3. Let C denote the set of coeﬃcients of a reduced Gr¨ obner basis of the ideal HG X=x. Then k(C) = k(x)G . The proof is obvious from the above theorem and Proposition 1. To make the treatment of the ideals complete we give a method to compute HG from Jk(x)G · k(x)[Z]. For this we need the following result:
Calculating Generators for Invariant Fields of Linear Algebraic Groups
399
Lemma 4. Let Jk(x)G · k(x)[Z] ∩ k[x, Z] x=X be the ideal of all polynomials from Jk(x)G · k(x)[Z] with the algebraically independent parameters x1 , . . . , xn replaced by the variables X1 , . . . , Xn . Furthermore let HG ✂ k[X, Z] be as above. Then we get the equality:
Jk(x)G · k(x)[Z] ∩ k[x, Z] x=X = HG
Proof. Let q(X, Z) be an element of Jk(x)G · k(x)[Z] ∩ k[x, Z] x=X. Then it vanishes on every point of the graph of the action hence q is in the ideal HG of all polynomials vanishing there. For the other direction let q(X, Z) be contained in HG . Then q(x, Z) is contained in Jk(x)G · k(x)[Z] (according to Theorem 4) and as no denominators occur it is also contained in Jk(x)G · k(x)[Z] ∩ k[x, Z]. Hence q(X, Z) is element
of Jk(x)G · k(x)[Z] ∩ k[x, Z] x=X. Using this lemma we can show the main ingredient for an algorithm to compute the ideal HG from the ideal Jk(x)G · k(x)[Z]. Corollary 4. Let f1 , . . . , fr ∈ k(x)[Z] be a generating system for Jk(x)G ·k(x)[Z] and p1 , . . . , pr ∈ k[x, Z] be this generating set after clearing denominators (multiplying by a suitable polynomial from k[x]). Then HG equals p1 , . . . , pr : d∞ . d∈k[x]\{0}
Proof. Following Lemma 4 we have to show that d∈k[x]\{0} p1 , . . . , pr : d∞
equals Jk(x)G · k(x)[Z] ∩ k[x, Z] . Let q be an element of d∈k[x]\{0} p1 , . . . , pr : d∞ then there exists a polynomial d ∈ k[x] such that dq ∈ p1 , . . . , pr hence dq ∈ f1 , . . . , fr and as f1 , . . . , fr is deﬁned over k(x) we have q ∈ f1 , . . . , fr and as q is a polynomial we get q ∈ f1 , . . . , fr ∩ k[x, Z]. Conversely let q be
ran element of Jk(x)G · k(x)[Z] ∩ k[x, Z] . Then there is a representation q = i=1 qi fi and the denominators with a d ∈ k[x]) yields a representation dq =
r (clearing
q p . Hence dq is an element of p1 , . . . , pr and as d∈k[x]\{0} p1 , . . . , pr : i i i=1 d∞ is saturated with respect to every polynomial in k[x] we have proven the above claim. We need two more lemmata (both taken from [11]) to make the above corollary eﬀective. Lemma 5. 1. For a primary ideal Q of a polynomial ring and a polynomial d the ideal Q : d∞ equals Q or 1. 2. For an intersection I1 ∩ · · · ∩ It of ideals I1 , . . . , It of a polynomial ring and a polynomial d we have: (I1 ∩ · · · ∩ It ) : d∞ = I1 : d∞ ∩ · · · ∩ It : d∞ .
400
J¨ orn M¨ uller–Quade and Thomas Beth
Proof. 1. If the ideal Q contains a power of d then Q : d∞ equals 1. Now let Q be a primary ideal with dn ∈ Q for all n then Q : d∞ = 1. Suppose Q : d∞ = Q then there exists a dl · p ∈ Q with p ∈ Q but because of Q being primary there must be a power of dl which is an element of Q. This contradicts dn ∈ Q for all n. 2. This follows from Proposition 10 in Chapter 4 § 4 of [2] and the fact that there always exists an l large enough such that I : d∞ = I : dl (see Exercise 8 in Chapter 4 § 4 of [2]). The algorithm can now be derived from Corollary 4 and the following result: Lemma 6. Let I ✂ k[x][Z], I ✂ k[x] be ideals and I prime. Then the ideal
I : d∞
d∈k[x]\I
is eﬀectively computable. Proof. By choosing a term order fulﬁlling x Z it is decidable if a given ideal of k[x][Z] contains an element of k[x] \ I . Let I = Q1 ∩ · · · ∩ Qt be a primary decomposition and Q be the set of all those primary components which do not contain an element of k[x] \ I . We claim that I : d∞ = Q. Q∈Q
d∈k[x]\I
The ideal d∈k[x]\I I : d∞ equals d∈k[x]\I (Q1 ∩ · · · ∩ Qt ) : d∞ which is
equal to d∈k[x]\I Q1 : d∞ ∩ · · · ∩ Qt : d∞ . For every primary component Qi which is not an element of Q there exists a summand where Qi is replaced by the ideal 1. Let dQi denote a polynomial
from k[x] \ I for which Qi : d∞ Qi = 1. Since I is prime we have Qi ∈Q dQi ∈ I hence there exists a summand Q1 : ( dQi )∞ ∩ · · · ∩ Qt : ( dQi )∞ Qi ∈Q
Qi ∈Q
in d∈k[x]\I I : d∞ which contains all other summands. This summand therefore equals the whole sum.
5
Four Examples
We will give four detailed examples in this section. Our ﬁrst example will be the invariant ﬁeld of a representation of the group (C, +).
Calculating Generators for Invariant Fields of Linear Algebraic Groups
401
1a a ∈ C} which is isomorphic to (C, +) 01 the deﬁning equations are the linear equations m11 = 1, m21 = 0, m22 = 1. From the ideal m11 Z1 + m12 Z2 − X1 , m21 Z1 + m22 Z2 − X2 , m11 − 1, m21 , m22 − 1 describing the points (m11 , m12 , m21 , m22 , Z1 , Z2 , X1 , X2 ) for which
Example 1. For the group G1 = {
m11 m12 m21 m22
Z1 X1 · = Z2 X2
we get by elimination of the variables m11 , m12 , m21 , m22 the deﬁning equations Z2 − X2 for the closed graph of the action. Specializing the variable X2 to the ﬁeld element x2 and computing a reduced Gr¨obner basis yields C(x1 , x2 )G1 = C(x2 ). For our next example we consider the noncompact group Id3 ⊗GL2 (C) where the group GL2 (C) operates synchronously on three vectors. Example 2. The deﬁning equations of the group GL2 (C) ⊕ GL2 (C) ⊕ GL2 (C) which consist of block matrices are simply mij = 0 for all those entries mij which are not within one of the three 2 × 2blocks. For the group G2 = Id3 ⊗ GL2 (C) we have the additional equations that the following polynomials must equal zero: m11 − m33 , m12 − m34, m21 − m43 , m22 − m44 , m11 − m55, m12 − m56 , m21 − m65 , m22 −m66 , i. e., the blocks must be equal. Proceeding as in the last example we get the following ideal describing the closed graph of the action: Z2 Z5 X3 − Z2 Z3 X5 − Z4 Z5 X1 + Z4 Z1 X5 + Z6 X1 Z3 + Z6 X3 Z1 , −Z2 X6 Z3 + Z2 X4 Z5 − Z4 X2 Z5 + Z4 Z1 X6 + Z6 Z3 X2 − Z6 Z1 X4 , −X2 Z5 X3 + X2 Z3 X5 + X4 Z5 X1 − X4 Z1 X5 − X6 Z3 X1 + X6 Z1 X3 , Z4 X5 X2 − Z6 X2 X3 − Z2 X4 X5 + Z2 X4 X5 + Z2 X3 X6 + X4 Z6 X1 − Z4 X1 X6 . Again specializing the variables Xi to ﬁeld elements xi and computing the reduced Gr¨ obner basis {Z1 + Z2 +
x1 x4 − x3 x2 −x1 x6 + x5 x2 Z5 + Z3 , x3 x6 − x4 x5 x3 x6 − x4 x5
−x1 x6 + x5 x2 x1 x4 − x3 x2 Z4 + Z6 } x3 x6 − x4 x5 x3 x6 − x4 x5
G2 3 x2 −x1 x6 +x5 x2 we get the invariant ﬁeld C( xx13 xx46 −x −x4 x5 , x3 x6 −x4 x5 ) = C(x1 , x2 , x3 , x4 , x5 , x6 )
Our third example has a motivation from physics. In the paper [7] invariants were applied to show that two quantum states are not locally equivalent, i. e., they are not on one orbit of the group SU2 (C) ⊗ SU2 (C) of local unitary transformations. Local in this context refers to the entanglement not being increased by local transformations.
402
J¨ orn M¨ uller–Quade and Thomas Beth
This leads to the general question how to compute invariants of tensor product groups. We will here consider the group SL2 (C)⊗SL2 (C). The group elements are of tensor product form ae af be bf ag ah bg bh ce cf de df cg ch dg dh where the variables a, b, . . . , h represent elements of C. This structure is directly reﬂected in the deﬁning equations. Example 3. The deﬁning equations for the group G3 = SL2 (C) ⊗ SL2 (C) are m11 − ae, m12 − af, m13 − be, m14 − bf, m21 − ag, m22 − ah, m23 − bg, m24 − bh, m31 − ce, m32 − cf, m33 − de, m34 − df, m41 − cg, m42 − ch, m43 − dg, m44 − dh for the group consisting of tensor product matrices plus ad−bc−1, eh−fg −1 for the determinants of the tensor factors being one. Calculating the ideal describing the closed graph of the action of G3 we get Z1 Z4 − Z2 Z3 − X1 X4 + X3 X2 . Specializing the variables Xi to ﬁeld elements xi , computing a reduced Gr¨ obner basis, and extracting coeﬃcients yields: C(x1 x4 − x3 x2 ) = C(x1 , x2 , x3 , x4 )G3 . An interesting example for calculating the ideal HG from generators of the ideal Jk(x)G comes from the cyclic group G generated by ω · Idn where ω denotes an lth primitive root of unity. The invariant ﬁeld k(x)G can be generated by n rational functions alone: k(x)G = k(xn1 , xx12 , xx13 , . . . , xxn1 ). But to generate the invariant ring k[x]G one needs all (exponentially many) monomials of degree l. The ring k[x]G proves that the Noether bound on the number of ring generators one needs is tight [16, Proposition 2.1.5]. This change from few ﬁeld generators to many ring generators can be observed when passing from Jk(x)G to HG . Example 4. For the group G = i · Id3 with the invariant ﬁeld k(x41 , xx12 , xx13 ) the ideal Jk(x)G ✂C(x1 , x2 , x3 )[Z1 , Z2 , Z3 ] is generated by Z14 −x41 , x1 Z2 −x2 Z1 , x1 Z3 − x3 Z1 (it is already saturated with respect to Z2 Z3 ). Substituting the ﬁeld elements x1 , x2 , x3 by variables X1 , X2 , X3 and computing a saturation with respect to all d ∈ k[X1 , X2 , X3 ] yields the generating system −X14 +Z14 , −X13 X2 +Z13 Z2 , −X3 X13 +Z13 Z3 , −X22 X12 +Z22 Z12 , −X3 X2 X12 + Z3 Z2 Z12, −X32 X12 + Z32 Z12 , −X23 X1 + Z23 Z1 , −X3 X22 X1 + Z22 Z3 Z1 , −X1 X2 X32 + Z2 Z32 Z1 , −X33 X1 + Z33 Z1 , −X1 Z2 + X2 Z1 , X3 Z1 − X1 Z3 , −X24 + Z24 , −X3 X23 + Z23 Z3 , −X32 X22 + Z22 Z32 , −X33 X2 + Z33 Z2 , X3 Z2 − X2 Z3 , −X34 + Z34 . Following Derksen’s algorithm we set the variables Z1 , Z2 , Z3 to zero and obtain all monomials of degree four. This already is a generating system for the invariant ring. Acknowledgements: We thank Rainer Steinwandt for [18,19].
Calculating Generators for Invariant Fields of Linear Algebraic Groups
403
References 1. Thomas Becker and Volker Weispfenning. Gr¨ obner Bases: A Computational Approach to Commutative Algebra. In cooperation with Heinz Kredel. Graduate Texts in Mathematics. Springer, New York, 1993. 2. David Cox, John Little, and Donal O’Shea. Ideals, Varieties and Algorithms: An Introduction to Computational Algebraic Geometry and Commutative Algebra. Undergraduate Texts in Mathematics. Springer, New York, 1992. 3. W. Decker and T. De Jong. Gr¨ obner Bases and Invariant Theory. In Bruno Buchberger and Franz Winkler, editors, Gr¨ obner Bases and Applications (Proc. of the Conference 33 Years of Gr¨ obner Bases), volume 251 of London Mathematical Society Lecture Notes Series. Cambridge University Press, 1998. 4. Harm Derksen. Computation of reductive group invariants. Preprint1 , 1997. 5. David Eisenbud. Commutative Algebra with a View Toward Algebraic Geometry. Graduate Texts in Mathematics. Springer, New York, 1995. 6. P. Gianni, B. Trager, and G. Zacharias. Gr¨ obner bases and primary decomposition of polynomial ideals. Journal of Symbolic Computation, 6:149–167, 1988. 7. M. Grassl, M. R¨ otteler, and Th. Beth. Computing Local Invariants of Quantum Bit Systems. Physical Review A, 58(3):1833–1839, September 1998. 8. Gregor Kemper. An Algorithm to Determine Properties of Field Extensions Lying over a Ground Field. IWR Preprint 9358, Heidelberg, Oktober 1993. 9. Gregor Kemper. Das Noethersche Problem und generische Polynome. Dissertation. Universit¨ at Heidelberg, August 1994. 10. Gregor Kemper. Calculating Invariant Rings of Finite Groups over Arbitrary Fields. Journal of Symbolic Computation, 21(3):351–366, M¨ arz 1996. 11. J¨ orn M¨ ullerQuade and Thomas Beth. Computing the Intersection of Finitely Generated Fields. Presented on the ISSAC poster session, 1998. An ex