This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
0 and a finite set R of rules of dimension m. A rewriting system (m, R) is terminating if there is not an infinite sequence of rewrite steps based on R. Given a rewriting system (m, R), a VASP V of dimension m is in normal form if there is not a VASP W such that V →R W , and V has a normal form if there is a finite sequence from V to some normal form. 3.2
Our Rewriting Rules and Strategies
When a rule interface is only a set of states, we give it by states labelled with capital letters Q, R, S, . . .. Given m > 0, we consider rules given in Fig. 3 called Regular Sum (RSum), Backward Zip (BZip), Backward Swap (BSwap), Collapse (Col), Zed (Zed) and Backward Expansion (BExp). For each rule, valuations are divided : that is, the valuation (of the regular arcs) of one side is a division of the valuation (of the regular arcs) of the other side. So they are set of rules. Abusively we consider rule sub-cases with the same name: BZip where one of the left-hand side regular arc is erased (and the corresponding source and target are merged), and BExp without regular arcs both side (then corresponding source and target are merged). We define rules called Forward Zip (FZip), Forward Swap (FSwap) and Forward Expansion (FExp) by respectively BZip, BSwap and BExp on the reversed VASP. We denote r−1 the reversed relation of rule r: r−1 goes from right-hand side of rule r to left-hand side of rule r. Remark that Zed, Col and RSum rules are their own reversed. Remark that by definition of rules and rewrite steps, there is no arc other than those indicated, to or from states which are not in a rule interface. Lemma 3 (Key lemma). Let m > 0, let r = (m, L ⊇ K ⊆ R) be one of the previously defined rules except Zed, but including reversed and inverted rules. Let C be a configuration of zero in-degree states of L (labelled Q, Q for Backward rules in Fig.3). Let C be a configuration of zero out-degree states of R. Then, C L C
⇔
C R C .
140
P. Jacob´e de Naurois and V. Mogbil
Q
R
−→RSum
R Q
R
−→BZip
Q
R Q
S
S
R Q
S
−→BSwap
R Q
S
Q
R
−→Col
0m Q
R
T T
Q R Q
R
−→Zed
Q
R
Q
Q
R
Q
Q
R R
−→BExp Q
R R
Fig. 3. Rewriting rules
Proof. Let r ∈ R. Since by definition C and C are configurations of the interface of rule r, they are configurations of R. The result is easily obtained by a case analysis because of a division of a split pair is the “reverse” of a division of a join pair, and reciprocally. Notably, the result is obvious for BSwap, Col and RSum, for all reversed Backward rules, and for inverse rules when proved for the corresponding rule. An interesting case is for the BZip rule: the right implication is clear. For the converse let us denote by c the regular arc of R, and a and b the regular arcs of L. By definition of BZip we have a divided valuation: (v(a), v(b)) is a division of v(c). Let C = {(Q, x)}, if C R C then x v(c) in R, and for any division (x1 , x2 ) of x − v(c) we have C = {(R, x1 ), (S, x2 )}. By lemma 1 let us take a division (x1 , x2 ) of x − v(c) such that x1 v(a) and x2 v(b). This is the requirement for reachability from C in L: we obtain C L C ⇐ C R C . For BZip rules, it is important to understand that only the requirement of divided valuation in VASP gives the left implication. Remark that only C L C ⇒ C R C holds for rule Zed. From a reachability point of view, we cannot use a rewriting system without knowing if it is terminating, but (hyper- and) graph rewriting termination is undecidable [16]. Here we present a notion of strategies for a rewriting system that allows us to study termination of rewriting systems in a restricted case. Other methods are certainly useful but we are just interested in reachability. A strategy for a rewriting system is a VASP transformation that defines when a rewrite step is performed and what rule it takes: Definition 10 (Strategy). Given a VASP rewriting system S, a strategy is a function fS from VASP set to itself. A strategy fS is normalizing if whenever V has a normal form, then there is some n so that fS n (V ) is a normal form. Given a rewriting system with a singleton rule set {r}, a strategy is usually a function that maps a VASP V with a redex into fS (V ), the corresponding reduct obtained by one rewrite step from V . Sharper strategies give an order on the redexes for determining which redex is rewritten. A strategy implementation is often an algorithm for VASP traversal with a decision function to choose rules to be applied. We use in what follows this set of rules: R2 = {RSum, BZip−1, F Zip−1 , BSwap, F Swap, BExp}.
Rewriting Systems for Reachability inVector Addition Systems with Pairs
141
We consider the following separation strategy using R2 rules, which is a recursive function that maps a VASP V with a ribbon ρ to an isomorphic VASP where ρ is rewritten into a separated ribbon by pushing away every split arcs to the left of join arcs (or the converse, from join arcs pushed away to the right, or by mixing them). BExp rule allows to cross in the right direction two arc pairs with “opposite” sharing. We are interested in terminaison of the separation strategy, but the BExp rule may create new BExp redexes. However we have: Lemma 4. Given a rewriting system (m, {BExp}), there is no infinite sequence of rewrite steps in a ribbon. Proof. We give a sketch by generalizing VASP pairs of paired graphs to hyperarcs of hypergraphs. In this case there is a measure that decreases in every BExp redex context, so terminaison for this rewrite rule generalizes to hyperarcs. This give us a bound on the number of BExp rewrite steps in the VASP by simulating with a fixed maximal number of steps the rule for hyperarcs.
4
Reachability Relationship between VASP and VASS
We give a simple example illustrating how rewriting systems are used as a tool to obtain a reachability equivalence. With VASP rewriting system we just need to consider an ad-hoc strategy to rewrite a separated ribbon into a VASS. Lemma 5. Let V be a separated ribbon from {s} to {t} with valuation v. Let W be a VASP of same dimension which is only a regular arc (s , t ) valued by Σv(a) for all regular arc a ∈ V . We have: {(s, x)} V {(t, y)}
⇔
{(s , x)} W {(t , y)}
Proof. Let V be a separated ribbon from {s} to {t} of dimension m > 0 and valuation v. Let W be the regular arc defined in the lemma hypothesis. Let V = (m, {RSum, Col, BZip, F Zip}) be a rewriting system. Let U be the normal form of V obtained by the following strategy on V: firstly from s we apply → − F = (→∗RSum →BZip )∗ on the 2-tree with root {s}, and we apply from t the ← − same reversed strategy, that is F = (→∗RSum →F Zip )∗ , on the reversed 2-tree with root {t}. Secondly we apply →∗Col and we finish with →∗RSum . Corollary 2. The reachability problem for separated ribbon between states reduces to the reachability problem for VASS (via many-one reductions). Now we give an example using a separation strategy which preserves reachability. Lemma 6. Let V be a ribbon from {s} to {t}. Let V = (m, R2 ) be a rewriting system. There is separating strategy for V rewriting V into a separated ribbon W {(s, x)} V {(t, y)} ⇔ {(s , x)} W {(t , y)} from {s } to {t } such that: Proof. Let V be a ribbon from {s} to {t}. Let the separation strategy be:
142
P. Jacob´e de Naurois and V. Mogbil
For each shared source of pairs, we call it d, if a path p from s to d is not a branch of a 2-tree from s then apply BExp, using other R2 rules in order to reduce the length of p and to reveal BExp redexes. Remark that if there is a subpath p of p from a shared target of a pair to a d state, then its length can always be reduced to zero by applying rules of R2 − {BExp}. It follows that in a not separated ribbon, BExp redexes can always be revealed. If the d states to treat are chosen with smallest distance from s then by lemma 4 the strategy terminates. Remark that V rewrites a ribbon into a ribbon. So V is rewritten into a ribbon such that every shared source paired states are in a branch of a 2-tree from s. In other words this ribbon is separated. Remark that a ribbon is defined to be both B-path and F-path, and this is essential to ensure that the strategy terminates by building a separated ribbon: there is no Zed rule to apply to build a new BExp redex, so one can continue the strategy preserving reachability, or we have already a separated ribbon. Corollary 3. The reachability problem for ribbon between states reduces to the reachability problem for separated ribbon between states. We easily extend the lemma 6 to ribbons from {s} to an arbitrary set T : the implemented separation strategy both terminates and normalizes in a separated ribbon from {s} to T (extended as expected) preserving the reachability. This is generalizable to separation for ribbons between arbitrary sets when there is a bridge between S and T (Definition 4). We are interested in bridged ribbons because they are always associated to a positive promenade in a BVASS. Lemma 7. Let V be a bridged ribbon from S to T . Let V = (m, R2 ) be a rewriting system. There is a separating strategy for V rewriting V into a separated ribbon W from S to T such that: S V T ⇔ S W T . Proof. Let ρ be a ribbon from S to T with a bridge such that the, at most three, simply connected components are denoted by Viρ , i ∈ I. Let i ∈ I. Viρ is by definition a ribbon from Si to Ti where either Si or Ti is a singleton, whose state is a state of the bridge. W.l.g. let Si = {si } be an arbitrary such singleton. By the previous extension of lemma 7, Viρ is separable in Wiρ between the same sets such that reachability is preserved. Let W be the normal form by separation strategy of the ribbon which consists of the bridge of ρ added to the Wiρ ribbon. We have that W is the normal form of ρ and again reachability is preserved. Remark that there is a bridge in all ribbons from S to T if at least one of these sets is a singleton. In fact one bridge is the arc or arc pair which is to or from the state of the singleton set. So bridged ribbons generalize ribbons from a singleton to a set. To compare bridged ribbon reachability to VASS reachability, we want to reduce reachability of arbitrary separated ribbon to VASS reachability. We have:
Rewriting Systems for Reachability inVector Addition Systems with Pairs
143
Lemma 8. Let ρ be a separated ribbon between arbitrary sets S and T . Let s and t be two states not in ρ. Let θs,S (respectively θT,t ) be a VASP of same dimension than ρ consisting of a binary tree of split pairs (respectively of join pairs) from {s} to S (respectively from T to {t}). Let W be the ribbon from {s} to {t} which consists of θs,S composed with ρ and composed with θT,t (by identity morphism on S and T ). We have: if x (respectively y) is a division of {xi }1i|S| (respectively of {yj }1i|T | ) then {(si , xi )si ∈S } ρ {(tj , yj )tj ∈T }
⇔
{(s, x)} W {(t, y)}
Remark that the ribbon W is separable, therefore there is a reduction between reachability for bridged ribbon and VASS reachability using a separability strategy. So by lemma 7 and 8 we have: Corollary 4. The reachability problem for bridged ribbons reduces to the reachability problem for VASS (via many-one reductions). Then the former is decidable. We finish a last step further with a strategy for VASP which are not ribbons: Repeat rewrite a ribbon between arbitrary sets by separation strategy Until all ribbons are separated. This (too strong) strategy does not always terminate, sometimes for bad reasons: rules cannot be applied because of interface restrictions, for example, when there is an arc to a node of the left-hand side of a rule, whose target or source is not in the interface. Rewriting rules with interfaces which consist of all the states of the left-hand side are quite inextricable (from a reachability point of view). Thus we are even far from semidecidability. Lemma 9. Given a VASP V , if the separation strategy terminates for V in a normal form W , we have: S V T ⇒ S W T . In such a normal form, all ribbons are separated. So by corollary 4 we have: Proposition 2. Given a VASP V , if the separation strategy terminates for V , then the reachability problem for V with divisible initial and final configurations reduces to VASS reachability problem.
5
Conclusion
We introduce a generalization of VASS called Vector Addition Systems with Pairs (VASP) by pairing arcs with same source or with same target. These correspond to split and joint transitions with a multiset of vectors. The reachability decision problem for VASP RP VASP subsumes the one for BVASS (as a sub-case of VASP without split pairs) which is equivalent to the open MELL provability decision problem. There is also a natural simplification of RP VASP not valid for BVASS.
144
P. Jacob´e de Naurois and V. Mogbil
We present graph rewriting systems in order to study paths in VASP. This tool permits reduction between restricted forms of VASP and VASS, preserving reachability properties. Notably the reachability problem is decidable for VASP in which our separation strategy terminates. Other strategies, like zipping one using {RSum, BZip, BSwap, BCol} rules and reversed rules, can be used to obtain reachability for other kind of VASP. By zipping strategy we think to rewrite a ribbon starting from a source state and applying rules step-by-step on each outgoing arcs (source paired or not), making synchronization on each target paired state by reducing the remaining branch of ribbon before it. The main other way for reachability decision is to adapt the original proof of reachability for VASS to VASP. It seems approachable to obtain decidability associated to Karp and Miller “trees” for VASP.
References 1. Br´ azdil, T., Jancar, P., Kucera, A.: Reachability games on extended vector addition systems with states. CoRR (2010) abs/1002.2557 2. de Groote, P., Guillaume, B., Salvati, S.: Vector addition tree automata. In: LICS, pp. 64–73. IEEE Computer Society, Los Alamitos (2004) 3. Demri, S., Jurdzinski, M., Lachish, O., Lazic, R.: The covering and boundedness problems for branching vector addition systems. In: Kannan, R., Kumar, K.N. (eds.) FSTTCS. LIPIcs, vol. 4, pp. 181–192 (2009) 4. Esparza, J., Nielsen, M.: Decidability issues for petri nets - a survey. Bulletin of the EATCS 52, 244–262 (1994) 5. Gallo, G., Longo, G., Pallottino, S.: Directed hypergraphs and applications. Discrete Applied Mathematics 42(2), 177–201 (1993) 6. Ginsburg, S., Spanier, E.H.: Semigroups, presburger formulas, and languages. Pacific Journal of Mathematic 16(2), 285–296 (1966) 7. Ginzburg, A., Yoeli, M.: Vector addition systems and regular languages. J. Comput. Syst. Sci. 20(3), 277–284 (1980) 8. Girard, J.-Y.: Linear logic. Theor. Comput. Sci. 50, 1–102 (1987) 9. Hopcroft, J.E., Pansiot, J.-J.: On the reachability problem for 5-dimensional vector addition systems. TCS 8, 135–159 (1979) 10. Karp, R.M., Miller, R.E.: Parallel program schemata. J. Comput. Syst. Sci. 3(2), 147–195 (1969) 11. Kosaraju, S.R.: Decidability of reachability in vector addition systems (preliminary version). In: STOC, pp. 267–281. ACM, New York (1982) 12. Lambert, J.-L.: A structure to decide reachability in petri nets. Theor. Comput. Sci. 99(1), 79–104 (1992) 13. Mayr, E.W.: An algorithm for the general petri net reachability problem. SIAM J. Comput. 13(3), 441–460 (1984) 14. M¨ uller, H.: The reachability problem for vas. In: Rozenberg, G., Genrich, H.J., Roucairol, G. (eds.) APN 1984. LNCS, vol. 188, pp. 376–391. Springer, Heidelberg (1985) 15. Parikh, R.: On context-free languages. J. ACM 13(4), 570–581 (1966) 16. Plump, D.: Termination of graph rewriting is undecidable. Fundam. Inform. 33(2), 201–209 (1998)
Rewriting Systems for Reachability inVector Addition Systems with Pairs
145
17. Reutenauer, C.: Aspects Math´ematiques des R´eseaux de P´etri. Masson (1989) 18. Verma, K.N., Goubault-Larrecq, J.: Karp-miller trees for a branching extension of vass. Discrete Mathematics & Theoretical Computer Science 7(1), 217–230 (2005) 19. Verma, K.N., Goubault-Larrecq, J.: Alternating two-way ac-tree automata. Inf. Comput. 205(6), 817–869 (2007)
The Complexity of Model Checking for Intuitionistic Logics and Their Modal Companions Martin Mundhenk and Felix Weiß Universit¨ at Jena, Institut f¨ ur Informatik, Jena, Germany {martin.mundhenk,felix.weiss}@uni-jena.de
Abstract. We study the model checking problem for logics whose semantics are defined using transitive Kripke models. We show that the model checking problem is P-complete for the intuitionistic logic KC. Interestingly, for its modal companion S4.2 we also obtain P-completeness even if we consider formulas with one variable only. This result is optimal since model checking for S4 without variables is NC1 -complete. The strongest variable free modal logic with P-complete model checking problem is K4. On the other hand, for KC formulas with one variable only we obtain much lower complexity, namely LOGDCFL as an upper bound.
1
Introduction
We investigate the complexity of the model checking problem for intuitionistic propositional logics and for its modal companions. Intuitionistic propositional logic IPC (see e.g. [1]) is the part of classical propositional logic that goes without the use of the excluded middle a ∨ ¬a. We will use its semantical definition by Kripke models with a partially ordered set of states and a monotone valuation function. A straightforward upper bound follows from the G¨ odel-Tarski translation (see e.g. [2, p.96]) that embeds intuitionistic logic into the modal logic S4. Since the model checking problem—given a formula and a model, does the model satisfy the formula (or does the formula evaluate to “true” under the model)?—for modal logic is in P [3], we obtain the same as an upper bound for the problem in intuitionistic logic. For classical propositional logic, the model checking problem can be solved in logarithmic space [4] and even better in alternating logtime [5]. Since the models for classical logic can be seen as a special case of Kripke models with one state only, we cannot expect such a low complexity for intuitionistic logic, where the models may consist of many states. More generally, we will consider the classical propositional logic PC, the intuitionistic logics LC (G¨ odel-Dummet logic, see [6]), KC (Jankov’s logic, see [6]), IPC, and BPL (Visser’s basic propositional logic [7]), and their respective modal companions S5, S4.3, S4.2, S4, and K4 (see e.g. [2] for an overview). Remind that PC ⊃ LC ⊃ KC ⊃ IPC ⊃ BPL. Our first hardness result (Theorem 2) is the P-hardness of the model checking problem for the superintuitionistic (or intermediate) logic KC. This hardness A. Kuˇ cera and I. Potapov (Eds.): RP 2010, LNCS 6227, pp. 146–160, 2010. c Springer-Verlag Berlin Heidelberg 2010
The Complexity of Model Checking for Intuitionistic Logics
147
Legend:
K4
P-complete S4
BPL
AC1 -complete S4.2
IPC
in LOGDCFL and NC1 -hard
K41 S4.3
KC S41
S5
NC1 -complete
BPL1
LC S4.21
IPC1 [10]
PC [5]
K40 S4.31
KC1 S40
S51
BPL0
LC1 S4.20 PC1 [5]
IPC0
Fig. 1. Summary of results: the structure of the logics and the complexity of the model checking problem. Lower and upper bounds for the uncircled logics follow from their neighbourhoods, but non-trivial bounds are unknown.
result consequently also holds for IPC and BPL and their companions S4.2, S4, and K4. Hence, the well-known upper bound [3] turns out to be the lower bound. Since the expressivity of intuitionistic logics seems to be much lower than that of their modal companions, it is somewhat surprising that all these logics have P-hard model checking problems. In fact, the satisfiability problem for S4.2 up to K4 are PSPACE-complete [8,9], whereas the satisfiability problem for intuitionistic logic has the same complexity as that for classical logic—both are NP-complete. We can point out some differences for the model checking problem that can be seen as a result of the greater expressivity of modal logics. This difference appears if we consider formulas with one variable only or without any variables. In Theorem 3 we show that the model checking problem remains Phard for S4.2, even if we consider formulas with one variable only. For K4 we show P-hardness, even if we consider formulas without variables (Theorem 4). These results are in contrast to the recent result in [10] showing that the model checking problem for IPC with one variable only is AC1 -complete. For KC with one variable only we will show that the complexity of model checking is even lower, namely in LOGDCFL (Theorem 7). Regarding the number of variables for S4.2 resp. S4, Theorem 3 is optimal. We show that model checking for the variable free fragment of S4 is NC1 -complete (Theorem 8).
148
M. Mundhenk and F. Weiß
Figure 1 summarizes our results. There, PC denotes classical propositional logic, and subscript 1 or 0 (e.g. S4.21 ) denotes the fragment with one variable only resp. without variables. Technically, our hardness results use a reduction from the alternating graph accessibility problem Agap, being one of the standard P-complete problems [11,12]. It can straightforwardly be logspace reduced to the model checking problem for propositional modal logic by taking the alternating graph as the frame of a Kripke model (with an empty valuation function) and a formula essentially consisting of a sequence of 2 and 3 operators that simulates the search through the graph. This straightforward approach does not work anymore when we want to reduce to Kripke models with transitive frames—like for the modal logic S4 or intuitionistic propositional logic. On the one hand, making an alternating graph transitive, destroys essential properties it has, and on the other hand, a logspace reduction does not have enough computational power to calculate the transitive closure of a directed graph. This paper is organized as follows. In Section 2 we introduce the notations for the logics under consideration, and we show P-completeness of a graph accessibility problem for a special case of alternating graphs that will be used for our P-hardness proofs. In Section 3 we give P-hardness proofs for the model checking problem for KC, S4.21 , and K40 . The upper bounds are presented in Section 4. The resulting completeness results and conclusions are drawn in Section 5.
2
Preliminaries
Kripke Models. We will consider different propositional logics whose formulas base on a countable set PROP of propositional variables (resp. atoms). A Kripke model is a triple M = (U, R, ξ), where U is a nonempty and finite set of states, R is a binary relation on U , and ξ : PROP → P(U ) is a function — the valuation function. Informally spoken, for any variable it assigns the set of states in which this variable is satisfied. (U, R) can also be seen as a directed graph—it is called a frame in this context. Modal Propositional Logic. The language ML of modal logic is the set of all formulas of the form ϕ ::= ⊥ | p | ϕ → ϕ | 3ϕ , where p ∈ PROP. As usual, we use the abbreviations ¬ϕ := ϕ → ⊥, := ¬⊥, ϕ ∨ ψ := (¬ϕ) → ψ, ϕ ∧ ψ := ¬(ϕ → ¬ψ), and 2ϕ := ¬3¬ϕ. The semantics is defined via Kripke models. Given a model M = (U, R, ξ) and a state s ∈ U , the satisfaction relation for modal logics |=M is defined as follows. M, s |=M ⊥ M, s |=M p iff s ∈ ξ(p), p ∈ PROP, |=M ϕ or M, s |=M ψ, M, s |=M ϕ → ψ iff M, s M, s |=M 3ϕ
iff ∃t ∈ U : sRt and M, t |=M ϕ.
The Complexity of Model Checking for Intuitionistic Logics
149
A formula ϕ is satisfied by model M in state s iff M, s |=M ϕ. If it is satisfied by M in every state s of M, then we write M |=M ϕ. The modal logic defined in this way is called K (after Saul Kripke) and it is the weakest normal modal logic. We will consider the stronger modal logics K4, S4, S4.2, S4.3, and S5. The formulas in all these logics are the same as for ML. Since we are interested in formula evaluation, we use the semantics defined by Kripke models. They will be defined by properties of the frame (i.e. graph) (U, R) that is part of the model. A frame (U, R) is reflexive, if xRx for all x ∈ U , and it is transitive, if for all a, b, c ∈ U , it follows from aRb and bRc that aRc. A reflexive and transitive frame is called a preorder. If a preorder (U, R) has the additional property that for all a, b ∈ U there exists a c ∈ U with aRc and bRc, then (U, R) is called a directed preorder. If for all a, b ∈ U holds aRb or bRa, then (U, R) is called a linear preorder. The semantics of several modal logics can be defined by restricting the class of Kripke frames under consideration. The semantics of K4 is defined by transitive frames. This means, that a formula α is a theorem of K4 if and only if M |=M α for all models M whose frame is transitive. The semantics of S4 is defined by preorders, of S4.2 by directed preorders, of S4.3 by linear preorders, and of S5 by equivalence relations (symmetric preorders). For any logic L, let Li denote its fragment with i variables only. The fragment L0 has no variables but the constant ⊥ only. Intuitionistic Propositional Logic. The language IPC of intuitionistic propositional logic is the same as that of propositional logic PC, i.e. it is the set of all formulas of the form ϕ ::= ⊥ | p | ϕ ∧ ϕ | ϕ ∨ ϕ | ϕ → ϕ , where p ∈ PROP. As usual, we use the abbreviations ¬ϕ := ϕ → ⊥ and := ¬⊥. Because of the semantics of intuitionistic logic, one cannot express ∧ or ∨ using → and ⊥. The semantics is defined via Kripke models M = (U, R, ξ) that fulfill certain restrictions. Firstly, R is a preorder on U , and secondly, the valuation function ξ : PROP → P(U ) is monotone in the sense that for every p ∈ PROP, a, b ∈ U : if a ∈ ξ(p) and aRb, then b ∈ ξ(p). We will call such models intuitionistic. Given an intuitionistic model M = (U, , ξ) and a state s ∈ U , the satisfaction relation for intuitionistic logics |=I is defined as follows. M, s |=I ⊥ M, s |=I p
iff s ∈ ξ(p), p ∈ PROP,
M, s |=I ϕ ∧ ψ M, s |=I ϕ ∨ ψ
iff M, s |=I ϕ and M, s |=I ψ, iff M, s |=I ϕ or M, s |=I ψ,
M, s |=I ϕ → ψ iff ∀n s : if M, n |=I ϕ then M, n |=I ψ An important property of intuitionistic logic is the monotonicity property: if M, s |=I ϕ then ∀n s holds M, n |=I ϕ, for all formulas ϕ.
150
M. Mundhenk and F. Weiß
int. logic
modal companion
frame properties
BPL IPC KC LC PC
K4 S4 S4.2 S4.3 S5
transitive preorder directed preorder linear preorder equivalence relation
Fig. 2. Intuitionistic logics, their modal companions, and the common frame properties
A formula ϕ is satisfied by an intuitionistic model M in state s iff M, s |=I ϕ. Intuitionistic propositional logic IPC is the set of IPC-formulas that are satisfied by every intuitionistic model. Notice that IPC is a proper subset of the tautologies in classical propositional logic PC.1 The superintuitionistic (or intermediate) logics KC and LC are also subsets of the tautologies in classical propositional logic, but proper supersets of IPC. Syntactically, KC results from adding the weak law of the excluded third ¬a ∨ ¬¬a to IPC. Its semantics is defined by Kripke frames that are directed preorders—similar as for S4.2. LC (also called G¨odel-Dummett logic) results syntactically from adding (a → b) ∨ (b → a) to IPC. Its semantics is defined by Kripke frames that are linear preorders–similar as for S4.3. The logic BPL is Visser’s basic propositional logic [7]. Its semantics is defined by transitive (not necessarily reflexive) Kripke models with monotone valuation functions. Hence it holds that BPL ⊆ IPC. Finally, the classical propositional logic PC can syntactically be seen as IPC plus the law of the excluded third a ∨ ¬a. Its semantics is defined by Kripke frames that are equivalence relations–similar as for S5. Notice that in a Kripke frame being an equivalence relation and having a monotone valuation function, all equivalent states satisfy exactly the same formulas. Therefore, evaluating a formula ϕ in a state w in such a model is the same as evaluating ϕ in the classical propositional sense under the assignment in which exactly those variables p with w ∈ ξ(p) are set to true. The G¨ odel-Tarski translation (see e.g. [2, p.96]) maps any IPC-formula α to a modal formula by inserting a 2 before every implication and every atom. For a formula α, let αGT be its G¨ odel-Tarski translation. The goal of this translation is to preserve validity. I.e., α is a theorem for IPC (resp. BPL, KC, LC, PC) iff αGT is a theorem for S4 (resp. K4, S4.2, S4.3, S5). Therefore, S4 (resp. K4, S4.2, S4.3, S5) is called a modal companion of IPC (resp. BPL, KC, LC, PC). Figure 2 gives an overview about the intuitionistic logics and their modal companions used here. The G¨odel-Tarski translation also preserves satisfaction in the different logics. Lemma 1. Let α be a formula from IPC, and an intuitionistic M with state s. Then M, s |=I α if and only if M, s |=M αGT . 1
The satisfiable formulas in intuitionistic logic are the same as in classical propositional logic.
The Complexity of Model Checking for Intuitionistic Logics
151
Model Checking Problems. This paper examines the model checking problems L-Mc for logics L whose formulas are evaluated on Kripke models with different properties. Problem: Input: Question:
L-Mc
ϕ, M, s, where ϕ is an L-formula, M = (U, R, ξ) is a Kripke model for L, and s ∈ U is a state Is ϕ satisfied by M in state s ?
We assume that formulas and Kripke models are encoded in a straightforward way. This means, a formula is given as a text, and the graph (U, R) of a Kripke model is given by its adjacency matrix that takes |U |2 bits. Therefore, only finite Kripke models can be considered. Notice that all instances ϕ, M, s of IPC-Mc have a graph (U, R) contained in M that is a preorder. Instances without this property can be assumed to be rejected. The same holds for S4-Mc and S41 -Mc. Accordingly, KC-Mc, S4.2-Mc, and S4.21 -Mc (resp. LC-Mc and S4.3-Mc) have instances only where the graph underlying the model is a directed preorder (resp. linear preorder). Since we only consider finite models, every directed preorder must have a maximal element. Therefore, it can be easily decided whether the model has the order property under consideration. Complexity. We assume familiarity with the standard notions of complexity theory as, e. g., defined in [13]. In particular, we will show results for the classes LOGDCFL and P. The notion of reducibility we use is the logspace many-one reodel-Tarski translation can be seen as a reduction between duction ≤log m . The G¨ the model checking problems for intuitionistic logics and their modal companlog log ions, namely BPL-Mc ≤log m K4-Mc, IPC-Mc ≤m S4-Mc, KC-Mc ≤m S4.2-Mc, log log LC-Mc ≤m S4.3-Mc, and PC-Mc ≤m S5-Mc. The respective reducibilities also hold for the model checking problems for formulas with any restricted number of variables. LOGDCFL is the class of sets that are ≤log m -reducible to deterministic contextfree languages. It is also characterized as sets decidable by deterministic Turing machine in polynomial-time and logarithmic space with additional use of a stack. The inclusion structure of the classes under consideration is as follows. NC1 ⊆ L ⊆ LOGDCFL ⊆ AC1 ⊆ P L denotes logspace, the formula value problem for propositional logic is complete for NC1 (= alternating logarithmic time) [5], and the model checking problem for IPC1 is complete for AC1 (= alternating logspace with logarithmically bounded number of alternations) [10]. P-complete problems. Chandra, Kozen, and Stockmeyer [11] have shown that the Alternating Graph Accessibility Problem Agap is P-complete. In [12] it is mentioned that P-completeness also holds for a bipartite version. An alternating graph G = (V, E) is a bipartite directed graph where V = V∃ ∪ V∀ are the partitions of V . Nodes in V∃ are called existential nodes, and
152
M. Mundhenk and F. Weiß
nodes in V∀ are called universal nodes. The property apath G (x, y) for nodes x, y ∈ V is defined as follows. 1) apath G (x, x) holds for all x ∈ V 2a) for x ∈ V∃ : apath G (x, y) iff ∃z ∈ V∀ : (x, z) ∈ E and apath G (z, y) 2b) for x ∈ V∀ : apath G (x, y) iff ∀z ∈ V∃ : if (x, z) ∈ E then apath G (z, y) The problem Agap consists of directed bipartite graphs G and nodes s, t that satisfy the property apath G (s, t). Notice that in bipartite graphs existential and universal nodes are strictly alternating. Problem: Input: Question:
Agap
G, s, t, where G is a directed bipartite graph does apath G (s, t) hold?
Theorem 1. [11,12] Agap is P-complete under ≤log m -reductions. For our purposes, we need an even more restricted variant of Agap. We claim that the graph is sliced. An alternating slice graph G = (V, E) is a directed bipartite acyclic graph with a bipartitioning V = V∃ ∪ V∀ , and a further partitioning = j) where V∃ = Vi and V = V1 ∪ V2 ∪ · · · ∪ Vm (m slices, Vi ∩ Vj = ∅ if i i≤m,i odd Vi , such that E ⊆ Vi × Vi+1 — i.e. all edges go from V∀ = i≤m,i even
i=1,2,...,m−1
slice Vi to slice Vi+1 (for i = 1, 2, . . . , m − 1). Finally, we claim that all nodes in a slice graph excepted those in the last slice Vm have outdegree > 0. Problem: Input: Question:
AsAgap
G, s, t, where G = (V∃ ∪ V∀ , E) is a slice graph with slices V1 , . . . , Vm , and s ∈ V1 ∩ V∃ , t ∈ Vm ∩ V∀ does apath G (s, t) hold?
It is not hard to see that this version of the alternating graph accessibility problem remains P-complete. Lemma 2. AsAgap is P-complete under ≤log m -reductions. Sketch of Proof. AsAgap is in P, since it is a special case of Agap, that is known to be in P, and since instances G, s, t where G is not a slice graph or s ∈ V1 ∩ V∃ or t ∈ Vm ∩ V∀ can easily be identified. In order to show P-hardness of AsAgap, it suffices to find a reduction Agap ≤log m AsAgap. For an instance G, s, t of Agap where G has n nodes it is straightforward to construct an instance Gn , s , t of AsAgap using the considerations from above. If Gn , s , t ∈ AsAgap, then there exists a tree being a subgraph of Gn , that witnesses this fact. This tree can directly be transformed to a witness for G, s, t ∈ Agap. If G, s, t ∈ Agap, this is also be witnessed by a (finite) tree T that can be seen to consist of copies of nodes and edges of G. This tree can be trimmed in a way that on every path from the root to a leaf, every node appears at most once. Hence T induces a tree that witnesses
Gn , s , t ∈ AsAgap.
The Complexity of Model Checking for Intuitionistic Logics
3
153
Lower Bounds
We now give hardness results for the model checking problem. The P-hardness proofs use logspace reductions from the P-hard problem AsAgap (Lemma 2). The slice graph is transformed to a frame to be used in an instance of the model checking problem. Since the semantics of the logics under consideration are defined by Kripke models with frames that are transitive (and reflexive), we need to produce frames that are transitive (and reflexive). The straightforward way would be to take the transitive closure of a slice graph. But we cannot compute the transitive closure of a directed graph in logspace. Fortunately, slice graphs can easily be made transitive by adding all edges that “jump” from a node to a node that is at least two slices higher. Clearly, the resulting graph is not anymore a slice graph, but it is a transitive supergraph of the transitive closure of the slice graph. We then will use the valuation function in order to let us rediscover in which slice a state is. t
t
s
s
Fig. 3. A slice graph and its pseudo-transitive closure
Definition 1. Let V≥i := j=i,i+1,...,m Vj , and V≤i := j=1,2,...,i Vj . The pseudotransitive closure of a slice graph G = (V, E) with V = V1 ∪ . . . ∪ Vm is the graph G = (V, E ) where := E ∪ Vi × V≥i+2 . E i=1,2,...,m−2
The reflexive and pseudo-transitive closure of G is the graph G = (V, E ) where E
:=
E ∪ V × V.
An example for a slice graph and its pseudo-transitive closure is shown in Figure 3. Theorem 2. KC-Mc — i.e. the model checking problem for KC — is P-hard.
154
M. Mundhenk and F. Weiß
t
slice 4: a1 , a2 , a3 , a4
∀
V4
∃
V3
∃
slice 3: a1 , a2 , a3
∀
V2
∀
slice 2: a1 , a2
∃
V1
∃
slice 1: a1
t
s
Fig. 4. A slice graph G, and the model MG as constructed in the proof of Theorem 2. Pseudo-transitive edges are drawn dashed, and reflexive edges are not drawn for simplicity.
Sketch of Proof. We show AsAgap ≤log m KC-Mc. The P-hardness of KC-Mc then follows from Lemma 2. For simplicity, we informally sketch the ideas for the reduction AsAgap ≤log m IPC-Mc. Given an AsAgap instance G, s, t where G has m slices, let (U, R) be its reflexive and pseudo-transitive closure. The valuation function ξ is defined for variables t, a1 , . . . , am as follows. t holds exactly in the state t of the graph (ξ(t) = {t}), and the variables a1 , . . . , ai hold in slice i (for i = 1, 2, . . . , m) (ξ(ai ) = V≥i ) This yields the Kripke model MG = (U, R, ξ). Figure 4 shows a slice graph G with m = 4 slices and Kripke model MG = (U, R, ξ) that is transformed from it. The fat lines indicate that apath G (s, t) holds. The graph (U, R) is the reflexive and pseudo-transitive closure of G. The blue lines in Figure 4 are the pseudotransitive edges, the reflexive edges are not depicted. The valuation function ξ is defined so that variable t holds exactly in the state t of the graph, and additionally the variables a1 , . . . , ai hold in slice i (for i = 1, 2, . . . , m). The formulas ψ1 , . . . , ψm , ψm+1 are inductively defined as follows. 1. ψm+1 := t, and 2. ψj := ψj+1 → aj+1 for all j = m, m − 1, . . . , 1. Notice that ψi = (· · · ((t → am+1 ) → am ) → · · · → ai+2 ) → ai+1 . Therefore, ψi is satisfied in all slices where ai+1 is satisfied, i.e. the slices V≥i+1 . In slice Vi , ψi and ψi+1 behave like the mutual complement. Say that a state v is good, if apath G (v, t) holds, and otherwise it is bad. It turns out, that the good and the bad states can be distuingished using the formulas ψi as follows. Claim. For all i = 1, 2, . . . , m and all w ∈ Vi holds: 1. if i is odd: apath G (w, t) iff MG , w |=I ψi+1 , and 2. if i is even: apath G (w, t) iff MG , w |=I ψi . For our example, this means the following.
The Complexity of Model Checking for Intuitionistic Logics
slice(s): 4,3: 3,2: 2,1: 1:
in every good state holds: |=I t → a5 |=I (t → a5 ) → a4 |=I ((t → a5 ) → a4 ) → a3 |=I (((t → a5 ) → a4 ) → a3 ) → a2
155
in every bad state holds: |=I t → a5 |=I (t → a5 ) → a4 |=I ((t → a5 ) → a4 ) → a3 |=I . . .
Since G, s, t ∈ AsAgap iff s is a good state, it now follows that G, s, t ∈ AsAgap if and only if MG , s |=I ψ1 , i.e. ψ1 , MG , s ∈ IPC-Mc. By the simplicity of the construction it follows that AsAgap ≤log m IPC-Mc. In order to make this reduction work for KC-Mc, we add an additional topstate, to which every state is related and in which every variable is satisfied. It follows immediately from Lemma 1 that the model checking problem for S4.2— the modal companion of KC—is P-hard, too. In fact, we can improve the result and obtain P-hardness for the model checking problem for S41 —i.e. the fragment of S4 with formulas with one variable only. This result is optimal in the sense that the model checking problem for S40 is easy to solve. A formula without any variables is either satisfied by every model w.r.t. S4, or it is satisfied by no model. This is because 3 (resp. 2) is satisfied by every state in every model, and 3⊥ (resp. 2⊥) is satisfied by no state in every model. Essentially, the modal operators can be ingnored and the remaining formula can be evaluated like a classical propositional formula—this problem is in NC1 [5]. Theorem 3. S4.21 -Mc is P-hard—i.e. the model checking problem for S4.2 is P-hard, even if we consider modal formulas with one variable only. Sketch of Proof. We show that AsAgap ≤log m S4.21 -Mc. Since AsAgap is Phard (Lemma 2), the P-hardness of S4.21 -Mc follows. For space reasons, we informally sketch the ideas for the reduction AsAgap ≤log m S4.21 -Mc below. Let G, s, t be an instance of AsAgap with a slice graph G with m slices (m even). First, we define the valuation function so that a holds in all nodes in all even slices. In order to be able to distinguish the goal node t from the other nodes, t gets a successor t , and t is the only node in the new m + 1st slice. Finally, we add a slice Vm+2 with some nodes between which we also have edges. For all nodes in the other slices we add edges to all nodes in Vm+2 . By the choice of edges in Vm+2 there is now a node h that is the top node of this construction. We chose the valuation ξ for the nodes in Vm+2 in a way that a certain formula γ is satisfied in all states in Vm+2 , and in all other states it is not satisfied. For the remaining slices V1 , . . . , Vm+1 it holds that Vi ⊆ ξ(a) iff i is odd. Using this alternation of slices that satisfy a and that satisfy ¬a, we can estimate the slice to which a state belongs as follows using the inductively defined formulas δi (for i = 1, 2, . . . , m). 1. δm := 3(a ∧ ¬γ ) 2. for odd i, 1 ≤ i < m: δi := 3(¬a ∧ δi+1 ) for even i, 1 ≤ i < m: δi := 3(a ∧ δi+1 ) For x ∈ V≤m we now have that MG , x |=M δi iff x ∈ V≤i .
156
M. Mundhenk and F. Weiß
The goal state t is the only state in Vm that satisfies a ∧ 3(¬a ∧ γ ). Using the δi formulas to verify an upper bound for the slice of a state, we can now simulate the alternating graph accessibility problem by the following formulas. 1. λm := a ∧ 3(¬a ∧ ¬γ ) 2. for odd i < m: λi := ¬a ∧ 3(δi+1 ∧ λi+1 ) for even i < m: λi := a ∧ 2(δi+1 → λi+1 ) It follows that G, s, t ∈ AsAgap iff MG , s |=M λ1 , i.e. λ1 , MG , s ∈ S4.21 -Mc. Since the construction of MG and λ1 from G can be computed in logarithmic space, it follows that AsAgap ≤log m S4.21 -Mc. The reduction in the proof of Theorem 3 is not suitable for intuitionistic logics, since the constructed Kripke model lacks the monotonicity property of the variables. Moreover, in that proof we make extensive use of negation, that would have a very different meaning in intuitionistic logics. In Theorem 4 we show P-hardness of the model checking problem for the modal logic K4, even if we consider formulas without any variables. Theorem 4. K40 -Mc is P-hard. Sketch of Proof. The P-hardness of the model checking problem for the modal logic K40 can easily be obtained using the P-hardness of AsAgap from Lemma 2. The reduction from AsAgap to K4-Mc works as follows. Let G, s, t be an instance of AsAgap where G is a slice graph with m slices. Define MG = (U, R, ξ) as follows. – (U, R ) is the pseudo-transitive closure of G. – R := R ∪ {(v, v) | for every vertex v = t in the top slice Vm of G}. Informally spoken, the model MG is the pseudo-transitive closure of G and every state in the last slice except the state t has an edge to itself. We define ϕG as follows. – αi := 3 . . . 32⊥ with m − i 3s for i ∈ {2, . . . , m − 1}. – ϕm−1 := 32⊥ for odd i, m − 1 > i ≥ 1: ϕi := 3(αi+1 ∧ ϕi+1 ) for even i, m > i > 1: ϕi := 2(αi+1 → ϕi+1 ) – ϕG := ϕ1 Notice that 2⊥ is satisfied only in t because t is the only state without any successor. The subformula αi is satisfied in state w, if there is a path in G from w to t with m − i vertices. For this reason M, w |=M αi implies w ∈ V≤i . With a straightforward induction it can be shown that for all w ∈ V≤i holds: MG , w |=M ϕi iff apath G (w, t). Hence it follows that G, s, t ∈ AsAgap iff MG , s |=M ϕG .
The Complexity of Model Checking for Intuitionistic Logics
4
157
Upper Bounds
We give upper bounds for the complexity of the model checking problem for the logics under consideration. For S4, the model checking problem is in P [3]. By the properties of the G¨ odel-Tarski embedding of IPC into S4 (Lemma 1), the same upper bound follows immediately for IPC. The same holds for the more common fragments BPL and K4. Theorem 5. [3] The model checking problem for K4 and for BPL is in P. Consequently, the model checking problems for the superintuitionistic logics and their modal companions can also be solved in polynomial time. We now consider logics for which this goes even better. Theorem 6. The model checking problem for LC is in LOGDCFL. Proof. The idea is as follows. Let M = (U, , ξ) be an LC-model. This means that ξ is monotone and (U, ) is a total preorder. For simplicity of notation we assume that U = {1, 2, . . . , n} and orders these states in the intuitive way, namely 1 2 3 . . . n. Because of the monotonicity of intuitionistic logic, for every formula α there exists an iα ∈ {1, 2, . . . , n, n + 1} such that α is not satisfied in states 1, 2, . . . , iα − 1 and α is satisfied in states iα , iα + 1, . . . , n. If iα = n + 1, then α is not satisfied in states 1, 2, . . . , n. We define a function g that maps formulas to this value. This function can inductively be defined as follows. (1) g(⊥) = n + 1 (2) for atoms α = a: g(a) = min {i | i ∈ ξ(a)} ∪ {n + 1} (3) for α = β ∧ γ: g(β ∧ γ) = max(g(β), g(γ)) (4) for α = β ∨ γ: g(β ∨ γ) = min(g(β), g(γ)) g(γ), if g(β) < g(γ) (5) for α = β → γ: g(β → γ) = 1, otherwise In order to decide M, 1 |=I α we calculate g(α) and decide whether this value equals 1. The calculation of g(α) can be done by a depth first search through the formula that we consider here as a tree. The “leaves” of this tree are variables resp. ⊥. The g-values of these leaves can easily be computed in logarithmic space by inspecting the valuation function ξ. Every internal node of this tree represents a subformula of α. The g-value of each of these nodes can be computed using the g-values of its sons as described by the inductive definition of g above. Altogether, this search can be performed deterministically in polynomial time within logarithmic space and an additional stack. This shows that the model checking problem for LC is in LOGDCFL. The model checking problem for KC1 can be reduced to that of LC1 , and by Theorem 6 it also has LOGDCFL as upper bound. The reduction relies on algebraic properties of KC1 according to [14,15] and is left out here for space reasons.
158
M. Mundhenk and F. Weiß
Theorem 7. The model checking problem for KC1 is in LOGDCFL. We obtain the same upper bound for S5-Mc. Proposition 1. The model checking problem for S5 is in LOGDCFL. Sketch of Proof. Let ϕ, M, s be an instance of S5-Mc for M = (U, R, ξ). Then R is a total relation on U . Therefore, every subformula of ϕ that begins with a modal operator (i.e. a subformula of the form 2α or 3α) is either satisfied in all states of U or in no state of U . Now, ϕ can be evaluated as follows. First, evaluate the subformulas 2α and 3α, where α is a propositional formula without any modal operators. In order to do this, check whether α is satisfied in every resp. in one state of U . This can be done in logspace. Replace these evaluated subformulas in ϕ by the propositional constants according to their satisfaction and evaluate the resulting formula. This must be repeated until one obtained a propositional formula that can straightforwardly be evaluated in the actual state. This process can be implemented using a top down search through the formula, during which propositional formulas have to be evaluated in the states of U . The whole process then takes polynomial time, logarithmic space, and uses a stack for the top down search. This shows that S5-Mc can be solved in LOGDCFL. In Theorem 8 we show NC1 -completeness of the model checking problem for the modal logic S4, even if we consider formulas without any variables. We sketch a proof for the upperbound. The NC1 -hardness follows immediately from [5]. Theorem 8. The model checking problem for S40 is NC1 -complete. Sketch of Proof. Notice that the S4 frames are reflexive and transitive. It is not possible to distinguish differents states in a reflexive and transitive frame with a variable free formula. Hence S40 contains exactly all variable free formulas that can be satisfied by a reflexive and transitive Kripke model. For an S40 -Mc instance M, ϕ it suffices to check whether ϕ ∈ S40 . Because we can not distinguish differents states, modal operators can be ignored. We define the operator free version ϕof of the S40 formula ϕ as follows. – pof = p for p ∈ {⊥, } – (α → β)of = αof → βof – (3α)of = αof It holds for an arbitrary M that M, ϕ ∈ S40 -Mc iff ϕof evaluates to true. Hence from [5] follows directly that S40 -Mc is NC1 -complete.
5
Conclusion
The upper and lower bounds from the last sections (Theorems 2, 3, 4, and 5) combine to the following completeness results.
The Complexity of Model Checking for Intuitionistic Logics
159
Theorem 9. The following problems are P-complete. 1. K40 -Mc—i.e. the model checking problems for K4 and formulas without variables. 2. K41 -Mc, S41 -Mc, S4.21 -Mc—i.e. the model checking problems for K4, S4 resp. S4.2 and formulas with one variable only. 3. KC-Mc, IPC-Mc, BPL-Mc, S4.2-Mc, S4-Mc, K4-Mc—i.e. the model checking problems for KC, IPC, BPL, S4.2, S4, and K4. The one variable fragment IPC1 of IPC is already deeply studied (see [16]). Recently it was shown that model checking for IPC1 is AC1 -complete [10]. Our Phardness proof of model checking for IPC uses an arbitrary number of variables. Rybakov [17] has shown that the tautology problem for the two variable fragment IPC2 of IPC is already PSPACE-complete. This indicates that it is interesting to study whether model checking for IPC2 is already P-complete. O’Connor [18] gives a tautology-preserving translation from IPC formulas to those with two variables only. It is an open problem, whether such a translation to IPC1 exists. From Theorem 2 and Proposition 7 follows, that we can exclude this for model checking for KC. ≤log Theorem 10. KC-Mc m KC1 -Mc, unless P ⊆ LOGDCFL. The G¨ odel-Tarski translation from intuitionistic logic into S4 and the PSPACEhardness of the tautology problem for IPC brought up the question for a “translation” from S4 into intuitionistic logic. In fact, this translation is expressed in terms of a reduction in [19]. Our results on the P-hardness of the model checking problem for S4.2 for formulas with one variable only (Theorem 3) and the contrasting LOGDCFL upper bound for KC1 (Proposition 7) shows that those translations cannot omit the use of additional variables (unless P ⊆ LOGDCFL). ≤log Theorem 11. S4.21 -Mc m KC1 -Mc, unless P ⊆ LOGDCFL. At all, the LOGDCFL upper bounds for the model checking for LC, KC1 , and S5 are not really satisfactory. A LOGDCFL computation (polynomial time and logarithmic space with an additional stack) allows to explore a formula in a top down manner. This seems to be a very natural way to evaluate a formula. It is very surprising, that for classical propositional logic the stack is not needed [4,5]. We conjecture that this is also possible for S5, and Proposition 1 could accordingly be improved. For KC1 , one can conclude from [14,15] that there are only 7 equivalence classes of formulas, and only 3 types of models–all states of the model satisfy a, no state satisfies a, resp. all others. The third type is the type that makes the difference to classical propositional logic. Nevertheless, we expect that the LOGDCFL upper bound for KC1 (Proposition 7) can be improved. Notice that the logics KC1 and LC1 are the same. In [15] it is shown that S4.31 —their modal companion—has infinitely many equivalence classes of formulas. Therefore it seems possible to find a lower bound for model checking for S4.31 that is above the upper bound for KC1 and LC1 .
160
M. Mundhenk and F. Weiß
Acknowledgements. The authors thank Steve Awodey for his introduction to intuitionistic logic and many helpful discussions, Matthias Kramer for discussing predecessors of the proofs of Theorems 2 and 6, Vitezslav Svejdar for helpful discussions about intuitionistic logic, and Thomas Schneider for his support. The authors specially thank an anonymous referee for her/his idea to improve Theorem 3 by saving one variable.
References 1. van Dalen, D.: Logic and Structure, 4th edn. Springer, Heidelberg (2004) 2. Chagrov, A., Zakharyaschev, M.: Modal Logic. Clarendon Press, Oxford (1997) 3. Fischer, M.J., Ladner, R.E.: Propositional dynamic logic of regular programs. J. Comput. Syst. Sci. 18(2), 194–211 (1979) 4. Lynch, N.A.: Log space recognition and translation of parenthesis languages. J. ACM 24(4), 583–590 (1977) 5. Buss, S.R.: The Boolean formula value problem is in ALOGTIME. In: Proc. 19th STOC, pp. 123–131. ACM Press, New York (1987) 6. Dummett, M., Lemmon, E.: Modal logics between S4 and S5. Zeitschrift f¨ ur Mathematische Logik und Grundlagen der Mathematik 14(24), 250–264 (1959) 7. Visser, A.: A propositional logic with explicit fixed points. Studia Logica 40, 155– 175 (1980) 8. Ladner, R.: The computational complexity of provability in systems of modal propositional logic. SIAM Journal on Computing 6(3), 467–480 (1977) 9. Spaan, E.: Complexity of Modal Logics. PhD thesis, Department of Mathematics and Computer Science. University of Amsterdam (1993) 10. Mundhenk, M., Weiß, F.: The model checking problem for intuitionistic logic with one variable is AC1 -complete (2010) (unpublished manuscript) 11. Chandra, A.K., Kozen, D., Stockmeyer, L.J.: Alternation. Journal of the Association for Computing Machinery 28, 114–133 (1981) 12. Greenlaw, R., Hoover, H.J., Ruzzo, W.L.: Limits to Parallel Computation: PCompleteness Theory. Oxford University Press, New York (1995) 13. Papadimitriou, C.H.: Computational Complexity. Addison-Wesley, Reading (1994) 14. Nishimura, I.: On formulas of one variable in intuitionistic propositional calculus. J. of Symbolic Logic 25, 327–331 (1960) 15. Makinson, D.: There are infinitely many Diodorean modal functions. J. of Symbolic Logic 31(3), 406–408 (1966) 16. Gabbay, D.M.: Semantical investigations in Heyting’s intuitionistic logic. D.Reidel, Dordrecht (1981) 17. Rybakov, M.N.: Complexity of intuitionistic and Visser’s basic and formal logics in finitely many variables. In: Papers from the 6th conference on “Advances in Modal Logic”, pp. 393–411. College Publications (2006) 18. O’Connor, M.: Embeddings into free Heyting algebras and translations into intuitionistic propositional logic. In: Artemov, S., Nerode, A. (eds.) LFCS 2007. LNCS, vol. 4514, pp. 437–448. Springer, Heidelberg (2007) 19. Fernandez, D.: A polynomial translation of S4 into intuitionistic logic. J. of Symbolic Logic 71(3), 989–1001 (2005)
Depth Boundedness in Multiset Rewriting Systems with Name Binding Fernando Rosa-Velardo Dpto. de Sistemas Inform´ aticos y Computaci´ on Universidad Complutense de Madrid [email protected]
Abstract. In this paper we consider ν-MSR, a formalism that combines the two main existing approaches for multiset rewriting, namely MSR and CMRS. In ν-MSR we rewrite multisets of atomic formulae, in which some names may be restricted. ν-MSR are Turing complete. In particular, a very straightforward encoding of π-calculus process can be done. Moreover, pν-PN, an extension of Petri nets in which tokens are tuples of pure names, are equivalent to ν-MSR. We know that the monadic subclass of ν-MSR is a Well Structured Transition System. Here we prove that depth-bounded ν-MSR, that is, ν-MSR systems for which the interdependance of names is bounded, are also Well Structured, by following the analogous steps to those followed by R. Meyer in the case of the π-calculus. As a corollary, also depth-bounded pν-PN are WSTS, so that coverability is decidable for them.
1
Introduction
In [16] we revised multiset rewriting with name binding, by combining the two main existing approaches to the study of concurrency by means of multiset rewriting: multiset rewriting with existential quantification and constrained multiset rewriting. The paper [6] presents a meta-notation for the specification and analysis of security protocols. This meta-notation involves facts and transitions, where facts are first-order atomic formulae and transitions are given by means of rewriting rules, with a precondition and a postcondition. For instance, A0 (k), Ann(k ) → ∃x.(A1 (k, x), N (enc(k , x, k)), Ann(k )) specifies the first rule of the Needham-Schroeder protocol. This notation gave rise to the specification language for security protocols MSR [5]. In [8] Constraint Multiset Rewriting Systems (CMRS) are defined. As in [6], facts are first-order atomic formulae, but the terms that can appear as part of such formulae must belong to a constraint system. For instance, the rule count(x), visit → count(x + 1), enter(x + 1) could be used to count the number of visits to a web site. For a comprehensive survey of CMRS see [9]. In CMRS, there is no mechanism for name binding or name creation, so that it has to be
Research supported by the MEC Spanish project DESAFIOS10 TIN2009-14599C03-01, and Comunidad de Madrid program PROMETIDOS S2009/TIC-1465.
A. Kuˇ cera and I. Potapov (Eds.): RP 2010, LNCS 6227, pp. 161–175, 2010. c Springer-Verlag Berlin Heidelberg 2010
162
Buyer
F. Rosa-Velardo
•
ν
a
k
o
ν
p
p
a
a
k p
(o, p) k
o Seller
•
a
p
(o, ν) o
o
a
ν
k
p
a
k
Fig. 1. Interconnection model between a buyer and a seller
simulated using the order in the constraint system (e.g., simulating the creation of a fresh name by taking a value greater than any of the values appeared so far). Thus, in an unordered version of CMRS, in which only the equality predicate between atoms is used, there is no way of ensuring that a name is fresh. In [16] we combined the features in MSR and CMRS, obtaining ν-MSR. On the one hand, we maintain the existential quantifications in [6] to keep a compositional approach, closer to that followed in process algebrae with name binding. On the other hand, we restrict terms in atomic formulae to be pure names, that can only be compared with equality or inequality, unlike the arbitrary terms over some syntax, as in [6], or terms in a constraint system, as in CMRS. We know [16] that ν-MSR is equivalent to pν-PN [18], an extension of Petri nets in which tokens are tuples of pure names, that can only be compared with each other by equality or inequality by using matching variables in the arcs of the net. pν-PN are Turing complete [18], so that so are ν-MSR. Moreover, the subclass of monadic ν-MSR is equivalent to ν-PN [17], the monadic version of pν-PN, for which tokens are just pure names. In [19] we proved that ν-PN are strictly Well Structured Transition Systems (WSTS) [10,2], which allows us to conclude that so are monadic ν-MSR, and coverability, boundedness and termination are decidable for them. However, reachability is still undecidable for ν-PN [19]. Finally, processes of the π-calculus [20] can be simulated within ν-MSR in a very natural way. This translation was inspired by the results about structural stationary π-calculus processes [14], that can be mapped to P/T nets. Though ν-PN have better decidability properties than pν-PN, some works need to use the model of pν-PN to model things like instance isolation in architectures with multiple concurrent conversations [7] or transactions in data bases [21]. The example in Fig. 1 is taken from [7]. The subnet inside the dashed line in the top represents buyer processes, and the subnet inside the dashed line in the bottom represents seller processes. The places outside the dashed lines are interface processes used for communication purposes. Thus, it is interesting to find subclasses of pν-PN in which some interesting properties are decidable. In the field of process algebra, there are many recent works that look for subclasses of the π-calculus for which some properties, such as termination, are decidable [4,14,13,15,3]. In this paper we will consider the results in [13] about depth-bounded π-calculus processes. Depth-boundedness is
Depth Boundedness in Multiset Rewriting Systems with Name Binding
163
a semantic restriction on π-calculus proceesses. Intuitively, a process is depthbounded whenever the interdependance of names is bounded in any process reachable from it. As a simple example, and assuming that the reader is familiar with the following π-calculus syntax, if starting from some process P the processes νa1 . . . . .an .(a1 a2 | a2 a3 | · · · | ai ai+1 | · · · | an−1 an ) | Qn are reachable for every n > 0, then P is a depth-unbounded process. However, the fact that processes νa.a1 . . . . .an .(aa1 | aa2 | · · · | aai | · · · | aan ) | Qn can be reached from P for every n does not allow us to conclude that P is depth-unbounded, since though an unbounded number of names can appear in reachable processes, those names do not depend one another, as happened in the previous example. Meyer proved in [13] that depth-bounded processes are WSTS. In this paper we adapt those results to ν-MSR. More precisely, we will consider depth-bounded ν-MSR, that is, ν-MSR for which the interdependance of bound names is bounded in every reachable term. We will prove that this subclass of ν-MSR is well structured by following the same steps followed in [13]. As a corollary, we obtain the analogous result not only for the π-calculus (which is already known) but also for pν-PN. The rest of the paper is organized as follows. In Section 2 we introduce some basic definitions and notations we will use in the rest of the paper. Section 3 defines ν-MSR. Section 4 contains our main results: well structuredness of depthbounded ν-MSR, which implies decidability of coverability for pν-PN. Finally, Section 5 presents our conclusions and some directions for future work.
2
Preliminaries
We denote by MS(A) the set of finite multisets over A. A quasi order in A is a reflexive and transitive binary relation on A. Every quasi order ≤ defined in A induces a quasi order in MS(A), given by {a1 , . . . , an } {b1 , . . . , bm } if there is some h : {1, . . . , n} → {1, . . . , m} injective st ai ≤ bh(i) for all i ∈ {1, . . . , n}. A quasi order ≤ is said to be a well-quasi order (wqo) if for every infinite sequence s0 , s1 , . . . there are i and j, with i < j, st si ≤ sj . Equivalently, it is a wqo if every infinite sequence has an increasing subsequence. It is a well known fact that the multiset order induced by a wqo ≤ is also a wqo. The set T (A) of trees over (A, ≤) is defined by T ::= a | (a, {T1 , . . . , Tn }) where a ranges over A. We define the order relating trees by a a if a ≤ a , and (a, A) (a , A ) if a ≤ a and A A , where is the multiset order induced by . The mapping height (T ) is defined as height (a) = 0 and height (a, {T1 , . . . , Tn }) = 1 + max{height (Ti ) | i = 1, . . . , n}. If we denote by T (A)n the set of trees of height less or equal than n, then (T (A)n , ) is a wqo provided (A, ≤) is a wqo. A hypergraph is a tuple G = (V, E, inc), where V is the set of vertices, E is the set of edges and for each e ∈ E, inc(e) is the set of vertices that incide in e. There is an arc between v ∈ V and e ∈ E whenever v ∈ inc(e).
164
F. Rosa-Velardo
A transition system is a tuple (S, →, s0 ), where S is a (possibly infinite) set of states, s0 ∈ S is the initial state and →⊆ S × S. We denote by →∗ the reflexive and transitive closure of →. The reachability problem in a transition system consists in deciding for a given states sf whether s0 →∗ sf . The termination problem consists in deciding whether there is an infinite trace s0 → s1 → s2 → · · · . The boundedness problem consists in deciding whether the set of reachable states is finite. For any transition system (S, →, s0 ) endowed with a quasi order ≤ we can define the coverability problem, that consists in deciding, given a state sf , whether there is s ∈ S reachable st sf ≤ s. A Well Structured Transition System (WSTS) is a tuple (S, →, ≤), where (S, →) is a transition system, ≤ is a decidable wqo compatible1 with → (meaning that s1 ≥ s1 → s2 implies that there is s2 ≥ s2 with s1 → s2 ), and so that for every s we can compute (a finite representation of) the set {s | s → s ≥ s}. We will refer to these properties as monotonicity of → with respect to ≤, and computability of the set of predecessors, respectively.2 For WSTS, the coverability and the termination problem are decidable [2,10]. A WSTS is said to be strict if it satisfies the following strict compatibility condition: s1 > s1 → s2 implies that there is s2 > s2 with s1 → s2 . For strict WSTS, also the boundedness problem is decidable [10].
3
ν-MSR
We fix a finite set of predicate symbols P, a denumerable set Id of names and a denumerable set Var of variables. We use a, b, c, . . . to range over Id , x, y, . . . to range over Var , and η, η . . . to range over Id ∪ Var. An atomic formula over P and Var has the form p(η1 , . . . , ηn ), where p ∈ P and ηi ∈ Var ∪ Id for all i. A ground atomic formula has the form p(a1 , . . . , an ), where p ∈ P and ai ∈ Id for all i. We use X, Y, . . . to range over atomic formulae and A, B, . . . to range over atomic ground formulae. We denote by Var(X) and Id (X) the set of variables and names appearing in X, respectively. We will write x ˜ and a ˜ to denote finite sequences of variables and names, respectively, so that we will sometimes write p(˜ x) or p(˜ a). We sometimes use set notation with these sequences and write, for instance, x ∈ x ˜ or x˜1 ∪ x ˜2 . A ν-MSR term is given by the following grammar: M ::= 0 A M1 + M2 νa.M We denote M the set of ν-MSR terms, ranged over by M , M . . . We define f n(M ) the set of free names in M as follows: f n(0) = ∅, f n(A) = Id (A), f n(M1 + M2 ) = f n(M1 ) ∪ f n(M2 ), and f n(νa.M ) = f n(M ) \ {a}. A rule t is an expression of the form t : X1 + . . . + Xn → ν˜ a.(Y1 + . . . + Ym ) 1 2
Different compatibility conditions are discussed in [10]. Strictly speaking, decidability of the wqo and computability of the set of predecessors, are not part of the definition of WSTS, but of the so called effective WSTS. These properties are needed to ensure decidability of coverability and termination.
Depth Boundedness in Multiset Rewriting Systems with Name Binding
165
st post (t) ⊆ pre(t), where pre(t) = ni=1 Var(Xi ), post (t) = m j=1 Var(Ym ), and Var (t) = pre(t) ∪ post (t). A ν-MSR is a pair R, M0 , where M0 is the initial ν-MSR term and R is a finite set of rules. Sometimes in examples, we will use commas instead of the symbol +. For instance, we will write p(x, y), q(y, y) → νa.q(x, a) instead of p(x, y) + q(y, y) → νa.q(x, a). We will identify ν-MSR terms up to ≡, the least congruence on M where α-conversion of bound names is allowed, st (M, +, 0) is a commutative monoid and: νa.νb.M ≡ νb.νa.M νa.0 ≡ 0 νa.(M1 + M2 ) ≡ νa.M1 + M2 if a ∈ / f n(M2 ) The first rule justifies our notation ν˜ a.M . The last rule is called name extrusion when applied from right to left. A mode for t : X1 +. . .+Xn → ν˜ a.(Y1 +. . .+Ym ) is any substitution σ : Var (t) → Id . We write pre t (σ) = σ(X1 ) + . . . + σ(Xn ), where σ(p(η1 , . . . , ηn )) = p(a1 , . . . , an ), with ai = σ(xi ) if ηi ∈ Var , or ai = ηi if ηi ∈ Id . To define post t (σ) we consider a sequence of pairwise different names ˜b a/˜b} (of the same length as a ˜) with σ(Var (t)) ∩ ˜b = ∅. Then, we take σ = σ ◦ {˜ ˜ ˜ and post t (σ) = ν b.(σ (Y1 ) + . . .+ σ (Ym )), where {˜ a/b} denotes the simultaneous substitution of each ai ∈ a ˜ by the corresponding bi ∈ ˜b. The transition system (M, →, M0 ), is given by (t)
M1 ≡ M1 −→ M2 ≡ M2
t
M1 −→ M2
pre t (σ) −→ post t (σ)
(+)
t
σ mode for t
t
M1 −→ M2
t
M1 −→ M2
t
νa.M1 −→ νa.M2
M1 + M −→ M2 + M
(≡)
t t
(ν)
Rules (+) and (ν) state that transitions can happen inside a sum or inside a restriction, respectively. Rule (≡) is also standard, and formalizes that we are rewriting terms modulo ≡. Then we have a rule schema (t) for each t ∈ R. We t will write M → M if there is t ∈ R such that M −→ M . As an example, let t : p(x), q(x) → νb.p(b) be a rule in R. The rewriting p(a), q(a) → νb.p(b) can take place by taking σ(x) = a, which satisfies the conditions for modes and pre t (σ) = p(a), q(a) and post t (σ) = νb.p(b). In order to apply the rule t starting from p(b), q(b) we need to rename b in the right handside of the rule, obtaining (e.g. if we replace b by a) νa.p(a). We denote by → ≡ the transition relation obtained by considering only rules (t), (+) and (ν) above (that is, without (≡)). As in the π-calculus [20], we can consider several normal forms, that force a certain rearrangement of bound names. M is said to be in standard normal form if M = ν˜ a.(A1 + . . . + An ). Every term is equivalent to some term in standard form, that can be obtained by applying the extrusion rule and α-conversion as much as necessary. The standard form is unique up to commutativity and associativity of +, and α-conversion and commutativity of the names in a ˜. Moreover, the transition relation is compatible with respect to the standard form, that is, t t if M1 ≡ M2 , M2 is in standard form and M1 −→ ≡ then M2 −→ ≡ [16, Prop. 1].
166
F. Rosa-Velardo
Let us now define restricted normal forms, which will help us to characterize depth-bounded terms. A term is in restricted form if the scope of its restrictions is minimal, that is, if all its subterms νa.(A1 + . . . + Am ) satisfy a ∈ f n(Ai ) for all i, so that no extrusion rule can be applied from left to right. Therefore, restricted forms can be seen as the opposite concept to standard forms. as the least congruence on M st + is commutative Definition 1. Let us define ≡ and associative with 0 as identity, and as the least binary relation on M st: a∈ / f n(M2 ) νa.(M1 + M2 ) νa.M1 + M2 M1 M2 M1 + M M2 + M
1 M2 ≡M 2 M1 ≡M M1 M2 M1 M2 νa.M1 νa.M2
M is in restricted form if there is no M with M M . We say a term M in restricted form is a fragment if it cannot be decomposed as M = M1 + M2 . Any M in restricted form satisfies M = F1 + . . . + Fn with Fi fragments, and any fragment is either an atomic formula or a term of the form νa.(F1 + . . . + Fm ), with Fi fragments st a ∈ f n(Fi ), for all i. For instance, M = νa.νa1 . . . . .νan .(p(a, a1 ), . . . , p(a, an )) F = νa.(νa1 .p(a, a1 ), . . . , νan .p(a, an )). Notice that F and each νai .p(a, ai ) are fragments. The relation is confluent, Moreover, if M M then M ≡ M . Unlike the standard normal up to ≡. form, the restricted normal form is not compatible with the transition relation. For instance, for M and F above, the rule t : p(x, y1 ), p(x, y2 ) → q(x) satisfies t t M −→ ≡ but not F −→ ≡ . However, restricted normal forms give more insight about the topology of pure names in terms. In particular, they are the basis of the proof that depth-bounded ν-MSR terms yield WSTS.
4
Depth-Bounded ν-MSR
We now consider depth-bounded ν-MSR. Intuitively, a ν-MSR is depth-bounded if names cannot appear linked in an arbitrarily long way. Thus, if every term of the form νa1 , . . . , νan .(p(a1 , a2 ), p(a2 , a3 ), . . . , p(an−1 , an )) can be reached, then the ν-MSR is not depth-bounded. However, reaching all terms of the form νa1 , . . . , νan , νa.(p(a, a1 ), . . . , p(a, an )) does not allow us to conclude that the νMSR is depth-unbounded. In order to define depth-bounded ν-MSR, we define a function nest ν , that measures the nesting of restrictions (occurrences of the operator ν) in a term. Definition 2. We define nest ν (M ) by structural induction on M : – nest ν (A) = nest ν (0) = 0, – nest ν (M1 + M2 ) = max(nest ν (M1 ), nest ν (M2 )), – nest ν (νa.M ) = 1 + nest ν (M ).
Depth Boundedness in Multiset Rewriting Systems with Name Binding
167
We take depth(M ) = min{nest ν (M ) | M ≡ M }. A ν-MSR is k-bounded if depth(M ) ≤ k for any reachable M , and depth-bounded if it is k-bounded for some k ≥ 0. As explained in [13], depth measures the interdependence of restricted names. The fragment F = νa1 . . . . .νan .νa.(p(a, a1 ), . . . , p(a, an )), satisfies nestν (F ) = n+1 and is equivalent to F = νa.(νa1 .p(a, a1 ), . . . , νan .p(a, an )), which satisfies nestν (F ) = 2. In fact, it can be easily checked that depth(F ) = 2. then nestν (F ) = nestν (G). Lemma 1. If F ≡G Proof. Obvious. However, it would We said above that the relation is confluent up to ≡. also allowed reordering of bound names. not be confluent if the congruence ≡ For instance, the term M = νa.νa1 . . . . .νan .(p(a, a1 ), . . . , p(a, an )) satisfies M νa.(νa1 .p(a, a1 ), . . . , νan .p(a, an )), but F = νa1 . . . . .νan .νa.(p(a, a1 ), . . . , p(a, an )) is in restricted normal form, that is, F . Therefore, restricted forms are not enough to characterize the interdependance of bound names. Next, we will define for each fragment F another fragment that we denote by nf (F ) equivalent to F with respect to ≡. Intuitively, nf (F ) is a representation of F that gives a better insight about the interdependance of names in F . In order to obtain nf (F ) from F we rearrange the whole set of bound names in F . For that purpose, as the first step we will consider the standard normal form of F . Then, we will split the set of bound names in those that appear both in A1 and outside A1 , those that appear only in A1 and the rest of names. Definition 3. Given a fragment F , we define nf (F ) in the following steps: 1. Let F ≡ ν˜ a.(A1 + . . . + An ) in standard form. 2. Split a ˜ into a ˜1 , a ˜2 and a ˜3 so that a ˜1 = f n(A1 ) ∩ f n(A2 + . . . + An ), a ˜2 = f n(A1 ) \ f n(A2 + . . . + An ) and a ˜3 = f n(A2 + . . . + An ) \ f n(A1 ). Then, F ≡ ν˜ a1 .(ν˜ a2 .A1 + ν˜ a3 .(A2 + . . . + An )). 3. Let ν˜ a3 .(A2 + . . . + Am ) G1 + . . . + Gm in restricted form. 4. Compute nf (Gi ) = Fi 5. Let ν˜ a1 .(ν˜ a2 .A1 + F1 + . . . + Fm ) nf (F ). Whenever F ≡ ν˜ a.A (that is, whenever n = 1 above) then nf (F ) is again F . Let us see how the procedure works (with n > 1) in the following example. Example 1. Let F = νa1 . . . . νan .νa.(p(a, a1 ), . . . , p(a, an )). 1. F is already in standard normal form. ˜1 = a, a ˜2 = a1 and a ˜3 = {a2 , . . . , an }, so that 2. We split {a, a1 , . . . , an } into a F ≡ νa.(νa1 .p(a, a1 ) + ν˜ a3 .(p(a, a2 ) + . . . + p(a, an )). 3. Let ν˜ a3 .(p(a, a2 ) + . . . + p(a, an )) νa2 .p(a, a2 ) + . . . + νan .p(a, an ). 4. As we have said above, nf (νai .p(a, ai )) = νai .p(a, ai ). 5. We obtain F = νa.(νa1 .p(a, a1 ) + νa2 .p(a, a2 ) + . . . + νan .p(a, an )), which is already is restricted form, so that F is nf (F ).
168
F. Rosa-Velardo a
p(a, a1 )
······
p(a, ai )
······
p(a, an )
a1
······
ai
······
an
Fig. 2. Hypergraph of the fragments in Example 1
a1
p(a1 , a2 )
a2
p(a2 , a3 )
a3
p(a3 , a4 )
a4
Fig. 3. Hypergraph of the fragments in Example 2
In this example we start building nf (F ) starting from the standard normal form νa1 . . . . νan .νa.(p(a, a1 ), . . . , p(a, an )), but we could have also started from another term in standard normal form in which the atomic formulae p(a, ai ) were given in a different order. It can be checked that in this particular example such order does not make any difference, so that the nf (F ) obtained is always the However, in general this is not the case, that is, the fragment same (up to ≡). nf (F ) defined above is not unique. More precisely, it is only unique (up to ≡), after fixing a given representative of the standard normal form. The fragments of the form nf (F ) are called anchored fragments in [13]. Such fragments have an anchor, an atomic formula that contains the names that are bound. More precisely, in an anchored fragment νa.(F1 + . . . + Fm ) it is not only true that a is free in each Fi , but it is also free in the anchor of Fi . This fact will be used in Lemma 4 to characterize the nesting of restrictions in such fragments. Example 2. Let F1 = νa1 , a2 , a3 , a4 .(p(a1 , a2 ), p(a2 , a3 ), p(a3 , a4 )) F2 = νa1 , a2 , a3 , a4 .(p(a2 , a3 ), p(a1 , a2 ), p(a3 , a4 )) If we compute be two fragments in standard normal form, equivalent up to ≡. nf (Fi ) starting from Fi for i = 1, 2, then it can be checked that nf (F1 ) = νa2 .(νa1 .p(a1 , a2 ) + νa3 .(p(a2 , a3 ) + νa4 .p(a3 , a4 ))) nf (F2 ) = νa2 , a3 .(p(a2 , a3 ) + νa1 .p(a1 , a2 ) + νa4 .p(a3 , a4 )) Therefore, as we have seen in the previous example, nf (F ) is not uniquely determined. Let us now prove some results that we will need to prove well structuredness of depth-bounded ν-MSR. Lemma 2. The two following facts hold: – F ≡ nf (F ) for every fragment F . (F2 ). – If F1 ≡ F2 then there are nf (F1 ) and nf (F2 ) st nf (F1 )≡nf
Depth Boundedness in Multiset Rewriting Systems with Name Binding
169
Proof. F ≡ nf (F ) because all the steps in Def. 3 preserve ≡. Let us now take F1 . and F2 st F1 ≡ F2 . In that case, there is F in standard normal form st Fi ≡F (F2 ). Then, steps 2 to 5 in Def. 3 coincide for F1 and F2 , so that nf (F1 )≡nf As in [13], we use the graph-theoretic interpretation of fragments. A fragment can be seen as a hypergraph with its atomic formulae as nodes and its names as arcs, that link all the formulae that contain that name. Definition 4. For a term M ≡ ν˜ a.(A1 + . . . + Am ) we define the hypergraph G(M ) = (V, E, inc), where V = {A1 , . . . , Am }, E = a ˜ and for e ∈ E, inc(e) is the set of atomic formulae in V in which e occurs. Fragments correspond to connected components. M1 ≡ M2 implies that G(M1 ) and G(M2 ) are isomorphic hypergraphs. For the two fragments F = νa1 . . . νan νa.(p(a, a1 ), . . . , p(a, an )) and F = νa.(νa1 .p(a, a1 ), . . . , νan (p(a, an ))) seen in Example 1, since F ≡ F the hypergraphs obtained for them are isomorphic (see Fig. 2). The ones in Example 2 are shown in Fig. 3. A path is a finite sequence ρ = A1 a1 A2 a2 · · · an An+1 with Ai , Ai+1 ∈ inc(ai ). The length of ρ is |ρ| = n, and ρ is simple whenever ai = aj for i = j. A simple path in the hypergraph in Fig. 2 is for instance ρ1 = p(a, a1 ) a1 p(a, a1 ) a p(a, a2 ) a2 p(a, a2 ) with length 3. Any attempt to extend that simple path results in a path that is no longer simple (since a and a2 alread occur in it). Indeed, it can be checked that the length of every single path is at most 3. In the case of the hypergraph in Fig. 3 the longest simple path, with length 4, is ρ2 = p(a1 , a2 ) a1 p(a1 , a2 ) a2 p(a2 , a3 ) a3 p(a3 , a4 ) a4 p(a3 , a4 ) In the first place, let us see that the length of any simple path in G(F ) is bounded by a value that depends only on nestν (F ). Lemma 3. If F is a fragment and ρ is a simple path in G(F ), then |ρ| ≤ 2nestν (F ) − 1. Proof. We prove it by structural induction on F . If F = A then p = A, so that nestν (F ) = 0 and |p| = 0 = 2nestν (F ) − 1. Let now F = νa.(F1 + . . . + Fn ) and let p be a simple path in G(F ). Then one of the following holds: – p is a simple path in G(Fk ) for some k, so that the hypothesis induction tells us that |p| ≤ 2nestν (Fk ) − 1 ≤ 2nestν (F ) − 1, or – p = pi apj with pi simple path of G(Fi ) and pj simple path in G(Fj ), so that by induction we know that |pi | ≤ 2nestν (Fi ) − 1 and |pi | ≤ 2nestν (Fi ) − 1. Let m st nestν (Fm ) = max{nestν (Fl ) | l = 1, . . . , n}. Then, |p| = |pi |+|pj |+1 ≤ 2nestν (Fi ) − 1 + 2nestν (Fj ) − 1 + 1 ≤ 2 · 2nestν (Fm ) − 1 = 2nestν (Fm )+1 − 1 = 2nestν (F ) − 1.
170
F. Rosa-Velardo
For the fragments F and F = nf (F ) in Example 1 we saw that nestν (F ) = n+1 and nestν (F ) = 2. A maximal simple path in G(F ) is ρ1 , with |ρ1 | = 3, which satisfies |ρ1 | ≤ 2nestν (F ) − 1 = 3. In the case of the fragments seen in Example 2, both nestν (F1 ) = nestν (F2 ) = 4 and nestν (nf (F1 )) = nestν (nf (F2 )) = 3. Moreover, the length of the maximal simple path ρ2 is |ρ2 | = 4, wich satisfies |ρ2 | ≤ 23 − 1. Next let us see that nestν (nf (F )) coincides with the length of some simple path in G(F ). Lemma 4. nestν (nf (F )) = |ρ| for some simple path ρ in G(F ). Proof. Any nf (F ) is of the form A or νa.(F1 + . . . + Fn ) with a ∈ anc(Fi ) for all i, where anc(A) = A and anc(νa.(F1 + . . . + Fn )) = anc(F1 ). Therefore, it is enough to see that for such a fragment F , there is a simple path p in G(F ) with nestν (F ) = |p|. We will also prove that p starts in anc(F ). We proceed by induction on F . If F = A the path p = A starts in anc(A) = A and satisfies nestν (A) = 0 = |p|. Let us now consider F = νa.(F1 + . . . + Fn ). By definition, nestν (F ) = 1 + max{nestν (F1 ), . . . , nestν (Fn )}. The induction hypothesis tells us that there are anchored paths p1 , . . . , pn st pi starts at anc(Fi ) and nestν (Fi ) = |pi |. Let pm the path with maximun length, so that nestν (F ) = 1 + |pm |. Since pm starts in anc(Fm ) and a is free both in anc(F1 ) and anc(Fm ), p = anc(F1 )apm is an anchored path that starts in anc(F ) = anc(F1 ) with length |p| = 1 + |pm | = nestν (F ). The proof of the previous result builds a path in G(F ), whose arcs correspond to the bound names traversed when computing nestν (nf (F )). For instance, for the fragment F in Example 1 and nf (F ) = νa.(νa1 .p(a, a1 ), . . . , νan .p(a, an )) the proof builds the path p(a, a1 ) a p(a, ai ) ai p(a, ai ) (i > 1), with length 2 = nestν (nf (F )). For the fragments F1 and F2 in Example 2 it builds p(a2 , a3 ) a2 p(a2 , a3 ) a3 p(a3 , a4 ) a4 p(a3 , a4 ) p(a2 , a3 ) a2 p(a2 , a3 ) a3 p(a1 , a2 ) a1 p(a1 , a2 ) respectively, both having length 3. The previous results can be combined to prove the following proposition. Proposition 1. nest ν (nf (F )) ≤ 2depth(F ) − 1 Proof. Let G st F ≡ G and depth(F ) = nest ν (G). Since F ≡ G, the hypergraphs G(F ) and G(G) are isomorphic. By Lemma 4 there is a simple path p in G(F ) st nest ν (nf (F )) = |p|. By Lemma 3, |p| ≤ 2nest ν (G) − 1 = 2depth(F ) − 1, and the thesis follow. We have obtained a bound on the nesting of restrictions in every fragment of the form nf (F ), that only depends on depth(F ). Next we define an order over terms, that will endow ν-MSR with a well-structure. Definition 5. We define F as the least binary relation over fragments st A F A, νa.( ni=1 Fi ) F νa.( ni=1 Gi + ni=1 Gi ) provided Fi F Gi for all i ∈ {1, . . . , n}, F F G provided F ≡ F F G ≡ G. We also define M1 M2 nand i if Mi ≡ j=1 Fji , n1 ≤ n2 and Fi1 F Fi2 for i ∈ {1, . . . , n1 }.
Depth Boundedness in Multiset Rewriting Systems with Name Binding
171
a1 a
.. . a1 an a1
······
ai
······
an
p(a, a1 )
······
p(a, ai )
······
p(a, an )
a
p(a, a1 )
···
p(a, ai )
···
p(a, an )
Fig. 4. Trees of the fragments F (left) and F (right) in Example 1
The order over terms can be seen as the multiset order induced by F over fragments. In turn, F can be intuitively characterized using standard forms. Lemma 5. Given two fragments F and G, F F G holds if and only if F ≡ ν˜ a.(A1 + . . . + Am ) and G ≡ ν˜ a.(A1 + . . . + Am + M ). Proof. Let F and G st F F G. We proceed by induction on the rules used to derive F F G. For F = A F A = G it is trivial. Suppose now that F = νa.(F1 +. . .+Fn ) and G = νa.(G1 +. . .+Gn +G1 +. . .+Gm ) with Fi F Gi . ˜i .( Aij ) and Gi ≡ a ˜i .( Aij + Mi ). The induction hypothesis tells us that Fi ≡ a ˜n .( Aij ) and G ≡ νa, a ˜1 , . . . , a ˜n .( Aij + Gi + Mi ), Then, F ≡ νa, a ˜1 , . . . , a which satisfy the thesis. Finally, if F ≡ F F G ≡ G the induction hypothesis tells us that F ≡ ν˜ a.(A1 + . . . + Am ) and G ≡ ν˜ a.(A1 + . . . + Am + M ) and because ≡ is transitive, the same holds for F and G. Conversely, if F ≡ ν˜ a.(A1 + . . . + Am ) and G ≡ ν˜ a.(A1 + . . . + Am + M ), a.(A1 + . . . + Am ) F ν˜ a.(A1 + . . . + Am + M ) and trivially Ai F Ai , so that ν˜ we can conclude by rule (≡) that F F G. Let us see that depth-bounded ν-MSR are WSTS with respect to that order. In order to see that the order is a wqo, we map fragments to trees as follows. Definition 6. Let Δ be the set of names and atomic formulae. We define T that maps fragments to trees in T (Δ) as follows: – T (A) = A, – T (νa.(F1 + . . . + Fn )) = (a, {T (F1 ), . . . , T (Fn )}). Figure 4 and Fig. 5 show the trees corresponding to the fragments considered in Example 1 and Example 2, respectively. The following lemma is easy to prove. Lemma 6. nestν (F ) = height (T (F )) Proof. Clearly, nestν (A) = 0 = height (A) = height (T (A)). For a fragment F = νa.(F1 + . . . + Fn ), nestν (F ) = 1 + max{nestν (Fi ) | i = 1, . . . , n}. By the induction hypothesis, nestν (Fi ) = height (T (Fi )). Then, nestν (F ) = 1 + max{height (T (Fi )) | i = 1, . . . , n} = height ((a, {T (F1 ), . . . , T (Fn )})) = height (T (F )).
172
F. Rosa-Velardo
a1
a2
a2
a2 a1
a3
p(a1 , a2 )
a4
p(a1 , a2 )
p(a2 , a3 )
p(a3 , a4 )
a3
a3
a4
p(a2 , a3 )
p(a3 , a4 )
p(a2 , a3 )
a1
a4
p(a1 , a2 )
p(a3 , a4 )
Fig. 5. Trees of the fragments F1 (left), nf (F1 ) (center) and nf (F2 ) (right) in Example 2
Moreover, the corresponding orders are preserved by T in the following sense: Proposition 2. If T (F1 ) T (F2 ) then F1 F F2 . Proof. We proceed by induction on the rules used to derive T (F1 ) T (F2 ). If T (F1 ) = A A = T (F2 ) then F1 = F2 = A and trivially, F1 F F2 . Otherwise, T (F1 ) = (a, {T1 , . . . , Tn }) (a, {T1 , . . . , Tn }) = T (F2 ) and {T1 , . . . , Tn } {T1 , . . . , Tn }, so that we can assume without loss of generality that Ti Ti for all i ∈ {1, . . . , n}. Then, F1 = νa.(F11 + . . . + Fn1 ) with T (Fi1 ) = Ti , and F2 = νa.(F12 + . . . + Fn2 ) with T (Fi2 ) = Ti . The induction hypothesis tells us that Fi1 F Fi2 , which allows us to conclude that F1 F F2 . We denote by Fn the set of fragments with depth less or equal than n, i.e., Fn = {F fragment | depth(F ) ≤ n}, and analogously, we define Mn as the set of terms with depth less or equal than n. Then we can prove the following lemma. Lemma 7. (Fn , F ) and (Mn , ) are wqos. Proof. Let (Fi ) be an infinite sequence of fragments, and let us consider the sequence of trees (T (nf (Fi ))). Because every fragment is in Fn , height (T (nf (Fi ))) = nestν (nf (Fi )) ≤ 2nest(Fi ) − 1 ≤ m = 2n − 1, thanks to Lemma 6 and Prop. 1. If (T (Δ)m , ) is a wqo then there are i < j st T (nf (Fi )) T (nf (Fj )). By Prop. 2, nf (Fi ) F nf (Fj ). Finally, because Fi ≡ nf (Fi ) and Fj ≡ nf (Fj ) we can conclude that Fi F Fj . Indeed, (T (Δ)m , ) is a wqo. To see it it is enough to check that we can take Δ to be a finite set. Indeed, since fragments are depthbounded, we can choose a finite set of names so that every name in Δ and every name in a formula in Δ is taken from that set. (Mn , ) is also a wqo because F is and is the multiset order induced by F . The proof of the previous result makes use of the fact that the order in trees is a wqo. Therefore, if a ν-MSR is depth-bounded by n, then the set of reachable terms is contained in Mn , which is a wqo with its order. In order to see that they are a WSTS, we still have to see that the transition relation is monotonic with respect the considered order, and that we can compute a finite representation of the set of predecessors of a given term.
Depth Boundedness in Multiset Rewriting Systems with Name Binding (a, b)
x
(x, y)
k
a bc
y
ν
l
→
k
(x, y)
x a
c
y
173
a (d fresh)
ν
d
Fig. 6. A simple pν-PN
Theorem 1. Depth-bounded ν-MSR are strict WSTS. Proof. We have to see that the defined order is monotonic with respect the rewriting relation, and that we can compute a finite representation of the set of predecessors of a given term. The former follows from the compatibility of the transition relation with respect to the standard normal form and Lemma 5. The latter follows from the fact that ν-MSR are finitary, that is, for a given M there are finitely many terms M up to ≡ st M → M . Since coverability, termination and also boundedness are decidable for strict WSTS [10,2], we obtain the following result as a corollary. Corollary 1. Coverability, boundedness and termination are decidable for the class of depth-bounded ν-MSR. In [16] we proved that π-calculus processes can be directly encoded into ν-MSR. Moreover, depth-bounded π-calculus processes correspond to depth-bounded νMSR. Proposition 3 ([16]). For all π-calculus process P there is a ν-MSR H(P ) (with H computable) st the transition systems induced by P and H(P ) are isomorphic. Moreover, if P is a depth-bounded process then H(P ) is a depth-bounded νMSR. Then, thanks to the previous result, and as a corollary of Prop. 1 we can obtain the following result (that was already obtained in [13]). Corollary 2. Depth-bounded π-calculus processes are strict WSTS. Therefore, coverability, termination and boundedness are decidable for depth-bounded πcalculus processes. The novelty of our results lies in the fact that we can apply Prop. 1 to other formalisms that can be easily encoded within ν-MSR. This is the case for pν-PN. A pν-PN is a Petri net that manages tuples of pure names. More precisely, tokens in a pν-PN are of the form (a1 , . . . , an ), where each ai is a pure name [12], taken from a set Id . In order to handle names, arcs are labelled by tuples of variables, taken from a set Var . Moreover, transitions can create fresh names, which is formalized by means of a special variable ν ∈ Var , that can only be instantiated to names that do not occur in the current state. Fig. 6 depicts a simple pν-PN and the firing of its only transition. See [18] for more details on pν-PN. In [16] we proved that ν-MSR and pν-PN are actually the same thing, so that pν-PN can be seen as a graphical representation of ν-MSR that work in their standard normal form.
174
F. Rosa-Velardo
Proposition 4 ([16]). For every pν-PN N there is a ν-MSR K(N ) (with K computable) st the transition systems induced by N and K(N ) are isomorphic. We say a pν-PN is depth-bounded if there is k st for any reachable state M and for any sequence A1 , . . . , An of tokens in M st for every i, there is a different name ai in Ai and Ai+1 , then necessarily n ≤ k. Depth-bounded ν-PN correspond to depth-bounded ν-MSR [16]. Moreover, one can check that ordinary Petri nets correspond to 0-bounded ν-MSR and ν-PN (the monadic subclass of pν-PN) to 1-bounded ν-MSR. Corollary 3. Depth-bounded pν-PN are strict WSTS. Therefore, coverability, boundedness and termination are decidable for the class of depth-bounded pν-PN.
5
Conclusions and Future Work
In this paper we consider a variation of the existing formalisms of concurrency based on multiset rewriting, that we call ν-MSR. We proved in [16] that they are Turing-complete, so that no interesting problem can be decided for them. Now we adapt the results in [13] in order to prove that a subclass of ν-MSR, that in which the interdependance of restricted names is bounded, is a strict Well Structured Transition System. This yields decidability of coverability, termination and also boundedness. These results can be transferred to any formalism that can be encoded within ν-MSR. We know that π-calculus processes can be easily translated to a νMSR system, so that depth-bounded π-calculus processes are WSTS. This was already proved in [13]. However, we can also obtain as a corollary the strict well structuredness of depth-bounded pν-PN. Moreover, we claim that the same result holds for spi-calculus processes [1], with an encoding analogous to the one used for the π-calculus. We have seen that the class of depth-bounded ν-MSR has decidable coverability. However, in order to obtain such decidability result, one needs to know a priori a bound on the nesting of restrictions in every reachable state. The paper [22] establishes how the algorithmic schema in [11] can be used to decide coverability using a forward analysis. This approach has the advantage that we do not need to know a bound on the nesting of restrictions a priori. As an immediate future work, it would be interesting to find (structural) sufficient conditions for depth-boundedness of ν-MSR. In that sense, it would be useful to strengthen the bound found in Prop. 1 on the nesting of a fragment.
References 1. Abadi, M., Gordon, A.D.: A Calculus for Cryptographic Protocols: The spi Calculus. Inf. Comput. 148(1), 1–70 (1999) 2. Abdulla, P.A., Cerans, K., Jonsson, B., Tsay, Y.K.: Algorithmic analysis of programs with well quasi-ordered domains. Inf. Comput. 160(1-2), 109–127 (2000)
Depth Boundedness in Multiset Rewriting Systems with Name Binding
175
3. Baldan, P., Bonchi, F., Gadducci, F.: Encoding asynchronous interactions using open Petri nets. In: Bravetti, M., Zavattaro, G. (eds.) CONCUR 2009. LNCS, vol. 5710, pp. 99–114. Springer, Heidelberg (2009) 4. Busi, N., Gorrieri, R.: Distributed semantics for the pi-calculus based on Petri nets with inhibitor arcs. J. Log. Algebr. Program. 78(3), 138–162 (2009) 5. Cervesato, I.: Typed MSR: Syntax and Examples. In: Gorodetski, V.I., Skormin, V.A., Popyack, L.J. (eds.) MMM-ACNS 2001. LNCS, vol. 2052, pp. 159–177. Springer, Heidelberg (2001) 6. Cervesato, I., Durgin, N.A., Lincoln, P., Mitchell, J.C., Scedrov, A.: A metanotation for protocol analysis. In: CSFW, pp. 55–69 (1999) 7. Decker, G., Weske, M.: Instance isolation analysis for service-oriented architectures. In: IEEE SCC, vol. (1), pp. 249–256. IEEE Computer Society, Los Alamitos (2008) 8. Delzanno, G.: An overview of MSR(C): A CLP-based framework for the symbolic verification of parameterized concurrent systems. Electr. Notes Theor. Comput. Sci, vol. 76 (2002) 9. Delzanno, G.: Constraint multiset rewriting. Technical Report DISI-TR-05-08, University of Genova (2005) 10. Finkel, A., Schnoebelen, P.: Well-structured transition systems everywhere? Theor. Comput. Sci. 256(1-2), 63–92 (2001) 11. Geeraerts, G., Raskin, J.F., Begin, L.V.: Expand, enlarge and check: New algorithms for the coverability problem of wsts. J. Comput. Syst. Sci. 72(1), 180–203 (2006) 12. Gordon, A.D.: Notes on nominal calculi for security and mobility. In: Focardi, R., Gorrieri, R. (eds.) FOSAD 2000. LNCS, vol. 2171, pp. 262–330. Springer, Heidelberg (2001) 13. Meyer, R.: On boundedness in Depth in the pi-calculus. In: Ausiello, G., Karhum¨ aki, J., Mauri, G., Ong, C.H.L. (eds.) IFIP TCS. IFIP, vol. 273, pp. 477– 489. Springer, Heidelberg (2008) 14. Meyer, R.: A theory of structural stationarity in the pi-calculus. Acta Inf. 46(2), 87–137 (2009) 15. Meyer, R., Gorrieri, R.: On the relationship between pi-calculus and finite place/transition Petri nets. In: Bravetti, M., Zavattaro, G. (eds.) CONCUR 2009. LNCS, vol. 5710, pp. 463–480. Springer, Heidelberg (2009) 16. Rosa-Velardo, F.: Multiset rewriting: a semantic framework for concurrency with name binding. In: 8th International Workshop on Rewriting Logic and its Applications, WRLA 2010. Springer, Heidelberg (to appear, 2010) 17. Rosa-Velardo, F., de Frutos-Escrig, D.: Name creation vs. replication in Petri net systems 88(3), 329–356 (2008) 18. Rosa-Velardo, F., de Frutos-Escrig, D.: Decidability problems in Petri nets with name creation and replication (submitted) 19. Rosa-Velardo, F., de Frutos-Escrig, D., Alonso, O.M.: On the expressiveness of Mobile Synchronizing Petri Nets. Electr. Notes Theor. Comput. Sci. 180(1), 77–94 (2007) 20. Sangiorgi, D., Walker, D.: The pi-calculus: a Theory of Mobile Processes. Cambridge University Press, Cambridge (2001) 21. van Hee, K.M., Sidorova, N., Voorhoeve, M., van der Werf, J.M.E.M.: Generation of database transactions with petri nets. Fundam. Inform. 93(1-3), 171–184 (2009) 22. Wies, T., Zufferey, D., Henzinger, T.A.: Forward analysis of depth-bounded processes. In: Ong, L. (ed.) FOSSACS 2010. LNCS, vol. 6014, pp. 94–108. Springer, Heidelberg (2010)
Efficient Construction of Semilinear Representations of Languages Accepted by Unary NFA Zdenˇek Sawa Center for Applied Cybernetics, Department of Computer Science Technical University of Ostrava 17. listopadu 15, Ostrava-Poruba, 708 33, Czech republic [email protected]
Abstract. Chrobak (1986) proved that a language accepted by a given nondeterministic finite automaton with one-letter alphabet, i.e., a unary NFA, with n states can be represented as the union of O(n2 ) arithmetic progressions, and Martinez (2002) has shown how to compute these progressions in polynomial time. To (2009) has pointed out recently that Chrobak’s construction and Martinez’s algorithm, which is based on it, contain a subtle error and has shown how they can be corrected. In this paper, a new simpler and more efficient algorithm for the same problem is presented. The running time of the presented algorithm is O(n2 (n + m)), where n is the number of states and m the number of transitions of a given unary NFA.
1
Introduction
It is well known that Parikh images of regular (and even context-free) languages are semilinear sets [7,4]. In unary languages, i.e., languages over a one-letter alphabet, words can be identified with their lengths (i.e., an can be identified with n), so the Parikh image of a unary language is just the set of lengths of words of the language, and it can be identified with the language itself. It can be easily shown that each regular unary language can be represented as the union of a finite number of arithmetic progressions of the form {c + di | i ∈ N} where c and d are constants specifying the offset and the period of a progression. A unary nondeterministic finite automaton (a unary NFA) is an NFA with a one-letter alphabet. Given a unary NFA A, a set of arithmetic progressions representing the language accepted by A can be computed by determinization of A; however, this straightforward approach can produce an exponential number of progressions. Chrobak [1] has shown that this exponential blowup is avoidable and that a language accepted by a unary NFA with n states can be represented as the union of O(n2 ) progressions of the form {c + di | i ∈ N} where c < p(n) for some p(n) ∈ O(n2 ) and 0 ≤ d ≤ n. The computational complexity of the
Supported by the Czech Ministry of Education, Grant No. 1M0567.
A. Kuˇ cera and I. Potapov (Eds.): RP 2010, LNCS 6227, pp. 176–182, 2010. c Springer-Verlag Berlin Heidelberg 2010
Efficient Construction of Semilinear Representations
177
construction of these progressions was not analyzed in [1], but it can be easily seen that a naive straightforward implementation would require exponential time. Later, Martinez [5,6] has shown how the construction described in [1] can be realized in polynomial time. The exact complexity of Martinez’s algorithm is O(kn4 ) where n is the number of states of the automaton and k the number of strongly connected components of its graph. The result was recently used for example in [3,2] to obtain more efficient algorithms for some problems in automata theory and the verification of one-counter processes. In [8], To pointed out that Chrobak’s construction and Martinez’s algorithm (whose correctness relies on correctness of Chrobak’s construction) contain a subtle error, and he has shown modifications that correct this error. In this paper, we give a simpler and more efficient algorithm for the same problem, i.e., for computing of a corresponding set of arithmetic progressions for a given unary NFA. The time complexity of the algorithm is O(n2 (n + m)) and its space complexity O(n2 ), where n is the number of states and m number of transitions of the unary NFA. Section 2 gives basic definitions and formulates the main result, Section 3 describes the algorithm and proofs of its correctness, and Section 4 contains a description of an efficient implementation of the algorithm and an analysis of its complexity.
2
Definitions and Main Result
The set of natural numbers {0, 1, 2, . . .} is denoted by N. For i, j ∈ N such that i ≤ j, [i, j] denotes the set {i, i + 1, . . . , j}, and [i, j) denotes the (possibly empty) set {i, i + 1, . . . , j − 1}. Given c, d ∈ N, an arithmetic progression is the set {c + d · i | i ∈ N}, denoted c + dN, where c is called the offset and d the period of the progression. The following definitions are standard (see e.g. [4]), except that they are specialized to the case where a one-letter alphabet is used. In such an alphabet, words can be identified with their lengths. A unary nondeterministic finite automaton (a unary NFA) is a tuple A = (Q, δ, I, F ) where Q is a finite set of states, δ ⊆ Q × Q is a transition relation, and I, F ⊆ Q are sets of initial and final states respectively. A path of length k from q to q , where q, q ∈ Q, is a sequence of states q0 , q1 , . . . , qk from Q where k q = q0 , q = qk , and (qi−1 , qi ) ∈ δ for each i ∈ [1, k]. We use q −→ q to denote that there exists a path of length k from q to q . A word x ∈ N is accepted by x A if q0 −→ qf for some q0 ∈ I and qf ∈ F . The language L(A) accepted by a unary NFA A is the set of all words accepted by A. We consider the following problem: Problem: UNFA-Arith-Progressions Input: A unary NFA A. Output: A set {(c1 , d1 ), (c2 , d2 ), . . . , (ck , dk )} of pairs of natural k numbers such that L(A) = i=1 (ci + di N).
178
Z. Sawa
The main result presented in this paper is: Theorem 1. There is an algorithm solving UNFA-Arith-Progressions with time complexity O(n2 (n + m)) and space complexity O(n2 ) where n is the number of states and m the number of transitions of a given unary NFA. The algorithm constructs O(n2 ) pairs of numbers and each constructed pair (ci , di ) satisfies ci < 2n2 + n and di ≤ n.
3
L(A) as Union of Arithmetic Progressions
In this section, we describe the algorithm for UNFA-Arith-Progressions and prove its correctness. In the rest of the section, we assume a fixed unary NFA A = (Q, δ, I, F ) with |Q| = n. 3.1
The Algorithm
The algorithm works as follows. It computes the resulting set R of pairs of numbers that represent arithmetic progressions as the union of the following sets R1 and R2 where: – R1 is the set of all of pairs (x, 0) where x ∈ L(A) and x ∈ [0, 2n2 + n), and – R2 is the set of all of pairs (c, d) where d ∈ [1, n], c ∈ [2n2 − d, 2n2 ), and n
d
where for some q0 ∈ I, q ∈ Q, and qf ∈ F we have q0 −→ q, q −→ q, c−n and q −→ qf (note that c ≥ n). To compute R1 , it is sufficient to test for each x ∈ [0, 2n2 +n) if x ∈ L(A), and to compute R2 , it is sufficient to test for each of O(n2 ) pairs (c, d), where d ∈ [1, n] and c ∈ [2n2 − d, 2n2 ), if the required conditions are satisfied. All these tests can be easily done in polynomial time and we can also see that |R| ∈ O(n2 ). An efficient implementation of the algorithm, which avoids some recomputations by precomputing certain sets of states, is described in Section 4 together with a more detailed analysis of its complexity. The correctness of the algorithm is ensured by the following crucial lemma and its corollary; the proof of the lemma is postponed to the next subsection. Lemma 2. Let x ≥ 2n2 + n. If x ∈ L(A) then x ∈ c + dN for some (c, d) ∈ R2 . Corollary 3. Let x ∈ N. Then x ∈ L(A) iff x ∈ c + dN for some (c, d) ∈ R. Proof. (⇒) Assume x ∈ L(A). Either x < 2n2 + n and then (x, 0) ∈ R1 and x ∈ (x + 0N) = {x}, or x ≥ 2n2 + n and then x ∈ c + dN for some (c, d) ∈ R2 by Lemma 2. (⇐) It can be easily checked that c+dN ⊆ L(A) for each (c, d) ∈ R. For (c, d) ∈ R1 this follows from the definition, and for (c, d) ∈ R2 from the observation that n
d
c−n
if q0 −→ q, q −→ q, and q −→ qf for some q0 ∈ I, q ∈ Q, and qf ∈ F (where c ≥ n), then A accepts each word from c + dN.
Efficient Construction of Semilinear Representations
3.2
179
Proof of Lemma 2
The rest of this section is devoted to the proof of Lemma 2, which is done by the following sequence of simple propositions. The basic idea of the proof is that there exists a polynomial p(n) ∈ O(n2 ) x such that if q1 −→ q2 for some q1 , q2 ∈ Q and x ≥ p(n) then there is a path α of length x from q1 to q2 of the following form: α goes from q1 to some state q by c1 steps, then goes through a cycle of length d ∈ [1, n] several times, and then goes from q to q2 by c2 steps. Obviously x = c1 + k · d + c2 for some k ∈ N, and it will be also ensured that c1 + c2 < p(n). Every path α of length x from q1 to q2 can be transformed into the described form by the following construction: we can decompose α into elementary cycles, i.e., cycles where no state is repeated, and a simple path, i.e., a path where no state is repeated, from q1 to q2 . We can do this by repeatedly removing elementary cycles from α. Using this decomposition, we can construct a path of the required form by selecting one elementary cycle of some length, say d, and by repeatedly “cutting-out” some subsets of the remaining elementary cycles, such that the sums of lengths of cycles in these subsets are multiples of d, which means that they can be replaced with iterations of the selected cycle of length d. However, when we “cut-out” cycles, we must be careful, because by cuttingout some cycles, some other cycles can become unreachable. An error of this kind was made by Chrobak in [1] as pointed out by To in [8]. To ensure that none of the cycles becomes unreachable, we divide elementary cycles into two categories — removable and unremovable. Only removable cycles will be cut-out, and it will be ensured that it is safe to remove any subset of removable cycles. We say a sequence β0 , β1 , . . . , βr , where β0 is a simple path from q1 to q2 and where β1 , β2 , . . . , βr are elementary cycles, is good if for each i ∈ [1, r] there is some j ∈ [0, i), such that βi and βj share at least one state q. Note that from such good sequence we can construct a path from q1 to q2 , whose length is the sum of lengths of all βi , by starting with β0 and repeatedly “pasting-in” β1 , β2 , . . . , βr (in this order). Each cycle βi can be “pasted-in” since it shares some state q with some βj where j < i (βi can be pasted in by splitting it in q). Note that a decomposition β0 , β1 , . . . , βr of an original path α, where β0 is a simple path from q1 to q2 and where β1 , β2 , . . . , βr are elementary cycles in the reverse order, in which they were removed from α (i.e., βr was removed first and β1 last), is good. We say a cycle βi , where i ∈ [1, r], is removable if for each state q of βi there is some j ∈ [0, i) such that βj contains q. Cycle βi that is not removable is unremovable. It can be easily checked that a sequence obtained from β0 , β1 , . . . , βr by removing some arbitrary subset of removable cycles is also good. The following proposition is the main “tool” that allows us to find a subset of removable cycles such that the sum of lengths of cycles in this subset is a multiple of d.
180
Z. Sawa
Proposition 4. Let d ≥ 1. Every sequence x1 , x2 , . . . , xr of natural numbers, where r ≥ d, contains a non-empty subsequence xi , xi+1 , . . . , xj (where 1 ≤ i ≤ j ≤ r) such that (xi + xi+1 + · · · + xj ) ≡ 0 (mod d). Proof. Consider a sequence s0 , s1 , . . . , sr where si = x1 +x2 +· · ·+xi for i ∈ [0, r]. There are at most d different values of si modulo d. Since r ≥ d, by the pigeonhole principle we have si ≡ sj (mod d) for some i, j such that 0 ≤ i < j ≤ r. The nonempty sequence xi+1 , xi+2 , . . . , xj has the required property (xi+1 + xi+2 +
· · · + xj ) ≡ 0 (mod d), since sj − si ≡ 0 (mod d). x
y
Proposition 5. Let q1 , q2 ∈ Q, x ∈ N, and d ∈ [1, n]. If q1 −→ q2 then q1 −→ q2 for some y ∈ [0, 2n2 − n) such that y ≤ x and y ≡ x (mod d). x
Proof. Let us assume q1 −→ q2 and let y ∈ N be the smallest number such that y y ≡ x (mod d) and q1 −→ q2 (such y exists, since y = x satisfies these properties). Let β0 , β1 , . . . , βr be a good decomposition of a path of length y from q1 to q2 (β0 is a simple path from q1 to q2 and βi for i ∈ [1, r] are elementary cycles). Let us assume that there are at least d removable cycles in this decomposition. Then, by Proposition 4, there is a nonempty subset of these removable cycles such that the sum of lengths of the cycles in this subset is a multiple of d. By removing the cycles in this subset we obtain a good sequence, from which we can construct a path from q1 to q2 of length y < y where y ≡ y (mod d). y
So q1 −→ q2 and y ≡ x (mod d), which is a contradiction, since we have assumed that y is the smallest such number. This implies that in the sequence β0 , β1 , . . . , βr there are at most d − 1 removable cycles. A cycle βi is unremovable iff it contains a state q that does not belong to any βj with j < i, which implies that there are at most n − 1 unremovable cycles (note that there is at least one state in β0 ). The length of β0 is at most n − 1 and a length of each elementary cycle is at most n, which implies y ≤ (n − 1) + (n − 1 + d − 1) · n < 2n2 − n, since d ≤ n.
x
Corollary 6. Let q1 −→ q2 for some q1 , q2 ∈ Q and x ∈ N. If x ≥ n then there c1 exist q ∈ Q, c1 ∈ [0, n), d ∈ [1, n], and c2 ∈ [0, 2n2 − n) such that q1 −→ q, d
c
2 q −→ q, q −→ q2 , and x ∈ (c1 + c2 ) + dN.
Proof. By the pigeonhole principle, some q ∈ Q must be visited twice in the first n steps of a path from q1 to q2 of length x ≥ n, and so for some c1 ∈ [0, n), c
d
c
1 2 q, q −→ q, q −→ q2 , and x = c1 + d + c2 . d ∈ [1, n], and c2 ∈ N we have q1 −→ c2 By Proposition 5, there is some c2 ∈ [0, 2n2 − n) satisfying c2 ≤ c2 , q −→ q2 , and c2 ≡ c2 (mod d). So c2 = c2 + k · d for some k ∈ N, and x = c1 + d + c2 = (c1 + c2 ) + (k + 1) · d, which means that x ∈ (c1 + c2 ) + dN.
x
Proposition 7. Let q1 −→ q2 for some q1 , q2 ∈ Q and x ∈ N. If x ≥ 2n2 + n n then there exist q ∈ Q, c ∈ [0, 2n2 − n), and d ∈ [1, n], such that q1 −→ q, d
c
q −→ q, q −→ q2 , and x ∈ (n + c) + dN.
Efficient Construction of Semilinear Representations
181
x
Proof. Assume q1 −→ q2 where x ≥ 2n2 + n. By Corollary 6, there are some c1 q ∈ Q, c1 ∈ [0, n), d ∈ [1, n], c2 ∈ [0, 2n2 − n), and k ∈ N such that q1 −→ q , d
c
2 q −→ q , q −→ q2 , and x = (c1 + c2 ) + k · d. Let α be a path of length x from q1 to q2 that goes from q1 to q by c1 steps, then goes k times through a cycle β of length d, and then goes from q to q2 by c2 steps, and let q be the state reached after the first n steps of α. Note that since (c1 + c2 ) + k · d = x ≥ 2n2 + n and c1 +c2 < 2n2 (because c1 < n and c2 < 2n2 −n), we have k ·d ≥ n. Together with n c1 < n this ensures that the state q is on the cycle β, which implies q1 −→ q, d x−n q −→ q, and q −→ q2 . By Proposition 5, there is some c ∈ [0, 2n2 − n) such that c c ≤ x − n, q −→ q2 , and c ≡ x − n (mod d). This means that n + c ≡ x (mod d), and since c ≤ x − n implies n + c ≤ x, we have x ∈ (n + c) + dN.
Now we can prove Lemma 2. Proof (of Lemma 2). Assume that x ≥ 2n2 + n and x ∈ L(A), so there are x some q0 ∈ I and qf ∈ F such that q0 −→ qf . By Lemma 7, there exist q ∈ Q, d
n
c
c ∈ [0, 2n2 − n), and d ∈ [1, n], such that q0 −→ q q −→ q, q −→ qf , and x ∈ (n + c ) + dN. This means that for each c ∈ (n + c ) + dN, such that c ≤ x, we c−n have q −→ qf and x ∈ c + dN. In particular, there is one such c in the interval 2
[2n − d, 2n2 ), since n + c ∈ [n, 2n2 ).
4
Efficient Implementation
To avoid recomputations, the algorithm precomputes some sets. For i ∈ N we i i define Si = {q ∈ Q | ∃q0 ∈ I : q0 −→ q} and Ti = {q ∈ Q | ∃qf ∈ F : q −→ qf }, d
and for q ∈ Q we define Periods (q) = {d ∈ [1, n] | q −→ q}. In particular, the algorithm precomputes the sets Sn , Ti for i ∈ [2n2 − 2n, 2n2 − n), and Periods(q) n for q ∈ Sn . To test for a given q if q0 −→ q for some q0 ∈ I, the algorithm tests c−n if q ∈ Sn , to test if q −→ qf for some qf ∈ F , it tests if q ∈ Tc−n , and to test if d
q −→ q, it tests if d ∈ Periods(q). All these sets can be implemented as bit arrays, so operations like adding an element to a set, testing if an element is member of a set, and so on, can be performed in a constant time. It is also obvious that for Q ⊆ Q, the sets Succ(Q ) = {q ∈ Q | ∃q ∈ Q : (q , q) ∈ δ} and Pre(Q ) = {q ∈ Q | ∃q ∈ Q : (q, q ) ∈ δ} can be computed in time O(n + m) where m is the number of transitions (i.e., |δ| = m). Using subroutines for computing Pre and Succ, the precomputation of all necessary sets can be done in time O(n2 (n + m)). For example, Sn can be precomputed by computing sequence S0 , S1 , . . . , Sn where S0 = I, and Si+1 = Succ(Si ) for i ≥ 0, Ti can be computed by T0 = F , and Ti+1 = Pre(Ti ) for i ≥ 0, etc. Also all x < 2n2 + n such that x ∈ L(A) can be found in time O(n2 (n + m)) by computing the sequence S0 , S1 , . . . , S2n2 +n−1 and checking if Sx ∩ F = ∅ for x ∈ [0, 2n2 + n). 2 There are O(n ) pairs (c, d) such that d ∈ [1, n] and c ∈ [2n2 − d, 2n2 ), and for each of them, at most n states are tested. Since the corresponding tests for one
182
Z. Sawa
triple c, d, q can be done in a constant time as described above, all triples can be tested in time O(n3 ). We see that the overall running time of the algorithm is O(n2 (n + m)). During the computation, only the values of Sn , Ti for i ∈ [2n2 −n−d, 2n2 −n), and Periods (q) for q ∈ Sn need to be stored. Obviously, O(n2 ) bits are sufficient to store these values. Other values are used only temporarily, can be discarded after their use, and do not take more than O(n2 ) bits, so the overall space complexity of the algorithm is O(n2 ).
References 1. Chrobak, M.: Finite automata and unary languages. Theoretical Computer Science 47(2), 149–158 (1986) 2. G¨ oller, S., Mayr, R., To, A.W.: On the computational complexity of verifying onecounter processes. In: LICS’09, pp. 235–244. IEEE Computer Society, Los Alamitos (2009), http://dx.doi.org/10.1109/LICS.2009.37 3. Gruber, H., Holzer, M.: Computational complexity of NFA minimization for finite and unary languages. In: Mart´ın-Vide, C., Otto, F., Fernau, H. (eds.) LATA 2008. LNCS, vol. 5196, pp. 261–272. Springer, Heidelberg (2008) 4. Kozen, D.C.: Automata and Computability. Springer, Heidelberg (1997) 5. Martinez, A.: Efficient computation of regular expressions from unary nfas. In: Descriptional Complexity of Formal Systems, DFCS (2002) 6. Martinez, A.: Topics in Formal Languages: String Enumeration, Unary NFAs and State Complexity. Master’s thesis, University of Waterloo (2002) 7. Parikh, R.J.: On context-free languages. J. ACM 13(4), 570–581 (1966) 8. To, A.W.: Unary finite automata vs. arithmetic progressions. Information Processing Letterss 109(17), 1010–1014 (2009), http://dx.doi.org/10.1016/j.ipl.2009.06.005
Efficient Graph Reachability Query Answering Using Tree Decomposition Fang Wei Computer Science Department, University of Freiburg, Germany
Abstract. Efficient reachability query answering in large directed graphs has been intensively investigated because of its fundamental importance in many application fields such as XML data processing, ontology reasoning and bioinformatics. In this paper, we present a novel indexing method based on the concept of tree decomposition. We show analytically that this intuitive approach is both time and space efficient. We demonstrate empirically the efficiency and the effectiveness of our method.
1
Introduction
Querying and manipulating large scale graph-like data has attracted much attention in the database community, due to the wide application areas of graph data, such as GIS, XML databases, bioinformatics, social network, and ontologies. The problem of reachability test in a directed graph is among the fundamental operations on the graph data. Given a digraph G = (V, E) and u, v ∈ V , a reachability query, denoted as u → v, ask: is there a path from u to v? One of the fundamental queries on biological networks is for instance, to find all genes whose expressions are directly or indirectly influenced by a given molecule [15]. Given the graph representation of the genes and regulation events, the question can also be reduced to the reachability query in a directed graph. Recently, tree decomposition methodologies have been successfully applied to solving shortest path query answering over undirected graphs [17]. Briefly stated, the vertices in a graph G are decomposed into a tree in which each node contains a set of vertices in G. Different from other partitioning based methods, there are overlapping between the tree nodes, i.e., for any vertex v in G, there could be more than one node in the tree which contains v. However, it is required that all these nodes constitute a connected subtree (see Definition 1 for the formal definition). Based on this decomposed structure, many otherwise intractable problems can be solved if the underlying tree decomposition has bounded treewidth. In this paper we make an attempt to solve reachability problems over directed graphs by using tree decomposition based index structures. In comparison to shortest path queries, reachability query answering enjoys some nice properties. For instance, the existing BFS or DFS algorithms are highly efficient. However, these properties might cause challenging problems to occur, if A. Kuˇ cera and I. Potapov (Eds.): RP 2010, LNCS 6227, pp. 183–197, 2010. c Springer-Verlag Berlin Heidelberg 2010
184
F. Wei
substantial improvement on time complexity is desired. Note that one extreme scheme is to store all the transitive closures in the pre-processing stage, thus the reachability queries can be answered in constant time. However this requires an index of size O(n2 ), which is unrealistic for large scale graphs. Therefore, finding a better trade-off between time and storage is the ultimate goal of many reachability query answering algorithms. Surprisingly, we have found that the tree decomposition-based methodology can be adapted on directed graphs and moreover, the efficiency of the query algorithm is substantially improved, based on the index which is much smaller than O(n2 ). Our main contributions are the following: – Linear time tree decomposition algorithm. In spite of the theoretical importance of the tree decomposition concept, many results are practically useless due to the fact that finding a tree decomposition with optimal treewidth is an NP-hard problem, w.r.t. the size of the graph. To overcome this difficulty, we propose a simple heuristics to achieve a linear time tree decomposition algorithm. – Flexibility of balancing the time and space efficiency. From the proposed tree decomposition algorithm, we discover an important correlation between the query time and the index size. This flexibility enables the users to choose the best time/space trade-off according to the system requirements. 1.1
Related Work
Most of the current research of reachability query answering concentrates on methods that first build an index structure to store part of the transitive closures, then speed up the query answering process, thus to find better trade-offs of index size and the query answering time. They can be categorized into the two main groups. The first group of algorithms are based on the 2-Hop approach first proposed by Cohen et al. [6]. The second are based on the interval labeling approach by Agrawal et al. [1]. 2-Hop based algorithms. The basic idea of the 2-Hop approach is to assign for each vertex v a list of vertices which are reachable from v, denoted as Lin (v), and a list of vertices to which v can reach, denoted as Lout (v), so that for any two vertices u and v, u → v if and only if Lout (u) ∩ Lin (v) = ∅. The ultimate goal of the algorithm is to minimize the index size of the form v∈V Lout (v) + Lin (v). Clearly if the index is available, the reachability query answering requires only two lookups. However, this optimization problem is NP-hard. Improvements on the 2-Hop algorithm can be found in [13,14] Generally 2-Hop based algorithms do not scale for large size graphs. Interval labeling based algorithms. Interval labeling based approaches utilize the efficient method of indexing and querying trees which was applied to XML query processing in recent years [18]. It is well known that given a tree, we can label each node v by an interval [start(v), end(v)]. Thus the reachability query can be answered by comparing the start and the end labels of u and v
Efficient Graph Reachability Query Answering Using Tree Decomposition
185
in constant time. The labeling process takes linear time and space. The Dual Labeling algorithm proposed by Wang et al. [16] achieved to answer reachability queries in constant time. They first identify a spanning tree from the graph and label the vertices in the tree with pre- and post-order values. Then the transitive closure for the rest of the edges is stored. Clearly, the price for the constant query time is paid by the storage cost of t2 where t is number of the non-tree edges in the graph. Therefore the Dual Labeling approach achieves good performance only if the graph is extremely sparse where t n. Jin et al. [9] proposed a different index structure called Path Tree. Like other interval labeling based methods, they extract a tree from the original graph. But every node in the tree contains a path, instead of a single vertex. This index structure is superior to the previous ones since it can encode some non-tree structures such as grid in an elegant way. All of these algorithms in common is that the performance deteriorate for non-sparse graphs. In contrast, the index structure proposed in this paper scales for dense graphs as well.
2 2.1
Graph Indexing with Tree Decomposition Tree Decomposition of Directed Graphs
A directed graph is defined as G = (V, E), where V = {0, 1, . . . , n − 1} is the vertex set and E ⊆ V × V is the edge set. Let n = |V | be the number of vertices and m = |E| be the number of edges. For each directed graph, its tree decomposition is defined as follows: Definition 1. A tree decomposition of G = (V, E), denoted as TG , is a pair ({Xi | i ∈ I}, T ), where {Xi | i ∈ I} is a collection of subsets of V and T = (I, F ) is a tree such that: 1. i∈I Xi = V . 2. for every (u, v) ∈ E, there is i ∈ I : u, w ∈ Xi . 3. for all v ∈ V , the set {i | v ∈ Xi } induces a subtree of T . A tree decomposition contains a set of tree nodes, where each node contains a set of vertices in V . We call the sets Xi bags. It is required that every vertex in V should occur in at least one bag (condition 1), and for every edge in E, both vertices of the edge should occur together in at least one bag (condition 2). The third condition is usually referred to as the connectedness condition, which requires that given a vertex v in the graph, all the bags which contain v should be connected. Note that from now on, the node in the directed graph G is referred to as vertex, and the node in the tree decomposition is referred to as tree node or simply node. For each tree node i, there is a bag Xi consisting of vertices. To simplify the presentation, we will sometimes use the term node and its corresponding bag interchangeably.
186
F. Wei
Given any graph G, there may exist many tree decompositions which fulfill all the conditions in Definition 1. However, we are interested in those tree decompositions with smaller bag sizes. The width of a bag is the cardinality of the bag. The width of a tree decomposition ({Xi | i ∈ I}, T ) is defined as max{|Xi | | i ∈ I}1 . The treewidth of G is the minimal width of all tree decompositions of G. It is denoted as tw(G). Note that trees and forests are precisely the structures with treewidth 2. Example 1. Consider the graph illustrated in Figure 1(a). One of the tree decompositions is shown in Figure 1(b) . Recall that only trees and forests have treewidth 2, therefore this tree decomposition is optimal and we have tw(G) = 3.
(a)
(b)
Fig. 1. The graph G (a) and one tree decomposition TG (b) with tw(G) = 3
Let G = (V, E) be a graph and TG = ({Xi | i ∈ I}, T ) its tree decomposition. Due to the third condition in Definition 1, for any vertex v in V there exists an induced subtree of TG in which every bag contains v. We call it the induced subtree of v and denote it as Tv . Furthermore, we denote the root of Tv as rv and its corresponding bag as Xrv . For instance, the induced subtree of vertex 3 in Figure 1(b) contains the bags X0 , X1 and X2 , where r3 = 0. 2.2
Tree Path
Let G = (V, E) be a directed graph, and u, v ∈ V . We say v is reachable from vertex u, denoted as u → v, if there is a path starting from u and ending at v with the form (u, v1 , . . . , vn , v), where (u, v1 ), (vi , vi+1 , (vn , v) ∈ E. Note that in this paper, we consider the more general definition of path, that is, a path is not necessarily a simple path. Let us consider the graph vertices in the tree nodes. Since each vertex occurs in more than one bag, a vertex can be identified with {v, i}, where v is a vertex and i the node in the tree, meaning that vertex v is located in the tree node i. We denote it as tree vertex. Now we define the so-called inner edge and inter edge in the tree decomposition. Definition 2 (Inner edge, Inter edge, Tree path). Let G = (V, E) be a directed graph and TG = ({Xi | i ∈ I}, T ) its tree decomposition. 1
The original definition of the width is max{|Xi | | i ∈ I} − 1, due to esthetic reasons.
Efficient Graph Reachability Query Answering Using Tree Decomposition
187
– The inner edges of TG are precisely the pairs of tree vertices defined as follows: {({u, i}, {v, i}) | (u, v) ∈ E, u, v ∈ Xi (i ∈ I)}. – The inter edges of TG are the pairs of tree vertices with the form ({v, i}, {v, j}) where v ∈ Xi and v ∈ Xj , and either (i, j) ∈ F or (j, i) ∈ F holds. – A tree path from {u, i} to {v, j} is a sequence of tree vertices connected with either inter or inner edges. Intuitively, the set of inner edges consists precisely of those edges in E, with the extra information of the bags in which the edges are located. For instance, the inner edges of the tree decomposition of the graph in Example 1 are: ({0, 2}, {5, 2}), ({1, 3}, {2, 3}), ({2, 1}, {3, 1}), ({3, 2}, {0, 2}), . . .. Note that it happens that the same pair of vertices occurs in more than one bag. For instance, the edge (4, 3) occurs in both bags X0 and X1 . Thus there are two inner edges: ({4, 1}, {3, 1}) and ({4, 0}, {3, 0}) For instance, in Example 1, ({5, 0}, {5, 2}) is an inter edge, as well as ({5, 2}, {5, 0}). Lemma 1. Let G = (V, E) be a directed graph and TG = ({Xi | i ∈ I}, T ) its tree decomposition. Let u, v ∈ V . Let further {u, i} and {v, j} be tree vertices in TG . There is a path from u to v in G if and only if there is a tree path from {u, i} to {v, j}. Example 2. Consider the graph in Figure 1(a). Vertex 4 reaches vertex 0 with the path {4, 1, 2, 3, 0}. In the tree decomposition in Figure 1(b), there is a tree path from {4, 1} to {0, 2} as follows: { {4, 1}, {4, 3}, {1, 3}, {2, 3}, {2, 1}, {3, 1}, {3, 0}, {3, 2}, {0, 2} }. 2.3
Reachability Test on Tree Decomposition
With the definition of tree path, to find a path from u to v, we can simply search in the tree decomposition for a corresponding tree path. Moreover, over the tree decomposition, we only need to concentrate on the simple path between the corresponding tree vertices. There is a well known property of trees that says for any two nodes i and j in a tree, there exists a unique simple path, denoted as SPi,j , such that every path from i to j contains all the nodes in SPi,j . Proposition 1. Let G = (V, E) be a directed graph and TG = ({Xi | i ∈ I}, T ) its tree decomposition. Let u, v ∈ V . Let further ru (resp. rv ) be the root node of the induced subtree of u (resp. v). Then u → v if and only if for every node n in SPru ,rv , there is at least one vertex t ∈ Xn , such that u → t and t → v. Proof. The ”if” direction is trivial: given a tree path from {u, i} to {v, j}, we only need to consider the inner edges. Since for each inner edge {u, i}, {v, i}, there is an edge (u, v) ∈ E, the path from u to v can be easily constructed. Now we prove the ”only if” direction: assume that there is a path from u to v in G. We prove it by induction on the length of the path.
188
F. Wei
– Basis: if u reaches v with a path of length 1, that is, (u, v) ∈ E. Then there exists a node k in the tree decomposition, s.t. u ∈ Xk and v ∈ Xk . We start from {u, i}, traverse along the induced subtree of u, till we reach {u, k}. Since the induced subtree is connected, the path from {u, i} to {u, k} can be constructed with inter edges. Then we reach from {u, k} to {v, k} with an inner edge. Now we traverse from {v, k} to {v, j} along the induced subtree of v, which can again be constructed with inter edges. The tree path from {u, i} to {v, j} is thus completed. – Induction: assume that the lemma holds with paths whose length is less than or equal to n − 1, we prove that it holds for paths with length of n. Assume that there is a path from u to v with length n, where u reaches w with length n − 1 and (w, v) ∈ E. From induction hypothesis, we know that there is a tree path form {u, i} to {w, l} in the tree decomposition, where l is a node in the induced subtree of w. Since (w, v) ∈ E, there is a node n such that w ∈ Xn and v ∈ Xn . Thus {w, n} can be reached from {w, l} with inter edges. Then {w, n} can reach {v, n} with an inner edge. Finally {v, n} can reach {v, j} with a sequence of inter edges. This completes the proof.
Proposition 1 shows that for the reachability test from u to v, although the tree path from {u, ru } to {v, rv } may possibly visit any node in the tree, we only need to concentrate on the reachability test for those vertices which occur in the simple path SPru ,rv . More precisely, we can simply take any node n from SPru ,rv , and check whether there is a vertex t ∈ Xn , such that u → t and t → v hold. In order to further accelerate the query process, we can execute the reachability test along the path tree in a bottom-up manner, as shown in Figure 2. In order to enable the bottom up operation, we need to store the transitive closure for each bag in the tree decomposition. That is, in every bag X, for every pair of vertices x, y ∈ X, the boolean values of x → y and y → x are pre-computed. We show in the following proposition how the reachability queries from u to all the vertices in SPru ,k can be answered.
Fig. 2. Bottom-up processing on the simple tree path
Proposition 2. Let G = (V, E) be a directed graph and TG = ({Xi | i ∈ I}, T ) its tree decomposition. Let u, v ∈ V . Let k be the lowest common ancestor of
Efficient Graph Reachability Query Answering Using Tree Decomposition
189
ru and rv . The reachability queries from u to all the vertices in SPru ,k can be answered in O(w2 h), where h = |SPru ,k | and w is the maximal width of the bags in SPru ,k . Proof. Assume that the transitive closure in every bag from SPru ,k is available. The reachability test starts with node ru . From the information of transitive closure, we can simply obtain the set Yru ⊆ Xru such that every vertex in Yru can be reached from u. Next, we consider ru as the child node and process its parent node, with the available reachability information. This process is recursively executed h times, until k is reached. Next we show that at each step of the processing, all the vertices in the current bag reachable from u can be found in w2 time, where w is the width of the current bag. Assume p is the current node, c its child node, and we have obtained Yc ⊆ Xc , where Yc contains all the vertices reachable from u. Now we have to decide the set Yp ⊆ Xp , i.e. identify all the vertices reachable from u in Xp . Let z be a vertex in Xp . We want to decide whether u → z. We have the following two cases: 1. z ∈ Xp and z ∈ Xc . Since at the child node we know whether z ∈ Yc , we set z ∈ Yp if z ∈ Yc . / Xc . This is a more complex case. We show that z ∈ Yp (i.e. 2. z ∈ Xp and z ∈ z is reachable from u) if and only if there exists a vertex t, such that t ∈ Xp , t ∈ Yc and t → z holds. (a) ”if” direction is trivial. (b) ”only if: Assume that u → z holds. Since z does not occur in Xc , according to the connectedness condition, z does not occur in any bag in the subtree rooted with c. Thus the induced subtrees of u and z do not share any common node in TG . Since u → z, there is a tree path from {u, ru } to {z, rz }, and c, p ∈ SPru ,rz . The tree path from {u, ru } to {z, rz } must contain an inter edge of the form ({t, c}, {t, p}), where t ∈ Xp , Xc , because this is the only possible edge to traverse from c to p. Clearly u → t holds. From the assumption u → z, we obtain that t → z must hold. Given the set Yc ⊆ Xc and the transitive closure in Xp , we can obtain Yp as follows: First set Yp as Yc ∩ Xp . Then for each vertex t ∈ Yp , we add the vertex s into Yp , if t → s holds. Clearly the time consumption is in the worst
case O(w2 ) where w is the width of Xp .
3
Algorithms and Complexity Results
In this section, we present the detailed algorithms for both the index construction and the reachability query answering. In Section 3.1 we begin with the introduction of algorithmic issues on the tree decomposition from a complexity theory perspective, and then justify our choice of an efficient but suboptimal decomposing algorithm. In Section 3.2 we first analyze the reachability query answering algorithm proposed in Theorem 2 from the previous section. Then, we point out that the time and space improvement can be made to achieve higher efficiency of our algorithm.
190
3.1
F. Wei
Index Construction via Tree Decomposition
Since its introduction by Robertson and Seymour [12], the concepts of tree decomposition has been proved to be of great importance in computational complexity theory [4]. The theoretical significance of the tree decomposition based approach lies in the fact that many intractable problems can be solved in polynomial time (or even in linear time) for graphs with treewidth bounded by a constant. Problems which can be dealt with in this way include many well known NP-complete problems, such as the Independent Set, the Hamiltonian Circuits, etc. Recent applications of tree decomposition based approaches can be found in Constraint Satisfaction [10] and database design [7]. However, the practical usefulness of tree decomposition based approaches has been limited due to the following two problems: (1) Calculating the treewidth of a graph is hard. In fact, determining whether the treewidth of a given graph is at most a given integer w is NP-complete [2]. Although for fixed w, linear time algorithms exist to solve the decision problem ”treewidth ≤ w” [3], there is a huge hidden constant factor, which prevents it to be useful in practice. There exist many heuristics and approximation algorithms for determining the treewidth, unfortunately few of them can deal with graphs containing more than 1000 nodes [11]. (2) The second problem lies in the fact that even if the treewidth can be determined, it still can not be guaranteed that good performance will be obtained since the time complexity of most of the algorithms is exponential to the treewidth. Therefore, to solve really hard problems efficiently by using the tree decomposition based approaches, we have to require that the underlying graphs have bounded treewidth (i.e. less than 10). As far as the efficiency is concerned, we can only search for an approximate solution, which yields a tree decomposition whose width is greater than the treewidth. On the other hand, we can tolerate a tree decomposition whose treewidth is not bounded. As we have seen from Proposition 2, the time complexity is in the worst case quadratic of the maximal bag size. We will show later in this section that our query answering algorithm does not depend on the treewidth, but with some parameter which can be enforced to be bounded, due to the nice property of our dedicated decomposing algorithm, and the height of the tree. Inspired from the so-called pre-processing methods by Bodlaender et al. [5], we apply the reduction rules on the graph by reducing stepwise a graph to another one with fewer vertices, due to the following simple fact. Definition 3 (Simplicial). A vertex v is simplicial in an undirected graph G if the set of neighbors of v form a clique in G. Figure 3 shows some special cases. If a vertex v has degree of one (Figure 3(a)), then we can remove v without increasing the treewidth. Figure 3(b), 3(c) illustrate the cases of degree 2 and 3 respectively. The main idea of our decomposition algorithm is to reduce the graph by removing the vertices one by one from the graph, and at the same time push the removed vertices into a stack, so that later on the tree can be constructed
Efficient Graph Reachability Query Answering Using Tree Decomposition
(a)
(b)
191
(c)
Fig. 3. A undirected graph containing a vertex v with degree 1 (a), 2 (b) and 3 (c)
with the information from the stack. First a vertex v with a specific degree is identified. We first check whether all its neighbors form a clique, if not, we add the missing edges to construct a clique. Then v together with its neighbors are pushed into the stack, which is followed by the deletion of v and its edges in the graph. See Algorithm 2.
Algorithm 1. tree decomp(G) Input: G = (V, E) is a directed graph. Output: return the tree decomposition TG . 1: Transform G into an undirected graph U G; 2: graph reduction(U G); {output the vertex stack S} 3: tree construction(S, G); {output the tree decomposition}
The program begins with removing isolated vertices and vertices with degree 1. Then, the reduction process proceeds with the vertices with degree of 2, 3, . . .. We denote such procedure of removing all the vertices with degree x as degree-x reduction. Example 3. Consider the undirected version of the graph in Example 1. Figure 4 illustrates the reduction process. The process starts with a degree-2 reduction by removing vertex 0 and its edges, after adding the edge between 3 and 5. Vertex 0 and its neighbors are then pushed in the stack. Next vertex 1 is removed, following the same principle as of 0. After vertex 2 is removed, a single triangle is then left. The procedure graph reduction will terminate when one of the following conditions is fulfilled. (1) The graph is reduced to an empty set. For instance, if the graph contains only simple cycles, it will be reduced to an empty set after degree-2 reductions. This is usually the case for extremely sparse graphs. (2) For graphs which are not sparse, one has to define a upper bound l for the reduction, so that the program stops after the degree-l reduction. Note that as the degree increases, the effectiveness of the reduction will decrease, because in the worst case, we need to add x(x − 1)/2 edges in order to remove x edges.
192
F. Wei
Fig. 4. The reduction process on the undirected graph of Example 1
Algorithm 2. graph reduction(U G) Input: U G is the undirected graph of G, l is the upper bound for the reduction. Output: stack S and the reduced graph U G 1: initialize stack S; 2: for i = 1 to l do 3: remove upto(i); 4: end for 5: return S, U G; 6: procedure remove upto(x) 7: while TRUE do 8: if there exists a vertex v with degree less than x then 9: {v1 , . . . , vx } = neighbors of v; 10: build a clique for {v1 , . . . , vx }; 11: push v, v1 , . . . , vx into S; 12: delete v and all its edges from U G; 13: else 14: break; 15: end if 16: end while
After the reduction process, the tree decomposition can be constructed as follows: (1) At first we collect all the vertices which were not removed by the reduction process and assign this set as the bag of the tree root. The size of the root depends on the structure of the graph (i.e. how many vertices are left after the reduction). (2) The rest of the tree is generated from the information stored in stack S. Let Xc be the set of vertices {v, v1 , . . . , vx } which is popped up from the top of S. Here v is the removed vertex and {v1 , . . . , vx } are the neighbors of v which form a clique. After the parent bag Xp which contains {v1 , . . . , vx } is located in the tree, Xc is added as a child bag of Xp . This process proceeds until S is empty. Algorithm 3 illustrates the process. The last step of the tree construction process is to generate the transitive closure for every bag. The correctness of our tree decomposition algorithm can be shown by the induction on the reduction steps. Note that during the reduction process, edges
Efficient Graph Reachability Query Answering Using Tree Decomposition
193
Algorithm 3. tree construction(S, G, U G ) Input: S is the stack storing the removed vertices and their neighbors, G is the directed graph, U G is the reduced graph of U G. Output: return tree decomposition TG 1: construct the root of TG containing all the vertices of U G ; 2: while S is not empty do 3: pop up a bag Xc = {v, v1 , . . . , vx } from S; 4: find the bag Xp containing {v1 , . . . , vx }; 5: add Xc into T as the child node of Xp ; 6: end while 7: generate transitive closure in all bags;
are inserted into the original graph. Therefore, the tree decomposition we obtain according to the algorithm is based on a graph consisting of extra edges. However, this does not affect the correctness proof due to the following proposition. Proposition 3. Let G = (V, E) and G = (V, E ) be graphs where E ⊆ E . Then any tree decomposition of G is a tree decomposition of G. Proof. Let TG be the tree decomposition of G . By checking the three properties of Definition 1, it is obvious that TG is also a tree decomposition of G. 3.2
Reachability Query Answering
Recall from Proposition 2 that the time complexity of the bottom-up query answering is O(w2 h). This upper bound is optimal, only if the following two conditions are fulfilled: (1) the treewidth of the underlying graph is bounded (that is, w2 n), and (2) there is an efficient tree decomposition algorithm for it. The first condition has to be fulfilled, since otherwise the linear time BFS algorithm would be more efficient. Unfortunately, as we have seen in the previous section, given an arbitrary graph, it is clear that neither (1) nor (2) can be fulfilled. Therefore, we have to inspect the tree decomposition heuristics applied in Section 3.1 for improvements. From Treewidth to |R| and l. According to Algorithm 2, a graph G can be decomposed by the degree-l reductions by increasing x from 1 to l. As soon as the degree-l reduction is done, all the vertices which are not yet removed are the elements in R of the tree decomposition. Usually if the graph is not extremely sparse, the relationship l |R| holds. In fact, we could even enforce such a relationship by setting l to be small enough in the tree decomposition algorithm. Hence, the resulting tree decomposition has the following properties: (1) the root is of big size (|R|), and (2) the rest of the bags have smaller size (the upper bound is l). If we inspect the bottom-up query processing more carefully, we could observe that the quadratic time computation over the root can be always be avoided. To see this, let us consider the vertices u and v and the lowest common ancestor of
194
F. Wei
ru and rv is the root R. Assume that X1 (resp. X2 ) is the child node of R which locates in the simple path from ru (resp. rv ) to R. Consider now that for all x ∈ X1 , reach(u, x) (resp. all y ∈ X2 , reach(y, v)) have been computed. Clearly, any path from u to v has to pass through a vertex in X1 and X2 respectively. Therefore, at the root node R, we can first calculate X1 ∩ R and X2 ∩ R. Since all the paths from u to v has to pass one vertex in X1 ∩ R and another vertex in X2 ∩ R, we only need to execute a nested loop on X1 ∩ R and X2 ∩ R to decide the reachability. Since both |X1 | and |X2 | have the upper bound of l, the overall time consumption is of O(l2 h), thus independent of |R|. Note that if both u and v are located in R, then the shortest path can be immediately obtained from the local shortest path from u to v, which are pre-computed.
Algorithm 4. reach(TG , u, v) Input: TG is the tree decomposition of G and u, v vertices in G. Output: return TRUE if u → v, otherwise FALSE 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18:
c = ru = root of induced subtree of u; c = rv = root of induced subtree of v; k = lowest common ancestor of ru and rv ; Ru = reachable vertices from u in Xc ; while c.parent = k do p = c.parent; Ru = Ru ∩ Xc ∩ Xp ; for all t in Ru do Rt = set of vertices reachable from t in Xp ; Ru = Ru ∪ Rt ; end for c = p; end while Rv = all vertices that reach v in Xc ; while c.parent = k do p = c.parent; Rv = Rv ∩ Xc ∩ Xp ; for all t in Rv do Rv = Rv ∪ Rt ; Rt = set of vertices reach t in Xp ; end for c = p; end while
19: Ru = Ru ∪ Xk ; Rv = Rv ∪ Xk ; 20: return (reach(x, y) | ∃x ∈ Ru ∧ ∃y ∈ Rv );
The algorithm for the reachability query answering is presented in Algorithm 4. Comparing with the bottom-up query processing shown in Proposition 2, Algorithm 4 is customized with respect to our dedicated tree decomposition algorithm, in the sense that the query time complexity is adapted to be related to l, instead of the treewidth. 3.3
Complexity
Index construction time. For the index construction, we have to (1) generate the tree decomposition, and (2) at each tree node, generate the local transitive
Efficient Graph Reachability Query Answering Using Tree Decomposition
195
closures. For (1), both of the reduction step and the tree construction procedure take time O(n). For (2), we deploy the classic BFS algorithm, which costs in worst case O(m). In fact, we need to run for each vertex in G exactly one BFS procedure. Therefore, the overall index construction time is O(nm). Index size. In each bag X, for each pair of vertices u, v in X, if u reaches v, we need to store a boolean value. Thus the index size is |X|2 , Since the relationship l |R| holds, the root size (|R|) is dominant among all the bags. Therefore, the index size is |R|2 . The index size consists of the tree structure, constructed by using the tree decomposition algorithm. However, this space overhead is linear to n, thus can be ignored. Query. The bottom-up query processing for reachability query answering takes time O(l2 h), where l is the number of the reductions and h is the height of the tree decomposition. Note that the proposed tree decomposition algorithm is independent of the treewidth of the underlying graph, since the reduction parameter l can be adjusted according to the property of the graph. On the other hand, there is no guarantee that the optimal tree decomposition can be obtained. In the worst case, if tree-width is approximately n, there are Θ(n2 ) edges to be stored. So the running time of the query algorithm in the worst case is worse than the one of the BFS (or DFS): if tw(G) = Θ(|G|). Clearly our algorithm is not suitable for such graphs.
4
Experiments
In this section we evaluate the tree decomposition method on real datasets. We are interested in the following parameters: Index size, Index construction time, and Query time. Note that the index size is measured as the size of transitive closures, which takes up the major part of the overall index size. Besides the standard measurements, we are also interested in the structure of the tree decomposition, which may influence the performance of the algorithm. These are: the number of tree nodes (#TreeN), the number of all the vertices stored in the bags (#SumV), the height of the tree (h), the number of vertex reductions (l), and the root size of the tree (|R|). Note that we have chosen the optimal l, in order to achieve the best query time performance. We tested our algorithm over real large datasets with density being larger than or close to 2 used in [8]. All graphs are extracted from real-world large datasets with density being larger than or close to 2. Among them, arXiv is extracted from a dataset of citations among scientific papers from the arxiv.org website. Citeseer contains citations among scientific literature publications from the CiteSeer project, and pubmed was extracted from an XML registry of open access medical publications from the PubMed Central website. GO contains genetic terms and their relationships from the Gene Ontology project. Yago describes the structure of relationships among terms in the semantic knowledge database from the YAGO project. The details of the datasets can be found in [8]. All tests are run on an Intel(R) Core 2 Duo 2.4 GHz CPU, and 2 GB of main memory. All algorithms were implemented in C++ with the Standard Template
196
F. Wei
Library (STL). A query is generated by randomly picking a pair of nodes for a reachability test. We measure the query time by answering a total of 10000 randomly generated reachability queries. We make a comparison of the query time with the linear time Breadth First Search method (BFS). Table 1. Statistics of real graphs, the properties of the index and query performance Graph Arxiv Citeseer Go Pubmed Yago
#V 6000 10720 6793 9000 6642
#E #TreeN #SumV h 66707 44258 13361 40028 42392
4713 8291 5186 6482 6161
28300 33411 19262 26746 19677
12 9 9 6 8
l
|R|
30 8 5 9 8
1288 2430 1608 2519 482
Index Query Time Time(s) Size TD (ms) BFS (ms) 12.5 362228 49.6 449.5 3.6 91067 8.8 135.5 1.2 29674 5.8 77.1 2.9 185065 5.8 127.4 1.2 11673 3.2 78.9
As shown in Table 1, the time costs for query answering are substantially improved with respect to the naive BFS algorithm. As expected, there is a correlation between the index size and the size of the root size of the tree decomposition |R|. Note that the size of the index structure should be approximately |R|2 . However, we can reduce the size by only store those pairs which are reachable from one to the other. We obtain a query time speedup with respect to the naive BFS approach between 11% (Arxiv) and 4% (Yago).
5
Conclusions and Future Work
In this paper, we introduced the tree decomposition as the index structure for large directed graphs to answer reachability queries efficiently. With both theoretical and empirical analysis, we demonstrated that our approach is intuitive and efficient. The algorithms achieve good transitive closure compression rates and scale well on large size graphs. In the future we plan to investigate the following problems: (1) Development of scalable tree decomposition algorithms. We expect to investigate more heuristics and integrate them into our implementation. (2) How to update the of the index structure is the underlying graph is changed. Furthermore, we will consider ondisk algorithms for both index construction and query answering.
References 1. Agrawal, R., Borgida, A., Jagadish, H.V.: Efficient management of transitive relationships in large data and knowledge bases. In: SIGMOD (1989) 2. Arnborg, S., Corneil, D.G., Proskurowski, A.: Complexity of finding embeddings in a k-tree. SIAM J. Algebraic Discrete Methods 8(2), 277–284 (1987) 3. Bodlaender, H.L.: A linear time algorithm for finding tree-decompositions of small treewidth. In: STOC (1993)
Efficient Graph Reachability Query Answering Using Tree Decomposition
197
4. Bodlaender, H.L.: A tourist guide through treewidth. Acta Cybernetica 11, 1–23 (1993) 5. Bodlaender, H.L., Koster, A.M.C.A., van den Eijkhof, F.: Pre-processing rules for triangulation of probabilistic networks. Computational Intelligence 21(3), 286–305 (2005) 6. Cohen, E., Halperin, E., Kaplan, H., Zwick, U.: Reachability and distance queries via 2-hop labels. SIAM J. Comput. 32(5), 1338–1355 (2003) 7. Gottlob, G., Pichler, R., Wei, F.: Tractable database design through bounded treewidth. In: PODS, pp. 124–133 (2006) 8. Jin, R., Xiang, Y., Ruan, N., Fuhry, D.: 3-hop: a high-compression indexing scheme for reachability query. In: SIGMOD (2009) 9. Jin, R., Xiang, Y., Ruan, N., Wang, H.: Efficiently answering reachability queries on very large directed graphs. In: SIGMOD (2008) 10. Kask, K., Dechter, R., Larrosa, J., Dechter, A.: Unifying tree decompositions for reasoning in graphical models. Artif. Intell. 166(1-2), 165–193 (2005) 11. Koster, A.M.C.A., Bodlaender, H.L., Hoesel, S.P.M.V.: Treewidth: Computational experiments. Electronic Notes in Discrete Mathematics (2001) 12. Robertson, P.D., Seymour, N.: Graph minors iii: Planar tree-width. Journal of Combinatorial Theory, Series B 36, 49–64 (1984) 13. Schenkel, R., Theobald, A., Weikum, G.: Hopi: An efficient connection index for complex xml document collections. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., B¨ ohm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 237–255. Springer, Heidelberg (2004) 14. Schenkel, R., Theobald, A., Weikum, G.: Efficient creation and incremental maintenance of the hopi index for complex xml document collections. In: ICDE, pp. 360–371 (2005) 15. Trissl, S., Leser, U.: Fast and practical indexing and querying of very large graphs. In: SIGMOD (2007) 16. Wang, H., He2, H., Yang, J., Yu, P.S., Yu, J.X.: Dual labeling: Answering graph reachability queries in constant time. In: ICDE (2006) 17. Wei, F.: Tedi: Efficient shortest path query answering on graphs. In: SIGMOD (2010) 18. Zhang, C., Naughton, J.F., DeWitt, D.J., Luo, Q., Lohman, G.M.: On supporting containment queries in relational database management systems. In: SIGMOD Conference (2001)
Author Index
´ Andr´e, Etienne Bell, Paul C.
76
Larsen, Kim Guldstrand
91
Chaloupka, Jakub
104
Fribourg, Laurent
76
Margenstern, Maurice 120 Mogbil, Virgile 133 Mundhenk, Martin 146 Rabinovich, Alexander 29 Rosa-Velardo, Fernando 161
Halava, Vesa 91 Hirvensalo, Mika 91 Holzer, Markus 1 Jacob´e de Naurois, Paulin Kutrib, Martin
1
24
Sawa, Zdenˇek 176 Schnoebelen, Philippe 133 Wei, Fang 183 Weiß, Felix 146
51