LEFT-ASSOCIATIVEGRAMMAR: The Algebraic
Definitions
ROLAND HAUSSER Institut fur Deutsche Philologie West
Munchen,
Ger...
12 downloads
512 Views
2MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
LEFT-ASSOCIATIVEGRAMMAR: The Algebraic
Definitions
ROLAND HAUSSER Institut fur Deutsche Philologie West
Munchen,
Germany
ABSTRACT:
This in
informally only
are
parsers
examples.
Finally,
sition
Networks,
of
the grammar
to be
is shown between
is proven
system
described
to generate
all
and
and context-sensitive context-free, regular, The relation between LA-grammar
of
in LA-grammar.
'type
transparent', and Finite
LA-grammar
and
State
Automata,
and with
illustrated
RTNs,
are explained.
Analyzers
Recursive
Power,
Context-Sensitive
Languages,
Input-Output
classes
generators
Generative
KEYWORDS:
account
of CaT.1 LA-grammar
The
differences
and Predictive
ATNs,
issue
languages.
and
a formal
presents
reconstructed
associated
Free
paper
the previous
the recursive
languages
Miinchen
Universitdt
Ludwig-Maximilians
Languages,
Languages,
Finite-State
Networks,
Augmented-Transition
Parsers,
Regular
Languages,
Automata, Generators,
Context
Recursive-Tran Type-Transparency,
Equivalence.
1. THE FORMAL
DEFINITION
Left-associative grammars constitute a new class of formal objects for which we are going to give an algebraic definition. Let us recall some notation from set theory needed for this purpose. If X is a set, then X+ is the 'positive closure', i.e., the set of all concatenations of elements of X. X* is the Kleene closure of X, defined as X+ u e, where c is the 'empty sequence'. The power set of X is denoted by 2X . If X and Y are sets, then (X x Y) is the Cartesian product of X and Y, i.e., the set of ordered pairs consisting of an element of X and an element of Y. For convenience, we also identify integers with sets, i.e., n = {i | 0 < i for all w? w, GW, and all c* G C+, is an instance of lexical look-up only if w, = w,. (w, c*), (wt ci), (w, cm), etc., are called the 'lexical readings' of the word surface wt G W. For example, the word surface gave is mapped into the lexical readings [gave (NDA V)] and [gave (N A TO
Thus, a categorized word a category. For example,
V)].
2Since
LX
is finite,
each w 6 W
is related by LX
to a non-empty
finite
set of elements
of C+.
LEFT-ASSOCIATIVE GRAMMAR 123 of a Sentence
1.4 Definition A
A
start
sentence
is an
Start of
element
x C*).
(W+
start is an ordered pair consisting of a sequence of word surfaces of the sentence start) and a category. For example, if John, 'surface' the (called are and the word surfaces inW, and if GQ, V are category segments in read, then read C, [{John the) (GQ V)] is a sentence start consisting of the surface sentence
{John read the) ? W+ 1.5 Definition
that
(GQ V) e C+.
Concatenation
of Surface
is the function SC from (W* x W)
concatenation
Surface such
and the category
...
SC((w-l
w-k),
=
w-k+1)
... w-k
(w-1
into W+
w-k+1),
for
all
w-j e W. The
function SC concatenates the surfaces of the input expressions3 into the surface of the resulting sentence start. This completely regular operation gives rise to the name of Left-Associative Grammar. 1.6 Definition The
of a Left-Associative
i-th left-associative
Rule
rule r, is the triple (co/, SC, rpt).
A left-associative rule r, takes a sentence start ss and a next word nw as input, and applies the categorial operation co, to the sentence category cat-1 and the next word category cat-2. If the input condition of the categorial operation is satisfied by (cat-1, cat-2), the application of r, is successful and an output is derived. The output consists of the pair (rp,, ss')> where rp, is a rule package and ss'
is a resulting
sentence
start.
If the
is not satisfied by (cat-1, cat-2), no output is derived.
input
condition
the application
of
the
categorial
operation
of rule r, is not successful
and
The rule package rp, provided by the rule r, contains all rules which may apply after rule r, was successful. A rule package is defined as a set of rule names, where the name of a rule rg is the place number g of its categorial operation cog in the sequence CO.4 In practice, the rules are called by more mnemonic
nw
=
such
names,
The resulting (wn+i
as
sentence
cat-2),
then
or
'rule-g'
'Fverb+main\
start ss' is derived as follows. ss'
=
(
cat-3),
where
If ss = (A? cat-1) and
is derived
from An and wrt+1 by SC, and cat-3 is derived from cat-1 and cat-2 by co,-. The categorial operation co, specifies the input categories and the output category by 3I.e., a pair consisting
of a sentence
start and a next word.
4To give computational recursion a form that is not impredicative in the mathematical Dana Scott suggested the use of rule names in the rule packages.
Professor
sense,
124
ROLAND HAUSSER
of category expressions. Category expressions may contain variables for category segments (written as segl, seg2, etc.), and for sequences of category segments of length > 0 (written as X, Y, etc.).
means
1.7 Definition
of a Category
Expression
is a list consisting of zero or more cate expression more zero or segment variables, and zero or more gory segments, The variables. empty list is represented as e. segment-sequence
A
category
Examples of category expressions are (a), (a X), where a is a category segment eCA category cat-i, with a category expression CAT-i, if the structure of pattern specified by the expression. For example, the is matched by categories like (a), (a a), (a b c), etc.
(X), (segl a X), (X a Y), is compatible cat-i e C, the the category matches (a X)
category expression
A category expression CAT-m is in the subset relation with the category expression CAT-n (CAT-m C CAT-n) if every category compatible with CAT m is compatible with CAT-n, and similarly for C, =, ? ?, and ?. A list of category segments (without any variables), e.g., (a), (a a), or e, is regarded as it is a Otherwise a category if it occurs as part of an analyzed expression. as CAT-i, category expression. Category expressions are represented abstractly while compatible categories are represented as cat-i. rules are notated by expressions of the form [(ss nw) => (rp, Left-associative x and are ss')], where ss, ss' expressions representing sentence starts G (W+ C), nw
is an
expression
representing
a
'next
word'
G
(W
x
C+).
Because
surface
composition SC is the same in all left-associative rules, itmay be omitted in the definitions. This results in the following simplified notation of left-associative rules:
1.8 Notation The
of a Left-Associative
Rule
i-th rule of an LA-grammar
r,: [CAT-1 CAT-2]
has the form
=? [rp, CAT-3],
where CAT-1, CAT-2, and CAT-3 are the category expressions ss, nw, and ss', respectively, and rp, is the rule package of r,.
of
is assumed implicitly in 1.8. If the categorial that surface concatenation cannot be cot expressed directly by the structure of the category ex operation pressions CAT-1, CAT-2, and CAT-3, a rule may be augmented with additional and the relationships among clauses which explain the category expressions them. If the categorial operation of a rule does not express a function, the rule is not considered well-formed.
Note
LEFT-ASSOCIATIVE GRAMMAR 125 2.
THE DERIVATIONAL
STRUCTURE
OF LA-GRAMMAR
In LA-grammar a complete well-formed expression is derived in a sequence of transitions from a start state via a number of rule states to a final state. These notions have the following definitions. 2.1 Definition
of the Set of Rule
States
A rule state in LA-grammar is defined as a pair [rp? CAT-3] where a is rule and is the category expression specified CAT-3 rp, package in the output of co,. The state associated with rule r, is called st,. The set of all rule states is called ST/?. 2.2 Definition
of the Set of Start States
If rps is the start rule package of an LA-grammar, nw = (surf cat-i), then [rps cat-i] is a start state.
and nw e LX,
For the sake of efficiency and precision, the set of start states is usually not defined with all lexical categories, but with a set of initial category expressions IC, defined as the set of all the category expressions which appear as CAT-1 in any of the rules in rp5, i.e., STS =d*f {[rps, CAT-1] |CAT-1 e IC}. 2.3 Definition
of the Set of Final
States
Given the set of 'final rule packages' RPf C RP, and the associated set of states {[rp, CAT-3,-] | rp, e RP/r}, the set of final states ST/r |CAT-3,/7 C CAT-3,}. =def {[rp, CAT-3^1 In other words, expressions
final states are like certain rule states, except be more
may
2.4 Definition
that their category
restricted.
of a State
A state is an element of the set STS U ST/? u STF. The set of states in an LA-grammar and
n rules,
2.5 Definition
it has
at most
2n
+
is finite.
If an LA-grammar
has / start states
/ states.
of a State Token
A
state (rpt CAT-3) represents a possibly infinite number of state tokens (rp, cat-3), where CAT-3 is a category expression and cat-3 represents categories compatible with CAT-3.
The
set of state tokens is unbounded because there is no upper limit on the length of categories generated by the rules. Two state tokens are equivalent only if they consist of the same rule package and the same category token.
ROLAND HAUSSER
126 2.6 Definition
of an Application
Set
An application set in LA-grammar is defined as a pair [rp, (cat-1 a where is rule rp, cat-2)] package and (cat-1 cat-2) is a pair of categories. Application
sets are derived from
2.7 Definition
of the nw-Intake
states by means
of the following
function:
Function
The function nw-intake takes a state token st, (0 < i < n) of the form [rp, cat-1] and a nw of the form (surf cat-2), nw e LX, and renders an application set of the form [rp, (cat-1 cat-2)] as output State function:
tokens are derived from
2.8 Definition
of the Application
application
sets by means
of the following
Function
The function application takes an application set [rp, (cat-1 cat-2)] as input, applies each rule j e RP, to (cat-1 cat-2), and renders a (possibly empty) set of state tokens as output. The interaction of states, nw-intake, applications sets, and application left-associative derivation is illustrated in the following schema: 2.9 The Recursion
of a Left-Associative
Derivation
STATE TOKENS [rps cat-1]
|
[rp/cat-1'] .
im
I
[rp/cat-r]-. |-m^ [rp/c cat-1'"]
f-^
APPLICATION
NW-INTAKE
L?[rpy
(cat-1" cat-2")]
^
I-[rp/ (cat-1' cat-2')]
^I
[rps (cat-1 cat-2)]
^
APPLICATION SETS
I
in a
GRAMMAR127 LEFT-ASSOCIATIVE 2.10 Definition
of a Left-Associative
Transition
A left-associative transition is a function from a state into a set of transition is a composite function, rule states. A left-associative of and nw-intake application. consisting transition can result in two different types of ambiguity, left-associative lexical ambiguity and syntactic ambiguity. Each reading of a lexically ambiguous next word is represented by a category; if an nw has i readings in nw-intake will create i application sets. Syntactic ambiguity arises when, a given application set, more than one state is generated because more than A
called
one rule in the rule package accepts the input pair. Since lexical ambiguity is associated with nw-intake and syntactic ambiguity is associated with application, in a transition. both kinds of ambiguity may occur simultaneously 2.11 Definition
of the Set of Well-Formed
The setWE of well-formed is defined as follows:
expressions
Expressions generated by an LA-grammar
1. If rps is the start rule package, and nw = (surf cat-i) is in LX, then [rp5 ((surf) cat-i)] G WE. 2. If (rpi, ss) G WE, j G rp,, and nw, nw G LX, is accepted by r;: [(ss nw) => (rp; ss')], then (rp, ss') G WE. 3. Nothing is inWE unless it so follows from (1) and (2). WE
is also called the reflexive-transitive closure of an LA-grammar. an is if it can be continued Intuitively, expression regarded as well-formed into a complete expression. For example, [rp-2 (aaab (bbccc))] is a well-formed expression of the language akbkck defined below because it can be continued into the complete well-formed expression [rp-3 (aaabbbccc, e)]. The set of complete well-formed expressions of a language is characterized by the final states of its grammar. Because [rp-3, e] is the final state of akbkck, all well formed expressions of that language with rp-3 as their rule package and e as their category are considered complete. 2.12 Definition The {s
of the Set of Surfaces
set of 'phrases' or surfaces S, S C W+, s | is the surface of we G WE}.
is
Special subsets of the well-formed expressions and the surfaces are the com plete well-formed expressions and the complete surfaces, respectively, which are defined as follows. 2.13 Definition
of the Set of Complete
Well-Formed
Expressions
The set of complete well-formed expressions CWE C WE of pairs [rp/ (s/ C/)], where [rp/ c/] G STF.
is the set
128 2.14 Definition The {s/
3.
ROLAND HAUSSER
of the Set of Complete
Surfaces
or complete surfaces CS, CS C W+, | S/ is the surface of ewe G CWE}.
set of 'sentences'
THE FORMAL
AN EXAMPLE:
LANGUAGE
is
akbkck
An LA-grammar is usually specified by (i) a lexicon LX, (ii) a set of start states STs, (iii) a sequence of rules, and (iv) a set of final states STF. Let us illustrate this general format of LA-grammars with a simple example of a formal language, namely the context-sensitive language akbkck. 3.1 The Definition
LX=^
of akb*ck5
{[a(bc)],[b(b)],[c(c)]}
STs=def {({r-l,r-2}(bc))} r-1: [(X)(bc)] =>[{r-l,r-2}(bXc)], r-2: [(bXc) (b)] => [{r-2, r-3} (Xc)],
r-3: [(cX) (c)] => [{r-3} (X)] STF=^
{[rp-3e]}.
Given example 3.1, let us consider the relation between the definition of as a 6-tuple <W, C, LX, CO, RP, rp,s>, and the specification of LA-grammar an LA-grammar in terms of LX, STs, a list of rules, and STf. The sets of in the word surfaces W and category segments C are implicitly characterized definition of LX: W -d^ {a, b, c} and C =