wiki:sk/synt

The Synt parser

The Synt is a tool for automatic syntactic analysis for Czech and Slovak language. Synt parser is based on a context-free backbone enhanced with contextual actions and performs a stochastic agenda-based head-driven chart analysis. The input for Synt parser is morphologically annotated sentence in vertical form. The output of Synt parser are: a phrase-structure tree, a dependency graph and a set of syntactic structures.

Synt grammar specification

In synt parser we use meta-grammar concept with tree grammar forms. Synt parser is based on a context-free backbone enhanced with contextual actions and performs a stochastic agenda-based head-driven chart analysis.

Synt meta-grammar
In synt parser we use tree grammar forms denotated as G1, G2 and G3. The G1 meta-grammar form is designed for human experts. The meta-grammar form contains high-level generative constructs that reflect natural language phenomena (like eword order constraints). The meta-grammar form is base for G2 grammar form where the meta-grammar rules are expanded.
The G2 grammar form consists of context free rules with feature agreement tests and other contextual actions.
The G3 grammar form consist of standard rules of the expanded grammar with the actions remaining to garantee the contextual requirements.

The G1 meta-grammar form
The meta-grammar consists of global order constraints that provide succession of given terminals. Meta-grammar contains special flags that impose partial restrictions to given non-terminals and terminals on the right side of the rule. In grammar rules are used different arrow marks (->, -->, ==>, ===>), that specify rule type. The meaning of arrow form is: "the thicker and longer the arrow the more actions are able to be done in rule translation". The '->' arrow de-notates an ordinary context free grammar transcription and '===>' inserts possible integer_segment between right hand side constituents, checks the correct order of enclitics and supplies several forms of rule to make the verb phrase into a full sentence.

G1 combining constructs (generates variants of given terminals and non-terminals):

  • order()
  • rhs()
  • first()

Example

I will ask:  clause ===> order(VBU,R,VRI)

order(): generates all possible permutations of its components

first() and rhs(): are employed to implant content of all the right hand side of specified non-terminal to the rule. The rhs(N) inserts all possible rewritings of non-terminal N. The resulting terms are then subject to standard constraints, enclitic checking and inter-segmentation. The first(N) secure that N is firmly tried to the beginning.

Grammar contains several generative constructs starting with %list_* expression. This constructs defining rule templates, which automatically produce new rules for a list of the given non-terminals.

A significant portion of the grammar is made up by verb group rules, that contains frequent repetitive constructions in given verb group.

Example:

%group verbP={
  V:     verb_rule_schema($@,"(#1)")
         groupflag($1,"head"),
  VR R:  verb_rule_schema($@,"(#1 #2)")
         groupflag($1,"head"),
}
/* ctu/ptam se - I am reading/I am asking */
  clause ====> order(group(verbP), vi_list)
  verb_rule_schema($@,"#2")
  depends(getgroupflag($1,"head"), $2)

Here, the group verbP denotes two sets of non-terminals with the corresponding actions that are substituted for the expression group(verbP) on the RHS of the clause non-terminal.

flag(any string): refer to veerb group members in rules verb_rule_schema:

  • defines the port of verb group that form a verbal object in successive logical analysis
  • appears in group and rule right hand side

%marge_actions={verb_rule_schema}: gather and merge arguments of actions from verb_rule_schema into one resulting actiont

rule levels: express the occurrence of grammatical phenomena. The higher the level, the less frequent the appropriate grammaticalphenomena is.

Example:

3: np -> adj_group
   propagate_case_number_gender($1)

Rule is of level 3. When we turn the grammar level to at least 3, we allow adjective groups to form a separate intersegment.

head() and depends(): allow to express the dependency links between rule items. For example depends(A,B,C) means that B and C depends on A.

Second grammar form (G2)
As we have mentioned earlier, several pre-defined grammatical tests and procedures are used in the description of context actions associated with each grammatical rule of the system.

The pruning actions include:

  • grammatical case test for particular words and noun groups
  • agreement test of case in prepositional construction
  • agreement test of number and gender for relative pronouns
  • agreement test of case, number and gender for noun groups
  • type checking of logical constructions

Example:

np -> adj_group np
      rule_schema($@, "lwtx(awtx(#1) and awtx(#2))")
      rule_schema($@, "lwtx([[awt(#1),#2],x])")

The rule schema action presents a prescription for building a logical construction out of the sub-constructions from the right hand side. propagate_all and agree_*_and_propagate: compute and propagate all relevant grammatical information from the selected non-terminals on the right hand side to the one on the left hand side of the rule.

The Expanded Grammar Form (G3)
Transform G2 form with the contextual actions into the rules.

http://www.fi.muni.cz/~xmedved1/synt.jpg

Possible Synt output

phrase-structure tree

http://www.fi.muni.cz/~xmedved1/strom_synt_zac.png


dependency graph

http://www.fi.muni.cz/~xmedved1/graph.png


syntactic structure

[0-7) : Tlačil      auto,
        tlačiť - V  auto - N
                    Tlačil auto,
[2-4) : ktoré          sa           pokazilo
        ktorý - PRON   byť - PRON   pokaziť - V
                 ktoré sa pokazilo.

Commands

The input for Synt parser is morphological annotated sentence (majka for Czech and RFTagger for Slovak) in vertical format. To provide basic syntactic analysis:

cat sentence.vert | /nlp/projekty/syntax_sk/synt_sk/synt/synt -i vertical (Slovak)
cat sentence.vert | /nlp/synt/synt/synt -i vertical (Czech)

To provide syntactic analysis with phrase-structure tree output:

cat sentence.vert | /nlp/projekty/syntax_sk/synt_sk/synt/synt -i vertical -tt- | /nlp/projekty/set/set/TreeViewer/TreeViewer.py (Slovak)
cat sentence.vert | /nlp/synt/synt/synt -i vertical -tt- | /nlp/projekty/set/set/TreeViewer/TreeViewer.py(Czech)
Last modified 11 years ago Last modified on Jun 20, 2013, 11:27:10 AM