| 1 | |
| 2 | == The Synt parser == |
| 3 | The '''Synt''' is a tool for automatic syntactic analysis for Czech and Slovak language. Synt parser is based on a context-free backbone enhanced with contextual actions and performs a stochastic agenda-based head-driven chart analysis. The input for Synt parser is morphologically annotated sentence in vertical form. The output of Synt parser are: a phrase-structure tree, a dependency graph and a set of syntactic structures. |
| 4 | |
| 5 | |
| 6 | == Synt grammar specification == |
| 7 | In synt parser we use meta-grammar concept with tree grammar forms. Synt parser is based on a context-free backbone enhanced with contextual actions and performs a stochastic agenda-based head-driven chart analysis. |
| 8 | |
| 9 | '''Synt meta-grammar'''[[BR]] |
| 10 | In synt parser we use tree grammar forms denotated as G1, G2 and G3. The G1 meta-grammar form is designed for human experts. The meta-grammar form contains high-level generative constructs that reflect natural language phenomena (like eword order constraints). The meta-grammar form is base for G2 grammar form where the meta-grammar rules are expanded.[[BR]] |
| 11 | The G2 grammar form consists of context free rules with feature agreement tests and other contextual actions.[[BR]] |
| 12 | The G3 grammar form consist of standard rules of the expanded grammar with the actions remaining to garantee the contextual requirements. |
| 13 | |
| 14 | |
| 15 | '''The G1 meta-grammar form'''[[BR]] |
| 16 | The meta-grammar consists of global order constraints that provide succession of given terminals. Meta-grammar contains special flags that impose partial restrictions to given non-terminals and terminals on the right side of the rule. |
| 17 | In grammar rules are used different arrow marks (->, -->, ==>, ===>), that specify rule type. The meaning of arrow form is: "the thicker and longer the arrow the more actions are able to be done in rule translation". The '->' arrow de-notates an ordinary context free grammar transcription and '===>' inserts possible integer_segment between right hand side constituents, checks the correct order of enclitics and supplies several forms of rule to make the verb phrase into a full sentence.[[BR]] |
| 18 | [[BR]] |
| 19 | |
| 20 | G1 combining constructs (generates variants of given terminals and non-terminals): |
| 21 | • order()[[BR]] |
| 22 | • rhs()[[BR]] |
| 23 | • first()[[BR]] |
| 24 | '''Example''' |
| 25 | {{{ |
| 26 | I will ask: clause ===> order(VBU,R,VRI) |
| 27 | }}} |
| 28 | |
| 29 | ''order()'': generates all possible permutations of its components |
| 30 | |
| 31 | ''first()'' and ''rhs()'': are employed to implant content of all the right hand side of specified non-terminal to the rule. The ''rhs(N)'' inserts all possible rewritings of non-terminal N. The resulting terms are then subject to standard constraints, enclitic checking and inter-segmentation. The ''first(N)'' secure that N is firmly tried to the beginning. |
| 32 | |
| 33 | Grammar contains several generative constructs starting with %list_* expression. This constructs defining rule templates, which automatically produce new rules for a list of the given non-terminals. |
| 34 | |
| 35 | A significant portion of the grammar is made up by verb group rules, that contains frequent repetitive constructions in given verb group. |
| 36 | |
| 37 | '''Example:''' |
| 38 | |
| 39 | {{{ |
| 40 | %group verbP={ |
| 41 | V: verb_rule_schema($@,"(#1)") |
| 42 | groupflag($1,"head"), |
| 43 | VR R: verb_rule_schema($@,"(#1 #2)") |
| 44 | groupflag($1,"head"), |
| 45 | } |
| 46 | /* ctu/ptam se - I am reading/I am asking */ |
| 47 | clause ====> order(group(verbP), vi_list) |
| 48 | verb_rule_schema($@,"#2") |
| 49 | depends(getgroupflag($1,"head"), $2) |
| 50 | }}} |
| 51 | |
| 52 | Here, the group verbP denotes two sets of non-terminals with the corresponding actions that are substituted for the expression group(verbP) on the RHS of the clause non-terminal. |
| 53 | |
| 54 | ''flag(any string)'': refer to veerb group members in rules |
| 55 | ''verb_rule_schema'': |
| 56 | • defines the port of verb group that form a verbal object in successive logical analysis[[BR]] |
| 57 | • appears in group and rule right hand side |
| 58 | ''%marge_actions={verb_rule_schema}'': gather and merge arguments of actions from verb_rule_schema into one resulting actiont |
| 59 | |
| 60 | rule levels: express the occurrence of grammatical phenomena. The higher the level, the less frequent the appropriate grammaticalphenomena is. |
| 61 | |
| 62 | '''Example:''' |
| 63 | |
| 64 | {{{ |
| 65 | 3: np -> adj_group |
| 66 | propagate_case_number_gender($1) |
| 67 | }}} |
| 68 | |
| 69 | Rule is of level 3. When we turn the grammar level to at least 3, we allow adjective groups to form a separate intersegment. |
| 70 | |
| 71 | ''head()'' and ''depends()'': allow to express the dependency links between rule items. For example depends(A,B,C) means that B and C depends on A. |
| 72 | |
| 73 | |
| 74 | '''Second grammar form (G2)'''[[BR]] |
| 75 | As we have mentioned earlier, several pre-defined grammatical tests and procedures are used in the description of context actions associated with each grammatical rule of the system. |
| 76 | |
| 77 | The pruning actions include: |
| 78 | • grammatical case test for particular words and noun groups |
| 79 | • agreement test of case in prepositional construction |
| 80 | • agreement test of number and gender for relative pronouns |
| 81 | • agreement test of case, number and gender for noun groups |
| 82 | • type checking of logical constructions |
| 83 | |
| 84 | '''Example:''' |
| 85 | |
| 86 | {{{ |
| 87 | np -> adj_group np |
| 88 | rule_schema($@, "lwtx(awtx(#1) and awtx(#2))") |
| 89 | rule_schema($@, "lwtx([[awt(#1),#2],x])") |
| 90 | }}} |
| 91 | |
| 92 | The rule schema action presents a prescription for building a logical construction out of the sub-constructions from the right hand side. |
| 93 | ''propagate_all'' and ''agree_*_and_propagate'': compute and propagate all relevant grammatical information from the selected non-terminals on the right hand side to the one on the left hand side of the rule. |
| 94 | |
| 95 | |
| 96 | '''The Expanded Grammar Form (G3)'''[[BR]] |
| 97 | Transform G2 form with the contextual actions into the rules. |
| 98 | |
| 99 | [[Image(http://www.fi.muni.cz/~xmedved1/synt.jpg)]] |
| 100 | |
| 101 | |
| 102 | == Possible Synt output == |
| 103 | '''phrase-structure tree'''[[BR]] |
| 104 | |
| 105 | [[Image(http://www.fi.muni.cz/~xmedved1/strom_synt_zac.png)]][[BR]][[BR]] |
| 106 | [[BR]] |
| 107 | |
| 108 | '''dependency graph'''[[BR]] |
| 109 | |
| 110 | [[Image(http://www.fi.muni.cz/~xmedved1/graph.png)]][[BR]] |
| 111 | [[BR]] |
| 112 | [[BR]] |
| 113 | '''syntactic structure'''[[BR]] |
| 114 | |
| 115 | {{{ |
| 116 | [0-7) : Tlačil auto, |
| 117 | tlačiť - V auto - N |
| 118 | Tlačil auto, |
| 119 | [2-4) : ktoré sa pokazilo |
| 120 | ktorý - PRON byť - PRON pokaziť - V |
| 121 | ktoré sa pokazilo. |
| 122 | }}} |
| 123 | |
| 124 | |
| 125 | == Commands == |
| 126 | The input for Synt parser is morphological annotated sentence (majka for Czech and RFTagger for Slovak) in vertical format. |
| 127 | To provide basic syntactic analysis: [[BR]] |
| 128 | |
| 129 | {{{ |
| 130 | cat sentence.vert | /nlp/projekty/syntax_sk/synt_sk/synt/synt -i vertical (Slovak) |
| 131 | cat sentence.vert | /nlp/synt/synt/synt -i vertical (Czech) |
| 132 | }}} |
| 133 | To provide syntactic analysis with phrase-structure tree output: [[BR]] |
| 134 | |
| 135 | {{{ |
| 136 | cat sentence.vert | /nlp/projekty/syntax_sk/synt_sk/synt/synt -i vertical -tt- | /nlp/projekty/set/set/TreeViewer/TreeViewer.py (Slovak) |
| 137 | cat sentence.vert | /nlp/synt/synt/synt -i vertical -tt- | /nlp/projekty/set/set/TreeViewer/TreeViewer.py(Czech) |
| 138 | }}} |