| 1 | = Automatic Semantic Tool (AST) = |
| 2 | |
| 3 | Full semantic analysis of natural language (NL) texts is an open |
| 4 | problem. The most comprehensive semantic systems build upon a mathematically |
| 5 | sound formalism of a selected logical system. Mostly due to computability |
| 6 | and efficiency, current systems work with the first order logic (or its variant). |
| 7 | However, the low-order logic is not appropriate for capturing higher-order |
| 8 | phenomena that occurs in natural language, such as belief attitudes, direct |
| 9 | speech, or verb tenses. In our project, we develop new tool for automatic semantic analysis |
| 10 | (AST) that emerged from (a module of) the Czech syntactic parser [https://nlp.fi.muni.cz/trac/synt SYNT] . |
| 11 | |
| 12 | |
| 13 | AST is now a standalone tool based on Transparent Intensional Logic (TIL). |
| 14 | It works with the same input files (lexicons, semantic rules, ...) that were designed and developed in SYNT. |
| 15 | |
| 16 | AST can provide a semantic analysis in the form of Transparent Intensional Logic (TIL) constructions independently on the input syntactic parser and language. |
| 17 | |
| 18 | Adaptation for new language consists in a specification of four lexicon files that describe lexical items, verb valencies, prepositional valencies and a semantic grammar. |
| 19 | |
| 20 | |
| 21 | == Input == |
| 22 | |
| 23 | To create a semantic structure of a sentence, AST needs the output from |
| 24 | previous analysis. A usual output is in the form of a syntactic tree. |
| 25 | |
| 26 | '''Textual form of syntactic tree:''' |
| 27 | {{{ |
| 28 | <tree> |
| 29 | {##start## |
| 30 | {start |
| 31 | {ss |
| 32 | {clause |
| 33 | {VL<leaf><idx>0</idx><w>Jedl</w> |
| 34 | <l>jíst</l><c>k5eAaIgMnS</c></leaf>} |
| 35 | {intr |
| 36 | {adjp |
| 37 | {ADJ<leaf><idx>1</idx><w>pečené</w> |
| 38 | <l>pečený</l><c>k2eAgNnSc4</c></leaf>} |
| 39 | } |
| 40 | {np |
| 41 | {N<leaf><idx>2</idx><w>kuře</w> |
| 42 | <l>kuře</l><c>k1gNnSc4</c></leaf>} |
| 43 | } |
| 44 | } |
| 45 | } |
| 46 | } |
| 47 | {ends |
| 48 | {'.'<leaf><idx>3</idx><w>.</w><l>.</l><c>kX</c></leaf> } |
| 49 | } |
| 50 | } |
| 51 | } |
| 52 | </tree> |
| 53 | }}} |
| 54 | |
| 55 | '''Corresponding graphical representation:''' |
| 56 | |
| 57 | [[Image(tree.png, 700px)]] |
| 58 | |
| 59 | Besides the tree nodes and edges, the tree contains morphological information about each word: a lemma and a PoS tag, which are used by AST for |
| 60 | deriving implicit out-of-vocabulary type information. |
| 61 | |
| 62 | Actual AST implementation is ale to process inputs form [https://nlp.fi.muni.cz/trac/synt SYNT] ad [https://nlp.fi.muni.cz/trac/set SET] parsers. The previous example of syntactic tree is from output of SYNT parser. |
| 63 | |
| 64 | The example of SET tree in textual form for sentence "Tom wants to buy a new car but he will not buy it.": |
| 65 | |
| 66 | {{{ |
| 67 | id word:nterm lemma tag pid til schema |
| 68 | 0 N:Tom Tom k1gMnSc1;ca14 p |
| 69 | 1 V:chce chtít k5eAaImIp3nS 15 |
| 70 | 2 V:koupit koupit k5eAaPmF 16 |
| 71 | 3 ADJ:nové nový k2eAgNnSc4d1 17 |
| 72 | 4 N:auto auto k1gNnSc4 17 |
| 73 | 5 PUNCT:, , kIx 10 |
| 74 | 6 CONJ:ale ale k8xC 10 |
| 75 | 7 V:nekoupí koupit k5eNaPmIp3nS 13 |
| 76 | 8 PRON:je on k3xPp3gNnSc4 13 |
| 77 | 9 PUNCT:. . kIx. 10 |
| 78 | 10 <CLAUSE> k5eNaPmIp3nS 12 vrule_sch ( $$ $@ ) |
| 79 | 11 <CLAUSE> k5eAaImIp3nS 12 vrule_sch ( $$ $@ ) |
| 80 | 12 <SENTENCE> -1 |
| 81 | 13 <VP> koupit k5eNaPmIp3nS 10 vrule_sch_add ( $$ $@ "#1H (#2)" ) |
| 82 | 14 <VP> chtít k5eAaImIp3nS 11 vrule_sch_add ( $$ $@ "#2H (#1)" ) |
| 83 | 15 <VP> chtít k5eAaImIp3nS 14 vrule_sch_add ( $$ $@ "#1H (#2)" ) |
| 84 | 16 <VP> koupit k5eAaPmF 15 vrule_sch_add ( $$ $@ "#1H (#2)" ) |
| 85 | 17 <NP> auto k1gNnSc4 16 rule_sch ( $$ $@ "[#1,#2]" ) |
| 86 | }}} |
| 87 | |
| 88 | Visual representation of SET structural tree tree: |
| 89 | |
| 90 | [[Image(set_tree.png, 700px)]] |
| 91 | |
| 92 | |
| 93 | |
| 94 | == Language Dependent Files == |
| 95 | |
| 96 | The core of AST system is universal and can be used for semantic analysis of any |
| 97 | language. Besides main core the system also uses input files that are language |
| 98 | dependent and that need to be modified for new language. |
| 99 | |
| 100 | |
| 101 | '''The Semantic Grammar''': resulting semantic construction is built by |
| 102 | bottom-up analysis based on the input syntactic tree provided by the syntactic |
| 103 | parser and by a semantic extension of the actual grammar used in the parsing |
| 104 | process. To know which rule was used by the parser, AST needs the semantic |
| 105 | grammar file. This file contains specification of semantic actions that need |
| 106 | to be done before propagation of particular node constructions to the higher |
| 107 | level in the syntactic tree. The semantic actions define what logical functions |
| 108 | correspond to each particular syntactic rule. For instance, the <np> node in |
| 109 | graphical representation corresponds to the rule and action: |
| 110 | |
| 111 | {{{ |
| 112 | np -> left_modif np |
| 113 | rule_schema ( "[#1,#2]" ) |
| 114 | }}} |
| 115 | |
| 116 | which says that the resulting logical construction of the left-hand side np is |
| 117 | obtained as a (logical) application of the left_modif (sub)construction to the |
| 118 | right-hand side np (sub)construction. Example of building construction from two subconstructions is presnet in following example: |
| 119 | |
| 120 | [[Image(analysis.png, 700px)]] |
| 121 | |
| 122 | '''TIL Types of Lexical Items''': the second language dependent file defines lexical |
| 123 | items and their TIL types. The types are hierarchically built from four simple |
| 124 | TIL types: |
| 125 | * o: representing the truth-values, |
| 126 | * ι: class of individuals, |
| 127 | * τ: class of time moments, and |
| 128 | * ω: class of possible worlds. |
| 129 | |
| 130 | AST contains rules for deriving implicit types based on PoS tags of the input |
| 131 | words, so as the lexicons must prescribe the type only for cases that differ from |
| 132 | the implicit definition. A lexical item example for the verb "jíst" (eat) is: |
| 133 | |
| 134 | [[Image(jist.png, 300px)]] |
| 135 | |
| 136 | The exact format of the lexical item in the input file is as follows: the lemma |
| 137 | starts on a separate line. After the lemma there is a list of lines where an |
| 138 | (optional) POS tag filter precedes the resulting object schema (here otriv, i.e. |
| 139 | o-trivialisation) and TIL type (here verbal object with one ι-argument). |
| 140 | |
| 141 | '''Verb Valencies''': the next language dependent file is a file that defines verb |
| 142 | valencies and schema and type information for building the resulting construction from the corresponding valency frame. An example for the verb “jíst” (eat) |
| 143 | is as follows: |
| 144 | |
| 145 | {{{ |
| 146 | jíst |
| 147 | hPTc4 :exists:V(v):V(v):and:V(v)=[[#0,try(#1)],V(w)] |
| 148 | }}} |
| 149 | |
| 150 | This record defines the valency of <somebody> eats <something>, given by the |
| 151 | brief valency frame hPTc4 of the object (an animate or inanimate noun phrase in |
| 152 | accusative), and the resulting construction of the verbal object (V(v)) derived as |
| 153 | an application of the verb (!#0) to its argument (the sentence object) with possible |
| 154 | extensification (try(!#1)) and the appropriate possible world variable (V(w)). |
| 155 | |
| 156 | '''Prepositional Valency Expressions''': the last file that has to be specified for |
| 157 | each language is a list of semantic mappings of prepositional phrases to |
| 158 | valency expressions based on the head preposition. The file contains for each |
| 159 | combination of a preposition and a grammatical case of the included noun |
| 160 | phrase all possible valency slots corresponding to the prepositional phrase. For |
| 161 | instance, the record for the preposition "k" (to) is displayed as |
| 162 | |
| 163 | {{{ |
| 164 | k |
| 165 | 3 hA hH |
| 166 | }}} |
| 167 | |
| 168 | saying that "k" can introduce prepositional phrase of a where-to direction hA |
| 169 | (e.g. "k lesu" – "to a forest"), or a modal how/what specification hH (e.g. "k večeři" |
| 170 | – "to a dinner"). |
| 171 | |
| 172 | = System Parts = |
| 173 | The AST system is implemented in the Python 2.7 programming language and |
| 174 | consists of six main parts: |
| 175 | * the input parser: reads standard input, extracts tree structures and creates tree object for each tree from input, |
| 176 | * the grammar parser: reads the grammar file and assigns a grammar rule and appropriate actions to each node inside the tree, |
| 177 | * the lexical item parser: reads the file with lexical item schemata and TIL types and assigns the type to each leaf in the tree structure, |
| 178 | * the schema parser: according to a logical construction schema coming with a semantic action, this module creates a construction from sub-constructions, |
| 179 | * the verb valency parser: picks up the correct valency for given sentence and triggers the schema parser on sub-constructions according to the schema coming with the valency, and |
| 180 | * the prepositional valency expression parser: reads the possible valency expressions assigned to prepositional phrases used as (optional) valency slots in the actual sentence valency frame. |
| 181 | * the sentence schema processor: if the sentence structure contains subordination or coordination clauses the sentence schema parser is triggered. The |
| 182 | sentence schemata are classified by the conjunctions used between clauses. |
| 183 | |
| 184 | = Download = |
| 185 | You can download AST tool [[attachment:ast_til.tar.xz|here]] |
| 186 | |