3 | | Full semantic analysis of natural language (NL) texts is an open |
4 | | problem. The most comprehensive semantic systems build upon a mathematically |
5 | | sound formalism of a selected logical system. Mostly due to computability |
6 | | and efficiency, current systems work with the first order logic (or its variant). |
7 | | However, the low-order logic is not appropriate for capturing higher-order |
8 | | phenomena that occurs in natural language, such as belief attitudes, direct |
9 | | speech, or verb tenses. In our project, we develop new tool for automatic semantic analysis |
10 | | (AST) that emerged from (a module of) the Czech syntactic parser [https://nlp.fi.muni.cz/trac/synt SYNT] . |
11 | | |
12 | | |
13 | | AST is now a standalone tool based on Transparent Intensional Logic (TIL). |
14 | | It works with the same input files (lexicons, semantic rules, ...) that were designed and developed in SYNT. |
15 | | |
16 | | AST can provide a semantic analysis in the form of Transparent Intensional Logic (TIL) constructions independently on the input syntactic parser and language. |
17 | | |
18 | | Adaptation for new language consists in a specification of four lexicon files that describe lexical items, verb valencies, prepositional valencies and a semantic grammar. |
19 | | |
20 | | |
21 | | == Input == |
22 | | |
23 | | To create a semantic structure of a sentence, AST needs the output from |
24 | | previous analysis. A usual output is in the form of a syntactic tree. |
25 | | |
26 | | '''Textual form of syntactic tree:''' |
27 | | {{{ |
28 | | <tree> |
29 | | {##start## |
30 | | {start |
31 | | {ss |
32 | | {clause |
33 | | {VL<leaf><idx>0</idx><w>Jedl</w> |
34 | | <l>jíst</l><c>k5eAaIgMnS</c></leaf>} |
35 | | {intr |
36 | | {adjp |
37 | | {ADJ<leaf><idx>1</idx><w>pečené</w> |
38 | | <l>pečený</l><c>k2eAgNnSc4</c></leaf>} |
39 | | } |
40 | | {np |
41 | | {N<leaf><idx>2</idx><w>kuře</w> |
42 | | <l>kuře</l><c>k1gNnSc4</c></leaf>} |
43 | | } |
44 | | } |
45 | | } |
46 | | } |
47 | | {ends |
48 | | {'.'<leaf><idx>3</idx><w>.</w><l>.</l><c>kX</c></leaf> } |
49 | | } |
50 | | } |
51 | | } |
52 | | </tree> |
53 | | }}} |
54 | | |
55 | | '''Corresponding graphical representation:''' |
56 | | |
57 | | [[Image(tree.png, 700px)]] |
58 | | |
59 | | Besides the tree nodes and edges, the tree contains morphological information about each word: a lemma and a PoS tag, which are used by AST for |
60 | | deriving implicit out-of-vocabulary type information. |
61 | | |
62 | | Actual AST implementation is ale to process inputs form [https://nlp.fi.muni.cz/trac/synt SYNT] ad [https://nlp.fi.muni.cz/trac/set SET] parsers. The previous example of syntactic tree is from output of SYNT parser. |
63 | | |
64 | | The example of SET tree in textual form for sentence "Tom wants to buy a new car but he will not buy it.": |
65 | | |
66 | | {{{ |
67 | | id word:nterm lemma tag pid til schema |
68 | | 0 N:Tom Tom k1gMnSc1;ca14 p |
69 | | 1 V:chce chtít k5eAaImIp3nS 15 |
70 | | 2 V:koupit koupit k5eAaPmF 16 |
71 | | 3 ADJ:nové nový k2eAgNnSc4d1 17 |
72 | | 4 N:auto auto k1gNnSc4 17 |
73 | | 5 PUNCT:, , kIx 10 |
74 | | 6 CONJ:ale ale k8xC 10 |
75 | | 7 V:nekoupí koupit k5eNaPmIp3nS 13 |
76 | | 8 PRON:je on k3xPp3gNnSc4 13 |
77 | | 9 PUNCT:. . kIx. 10 |
78 | | 10 <CLAUSE> k5eNaPmIp3nS 12 vrule_sch ( $$ $@ ) |
79 | | 11 <CLAUSE> k5eAaImIp3nS 12 vrule_sch ( $$ $@ ) |
80 | | 12 <SENTENCE> -1 |
81 | | 13 <VP> koupit k5eNaPmIp3nS 10 vrule_sch_add ( $$ $@ "#1H (#2)" ) |
82 | | 14 <VP> chtít k5eAaImIp3nS 11 vrule_sch_add ( $$ $@ "#2H (#1)" ) |
83 | | 15 <VP> chtít k5eAaImIp3nS 14 vrule_sch_add ( $$ $@ "#1H (#2)" ) |
84 | | 16 <VP> koupit k5eAaPmF 15 vrule_sch_add ( $$ $@ "#1H (#2)" ) |
85 | | 17 <NP> auto k1gNnSc4 16 rule_sch ( $$ $@ "[#1,#2]" ) |
86 | | }}} |
87 | | |
88 | | Visual representation of SET structural tree tree: |
89 | | |
90 | | [[Image(set_tree.png, 700px)]] |
91 | | |
92 | | |
93 | | |
94 | | == Language Dependent Files == |
95 | | |
96 | | The core of AST system is universal and can be used for semantic analysis of any |
97 | | language. Besides main core the system also uses input files that are language |
98 | | dependent and that need to be modified for new language. |
99 | | |
100 | | |
101 | | '''The Semantic Grammar''': resulting semantic construction is built by |
102 | | bottom-up analysis based on the input syntactic tree provided by the syntactic |
103 | | parser and by a semantic extension of the actual grammar used in the parsing |
104 | | process. To know which rule was used by the parser, AST needs the semantic |
105 | | grammar file. This file contains specification of semantic actions that need |
106 | | to be done before propagation of particular node constructions to the higher |
107 | | level in the syntactic tree. The semantic actions define what logical functions |
108 | | correspond to each particular syntactic rule. For instance, the <np> node in |
109 | | graphical representation corresponds to the rule and action: |
110 | | |
111 | | {{{ |
112 | | np -> left_modif np |
113 | | rule_schema ( "[#1,#2]" ) |
114 | | }}} |
115 | | |
116 | | which says that the resulting logical construction of the left-hand side np is |
117 | | obtained as a (logical) application of the left_modif (sub)construction to the |
118 | | right-hand side np (sub)construction. Example of building construction from two subconstructions is presnet in following example: |
119 | | |
120 | | [[Image(analysis.png, 700px)]] |
121 | | |
122 | | '''TIL Types of Lexical Items''': the second language dependent file defines lexical |
123 | | items and their TIL types. The types are hierarchically built from four simple |
124 | | TIL types: |
125 | | * o: representing the truth-values, |
126 | | * ι: class of individuals, |
127 | | * τ: class of time moments, and |
128 | | * ω: class of possible worlds. |
129 | | |
130 | | AST contains rules for deriving implicit types based on PoS tags of the input |
131 | | words, so as the lexicons must prescribe the type only for cases that differ from |
132 | | the implicit definition. A lexical item example for the verb "jíst" (eat) is: |
133 | | |
134 | | [[Image(jist.png, 300px)]] |
135 | | |
136 | | The exact format of the lexical item in the input file is as follows: the lemma |
137 | | starts on a separate line. After the lemma there is a list of lines where an |
138 | | (optional) POS tag filter precedes the resulting object schema (here otriv, i.e. |
139 | | o-trivialisation) and TIL type (here verbal object with one ι-argument). |
140 | | |
141 | | '''Verb Valencies''': the next language dependent file is a file that defines verb |
142 | | valencies and schema and type information for building the resulting construction from the corresponding valency frame. An example for the verb “jíst” (eat) |
143 | | is as follows: |
144 | | |
145 | | {{{ |
146 | | jíst |
147 | | hPTc4 :exists:V(v):V(v):and:V(v)=[[#0,try(#1)],V(w)] |
148 | | }}} |
149 | | |
150 | | This record defines the valency of <somebody> eats <something>, given by the |
151 | | brief valency frame hPTc4 of the object (an animate or inanimate noun phrase in |
152 | | accusative), and the resulting construction of the verbal object (V(v)) derived as |
153 | | an application of the verb (!#0) to its argument (the sentence object) with possible |
154 | | extensification (try(!#1)) and the appropriate possible world variable (V(w)). |
155 | | |
156 | | '''Prepositional Valency Expressions''': the last file that has to be specified for |
157 | | each language is a list of semantic mappings of prepositional phrases to |
158 | | valency expressions based on the head preposition. The file contains for each |
159 | | combination of a preposition and a grammatical case of the included noun |
160 | | phrase all possible valency slots corresponding to the prepositional phrase. For |
161 | | instance, the record for the preposition "k" (to) is displayed as |
162 | | |
163 | | {{{ |
164 | | k |
165 | | 3 hA hH |
166 | | }}} |
167 | | |
168 | | saying that "k" can introduce prepositional phrase of a where-to direction hA |
169 | | (e.g. "k lesu" – "to a forest"), or a modal how/what specification hH (e.g. "k večeři" |
170 | | – "to a dinner"). |
171 | | |
172 | | = System Parts = |
173 | | The AST system is implemented in the Python 2.7 programming language and |
174 | | consists of six main parts: |
175 | | * the input parser: reads standard input, extracts tree structures and creates tree object for each tree from input, |
176 | | * the grammar parser: reads the grammar file and assigns a grammar rule and appropriate actions to each node inside the tree, |
177 | | * the lexical item parser: reads the file with lexical item schemata and TIL types and assigns the type to each leaf in the tree structure, |
178 | | * the schema parser: according to a logical construction schema coming with a semantic action, this module creates a construction from sub-constructions, |
179 | | * the verb valency parser: picks up the correct valency for given sentence and triggers the schema parser on sub-constructions according to the schema coming with the valency, and |
180 | | * the prepositional valency expression parser: reads the possible valency expressions assigned to prepositional phrases used as (optional) valency slots in the actual sentence valency frame. |
181 | | * the sentence schema processor: if the sentence structure contains subordination or coordination clauses the sentence schema parser is triggered. The |
182 | | sentence schemata are classified by the conjunctions used between clauses. |
183 | | |
184 | | = Download = |
185 | | You can download AST tool [[attachment:ast_til.tar.xz|here]] |
186 | | |
| 3 | Move to the [https://nlp.fi.muni.cz/trac/synt/wiki/Ast synt trac]. |