Changes between Initial Version and Version 1 of Ast


Ignore:
Timestamp:
Mar 15, 2019, 11:46:12 AM (5 years ago)
Author:
hales
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Ast

    v1 v1  
     1= Automatic Semantic Tool (AST) =
     2
     3Full semantic analysis of natural language (NL) texts is an open
     4problem. The most comprehensive semantic systems build upon a mathematically
     5sound formalism of a selected logical system. Mostly due to computability
     6and efficiency, current systems work with the first order logic (or its variant).
     7However, the low-order logic is not appropriate for capturing higher-order
     8phenomena that occurs in natural language, such as belief attitudes, direct
     9speech, or verb tenses. In our project, we develop new tool for automatic semantic analysis
     10(AST) that emerged from (a module of) the Czech syntactic parser [https://nlp.fi.muni.cz/trac/synt SYNT] .
     11
     12
     13AST is now a standalone tool  based on Transparent Intensional Logic (TIL).
     14It works with the same input files (lexicons, semantic rules, ...) that were designed and developed in SYNT.
     15
     16AST can provide a semantic analysis in the form of Transparent Intensional Logic (TIL) constructions independently on the input syntactic parser and language.
     17
     18Adaptation for new language consists in a specification of four lexicon files that describe lexical items, verb valencies, prepositional valencies and a semantic grammar.
     19
     20
     21== Input ==
     22
     23To create a semantic structure of a sentence, AST needs the output from
     24previous analysis. A usual output is in the form of a syntactic tree.
     25
     26'''Textual form of syntactic tree:'''
     27{{{
     28<tree>
     29{##start##
     30  {start
     31    {ss
     32      {clause
     33        {VL<leaf><idx>0</idx><w>Jedl</w>
     34         <l>jíst</l><c>k5eAaIgMnS</c></leaf>}
     35        {intr
     36          {adjp
     37            {ADJ<leaf><idx>1</idx><w>pečené</w>
     38             <l>pečený</l><c>k2eAgNnSc4</c></leaf>}
     39          }
     40          {np
     41            {N<leaf><idx>2</idx><w>kuře</w>
     42             <l>kuře</l><c>k1gNnSc4</c></leaf>}
     43          }
     44        }
     45      }
     46    }
     47    {ends
     48      {'.'<leaf><idx>3</idx><w>.</w><l>.</l><c>kX</c></leaf> }
     49    }     
     50  }
     51}
     52</tree>
     53}}}
     54
     55'''Corresponding graphical representation:'''
     56
     57[[Image(tree.png, 700px)]]
     58
     59Besides the tree nodes and edges, the tree contains morphological information about each word: a lemma and a PoS tag, which are used by AST for
     60deriving implicit out-of-vocabulary type information.
     61
     62Actual AST implementation is ale to process inputs form [https://nlp.fi.muni.cz/trac/synt SYNT] ad [https://nlp.fi.muni.cz/trac/set SET] parsers. The previous example of syntactic tree is from output of SYNT parser.
     63
     64The example of SET tree in textual form  for sentence "Tom wants to buy a new car but he will not buy it.":
     65
     66{{{
     67id word:nterm lemma tag pid til schema
     680 N:Tom Tom k1gMnSc1;ca14 p
     691 V:chce chtít k5eAaImIp3nS 15
     702 V:koupit koupit k5eAaPmF 16
     713 ADJ:nové nový k2eAgNnSc4d1 17
     724 N:auto auto k1gNnSc4 17
     735 PUNCT:, , kIx 10
     746 CONJ:ale ale k8xC 10
     757 V:nekoupí koupit k5eNaPmIp3nS 13
     768 PRON:je on k3xPp3gNnSc4 13
     779 PUNCT:. . kIx. 10
     7810 <CLAUSE> k5eNaPmIp3nS 12 vrule_sch ( $$ $@ )
     7911 <CLAUSE> k5eAaImIp3nS 12 vrule_sch ( $$ $@ )
     8012 <SENTENCE> -1
     8113 <VP> koupit k5eNaPmIp3nS 10 vrule_sch_add ( $$ $@ "#1H (#2)" )
     8214 <VP> chtít k5eAaImIp3nS 11 vrule_sch_add ( $$ $@ "#2H (#1)" )
     8315 <VP> chtít k5eAaImIp3nS 14 vrule_sch_add ( $$ $@ "#1H (#2)" )
     8416 <VP> koupit k5eAaPmF 15 vrule_sch_add ( $$ $@ "#1H (#2)" )
     8517 <NP> auto k1gNnSc4 16 rule_sch ( $$ $@ "[#1,#2]" )
     86}}}
     87
     88Visual representation of SET structural tree tree:
     89
     90[[Image(set_tree.png, 700px)]]
     91
     92
     93
     94== Language Dependent Files ==
     95
     96The core of AST system is universal and can be used for semantic analysis of any
     97language. Besides main core the system also uses input files that are language
     98dependent and that need to be modified for new language.
     99
     100
     101'''The Semantic Grammar''': resulting semantic construction is built by
     102bottom-up analysis based on the input syntactic tree provided by the syntactic
     103parser and by a semantic extension of the actual grammar used in the parsing
     104process. To know which rule was used by the parser, AST needs the semantic
     105grammar file. This file contains specification of semantic actions that need
     106to be done before propagation of particular node constructions to the higher
     107level in the syntactic tree. The semantic actions define what logical functions
     108correspond to each particular syntactic rule. For instance, the <np> node in
     109graphical representation corresponds to the rule and action:
     110
     111{{{
     112np -> left_modif np
     113rule_schema ( "[#1,#2]" )
     114}}}
     115
     116which says that the resulting logical construction of the left-hand side np is
     117obtained as a (logical) application of the left_modif (sub)construction to the
     118right-hand side np (sub)construction. Example of building construction from two subconstructions is presnet in following example:
     119
     120[[Image(analysis.png, 700px)]]
     121
     122'''TIL Types of Lexical Items''': the second language dependent file defines lexical
     123items and their TIL types. The types are hierarchically built from four simple
     124TIL types:
     125* o: representing the truth-values,
     126* ι: class of individuals,
     127* τ: class of time moments, and
     128* ω: class of possible worlds.
     129
     130AST contains rules for deriving implicit types based on PoS tags of the input
     131words, so as the lexicons must prescribe the type only for cases that differ from
     132the implicit definition. A lexical item example for the verb "jíst" (eat) is:
     133
     134[[Image(jist.png, 300px)]]
     135
     136The exact format of the lexical item in the input file is as follows: the lemma
     137starts on a separate line. After the lemma there is a list of lines where an
     138(optional) POS tag filter precedes the resulting object schema (here otriv, i.e.
     139o-trivialisation) and TIL type (here verbal object with one ι-argument).
     140
     141'''Verb Valencies''': the next language dependent file is a file that defines verb
     142valencies and schema and type information for building the resulting construction from the corresponding valency frame. An example for the verb “jíst” (eat)
     143is as follows:
     144
     145{{{
     146jíst
     147hPTc4 :exists:V(v):V(v):and:V(v)=[[#0,try(#1)],V(w)]
     148}}}
     149
     150This record defines the valency of <somebody> eats <something>, given by the
     151brief valency frame hPTc4 of the object (an animate or inanimate noun phrase in
     152accusative), and the resulting construction of the verbal object (V(v)) derived as
     153an application of the verb (!#0) to its argument (the sentence object) with possible
     154extensification (try(!#1)) and the appropriate possible world variable (V(w)).
     155
     156'''Prepositional Valency Expressions''': the last file that has to be specified for
     157each language is a list of semantic mappings of prepositional phrases to
     158valency expressions based on the head preposition. The file contains for each
     159combination of a preposition and a grammatical case of the included noun
     160phrase all possible valency slots corresponding to the prepositional phrase. For
     161instance, the record for the preposition "k" (to) is displayed as
     162
     163{{{
     164k
     1653 hA hH
     166}}}
     167
     168saying that "k" can introduce prepositional phrase of a where-to direction hA
     169(e.g. "k lesu" – "to a forest"), or a modal how/what specification hH (e.g. "k večeři"
     170– "to a dinner").
     171
     172= System Parts =
     173The AST system is implemented in the Python 2.7 programming language and
     174consists of six main parts:
     175* the input parser: reads standard input, extracts tree structures and creates tree object for each tree from input,
     176* the grammar parser: reads the grammar file and assigns a grammar rule and appropriate actions to each node inside the tree,
     177* the lexical item parser: reads the file with lexical item schemata and TIL types and assigns the type to each leaf in the tree structure,
     178* the schema parser: according to a logical construction schema coming with a semantic action, this module creates a construction from sub-constructions,
     179* the verb valency parser: picks up the correct valency for given sentence and triggers the schema parser on sub-constructions according to the schema coming with the valency, and
     180* the prepositional valency expression parser: reads the possible valency expressions assigned to prepositional phrases used as (optional) valency slots in the actual sentence valency frame.
     181* the sentence schema processor: if the sentence structure contains subordination or coordination clauses the sentence schema parser is triggered. The
     182sentence schemata are classified by the conjunctions used between clauses.
     183
     184= Download =
     185You can download AST tool [[attachment:ast_til.tar.xz|here]]
     186