wiki:ast

Version 7 (modified by xmedved1, 3 years ago) (diff)

--

Automatic Semantic Tool (AST)

Full semantic analysis of natural language (NL) texts is an open problem. The most comprehensive semantic systems build upon a mathematically sound formalism of a selected logical system. Mostly due to computability and efficiency, current systems work with the first order logic (or its variant). However, the low-order logic is not appropriate for capturing higher-order phenomena that occurs in natural language, such as belief attitudes, direct speech, or verb tenses. In our project, we develop new tool for automatic semantic analysis (AST) that emerged from (a module of) the Czech syntactic parser SYNT .

AST is now a standalone tool based on Transparent Intensional Logic (TIL). It works with the same input files (lexicons, semantic rules, ...) that were designed and developed in SYNT.

AST can provide a semantic analysis in the form of Transparent Intensional Logic (TIL) constructions independently on the input syntactic parser and language.

Adaptation for new language consists in a specification of four lexicon files that describe lexical items, verb valencies, prepositional valencies and a semantic grammar.

Input

To create a semantic structure of a sentence, AST needs the output from previous analysis. A usual output is in the form of a syntactic tree.

Textual form of syntactic tree:

<tree>
{##start## 
  {start 
    {ss 
      {clause 
        {VL<leaf><idx>0</idx><w>Jedl</w>
         <l>jíst</l><c>k5eAaIgMnS</c></leaf>} 
        {intr
          {adjp
            {ADJ<leaf><idx>1</idx><w>pečené</w>
             <l>pečený</l><c>k2eAgNnSc4</c></leaf>} 
          }
          {np 
            {N<leaf><idx>2</idx><w>kuře</w>
             <l>kuře</l><c>k1gNnSc4</c></leaf>} 
          }
        }
      }
    }
    {ends 
      {'.'<leaf><idx>3</idx><w>.</w><l>.</l><c>kX</c></leaf> } 
    }     
  }
}
</tree> 

Corresponding graphical representation:

No image "tree.png" attached to ast

Besides the tree nodes and edges, the tree contains morphological information about each word: a lemma and a PoS tag, which are used by AST for deriving implicit out-of-vocabulary type information.

Actual AST implementation is ale to process inputs form SYNT ad SET parsers. The previous example of syntactic tree is from output of SYNT parser.

The example of SET tree in textual form for sentence "Tom wants to buy a new car but he will not buy it.":

id word:nterm lemma tag pid til schema
0 N:Tom Tom k1gMnSc1;ca14 p
1 V:chce chtít k5eAaImIp3nS 15
2 V:koupit koupit k5eAaPmF 16
3 ADJ:nové nový k2eAgNnSc4d1 17
4 N:auto auto k1gNnSc4 17
5 PUNCT:, , kIx 10
6 CONJ:ale ale k8xC 10
7 V:nekoupí koupit k5eNaPmIp3nS 13
8 PRON:je on k3xPp3gNnSc4 13
9 PUNCT:. . kIx. 10
10 <CLAUSE> k5eNaPmIp3nS 12 vrule_sch ( $$ $@ )
11 <CLAUSE> k5eAaImIp3nS 12 vrule_sch ( $$ $@ )
12 <SENTENCE> -1
13 <VP> koupit k5eNaPmIp3nS 10 vrule_sch_add ( $$ $@ "#1H (#2)" )
14 <VP> chtít k5eAaImIp3nS 11 vrule_sch_add ( $$ $@ "#2H (#1)" )
15 <VP> chtít k5eAaImIp3nS 14 vrule_sch_add ( $$ $@ "#1H (#2)" )
16 <VP> koupit k5eAaPmF 15 vrule_sch_add ( $$ $@ "#1H (#2)" )
17 <NP> auto k1gNnSc4 16 rule_sch ( $$ $@ "[#1,#2]" )

Visual representation of SET structural tree tree:

No image "set_tree.png" attached to ast

Language Dependent Files

The core of AST system is universal and can be used for semantic analysis of any language. Besides main core the system also uses input files that are language dependent and that need to be modified for new language.

The Semantic Grammar: resulting semantic construction is built by bottom-up analysis based on the input syntactic tree provided by the syntactic parser and by a semantic extension of the actual grammar used in the parsing process. To know which rule was used by the parser, AST needs the semantic grammar file. This file contains specification of semantic actions that need to be done before propagation of particular node constructions to the higher level in the syntactic tree. The semantic actions define what logical functions correspond to each particular syntactic rule. For instance, the <np> node in graphical representation corresponds to the rule and action:

np -> left_modif np
rule_schema ( "[#1,#2]" )

which says that the resulting logical construction of the left-hand side np is obtained as a (logical) application of the left_modif (sub)construction to the right-hand side np (sub)construction. Example of building construction from two subconstructions is presnet in following example:

No image "analysis.png" attached to ast

TIL Types of Lexical Items: the second language dependent file defines lexical items and their TIL types. The types are hierarchically built from four simple TIL types:

  • o: representing the truth-values,
  • ι: class of individuals,
  • τ: class of time moments, and
  • ω: class of possible worlds.

AST contains rules for deriving implicit types based on PoS tags of the input words, so as the lexicons must prescribe the type only for cases that differ from the implicit definition. A lexical item example for the verb "jíst" (eat) is:

No image "jist.png" attached to ast

The exact format of the lexical item in the input file is as follows: the lemma starts on a separate line. After the lemma there is a list of lines where an (optional) POS tag filter precedes the resulting object schema (here otriv, i.e. o-trivialisation) and TIL type (here verbal object with one ι-argument).

Verb Valencies: the next language dependent file is a file that defines verb valencies and schema and type information for building the resulting construction from the corresponding valency frame. An example for the verb “jíst” (eat) is as follows:

jíst
hPTc4 :exists:V(v):V(v):and:V(v)=[[#0,try(#1)],V(w)]

This record defines the valency of <somebody> eats <something>, given by the brief valency frame hPTc4 of the object (an animate or inanimate noun phrase in accusative), and the resulting construction of the verbal object (V(v)) derived as an application of the verb (#0) to its argument (the sentence object) with possible extensification (try(#1)) and the appropriate possible world variable (V(w)).

Prepositional Valency Expressions: the last file that has to be specified for each language is a list of semantic mappings of prepositional phrases to valency expressions based on the head preposition. The file contains for each combination of a preposition and a grammatical case of the included noun phrase all possible valency slots corresponding to the prepositional phrase. For instance, the record for the preposition "k" (to) is displayed as

k
3 hA hH

saying that "k" can introduce prepositional phrase of a where-to direction hA (e.g. "k lesu" – "to a forest"), or a modal how/what specification hH (e.g. "k večeři" – "to a dinner").

System Parts

The AST system is implemented in the Python 2.7 programming language and consists of six main parts:

  • the input parser: reads standard input, extracts tree structures and creates tree object for each tree from input,
  • the grammar parser: reads the grammar file and assigns a grammar rule and appropriate actions to each node inside the tree,
  • the lexical item parser: reads the file with lexical item schemata and TIL types and assigns the type to each leaf in the tree structure,
  • the schema parser: according to a logical construction schema coming with a semantic action, this module creates a construction from sub-constructions,
  • the verb valency parser: picks up the correct valency for given sentence and triggers the schema parser on sub-constructions according to the schema coming with the valency, and
  • the prepositional valency expression parser: reads the possible valency expressions assigned to prepositional phrases used as (optional) valency slots in the actual sentence valency frame.
  • the sentence schema processor: if the sentence structure contains subordination or coordination clauses the sentence schema parser is triggered. The

sentence schemata are classified by the conjunctions used between clauses.

Download

You can download AST tool here