wiki:ast

Version 3 (modified by xmedved1, 5 years ago) (diff)

--

Automatic Semantic Tool (AST)

Full semantic analysis of natural language (NL) texts is an open problem. The most comprehensive semantic systems build upon a mathematically sound formalism of a selected logical system. Mostly due to computability and efficiency, current systems work with the first order logic (or its variant). However, the low-order logic is not appropriate for capturing higher-order phenomena that occurs in natural language, such as belief attitudes, direct speech, or verb tenses. In our project, we develop new tool for automatic semantic analysis (AST) that emerged from (a module of) the Czech syntactic parser SYNT .

AST is now a standalone tool based on Transparent Intensional Logic (TIL). It works with the same input files (lexicons, semantic rules, ...) that were designed and developed in SYNT.

AST can provide a semantic analysis in the form of Transparent Intensional Logic (TIL) constructions independently on the input syntactic parser and language.

Adaptation for new language consists in a specification of four lexicon files that describe lexical items, verb valencies, prepositional valencies and a semantic grammar.

Input

To create a semantic structure of a sentence, AST needs the output from previous analysis. A usual output is in the form of a syntactic tree.

Textual form of syntactic tree:

<tree>
{##start## 
  {start 
    {ss 
      {clause 
        {VL<leaf><idx>0</idx><w>Jedl</w>
         <l>jíst</l><c>k5eAaIgMnS</c></leaf>} 
        {intr
          {adjp
            {ADJ<leaf><idx>1</idx><w>pečené</w>
             <l>pečený</l><c>k2eAgNnSc4</c></leaf>} 
          }
          {np 
            {N<leaf><idx>2</idx><w>kuře</w>
             <l>kuře</l><c>k1gNnSc4</c></leaf>} 
          }
        }
      }
    }
    {ends 
      {'.'<leaf><idx>3</idx><w>.</w><l>.</l><c>kX</c></leaf> } 
    }     
  }
}
</tree> 

Corresponding graphical representation:

No image "tree.png" attached to ast

Besides the tree nodes and edges, the tree contains morphological information about each word: a lemma and a PoS tag, which are used by AST for deriving implicit out-of-vocabulary type information

Language Dependent Files

The core of AST system is universal and can be used for semantic analysis of any language. Besides main core the system also uses input files that are language dependent and that need to be modified for new language.

The Semantic Grammar: resulting semantic construction is built by bottom-up analysis based on the input syntactic tree provided by the syntactic parser and by a semantic extension of the actual grammar used in the parsing process. To know which rule was used by the parser, AST needs the semantic grammar file. This file contains specification of semantic actions that need to be done before propagation of particular node constructions to the higher level in the syntactic tree. The semantic actions define what logical functions correspond to each particular syntactic rule. For instance, the <np> node in graphical representation corresponds to the rule and action:

np -> left_modif np
rule_schema ( "[#1,#2]" )

which says that the resulting logical construction of the left-hand side np is obtained as a (logical) application of the left_modif (sub)construction to the right-hand side np (sub)construction. Example of building construction from two subconstructions is presnet in following example:

No image "analysis.png" attached to ast

TIL Types of Lexical Items: the second language dependent file defines lexical items and their TIL types. The types are hierarchically built from four simple TIL types:

  • o: representing the truth-values,
  • ι: class of individuals,
  • τ: class of time moments, and
  • ω: class of possible worlds.

AST contains rules for deriving implicit types based on PoS tags of the input words, so as the lexicons must prescribe the type only for cases that differ from the implicit definition. A lexical item example for the verb "jíst" (eat) is:

No image "jist.png" attached to ast

The exact format of the lexical item in the input file is as follows: the lemma starts on a separate line. After the lemma there is a list of lines where an (optional) POS tag filter precedes the resulting object schema (here otriv, i.e. o-trivialisation) and TIL type (here verbal object with one ι-argument).