Changes between Version 7 and Version 8 of ast


Ignore:
Timestamp:
Mar 15, 2019, 11:52:46 AM (3 years ago)
Author:
Ales Horak
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ast

    v7 v8  
    11= Automatic Semantic Tool (AST) =
    22
    3 Full semantic analysis of natural language (NL) texts is an open
    4 problem. The most comprehensive semantic systems build upon a mathematically
    5 sound formalism of a selected logical system. Mostly due to computability
    6 and efficiency, current systems work with the first order logic (or its variant).
    7 However, the low-order logic is not appropriate for capturing higher-order
    8 phenomena that occurs in natural language, such as belief attitudes, direct
    9 speech, or verb tenses. In our project, we develop new tool for automatic semantic analysis
    10 (AST) that emerged from (a module of) the Czech syntactic parser [https://nlp.fi.muni.cz/trac/synt SYNT] .
    11 
    12 
    13 AST is now a standalone tool  based on Transparent Intensional Logic (TIL).
    14 It works with the same input files (lexicons, semantic rules, ...) that were designed and developed in SYNT.
    15 
    16 AST can provide a semantic analysis in the form of Transparent Intensional Logic (TIL) constructions independently on the input syntactic parser and language.
    17 
    18 Adaptation for new language consists in a specification of four lexicon files that describe lexical items, verb valencies, prepositional valencies and a semantic grammar.
    19 
    20 
    21 == Input ==
    22 
    23 To create a semantic structure of a sentence, AST needs the output from
    24 previous analysis. A usual output is in the form of a syntactic tree.
    25 
    26 '''Textual form of syntactic tree:'''
    27 {{{
    28 <tree>
    29 {##start##
    30   {start
    31     {ss
    32       {clause
    33         {VL<leaf><idx>0</idx><w>Jedl</w>
    34          <l>jíst</l><c>k5eAaIgMnS</c></leaf>}
    35         {intr
    36           {adjp
    37             {ADJ<leaf><idx>1</idx><w>pečené</w>
    38              <l>pečený</l><c>k2eAgNnSc4</c></leaf>}
    39           }
    40           {np
    41             {N<leaf><idx>2</idx><w>kuře</w>
    42              <l>kuře</l><c>k1gNnSc4</c></leaf>}
    43           }
    44         }
    45       }
    46     }
    47     {ends
    48       {'.'<leaf><idx>3</idx><w>.</w><l>.</l><c>kX</c></leaf> }
    49     }     
    50   }
    51 }
    52 </tree>
    53 }}}
    54 
    55 '''Corresponding graphical representation:'''
    56 
    57 [[Image(tree.png, 700px)]]
    58 
    59 Besides the tree nodes and edges, the tree contains morphological information about each word: a lemma and a PoS tag, which are used by AST for
    60 deriving implicit out-of-vocabulary type information.
    61 
    62 Actual AST implementation is ale to process inputs form [https://nlp.fi.muni.cz/trac/synt SYNT] ad [https://nlp.fi.muni.cz/trac/set SET] parsers. The previous example of syntactic tree is from output of SYNT parser.
    63 
    64 The example of SET tree in textual form  for sentence "Tom wants to buy a new car but he will not buy it.":
    65 
    66 {{{
    67 id word:nterm lemma tag pid til schema
    68 0 N:Tom Tom k1gMnSc1;ca14 p
    69 1 V:chce chtít k5eAaImIp3nS 15
    70 2 V:koupit koupit k5eAaPmF 16
    71 3 ADJ:nové nový k2eAgNnSc4d1 17
    72 4 N:auto auto k1gNnSc4 17
    73 5 PUNCT:, , kIx 10
    74 6 CONJ:ale ale k8xC 10
    75 7 V:nekoupí koupit k5eNaPmIp3nS 13
    76 8 PRON:je on k3xPp3gNnSc4 13
    77 9 PUNCT:. . kIx. 10
    78 10 <CLAUSE> k5eNaPmIp3nS 12 vrule_sch ( $$ $@ )
    79 11 <CLAUSE> k5eAaImIp3nS 12 vrule_sch ( $$ $@ )
    80 12 <SENTENCE> -1
    81 13 <VP> koupit k5eNaPmIp3nS 10 vrule_sch_add ( $$ $@ "#1H (#2)" )
    82 14 <VP> chtít k5eAaImIp3nS 11 vrule_sch_add ( $$ $@ "#2H (#1)" )
    83 15 <VP> chtít k5eAaImIp3nS 14 vrule_sch_add ( $$ $@ "#1H (#2)" )
    84 16 <VP> koupit k5eAaPmF 15 vrule_sch_add ( $$ $@ "#1H (#2)" )
    85 17 <NP> auto k1gNnSc4 16 rule_sch ( $$ $@ "[#1,#2]" )
    86 }}}
    87 
    88 Visual representation of SET structural tree tree:
    89 
    90 [[Image(set_tree.png, 700px)]]
    91 
    92 
    93 
    94 == Language Dependent Files ==
    95 
    96 The core of AST system is universal and can be used for semantic analysis of any
    97 language. Besides main core the system also uses input files that are language
    98 dependent and that need to be modified for new language.
    99 
    100 
    101 '''The Semantic Grammar''': resulting semantic construction is built by
    102 bottom-up analysis based on the input syntactic tree provided by the syntactic
    103 parser and by a semantic extension of the actual grammar used in the parsing
    104 process. To know which rule was used by the parser, AST needs the semantic
    105 grammar file. This file contains specification of semantic actions that need
    106 to be done before propagation of particular node constructions to the higher
    107 level in the syntactic tree. The semantic actions define what logical functions
    108 correspond to each particular syntactic rule. For instance, the <np> node in
    109 graphical representation corresponds to the rule and action:
    110 
    111 {{{
    112 np -> left_modif np
    113 rule_schema ( "[#1,#2]" )
    114 }}}
    115 
    116 which says that the resulting logical construction of the left-hand side np is
    117 obtained as a (logical) application of the left_modif (sub)construction to the
    118 right-hand side np (sub)construction. Example of building construction from two subconstructions is presnet in following example:
    119 
    120 [[Image(analysis.png, 700px)]]
    121 
    122 '''TIL Types of Lexical Items''': the second language dependent file defines lexical
    123 items and their TIL types. The types are hierarchically built from four simple
    124 TIL types:
    125 * o: representing the truth-values,
    126 * ι: class of individuals,
    127 * τ: class of time moments, and
    128 * ω: class of possible worlds.
    129 
    130 AST contains rules for deriving implicit types based on PoS tags of the input
    131 words, so as the lexicons must prescribe the type only for cases that differ from
    132 the implicit definition. A lexical item example for the verb "jíst" (eat) is:
    133 
    134 [[Image(jist.png, 300px)]]
    135 
    136 The exact format of the lexical item in the input file is as follows: the lemma
    137 starts on a separate line. After the lemma there is a list of lines where an
    138 (optional) POS tag filter precedes the resulting object schema (here otriv, i.e.
    139 o-trivialisation) and TIL type (here verbal object with one ι-argument).
    140 
    141 '''Verb Valencies''': the next language dependent file is a file that defines verb
    142 valencies and schema and type information for building the resulting construction from the corresponding valency frame. An example for the verb “jíst” (eat)
    143 is as follows:
    144 
    145 {{{
    146 jíst
    147 hPTc4 :exists:V(v):V(v):and:V(v)=[[#0,try(#1)],V(w)]
    148 }}}
    149 
    150 This record defines the valency of <somebody> eats <something>, given by the
    151 brief valency frame hPTc4 of the object (an animate or inanimate noun phrase in
    152 accusative), and the resulting construction of the verbal object (V(v)) derived as
    153 an application of the verb (!#0) to its argument (the sentence object) with possible
    154 extensification (try(!#1)) and the appropriate possible world variable (V(w)).
    155 
    156 '''Prepositional Valency Expressions''': the last file that has to be specified for
    157 each language is a list of semantic mappings of prepositional phrases to
    158 valency expressions based on the head preposition. The file contains for each
    159 combination of a preposition and a grammatical case of the included noun
    160 phrase all possible valency slots corresponding to the prepositional phrase. For
    161 instance, the record for the preposition "k" (to) is displayed as
    162 
    163 {{{
    164 k
    165 3 hA hH
    166 }}}
    167 
    168 saying that "k" can introduce prepositional phrase of a where-to direction hA
    169 (e.g. "k lesu" – "to a forest"), or a modal how/what specification hH (e.g. "k večeři"
    170 – "to a dinner").
    171 
    172 = System Parts =
    173 The AST system is implemented in the Python 2.7 programming language and
    174 consists of six main parts:
    175 * the input parser: reads standard input, extracts tree structures and creates tree object for each tree from input,
    176 * the grammar parser: reads the grammar file and assigns a grammar rule and appropriate actions to each node inside the tree,
    177 * the lexical item parser: reads the file with lexical item schemata and TIL types and assigns the type to each leaf in the tree structure,
    178 * the schema parser: according to a logical construction schema coming with a semantic action, this module creates a construction from sub-constructions,
    179 * the verb valency parser: picks up the correct valency for given sentence and triggers the schema parser on sub-constructions according to the schema coming with the valency, and
    180 * the prepositional valency expression parser: reads the possible valency expressions assigned to prepositional phrases used as (optional) valency slots in the actual sentence valency frame.
    181 * the sentence schema processor: if the sentence structure contains subordination or coordination clauses the sentence schema parser is triggered. The
    182 sentence schemata are classified by the conjunctions used between clauses.
    183 
    184 = Download =
    185 You can download AST tool [[attachment:ast_til.tar.xz|here]]
    186 
     3Move to the [https://nlp.fi.muni.cz/trac/synt/wiki/Ast synt trac].