Changes between Version 2 and Version 3 of LexicalAnalysis


Ignore:
Timestamp:
Oct 11, 2013, 12:41:33 PM (11 years ago)
Author:
xmedved1
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • LexicalAnalysis

    v2 v3  
    1616This new lexical analysis is based on system re2c. Re2c is a tool for writing very fast and very flexible scanners. The input for this program is program written in very specific format. [http://re2c.org/manual.html] But it still contains C like code. So we decide to transform text file that contains only regular expressions and actions if this regular expression is matched to re2c like file. For this task we create script written in python called "RE2re2c.py".
    1717
    18 The input for script "RE2re2c.py" is text file that contains regular expressions for word, lemma, tag and list of actions. Strings in each part are writen in quotes. This four parts are divided by tabulator white space "\t".
     18The input for script "RE2re2c.py" is text file that contains regular expressions for word, lemma, tag and list of actions (separated by ";"). Strings in each part are writen in quotes. This four parts are divided by tabulator white space "\t".
    1919
    2020{{{
     
    5858}}}
    5959
     60In lexical rules and macros you can use predefined variables word, lemma, morf_info, word index and lemma index. For example if you want to write lexical rule for words that are not in the firs place of sentence:
     61
     62{{{
     63"k1".*  .*  {ONLY_FIRST_UPPER}    if (word_index>0){lemma->preterm=__SYNT_NTERM_NPR;lemma = word->duplicateLemma(LEMMAINDEX);}
     64}}}
     65
    6066The output of "RE2re2c.py" is re2c like file and all strings from text file is transformed into Unicode code point format. If lexical rule contains signs ".*" in RE than it is replaced by special macro STRING. The STRING macro is specified as follows: {{{STRING = [^\t\n\r\0]*}}} '''This is only special macro and user is not allowed to specified macro with name STRING.'''
    6167