Changes between Version 2 and Version 3 of LexicalAnalysis
- Timestamp:
- Oct 11, 2013, 12:41:33 PM (11 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
LexicalAnalysis
v2 v3 16 16 This new lexical analysis is based on system re2c. Re2c is a tool for writing very fast and very flexible scanners. The input for this program is program written in very specific format. [http://re2c.org/manual.html] But it still contains C like code. So we decide to transform text file that contains only regular expressions and actions if this regular expression is matched to re2c like file. For this task we create script written in python called "RE2re2c.py". 17 17 18 The input for script "RE2re2c.py" is text file that contains regular expressions for word, lemma, tag and list of actions . Strings in each part are writen in quotes. This four parts are divided by tabulator white space "\t".18 The input for script "RE2re2c.py" is text file that contains regular expressions for word, lemma, tag and list of actions (separated by ";"). Strings in each part are writen in quotes. This four parts are divided by tabulator white space "\t". 19 19 20 20 {{{ … … 58 58 }}} 59 59 60 In lexical rules and macros you can use predefined variables word, lemma, morf_info, word index and lemma index. For example if you want to write lexical rule for words that are not in the firs place of sentence: 61 62 {{{ 63 "k1".* .* {ONLY_FIRST_UPPER} if (word_index>0){lemma->preterm=__SYNT_NTERM_NPR;lemma = word->duplicateLemma(LEMMAINDEX);} 64 }}} 65 60 66 The output of "RE2re2c.py" is re2c like file and all strings from text file is transformed into Unicode code point format. If lexical rule contains signs ".*" in RE than it is replaced by special macro STRING. The STRING macro is specified as follows: {{{STRING = [^\t\n\r\0]*}}} '''This is only special macro and user is not allowed to specified macro with name STRING.''' 61 67