Context Navigation

← Previous Change
Wiki History
Next Change →

Changes between Version 8 and Version 9 of LexicalAnalysis

Timestamp:: Feb 12, 2014, 12:23:20 PM (11 years ago)
Author:: xmedved1
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

LexicalAnalysis

-                      v8
+                      v9
          adv -> NUMBER '.' MONTH
 }}}
+[[BR]]
 Now we decide to change this lexical analysis (originally written in C-code) into format that can understand everybody who knows regular expressions (mainly linguists). This change make system easily adaptable for another language and more flexible for future development.
 …
              "k4".*"xC".*      "dva"           .*           [0-9]+       preterm=__SYNT_NTERM_TWO;
 }}}
+[[BR]]
 User can put a macro in regular expression too. This macro in lexical rule is then substituted by predefined regular expression. The definition of macro is: "m=NAME RE". Where NAME and RE are separated by tab white space "\t". Macro in lexical rule is then written in curly braces.
 …
                      .*      .*      {IS_NUMBER}   [0-9]+   preterm=NUM
 }}}
+[[BR]]
 User is allowed to write comment for defined macro or lexical rule. The "gen_lex.py" can recognize two types of comments. Format of comment is "#=comment".
 …
                           #=Top commnet for macro specification
                           m=IS_NUMBER     [0-9]+
+}}}
+}}}
+[[BR]]
 In lexical rules and macros you can use predefined variables word, lemma, morf_info, word index and lemma index. For example if you want to write lexical rule for words that are not in the firs place of sentence:
 …
 {{{
 "k1".*  .*  {ONLY_FIRST_UPPER}   WI    if (word_index>0){lemma->preterm=__SYNT_NTERM_NPR;lemma = word->duplicateLemma(LEMMAINDEX);}
+}}}
+}}}
+[[BR]]
 The output of "gen_lex.py" is re2c like file and all strings from text file is transformed into Unicode code point format. If lexical rule contains signs ".*" in RE than it is replaced by special macro STRING. The STRING macro is specified as follows: {{{STRING = [^\t\n\r\0]*}}}. '''This is only special macro and user is not allowed to specified macro with name STRING.'''
+The output of "gen_lex.py" is re2c like file and all strings from text file is transformed into Unicode code point format. In tag regular expression part all ".*" signs are replaced with special macro '''STING''' (specified as follows {{{STRING = [^\t\n\r\0]*}}}) or if there are some strings like "k1" then it is replaced by {{{[^\t\n\r\0k]*}}}. This modification makes program faster when the re2c file is transformed in final automata written in C-code. '''This STRING macro is not allowed to use in user input.'''
+In tag regular expression part sign ".*" s replaced with STING or if there are some strings like "k1" then sign is replaced by {{{[^\t\n\r\0k]*}}}. This modification makes program faster when the re2c file is transformed in final automata written in C-code.
+[[BR]]
 The command to generate C-code from re2c file is: {{{re2c -isuF -o analyze_lex.c analyze_lex.re2c}}}. In SYNT system Makefile makes all work.
+The command to generate C-code from re2c file is: {{{re2c -isuF -o analyze_lex.c analyze_lex.re2c}}}.
 Usage of gen_lex.py: [[BR]]
+Usage of gen_lex.py: {{{gen_lex.py [-u] < INPUT > OUTPUT}}}[[BR]]
-{{{gen_lex.py [-u] < INPUT > OUTPUT}}}[[BR]]