Changes between Initial Version and Version 1 of en/WordLevelAnalysis


Ignore:
Timestamp:
Jun 5, 2014, 10:45:34 AM (10 years ago)
Author:
xkocinc
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • en/WordLevelAnalysis

    v1 v1  
     1= Word Level Analysis =
     2
     3== Motivation ==
     4
     5Many applications need a tool for “clustering” of word forms appearing in texts:
     6 * chladniček   
     7 * chladničky
     8 * chladničkách     <=>   chladnička
     9 * chladničce
     10 * ...
     11
     12Usage:
     13 * Indexing, searching, keyword extraction, ...
     14 * And almost all NLP tools
     15
     16
     17== Word Level Processing Data for Czech ==
     18
     19For almost 12 M word forms (incl. colloquial forms):
     20 * lemma (canonical form, dictionary form)
     21 * grammatical information: part of speech, number, case etc.
     22
     23Word form stroj has 3 interpretations:
     24 * lemma ''stroj'', nominative
     25 * lemma ''stroj'', accusative
     26    * noun, masculine animated, singular
     27 * lemma ''strojit''
     28    * verb, 2nd person, singular, imperative mood