Word Level Analysis


Many applications need a tool for “clustering” of word forms appearing in texts:

  • chladniček
  • chladničky
  • chladničkách <=> chladnička
  • chladničce
  • ...


  • Indexing, searching, keyword extraction, ...
  • And almost all NLP tools

Word Level Processing Data for Czech

For almost 12 M word forms (incl. colloquial forms):

  • lemma (canonical form, dictionary form)
  • grammatical information: part of speech, number, case etc.

Word form stroj has 3 interpretations:

  • lemma stroj, nominative
  • lemma stroj, accusative
    • noun, masculine animated, singular
  • lemma strojit
    • verb, 2nd person, singular, imperative mood

