= Word Level Analysis =
== Motivation ==
Many applications need a tool for “clustering” of word forms appearing in texts:

 * chladniček
 * chladničky
 * chladničkách     <=>   chladnička
 * chladničce
 * ...

Usage:

 * Indexing, searching, keyword extraction, ...
 * And almost all NLP tools

== Word Level Processing Data for Czech ==
For almost 12 M word forms (incl. colloquial forms):

 * lemma (canonical form, dictionary form)
 * grammatical information: part of speech, number, case etc.

Word form stroj has 3 interpretations:

 * lemma ''stroj'', nominative
 * lemma ''stroj'', accusative
   * noun, masculine animated, singular
 * lemma ''strojit''
   * verb, 2nd person, singular, imperative mood


== Possible Applications ==
Various types of analyses:
 * word form => lemma (many types of searching/indexation)
   * nebral => brát/nebrat (úplatky)
   * nejstaršího => nejstarší/starý (člověk)
   * chladnička => chladničky (as a class)
   * bavlna => bavlněný (word derivation)
 * word form/lemma + gram. info. => word form
   * e.g. salutation generation: pane Procházko
 * word form/lemma => all word forms
 * word form => lemma + full/partial grammatical information

The analysis is very fast - approx. 1 million word forms per second