= Word Level Analysis = == Motivation == Many applications need a tool for “clustering” of word forms appearing in texts: * chladniček * chladničky * chladničkách <=> chladnička * chladničce * ... Usage: * Indexing, searching, keyword extraction, ... * And almost all NLP tools == Word Level Processing Data for Czech == For almost 12 M word forms (incl. colloquial forms): * lemma (canonical form, dictionary form) * grammatical information: part of speech, number, case etc. Word form stroj has 3 interpretations: * lemma ''stroj'', nominative * lemma ''stroj'', accusative * noun, masculine animated, singular * lemma ''strojit'' * verb, 2nd person, singular, imperative mood