| 1 | = Word Level Analysis = |
| 2 | |
| 3 | == Motivation == |
| 4 | |
| 5 | Many applications need a tool for “clustering” of word forms appearing in texts: |
| 6 | * chladniček |
| 7 | * chladničky |
| 8 | * chladničkách <=> chladnička |
| 9 | * chladničce |
| 10 | * ... |
| 11 | |
| 12 | Usage: |
| 13 | * Indexing, searching, keyword extraction, ... |
| 14 | * And almost all NLP tools |
| 15 | |
| 16 | |
| 17 | == Word Level Processing Data for Czech == |
| 18 | |
| 19 | For almost 12 M word forms (incl. colloquial forms): |
| 20 | * lemma (canonical form, dictionary form) |
| 21 | * grammatical information: part of speech, number, case etc. |
| 22 | |
| 23 | Word form stroj has 3 interpretations: |
| 24 | * lemma ''stroj'', nominative |
| 25 | * lemma ''stroj'', accusative |
| 26 | * noun, masculine animated, singular |
| 27 | * lemma ''strojit'' |
| 28 | * verb, 2nd person, singular, imperative mood |