Version 1 (modified by 10 years ago) (diff) | ,
---|
Word Level Analysis
Motivation
Many applications need a tool for “clustering” of word forms appearing in texts:
- chladniček
- chladničky
- chladničkách <=> chladnička
- chladničce
- ...
Usage:
- Indexing, searching, keyword extraction, ...
- And almost all NLP tools
Word Level Processing Data for Czech
For almost 12 M word forms (incl. colloquial forms):
- lemma (canonical form, dictionary form)
- grammatical information: part of speech, number, case etc.
Word form stroj has 3 interpretations:
- lemma stroj, nominative
- lemma stroj, accusative
- noun, masculine animated, singular
- lemma strojit
- verb, 2nd person, singular, imperative mood
Attachments (3)
- stat.png (181.4 KB) - added by 10 years ago.
- chladnicka.png (222.6 KB) - added by 10 years ago.
- czAccent.png (49.9 KB) - added by 10 years ago.
Download all attachments as: .zip