Version 7 (modified by 9 years ago) (diff) | ,
---|
Language Resources
For NLP we need
Types of language resources
- synonym dictionary - fuzzy searching
- over 23000 entries, with over 56000 synonyms
- Czech Wordnet - 85592 words organized in 40919 synonym sets, plus grouping to domains/categories
- thesaurus in Sketch Engine
- translation dictionary - multilingual searching
- Czech-English dictionary - 54000 entries
- interconnected wordnets (EuroWordnet, Balkanet) - Czech, English, Dutch, Italian, Spanish, French, Greek, Polish, Romanian, Turkish (at least 8500 common synonimical sets)
- vulgar words dictionary - detection of inappropriate behavior in discussions
- current language (April 2013), 600 manually edited words/collocations, with rules to detect masking
- other: dictionary of toponyms? ancient surnames, genealogy? gestures, artworks...?
- multimedial content in explanatory dictionaries (artworks, videos, recordings) for text enhancement
- sign language dictionary with gesture videos
Tools for language resources processing
creating, editing, importing, connecting with other resources, visualizing
-> the DEB platform
Attachments (8)
- nlp_tools_resources_algorithms.png (47.3 KB) - added by 9 years ago.
- cat.png (88.8 KB) - added by 9 years ago.
- slovnik_spis.cestiny.png (102.7 KB) - added by 9 years ago.
- CzechWordNet.png (62.6 KB) - added by 9 years ago.
- WordNet_parallel.png (124.1 KB) - added by 9 years ago.
- handsome.png (139.9 KB) - added by 9 years ago.
- handsome_corpus.png (131.5 KB) - added by 9 years ago.
- world.png (105.3 KB) - added by 9 years ago.
Download all attachments as: .zip