wiki:en/LanguageResources

Version 7 (modified by xkocinc, 7 years ago) (diff)

--

Language Resources

For NLP we need

/trac/research/raw-attachment/wiki/en/LanguageResources/.png

Types of language resources

  • synonym dictionary - fuzzy searching
    • over 23000 entries, with over 56000 synonyms
    • Czech Wordnet - 85592 words organized in 40919 synonym sets, plus grouping to domains/categories
    • thesaurus in Sketch Engine
  • translation dictionary - multilingual searching
    • Czech-English dictionary - 54000 entries
    • interconnected wordnets (EuroWordnet, Balkanet) - Czech, English, Dutch, Italian, Spanish, French, Greek, Polish, Romanian, Turkish (at least 8500 common synonimical sets)
  • vulgar words dictionary - detection of inappropriate behavior in discussions
    • current language (April 2013), 600 manually edited words/collocations, with rules to detect masking
  • other: dictionary of toponyms? ancient surnames, genealogy? gestures, artworks...?
    • multimedial content in explanatory dictionaries (artworks, videos, recordings) for text enhancement
    • sign language dictionary with gesture videos

Tools for language resources processing

creating, editing, importing, connecting with other resources, visualizing

-> the DEB platform

Attachments (8)

Download all attachments as: .zip