Changes between Version 9 and Version 10 of en/LanguageResources


Ignore:
Timestamp:
Jun 5, 2014, 12:55:55 PM (10 years ago)
Author:
xkocinc
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • en/LanguageResources

    v9 v10  
    2323
    2424
    25 == WordNets ==
     25== Types of language resources ==
     26
     27 * synonym dictionary - fuzzy searching
     28   * over 23000 entries, with over 56000 synonyms
     29   * Czech Wordnet - 85592 words organized in 40919 synonym sets, plus grouping to domains/categories
     30   * thesaurus in Sketch Engine
     31
     32 * translation dictionary - multilingual searching
     33   * Czech-English dictionary - 54000 entries
     34   * interconnected wordnets (!EuroWordnet, Balkanet) - Czech, English, Dutch, Italian, Spanish, French, Greek, Polish, Romanian, Turkish (at least 8500 common synonimical sets)
     35
     36 * vulgar words dictionary - detection of inappropriate behavior in discussions
     37   * current language (April 2013), 600 manually edited words/collocations, with rules to detect masking
     38
     39 * other: dictionary of toponyms? ancient surnames, genealogy? gestures, artworks...?
     40   * multimedial content in explanatory dictionaries (artworks, videos, recordings) for text enhancement
     41   * sign language dictionary with gesture videos
     42
     43
     44== !WordNets ==
    2645
    2746[[Image(/trac/research/raw-attachment/wiki/en/LanguageResources/WordNet_parallel.png)]]
     
    6685
    6786
    68 
    69 ---
    70 
    71 == Types of language resources ==
    72 
    73  * synonym dictionary - fuzzy searching
    74    * over 23000 entries, with over 56000 synonyms
    75    * Czech Wordnet - 85592 words organized in 40919 synonym sets, plus grouping to domains/categories
    76    * thesaurus in Sketch Engine
    77  * translation dictionary - multilingual searching
    78    * Czech-English dictionary - 54000 entries
    79    * interconnected wordnets (!EuroWordnet, Balkanet) - Czech, English, Dutch, Italian, Spanish, French, Greek, Polish, Romanian, Turkish (at least 8500 common synonimical sets)
    80  * vulgar words dictionary - detection of inappropriate behavior in discussions
    81    * current language (April 2013), 600 manually edited words/collocations, with rules to detect masking
    82  * other: dictionary of toponyms? ancient surnames, genealogy? gestures, artworks...?
    83    * multimedial content in explanatory dictionaries (artworks, videos, recordings) for text enhancement
    84    * sign language dictionary with gesture videos
    85 
    8687== Tools for language resources processing ==
    8788
    88 creating, editing, importing, connecting with other resources, visualizing
     89 * creating, editing, importing, connecting with other resources, visualizing
    8990
    90 -> the DEB platform
     91
     92== Language resource tools: the DEB platform ==
     93
     94 * platform for '''d'''ictionary '''e'''diting and '''b'''rowsing
     95   * strict client-server architecture
     96   * basically any XML data
     97
     98 * server
     99   * server side modules
     100   * database backend (XML database)
     101
     102 * client
     103   * lightweight
     104   * graphical interface
     105   * web interface
     106
     107 * practically used in 22 international scientific/commercial projects
     108
     109[[Image(/trac/research/raw-attachment/wiki/en/LanguageResources/world.png)]]
     110
     111
     112== Conclusions ==
     113 * language resources:
     114   * dictionaries
     115   * corpus-based thesauri
     116   * semantic networks (WordNet)
     117 * flexible and powerful tool for language resources processing: the DEB platform