Changes between Version 8 and Version 9 of en/LanguageResources


Ignore:
Timestamp:
Jun 5, 2014, 12:11:40 PM (10 years ago)
Author:
xkocinc
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • en/LanguageResources

    v8 v9  
    11= Language Resources =
    22
    3 == For NLP we need ==
     3== For NLP we need: ==
    44
    55[[Image(/trac/research/raw-attachment/wiki/en/LanguageResources/nlp_tools_resources_algorithms.png)]]
     6
     7== Language resources ==
     8
     9Similar to dictionaries but more general
     10 * knowledge about language
     11 * knowledge about the world
     12
     13[[Image(/trac/research/raw-attachment/wiki/en/LanguageResources/cat.png)]]
     14
     15[[Image(/trac/research/raw-attachment/wiki/en/LanguageResources/slovnik_spis.cestiny.png)]]
     16
     17 * '''intended for humans:''' multilingual dictionaries, explanatory dictionaries, thesauri, encyclopedias
     18 
     19 * '''intended for computer programs:''' translation memory, knowledge bases, semantic networks
     20
     21[[Image(/trac/research/raw-attachment/wiki/en/LanguageResources/CzechWordNet.png)]]
     22
     23
     24
     25== WordNets ==
     26
     27[[Image(/trac/research/raw-attachment/wiki/en/LanguageResources/WordNet_parallel.png)]]
     28
     29 * 85,592 words organized in 40,919 synonymical sets
     30 
     31 * several relation types: subclass, part-of, translation, synonymy
     32
     33
     34== Synonyms: Dictionary vs. thesaurus ==
     35
     36[[Image(/trac/research/raw-attachment/wiki/en/LanguageResources/handsome.png)]]
     37
     38 * from the contemporary language
     39 * similarity score
     40 * available for many languages
     41 * for every word used in the language
     42
     43[[Image(/trac/research/raw-attachment/wiki/en/LanguageResources/handsome_corpus.png)]]
     44
     45
     46== Selected language resources at NLPC ==
     47 * 6 dictionaries of Czech language, 512,000 of entries
     48 * synonyms
     49   * Czech synonyms (K. Pala): 23,000 entries, 56,000 synonyms
     50   * Czech WordNet: 85,592 words organized in 40,919 synonymical sets
     51   * automatically generated thesaurus
     52 * translation
     53   * interconnected wordnets: Czech, English, Dutch, Italian, Spanish, French, Greek, Polish, Romanian, Turkish
     54 * specials
     55   * contemporary vulgar words (April 2013): 600 words/collocations + rules to detect concealing
     56   * sign language dictionary with gesture videos
     57
     58== Tools for language resources ==
     59
     60Language resources have to be
     61 * built and continuously maintained
     62 * digitalized (OCR to XML)
     63 * connected with other language resources
     64 * shared among computer programs
     65 * readable for humans
     66
     67
     68
     69---
    670
    771== Types of language resources ==