| 6 | |
| 7 | == Language resources == |
| 8 | |
| 9 | Similar to dictionaries but more general |
| 10 | * knowledge about language |
| 11 | * knowledge about the world |
| 12 | |
| 13 | [[Image(/trac/research/raw-attachment/wiki/en/LanguageResources/cat.png)]] |
| 14 | |
| 15 | [[Image(/trac/research/raw-attachment/wiki/en/LanguageResources/slovnik_spis.cestiny.png)]] |
| 16 | |
| 17 | * '''intended for humans:''' multilingual dictionaries, explanatory dictionaries, thesauri, encyclopedias |
| 18 | |
| 19 | * '''intended for computer programs:''' translation memory, knowledge bases, semantic networks |
| 20 | |
| 21 | [[Image(/trac/research/raw-attachment/wiki/en/LanguageResources/CzechWordNet.png)]] |
| 22 | |
| 23 | |
| 24 | |
| 25 | == WordNets == |
| 26 | |
| 27 | [[Image(/trac/research/raw-attachment/wiki/en/LanguageResources/WordNet_parallel.png)]] |
| 28 | |
| 29 | * 85,592 words organized in 40,919 synonymical sets |
| 30 | |
| 31 | * several relation types: subclass, part-of, translation, synonymy |
| 32 | |
| 33 | |
| 34 | == Synonyms: Dictionary vs. thesaurus == |
| 35 | |
| 36 | [[Image(/trac/research/raw-attachment/wiki/en/LanguageResources/handsome.png)]] |
| 37 | |
| 38 | * from the contemporary language |
| 39 | * similarity score |
| 40 | * available for many languages |
| 41 | * for every word used in the language |
| 42 | |
| 43 | [[Image(/trac/research/raw-attachment/wiki/en/LanguageResources/handsome_corpus.png)]] |
| 44 | |
| 45 | |
| 46 | == Selected language resources at NLPC == |
| 47 | * 6 dictionaries of Czech language, 512,000 of entries |
| 48 | * synonyms |
| 49 | * Czech synonyms (K. Pala): 23,000 entries, 56,000 synonyms |
| 50 | * Czech WordNet: 85,592 words organized in 40,919 synonymical sets |
| 51 | * automatically generated thesaurus |
| 52 | * translation |
| 53 | * interconnected wordnets: Czech, English, Dutch, Italian, Spanish, French, Greek, Polish, Romanian, Turkish |
| 54 | * specials |
| 55 | * contemporary vulgar words (April 2013): 600 words/collocations + rules to detect concealing |
| 56 | * sign language dictionary with gesture videos |
| 57 | |
| 58 | == Tools for language resources == |
| 59 | |
| 60 | Language resources have to be |
| 61 | * built and continuously maintained |
| 62 | * digitalized (OCR to XML) |
| 63 | * connected with other language resources |
| 64 | * shared among computer programs |
| 65 | * readable for humans |
| 66 | |
| 67 | |
| 68 | |
| 69 | --- |