Changes between Version 2 and Version 3 of en/MainTopics
- Timestamp:
- May 12, 2014, 12:24:46 PM (9 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
en/MainTopics
v2 v3 19 19 20 20 == Corpora == #Corpora 21 [[Image(/trac/research/raw-attachment/wiki/ cs/MainTopics/corpora.png)]]21 [[Image(/trac/research/raw-attachment/wiki/en/MainTopics/corpora.png)]] 22 22 23 23 Corpus is a collection of text data in electronic form. As a significant source of linguistic data, corpora make it possible to investigate many frequency-related phenomena in language, and nowadays they are an indispensable tool in NLP. In addition to corpora containing general texts, corpora for specific purposes are also produced, such as annotated, domain-specific, spoken or error corpora. … … 29 29 The NLP Centre has produced a complete set of tools for creating and managing corpora, the '''Corpus Architect'''. It can store and manage corpora containing 100+ billion word tokens. 30 30 31 [[Image(/trac/research/raw-attachment/wiki/ cs/MainTopics/metatrans.png)]]31 [[Image(/trac/research/raw-attachment/wiki/en/MainTopics/metatrans.png)]] 32 32 33 33 ''Related projects:'' … … 52 52 53 53 == Dictionaries == #Dictionaries 54 55 [[Image(/trac/research/raw-attachment/wiki/cs/MainTopics/debII_slovniky.png, align=right)]] 54 [[Image(/trac/research/raw-attachment/wiki/en/MainTopics/debII_slovniky.png, align=right)]] 56 55 57 56 Dictionaries have always been a fundamental part of every linguist's basic equipment. However, handling paper dictionaries is rather inconvenient. Therefore, one of the first projects of the NLP Centre was to digitize classic dictionaries of Czech and develop a set of advanced tools for processing lexicographic data, a so-called lexicographer's workbench. This term refers to a system that enables each expert user to easily access various linguistic resources and provides them with an application interface for searching and editing data. … … 76 75 77 76 == Morphology == #Morphology 78 79 [[Image(/trac/research/raw-attachment/wiki/cs/MainTopics/majka_nlpportal.png, align=right)]] 77 [[Image(/trac/research/raw-attachment/wiki/en/MainTopics/majka_nlpportal.png, align=right)]] 80 78 81 79 Morphological analysis gives a basic insight into natural language by studying how to distinguish and generate grammatical forms of words arising through inflection (ie. declension and conjugation). This involves considering a set of tags describing the grammatical categories of the word form concerned, most notably, its base form (lemma) and paradigm. Automatic analysis of word forms in free text can be used for instance in grammar checker development, and can aid corpus tagging, or semi-automatic dictionary compiling. … … 94 92 95 93 == Syntactic Analysis == #Syntactic_Analysis 96 97 [[Image(/trac/research/raw-attachment/wiki/cs/MainTopics/synt_tree.png, align=right)]] 94 [[Image(/trac/research/raw-attachment/wiki/en/MainTopics/synt_tree.png, align=right)]] 98 95 99 96 The goal of syntactic analysis is to determine whether the text string on input is a sentence in the given (natural) language. If it is, the result of the analysis contains a description of the syntactic structure of the sentence, for example in the form of a derivation tree. Such formalizations are aimed at making computers "understand" grammar of natural languages. Syntactic analysis can be utilized for instance when developing a punctuation corrector, dialogue systems with a natural language interface, or as a building block in a machine translation system. Czech is a language exhibiting rich inflection and free word order and thus belongs to the languages that are very hard to analyze, as it requires more grammar rules than most other languages. … … 112 109 113 110 == Semantics == #Semantics 114 115 [[Image(/trac/research/raw-attachment/wiki/cs/MainTopics/dict2_small.png, align=left)]] 111 [[Image(/trac/research/raw-attachment/wiki/en/MainTopics/dict2_small.png, align=left)]] 116 112 117 113 Semantic and pragmatic analysis make up the most complex phase of language processing as they build up on results of all the above mentioned disciplines. The ultimate touchstone on this level is machine translation, which hasn't been implemented for Czech with satisfactory results yet. … … 133 129 ''Animated demonstration of the Visual Browser:'' 134 130 135 [[Image(/trac/research/raw-attachment/wiki/ cs/MainTopics/vl_anim.gif)]]131 [[Image(/trac/research/raw-attachment/wiki/en/MainTopics/vl_anim.gif)]] 136 132 137 133 == Further information == #Further_information