Changes between Version 2 and Version 3 of en/MainTopics


Ignore:
Timestamp:
May 12, 2014, 12:24:46 PM (10 years ago)
Author:
xkocinc
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • en/MainTopics

    v2 v3  
    1919
    2020== Corpora == #Corpora
    21 [[Image(/trac/research/raw-attachment/wiki/cs/MainTopics/corpora.png)]]
     21[[Image(/trac/research/raw-attachment/wiki/en/MainTopics/corpora.png)]]
    2222
    2323Corpus is a collection of text data in electronic form. As a significant source of linguistic data, corpora make it possible to investigate many frequency-related phenomena in language, and nowadays they are an indispensable tool in NLP. In addition to corpora containing general texts, corpora for specific purposes are also produced, such as annotated, domain-specific, spoken or error corpora.
     
    2929The NLP Centre has produced a complete set of tools for creating and managing corpora, the '''Corpus Architect'''. It can store and manage corpora containing 100+ billion word tokens.
    3030
    31 [[Image(/trac/research/raw-attachment/wiki/cs/MainTopics/metatrans.png)]]
     31[[Image(/trac/research/raw-attachment/wiki/en/MainTopics/metatrans.png)]]
    3232
    3333''Related projects:''
     
    5252
    5353== Dictionaries == #Dictionaries
    54 
    55 [[Image(/trac/research/raw-attachment/wiki/cs/MainTopics/debII_slovniky.png, align=right)]]
     54[[Image(/trac/research/raw-attachment/wiki/en/MainTopics/debII_slovniky.png, align=right)]]
    5655
    5756Dictionaries have always been a fundamental part of every linguist's basic equipment. However, handling paper dictionaries is rather inconvenient. Therefore, one of the first projects of the NLP Centre was to digitize classic dictionaries of Czech and develop a set of advanced tools for processing lexicographic data, a so-called lexicographer's workbench. This term refers to a system that enables each expert user to easily access various linguistic resources and provides them with an application interface for searching and editing data.
     
    7675
    7776== Morphology == #Morphology
    78 
    79 [[Image(/trac/research/raw-attachment/wiki/cs/MainTopics/majka_nlpportal.png, align=right)]]
     77[[Image(/trac/research/raw-attachment/wiki/en/MainTopics/majka_nlpportal.png, align=right)]]
    8078
    8179Morphological analysis gives a basic insight into natural language by studying how to distinguish and generate grammatical forms of words arising through inflection (ie. declension and conjugation). This involves considering a set of tags describing the grammatical categories of the word form concerned, most notably, its base form (lemma) and paradigm. Automatic analysis of word forms in free text can be used for instance in grammar checker development, and can aid corpus tagging, or semi-automatic dictionary compiling.
     
    9492
    9593== Syntactic Analysis == #Syntactic_Analysis
    96 
    97 [[Image(/trac/research/raw-attachment/wiki/cs/MainTopics/synt_tree.png​, align=right)]]
     94[[Image(/trac/research/raw-attachment/wiki/en/MainTopics/synt_tree.png​, align=right)]]
    9895
    9996The goal of syntactic analysis is to determine whether the text string on input is a sentence in the given (natural) language. If it is, the result of the analysis contains a description of the syntactic structure of the sentence, for example in the form of a derivation tree. Such formalizations are aimed at making computers "understand" grammar of natural languages. Syntactic analysis can be utilized for instance when developing a punctuation corrector, dialogue systems with a natural language interface, or as a building block in a machine translation system. Czech is a language exhibiting rich inflection and free word order and thus belongs to the languages that are very hard to analyze, as it requires more grammar rules than most other languages.
     
    112109
    113110== Semantics == #Semantics
    114 
    115 [[Image(/trac/research/raw-attachment/wiki/cs/MainTopics/dict2_small.png, align=left)]]
     111[[Image(/trac/research/raw-attachment/wiki/en/MainTopics/dict2_small.png, align=left)]]
    116112
    117113Semantic and pragmatic analysis make up the most complex phase of language processing as they build up on results of all the above mentioned disciplines. The ultimate touchstone on this level is machine translation, which hasn't been implemented for Czech with satisfactory results yet.
     
    133129''Animated demonstration of the Visual Browser:''
    134130
    135 [[Image(/trac/research/raw-attachment/wiki/cs/MainTopics/vl_anim.gif)]]
     131[[Image(/trac/research/raw-attachment/wiki/en/MainTopics/vl_anim.gif)]]
    136132
    137133== Further information == #Further_information