Changes between Version 1 and Version 2 of en/Research/Analysis


Ignore:
Timestamp:
Feb 22, 2019, 10:43:41 AM (5 years ago)
Author:
Ales Horak
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • en/Research/Analysis

    v1 v2  
    1111Morphological analysis gives a basic insight into natural language by studying how to distinguish and generate grammatical forms of words arising through inflection (ie. declension and conjugation). This involves considering a set of tags describing grammatical categories of the word form concerned, most notably, its base form (lemma) and paradigm. Automatic analysis of word forms in free text can be used for instance in grammar checker development, and can aid corpus tagging, or semi-automatic dictionary compiling.
    1212
    13 The NLP laboratory has produced a general morphological analyzer for Czech, ajka, which covers vocabulary of over 6 million word forms. It has further served as a base for a similar analyzer for Slovak, the fispell grammar-checker, the czaccent converter of ascii text to text with diacritics, and an interactive interface for the IM Jabber protocol.
     13The NLP laboratory has produced a general [[https://nlp.fi.muni.cz/projekty/ajka/|morphological analyzer for Czech, ajka]], which covers vocabulary of over 6 million word forms. It has further served as a base for a similar analyzer for Slovak, the fispell grammar-checker, the czaccent converter of ascii text to text with diacritics, and an interactive interface for the IM Jabber protocol.
    1414
    1515== Syntactic Analysis
     
    1717The goal of syntactic analysis is to determine whether the text string on input is a sentence in the given (natural) language. If it is, the result of the analysis contains a description of the syntactic structure of the sentence, for example in the form of a derivation tree. Such formalizations are aimed at making computers "understand" relationships between words (and indirectly between corresponding people, things, and actions). Syntactic analysis can be utilized for instance when developing a punctuation corrector, dialogue systems with a natural language interface, or as a building block in a machine translation system. Czech is a language exhibiting rich inflection and free word order and thus requires more grammar rules than most other languages. Accordingly, it is one of the languages that are very hard to analyze.
    1818
    19 The NLP laboratory is developing the synt syntactic analyzer. According to tests performed on large corpora, the performance of synt reaches the recall of 92 % and precision of 84 %. For educational purposes we have a simple syntactic analyzer Zuzana, which is capable of visualizing several types of derivation trees.
     19The NLP laboratory is developing the [[https://nlp.fi.muni.cz/projekty/wwwsynt/|synt syntactic analyzer]]. According to tests performed on large corpora, the performance of **synt** reaches the recall of 92 % and precision of 84 %. For educational purposes we have a simple [[https://nlp.fi.muni.cz/projekty/zuzana/|syntactic analyzer Zuzana]], which is capable of visualizing several types of derivation trees.
    2020
    2121== Semantic Analysis
     
    2323Semantic and pragmatic analysis make up the most complex phase of language processing as they build up on results of all the above mentioned disciplines. Based on the knowledge about the structure of words and sentences, the meaning of words, phrases, sentences and texts is stipulated, and subsequently also their purpose and consequences. From the computational point of view, no general solutions that would be adequate have been proposed for this area. There are many open theoretical problems, and in practice, great problems are caused by errors on lower processing levels. The ultimate touchstone on this level is machine translation, which hasn't been implemented for Czech with satisfactory results yet.
    2424
    25 One of the long-term projects of the NLP laboratory is the use of Transparent Intensional Logic (TIL) as a semantic representation of knowledge and subsequently as a transfer language in automatic machine translation. At the current stage, it is realistic to process knowledge in a simpler form - considerably less complex tasks have been addressed, such as machine translation for a restricted domain (eg. official documents and weather reports), or semi-automatic machine translation between close languages. The resources exploited in these applications are corpora, semantic nets, and electronic dictionaries.
     25One of the long-term projects of the NLP laboratory is the use of [[https://www.researchgate.net/publication/266521618_Procedural_Seman-_tics_for_Hyperintensional_Logic_Foundations_and_Applications_of_Transparent_Intensional_Logic|Transparent Intensional Logic]] ([[http://www.cs.vsb.cz/duzi/aleph.pdf|TIL]]) as a [[https://www.fi.muni.cz/~hales/disert/|semantic representation of knowledge]] and subsequently as a transfer language in automatic machine translation. At the current stage, it is realistic to process knowledge in a simpler form - considerably less complex tasks have been addressed, such as machine translation for a restricted domain (eg. official documents and weather reports), or semi-automatic machine translation between close languages. The resources exploited in these applications are corpora, semantic nets, and electronic dictionaries.
    2626Knowledge Representation
    2727
    2828Not all information needed for processing of texts is encoded in the structure of language. In order to understand the content of texts properly, it is often necessary to possess certain knowledge about the world - either general (eg. that birds can fly, or that a key is required to open a locked door), or even very specific, expert knowledge, the reader is expected to be familiar with (eg. in a mathematical journal that an even number higher than 2 can't be a prime). Seemingly, the greatest challenge in this field is not to gather the knowledge, but to represent and structure it in a suitable way, to search in it efficiently, and to use it to infer further knowledge. These goals in their essence correspond to the task of constructing artificial intelligence, which is without any doubt one of the biggest and most interesting topics of modern science.
    2929
    30 In the field of representation of meaning and knowledge we shall mention the notable contribution of NLP laboratory members to the !EuroWordNet and Balkanet projects, which were aimed at building a multilingual !WordNet-like semantic net. Further, the laboratory has developed the DEB (Dictionary Editor and Browser) platform, which makes it possible to efficiently browse and search the !WordNet semantic net and also to edit it in a comfortable way. With regard to the success of this platform, it's large-scale use within the !WordNet Grid project has been considered.
     30In the field of representation of meaning and knowledge we shall mention the notable contribution of NLP laboratory members to the !EuroWordNet and Balkanet projects, which were aimed at building a multilingual !WordNet-like semantic net. Further, the laboratory has developed the [[https://deb.fi.muni.cz/|DEB (Dictionary Editor and Browser) platform]], which makes it possible to efficiently browse and search the !WordNet semantic net and also to edit it in a comfortable way. With regard to the success of this platform, it's large-scale use within the !WordNet Grid project has been considered.