Changes between Version 12 and Version 13 of documentation


Ignore:
Timestamp:
Feb 17, 2010, 4:55:54 PM (14 years ago)
Author:
Vojtěch Kovář
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • documentation

    v12 v13  
    33== Introduction ==
    44
    5 SET is an open source tool for syntax analysis of natural languages. It is based on the principle of detection of important patterns in the text and incremental segmentation of the sentence. Its core consists of a set of patterns (or rules) and a parsing engine that analyses the input sentence according to given rules. Currently, SET is distributed with a set of rules for parsing the Czech language, containing about 150 rules. A simple tree viewer for displaying parser output is also present.
     5SET is an open source tool for syntax analysis of natural languages. It is based on the principle of detection of important patterns in the text and incremental segmentation of the sentence. Its core consists of a set of patterns (or rules) and a parsing engine that analyses the input sentence according to given rules. Currently, SET is distributed with a set of rules for parsing the Czech language, containing about 220 rules. A tree viewer for displaying parser output is also present.
    66
    77== System features ==
    88
    9 The system is able to parse a morphologically tagged sentence in the vertical (BRIEF) format, i.e. one token per line, in word - lemma - tag order. At the time, the morphological tagging must be disambiguated and the tags are expected in the attribute format, as used by the [http://nlp.fi.muni.cz/projekty/ajka ajka] morphological analyser. Examples of correct input files: [attachment:s1.txt sentence 1], [attachment:s2.txt sentence 2] (BRIEF format, in Czech).
     9The system is able to parse a morphologically annotated sentence in the vertical (or BRIEF) format, i.e. one token per line, in word - lemma - tag order. Ambiguous comma-delimited lemmas and tags are allowed in the input. The tags can be either in the attribute format, as used by the [http://nlp.fi.muni.cz/projekty/ajka ajka] morphological analyser, or in the positional format, as described [http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/hmptagqr.html here]. Examples of correct input files: [attachment:s1.txt example 1], [attachment:s2.txt example 2], [attachment:s3.txt example 3], [attachment:s4.txt example 4] (in Czech).
    1010
    1111As the output, the system returns syntactic information found in the input sentence in several possible formats: