Changes between Version 13 and Version 14 of documentation


Ignore:
Timestamp:
Feb 17, 2010, 6:00:13 PM (14 years ago)
Author:
Vojtěch Kovář
Comment:

"System Features" updated

Legend:

Unmodified
Added
Removed
Modified
  • documentation

    v13 v14  
    77== System features ==
    88
    9 The system is able to parse a morphologically annotated sentence in the vertical (or BRIEF) format, i.e. one token per line, in word - lemma - tag order. Ambiguous comma-delimited lemmas and tags are allowed in the input. The tags can be either in the attribute format, as used by the [http://nlp.fi.muni.cz/projekty/ajka ajka] morphological analyser, or in the positional format, as described [http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/hmptagqr.html here]. Examples of correct input files: [attachment:s1.txt example 1], [attachment:s2.txt example 2], [attachment:s3.txt example 3], [attachment:s4.txt example 4] (in Czech).
     9The system is able to parse a morphologically annotated sentence in the vertical (or BRIEF) format, i.e. one token per line, in word - lemma - tag order. Ambiguous comma-delimited lemmas and tags are allowed in the input. The tags can be either in the attribute format, as used by the [http://nlp.fi.muni.cz/projekty/ajka ajka] morphological analyser, or in the positional format, as described [http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/hmptagqr.html here] (in this case, the {{{--postags}}} switch needs to be used). Examples of correct input files: [attachment:s1.txt example 1], [attachment:s2.txt example 2], [attachment:s3.txt example 3], [attachment:s4.txt example 4] (in Czech).
    1010
    1111As the output, the system returns syntactic information found in the input sentence in several possible formats:
     
    1919    * '''Dependency trees'''
    2020      Full syntactic trees containing only dependency elements, corresponding to the formalism used by the [http://ufal.mff.cuni.cz Institute of Formal and Applied Linguistics] in Prague. In the text form, it is printed on stdout; it can be also displayed in the graphic module.
     21    * '''Phrasal trees'''
     22      Full syntactic trees in the constituent format. In the text form, it is printed on stdout in two possible codings; it can be also displayed in the graphic module.
     23    * '''Collocations'''
     24      Pairs of words in dependency relations. Pairs of lemmas and tags are also output.
     25    * '''Phrases'''
     26      Word chunks that form phrases in the sentence. These can be output in two possible formats.
    2127
    22 In the text form, the output trees are encoded by set of lines, each of them representing one node of the resulting tree. Each line contains four TAB-delimited fields:
     28The output trees in the text form are encoded by set of lines, each of them representing one node of the resulting tree. Each line contains four TAB-delimited fields:
    2329
    2430    * Node ID (integer number)
     
    2632    * Node dependency ID (integer number)
    2733    * Dependency type ('p' or 'd', for phrasal or dependency edge)
     34
     35Phrasal trees can also be printed in the "LAA" format (using the {{{--laa}}} switch) that enables the trees to be compared using the Leaf Ancestor Assessment metric, as described [http://www.grsampson.net/RLeafAnc.html here].
    2836
    2937== Program usage ==