Context Navigation

Changes between Version 13 and Version 14 of documentation

Timestamp:: Feb 17, 2010, 6:00:13 PM (15 years ago)
Author:: Vojtěch Kovář
Comment:: "System Features" updated

Legend:

: Unmodified
: Added
: Removed
: Modified

documentation

-                      v13
+                      v14
 == System features ==
 The system is able to parse a morphologically annotated sentence in the vertical (or BRIEF) format, i.e. one token per line, in word - lemma - tag order. Ambiguous comma-delimited lemmas and tags are allowed in the input. The tags can be either in the attribute format, as used by the [http://nlp.fi.muni.cz/projekty/ajka ajka] morphological analyser, or in the positional format, as described [http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/hmptagqr.html here]. Examples of correct input files: [attachment:s1.txt example 1], [attachment:s2.txt example 2], [attachment:s3.txt example 3], [attachment:s4.txt example 4] (in Czech).
+The system is able to parse a morphologically annotated sentence in the vertical (or BRIEF) format, i.e. one token per line, in word - lemma - tag order. Ambiguous comma-delimited lemmas and tags are allowed in the input. The tags can be either in the attribute format, as used by the [http://nlp.fi.muni.cz/projekty/ajka ajka] morphological analyser, or in the positional format, as described [http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/hmptagqr.html here] (in this case, the {{{--postags}}} switch needs to be used). Examples of correct input files: [attachment:s1.txt example 1], [attachment:s2.txt example 2], [attachment:s3.txt example 3], [attachment:s4.txt example 4] (in Czech).
 As the output, the system returns syntactic information found in the input sentence in several possible formats:
 …
     * '''Dependency trees'''
       Full syntactic trees containing only dependency elements, corresponding to the formalism used by the [http://ufal.mff.cuni.cz Institute of Formal and Applied Linguistics] in Prague. In the text form, it is printed on stdout; it can be also displayed in the graphic module.
+    * '''Phrasal trees'''
+      Full syntactic trees in the constituent format. In the text form, it is printed on stdout in two possible codings; it can be also displayed in the graphic module.
+    * '''Collocations'''
+      Pairs of words in dependency relations. Pairs of lemmas and tags are also output.
+    * '''Phrases'''
+      Word chunks that form phrases in the sentence. These can be output in two possible formats.
 In the text form, the output trees are encoded by set of lines, each of them representing one node of the resulting tree. Each line contains four TAB-delimited fields:
+The output trees in the text form are encoded by set of lines, each of them representing one node of the resulting tree. Each line contains four TAB-delimited fields:
     * Node ID (integer number)
 …
     * Node dependency ID (integer number)
     * Dependency type ('p' or 'd', for phrasal or dependency edge)
+Phrasal trees can also be printed in the "LAA" format (using the {{{--laa}}} switch) that enables the trees to be compared using the Leaf Ancestor Assessment metric, as described [http://www.grsampson.net/RLeafAnc.html here].
 == Program usage ==