Context Navigation

ParsingCzech

Timestamp:: Nov 10, 2022, 11:27:03 PM (3 years ago)
Author:: Ales Horak
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

private/NlpInPracticeCourse/ParsingCzech

-                      v30
+                      v31
 cd ukol_ia161-parsing
 }}}
+. Choose the language you want to work with. The default is Czech (`cs`) which can be changed to English (`en`) via editing `Makefile`:
+{{{
+nano Makefile
+}}}
+ change the first line to
+{{{
+LANGUAGE=en
+}}}
 . Test the prepared program that analyses 100 selected sentences
 {{{
 …
 }}}
  Exit the diff by pressing `q`.[[br]]
+ You can watch the two trees with (`python-qt4` must be installed in the system)
+ You may inspect the tagged vertical text with
+ {{{
+ make vert SENTENCE=00009
+}}}
+ You can watch the two trees with (`python3-tk` must be installed in the system)
  {{{
 make view SENTENCE=00009
 …
 }}}
 . Look at the files:
  * `data/vert/pdt2_etest-sel100` - 100 input sentences in vertical format. The tag format is  the Prague Dependency Treebank [https://ufal.mff.cuni.cz/pdt2.0/doc/manuals/en/m-layer/html/ch02s02s01.html positional tagset]
  * `data/trees/pdt2_etest` - 100 gold standard dependency trees from the Prague Dependency Treebank
  * `data/trees/set_pdt2_etest-sel100` - 100 trees output from SET by running `make set_trees`
  * `grammar.set` - the grammar used in running SET
+ * `data/vert/pdt2_etest` or `ud21_gum_dev` - 100 input sentences in vertical format. The tag format is  the Prague Dependency Treebank [https://ufal.mff.cuni.cz/pdt2.0/doc/manuals/en/m-layer/html/ch02s02s01.html positional tagset] for Czech and the [https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html  Penn Treebank tagset] for English
+ * `data/trees/pdt2_etest` or `ud21_gum_dev` - 100 gold standard dependency trees from the Prague Dependency Treebank or the Universal Dependencies GUM corpus
+ * `data/trees/set_pdt2_etest` or `set_ud21_gum_dev` - 100 trees output from SET by running `make set_trees`
+ * `grammar-cs.set` or `grammar-en.set` - the grammar used in running SET
 == Assignment ==
 . Study the [https://nlp.fi.muni.cz/trac/set/wiki/documentation SET documentation]. The tags used in the grammar are in the [raw-attachment:tagset.pdf Brno tagset].
+. Study the [https://nlp.fi.muni.cz/trac/set/wiki/documentation SET documentation]. The tags used in the Czech grammar are in the [raw-attachment:tagset.pdf Brno tagset].
 . Develop better grammar - repeat the process:
 {{{
 edit grammar.set # use your favourite editor
+nano grammar.set # or use your favourite editor
 make set_trees
 make compare
 }}}
  to improve the original UAS
 . Write the final UAS in `grammar.set`
+. Write the final UAS in `grammar-cs.set` or `grammar-en.set`
 {{{
 # This is the SET grammar for Czech used in IA161 course
+#
 # ===========   resulting UAS =  66.1 %  ===================
+# ===========   resulting UAS =  66.9 %  ===================
 }}}
 . Upload your `grammar.set` to the homework vault.
+. Upload your `grammar-cs.set` or `grammar-en.set` to the homework vault.