Context Navigation

ParsingCzech

Timestamp:: Nov 5, 2023, 8:46:00 PM (20 months ago)
Author:: Ales Horak
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

private/NlpInPracticeCourse/ParsingCzech

-                      v31
+                      v32
 == Practical Session ==
 We will develop/adjust the grammar of the SET parser.
+We will develop/adjust the grammar of the SET parser (for English or Czech).
 . Download the [[htdocs:bigdata/ukol_ia161-parsing.zip|SET parser with evaluation dataset]]
 …
 cd ukol_ia161-parsing
 }}}
 . Choose the language you want to work with. The default is Czech (`cs`) which can be changed to English (`en`) via editing `Makefile`:
+. [optional] Choose the language you want to work with. The default is English (`en`) which can be changed to Czech (`cs`) via editing `Makefile`:
 {{{
 nano Makefile
 }}}
  change the first line to
+ if you want to work with Czech, change the first line to
 {{{
 LANGUAGE=en
+LANGUAGE=cs
 }}}
 . Test the prepared program that analyses 100 selected sentences
 …
  The output should be
 {{{
 ./compare_dep_trees.py data/trees/pdt2_etest data/trees/set_pdt2_etest-sel100
 UAS =  66.1 %
+./compare_dep_trees.py data/trees/ud21_gum_dev data/trees/set_ud21_gum_dev
+UAS =  55.4 %
 }}}
  You can see detailed evaluation (sentence by sentence) with
 …
  You can watch differences for one tree with
 {{{
 make diff SENTENCE=00009
+make diff SENTENCE=academic_librarians-10
 }}}
+ The left window with `ud21_gum_dev/academic_librarians-10` shows the
+ expected ground truth, the right window of `set_ud21_gum_dev/academic_librarians-10` displays the current parsing result (to be improved by you).[[br]]
  Exit the diff by pressing `q`.[[br]]
  You may inspect the tagged vertical text with
  {{{
  make vert SENTENCE=00009
+ make vert SENTENCE=academic_librarians-10
 }}}
  You can watch the two trees with (`python3-tk` must be installed in the system)
  {{{
 make view SENTENCE=00009
+make view SENTENCE=academic_librarians-10
 }}}
  For remote tree view, you may run
+ For remote tree view (i.e. inspecting the trees on different computer), you may run
  {{{
 make html SENTENCE=00009
+make html SENTENCE=academic_librarians-10
 }}}
  And point your browser to the `html/index.html` file. [[br]]
  You can extract the text of the sentence easily with
  {{{
 make text SENTENCE=00009
+make text SENTENCE=academic_librarians-10
 }}}
  English translation of the Czech sentences can be obtained via
  {{{
 make texttrans SENTENCE=00009
+make texttrans SENTENCE=academic_librarians-10
 }}}
 . Look at the files:
+. Look at the files (you may use `mc` file manager, exit it with `Esc+0`):
  * `data/vert/pdt2_etest` or `ud21_gum_dev` - 100 input sentences in vertical format. The tag format is  the Prague Dependency Treebank [https://ufal.mff.cuni.cz/pdt2.0/doc/manuals/en/m-layer/html/ch02s02s01.html positional tagset] for Czech and the [https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html  Penn Treebank tagset] for English
  * `data/trees/pdt2_etest` or `ud21_gum_dev` - 100 gold standard dependency trees from the Prague Dependency Treebank or the Universal Dependencies GUM corpus
 …
 == Assignment ==
 . Study the [https://nlp.fi.muni.cz/trac/set/wiki/documentation SET documentation]. The tags used in the Czech grammar are in the [raw-attachment:tagset.pdf Brno tagset].
+. Study the [https://nlp.fi.muni.cz/trac/set/wiki/documentation SET documentation]. The tags used in the English `grammar-en.set` follow the [https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html  Penn Treebank tagset] and in the Czech grammar `grammar-cs.set` the [raw-attachment:tagset.pdf Brno tagset].
 . Develop better grammar - repeat the process:
 {{{
 nano grammar.set # or use your favourite editor
+nano grammar-en.set # or use your favourite editor
 make set_trees
 make compare
 …
 . Write the final UAS in `grammar-cs.set` or `grammar-en.set`
 {{{
 # This is the SET grammar for Czech used in IA161 course
+# This is the SET grammar for English used in IA161 course
+#
 # ===========   resulting UAS =  66.9 %  ===================