Changes between Version 30 and Version 31 of private/NlpInPracticeCourse/ParsingCzech


Ignore:
Timestamp:
Nov 10, 2022, 11:27:03 PM (17 months ago)
Author:
Ales Horak
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • private/NlpInPracticeCourse/ParsingCzech

    v30 v31  
    3131cd ukol_ia161-parsing
    3232}}}
     331. Choose the language you want to work with. The default is Czech (`cs`) which can be changed to English (`en`) via editing `Makefile`:
     34{{{
     35nano Makefile
     36}}}
     37 change the first line to
     38{{{
     39LANGUAGE=en
     40}}}
    33411. Test the prepared program that analyses 100 selected sentences
    3442{{{
     
    5058}}}
    5159 Exit the diff by pressing `q`.[[br]]
    52  You can watch the two trees with (`python-qt4` must be installed in the system)
     60 You may inspect the tagged vertical text with
     61 {{{
     62 make vert SENTENCE=00009
     63}}}
     64 You can watch the two trees with (`python3-tk` must be installed in the system)
    5365 {{{
    5466make view SENTENCE=00009
     
    6880}}}
    69811. Look at the files:
    70  * `data/vert/pdt2_etest-sel100` - 100 input sentences in vertical format. The tag format is  the Prague Dependency Treebank [https://ufal.mff.cuni.cz/pdt2.0/doc/manuals/en/m-layer/html/ch02s02s01.html positional tagset]
    71  * `data/trees/pdt2_etest` - 100 gold standard dependency trees from the Prague Dependency Treebank
    72  * `data/trees/set_pdt2_etest-sel100` - 100 trees output from SET by running `make set_trees`
    73  * `grammar.set` - the grammar used in running SET
     82 * `data/vert/pdt2_etest` or `ud21_gum_dev` - 100 input sentences in vertical format. The tag format is  the Prague Dependency Treebank [https://ufal.mff.cuni.cz/pdt2.0/doc/manuals/en/m-layer/html/ch02s02s01.html positional tagset] for Czech and the [https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html  Penn Treebank tagset] for English
     83 * `data/trees/pdt2_etest` or `ud21_gum_dev` - 100 gold standard dependency trees from the Prague Dependency Treebank or the Universal Dependencies GUM corpus
     84 * `data/trees/set_pdt2_etest` or `set_ud21_gum_dev` - 100 trees output from SET by running `make set_trees`
     85 * `grammar-cs.set` or `grammar-en.set` - the grammar used in running SET
    7486
    7587== Assignment ==
    7688
    77 1. Study the [https://nlp.fi.muni.cz/trac/set/wiki/documentation SET documentation]. The tags used in the grammar are in the [raw-attachment:tagset.pdf Brno tagset].
     891. Study the [https://nlp.fi.muni.cz/trac/set/wiki/documentation SET documentation]. The tags used in the Czech grammar are in the [raw-attachment:tagset.pdf Brno tagset].
    78901. Develop better grammar - repeat the process:
    7991{{{
    80 edit grammar.set # use your favourite editor
     92nano grammar.set # or use your favourite editor
    8193make set_trees
    8294make compare
    8395}}}
    8496 to improve the original UAS
    85 1. Write the final UAS in `grammar.set`
     971. Write the final UAS in `grammar-cs.set` or `grammar-en.set`
    8698{{{
    8799# This is the SET grammar for Czech used in IA161 course
    88100#
    89 # ===========   resulting UAS =  66.1 %  ===================
     101# ===========   resulting UAS =  66.9 %  ===================
    90102}}}
    91 1. Upload your `grammar.set` to the homework vault.
     1031. Upload your `grammar-cs.set` or `grammar-en.set` to the homework vault.