Changes between Version 31 and Version 32 of private/NlpInPracticeCourse/ParsingCzech
- Timestamp:
- Nov 5, 2023, 8:46:00 PM (6 months ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
private/NlpInPracticeCourse/ParsingCzech
v31 v32 17 17 == Practical Session == 18 18 19 We will develop/adjust the grammar of the SET parser .19 We will develop/adjust the grammar of the SET parser (for English or Czech). 20 20 21 21 1. Download the [[htdocs:bigdata/ukol_ia161-parsing.zip|SET parser with evaluation dataset]] … … 31 31 cd ukol_ia161-parsing 32 32 }}} 33 1. Choose the language you want to work with. The default is Czech (`cs`) which can be changed to English (`en`) via editing `Makefile`:33 1. [optional] Choose the language you want to work with. The default is English (`en`) which can be changed to Czech (`cs`) via editing `Makefile`: 34 34 {{{ 35 35 nano Makefile 36 36 }}} 37 change the first line to37 if you want to work with Czech, change the first line to 38 38 {{{ 39 LANGUAGE= en39 LANGUAGE=cs 40 40 }}} 41 41 1. Test the prepared program that analyses 100 selected sentences … … 46 46 The output should be 47 47 {{{ 48 ./compare_dep_trees.py data/trees/ pdt2_etest data/trees/set_pdt2_etest-sel10049 UAS = 66.1%48 ./compare_dep_trees.py data/trees/ud21_gum_dev data/trees/set_ud21_gum_dev 49 UAS = 55.4 % 50 50 }}} 51 51 You can see detailed evaluation (sentence by sentence) with … … 55 55 You can watch differences for one tree with 56 56 {{{ 57 make diff SENTENCE= 0000957 make diff SENTENCE=academic_librarians-10 58 58 }}} 59 The left window with `ud21_gum_dev/academic_librarians-10` shows the 60 expected ground truth, the right window of `set_ud21_gum_dev/academic_librarians-10` displays the current parsing result (to be improved by you).[[br]] 59 61 Exit the diff by pressing `q`.[[br]] 60 62 You may inspect the tagged vertical text with 61 63 {{{ 62 make vert SENTENCE= 0000964 make vert SENTENCE=academic_librarians-10 63 65 }}} 64 66 You can watch the two trees with (`python3-tk` must be installed in the system) 65 67 {{{ 66 make view SENTENCE= 0000968 make view SENTENCE=academic_librarians-10 67 69 }}} 68 For remote tree view , you may run70 For remote tree view (i.e. inspecting the trees on different computer), you may run 69 71 {{{ 70 make html SENTENCE= 0000972 make html SENTENCE=academic_librarians-10 71 73 }}} 72 74 And point your browser to the `html/index.html` file. [[br]] 73 75 You can extract the text of the sentence easily with 74 76 {{{ 75 make text SENTENCE= 0000977 make text SENTENCE=academic_librarians-10 76 78 }}} 77 79 English translation of the Czech sentences can be obtained via 78 80 {{{ 79 make texttrans SENTENCE= 0000981 make texttrans SENTENCE=academic_librarians-10 80 82 }}} 81 1. Look at the files :83 1. Look at the files (you may use `mc` file manager, exit it with `Esc+0`): 82 84 * `data/vert/pdt2_etest` or `ud21_gum_dev` - 100 input sentences in vertical format. The tag format is the Prague Dependency Treebank [https://ufal.mff.cuni.cz/pdt2.0/doc/manuals/en/m-layer/html/ch02s02s01.html positional tagset] for Czech and the [https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html Penn Treebank tagset] for English 83 85 * `data/trees/pdt2_etest` or `ud21_gum_dev` - 100 gold standard dependency trees from the Prague Dependency Treebank or the Universal Dependencies GUM corpus … … 87 89 == Assignment == 88 90 89 1. Study the [https://nlp.fi.muni.cz/trac/set/wiki/documentation SET documentation]. The tags used in the Czech grammar are inthe [raw-attachment:tagset.pdf Brno tagset].91 1. Study the [https://nlp.fi.muni.cz/trac/set/wiki/documentation SET documentation]. The tags used in the English `grammar-en.set` follow the [https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html Penn Treebank tagset] and in the Czech grammar `grammar-cs.set` the [raw-attachment:tagset.pdf Brno tagset]. 90 92 1. Develop better grammar - repeat the process: 91 93 {{{ 92 nano grammar .set # or use your favourite editor94 nano grammar-en.set # or use your favourite editor 93 95 make set_trees 94 96 make compare … … 97 99 1. Write the final UAS in `grammar-cs.set` or `grammar-en.set` 98 100 {{{ 99 # This is the SET grammar for Czech used in IA161 course101 # This is the SET grammar for English used in IA161 course 100 102 # 101 103 # =========== resulting UAS = 66.9 % ===================