Changes between Version 30 and Version 31 of private/NlpInPracticeCourse/ParsingCzech
- Timestamp:
- Nov 10, 2022, 11:27:03 PM (17 months ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
private/NlpInPracticeCourse/ParsingCzech
v30 v31 31 31 cd ukol_ia161-parsing 32 32 }}} 33 1. Choose the language you want to work with. The default is Czech (`cs`) which can be changed to English (`en`) via editing `Makefile`: 34 {{{ 35 nano Makefile 36 }}} 37 change the first line to 38 {{{ 39 LANGUAGE=en 40 }}} 33 41 1. Test the prepared program that analyses 100 selected sentences 34 42 {{{ … … 50 58 }}} 51 59 Exit the diff by pressing `q`.[[br]] 52 You can watch the two trees with (`python-qt4` must be installed in the system) 60 You may inspect the tagged vertical text with 61 {{{ 62 make vert SENTENCE=00009 63 }}} 64 You can watch the two trees with (`python3-tk` must be installed in the system) 53 65 {{{ 54 66 make view SENTENCE=00009 … … 68 80 }}} 69 81 1. Look at the files: 70 * `data/vert/pdt2_etest -sel100` - 100 input sentences in vertical format. The tag format is the Prague Dependency Treebank [https://ufal.mff.cuni.cz/pdt2.0/doc/manuals/en/m-layer/html/ch02s02s01.html positional tagset]71 * `data/trees/pdt2_etest` - 100 gold standard dependency trees from the Prague Dependency Treebank72 * `data/trees/set_pdt2_etest -sel100` - 100 trees output from SET by running `make set_trees`73 * `grammar .set` - the grammar used in running SET82 * `data/vert/pdt2_etest` or `ud21_gum_dev` - 100 input sentences in vertical format. The tag format is the Prague Dependency Treebank [https://ufal.mff.cuni.cz/pdt2.0/doc/manuals/en/m-layer/html/ch02s02s01.html positional tagset] for Czech and the [https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html Penn Treebank tagset] for English 83 * `data/trees/pdt2_etest` or `ud21_gum_dev` - 100 gold standard dependency trees from the Prague Dependency Treebank or the Universal Dependencies GUM corpus 84 * `data/trees/set_pdt2_etest` or `set_ud21_gum_dev` - 100 trees output from SET by running `make set_trees` 85 * `grammar-cs.set` or `grammar-en.set` - the grammar used in running SET 74 86 75 87 == Assignment == 76 88 77 1. Study the [https://nlp.fi.muni.cz/trac/set/wiki/documentation SET documentation]. The tags used in the grammar are in the [raw-attachment:tagset.pdf Brno tagset].89 1. Study the [https://nlp.fi.muni.cz/trac/set/wiki/documentation SET documentation]. The tags used in the Czech grammar are in the [raw-attachment:tagset.pdf Brno tagset]. 78 90 1. Develop better grammar - repeat the process: 79 91 {{{ 80 edit grammar.set #use your favourite editor92 nano grammar.set # or use your favourite editor 81 93 make set_trees 82 94 make compare 83 95 }}} 84 96 to improve the original UAS 85 1. Write the final UAS in `grammar .set`97 1. Write the final UAS in `grammar-cs.set` or `grammar-en.set` 86 98 {{{ 87 99 # This is the SET grammar for Czech used in IA161 course 88 100 # 89 # =========== resulting UAS = 66. 1% ===================101 # =========== resulting UAS = 66.9 % =================== 90 102 }}} 91 1. Upload your `grammar .set` to the homework vault.103 1. Upload your `grammar-cs.set` or `grammar-en.set` to the homework vault.