| 1 | = Parsing of Czech: Between Rules and Stats = |
| 2 | |
| 3 | [[https://is.muni.cz/auth/predmet/fi/ia161|IA161]] [[en/AdvancedNlpCourse|Advanced NLP Course]], Course Guarantee: Aleš Horák |
| 4 | |
| 5 | Prepared by: Miloš Jakubíček |
| 6 | |
| 7 | == State of the Art == |
| 8 | |
| 9 | === References === |
| 10 | |
| 11 | 1. PEI, Wenzhe; GE, Tao; CHANG, Baobao. An effective neural network model for graph-based dependency parsing. In: Proc. of ACL. 2015. |
| 12 | 1. CHOI, Jinho D.; TETREAULT, Joel; STENT, Amanda. It Depends: Dependency Parser Comparison Using A Web-based Evaluation Tool. In: Proc. of ACL. 2015. |
| 13 | 1. DURRETT, Greg; KLEIN, Dan. Neural CRF Parsing. In: Proc. of ACL. 2015. |
| 14 | |
| 15 | == Practical Session == |
| 16 | |
| 17 | We will develop/adjust the grammar of the SET parser. |
| 18 | |
| 19 | 1. Download the [[htdocs:bigdata/ukol_ia161-parsing.zip|SET parser with evaluation dataset]] |
| 20 | {{{ |
| 21 | wget https://nlp.fi.muni.cz/trac/research/chrome/site/bigdata/ukol_ia161-parsing.zip |
| 22 | }}} |
| 23 | 1. Unzip the downloaded file |
| 24 | {{{ |
| 25 | unzip ukol_ia161-parsing.zip |
| 26 | }}} |
| 27 | 1. Go to the unziped folder |
| 28 | {{{ |
| 29 | cd ukol_ia161-parsing |
| 30 | }}} |
| 31 | 1. Test the prepared program that analyses 100 selected sentences |
| 32 | {{{ |
| 33 | make set_trees |
| 34 | make compare |
| 35 | }}} |
| 36 | The output should be |
| 37 | {{{ |
| 38 | ./compare_dep_trees.py data/trees/pdt2_etest data/trees/set_pdt2_etest-sel100 |
| 39 | UAS = 66.1 % |
| 40 | }}} |
| 41 | You can see detailed evaluation (sentence by sentence) with |
| 42 | {{{ |
| 43 | make compare SENTENCES=1 |
| 44 | }}} |
| 45 | You can watch differences for one tree with |
| 46 | {{{ |
| 47 | make diff SENTENCE=00009 |
| 48 | }}} |
| 49 | Exit the diff by pressing `q`.[[br]] |
| 50 | You can watch the two trees with (`python-qt4` must be installed in the system) |
| 51 | {{{ |
| 52 | make view SENTENCE=00009 |
| 53 | }}} |
| 54 | You can extract the text of the sentence (e.g. for Google translate) easily with |
| 55 | {{{ |
| 56 | make text SENTENCE=00009 |
| 57 | }}} |
| 58 | 1. Look at the files: |
| 59 | * `data/vert/pdt2_etest-sel100` - 100 input sentences in vertical format. The tag format is the Prague Dependency Treebank [https://ufal.mff.cuni.cz/pdt2.0/doc/manuals/en/m-layer/html/ch02s02s01.html positional tagset] |
| 60 | * `data/trees/pdt2_etest` - 100 gold standard dependency trees from the Prague Dependency Treebank |
| 61 | * `data/trees/set_pdt2_etest-sel100` - 100 trees output from SET by running `make set_trees` |
| 62 | * `grammar.set` - the grammar used in running SET |
| 63 | |
| 64 | == Assignment == |
| 65 | |
| 66 | 1. Study the [https://nlp.fi.muni.cz/trac/set/wiki/documentation SET documentation]. The tags used in the grammar are in the [raw-attachment:tagset.pdf Brno tagset]. |
| 67 | 1. Develop better grammar - repeat the process: |
| 68 | {{{ |
| 69 | edit grammar.set # use your favourite editor |
| 70 | make set_trees |
| 71 | make compare |
| 72 | }}} |
| 73 | to improve the original UAS |
| 74 | 1. Write the final UAS in `grammar.set` |
| 75 | {{{ |
| 76 | # This is the SET grammar for Czech used in IA161 course |
| 77 | # |
| 78 | # =========== resulting UAS = 66.1 % =================== |
| 79 | }}} |
| 80 | 1. Upload your `grammar.set` to the homework vault. |