= Parsing of Czech: Between Rules and Stats = [[https://is.muni.cz/auth/predmet/fi/ia161|IA161]] [[en/AdvancedNlpCourse|Advanced NLP Course]], Course Guarantee: Aleš Horák Prepared by: Miloš Jakubíček == State of the Art == === References === 1. PEI, Wenzhe; GE, Tao; CHANG, Baobao. An effective neural network model for graph-based dependency parsing. In: Proc. of ACL. 2015. 1. CHOI, Jinho D.; TETREAULT, Joel; STENT, Amanda. It Depends: Dependency Parser Comparison Using A Web-based Evaluation Tool. In: Proc. of ACL. 2015. 1. DURRETT, Greg; KLEIN, Dan. Neural CRF Parsing. In: Proc. of ACL. 2015. == Practical Session == We will develop/adjust the grammar of the SET parser. 1. Download the [[htdocs:bigdata/ukol_ia161-parsing.zip|SET parser with evaluation dataset]] {{{ wget https://nlp.fi.muni.cz/trac/research/chrome/site/bigdata/ukol_ia161-parsing.zip }}} 1. Unzip the downloaded file {{{ unzip ukol_ia161-parsing.zip }}} 1. Go to the unziped folder {{{ cd ukol_ia161-parsing }}} 1. Test the prepared program that analyses 100 selected sentences {{{ make set_trees make compare }}} The output should be {{{ ./compare_dep_trees.py data/trees/pdt2_etest data/trees/set_pdt2_etest-sel100 UAS = 66.1 % }}} You can see detailed evaluation (sentence by sentence) with {{{ make compare SENTENCES=1 }}} You can watch differences for one tree with {{{ make diff SENTENCE=00009 }}} Exit the diff by pressing `q`.[[br]] You can watch the two trees with (`python-qt4` must be installed in the system) {{{ make view SENTENCE=00009 }}} You can extract the text of the sentence (e.g. for Google translate) easily with {{{ make text SENTENCE=00009 }}} 1. Look at the files: * `data/vert/pdt2_etest-sel100` - 100 input sentences in vertical format. The tag format is the Prague Dependency Treebank [https://ufal.mff.cuni.cz/pdt2.0/doc/manuals/en/m-layer/html/ch02s02s01.html positional tagset] * `data/trees/pdt2_etest` - 100 gold standard dependency trees from the Prague Dependency Treebank * `data/trees/set_pdt2_etest-sel100` - 100 trees output from SET by running `make set_trees` * `grammar.set` - the grammar used in running SET == Assignment == 1. Study the [https://nlp.fi.muni.cz/trac/set/wiki/documentation SET documentation]. The tags used in the grammar are in the [raw-attachment:tagset.pdf Brno tagset]. 1. Develop better grammar - repeat the process: {{{ edit grammar.set # use your favourite editor make set_trees make compare }}} to improve the original UAS 1. Write the final UAS in `grammar.set` {{{ # This is the SET grammar for Czech used in IA161 course # # =========== resulting UAS = 66.1 % =================== }}} 1. Upload your `grammar.set` to the homework vault.