= Parsing of Czech: Between Rules and Stats =

[[https://is.muni.cz/auth/predmet/fi/ia161|IA161 Advanced NLP Course]], Course Guarantee: Aleš Horák

Prepared by: Miloš Jakubíček

== State of the Art ==

=== References ===

 1. PEI, Wenzhe; GE, Tao; CHANG, Baobao. An effective neural network model for graph-based dependency parsing. In: Proc. of ACL. 2015.
 1. CHOI, Jinho D.; TETREAULT, Joel; STENT, Amanda. It Depends: Dependency Parser Comparison Using A Web-based Evaluation Tool. In: Proc. of ACL. 2015.
 1. DURRETT, Greg; KLEIN, Dan. Neural CRF Parsing. In: Proc. of ACL. 2015.

== Practical Session ==

 1. Go to http://ske.fi.muni.cz, login and create a shadow copy of the Czech Wikipedia corpus by clicking on ￼"Create grammar development corpus".
 1. Develop your own sketch grammar that will capture the following semantic relations in this corpus: hypernymy/hyponymy, meronymy/holonymy (hint: use {{{DUAL}}} directive), optionally you can develop more relations (e.g. "is-defined-as").
    Read related [https://www.sketchengine.co.uk/writing-sketch-grammars/ documentation]. Start with a couple of simple CQL queries that you pretest in the interface.
 1. You can iteratively expand the grammar, upload it into the system, have the system compute word sketches and review the results
 1. When you are happy with the grammar, logon to the {{{alba.fi.muni.cz}}} server and use the {{{dumpws}}} command to export the content of the word sketch database:

    {{{dumpws /corpora/ca/user_data/<YOUR_USERNAME_IN_SKETCH_ENGINE>/registry/<YOUR_CORPUS_ID>}}}
 5. Process the output of {{{dumpws}}} with a simple Bash or Python script to select first 100 most salient headword-collocation pairs for each relation. Upload the resulting list into the IS vault.