Version 5 (modified by 9 years ago) (diff) | ,
---|
Parsing of Czech: Between Rules and Stats
IA161 Advanced NLP Course?, Course Guarantee: Aleš Horák
Prepared by: Miloš Jakubíček
State of the Art
References
- PEI, Wenzhe; GE, Tao; CHANG, Baobao. An effective neural network model for graph-based dependency parsing. In: Proc. of ACL. 2015.
- CHOI, Jinho D.; TETREAULT, Joel; STENT, Amanda. It Depends: Dependency Parser Comparison Using A Web-based Evaluation Tool. In: Proc. of ACL. 2015.
- DURRETT, Greg; KLEIN, Dan. Neural CRF Parsing. In: Proc. of ACL. 2015.
Practical Session
- Go to http://ske.fi.muni.cz, login and create a shadow copy of the Czech Wikipedia corpus by clicking on "Create grammar development corpus".
- Develop your own sketch grammar that will capture the following semantic relations in this corpus: hypernymy/hyponymy, meronymy/holonymy (hint: use
DUAL
directive), optionally you can develop more relations (e.g. "is-defined-as"). Read related documentation. Start with a couple of simple CQL queries that you pretest in the interface. - You can iteratively expand the grammar, upload it into the system, have the system compute word sketches and review the results
- When you are happy with the grammar, logon to the
alba.fi.muni.cz
server and use thedumpws
command to export the content of the word sketch database:
dumpws /corpora/ca/user_data/<YOUR_USERNAME_IN_SKETCH_ENGINE>/registry/<YOUR_CORPUS_ID>
- Process the output of
dumpws
with a simple Bash or Python script to select first 100 most salient headword-collocation pairs for each relation. Upload the resulting list into the IS vault.
Attachments (2)
- add.png (288 bytes) - added by 8 years ago.
- tagset.pdf (120.2 KB) - added by 4 years ago.
Download all attachments as: .zip