Version 20 (modified by 5 years ago) (diff) | ,
---|
Parsing of Czech: Between Rules and Stats
IA161 Advanced NLP Course?, Course Guarantee: Aleš Horák
Prepared by: Miloš Jakubíček
State of the Art
References
- PEI, Wenzhe; GE, Tao; CHANG, Baobao. An effective neural network model for graph-based dependency parsing. In: Proc. of ACL. 2015.
- CHOI, Jinho D.; TETREAULT, Joel; STENT, Amanda. It Depends: Dependency Parser Comparison Using A Web-based Evaluation Tool. In: Proc. of ACL. 2015.
- DURRETT, Greg; KLEIN, Dan. Neural CRF Parsing. In: Proc. of ACL. 2015.
Practical Session
We will develop/ajdust the grammar of the SET parser.
- Download the SET parser with evaluation dataset
wget https://nlp.fi.muni.cz/trac/research/chrome/site/bigdata/ukol_ia161-parsing.zip
- Unzip the downloaded file
unzip ukol_ia161-parsing.zip
- Go to the unziped folder
cd ukol_ia161-parsing
- Test the prepared program that analyses 100 selected sentences
make set_trees make compare
The output should be./compare_dep_trees.py data/trees/pdt2_etest data/trees/set_pdt2_etest-sel100 UAS = 66.1 %
You can see detailed evaluation (sentence by sentence) withmake compare SENTENCES=1
You can watch differences for one tree withmake diff SENTENCE=00009
Exit the diff by pressingq
. - Look at the files:
data/vert/pdt2_etest-sel100
- 100 input sentences in vertical format. The tag format is the Prague Dependency Treebank positional tagsetdata/trees/pdt2_etest
- 100 gold standard dependency trees from the Prague Dependency Treebankdata/trees/set_pdt2_etest-sel100
- 100 trees output from SET by runningmake set_trees
grammar.set
- the grammar used in running SET
Assignment
- Study the SET documentation. The tags used in the grammar are in the Brno tagset.
- Develop better grammar - repeat the process:
edit grammar.set # use your favourite editor make set_trees make compare
to improve the original UAS - Write the final UAS in
grammar.set
# This is the SET grammar for Czech used in IA161 course # # =========== resulting UAS = 66.1 % ===================
- Upload your
grammar.set
to the homework vault.
Attachments (2)
- add.png (288 bytes) - added by 9 years ago.
- tagset.pdf (120.2 KB) - added by 5 years ago.
Download all attachments as: .zip