Context Navigation

Version 1 (modified by Ales Horak, 5 years ago) (diff)
copied from private/AdvancedNlpCourse/ParsingCzech

Parsing of Czech: Between Rules and Stats

IA161 Advanced NLP Course?, Course Guarantee: Aleš Horák

Prepared by: Miloš Jakubíček

PEI, Wenzhe; GE, Tao; CHANG, Baobao. An effective neural network model for graph-based dependency parsing. In: Proc. of ACL. 2015.
CHOI, Jinho D.; TETREAULT, Joel; STENT, Amanda. It Depends: Dependency Parser Comparison Using A Web-based Evaluation Tool. In: Proc. of ACL. 2015.
DURRETT, Greg; KLEIN, Dan. Neural CRF Parsing. In: Proc. of ACL. 2015.

We will develop/adjust the grammar of the SET parser.

wget https://nlp.fi.muni.cz/trac/research/chrome/site/bigdata/ukol_ia161-parsing.zip

Unzip the downloaded file
```
unzip ukol_ia161-parsing.zip
```
Go to the unziped folder
```
cd ukol_ia161-parsing
```
Test the prepared program that analyses 100 selected sentences
```
make set_trees
make compare
```
The output should be
```
./compare_dep_trees.py data/trees/pdt2_etest data/trees/set_pdt2_etest-sel100
UAS =  66.1 %
```
You can see detailed evaluation (sentence by sentence) with
```
make compare SENTENCES=1
```
You can watch differences for one tree with
```
make diff SENTENCE=00009
```
Exit the diff by pressing q.
You can watch the two trees with (python-qt4 must be installed in the system)
```
make view SENTENCE=00009
```
You can extract the text of the sentence (e.g. for Google translate) easily with
```
make text SENTENCE=00009
```
Look at the files:
- data/vert/pdt2_etest-sel100 - 100 input sentences in vertical format. The tag format is the Prague Dependency Treebank positional tagset
- data/trees/pdt2_etest - 100 gold standard dependency trees from the Prague Dependency Treebank
- data/trees/set_pdt2_etest-sel100 - 100 trees output from SET by running make set_trees
- grammar.set - the grammar used in running SET

Study the SET documentation. The tags used in the grammar are in the Brno tagset.

Develop better grammar - repeat the process:

edit grammar.set # use your favourite editor
make set_trees
make compare

to improve the original UAS

Write the final UAS in grammar.set

# This is the SET grammar for Czech used in IA161 course
# 
# ===========   resulting UAS =  66.1 %  ===================

Download all attachments as: .zip