Changes between Initial Version and Version 1 of en/AdvancedNlpCourse2019/ParsingCzech


Ignore:
Timestamp:
Oct 1, 2020, 3:33:53 PM (12 months ago)
Author:
Ales Horak
Comment:

copied from private/AdvancedNlpCourse/ParsingCzech

Legend:

Unmodified
Added
Removed
Modified
  • en/AdvancedNlpCourse2019/ParsingCzech

    v1 v1  
     1= Parsing of Czech: Between Rules and Stats =
     2
     3[[https://is.muni.cz/auth/predmet/fi/ia161|IA161]] [[en/AdvancedNlpCourse|Advanced NLP Course]], Course Guarantee: Aleš Horák
     4
     5Prepared by: Miloš Jakubíček
     6
     7== State of the Art ==
     8
     9=== References ===
     10
     11 1. PEI, Wenzhe; GE, Tao; CHANG, Baobao. An effective neural network model for graph-based dependency parsing. In: Proc. of ACL. 2015.
     12 1. CHOI, Jinho D.; TETREAULT, Joel; STENT, Amanda. It Depends: Dependency Parser Comparison Using A Web-based Evaluation Tool. In: Proc. of ACL. 2015.
     13 1. DURRETT, Greg; KLEIN, Dan. Neural CRF Parsing. In: Proc. of ACL. 2015.
     14
     15== Practical Session ==
     16
     17We will develop/adjust the grammar of the SET parser.
     18
     191. Download the [[htdocs:bigdata/ukol_ia161-parsing.zip|SET parser with evaluation dataset]]
     20{{{
     21wget https://nlp.fi.muni.cz/trac/research/chrome/site/bigdata/ukol_ia161-parsing.zip
     22}}}
     231. Unzip the downloaded file
     24{{{
     25unzip ukol_ia161-parsing.zip
     26}}}
     271. Go to the unziped folder
     28{{{
     29cd ukol_ia161-parsing
     30}}}
     311. Test the prepared program that analyses 100 selected sentences
     32{{{
     33make set_trees
     34make compare
     35}}}
     36 The output should be
     37{{{
     38./compare_dep_trees.py data/trees/pdt2_etest data/trees/set_pdt2_etest-sel100
     39UAS =  66.1 %
     40}}}
     41 You can see detailed evaluation (sentence by sentence) with
     42{{{
     43make compare SENTENCES=1
     44}}}
     45 You can watch differences for one tree with
     46{{{
     47make diff SENTENCE=00009
     48}}}
     49 Exit the diff by pressing `q`.[[br]]
     50 You can watch the two trees with (`python-qt4` must be installed in the system)
     51 {{{
     52make view SENTENCE=00009
     53}}}
     54 You can extract the text of the sentence (e.g. for Google translate) easily with
     55 {{{
     56make text SENTENCE=00009
     57}}}
     581. Look at the files:
     59 * `data/vert/pdt2_etest-sel100` - 100 input sentences in vertical format. The tag format is  the Prague Dependency Treebank [https://ufal.mff.cuni.cz/pdt2.0/doc/manuals/en/m-layer/html/ch02s02s01.html positional tagset]
     60 * `data/trees/pdt2_etest` - 100 gold standard dependency trees from the Prague Dependency Treebank
     61 * `data/trees/set_pdt2_etest-sel100` - 100 trees output from SET by running `make set_trees`
     62 * `grammar.set` - the grammar used in running SET
     63
     64== Assignment ==
     65
     661. Study the [https://nlp.fi.muni.cz/trac/set/wiki/documentation SET documentation]. The tags used in the grammar are in the [raw-attachment:tagset.pdf Brno tagset].
     671. Develop better grammar - repeat the process:
     68{{{
     69edit grammar.set # use your favourite editor
     70make set_trees
     71make compare
     72}}}
     73 to improve the original UAS
     741. Write the final UAS in `grammar.set`
     75{{{
     76# This is the SET grammar for Czech used in IA161 course
     77#
     78# ===========   resulting UAS =  66.1 %  ===================
     79}}}
     801. Upload your `grammar.set` to the homework vault.