Context Navigation

← Previous Change
Wiki History
Next Change →

ParsingCzech

Timestamp:: Oct 1, 2020, 3:33:53 PM (5 years ago)
Author:: Ales Horak
Comment:: copied from private/AdvancedNlpCourse/ParsingCzech

Legend:

: Unmodified
: Added
: Removed
: Modified

en/AdvancedNlpCourse2019/ParsingCzech

                       v1
+= Parsing of Czech: Between Rules and Stats =
+[[https://is.muni.cz/auth/predmet/fi/ia161|IA161]] [[en/AdvancedNlpCourse|Advanced NLP Course]], Course Guarantee: Aleš Horák
+Prepared by: Miloš Jakubíček
+== State of the Art ==
+=== References ===
+. PEI, Wenzhe; GE, Tao; CHANG, Baobao. An effective neural network model for graph-based dependency parsing. In: Proc. of ACL. 2015.
+. CHOI, Jinho D.; TETREAULT, Joel; STENT, Amanda. It Depends: Dependency Parser Comparison Using A Web-based Evaluation Tool. In: Proc. of ACL. 2015.
+. DURRETT, Greg; KLEIN, Dan. Neural CRF Parsing. In: Proc. of ACL. 2015.
+== Practical Session ==
+We will develop/adjust the grammar of the SET parser.
+. Download the [[htdocs:bigdata/ukol_ia161-parsing.zip|SET parser with evaluation dataset]]
+{{{
+wget https://nlp.fi.muni.cz/trac/research/chrome/site/bigdata/ukol_ia161-parsing.zip
+}}}
+. Unzip the downloaded file
+{{{
+unzip ukol_ia161-parsing.zip
+}}}
+. Go to the unziped folder
+{{{
+cd ukol_ia161-parsing
+}}}
+. Test the prepared program that analyses 100 selected sentences
+{{{
+make set_trees
+make compare
+}}}
+ The output should be
+{{{
+./compare_dep_trees.py data/trees/pdt2_etest data/trees/set_pdt2_etest-sel100
+UAS =  66.1 %
+}}}
+ You can see detailed evaluation (sentence by sentence) with
+{{{
+make compare SENTENCES=1
+}}}
+ You can watch differences for one tree with
+{{{
+make diff SENTENCE=00009
+}}}
+ Exit the diff by pressing `q`.[[br]]
+ You can watch the two trees with (`python-qt4` must be installed in the system)
+ {{{
+make view SENTENCE=00009
+}}}
+ You can extract the text of the sentence (e.g. for Google translate) easily with
+ {{{
+make text SENTENCE=00009
+}}}
+. Look at the files:
+ * `data/vert/pdt2_etest-sel100` - 100 input sentences in vertical format. The tag format is  the Prague Dependency Treebank [https://ufal.mff.cuni.cz/pdt2.0/doc/manuals/en/m-layer/html/ch02s02s01.html positional tagset]
+ * `data/trees/pdt2_etest` - 100 gold standard dependency trees from the Prague Dependency Treebank
+ * `data/trees/set_pdt2_etest-sel100` - 100 trees output from SET by running `make set_trees`
+ * `grammar.set` - the grammar used in running SET
+== Assignment ==
+. Study the [https://nlp.fi.muni.cz/trac/set/wiki/documentation SET documentation]. The tags used in the grammar are in the [raw-attachment:tagset.pdf Brno tagset].
+. Develop better grammar - repeat the process:
+{{{
+edit grammar.set # use your favourite editor
+make set_trees
+make compare
+}}}
+ to improve the original UAS
+. Write the final UAS in `grammar.set`
+{{{
+# This is the SET grammar for Czech used in IA161 course
+#
+# ===========   resulting UAS =  66.1 %  ===================
+}}}
+. Upload your `grammar.set` to the homework vault.