Changes between Initial Version and Version 1 of en/NlpInPracticeCourse/2021/ParsingCzech


Ignore:
Timestamp:
Aug 30, 2022, 10:40:23 AM (20 months ago)
Author:
Ales Horak
Comment:

copied from private/NlpInPracticeCourse/ParsingCzech

Legend:

Unmodified
Added
Removed
Modified
  • en/NlpInPracticeCourse/2021/ParsingCzech

    v1 v1  
     1= Parsing of Czech: Between Rules and Stats =
     2
     3[[https://is.muni.cz/auth/predmet/fi/ia161|IA161]] [[en/NlpInPracticeCourse|NLP in Practice Course]], Course Guarantee: Aleš Horák
     4
     5Prepared by: Miloš Jakubíček
     6
     7== State of the Art ==
     8
     9=== References ===
     10
     11 1. Zhang, Y., Zhou, H., & Li, Z. (2020). Fast and Accurate Neural CRF Constituency Parsing. arXiv preprint arXiv:2008.03736.
     12 1. Qi, P., Dozat, T., Zhang, Y., & Manning, C. D. (2019). Universal dependency parsing from scratch. arXiv preprint arXiv:1901.10457.
     13 1. Straka, M., Straková, J., & Hajič, J. (2019). Czech Text Processing with Contextual Embeddings: POS Tagging, Lemmatization, Parsing and NER. In International Conference on Text, Speech, and Dialogue (pp. 137-150). Springer, Cham.
     14 1. Baisa, V. and Kovář, V. (2014). Information extraction for Czech based on syntactic analysis. In Vetulani, Z. and Mariani, J., editors,Human Language Technology Challenges for Computer Science and Linguistics, pages 155–165. Springer International Publishing.
     15
     16
     17== Practical Session ==
     18
     19We will develop/adjust the grammar of the SET parser.
     20
     211. Download the [[htdocs:bigdata/ukol_ia161-parsing.zip|SET parser with evaluation dataset]]
     22{{{
     23wget https://nlp.fi.muni.cz/trac/research/chrome/site/bigdata/ukol_ia161-parsing.zip
     24}}}
     251. Unzip the downloaded file
     26{{{
     27unzip ukol_ia161-parsing.zip
     28}}}
     291. Go to the unziped folder
     30{{{
     31cd ukol_ia161-parsing
     32}}}
     331. Test the prepared program that analyses 100 selected sentences
     34{{{
     35make set_trees
     36make compare
     37}}}
     38 The output should be
     39{{{
     40./compare_dep_trees.py data/trees/pdt2_etest data/trees/set_pdt2_etest-sel100
     41UAS =  66.1 %
     42}}}
     43 You can see detailed evaluation (sentence by sentence) with
     44{{{
     45make compare SENTENCES=1
     46}}}
     47 You can watch differences for one tree with
     48{{{
     49make diff SENTENCE=00009
     50}}}
     51 Exit the diff by pressing `q`.[[br]]
     52 You can watch the two trees with (`python-qt4` must be installed in the system)
     53 {{{
     54make view SENTENCE=00009
     55}}}
     56 For remote tree view, you may run
     57 {{{
     58make html SENTENCE=00009
     59}}}
     60 And point your browser to the `html/index.html` file. [[br]]
     61 You can extract the text of the sentence easily with
     62 {{{
     63make text SENTENCE=00009
     64}}}
     65 English translation of the Czech sentences can be obtained via
     66 {{{
     67make texttrans SENTENCE=00009
     68}}}
     691. Look at the files:
     70 * `data/vert/pdt2_etest-sel100` - 100 input sentences in vertical format. The tag format is  the Prague Dependency Treebank [https://ufal.mff.cuni.cz/pdt2.0/doc/manuals/en/m-layer/html/ch02s02s01.html positional tagset]
     71 * `data/trees/pdt2_etest` - 100 gold standard dependency trees from the Prague Dependency Treebank
     72 * `data/trees/set_pdt2_etest-sel100` - 100 trees output from SET by running `make set_trees`
     73 * `grammar.set` - the grammar used in running SET
     74
     75== Assignment ==
     76
     771. Study the [https://nlp.fi.muni.cz/trac/set/wiki/documentation SET documentation]. The tags used in the grammar are in the [raw-attachment:tagset.pdf Brno tagset].
     781. Develop better grammar - repeat the process:
     79{{{
     80edit grammar.set # use your favourite editor
     81make set_trees
     82make compare
     83}}}
     84 to improve the original UAS
     851. Write the final UAS in `grammar.set`
     86{{{
     87# This is the SET grammar for Czech used in IA161 course
     88#
     89# ===========   resulting UAS =  66.1 %  ===================
     90}}}
     911. Upload your `grammar.set` to the homework vault.