wiki:private/NlpInPracticeCourse/ParsingCzech

Version 20 (modified by Ales Horak, 5 years ago) (diff)

--

Parsing of Czech: Between Rules and Stats

IA161 Advanced NLP Course?, Course Guarantee: Aleš Horák

Prepared by: Miloš Jakubíček

State of the Art

References

  1. PEI, Wenzhe; GE, Tao; CHANG, Baobao. An effective neural network model for graph-based dependency parsing. In: Proc. of ACL. 2015.
  2. CHOI, Jinho D.; TETREAULT, Joel; STENT, Amanda. It Depends: Dependency Parser Comparison Using A Web-based Evaluation Tool. In: Proc. of ACL. 2015.
  3. DURRETT, Greg; KLEIN, Dan. Neural CRF Parsing. In: Proc. of ACL. 2015.

Practical Session

We will develop/ajdust the grammar of the SET parser.

  1. Download the SET parser with evaluation dataset
    wget https://nlp.fi.muni.cz/trac/research/chrome/site/bigdata/ukol_ia161-parsing.zip
    
  2. Unzip the downloaded file
    unzip ukol_ia161-parsing.zip
    
  3. Go to the unziped folder
    cd ukol_ia161-parsing
    
  4. Test the prepared program that analyses 100 selected sentences
    make set_trees
    make compare
    
    The output should be
    ./compare_dep_trees.py data/trees/pdt2_etest data/trees/set_pdt2_etest-sel100
    UAS =  66.1 %
    
    You can see detailed evaluation (sentence by sentence) with
    make compare SENTENCES=1
    
    You can watch differences for one tree with
    make diff SENTENCE=00009
    
    Exit the diff by pressing q.
  5. Look at the files:
    • data/vert/pdt2_etest-sel100 - 100 input sentences in vertical format. The tag format is the Prague Dependency Treebank positional tagset
    • data/trees/pdt2_etest - 100 gold standard dependency trees from the Prague Dependency Treebank
    • data/trees/set_pdt2_etest-sel100 - 100 trees output from SET by running make set_trees
    • grammar.set - the grammar used in running SET

Assignment

  1. Study the SET documentation. The tags used in the grammar are in the Brno tagset.
  2. Develop better grammar - repeat the process:
    edit grammar.set # use your favourite editor
    make set_trees
    make compare
    
    to improve the original UAS
  3. Write the final UAS in grammar.set
    # This is the SET grammar for Czech used in IA161 course
    # 
    # ===========   resulting UAS =  66.1 %  ===================
    
  4. Upload your grammar.set to the homework vault.

Attachments (2)

Download all attachments as: .zip