Changes between Version 45 and Version 46 of private/NlpInPracticeCourse/AutomaticCorrection


Ignore:
Timestamp:
Dec 11, 2023, 10:18:24 PM (5 months ago)
Author:
Ales Horak
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • private/NlpInPracticeCourse/AutomaticCorrection

    v45 v46  
    1919
    2020== Practical Session ==
     21{{{
     22#!div class="wiki-toc" style="width: 40%"
     23**Note:** If you are new to the [https://en.wikipedia.org/wiki/Command-line_interface command line interface] via a [https://en.wikipedia.org/wiki/Terminal_emulator terminal window], you may find the **[https://ubuntu.com/tutorials/command-line-for-beginners#3-opening-a-terminal tutorial for working in terminal]** useful.
     24}}}
    2125
    2226There are 2 tasks, you may choose one or both:
     
    3337
    3438
    35  1. Download the prepared script  [[raw-attachment:spell.py|spell.py]] and the training data collection  [[raw-attachment:big.txt|big.txt]].
    36  1. Test the script by running `python ./spell.py` in your working directory.
    37  1. Open it in your favourite editor and we will walk through its functionality.
     39 1. Download [htdocs:bigdata/task_ia161-spell.zip task_ia161-spell.zip] with a prepared script  `spell.py` and a training data collection  `big.txt`. Unzip it and change to the contained directory.
     40 {{{
     41wget https://nlp.fi.muni.cz/trac/research/chrome/site/bigdata/task_ia161-spell.zip
     42unzip task_ia161-spell.zip
     43cd task_ia161-spell
     44}}}
     45 1. Test the script by running
     46 {{{
     47python spell.py
     48}}}
     49 1. Open `spell.py` in your favourite editor and we will walk through its functionality.
    3850
    3951
     
    93105=== Task 2 ===
    94106
    95 1. login to aurora: `ssh aurora`
    96 1. download:
    97    1. [raw-attachment:punct.set syntactic grammar] for punctuation detection for the [http://nlp.fi.muni.cz/projects/set SET parser]
    98    1. [raw-attachment:test-nopunct.txt testing text with no commas]
    99    1. [raw-attachment:eval-gold.txt evaluation text with correct punctuation]
    100    1. [raw-attachment:evalpunct_robust.py evaluation script] which computes recall and precision with both texts
     1071. login to asteria04: `ssh asteria04`
     1081. download [htdocs:bigdata/task_ia161-grammar.zip task_ia161-grammar.zip] containing:
     109   1. `punct.set`, the syntactic grammar for punctuation detection for the [http://nlp.fi.muni.cz/projects/set SET parser]
     110   1. `test-nopunct.txt` - testing text with no commas
     111   1. `eval-gold.txt` - evaluation text with correct punctuation
     112   1. `evalpunct_robust.py` - evaluation script which computes recall and precision with both texts
    101113{{{
    102 mkdir ia161-grammar
    103 cd ia161-grammar
    104 wget https://nlp.fi.muni.cz/trac/research/raw-attachment/wiki/private/NlpInPracticeCourse/AutomaticCorrection/punct.set
    105 wget https://nlp.fi.muni.cz/trac/research/raw-attachment/wiki/private/NlpInPracticeCourse/AutomaticCorrection/test-nopunct.txt
    106 wget https://nlp.fi.muni.cz/trac/research/raw-attachment/wiki/private/NlpInPracticeCourse/AutomaticCorrection/eval-gold.txt
    107 wget https://nlp.fi.muni.cz/trac/research/raw-attachment/wiki/private/NlpInPracticeCourse/AutomaticCorrection/evalpunct_robust.py
     114wget https://nlp.fi.muni.cz/trac/research/chrome/site/bigdata/task_ia161-grammar.zip
     115unzip task_ia161-grammar.zip
     116cd task_ia161-grammar
    108117}}}
    1091181. run the parser to fill punctuation to the testing text
     
    1181271. evaluate the result
    119128 {{{
    120 PYTHONIOENCODING=UTF-8 python evalpunct_robust.py eval-gold.txt test.txt > results.txt; \
     129python evalpunct_robust.py eval-gold.txt test.txt > results.txt; \
    121130cat results.txt
    122131}}}