Changes between Version 45 and Version 46 of private/NlpInPracticeCourse/AutomaticCorrection
- Timestamp:
- Dec 11, 2023, 10:18:24 PM (5 months ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
private/NlpInPracticeCourse/AutomaticCorrection
v45 v46 19 19 20 20 == Practical Session == 21 {{{ 22 #!div class="wiki-toc" style="width: 40%" 23 **Note:** If you are new to the [https://en.wikipedia.org/wiki/Command-line_interface command line interface] via a [https://en.wikipedia.org/wiki/Terminal_emulator terminal window], you may find the **[https://ubuntu.com/tutorials/command-line-for-beginners#3-opening-a-terminal tutorial for working in terminal]** useful. 24 }}} 21 25 22 26 There are 2 tasks, you may choose one or both: … … 33 37 34 38 35 1. Download the prepared script [[raw-attachment:spell.py|spell.py]] and the training data collection [[raw-attachment:big.txt|big.txt]]. 36 1. Test the script by running `python ./spell.py` in your working directory. 37 1. Open it in your favourite editor and we will walk through its functionality. 39 1. Download [htdocs:bigdata/task_ia161-spell.zip task_ia161-spell.zip] with a prepared script `spell.py` and a training data collection `big.txt`. Unzip it and change to the contained directory. 40 {{{ 41 wget https://nlp.fi.muni.cz/trac/research/chrome/site/bigdata/task_ia161-spell.zip 42 unzip task_ia161-spell.zip 43 cd task_ia161-spell 44 }}} 45 1. Test the script by running 46 {{{ 47 python spell.py 48 }}} 49 1. Open `spell.py` in your favourite editor and we will walk through its functionality. 38 50 39 51 … … 93 105 === Task 2 === 94 106 95 1. login to a urora: `ssh aurora`96 1. download :97 1. [raw-attachment:punct.set syntactic grammar]for punctuation detection for the [http://nlp.fi.muni.cz/projects/set SET parser]98 1. [raw-attachment:test-nopunct.txt testing text with no commas]99 1. [raw-attachment:eval-gold.txt evaluation text with correct punctuation]100 1. [raw-attachment:evalpunct_robust.py evaluation script]which computes recall and precision with both texts107 1. login to asteria04: `ssh asteria04` 108 1. download [htdocs:bigdata/task_ia161-grammar.zip task_ia161-grammar.zip] containing: 109 1. `punct.set`, the syntactic grammar for punctuation detection for the [http://nlp.fi.muni.cz/projects/set SET parser] 110 1. `test-nopunct.txt` - testing text with no commas 111 1. `eval-gold.txt` - evaluation text with correct punctuation 112 1. `evalpunct_robust.py` - evaluation script which computes recall and precision with both texts 101 113 {{{ 102 mkdir ia161-grammar 103 cd ia161-grammar 104 wget https://nlp.fi.muni.cz/trac/research/raw-attachment/wiki/private/NlpInPracticeCourse/AutomaticCorrection/punct.set 105 wget https://nlp.fi.muni.cz/trac/research/raw-attachment/wiki/private/NlpInPracticeCourse/AutomaticCorrection/test-nopunct.txt 106 wget https://nlp.fi.muni.cz/trac/research/raw-attachment/wiki/private/NlpInPracticeCourse/AutomaticCorrection/eval-gold.txt 107 wget https://nlp.fi.muni.cz/trac/research/raw-attachment/wiki/private/NlpInPracticeCourse/AutomaticCorrection/evalpunct_robust.py 114 wget https://nlp.fi.muni.cz/trac/research/chrome/site/bigdata/task_ia161-grammar.zip 115 unzip task_ia161-grammar.zip 116 cd task_ia161-grammar 108 117 }}} 109 118 1. run the parser to fill punctuation to the testing text … … 118 127 1. evaluate the result 119 128 {{{ 120 PYTHONIOENCODING=UTF-8python evalpunct_robust.py eval-gold.txt test.txt > results.txt; \129 python evalpunct_robust.py eval-gold.txt test.txt > results.txt; \ 121 130 cat results.txt 122 131 }}}