Changes between Version 21 and Version 22 of private/NlpInPracticeCourse/AutomaticCorrection
- Timestamp:
- Dec 8, 2017, 7:06:02 PM (6 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
private/NlpInPracticeCourse/AutomaticCorrection
v21 v22 24 24 2. [#task2 rule based grammar checker (punctuation) for Czech] 25 25 26 == = Statistical spell checker for English === #task126 == Task 1: Statistical spell checker for English == #task1 27 27 28 28 In theoretical lesson we have become acquainted with various approaches how spelling correctors work. Now we will get to know how a simple spellchecker based on '''edit distance''' works. … … 38 38 39 39 40 === = Spellchecker functionality with examples ====40 === Spellchecker functionality with examples === 41 41 42 42 1. Spellchecker is '''trained''' from file `big.txt` which is a concatenation of several public domain books from '''Project Gutenberg''' and lists of most frequent words from '''Wiktionary''' and the '''British National Corpus'''. Function `train` stores how many times each word occurs in the text file. `NWORDS[w]` holds a count of how many times the word '''w has been seen'''. … … 75 75 76 76 77 === = Task 1 ====77 === Task 1 === 78 78 1. Create `<YOUR_FILE>`, a text file named `ia161-UCO-13.txt` where UCO is your university ID. 79 79 … … 87 87 88 88 89 === = Upload `<YOUR_FILE>` and edited `spell.py` ====89 === Upload `<YOUR_FILE>` and edited `spell.py` === 90 90 91 == = Rule based grammar checker (punctuation) for Czech === #task291 == Task 2: Rule based grammar checker (punctuation) for Czech == #task2 92 92 93 93 The second task choice consists in adapting specific syntactic grammar of Czech to improve the results of ''punctuation detection'', i.e. placement of ''commas'' in the requested position in a sentence. 94 94 95 === = Task 2 ====95 === Task 2 === 96 96 97 97 1. login to aurora: `ssh aurora` … … 116 116 }}} 117 117 1. edit the grammar `punct.set` and add 1-2 rules to increase the coverage of 10% 118 You may need to go through general information about the [https://nlp.fi.muni.cz/trac/set/wiki/documentation#Rulesstructure SET grammar format]. Information about adapting the grammar for the task of ''punctuation detection'' can be found the this [raw-attachment:tsd2014.pdf published paper]. 119 120 Current best results achieved with an extended grammar are 91.2 % of precision and 55 % recall. 118 121 1. upload the modified `punct.set` and the respective `results.txt`. 119 122