Context Navigation

AutomaticCorrection

Timestamp:: Dec 8, 2017, 7:06:02 PM (8 years ago)
Author:: Ales Horak
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

private/NlpInPracticeCourse/AutomaticCorrection

-                      v21
+                      v22
 . [#task2 rule based grammar checker (punctuation) for Czech]
 === Statistical spell checker for English === #task1
+== Task 1: Statistical spell checker for English == #task1
 In theoretical lesson we have become acquainted with various approaches how spelling correctors work. Now we will get to know how a simple spellchecker based on '''edit distance''' works.
 …
 ==== Spellchecker functionality with examples ====
+=== Spellchecker functionality with examples ===
 . Spellchecker is '''trained''' from file `big.txt` which is a concatenation of several public domain books from '''Project Gutenberg''' and lists of most frequent words from '''Wiktionary''' and the '''British National Corpus'''. Function `train` stores how many times each word occurs in the text file. `NWORDS[w]` holds a count of how many times the word '''w has been seen'''.
 …
 ==== Task 1 ====
+=== Task 1 ===
 . Create `<YOUR_FILE>`, a text file named `ia161-UCO-13.txt` where UCO is your university ID.
 …
 ==== Upload `<YOUR_FILE>` and edited `spell.py` ====
+=== Upload `<YOUR_FILE>` and edited `spell.py` ===
 === Rule based grammar checker (punctuation) for Czech === #task2
+== Task 2: Rule based grammar checker (punctuation) for Czech == #task2
 The second task choice consists in adapting specific syntactic grammar of Czech to improve the results of ''punctuation detection'', i.e. placement of ''commas'' in the requested position in a sentence.
 ==== Task 2 ====
+=== Task 2 ===
 . login to aurora: `ssh aurora`
 …
 }}}
 . edit the grammar `punct.set` and add 1-2 rules to increase the coverage of 10%
+ You may need to go through general information about the [https://nlp.fi.muni.cz/trac/set/wiki/documentation#Rulesstructure SET grammar format]. Information about adapting the grammar for the task of ''punctuation detection'' can be found the this [raw-attachment:tsd2014.pdf published paper].
+ Current best results achieved with an extended grammar are 91.2 % of precision and 55 % recall.
 . upload the modified `punct.set` and the respective `results.txt`.