Changes between Version 21 and Version 22 of private/AdvancedNlpCourse/AutomaticCorrection


Ignore:
Timestamp:
Dec 8, 2017, 7:06:02 PM (4 years ago)
Author:
Ales Horak
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • private/AdvancedNlpCourse/AutomaticCorrection

    v21 v22  
    2424 2. [#task2 rule based grammar checker (punctuation) for Czech]
    2525
    26 === Statistical spell checker for English === #task1
     26== Task 1: Statistical spell checker for English == #task1
    2727
    2828In theoretical lesson we have become acquainted with various approaches how spelling correctors work. Now we will get to know how a simple spellchecker based on '''edit distance''' works.
     
    3838
    3939
    40 ==== Spellchecker functionality with examples ====
     40=== Spellchecker functionality with examples ===
    4141
    42421. Spellchecker is '''trained''' from file `big.txt` which is a concatenation of several public domain books from '''Project Gutenberg''' and lists of most frequent words from '''Wiktionary''' and the '''British National Corpus'''. Function `train` stores how many times each word occurs in the text file. `NWORDS[w]` holds a count of how many times the word '''w has been seen'''. 
     
    7575
    7676
    77 ==== Task 1 ====
     77=== Task 1 ===
    7878 1. Create `<YOUR_FILE>`, a text file named `ia161-UCO-13.txt` where UCO is your university ID.
    7979
     
    8787
    8888
    89 ==== Upload `<YOUR_FILE>` and edited `spell.py` ====
     89=== Upload `<YOUR_FILE>` and edited `spell.py` ===
    9090
    91 === Rule based grammar checker (punctuation) for Czech === #task2
     91== Task 2: Rule based grammar checker (punctuation) for Czech == #task2
    9292
    9393The second task choice consists in adapting specific syntactic grammar of Czech to improve the results of ''punctuation detection'', i.e. placement of ''commas'' in the requested position in a sentence.
    9494
    95 ==== Task 2 ====
     95=== Task 2 ===
    9696
    97971. login to aurora: `ssh aurora`
     
    116116}}}
    1171171. edit the grammar `punct.set` and add 1-2 rules to increase the coverage of 10%
     118 You may need to go through general information about the [https://nlp.fi.muni.cz/trac/set/wiki/documentation#Rulesstructure SET grammar format]. Information about adapting the grammar for the task of ''punctuation detection'' can be found the this [raw-attachment:tsd2014.pdf published paper].
     119
     120 Current best results achieved with an extended grammar are 91.2 % of precision and 55 % recall.
    1181211. upload the modified `punct.set` and the respective `results.txt`.
    119122