Changes between Version 17 and Version 18 of private/NlpInPracticeCourse/AutomaticCorrection
- Timestamp:
- Dec 17, 2015, 11:36:00 PM (8 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
private/NlpInPracticeCourse/AutomaticCorrection
v17 v18 49 49 }}} 50 50 51 2. '''Edit distance 1''' is represented as function `edits1` - it represents deletion (remove one letter), a transposition (swap adjacent letters), an alteration (change one letter to another) or an insertion (add a letter). 51 2. '''Edit distance 1''' is represented as function `edits1` - it represents deletion (remove one letter), a transposition (swap adjacent letters), an alteration (change one letter to another) or an insertion (add a letter). For a word of length '''n''', there will be '''n deletions''', '''n-1 transpositions''', '''26n alterations''', and '''26(n+1) insertions''', for a '''total of 54n+25'''. Example: len(edits1('something')) = 494 words. 52 52 53 53 {{{ … … 61 61 }}} 62 62 63 For a word of length '''n''', there will be '''n deletions''', '''n-1 transpositions''', '''26n alterations''', and '''26(n+1) insertions''', for a '''total of 54n+25'''. Example: len(edits1('something')) = 494 words.64 63 65 3. '''Edit distance 2'''(`edits2`) - applied edits1 to all the results of edits1. Example: len(edits2('something')) = 114 324 words, which is a high number.66 64 67 To enhance speed we can only keep the candidates that are actually known words (`known_edits2`). Now known_edits2('something') is a set of just 4 words: {'smoothing', 'seething', 'something', 'soothing'}.65 3. '''Edit distance 2'''(`edits2`) - applied edits1 to all the results of edits1. Example: len(edits2('something')) = 114 324 words, which is a high number. To enhance speed we can only keep the candidates that are actually known words (`known_edits2`). Now known_edits2('something') is a set of just 4 words: {'smoothing', 'seething', 'something', 'soothing'}. 68 66 69 67 4. The function `correct` chooses as the set of candidate words the set with the '''shortest edit distance''' to the original word. … … 76 74 }}} 77 75 78 5. '''Result of the spellchecker''' is, that it takes a word as input and returns a likely correction of that word. 79 {{{ 80 >>> correct('speling') 81 'spelling' 82 >>> correct('korrecter') 83 'corrector' 84 }}} 76 5. For '''evaluation''' there are prepared two test sets - developement(`test1`) and final test set(`test2`). 77 78 85 79 86 80