Changes between Version 17 and Version 18 of private/AdvancedNlpCourse/AutomaticCorrection


Ignore:
Timestamp:
Dec 17, 2015, 11:36:00 PM (6 years ago)
Author:
xsvec3
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • private/AdvancedNlpCourse/AutomaticCorrection

    v17 v18  
    4949}}}
    5050
    51 2. '''Edit distance 1''' is represented as function `edits1` - it represents deletion (remove one letter), a transposition (swap adjacent letters), an alteration (change one letter to another) or an insertion (add a letter).
     512. '''Edit distance 1''' is represented as function `edits1` - it represents deletion (remove one letter), a transposition (swap adjacent letters), an alteration (change one letter to another) or an insertion (add a letter). For a word of length '''n''', there will be '''n deletions''', '''n-1 transpositions''', '''26n alterations''', and '''26(n+1) insertions''', for a '''total of 54n+25'''. Example: len(edits1('something')) = 494 words.
    5252
    5353{{{
     
    6161}}}
    6262
    63 For a word of length '''n''', there will be '''n deletions''', '''n-1 transpositions''', '''26n alterations''', and '''26(n+1) insertions''', for a '''total of 54n+25'''. Example: len(edits1('something')) = 494 words.
    6463
    65 3. '''Edit distance 2'''(`edits2`) - applied edits1 to all the results of edits1. Example: len(edits2('something')) = 114 324 words, which is a high number.
    6664
    67 To enhance speed we can only keep the candidates that are actually known words (`known_edits2`). Now known_edits2('something') is a set of just 4 words: {'smoothing', 'seething', 'something', 'soothing'}.
     653. '''Edit distance 2'''(`edits2`) - applied edits1 to all the results of edits1. Example: len(edits2('something')) = 114 324 words, which is a high number. To enhance speed we can only keep the candidates that are actually known words (`known_edits2`). Now known_edits2('something') is a set of just 4 words: {'smoothing', 'seething', 'something', 'soothing'}.
    6866
    69674. The function `correct` chooses as the set of candidate words the set with the '''shortest edit distance''' to the original word.
     
    7674}}}
    7775
    78 5. '''Result of the spellchecker''' is, that it takes a word as input and returns a likely correction of that word.
    79 {{{
    80 >>> correct('speling')
    81 'spelling'
    82 >>> correct('korrecter')
    83 'corrector'
    84 }}}
     765. For '''evaluation''' there are prepared two test sets - developement(`test1`) and final test set(`test2`).
     77
     78
    8579
    8680