Changes between Version 42 and Version 43 of private/NlpInPracticeCourse/AutomaticCorrection
- Timestamp:
- Dec 7, 2023, 9:19:02 AM (5 months ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
private/NlpInPracticeCourse/AutomaticCorrection
v42 v43 62 62 return set(deletes + transposes + replaces + inserts) 63 63 }}} 64 1. '''Edit distance 2''' (`edits2`) - applies `edits1()` to all the results of `edits1()`. Example: `len(edits2('something')) = 114 324` words, which is a high number. To enhance speed we can only keep the candidates that are actually known words (`known_edits2()`). Now `known_edits2('something')` is a set of just 4 words: `{'smoothing', 'seething', 'something', 'soothing'}`.64 1. '''Edit distance 2''' (`edits2`) - applies `edits1()` to all the results of `edits1()`. Example: `len(edits2('something')) = 114 324` words, which is a high number. To enhance speed we can only keep the candidates that are actually known words (`known_edits2()`). Now `known_edits2('something')` is a set of just 4 words: `{'smoothing', 'seething', 'something', 'soothing'}`. 65 65 1. The function `correct()` chooses as the set of candidate words the set with the '''shortest edit distance''' to the original word. 66 66 {{{ … … 84 84 4. Modify the code of `spell.py` to increase accuracy (`pct`) at `tests2` by 10 %. You may take an inspiration from the ''Future work'' section of [http://norvig.com/spell-correct.html the Norvig's article]. Describe your changes and write your new accuracy results to `<YOUR_FILE>`. 85 85 86 5. Upload `<YOUR_FILE>` and the edited `spell.py` to the [/en/NlpInPracticeCourse homework vault (odevzdávárna)]. 86 87 87 === Upload `<YOUR_FILE>` and the edited `spell.py` ===88 88 89 89 == Task 2: Rule based grammar checker (punctuation) for Czech == #task2