Changes between Version 15 and Version 16 of private/AdvancedNlpCourse/AutomaticCorrection


Ignore:
Timestamp:
Dec 17, 2015, 9:56:36 PM (6 years ago)
Author:
xsvec3
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • private/AdvancedNlpCourse/AutomaticCorrection

    v15 v16  
    11= Automatic language correction =
    2 
    32[[https://is.muni.cz/auth/predmet/fi/ia161|IA161]] [[en/AdvancedNlpCourse|Advanced NLP Course]], Course Guarantee: Aleš Horák
    43
    5 Prepared by: Ján Švec 
     4Prepared by: Ján Švec
    65
    76== State of the Art ==
    8 
    97Language correction nowadays has many potential applications on large amount of informal and unedited text generated online, among other things: web forums, tweets, blogs, and email. Automatic language correction can consist of many areas including: spell checking, grammar checking and word completion.
    108
     
    1311The lesson will also answer a question "How difficult is to develop a spell-checker?". And also describe a system that performs spell-checking and autocorrection.
    1412
    15 In the end there will be a brief overview of various applications (computer software) for automatic language correction.
     13=== References ===
     14 1. CHOUDHURY, Monojit, et al. "How Difficult is it to Develop a Perfect Spell-checker? A Cross-linguistic Analysis through Complex Network Approach" Graph-Based Algorithms for Natural Language Processing, pages 81–88, Rochester, 2007. [[http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=52A3B869596656C9DA285DCE83A0339F?doi=10.1.1.146.4390&rep=rep1&type=pdf|Source]]
     15 1. WHITELAW, Casey, et al. "Using the Web for Language Independent Spellchecking and Autocorrection" Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 890–899, Singapore, 2009. [[http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/36180.pdf|Source]]
     16 1. GUPTA, Neha, MATHUR, Pratistha. "Spell Checking Techniques in NLP: A Survey" International Journal of Advanced Research in Computer Science and Software Engineering, volume 2, issue 12, pages 217-221, 2012. [[http://www.ijarcsse.com/docs/papers/12_December2012/Volume_2_issue_12_December2012/V2I12-0164.pdf|Source]]
     17 1. HLADEK, Daniel, STAS, Jan, JUHAR, Jozef. "Unsupervised Spelling Correction for the Slovak Text." Advances in Electrical and Electronic Engineering 11 (5), pages 392-397, 2013.  [[http://advances.utc.sk/index.php/AEEE/article/view/898|Source]]
    1618
    17 === References ===   
    18 
    19   1. CHOUDHURY, Monojit, et al. "How Difficult is it to Develop a Perfect Spell-checker? A Cross-linguistic Analysis through Complex Network Approach" Graph-Based Algorithms for Natural Language Processing, pages 81–88, Rochester, 2007. [[http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=52A3B869596656C9DA285DCE83A0339F?doi=10.1.1.146.4390&rep=rep1&type=pdf|Source]]         
    20   1. WHITELAW, Casey, et al. "Using the Web for Language Independent Spellchecking and Autocorrection" Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 890–899, Singapore, 2009. [[http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/36180.pdf|Source]]   
    21   1. GUPTA, Neha, MATHUR, Pratistha. "Spell Checking Techniques in NLP: A Survey" International Journal of Advanced Research in Computer Science and Software Engineering, volume 2, issue 12, pages 217-221, 2012. [[http://www.ijarcsse.com/docs/papers/12_December2012/Volume_2_issue_12_December2012/V2I12-0164.pdf|Source]] 
    22   1. HLADEK, Daniel, STAS, Jan, JUHAR, Jozef. "Unsupervised Spelling Correction for the Slovak Text." Advances in Electrical and Electronic Engineering 11 (5), pages 392-397, 2013.  [[http://advances.utc.sk/index.php/AEEE/article/view/898|Source]]
    2319== Slides ==
    2420[http://nlp.fi.muni.cz/trac/research/raw-attachment/wiki/en/AdvancedNlpCourse/anlp-14-AutomaticCorrection.pdf]
    2521
    2622== Practical Session ==
     23In theoretical lesson we have become acquainted with various approaches how spelling correctors work. Now we will get to know how a simple spellchecker based on '''edit distance''' works.
    2724
    28 There will be a short overview of [[https://www.languagetool.org/|LanguageTool]] - Style and Grammar checker. Students can test the language correction algorithm and evaluate it on real data. After they become acquainted with how a spelling corrector works, we will write a simple spelling corrector in Python. The spelling corrector will be trained on a large text file compiled from [[https://www.gutenberg.org/|Project Gutenberg]]. The example will be based on Peter Norvig's [[http://norvig.com/spell-correct.html|Spelling Corrector]] in python. If the student finishes early the additional task is to enhance the spelling corrector's functionality.
     25The example is based on Peter Norvig's [[http://norvig.com/spell-correct.html|Spelling Corrector]] in python. The spelling corrector will be trained with a large text file consisting of about a million words.
    2926
    30 1. Download prepared script  [[https://nlp.fi.muni.cz/trac/research/attachment/wiki/private/AdvancedNlpCourse/AutomaticCorrection/spell.py|spell.py]] and training data collection  [[https://nlp.fi.muni.cz/trac/research/attachment/wiki/private/AdvancedNlpCourse/AutomaticCorrection/big.txt|big.txt]].
    31 1. Test the script {{{ python ./spell.py }}} in your working directory.
    32 1. Open it in your favourite editor and we will walk through its functionality.
     27We will test this tool on prepared data. Your goal will be to enhance spellchecker's accuracy. If you finish early, there is a bonus question in the `task` section. 
    3328
    3429
    35 === Task ===
     30 1. Download prepared script  [[https://nlp.fi.muni.cz/trac/research/attachment/wiki/private/AdvancedNlpCourse/AutomaticCorrection/spell.py|spell.py]] and training data collection  [[https://nlp.fi.muni.cz/trac/research/attachment/wiki/private/AdvancedNlpCourse/AutomaticCorrection/big.txt|big.txt]].
     31 1. Test the script ` python ./spell.py ` in your working directory.
     32 1. Open it in your favourite editor and we will walk through its functionality.
    3633
    37 1. Create `<YOUR_FILE>`, a text file named ia161-UCO-14.txt where UCO is your university ID.
     34=== Task ===
     35 1. Create `<YOUR_FILE>`, a text file named ia161-UCO-14.txt where UCO is your university ID.
    3836
    39 2. Run `spell.py` with developement and final test sets (test1 and test2), write the results in `<YOUR_FILE>`.
     37 2. Run `spell.py` with developement and final test sets (test1 and test2), write the results in `<YOUR_FILE>`.
    4038
    41 3. Explain the given results in few words and write it in `<YOUR_FILE>`.
     39 3. Explain the given results in few words and write it in `<YOUR_FILE>`.
    4240
    43 4. Modify the code of `spell.py` to increase accuraccy by 10 %. Write your new accuracy results to `<YOUR_FILE>`.
     41 4. Modify the code of `spell.py` to increase accuraccy by 10 %. Write your new accuracy results to `<YOUR_FILE>`.
    4442
    45 5. Run the script with `verbose=True` and examine given results. Try to suggest at least one adjustment how to enhance spellchecker's accuracy. Write your suggestions to `<YOUR_FILE>`.
    46  
    47 -Bonus question- How could you make the implementation faster without changing the results? Write your suggestions to `<YOUR_FILE>`.
     43 5. Run the script with `verbose=True` and examine given results. Try to suggest at least one adjustment how to enhance spellchecker's accuracy. Write your suggestions to `<YOUR_FILE>`.
    4844
     45 -Bonus question- How could you make the implementation faster without changing the results? Write your suggestions to `<YOUR_FILE>`.
    4946
    5047=== Upload `<YOUR_FILE>` and edited `spell.py` ===
    51 
    5248Do not forget to upload your resulting files to the [https://is.muni.cz/auth/el/1433/podzim2015/IA161/ode/59241116/ homework vault (odevzdávárna)].