Version 4 (modified by 9 years ago) (diff) | ,
---|
Automatic language correction
IA161 Advanced NLP Course, Course Guarantee: Aleš Horák
Prepared by: Ján Švec
State of the Art
Automatic language correction (spell checking) is the process of detecting and sometimes providing spelling suggestions for incorrectly spelled words in a text. Language correction nowadays has many potential applications on large amount of informal and unedited text generated online, among other things: web forums, tweets, blogs, and email.
In the theoretical lesson we will introduce and compare various methods to automatcally propose and choose a correction for an incorrectly written word. The lesson will also answer a question "How difficult is to develop a spell-checker?". And also describe a system that performs spellchecking and autocorrection.
In the end there will be a brief overwiev of various applications (computer software) for automatic language correction.
References
- CHOUDHURY, Monojit, et al. "How Difficult is it to Develop a Perfect Spell-checker? A Cross-linguistic Analysis through Complex Network Approach" TextGraphs?-2: Graph-Based Algorithms for Natural Language Processing, pages 81–88, Rochester, 2007. Source
- WHITELAW, Casey, et al. "Using the Web for Language Independent Spellchecking and Autocorrection" Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 890–899, Singapore, 2009. Source
- GUPTA, Neha, MATHUR, Pratistha. "Spell Checking Techniques in NLP: A Survey" International Journal of Advanced Research in Computer Science and Software Engineering, volume 2, issue 12, pages 217-221, 2012. Source
- HLADEK, Daniel, STAS, Jan, JUHAR, Jozef. "Unsupervised Spelling Correction for the Slovak Text." Advances in Electrical and Electronic Engineering 11 (5), pages 392-397, 2013. Source
Practical Session
There will be a short overview of LanguageTool - Style and Grammar checker. Students can test the complete algorithm, and evaluate it on real data. After they become acquainted with how a spelling corrector works, we will write a simple spelling corrector in Python. The spelling corrector will be trained on a large text file compiled from Project Gutenberg. The example will be based on Peter Norvig's Spelling Corrector in python. If the student finishes early the additional task is to enhance the spelling corrector's functionality.
Attachments (10)
- IMG_20151126_112640.jpg (1.6 MB) - added by 9 years ago.
- test-nopunct.txt (55.1 KB) - added by 7 years ago.
- tsd2014.pdf (131.3 KB) - added by 7 years ago.
- spell-testset1.txt (3.7 KB) - added by 7 years ago.
- spell-testset2.txt (7.3 KB) - added by 7 years ago.
- evalpunct_robust.py (1.8 KB) - added by 7 years ago.
- big.txt (6.2 MB) - added by 7 years ago.
-
spell.py (15.8 KB) - added by 4 years ago.
Spelling corrector
- eval-gold.txt (56.5 KB) - added by 4 years ago.
- punct.set (3.4 KB) - added by 4 years ago.