Changes between Version 1 and Version 2 of private/NlpInPracticeCourse/LanguageResourcesFromWeb
- Timestamp:
- Jun 5, 2015, 2:37:29 PM (8 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
private/NlpInPracticeCourse/LanguageResourcesFromWeb
v1 v2 1 1 = Building Language Resources from the Web = 2 3 A new topic proposal: 4 = Duplicities on the Web – deduplication and plagiarism detection = 2 5 3 6 [[https://is.muni.cz/auth/predmet/fi/ia161|IA161 Advanced NLP Course]], Course Guarantee: Aleš Horák … … 16 19 Approx 3 current papers (preferably from best NLP conferences/journals, eg. [[https://www.aclweb.org/anthology/|ACL Anthology]]) that will be used as a source for the one-hour lecture: 17 20 18 1. paper1 19 1. paper2 20 1. paper3 21 1. Chapters 19 and 20 from C. D. Manning et al. "Introduction to Information Retrieval". Cambridge University Press, 2008. 22 1. Pomikálek, Jan. "Removing boilerplate and duplicate content from web corpora." Dissertation thesis. Masaryk University, 2011. 23 1. HaCohen-Kerner, Yaakov, Aharon Tayeb, and Natan Ben-Dror. "Detection of simple plagiarism in computer science papers." Coling, 2010. 24 1. !TODO another plagiarism detection paper 21 25 22 26 == Practical Session == … … 25 29 26 30 Students can also be required to generate some results of their work and hand them in to prove completing the tasks. 31 32 Resources: 33 - A set of documents and plagiates !TODO 34 - A frame script in Python for plagiarism detection !TODO 35 - A description of several basic methods for plagiarism detection evaluated by HaCohen-Kerner et al. !TODO 36 37 The task: 38 - !TODO instructions 39 - The student will choose a method for plagiarism detection and implement it as a function in the frame script. 40 - Evaluation: precision, recall, F1 (the calculaton will be a part of the frame script). 41 42