Changes between Version 42 and Version 43 of private/NlpInPracticeCourse/LanguageResourcesFromWeb
- Timestamp:
- Oct 30, 2024, 1:36:07 AM (7 months ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
private/NlpInPracticeCourse/LanguageResourcesFromWeb
v42 v43 46 46 * which detection methods might not be able to reveal it – give reasons 47 47 * **Submit a text file containing 10 documents according to the requirements + 1 text file describing techniques used and your estimation which detection techniques may or may not work.** 48 * If you don't have access to NLPC machines, it is permitted to submit plain text with document structures instead of a POS tagged vertical. 48 49 49 50 Or: Select a detection algorithm and implement it in Python. //The right homework if you want to learn something.// 50 * A basic detection script to extend: [https:// nlp.fi.muni.cz/trac/research/raw-attachment/wiki/en/NlpInPracticeCourse/LanguageResourcesFromWeb/plagiarism_simple.py plagiarism_simple.py] – usage: {{{python plagiarism_simple.py < training_data.vert}}}51 * A basic detection script to extend: [https://colab.research.google.com/drive/1eoitLND5_URua-1IRddoQDSbkYzlSbqP the interactive version in Google Colab] or download [https://nlp.fi.muni.cz/trac/research/raw-attachment/wiki/en/NlpInPracticeCourse/LanguageResourcesFromWeb/plagiarism_simple.py plagiarism_simple.py] and run it on your own – {{{python plagiarism_simple.py < training_data.vert}}}. 51 52 * A bag of words + cosine similarity of word vectors approach is implemented in this script. //(For the sake of simplicity: A plagiarism cannot have more sources here.)// 52 53 * You can modify the script to