Changes between Version 32 and Version 33 of private/NlpInPracticeCourse/LanguageResourcesFromWeb
- Timestamp:
- Oct 27, 2021, 1:14:24 PM (2 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
private/NlpInPracticeCourse/LanguageResourcesFromWeb
v32 v33 44 44 45 45 Or: Select a detection algorithm and implement it in Python. //The right homework if you want to learn something.// 46 * A basic detection script to extend: [ raw-attachment:plagiarism_simple.py] -- usage: {{{python plagiarism_simple.py < training_data.vert}}}46 * A basic detection script to extend: [https://nlp.fi.muni.cz/trac/research/raw-attachment/wiki/en/AdvancedNlpCourse/LanguageResourcesFromWeb/plagiarism_simple.py plagiarism_simple.py] -- usage: {{{python plagiarism_simple.py < training_data.vert}}} 47 47 * A bag of words + cosine similarity of word vectors approach is implemented in this script. //(For the sake of simplicity: A plagiarism cannot have more sources here.)// 48 48 * You can modify the script to … … 50 50 * or implement other lexical/syntactic based detection approach, e.g. n-grams of words or Levenshtein's distance, 51 51 * or implement other semantic based detection approach, e.g. the similarity of {{{word2vec}}} vectors. 52 * Input format: A 3-column vertical, see above. [ raw-attachment:training_data.vert]52 * Input format: A 3-column vertical, see above. [https://nlp.fi.muni.cz/trac/research/raw-attachment/wiki/en/AdvancedNlpCourse/LanguageResourcesFromWeb/training_data.vert training_data.vert] 53 53 * Output: One plagiarism per line: id TAB detected source id TAB real source id. Evaluation line: precision, recall F1 measure. 54 54 * Your script will be evaluated using data made by others.