Changes between Version 39 and Version 40 of private/NlpInPracticeCourse/LanguageResourcesFromWeb
- Timestamp:
- Oct 17, 2023, 9:37:36 AM (7 months ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
private/NlpInPracticeCourse/LanguageResourcesFromWeb
v39 v40 51 51 * A bag of words + cosine similarity of word vectors approach is implemented in this script. //(For the sake of simplicity: A plagiarism cannot have more sources here.)// 52 52 * You can modify the script to 53 * use other input attributes than the word or a combination of attributes, e.g. the lemma or the morphological tag, 54 * or implement other lexical/syntactic based detection approach, e.g. n-grams of words or Levenshtein's distance, 55 * or implement other semantic based detection approach, e.g. the similarity of {{{word2vec}}} vectors. 53 * use other input attributes than the word or a combination of attributes, e.g. the lemma or the morphological tag 54 * or implement other lexical/syntactic based detection approach, e.g. n-grams of words or Levenshtein's distance 55 * or implement other semantic based detection approach, e.g. the similarity of {{{word2vec}}} vectors 56 * or do it another way, be creative -- describe how it works in comments in the code. 56 57 * Input format: A 3-column vertical, see above. [https://nlp.fi.muni.cz/trac/research/raw-attachment/wiki/en/NlpInPracticeCourse/LanguageResourcesFromWeb/training_data.vert training_data.vert] 57 58 * Output: One plagiarism per line: id TAB detected source id TAB real source id. Evaluation line: precision, recall F1 measure.