Changes between Version 39 and Version 40 of private/NlpInPracticeCourse/LanguageResourcesFromWeb


Ignore:
Timestamp:
Oct 17, 2023, 9:37:36 AM (7 months ago)
Author:
xsuchom2
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • private/NlpInPracticeCourse/LanguageResourcesFromWeb

    v39 v40  
    5151    * A bag of words + cosine similarity of word vectors approach is implemented in this script. //(For the sake of simplicity: A plagiarism cannot have more sources here.)//
    5252    * You can modify the script to
    53       * use other input attributes than the word or a combination of attributes, e.g. the lemma or the morphological tag,
    54       * or implement other lexical/syntactic based detection approach, e.g. n-grams of words or Levenshtein's distance,
    55       * or implement other semantic based detection approach, e.g. the similarity of {{{word2vec}}} vectors.
     53      * use other input attributes than the word or a combination of attributes, e.g. the lemma or the morphological tag
     54      * or implement other lexical/syntactic based detection approach, e.g. n-grams of words or Levenshtein's distance
     55      * or implement other semantic based detection approach, e.g. the similarity of {{{word2vec}}} vectors
     56      * or do it another way, be creative -- describe how it works in comments in the code.
    5657  * Input format: A 3-column vertical, see above. [https://nlp.fi.muni.cz/trac/research/raw-attachment/wiki/en/NlpInPracticeCourse/LanguageResourcesFromWeb/training_data.vert training_data.vert]
    5758  * Output: One plagiarism per line: id TAB detected source id TAB real source id. Evaluation line: precision, recall F1 measure.