Context Navigation

← Previous Change
Wiki History
Next Change →

LanguageResourcesFromWeb

Timestamp:: Oct 17, 2023, 9:38:36 AM (9 months ago)
Author:: xsuchom2
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

private/NlpInPracticeCourse/LanguageResourcesFromWeb

-                      v40
+                      v41
   * For each plagiarism:
     * describe plagiarsim technique(s) used
     * which detection methods might be able to reveal it -- give reasons
     * which detection methods might not be able to reveal it -- give reasons
+    * which detection methods might be able to reveal it – give reasons
+    * which detection methods might not be able to reveal it – give reasons
   * **Submit a text file containing 10 documents according to the requirements + 1 text file describing techniques used and your estimation which detection techniques may or may not work.**
 Or: Select a detection algorithm and implement it in Python. //The right homework if you want to learn something.//
   * A basic detection script to extend: [https://nlp.fi.muni.cz/trac/research/raw-attachment/wiki/en/NlpInPracticeCourse/LanguageResourcesFromWeb/plagiarism_simple.py plagiarism_simple.py] -- usage: {{{python plagiarism_simple.py < training_data.vert}}}
+  * A basic detection script to extend: [https://nlp.fi.muni.cz/trac/research/raw-attachment/wiki/en/NlpInPracticeCourse/LanguageResourcesFromWeb/plagiarism_simple.py plagiarism_simple.py] – usage: {{{python plagiarism_simple.py < training_data.vert}}}
     * A bag of words + cosine similarity of word vectors approach is implemented in this script. //(For the sake of simplicity: A plagiarism cannot have more sources here.)//
     * You can modify the script to
 …
       * or implement other lexical/syntactic based detection approach, e.g. n-grams of words or Levenshtein's distance
       * or implement other semantic based detection approach, e.g. the similarity of {{{word2vec}}} vectors
       * or do it another way, be creative -- describe how it works in comments in the code.
+      * or do it another way, be creative – describe how it works in comments in the code.
   * Input format: A 3-column vertical, see above. [https://nlp.fi.muni.cz/trac/research/raw-attachment/wiki/en/NlpInPracticeCourse/LanguageResourcesFromWeb/training_data.vert training_data.vert]
   * Output: One plagiarism per line: id TAB detected source id TAB real source id. Evaluation line: precision, recall F1 measure.
   * Your script will be evaluated using data made by others.
   * Describe which plagiarism detection technique(s) were implemented -- put it in a comment in the beginning of your script.
+  * Describe which plagiarism detection technique(s) were implemented – put it in a comment in the beginning of your script.
   * **Submit the modified script (or your own script) with a short description in a comment.** (The training set output of the script is not required.)