Changes between Version 42 and Version 43 of private/NlpInPracticeCourse/LanguageResourcesFromWeb


Ignore:
Timestamp:
Oct 30, 2024, 1:36:07 AM (7 months ago)
Author:
xsuchom2
Comment:

Google Colab link

Legend:

Unmodified
Added
Removed
Modified
  • private/NlpInPracticeCourse/LanguageResourcesFromWeb

    v42 v43  
    4646    * which detection methods might not be able to reveal it – give reasons
    4747  * **Submit a text file containing 10 documents according to the requirements + 1 text file describing techniques used and your estimation which detection techniques may or may not work.**
     48  * If you don't have access to NLPC machines, it is permitted to submit plain text with document structures instead of a POS tagged vertical.
    4849
    4950Or: Select a detection algorithm and implement it in Python. //The right homework if you want to learn something.//
    50   * A basic detection script to extend: [https://nlp.fi.muni.cz/trac/research/raw-attachment/wiki/en/NlpInPracticeCourse/LanguageResourcesFromWeb/plagiarism_simple.py plagiarism_simple.py] – usage: {{{python plagiarism_simple.py < training_data.vert}}}
     51  * A basic detection script to extend: [https://colab.research.google.com/drive/1eoitLND5_URua-1IRddoQDSbkYzlSbqP the interactive version in Google Colab] or download [https://nlp.fi.muni.cz/trac/research/raw-attachment/wiki/en/NlpInPracticeCourse/LanguageResourcesFromWeb/plagiarism_simple.py plagiarism_simple.py] and run it on your own – {{{python plagiarism_simple.py < training_data.vert}}}.
    5152    * A bag of words + cosine similarity of word vectors approach is implemented in this script. //(For the sake of simplicity: A plagiarism cannot have more sources here.)//
    5253    * You can modify the script to