Changes between Version 12 and Version 13 of private/AdvancedNlpCourse/LanguageResourcesFromWeb


Ignore:
Timestamp:
Oct 23, 2017, 2:43:42 AM (4 years ago)
Author:
xsuchom2
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • private/AdvancedNlpCourse/LanguageResourcesFromWeb

    v12 v13  
    2424== Practical Session task ==
    2525=== Plagiators vs. plagiarism detectors ===
    26  1. Create 5 documents (with a similar topic) and 5 plagiarisms of these documents, 10 documents total. 100 words $\leq$ document lenght $\leq$ 500 words. 20 \% $\leq$ plagiarism content $\leq$ 75 \% (100 \% if done well).
    27  1. Select detection algorithm and implement it in Python. At least 1 own plagiarism must be detected, at least 1 must be not detected by your own script.
    28  1. Input format: POS tagged vertical consisting of 10 sctructures `doc` with attributes `author`, `id`, `class`, `source`. Pair author, id is unique. Class is "original" or "plagiarism". Source is the id of the source (in case of plagiarism) or own id (in case of original).\footnote{{\tiny For the sake of simplicity: A plagiarism cannot have more sources here.}}
    29  1. Output format: One plagiarism per line: id `TAB` detected source id `TAB` real source id. Evaluation line: precision, recall F1 measure.
    30  1. Your script will be evaluated using data made by others.
     26See the slides.
    3127
    3228=== Text processing pipelines ===