Context Navigation

LanguageResourcesFromWeb

-                      v12
+                      v13
 == Practical Session task ==
 === Plagiators vs. plagiarism detectors ===
+. Create 5 documents (with a similar topic) and 5 plagiarisms of these documents, 10 documents total. 100 words $\leq$ document lenght $\leq$ 500 words. 20 \% $\leq$ plagiarism content $\leq$ 75 \% (100 \% if done well).
+. Select detection algorithm and implement it in Python. At least 1 own plagiarism must be detected, at least 1 must be not detected by your own script.
+. Input format: POS tagged vertical consisting of 10 sctructures `doc` with attributes `author`, `id`, `class`, `source`. Pair author, id is unique. Class is "original" or "plagiarism". Source is the id of the source (in case of plagiarism) or own id (in case of original).\footnote{{\tiny For the sake of simplicity: A plagiarism cannot have more sources here.}}
+. Output format: One plagiarism per line: id `TAB` detected source id `TAB` real source id. Evaluation line: precision, recall F1 measure.
+. Your script will be evaluated using data made by others.
+See the slides.
 === Text processing pipelines ===