Context Navigation

CorpusIndexing

Timestamp:: Nov 30, 2015, 9:02:37 AM (10 years ago)
Author:: Miloš Jakubíček
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

private/NlpInPracticeCourse/CorpusIndexing

-                      v1
+                      v2
 === References ===
+Approx 3 current papers (preferably from best NLP conferences/journals, eg. [[https://www.aclweb.org/anthology/|ACL Anthology]]) that will be used as a source for the one-hour lecture:
+. paper 1
+. paper 2
+. paper 3
+. RYCHLÝ, Pavel, et al. Korpusové manažery a~ jejich efektivní implementace. 2000.
+. JAKUBÍCEK, Miloš; KILGARRIFF, Adam; RYCHLÝ, Pavel. Effective Corpus Virtualization. In: Challenges in the Management of Large Corpora (CMLC-2) Workshop Programme. p. 7.
+. JAKUBICEK, Milos, et al. Fast Syntactic Searching in Very Large Corpora for Many Languages. In: PACLIC. 2010. p. 741-747.
 == Practical Session ==
+Concrete description of work assignment for students for the second one-hour part of the lecture. The work will consist of tasks connected with practical implementations of algorithms connected with the current topic (probably not the state-of-the-art algorithms mentioned in the first part) and with real data. Students can test the algorithms, evaluate them and possibly try some short adaptations for various subtasks.
+Students are also required to generate some results of their work and hand them in to prove completing the tasks.
+. login to aurora
+. write a program or script that will find all occurrences of a given word form including a small context (at least 5 preceding and succeeding words) in the vertical file {{{/corpora-fast1/vert/bnc/bnc.vert}}}
+. the script will take two arguments: path to the vertical file and word to be searched
+. submit the script into the IS vault