Changes between Version 1 and Version 2 of private/NlpInPracticeCourse/CorpusIndexing


Ignore:
Timestamp:
Nov 30, 2015, 9:02:37 AM (8 years ago)
Author:
Miloš Jakubíček
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • private/NlpInPracticeCourse/CorpusIndexing

    v1 v2  
    99=== References ===
    1010
    11 Approx 3 current papers (preferably from best NLP conferences/journals, eg. [[https://www.aclweb.org/anthology/|ACL Anthology]]) that will be used as a source for the one-hour lecture:
    12 
    13  1. paper 1
    14  1. paper 2
    15  1. paper 3
     11 1. RYCHLÝ, Pavel, et al. Korpusové manažery a~ jejich efektivní implementace. 2000.
     12 1. JAKUBÍCEK, Miloš; KILGARRIFF, Adam; RYCHLÝ, Pavel. Effective Corpus Virtualization. In: Challenges in the Management of Large Corpora (CMLC-2) Workshop Programme. p. 7.
     13 1. JAKUBICEK, Milos, et al. Fast Syntactic Searching in Very Large Corpora for Many Languages. In: PACLIC. 2010. p. 741-747.
    1614
    1715== Practical Session ==
    1816
    19 Concrete description of work assignment for students for the second one-hour part of the lecture. The work will consist of tasks connected with practical implementations of algorithms connected with the current topic (probably not the state-of-the-art algorithms mentioned in the first part) and with real data. Students can test the algorithms, evaluate them and possibly try some short adaptations for various subtasks.
    20 
    21 Students are also required to generate some results of their work and hand them in to prove completing the tasks.
     17 1. login to aurora
     18 1. write a program or script that will find all occurrences of a given word form including a small context (at least 5 preceding and succeeding words) in the vertical file {{{/corpora-fast1/vert/bnc/bnc.vert}}}
     19 1. the script will take two arguments: path to the vertical file and word to be searched
     20 1. submit the script into the IS vault