Changes between Version 4 and Version 5 of private/AdvancedNlpCourse/CorpusIndexing


Ignore:
Timestamp:
Oct 16, 2019, 3:45:28 PM (11 months ago)
Author:
Miloš Jakubíček
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • private/AdvancedNlpCourse/CorpusIndexing

    v4 v5  
    1515== Practical Session ==
    1616
    17  1. (optionally) login to aurora
    18  1. write a program or script that will find all occurrences of a given word form including a small context (at least 5 preceding and succeeding words) in the [[htdocs:bigdata/bnc.vert.xz|vertical file]]
    19  1. the script will take two arguments: path to the vertical file and word to be searched [[br]]
    20  If you have logged to aurora, you may use fixed path to the vertical file as
    21  {{{
    22 /nlp/trac/research/htdocs/bigdata/bnc.vert
    23 }}}
    24  without the need to copy it.
    25  1. submit the script into the IS vault
     17 1. login to alba
     18 1. inspect command-line tools that are part of manatee (rpm -ql manatee)
     19 1. inspect index files of BNC using less, od, lsclex, dumpdrev, dumpdtext
     20 1. inspect the Python API of Manatee using the provided overview