Changes between Version 5 and Version 6 of private/NlpInPracticeCourse/CorpusIndexing


Ignore:
Timestamp:
Nov 5, 2020, 11:33:18 AM (3 years ago)
Author:
Miloš Jakubíček
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • private/NlpInPracticeCourse/CorpusIndexing

    v5 v6  
    1515== Practical Session ==
    1616
    17  1. login to alba
    18  1. inspect command-line tools that are part of manatee (rpm -ql manatee)
    19  1. inspect index files of BNC using less, od, lsclex, dumpdrev, dumpdtext
    20  1. inspect the Python API of Manatee using the provided overview
     17Compare search through (A) plain text using grep, (B) an indexed corpus using Manatee, (C) a corpus indexed in an arbitrary SQL database
     18Use vertical text for BNC available at aurora:/corpora/vert/bnc/bnc.vert.xz.
     19
     20Search for the phrase "test case", display context of 10 words before and after each occurrence of the search phrase.
     21
     22(A) plain
     23
     24Hint: use grep -C to display context
     25
     26(B)
     27
     28Corpus is already indexed on Manatee, try:
     29
     30time corpquery bnc '[word="test"] [word="case"]'
     31
     32(C)
     33
     34Use your favourite SQL database, on aurora you can use sqlite3.
     35Hint how to import vertical text:
     36
     37https://stackoverflow.com/questions/26065872/how-to-import-a-tsv-file-with-sqlite3
     38
     39For (A), (B) and (C), submit the commands you used and how long the search took to evaluate.