Indexing and Searching Very Large Texts

IA161 Advanced NLP Course?, Course Guarantee: Aleš Horák

Prepared by: Miloš Jakubíček

State of the Art


  1. RYCHLÝ, Pavel, et al. Korpusové manažery a~ jejich efektivní implementace. 2000.
  2. JAKUBÍCEK, Miloš; KILGARRIFF, Adam; RYCHLÝ, Pavel. Effective Corpus Virtualization. In: Challenges in the Management of Large Corpora (CMLC-2) Workshop Programme. p. 7.
  3. JAKUBICEK, Milos, et al. Fast Syntactic Searching in Very Large Corpora for Many Languages. In: PACLIC. 2010. p. 741-747.

Practical Session

  1. login to alba
  2. inspect command-line tools that are part of manatee (rpm -ql manatee)
  3. inspect index files of BNC using less, od, lsclex, dumpdrev, dumpdtext
  4. inspect the Python API of Manatee using the provided overview
Last modified 23 months ago Last modified on Oct 1, 2020, 3:33:43 PM