wiki:private/NlpInPracticeCourse/CorpusIndexing

Version 5 (modified by Miloš Jakubíček, 5 years ago) (diff)

--

Indexing and Searching Very Large Texts

IA161 Advanced NLP Course?, Course Guarantee: Aleš Horák

Prepared by: Miloš Jakubíček

State of the Art

References

  1. RYCHLÝ, Pavel, et al. Korpusové manažery a~ jejich efektivní implementace. 2000.
  2. JAKUBÍCEK, Miloš; KILGARRIFF, Adam; RYCHLÝ, Pavel. Effective Corpus Virtualization. In: Challenges in the Management of Large Corpora (CMLC-2) Workshop Programme. p. 7.
  3. JAKUBICEK, Milos, et al. Fast Syntactic Searching in Very Large Corpora for Many Languages. In: PACLIC. 2010. p. 741-747.

Practical Session

  1. login to alba
  2. inspect command-line tools that are part of manatee (rpm -ql manatee)
  3. inspect index files of BNC using less, od, lsclex, dumpdrev, dumpdtext
  4. inspect the Python API of Manatee using the provided overview