wiki:private/AdvancedNlpCourse/CorpusIndexing

Indexing and Searching Very Large Texts

IA161 Advanced NLP Course, Course Guarantee: Aleš Horák

Prepared by: Miloš Jakubíček

State of the Art

References

  1. RYCHLÝ, Pavel, et al. Korpusové manažery a~ jejich efektivní implementace. 2000.
  2. JAKUBÍCEK, Miloš; KILGARRIFF, Adam; RYCHLÝ, Pavel. Effective Corpus Virtualization. In: Challenges in the Management of Large Corpora (CMLC-2) Workshop Programme. p. 7.
  3. JAKUBICEK, Milos, et al. Fast Syntactic Searching in Very Large Corpora for Many Languages. In: PACLIC. 2010. p. 741-747.

Practical Session

  1. login to alba
  2. inspect command-line tools that are part of manatee (rpm -ql manatee)
  3. inspect index files of BNC using less, od, lsclex, dumpdrev, dumpdtext
  4. inspect the Python API of Manatee using the provided overview
Last modified 12 months ago Last modified on Oct 16, 2019, 3:45:28 PM