| 1 | = Indexing and Searching Very Large Texts = |
| 2 | |
| 3 | [[https://is.muni.cz/auth/predmet/fi/ia161|IA161]] [[en/AdvancedNlpCourse|Advanced NLP Course]], Course Guarantee: Aleš Horák |
| 4 | |
| 5 | Prepared by: Miloš Jakubíček |
| 6 | |
| 7 | == State of the Art == |
| 8 | |
| 9 | === References === |
| 10 | |
| 11 | 1. RYCHLÝ, Pavel, et al. Korpusové manažery a~ jejich efektivní implementace. 2000. |
| 12 | 1. JAKUBÍCEK, Miloš; KILGARRIFF, Adam; RYCHLÝ, Pavel. Effective Corpus Virtualization. In: Challenges in the Management of Large Corpora (CMLC-2) Workshop Programme. p. 7. |
| 13 | 1. JAKUBICEK, Milos, et al. Fast Syntactic Searching in Very Large Corpora for Many Languages. In: PACLIC. 2010. p. 741-747. |
| 14 | |
| 15 | == Practical Session == |
| 16 | |
| 17 | 1. (optionally) login to aurora |
| 18 | 1. write a program or script that will find all occurrences of a given word form including a small context (at least 5 preceding and succeeding words) in the [[htdocs:bigdata/bnc.vert.xz|vertical file]] |
| 19 | 1. the script will take two arguments: path to the vertical file and word to be searched [[br]] |
| 20 | If you have logged to aurora, you may use fixed path to the vertical file as |
| 21 | {{{ |
| 22 | /nlp/trac/research/htdocs/bigdata/bnc.vert |
| 23 | }}} |
| 24 | without the need to copy it. |
| 25 | 1. submit the script into the IS vault |