Changes between Version 6 and Version 7 of private/NlpInPracticeCourse/ParsingCzech
- Timestamp:
- Oct 26, 2015, 2:44:41 PM (10 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
private/NlpInPracticeCourse/ParsingCzech
v6 v7 25 25 https://ske.fi.muni.cz/bonito/r.cgi/dumpws?corpname=user/novakjan/gramdev_czechwiki [[BR]] 26 26 [[BR]] 27 `gramdev_czechwiki` is the `<corpus_id>` of the Czech Wikipedia corpus.27 `gramdev_czechwiki` is the ''corpus_id'' of the Czech Wikipedia corpus. [[BR]] 28 28 Or, if you need more than 100,000 relations, you can use the other way 29 29 1. logon to the {{{alba.fi.muni.cz}}} server and use the {{{dumpws}}} command to export the content of the word sketch database: [[BR]] 30 {{{dumpws /corpora/ca/user_data/<YOUR_USERNAME_IN_SKETCH_ENGINE>/registry/gramdev_czechwiki}}} 30 {{{dumpws /corpora/ca/user_data/<YOUR_USERNAME_IN_SKETCH_ENGINE>/registry/gramdev_czechwiki}}} [[BR]] 31 For this you may need to ask for extra permission to registry directories. 31 32 5. Process the output of {{{dumpws}}} with a simple Bash or Python script to select first 100 most salient headword-collocation pairs for each relation. Upload the resulting list into the IS vault.