Changes between Version 6 and Version 7 of private/NlpInPracticeCourse/ParsingCzech


Ignore:
Timestamp:
Oct 26, 2015, 2:44:41 PM (10 years ago)
Author:
Ales Horak
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • private/NlpInPracticeCourse/ParsingCzech

    v6 v7  
    2525   https://ske.fi.muni.cz/bonito/r.cgi/dumpws?corpname=user/novakjan/gramdev_czechwiki [[BR]]
    2626   [[BR]]
    27    `gramdev_czechwiki` is the `<corpus_id>` of the Czech Wikipedia corpus.
     27   `gramdev_czechwiki` is the ''corpus_id'' of the Czech Wikipedia corpus. [[BR]]
    2828   Or, if you need more than 100,000 relations, you can use the other way
    2929  1. logon to the {{{alba.fi.muni.cz}}} server and use the {{{dumpws}}} command to export the content of the word sketch database: [[BR]]
    30    {{{dumpws /corpora/ca/user_data/<YOUR_USERNAME_IN_SKETCH_ENGINE>/registry/gramdev_czechwiki}}}
     30   {{{dumpws /corpora/ca/user_data/<YOUR_USERNAME_IN_SKETCH_ENGINE>/registry/gramdev_czechwiki}}} [[BR]]
     31   For this you may need to ask for extra permission to registry directories.
    3132 5. Process the output of {{{dumpws}}} with a simple Bash or Python script to select first 100 most salient headword-collocation pairs for each relation. Upload the resulting list into the IS vault.