Changes between Version 6 and Version 7 of private/NlpInPracticeCourse/LanguageModelling
- Timestamp:
- Nov 2, 2015, 8:01:00 AM (8 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
private/NlpInPracticeCourse/LanguageModelling
v6 v7 20 20 We will build a simple character-based language model and generate naturally-looking sentences. We need a plain text and fast suffix sorting algorithm (mksary). 21 21 22 == Getting necessary data and tools == 22 23 24 * {{{wget nlp.fi.muni.cz/~xbaisa/cblm.tar.gz}}} 25 * {{{tar xvzf cblm.tar.gz}}} in your directory 26 * {{{cd cblm}}} 23 27 28 == Training data == 29 30 To build a new model, we need 31 * a plain text, see {{{data}}} directory, use {{{lower.py}}} 32 * to create a suffix array {{{mksary INPUT.txt OUTPUT.sa}}} 33 * and compute the prefix tree: {{{python build_trie.py FILE.sa [MINFREQ] [OUPUTFILE]}}} 34 35 In .trie file, the model is stored. 36 37 == Generating text == 38 39 To generate a random text, just run 40 {{{python alarm.py FILE.trie}}} 24 41 25 42 === Task === … … 30 47 * changing generating process 31 48 * or all above. 49 50 Upload 10,000 random sentences to your wault. Describe your changes, tunings in README where you can put some hilarious random sentence examples.