Context Navigation

LanguageModelling

-                      v6
+                      v7
 We will build a simple character-based language model and generate naturally-looking sentences. We need a plain text and fast suffix sorting algorithm (mksary).
+== Getting necessary data and tools ==
+* {{{wget nlp.fi.muni.cz/~xbaisa/cblm.tar.gz}}}
+* {{{tar xvzf cblm.tar.gz}}} in your directory
+* {{{cd cblm}}}
+== Training data ==
+To build a new model, we need
+* a plain text, see {{{data}}} directory, use {{{lower.py}}}
+* to create a suffix array {{{mksary INPUT.txt OUTPUT.sa}}}
+* and compute the prefix tree: {{{python build_trie.py FILE.sa [MINFREQ] [OUPUTFILE]}}}
+In .trie file, the model is stored.
+== Generating text ==
+To generate a random text, just run
+{{{python alarm.py FILE.trie}}}
 === Task ===
 …
 * changing generating process
 * or all above.
+Upload 10,000 random sentences to your wault. Describe your changes, tunings in README where you can put some hilarious random sentence examples.