Changes between Version 9 and Version 10 of private/NlpInPracticeCourse/LanguageModelling
- Timestamp:
- Nov 2, 2015, 8:27:08 AM (8 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
private/NlpInPracticeCourse/LanguageModelling
v9 v10 30 30 * {{{cd cblm}}} 31 31 32 Mksary 33 34 * {{{git clone https://github.com/lh3/libdivsufsort.git}}} 35 * {{{cd libdivsufsort}}} 36 * {{{mkdir build}}} 37 * {{{cmake -DCMAKE_BUILD_TYPE="Release" -DCMAKE_INSTALL_PREFIX="/ABSOLUTE_PATH_TO_LIBDIVSUFSORT"}}} 38 * {{{make}}} 39 * {{{cd ...}}} 40 * {{{ln -s libdivsufsort/examples/mksary mksary}}} 41 32 42 == Training data == 33 43 34 44 To build a new model, we need 35 45 * a plain text, see {{{data}}} directory, use {{{lower.py}}} 36 * to create a suffix array {{{ mksary INPUT.txt OUTPUT.sa}}}46 * to create a suffix array {{{./mksary INPUT.txt OUTPUT.sa}}} 37 47 * and compute the prefix tree: {{{python build_trie.py FILE.sa [MINFREQ] [OUPUTFILE]}}} 38 48