20 | | We will build a simple language model (skip-gram) which has very interesting properties. When trained properly, the vectors of words obey simple space arithmetics, e.g. |
21 | | vector "king" − vector "man" + vector "woman" ~= vector "queen". |
22 | | We will train this model on a large Czech and English corpora and evaluate the results. |
| 20 | We will build a simple character-based language model and generate naturally-looking sentences. We need a plain text and fast suffix sorting algorithm (mksary). |
| 21 | |
| 22 | |
| 23 | |
| 24 | |
| 25 | === Task === |
| 26 | |
| 27 | Change the training process and the generating process to generate the most naturally-looking sentences. Either by |
| 28 | * pre-processing the input plain text or |
| 29 | * setting training parameters or |
| 30 | * changing generating process |
| 31 | * or all above. |