19 | | 1. Download and extract the corpus of Czech Wikipedia documents: [[htdocs:bigdata/wiki.tar.bz2|wiki corpus]]. |
20 | | |
21 | | 1. Train LSA and LDA models of the corpus for various numbers of topics using Gensim. You can use this template: |
22 | | |
23 | | 1. For both the LSA and LDA select the best best models |
24 | | |
25 | | Students will also be required to generate some results of their work and hand them in to prove completing the tasks. |
| 19 | 1. Download and extract the corpus of Czech Wikipedia documents: [[htdocs:bigdata/wiki.tar.bz2|wiki corpus]]. |
| 20 | 1. Train LSA and LDA models of the corpus for various numbers of topics using Gensim. You can use this template: [raw-attachment:models.py models.py]. |
| 21 | 1. For both LSA and LDA select the best model (by looking at the data or by computing perplexity of a test set for LDA). |
| 22 | 1. Select 5 most important topics with 10 most important words, give them a name, save it into a text file and upload it into odevzdavarna. |