Context Navigation

TopicModelling

-                      v21
+                      v22
 . Create `<YOUR_FILE>`, a text file named ia161-UCO-07.txt where UCO is your university ID.
+. Gensim is installed on `epimetheus1.fi.muni.cz` and offers faster model processing, but you can easily use your own installation.
+. Download and extract the corpus of Wikipedia documents:  [[htdocs:bigdata/wiki_en.tar.bz2|English wiki corpus]].
+. Train LSA and LDA models of the corpus for various topics using Gensim. You can use this template: [raw-attachment:models.py models.py] or  [[https://colab.research.google.com/drive/1nTJaNkwclqBSI6Kk6X_uUHLtViWgbe_S?usp=sharing|Google Colab]].
+. Train LSA and LDA models of the corpus for various topics using Gensim. Use  [[https://colab.research.google.com/drive/1nTJaNkwclqBSI6Kk6X_uUHLtViWgbe_S?usp=sharing|Google Colab]].
 . Check the coherence for various parameters.
 . Select the best model for both LSA and LDA (by looking at the data or by coherence).
 . For each model, select the two most significant topics that make sense to you and compare them with the coherence score. Give them a name, save it into a `<YOUR_FILE>`, and submit it to the homework vault (Odevzdavarna).
-You can save the files in your home directory on NLP computers, which will be accessible on the server.