Changes between Version 21 and Version 22 of private/NlpInPracticeCourse/TopicModelling


Ignore:
Timestamp:
Oct 30, 2023, 11:35:23 AM (6 months ago)
Author:
Zuzana Nevěřilová
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • private/NlpInPracticeCourse/TopicModelling

    v21 v22  
    1919
    2020 1. Create `<YOUR_FILE>`, a text file named ia161-UCO-07.txt where UCO is your university ID.
    21  1. Gensim is installed on `epimetheus1.fi.muni.cz` and offers faster model processing, but you can easily use your own installation.
    22  1. Download and extract the corpus of Wikipedia documents:  [[htdocs:bigdata/wiki_en.tar.bz2|English wiki corpus]].
    23  1. Train LSA and LDA models of the corpus for various topics using Gensim. You can use this template: [raw-attachment:models.py models.py] or  [[https://colab.research.google.com/drive/1nTJaNkwclqBSI6Kk6X_uUHLtViWgbe_S?usp=sharing|Google Colab]].
     21 1. Train LSA and LDA models of the corpus for various topics using Gensim. Use  [[https://colab.research.google.com/drive/1nTJaNkwclqBSI6Kk6X_uUHLtViWgbe_S?usp=sharing|Google Colab]].
    2422 1. Check the coherence for various parameters.
    2523 1. Select the best model for both LSA and LDA (by looking at the data or by coherence).
    2624 1. For each model, select the two most significant topics that make sense to you and compare them with the coherence score. Give them a name, save it into a `<YOUR_FILE>`, and submit it to the homework vault (Odevzdavarna).
    27 
    28 You can save the files in your home directory on NLP computers, which will be accessible on the server.