Changes between Version 8 and Version 9 of private/NlpInPracticeCourse/TopicModelling


Ignore:
Timestamp:
Nov 10, 2020, 4:09:08 PM (3 years ago)
Author:
xrambous
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • private/NlpInPracticeCourse/TopicModelling

    v8 v9  
    1717In this session we will use [[http://radimrehurek.com/gensim/|Gensim]] to model latent topics of Wikipedia documents. We will focus on Latent Semantic Analysis and Latent Dirichlet Allocation models.
    1818
     19 1. Gensim is already installed on epimetheus1.fi.muni.cz and it also offers faster model processing.
    1920 1. Download and extract the corpus of Czech Wikipedia documents:  [[htdocs:bigdata/wiki.tar.bz2|wiki corpus]].
    2021 1. Train LSA and LDA models of the corpus for various numbers of topics using Gensim. You can use this template: [raw-attachment:models.py models.py].
     
    2223 1. Select 5 most important topics with 10 most important words, give them a name, save it into a text file and upload it into odevzdavarna.
    2324
    24 If you want faster model processing, login to epimetheus1.fi.muni.cz. You can save the files in your home directory on NLP computers and they will be accessible on the server.
     25You can save the files in your home directory on NLP computers and they will be accessible on the server.