Changes between Version 6 and Version 7 of private/AdvancedNlpCourse/TopicModelling


Ignore:
Timestamp:
Nov 3, 2015, 9:55:44 PM (6 years ago)
Author:
ymaterna
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • private/AdvancedNlpCourse/TopicModelling

    v6 v7  
    1717In this session we will use [[http://radimrehurek.com/gensim/|Gensim]] to model latent topics of Wikipedia documents. We will focus on Latent Semantic Analysis and Latent Dirichlet Allocation models.
    1818
    19 1. Download and extract the corpus of Czech Wikipedia documents:  [[htdocs:bigdata/wiki.tar.bz2|wiki corpus]].
    20 
    21 1. Train LSA and LDA models of the corpus for various numbers of topics using Gensim. You can use this template:
    22 
    23 1. For both the LSA and LDA select the best best models   
    24 
    25 Students will also be required to generate some results of their work and hand them in to prove completing the tasks.
     19 1. Download and extract the corpus of Czech Wikipedia documents:  [[htdocs:bigdata/wiki.tar.bz2|wiki corpus]].
     20 1. Train LSA and LDA models of the corpus for various numbers of topics using Gensim. You can use this template: [raw-attachment:models.py models.py].
     21 1. For both LSA and LDA select the best model (by looking at the data or by computing perplexity of a test set for LDA).
     22 1. Select 5 most important topics with 10 most important words, give them a name, save it into a text file and upload it into odevzdavarna.