Changes between Version 5 and Version 6 of private/NlpInPracticeCourse/TopicModelling
- Timestamp:
- Nov 3, 2015, 9:23:52 PM (8 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
private/NlpInPracticeCourse/TopicModelling
v5 v6 17 17 In this session we will use [[http://radimrehurek.com/gensim/|Gensim]] to model latent topics of Wikipedia documents. We will focus on Latent Semantic Analysis and Latent Dirichlet Allocation models. 18 18 19 1. Download and extract the corpus of Czech Wikipedia documents: [[htdocs:bigdata/wiki.tar.bz2|wiki corpus]]. 19 1. Download and extract the corpus of Czech Wikipedia documents: [[htdocs:bigdata/wiki.tar.bz2|wiki corpus]]. 20 21 1. Train LSA and LDA models of the corpus for various numbers of topics using Gensim. You can use this template: 22 23 1. For both the LSA and LDA select the best best models 20 24 21 25 Students will also be required to generate some results of their work and hand them in to prove completing the tasks.