= Topic identification, topic modelling = [[https://is.muni.cz/auth/predmet/fi/ia161|IA161 Advanced NLP Course]], Course Guarantee: Aleš Horák Prepared by: Jirka Materna == State of the Art == === References === Approx 3 current papers (preferably from best NLP conferences/journals, eg. [[https://www.aclweb.org/anthology/|ACL Anthology]]) that will be used as a source for the one-hour lecture: 1. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3:993 – 1022, 2003. 1. Yee W. Teh, Michael I. Jordan, Matthew J. Beal, and David M. Blei. Hierarchical Dirichlet processes . Journal of the American Statistical Association, 101:1566 – 1581, 2006. 1. S. T. Dumais, G. W. Furnas, T. K. Landauer, S. Deerwester, and R. Harshman. Using Latent Semantic Analysis to Improve Access to Textual Information. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’88, pages 281–285, New York, NY, USA, 1988. ACM. ISBN 0-201-14237-6. == Practical Session == In this session we will use [[http://radimrehurek.com/gensim/|Gensim]] to model latent topics of various texts. We will focus on Latent Semantic Analysis and Latent Dirichlet Allocation models. Students will also be required to generate some results of their work and hand them in to prove completing the tasks.