Version 3 (modified by Vít Baisa, 5 years ago) (diff)


Language modelling

IA161 Advanced NLP Course, Course Guarantee: Aleš Horák

Prepared by: Vít Baisa

State of the Art =

The goal of language model is to a) predict a following word or phrase based on a given text history and b) assign a probability (=score) to any possible input sentence. This was done mainly by n-gram models known since WWII. But recently, the buzzword deep learning penetrated also into language modelling and it turned out neural networks beat classic n-gram models.


Approx 3 current papers (preferably from best NLP conferences/journals, eg. ACL Anthology) that will be used as a source for the one-hour lecture:

  1. Bengio, Yoshua, et al. "A neural probabilistic language model." The Journal of Machine Learning Research 3 (2003): 1137-1155.
  2. Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013).
  3. Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in Neural Information Processing Systems. 2013.
  4. Chelba, Ciprian, et al. "One billion word benchmark for measuring progress in statistical language modeling." arXiv preprint arXiv:1312.3005 (2013).

Practical Session

We will build a simple language model (skip-gram) which has very interesting properties. When trained properly, the vectors of words obey simple space arithmetics, e.g. vector "king" − vector "man" + vector "woman" ~= vector of "queen". We will train this model on a large Czech and English corpora and evaluate the result.