Version 2 (modified by 8 years ago) (diff) | ,
---|
Language modelling
IA161 Advanced NLP Course, Course Guarantee: Aleš Horák
Prepared by: Vít Baisa
State of the Art =
The goal of language model is to a) predict a following word or phrase based on a given text history and b) assign a probability (=score) to any possible input sentence. This was done mainly by n-gram models known from WWII. But recently, buzzword deep learning penetrated also into language modelling and it turned out neural networks beat classic n-gram models substantially.
References
Approx 3 current papers (preferably from best NLP conferences/journals, eg. ACL Anthology) that will be used as a source for the one-hour lecture:
- Bengio, Yoshua, et al. "A neural probabilistic language model." The Journal of Machine Learning Research 3 (2003): 1137-1155.
- Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013).
- Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in Neural Information Processing Systems. 2013.
- Chelba, Ciprian, et al. "One billion word benchmark for measuring progress in statistical language modeling." arXiv preprint arXiv:1312.3005 (2013).
Practical Session
We will build a simple language model (skip-gram) which has very interesting properties. When trained properly, the vectors of words obey simple space arithmetics, e.g. vector "king" − vector "man" + vector "woman" ~= vector of "queen". We will train this model on a large Czech and English corpora and evaluate the result.