wiki:private/AdvancedNlpCourse/MachineTranslation

Version 2 (modified by Vít Baisa, 6 years ago) (diff)

--

Machine translation

IA161 Advanced NLP Course, Course Guarantee: Aleš Horák

Prepared by: Vít Baisa

State of the Art

The Statistical Machine Translation consists of two main parts: a language model for a target language which is responsible for fluency and good-looking output sentences and a translation model which translates source words and phrases into target language. Both models are probability distributions and can be built using a monolingual corpus for language model and a parallel corpus for translation model.

References

Approx 3 current papers (preferably from best NLP conferences/journals, eg. ACL Anthology) that will be used as a source for the one-hour lecture:

  1. Koehn, Philipp, et al. "Moses: Open source toolkit for statistical machine translation." Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions. Association for Computational Linguistics, 2007.
  2. Koehn, Philipp, Franz Josef Och, and Daniel Marcu. "Statistical phrase-based translation." Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1. Association for Computational Linguistics, 2003.
  3. Denkowski, Michael, and Alon Lavie. "Meteor 1.3: Automatic metric for reliable optimization and evaluation of machine translation systems." Proceedings of the Sixth Workshop on Statistical Machine Translation. Association for Computational Linguistics, 2011.

Practical Session

In the practical session we will try to build a small statistical machine translation system for Czech-English pair using open source tool Moses. We will use default language model trained on ententen08 -- a web corpus built at NLPC. For training of the translation model we will use OPUS2 parallel corpus. If there will be enough time we will measure the translation quality on a test data (around 100 Czech sentences).

Attachments (2)