Context Navigation

MachineTranslation

Timestamp:: Jun 1, 2015, 9:23:58 AM (10 years ago)
Author:: Vít Baisa
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

private/NlpInPracticeCourse/MachineTranslation

-                      v1
+                      v2
 Prepared by: Vít Baisa
 == TODO til 31.5.2015 ==
+== State of the Art ==
+. choose particular papers for [[#References|References]] below (that will serve as input for the lecture later on)
+. prepare the [[#PracticalSession|Practical Session]]
+== State of the Art ==
+The Statistical Machine Translation consists of two main parts: a language model for a target language which is responsible for fluency and good-looking output sentences and a translation model which translates source words and phrases into target language. Both models are probability distributions and can be built using a monolingual corpus for language model and a parallel corpus for translation model.
 === References ===
 …
 Approx 3 current papers (preferably from best NLP conferences/journals, eg. [[https://www.aclweb.org/anthology/|ACL Anthology]]) that will be used as a source for the one-hour lecture:
 . paper1
 . paper2
 . paper3
+. Koehn, Philipp, et al. "Moses: Open source toolkit for statistical machine translation." Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions. Association for Computational Linguistics, 2007.
+. Koehn, Philipp, Franz Josef Och, and Daniel Marcu. "Statistical phrase-based translation." Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1. Association for Computational Linguistics, 2003.
+. Denkowski, Michael, and Alon Lavie. "Meteor 1.3: Automatic metric for reliable optimization and evaluation of machine translation systems." Proceedings of the Sixth Workshop on Statistical Machine Translation. Association for Computational Linguistics, 2011.
 == Practical Session ==
+Concrete description of work assignment for students for the second one-hour part of the lecture. The work will consist of tasks connected with practical implementations of algorithms connected with the current topic (probably not the state-of-the-art algorithms mentioned in the first part) and with real data. Students can test the algorithms, evaluate them and possibly try some short adaptations for various subtasks.
+Students can also be required to generate some results of their work and hand them in to prove completing the tasks.
+In the practical session we will try to build a small statistical machine translation system for Czech-English pair using open source tool Moses. We will use default language model trained on ententen08 -- a web corpus built at NLPC. For training of the translation model we will use OPUS2 parallel corpus. If there will be enough time we will measure the translation quality on a test data (around 100 Czech sentences).