Changes between Version 1 and Version 2 of private/NlpInPracticeCourse/MachineTranslation
- Timestamp:
- Jun 1, 2015, 9:23:58 AM (8 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
private/NlpInPracticeCourse/MachineTranslation
v1 v2 5 5 Prepared by: Vít Baisa 6 6 7 == TODO til 31.5.2015==7 == State of the Art == 8 8 9 1. choose particular papers for [[#References|References]] below (that will serve as input for the lecture later on) 10 1. prepare the [[#PracticalSession|Practical Session]] 11 12 == State of the Art == 9 The Statistical Machine Translation consists of two main parts: a language model for a target language which is responsible for fluency and good-looking output sentences and a translation model which translates source words and phrases into target language. Both models are probability distributions and can be built using a monolingual corpus for language model and a parallel corpus for translation model. 13 10 14 11 === References === … … 16 13 Approx 3 current papers (preferably from best NLP conferences/journals, eg. [[https://www.aclweb.org/anthology/|ACL Anthology]]) that will be used as a source for the one-hour lecture: 17 14 18 1. paper119 1. paper220 1. paper315 1. Koehn, Philipp, et al. "Moses: Open source toolkit for statistical machine translation." Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions. Association for Computational Linguistics, 2007. 16 1. Koehn, Philipp, Franz Josef Och, and Daniel Marcu. "Statistical phrase-based translation." Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1. Association for Computational Linguistics, 2003. 17 1. Denkowski, Michael, and Alon Lavie. "Meteor 1.3: Automatic metric for reliable optimization and evaluation of machine translation systems." Proceedings of the Sixth Workshop on Statistical Machine Translation. Association for Computational Linguistics, 2011. 21 18 22 19 == Practical Session == 23 20 24 Concrete description of work assignment for students for the second one-hour part of the lecture. The work will consist of tasks connected with practical implementations of algorithms connected with the current topic (probably not the state-of-the-art algorithms mentioned in the first part) and with real data. Students can test the algorithms, evaluate them and possibly try some short adaptations for various subtasks. 25 26 Students can also be required to generate some results of their work and hand them in to prove completing the tasks. 21 In the practical session we will try to build a small statistical machine translation system for Czech-English pair using open source tool Moses. We will use default language model trained on ententen08 -- a web corpus built at NLPC. For training of the translation model we will use OPUS2 parallel corpus. If there will be enough time we will measure the translation quality on a test data (around 100 Czech sentences).