Machine translation

IA161 NLP in Practice Course, Course Guarantee: Aleš Horák

Prepared by: Pavel Rychlý

State of the Art

The Neural Machine Translation system are structured as Encoder-Decoder pair. They are trained on parallel corpora, each training example is a pair of source sentence and a reference translation. Big advances could be done by preparing cleaner data and feeding the network with the right order of sentences.

References

Alammar, Jay (2018). The Illustrated Transformer [Blog post]. Retrieved from https://jalammar.github.io/illustrated-transformer/
Popel, Martin, et al. "Transforming machine translation: a deep learning system reaches news translation quality comparable to human professionals." Nature communications 11.1 (2020): 1-15.
Thompson, Brian and Koehn, Philipp. "Vecalign: Improved Sentence Alignment in Linear Time and Space", Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019

Practical Session

Technical Requirements

The task will proceed using Python notebook run in web browser in the Google Colaboratory environment.

In case of running the codes in a local environment, the requirements are Python 3.6+, jupyter notebook.

Translation with a Sequence to Sequence Network and Attention

Access notebook in the Google Colab environment.

download the notebook or plain python file from the shared notebook (File > Download) and run in your local environment.

Follow the notebook. Choose one of the task at the end of the notebook.

upload

Upload your modified notebook or python script with results to the homework vault (odevzdávárna).