Version 2 (modified by 9 years ago) (diff) | ,
---|
Machine Translation
Topics
- Statistical machine translation
- Extension of translation memories
- Domain-speci c machine translation
- Machine translation between close languages
- Sub-word level machine translation
Statistical machine translation
Improving statistical machine translation
- free state-of-the-art tools available (SRILM, Moses)
- baseline SMT available for everyone
- languages with high number of wordforms need special treatment
- dělám, děláš, dělal, dělajícímu, dělaje, děláním, ...
magas, magasabb, legmagasabb, legeslegmagasabb, ...
- language models can be enriched with linguistic knowledge
Word alignment matrix - from words to phrases
Domain-speci c machine translation
- straightforward way of increasing quality of MT
- domain-speci c corpora can be downloaded on demand
- separate models for each domain: sports, cooking, gardening
- one sense per domain: bat
- translations of
- product details, product descriptions in e-shops,
- manuals, warranty certi cates,
- user interface localizations, ...
Machine translation between close languages
- West and South Slavic languages: Czech, Slovak, Polish, Serbian, Croatian, Slovene
- MT mainly on word level, structure is very similar
- di erences can be described systematically by rules: hraje na klavíri <-> hraje na klavír
- billion-word corpora available for these languages
- dictionaries can be generated semi-automatically
- -> searching for duplicates in close languages (reprinted news)
MT quality, European languages
Sub-word level machine translation
- SMT principle applied on character level
- translation on subword level (English -> Czech)
- translation across levels
- -> translation of out-of-dictionary words
Conclusions
- generating new segments for translation memories
- domain-speci c translation
- translation between close languages
- sub-word level translation
Attachments (10)
- error.png (710.5 KB) - added by 9 years ago.
- trans.png (149.5 KB) - added by 9 years ago.
- kings.png (42.0 KB) - added by 9 years ago.
- word_matrix1.png (48.8 KB) - added by 9 years ago.
- word_matrix2.png (46.5 KB) - added by 9 years ago.
- word_matrix3.png (47.6 KB) - added by 9 years ago.
- bat.png (25.7 KB) - added by 9 years ago.
- lang_matrix.png (723.6 KB) - added by 9 years ago.
- trans1.png (23.6 KB) - added by 9 years ago.
- trans2.png (19.5 KB) - added by 9 years ago.
Download all attachments as: .zip