wiki:en/MachineTranslation

Version 2 (modified by xkocinc, 10 years ago) (diff)

--

Machine Translation

/trac/research/raw-attachment/wiki/en/MachineTranslation/error.png

Topics

  • Statistical machine translation
  • Extension of translation memories
  • Domain-speci c machine translation
  • Machine translation between close languages
  • Sub-word level machine translation

Statistical machine translation

/trac/research/raw-attachment/wiki/en/MachineTranslation/trans.png

Improving statistical machine translation

  • free state-of-the-art tools available (SRILM, Moses)
  • baseline SMT available for everyone
  • languages with high number of wordforms need special treatment
  • dělám, děláš, dělal, dělajícímu, dělaje, děláním, ...

magas, magasabb, legmagasabb, legeslegmagasabb, ...

  • language models can be enriched with linguistic knowledge

/trac/research/raw-attachment/wiki/en/MachineTranslation/kings.png

Word alignment matrix - from words to phrases

/trac/research/raw-attachment/wiki/en/MachineTranslation/word_matrix1.png /trac/research/raw-attachment/wiki/en/MachineTranslation/word_matrix2.png /trac/research/raw-attachment/wiki/en/MachineTranslation/word_matrix3.png

Domain-speci c machine translation

  • straightforward way of increasing quality of MT
  • domain-speci c corpora can be downloaded on demand
  • separate models for each domain: sports, cooking, gardening
  • one sense per domain: bat

/trac/research/raw-attachment/wiki/en/MachineTranslation/bat.png

  • translations of
    • product details, product descriptions in e-shops,
    • manuals, warranty certi cates,
    • user interface localizations, ...

Machine translation between close languages

  • West and South Slavic languages: Czech, Slovak, Polish, Serbian, Croatian, Slovene
  • MT mainly on word level, structure is very similar
  • di erences can be described systematically by rules: hraje na klavíri <-> hraje na klavír
  • billion-word corpora available for these languages
  • dictionaries can be generated semi-automatically
  • -> searching for duplicates in close languages (reprinted news)

MT quality, European languages

/trac/research/raw-attachment/wiki/en/MachineTranslation/lang_matrix.png

Sub-word level machine translation

  • SMT principle applied on character level
  • translation on subword level (English -> Czech)

/trac/research/raw-attachment/wiki/en/MachineTranslation/trans1.png

  • translation across levels

/trac/research/raw-attachment/wiki/en/MachineTranslation/trans2.png

  • -> translation of out-of-dictionary words

Conclusions

  • generating new segments for translation memories
  • domain-speci c translation
  • translation between close languages
  • sub-word level translation

Attachments (10)

Download all attachments as: .zip