Version 5 (modified by xkocinc, 9 years ago) (diff)


Machine Translation



  • Statistical machine translation
  • Extension of translation memories
  • Domain-specic machine translation
  • Machine translation between close languages
  • Sub-word level machine translation

Statistical machine translation


Improving statistical machine translation

  • free state-of-the-art tools available (SRILM, Moses)
  • baseline SMT available for everyone
  • languages with high number of wordforms need special treatment
  • dělám, děláš, dělal, dělajícímu, dělaje, děláním, ...
  • magas, magasabb, legmagasabb, legeslegmagasabb, ...
  • language models can be enriched with linguistic knowledge


Word alignment matrix - from words to phrases




Domain-specic machine translation

  • straightforward way of increasing quality of MT
  • domain-specic corpora can be downloaded on demand
  • separate models for each domain: sports, cooking, gardening
  • one sense per domain: bat


  • translations of
    • product details, product descriptions in e-shops,
    • manuals, warranty certicates,
    • user interface localizations, ...

Machine translation between close languages

  • West and South Slavic languages: Czech, Slovak, Polish, Serbian, Croatian, Slovene
  • MT mainly on word level, structure is very similar
  • dierences can be described systematically by rules: hraje na klavíri <-> hraje na klavír
  • billion-word corpora available for these languages
  • dictionaries can be generated semi-automatically
  • -> searching for duplicates in close languages (reprinted news)

MT quality, European languages


Sub-word level machine translation

  • SMT principle applied on character level
  • translation on subword level (English -> Czech)


  • translation across levels


  • -> translation of out-of-dictionary words


  • generating new segments for translation memories
  • domain-specic translation
  • translation between close languages
  • sub-word level translation

Attachments (10)

Download all attachments as: .zip