= Machine Translation = [[Image(/trac/research/raw-attachment/wiki/en/MachineTranslation/error.png)]] == Topics == * Statistical machine translation * Extension of translation memories * Domain-specic machine translation * Machine translation between close languages * Sub-word level machine translation == Statistical machine translation == [[Image(/trac/research/raw-attachment/wiki/en/MachineTranslation/trans.png)]] == Improving statistical machine translation == * free state-of-the-art tools available (SRILM, Moses) * baseline SMT available for everyone * languages with high number of wordforms need special treatment * '''dělám, děláš, dělal, dělajícímu, dělaje, děláním, ...''' '''magas, magasabb, legmagasabb, legeslegmagasabb, ...''' * language models can be enriched with linguistic knowledge [[Image(/trac/research/raw-attachment/wiki/en/MachineTranslation/kings.png)]] == Word alignment matrix - from words to phrases == [[Image(/trac/research/raw-attachment/wiki/en/MachineTranslation/word_matrix1.png)]] [[Image(/trac/research/raw-attachment/wiki/en/MachineTranslation/word_matrix2.png)]] [[Image(/trac/research/raw-attachment/wiki/en/MachineTranslation/word_matrix3.png)]] == Domain-specic machine translation == * straightforward way of increasing quality of MT * domain-specic corpora can be downloaded on demand * separate models for each domain: ''' sports, cooking, gardening''' * one sense per domain: '''bat''' [[Image(/trac/research/raw-attachment/wiki/en/MachineTranslation/bat.png)]] * translations of * product details, product descriptions in e-shops, * manuals, warranty certicates, * user interface localizations, ... == Machine translation between close languages == * West and South Slavic languages: Czech, Slovak, Polish, Serbian, Croatian, Slovene * MT mainly on word level, structure is very similar * dierences can be described systematically by rules: '''hraje na klavíri''' <-> '''hraje na klavír''' * billion-word corpora available for these languages * dictionaries can be generated semi-automatically * -> searching for duplicates in close languages (reprinted news) == MT quality, European languages == [[Image(/trac/research/raw-attachment/wiki/en/MachineTranslation/lang_matrix.png)]] == Sub-word level machine translation == * SMT principle applied on character level * translation on subword level (English -> Czech) [[Image(/trac/research/raw-attachment/wiki/en/MachineTranslation/trans1.png)]] * translation across levels [[Image(/trac/research/raw-attachment/wiki/en/MachineTranslation/trans2.png)]] * -> translation of out-of-dictionary words == Conclusions == * generating new segments for translation memories * domain-specic translation * translation between close languages * sub-word level translation