Changes between Version 1 and Version 2 of en/MachineTranslation


Ignore:
Timestamp:
Jun 5, 2014, 3:16:10 PM (7 years ago)
Author:
xkocinc
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • en/MachineTranslation

    v1 v2  
    11= Machine Translation =
    22
     3[[Image(/trac/research/raw-attachment/wiki/en/MachineTranslation/error.png)]]
     4
     5== Topics ==
     6 * Statistical machine translation
     7 * Extension of translation memories
     8 * Domain-speci
     9c machine translation
     10 * Machine translation between close languages
     11 * Sub-word level machine translation
     12
     13== Statistical machine translation ==
     14
     15[[Image(/trac/research/raw-attachment/wiki/en/MachineTranslation/trans.png)]]
     16
     17== Improving statistical machine translation ==
     18 * free state-of-the-art tools available (SRILM, Moses)
     19 * baseline SMT available for everyone
     20 * languages with high number of wordforms need special treatment
     21 * '''dělám, děláš, dělal, dělajícímu, dělaje, děláním, ...'''
     22
     23   '''magas, magasabb, legmagasabb, legeslegmagasabb, ...'''
     24 * language models can be enriched with linguistic knowledge
     25
     26[[Image(/trac/research/raw-attachment/wiki/en/MachineTranslation/kings.png)]]
     27
     28
     29== Word alignment matrix - from words to phrases ==
     30
     31[[Image(/trac/research/raw-attachment/wiki/en/MachineTranslation/word_matrix1.png)]]
     32[[Image(/trac/research/raw-attachment/wiki/en/MachineTranslation/word_matrix2.png)]]
     33[[Image(/trac/research/raw-attachment/wiki/en/MachineTranslation/word_matrix3.png)]]
     34
     35
     36== Domain-speci
     37c machine translation ==
     38 * straightforward way of increasing quality of MT
     39 * domain-speci
     40c corpora can be downloaded on demand
     41 * separate models for each domain: ''' sports, cooking, gardening'''
     42 * one sense per domain: '''bat'''
     43
     44[[Image(/trac/research/raw-attachment/wiki/en/MachineTranslation/bat.png)]]
     45
     46 * translations of
     47   * product details, product descriptions in e-shops,
     48   * manuals, warranty certi
     49cates,
     50   * user interface localizations, ...
     51
     52
     53== Machine translation between close languages ==
     54 * West and South Slavic languages: Czech, Slovak, Polish, Serbian, Croatian, Slovene
     55 * MT mainly on word level, structure is very similar
     56 * di
     57erences can be described systematically by rules:
     58   '''hraje na klavíri''' <-> '''hraje na klavír'''
     59 * billion-word corpora available for these languages
     60 * dictionaries can be generated semi-automatically
     61
     62 * -> searching for duplicates in close languages (reprinted news)
     63
     64== MT quality, European languages ==
     65
     66[[Image(/trac/research/raw-attachment/wiki/en/MachineTranslation/lang_matrix.png)]]
     67
     68
     69== Sub-word level machine translation ==
     70 * SMT principle applied on character level
     71 * translation on subword level (English -> Czech)
     72[[Image(/trac/research/raw-attachment/wiki/en/MachineTranslation/trans1.png)]]
     73 * translation across levels
     74[[Image(/trac/research/raw-attachment/wiki/en/MachineTranslation/trans2.png)]]
     75
     76 * -> translation of out-of-dictionary words
     77
     78
     79== Conclusions ==
     80 * generating new segments for translation memories
     81 * domain-speci
     82c translation
     83 * translation between close languages
     84 * sub-word level translation