| 3 | [[Image(/trac/research/raw-attachment/wiki/en/MachineTranslation/error.png)]] |
| 4 | |
| 5 | == Topics == |
| 6 | * Statistical machine translation |
| 7 | * Extension of translation memories |
| 8 | * Domain-speci |
| 9 | c machine translation |
| 10 | * Machine translation between close languages |
| 11 | * Sub-word level machine translation |
| 12 | |
| 13 | == Statistical machine translation == |
| 14 | |
| 15 | [[Image(/trac/research/raw-attachment/wiki/en/MachineTranslation/trans.png)]] |
| 16 | |
| 17 | == Improving statistical machine translation == |
| 18 | * free state-of-the-art tools available (SRILM, Moses) |
| 19 | * baseline SMT available for everyone |
| 20 | * languages with high number of wordforms need special treatment |
| 21 | * '''dělám, děláš, dělal, dělajícímu, dělaje, děláním, ...''' |
| 22 | |
| 23 | '''magas, magasabb, legmagasabb, legeslegmagasabb, ...''' |
| 24 | * language models can be enriched with linguistic knowledge |
| 25 | |
| 26 | [[Image(/trac/research/raw-attachment/wiki/en/MachineTranslation/kings.png)]] |
| 27 | |
| 28 | |
| 29 | == Word alignment matrix - from words to phrases == |
| 30 | |
| 31 | [[Image(/trac/research/raw-attachment/wiki/en/MachineTranslation/word_matrix1.png)]] |
| 32 | [[Image(/trac/research/raw-attachment/wiki/en/MachineTranslation/word_matrix2.png)]] |
| 33 | [[Image(/trac/research/raw-attachment/wiki/en/MachineTranslation/word_matrix3.png)]] |
| 34 | |
| 35 | |
| 36 | == Domain-speci |
| 37 | c machine translation == |
| 38 | * straightforward way of increasing quality of MT |
| 39 | * domain-speci |
| 40 | c corpora can be downloaded on demand |
| 41 | * separate models for each domain: ''' sports, cooking, gardening''' |
| 42 | * one sense per domain: '''bat''' |
| 43 | |
| 44 | [[Image(/trac/research/raw-attachment/wiki/en/MachineTranslation/bat.png)]] |
| 45 | |
| 46 | * translations of |
| 47 | * product details, product descriptions in e-shops, |
| 48 | * manuals, warranty certi |
| 49 | cates, |
| 50 | * user interface localizations, ... |
| 51 | |
| 52 | |
| 53 | == Machine translation between close languages == |
| 54 | * West and South Slavic languages: Czech, Slovak, Polish, Serbian, Croatian, Slovene |
| 55 | * MT mainly on word level, structure is very similar |
| 56 | * di |
| 57 | erences can be described systematically by rules: |
| 58 | '''hraje na klavíri''' <-> '''hraje na klavír''' |
| 59 | * billion-word corpora available for these languages |
| 60 | * dictionaries can be generated semi-automatically |
| 61 | |
| 62 | * -> searching for duplicates in close languages (reprinted news) |
| 63 | |
| 64 | == MT quality, European languages == |
| 65 | |
| 66 | [[Image(/trac/research/raw-attachment/wiki/en/MachineTranslation/lang_matrix.png)]] |
| 67 | |
| 68 | |
| 69 | == Sub-word level machine translation == |
| 70 | * SMT principle applied on character level |
| 71 | * translation on subword level (English -> Czech) |
| 72 | [[Image(/trac/research/raw-attachment/wiki/en/MachineTranslation/trans1.png)]] |
| 73 | * translation across levels |
| 74 | [[Image(/trac/research/raw-attachment/wiki/en/MachineTranslation/trans2.png)]] |
| 75 | |
| 76 | * -> translation of out-of-dictionary words |
| 77 | |
| 78 | |
| 79 | == Conclusions == |
| 80 | * generating new segments for translation memories |
| 81 | * domain-speci |
| 82 | c translation |
| 83 | * translation between close languages |
| 84 | * sub-word level translation |