| 78 | |
| 79 | |
| 80 | == Example of Contexts — Word Sketches == |
| 81 | |
| 82 | [[Image(/trac/research/raw-attachment/wiki/en/WordLevelAnalysis/stat.png)]] |
| 83 | |
| 84 | == Spellchecking and Diacritics Restoration == |
| 85 | |
| 86 | Data also allow spellchecking and diacritics restoration: |
| 87 | |
| 88 | [[Image(/trac/research/raw-attachment/wiki/en/WordLevelAnalysis/czAccent.png)]] |
| 89 | |
| 90 | |
| 91 | == Universality == |
| 92 | |
| 93 | All the mentioned processes can be |
| 94 | * tuned for a specific domain |
| 95 | * using texts from this domain |
| 96 | * applied to a language other than Czech |
| 97 | * (Slovak, Polish, German, English, ...) |
| 98 | |
| 99 | |
| 100 | == Latest Applications == |
| 101 | |
| 102 | Seznam.cz, Yandex.ru, Aukro.cz, Václav Havel Library |
| 103 | * indexing and searching |
| 104 | |
| 105 | Information System of Masaryk University |
| 106 | * other universities and schools (FHS UK, JAMU, VŠFS, ...) |
| 107 | * affiliate projects (theses.cz, odevzdej.cz, repozitar.cz) |
| 108 | * indexing, searching and plagiarism detection |
| 109 | |
| 110 | “Internetová jazyková příručka” |
| 111 | * online source on Czech orthography and grammar |
| 112 | * NLP Centre data were a starting point for word form tables |
| 113 | |
| 114 | |
| 115 | == Conclusions == |
| 116 | |
| 117 | Word level processing of texts allows: |
| 118 | * various types of base word determining which forms are to be grouped together |
| 119 | * ambiguity resolution according to the context |
| 120 | * word form generation |
| 121 | * spellchecking, diacritics restoration |
| 122 | |
| 123 | The tools/data can be domain specific and for various languages |
| 124 | |