Context Navigation

Changes between Version 4 and Version 5 of en/WordLevelAnalysis

Timestamp:: Jun 5, 2014, 10:56:28 AM (11 years ago)
Author:: xkocinc
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

en/WordLevelAnalysis

-                      v4
+                      v5
  * rules and/or statistical data describe typical contexts of nouns, verbs, etc.
  * using such information one can tell that ''stát'' is noun/verb
+== Example of Contexts — Word Sketches ==
+[[Image(/trac/research/raw-attachment/wiki/en/WordLevelAnalysis/stat.png)]]
+== Spellchecking and Diacritics Restoration ==
+Data also allow spellchecking and diacritics restoration:
+[[Image(/trac/research/raw-attachment/wiki/en/WordLevelAnalysis/czAccent.png)]]
+== Universality ==
+All the mentioned processes can be
+ * tuned for a specific domain
+   * using texts from this domain
+ * applied to a language other than Czech
+   * (Slovak, Polish, German, English, ...)
+== Latest Applications ==
+Seznam.cz, Yandex.ru, Aukro.cz, Václav Havel Library
+ * indexing and searching
+Information System of Masaryk University
+ * other universities and schools (FHS UK, JAMU, VŠFS, ...)
+ * affiliate projects (theses.cz, odevzdej.cz, repozitar.cz)
+ * indexing, searching and plagiarism detection
+“Internetová jazyková příručka”
+ * online source on Czech orthography and grammar
+ * NLP Centre data were a starting point for word form tables
+== Conclusions ==
+Word level processing of texts allows:
+ * various types of base word determining which forms are to be grouped together
+ * ambiguity resolution according to the context
+ * word form generation
+ * spellchecking, diacritics restoration
+The tools/data can be domain specific and for various languages