Context Navigation

Changes between Version 1 and Version 2 of en/ProcessingLargeTextCollections

-                      v1
+                      v2
  * '''real''' data instead of false assumptions
+== Information in Text ==
+[[Image(/trac/research/raw-attachment/wiki/en/WordLevelAnalysis/text.png)]]
+== Text collection = a text corpus ==
+ * text collection: usually referred to as '''text corpus'''
+ * '''humanities''' → corpus linguistics, language learning
+ * '''computer science''' → effective design of specialized database management systems
+ * '''applications''' → usage of ''any text'' as information source
+== Text Corpora as Information Source ==
+[[Image(/trac/research/raw-attachment/wiki/en/WordLevelAnalysis/goal.png)]]
+== So what is a corpus? ==
+[[Image(/trac/research/raw-attachment/wiki/en/WordLevelAnalysis/what_is_corpus.png)]]
+== Corpora ==
+ * '''text type'''
+   * ''general language'' (gather domain independent information: common sense knowledge, global statistics, information defaults)
+   * ''domain specific'' (gather domain specific information: terminology, in-domain knowledge, contrast to common texts)
+ * '''timeline'''
+   * ''synchronic'': one time period / time span (→ what is up now?)
+   * ''diachronic'': different time periods / time spans (→ what are the trends?)
+ * '''language, written/spoken, metadata annotation type,...'''