Context Navigation

Changes between Version 1 and Version 2 of en/SentenceLevelTextAnalysis

Timestamp:: Jun 5, 2014, 2:47:34 PM (11 years ago)
Author:: xkocinc
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

en/SentenceLevelTextAnalysis

-                      v1
+                      v2
 == Simon speaks about sex with Britney Spears ==
+[[Image(/trac/research/raw-attachment/wiki/en/SentenceLevelTextAnalysis/simon_britney.png)]]
+[[Image(/trac/research/raw-attachment/wiki/en/SentenceLevelTextAnalysis/ukazka1.png)]]
+== Sentence level analysis ==
+''' Natural language syntax '''
+ * describes relationships among words
+''' Automatic syntactic analysis '''
+ * revealing inter-word relationships on various levels
+ * detection of noun (prepositional, verb, ...) phrases, clauses
+* '''| Simon | spoke | about sex | with Britney Spears |'''
+* '''| Simon | spoke | about sex with Britney Spears |'''
+== Syntactic trees ==
+[[Image(/trac/research/raw-attachment/wiki/en/SentenceLevelTextAnalysis/tree1.png)]]
+[[Image(/trac/research/raw-attachment/wiki/en/SentenceLevelTextAnalysis/tree2.png)]]
+== Why are we doing this? ==
+Syntactic units are carriers of meaning
+ * “in the city”
+ * meaning of “in”, “the” is unclear, complicated
+ * meaning of “in the city” is simply '''where'''
+Words are not enough
+ * '''red brick house''' vs. '''brick house red''' vs. '''red house brick'''
+ * '''Honey, give me love''' vs. '''Love, give me honey'''
+Starting point for intelligent natural language applications
+ * extraction of facts & question answering
+ * logical analysis
+ * punctuation detection & grammar checking
+ * natural text generation
+ * authorship detection
+ * machine translation
+== Example: Extraction of facts ==
+[[Image(/trac/research/raw-attachment/wiki/en/SentenceLevelTextAnalysis/ukazka2.png)]]
+== Example: Logical analysis ==
+[[Image(/trac/research/raw-attachment/wiki/en/SentenceLevelTextAnalysis/ukazka3.png)]]
+== Example: Grammar checking ==
+* Let’s eat grandma!
+   * syntactic analysis
+   * detection of non-probable constructions
+   * -> grandma is not a usual object of eating
+   * -> correction suggestion
+* Let’s eat, grandma!
+   * life saved :)
+[[Image(/trac/research/raw-attachment/wiki/en/SentenceLevelTextAnalysis/punctuation.jpg)]]
+Similarly with other grammar phenomena
+   “This is worth try” -> “This is worth try'''ing'''”
+== How to analyse natural language syntax? ==
+'''Prerequisites'''
+ * '''word level analysis''' (part of speech, gender, number)
+ * named entity recognition
+ * common sense information (e.g. “pregnant” goes with women only)
+'''Named entity recognition'''
+ * determine that e.g. “prof. Václav Šplíchal” is a person
+ * can be viewed as a sub-task of syntactic analysis
+'''Statistical methods'''
+ * people annotate corpus
+ * statistic methods learn rules from the corpus
+ * universal across languages (to some extent)
+ * annotation is expensive
+ * hard to customize for different applications
+ * data are usually not big enough
+'''Rule-based methods'''
+ * specialists develop a set of rules (“grammar”)
+ * not universal, depends on specialists
+ * grammar can become uneasy to maintain
+ * easy to customize for different applications
+'''Hybrids'''
+== Syntactic analysers in the NLP Centre ==
+'''Synt'''
+ * C++, fast (0.07 s/sentence)
+ * based on an expressive meta-grammar
+'''SET'''
+ * Python, slower but easily adaptable
+ * based on a set of phrase patterns
+'''Synt+SET'''
+ * rule-based backbone with statistical extensions
+ * grammars for Czech, English and Slovak
+ * accuracy 85–90 % on newspaper texts
+'''Word Sketches'''
+ * very fast shallow syntax for large corpora
+ * 31 languages
+== Conclusions ==
+Sentence level analysis
+ * detection of phrases and inter-word relationships
+ * their further processing
+Applications
+ * grammar checking
+ * information analysis of text
+ * text generation