= Sentence Level Text Analysis =

== Simon speaks about sex with Britney Spears ==

[[Image(/trac/research/raw-attachment/wiki/en/SentenceLevelTextAnalysis/simon_britney.png)]]

[[Image(/trac/research/raw-attachment/wiki/en/SentenceLevelTextAnalysis/ukazka1.png)]]

== Sentence level analysis ==

''' Natural language syntax '''
 * describes relationships among words

''' Automatic syntactic analysis '''
 * revealing inter-word relationships on various levels
 * detection of noun (prepositional, verb, ...) phrases, clauses

* '''| Simon | spoke | about sex | with Britney Spears |'''
* '''| Simon | spoke | about sex with Britney Spears |'''

== Syntactic trees ==

[[Image(/trac/research/raw-attachment/wiki/en/SentenceLevelTextAnalysis/tree1.png)]]

[[Image(/trac/research/raw-attachment/wiki/en/SentenceLevelTextAnalysis/tree2.png)]]

== Why are we doing this? ==

Syntactic units are carriers of meaning
 * “in the city”
 * meaning of “in”, “the” is unclear, complicated
 * meaning of “in the city” is simply '''where'''

Words are not enough
 * '''red brick house''' vs. '''brick house red''' vs. '''red house brick'''
 * '''Honey, give me love''' vs. '''Love, give me honey'''

Starting point for intelligent natural language applications
 * extraction of facts & question answering
 * logical analysis
 * punctuation detection & grammar checking
 * natural text generation
 * authorship detection
 * machine translation 

== Example: Extraction of facts ==

[[Image(/trac/research/raw-attachment/wiki/en/SentenceLevelTextAnalysis/ukazka2.png)]]


== Example: Logical analysis ==

[[Image(/trac/research/raw-attachment/wiki/en/SentenceLevelTextAnalysis/ukazka3.png)]]


== Example: Grammar checking ==

* Let’s eat grandma!
   * syntactic analysis
   * detection of non-probable constructions
   * -> grandma is not a usual object of eating
   * -> correction suggestion

* Let’s eat, grandma!
   * life saved :)

[[Image(/trac/research/raw-attachment/wiki/en/SentenceLevelTextAnalysis/punctuation.jpg)]]


Similarly with other grammar phenomena 
   “This is worth try” -> “This is worth try'''ing'''”


== How to analyse natural language syntax? ==

'''Prerequisites'''
 * '''word level analysis''' (part of speech, gender, number)
 * named entity recognition
 * common sense information (e.g. “pregnant” goes with women only)

'''Named entity recognition'''
 * determine that e.g. “prof. Václav Šplíchal” is a person
 * can be viewed as a sub-task of syntactic analysis

'''Statistical methods'''
 * people annotate corpus
 * statistic methods learn rules from the corpus
 * universal across languages (to some extent)
 * annotation is expensive
 * hard to customize for different applications
 * data are usually not big enough

'''Rule-based methods'''
 * specialists develop a set of rules (“grammar”)
 * not universal, depends on specialists
 * grammar can become uneasy to maintain
 * easy to customize for different applications

'''Hybrids'''



== Syntactic analysers in the NLP Centre ==

'''Synt'''
 * C++, fast (0.07 s/sentence)
 * based on an expressive meta-grammar 

'''SET'''
 * Python, slower but easily adaptable
 * based on a set of phrase patterns

'''Synt+SET'''
 * rule-based backbone with statistical extensions
 * grammars for Czech, English and Slovak
 * accuracy 85–90 % on newspaper texts

'''Word Sketches'''
 * very fast shallow syntax for large corpora
 * 31 languages


== Conclusions ==
Sentence level analysis
 * detection of phrases and inter-word relationships
 * their further processing

Applications
 * grammar checking
 * information analysis of text
 * text generation