Version 10 (modified by 9 years ago) (diff) | ,
---|
Sentence Level Text Analysis
Simon speaks about sex with Britney Spears
Example
Sentence level analysis
Natural language syntax
- describes relationships among words
Automatic syntactic analysis
- revealing inter-word relationships on various levels
- detection of noun (prepositional, verb, ...) phrases, clauses
- | Simon | spoke | about sex | with Britney Spears |
- | Simon | spoke | about sex with Britney Spears |
Syntactic trees
Why are we doing this?
Syntactic units are carriers of meaning
- “in the city”
- meaning of “in”, “the” is unclear, complicated
- meaning of “in the city” is simply where
Words are not enough
- red brick house vs. brick house red vs. red house brick
- Honey, give me love vs. Love, give me honey
Starting point for intelligent natural language applications
- extraction of facts & question answering
- logical analysis
- punctuation detection & grammar checking
- natural text generation
- authorship detection
- machine translation
Example: Extraction of facts
Example: Logical analysis
Example: Grammar checking
- Let’s eat grandma'''
- syntactic analysis
- detection of non-probable constructions
- -> grandma is not a usual object of eating
- -> correction suggestion
- Let’s eat, grandma'''
- life saved :)
Similarly with other grammar phenomena
- “This is worth try”
->
- “This is worth trying”
How to analyse natural language syntax?
Prerequisites
- word level analysis (part of speech, gender, number)
- named entity recognition
- common sense information (e.g. “pregnant” goes with women only)
Named entity recognition
- determine that e.g. “prof. Václav Šplíchal” is a person
- can be viewed as a sub-task of syntactic analysis
Statistical methods
- people annotate corpus
- statistic methods learn rules from the corpus
- universal across languages (to some extent)
- annotation is expensive
- hard to customize for different applications
- data are usually not big enough
Rule-based methods
- specialists develop a set of rules (“grammar”)
- not universal, depends on specialists
- grammar can become uneasy to maintain
- easy to customize for different applications
Hybrids
Syntactic analysers in the NLP Centre
Synt
- C++, fast (0.07 s/sentence)
- based on an expressive meta-grammar
SET
- Python, slower but easily adaptable
- based on a set of phrase patterns
Synt+SET
- rule-based backbone with statistical extensions
- grammars for Czech, English and Slovak
- accuracy 85–90 % on newspaper texts
Word Sketches
- very fast shallow syntax for large corpora
- 31 languages
Conclusions
Sentence level analysis
- detection of phrases and inter-word relationships
- their further processing
Applications
- grammar checking
- information analysis of text
- text generation
Attachments (7)
- simon_britney.png (527.1 KB) - added by 9 years ago.
- ukazka1.png (55.8 KB) - added by 9 years ago.
- tree1.png (26.3 KB) - added by 9 years ago.
- tree2.png (45.1 KB) - added by 9 years ago.
- ukazka2.png (68.6 KB) - added by 9 years ago.
- ukazka3.png (69.6 KB) - added by 9 years ago.
- punctuation.jpg (46.5 KB) - added by 9 years ago.
Download all attachments as: .zip