Changes between Version 1 and Version 2 of en/SentenceLevelTextAnalysis


Ignore:
Timestamp:
Jun 5, 2014, 2:47:34 PM (7 years ago)
Author:
xkocinc
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • en/SentenceLevelTextAnalysis

    v1 v2  
    33== Simon speaks about sex with Britney Spears ==
    44
     5[[Image(/trac/research/raw-attachment/wiki/en/SentenceLevelTextAnalysis/simon_britney.png)]]
     6
     7[[Image(/trac/research/raw-attachment/wiki/en/SentenceLevelTextAnalysis/ukazka1.png)]]
     8
     9== Sentence level analysis ==
     10
     11''' Natural language syntax '''
     12 * describes relationships among words
     13
     14''' Automatic syntactic analysis '''
     15 * revealing inter-word relationships on various levels
     16 * detection of noun (prepositional, verb, ...) phrases, clauses
     17
     18* '''| Simon | spoke | about sex | with Britney Spears |'''
     19* '''| Simon | spoke | about sex with Britney Spears |'''
     20
     21== Syntactic trees ==
     22
     23[[Image(/trac/research/raw-attachment/wiki/en/SentenceLevelTextAnalysis/tree1.png)]]
     24
     25[[Image(/trac/research/raw-attachment/wiki/en/SentenceLevelTextAnalysis/tree2.png)]]
     26
     27== Why are we doing this? ==
     28
     29Syntactic units are carriers of meaning
     30 * “in the city”
     31 * meaning of “in”, “the” is unclear, complicated
     32 * meaning of “in the city” is simply '''where'''
     33
     34Words are not enough
     35 * '''red brick house''' vs. '''brick house red''' vs. '''red house brick'''
     36 * '''Honey, give me love''' vs. '''Love, give me honey'''
     37
     38Starting point for intelligent natural language applications
     39 * extraction of facts & question answering
     40 * logical analysis
     41 * punctuation detection & grammar checking
     42 * natural text generation
     43 * authorship detection
     44 * machine translation
     45
     46== Example: Extraction of facts ==
     47
     48[[Image(/trac/research/raw-attachment/wiki/en/SentenceLevelTextAnalysis/ukazka2.png)]]
     49
     50
     51== Example: Logical analysis ==
     52
     53[[Image(/trac/research/raw-attachment/wiki/en/SentenceLevelTextAnalysis/ukazka3.png)]]
     54
     55
     56== Example: Grammar checking ==
     57
     58* Let’s eat grandma!
     59   * syntactic analysis
     60   * detection of non-probable constructions
     61   * -> grandma is not a usual object of eating
     62   * -> correction suggestion
     63
     64* Let’s eat, grandma!
     65   * life saved :)
     66
     67[[Image(/trac/research/raw-attachment/wiki/en/SentenceLevelTextAnalysis/punctuation.jpg)]]
     68
     69
     70Similarly with other grammar phenomena
     71   “This is worth try” -> “This is worth try'''ing'''”
     72
     73
     74== How to analyse natural language syntax? ==
     75
     76'''Prerequisites'''
     77 * '''word level analysis''' (part of speech, gender, number)
     78 * named entity recognition
     79 * common sense information (e.g. “pregnant” goes with women only)
     80
     81'''Named entity recognition'''
     82 * determine that e.g. “prof. Václav Šplíchal” is a person
     83 * can be viewed as a sub-task of syntactic analysis
     84
     85'''Statistical methods'''
     86 * people annotate corpus
     87 * statistic methods learn rules from the corpus
     88 * universal across languages (to some extent)
     89 * annotation is expensive
     90 * hard to customize for different applications
     91 * data are usually not big enough
     92
     93'''Rule-based methods'''
     94 * specialists develop a set of rules (“grammar”)
     95 * not universal, depends on specialists
     96 * grammar can become uneasy to maintain
     97 * easy to customize for different applications
     98
     99'''Hybrids'''
     100
     101
     102
     103== Syntactic analysers in the NLP Centre ==
     104
     105'''Synt'''
     106 * C++, fast (0.07 s/sentence)
     107 * based on an expressive meta-grammar
     108
     109'''SET'''
     110 * Python, slower but easily adaptable
     111 * based on a set of phrase patterns
     112
     113'''Synt+SET'''
     114 * rule-based backbone with statistical extensions
     115 * grammars for Czech, English and Slovak
     116 * accuracy 85–90 % on newspaper texts
     117
     118'''Word Sketches'''
     119 * very fast shallow syntax for large corpora
     120 * 31 languages
     121
     122
     123== Conclusions ==
     124Sentence level analysis
     125 * detection of phrases and inter-word relationships
     126 * their further processing
     127
     128Applications
     129 * grammar checking
     130 * information analysis of text
     131 * text generation