Version 7 (modified by xkocinc, 10 years ago) (diff)


Sentence Level Text Analysis

Simon speaks about sex with Britney Spears



Sentence level analysis

Natural language syntax

  • describes relationships among words

Automatic syntactic analysis

  • revealing inter-word relationships on various levels
  • detection of noun (prepositional, verb, ...) phrases, clauses
  • | Simon | spoke | about sex | with Britney Spears |
  • | Simon | spoke | about sex with Britney Spears |

Syntactic trees



Why are we doing this?

Syntactic units are carriers of meaning

  • “in the city”
  • meaning of “in”, “the” is unclear, complicated
  • meaning of “in the city” is simply where

Words are not enough

  • red brick house vs. brick house red vs. red house brick
  • Honey, give me love vs. Love, give me honey

Starting point for intelligent natural language applications

  • extraction of facts & question answering
  • logical analysis
  • punctuation detection & grammar checking
  • natural text generation
  • authorship detection
  • machine translation

Example: Extraction of facts


Example: Logical analysis


Example: Grammar checking


  • Let’s eat grandma'''
    • syntactic analysis
    • detection of non-probable constructions
    • -> grandma is not a usual object of eating
    • -> correction suggestion
  • Let’s eat, grandma'''
    • life saved :)

Similarly with other grammar phenomena

“This is worth try” -> “This is worth trying

How to analyse natural language syntax?


  • word level analysis (part of speech, gender, number)
  • named entity recognition
  • common sense information (e.g. “pregnant” goes with women only)

Named entity recognition

  • determine that e.g. “prof. Václav Šplíchal” is a person
  • can be viewed as a sub-task of syntactic analysis

Statistical methods

  • people annotate corpus
  • statistic methods learn rules from the corpus
  • universal across languages (to some extent)
  • annotation is expensive
  • hard to customize for different applications
  • data are usually not big enough

Rule-based methods

  • specialists develop a set of rules (“grammar”)
  • not universal, depends on specialists
  • grammar can become uneasy to maintain
  • easy to customize for different applications


Syntactic analysers in the NLP Centre


  • C++, fast (0.07 s/sentence)
  • based on an expressive meta-grammar


  • Python, slower but easily adaptable
  • based on a set of phrase patterns


  • rule-based backbone with statistical extensions
  • grammars for Czech, English and Slovak
  • accuracy 85–90 % on newspaper texts

Word Sketches

  • very fast shallow syntax for large corpora
  • 31 languages


Sentence level analysis

  • detection of phrases and inter-word relationships
  • their further processing


  • grammar checking
  • information analysis of text
  • text generation

Attachments (7)

Download all attachments as: .zip