Čeština
English
  • Vítejte na stránkách NLP Centra!
  • Zapojte se do vývoje softwarových nástrojů!
  • Analýza přirozeného jazyka
  • Vyzkoušejte si korpusy o velikosti knihoven online!
  • Studujte jednu ze specializací!
  • Členové laboratoře

Text Characteristics

Keyword extraction

/trac/research/raw-attachment/wiki/en/TextCharacteristics/example.png

Definition Words used to characterise the contents of a document.

Method Select words that appear with statistically unusual frequency in a text

Applications

  • Text classification (topic, spam)
  • Search Engine Optimisation (SEO)
  • Text filtering (job advertising, RSS)
  • Text summarization
  • Text clustering and reorganization

/trac/research/raw-attachment/wiki/en/TextCharacteristics/seo.png

Communication Pattern Analysis

/trac/research/raw-attachment/wiki/en/TextCharacteristics/text_characteristics.png

Motivation

  • Analysis of personality traits using author’s verbal style
  • Optimize communication strategies
  • Behaviour prediction

/trac/research/raw-attachment/wiki/en/TextCharacteristics/applications.png

Author’s traits

/trac/research/raw-attachment/wiki/en/TextCharacteristics/vocabulary.png

Problem Definition

/trac/research/raw-attachment/wiki/en/TextCharacteristics/auth_ver.png

/trac/research/raw-attachment/wiki/en/TextCharacteristics/auth_att.png

/trac/research/raw-attachment/wiki/en/TextCharacteristics/auth_clus.png

Author Writeprint/Stylom

/trac/research/raw-attachment/wiki/en/TextCharacteristics/collection.png

Authorship Verification

/trac/research/raw-attachment/wiki/en/TextCharacteristics/stylometry.png

Machine learning approach

/trac/research/raw-attachment/wiki/en/TextCharacteristics/simML.png

Accuracy

/trac/research/raw-attachment/wiki/en/TextCharacteristics/verification.png

Conclusions

Keyword Extraction A Brief representation of the content of a document.

Communication Pattern Analysis An analysis of personality traits.

Authorship Recognition An uncovering authorship of anonymous texts.