Version 1 (modified by Zuzana Nevěřilová, 9 years ago) (diff)


Semantic Analysis

Presentation outline

Will computers ever understand us? Understanding of understanding

Aims: inappropriate, naughty, vulgar, silly posts detection

Use case: discussion forum, automatic detection of inappropriate posts
Common solution: word list
But: users use obfuscated words that are difficult to detect (f*king, f.u.c.k,f..k)
Better solution: word list + obfuscation rules
But: users invent new obfuscation patterns
Even better solution: word list + automatically generated thesaurus + obfuscation rules + naughty language patterns (e.g you <adjective> <noun>!!!)

Aims: text summarization

Use case: automatic abstract generation, multiple document digest, are these documents stating similar or oposite theses?
Common solution: take every first sentence in a paragraph or take every sentence containing a keyword
But: works worse on Slavic languages, is not really scalable, almost impossible to detect the main thesis
Better solution: analyse text on several levels

  • as a whole discourse (sections, paragraphs, references)
  • as a sequence of sentences (each having a structure)
  • as a bag of words and keywords (in different forms, synonyms, abbreviations etc.)
  • main theses detection
  • text generation

Aims: opinion mining

(this part may be replaced by content targeting)

Use case: what are people thinking about a particular product/company/idea X?
Solution: search X
But: what other names a people giving to X? what are people saying about X?
Better solution:

  • found synonyms for X
  • extract useful attributes of X (noise, weight, price, appearance)
  • generate thesauri of opinion words (weird rattle in iPhone5?)

Aims: question answering

Use case: chatbot providing basic support (do you have a phone similar to Sony Xperia Z but cheaper? what is the shipping cost?)
Solution: patterns, keyword detection (Sony Xperia Z, shipping), then searching
But: no real dialogue, no real answers just searching
Better solution: sentence structure analysis, keyword detection, coreference resolution, dialogue strategy

Is this real understanding? Will computers understand us? No. We don’t know what understanding is but we know how it looks like when someone understands. Computer programs that can discover a vulgar text, summarize a text, answer questions, “feel” emotions look like they understand our language... (in fact this is a behaviorist approach).