= Semantic Analysis = == Presentation outline == == Will computers ever understand us? Understanding of ''understanding'' == === Aims: inappropriate, naughty, vulgar, silly posts detection === Use case: discussion forum, automatic detection of inappropriate posts[[BR]] Common solution: word list[[BR]] But: users use obfuscated words that are difficult to detect (f*king, f.u.c.k,f..k)[[BR]] Better solution: word list + obfuscation rules[[BR]] But: users invent new obfuscation patterns[[BR]] Even better solution: word list + automatically generated thesaurus + obfuscation rules + naughty language patterns (e.g you !!!) === Aims: text summarization === Use case: automatic abstract generation, multiple document digest, are these documents stating similar or oposite theses?[[BR]] Common solution: take every first sentence in a paragraph or take every sentence containing a keyword[[BR]] But: works worse on Slavic languages, is not really scalable, almost impossible to detect the main thesis[[BR]] Better solution: analyse text on several levels * as a whole discourse (sections, paragraphs, references) * as a sequence of sentences (each having a structure) * as a bag of words and keywords (in different forms, synonyms, abbreviations etc.) * main theses detection * text generation == Aims: opinion mining == (this part may be replaced by ''content targeting'') Use case: what are people thinking about a particular product/company/idea X?[[BR]] Solution: search X[[BR]] But: what other names a people giving to X? what are people saying about X?[[BR]] Better solution: * found synonyms for X * extract useful attributes of X (noise, weight, price, appearance) * generate thesauri of opinion words (weird rattle in iPhone5?) == Aims: question answering == Use case: chatbot providing basic support (do you have a phone similar to Sony Xperia Z but cheaper? what is the shipping cost?)[[BR]] Solution: patterns, keyword detection (Sony Xperia Z, shipping), then searching[[BR]] But: no real dialogue, no real answers just searching[[BR]] Better solution: sentence structure analysis, keyword detection, coreference resolution, dialogue strategy[[BR]] Is this real understanding? Will computers understand us? No. We don’t know what understanding is but we know how ''it looks like'' when someone understands. Computer programs that can discover a vulgar text, summarize a text, answer questions, “feel” emotions look like they understand our language... (in fact this is a ''behaviorist approach'').