Changes between Initial Version and Version 1 of en/AdvancedNlpCourse2018/OpinionSentiment


Ignore:
Timestamp:
Sep 12, 2019, 11:11:15 AM (2 years ago)
Author:
Ales Horak
Comment:

copied from private/AdvancedNlpCourse/OpinionSentiment

Legend:

Unmodified
Added
Removed
Modified
  • en/AdvancedNlpCourse2018/OpinionSentiment

    v1 v1  
     1= Opinion mining, sentiment analysis =
     2
     3[[https://is.muni.cz/auth/predmet/fi/ia161|IA161]] [[en/AdvancedNlpCourse|Advanced NLP Course]], Course Guarantee: Aleš Horák
     4
     5Prepared by: Zuzana Nevěřilová
     6
     7== State of the Art ==
     8
     9Sentiment analysis can be seen as a text categorization task (i.e. is the writer's opinion on a discussed topic X or Y?). It consists of detection of the topic (which can be easy in focused reviews) and detection of the sentiment (which is generally difficult). Opinions are sometimes expressed in a very subtle manner (e.g. the sentence ''How could anyone sit through this movie?'' contains no negative word) ![3]. The sentiments are usually simply classified by their polarity (positive, negative) but they can be recognized more in depth (e.g. strongly negative). Recognized opinions are also subject to summarization (e.g. how many people like this new iPhone design?).
     10
     11=== References ===
     12
     13 1. Bing Liu. Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies. 2012, 5(1): 1-167. DOI: 10.2200/s00416ed1v01y201204hlt016. Draft version available at [[http://www.cs.uic.edu/~liub/FBS/SentimentAnalysis-and-OpinionMining.pdf]]
     14 1. Bing Liu. Sentiment Analysis Tutorial. AAAI-2011, August 8, 2011. Slides available at [[http://www.cs.uic.edu/~liub/FBS/Sentiment-Analysis-tutorial-AAAI-2011.pdf]]
     15 1.  Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan, Thumbs up? Sentiment Classification using Machine Learning Techniques, Proceedings of EMNLP 2002. [[http://www.cs.cornell.edu/home/llee/papers/sentiment.pdf]]
     16 1. Liviu P. Dinu and Iulia Iuga. The Naive Bayes classifier in opinion mining: In search of the best feature set. In Alexander Gelbukh, editor, Computational Linguistics and Intelligent Text Processing, volume 7181 of Lecture Notes in Computer Science, pages 556–567. Springer Berlin Heidelberg, 2012.
     17
     18
     19Bing Liu's References: http://www.cs.uic.edu/~liub/FBS/AAAI-2011-tutorial-references.pdf
     20
     21== Practical Session ==
     22
     23=== Czech Sentiment Analysis ===
     24
     25In this workshop, we try several methods for opinion mining. First, we will train a classifier on real Czech data from Seznam.cz (thank you for the data, Seznam.cz guys!).
     26
     27Requirements: python, NLTK module
     28
     29Files: [raw-attachment:reviews.csv reviews.csv], [raw-attachment:reviews_desamb.csv reviews_desamb.csv], [raw-attachment:reviews_czaccent.csv reviews_czaccent.csv], [raw-attachment:reviews_czaccent_desamb.csv reviews_czaccent_desamb.csv], [raw-attachment:classify.py classify.py]
     30
     311. Create `<YOUR_FILE>`, a text file named ia161-UCO-01.txt where UCO is your university ID.
     321. Open `reviews.csv`, observe column meanings (Ack. to Karel Votruba, Karel.Votruba@firma.seznam.cz):
     33  1. shop ID
     34  1. positive comment
     35  1. negative comment
     36  1. price/delivery rating
     37  1. communication rating
     38  1. goods/content rating
     39  1. package/return rating
     40  1. complaints procedure rating
     411. Run classify.py.
     42  {{{
     43PYTHONIOENCODING=UTF-8 python classify.py
     44}}}
     451. What the most informative features? Do they make sense? Write most informative features to `<YOUR_FILE>` and mark them with + or - depending on whether they make sense. Example (In the example, we can see that the first feature makes sense while the second does not.):
     46  * peníze +
     47  * s -
     481. Open classify.py, read the code, uncomment print statements, observe the results.
     491. Why the classification is not very good even though the accuracy is over 95 %? Particularly, why the last sentence is not correctly classified? Write your answer at the end of `<YOUR_FILE>`. Feel free to discuss it with the group.
     501. Think of improvements in feature extraction. Currently, the program takes only words as features. Add another feature, i.e. modify the function `document_features()`.
     51
     52=== Sentiment analysis in English ===
     53
     54Second, we try the Stanford Core NLP Sentiment Pipeline.
     55
     56Requirements: Java 8, python, gigabytes of memory
     57
     58`module add jdk`
     59this adds Java 8, you can check it when typing `java -version`
     60
     611. Download Stanford Core NLP from http://nlp.stanford.edu/software/corenlp.shtml, unzip it.
     62
     632. Try it: e.g.
     64{{{
     65java -cp "stanford-corenlp-full-2017-06-09/*" -mx5g \
     66    edu.stanford.nlp.sentiment.SentimentPipeline -stdin
     67}}}
     68 or
     69{{{
     70java -cp "stanford-corenlp-full-2017-06-09/*" -mx5g \
     71    edu.stanford.nlp.sentiment.SentimentPipeline \
     72    -file book-reviews.txt
     73}}}
     74
     75The file [raw-attachment:book-reviews.txt book-reviews.txt] is available.
     76
     773. In how many classes does Stanford Core NLP classify the sentences? Write it in `<YOUR_FILE>`. Write example of a wrong classification.
     78
     79=== Upload `<YOUR_FILE>` ===
     80
     81Do not forget to upload your resulting file to the [wiki:en/AdvancedNlpCourse homework vault (odevzdávárna)].