Context Navigation

OpinionSentiment

Timestamp:: Sep 18, 2015, 12:53:58 PM (10 years ago)
Author:: Zuzana Nevěřilová
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

private/NlpInPracticeCourse/OpinionSentiment

-                      v7
+                      v8
 == Practical Session ==
 Train classifier on the Movie Review Data [[http://www.cs.cornell.edu/people/pabo/movie-review-data/]]. Measure precision, recall, and F1-score.
+=== Czech Sentiment Analysis ===
+Train classifier on e-shop evaluation provided by customers and users of www.zbozi.cz. Measure precision, recall, and F1-score.
+In this workshop, we try several methods for opinion mining. First, we will train a classifier on real Czech data from Seznam.cz (thank you for the data, Seznam.cz guys!).
+Discuss the differences between training classifiers on Czech and English data.
+Requirements: python, NLTK module
+`module add nltk` in order to make the NLTK module available
+. Create `<YOUR_FILE>`, a text file named ia161-UCO-01.txt where UCO is your university ID.
+. Open `reviews.csv`, observe column meanings (Ack. to Karel Votruba, Karel.Votruba@firma.seznam.cz):
+. shop ID
+. positive comment
+. negative comment
+. price/delivery rating
+. communication rating
+. goods/content rating
+. package/return rating
+. complaints procedure rating
+. Run classify.py.
+. What the most informative features? Do they make sense? Write most informative features to `<YOUR_FILE>` and mark them with + or - depending on whether they make sense. Example (In the example, we can see that the first feature makes sense while the second does not.):
+  * peníze +
+  * s -
+. Open classify.py, read the code, uncomment print statements, observe the results.
+. Why the classification is not very good even though the accuracy is over 95 %? Particularly, why the last sentence is not correctly classified? Write your answer at the end of `<YOUR_FILE>`. Feel free to discuss it with the group.
+. Think of improvements in feature extraction. Currently, the program takes only words as features. Add another feature, i.e. modify the function `document_features()`.
+=== Sentiment analysis in English ===
+Second, we try the Stanford Core NLP Sentiment Pipeline.
+Requirements: Java 8, python, gigabytes of memory
+`module add jdk`
+this adds Java 8, you can check it when typing `java -version`
+. Download Stanford Core NLP from http://nlp.stanford.edu/software/corenlp.shtml, unzip it.
+. Try it: e.g. `java -cp "stanford-corenlp-full-2015-04-20/*" -mx5g edu.stanford.nlp.sentiment.SentimentPipeline -stdin` or `java -cp "stanford-corenlp-full-2015-04-20/*" -mx5g edu.stanford.nlp.sentiment.SentimentPipeline -file book-reviews.txt`
+. In how many classes does Stanford Core NLP classify the sentences? Write it in `<YOUR_FILE>`. Write example of a wrong classification.