Changes between Version 7 and Version 8 of private/NlpInPracticeCourse/OpinionSentiment

Sep 18, 2015, 12:53:58 PM (7 years ago)
Zuzana Nevěřilová



  • private/NlpInPracticeCourse/OpinionSentiment

    v7 v8  
    2121== Practical Session ==
    23 Train classifier on the Movie Review Data [[]]. Measure precision, recall, and F1-score.
     23=== Czech Sentiment Analysis ===
    25 Train classifier on e-shop evaluation provided by customers and users of Measure precision, recall, and F1-score.
     25In this workshop, we try several methods for opinion mining. First, we will train a classifier on real Czech data from (thank you for the data, guys!).
    27 Discuss the differences between training classifiers on Czech and English data.
     27Requirements: python, NLTK module
     29`module add nltk` in order to make the NLTK module available
     311. Create `<YOUR_FILE>`, a text file named ia161-UCO-01.txt where UCO is your university ID.
     321. Open `reviews.csv`, observe column meanings (Ack. to Karel Votruba,
     33  1. shop ID
     34  1. positive comment
     35  1. negative comment
     36  1. price/delivery rating
     37  1. communication rating
     38  1. goods/content rating
     39  1. package/return rating
     40  1. complaints procedure rating
     411. Run
     421. What the most informative features? Do they make sense? Write most informative features to `<YOUR_FILE>` and mark them with + or - depending on whether they make sense. Example (In the example, we can see that the first feature makes sense while the second does not.):
     43  * peníze +
     44  * s -
     451. Open, read the code, uncomment print statements, observe the results.
     461. Why the classification is not very good even though the accuracy is over 95 %? Particularly, why the last sentence is not correctly classified? Write your answer at the end of `<YOUR_FILE>`. Feel free to discuss it with the group.
     471. Think of improvements in feature extraction. Currently, the program takes only words as features. Add another feature, i.e. modify the function `document_features()`.
     49=== Sentiment analysis in English ===
     51Second, we try the Stanford Core NLP Sentiment Pipeline.
     53Requirements: Java 8, python, gigabytes of memory
     55`module add jdk`
     56this adds Java 8, you can check it when typing `java -version`
     581. Download Stanford Core NLP from, unzip it.
     602. Try it: e.g. `java -cp "stanford-corenlp-full-2015-04-20/*" -mx5g edu.stanford.nlp.sentiment.SentimentPipeline -stdin` or `java -cp "stanford-corenlp-full-2015-04-20/*" -mx5g edu.stanford.nlp.sentiment.SentimentPipeline -file book-reviews.txt`
     623. In how many classes does Stanford Core NLP classify the sentences? Write it in `<YOUR_FILE>`. Write example of a wrong classification.