Changes between Version 7 and Version 8 of private/NlpInPracticeCourse/OpinionSentiment


Ignore:
Timestamp:
Sep 18, 2015, 12:53:58 PM (10 years ago)
Author:
Zuzana Nevěřilová
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • private/NlpInPracticeCourse/OpinionSentiment

    v7 v8  
    2121== Practical Session ==
    2222
    23 Train classifier on the Movie Review Data [[http://www.cs.cornell.edu/people/pabo/movie-review-data/]]. Measure precision, recall, and F1-score.
     23=== Czech Sentiment Analysis ===
    2424
    25 Train classifier on e-shop evaluation provided by customers and users of www.zbozi.cz. Measure precision, recall, and F1-score.
     25In this workshop, we try several methods for opinion mining. First, we will train a classifier on real Czech data from Seznam.cz (thank you for the data, Seznam.cz guys!).
    2626
    27 Discuss the differences between training classifiers on Czech and English data.
     27Requirements: python, NLTK module
     28
     29`module add nltk` in order to make the NLTK module available
     30
     311. Create `<YOUR_FILE>`, a text file named ia161-UCO-01.txt where UCO is your university ID.
     321. Open `reviews.csv`, observe column meanings (Ack. to Karel Votruba, Karel.Votruba@firma.seznam.cz):
     33  1. shop ID
     34  1. positive comment
     35  1. negative comment
     36  1. price/delivery rating
     37  1. communication rating
     38  1. goods/content rating
     39  1. package/return rating
     40  1. complaints procedure rating
     411. Run classify.py.
     421. What the most informative features? Do they make sense? Write most informative features to `<YOUR_FILE>` and mark them with + or - depending on whether they make sense. Example (In the example, we can see that the first feature makes sense while the second does not.):
     43  * peníze +
     44  * s -
     451. Open classify.py, read the code, uncomment print statements, observe the results.
     461. Why the classification is not very good even though the accuracy is over 95 %? Particularly, why the last sentence is not correctly classified? Write your answer at the end of `<YOUR_FILE>`. Feel free to discuss it with the group.
     471. Think of improvements in feature extraction. Currently, the program takes only words as features. Add another feature, i.e. modify the function `document_features()`.
     48
     49=== Sentiment analysis in English ===
     50
     51Second, we try the Stanford Core NLP Sentiment Pipeline.
     52
     53Requirements: Java 8, python, gigabytes of memory
     54
     55`module add jdk`
     56this adds Java 8, you can check it when typing `java -version`
     57
     581. Download Stanford Core NLP from http://nlp.stanford.edu/software/corenlp.shtml, unzip it.
     59
     602. Try it: e.g. `java -cp "stanford-corenlp-full-2015-04-20/*" -mx5g edu.stanford.nlp.sentiment.SentimentPipeline -stdin` or `java -cp "stanford-corenlp-full-2015-04-20/*" -mx5g edu.stanford.nlp.sentiment.SentimentPipeline -file book-reviews.txt`
     61
     623. In how many classes does Stanford Core NLP classify the sentences? Write it in `<YOUR_FILE>`. Write example of a wrong classification.
     63