Changes between Version 26 and Version 27 of private/NlpInPracticeCourse/OpinionSentiment


Ignore:
Timestamp:
Sep 15, 2019, 7:39:36 PM (5 years ago)
Author:
Zuzana Nevěřilová
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • private/NlpInPracticeCourse/OpinionSentiment

    v26 v27  
    1515 1.  Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan, Thumbs up? Sentiment Classification using Machine Learning Techniques, Proceedings of EMNLP 2002. [[http://www.cs.cornell.edu/home/llee/papers/sentiment.pdf]]
    1616 1. Liviu P. Dinu and Iulia Iuga. The Naive Bayes classifier in opinion mining: In search of the best feature set. In Alexander Gelbukh, editor, Computational Linguistics and Intelligent Text Processing, volume 7181 of Lecture Notes in Computer Science, pages 556–567. Springer Berlin Heidelberg, 2012.
     17 1. Zhang, L. J., Wang, S., and Liu, B. (2018). Deep learning for sentiment analysis:  A survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov., 8.
    1718
    1819
     
    2324=== Czech Sentiment Analysis ===
    2425
    25 In this workshop, we try several methods for opinion mining. First, we will train a classifier on real Czech data from Seznam.cz (thank you for the data, Seznam.cz guys!).
     26In this workshop, we try two methods for opinion mining: we use the automatic translation of Liu's Opinion Lexion. Next, we try to compensate drawbacks of the translated lexicon by computing word vectors in a simple way.
    2627
    27 Requirements: python, NLTK module
     28Requirements: python 3, jupyter notebook, modules NLTK, scipy, numpy, pandas, sklearn
    2829
    2930Files: [raw-attachment:reviews.csv reviews.csv], [raw-attachment:reviews_desamb.csv reviews_desamb.csv], [raw-attachment:reviews_czaccent.csv reviews_czaccent.csv], [raw-attachment:reviews_czaccent_desamb.csv reviews_czaccent_desamb.csv], [raw-attachment:classify.py classify.py]
    3031
    31321. Create `<YOUR_FILE>`, a text file named ia161-UCO-01.txt where UCO is your university ID.
    32 1. Open `reviews.csv`, observe column meanings (Ack. to Karel Votruba, Karel.Votruba@firma.seznam.cz):
    33   1. shop ID
    34   1. positive comment
    35   1. negative comment
    36   1. price/delivery rating
    37   1. communication rating
    38   1. goods/content rating
    39   1. package/return rating
    40   1. complaints procedure rating
    41 1. Run classify.py.
    42   {{{
    43 PYTHONIOENCODING=UTF-8 python classify.py
    44 }}}
    45 1. What the most informative features? Do they make sense? Write most informative features to `<YOUR_FILE>` and mark them with + or - depending on whether they make sense. Example (In the example, we can see that the first feature makes sense while the second does not.):
    46   * peníze +
    47   * s -
    48 1. Open classify.py, read the code, uncomment print statements, observe the results.
    49 1. Why the classification is not very good even though the accuracy is over 95 %? Particularly, why the last sentence is not correctly classified? Write your answer at the end of `<YOUR_FILE>`. Feel free to discuss it with the group.
    50 1. Think of improvements in feature extraction. Currently, the program takes only words as features. Add another feature, i.e. modify the function `document_features()`.
    51 
    52 === Sentiment analysis in English ===
    53 
    54 Second, we try the Stanford Core NLP Sentiment Pipeline.
    55 
    56 Requirements: Java 8, python, gigabytes of memory
    57 
    58 `module add jdk`
    59 this adds Java 8, you can check it when typing `java -version`
    60 
    61 1. Download Stanford Core NLP from http://nlp.stanford.edu/software/corenlp.shtml, unzip it.
    62 
    63 2. Try it: e.g.
    64 {{{
    65 java -cp "stanford-corenlp-full-2017-06-09/*" -mx5g \
    66     edu.stanford.nlp.sentiment.SentimentPipeline -stdin
    67 }}}
    68  or
    69 {{{
    70 java -cp "stanford-corenlp-full-2017-06-09/*" -mx5g \
    71     edu.stanford.nlp.sentiment.SentimentPipeline \
    72     -file book-reviews.txt
    73 }}}
    74 
    75 The file [raw-attachment:book-reviews.txt book-reviews.txt] is available.
    76 
    77 3. In how many classes does Stanford Core NLP classify the sentences? Write it in `<YOUR_FILE>`. Write example of a wrong classification.
     331. Do tasks marked in the python notebook as *TASK X*. You don't have to do optional tasks.
    7834
    7935=== Upload `<YOUR_FILE>` ===