32 | | 1. Open `reviews.csv`, observe column meanings (Ack. to Karel Votruba, Karel.Votruba@firma.seznam.cz): |
33 | | 1. shop ID |
34 | | 1. positive comment |
35 | | 1. negative comment |
36 | | 1. price/delivery rating |
37 | | 1. communication rating |
38 | | 1. goods/content rating |
39 | | 1. package/return rating |
40 | | 1. complaints procedure rating |
41 | | 1. Run classify.py. |
42 | | {{{ |
43 | | PYTHONIOENCODING=UTF-8 python classify.py |
44 | | }}} |
45 | | 1. What the most informative features? Do they make sense? Write most informative features to `<YOUR_FILE>` and mark them with + or - depending on whether they make sense. Example (In the example, we can see that the first feature makes sense while the second does not.): |
46 | | * peníze + |
47 | | * s - |
48 | | 1. Open classify.py, read the code, uncomment print statements, observe the results. |
49 | | 1. Why the classification is not very good even though the accuracy is over 95 %? Particularly, why the last sentence is not correctly classified? Write your answer at the end of `<YOUR_FILE>`. Feel free to discuss it with the group. |
50 | | 1. Think of improvements in feature extraction. Currently, the program takes only words as features. Add another feature, i.e. modify the function `document_features()`. |
51 | | |
52 | | === Sentiment analysis in English === |
53 | | |
54 | | Second, we try the Stanford Core NLP Sentiment Pipeline. |
55 | | |
56 | | Requirements: Java 8, python, gigabytes of memory |
57 | | |
58 | | `module add jdk` |
59 | | this adds Java 8, you can check it when typing `java -version` |
60 | | |
61 | | 1. Download Stanford Core NLP from http://nlp.stanford.edu/software/corenlp.shtml, unzip it. |
62 | | |
63 | | 2. Try it: e.g. |
64 | | {{{ |
65 | | java -cp "stanford-corenlp-full-2017-06-09/*" -mx5g \ |
66 | | edu.stanford.nlp.sentiment.SentimentPipeline -stdin |
67 | | }}} |
68 | | or |
69 | | {{{ |
70 | | java -cp "stanford-corenlp-full-2017-06-09/*" -mx5g \ |
71 | | edu.stanford.nlp.sentiment.SentimentPipeline \ |
72 | | -file book-reviews.txt |
73 | | }}} |
74 | | |
75 | | The file [raw-attachment:book-reviews.txt book-reviews.txt] is available. |
76 | | |
77 | | 3. In how many classes does Stanford Core NLP classify the sentences? Write it in `<YOUR_FILE>`. Write example of a wrong classification. |
| 33 | 1. Do tasks marked in the python notebook as *TASK X*. You don't have to do optional tasks. |