27 | | Discuss the differences between training classifiers on Czech and English data. |
| 27 | Requirements: python, NLTK module |
| 28 | |
| 29 | `module add nltk` in order to make the NLTK module available |
| 30 | |
| 31 | 1. Create `<YOUR_FILE>`, a text file named ia161-UCO-01.txt where UCO is your university ID. |
| 32 | 1. Open `reviews.csv`, observe column meanings (Ack. to Karel Votruba, Karel.Votruba@firma.seznam.cz): |
| 33 | 1. shop ID |
| 34 | 1. positive comment |
| 35 | 1. negative comment |
| 36 | 1. price/delivery rating |
| 37 | 1. communication rating |
| 38 | 1. goods/content rating |
| 39 | 1. package/return rating |
| 40 | 1. complaints procedure rating |
| 41 | 1. Run classify.py. |
| 42 | 1. What the most informative features? Do they make sense? Write most informative features to `<YOUR_FILE>` and mark them with + or - depending on whether they make sense. Example (In the example, we can see that the first feature makes sense while the second does not.): |
| 43 | * peníze + |
| 44 | * s - |
| 45 | 1. Open classify.py, read the code, uncomment print statements, observe the results. |
| 46 | 1. Why the classification is not very good even though the accuracy is over 95 %? Particularly, why the last sentence is not correctly classified? Write your answer at the end of `<YOUR_FILE>`. Feel free to discuss it with the group. |
| 47 | 1. Think of improvements in feature extraction. Currently, the program takes only words as features. Add another feature, i.e. modify the function `document_features()`. |
| 48 | |
| 49 | === Sentiment analysis in English === |
| 50 | |
| 51 | Second, we try the Stanford Core NLP Sentiment Pipeline. |
| 52 | |
| 53 | Requirements: Java 8, python, gigabytes of memory |
| 54 | |
| 55 | `module add jdk` |
| 56 | this adds Java 8, you can check it when typing `java -version` |
| 57 | |
| 58 | 1. Download Stanford Core NLP from http://nlp.stanford.edu/software/corenlp.shtml, unzip it. |
| 59 | |
| 60 | 2. Try it: e.g. `java -cp "stanford-corenlp-full-2015-04-20/*" -mx5g edu.stanford.nlp.sentiment.SentimentPipeline -stdin` or `java -cp "stanford-corenlp-full-2015-04-20/*" -mx5g edu.stanford.nlp.sentiment.SentimentPipeline -file book-reviews.txt` |
| 61 | |
| 62 | 3. In how many classes does Stanford Core NLP classify the sentences? Write it in `<YOUR_FILE>`. Write example of a wrong classification. |
| 63 | |